Rendering Pages with Puppeteer
Puppeteer is a Node.js library which provides an API to control Chromium using the DevTools Protocol.
JavaScript may be executed in the context of the headless browser, so any kind of change may be made to the page before taking a snapshot (PNG) or saving as a PDF.
Argument Parsing
Node.js does not ship with an argument parser, but we can make do with a loop to match arguments
function usage() { console.log('usage: save-report.js --url http_location --output out.pdf'); process.exit(1); } /* Argument parsing */ let url; let outputFile; for (let i=2; i < process.argv.length; i++) { switch (process.argv[i]) { case '--url': url = process.argv[++i]; break; case '--output': outputFile = process.argv[++i]; break; default: usage(); } }
Also assert that required arguments are provided
if (!url || !outputFile) { usage(); }
Main
The main loop of this program consists of three steps
- Create a new browser instances
- Open a new tab
- Wait until all resources are loaded (network idle)
- Render the page to a PDF
const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch({ defaultViewport: null, args: ['--ignore-certificate-errors', '--window-size=1500,1500'], }); const page = await browser.newPage(); /* Fetch page and wait for redirects */ await page.goto(url, { waitUntil: 'networkidle0', }); await page.pdf({ path: outputFile, width: '11in', height: '17in', displayHeaderFooter: false, scale: 0.9, }); await browser.close(); })();
Eval Parameters
There is a technique for passing data from the main node.js application to
javascript running
inside
the Chrome instance: pass arguments to arrow expression
and to
evaluate()
const params = { logo: '/img/logo.png'; }; /* customzie page before rednering */ await page.evaluate((params) => { document.getElementById('footer').innerHTML = ` <img src="${params['logo']}"> `; }, params);
WaitFor Functions
After modifying the page, wait for content to load
await page.waitForNetworkIdle();
Since web browsers are purely asynchronous, it may be necessary to watch for an element to appear or disappear
await page.waitForSelector('.spinner', {hidden: true});
This function will return if the elements matching a CSS selector are removed, invisible, or has width/height of 0.