Eric Radman : a Journal

Rendering Pages with Puppeteer

Puppeteer is a Node.js library which provides an API to control Chromium using the DevTools Protocol.

JavaScript may be executed in the context of the headless browser, so any kind of change may be made to the page before taking a snapshot (PNG) or saving as a PDF.

Argument Parsing

Node.js does not ship with an argument parser, but we can make do with a loop to match arguments

function usage() {
    console.log('usage: save-report.js --url http_location --output out.pdf');
    process.exit(1);
}

/* Argument parsing */
let url;
let outputFile;

for (let i=2; i < process.argv.length; i++) {
    switch (process.argv[i]) {
        case '--url':
            url = process.argv[++i];
            break;
        case '--output':
            outputFile = process.argv[++i];
            break;
        default:
            usage();
    }
}

Also assert that required arguments are provided

if (!url || !outputFile) {
    usage();
}

Main

The main loop of this program consists of three steps

  1. Create a new browser instances
  2. Open a new tab
  3. Wait until all resources are loaded (network idle)
  4. Render the page to a PDF
const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        defaultViewport: null,
        args: ['--ignore-certificate-errors', '--window-size=1500,1500'],
    });
    const page = await browser.newPage();

    /* Fetch page and wait for redirects */
    await page.goto(url, {
        waitUntil: 'networkidle0',
    });

    await page.pdf({
        path: outputFile,
        width: '11in',
        height: '17in',
        displayHeaderFooter: false,
        scale: 0.9,
    });
    await browser.close();
})();

Eval Parameters

There is a technique for passing data from the main node.js application to javascript running inside the Chrome instance: pass arguments to arrow expression and to evaluate()

const params = {
    logo: '/img/logo.png';
};

/* customzie page before rednering */
await page.evaluate((params) => {
    document.getElementById('footer').innerHTML = `
    <img src="${params['logo']}">
    `;
}, params);

WaitFor Functions

After modifying the page, wait for content to load

await page.waitForNetworkIdle();

Since web browsers are purely asynchronous, it may be necessary to watch for an element to appear or disappear

await page.waitForSelector('.spinner', {hidden: true});

This function will return if the elements matching a CSS selector are removed, invisible, or has width/height of 0.