Eric Radman : a Journal

Rendering Pages with Puppeteer

Puppeteer is a Node.js library which provides an API to control Chromium using the DevTools Protocol.

JavaScript may be executed in the context of the headless browser, so any kind of change may be made to the page before taking a snapshot (PNG) or saving as a PDF.

Argument Parsing

Node.js does not ship with an argument parser, but we can make do with a loop to match arguments

function usage() {
    console.log('usage: save-report.js --url http_location --output out.pdf');
    process.exit(1);
}

/* Argument parsing */
let url;
let outputFile;

for (let i=2; i < process.argv.length; i++) {
    switch (process.argv[i]) {
        case '--url':
            url = process.argv[++i];
            break;
        case '--output':
            outputFile = process.argv[++i];
            break;
        default:
            usage();
    }
}

Also assert that required arguments are provided

if (!url || !outputFile) {
    usage();
}

Main

The main loop of this program consists of three steps

  1. Create a new browser instances
  2. Open a new tab
  3. Wait until all resources are loaded (network idle)
  4. Render the page to a PDF
const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch({
        defaultViewport: null,
        args: ['--ignore-certificate-errors', '--window-size=1500,1500'],
    });
    const page = await browser.newPage();

    /* Fetch page and wait for redirects */
    await page.goto(url, {
        waitUntil: 'networkidle0',
    });

    await page.pdf({
        path: outputFile,
        width: '11in',
        height: '17in',
        displayHeaderFooter: false,
        scale: 0.9,
    });
    await browser.close();
})();

Eval Parameters

There is a technique for passing data from the main node.js application to javascript running inside the Chrome instance: pass arguments to arrow expression and to evaluate()

const params = {
    logo: '/img/logo.png';
};

/* customzie page before rednering */
await page.evaluate((params) => {
    document.getElementById('footer').innerHTML = `
    <img src="${params['logo']}">
    `;
}, params);

WaitFor Functions

After modifying the page, wait for content to load

await page.waitForNetworkIdle();

Since web browsers are purely asynchronous, it may be necessary to watch for an element to appear or disappear

await page.waitForSelector('.spinner', {hidden: true});

This function will return if the elements matching a CSS selector are removed, invisible, or has width/height of 0.

Global Installation

Unlike most node modules, puppeteer will install a copy of Chrome in the user's home directory even when installed with npm -g. A common location can be set using PUPPETEER_CACHE_DIR,

Chrome assumes that it can write to a user's home directory, but if this is not true, set XDG_CONFIG_HOME and XDG_CACHE_HOME.

RUN groupadd -r app
RUN useradd -m -r -g app -G audio,video app
USER app
WORKDIR /home/app

ENV PUPPETEER_CACHE_DIR="/home/app"
ENV NODE_PATH="/home/app/node_modules"
RUN npm -d install puppeteer@23.8.0

ENV XDG_CONFIG_HOME=/tmp/.chromium
ENV XDG_CACHE_HOME=/tmp/.chromium

Putting all this together allows the docker container to be run as any UID/GID using the --user flag.