Rendering Pages with Puppeteer
Puppeteer is a Node.js library which provides an API to control Chromium using the DevTools Protocol.
JavaScript may be executed in the context of the headless browser, so any kind of change may be made to the page before taking a snapshot (PNG) or saving as a PDF.
Argument Parsing
Node.js does not ship with an argument parser, but we can make do with a loop to match arguments
function usage() { console.log('usage: save-report.js --url http_location --output out.pdf'); process.exit(1); } /* Argument parsing */ let url; let outputFile; for (let i=2; i < process.argv.length; i++) { switch (process.argv[i]) { case '--url': url = process.argv[++i]; break; case '--output': outputFile = process.argv[++i]; break; default: usage(); } }
Also assert that required arguments are provided
if (!url || !outputFile) { usage(); }
Main
The main loop of this program consists of three steps
- Create a new browser instances
- Open a new tab
- Wait until all resources are loaded (network idle)
- Render the page to a PDF
const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch({ defaultViewport: null, args: ['--ignore-certificate-errors', '--window-size=1500,1500'], }); const page = await browser.newPage(); /* Fetch page and wait for redirects */ await page.goto(url, { waitUntil: 'networkidle0', }); await page.pdf({ path: outputFile, width: '11in', height: '17in', displayHeaderFooter: false, scale: 0.9, }); await browser.close(); })();
Eval Parameters
There is a technique for passing data from the main node.js application to
javascript running
inside
the Chrome instance: pass arguments to arrow expression
and to
evaluate()
const params = { logo: '/img/logo.png'; }; /* customzie page before rednering */ await page.evaluate((params) => { document.getElementById('footer').innerHTML = ` <img src="${params['logo']}"> `; }, params);
WaitFor Functions
After modifying the page, wait for content to load
await page.waitForNetworkIdle();
Since web browsers are purely asynchronous, it may be necessary to watch for an element to appear or disappear
await page.waitForSelector('.spinner', {hidden: true});
This function will return if the elements matching a CSS selector are removed, invisible, or has width/height of 0.
Global Installation
Unlike most node modules, puppeteer will install a copy of Chrome in the
user's home directory even when installed with
npm -g
.
A common location can be set using
PUPPETEER_CACHE_DIR
,
Chrome assumes that it can write to a user's home directory, but if this is
not true, set
XDG_CONFIG_HOME
and
XDG_CACHE_HOME
.
RUN groupadd -r app RUN useradd -m -r -g app -G audio,video app USER app WORKDIR /home/app ENV PUPPETEER_CACHE_DIR="/home/app" ENV NODE_PATH="/home/app/node_modules" RUN npm -d install puppeteer@23.8.0 ENV XDG_CONFIG_HOME=/tmp/.chromium ENV XDG_CACHE_HOME=/tmp/.chromium
Putting all this together allows the docker container to be run as any UID/GID
using the
--user
flag.