Generate a single PDF containing all pages of a website. Ideal for AI-based Retrieval-Augmented Generation (RAG) and Question Answering (QA) tasks.
- Portability - Combine multiple pages into a single shareable PDF
- AI Integration - Works with Google NotebookLM, ChatGPT GPTs, and other AI tools
- Visual Preservation - Maintains images and formatting for multimodal models
- Concurrent Processing - Processes multiple pages in parallel for faster generation
npx site2pdf-cli https://example.comOutput is saved to ./out/<domain>.pdf.
To install the tool globally on your machine from source, run:
git clone https://github.com/laiso/site2pdf.git
cd site2pdf
npm install
npm run build
npm linkAfter installation, you can run the tool directly using the site2pdf command from anywhere:
site2pdf <main_url> [url_pattern]- Node.js (v18 or later recommended)
Puppeteer requires these system libraries:
sudo apt-get update
sudo apt-get install -y libnss3 libatk1.0-0 libatk-bridge2.0-0 libcups2 \
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 \
libgbm1 libasound2Note: On newer Ubuntu versions (24.04+), use
libasound2t64instead oflibasound2.
npx site2pdf-cli <main_url> [url_pattern]| Argument | Description |
|---|---|
<main_url> |
The starting URL to crawl and convert |
[url_pattern] |
Optional regex to filter which links to include (defaults to same domain) |
- Plain string:
'https://example.com/docs'- matches URLs containing this string - Regex literal:
'/https:\/\/example\.com\/docs/i'- full regex with flags
Basic usage (captures all same-domain links):
npx site2pdf-cli https://docs.example.comFilter to specific section:
npx site2pdf-cli "https://www.typescriptlang.org/docs/handbook/" "https://www.typescriptlang.org/docs/handbook/2/"| Variable | Description |
|---|---|
CHROME_PATH |
Path to a custom Chrome/Chromium executable |
Grant permissions to the Puppeteer cache:
icacls %USERPROFILE%/.cache/puppeteer/chrome /grant *S-1-15-2-1:(OI)(CI)(RX)See Puppeteer Windows troubleshooting.
Chrome does not provide ARM64 binaries for Linux. You'll see errors like:
- "Failed to launch the browser process!"
- "chrome-linux64/chrome: 1: Syntax error: "(" unexpected"
See Chrome for Testing ARM64 Support Issue.
- Launches headless Chrome via Puppeteer
- Navigates to the main URL and extracts all matching links
- Generates a PDF for each page concurrently
- Merges all PDFs into a single document using pdf-lib
- Saves to
./out/<slugified-url>.pdf
git clone https://github.com/laiso/site2pdf.git
cd site2pdf
npm install| Command | Description |
|---|---|
npm run dev -- <main_url> [url_pattern] |
Run in development mode with watch |
npm run build |
Compile TypeScript |
npm test |
Run tests |
npx biome lint |
Check for lint issues |
npx biome format |
Format code |
Issues and pull requests are welcome. Please follow the existing code style and include tests for new features.
MIT