Private, structured invoice extraction that fits into your n8n workflow. Get JSON with line items, totals, and tax fields — the PDF never leaves your machine.
Try Free — No Signup API DocsYou have an n8n workflow that processes orders, syncs accounting data, or manages AP. But invoice PDFs still need manual data entry because:
Email arrives with invoice PDF attachment (IMAP trigger), file lands in watched folder (local trigger), or webhook fires from your billing system.
HTTP Request node sends the PDF to the extraction API. Returns structured JSON: vendor, invoice number, date, line items with quantities and prices, tax, and total.
IF node checks math: does sum of line items equal the subtotal? Does subtotal + tax equal the total? Flag mismatches for human review.
Push validated data to Google Sheets, Xero, QuickBooks, Notion, Airtable, or any system with an n8n node. Batch invoices flow through automatically.
Here’s the practical setup. This workflow watches an inbox for invoice PDFs, extracts the data, validates the math, and appends a row to Google Sheets.
Configure the IMAP node to watch your AP inbox. Filter for emails with PDF attachments — most invoice emails have the PDF inline or attached.
{
"node": "IMAP Email",
"parameters": {
"mailbox": "INBOX",
"options": {
"allowUnauthorizedCerts": false
}
},
"note": "Filter downstream for .pdf attachments"
}
Send the PDF binary to the extraction API endpoint. The response is structured JSON you can immediately use in downstream nodes.
{
"node": "HTTP Request",
"parameters": {
"method": "POST",
"url": "https://api.useful-patch.com/v1/extract",
"sendBody": true,
"bodyParameters": {
"parameters": [
{
"name": "file",
"parameterType": "formBinaryData",
"inputDataFieldName": "attachment_0"
}
]
},
"headerParameters": {
"parameters": [
{
"name": "Authorization",
"value": "Bearer YOUR_API_KEY"
}
]
}
}
}
{
"vendor": "Acme Supplies Ltd",
"invoice_number": "INV-2026-0847",
"invoice_date": "2026-04-10",
"due_date": "2026-05-10",
"currency": "GBP",
"subtotal": 840.00,
"tax_rate": 20,
"tax_amount": 168.00,
"total": 1008.00,
"line_items": [
{
"description": "Office chairs (ergonomic)",
"quantity": 4,
"unit_price": 160.00,
"line_total": 640.00
},
{
"description": "Standing desk converter",
"quantity": 2,
"unit_price": 100.00,
"line_total": 200.00
}
],
"confidence": 0.97,
"validation": {
"line_items_sum_matches_subtotal": true,
"tax_calculation_matches": true,
"total_matches": true
}
}
Check the validation object. If all three checks pass, route to the “auto-approve” branch. If any fail, route to a Slack notification or email alert for human review.
{
"node": "IF",
"parameters": {
"conditions": {
"boolean": [
{
"value1": "={{ $json.validation.total_matches }}",
"value2": true
}
]
}
}
}
Map the extracted fields to your spreadsheet columns. Each invoice becomes a row with all the structured data ready for reconciliation.
{
"node": "Google Sheets",
"parameters": {
"operation": "appendOrUpdate",
"documentId": "YOUR_SHEET_ID",
"sheetName": "Invoices",
"columns": {
"mappingMode": "defineBelow",
"value": {
"Vendor": "={{ $json.vendor }}",
"Invoice #": "={{ $json.invoice_number }}",
"Date": "={{ $json.invoice_date }}",
"Due": "={{ $json.due_date }}",
"Subtotal": "={{ $json.subtotal }}",
"Tax": "={{ $json.tax_amount }}",
"Total": "={{ $json.total }}",
"Line Items": "={{ $json.line_items.length }}",
"Confidence": "={{ $json.confidence }}"
}
}
}
}
| Approach | Privacy | Structured Output | Per-Page Cost | Setup Difficulty | Line Items |
|---|---|---|---|---|---|
| Useful Patch API | ✓ Client-side option | ✓ Full JSON | ✓ Flat rate | ✓ HTTP node only | ✓ Yes |
| Rossum | ✗ Cloud upload | ✓ Full JSON | ✗ ~$0.30/page | ⚠ OAuth + webhook | ✓ Yes |
| Nanonets | ✗ Cloud upload | ✓ Full JSON | ✗ ~$0.10/page | ✓ REST API | ✓ Yes |
| Mindee | ✗ Cloud upload | ✓ Full JSON | ✗ ~$0.08/page | ✓ REST API | ✓ Yes |
| Google Document AI | ⚠ Google Cloud | ✓ Full JSON | ✗ ~$0.05/page | ✗ GCP setup + auth | ✓ Yes |
| Amazon Textract | ⚠ AWS | ⚠ Tables only | ✗ ~$0.015/page | ✗ IAM + S3 bucket | ⚠ Raw tables |
| Tesseract (local) | ✓ Fully local | ✗ Raw text only | ✓ Free | ✗ Regex parsing needed | ✗ Manual |
| Ollama + Vision LLM | ✓ Fully local | ⚠ Unpredictable | ✓ Free (GPU cost) | ✗ GPU + prompt tuning | ⚠ Hallucinates |
Pricing reflects published rates as of April 2026. Per-page costs vary by volume tier.
Invoices contain some of the most sensitive business data: vendor relationships, pricing agreements, payment terms, bank details, tax IDs. Uploading them to a cloud OCR service means:
Client-side extraction avoids all of this. The PDF is processed in your browser or on your own infrastructure. Only the structured JSON result — which contains no raw document images — moves through your n8n workflow.
The most common setup. IMAP trigger watches your AP inbox. Invoice PDFs are extracted and pushed directly to Xero, QuickBooks, or FreshBooks via their respective n8n nodes. Validation checks catch OCR errors before they hit your books.
For teams that collect invoices in a shared drive folder. A cron-triggered workflow scans for new PDFs, extracts each one, and appends rows to a master spreadsheet. Good for monthly reconciliation or audit prep.
Your procurement system sends a webhook when a new invoice is received. n8n extracts the data, checks it against the purchase order (amount match, vendor match), routes for approval via Slack or email, and triggers payment on approval. Full AP automation loop.
Invoices arrive in different formats from different vendors — some PDF, some email body, some scanned images. n8n routes each type through the appropriate extraction path, normalizes the output to a common schema, and pushes to your ERP. The extraction API handles layout variation automatically.
| System | n8n Node | What to Map |
|---|---|---|
| Google Sheets | Google Sheets | Append row: vendor, invoice #, date, total, line item count |
| Xero | Xero | Create bill: contact (vendor), date, line items, tax rate, currency |
| QuickBooks | QuickBooks Online | Create bill: vendor, line items, total, terms, due date |
| Notion | Notion | Create database entry: all fields as properties, PDF link as file |
| Airtable | Airtable | Create record: vendor, amounts, status (extracted/validated/approved) |
| Slack | Slack | Post summary: “New invoice from [vendor]: [total] due [date]” + approve button |
| PostgreSQL | Postgres | INSERT invoice header + line items into normalised tables |
Yes. Use a client-side extraction approach where the PDF is processed locally (in-browser or on your own server) and only the extracted JSON moves through your n8n workflow. This avoids sending sensitive financial documents to third-party OCR services.
A good extraction setup returns: vendor name, invoice number, invoice date, due date, currency, subtotal, tax amount, tax rate, total, and a line items array with description, quantity, unit price, and line total for each row.
Use an HTTP Request node. POST the invoice file as binary data or base64. The API returns JSON you can route directly to Google Sheets, Xero, QuickBooks, or any downstream n8n node. For batch processing, use Split In Batches.
For truly free and local, Tesseract with custom templates works but needs significant setup per layout. For structured extraction without per-page costs, the Useful Patch free tier handles browser-based processing. The Developer plan (£29/mo) adds API access with no per-page fees.
Yes. Use an IMAP Email Trigger node filtered to messages with PDF attachments. Route the attachment binary through an HTTP Request node to your extraction API. The returned JSON feeds directly into your accounting or ERP nodes.
Those are dedicated document AI platforms with per-page pricing ($0.05–$0.50/page) that require uploading invoices to their cloud. Using n8n with an extraction API gives you workflow control, privacy (client-side processing option), and flat-rate pricing.
Try the extractor free in your browser, or grab an API key for your n8n HTTP Request nodes. No per-page fees.
Try Free — No Signup Developer Plan — £29/mo