n8n Invoice Extraction: PDF to JSON Automation Without Uploading Files

Q: What fields can I extract from invoices using n8n?

A good invoice extraction setup returns: vendor name, invoice number, invoice date, due date, currency, subtotal, tax amount, tax rate, total, and a line items array with description, quantity, unit price, and line total for each row. The Useful Patch API returns all of these as structured JSON.

Q: How do I connect an invoice extraction API to n8n?

Use an HTTP Request node in n8n. Set it to POST with the invoice file as binary data or base64 in the body. The API returns JSON which you can route directly to a Google Sheets node, Xero node, QuickBooks node, or any downstream system. For batch processing, use the Split In Batches node to handle multiple invoices from an email trigger or watched folder.

Q: What is the best free invoice OCR for n8n?

For truly free and local, Tesseract OCR with a custom template works but requires significant setup for each invoice layout. For structured extraction without per-page costs, Useful Patch offers a free tier with browser-based processing. For high volume, the Developer plan at £29/month includes API access with no per-page fees.

Q: How does n8n invoice extraction compare to Rossum, Nanonets, or Mindee?

Rossum, Nanonets, and Mindee are dedicated document AI platforms with per-page pricing (typically $0.05-0.50 per page). They require uploading your invoices to their cloud. Using n8n with an extraction API like Useful Patch gives you more control over the workflow, privacy (client-side processing option), and flat-rate pricing without per-page costs.

The Problem: Invoice Data Entry in Automated Workflows

You have an n8n workflow that processes orders, syncs accounting data, or manages AP. But invoice PDFs still need manual data entry because:

Cloud OCR services require uploading invoices — a privacy and compliance concern for financial documents
Tesseract alone doesn’t understand invoices — it reads characters, not structured fields like line items, totals, or tax rates
Per-page pricing adds up fast — Rossum, Nanonets, and Mindee charge $0.05–$0.50 per page, which compounds on batch runs
Self-hosted LLM extraction is slow and unreliable — running Ollama for document understanding needs GPU and still hallucinates amounts

What n8n users actually need: a node that takes a PDF invoice in and gives structured JSON out — vendor, date, total, tax, line items — without per-page fees or file uploads to third-party clouds.

How It Works: n8n + Invoice Extraction API

Trigger

Email arrives with invoice PDF attachment (IMAP trigger), file lands in watched folder (local trigger), or webhook fires from your billing system.

Extract

HTTP Request node sends the PDF to the extraction API. Returns structured JSON: vendor, invoice number, date, line items with quantities and prices, tax, and total.

Validate

IF node checks math: does sum of line items equal the subtotal? Does subtotal + tax equal the total? Flag mismatches for human review.

Route

Push validated data to Google Sheets, Xero, QuickBooks, Notion, Airtable, or any system with an n8n node. Batch invoices flow through automatically.

n8n Workflow: Invoice Email → Extraction → Google Sheets

Here’s the practical setup. This workflow watches an inbox for invoice PDFs, extracts the data, validates the math, and appends a row to Google Sheets.

1. IMAP Email Trigger

Configure the IMAP node to watch your AP inbox. Filter for emails with PDF attachments — most invoice emails have the PDF inline or attached.

{
  "node": "IMAP Email",
  "parameters": {
    "mailbox": "INBOX",
    "options": {
      "allowUnauthorizedCerts": false
    }
  },
  "note": "Filter downstream for .pdf attachments"
}

2. HTTP Request — Extract Invoice

Send the PDF binary to the extraction API endpoint. The response is structured JSON you can immediately use in downstream nodes.

{
  "node": "HTTP Request",
  "parameters": {
    "method": "POST",
    "url": "https://api.useful-patch.com/v1/extract",
    "sendBody": true,
    "bodyParameters": {
      "parameters": [
        {
          "name": "file",
          "parameterType": "formBinaryData",
          "inputDataFieldName": "attachment_0"
        }
      ]
    },
    "headerParameters": {
      "parameters": [
        {
          "name": "Authorization",
          "value": "Bearer YOUR_API_KEY"
        }
      ]
    }
  }
}

3. Example API Response

{
  "vendor": "Acme Supplies Ltd",
  "invoice_number": "INV-2026-0847",
  "invoice_date": "2026-04-10",
  "due_date": "2026-05-10",
  "currency": "GBP",
  "subtotal": 840.00,
  "tax_rate": 20,
  "tax_amount": 168.00,
  "total": 1008.00,
  "line_items": [
    {
      "description": "Office chairs (ergonomic)",
      "quantity": 4,
      "unit_price": 160.00,
      "line_total": 640.00
    },
    {
      "description": "Standing desk converter",
      "quantity": 2,
      "unit_price": 100.00,
      "line_total": 200.00
    }
  ],
  "confidence": 0.97,
  "validation": {
    "line_items_sum_matches_subtotal": true,
    "tax_calculation_matches": true,
    "total_matches": true
  }
}

4. IF Node — Math Validation

Check the validation object. If all three checks pass, route to the “auto-approve” branch. If any fail, route to a Slack notification or email alert for human review.

{
  "node": "IF",
  "parameters": {
    "conditions": {
      "boolean": [
        {
          "value1": "={{ $json.validation.total_matches }}",
          "value2": true
        }
      ]
    }
  }
}

5. Google Sheets — Append Row

Map the extracted fields to your spreadsheet columns. Each invoice becomes a row with all the structured data ready for reconciliation.

{
  "node": "Google Sheets",
  "parameters": {
    "operation": "appendOrUpdate",
    "documentId": "YOUR_SHEET_ID",
    "sheetName": "Invoices",
    "columns": {
      "mappingMode": "defineBelow",
      "value": {
        "Vendor": "={{ $json.vendor }}",
        "Invoice #": "={{ $json.invoice_number }}",
        "Date": "={{ $json.invoice_date }}",
        "Due": "={{ $json.due_date }}",
        "Subtotal": "={{ $json.subtotal }}",
        "Tax": "={{ $json.tax_amount }}",
        "Total": "={{ $json.total }}",
        "Line Items": "={{ $json.line_items.length }}",
        "Confidence": "={{ $json.confidence }}"
      }
    }
  }
}

Batch processing tip: For folders of invoices, use a Read Binary Files node pointed at a directory, followed by Split In Batches to process 5–10 invoices at a time. This avoids rate limits and keeps memory usage stable in n8n.

Invoice Extraction Options for n8n: Comparison

Approach	Privacy	Structured Output	Per-Page Cost	Setup Difficulty	Line Items
Useful Patch API	✓ Client-side option	✓ Full JSON	✓ Flat rate	✓ HTTP node only	✓ Yes
Rossum	✗ Cloud upload	✓ Full JSON	✗ ~$0.30/page	⚠ OAuth + webhook	✓ Yes
Nanonets	✗ Cloud upload	✓ Full JSON	✗ ~$0.10/page	✓ REST API	✓ Yes
Mindee	✗ Cloud upload	✓ Full JSON	✗ ~$0.08/page	✓ REST API	✓ Yes
Google Document AI	⚠ Google Cloud	✓ Full JSON	✗ ~$0.05/page	✗ GCP setup + auth	✓ Yes
Amazon Textract	⚠ AWS	⚠ Tables only	✗ ~$0.015/page	✗ IAM + S3 bucket	⚠ Raw tables
Tesseract (local)	✓ Fully local	✗ Raw text only	✓ Free	✗ Regex parsing needed	✗ Manual
Ollama + Vision LLM	✓ Fully local	⚠ Unpredictable	✓ Free (GPU cost)	✗ GPU + prompt tuning	⚠ Hallucinates

Pricing reflects published rates as of April 2026. Per-page costs vary by volume tier.

Privacy-First Extraction: Why It Matters for Invoices

Invoices contain some of the most sensitive business data: vendor relationships, pricing agreements, payment terms, bank details, tax IDs. Uploading them to a cloud OCR service means:

The vendor’s data processing agreement now covers your financial documents
Your supplier relationships and pricing are in someone else’s training pipeline (unless explicitly excluded)
Compliance teams need to audit every third-party processor that touches financial data

Client-side extraction avoids all of this. The PDF is processed in your browser or on your own infrastructure. Only the structured JSON result — which contains no raw document images — moves through your n8n workflow.

For regulated industries: If you handle invoices from healthcare, government, or financial services clients, client-side extraction eliminates the need to include the OCR vendor in your data processing inventory. The JSON output is derived data, not the source document.

Common n8n Invoice Automation Patterns

Pattern 1: Email Inbox → Extract → Accounting Software

The most common setup. IMAP trigger watches your AP inbox. Invoice PDFs are extracted and pushed directly to Xero, QuickBooks, or FreshBooks via their respective n8n nodes. Validation checks catch OCR errors before they hit your books.

Pattern 2: Watched Folder → Batch Extract → Google Sheets

For teams that collect invoices in a shared drive folder. A cron-triggered workflow scans for new PDFs, extracts each one, and appends rows to a master spreadsheet. Good for monthly reconciliation or audit prep.

Pattern 3: Webhook → Extract → Approve → Pay

Your procurement system sends a webhook when a new invoice is received. n8n extracts the data, checks it against the purchase order (amount match, vendor match), routes for approval via Slack or email, and triggers payment on approval. Full AP automation loop.

Pattern 4: Multi-Format Intake → Normalize → ERP

Invoices arrive in different formats from different vendors — some PDF, some email body, some scanned images. n8n routes each type through the appropriate extraction path, normalizes the output to a common schema, and pushes to your ERP. The extraction API handles layout variation automatically.

Integration Targets: Where the JSON Goes

System	n8n Node	What to Map
Google Sheets	Google Sheets	Append row: vendor, invoice #, date, total, line item count
Xero	Xero	Create bill: contact (vendor), date, line items, tax rate, currency
QuickBooks	QuickBooks Online	Create bill: vendor, line items, total, terms, due date
Notion	Notion	Create database entry: all fields as properties, PDF link as file
Airtable	Airtable	Create record: vendor, amounts, status (extracted/validated/approved)
Slack	Slack	Post summary: “New invoice from [vendor]: [total] due [date]” + approve button
PostgreSQL	Postgres	INSERT invoice header + line items into normalised tables

Frequently Asked Questions

Can I extract invoice data in n8n without uploading PDFs to a cloud service?

Yes. Use a client-side extraction approach where the PDF is processed locally (in-browser or on your own server) and only the extracted JSON moves through your n8n workflow. This avoids sending sensitive financial documents to third-party OCR services.

What fields can I extract from invoices using n8n?

A good extraction setup returns: vendor name, invoice number, invoice date, due date, currency, subtotal, tax amount, tax rate, total, and a line items array with description, quantity, unit price, and line total for each row.

How do I connect an invoice extraction API to n8n?

Use an HTTP Request node. POST the invoice file as binary data or base64. The API returns JSON you can route directly to Google Sheets, Xero, QuickBooks, or any downstream n8n node. For batch processing, use Split In Batches.

What is the best free invoice OCR for n8n?

For truly free and local, Tesseract with custom templates works but needs significant setup per layout. For structured extraction without per-page costs, the Useful Patch free tier handles browser-based processing. The Developer plan (£29/mo) adds API access with no per-page fees.

Can n8n process invoices from email attachments automatically?

Yes. Use an IMAP Email Trigger node filtered to messages with PDF attachments. Route the attachment binary through an HTTP Request node to your extraction API. The returned JSON feeds directly into your accounting or ERP nodes.

How does this compare to Rossum, Nanonets, or Mindee?

Those are dedicated document AI platforms with per-page pricing ($0.05–$0.50/page) that require uploading invoices to their cloud. Using n8n with an extraction API gives you workflow control, privacy (client-side processing option), and flat-rate pricing.

Extract Invoice Data in n8n Without Uploading Files to Third Parties