n8n + Invoice OCR

Extract Invoice Data in n8n Without Uploading Files to Third Parties

Private, structured invoice extraction that fits into your n8n workflow. Get JSON with line items, totals, and tax fields — the PDF never leaves your machine.

Try Free — No Signup API Docs

The Problem: Invoice Data Entry in Automated Workflows

You have an n8n workflow that processes orders, syncs accounting data, or manages AP. But invoice PDFs still need manual data entry because:

What n8n users actually need: a node that takes a PDF invoice in and gives structured JSON out — vendor, date, total, tax, line items — without per-page fees or file uploads to third-party clouds.

How It Works: n8n + Invoice Extraction API

1

Trigger

Email arrives with invoice PDF attachment (IMAP trigger), file lands in watched folder (local trigger), or webhook fires from your billing system.

2

Extract

HTTP Request node sends the PDF to the extraction API. Returns structured JSON: vendor, invoice number, date, line items with quantities and prices, tax, and total.

3

Validate

IF node checks math: does sum of line items equal the subtotal? Does subtotal + tax equal the total? Flag mismatches for human review.

4

Route

Push validated data to Google Sheets, Xero, QuickBooks, Notion, Airtable, or any system with an n8n node. Batch invoices flow through automatically.

n8n Workflow: Invoice Email → Extraction → Google Sheets

Here’s the practical setup. This workflow watches an inbox for invoice PDFs, extracts the data, validates the math, and appends a row to Google Sheets.

1. IMAP Email Trigger

Configure the IMAP node to watch your AP inbox. Filter for emails with PDF attachments — most invoice emails have the PDF inline or attached.

{
  "node": "IMAP Email",
  "parameters": {
    "mailbox": "INBOX",
    "options": {
      "allowUnauthorizedCerts": false
    }
  },
  "note": "Filter downstream for .pdf attachments"
}

2. HTTP Request — Extract Invoice

Send the PDF binary to the extraction API endpoint. The response is structured JSON you can immediately use in downstream nodes.

{
  "node": "HTTP Request",
  "parameters": {
    "method": "POST",
    "url": "https://api.useful-patch.com/v1/extract",
    "sendBody": true,
    "bodyParameters": {
      "parameters": [
        {
          "name": "file",
          "parameterType": "formBinaryData",
          "inputDataFieldName": "attachment_0"
        }
      ]
    },
    "headerParameters": {
      "parameters": [
        {
          "name": "Authorization",
          "value": "Bearer YOUR_API_KEY"
        }
      ]
    }
  }
}

3. Example API Response

{
  "vendor": "Acme Supplies Ltd",
  "invoice_number": "INV-2026-0847",
  "invoice_date": "2026-04-10",
  "due_date": "2026-05-10",
  "currency": "GBP",
  "subtotal": 840.00,
  "tax_rate": 20,
  "tax_amount": 168.00,
  "total": 1008.00,
  "line_items": [
    {
      "description": "Office chairs (ergonomic)",
      "quantity": 4,
      "unit_price": 160.00,
      "line_total": 640.00
    },
    {
      "description": "Standing desk converter",
      "quantity": 2,
      "unit_price": 100.00,
      "line_total": 200.00
    }
  ],
  "confidence": 0.97,
  "validation": {
    "line_items_sum_matches_subtotal": true,
    "tax_calculation_matches": true,
    "total_matches": true
  }
}

4. IF Node — Math Validation

Check the validation object. If all three checks pass, route to the “auto-approve” branch. If any fail, route to a Slack notification or email alert for human review.

{
  "node": "IF",
  "parameters": {
    "conditions": {
      "boolean": [
        {
          "value1": "={{ $json.validation.total_matches }}",
          "value2": true
        }
      ]
    }
  }
}

5. Google Sheets — Append Row

Map the extracted fields to your spreadsheet columns. Each invoice becomes a row with all the structured data ready for reconciliation.

{
  "node": "Google Sheets",
  "parameters": {
    "operation": "appendOrUpdate",
    "documentId": "YOUR_SHEET_ID",
    "sheetName": "Invoices",
    "columns": {
      "mappingMode": "defineBelow",
      "value": {
        "Vendor": "={{ $json.vendor }}",
        "Invoice #": "={{ $json.invoice_number }}",
        "Date": "={{ $json.invoice_date }}",
        "Due": "={{ $json.due_date }}",
        "Subtotal": "={{ $json.subtotal }}",
        "Tax": "={{ $json.tax_amount }}",
        "Total": "={{ $json.total }}",
        "Line Items": "={{ $json.line_items.length }}",
        "Confidence": "={{ $json.confidence }}"
      }
    }
  }
}
Batch processing tip: For folders of invoices, use a Read Binary Files node pointed at a directory, followed by Split In Batches to process 5–10 invoices at a time. This avoids rate limits and keeps memory usage stable in n8n.

Invoice Extraction Options for n8n: Comparison

Approach Privacy Structured Output Per-Page Cost Setup Difficulty Line Items
Useful Patch API ✓ Client-side option ✓ Full JSON ✓ Flat rate ✓ HTTP node only ✓ Yes
Rossum ✗ Cloud upload ✓ Full JSON ✗ ~$0.30/page ⚠ OAuth + webhook ✓ Yes
Nanonets ✗ Cloud upload ✓ Full JSON ✗ ~$0.10/page ✓ REST API ✓ Yes
Mindee ✗ Cloud upload ✓ Full JSON ✗ ~$0.08/page ✓ REST API ✓ Yes
Google Document AI ⚠ Google Cloud ✓ Full JSON ✗ ~$0.05/page ✗ GCP setup + auth ✓ Yes
Amazon Textract ⚠ AWS ⚠ Tables only ✗ ~$0.015/page ✗ IAM + S3 bucket ⚠ Raw tables
Tesseract (local) ✓ Fully local ✗ Raw text only ✓ Free ✗ Regex parsing needed ✗ Manual
Ollama + Vision LLM ✓ Fully local ⚠ Unpredictable ✓ Free (GPU cost) ✗ GPU + prompt tuning ⚠ Hallucinates

Pricing reflects published rates as of April 2026. Per-page costs vary by volume tier.

Privacy-First Extraction: Why It Matters for Invoices

Invoices contain some of the most sensitive business data: vendor relationships, pricing agreements, payment terms, bank details, tax IDs. Uploading them to a cloud OCR service means:

Client-side extraction avoids all of this. The PDF is processed in your browser or on your own infrastructure. Only the structured JSON result — which contains no raw document images — moves through your n8n workflow.

For regulated industries: If you handle invoices from healthcare, government, or financial services clients, client-side extraction eliminates the need to include the OCR vendor in your data processing inventory. The JSON output is derived data, not the source document.

Common n8n Invoice Automation Patterns

Pattern 1: Email Inbox → Extract → Accounting Software

The most common setup. IMAP trigger watches your AP inbox. Invoice PDFs are extracted and pushed directly to Xero, QuickBooks, or FreshBooks via their respective n8n nodes. Validation checks catch OCR errors before they hit your books.

Pattern 2: Watched Folder → Batch Extract → Google Sheets

For teams that collect invoices in a shared drive folder. A cron-triggered workflow scans for new PDFs, extracts each one, and appends rows to a master spreadsheet. Good for monthly reconciliation or audit prep.

Pattern 3: Webhook → Extract → Approve → Pay

Your procurement system sends a webhook when a new invoice is received. n8n extracts the data, checks it against the purchase order (amount match, vendor match), routes for approval via Slack or email, and triggers payment on approval. Full AP automation loop.

Pattern 4: Multi-Format Intake → Normalize → ERP

Invoices arrive in different formats from different vendors — some PDF, some email body, some scanned images. n8n routes each type through the appropriate extraction path, normalizes the output to a common schema, and pushes to your ERP. The extraction API handles layout variation automatically.

Integration Targets: Where the JSON Goes

System n8n Node What to Map
Google Sheets Google Sheets Append row: vendor, invoice #, date, total, line item count
Xero Xero Create bill: contact (vendor), date, line items, tax rate, currency
QuickBooks QuickBooks Online Create bill: vendor, line items, total, terms, due date
Notion Notion Create database entry: all fields as properties, PDF link as file
Airtable Airtable Create record: vendor, amounts, status (extracted/validated/approved)
Slack Slack Post summary: “New invoice from [vendor]: [total] due [date]” + approve button
PostgreSQL Postgres INSERT invoice header + line items into normalised tables

Frequently Asked Questions

Can I extract invoice data in n8n without uploading PDFs to a cloud service?

Yes. Use a client-side extraction approach where the PDF is processed locally (in-browser or on your own server) and only the extracted JSON moves through your n8n workflow. This avoids sending sensitive financial documents to third-party OCR services.

What fields can I extract from invoices using n8n?

A good extraction setup returns: vendor name, invoice number, invoice date, due date, currency, subtotal, tax amount, tax rate, total, and a line items array with description, quantity, unit price, and line total for each row.

How do I connect an invoice extraction API to n8n?

Use an HTTP Request node. POST the invoice file as binary data or base64. The API returns JSON you can route directly to Google Sheets, Xero, QuickBooks, or any downstream n8n node. For batch processing, use Split In Batches.

What is the best free invoice OCR for n8n?

For truly free and local, Tesseract with custom templates works but needs significant setup per layout. For structured extraction without per-page costs, the Useful Patch free tier handles browser-based processing. The Developer plan (£29/mo) adds API access with no per-page fees.

Can n8n process invoices from email attachments automatically?

Yes. Use an IMAP Email Trigger node filtered to messages with PDF attachments. Route the attachment binary through an HTTP Request node to your extraction API. The returned JSON feeds directly into your accounting or ERP nodes.

How does this compare to Rossum, Nanonets, or Mindee?

Those are dedicated document AI platforms with per-page pricing ($0.05–$0.50/page) that require uploading invoices to their cloud. Using n8n with an extraction API gives you workflow control, privacy (client-side processing option), and flat-rate pricing.

Add Invoice Extraction to Your n8n Workflow

Try the extractor free in your browser, or grab an API key for your n8n HTTP Request nodes. No per-page fees.

Try Free — No Signup Developer Plan — £29/mo