Save this as templates/<vendor>.yml and point invoice2data at it with --template-folder templates/. Test with: invoice2data --template-folder templates/ your_invoice.pdf
How it works
1. Paste invoice text
Copy the text from any invoice PDF. The tool reads the layout and field positions.
2. AI writes the regex
It detects invoice number, date, total, and VAT patterns, then writes anchored regex that works across invoices from the same vendor.
3. Drop into invoice2data
Save the YAML to your templates folder. invoice2data picks it up automatically. No more trial-and-error regex.
Why this exists
invoice2data is a great open-source library, but every new vendor needs a hand-written YAML template with regex you have to get exactly right. That is the single most common complaint from people using it. This tool does the tedious part for you. It is free for 3 templates a day. If you process invoices from many vendors, the Pro pack removes the limit and adds batch mode.
Frequently asked questions
Does this replace invoice2data?
No. It generates templates FOR invoice2data. You still run invoice2data to do the extraction. If you want a fully hosted extractor with no Python at all, try our free in-browser extractor.
Is my invoice data stored?
We log the vendor name and character count to rate-limit the free tier. We do not store full invoice contents beyond what is needed to generate your template.
What format does it output?
invoice2data v0.5.0 YAML, with issuer, keywords, fields (regex), and options blocks.
Debugging an invoice2data template is too difficult. Does this help?
Yes, that is exactly why it exists. Instead of editing regex by hand and re-running until it matches, you get a template that already matches the invoice you pasted. You can tweak from a working starting point instead of a blank file.
Is there a template validator?
The generated template is built from your actual invoice text, so the regex is validated against real input at generation time. Paste a second invoice from the same vendor to confirm the patterns generalise.