Managing operational expenses such as utility bills for water, electricity, and gas is often a time-consuming and costly task. Companies that handle multiple projects or properties receive thousands of invoices on a regular basis. The data from these documents needs to be extracted, processed, and recorded in internal systems.
Traditionally, this process has been done manually, creating a bottleneck that consumes valuable time and resources.
Beyond cost and inefficiency, manual data entry especially from scanned invoices introduces a significant risk: human error. Visual fatigue, repetitive tasks, and variability in document format and quality all increase the likelihood of transcription mistakes, which can lead to accounting discrepancies, incorrect payments, and audit issues.
In our specific case, the situation is even more complex because most invoices are scanned documents unlike native digital files, scanned invoices pose additional challenges for Optical Character Recognition (OCR) technologies:
These issues make it significantly harder to accurately extract data using automated methods.
Recognizing the scope and technical complexity of the problem, we conducted an in-depth review of the latest OCR and AI-based document processing models. Our goal was to identify the most robust and accurate solution capable of handling complex scanned invoices.
We evaluated leading vision-language models, including:
We tested these models against a representative dataset of scanned invoices from different providers. Some models such as certain Mistral variants produced an unacceptably high error rate, confusing similar digits (e.g., ‘5’ and ‘9’) or letters (‘R’ and ‘F’), making them unsuitable for sensitive financial data extraction.
After comparing their performance on extracting key information vendor names, account numbers, dates, amounts, and addresses—from complex scanned documents, Gemini 1.5 Pro and Sonnet 3.5 emerged as the most promising options. Both demonstrated strong contextual understanding and reliability, even under suboptimal conditions.
We ultimately selected Gemini 1.5 Pro due to its optimal balance of performance and costwhile Sonnet 3.5 delivered comparable results, its token-based pricing model was significantly more expensive at the processing scale required. Gemini 1.5 Pro, with its large context window, enabled efficient processing of long documents in a single pass and offered more favorable economics. (Note: Pricing models should be verified with the provider’s current documentation, but historically Gemini 1.5 Pro has offered better cost-efficiency for high-volume, large-context tasks.)
To address this challenge, we built a robust automated solution using a combination of Python for low-level processing and n8n for workflow orchestration. The process includes the following stages:
1. Document ingestion and storage:
2. Pre-processing with Python:
3. Preparing for multimodal processing (Base64):
4. Data extraction with Gemini 1.5 Pro:
The prompt
The prompt is a key component and reflects best practices for guiding a multimodal model in structured extraction. It includes:
6. Advanced processing with n8n and AI agents:
Gemini’s structured JSON output is passed into an n8n workflow.
Validation is a critical integrity layer. Even with highly accurate extraction, there’s always a risk of anomalies. Validating data before database insertion ensures: antes de la inserción en la base de datos asegura que:
This project showcases the transformative impact of combining advanced OCR, large language models (LLMs), and workflow automation. The construction company achieved:
The project's success stems not only from using cutting-edge AI models (Gemini 1.5 Pro for OCR, GPT-4.1 as an intelligent agent), but also from meticulous prompt engineering and a solid workflow architecture in n8n. Strategic use of techniques like Chain of Thought ("THINK") reasoning enhances the agent’s ability to handle complex scenarios, interact with external systems (via API), and validate critical data before proceeding.
In summary, we’ve transformed an inefficient, error-prone manual process into a smart, scalable digital workflow demonstrating how AI can solve real-world operational challenges and drive tangible business value.
What's next?
If you have a similar idea you'd like to implement in your company, feel free to reach out. You can contact me at pablo@ideasforge.io
To homepage
Copyright ©2025 Ideasforge