← All case studies

FINTECH / AUTOMATION

End-to-End Automation of US Tax Returns via AI Agents & Computer Vision

How we built an autonomous AI agent that operates legacy tax software via computer vision, designed to scale horizontally without friction.

OCRComputer VisionAI AgentsPythonLangSmith
The problem

A US VC-backed fintech startup aimed to fully automate the US tax return filing workflow. The main challenge wasn't just interacting with legacy software via computer vision — it was making the system scalable at volume to generate real economic return. Going from code that works on one machine to massive operational scale is where most automation projects fail.

What we did

After a 30-day PoC, we engineered an autonomous agent combining OCR and Computer Vision to navigate Windows desktop environments and interact with tax software. To guarantee horizontal scalability, we focused on two technical pillars: 1. Stateless Architecture: We made the agent completely stateless. Session state is irrelevant and fully cleared between runs, allowing the system to scale without memory or process conflicts. 2. Structured Observability: Scaling means managing thousands of interactions. We implemented rigorous log management to avoid being flooded with indecipherable output. When errors occur — inevitable when interacting with software designed for humans — tracing lets you find the root cause and act fast.

Result

In just three months the system went to production. The agent allowed the startup to drastically reduce accountants' manual work. Thanks to the stateless approach and monitoring that tracks errors accurately and in real time, the company gained immediate operational scalability — handling volumes of tax returns that would otherwise be impossible to process manually.