What Building a Production Agent Actually Teaches You
Everyone's talking about agents. Few are shipping them.
I recently worked through building a multi-agent system for a large-scale image processing task. Not a demo. Not a tutorial walkthrough. A real pipeline that needed to process thousands of files, run on shared infrastructure, and produce reliable outputs.
Here’s what that experience reinforced—and what I wish more AI engineers understood before they start.
1. Your agent is only as good as its environment
The exciting part is the ML. The frustrating part is everything else.
Before my agents could do anything, I had to solve:
Where temporary files get written (and whose quota they count against)
Which filesystems have internet access (spoiler: in HPC systems, not the compute nodes; in cloud - allow internet access)
How model weights get loaded when you can’t download at runtime
These aren’t glamorous problems. But they’re the problems that determine whether your agent actually runs or just looks good in a notebook.
The lesson: Production AI work is 30% models, 70% plumbing. Respect the plumbing.



