Why "AI" is the easy part Most teams discover quickly that the model is the easy part. The hard part is everything around it — eval pipelines, caching, prompt versioning, observability, fallback…
Most teams discover quickly that the model is the easy part. The hard part is everything around it — eval pipelines, caching, prompt versioning, observability, fallback strategies, and the cost curves that surprise you in week three.
This post is a frank look at what we learned building three production-grade AI features for clients between Q2 2025 and Q1 2026.
Each one taught us a different lesson.
A 1.4-second average response felt fast in isolation, but our P95 was 4.8 seconds, and the P99 was over 12. Operators started bypassing the AI entirely.
The fix wasn't a faster model — it was a streaming UI + a tighter prompt + a small-model fallback for the 80% case.
"If the AI can't answer in two seconds, it's not an AI feature, it's a chore." — Engineering lead on the triage team
We learned to treat the evaluation set as the source of truth for what the feature does. Every prompt change ran:
results = run_eval_suite(
suite="triage_v3",
sample_size=500,
judges=["accuracy", "tone", "no_hallucination"],
)
A change that didn't move the metrics didn't ship.
Token costs ballooned from ~$280/day to $11k/day in one quarter. The fix was a four-step pipeline:
We brought it back to $1.9k/day with zero quality regression.
The next frontier is agents that act, not just answer. We're prototyping tool-using agents for back-office workflows — early signs are promising, but the eval surface is enormous.
If you're building something similar, we'd love to compare notes.
01 · RelatedA step-by-step engineering case study of an API credential exposure and how modern product teams automate secret detection and rotation.
Read post
02 · RelatedBeyond OpenAI API: Building Local LLM Pipelines for Privacy Sending customer data to a third-party APIis a risk that many startups can no longer afford to take. Whether you are handling medical…
Read post
03 · RelatedDiscover why developers who combine clean code with product thinking and UI/UX empathy rise fasterto technical leadership positions.
Read postWe will reply in plain English within one business day, NDA on request. Discovery call is free.