The 200-Call AI Triage: Deployment Lessons for Enterprise

A voice AI prototype that worked flawlessly in a demo broke down in the real world because of a one-second lag. Éric Pinet's team at Unicorne discovered this while testing a triage system for medical clinics across Québec. When responses came in under a second, patients stayed on the line. When it took longer, they asked for a human, undermining the system's purpose.

The experience captures a pattern Pinet calls suspended animation: the state where most enterprise AI projects stall in 2024 and 2025. They do not fail outright. They sit between promise and deployment, usually at one of two points – cost or security. For anyone building or investing in AI-driven services, the clinic project offers a concrete map of where real-world deployment breaks down.

The One-Second Latency Threshold That Killed Voice AI Demos

Pinet, president of Québec City-based Unicorne, told BetaKit that the hard part is not building a beautiful demonstration. "The hard part is putting it into production with real customers, and making it work at an efficient cost." In the clinic use case, every patient response travels through multiple steps: speech to text, a reasoning model, and back to speech. Any transition can introduce lag.

To keep the conversation flowing, the team built in short acknowledgments – phrases like "OK, I understand" – that the model delivers while it works on its next response. That is a low-tech fix for a high-stakes problem. If the latency budget exceeds one second, the patient perceives the system as robotic and requests a human handoff, defeating triage automation.

Practical rule: Latency is not a performance metric. It is a conversion metric. In voice AI, sub-second response determines whether the user stays engaged. For any real-time customer-facing AI, latency tolerance is the first deployment constraint.

Cost: The Token Trap Most Demos Ignore

Pinet identifies cost as the first point where AI projects stall. "A system that feels cheap in a demo can become hard to sustain once it's handling real volume." Every interaction with a generative model carries a token cost, and those costs do not amortize the way traditional SaaS does. A prototype that handles a few dozen test calls looks efficient. The same system at 200 calls a day becomes unaffordable unless it was designed from the start to manage how much information flows through the model.

The clinic system now handles more than 200 calls a day across client clinics, according to Pinet. That volume exposes the true cost structure. The system must decide which parts of the conversation require generative reasoning and which can use cheaper, deterministic triage protocols. Unicorne built the pipeline inside AWS using Connect for calls, Nova Sonic for voice, and Bedrock for reasoning over the clinic's triage protocols. The choice to run everything inside a single cloud environment was driven partly by cost control – each component has its own billing and latency profile.

Key insight: The cost of running generative AI is not linear with volume. It is exponential unless the architecture limits the surface area of model calls. The winning designs treat the model as an expensive resource, not as the default compute layer.

Security: Why Infrastructure Determines Product Viability

Pinet's second deployment hurdle is security. "Prototypes often rely on external APIs, which are difficult to defend in a regulated environment." When audio containing patient health data leaves your infrastructure, you are trusting someone else to keep it safe and decide where the data goes. For Québec's privacy rules, that is not acceptable.

The Québec Triage Pipeline: Infrastructure as the Product

The old system in Québec clinics relied on receptionists taking messages without clinical context. Nurses then called patients back in order, learning the reason for the call only once the conversation began. The AI solution addresses the first contact point. A voice bot answers, asks structured questions, applies each clinic's triage protocols, and produces a summary. By the time a nurse calls back, they have a thorough summary and urgent cases can be prioritized earlier.

Pinet reports that nurses have responded positively, noting the intake summaries improve efficiency. The system now operates across multiple clinics. The deployment required more than model tuning – it required mapping the exact handoff scenarios and building the infrastructure to support them.

Risk to watch: The biggest risk for any regulated enterprise AI is not model accuracy. It is the gap between what the infrastructure can log and what the regulator requires. If the audit trail is incomplete, the system cannot be deployed regardless of how well the model performs.

What This Means for Enterprise AI Buyers

Pinet's advice to founders is to ask the unglamorous questions early. "What's the cost? What's the security? How do you manage access patterns?" These questions rarely feel urgent in the demo phase. They become the binding constraint in production.

For investors tracking AI infrastructure plays, the gap between demo and production is a key risk factor. Companies that solve the cost and security problems for regulated use cases – like healthcare and finance – own a durable advantage. Those that sell beautiful demos without path to deployment will see their customer pipelines stall. AlphaScala's stock market analysis tracks how these deployment bottlenecks affect sector-wide valuations.

The Unicorne clinic project is a small window into that larger pattern. A one-second latency difference, a poorly designed token budget, or an external API you cannot audit can turn a working model into a shelfware project. The companies that survive the production gap are the ones that design for deployment first and demo second.

The 200-Call AI Triage: Deployment Lessons for Enterprise

The One-Second Latency Threshold That Killed Voice AI Demos

Cost: The Token Trap Most Demos Ignore

Security: Why Infrastructure Determines Product Viability

The Québec Triage Pipeline: Infrastructure as the Product

What This Means for Enterprise AI Buyers

Explore More

More from AlphaScala

Trading Q&A

Related Tools & Research