
Unicorne's voice AI handles 200 calls/day at Québec clinics. The real lesson: latency, cost, and security decide which AI projects survive the production gap.
A voice AI prototype that worked flawlessly in a demo broke down in the real world because of a one-second lag. Éric Pinet's team at Unicorne discovered this while testing a triage system for medical clinics across Québec. When responses came in under a second, patients stayed on the line. When it took longer, they asked for a human, undermining the system's purpose.
The experience captures a pattern Pinet calls suspended animation: the state where most enterprise AI projects stall in 2024 and 2025. They do not fail outright. They sit between promise and deployment, usually at one of two points – cost or security. For anyone building or investing in AI-driven services, the clinic project offers a concrete map of where real-world deployment breaks down.
Pinet, president of Québec City-based Unicorne, told BetaKit that the hard part is not building a beautiful demonstration. "The hard part is putting it into production with real customers, and making it work at an efficient cost." In the clinic use case, every patient response travels through multiple steps: speech to text, a reasoning model, and back to speech. Any transition can introduce lag.
To keep the conversation flowing, the team built in short acknowledgments – phrases like "OK, I understand" – that the model delivers while it works on its next response. That is a low-tech fix for a high-stakes problem. If the latency budget exceeds one second, the patient perceives the system as robotic and requests a human handoff, defeating triage automation.
Practical rule: Latency is not a performance metric. It is a conversion metric. In voice AI, sub-second response determines whether the user stays engaged. For any real-time customer-facing AI, latency tolerance is the first deployment constraint.
Pinet identifies cost as the first point where AI projects stall. "A system that feels cheap in a demo can become hard to sustain once it's handling real volume." Every interaction with a generative model carries a token cost, and those costs do not amortize the way traditional SaaS does. A prototype that handles a few dozen test calls looks efficient. The same system at 200 calls a day becomes unaffordable unless it was designed from the start to manage how much information flows through the model.
The clinic system now handles more than 200 calls a day across client clinics, according to Pinet. That volume exposes the true cost structure. The system must decide which parts of the conversation require generative reasoning and which can use cheaper, deterministic triage protocols. Unicorne built the pipeline inside AWS using Connect for calls, Nova Sonic for voice, and Bedrock for reasoning over the clinic's triage protocols. The choice to run everything inside a single cloud environment was driven partly by cost control – each component has its own billing and latency profile.
Key insight: The cost of running generative AI is not linear with volume. It is exponential unless the architecture limits the surface area of model calls. The winning designs treat the model as an expensive resource, not as the default compute layer.
Pinet's second deployment hurdle is security. "Prototypes often rely on external APIs, which are difficult to defend in a regulated environment." When audio containing patient health data leaves your infrastructure, you are trusting someone else to keep it safe and decide where the data goes. For Québec's privacy rules, that is not acceptable.
Unicorne's approach inverts the typical build order. Infrastructure questions – where the data lives, who sees it, what gets logged – come before the model questions. In regulated settings, infrastructure is not a backdrop to the product. It is the product. The clinic pipeline runs entirely inside a single AWS environment. Patient audio never leaves that secure environment. Each interaction is logged, and every decision is traceable.
That traceability is not a nice-to-have. In regulated healthcare, the AI must produce an audit trail for every triage decision. The model does not make a diagnosis – Pinet is explicit: "The AI is only for the first triage. It's not for diagnosis." But the handoff criteria must be codified. The team mapped out three triggers for a human takeover:
Each trigger must be logged, time-stamped, and escalated. Without infrastructure built for that logging, the product cannot pass compliance review.
The old system in Québec clinics relied on receptionists taking messages without clinical context. Nurses then called patients back in order, learning the reason for the call only once the conversation began. The AI solution addresses the first contact point. A voice bot answers, asks structured questions, applies each clinic's triage protocols, and produces a summary. By the time a nurse calls back, they have a thorough summary and urgent cases can be prioritized earlier.
Pinet reports that nurses have responded positively, noting the intake summaries improve efficiency. The system now operates across multiple clinics. The deployment required more than model tuning – it required mapping the exact handoff scenarios and building the infrastructure to support them.
Risk to watch: The biggest risk for any regulated enterprise AI is not model accuracy. It is the gap between what the infrastructure can log and what the regulator requires. If the audit trail is incomplete, the system cannot be deployed regardless of how well the model performs.
Pinet's advice to founders is to ask the unglamorous questions early. "What's the cost? What's the security? How do you manage access patterns?" These questions rarely feel urgent in the demo phase. They become the binding constraint in production.
For investors tracking AI infrastructure plays, the gap between demo and production is a key risk factor. Companies that solve the cost and security problems for regulated use cases – like healthcare and finance – own a durable advantage. Those that sell beautiful demos without path to deployment will see their customer pipelines stall. AlphaScala's stock market analysis tracks how these deployment bottlenecks affect sector-wide valuations.
The Unicorne clinic project is a small window into that larger pattern. A one-second latency difference, a poorly designed token budget, or an external API you cannot audit can turn a working model into a shelfware project. The companies that survive the production gap are the ones that design for deployment first and demo second.
Prepared with AlphaScala research tooling and grounded in primary market data: live prices, fundamentals, SEC filings, hedge-fund holdings, and insider activity. Each story is checked against AlphaScala publishing rules before release. Educational coverage, not personalized advice.