Why an Indian startup charges per output, not per token, for AI

Token prices dropped roughly 35% in two years. Enterprise AI budgets rose nearly 6X in the same period, to $7 million in 2026. That gap is where Turiyam, a Bengaluru chip startup, is building a business.

Sanchayan Sinha founded Turiyam in December 2024 with Parag Jain and Praveen Jain. The firm builds inference chips – following an existing model rather than training one – without using any Nvidia components in the stack. Sinha told Digitimes Asia he expects 95% of India's AI market to eventually run on inference alone.

The cost dynamic is well understood. A former CTO at an Indian enterprise described it this way: "Two years ago, we might have spent $45 to generate a thousand responses. Today, we can do that for a fraction of the cost, almost $0.75. We're no longer generating just a thousand responses. We're generating millions." The total bill exploded even as the per-query price collapsed.

Turiyam's answer is to stop charging per token. Most AI providers bill by token volume. Turiyam charges by the finished product – a generated image, a completed customer call – at a price it claims is 10% to 20% cheaper than running the same job on Nvidia. A company spending Rs 2 crore a year on AI infrastructure could, by that math, spend about Rs 40 lakh on the same functions.

The pricing model matters because it changes the incentive. Under per-token billing, the provider wants higher token throughput. Under per-output billing, the provider wants to drive down cost per task. The two goals pull in opposite directions. Turiyam's chip efficiency directly determines its revenue per output. If the chip is efficient, the provider gets paid the same for less compute. The customer wins. Whether the unit economics hold at scale is unproven – Turiyam hasn't disclosed production capacity or customer names – but the direction is real.

The read-through for Nvidia is not that Turiyam will take immediate market share. NVDA shares fell 4.13% to $200.04 today, with an Alpha Score of 70. The move reflects a broader market reassessment of AI chip pricing power. If enough demand shifts to ultra-low-cost inference architectures, the pricing premium Nvidia commands on inference workloads could narrow. The competition is on cost per completed task, not FLOPs or throughput.

Enterprise buyers are already pushing for output-based pricing. The former CTO said his team now negotiates on a per-task basis. "We're telling every vendor: charge us per completed call, not per token. They hate it. That's where the market is going." That quote captures the shift. The question is not whether Turiyam succeeds as a company. It is too early to assess that. The real question is whether the pricing standard it is proposing gains adoption.

Sinha's 95% inference projection is a long-term bet. The near-term signal is the pricing model itself. If the per-output standard catches on, it will reshape revenue models across the AI stack, from chip designers to hyperscalers. The catalyst is already here: enterprise buyers demanding a different way to pay.

Why an Indian startup charges per output, not per token, for AI

Explore More

More from AlphaScala

Trading Q&A

Related Tools & Research

Asset Profiles