Infinigence Token Calls Surge 20x as Inference Overtakes Training

Infinigence said its Agentic MaaS platform saw token call volume jump more than 20-fold in six months. The company added that inference now accounts for a larger share of compute spending than training, a shift that underpins its strategy as a neutral infrastructure layer between chip makers and model developers.

The platform, short for model-as-a-service, lets AI agents call models on demand. The surge in token volume reflects a broader industry move from training large models to running them in production. As applications go live, inference workloads multiply. Infinigence said its call volume growth came from existing customers scaling usage, not just new sign-ups.

The company positions itself as an independent middleware provider. It does not build its own chips or foundation models. Instead, it routes inference requests across multiple hardware and model providers, choosing the cheapest or fastest option for each call. That neutrality, Infinigence argues, gives developers flexibility and avoids lock-in to any single cloud or chip architecture.

Infinigence Token Calls Surge 20x as Inference Overtakes Training

Explore More

More from AlphaScala

Trading Q&A

Related Tools & Research