
Infinigence's Agentic MaaS platform saw token call volume jump over 20x in six months. Inference now drives more compute spend than training, the company said.
Infinigence said its Agentic MaaS platform saw token call volume jump more than 20-fold in six months. The company added that inference now accounts for a larger share of compute spending than training, a shift that underpins its strategy as a neutral infrastructure layer between chip makers and model developers.
The platform, short for model-as-a-service, lets AI agents call models on demand. The surge in token volume reflects a broader industry move from training large models to running them in production. As applications go live, inference workloads multiply. Infinigence said its call volume growth came from existing customers scaling usage, not just new sign-ups.
The company positions itself as an independent middleware provider. It does not build its own chips or foundation models. Instead, it routes inference requests across multiple hardware and model providers, choosing the cheapest or fastest option for each call. That neutrality, Infinigence argues, gives developers flexibility and avoids lock-in to any single cloud or chip architecture.
The shift from training to inference changes the economics of AI compute. Training requires massive, concentrated bursts of GPU time. Inference is more distributed, latency-sensitive, and runs continuously. Platforms that can manage that routing efficiently capture a growing share of spending. Infinigence said its flywheel works like this: more token calls attract more model providers, which improves routing options, which draws more developers, which drives more calls.
Competition is not standing still. Cloud hyperscalers offer their own inference services, often bundled with storage and networking. Some model providers, like OpenAI and Anthropic, sell direct access to their models. Infinigence's pitch is that developers want a layer that works across all of them without favoring one. The company said its growth rate suggests that pitch is resonating, at least among AI agent builders who need to switch models frequently.
The company did not disclose revenue or profit margins. It said it remains focused on expanding the number of supported models and hardware backends. Infinigence plans to add support for more specialized inference chips, including those from startups that target lower power consumption per token.
For now, the token volume numbers are the clearest signal. A 20x increase in six months, driven by inference overtaking training, points to a market where middleware can capture value without owning the underlying compute or the model. Whether that holds as hyperscalers tighten their bundles is the open question. Infinigence says its independence is the reason customers choose it, not a weakness.
Prepared with AlphaScala research tooling and grounded in primary market data: live prices, fundamentals, SEC filings, hedge-fund holdings, and insider activity. Each story is checked against AlphaScala publishing rules before release. Educational coverage, not personalized advice.