Treat AI Agents as Untrusted Systems, Researchers Warn Crypto

A May 2026 paper published on arXiv argues that AI agents handling financial transactions should be architected as fundamentally untrusted components within larger systems. The paper, titled “Agent Security is a Systems Problem” (arXiv:2605.18991), arrives as the crypto industry bets heavily on autonomous agents to manage DeFi trades, wallet operations, and stablecoin payments.

Circle CEO Jeremy Allaire has projected that billions of AI agents will independently conduct economic activities using stablecoins within the next three to five years. That timeline makes the paper’s recommendations urgent for anyone building or investing in crypto infrastructure that touches AI.

The Core Argument: Least-Privilege Sandboxing for Agents

Modern operating systems do not trust individual processes. Every application runs in a sandbox with limited permissions, can only access files it has been explicitly granted, and gets terminated if it tries to reach beyond its boundaries. The researchers want the same philosophy applied to AI agents.

The paper advocates for three specific measures:

Enforcing security invariants at the system level – hard rules that cannot be overridden by the AI itself.
Implementing least-privilege sandboxing – agents only get access to the minimum resources needed for their specific task.
Ensuring effective separation of instructions from data – addressing one of the most dangerous attack vectors in AI systems today.

That last point matters more than it might sound. Prompt injection attacks work precisely because AI agents often cannot distinguish between legitimate instructions and malicious data that contains hidden commands. When an agent processes a transaction memo that secretly contains instructions to redirect funds, the lack of separation becomes a real-dollar problem.

Why Prompt Injection Is a $500,000 Problem

An April 2026 incident resulted in exactly $500,000 being drained from a crypto wallet due to flaws in AI infrastructure and malicious tool calls. The attack exploited the kind of vulnerability the researchers are warning about: an AI agent with too much access, insufficient verification of the tools it was calling, and no system-level guardrails to catch the anomaly before funds left the wallet.

The autonomous nature of these agents compounds the risk. A human trader who receives a phishing email might pause and think. An AI agent that receives a carefully crafted prompt injection executes it at machine speed, potentially draining assets before any monitoring system can react.

The Systems Problem vs. Model Problem Distinction

The paper’s recommendation to treat this as a “systems problem” rather than a “model problem” is a meaningful distinction. It shifts responsibility from AI developers alone to the broader ecosystem of infrastructure providers, protocol designers, and platform operators.

Most current security research focuses on making models more robust against adversarial inputs. That approach has limits. A model that passes every red-team test can still be compromised if the system around it allows a malicious tool call to execute without verification.

What the paper does not say

The researchers do not argue that AI models are inherently dangerous. They argue that the current deployment pattern – giving agents broad access to wallets, signing keys, and DeFi protocols – is structurally unsafe. The fix is architectural, not behavioral.

Who Is Exposed: DeFi Protocols, Wallet Providers, and Stablecoin Issuers

What to Watch: Hardware Anchors and Verifiable Computation

The paper points to two emerging categories of infrastructure that will likely become table stakes for institutional-grade AI agent platforms within the next 12 to 18 months.

Verifiable computation for AI agent actions

If an agent’s decision-making process can be cryptographically verified after the fact, stolen funds can be traced and potentially clawed back. Protocols that implement zero-knowledge proofs for agent reasoning will have a structural advantage in attracting institutional capital.

Mandatory least-privilege access controls

Platforms that enforce per-transaction permission grants – where an agent must request and receive explicit approval for each action – will reduce the blast radius of any single compromise. This is the crypto equivalent of an operating system asking for camera permission every time an app opens it.

The Practical Takeaway for Traders and Builders

For traders evaluating crypto projects that claim AI integration, the relevant question is not “how smart is the model?” but “how is the agent sandboxed?”. Projects that cannot answer that question with specific architectural details are taking on uncompensated risk.

For builders, the paper provides a clear checklist: enforce invariants at the system level, sandbox agents to least privilege, and separate instructions from data. The April 2026 incident shows the cost of ignoring any of those three measures.

For further context on how crypto infrastructure is evolving to handle autonomous agents, see AlphaScala’s crypto market analysis and the Bitcoin (BTC) profile for broader market positioning.

Treat AI Agents as Untrusted Systems, Researchers Warn Crypto

The Core Argument: Least-Privilege Sandboxing for Agents

Why Prompt Injection Is a $500,000 Problem

The Systems Problem vs. Model Problem Distinction

What the paper does not say

Who Is Exposed: DeFi Protocols, Wallet Providers, and Stablecoin Issuers

DeFi protocols

Wallet providers

Stablecoin issuers

What to Watch: Hardware Anchors and Verifiable Computation

Verifiable computation for AI agent actions

Mandatory least-privilege access controls

The Practical Takeaway for Traders and Builders

Explore More

More from AlphaScala

Trading Q&A

Related Tools & Research

Asset Profiles