Prompt Injection Is the Vulnerability Class We Don't Know How to Fix Yet
Executive Summary
Security leaders are accustomed to a particular rhythm with vulnerabilities. A flaw is discovered, a patch is issued, the patch is applied, the risk is reduced. The entire apparatus of vulnerability management assumes that vulnerabilities can, in principle, be fixed. Prompt injection breaks that assumption — and it is doing so at precisely the moment organizations are connecting AI systems to their real tools, data and workflows.
Prompt injection is, at its core, the exploitation of a fundamental property of how large language models work: they do not reliably distinguish between the instructions they are supposed to follow and the data they are supposed to process. When an AI system reads a document, a web page, an email or a database record, malicious instructions hidden in that content can be interpreted by the model as commands. The model was given untrusted data and treated part of it as a trusted instruction — because, structurally, it has no robust way to tell the difference.
This is not a bug in a particular product that a vendor will patch next quarter. It is a characteristic of the technology itself, and despite significant effort, there is no reliable, general fix. For a security function, that is an unfamiliar and uncomfortable category: a vulnerability class that has to be managed rather than eliminated.
Why This Matters Now
For a while, prompt injection was a curiosity — a way to make a chatbot misbehave or say something it shouldn't. The consequences were embarrassing rather than serious, because the AI system could only produce text. That has changed decisively.
The shift to AI agents — systems that do not just answer but act, calling tools, accessing data, executing tasks and interacting with other systems — has transformed prompt injection from a content problem into an action problem. An agent that can be hijacked by injected instructions is not merely going to say the wrong thing; it can be induced to take the wrong action: to exfiltrate data it has access to, to invoke a tool in a harmful way, to send information to an attacker, or to manipulate a system it is connected to. The blast radius of a successful injection is now the full set of permissions and capabilities the agent holds.
And organizations are connecting these agents to real systems quickly, often without fully internalizing that every piece of untrusted content the agent processes is a potential injection vector. The combination of an unsolved vulnerability class and rapid deployment into consequential roles is exactly the kind of gap that produces incidents.
CISO2CISO Insight
We know how to manage vulnerabilities we can patch. Prompt injection asks a harder question: how do you secure a system whose core behavior is to follow instructions, when you cannot guarantee it can tell your instructions from an attacker's?
Why It Resists a Fix
Understanding why prompt injection is so stubborn is essential to managing it, because the instinct to wait for a patch will leave the organization exposed indefinitely.
Instructions and data share the same channel. In a traditional application, code and data are architecturally separate, and decades of security engineering have hardened the boundary between them. In a language model, instructions and data arrive as the same kind of input — text — and are processed by the same mechanism. The separation that makes injection preventable in conventional software does not cleanly exist here.
Mitigations reduce but do not eliminate. Considerable effort has gone into defenses — filtering inputs, instructing models to ignore embedded commands, separating system and user content. These help, sometimes substantially. But they are probabilistic mitigations against a determined adversary, not the deterministic guarantees that "patched" implies. A defense that works against most injections is not the same as a vulnerability that has been closed.
The attack surface is everything the AI reads. Any content an AI system processes can carry an injection — a document, a website, an email, a calendar entry, a record retrieved from a database, the output of another tool. As AI systems are connected to more sources, the surface expands accordingly, and much of it is content the organization does not control.
Managing What You Cannot Patch
If elimination is not available, the discipline shifts to containment — limiting what a successful injection can achieve. This is familiar territory for security, even if the trigger is novel.
Constrain the agent's permissions ruthlessly. The damage a hijacked agent can do is bounded by what it is allowed to do. An agent with broad access and powerful tools is a catastrophic injection target; an agent with narrowly scoped, least-privilege access is a contained one. Treating agent permissions with the same rigor applied to any powerful identity is the single most important control.
Keep a human in the loop for consequential actions. For actions with significant impact — moving money, deleting data, sending external communications, changing configurations — requiring human confirmation breaks the chain between an injection and an irreversible outcome. Full autonomy and high consequence are a dangerous combination when injection cannot be ruled out.
Treat all ingested content as untrusted. The organization should assume that any external content an AI system processes may contain an injection attempt, and design the system so that processing untrusted content cannot, by itself, trigger privileged actions.
Monitor agent behavior for anomalies. Because injection produces actions, those actions can be watched. An agent suddenly accessing data or invoking tools outside its normal pattern is detectable — if the organization is monitoring agent behavior the way it monitors other privileged activity.
Executive Framework
| Dimension | Traditional vulnerability | Prompt injection |
|---|---|---|
| Root cause | A specific flaw in code | Inherent inability to separate instruction from data |
| Remedy | A patch eliminates it | No reliable general fix — managed, not closed |
| Trigger | Crafted exploit | Malicious instructions in any ingested content |
| Consequence (chat) | Wrong or harmful text | Embarrassment |
| Consequence (agents) | — | Unauthorized actions, data exfiltration, tool abuse |
| Control philosophy | Patch and verify | Constrain permissions, contain blast radius, monitor |
What CISOs Should Do Next
- Treat prompt injection as a managed risk, not a vulnerability awaiting a patch — build the program around containment rather than elimination.
- Scope every AI agent's permissions and tool access to the strict minimum, recognizing that those permissions define the blast radius of any successful injection.
- Require human confirmation for high-consequence actions, keeping full autonomy away from operations that are irreversible or materially damaging.
- Design AI systems so that processing untrusted external content cannot, on its own, trigger privileged actions or data access.
- Monitor agent behavior for anomalies, applying the same detection mindset used for privileged identities to the actions AI agents take.
- Govern the connection of agents to sensitive systems deliberately, asking before each integration what an injected instruction could cause the agent to do.
Board-Level Questions
- As we connect AI agents to our tools and data, do we understand that they can be manipulated through the content they process — and have we contained what that manipulation could achieve?
- What is the maximum damage a hijacked AI agent could do given its current permissions, and are those permissions scoped to the minimum?
- For consequential actions, do our AI systems require human confirmation, or can they act autonomously on instructions that may have been injected?
- Are we monitoring the actions our AI agents take the way we monitor other privileged access?
Final Executive Takeaway
Prompt injection is uncomfortable precisely because it does not fit the model security has spent decades refining. There is no patch to deploy, no version that closes it, no point at which the organization can declare it resolved. It is a property of how the technology works, and it will be present, in some form, for the foreseeable future. Waiting for a fix is not a strategy; it is an exposure that compounds as more agents are connected to more systems.
The path forward is the one security has always taken when prevention is imperfect: contain the consequence. Scope the permissions, keep humans in the loop where it matters, treat ingested content as hostile, and watch what the agents actually do. The goal is not an AI system that cannot be injected — that may not be achievable — but one where a successful injection cannot reach anything that matters.
The question is no longer "can our AI be tricked?" Assume it can. The question is "when it is, what is the worst thing it is allowed to do?" — and the answer should be: not much.
*To be continued...*

