Prompt Injection Explained
Prompt injection is an attack technique that manipulates artificial intelligence and large language model (LLM) applications by inserting crafted instructions into input that the model processes. The OWASP Top 10 for LLM Applications ranks prompt injection as the number-one risk (LLM01:2025) for the second consecutive edition, and MITRE ATLAS catalogs it as technique AML.T0051 under Initial Access. The vulnerability exists because LLMs process instructions and data in the same input channel with no reliable mechanism to separate the two.
Direct and Indirect Prompt Injection
Prompt injection attacks fall into two categories based on how the malicious instructions reach the model.
- Direct prompt injection. The attacker submits malicious input through the application's user-facing interface. A common pattern is the "ignore all previous instructions" payload, where the attacker tells the model to discard its system prompt and follow a new set of commands. Direct injection is sometimes called jailbreaking when the goal is to bypass the model's safety guardrails. Attackers use direct injection to generate harmful content, extract system prompt details, or manipulate the model's outputs for downstream processes.
- Indirect prompt injection. The attacker embeds hidden instructions in external content that the model retrieves and processes. The attacker never interacts with the model directly. Instead, the payload lives in a document, web page, email body, database record, or API response that the LLM ingests as part of its workflow. Because the model treats retrieved content with the same trust level as its own instructions, the injected payload can hijack behavior without the user or application operator noticing. NIST AI 100-2 E2025 identifies indirect prompt injection as a distinct attack category within its adversarial machine learning taxonomy.
Why Prompt Injection Matters for Email Security
AI-powered email triage and classification tools introduce a new attack surface because they process untrusted content by design. Every inbound email is external input that an LLM-based filter must read, interpret, and act on. An attacker who understands this pipeline can embed injection payloads in places the model will consume.
Practical attack scenarios include:
- Hidden instructions in email bodies. White-on-white text, zero-width characters, or HTML comments containing directives that tell the AI to classify the message as safe, suppress alerts, or forward content to an external address.
- Weaponized attachments and linked documents. A PDF, spreadsheet, or hosted document containing injection text that triggers when the AI reads the file for content analysis or summarization.
- Chained social engineering. An attacker pairs a traditional phishing lure with an injection payload, so the AI component misclassifies the message while the human recipient sees a convincing pretext.
These scenarios are not theoretical. Security researchers have demonstrated data exfiltration through prompt injection targeting AI-powered customer service agents, and the same principles apply to any AI system that processes untrusted email content.
Defending against prompt injection in email security requires treating every piece of external content as potentially adversarial input. Approaches include constraining model permissions so the AI cannot take high-impact actions without human approval, separating the instruction channel from the data channel where architecturally possible, and layering behavioral analysis with generative AI classification so no single model decision is final.
Related Terms
Email Attack of the Day is a daily series from
IRONSCALES spotlighting real phishing attacks caught by Adaptive AI and our community of 35,000+ security professionals. Each post breaks down a real attack. What it looked like, why it worked, and what to do about it.