The Tax PDF That Every Scanner Declared Clean (It Wasn't)

Written by Audian Paxson | Mar 29, 2026 11:00:00 AM

TL;DR A PDF titled 'Tax Document' landed in four mailboxes at a mid-market financial services firm during tax season. The file passed every static scanner, contained no JavaScript, no embedded URLs, no AcroForm fields, and was marked clean by antivirus. What it did contain: 12 occurrences of the PDF /AA (Additional Actions) token, a spec-level feature that can trigger automated behavior on document events like page open or close. Static tools cannot execute PDF behavior, so the /AA payload remained invisible to them. Themis flagged the email at 66% confidence despite the clean static verdict, citing sender anomalies and structural risk patterns. This is the gap between 'no malicious code found' and 'safe to open.'

Severity: High Malware Delivery Credential Harvesting MITRE: {'id': 'T1566.001', 'name': 'Phishing: Spearphishing Attachment'} MITRE: {'id': 'T1204.002', 'name': 'User Execution: Malicious File'} MITRE: {'id': 'T1027', 'name': 'Obfuscated Files or Information'}

The file had no JavaScript. No forms. No embedded URLs. No links in the body. Every antivirus engine that touched it returned clean. The sandbox said clean. The gateway passed it without a flag.

It had 12 /AA tokens.

That number, buried in the PDF object tree, is what separated this from a routine tax-season attachment. And it's what most email security tools are completely blind to.

What a "Clean" PDF Verdict Actually Means

When a static scanner calls a PDF clean, it means: no known malicious signatures were found in the file's extracted objects. The tool parsed the structure, pulled out embedded content, and checked it against a database of known bad things.

What it cannot tell you is what the file will do when a PDF viewer opens it.

That distinction matters enormously. The PDF specification includes a feature called the Additional Actions dictionary, denoted in the file's object tree as /AA. It is part of the official ISO 32000 PDF standard, not a vulnerability or an exploit. It is a designed capability that allows PDF authors to attach automated actions to document lifecycle events: opening a page, closing a page, opening the document, closing the document, losing focus on a field.

Those actions can include URI requests, JavaScript execution (when a /JS object is referenced by the /AA entry), application launches, and form submissions. A PDF built to exploit /AA does not need to put the malicious object in an obvious location. The action only fires when the viewer processes the relevant event. Static parsers that extract objects without executing them will simply catalog the /AA entry and move on.

The Setup

The email arrived on a Sunday afternoon during the last week of March, timed precisely to tax season in North America. The subject line: Tax Document. The sender: a Gmail address, dcthardes@gmail[.]com. The X-Mailer header identified it as sent from an iPhone.

The body contained two things: a Microsoft-injected external sender warning, and "Sent from my iPhone." Nothing else. No instructions, no urgency, no credential request. Just the subject line and a PDF called Document.pdf.

SPF passed. DKIM passed. DMARC passed. The message was legitimately delivered from Google infrastructure. That's not exculpatory, it's just accurate. A compromised Gmail account or an attacker operating one sends authenticated mail just as cleanly as a legitimate sender does.

The sender was flagged as a first-time contact. High risk score. No prior communication history with any of the four affected mailboxes.

This is the part of phishing that perimeter-based tools consistently miss: authentication is not identity. A message can be perfectly authenticated and still be an attack.

Inside the PDF

Document.pdf. 63,117 bytes. PDF 1.3. Metadata title: "CCM Tax." Author: "Registered to: SYMCOR." Created with OpenText Exstream 16.2.0, a document composition platform common in financial services for generating statements and tax documents. Creation timestamp: 2026-03-29T22:57:34Z, roughly an hour before it landed in the inbox.

Static forensic extraction found:

No /JS or /JavaScript objects. Zero.
No AcroForm fields. No credential collection forms of any kind.
No /URI or HTTP links in extracted text.
No embedded files. No executables hidden inside the PDF container.
No /OpenAction entry. The standard hook for on-open execution was absent.

What the extraction did find: 12 occurrences of /AA.

For context: most legitimate PDFs produced by enterprise document systems contain zero /AA entries. A PDF with a single /AA entry tied to a field focus event is unremarkable. A PDF with 12 is not. That count implies layered actions distributed across multiple pages or multiple event types, a pattern consistent with a deliberate attempt to trigger behavior at specific points in the viewing session rather than all at once on open.

The static scan could not determine what those 12 /AA entries point to without dynamic execution. That is the entire problem.

See how many phishing emails are getting through your filters.

Why Static Analysis Is the Wrong Tool for This Problem

The gap here is not a bug in any specific product. It is a structural limitation of the static analysis approach.

Static analysis extracts file contents and compares them against known patterns. It is fast and reliable for known threats. It cannot evaluate behavior that only manifests at runtime.

The Verizon 2024 Data Breach Investigations Report found that 94% of malware is still delivered via email, with attachment-based lures remaining a core delivery mechanism. CISA and the IRS issue annual tax-season phishing advisories specifically because attackers exploit the credibility of financial documents during Q1 filing periods (IRS Security Summit phishing alerts). MITRE ATT&CK catalogs this under T1566.001 (Spearphishing Attachment) and T1027 (Obfuscated Files or Information).

The PDF /AA approach sidesteps static detection entirely. No signatures to match. Just a specification feature used in a way its designers did not intend.

How Themis Caught What the Scanners Missed

Themis flagged this email at 66% confidence, labeling it as a credential theft attempt. No static payload was needed to reach that conclusion. The detection was built from contextual and behavioral signals:

First-time external sender, never seen by any of the four mailboxes
Free consumer email address (Gmail) sending a corporate-facing tax document
Tax-season timing with a subject line designed to prompt action
Sparse email body with no legitimate business context
Structural anomaly in the PDF (12 /AA tokens, elevated caution flag from attachment analysis)

Across the IRONSCALES platform, patterns like this, minimal-body lure with anomalous attachment from a first-time high-risk sender, are among the highest-signal combinations for attachment-based phishing. The absence of an obvious payload is itself signal. Legitimate senders sending legitimate tax documents include context. They do not send a blank email with a two-word subject and "Sent from my iPhone."

What to Watch For

This case is useful as a template for what PDF-based evasion looks like when it is working correctly from the attacker's perspective.

Indicators of Compromise (IOCs)

IOC	Type	Notes
`dcthardes@gmail[.]com`	Sender email	First-time contact, high risk score
`cff49585bda4434a6aa11282686a1498`	MD5, `Document.pdf`	63,117 bytes, PDF 1.3
`2607:f8b0:4864:20::433`	Sending IP (IPv6)	Google Gmail infrastructure
`mail-pf1-x433[.]google[.]com`	Sending relay	Gmail outbound MX
"CCM Tax"	PDF metadata title	OpenText Exstream 16.2.0 origin

Detection signals worth building into your posture:

The /AA count is a meaningful heuristic. Normal enterprise-generated PDFs (tax documents, invoices, statements) produced by OpenText Exstream, DocuWare, or similar platforms do not generate 12 /AA entries. If your email security or endpoint tooling can surface PDF object metadata, a threshold alert on /AA count greater than 3 or 4 is defensible.

Sender context matters more than sender authentication. SPF/DKIM/DMARC passing is table stakes. What matters more is whether this sender has ever communicated with this mailbox, whether the domain is commercial or free, and whether the sending device (iPhone Mail, in this case) is consistent with a legitimate business document workflow.

Behavioral sandboxing is not optional for this class of threat. A sandbox that does not actually render the PDF in a viewer capable of processing document events will not catch /AA payloads. Static extraction is not sufficient. The IRONSCALES threat intelligence blog has covered similar sandbox-evasion patterns across multiple PDF and archive-based lures.

Tax-season timing, mobile sending client, minimal body, first-time sender: these signals should trip wires before anyone asks what is in the attachment. By the time you are analyzing PDF structure, you are already behind.

This is what behavioral and heuristic detection is for. Not to replace static analysis, but to cover the cases where "no malicious code found" is a dangerously incomplete answer.

Email Attack of the Day is a daily series from IRONSCALES spotlighting real phishing attacks caught by Adaptive AI and our community of 35,000+ security professionals. Each post breaks down a real attack. What it looked like, why it worked, and what to do about it.

View full post