Table of Contents
Most phishing PDFs try to hide their malicious content. This one hid its identity, and the metadata gave it away before anyone opened the file.
A recipient at a regional services organization received an email from briliansupriyadiwill[@]gmail[.]com with the subject "Package Scanned A507JYE85A8N3_SFFC4~Z7EACLM4FBM1.2CRG." The body contained only that same opaque alphanumeric string: no carrier name, no tracking URL, no CTA, no context. One PDF attachment accompanied it. Gateway verdict on the PDF: clean. No JavaScript, no AcroForm fields, no embedded files, no extractable URLs.
The story was in the metadata the scanner didn't surface.
What the PDF Creator Field Reveals That Body Scanning Misses
PDF documents store a metadata dictionary in their header that records, among other fields, the Creator (the application that generated the content) and the Title (the document's internal name). Static analysis of the attachment showed:
- Creator: wkhtmltopdf 0.12.6
- Title: Geek Squad Subscription Renewal
- Subject line: Package Scanned (shipping lure)
Those three data points form the core of this attack's forensic tell. The subject frames the message as a shipping notification. The PDF title reveals the actual lure: a Geek Squad subscription renewal claim. And the Creator field identifies exactly how the document was produced.
wkhtmltopdf is a command-line HTML-to-PDF renderer used extensively in callback phishing kits. Its appeal to phishing operators is straightforward: convert a fully functional HTML lure page (complete with branding, urgency language, and a phone number) into a static PDF. The conversion removes all active content. The resulting file contains no clickable links, no JavaScript, no forms. Every gateway check that looks for exploitable mechanisms in PDF attachments returns clean. The social-engineering payload (the callback number and the scripted urgency) survives in the visual layer of the rendered document, invisible to parsers that don't perform optical character recognition or visual analysis.
MITRE ATT&CK T1566.001 covers spearphishing via attachment. T1204.002 (user execution: malicious file) applies when the victim opens the PDF and reads the callback instructions. T1656 (impersonation) covers the Geek Squad identity claim embedded in the document.
The Shipping Subject as Misdirection
The subject line and body token serve a specific purpose: they get the email past content filters without triggering brand-impersonation detection for Geek Squad. A Geek Squad-themed subject would match known phishing patterns in many gateway configurations. A shipping token does not.
The mismatch is not accidental. Phishing kit operators running volume campaigns frequently reuse template components, a shipping email wrapper here and a Geek Squad renewal PDF there, without synchronizing them. The result is an internal contradiction that tells the analyst what the gateway score cannot: these pieces come from different templates assembled for evasion, not for coherence.
The social engineering mechanism depends on urgency and authority. Geek Squad renewal phishing typically claims an auto-renewing subscription charge and instructs the recipient to call immediately to cancel. The callback phone number connects to an attacker posing as support, who then guides the victim toward credential disclosure, remote-access tool installation, or a fraudulent refund transaction.
See Your Risk: Calculate how many threats your SEG is missing
Gmail API Dispatch and the Authentication Laundering Problem
The Received header showed the message dispatched via gmailapi.google.com with HTTPREST, not a standard SMTP client. This indicates the attacker sent through the Gmail API, an automated-send pathway that fully authenticates through Google's infrastructure.
SPF passed (Google's outbound IP designated as permitted for gmail.com). DKIM passed (signed by gmail.com). DMARC passed (policy p=NONE, result pass). ARC sealed at i=2 by google.com. Every authentication check returned a positive result because every authentication check was evaluating Google's infrastructure, not the attacker's identity.
First-time sender status and the implausible local-part of the sending address (briliansupriyadiwill[@]gmail[.]com reads as a string-generated account name, not a human's) are behavioral signals that authentication headers cannot encode. These are exactly the signals that impersonation detection must evaluate when the cryptographic layer provides no evidence of spoofing.
What Detection Requires When the File Is Technically Clean
The case illustrates a gap that gateway-layer PDF scanning cannot close. Static analysis correctly returned a clean verdict: no executable content, no network callouts, no exploit primitives. That verdict is accurate and useless simultaneously.
Closing the gap requires two capabilities working together. First, metadata extraction that surfaces Creator, Producer, and Title fields alongside the attachment verdict, in the primary analyst view rather than a supplemental log. wkhtmltopdf in the Creator field, combined with a document Title that names a brand the sender has no relationship to, is a high-confidence indicator of kit assembly.
Second, cross-field coherence checking: does the email subject match the attachment's declared content? A shipping-themed email body attached to a "Geek Squad Subscription Renewal" PDF is incoherent. Humans notice incoherence. Automated scanners that evaluate each field in isolation do not.
IRONSCALES detected the behavioral anomaly (first-time external Gmail sender, opaque body, PDF attachment with mismatched internal metadata) and flagged the incident for quarantine. The Themis AI engine identified the credential-theft pattern at the campaign level, matching the assembly characteristics to known social engineering kits despite the absence of any technically malicious PDF component.
Indicators of Compromise
| Type | Indicator | Context |
|---|---|---|
| Sender address | briliansupriyadiwill[@]gmail[.]com | Attacker-controlled Gmail account; first-time sender; dispatched via Gmail API HTTPREST |
| Attachment filename | EKW5ZT9FMPMTSI6Z1QB[.]pdf | Randomized filename; MD5 f19651e085bee0ed7f9bc79833e26d76; SHA-256 503de9c70cd31a4c40a4a3879dfa8557a025c9945e0e440c280e936c16b29e00 |
| PDF Creator metadata | wkhtmltopdf 0.12.6 | Known phishing-kit HTML-to-PDF renderer; presence indicates kit-assembled document |
| PDF Title metadata | Geek Squad Subscription Renewal | Mismatches shipping-themed email subject; reveals actual lure brand |
| PDF Producer metadata | Qt 4.8.7 | Qt rendering engine paired with wkhtmltopdf; consistent kit signature |
| Dispatch method | Gmail API HTTPREST | Full SPF/DKIM/DMARC/ARC pass; authentication reflects Google infrastructure, not sender identity |
Related attacks
| Attack | What happened |
|---|---|
| The Calendar Invite That Was a Bill: Malwarebytes Impersonation via Same-Day Domain and Google Calendar | Attackers registered infodeliv.com the same day they sent a Google Calendar .ics invite demanding a $479.33 Malwarebytes charge. |
| The .com That Wasn't the .org: TLD Confusion in a Payroll Email With an Empty Body | A payroll email about annual salary and benefits arrived from the .com version of a nonprofit's domain. |
| Microsoft Bookings as a Weapon: When DMARC Says Trust Me and ARC Quietly Disagrees | A phishing email sent from bookings.microsoft.com passed every authentication check. |
| Perfect Authentication, Zero Payload: The Yahoo Free-Mail BEC That Microsoft Flagged but Didn't Block | A Yahoo free-mail account with perfect SPF, DKIM, and DMARC authentication sent a zero-payload account change request to a state government health agency. |
| The RSA Follow-Up That Wasn't: How a Post-Conference Calendar Invite Fooled Three Inboxes | A calendar invite landed right after RSA Conference, appearing to be a follow-up from an internal VP. |
Explore More Articles
Say goodbye to Phishing, BEC, and QR code attacks. Our Adaptive AI automatically learns and evolves to keep your employees safe from email attacks.