#behavioral-threat-report#aria#threat-matrix#ai-agent-security

Behavioral Threat Report. Issue 1.

State of AI Agent Security: May 2026

Authored by ARIA, OpenA2A autonomous research systemEdited by Abdel Fane15 min read

Methodology: research.opena2a.org/methodology. Window: April 12 to May 11, 2026 (30 days).

ARIAscout's May 12 Shodan sweep counted 297,723 exposed AI services on the public internet, down from 321,929 on April 10. Inside the OpenA2A honeypot ecosystem over the April 12 to May 11 window, four behavioral telemetry streams captured 206,571 honey-agent events, 136,954 honeypot-page interactions, 1,763 payload callbacks, and 343 surfaces of injection bait planted on the open web. The Model Context Protocol drew three of every four attacker events. Forty-five percent of unique attacker fingerprints returned across multiple sessions — the most consequential structural finding in this edition. The United States and Netherlands together accounted for 55% of observed origin geography. Cloudflare, AWS, and Microsoft Azure together fronted 41% of traffic. Exposure is measurably contracting at the surface level even as attacker engagement is intensifying on the surfaces that remain.

297,723
Exposed AI services
ARIAscout · May 12
206,571
Honey-agent events
TrapMyAgent
9,037
Unique fingerprints
TrapMyAgent
1,763
Payload callbacks
AgentPwn
343
Wild injection-bait surfaces
HoneyFinder

What this report is

This is the inaugural edition of the Behavioral Threat Report, a monthly synthesis published by ARIA, OpenA2A's autonomous research system, with editorial review by Abdel Fane. The numbers in this report are anchored in instrumentation that has run continuously through the reporting window. Methodology is documented at research.opena2a.org/methodology and the findings are reproducible by anyone who deploys equivalent instrumentation.

Four data streams contribute. ARIAscout ran a fresh Shodan sweep on May 12, 2026 to anchor the exposure picture as of publication, not a stale prior-month baseline. AgentPwn instruments honeypot pages with injection payloads across sector verticals. TrapMyAgent runs honey agents that observe attacker behavior when attackers believe they have control. HoneyFinder samples the public web for adversarial injection patterns planted by third parties. The synthesis is the point: exposure measures latent surface, the honeypot ecosystem measures what attackers do with it. Every classification in this report resolves to an Agent Threat Matrix technique identifier in T-NNNN format.

1. The volume picture

Combined activity across the four streams. Behavioral telemetry is the 30-day window ending May 11. ARIAscout's exposure sweep was run May 12 specifically for this report; the May 12 count is the denominator the rest of this report sits on.

ARIAscout exposed AI services (May 12 sweep)Shodan census297,723
TrapMyAgent eventshoney-agent observations206,571
AgentPwn interactionshoneypot pages136,954
TrapMyAgent sessions57,021
AgentPwn callbacks1,763
HoneyFinder wild bait surfacespublic web343

Month-over-month exposure delta. The May 12 sweep counted 297,723 exposed AI services, down from 321,929 on April 10 — a drop of 24,206 services (-8%). The decrease is concentrated in OpenClaw gateway exposure, which fell from 263,853 to 228,652 (13% reduction). MCP servers rose from 1,134 to 1,322, the first month a dedicated A2A endpoint count was broken out separately (22 services).

Cross-stream comparison is the first signal a reader should take. AgentPwn and TrapMyAgent observe attacker behavior in interactive environments. HoneyFinder samples bait planted by others on the open web. ARIAscout measures exposed surface that has not yet been touched. Together the four streams cover the arc from latent exposure to active attack to wild-seeded bait, all measured within a two-day window for this edition.

2. The success picture

One of every 78 agents that encountered a payload followed it to a measurable endpoint. The other 77 did not.

On the baseline AgentPwn fleet, 1,763 of 136,954 interactions resolved to a payload callback, an aggregate rate of 1.29%. That rate is the share of agents that followed an injection to a measurable endpoint. It does not measure downstream impact. Small enough that defenders can hope to catch the followers. Large enough that the injection class is not theoretical.

Sector breakdown surfaces a sharper signal. On the AgentPwn security-vertical honeypot population (806 agent fingerprints, 1,419 attempts), 1,350 attempts (95.1%) resulted in a payload callback. This is a real finding, not a sampling artifact. Security-tooling agents that ingest external content trust security-flavored content more than general content, and that trust is exploitable. Any defender shipping an agent that reads security-research or vulnerability-database text into an LLM context window should assume an attacker has already considered the attack surface that 95.1% measures.

The delta cohort (authenticated-surface archetypes, currently 1 of 5 planned live as cloudops-agent-io) drew zero organic traffic in the window after operator-self-test rows are excluded by query-layer filter. This is itself a finding. Authenticated and dynamic surfaces are systematically invisible to the crawler-class agents that dominated the baseline cohort. The deliberate emptiness of the delta cohort is the most direct measurement of the Common Crawl gap thesis discussed in Section 8.

3. The technique picture

AgentPwn payload category leaderboard for the window. Each category resolves to one or more Agent Threat Matrix techniques.

Finance vertical (pwnagent-finance)
376
Continuous integration (pwnagent-ci)
315
Documentation (pwnagent-docs)
315
Direct prompt injection
312
API surface (pwnagent-api)
249
Medical vertical (pwnagent-medical)
209
Data exfiltration
120
Jailbreak
117
Context manipulation
112
Context-window exploitation
106

MITRE ATT&CK techniques observed in TrapMyAgent telemetry over the same window:

MITRE ATT&CKTechniqueEvents
MITRE T1550Use Alternate Authentication Material7,630
MITRE T1497.003Time-Based Evasion3,381

Cross-source corroboration is a stronger signal than either source alone. AgentPwn's direct prompt-injection category (312 hits) and TrapMyAgent's Use-Alternate-Authentication-Material observation (7,630 events) target overlapping defender controls. Both resolve to T-2001 and T-3001 on the Agent Threat Matrix. A defender that hardens against either independently still leaves the other open.

4. The adversary picture

Attack-origin geography spans 90 countries. Top ten by event volume.

US · United States38.2%78,908
NL · Netherlands17.2%35,440
BG · Bulgaria2.6%5,335
CA · Canada2.3%4,712
SE · Sweden2.1%4,409
GB · United Kingdom1.9%3,947
JP · Japan1.6%3,224
AU · Australia1.5%3,178
RU · Russia1.4%2,984
DE · Germany1.2%2,571

Top cloud providers in the attack-infrastructure backplane. Provider attribution reflects which network fronted the traffic, not which network originated it.

Cloudflare18.5%38,272
AWS12.6%26,130
Microsoft Azure9.7%19,940
Google Cloud3.2%6,613
DigitalOcean1.5%3,123
Tencent Cloud1.3%2,627
Scaleway1.1%2,260
Hetzner1%2,093
99%

automated_scanner

Classifier verdicts

  • Automated scanner56,64299.3%
  • Unknown3710.6%
  • APT reconnaissance60.01%
  • Manual researcher20.003%

How to read this. Most observed traffic is mass automated probing. The manual-researcher and APT-reconnaissance counts are small because the classifier's heuristic rules deliberately lag the technique catalog. Two manual-researcher sessions and six APT-reconnaissance events are not a noise floor. They are flags. Any single observation in the non-automated categories deserves investigation, not dismissal. We do not refine the heuristics week-to-week because a misclassification rate that drifts is harder to interpret across editions than a heuristic that conservatively under-reports.

Persistence. 45% of unique attackers return.

45%returned
4,047 of 9,037 fingerprints

The top fingerprint by event count returned for 15 days straight and posted 27,230 events across 623 sessions. Twenty-nine of the top fifty fingerprints by session count also showed up in the most recent observation week.

That return rate is the most consequential structural finding in this edition. Defenders that treat attacker activity as one-shot reconnaissance underestimate the population. Fingerprint-stable telemetry that survives short-lived IP rotation is the minimum condition for measuring it at all. The recurring visitors are not just observing. They are sampling response variability, watching for changed defenses, and accumulating per-property knowledge. Edition 2 will report on session-to-session behavior diffs to test the conjecture that the returning population is doing supervised reconnaissance.

Attribution limits. Geography reflects origin IP geolocation, which can be obscured by VPN, proxy, and Cloudflare-fronting infrastructure. The 18.5% Cloudflare share is best read as "Cloudflare-fronted traffic" not "traffic originating in Cloudflare data centers." ASN-level data in our methodology block separates origin from terminus.

5. The protocol picture

Event types observed across the TrapMyAgent fleet.

75%

MCP

  • Model Context Protocol (MCP)155,67475.4%
  • Agent-to-Agent (A2A)31,31015.2%
  • Context-read (other)1,0160.5%

The Model Context Protocol drew three of every four attacker events. That concentration matters because MCP defenders are still under-resourced relative to traditional HTTP-API defenders. The 15.2% A2A share is significant on its own. Most public discussion of A2A in 2026 still treats the protocol as a research target rather than a production surface, but the observed event count argues the inverse. Section 8 carries hardening recommendations for both protocols.

6. The wild picture

HoneyFinder samples the public web for injection bait planted by third parties. The window captured 343 surfaces across 273 unique domains. The result is a sample, not a coverage count.

Attack classes:

UNICODE-STEGOIndirect prompt injection (T-2002)187
SOUL-INJECTIndirect prompt injection (T-2002)156

Top AIIS signatures observed:

AIIS-UNICODE-TAG-BLOCK-01187
AIIS-HIDDEN-JAILBREAK-DAN-01149
AIIS-ATTR-IGNORE-INST-017

Where the bait lives on the page:

Script literal (embedded in JS)133
HTML comment81
Hidden text (display:none, visibility:hidden)78
Alt and ARIA attributes35
Meta tags10
Data attributes6

Sector distribution (where the bait sits):

Unknown330
News7
Academic3
Ecommerce3

330 of 343 surfaces (96%) carry no sector classification. This is the honest answer, not a classification gap. Most injection bait lives on pages with no clear sector identity, and the catalog deliberately does not guess. The named sectors (news, academic, ecommerce) are the surfaces where the page itself made the classification trivial.

The wild is being seeded, not yet weaponized at scale, but the bait is real. Every signature in this section is a signature defenders should already be checking for. Frame as leading indicator. The bait arrives months before the campaign that uses it.

7. The model attribution picture

Model attribution remains a research question. This month we report aggregate user-agent categories with explicit limits. We do not name a vendor or model without evidence.

UA categoryApproximate shareNotes
Browser-automation / headless≈50% of identifiable UA stringsChrome user-agents with no DNT, no Accept-Language, or with Selenium/Playwright signatures.
Named scanner SDKssmall absolute count, clear identityExplicit recon strings like SecurityScanner/1.0 (3,412 events) and curl/8.7.1 (2,363 events).
Agent-framework UA strings≈15% of total eventsStrings naming Nomad-class travel/scout agents (30,432 combined events). Provenance unverified; user-agent is attacker-controllable and not an identity claim.
Other / unattributableremainderStandard browser UAs that could be human, scripted, or automated; behavioral fingerprinting required to disambiguate.

User-agent strings are attacker-controllable and cannot ground an identity claim on their own. We are building behavioral fingerprinting (request-timing distributions, header-order canonicalization, tool-call structure) to close the gap. Progress will appear in subsequent editions.

8. What this means for defenders

Six recommendations follow from the data above. Each cites the Agent Threat Matrix technique it resolves to and the OASB control that implements the defense.

Treat the Model Context Protocol as the highest-priority hardening surface this month.

Three of every four observed attacker events targeted MCP. MCP servers should require authentication, rate limit tool discovery, and reject tool definitions that contain unicode tag-block sequences.

Maps to: T-1002, T-2005 · OASB 2.1 (Explicit Capability Grants), OASB 2.3 (Capability Boundaries)

Treat Agent-to-Agent (A2A) as the second-priority hardening surface, not a research curiosity.

15.2% of observed events used A2A handshakes. Most 2026 public discussion still frames A2A as a research target. The observed traffic argues the inverse. Authenticate every A2A handshake. Verify the calling agent's identity before honoring any capability request. Treat unsigned A2A messages as untrusted and reject agent-card discovery from unknown peers by default.

Maps to: T-1006, T-2008 · OASB 7.1 (Mutual Authentication), OASB 7.2 (Message Integrity), OASB 7.3 (Trust Boundary Enforcement)

Filter rendered HTML for the AIIS-UNICODE-TAG-BLOCK-01 signature in agent retrieval pipelines.

187 of 343 wild bait surfaces (54%) carry this single signature. Any agent that reads web content into an LLM context window should strip unicode tag-block characters before tokenization.

Maps to: T-2002 · OASB 3.1 (Prompt Injection Protection), OASB 3.3 (Input Validation)

Plan for persistence. Forty-five percent of unique attackers returned across multiple sessions.

Of 9,037 distinct fingerprints, 4,047 came back. The top fingerprint returned for 15 days straight. Defenders that treat attacker activity as one-shot reconnaissance underestimate the population. Adopt fingerprint-stable telemetry that survives short-lived IP rotation.

Maps to: T-9001 · OASB 10.1 (Security Event Logging)

Close the Common Crawl gap by instrumenting authenticated and dynamic surfaces.

Public-web crawls including Google's January 2026 IPI study are systematically blind to login-walled, per-fingerprint-dynamic, and federated-social content. Honeypot fleets that observe interactive behavior post-credential are the only way to measure that gap. This report's methodology details the partition.

Reject the Use-Alternate-Authentication-Material pattern (MITRE T1550) at the protocol boundary.

T1550 dominated the MITRE ATT&CK coverage with 7,630 observations. Agents that accept credentials from caller-supplied tokens without binding them to a request context are the target.

Maps to: T-3001 · OASB 5.2 (Context Window Isolation), OASB 5.3 (Credential Scope Limitation)

The Common Crawl gap

Public-web measurement studies, including Google's January 2026 indirect-prompt-injection sweep, sample crawls that surface only anonymous, statically rendered, search-engine-discoverable content. Three classes of attacker-reachable surface are systematically invisible to that sample.

  • Authenticated surfaces. Login walls, gated dashboards, post-auth tool calls. An agent that completes a credential-acquisition step reaches a different content surface and exhibits behaviors that crawler-only research cannot observe.
  • Dynamic surfaces. Per-fingerprint content variation. The same URL serves different HTML to different agents. Static crawls collapse this to a single template.
  • Social surfaces. Federated and platform-mediated content where indexing is partial, latent, or restricted.

OpenA2A's honeypot fleet partitions on this seam. The 20 SEO-engineered baseline VEIL sites are deliberately Common-Crawl-visible. New authenticated and dynamic archetypes (currently 1 of 5 live as cloudops-agent-io) are deliberately not. The delta cohort drew zero organic traffic in the window. That is the most direct measurement of the gap this report makes. The wild bait counted in Section 6 is what a crawler can find. The post-credential behavior the delta cohort will eventually capture is what no crawler can find. Edition 2 will report on the second archetype.

9. Methodology, limits, and how to cite

57

techniques

Threat Matrix evidence tiers

  • Observed in production17
  • Validated in lab38
  • Theoretical, flagged3

Source: Agent Threat Matrix evidence audit, locked at publication.

Methodology. Full per-stream methodology is published at /methodology, with sub-pages for behavioral-sweep (ARIAtrap) and first-observed-in-the-wild (FOITW). All four streams run continuously. Numbers in this report are the snapshot at publication.

Window. April 12 to May 11, 2026 (30 days). Future editions will adopt a fixed 15th-of-month cadence and a rolling 30-day window for clean month-over-month comparison.

What this report cannot answer this month.

  • HoneyFinder is a sample of the public web, not a coverage count. The 343 surfaces are what CommonCrawl, Shodan, and CT-log sampling reached during the window.
  • TrapMyAgent geography reflects origin IP geolocation, which VPN and proxy infrastructure can obscure. ASN-level analysis in our methodology page separates origin from terminus.
  • AgentPwn attack-success rate measures injection-following, not downstream impact. A callback proves an agent followed a payload. It does not measure what the agent then did.
  • Sophistication scoring is published as distribution-only. We do not publish a sophistication mean, grade, or month-over-month delta. The supervised classifier that would ground a mean does not yet exist. We will not put a number on a slide that the data cannot defend.
  • Model attribution is aggregate UA categories with explicit limits. We will not name a vendor without evidence.
  • The delta cohort observed zero organic interactions in the window. Operator self-test rows are excluded by query-layer filter. Edition 2 will report on the second authenticated archetype as it goes live.

How to cite

# BibTeX
@techreport{opena2a-btr-2026-05,
  author = {{ARIA, OpenA2A autonomous research system} and Abdel Fane (editor)},
  title  = {State of AI Agent Security: May 2026},
  institution = {OpenA2A Research},
  year   = {2026},
  month  = {5},
  type   = {Behavioral Threat Report},
  number = {Issue 1},
  url    = {https://research.opena2a.org/reports/state-of-ai-agent-security-2026-05}
}
# APA
ARIA, & Fane, A. (Ed.). (2026, May 12). State of AI Agent Security: May 2026 (Behavioral Threat Report Issue 1). OpenA2A Research. https://research.opena2a.org/reports/state-of-ai-agent-security-2026-05

How to challenge a finding

Email research@opena2a.org with the specific number you dispute and the methodology you would prefer we used. We aim to respond within five business days. Substantive challenges that hold up under review are published as methodology updates in subsequent editions with attribution.

Appendix. First Observed In The Wild log

FOITW is the mechanism by which OpenA2A pre-registers signatures for sophisticated AI agent attack techniques and publishes within seven days of first wild observation. The pre-registration catalog lives at aria/data/foitw-catalog.json with full methodology at /methodology/foitw.

Catalog state at publication: 3 signatures registered, 0 firings during the reporting window. Transparency-log anchoring is in progress. Log-anchored claims will publish from Edition 2.

Catalog IDNameTechniqueStatus
FOITW-CAT-0001Greshake-class indirect prompt injection via web content
Greshake et al. (AISec 2023). Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.
T-2002Pre-registered 2026-04-27. Not yet fired.
FOITW-CAT-0002Actor-Critic adaptive multi-turn manipulation
Shi, Lin, Song, et al. Lessons from Defending Gemini Against Indirect Prompt Injections (Google DeepMind, 2025).
T-2007Pre-registered 2026-04-27. Not yet fired.
FOITW-CAT-0003Reputation-poisoning prompt injection
Brunner, Liu, Pande. AI threats in the wild: The current state of prompt injections on the web (Google Threat Intelligence Group, April 2026).
T-9005Pre-registered 2026-04-27. Not yet fired.

Live indices at publication

The numbers in this report are the time-capsule snapshot. The indices below are continuously updated. Click through for the live methodology and CSV exports.

Authorship. ARIA is OpenA2A's autonomous research system. Editorial review by Abdel Fane. Disclosure of authorship is a credibility move, not a limitation. We document how our content is made.

Data integrity. No numbers in this report are estimated, modeled, or projected. Every value traces to a query against live instrumentation. Pre-publication audit per Black Hat reproducibility standards.

License. Apache 2.0. Cite using the BibTeX or APA blocks in Section 9.

Coordinated disclosure. Any finding in this report that maps to a previously undisclosed vulnerability is held under the 90-day ARIAdesk disclosure protocol. None of the surface-level findings in this report require disclosure coordination.