Research methodology
How OpenA2A research works. Authorship, data streams, taxonomy, evidence standards, limits, citation, and how to challenge a finding.
Authorship
ARIA is OpenA2A's autonomous research system. Six specialized agents handle hypothesis generation, exposure sweeps, behavioral threat synthesis, exploit development, patch development, statistical analysis, and disclosure coordination. Reports authored by ARIA carry a system byline. Editorial review is done by a named human, currently Abdel Fane. Bylines on every report disclose both.
Disclosure of authorship is a credibility move, not a limitation. We document how our content is made. Trust signals in 2026 favor organizations that show the work, not organizations that hide automation.
All findings are reviewed by a human editor before publication. The editor can return a draft for revision. The editor cannot rewrite an ARIA finding under a different byline.
Four data streams
OpenA2A operates four telemetry streams. Each measures a distinct slice of AI agent security. The monthly Behavioral Threat Report synthesizes across all four; specialty reports may use only one. Every stream is documented in full at its own sub-page.
ARIAscout. Exposure sweeps.
Continuous Shodan and Censys sweeps of AI service exposure. As of April 2026, 321,929 internet-facing AI services indexed. The longest-running stream. Provides the denominator for exposure prevalence comparisons. Full methodology in the ARIAscout subsection below.
ARIAtrap. Honey-agent fleet.
TrapMyAgent runs 22 honey agents across sector verticals. Each agent looks like a deployed production agent to a visitor; each is instrumented to observe attacker behavior post-credential. AgentPwn instruments honeypot pages with injection payloads that an attacker-controlled agent can encounter and follow. Together they form the behavioral telemetry layer that public-web crawler studies cannot produce.
HoneyFinder. Wild bait sampling.
Samples the public web through CommonCrawl, Shodan banner data, and CT-log domain expansion for adversarial injection payloads planted by third parties. HoneyFinder reports aggregate counts and a signature leaderboard. HoneyFinder findings are a sample of the public web, not a coverage count of every site online. The HoneyFinder stream is currently published under research.opena2a.org until the brand reaches independent publication scale.
First Observed In The Wild (FOITW).
Pre-registered signatures for sophisticated AI agent attack techniques. When research publishes a new technique class, we register a signature in our pre-registration catalog and watch the honeypot fleet for first wild observation. Publication SLA is within seven days of first observed firing. Transparency-log anchoring of pre-registration timestamps is the integrity layer.
Registry trust signals (opt-in)
Where deployed scanners (HMA), guards (ARP), verifiers (AIM), and brokers (Secretless) are opted in, telemetry from those deployments contributes supplementary signal. Examples include scan volume, check failure rates, trust score distribution, and capability decision counts. All Registry telemetry is opt-in. Reports cite Registry signals only when the deploying organization has explicitly authorized inclusion in research data.
Deduplication rule
One number, one home. Each metric lives in exactly one stream. Live values stay on dashboards. Monthly reports quote with a timestamp and link to the dashboard. Reports are time capsules. Live numbers stay on the dashboard.
- Exposed AI services count lives on the Exposure Prevalence Index (ARIAscout).
- Honeypot fleet event totals live on the Attack Prevalence Index (ARIAtrap).
- Sophistication distribution lives on the Sophistication Index (ARIAtrap, distribution-only).
- HoneyFinder surface counts live on the wild-bait dashboard.
- Each Agent Threat Matrix technique resolves to exactly one T-NNNN identifier.
Evidence standards
Every Agent Threat Matrix technique cited in a report carries an evidence tier. We do not publish a technique that does not pass at least one tier. The tier rubric comes from the Agent Threat Matrix evidence audit.
| Tier | Definition | Count of 57 techniques |
|---|---|---|
| OBSERVED | Confirmed in real-world production systems. ARIAscout exposure sweeps, npm supply-chain attacks, real CVE disclosures. | 17 |
| VALIDATED | Reproducible in a controlled lab environment. DVAA challenge fixtures, HMA scan results, sandboxed ARIAred PoCs. | 38 |
| THEORETICAL | Plausible based on architecture analysis but no real-world observation or lab reproduction yet. Flagged as such in the technique page. | 3 |
Counts as of the publication of the May 2026 Behavioral Threat Report. The Threat Matrix grows continuously. The live evidence audit lives at the matrix repository.
Per-source limitations
Every stream has a structural limit. We publish each limit alongside the finding it constrains.
- HoneyFinder is a sample, not a coverage count. Numbers are what CommonCrawl, Shodan banner data, and CT-log domain expansion reached during the reporting window. Surfaces that block crawlers are systematically invisible.
- TrapMyAgent geography reflects origin IP geolocation. VPN, proxy, and Cloudflare-fronting infrastructure can obscure the real origin. ASN-level analysis separates origin from terminus when needed.
- AgentPwn attack-success rate measures injection-following. A callback proves an agent followed a payload. It does not measure downstream impact. Reports that extrapolate from injection-following to compromise scope should be read with that caveat.
- Sophistication scoring is distribution-only. The supervised classifier that would ground a mean does not yet exist. We do not publish a sophistication mean, grade, or month-over-month delta. We publish the per-session distribution and label the rubric as in calibration.
- User-agent strings are not identity claims. Attacker UA strings are attacker-controllable. We report aggregate UA categories with explicit limits, never name a vendor or model without evidence beyond the UA string.
- ARIAscout sees what Shodan indexes. Shodan's index is a public snapshot of internet-facing infrastructure with banner-grabbing limitations and indexing lag. Services that block Shodan crawlers, run on non-standard ports without service signatures, or sit behind login walls are systematically invisible.
How to cite OpenA2A research
Every report carries a citation block in BibTeX and APA format. Citation makes the work compound. Make citation easy.
@techreport{opena2a-btr-YYYY-MM,
author = {{ARIA, OpenA2A autonomous research system} and <editor name> (editor)},
title = {State of AI Agent Security: <Month> <Year>},
institution = {OpenA2A Research},
year = {<year>},
month = {<month>},
type = {Behavioral Threat Report},
number = {Issue N},
url = {<report URL>}
}ARIA, & <Editor>. (Ed.). (<year>, <month> <day>). State of AI Agent Security: <Month> <Year> (Behavioral Threat Report Issue N). OpenA2A Research. <URL>
DOI assignment is on the roadmap. Until then, the canonical URL of each report is the persistent identifier.
How to challenge a finding
Email research@opena2a.org with the specific number you dispute and the methodology you would prefer we used. Substantive challenges that hold up under review are published as methodology updates in subsequent editions with attribution.
Response target: five business days. Disputes that turn on definitions (what counts as an agent, what counts as a session) typically resolve faster than disputes that turn on sampling. We will not silently change a published number. Errata appear in the methodology page changelog and in the affected report's footer.
ARIAscout: exposure sweeps
ARIAscout is the longest-running stream. It uses 207 Shodan search queries across ten categories to identify internet-facing AI agent infrastructure. As of April 10, 2026, 321,929 services were indexed.
Key principle
All findings are based on publicly indexed data and open source code review. We analyze what Shodan and Censys have already indexed. We do not access, test, or exploit third-party systems.
Process
- Target discovery via Shodan. 207 queries across Python frameworks, Node.js servers, Go/Java/Ruby/Rust frameworks, cloud platforms, API patterns, AI/ML infrastructure, and agent-framework signatures.
- Banner and response analysis. Shodan's cached banner data and HTTP response headers are analyzed for security-relevant patterns: exposed configuration files, agent instruction files, API key patterns, MCP tool definitions, gateway signatures, and debug mode indicators. Only findings with clear signature matches in the indexed data are counted.
- Aggregation and reporting. Findings are aggregated by pattern type. Rates are computed against the proportion of indexed hosts showing security-relevant patterns. No individual hosts, IPs, or organizations are identified in published output.
Shodan query categories
207 queries are distributed across these categories.
SSE endpoints
5 queriestext/event-stream on ports 80, 443, 3000, 8000, 8080
Python frameworks
35 queriesUvicorn, FastAPI, Django, Flask, Gunicorn, Tornado, aiohttp
Node.js servers
30 queriesExpress, Koa, Hapi, Fastify, NestJS, Next.js, Nuxt
WebSocket and real-time
15 queriesWebSocket upgrades, Socket.io, WS connections
API patterns
25 queries/api/v1, /api/v2, REST endpoints, GraphQL, OpenAPI
AI and ML infrastructure
20 queriesLangChain, LlamaIndex, Hugging Face, model endpoints
Cloud platforms
15 queriesAWS Lambda, GCP Run, Azure Functions, Vercel, Heroku
Debug and admin endpoints
20 queries/debug, /admin, /health, /metrics, /status
Go, Java, Ruby, Rust
25 queriesGin, Echo, Spring, Rails, Actix, Rocket
Database and container UIs
17 queriesMongoDB Express, Redis Commander, Portainer, phpMyAdmin
Patterns analyzed
ARIAscout analyzes Shodan index data for 12 security-relevant patterns.
mcp-sse-exposedMCP SSE endpoints
SSE endpoint signatures in banner data
mcp-tools-exposedMCP tools listing
Tool definition patterns in HTTP responses
api-key-exposedAPI key exposure
API key patterns in cached response headers
config-file-exposedConfig files
Configuration file paths in directory listings
claude-md-exposedSystem instructions
CLAUDE.md references in index data
no-auth-mcpUnauthenticated MCP
MCP endpoints without auth indicators
outdated-api-endpointDebug endpoints
/debug, /admin, /shell paths in index data
clawdbot-gateway-exposedAgent gateway
Gateway signatures on port 18789
clawdbot-websocket-exposedWebSocket control
WebSocket control signatures on port 18790
outdated-versionOutdated versions
Outdated version strings in banners
debug-mode-enabledDebug mode
Debug mode indicators in response headers
dir-listing-enabledDirectory listing
Directory listing indicators in HTML content
Reproducibility
All ARIA methodology is documented for transparency. Each stream's reproduction requirements are published on its sub-page. ARIAscout's requirements:
Ethics and legal framework
- *ARIAscout research is based on analysis of Shodan's publicly available index data.
- *ARIAtrap honey agents and AgentPwn honeypot pages run on infrastructure OpenA2A controls.
- *HoneyFinder samples public-web content via CommonCrawl, Shodan, and CT logs. It does not authenticate into any system.
- *No authentication mechanisms on third-party systems are bypassed or tested.
- *No private data is retrieved, stored, or disclosed.
- *Reports use aggregate statistics only. No organizations, IPs, or domains are identified in published output.
- *Security patterns are identified through open source code review and default configuration analysis.
- *Vulnerability findings flow through the ARIAdesk 90-day coordinated disclosure pipeline before publication.
Questions or concerns?
Questions about methodology or data quality go to research@opena2a.org. Coordinated disclosure inquiries go to disclose@opena2a.org.