Research methodology

How OpenA2A research works. Authorship, data streams, taxonomy, evidence standards, limits, citation, and how to challenge a finding.

Authorship

ARIA is OpenA2A's autonomous research system. Six specialized agents handle hypothesis generation, exposure sweeps, behavioral threat synthesis, exploit development, patch development, statistical analysis, and disclosure coordination. Reports authored by ARIA carry a system byline. Editorial review is done by a named human, currently Abdel Fane. Bylines on every report disclose both.

Disclosure of authorship is a credibility move, not a limitation. We document how our content is made. Trust signals in 2026 favor organizations that show the work, not organizations that hide automation.

All findings are reviewed by a human editor before publication. The editor can return a draft for revision. The editor cannot rewrite an ARIA finding under a different byline.

Four data streams

OpenA2A operates four telemetry streams. Each measures a distinct slice of AI agent security. The monthly Behavioral Threat Report synthesizes across all four; specialty reports may use only one. Every stream is documented in full at its own sub-page.

ARIAscout. Exposure sweeps.

Continuous Shodan and Censys sweeps of AI service exposure. As of April 2026, 321,929 internet-facing AI services indexed. The longest-running stream. Provides the denominator for exposure prevalence comparisons. Full methodology in the ARIAscout subsection below.

ARIAtrap. Honey-agent fleet.

TrapMyAgent runs 22 honey agents across sector verticals. Each agent looks like a deployed production agent to a visitor; each is instrumented to observe attacker behavior post-credential. AgentPwn instruments honeypot pages with injection payloads that an attacker-controlled agent can encounter and follow. Together they form the behavioral telemetry layer that public-web crawler studies cannot produce.

HoneyFinder. Wild bait sampling.

Samples the public web through CommonCrawl, Shodan banner data, and CT-log domain expansion for adversarial injection payloads planted by third parties. HoneyFinder reports aggregate counts and a signature leaderboard. HoneyFinder findings are a sample of the public web, not a coverage count of every site online. The HoneyFinder stream is currently published under research.opena2a.org until the brand reaches independent publication scale.

First Observed In The Wild (FOITW).

Pre-registered signatures for sophisticated AI agent attack techniques. When research publishes a new technique class, we register a signature in our pre-registration catalog and watch the honeypot fleet for first wild observation. Publication SLA is within seven days of first observed firing. Transparency-log anchoring of pre-registration timestamps is the integrity layer.

Registry trust signals (opt-in)

Where deployed scanners (HMA), guards (ARP), verifiers (AIM), and brokers (Secretless) are opted in, telemetry from those deployments contributes supplementary signal. Examples include scan volume, check failure rates, trust score distribution, and capability decision counts. All Registry telemetry is opt-in. Reports cite Registry signals only when the deploying organization has explicitly authorized inclusion in research data.

Deduplication rule

One number, one home. Each metric lives in exactly one stream. Live values stay on dashboards. Monthly reports quote with a timestamp and link to the dashboard. Reports are time capsules. Live numbers stay on the dashboard.

  • Exposed AI services count lives on the Exposure Prevalence Index (ARIAscout).
  • Honeypot fleet event totals live on the Attack Prevalence Index (ARIAtrap).
  • Sophistication distribution lives on the Sophistication Index (ARIAtrap, distribution-only).
  • HoneyFinder surface counts live on the wild-bait dashboard.
  • Each Agent Threat Matrix technique resolves to exactly one T-NNNN identifier.

Evidence standards

Every Agent Threat Matrix technique cited in a report carries an evidence tier. We do not publish a technique that does not pass at least one tier. The tier rubric comes from the Agent Threat Matrix evidence audit.

TierDefinitionCount of 57 techniques
OBSERVEDConfirmed in real-world production systems. ARIAscout exposure sweeps, npm supply-chain attacks, real CVE disclosures.17
VALIDATEDReproducible in a controlled lab environment. DVAA challenge fixtures, HMA scan results, sandboxed ARIAred PoCs.38
THEORETICALPlausible based on architecture analysis but no real-world observation or lab reproduction yet. Flagged as such in the technique page.3

Counts as of the publication of the May 2026 Behavioral Threat Report. The Threat Matrix grows continuously. The live evidence audit lives at the matrix repository.

Per-source limitations

Every stream has a structural limit. We publish each limit alongside the finding it constrains.

  • HoneyFinder is a sample, not a coverage count. Numbers are what CommonCrawl, Shodan banner data, and CT-log domain expansion reached during the reporting window. Surfaces that block crawlers are systematically invisible.
  • TrapMyAgent geography reflects origin IP geolocation. VPN, proxy, and Cloudflare-fronting infrastructure can obscure the real origin. ASN-level analysis separates origin from terminus when needed.
  • AgentPwn attack-success rate measures injection-following. A callback proves an agent followed a payload. It does not measure downstream impact. Reports that extrapolate from injection-following to compromise scope should be read with that caveat.
  • Sophistication scoring is distribution-only. The supervised classifier that would ground a mean does not yet exist. We do not publish a sophistication mean, grade, or month-over-month delta. We publish the per-session distribution and label the rubric as in calibration.
  • User-agent strings are not identity claims. Attacker UA strings are attacker-controllable. We report aggregate UA categories with explicit limits, never name a vendor or model without evidence beyond the UA string.
  • ARIAscout sees what Shodan indexes. Shodan's index is a public snapshot of internet-facing infrastructure with banner-grabbing limitations and indexing lag. Services that block Shodan crawlers, run on non-standard ports without service signatures, or sit behind login walls are systematically invisible.

How to cite OpenA2A research

Every report carries a citation block in BibTeX and APA format. Citation makes the work compound. Make citation easy.

# BibTeX, generic form
@techreport{opena2a-btr-YYYY-MM,
  author = {{ARIA, OpenA2A autonomous research system} and <editor name> (editor)},
  title  = {State of AI Agent Security: <Month> <Year>},
  institution = {OpenA2A Research},
  year   = {<year>},
  month  = {<month>},
  type   = {Behavioral Threat Report},
  number = {Issue N},
  url    = {<report URL>}
}
# APA, generic form
ARIA, & <Editor>. (Ed.). (<year>, <month> <day>). State of AI Agent Security: <Month> <Year> (Behavioral Threat Report Issue N). OpenA2A Research. <URL>

DOI assignment is on the roadmap. Until then, the canonical URL of each report is the persistent identifier.

How to challenge a finding

Email research@opena2a.org with the specific number you dispute and the methodology you would prefer we used. Substantive challenges that hold up under review are published as methodology updates in subsequent editions with attribution.

Response target: five business days. Disputes that turn on definitions (what counts as an agent, what counts as a session) typically resolve faster than disputes that turn on sampling. We will not silently change a published number. Errata appear in the methodology page changelog and in the affected report's footer.

ARIAscout: exposure sweeps

ARIAscout is the longest-running stream. It uses 207 Shodan search queries across ten categories to identify internet-facing AI agent infrastructure. As of April 10, 2026, 321,929 services were indexed.

Key principle

All findings are based on publicly indexed data and open source code review. We analyze what Shodan and Censys have already indexed. We do not access, test, or exploit third-party systems.

Process

  1. Target discovery via Shodan. 207 queries across Python frameworks, Node.js servers, Go/Java/Ruby/Rust frameworks, cloud platforms, API patterns, AI/ML infrastructure, and agent-framework signatures.
  2. Banner and response analysis. Shodan's cached banner data and HTTP response headers are analyzed for security-relevant patterns: exposed configuration files, agent instruction files, API key patterns, MCP tool definitions, gateway signatures, and debug mode indicators. Only findings with clear signature matches in the indexed data are counted.
  3. Aggregation and reporting. Findings are aggregated by pattern type. Rates are computed against the proportion of indexed hosts showing security-relevant patterns. No individual hosts, IPs, or organizations are identified in published output.

Shodan query categories

207 queries are distributed across these categories.

SSE endpoints

5 queries

text/event-stream on ports 80, 443, 3000, 8000, 8080

Python frameworks

35 queries

Uvicorn, FastAPI, Django, Flask, Gunicorn, Tornado, aiohttp

Node.js servers

30 queries

Express, Koa, Hapi, Fastify, NestJS, Next.js, Nuxt

WebSocket and real-time

15 queries

WebSocket upgrades, Socket.io, WS connections

API patterns

25 queries

/api/v1, /api/v2, REST endpoints, GraphQL, OpenAPI

AI and ML infrastructure

20 queries

LangChain, LlamaIndex, Hugging Face, model endpoints

Cloud platforms

15 queries

AWS Lambda, GCP Run, Azure Functions, Vercel, Heroku

Debug and admin endpoints

20 queries

/debug, /admin, /health, /metrics, /status

Go, Java, Ruby, Rust

25 queries

Gin, Echo, Spring, Rails, Actix, Rocket

Database and container UIs

17 queries

MongoDB Express, Redis Commander, Portainer, phpMyAdmin

Patterns analyzed

ARIAscout analyzes Shodan index data for 12 security-relevant patterns.

mcp-sse-exposed

MCP SSE endpoints

SSE endpoint signatures in banner data

mcp-tools-exposed

MCP tools listing

Tool definition patterns in HTTP responses

api-key-exposed

API key exposure

API key patterns in cached response headers

config-file-exposed

Config files

Configuration file paths in directory listings

claude-md-exposed

System instructions

CLAUDE.md references in index data

no-auth-mcp

Unauthenticated MCP

MCP endpoints without auth indicators

outdated-api-endpoint

Debug endpoints

/debug, /admin, /shell paths in index data

clawdbot-gateway-exposed

Agent gateway

Gateway signatures on port 18789

clawdbot-websocket-exposed

WebSocket control

WebSocket control signatures on port 18790

outdated-version

Outdated versions

Outdated version strings in banners

debug-mode-enabled

Debug mode

Debug mode indicators in response headers

dir-listing-enabled

Directory listing

Directory listing indicators in HTML content

Reproducibility

All ARIA methodology is documented for transparency. Each stream's reproduction requirements are published on its sub-page. ARIAscout's requirements:

# Requirements
- Shodan API key (Freelancer plan for 1000 results per query)
- Node.js 18+
# Analysis parameters
- 207 Shodan queries across 10 categories
- 6 second delay between Shodan API calls
- Banner and header pattern matching for service identification

Ethics and legal framework

  • *ARIAscout research is based on analysis of Shodan's publicly available index data.
  • *ARIAtrap honey agents and AgentPwn honeypot pages run on infrastructure OpenA2A controls.
  • *HoneyFinder samples public-web content via CommonCrawl, Shodan, and CT logs. It does not authenticate into any system.
  • *No authentication mechanisms on third-party systems are bypassed or tested.
  • *No private data is retrieved, stored, or disclosed.
  • *Reports use aggregate statistics only. No organizations, IPs, or domains are identified in published output.
  • *Security patterns are identified through open source code review and default configuration analysis.
  • *Vulnerability findings flow through the ARIAdesk 90-day coordinated disclosure pipeline before publication.

Questions or concerns?

Questions about methodology or data quality go to research@opena2a.org. Coordinated disclosure inquiries go to disclose@opena2a.org.