Research Methodology
How we collect and verify the vulnerability statistics shown in our research reports.
Specialized methodologies
The rest of this page documents the Shodan-driven exposure-sweep methodology used for monthly infrastructure census reports. ARIA v2 adds two additional methodologies for behavioral and pre-registered attack research:
Behavioral Sweep
How ARIAtrap turns honeypot fleet telemetry into Threat Matrix-classified observations and the monthly Behavioral Threat Report.
First Observed In The Wild (FOITW)
Pre-registered signatures, transparency-log anchored, published within a week of first fleet observation of a published technique.
Current Research Summary
Last updated: January 29, 2026
Overview
The statistics in our reports are derived from analysis of the Shodan search engine's publicly available index data, combined with review of open-source project documentation and default configurations. We do not access, test, or exploit any third-party systems.
Key principle: All findings are based on publicly indexed data and open-source code review. We analyze what Shodan has already indexed, not what we access directly.
Our Process
Target Discovery via Shodan
We use the Shodan API with 207 different search queries to identify candidate IP addresses running AI agent infrastructure. We search for signatures across Python frameworks (Uvicorn, FastAPI, Django, Flask, Gunicorn), Node.js servers (Express, Koa, Next.js), Go/Java/Ruby/Rust frameworks, cloud platforms, and API patterns.
Banner and Response Analysis
Shodan's cached banner data and HTTP response headers are analyzed for security-relevant patterns: exposed configuration files, agent instruction files, API key patterns, MCP tool definitions, gateway signatures, and debug mode indicators. Only findings with clear signature matches in the indexed data are counted.
Aggregation and Reporting
Findings are aggregated by pattern type. We calculate rates based on the proportion of indexed hosts showing security-relevant patterns and publish aggregate statistics only. No individual hosts, IPs, or organizations are identified.
Shodan Query Categories
We use 207 queries across these categories to maximize coverage:
SSE Endpoints
5 queriestext/event-stream on ports 80, 443, 3000, 8000, 8080
Python Frameworks
35 queriesUvicorn, FastAPI, Django, Flask, Gunicorn, Tornado, aiohttp
Node.js Servers
30 queriesExpress, Koa, Hapi, Fastify, NestJS, Next.js, Nuxt
WebSocket/Real-time
15 queriesWebSocket upgrades, Socket.io, WS connections
API Patterns
25 queries/api/v1, /api/v2, REST endpoints, GraphQL, OpenAPI
AI/ML Infrastructure
20 queriesLangChain, LlamaIndex, Hugging Face, model endpoints
Cloud Platforms
15 queriesAWS Lambda, GCP Run, Azure Functions, Vercel, Heroku
Debug/Admin Endpoints
20 queries/debug, /admin, /health, /metrics, /status
Go/Java/Ruby/Rust
25 queriesGin, Echo, Spring, Rails, Actix, Rocket
Database/Container UIs
17 queriesMongoDB Express, Redis Commander, Portainer, phpMyAdmin
View sample Shodan queries
Patterns Analyzed
We analyze Shodan index data for 12 security-relevant patterns:
mcp-sse-exposedMCP SSE Endpoints
SSE endpoint signatures in banner data
mcp-tools-exposedMCP Tools Listing
Tool definition patterns in HTTP responses
api-key-exposedAPI Key Exposure
API key patterns in cached response headers
config-file-exposedConfig Files
Configuration file paths in directory listings
claude-md-exposedSystem Instructions
CLAUDE.md references in index data
no-auth-mcpUnauthenticated MCP
MCP endpoints without auth indicators
outdated-api-endpointDebug Endpoints
/debug, /admin, /shell paths in index data
clawdbot-gateway-exposedAgent Gateway
Gateway signatures on port 18789
clawdbot-websocket-exposedWebSocket Control
WebSocket control signatures on port 18790
outdated-versionOutdated Versions
Outdated version strings in banners
debug-mode-enabledDebug Mode
Debug mode indicators in response headers
dir-listing-enabledDirectory Listing
Directory listing indicators in HTML content
Current Findings Breakdown
From 11,100 analyzed hosts, 8,449 security-relevant patterns were identified:
Reproducibility
Our research methodology is documented here for transparency. The analysis approach uses the Shodan search API, a publicly available service, combined with open-source code review.
Ethics and Legal Framework
- *All research is based on analysis of Shodan's publicly available index data
- *We do not access, test, or exploit any third-party systems
- *No authentication mechanisms are bypassed or tested
- *No private data is retrieved, stored, or disclosed
- *All reports use aggregate statistics only; no organizations, IPs, or domains are identified
- *Security patterns are identified through open-source code review and default configuration analysis
- *Our goal is to help organizations identify and fix security issues in their own infrastructure
Full Research Report
For the full analysis including detailed vulnerability breakdowns and recommendations, read our published report.
Questions or Concerns?
If you have questions about our methodology or want to report an issue with our data, please contact us.