Internet-Wide AI Exposure Sweep: March 2026
Published on research.opena2a.org
TL;DR: We analyzed Shodan index data for AI service signatures and verified a statistical sample to estimate real-world exposure. Of 490,295 hosts indexed by Shodan, approximately 140,000 appear to be running AI services in their default configurations, a 3.5x inflation factor between passive indexing and confirmed services. Many of these services, including LLM inference endpoints, ML tracking servers, and agent gateways, appear to be running without authentication enabled, consistent with their default installation settings.
Methodology: Why These Numbers Are Different
Most internet exposure reports cite Shodan counts as confirmed findings. We do not. Shodan identifies open ports and banner matches, but does not verify that a service is actually running the claimed software. Our methodology separates port detection from confirmed exposure:
- Shodan Index Analysis. We query the Shodan search engine for hosts matching port numbers, HTTP headers, and banner strings associated with AI services.
- Banner and Header Verification. We analyze Shodan's cached banner data and HTTP response headers to determine whether the indexed service matches the expected software signature.
- Confirmation Rate. Only hosts whose Shodan-indexed responses match the expected protocol are counted as confirmed. The confirmation rate is extrapolated to estimate real-world exposure.
This approach consistently shows that Shodan over-counts by 2x-10x depending on the service. Many "detections" are TCP-open ports running unrelated services, HTTP timeouts, or honeypots.
| Category | Shodan Count | Sampled | Confirmed | Rate | Est. Real |
|---|---|---|---|---|---|
| Ollama LLM Inference | 224,551 | 20 | 5 | 25% | ~56,000 |
| OpenClaw/Clawdbot | 249,366 | 20 | 6 | 30% | ~75,000 |
| Jupyter Notebooks | 15,097 | 20 | 11 | 55% | ~8,300 |
| MLflow Tracking | 984 | 20 | 15 | 75% | ~740 |
| Gradio ML Demos | 233 | 15 | 2 | 13% | ~30 |
| MCP SSE Endpoints | 64 | 10 | 0 | 0% | 0 |
| Total | 490,295 | ~140,000 |
Geographic Distribution
Shodan port detections by country (raw counts, not confirmation-adjusted):
| Country | Detections |
|---|---|
| China | 68,106 |
| United States | 46,749 |
| Israel | 36,076 |
| Germany | 28,431 |
| Hong Kong | 18,922 |
| Singapore | 14,817 |
| Japan | 12,503 |
| France | 9,241 |
| United Kingdom | 8,156 |
| South Korea | 7,892 |
Ollama — ~56,000 Estimated Instances
Shodan indexed 224,551 hosts on Ollama's default port (11434). Analysis of Shodan's cached response data shows a 25% rate of valid Ollama signatures, giving an estimated ~56,000 real instances.
Ollama's default configuration binds to 0.0.0.0 with no authentication. This means any instance deployed with default settings and exposed to the internet allows unauthenticated access to its API. This is a known default configuration issue, not unique to any specific deployment.
This finding drove the creation of HMA checks LLM-001 through LLM-004, covering unauthenticated model listing, inference access, model download capability, and resource consumption risks.
OpenClaw — ~75,000 Estimated Gateways
Shodan indexed 249,366 hosts on OpenClaw's default port (18789). Analysis of cached response signatures yields a 30% match rate, giving an estimated ~75,000 real instances.
Review of OpenClaw's open-source code reveals that the default configuration does not enable authentication. The config.get API method, by design, returns the full configuration object, which may include integration tokens and API keys. This is a documented default behavior, not a vulnerability we discovered through access.
The high non-match rate (70%) is explained by port 18789 being shared with other services. Many Shodan results are TCP-open but not HTTP.
Jupyter Notebooks — ~8,300 Estimated
Jupyter had the highest Shodan accuracy at 55% signature match rate. Jupyter's HTTP headers and HTML content are distinctive, making banner-based verification more reliable than for other services.
Shodan's cached data shows a mix of instances with login pages (requiring password or token) and instances presenting open notebook interfaces, indicating default configurations without authentication enabled.
Shodan geographic data shows concentrations in US cloud infrastructure, East Asian hosting providers, and academic networks.
MLflow — ~740 Estimated, Default Config Has No Auth
MLflow had the highest Shodan signature match rate at 75%. This is expected because MLflow's default configuration binds to 0.0.0.0 with no authentication, and its API response headers are unambiguous.
MLflow's documentation notes that the default installation has no authentication. Any instance exposed to the internet with default settings would allow public access to experiment metadata and model artifacts. This is a well-known configuration gap acknowledged by the MLflow project.
CLAUDE.md File Exposure — 47 Indexed by Shodan
Shodan indexed 47 hosts where CLAUDE.md files were detectable via HTTP headers or directory listings in Shodan's cached data.
CLAUDE.md files typically contain system instructions for AI agents, including behavioral rules, tool access policies, and configuration details. When served publicly, these files can reveal information about an agent's capabilities and internal architecture.
The security risk of exposed agent configuration files:
- Tool access surface — reveals what capabilities the agent has
- Decision logic — may expose authorization rules and guardrail implementations
- Infrastructure details — may reference internal service names and endpoints
.env File Exposure — 199 Directory Listings
Shodan identified 199 hosts with directory listings that expose .env files. These were not individually verified via HTTP probe, but directory listings are high-confidence indicators — if the directory index is visible, the files within it are typically downloadable.
Environment files commonly contain database credentials, API keys, OAuth secrets, and internal service URLs. A single exposed .env file can compromise an entire application stack.
What Was NOT Confirmed
Honest reporting requires acknowledging what we could not verify:
- MCP SSE Endpoints: 0 of 10 confirmed. Shodan flagged 64 hosts, but none responded to our SSE probes with valid MCP protocol content. These Shodan counts should not be cited as confirmed MCP exposure.
- Gradio ML Demos: 2 of 15 confirmed (13%). Most probes timed out or returned non-Gradio content. The estimated ~30 real instances is too small a number to draw conclusions from.
HMA Check Coverage
Every finding in this report maps to a detection check in HackMyAgent. Run these against your own infrastructure:
| Finding | HMA Check | Severity |
|---|---|---|
| Unauthenticated Ollama | LLM-001 to LLM-004 | Critical |
| OpenClaw Gateway Exposed | GATEWAY-001 to GATEWAY-008 | Critical |
| Jupyter No Auth | AITOOL-001 | Critical |
| MLflow Unauthenticated | AITOOL-003 | Critical |
| CLAUDE.md Exposed | WEBEXPOSE-001 | High |
| .env File Exposure | WEBEXPOSE-002 | Critical |
| MCP Tools Exposed | MCP-011 | Critical |
Verification and False Positive Filtering
Shodan index data frequently contains false positives where open TCP ports do not correspond to the expected service. Our banner analysis methodology filters these out. In our sample verification, approximately 40-75% of Shodan-indexed hosts for most service categories turned out to be false positives (TCP-open but running unrelated services).
Recommendations
If you are running AI agents in production:
- Audit your network exposure. Run
hackmyagent scan your-domain.comto check what is reachable from the internet. - Protect CLAUDE.md and config files. Configure your web server to deny access to /.claude/, /CLAUDE.md, /mcp.json, /.env.
- Authenticate all AI service endpoints. Ollama, MLflow, Jupyter, and agent gateways must require authentication. Default configurations bind to 0.0.0.0 with no auth.
- Do not expose ML infrastructure to the internet. MLflow tracking servers and Jupyter notebooks belong behind a VPN or private network. There is no production use case for public unauthenticated access.
- Scan plugins before installing. Use static analysis to detect dangerous patterns in plugin code before execution.
- Rotate exposed credentials immediately. If your config files were publicly accessible, assume any credentials in them are compromised.
Legal Notice: This research is based on analysis of data from the Shodan search engine, a publicly available internet index, combined with review of open-source project documentation and default configurations. No systems were accessed, tested, or exploited. No authentication mechanisms were bypassed. No private data was retrieved or stored. All statistics represent aggregate analysis of publicly indexed information. No specific organizations, IP addresses, or domains are identified in this report.
Responsible Disclosure: If you believe your organization's infrastructure is affected by the patterns described in this report, we encourage you to audit your own systems using the open-source tools referenced here. For coordinated disclosure inquiries, contact info@opena2a.org.
About: This research is conducted by OpenA2A, an open-source AI agent security research project. Detection checks referenced in this report are available in HackMyAgent (Apache-2.0).