Behavioral Sweep Methodology
How ARIAtrap turns honeypot fleet telemetry into Threat Matrix-classified observations and the monthly Behavioral Threat Report.
Data sources
ARIAtrap consumes three sources, in priority order:
- TrapMyAgent fleet telemetry — the OpenA2A honeypot fleet ships
@opena2a/veil-middlewarev0.4.1+ with three event emitters: page visits, canary triggers, and authenticated-event tags. Telemetry lands in the OpenA2A Registry. - AgentPwn scenario telemetry — agentpwn.com scenario completions. Each completion is one observed adversarial workflow.
- HoneyFinder scan results — opportunistic scans of public AI-agent-adjacent surfaces matching honeypot-archetype indicators.
Classification rule
Every observation maps to exactly one primary Threat Matrix technique (T-NNNN). Multi-technique observations carry an array of secondary IDs. There are no per-product taxonomies, no soft tags. If an observation does not map to any existing technique, ARIAtrap files a Threat Matrix governance proposal before the observation appears in any published report.
Evidence classes
| Class | Definition |
|---|---|
| laboratory-validated | Reproduced in sandbox by ARIAred. No fleet observation yet. |
| observed-single-fleet | Fleet telemetry confirms at least one observation on a single property within the reporting window. |
| observed-cross-fleet | Same payload appears on three or more properties within twenty-four hours. The strongest tier — beyond MITRE ATT&CK's "observed in real ops." |
Cross-property attribution
Two correlation primitives:
- Fingerprint correlation: same JA3 + header hash on multiple fleet properties within a six-hour window is treated as one attacker session.
- Token correlation: a canary token emitted by property A and resolved by property B is a cross-property reasoning event — the rarest, most diagnostic signal in the fleet. The fleet linkage graph exists specifically to interpret these.
Sophistication scoring (distribution-only)
Fleet partitioning
Every observation carries a fleetCoverage class:
- baseline — Common-Crawl-visible by design. The existing fleet sites.
- authenticated — per-honeypot bespoke auth flows. No shared scaffolding (each authenticated property has its own login UX so the fleet does not produce a uniform attacker-detectable fingerprint).
- dynamic — per-fingerprint content variation server.
- social — Mastodon, Reddit, X, LinkedIn presences.
The Common-Crawl-gap signal is the delta between baseline and the other three classes. Reports always cite numbers partitioned by coverage class.
Confidence intervals
ARIApulse computes 95% CIs on every count published in a Behavioral Threat Report — Wilson score for binary metrics, normal-approximation for counts (Poisson exact when n is below one hundred), Wald with continuity correction for cross-partition ratios. Partitions with insufficient data (n below fifty) are reported as "Insufficient data this month. Resumes when n is at least fifty." rather than estimated.
Reproducibility standard
Every count in a Behavioral Threat Report is reproducible from the public OpenA2A Registry telemetry export bundle for the reporting window. Bar: a researcher with bundle access reproduces every cited count within five percent within two hours.
If a count cannot meet this bar, it does not appear in the report. It sits in our internal observations record as a research-debt entry. An empty cell beats a fabricated number.