Behavioral Sweep Methodology

How ARIAtrap turns honeypot fleet telemetry into Threat Matrix-classified observations and the monthly Behavioral Threat Report.

Data sources

ARIAtrap consumes three sources, in priority order:

  1. TrapMyAgent fleet telemetry The OpenA2A honeypot fleet ships @opena2a/veil-middleware v0.4.1+ with three event emitters: page visits, canary triggers, and authenticated-event tags. Telemetry lands in the OpenA2A Registry.
  2. AgentPwn scenario telemetry Agentpwn.com scenario completions. Each completion is one observed adversarial workflow.
  3. HoneyFinder scan results Opportunistic scans of public AI-agent-adjacent surfaces matching honeypot-archetype indicators.

Classification rule

Every observation maps to exactly one primary Threat Matrix technique (T-NNNN). Multi-technique observations carry an array of secondary IDs. There are no per-product taxonomies, no soft tags. If an observation does not map to any existing technique, ARIAtrap files a Threat Matrix governance proposal before the observation appears in any published report.

Evidence classes

ClassDefinition
laboratory-validatedReproduced in sandbox by ARIAred. No fleet observation yet.
observed-single-fleetFleet telemetry confirms at least one observation on a single property within the reporting window.
observed-cross-fleetSame payload appears on three or more properties within twenty-four hours. The strongest tier, beyond MITRE ATT&CK's "observed in real ops."

Cross-property attribution

Two correlation primitives:

  • Fingerprint correlation: same JA3 + header hash on multiple fleet properties within a six-hour window is treated as one attacker session.
  • Token correlation: a canary token emitted by property A and resolved by property B is a cross-property reasoning event, the rarest, most diagnostic signal in the fleet. The fleet linkage graph exists specifically to interpret these.

Sophistication scoring (distribution-only)

Rubric in calibration; mean withheld pending classifier validation. ARIAtrap publishes the Sophistication Index as a histogram across the 1–10 scale per session. No mean. No grade. No month-over-month delta on a mean. Once the NanoMind v3.x supervised classifier achieves F1 ≥ 0.7 on a 200-session human-labeled holdout, the index transitions to mean + confidence interval. Until then, distribution shape is the only published statistic.

Fleet partitioning

Every observation carries a fleetCoverage class:

  • baseline . Common-Crawl-visible by design. The existing fleet sites.
  • authenticated Per-honeypot bespoke auth flows. No shared scaffolding (each authenticated property has its own login UX so the fleet does not produce a uniform attacker-detectable fingerprint).
  • dynamic Per-fingerprint content variation server.
  • social . Mastodon, Reddit, X, LinkedIn presences.

The Common-Crawl-gap signal is the delta between baseline and the other three classes. Reports always cite numbers partitioned by coverage class.

Confidence intervals

ARIApulse computes 95% CIs on every count published in a Behavioral Threat Report. Wilson score for binary metrics, normal-approximation for counts (Poisson exact when n is below one hundred), Wald with continuity correction for cross-partition ratios. Partitions with insufficient data (n below fifty) are reported as "Insufficient data this month. Resumes when n is at least fifty." rather than estimated.

Reproducibility standard

Every count in a Behavioral Threat Report is reproducible from the public OpenA2A Registry telemetry export bundle for the reporting window. Bar: a researcher with bundle access reproduces every cited count within five percent within two hours.

If a count cannot meet this bar, it does not appear in the report. It sits in our internal observations record as a research-debt entry. An empty cell beats a fabricated number.

See also