What this benchmark measures
Signal-to-decision latency is the time elapsed between the moment a structured event is delivered to an ingest endpoint and the moment a decision proposal lands in an operator-visible queue. We split the measurement into four percentiles — p50, p75, p95, p99 — across three architecture baselines, and we run it twice: once with steady-state inputs and once with modality drift injected at 18% of incoming events.
The benchmark deliberately ignores model-internal latency. The purpose is to measure end-to-end operational latency the operator actually feels, including ingest queue, ontology resolution, scorer execution, action gate evaluation, and write-back to the ledger. Architecture choices that look fast in isolation can lose to architectures that look slow in isolation but compose well.
Three baseline architectures
Baseline A — point-to-point integration
The current dominant pattern in operational software. Sensor to middleware to system-of-record with bespoke per-source connectors and a human-in-the-loop reviewing each decision. Representative of the typical multi-site operator stack we surveyed in our 2026 field study.
Baseline B — message-bus fan-in
A message bus (Kafka or equivalent) ingests every source. Decision logic runs as topic consumers. Outputs land back on the bus and are persisted by a separate audit consumer. Common in well-resourced enterprise stacks.
Baseline C — ontology-first fusion (MAIA architecture)
Sources resolve to a shared ontology at ingest. Scorers run over the graph. Action gates emit proposals carrying lineage. Audit chain anchors every output. The architecture specified in our 2026 Operational Ontology paper.
Reference numbers
The numbers below are from the steady-state run on the public reference dataset. Each percentile is the latency in seconds at that percentile across 50,000 ingested events. Lower is better.
- Baseline A · point-to-point: p50 182 · p75 412 · p95 1,940 · p99 11,700
- Baseline B · message-bus: p50 21 · p75 44 · p95 186 · p99 1,310
- Baseline C · ontology-first (MAIA): p50 1.4 · p75 3.8 · p95 12.1 · p99 58.4
Under modality drift, where 18% of incoming events arrive in a new schema at a previously untrained sensor, the gap widens. Baseline A degrades by 4.2x at p99, Baseline B by 2.6x, and Baseline C by 1.4x. The ontology layer absorbs the drift because schema evolution is declarative rather than coded into the integration path.
Why ontology-first wins on latency
Cross-system reasoning is the slow operation in every operational stack. Latency-killing work happens between systems, not inside them. The ontology architecture eliminates that work by performing entity resolution once at ingest and treating every downstream query as a graph traversal in a single store. There is no per-decision integration step to time, because the integration happened the moment the event arrived.
Action-tier latency is governed by the same property. A severity-banded gate looks up the action's reversibility window, classification cap, and policy lens in the same graph the decision was emitted from. There is no second join to a policy engine, because policy is bound to the entity at runtime.
Reproducing the benchmark
The reference dataset, the three baseline harnesses, and the measurement script are available on request to qualified procurement officers and academic collaborators. The workload is 50,000 events drawn from a synthetic operational distribution covering imagery, sensor telemetry, structured text, and RF events. The harness runs in approximately three hours on a single c5.4xlarge equivalent. Results are deterministic given the published RNG seed.
We deliberately publish these numbers from the steady-state run rather than the most favourable configuration. The point of a benchmark is to be defensible against an inspector general or an academic reviewer, not to look impressive.
Caveats
Benchmarks measure what they measure. The numbers above are end-to-end ingest-to-proposal latency on a synthetic distribution, not a guarantee of production behaviour at scale. Modality drift in the real world is rarely as clean as our 18% injection. And point-to-point baselines vary widely depending on how aggressively the operator has invested in middleware automation.
We expose the harness to encourage other vendors to publish their numbers. The category needs a public reference, and the operator who reads competing benchmarks can compare like to like instead of glossy decks.
Companion papers
This benchmark is the third leg of MAIA's 2026 methodology sequence. The framing comes from The State of Operational Decision-Making. The architecture under test is specified in The Operational Ontology and extended for defence use in Multi-Domain Fusion at Sovereign Scale. The ontology specifies the layer, the fusion paper specifies the architecture, and this benchmark specifies how to measure whether the architecture delivers.
