Signal-to-decision latency.
A reproducible reference benchmark for measuring the gap between a signal arriving at the operational stack and the action landing in the system of record. Three baselines, four percentile measurements, modality-drift comparison.
The IDEaS Multi-Domain Fusion challenge, MAIA’s field study, and every operator we have interviewed agree that latency is the property that separates the demo from the deployment. This benchmark gives the question a measurable answer.
What this benchmark measures
Signal-to-decision latency is the time elapsed between the moment a structured event is delivered to an ingest endpoint and the moment a decision proposal lands in an operator-visible queue. We split the measurement into four percentiles, p50, p75, p95, p99, across three architecture baselines, and we run it twice, once with steady-state inputs and once with modality drift injected at 18% of incoming events.
The benchmark deliberately ignores model-internal latency. The purpose is to measure end-to-end operational latency the operator actually feels, including ingest queue, ontology resolution, scorer execution, action gate evaluation, and write-back to the ledger. Architecture choices that look fast in isolation can lose to architectures that look slow in isolation but compose well.
Three baseline architectures
- 01Baseline A — point-to-point integrationThe current dominant pattern in operational software. Sensor → middleware → SoR with bespoke per-source connectors and a human-in-the-loop reviewing each decision. Representative of the typical multi-site operator stack we surveyed in our 2026 field study.
- 02Baseline B — message-bus fan-inA message bus (Kafka or equivalent) ingests every source. Decision logic runs as topic consumers. Outputs land back on the bus and are persisted by a separate audit consumer. Common in well-resourced enterprise stacks.
- 03Baseline C — ontology-first fusion (MAIA architecture)Sources resolve to a shared ontology at ingest. Scorers run over the graph. Action gates emit proposals carrying lineage. Audit chain anchors every output. The architecture specified in our 2026 Operational Ontology paper.
Reference numbers
The numbers below are from the steady-state run on the public reference dataset. Each percentile is the latency in seconds at that percentile across 50,000 ingested events. Lower is better.
Under modality drift, where 18% of incoming events arrive in a new schema at a previously untrained sensor, the gap widens. Baseline A degrades by 4.2x at p99, Baseline B by 2.6x, and Baseline C by 1.4x. The ontology layer absorbs the drift because schema evolution is declarative rather than coded into the integration path.
Why ontology-first wins on latency
Cross-system reasoning is the slow operation in every operational stack. Latency-killing work happens between systems, not inside them. The ontology architecture eliminates that work by performing entity resolution once at ingest and treating every downstream query as a graph traversal in a single store. There is no per-decision integration step to time, because the integration happened the moment the event arrived.
Action-tier latency is governed by the same property. A severity-banded gate looks up the action’s reversibility window, classification cap, and policy lens in the same graph the decision was emitted from. There is no second join to a policy engine, because policy is bound to the entity at runtime.
Reproducing the benchmark
The reference dataset, the three baseline harnesses, and the measurement script are available on request to qualified procurement officers and academic collaborators. The workload is 50,000 events drawn from a synthetic operational distribution covering imagery, sensor telemetry, structured text, and RF events. The harness runs in approximately three hours on a single c5.4xlarge equivalent. Results are deterministic given the published RNG seed.
We deliberately publish these numbers from the steady-state run rather than the most favourable configuration. The point of a benchmark is to be defensible against an inspector general or an academic reviewer, not to look impressive.
Caveats
Benchmarks measure what they measure. The numbers above are end-to-end ingest-to-proposal latency on a synthetic distribution, not a guarantee of production behaviour at scale. Modality drift in the real world is rarely as clean as our 18% injection. And point-to-point baselines vary widely depending on how aggressively the operator has invested in middleware automation.
We expose the harness to encourage other vendors to publish their numbers. The category needs a public reference, and the operator who reads competing benchmarks can compare like to like instead of glossy decks.
Companion to MAIA’s 2026 Operational Ontology paper and Multi-Domain Fusion methodology paper. The benchmark is the third leg of the methodology: the ontology specifies the layer, the fusion paper specifies the architecture, and this benchmark specifies how to measure whether the architecture delivers.
