Architecture Deep-Dive

The database
built for queries.

MinusOneDB is a parallel database where distributed search is the foundation, storage and compute live on the same node, and pricing is for capacity — not for asking questions. This is how 100–1000× price-performance is architected, not marketed.

Read the docs

100–1000×

Price-performance vs. pay-per-query warehouses

~2s

Write visibility — events queryable in near real time

Per-query cost for agents, dashboards, and analysts

The default stack

One problem. Seven systems.

A typical analytics pipeline has a database, a warehouse, an ETL layer, a stream processor, a cache, a lake, and a search engine — each metered, each staffed, each a renewal cycle.

Operational DBrenewal + DBAs

Data Warehouse$/query scanned

ETL Pipelineseng team

Stream ProcessorDevOps + lag

Cache Layerinvalidation hell

Data Lakecrawl jobs

Search Indexerseparate sync

MinusOneDBone system

One system replaces all of it. Events land once, index themselves, and are queryable at capacity cost. The org chart collapses with the stack.

Storage + compute, reunited

Every query stops paying rent on the wire.

Modern warehouses put storage on one side of the network and compute on the other. Every query crosses the wire both ways — and the meter runs either end. We put them back on the same node.

StorageS3 / blob

ComputeWarehouse

MinusOneDBSame node

Execution happens where the data lives. No network tax per query. Aggregations that used to stream petabytes across a wire run against the indexes in place.

Distributed search as the foundation

Search isn’t a feature. It’s the execution model.

In the legacy stack, search is a separate cluster fed by a crawl job — minutes-stale, cost-metered, forever drifting from the primary store. In MinusOneDB, every query resolves to a search operation. Text filter, faceted aggregation, numeric range, vector similarity: same engine, same write path, no second system to keep in sync.

Applications & AgentsREST + JS SDK

Query primitivesfilter · facet · aggregate · rank · vector

Index structuresinverted · doc-value · vector · range

Distributed parallel shardsfoundation

Every shard carries inverted indexes (text + structured filters), doc-value columns (for sort and facet), vector indexes (similarity), and range trees (numeric + temporal). All maintained by the same write. No ETL, no reindex job, no drift between systems.

The write path

We do the work at ingest time, so you don’t pay for it at query time.

You hand us a large dataset. We break it into chunks, stream them to N shard writers in parallel, and build every index around each document as it lands — inverted, doc-value, vector, range. Work a warehouse would do at query time happens at write time, once, amortised across every future read.

Shift the work

write-time

Indexes are built once at ingest. Every future query reads pre-computed structures — not raw rows.

Parallel writers

N × throughput

Each shard ingests independently. Double the cluster, double the write rate — no coordinator bottleneck.

Write to queryable

~2 seconds

By the time the ack lands, all four indexes exist. No ETL, no crawl job, no nightly rebuild.

A query against a warehouse is a computation. Against MinusOneDB, it’s a lookup. Warehouses scan rows every time you ask. We paid that cost once, at ingest — so every query that comes later just finds the answer waiting.

Four built-in stores

One API. Four data systems. Every workload.

One MinusOneDB environment exposes four optimised stores behind a single REST API + JS SDK — so you stop stitching together a database, a warehouse, a cache, and a lake yourself.

Search Store

Constant-time distributed search across petabytes of documents. Full-text, structured fields, facets, aggregations, range queries — in one pass.

single-ms needle queriesinverted + doc-value + vector~5M queries/mo base

Full-text + faceted queries in one request
Group-by aggregations without pre-rollup
Vector + text hybrid ranking

Replaces: Elasticsearch, Solr, the analytics side of Snowflake / BigQuery

Session Store

Live per-user state for frequency caps, real-time targeting, profile flags, and anything that needs write-then-read in the same second.

~2s write visibilitykey-value + JSON

Durable, not ephemeral — survives restarts
Shared across every app node, no sticky sessions
Queryable alongside fact data in the same request

Replaces: Redis, DynamoDB session tables, bespoke session services

Data Lake

Raw, arbitrary-shape JSON at any scale. Land first, decide later. The schema can grow around the data, not the other way around.

nested JSONadditive schema

No upfront modeling required to start ingesting
Promote lake documents into the Search Store once patterns emerge
Stream the lake into batch jobs without copying data out

Replaces: S3 + Glue catalogs, Delta/Iceberg, raw-file pipelines

Anatomy of a sub-50ms query on a petabyte.

A REST request fans out to every shard in parallel, aggregates in place, and returns — all in one round-trip. Here it is, measured against the kind of number you’d use to time a keypress.

1 Watch it happen

10ms

25ms

42ms

50ms

Request received0.6ms

Fan-out to every shard2ms

In-place execution on every shard, in parallel32ms

Merge at coordinator6ms

Response1.4ms

Total round-trip 0ms

Human blink ≈ 100ms · off this chart →

2 Now compare

Pay-per-query warehouse

~47 s · 1,119× slower

Columnar OLAP engine

~2.3 s · 55× slower

MinusOneDB

42 ms · done before you can blink

Same petabyte. Same query. MinusOneDB lives on the scale of a keystroke. Warehouses live on the scale of a coffee refill.

Why pricing follows architecture

Capacity pricing is a structural choice — not a discount.

Because queries run on the same node as the data, there’s nothing to meter per request. So we price what actually costs us something: the infrastructure footprint. Your bill flattens while your usage scales.

Per-query warehouse bill (scales with usage)

MinusOneDB capacity bill (flat)

Per-query, 12-month total

and rising with every agent you add.

MinusOneDB, 12-month total

fixed. Same number next year.

Per-query is demand-priced — the warehouse’s incentive is to keep you running more queries. Capacity is supply-priced — ours is to keep you running efficiently.

Patterns this architecture unlocks

Workloads that are economically impossible elsewhere.

When every query is free at the margin and session, search, lake and archive share a store, whole categories of application become viable that pay-per-query stacks can’t price.

IdentityForge

Real-time identity resolution pattern

A live identity object with full lineage, updated as signals land, queryable in the same request as the facts. No nightly batch ID graph, no match-rate decay.

In production: enterprise analytics →

ModelForge

Continuous model scoring pattern

Score billions of events against thousands of model variants every day. Writes are cheap because the store is search-first — AutoML and nightly retraining become a line item, not a quarterly initiative.

AI-native workloads →

Federated Clean Room

Bring-compute-to-data pattern

Partners land data once in object store and run cross-domain filters inside an M1DB node. No per-query meter, no data movement, no lock-in.

Case study: Qonsent →

Behavioural Segments

~2-second refresh pattern

Build, maintain, and sell user segments that refresh every ~2 seconds. Session + search stores combine so a segment is a live query, not a table rebuild.

In production: publishers & SSPs →

The database
built for queries.

One problem. Seven systems.

Every query stops paying rent on the wire.

Search isn’t a feature. It’s the execution model.

We do the work at ingest time, so you don’t pay for it at query time.

One API. Four data systems. Every workload.

Search Store

Session Store

Data Lake

Archive

Anatomy of a sub-50ms query on a petabyte.

Capacity pricing is a structural choice — not a discount.

Workloads that are economically impossible elsewhere.

IdentityForge

ModelForge

Federated Clean Room

Behavioural Segments

The architecture is the pitch.
The rest is math.

The databasebuilt for queries.

One problem. Seven systems.

Every query stops paying rent on the wire.

Search isn’t a feature. It’s the execution model.

We do the work at ingest time, so you don’t pay for it at query time.

One API. Four data systems. Every workload.

Search Store

Session Store

Data Lake

Archive

Anatomy of a sub-50ms query on a petabyte.

Capacity pricing is a structural choice — not a discount.

Workloads that are economically impossible elsewhere.

IdentityForge

ModelForge

Federated Clean Room

Behavioural Segments

The architecture is the pitch.The rest is math.

The database
built for queries.

The architecture is the pitch.
The rest is math.