MinusOneDB logoMinusOneDB
Architecture Deep-Dive

The database
built for queries.

MinusOneDB is a parallel database where distributed search is the foundation, storage and compute live on the same node, and pricing is for capacity — not for asking questions. This is how 100–1000× price-performance is architected, not marketed.

Read the docs
100–1000×
Price-performance vs. pay-per-query warehouses
~2s
Write visibility — events queryable in near real time
$0
Per-query cost for agents, dashboards, and analysts
The default stack

One problem. Seven systems.

A typical analytics pipeline has a database, a warehouse, an ETL layer, a stream processor, a cache, a lake, and a search engine — each metered, each staffed, each a renewal cycle.

Operational DBrenewal + DBAs
Data Warehouse$/query scanned
ETL Pipelineseng team
Stream ProcessorDevOps + lag
Cache Layerinvalidation hell
Data Lakecrawl jobs
Search Indexerseparate sync
MinusOneDBone system

One system replaces all of it. Events land once, index themselves, and are queryable at capacity cost. The org chart collapses with the stack.

Storage + compute, reunited

Every query stops paying rent on the wire.

Modern warehouses put storage on one side of the network and compute on the other. Every query crosses the wire both ways — and the meter runs either end. We put them back on the same node.

StorageS3 / blob
ComputeWarehouse
MinusOneDBSame node

Execution happens where the data lives. No network tax per query. Aggregations that used to stream petabytes across a wire run against the indexes in place.

Distributed search as the foundation

Search isn’t a feature. It’s the execution model.

In the legacy stack, search is a separate cluster fed by a crawl job — minutes-stale, cost-metered, forever drifting from the primary store. In MinusOneDB, every query resolves to a search operation. Text filter, faceted aggregation, numeric range, vector similarity: same engine, same write path, no second system to keep in sync.

Applications & AgentsREST + JS SDK
Query primitivesfilter · facet · aggregate · rank · vector
Index structuresinverted · doc-value · vector · range
Distributed parallel shardsfoundation

Every shard carries inverted indexes (text + structured filters), doc-value columns (for sort and facet), vector indexes (similarity), and range trees (numeric + temporal). All maintained by the same write. No ETL, no reindex job, no drift between systems.

The write path

We do the work at ingest time, so you don’t pay for it at query time.

You hand us a large dataset. We break it into chunks, stream them to N shard writers in parallel, and build every index around each document as it lands — inverted, doc-value, vector, range. Work a warehouse would do at query time happens at write time, once, amortised across every future read.

Shift the work
write-time
Indexes are built once at ingest. Every future query reads pre-computed structures — not raw rows.
Parallel writers
N × throughput
Each shard ingests independently. Double the cluster, double the write rate — no coordinator bottleneck.
Write to queryable
~2 seconds
By the time the ack lands, all four indexes exist. No ETL, no crawl job, no nightly rebuild.

A query against a warehouse is a computation. Against MinusOneDB, it’s a lookup. Warehouses scan rows every time you ask. We paid that cost once, at ingest — so every query that comes later just finds the answer waiting.

Four built-in stores

One API. Four data systems. Every workload.

One MinusOneDB environment exposes four optimised stores behind a single REST API + JS SDK — so you stop stitching together a database, a warehouse, a cache, and a lake yourself.

S

Search Store

Constant-time distributed search across petabytes of documents. Full-text, structured fields, facets, aggregations, range queries — in one pass.

single-ms needle queriesinverted + doc-value + vector~5M queries/mo base
  • Full-text + faceted queries in one request
  • Group-by aggregations without pre-rollup
  • Vector + text hybrid ranking
Replaces: Elasticsearch, Solr, the analytics side of Snowflake / BigQuery
Q

Session Store

Live per-user state for frequency caps, real-time targeting, profile flags, and anything that needs write-then-read in the same second.

~2s write visibilitykey-value + JSON
  • Durable, not ephemeral — survives restarts
  • Shared across every app node, no sticky sessions
  • Queryable alongside fact data in the same request
Replaces: Redis, DynamoDB session tables, bespoke session services
L

Data Lake

Raw, arbitrary-shape JSON at any scale. Land first, decide later. The schema can grow around the data, not the other way around.

nested JSONadditive schema
  • No upfront modeling required to start ingesting
  • Promote lake documents into the Search Store once patterns emerge
  • Stream the lake into batch jobs without copying data out
Replaces: S3 + Glue catalogs, Delta/Iceberg, raw-file pipelines
A

Archive

Durable object-store backing for every record ever written. Any dataset at any scale rebuilds from archive in ~3 hours.

~3h full rebuildS3-compatibleregion-pinned
  • Disaster recovery without tape or snapshots
  • Data-sovereignty pinning by region
  • Reproducible re-indexing when mappings evolve
Replaces: backup tooling, bespoke DR procedures, cold-storage shuffles
The query path

Anatomy of a sub-50ms query on a petabyte.

A REST request fans out to every shard in parallel, aggregates in place, and returns — all in one round-trip. Here it is, measured against the kind of number you’d use to time a keypress.

1 Watch it happen
0
10ms
25ms
42ms
50ms
Request received0.6ms
Fan-out to every shard2ms
In-place execution on every shard, in parallel32ms
Merge at coordinator6ms
Response1.4ms
Total round-trip 0ms
2 Now compare
Pay-per-query warehouse
~47 s · 1,119× slower
Columnar OLAP engine
~2.3 s · 55× slower
MinusOneDB
42 ms · done before you can blink

Same petabyte. Same query. MinusOneDB lives on the scale of a keystroke. Warehouses live on the scale of a coffee refill.

Why pricing follows architecture

Capacity pricing is a structural choice — not a discount.

Because queries run on the same node as the data, there’s nothing to meter per request. So we price what actually costs us something: the infrastructure footprint. Your bill flattens while your usage scales.

$$$$ $$ $ month 0 month 6 month 12 Per-query warehouse MinusOneDB capacity
Per-query warehouse bill (scales with usage)
MinusOneDB capacity bill (flat)
Per-query, 12-month total
$0
and rising with every agent you add.
MinusOneDB, 12-month total
$0
fixed. Same number next year.

Per-query is demand-priced — the warehouse’s incentive is to keep you running more queries. Capacity is supply-priced — ours is to keep you running efficiently.

Patterns this architecture unlocks

Workloads that are economically impossible elsewhere.

When every query is free at the margin and session, search, lake and archive share a store, whole categories of application become viable that pay-per-query stacks can’t price.

IdentityForge

Real-time identity resolution pattern

A live identity object with full lineage, updated as signals land, queryable in the same request as the facts. No nightly batch ID graph, no match-rate decay.

In production: enterprise analytics

ModelForge

Continuous model scoring pattern

Score billions of events against thousands of model variants every day. Writes are cheap because the store is search-first — AutoML and nightly retraining become a line item, not a quarterly initiative.

AI-native workloads

Federated Clean Room

Bring-compute-to-data pattern

Partners land data once in object store and run cross-domain filters inside an M1DB node. No per-query meter, no data movement, no lock-in.

Case study: Qonsent

Behavioural Segments

~2-second refresh pattern

Build, maintain, and sell user segments that refresh every ~2 seconds. Session + search stores combine so a segment is a live query, not a table rebuild.

In production: publishers & SSPs