Timestamp binary search
~99x faster at 1% selectivity. O(log n) vs O(n) — 20 comparisons vs 1M.
Time-series data has two properties that most databases ignore: timestamps are always sorted, and data is naturally partitioned by symbol. ZeptoDB exploits both to make GROUP BY and WHERE timestamp BETWEEN dramatically faster.
A naive WHERE timestamp BETWEEN X AND Y scans every row:
// O(n) — reads all 1M rows even if only 1% matchfor (size_t i = 0; i < num_rows; ++i) { if (data[i] >= lo && data[i] <= hi) result.push_back(i);}For a 1M-row partition where only 10K rows match the time range, this wastes 99% of the work.
Since ZeptoDB appends data in timestamp order (monotonically increasing), every partition’s timestamp column is already sorted. Binary search finds the exact range boundaries:
std::pair<size_t, size_t> timestamp_range(int64_t from_ts, int64_t to_ts) const { auto span = ts_col->as_span<int64_t>(); auto begin = std::lower_bound(span.begin(), span.end(), from_ts); auto end = std::upper_bound(span.begin(), span.end(), to_ts); return {begin - span.begin(), end - span.begin()};}This returns a [start, end) index range in O(log n). For 1M rows, that’s ~20 comparisons instead of 1M.
Before even binary-searching within a partition, we can skip entire partitions by checking only the first and last timestamp:
bool overlaps_time_range(int64_t lo, int64_t hi) const { auto span = ts_col->as_span<int64_t>(); return span.front() <= hi && span.back() >= lo;}Two comparisons. If the partition’s time range doesn’t overlap the query range, skip it entirely.
For each partition: ├── O(1): overlaps_time_range? → NO → skip entirely ├── O(log n): timestamp_range → [start, end) └── eval WHERE only on [start, end) rowsAt 1% selectivity, this is roughly 99x faster than a full scan — the binary search finds the 10K matching rows directly, and partition pruning eliminates non-overlapping partitions without touching their data.
ZeptoDB stores data in per-symbol partitions. Each partition has a PartitionKey containing the symbol_id. For GROUP BY symbol, this means the grouping is already done:
if (is_symbol_group) { // O(1) — read symbol_id from partition key, not from row data int64_t symbol_gkey = static_cast<int64_t>(part->key().symbol_id); // aggregate all rows in this partition under this key}No hash table. No per-row symbol column reads. No hash collisions. Each partition is a group.
For non-symbol GROUP BY columns, a pre-allocated hash map handles the general case:
std::unordered_map<int64_t, std::vector<GroupState>> groups;groups.reserve(1024); // pre-allocate for typical cardinalityWhen a query has both ORDER BY and LIMIT, full sorting is wasteful. std::partial_sort gives O(n log k) instead of O(n log n):
if (limit < result.rows.size()) { // Top-N: O(n log k) — only sort the top k elements std::partial_sort(begin, begin + limit, end, comparator); result.rows.resize(limit);} else { // Full sort: O(n log n) std::sort(begin, end, comparator);}For ORDER BY price DESC LIMIT 10 on 1M rows, this is ~50x faster than full sort.
A subtle bug was caught during implementation: applying LIMIT during data collection (before ORDER BY) truncates the wrong rows. The fix ensures LIMIT is only applied after sorting when ORDER BY is present.
Environment: Amazon Linux 2023, Clang 19, -O3 -march=native, 100K rows (10 symbols × 10K each).
group_by_symbol_sum min=331μs avg=360μs p99=412μsgroup_by_symbol_multi_agg min=1592μs avg=1960μs p99=2117μsgroup_by_symbol_order_limit min=489μs avg=535μs p99=559μsTimestamp binary search
~99x faster at 1% selectivity. O(log n) vs O(n) — 20 comparisons vs 1M.
Partition pruning
O(1) per partition. Two comparisons skip entire partitions that don’t overlap the query range.
Symbol GROUP BY
Zero-cost grouping. Partition key = group key. No hash table needed.
Top-N sort
O(n log k) partial sort for ORDER BY + LIMIT. ~50x faster for small k.
These optimizations compound. A query like:
SELECT symbol, avg(price), sum(volume)FROM tradesWHERE timestamp BETWEEN '2026-03-01' AND '2026-03-02'GROUP BY symbolORDER BY sum(volume) DESCLIMIT 10Hits all four optimizations:
The result is a query that touches only the data it needs, at every level of the execution pipeline.
Related: Parallel Query Engine → · Cost-Based Query Planner → · FlatHashMap JOINs →