Skip to content

GROUP BY Optimization with Timestamp Range Index

Time-series data has two properties that most databases ignore: timestamps are always sorted, and data is naturally partitioned by symbol. ZeptoDB exploits both to make GROUP BY and WHERE timestamp BETWEEN dramatically faster.


A naive WHERE timestamp BETWEEN X AND Y scans every row:

// O(n) — reads all 1M rows even if only 1% match
for (size_t i = 0; i < num_rows; ++i) {
if (data[i] >= lo && data[i] <= hi)
result.push_back(i);
}

For a 1M-row partition where only 10K rows match the time range, this wastes 99% of the work.


Section titled “Timestamp Range Index: O(log n) Binary Search”

Since ZeptoDB appends data in timestamp order (monotonically increasing), every partition’s timestamp column is already sorted. Binary search finds the exact range boundaries:

std::pair<size_t, size_t> timestamp_range(int64_t from_ts, int64_t to_ts) const {
auto span = ts_col->as_span<int64_t>();
auto begin = std::lower_bound(span.begin(), span.end(), from_ts);
auto end = std::upper_bound(span.begin(), span.end(), to_ts);
return {begin - span.begin(), end - span.begin()};
}

This returns a [start, end) index range in O(log n). For 1M rows, that’s ~20 comparisons instead of 1M.

Before even binary-searching within a partition, we can skip entire partitions by checking only the first and last timestamp:

bool overlaps_time_range(int64_t lo, int64_t hi) const {
auto span = ts_col->as_span<int64_t>();
return span.front() <= hi && span.back() >= lo;
}

Two comparisons. If the partition’s time range doesn’t overlap the query range, skip it entirely.

For each partition:
├── O(1): overlaps_time_range? → NO → skip entirely
├── O(log n): timestamp_range → [start, end)
└── eval WHERE only on [start, end) rows

At 1% selectivity, this is roughly 99x faster than a full scan — the binary search finds the 10K matching rows directly, and partition pruning eliminates non-overlapping partitions without touching their data.


ZeptoDB stores data in per-symbol partitions. Each partition has a PartitionKey containing the symbol_id. For GROUP BY symbol, this means the grouping is already done:

if (is_symbol_group) {
// O(1) — read symbol_id from partition key, not from row data
int64_t symbol_gkey = static_cast<int64_t>(part->key().symbol_id);
// aggregate all rows in this partition under this key
}

No hash table. No per-row symbol column reads. No hash collisions. Each partition is a group.

For non-symbol GROUP BY columns, a pre-allocated hash map handles the general case:

std::unordered_map<int64_t, std::vector<GroupState>> groups;
groups.reserve(1024); // pre-allocate for typical cardinality

When a query has both ORDER BY and LIMIT, full sorting is wasteful. std::partial_sort gives O(n log k) instead of O(n log n):

if (limit < result.rows.size()) {
// Top-N: O(n log k) — only sort the top k elements
std::partial_sort(begin, begin + limit, end, comparator);
result.rows.resize(limit);
} else {
// Full sort: O(n log n)
std::sort(begin, end, comparator);
}

For ORDER BY price DESC LIMIT 10 on 1M rows, this is ~50x faster than full sort.

A subtle bug was caught during implementation: applying LIMIT during data collection (before ORDER BY) truncates the wrong rows. The fix ensures LIMIT is only applied after sorting when ORDER BY is present.


Environment: Amazon Linux 2023, Clang 19, -O3 -march=native, 100K rows (10 symbols × 10K each).

group_by_symbol_sum min=331μs avg=360μs p99=412μs
group_by_symbol_multi_agg min=1592μs avg=1960μs p99=2117μs
group_by_symbol_order_limit min=489μs avg=535μs p99=559μs

Timestamp binary search

~99x faster at 1% selectivity. O(log n) vs O(n) — 20 comparisons vs 1M.

Partition pruning

O(1) per partition. Two comparisons skip entire partitions that don’t overlap the query range.

Symbol GROUP BY

Zero-cost grouping. Partition key = group key. No hash table needed.

Top-N sort

O(n log k) partial sort for ORDER BY + LIMIT. ~50x faster for small k.


These optimizations compound. A query like:

SELECT symbol, avg(price), sum(volume)
FROM trades
WHERE timestamp BETWEEN '2026-03-01' AND '2026-03-02'
GROUP BY symbol
ORDER BY sum(volume) DESC
LIMIT 10

Hits all four optimizations:

  1. Partition pruning skips symbols with no data in the time range
  2. Binary search narrows to the exact row range within each partition
  3. GROUP BY uses partition keys directly — no hash table
  4. Top-N partial sort returns only the top 10

The result is a query that touches only the data it needs, at every level of the execution pipeline.


Related: Parallel Query Engine → · Cost-Based Query Planner → · FlatHashMap JOINs →