ZeptoDB SQL Reference
Last updated: 2026-03-24 31 SQL functions · 9 JOIN types · DDL/DML · s#/g#/p# indexes · Distributed query support · Statistical functions
ZeptoDB uses a recursive descent SQL parser with nanosecond timestamp semantics.
All integer columns are int64. Floating-point values are stored as fixed-point scaled integers.
NULL is represented internally as INT64_MIN.
Table of Contents
Section titled “Table of Contents”- SELECT Syntax
- CTE (WITH clause) & Subqueries
- WHERE Conditions
- Aggregate Functions
- GROUP BY / HAVING / ORDER BY / LIMIT
- Window Functions
- Financial Functions
- Date/Time Functions
- String Functions
- JOINs
- Set Operations
- CASE WHEN
- DDL (CREATE / ALTER / DROP TABLE)
- DML (INSERT / UPDATE / DELETE)
- Index Attributes (s# / g# / p#)
- EXPLAIN
- Catalog Queries (SHOW TABLES / DESCRIBE)
- Data Types & Timestamp Arithmetic
- Distributed Query Behavior
- Known Limitations
Quick Start
Section titled “Quick Start”1. Basic aggregation
Section titled “1. Basic aggregation”-- VWAP + row count for symbol 1SELECT vwap(price, volume) AS vwap, count(*) AS nFROM tradesWHERE symbol = 12. 5-minute OHLCV bar (kdb+ xbar style)
Section titled “2. 5-minute OHLCV bar (kdb+ xbar style)”SELECT xbar(timestamp, 300000000000) AS bar, first(price) AS open, max(price) AS high, min(price) AS low, last(price) AS close, sum(volume) AS volumeFROM tradesWHERE symbol = 1GROUP BY xbar(timestamp, 300000000000)ORDER BY bar ASC3. Moving average + EMA
Section titled “3. Moving average + EMA”SELECT timestamp, price, AVG(price) OVER (PARTITION BY symbol ROWS 20 PRECEDING) AS ma20, EMA(price, 12) OVER (PARTITION BY symbol ORDER BY timestamp) AS ema12FROM tradesWHERE symbol = 1ORDER BY timestamp ASC4. ASOF JOIN (trades ↔ quotes)
Section titled “4. ASOF JOIN (trades ↔ quotes)”SELECT t.symbol, t.price, q.bid, q.ask, t.timestamp - q.timestamp AS staleness_nsFROM trades tASOF JOIN quotes qON t.symbol = q.symbol AND t.timestamp >= q.timestampWHERE t.symbol = 15. Per-minute volume with time filter
Section titled “5. Per-minute volume with time filter”SELECT DATE_TRUNC('min', timestamp) AS minute, sum(volume) AS vol, vwap(price, volume) AS vwapFROM tradesWHERE symbol = 1 AND timestamp > NOW() - 3600000000000GROUP BY DATE_TRUNC('min', timestamp)ORDER BY minute ASC6. Conditional aggregation with CASE WHEN
Section titled “6. Conditional aggregation with CASE WHEN”SELECT symbol, sum(CASE WHEN price > 15050 THEN volume ELSE 0 END) AS high_vol, sum(CASE WHEN price <= 15050 THEN volume ELSE 0 END) AS low_vol, count(*) AS totalFROM tradesGROUP BY symbol7. UNION — combine two symbol results
Section titled “7. UNION — combine two symbol results”SELECT symbol, price, volume FROM trades WHERE symbol = 1UNION ALLSELECT symbol, price, volume FROM trades WHERE symbol = 2ORDER BY symbol ASC, timestamp ASC8. CTE — multi-step aggregation
Section titled “8. CTE — multi-step aggregation”-- Step 1: per-minute VWAP bar-- Step 2: rank bars by volumeWITH bars AS ( SELECT DATE_TRUNC('min', timestamp) AS minute, VWAP(price, volume) AS vwap, SUM(volume) AS vol FROM trades WHERE symbol = 1 GROUP BY DATE_TRUNC('min', timestamp))SELECT minute, vwap, volFROM barsWHERE vol > 50000ORDER BY vol DESCLIMIT 109. FROM subquery — derived table
Section titled “9. FROM subquery — derived table”SELECT symbol, avg_priceFROM ( SELECT symbol, AVG(price) AS avg_price FROM trades GROUP BY symbol) AS summaryWHERE avg_price > 15000SELECT Syntax
Section titled “SELECT Syntax”[WITH cte_name AS (SELECT ...) [, cte_name2 AS (SELECT ...) ...]]SELECT [DISTINCT] col_expr [AS alias], ...FROM { table_name [AS alias] | (SELECT ...) AS alias } [SAMPLE fraction] [JOIN ...]WHERE conditionGROUP BY col_or_expr, ...HAVING conditionORDER BY col [ASC|DESC], ...LIMIT nSAMPLE
Section titled “SAMPLE”Reads only a fraction of rows (0 < fraction ≤ 1). Uses deterministic hashing so results are reproducible. Applied after WHERE filtering.
-- Read ~10% of rowsSELECT * FROM trades SAMPLE 0.1
-- Combine with WHERE and aggregationSELECT avg(price) FROM trades SAMPLE 0.2 WHERE symbol = 1
-- Approximate GROUP BY on large datasetsSELECT symbol, count(*) FROM trades SAMPLE 0.5 GROUP BY symbolColumn expressions
Section titled “Column expressions”-- Plain columnSELECT price FROM trades
-- Arithmetic: + - * /SELECT price * volume AS notional FROM tradesSELECT (price - 15000) / 100 AS premium FROM tradesSELECT SUM(price * volume) AS total_notional FROM tradesSELECT AVG(price - open_price) AS avg_change FROM trades
-- Aggregate with arithmetic insideSELECT SUM(price * volume) / SUM(volume) AS manual_vwap FROM tradesDISTINCT
Section titled “DISTINCT”SELECT DISTINCT symbol FROM tradesTable alias
Section titled “Table alias”SELECT t.price, q.bid FROM trades t ASOF JOIN quotes q ...CTE (WITH clause) & Subqueries
Section titled “CTE (WITH clause) & Subqueries”WITH clause (Common Table Expressions)
Section titled “WITH clause (Common Table Expressions)”Named temporary result sets defined before the main SELECT. Makes complex multi-step queries readable and avoids nesting.
WITH name AS (SELECT ...) [, name2 AS (SELECT ...) ...]SELECT ... FROM name-- Single CTEWITH daily AS ( SELECT symbol, DATE_TRUNC('day', timestamp) AS day, SUM(volume) AS vol FROM trades GROUP BY symbol, DATE_TRUNC('day', timestamp))SELECT symbol, SUM(vol) AS total_volFROM dailyGROUP BY symbolORDER BY total_vol DESC-- Multiple chained CTEs (b references a)WITH a AS ( SELECT symbol, SUM(volume) AS total FROM trades GROUP BY symbol),b AS ( SELECT symbol, total FROM a WHERE total > 1000)SELECT symbol, total FROM b ORDER BY total DESC-- CTE + UNION ALLWITH highs AS ( SELECT symbol, price FROM trades WHERE price > 15050)SELECT symbol, price FROM highsUNION ALLSELECT symbol, price FROM trades WHERE symbol = 2FROM subquery (derived table)
Section titled “FROM subquery (derived table)”Use a SELECT as the FROM source by wrapping it in parentheses with an alias.
SELECT symbol, avg_priceFROM ( SELECT symbol, AVG(price) AS avg_price FROM trades GROUP BY symbol) AS summaryWHERE avg_price > 15000ORDER BY avg_price DESC-- Aggregation over subquerySELECT SUM(vol) AS grand_totalFROM ( SELECT symbol, SUM(volume) AS vol FROM trades WHERE price > 15000 GROUP BY symbol) AS subScalar subqueries in WHERE
Section titled “Scalar subqueries in WHERE”Use a subquery that returns a single value as a comparison target. The subquery is evaluated once before the outer scan.
-- Compare against aggregateSELECT * FROM tradesWHERE price > (SELECT avg(price) FROM trades)
-- With inner filterSELECT * FROM tradesWHERE price > (SELECT avg(price) FROM trades WHERE symbol = 1)
-- IN (SELECT ...) — filter by subquery result setSELECT * FROM tradesWHERE symbol IN (SELECT symbol FROM trades WHERE volume > 1000)Limitations:
- Scalar subqueries must return exactly 1 row × 1 column (error otherwise)
- Only uncorrelated subqueries (no references to outer query columns)
- IN subquery results are auto-deduplicated
Supported clauses on virtual tables
Section titled “Supported clauses on virtual tables”All standard clauses work on CTE / subquery results:
| Clause | Supported |
|---|---|
WHERE | ✅ All operators (=, !=, >, <, BETWEEN, IN, IS NULL, LIKE, AND, OR, NOT) |
GROUP BY | ✅ Single and multi-column |
HAVING | ✅ Post-aggregation filter |
ORDER BY | ✅ Single and multi-column, ASC/DESC |
LIMIT | ✅ |
SAMPLE | ✅ Deterministic hash-based row sampling (0..1 fraction) |
DISTINCT | ✅ |
SELECT * | ✅ Pass-through all source columns |
| Arithmetic | ✅ price * volume AS notional |
| Aggregates | ✅ SUM, AVG, MIN, MAX, COUNT, FIRST, LAST |
Limitations
Section titled “Limitations”- No correlated subqueries (
WHERE col = (SELECT ... WHERE inner.x = outer.x)) - Scalar subqueries in WHERE supported (uncorrelated only):
WHERE col > (SELECT ...),WHERE col IN (SELECT ...) - VWAP, XBAR, window functions, and JOIN not yet supported on virtual tables
WHERE Conditions
Section titled “WHERE Conditions”Comparison operators
Section titled “Comparison operators”WHERE price > 15000WHERE price >= 15000WHERE price < 15100WHERE price <= 15100WHERE price = 15000WHERE price != 15000BETWEEN
Section titled “BETWEEN”WHERE timestamp BETWEEN 1711000000000000000 AND 1711003600000000000WHERE price BETWEEN 15000 AND 15100AND / OR / NOT
Section titled “AND / OR / NOT”WHERE symbol = 1 AND price > 15000WHERE symbol = 1 OR symbol = 2WHERE NOT price > 15100WHERE NOT (price > 15100 OR volume < 50)Supports multi-partition routing: WHERE symbol IN (...) scans only the listed partitions.
WHERE symbol IN (1, 2, 3)WHERE price IN (15000, 15010, 15020)
-- Multi-partition aggregationSELECT symbol, SUM(volume) FROM tradesWHERE symbol IN (1, 2, 3) GROUP BY symbolIS NULL / IS NOT NULL
Section titled “IS NULL / IS NOT NULL”ZeptoDB uses INT64_MIN as the NULL sentinel.
WHERE risk_score IS NULLWHERE risk_score IS NOT NULLLIKE / NOT LIKE
Section titled “LIKE / NOT LIKE”Glob-style pattern matching applied to the decimal string representation of int64 values.
| Pattern char | Meaning |
|---|---|
% | Any substring (0 or more characters) |
_ | Any single character |
WHERE price LIKE '150%' -- prices starting with "150"WHERE price NOT LIKE '%9' -- prices not ending in "9"WHERE price LIKE '1500_' -- 5-char prices starting with "1500"WHERE timestamp LIKE '1711%' -- timestamps with that prefixAggregate Functions
Section titled “Aggregate Functions”All aggregates ignore NULL. Can be used in SELECT list or nested in expressions.
| Function | Description |
|---|---|
COUNT(*) | Total row count |
COUNT(col) | Non-null row count |
COUNT(DISTINCT col) | Distinct value count |
SUM(col) | Sum |
SUM(expr) | Sum of arithmetic expression, e.g. SUM(price * volume) |
AVG(col) | Average |
AVG(expr) | Average of arithmetic expression |
MIN(col) | Minimum |
MAX(col) | Maximum |
FIRST(col) | First value (by row order) — kdb+ first |
LAST(col) | Last value (by row order) — kdb+ last |
VWAP(price, volume) | Volume-weighted average price |
STDDEV(col) | Population standard deviation |
VARIANCE(col) | Population variance |
MEDIAN(col) | Median (50th percentile) |
PERCENTILE(col, N) | Nth percentile (0-100). Alias: PERCENTILE_CONT |
SELECT COUNT(*), COUNT(DISTINCT symbol), SUM(volume), AVG(price), MIN(price), MAX(price), VWAP(price, volume), FIRST(price) AS open, LAST(price) AS closeFROM trades WHERE symbol = 1Statistical Functions
Section titled “Statistical Functions”-- Standard deviation and variance per symbolSELECT symbol, STDDEV(price) AS sd, VARIANCE(price) AS varFROM trades GROUP BY symbol
-- Median priceSELECT MEDIAN(price) AS median_price FROM trades WHERE symbol = 1
-- P90 latency (percentile)SELECT PERCENTILE(price, 90) AS p90, PERCENTILE(price, 99) AS p99FROM trades WHERE symbol = 1
-- PERCENTILE_CONT alias also worksSELECT PERCENTILE_CONT(price, 50) AS p50 FROM trades WHERE symbol = 1GROUP BY / HAVING / ORDER BY / LIMIT
Section titled “GROUP BY / HAVING / ORDER BY / LIMIT”GROUP BY
Section titled “GROUP BY”-- Single columnSELECT symbol, SUM(volume) FROM trades GROUP BY symbol
-- xbar: kdb+ style time bucketing (arg = nanoseconds)SELECT xbar(timestamp, 300000000000) AS bar, SUM(volume)FROM trades GROUP BY xbar(timestamp, 300000000000)
-- Multi-column (composite key)SELECT symbol, price, SUM(volume) AS volFROM trades GROUP BY symbol, price
-- Date/time function as keySELECT DATE_TRUNC('hour', timestamp) AS hour, SUM(volume)FROM trades GROUP BY DATE_TRUNC('hour', timestamp)HAVING
Section titled “HAVING”Applied after aggregation. References result column aliases.
SELECT symbol, SUM(volume) AS total_volFROM trades GROUP BY symbolHAVING total_vol > 1000
SELECT symbol, AVG(price) AS avg_priceFROM trades GROUP BY symbolHAVING avg_price > 15000 AND avg_price < 20000ORDER BY / LIMIT / OFFSET
Section titled “ORDER BY / LIMIT / OFFSET”SELECT symbol, SUM(volume) AS total_volFROM trades GROUP BY symbolORDER BY total_vol DESCLIMIT 10
-- Pagination with OFFSETSELECT * FROM trades WHERE symbol = 1ORDER BY timestamp DESCLIMIT 50 OFFSET 100
-- OFFSET without ORDER BY (stable row order within partitions)SELECT * FROM trades LIMIT 50 OFFSET 200
-- Multi-column ORDER BYORDER BY symbol ASC, price DESCWindow Functions
Section titled “Window Functions”Syntax: func(col) OVER ([PARTITION BY col] [ORDER BY col] [ROWS n PRECEDING])
| Function | Description |
|---|---|
SUM(col) OVER (...) | Running sum |
AVG(col) OVER (...) | Moving average |
MIN(col) OVER (...) | Moving minimum |
MAX(col) OVER (...) | Moving maximum |
COUNT(col) OVER (...) | Running count |
ROW_NUMBER() OVER (...) | Row number within partition |
RANK() OVER (...) | Rank (gaps on tie) |
DENSE_RANK() OVER (...) | Rank (no gaps) |
LAG(col, n) OVER (...) | Value n rows before |
LEAD(col, n) OVER (...) | Value n rows ahead |
-- 20-row moving averageSELECT price, AVG(price) OVER (PARTITION BY symbol ROWS 20 PRECEDING) AS ma20FROM trades
-- Rank by price descendingSELECT symbol, price, RANK() OVER (ORDER BY price DESC) AS rankFROM trades
-- LAG / LEADSELECT price, LAG(price, 1) OVER (PARTITION BY symbol ORDER BY timestamp) AS prev_price, LEAD(price, 1) OVER (PARTITION BY symbol ORDER BY timestamp) AS next_priceFROM tradesFinancial Functions
Section titled “Financial Functions”EMA (Exponential Moving Average)
Section titled “EMA (Exponential Moving Average)”SELECT EMA(price, 20) OVER (PARTITION BY symbol ORDER BY timestamp) AS ema20FROM trades
-- Two EMAs (MACD components)SELECT EMA(price, 12) OVER (PARTITION BY symbol ORDER BY timestamp) AS ema12, EMA(price, 26) OVER (PARTITION BY symbol ORDER BY timestamp) AS ema26FROM tradesDELTA / RATIO
Section titled “DELTA / RATIO”-- Row-to-row differenceSELECT DELTA(price) OVER (PARTITION BY symbol ORDER BY timestamp) AS price_changeFROM trades
-- Row-to-row ratio (scaled int; multiply by 1e-6 for float interpretation)SELECT RATIO(price) OVER (ORDER BY timestamp) AS price_ratioFROM tradesxbar (Time Bar Aggregation)
Section titled “xbar (Time Bar Aggregation)”Buckets timestamps into fixed-size intervals. Argument is bucket size in nanoseconds.
-- 5-minute OHLCV candlestickSELECT xbar(timestamp, 300000000000) AS bar, FIRST(price) AS open, MAX(price) AS high, MIN(price) AS low, LAST(price) AS close, SUM(volume) AS volumeFROM trades WHERE symbol = 1GROUP BY xbar(timestamp, 300000000000)ORDER BY bar ASC
-- 1-hour VWAP barSELECT xbar(timestamp, 3600000000000) AS hour_bar, VWAP(price, volume) AS vwapFROM trades GROUP BY xbar(timestamp, 3600000000000)Common bar sizes:
| Period | Nanoseconds |
|---|---|
| 1 second | 1_000_000_000 |
| 1 minute | 60_000_000_000 |
| 5 minutes | 300_000_000_000 |
| 1 hour | 3_600_000_000_000 |
| 1 day | 86_400_000_000_000 |
Date/Time Functions
Section titled “Date/Time Functions”All ZeptoDB timestamps are nanoseconds since Unix epoch (int64).
DATE_TRUNC
Section titled “DATE_TRUNC”Floors a nanosecond timestamp to a time unit boundary.
DATE_TRUNC('unit', column_or_expr)| Unit | Bucket size (ns) |
|---|---|
'ns' | 1 |
'us' | 1,000 |
'ms' | 1,000,000 |
's' | 1,000,000,000 |
'min' | 60,000,000,000 |
'hour' | 3,600,000,000,000 |
'day' | 86,400,000,000,000 |
'week' | 604,800,000,000,000 |
SELECT DATE_TRUNC('min', timestamp) AS minute, SUM(volume) AS volFROM trades WHERE symbol = 1GROUP BY DATE_TRUNC('min', timestamp)ORDER BY minute ASC
SELECT DATE_TRUNC('hour', timestamp) AS hour, FIRST(price) AS open, LAST(price) AS closeFROM tradesGROUP BY DATE_TRUNC('hour', timestamp)Current nanosecond timestamp at query execution time (std::chrono::system_clock).
-- Last 60 seconds of tradesSELECT * FROM trades WHERE timestamp > NOW() - 60000000000
-- Age in secondsSELECT EPOCH_S(NOW()) - EPOCH_S(timestamp) AS age_sec FROM tradesINTERVAL
Section titled “INTERVAL”Duration literal that evaluates to nanoseconds. Use with NOW() for readable time-range queries.
INTERVAL 'N unit'| Unit | Aliases |
|---|---|
nanoseconds | ns, nanosecond |
microseconds | us, microsecond |
milliseconds | ms, millisecond |
seconds | s, sec, second |
minutes | m, min, minute |
hours | h, hour |
days | d, day |
weeks | w, week |
-- Last 5 minutes of tradesSELECT * FROM trades WHERE timestamp > NOW() - INTERVAL '5 minutes'
-- Last 2 hoursSELECT * FROM trades WHERE timestamp > NOW() - INTERVAL '2 hours'
-- In SELECT expressionsSELECT NOW() - INTERVAL '1 day' AS yesterday FROM trades LIMIT 1EPOCH_S / EPOCH_MS
Section titled “EPOCH_S / EPOCH_MS”Convert nanosecond timestamp to seconds or milliseconds.
SELECT EPOCH_S(timestamp) AS ts_sec FROM trades WHERE symbol = 1SELECT EPOCH_MS(timestamp) AS ts_ms FROM trades WHERE symbol = 1
-- Use in arithmeticSELECT price, EPOCH_S(timestamp) * 1000 AS ts_ms_manual FROM tradesASOF JOIN (time-series, kdb+ style)
Section titled “ASOF JOIN (time-series, kdb+ style)”For each left row, finds the most recent right row where right.timestamp <= left.timestamp.
SELECT t.symbol, t.price, q.bid, q.ask, t.timestamp - q.timestamp AS staleness_nsFROM trades tASOF JOIN quotes qON t.symbol = q.symbol AND t.timestamp >= q.timestampHash JOIN (equi-join)
Section titled “Hash JOIN (equi-join)”Standard equi-join. NULL values in the join key are excluded.
SELECT t.price, t.volume, r.risk_score, r.sectorFROM trades tJOIN risk_factors r ON t.symbol = r.symbolLEFT JOIN
Section titled “LEFT JOIN”Returns all left-side rows; unmatched right-side columns are NULL (INT64_MIN).
SELECT t.price, t.volume, r.risk_scoreFROM trades tLEFT JOIN risk_factors r ON t.symbol = r.symbolWHERE r.risk_score IS NOT NULLWINDOW JOIN (wj, kdb+ style)
Section titled “WINDOW JOIN (wj, kdb+ style)”For each left row, aggregates right-side rows within a symmetric time window.
SELECT t.price, wj_avg(q.bid) AS avg_bid, wj_avg(q.ask) AS avg_ask, wj_count(q.bid) AS quote_countFROM trades tWINDOW JOIN quotes qON t.symbol = q.symbolAND q.timestamp BETWEEN t.timestamp - 5000000000 AND t.timestamp + 5000000000Window aggregates: wj_avg, wj_sum, wj_min, wj_max, wj_count
RIGHT JOIN
Section titled “RIGHT JOIN”Returns all right-side rows; unmatched left-side columns are NULL.
SELECT t.price, r.risk_scoreFROM trades tRIGHT JOIN risk_factors r ON t.symbol = r.symbolFULL OUTER JOIN
Section titled “FULL OUTER JOIN”Returns all rows from both sides; unmatched columns are NULL.
SELECT t.price, r.risk_scoreFROM trades tFULL OUTER JOIN risk_factors r ON t.symbol = r.symbolUNION JOIN (uj, kdb+ style)
Section titled “UNION JOIN (uj, kdb+ style)”Merges columns from both tables, concatenates all rows. Missing columns filled with NULL.
SELECT * FROM trades t UNION JOIN quotes qPLUS JOIN (pj, kdb+ style)
Section titled “PLUS JOIN (pj, kdb+ style)”Additive join — matching rows have numeric columns summed.
SELECT * FROM trades tPLUS JOIN adjustments a ON t.symbol = a.symbolAJ0 (left-columns-only ASOF JOIN)
Section titled “AJ0 (left-columns-only ASOF JOIN)”Like ASOF JOIN but returns only the left table’s columns plus matched right values.
SELECT t.price, t.volume, q.bidFROM trades tAJ0 JOIN quotes q ON t.symbol = q.symbol AND t.timestamp >= q.timestampSet Operations
Section titled “Set Operations”All set operations require the same column count in both SELECT lists.
UNION ALL
Section titled “UNION ALL”Concatenates results. Duplicates are kept.
SELECT symbol, price FROM trades WHERE symbol = 1UNION ALLSELECT symbol, price FROM trades WHERE symbol = 2UNION (DISTINCT)
Section titled “UNION (DISTINCT)”Concatenates results and removes duplicate rows.
SELECT price FROM trades WHERE symbol = 1UNIONSELECT price FROM trades WHERE symbol = 2INTERSECT
Section titled “INTERSECT”Returns rows present in both result sets.
SELECT price FROM trades WHERE symbol = 1INTERSECTSELECT price FROM trades WHERE price > 15050EXCEPT
Section titled “EXCEPT”Returns rows from the left result set that are not in the right.
SELECT price FROM trades WHERE symbol = 1EXCEPTSELECT price FROM trades WHERE price > 15050CASE WHEN
Section titled “CASE WHEN”CASE WHEN condition THEN arithmetic_expr [WHEN condition THEN arithmetic_expr ...] [ELSE arithmetic_expr]END [AS alias]WHEN condition supports the same syntax as WHERE. THEN/ELSE support full arithmetic expressions.
CASE WHEN can be used standalone or nested inside aggregate functions:
-- Standalone: binary flagSELECT price, CASE WHEN price > 15050 THEN 1 ELSE 0 END AS is_highFROM trades WHERE symbol = 1
-- Inside SUM: conditional aggregationSELECT SUM(CASE WHEN price > 15050 THEN volume ELSE 0 END) AS high_volume, SUM(CASE WHEN price <= 15050 THEN volume ELSE 0 END) AS low_volumeFROM trades WHERE symbol = 1
-- Inside SUM with GROUP BY: per-group conditional countSELECT symbol, SUM(CASE WHEN price > 15050 THEN 1 ELSE 0 END) AS high_countFROM trades GROUP BY symbol
-- Arithmetic in THEN/ELSESELECT price, volume, CASE WHEN price > 15050 THEN price * 2 WHEN price > 15020 THEN price * 1 ELSE 0 END AS weighted_priceFROM trades
-- Conditional aggregateSELECT SUM(CASE WHEN price > 15050 THEN volume ELSE 0 END) AS high_volumeFROM trades WHERE symbol = 1String Functions
Section titled “String Functions”SUBSTR
Section titled “SUBSTR”Extracts a substring from the decimal string representation of an int64 column.
SUBSTR(column, start, length)startis 1-based (first character = 1)- Result is converted back to int64
-- Extract first 3 digits of priceSELECT SUBSTR(price, 1, 3) AS price_prefix FROM trades WHERE symbol = 1
-- Extract last 2 digitsSELECT SUBSTR(price, 4, 2) AS price_suffix FROM tradesData Types & Timestamp Arithmetic
Section titled “Data Types & Timestamp Arithmetic”ZeptoDB supports fixed-size numeric columns and dictionary-encoded strings.
| Logical type | Storage | DDL keyword | Notes |
|---|---|---|---|
| Integer | int64 | INT64, BIGINT | Direct |
| Integer (32-bit) | int32 | INT32, INT | Direct |
| Float | double | FLOAT64, DOUBLE | Native IEEE 754 |
| Float (32-bit) | float | FLOAT32, FLOAT | Native IEEE 754 |
| Timestamp | int64 | TIMESTAMP, TIMESTAMP_NS | Nanoseconds since Unix epoch |
| Symbol ID | uint32 | SYMBOL | Numeric symbol identifier (legacy) |
| String | uint32 (dict code) | STRING, VARCHAR, TEXT | Dictionary-encoded (LowCardinality) |
| Boolean | uint8 | BOOL, BOOLEAN | 0 or 1 |
| NULL | INT64_MIN | — | Used for IS NULL checks |
String (Dictionary-Encoded)
Section titled “String (Dictionary-Encoded)”String columns use dictionary encoding internally — each unique string is assigned a uint32 code.
This is optimal for low-cardinality columns (symbol, exchange, side, currency).
-- Insert with string symbolINSERT INTO trades (symbol, price, volume) VALUES ('AAPL', 150.25, 100)
-- Query with string symbolSELECT price FROM trades WHERE symbol = 'AAPL'SELECT symbol, SUM(volume) FROM trades GROUP BY symbol
-- Works with all operations: aggregation, JOIN, ORDER BY, LIMIT, etc.SELECT VWAP(price, volume) FROM trades WHERE symbol = 'GOOGL'Integer symbol IDs (legacy) remain fully supported:
INSERT INTO trades VALUES (1, 15000, 100, 1711234567000000000)SELECT price FROM trades WHERE symbol = 1Timestamp arithmetic
Section titled “Timestamp arithmetic”-- 1 minute agoNOW() - 60000000000
-- Last 5 minutesWHERE timestamp > NOW() - 300000000000Unit reference:
| Unit | Nanoseconds |
|---|---|
| 1 ns | 1 |
| 1 μs | 1_000 |
| 1 ms | 1_000_000 |
| 1 s | 1_000_000_000 |
| 1 min | 60_000_000_000 |
| 1 hour | 3_600_000_000_000 |
| 1 day | 86_400_000_000_000 |
DDL (Data Definition Language)
Section titled “DDL (Data Definition Language)”CREATE TABLE
Section titled “CREATE TABLE”CREATE TABLE orders ( symbol INT64, price INT64, volume INT64, timestamp TIMESTAMP_NS)Supported types: INT64, INT32, FLOAT64, FLOAT32, TIMESTAMP_NS, SYMBOL, BOOL
DROP TABLE
Section titled “DROP TABLE”DROP TABLE ordersDROP TABLE IF EXISTS ordersALTER TABLE
Section titled “ALTER TABLE”-- Add columnALTER TABLE orders ADD COLUMN risk_score INT64
-- Drop columnALTER TABLE orders DROP COLUMN risk_score
-- Set TTL (auto-evict old partitions)ALTER TABLE trades SET TTL 30 DAYSALTER TABLE trades SET TTL 24 HOURS
-- Set index attribute (s#/g#/p#)ALTER TABLE trades SET ATTRIBUTE price GROUPED -- g# hash indexALTER TABLE trades SET ATTRIBUTE exchange PARTED -- p# parted indexALTER TABLE trades SET ATTRIBUTE timestamp SORTED -- s# sorted indexDML (Data Manipulation Language)
Section titled “DML (Data Manipulation Language)”INSERT
Section titled “INSERT”-- Single rowINSERT INTO trades VALUES (1, 15000, 100, 1711234567000000000)
-- Multi-rowINSERT INTO trades VALUES (1, 15050, 200, 1711234568000000000), (2, 16000, 300, 1711234569000000000)
-- Column list (timestamp auto-generated)INSERT INTO trades (symbol, price, volume) VALUES (1, 15100, 150)UPDATE
Section titled “UPDATE”UPDATE trades SET price = 15200 WHERE symbol = 1 AND price > 15100DELETE
Section titled “DELETE”DELETE FROM trades WHERE symbol = 1 AND timestamp < 1711000000000000000Index Attributes (s# / g# / p#)
Section titled “Index Attributes (s# / g# / p#)”kdb+ compatible column attributes for query acceleration.
| Attribute | Type | Complexity | Best For |
|---|---|---|---|
s# (SORTED) | Binary search | O(log n) | Range queries (BETWEEN, >, <) on monotonic columns |
g# (GROUPED) | Hash map | O(1) | Equality queries (= X) on high-cardinality columns |
p# (PARTED) | Range map | O(1) | Equality queries on low-cardinality clustered columns |
Setting attributes
Section titled “Setting attributes”ALTER TABLE trades SET ATTRIBUTE price GROUPED -- g# hash indexALTER TABLE trades SET ATTRIBUTE exchange PARTED -- p# parted indexALTER TABLE trades SET ATTRIBUTE timestamp SORTED -- s# binary searchPerformance impact
Section titled “Performance impact”| Query | No Index | g# Index | Speedup |
|---|---|---|---|
WHERE price = 15500 (1M rows) | 904μs | 3.3μs | 274x |
The executor automatically uses the best available index for WHERE conditions. Index selection priority: timestamp range → s# sorted → g#/p# equality → full scan.
EXPLAIN
Section titled “EXPLAIN”Shows the query execution plan without running the query.
EXPLAIN SELECT count(*) FROM trades WHERE symbol = 1 AND price > 15000Output includes: scan type (full/indexed/parallel), index used, estimated rows, partition count.
Catalog Queries (SHOW TABLES / DESCRIBE)
Section titled “Catalog Queries (SHOW TABLES / DESCRIBE)”SHOW TABLES
Section titled “SHOW TABLES”Lists all tables registered in the schema registry.
SHOW TABLESResponse columns: name (string), rows (int64 — total row count across all partitions).
{"columns":["name","rows"],"data":[["trades",50000],["quotes",30000]],"rows":2}DESCRIBE
Section titled “DESCRIBE”Returns the column definitions for a table.
DESCRIBE tradesResponse columns: column (string), type (string — INT64, FLOAT64, TIMESTAMP, SYMBOL, etc.).
{"columns":["column","type"],"data":[["symbol","SYMBOL"],["price","INT64"],["volume","INT64"],["timestamp","TIMESTAMP"]],"rows":4}Distributed Query Behavior
Section titled “Distributed Query Behavior”In a multi-node cluster, the QueryCoordinator routes queries using a tiered strategy:
Routing tiers
Section titled “Routing tiers”| Tier | Condition | Behavior |
|---|---|---|
| A | WHERE symbol = X | Direct routing to the owning node (zero scatter overhead) |
| A-1 | WHERE symbol IN (1,2,3) | Scatter to all nodes, each filters locally, merge results |
| A-2 | ASOF/WINDOW JOIN + symbol filter | Route to symbol’s node (both tables co-located) |
| B | No symbol filter | Scatter-gather to all nodes, merge with appropriate strategy |
Merge strategies
Section titled “Merge strategies”| Strategy | Used when | Merge logic |
|---|---|---|
| CONCAT | GROUP BY symbol | Each node owns its symbols → concatenate results |
| MERGE_GROUP_BY | GROUP BY non-symbol key (e.g. xbar) | Re-aggregate partial results across nodes |
| SCALAR_AGG | No GROUP BY, all columns are aggregates | SUM→sum, COUNT→sum, MIN→min, MAX→max, AVG→sum/count |
| CONCAT (default) | Non-aggregate SELECT | Concatenate all rows, apply post-merge ORDER BY/LIMIT |
Distributed support for SQL features
Section titled “Distributed support for SQL features”| Feature | Distributed support | Notes |
|---|---|---|
SUM(CASE WHEN ...) | ✅ Full | CASE WHEN serialized to scatter SQL via unparse_case_when |
WHERE symbol IN (...) | ✅ Full | Tier A-1: scatter + local filter + merge |
ORDER BY | ✅ Full | Post-merge sort on coordinator |
HAVING | ✅ Full | Stripped from scatter SQL, applied post-merge |
LIMIT | ✅ Full | Applied post-merge after ORDER BY |
AVG | ✅ Full | Rewritten to SUM+COUNT, reconstructed post-merge |
VWAP | ✅ Full | Rewritten to SUM(price×vol)+SUM(vol), reconstructed |
FIRST/LAST | ✅ Full | Fetches all data, sorts by timestamp, executes locally |
COUNT(DISTINCT) | ✅ Full | Fetches all data, executes locally |
| Window functions | ✅ Full | Fetches all data, executes locally |
| CTE / Subquery | ✅ Full | Fetches all data, executes locally |
STDDEV/VARIANCE/MEDIAN/PERCENTILE | ✅ Full | Fetches all data, executes locally |
SHOW TABLES | ✅ Full | Scatter to all nodes, sum row counts |
DESCRIBE | ✅ Full | Execute on any node (schema replicated via DDL broadcast) |
CREATE/DROP/ALTER TABLE | ✅ Full | DDL broadcast to all nodes |
Known Limitations
Section titled “Known Limitations”| Feature | Status |
|---|---|
Correlated subqueries (WHERE col = (SELECT ...)) | Not planned |
| Subqueries in SELECT/WHERE expressions | Not planned |
| JOINs on CTE/subquery virtual tables | Planned |
| Window functions on virtual tables | Planned |
| Float columns (native double storage) | Planned |
| String columns | Planned |
| PERCENTILE_CONT / MEDIAN | Planned |
| STDDEV / VARIANCE | Planned |
See also: Python Reference · C++ Reference · HTTP Reference