Skip to content

WAL Replication: Quorum Writes, Retry, and Backpressure

The original WalReplicator was async best-effort: data dropped silently on queue overflow, failed sends were never retried, and there was no way to distinguish partial replica failures. For a time-series database handling financial data, that’s not good enough. Here’s what we built.


Before

Queue overflow → silent data loss. Send failure → error counter incremented, data gone. No quorum concept.

After

Quorum writes with configurable W. Retry queue with exponential backoff. Optional backpressure to block producers.


In flush_batch(), the replicator sends to each replica and tallies ACK count:

flush_batch(batch):
ack_count = 0
for each replica in replicas_:
if send(replica, batch) == OK:
ack_count++
else:
enqueue_retry(replica, batch)
if quorum_w == 0:
success = (ack_count == replicas_.size()) // all must ACK
else:
success = (ack_count >= quorum_w) // W of N
quorum_wBehavior
0All replicas must ACK (strongest guarantee)
1At least 1 replica ACKs (availability over consistency)
N/2 + 1Majority quorum (typical production setting)

Failed batches go into a bounded retry queue. The send loop processes retries on every iteration:

send_loop():
while running:
batch = dequeue() // main queue
flush_batch(batch)
process_retry_queue() // retry failed batches
process_retry_queue():
for each entry in retry_queue_:
if entry.attempts >= max_retries:
drop(entry) // log + increment retry_exhausted
continue
if send(entry.replica, entry.batch) == OK:
stats_.retried++
remove(entry)
else:
entry.attempts++
struct ReplicatorConfig {
uint32_t quorum_w = 0; // 0 = all replicas
size_t retry_queue_capacity = 65536; // 64K entries
uint32_t retry_interval_ms = 100; // between retry attempts
uint32_t max_retries = 3; // then drop
bool backpressure = false; // block producer on full queue
uint32_t backpressure_timeout_ms = 500; // max block time
};

When backpressure = true and the main queue is full, the producer thread blocks instead of dropping data:

enqueue(batch):
if queue_.full():
if config_.backpressure:
stats_.backpressured++
cv_.wait_for(lock, backpressure_timeout_ms)
if queue_.full():
drop(batch) // timeout — drop as last resort
return
else:
drop(batch) // no backpressure — drop immediately
return
queue_.push(batch)

The condition variable is signaled by send_loop() after each batch swap, creating natural flow control between producer and replicator.


struct ReplicatorStats {
uint64_t sent; // successfully sent batches
uint64_t failed; // send failures (queued for retry)
uint64_t dropped; // dropped on queue overflow
uint64_t retried; // successfully retried batches
uint64_t retry_exhausted; // dropped after max_retries exceeded
uint64_t backpressured; // enqueue calls that blocked
};

These are exposed via /admin/metrics for Prometheus monitoring. Key alerts to set up:

MetricAlert ThresholdMeaning
retry_exhausted> 0Data loss — replicas unreachable beyond retry budget
backpressuredincreasingProducer faster than replication — consider more replicas
dropped> 0Queue overflow without backpressure — enable backpressure or increase capacity

Workloadquorum_wbackpressuremax_retriesRationale
HFT (latency-critical)1false1Minimize write latency; accept some data loss
Financial (correctness)N/2+1true5Majority quorum; block rather than lose data
IoT (high volume)1false3Best-effort with retries; volume too high to block
Observability0false3All replicas; drop on overload (metrics are replaceable)

All defaults match the original behavior:

FieldDefaultOriginal Behavior
quorum_w0All replicas attempted
max_retries3New (was 0 — no retries)
backpressurefalseDrop on overflow

The only behavioral change: failed sends are now retried up to 3 times before dropping. This is strictly better than the original behavior.