Agent Memory for AI Agents & AgentOps

Time-series tells an agent what happened. Agent Memory tells it what mattered last time. ZeptoDB puts both on the same operational substrate.

Why agents need time-series memory

Most agent memory systems store summaries or embeddings without the live event stream that made those memories true. That is useful for chat, but weak for operational agents. A factory agent, trading agent, robotics agent, or incident-response agent needs to know the exact sequence:

Which signals changed?
Which memories were retrieved?
Was a cached answer reused?
Which tool or model was called?
What action followed?
What happened afterward?

ZeptoDB handles that shape directly. The time-series engine stores events, metrics, tool calls, model calls, and outcomes. The Agent Memory layer stores tenant/session scoped memories, embeddings, context windows, prompt cache entries, retention policy, and replay metadata.

What the combined system unlocks

Grounded agent recall

Retrieve memories by tenant, namespace, user, session, agent, type, TTL, importance, recency, access count, and embedding similarity, then tie the result back to raw time-series evidence.

Operational prompt cache

Check exact and semantic cache entries before calling an external provider. Repeated incidents, alerts, queries, and support flows can reuse prior responses when application policy allows it.

Replayable decisions

Store agent_runs, retrieval_events, cache_events, llm_calls, llm_errors, tool_calls, context_traces, and context_replay_events as time-series tables.

Live context assembly

Pull the latest facts from SQL, retrieve durable memories from Agent Memory, fit both under a token budget, and send one grounded context packet to the model.

Cluster-aware operation

Route writes, point reads, search/context, and semantic cache fallback across Agent Memory nodes, then inspect local or cluster-scoped stats.

Retention and ANN controls

Bound memory with tenant quotas, TTL/capacity eviction, tombstones, rollback, and optional sparse/HNSW/IVF ANN candidate indexes.

Example flow

1. Live system emits time-series events
   sensors, ticks, traces, alerts, tool calls, model calls

2. Agent asks a question or receives an alert
   "Why is press-7 vibration rising?"

3. ZeptoDB retrieves live evidence
   last 10 minutes of vibration, temperature, current, maintenance events

4. Agent Memory retrieves context
   similar failures, pinned maintenance notes, prior diagnoses, cache hits

5. Provider call happens only if needed
   exact cache hit -> reuse
   semantic cache hit -> reuse with threshold
   cache miss -> call model

6. Agent writes back
   decision summary, action, confidence, follow-up, tool results

7. AgentOps timeline remains queryable
   replay the full chain later with SQL

API surface

Surface	Use
`POST /api/ai/memories`	Store a memory record with metadata and optional embedding
`POST /api/ai/memories/search`	Search memories with filters, ranking, and top-K retrieval
`POST /api/ai/context`	Assemble deduplicated memory context under a token budget
`POST /api/ai/cache/store`	Store exact/semantic prompt cache entries
`POST /api/ai/cache/lookup`	Lookup exact prompt matches or embedding-similar cached responses
`GET /api/ai/stats`	Inspect local memory/cache counts, capacity, snapshot health, ANN footprint, and failover status
`GET /api/ai/stats?scope=cluster`	Aggregate Agent Memory stats across configured nodes and report partial failures
`connection.memory`	Python helper for memory write/search/context calls
`connection.cache`	Python helper for cache store/lookup calls

ZeptoDB deliberately does not call embedding providers or LLM providers from the server. Your application owns provider choice, model choice, prompts, and embeddings. ZeptoDB owns fast storage, filtering, ranking, context assembly, cache lookup, and telemetry.

Python sketch

import zepto_py as zepto

conn = zepto.connect("localhost", 8123)

recent = conn.query("""
    SELECT ts, vibration, temperature, current
    FROM machine_sensors
    WHERE machine_id = 'press-7'
      AND ts > now() - interval '10 minutes'
    ORDER BY ts
""")

context = conn.memory.get_context(
    tenant_id="factory-a",
    namespace="maintenance",
    user_id="operator-12",
    session_id="shift-2026-05-27",
    agent_id="maintenance-agent",
    query_embedding=embed("press-7 rising vibration"),
    limit=8,
    token_budget=1800,
)

cached = conn.cache.lookup(
    tenant_id="factory-a",
    namespace="maintenance",
    prompt="Why is press-7 vibration rising?",
    embedding=embed("Why is press-7 vibration rising?"),
    semantic_threshold=0.92,
)

if cached.get("hit"):
    answer = cached.get("entry", {}).get("response", "")
else:
    answer = call_model(recent, context)
    conn.cache.store(
        tenant_id="factory-a",
        namespace="maintenance",
        prompt="Why is press-7 vibration rising?",
        response=answer,
        embedding=embed("Why is press-7 vibration rising?"),
    )

conn.memory.put(
    tenant_id="factory-a",
    namespace="maintenance",
    user_id="operator-12",
    session_id="shift-2026-05-27",
    agent_id="maintenance-agent",
    type="decision",
    content=answer,
    metadata_json='{"machine_id":"press-7","source":"agent"}',
    embedding=embed(answer),
    importance=0.87,
    pinned=False,
)

AgentOps trace and replay

Agent Memory examples now include concrete AgentOps mappings:

OpenTelemetry GenAI spans map to llm_calls, llm_errors, cache_events, and tool_calls.
Context trace rows record selected memories, rank, score, similarity, token count, and reason.
Context replay rows record time-series queries and evidence windows around a decision.
These tables remain queryable with the same SQL engine that stores the operational timeline.

This keeps the agent path inspectable: what the agent saw, which memories it used, which cache decision happened, which provider/tool call ran, and which evidence window justified the answer.

Vertical patterns

Industrial agents

Combine high-frequency vibration, temperature, current, and work-order timelines with prior diagnoses and cached maintenance guidance.

Trading agents

Pair tick-by-tick market state with strategy memory, risk decisions, execution outcomes, and compliance replay.

Robotics agents

Keep sensor fusion data, action outcomes, suppressions, operator interventions, and policy notes together as replayable episodes. The Physical AI Action-Outcome line is currently research-only; see the grounded overview before treating it as a production feature.

Action-Outcome Memory →

Observability agents

Join metrics, traces, deploys, incidents, runbooks, cache hits, LLM calls, and remediation actions in one timeline.

Performance shape

Operation	Sample result
Time-series ingestion	5.52M events/sec on a single node
Time-series query	272μs filter over 1M rows
Python zero-copy	522ns query result to NumPy
Memory search, 10K records	1.23ms p50, 1.40ms p95
Context assembly, 10K records	1.34ms p50, 1.41ms p95
Exact cache lookup	0.00ms p50
Semantic cache lookup	0.07ms p50

Exact filtered scan remains the baseline. Optional sparse projection, HNSW, and IVF candidate indexes can reduce semantic candidate latency while preserving final filtering and ranking. Stats expose indexed vectors, rebuilds, fallbacks, memory bytes, tombstone entries, and sidecar byte counts.

Current operating model

Agent Memory now has a multi-node operating path for routed writes, point reads, fan-out search/context, semantic-cache fan-out, owner-local persistence, replica WAL policy, delete and eviction tombstones, tenant quotas, local/cluster stats, and owner-failover status.

Operationally important details:

GET /api/ai/stats returns local counts, snapshot health, ANN footprint, eviction config, and last owner-failover result.
GET /api/ai/stats?scope=cluster aggregates node stats and partial failures.
Tenant quotas run before global caps; pinned memories are protected from capacity eviction, but TTL expiry still removes them.
Automatic TTL, tenant-quota, and capacity evictions emit tombstones.
Failed durability paths restore capacity-eviction side effects if the tombstone path is not durable.
Shard migration dual-write/catch-up remains future work outside the current routed Agent Memory path.

Build from the public repo, or discuss your agent workflow in GitHub: GitHub → · Discussions →

Start in Python

If you want the shortest runnable path, use the Agent Memory Python Quickstart. It walks through memory writes, context retrieval, cache lookup, provider fallback, and decision write-back.

For robot and embodied-agent memory, read Physical AI Memory and Action-Outcome Memory, then see Action-Outcome Memory and the research evidence for the current controlled-pilot boundary.