Agent Memory Python Quickstart
This quickstart shows the smallest useful Agent Memory loop:
- Connect to a ZeptoDB HTTP server.
- Store scoped memories with client-supplied embeddings.
- Retrieve context under a token budget.
- Check exact and semantic prompt cache.
- Store the response and write back a decision memory.
ZeptoDB does not call embedding or LLM providers from the server. The application owns embeddings, prompts, model calls, and provider credentials.
Start ZeptoDB
Section titled “Start ZeptoDB”Use Docker for a local server:
docker run -p 8123:8123 zeptodb/zeptodb:latestOr run the local binary:
./zepto_http_server --port 8123For persistence, start the server with an Agent Memory directory:
./zepto_http_server --port 8123 --agent-memory-dir ./agent_memoryInstall the Python client
Section titled “Install the Python client”pip install zeptodbIf you are working from the source tree, use the local package environment instead.
Run one memory turn
Section titled “Run one memory turn”import hashlibimport json
import zepto_py as zepto
def embed(text: str, dims: int = 8) -> list[float]: """Small deterministic demo embedding. Replace with your provider.""" digest = hashlib.sha256(text.encode("utf-8")).digest() return [((digest[i] / 255.0) * 2.0) - 1.0 for i in range(dims)]
def call_model(prompt: str) -> str: """Mock provider call. Replace with your OpenAI/Anthropic/local model call.""" return "Check press-7 vibration, inspect bearing wear, and compare the last maintenance window."
db = zepto.connect("localhost", 8123)
tenant_id = "factory-a"namespace = "maintenance"user_id = "operator-12"session_id = "shift-2026-05-28"agent_id = "maintenance-agent"
# 1. Seed memories.db.memory.put( tenant_id=tenant_id, namespace=namespace, user_id=user_id, session_id=session_id, agent_id=agent_id, type="incident", content="Press-7 vibration rose before bearing wear was found during the last inspection.", metadata_json=json.dumps({"machine_id": "press-7", "source": "maintenance_log"}), embedding=embed("press-7 vibration bearing wear"), token_count=14, importance=0.9, pinned=True,)
db.memory.put( tenant_id=tenant_id, namespace=namespace, user_id=user_id, session_id=session_id, agent_id=agent_id, type="runbook", content="For rising vibration, compare current, temperature, lubrication, and recent bearing service.", metadata_json=json.dumps({"machine_id": "press-7", "source": "runbook"}), embedding=embed("rising vibration inspection checklist"), token_count=13, importance=0.7,)
# 2. Retrieve context for the current question.question = "Why is press-7 vibration rising?"context = db.memory.get_context( tenant_id=tenant_id, namespace=namespace, user_id=user_id, session_id=session_id, agent_id=agent_id, query_embedding=embed(question), token_budget=256, limit=5,)
memory_lines = [ f"- {m.get('content', '')}" for m in context.get("memories", [])]prompt = "\n".join([ "Use the retrieved operational memory to answer the question.", "", "Question:", question, "", "Retrieved memory:", *memory_lines,])
# 3. Check exact/semantic cache before calling a model provider.cached = db.cache.lookup( prompt, embedding=embed(prompt), tenant_id=tenant_id, namespace=namespace, semantic_threshold=0.92,)
if cached.get("hit"): response = cached.get("entry", {}).get("response", "") source = f"cache:{cached.get('kind', 'unknown')}"else: response = call_model(prompt) db.cache.store( prompt, response, embedding=embed(prompt), tenant_id=tenant_id, namespace=namespace, metadata_json=json.dumps({"question": question, "agent_id": agent_id}), token_count=len(response.split()), ) source = "provider"
# 4. Write back the decision as memory.decision_id = db.memory.put( tenant_id=tenant_id, namespace=namespace, user_id=user_id, session_id=session_id, agent_id=agent_id, type="decision", content=response, metadata_json=json.dumps({"question": question, "source": source}), embedding=embed(response), token_count=len(response.split()), importance=0.85,)
print({ "source": source, "context_memories": len(context.get("memories", [])), "context_tokens": context.get("token_count", 0), "decision_id": decision_id, "response": response,})Add live evidence
Section titled “Add live evidence”Agent Memory becomes more useful when it sits beside live time-series evidence. In a real agent, retrieve recent rows first, then combine SQL evidence with memory context:
recent = db.query(""" SELECT ts, vibration, temperature, current FROM machine_sensors WHERE machine_id = 'press-7' ORDER BY ts""")
evidence = recent.to_dict()The agent prompt can then include both:
- recent time-series evidence from SQL
- retrieved memories from
db.memory.get_context(...) - cache result from
db.cache.lookup(...) - decision write-back through
db.memory.put(...)
Next steps
Section titled “Next steps” Agent Memory Guide Full API surface, performance shape, and current operating model
Why Time-Series Matters Why operational memory needs a replayable event timeline
Benchmarks Search, context, cache, snapshot, ingestion, and query latency numbers