Skip to content

Agent Memory Python Quickstart

This quickstart shows the smallest useful Agent Memory loop:

  1. Connect to a ZeptoDB HTTP server.
  2. Store scoped memories with client-supplied embeddings.
  3. Retrieve context under a token budget.
  4. Check exact and semantic prompt cache.
  5. Store the response and write back a decision memory.

ZeptoDB does not call embedding or LLM providers from the server. The application owns embeddings, prompts, model calls, and provider credentials.


Use Docker for a local server:

Terminal window
docker run -p 8123:8123 zeptodb/zeptodb:latest

Or run the local binary:

Terminal window
./zepto_http_server --port 8123

For persistence, start the server with an Agent Memory directory:

Terminal window
./zepto_http_server --port 8123 --agent-memory-dir ./agent_memory

Terminal window
pip install zeptodb

If you are working from the source tree, use the local package environment instead.


import hashlib
import json
import zepto_py as zepto
def embed(text: str, dims: int = 8) -> list[float]:
"""Small deterministic demo embedding. Replace with your provider."""
digest = hashlib.sha256(text.encode("utf-8")).digest()
return [((digest[i] / 255.0) * 2.0) - 1.0 for i in range(dims)]
def call_model(prompt: str) -> str:
"""Mock provider call. Replace with your OpenAI/Anthropic/local model call."""
return "Check press-7 vibration, inspect bearing wear, and compare the last maintenance window."
db = zepto.connect("localhost", 8123)
tenant_id = "factory-a"
namespace = "maintenance"
user_id = "operator-12"
session_id = "shift-2026-05-28"
agent_id = "maintenance-agent"
# 1. Seed memories.
db.memory.put(
tenant_id=tenant_id,
namespace=namespace,
user_id=user_id,
session_id=session_id,
agent_id=agent_id,
type="incident",
content="Press-7 vibration rose before bearing wear was found during the last inspection.",
metadata_json=json.dumps({"machine_id": "press-7", "source": "maintenance_log"}),
embedding=embed("press-7 vibration bearing wear"),
token_count=14,
importance=0.9,
pinned=True,
)
db.memory.put(
tenant_id=tenant_id,
namespace=namespace,
user_id=user_id,
session_id=session_id,
agent_id=agent_id,
type="runbook",
content="For rising vibration, compare current, temperature, lubrication, and recent bearing service.",
metadata_json=json.dumps({"machine_id": "press-7", "source": "runbook"}),
embedding=embed("rising vibration inspection checklist"),
token_count=13,
importance=0.7,
)
# 2. Retrieve context for the current question.
question = "Why is press-7 vibration rising?"
context = db.memory.get_context(
tenant_id=tenant_id,
namespace=namespace,
user_id=user_id,
session_id=session_id,
agent_id=agent_id,
query_embedding=embed(question),
token_budget=256,
limit=5,
)
memory_lines = [
f"- {m.get('content', '')}"
for m in context.get("memories", [])
]
prompt = "\n".join([
"Use the retrieved operational memory to answer the question.",
"",
"Question:",
question,
"",
"Retrieved memory:",
*memory_lines,
])
# 3. Check exact/semantic cache before calling a model provider.
cached = db.cache.lookup(
prompt,
embedding=embed(prompt),
tenant_id=tenant_id,
namespace=namespace,
semantic_threshold=0.92,
)
if cached.get("hit"):
response = cached.get("entry", {}).get("response", "")
source = f"cache:{cached.get('kind', 'unknown')}"
else:
response = call_model(prompt)
db.cache.store(
prompt,
response,
embedding=embed(prompt),
tenant_id=tenant_id,
namespace=namespace,
metadata_json=json.dumps({"question": question, "agent_id": agent_id}),
token_count=len(response.split()),
)
source = "provider"
# 4. Write back the decision as memory.
decision_id = db.memory.put(
tenant_id=tenant_id,
namespace=namespace,
user_id=user_id,
session_id=session_id,
agent_id=agent_id,
type="decision",
content=response,
metadata_json=json.dumps({"question": question, "source": source}),
embedding=embed(response),
token_count=len(response.split()),
importance=0.85,
)
print({
"source": source,
"context_memories": len(context.get("memories", [])),
"context_tokens": context.get("token_count", 0),
"decision_id": decision_id,
"response": response,
})

Agent Memory becomes more useful when it sits beside live time-series evidence. In a real agent, retrieve recent rows first, then combine SQL evidence with memory context:

recent = db.query("""
SELECT ts, vibration, temperature, current
FROM machine_sensors
WHERE machine_id = 'press-7'
ORDER BY ts
""")
evidence = recent.to_dict()

The agent prompt can then include both:

  • recent time-series evidence from SQL
  • retrieved memories from db.memory.get_context(...)
  • cache result from db.cache.lookup(...)
  • decision write-back through db.memory.put(...)