Skip to content

reflect-memory — the construct

One loop: capture what you teach the agent, index it three ways, recall the right piece into the next session. Everything in the Recall reference is a tuning knob on this loop.

For why this exists and how it compares to bare-harness memory, see Problem & fit.

The loop

┌─────────┐ signal ┌──────────┐ markdown ┌──────────────┐
│ session │ ───────▶ │ capture │ ─────────▶ │ KB (notes) │ ◀── source of truth
└─────────┘ "no—don't"└──────────┘ + sidecar └──────┬───────┘
│ index
┌─────────────────────────┼─────────────────────────┐
▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌──────────────┐
│ QMD (BM25) │ │ vector (ANN) │ │ graph (edges)│
│ index.sqlite│ │ nano-graphrag│ │ nano-graphrag│
└──────┬─────┘ └──────┬───────┘ └──────┬───────┘
└──────────── recall ─────┴──── RRF fuse ────────────┘
│ rerank · gate · budget
inject into next session
StageWhat happensWhere it lives
CaptureA hook detects a correction/decision signal, slices just the relevant dialogue window, and the drain writes a structured learning (title/category/tags/confidence/problem/fix/rule) plus an entity sidecar (A —[relation]→ B). Capture summarisation reuses the harness LLM (claude -p) — no extra key.~/.learnings/ markdown + sidecars
Indexreflect reindex registers each note in three arms: BM25 lexical (QMD), vector embedding (nano-graphrag, all-mpnet-base-v2), and the entity graph (nano-graphrag).QMD index.sqlite + nano-graphrag store
RecallBuild a query from project/branch/prompt context, run the arms, RRF-fuse the rankings, cross-encoder rerank, OOD-gate (inject nothing if nothing fits), then token-budget pack the survivors into the inject block.client-side, every session

The split that makes it cheap and private: the brain is client-side (embedding + clustering + capture LLM all run on your machine / your harness’s model), and the store is dumb (it indexes and returns rows — it never calls an LLM).

The two indices

reflect runs two complementary search engines, fused at recall time:

EngineAnswersBacked byStrength
QMD”what matches these words?”BM25 over ~/.cache/qmd/index.sqliteexact terms, file names, error strings
nano-graphrag”what’s connected / what means this?“vector ANN (hnswlib) + entity graph (.graphml)semantics + multi-hop (“what caused X?”)

Neither alone is enough — BM25 misses paraphrase, vectors miss exact tokens, and only the graph hops to the note you never lexically matched. RRF fusion is what lets all three vote.

Physical topology

reflect component topology — harness hooks feed the engine; the markdown KB is the source of truth; the derived store runs either local (QMD sqlite + nano-graphrag hnswlib/graphml) or shared (Supabase Postgres + pgvector)

Backend: local or shared (Postgres)

The markdown notes are always the local source of truth, and all LLM / embedding / clustering always stays client-side — no extra API key. Only the derived vector + graph store has two modes:

Mode 1 — Local (default)Mode 2 — Shared (Postgres)
Derived storeper-machine: QMD index.sqlite (BM25) + nano-graphrag (hnswlib + .graphml)one Supabase Postgres (pgvector) for everyone
SetupnothingREFLECT_PG_DSN + REFLECT_WORKSPACE_ID + 2 migrations
Share across machinesgit-sync the notes, then reflect reindex on each machineautomatic — every machine queries the same store

In Mode 2, nano-graphrag runs unchanged — it’s handed Postgres-backed storage classes (the same way it ships Neo4jStorage). The DB stays dumb (no LLM/embeddings); tenant isolation is RLS (fail-closed) + explicit workspace_id scoping; writes need a service_role DSN.

➡️ Full backend reference (schema, setup, threat model, per-harness install): stevengonsalvez/ainb-reflect-memory.

Next

Each stage above has knobs. The Recall reference documents every recall-time feature with a concrete example and the counterfactual — what you’d get without it.