Skip to content

reflect-memory — recall reference (all 57 ports)

“Correct once, never again.” This is the reference half of reflect-memory: first one learning’s whole journey (the newcomer’s mental model), then every ported feature as a table — flag, what it does, an example, and the counterfactual: what you’d get if it didn’t exist.

For the architecture see Construct; for the problem and the bare-harness comparison see Problem & fit.

Examples are illustrative

The reflect search … examples and their results below are representative, hand-authored to show the behaviour — not pasted from a specific live KB run. Every feature is backed by a behavioural proof under tests/eval/behavioral/proofs/ that exercises it with the knob on and off.

Memory end to end — one learning’s journey

Follow one correction from the moment it happens to the moment it saves you weeks later.

1 · Capture. Mid-session you tell the agent: “no — don’t bump the shared payments.proto without regenerating the clients, it broke staging last time.” A PostToolUse/Stop hook detects the correction signal (the “no — don’t…” shape), slices just the relevant dialogue window (not the whole 100k-token transcript), and the drain writes a structured learning:

---
title: "Regenerate gRPC clients after editing payments.proto"
category: reliability
tags: [grpc, proto, payments, codegen]
confidence: 0.8
project_id: billing-svc
problem: "Bumped payments.proto without regenerating clients"
fix: "Run `make proto-gen` after any .proto edit; CI now gates on it"
rule: "Never ship a .proto change without regenerated clients"
---

Alongside it, an entity sidecar records payments.proto —[prevents]→ staging outage.

2 · Index. reflect reindex embeds the note (vector arm), adds its entities + edges to the GraphRAG graph (graph arm), and registers it in the BM25 index (QMD arm). It’s now reachable three different ways.

3 · Recall — three weeks later, a different session. A new teammate’s agent opens billing-svc and is about to edit payments.proto. SessionStart fires, builds a query from the project + branch context, and runs hybrid recall:

  • the vector arm matches on proto / payments / codegen meaning,
  • the BM25 arm matches the literal payments.proto,
  • the graph arm hops the prevents edge to the staging-outage context,
  • RRF fuses the three rankings, the cross-encoder reranks, the OOD gate confirms it’s genuinely relevant, and the token budget packs it into the inject block.

Before the agent writes a single line, it sees: “Regenerate gRPC clients after editing payments.proto — broke staging last time; run make proto-gen.” The mistake never happens twice.

That whole chain — capture → index → fuse → rerank → gate → inject — is what the 57 ports tune.

The 57 ports at a glance

┌──────────────────────── recall time ───────────────────────┐
query ─▶ [arms] ─▶ RRF fuse ─▶ [rerank] ─▶ [gate] ─▶ [boosts] ─▶ budget ─▶ inject
R1·R5·R6 R2·R3 R7·R12 R8·R16
─────────────────── scope: R15·A6 · modes: M1·R10·R11 ────────────────
GroupPortsBelow
Retrieval armsR1 R5 R6 R2 R3 R4
Relevance gatesR7 R12
Ranking & affinity boostsR8 R16 S3 S4
Scope: sharding & isolationR15 A6
Recall modes & injectM1 R10 R11 M7 M4 O3 A1 R20
Caching, dedup & negative-recallR9 S7 C1 A3 SG6 S1
Capture · storage · consolidation · team (the other 29)S/M/A/C/O/SG/R series

All 57: R1–R16 + R20 (17 retrieval/recall), M1–M8 (8 modes), S1–S10 (10 capture/storage), A1–A6 (6 advanced), C1–C5 (5 consolidation), O1–O3 (3 observations), SG1–SG8 (8 signals). (There is no R17–R19 — the R-series jumps R16 → R20.)


Retrieval arms

The parallel signals that find candidates, plus the post-fusion shapers that order them. Marked ⭑ features have a worked example below.

FeatureFlag (default)What it does · exampleWithout it
R1 ⭑ Graph-expansion armRECALL_GRAPH_ARM (on)A 3rd arm walks the entity graph and fuses notes you never matched lexically. “why does checkout call recalcTax twice?” → hops caused_by to the EU-VAT rounding note.Only the lexically-matching note; you “fix” the perf and re-introduce the rounding bug.
R5 ⭑ Temporal armRECALL_TEMPORAL_ARM (on)When a date phrase is found (R6), a 4th arm fuses notes whose timestamp falls in the window. “what did we decide last week about auth”.In-window notes get crowded out by topical arms; recent decisions don’t surface.
R6 Query-time date parsingRECALL_TEMPORAL (on)Regex-extracts “last week” / “in march” / “since 2026-01-01” into a real date range that feeds R5.”last week” is just two more keywords; temporal intent silently dropped.
R2 ⭑ Cross-encoder rerankRECALL_CROSS_ENCODER (on); REFLECT_CE_MODEL, RECALL_CE_TIMEOUTRe-reads the top-20 jointly with the query (local MiniLM CE) and re-sorts by meaning. “flaky test in the auth suite” lifts the real answer over a keyword-similar “auth token format”.The keyword-similar-but-wrong note wins rank 1; right prior art misses a tight budget.
R3 ⭑ MMR diversityRECALL_MMR (on) / --no-mmr; RECALL_MMR_LAMBDA (0.7)Final top-k via Maximal Marginal Relevance — de-clusters near-duplicate notes. “nginx 502 under load” surfaces the lone “enable keepalive” note past 4 “raise worker_connections” twins.Top-k is 4 copies of one idea; the second, correct idea sits at rank 6, never injected.
R4 Token-budget retrieval--max-tokens N (0=off); REFLECT_RECALL_MAX_TOKENSPacks ranked notes by estimated tokens until the budget is hit, instead of a fixed top-k. reflect search "deploy steps" --max-tokens 1500.A fixed top-k blows the window on a verbose corpus; long notes evict the user’s own files.

Relevance gates

FeatureFlag (default)What it does · exampleWithout it
R7 ⭑ OOD relevance gate--min-overlap / REFLECT_RECALL_MIN_OVERLAPMeasures query-term coverage of the top hit; below threshold it injects nothing (ood_gated). --min-overlap 0.3 in a fresh repo.Every session gets the least-bad junk; the agent learns to ignore the inject block.
R12 Per-arm calibrated thresholdsRECALL_ARM_<NAME>_MIN_SCORE (0=off); seed via reflect calibrate-thresholdsA per-arm floor applied before RRF (arm scores aren’t comparable). RECALL_ARM_BM25_MIN_SCORE=0.15 drops weak BM25 hits without nuking strong graph hits.One global cutoff either lets BM25 noise through or starves the graph arm.

Ranking & affinity boosts

Secondary signals that break ties — each bounded so it can nudge, never hijack.

FeatureFlag (default)What it does · exampleWithout it
R8 Bounded multiplicative boostsRECALL_CONFIDENCE_ALPHA RECALL_RECENCY_ALPHA RECALL_TAG_ALPHA RECALL_PROOF_ALPHA (0.2/0.2/0.2/0.1)Each signal applied as 1+α·(norm−0.5), clamped to ±α/2 — recency/confidence/tags/proof break ties only.Unbounded boosts let “most recent” bury a 2-year-old note that perfectly answers the query.
R16 Project-affinity boostRECALL_PROJECT_ALPHA (0.2)Under --global, current-project notes get a capped +10% lift over equally-relevant foreign ones.Cross-project recall treats every project equally; a foreign note outranks your own.
S3 Numeric confidence rankingRECALL_CONFIDENCE_ALPHA; --field confidence_numStores continuous 0–1 confidence as the canonical ranking value; HIGH/MED/LOW are display buckets.Coarse tiers can’t separate two “HIGH” notes; ranking loses a real signal.
S4 Provenance / proof-countRECALL_PROOF_ALPHA (0.1); --field proof_countFirst-class proof_count provenance nudges ranking ±5% and is projectable.A note proven 12 times ranks identically to an unverified one.

Scope: sharding & isolation

FeatureFlag (default)What it does · exampleWithout it
R15 Per-project sharding--global / RECALL_GLOBAL; RECALL_LEARNINGS_ROOTEach project has its own nano-graphrag shard; recall defaults to the current project’s. --global unions across all.Every project’s recall is polluted by every other’s; the relevant local note drowns.
A6 Branch-aware isolationRECALL_BRANCH / --all-branches / RECALL_ALL_BRANCHESWithin a project, each git branch/worktree gets a sub-shard; recall pins to the current branch.A speculative note from an abandoned feat/y surfaces as fact while you work feat/x.

Recall modes & inject

FeatureFlag (default)What it does · exampleWithout it
M1 ⭑ Staged 3-layer recallrecall_stages: reflect index …reflect hydrate <id…>Index-then-hydrate: returns token-capped ID-only rows; the agent hydrates only the ids it wants.Every recall pays full-body cost for every candidate; deep digs become token-prohibitive.
R10 3-tier hierarchical injectREFLECT_TIERED_INJECT (opt-in); REFLECT_SKILL_TIER_MIN_SCORE (2.0)SessionStart consults curated skills first; a strong skill hit is injected and raw recall skipped.Every session runs full raw recall even when a promoted skill already has the answer.
R11 Forced-grounding short-circuit(R10 freshness gate)If the tier-1 skill hit is fresh and high-confidence, SessionStart emits just that and never spawns the recall subprocess.Warm-project boots needlessly spawn the full pipeline — slow and noisy.
M7 Knowledge-corpus Q&Areflect corpus build <name> --tag …Snapshots a filtered KB subset into corpora/<name>.json for a primed, deterministic Q&A scope.No way to pin recall to a curated subset; every query hits the whole corpus.
M4 Pluggable-mode inheritanceREFLECT_MODE (engineering); REFLECT_MODES_DIRLoads taxonomy + prompt templates from a mode JSON (deep-merge inheritance); drives learning types + economics glyphs.One hard-coded taxonomy; research/writing workflows can’t retune what’s captured/surfaced.
O3 Persona / preference answeralways-on (project_persona)A high-confidence distilled field (e.g. testing_style='TDD') answers an open-domain query directly. “what testing style does this project use?”The question falls through to generic recall; a known team fact isn’t answered crisply.
A1 Pinned editable memory slotsREFLECT_SLOTS (opt-in)A pinned scratchpad slot per (project, name) injected at Tier-0, ahead of skills/recall, regardless of ranking.No way to force an always-present note; critical context depends on it ranking well.
R20 Skills-index queryalways-on (built in reflect.db)A queryable sqlite index of installed skills (name/tags/summary) replaces per-query SKILL.md scanning; feeds R10/R11.Tiered inject rescans every SKILL.md per query; slower, and never matches an unindexed skill.

Caching, dedup & negative-recall

FeatureFlag (default)What it does · exampleWithout it
R9 Fuzzy cache tierRECALL_FUZZY_CACHE (on); RECALL_FUZZY_THRESHOLD (0.85)A reworded repeat within the Jaccard threshold is served from cache — skips embed+graph+rerank.Every rephrasing pays the full retrieval cost; a debugging back-and-forth re-runs it dozens of times.
S7 Chunk-hash dedupalways-on (chunk_hashes)Slice-chunk hashing at drain so re-draining the same transcript doesn’t duplicate the learning.Recall returns two copies of the same lesson; duplicates crowd the top-k.
C1 Semantic-dedup adjudicationREFLECT_DEDUP_THRESHOLD (0.97; ≥1.0 disables)Before a CREATE lands, an embedding-cosine twin ≥ threshold is held as a “merge?” adjudication.Near-identical phrasings accumulate; the KB bloats with restatements of one idea.
A3 forget_after TTL pruneper-row forget_after (hourly sweep)Expired learnings are archived and moved to .forgotten/; permanent/future ones survive.Stale, time-boxed notes linger forever and keep surfacing past their relevance.
SG6 Knowledge-gap signalRECALL_GAP_LOG (on) / --no-gap-logA 0-result recall logs {query, normalized, session_id} to knowledge-gaps.jsonl as a curation backlog.Misses vanish silently; you never learn what the KB should have known.
S1 Structured field extraction--field NAMEProjects a single typed field (rule/fix/problem/…) instead of the whole note. --field rule.Every hit returns the full note body; context-expensive when you only need the rule.

Worked examples

Illustrative command → output for the marquee arms (see the note at the top — representative, not a live run).

R1 · graph-expansion

$ reflect search "why does checkout call recalcTax twice"
✓ recalcTax is idempotent but expensive (vector + bm25)
✓ double-call fixes an EU-VAT rounding bug (a1b2c3) (graph: caused_by hop) ← never matched lexically

R2 · cross-encoder rerank

$ reflect search "flaky test in the auth suite"
RRF rank 1: auth token format … ← keyword-heavy, wrong
→ CE rank 1: auth integration test flaky under parallel xdist ← answers the question

R5 · temporal arm

$ reflect search "what's our current API auth"
✓ migrated to server-side sessions (Jun) ← temporal arm lifts recent
we use JWT (Apr) ← older, more-cited, demoted

R7 · OOD gate

$ reflect search "totally unrelated topic" --min-overlap 0.3
∅ ood_gated — top hit overlap 0.08 < 0.3 → injected nothing (no least-bad junk)

M1 · staged recall

$ reflect index "tokio panic on shutdown" # token-capped id+title rows
[a17] Graceful tokio shutdown ordering score 0.82
[c44] Abort vs cancel on JoinHandle score 0.71
$ reflect hydrate a17 c44 # full bodies only for what you picked

The other 29 — capture, storage, consolidation, team

Not query-time features, but the plumbing that fills and maintains the KB the recall arms read. Listed for completeness so the full 57 are accounted for.

FeatureCategoryFlag (default)What it does
M2 Writer-output classifier + breakercaptureREFLECT_DRAIN_INVALID_THRESHOLD (3)Kills + archives a drifting/poisoned writer after N bad outputs.
M3 Quota-aware writer abortcaptureREFLECT_DRAIN_DAILY_MAX / REFLECT_QUOTA_GATEDefers the whole drain queue when the daily LLM gate is closed, instead of burning the cap.
M5 Commit-reference verificationcapturealways-onChecks every cited commit hash against the repo; rejects all-fabricated notes, flags partials.
M6 Private-tag stripcapturealways-onStrips <private> spans at the LLM-prompt boundary so they never reach the writer/index.
M8 Token-economics surfacingdashboardRECALL_ECONOMICS (on)Annotates each result with discovery/read tokens + savings % and a mode glyph.
S2 Typed causal-link enumcapturealways-onClosed enum for sidecar relations (caused_by/causes/enables/prevents/contradicts/supersedes/part_of/uses).
S5 Belief-revision on ingestcapturealways-onRuns CREATE/UPDATE/DELETE against reflect.db so new learnings revise prior beliefs.
S6 History snapshot on updatecapturealways-onSnapshots the prior form into learning_history before mutating a live row.
S8 Doc→chunk→learning groupingcapturealways-onPersists each learning’s lineage back to its source transcript + chunk.
S9 Volatile-signals sidecarcapturealways-onMoves churning signals (recall_count, helpful_count…) out of note markdown into a DB sidecar — clean git diffs.
S10 Write-validate-retry loopcapturealways-on (3 attempts)Validates structure + sidecar after write; re-prompts; flags unfixable notes validated: false.
R13 Auto skill-refresh triggercapturealways-onFlags an existing skill for refresh when a learning it covers lands.
R14 Per-skill staleness signalsignalREFLECT_STALENESS_DAYS (30)Marks a skill is_stale when an in-scope learning changed after its last refresh.
SG1 Cross-turn contradictioncapturealways-onDetects + reconciles contradicting learnings at capture (sets is_latest).
SG2 Git-event capturecapturealways-onLinks commits↔sessions; demotes a reverted commit’s learnings on git revert.
SG3 Idle-sweep triggersignalREFLECT_IDLE_THRESHOLD_SEC; RECALL_SPECULATIVE_ALPHA (0.2)Idle timer sweeps quiet transcripts into speculative learnings (down-ranked at recall).
SG4 Test-outcome parsingsignalalways-onParses pass/fail from Bash output in PostToolUse into a capture signal.
SG5 Tool-loop detectionsignalalways-onDetects repeated/oscillating tool-call loops as a signal.
SG7 TodoWrite completion signalsignalalways-onEmits a “how I did X” candidate when a todo flips to completed.
SG8 Permission-reply capturesignalalways-onCaptures permission-prompt allow/deny replies as policy learnings.
A2 Bitemporal graph edgesinfraalways-onEdges carry tcommit / tvalid / tvalid_end — “what was true” vs “what we knew” stay separable.
A4 Followup-rate diagnosticdashboardRECALL_FOLLOWUP (on)Logs a recall-quality verdict (did the user immediately re-query differently?) to metrics.
A5 Synthetic compression fallbackcapturedrain --no-llmBuilds a structured learning from heuristics alone when the drain LLM is unavailable.
C2 Auto-consolidation thresholdconsolidationREFLECT_SYNTHESIS_AUTO_THRESHOLD (30)Fires the synthesis pass early once learnings-since-last-consolidation cross the threshold.
C3 Graph maintenance sweepconsolidationalways-on (every N drains)Structural rewrite of the local graphml to repair orphan edges after deletes.
C4 Lifecycle events fan-outinfraREFLECT_EVENTS_ON_<EVENT>Appends lifecycle events to events.jsonl + runs per-event shell hooks (local webhooks).
C5 KB export/import round-tripteamkb_export.py / kb_import.pySnapshots documents/ + reflect.db into one git-friendly tarball; byte-identical restore elsewhere.
O1 Consolidated observationsconsolidationalways-onA 2nd drain stream of persona/convention statements that accumulate evidence over time.
O2 Auto-refreshing conventions docconsolidationalways-onRe-renders a conventions markdown doc from accumulated observations each consolidation.

Every feature above is verified by a behavioural proof under tests/eval/behavioral/proofs/ (57 proof_*.py, one per port) that demonstrates the behaviour with the knob on and off. See the reflect CLI reference for how to drive recall directly.