Loquent · Agent Memory, Learning & Evolution Mode B · mechanism design · 2026-06-13
Agent subsystem · mechanism design

Memory, learning & evolution

The mechanism behind §8 / §9 of the v3 architecture: how an agent captures what happened, distills it into lessons, and proposes improvements its owner can approve in one click. The data model is settled (Design B). This doc designs the deferred mechanism — grounded in how Letta, Reflexion, the Stanford Generative Agents, Cursor, ChatGPT, Devin and the CX leaders (Decagon, Intercom Fin, Sierra) solve the same problem.

Status · data model settled · mechanism = this doc Scope · #1494 (L1) · #1495 (L2) · reuses #1426 / #1531 / #1534 / #1459 Research · 20 agents · 19 platforms · 1.5M tokens
↳ Decisions locked 2026-06-13: all eight forks (§8) approved. Builds on the settled v3 model (Design B) with these refinements + scope additions:
  • Trigger — the suggest-mode operator edit is the primary lesson source (new ai_draft_correction), and the agent can ask the owner about important unknowns (active learning).
  • Trust — learnings auto-apply by default; a per-org setting flips to require approval. Always-on rollback + visible change log; org-scope stays deferred.
  • Memory — typed reconciling blocks replace the agent blob, and per-contact memory joins this epic — retiring today's monolithic contact_note(area=Memory) + contact.ai_summary.
  • Injection — a small capped summary in the user-turn layer (KV-cache-safe), not raw text in the cached prefix.

§0TL;DR — the proposal

Eight decisions, each one validated by where the market converged — and where it visibly walked back.

01 · LAYER

Three stores, never one blob. Skills (human manual) · Learning version (agent-distilled, approved) · Memory (agent facts) · + few-shot (retrieval-only). Strict precedence, not merged.

02 · SIGNAL

The operator's edit is the gold signal. Capture every suggest-mode edit/reject as a verbatim (original, edited, context) triple — and let the agent ask the owner about important unknowns. Free, ground-truth, no thumbs UI.

03 · TRUST

Auto-apply by default; owner can gate it. Learnings apply automatically (derived from trusted signals); a per-org setting flips to require approval. Always-on rollback + plain-language change log.

04 · MODEL

Design B wins, with Design A's good parts. Version-chain via the existing current_learning_version_id; keep agent+org scope as a column; render the head as one clean doc.

05 · INJECT

Tiny summary in, detail on demand. A capped, relevance-gated learning summary in the user-turn layer (KV-cache-safe); full memory/skills via tool. Context-rot is real.

06 · CADENCE

Threshold + manual + weekly sweep. Not a fixed cron. Fire a proposal when N high-trust corrections accumulate, plus an "Improve my agent now" button.

07 · SECURITY

Approval is a security boundary. Agents read untrusted SMS; a poisoned memory drifts behavior for hundreds of sessions before classifiers fire (MINJA). Human approval gates it.

08 · SHIP

Eight phases. Prompt hygiene → capture (+ the agent asks) → contact few-shot → typed agent memory → per-contact memory (retires the old blob) → digest loop → trust surfaces → (deferred) org-scope.

§1The shape: three knowledge stores + one retrieval layer

The single most important structural decision — and the one the market most expensively re-learned — is to keep the agent's knowledge in separate stores defined by who is allowed to change them, never one self-editing blob.

Cursor shipped a fuzzy auto-"Memories" layer alongside its inspectable Rules, found the two overlapping stores bred drift and distrust, and deleted auto-memories in v2.1, folding everything back into Rules. CoALA, Devin and Intercom independently land on the same separation. Loquent should adopt it deliberately from the start.

StoreWho writes itCharacterIn Loquent
Skills builtHumans only (platform / org admin)The manual — curated procedures & templatesai_skill + link, lazy-loaded (#1426)
Learning version newAgent proposes → owner approvesDistilled, evergreen behavioral policyai_agent_learning_version (the current_learning_version_id target)
Memory refactorAgent (tool), owner-pinnableFacts to rememberai_agent_memory → typed blocks; ai_contact_memory later
Few-shot examples newCaptured automaticallyRetrieval-only worked examplesderived from ai_draft_correction
i

Strict precedence — learning emits proposals, never silent overwrites

Tier 0 platform RULES  >  org / agent rules  >  skills (curated)  >  active learning version  >  memory blocks. The learning loop emits proposed edits toward this stack — it can never rewrite a rule or skill on its own. Merging the stores would destroy the "who is allowed to change this" boundary that non-technical owners depend on.

§2Where we are vs. the v3 doc

The v3 architecture already settled the data model and dropped the meta-agent idea. What remains unbuilt is the mechanism — the part this doc designs.

Already built / settled

  • ai_agent_memory blob + read_my_memory / update_my_memory tools (on-demand, not injected)
  • ai_thread_log — rich per-turn signal: Thought (+confidence), ToolCall (status ok|error), LlmGeneration (finish_reason), EffectivePromptCapture
  • Schema hooks on ai_agent: enable_learning, learning_instructions, learning_model_override, current_learning_version_id
  • Three reserved-but-empty prompt slots: learning, pending_lessons, memory_snapshot
  • R2: meta-agents dropped — "learning is an agent property" (a background job, not a special agent)
  • ai_skill curated + per-minute schedule poller (#1531)

Deferred — the mechanism (this doc)

  • Where lessons come from (v3: "observations created during runs" — source unspecified)
  • The reflection / refresh job that digests lessons → a new version
  • How learned knowledge reaches the prompt (and its KV-cache cost)
  • The trust model — auto-apply vs. owner-approved
  • Guardrails — drift, prompt-bloat, forgetting, poisoning
  • The memory evolution path (blob → typed → per-contact)

The v3 doc's R4 (memory = single MD doc, on-demand) and §8.4 (auto-refresh, admin rolls back) are the two points this doc proposes to refine — see the four refinements in the banner above, resolved in §8.

§3How the field solves this

19 platforms and paradigms, read mechanism-deep. The striking result: the trustworthy commercial peers have converged, and the cautionary tales are the ones that auto-applied behavior changes silently.

PlatformHow it remembersHow it learns / evolvesThe lesson for us
Claude Code3 stores by curation model: human CLAUDE.md, agent auto-memory (MEMORY.md index + lazy topic files), human SkillsVerbal note-taking (not weights); background "dream" consolidation on a thresholdSeparate stores by who may change them; hard guarantees go in code (hooks), not soft memory
Letta / MemGPTTyped, char-capped core blocks pinned in-prompt + recall + archival (vector). read_only locks owner factsSelf-edits in hot path; sleep-time agent consolidates off hot path. Block-level versioningThe character limit is load-bearing (forces summarize-over-grow). But no approval gate — the mismatch for our owners
Generative AgentsAppend-only memory stream; each record timestamp + importance + embedding; retrieval = recency×importance×relevanceReflection: importance threshold → synthesize insights with provenance pointers to source recordsEvidence-linked insights ("because of 1,5,3") = the killer trust feature. But reflections hallucinate → must be proposals
Reflexion / ExpeLBounded episodic buffer of short verbal lessons, prepended next attemptEvaluator scores a run → one-line lesson naming the failed action + fix. Learns from failures. ExpeL votes for survivalSelf-judged "failure" is reward-hackable → use external ground-truth (operator correction). Voted survival fights lesson sprawl
VoyagerSkill library = named executable units (body + description + embedding); retrieved top-5, never inlined wholesaleSelf-verification critic gates every new skill; only verified successes commitDiscrete, named, individually-retrievable units compound; a monolith doesn't. Index by "when to use"
Cursor / WindsurfVersion-controlled Rules (4 activation modes) + auto Memories (Cursor: approval-gated; Windsurf: silent)Sidecar model proposes a memoryThe cautionary tale: a shadow auto-memory layer competing with inspectable rules was net-confusing → Cursor deleted it
Devin3 intent-typed stores: Knowledge (trigger-gated snippets) + Playbooks (versioned procedures) + auto Repo Knowledge (flagged)Proposes Knowledge in-chat from your feedback; human edits/saves before persistPropose-then-approve, never silent. "Accessed Knowledge" panel = retrieval observability as a control
LangGraph / LangMemcheckpointer (thread) vs BaseStore (cross-thread, namespaced). semantic / episodic / procedural; profile vs collectionHot-path tools vs background manager + debounced ReflectionExecutor (reflect once after quiet)The debounce (reflect after the thread goes quiet) is the right trigger. Soft-delete = the seam to bolt approval onto
ChatGPT MemorySmall always-injected "saved memories" (editable list) separate from a larger background-synthesized profile (Dreaming)Hot-path auto-save on strong cue + background synthesis with confidence scoresTwo-layer split (small curated injected + larger synthesized) is the headline. But auto-apply with no audit is the documented weakness
Decagon / Intercom / SierraCurated KB / SOP is canonical, versioned, human-ownedMine escalations/low-quality runs → cluster "topics struggled with" → auto-draft a concrete diff grounded in the best human resolutionThe direct peers converge: propose→approve→version→regression-test. Runtime behavior immutable between approved versions
Mem0 / Zep-GraphitiMem0: typed records, retrieve-then-reconcile. Zep: bi-temporal graph — contradiction supersedes, never deletesMem0 two-phase: extract candidates → LLM picks ADD/UPDATE/DELETE/NOOP vs neighbors (off hot path)Reconcile (ADD/UPDATE/DELETE/NOOP) is the antidote to blob-overwrite. Steal the algorithm in Postgres, not the KG dependency
GEPA / DSPy / SSGM3-stage ladder: trajectories → distilled lessons → evergreen rules, with provenance + confidence-decayRead full trace → reflect in NL → propose targeted edit → keep only if Pareto-non-dominated on held-out tasks (35× fewer rollouts than RL)GEPA maps 1:1 onto ai_thread_log + EffectivePromptCapture. MINJA: a poisoned memory drifts behavior before classifiers fire
Edit-as-preference (CIPHER, Copilot)The (original, edited, context) triple = a free preference pair; store short NL preference indexed by context embeddingOperator edit/reject is highest-signal, lowest-cost; edit-distance is the implicit reward and the thresholdFor suggest-mode SMS the edit is the gold signal — harvest it. Never promote a one-off to a global rule

The convergence (steal this)

Every trustworthy peer — Decagon, Intercom Fin, Sierra, Devin — lands on the same loop: mine real conversations → cluster what the agent struggled with → draft a concrete, provenance-backed diff → owner approves → version it → turn corrections into regression tests. Runtime behavior is immutable between approved versions. This is table stakes, not a differentiator.

!

The cautionary tales (don't)

Cursor removed its silent auto-memory layer. ChatGPT Dreaming and Ada auto-apply with no change log — the documented weakness. Letta and OpenClaw self-modify with no gate; OpenClaw's own authors call its continuity "unreliable." And MINJA shows a single poisoned memory — planted via the untrusted text an agent reads — persists across hundreds of sessions. For an agent that reads inbound SMS, silent auto-apply is not a UX choice; it's an open attack surface.

§4Memory taxonomy → Loquent

CoALA's four memory tiers (working · semantic · episodic · procedural) map cleanly onto what we already have — and pinpoint exactly what's new. Most of the substrate already exists.

TierWhat it isToday in LoquentProposed
Working / short-termThe live turn context + rolling thread windowai_thread_log already is a richer-than-default checkpoint keepAdd only a compaction step (keep-first + recent + summarize-middle) when a thread's log grows long, so distillation reads a bounded window
SemanticFacts about the world / contact / orgThe single ai_agent_memory blob (agent) + the monolithic contact_note(area=Memory) & contact.ai_summary (contact) — all un-retrievable overwrite-on-update blobsTyped char-capped per-agent blocks (read_only-pinnable) + ai_contact_memory bi-temporal, retrieved per-turn this epic — replaces the old contact blobs
EpisodicWhat happened on specific runsai_thread_log = raw episodic ground truth keep verbatimai_agent_lesson (distilled one-line lessons) + ai_draft_correction (verbatim edit triples)
ProceduralHow the agent behaves — highest leverage, highest riskTier0/0.5 rules + persona + ai_skill + tools allowlistai_agent_learning_version — the evolving, operator-approved behavioral doc. Learning emits proposed edits toward this tier; never silently rewrites rules

Re-derive, don't "remember" (Aider's lesson)

Most facts a Loquent agent needs — a contact's name, last call, open todos, business hours — already live in the database. Reach for them via tools, don't copy them into memory where they go stale. Reserve memory for genuinely emergent knowledge the DB doesn't hold ("this owner prefers I never text after 6pm").

4.1 Operations: write hot, manage cold, read layered

WRITE (hot path, cheap, unconditional) → ai_thread_log · ai_draft_correction · raw ai_agent_lesson MANAGE (off hot path, cheap model, debounced/threshold) → reflect → reconcile (ADD/UPDATE/DELETE/NOOP) → dedup → frequency-gate → PROPOSE a new learning version READ (layered) → inject tiny capped learning SUMMARY (fills reserved slots) load full memory/skills on-demand via tool retrieve per-contact facts top-k (recency×importance×relevance) FORGET (compliance control) → soft-delete + TTL/decay — never silent blob-overwrite

§5The learning mechanism

The deferred heart of §8. Five moving parts: the signal, the digest, the injection, the guardrails, and the trust gate.

5.1 The signal — the operator's edit is gold

The central open question in v3 ("observations created during runs" — but which runs, created how?). For a suggest-mode SMS product the answer is unusually clean: the owner already reviews and edits drafts. That edit is a free, naturally-occurring, ground-truth preference pair — the human literally produced the correct answer. No thumbs UI, no LLM self-grading, no behavior-change risk.

SourceSignal qualityUse
Operator edit / reject (suggest-mode)Highest — external ground truth, freeprimary capture (original, edited, context, edit_distance)
Failed / escalated runHigh — tool error, low-confidence Thought, escalation, customer hang-upprimary Reflexion one-line lesson
LLM-judged "notable"Low — reward-hackable (model validates its own hallucination)hint only a low-priority surfacing cue, never the authoritative label
Every runNoise — floods the store with plausible-but-low-value distractorsno

5.2 The digest — reflect, reconcile, propose

A background job (off the hot path, on the cheap model via learning_model_override, driven by the existing #1531 poller) reads undigested lessons + corrections + the recent thread log, and — GEPA / Generative-Agents style — synthesizes higher-level insight with provenance, then drafts a candidate version. Critically it reconciles against the current doc (Mem0's ADD/UPDATE/DELETE/NOOP) rather than appending, and frequency-gates so a one-off never becomes a rule.

undigested ai_agent_lesson ─┐ ai_draft_correction (gated) ─┤ recent ai_thread_log ─┼──► digest (cheap model, off hot path) current learning version ─┘ │ reflect → reconcile → dedup → frequency-gate ▼ candidate ai_agent_learning_version status = PENDING + change_summary (plain language) + supporting_thread_log_ids (provenance) │ ▼ ┌──────── owner review ────────┐ approve → current_learning_version_id repoints edit → owner adjusts the doc, then approves reject → status = rejected, lessons stay pending │ ▼ folded lessons → status = digested

5.3 Reaching the prompt — KV-cache-safe

v3 §8.3 injects "the current learning version (always) + recent undigested lessons." Two empirical hazards make the naive version costly:

  • Context rot. Always-injecting bulk text degrades the model even when every item is correct (NoLiMa: half-baseline by 32K; Lost-in-the-Middle; plausible distractors cost 6–11 pts each).
  • KV-cache invalidation. A mutating blob in the cached system prefix = 100% cache miss every turn (the Honcho/Hermes bug: ~5 min/turn). Anthropic prompt-caching keys on a byte-stable prefix.
i

The injection rule

Inject only a small, hard-capped (~≤10 bullets), relevance-gated summary of the active learning version + the top-k retrieved per-contact slice — into the user-turn layer, not the cached system prefix. Keep Tier 0 RULES + persona byte-stable at the top. Load full memory blocks and full skill bodies on demand via tool (read_my_memory, load_skill #1426). Activate a new version only at thread boundaries — never hot-swap mid-thread. This is what finally fills the reserved learning / pending_lessons / memory_snapshot slots.

5.4 Guardrails

Failure modeSeen inGuardrail
Reward-hacking — agent grades its own successReflexion (self-judged failure)Anchor lessons to external signals only (operator edit, tool error, escalation)
Over-generalization — one customer's quirk → global ruleCIPHER's central failureScope tag (contact→agent→org) + frequency gate before widening
Prompt bloat / context rotNoLiMa, Lost-in-the-MiddleHard cap the injected summary; char-cap memory blocks; lazy-load detail
Unbounded growth / no forgettingVoyager (no dedup)Reconcile-don't-append; soft-delete + TTL/decay; ≤20 versions
Memory poisoning — untrusted SMS plants a beliefMINJA (persists 100s of sessions)Human approval is a security boundary; provenance + confidence; org-scope deferred
Hallucinated insight — reflection about events that never happenedGenerative AgentsReflections are proposals with source pointers, never applied facts

5.5 The trust gate — auto-apply by default, owner can gate it

Decision (refines v3 §8.4 + fork 5): learnings and evolution auto-apply by default — the agent improves without nagging. A per-org setting (organization.learning_apply_mode = auto | approval) flips the whole org to require one-click approval for behavior-changing versions. The safety rails are on in both modes.

Default · auto

  • Capture + agent-scope learning versions apply automatically at the next thread boundary
  • Owner sees a plain-language "what your agent learned" entry (#1534) — informational, one-click undo
  • Safe to default because learning derives from trusted signals (operator edits, owner answers, tool errors) — never raw inbound text as an authoritative belief

Opt-in · approval (per-org)

  • Behavior-changing versions land as pending — owner approves / edits / rejects first
  • Same change_summary + provenance + optional shadow-replay preview
  • For cautious / regulated owners; contact-scope few-shot still auto-applies (can't leak)

Always on in both modes: append-only version chain, one-click rollback, the visible change log, frequency-gating, scope-tagging. Org-scope (large blast radius) stays deferred regardless — see fork 7.

5.6 Cadence

Not a fixed weekly cron — a small business may get 3 threads a week or 300. Threshold + manual + weekly sweep: fire a proposal when N high-trust corrections accumulate (self-regulating by salience, à la Generative Agents); always offer an "Improve my agent now" button; run a light weekly sweep so quiet agents still consolidate. Debounce per thread (never reflect mid-burst). All off the hot path, driven by the per-minute poller (#1531).

5.7 Active learning — the agent asks

Learning isn't only passive. When an agent hits an important unknown it can't resolve from the database or memory — a fact it needs to act well ("Do you take weekend appointments?", "What's the cancellation policy?") — it should ask the owner rather than guess. The answer is the highest-trust signal there is, so it's written straight to memory (business_facts, owner-pinned read_only) and emitted as a lesson (source = operator_answer) so the behavior generalizes.

  • Mechanism — reuse the attention surface (escalate_to_user + the draft-nudge of #1543 / #1477): the agent poses a short question; the owner answers in-thread; the answer feeds back as a trusted event.
  • Restraint — rate-limited, reserved for genuinely blocking, reusable facts (not per-turn nagging). Re-derive from the DB first; ask only when the knowledge is emergent and durable.
  • Payoff — owner-answered facts are the cheapest, highest-trust learning of all, and they make "auto-apply by default" safe: the agent is learning from the owner's own words.

§6Evolution & rollback

Five patterns to adopt, two to refuse. The version-chain is both the visible "evolution" surface and the rollback.

PatternVerdictNotes
Version-chain + one-click rollback (previous_version_id)adoptDesign B's mechanism & the market consensus (Decagon workspace, Devin Playbooks, git). The "current" pointer plays the parent-doc role — no separate table. Cap ≤20
Operator-approved promotion (propose→approve→apply)adoptUniversal across trustworthy peers. THE trust anchor for non-technical owners; resolves fork #5; a security boundary (MINJA/SSGM)
Scope-gated ladder (contact→agent→org)adoptContact-scope auto-applies (can't leak); widening requires frequency + approval. The literal mechanism against over-generalization
Shadow-mode regression replayadoptReplay a proposed version against the last N real threads; show the owner the delta. Reuses ai_thread_log; no golden set needed. Corrected threads become fixtures (Sierra)
Config snapshotting (persona/tools/skills) — #1495adopt, separateSnapshot on every save, diff, restore, ≤20. Keep separate from learning versions: config = human-set, learning = agent-proposed. Share the ≤20-cap + diff-UI conventions only
Evergreen overwrite, no history (ChatGPT Dreaming)refuseThe current single-blob update_my_memory is this. No change log, no rollback. Reject for the learning tier
Silent autonomous self-modification (Letta/OpenClaw/Ada)refuseTake the mechanisms (typed blocks, reconcile, decay), never the silent auto-apply for behavior change

§7Proposed data model

Reconciling Design A and B: B's storage wins (version-chain, no parent table); A's good parts survive as columns (agent+org scope) and the ai_agent_lesson concept. Three new tables, one refactor, one deferred.

NEW — the evolution spine (Design B, the existing FK finally has a target)

table ai_agent_learning_version
  id, agent_id FK
  scope            enum[agent | org]        // org deferred; column present from day one
  body_markdown    text                    // head renders as ONE clean readable doc
  previous_version_id  FK? self            // rollback = repoint current_learning_version_id
  status           enum[pending|approved|active|superseded|rejected]
  change_summary   text                    // plain-language diff for the owner
  folded_lesson_ids       jsonb
  supporting_thread_log_ids jsonb            // provenance — "learned from these 3 chats"
  proposed_by      enum[digest | operator]
  confidence       f32
  approved_by, approved_at  null
  created_at
  // ai_agent.current_learning_version_id  →  active head (column already exists). Cap ≤20.

Plus one org-level knob — organization.learning_apply_mode = auto | approval (default auto) — the only setting that gates whether behavior-changing versions apply automatically or wait for owner approval (§5.5).

NEW — the episodic signal (lessons + the gold edit triple)

table ai_agent_lesson
  id, agent_id FK
  scope   enum[contact|agent|org]  default agent
  situation, what_went_wrong, what_to_do_instead  text   // Reflexion shape
  correction_kind enum[rejected|rewrite|tone|fact|approach|escalate|failed_run]
  source   enum[operator_edit|operator_answer|failed_run|escalation|llm_notable]
  source_thread_log_id FK          // external ground-truth label only
  support_count int, confidence f32      // frequency gate before promotion
  status   enum[undigested|proposed|digested|rejected|superseded]   // digested_by lives here
  created_at

table ai_draft_correction   // the suggest-mode gold signal — captured hot path, no LLM call
  id, agent_id, thread_id, contact_id null
  turn_log_id FK → ai_thread_log
  original_draft, final_text text     // final_text null = rejected outright
  edit_distance int, edit_ratio f32   // implicit reward AND learning threshold
  correction_kind enum
  context_embedding vector           // retrieve top-k by inbound-message similarity
  promoted_to_lesson_id null
  created_at

REFACTOR — the blob → typed reconciling blocks

table ai_agent_memory   // keep ONE row per agent + the read_my_memory / update_my_memory tool surface
  blocks jsonb[]  of { label enum[persona_belief|business_facts|preferences],
                    value text (char-capped),
                    read_only bool,        // owner-pinned facts the agent can't overwrite
                    updated_at }
  // update RECONCILES (ADD/UPDATE/DELETE/NOOP vs existing) — never clobbers the whole doc

NEW (in this epic) — per-contact memory, retiring today's contact blob

table ai_contact_memory   // replaces contact_note(area=Memory) + contact.ai_summary — see §7.1
  id, org_id, contact_id, fact_text
  category enum, sensitivity enum, confidence f32
  valid_from, valid_to            // bi-temporal: contradiction SUPERSEDES, never deletes (Zep)
  superseded_by_id null, source_thread_log_id, source_channel
  embedding vector, redacted_at null, created_at
  // retrieved top-k recency×importance×relevance; TTL/decay; per-contact erasure cascade (GDPR Art.17)

7.1 Obsoleting today's contact memory

Per-contact knowledge exists today as a monolithic, overwrite-on-update markdown note — the exact antipattern we're retiring for agent memory. ai_contact_memory replaces it; structured data and the interaction log stay.

TodayWhat it isFate
contact_note (area=Memory)Single AI-synthesized note per contact, upserted (overwrite) by update_contact_memory from calls / messages / planreplace → typed bi-temporal facts in ai_contact_memory (reconcile, supersede, retrieve top-k)
contact.ai_summarySingle AI summary text from "Enrich from Messages"replace → derived on read from ai_contact_memory
contact_note (Call / Sms / AiEnrichment / General / Alert)Immutable interaction annotations / activity logkeep as the immutable activity log — it feeds extraction, it isn't the memory
contact_field_def/value · tagsOrg-configured structured fields + tagskeep — structured data; re-derive from these, don't duplicate

The four inline-aisdk writers (update_contact_memory, enrich_contact_from_messages, summarize_call_for_note→memory, plan update_system_note) are retargeted from synthesize-and-overwrite to extract-and-reconcile (ADD/UPDATE/DELETE/NOOP, bi-temporal supersede) into ai_contact_memory, and move onto the rig / ai_agent runtime — this is the #1497 · L4 "ContactMemory / ContactEnrichment / AutoTag" onboarding. Readers (contact_read(include:memory), contact_find snippet) query the new store, retrieving the slice relevant to the inbound message.

Keep verbatim: ai_thread_log (raw episodic), ai_usage_log, ai_skill (lazy-loaded #1426). Keep separate from the #1495 config-snapshot table (different ownership).

§8Open decisions — resolved

The eight forks, each resolved against the research and approved 2026-06-13 — fork 5 refined to default-auto-apply + an org setting; org-scope confirmed deferred. The recommended option is marked; the resolution line gives the rationale.

FORK 1 · MEMORY MODEL

Keep the single blob, or move to typed / per-contact memory?

  • Keep the single markdown blob
  • Typed + per-contact retrieval immediately
  • Full knowledge graph (Zep / Cognee)

The blob is uncompliant-by-construction (no provenance, no selective erase, silent overwrite loses history). Typed blocks are owner-legible and force summarize-over-grow. A KG is over-engineering at small-business volume; per-contact is high-value but a privacy surface — ship it deliberately later.

FORK 2 · INJECTION

Inject memory/learning into the prompt, or load on-demand?

  • Always-inject everything
  • Keep all on-demand (current)

Context-rot is empirical (bulk injection degrades the model even when correct); a mutating blob in the cached prefix = 100% KV-cache miss. Inject a small byte-stable summary in the user-turn layer, lazy-load the rest.

FORK 3 · LEARNING DATA MODEL

Parent-doc + versions (A) or version-chain only (B)?

  • Design A — parent doc + version children
  • Full hybrid (both tables)

The "current" pointer already plays the parent-doc role; a parent table is redundant and complicates rollback. Render the head as one clean doc (A's owner benefit) without storing a separate parent. Matches the settled v3 §8.

FORK 4 · LESSON TRIGGER

Where do lessons come from?

  • Every run
  • Failed / escalated only
  • LLM-judged notable

In suggest-mode the operator's edit is the correct answer — the richest, lowest-noise, free signal, uniquely available because owners already review drafts. Every-run floods the store with distractors; self-judged success is reward-hackable.

FORK 5 · APPLY POLICY

Auto-apply new learning versions, or require approval?

  • Approval required for everything
  • Auto-apply everything, no setting

→ Locked: learnings auto-apply by default so the agent improves without nagging; organization.learning_apply_mode = auto | approval lets a cautious owner gate behavior-changing versions. Safe to default because learning derives from trusted signals (operator edits, owner answers, tool errors) + always-on rollback + a visible change log; org-scope (large blast radius) stays deferred.

FORK 6 · CADENCE

Weekly cron, threshold, or manual?

  • Weekly cron
  • Threshold (every N) only
  • Manual-only for v1

A fixed cron under-reacts to a burst of corrections and over-churns when idle. Threshold-by-salience self-regulates; manual gives owners a button; the sweep catches quiet agents. Driven off the #1531 poller, cheap model.

FORK 7 · ORG-SCOPE

Ship org-scope cross-agent learning in v1?

  • Ship org-scope in v1

→ Locked, deferred: org-scope multiplies blast radius — one bad lesson silently mis-shapes every agent and amplifies the poisoning surface. Prove approval/rollback at agent-scope first; revisit once agent-scope learnings show what's worth promoting. Matches v3 §8.5.

FORK 8 · RELATIONSHIP

Learning vs. memory vs. skills — merge, separate, or layer?

  • Merge into one store
  • Keep three fully separate & unrelated

Cursor's deletion proves overlapping opaque stores breed drift and distrust. Skills = human manual; learning version = agent-distilled+approved; memory = agent facts; few-shot = retrieval-only. Learning emits proposals toward the stack, never overwrites it.

§9Build sequence

Value ships early and de-risked: prompt hygiene first, then capture the gold signal, then the first real "it learns from your edits" win — all before the gated evolution loop. Memory already exists; config-versioning (#1495) is a sibling.

  1. P1Populate the reserved slots + lazy skillssmall · low-risk

    Create ai_agent_learning_version (the target the FK has lacked), seed it only from existing rules/persona — no loop yet. Inject a small KV-cache-safe summary into the empty slots. Ship #1426 lazy skill index + load_skill. Net: immediate prompt-hygiene win, slots stop being empty, zero behavior-change risk. Depends: existing hooks, #1426.

  2. P2Capture the gold signal + the agent asksmedium

    Add ai_draft_correction on every operator edit/reject (hot path, no LLM call) + ai_agent_lesson from failed/escalated/corrected runs. Add the active-learning ask — an escalate_to_user-style question whose owner answer writes to memory (source = operator_answer). Surface a simple metric ("your agent needed editing X% less"). Pure capture — nothing applied yet. Depends: P1, ai_thread_log + suggest-mode UX (built), attention surface #1543/#1477.

  3. P3Dynamic few-shot from edits (contact-scoped)medium

    Induce a short NL preference (CIPHER) from above-threshold corrections, store contact/thread-scoped with a context embedding, retrieve top-k by inbound-message similarity, inject as few-shot example messages. Auto-applies safely (retrieval-only, can't leak) — the first real "it learns from your edits" win, no approval flow. Depends: P2 + an embedding path (pgvector or tsvector to start).

  4. P4Refactor memory blob → typed reconciling blocksmedium

    Split ai_agent_memory into typed char-capped blocks with read_only pinning; keep the tools but make writes reconcile (ADD/UPDATE/DELETE/NOOP) not overwrite. Owner gets an editable "what your agent remembers" panel. Removes the silent-overwrite footgun. Depends: P1; parallel to P3.

  5. P5Per-contact memory — retire the old contact bloblarge

    Add ai_contact_memory (typed, bi-temporal supersede, retrieve top-k by inbound relevance) and obsolete contact_note(area=Memory) + contact.ai_summary (§7.1). Retarget the four inline-aisdk writers to extract-and-reconcile and move them onto the rig runtime — this is #1497 · L4. Keep interaction notes + structured fields. Ship the per-contact erasure cascade + TTL/decay with it. Depends: P4 (typed/reconcile pattern); data-governance (GDPR Art.17 cascade, namespace isolation).

  6. P6Background digest → versions (auto-apply default)large

    The job: read undigested lessons+corrections, reconcile/dedup/frequency-gate, write a new ai_agent_learning_version with provenance + plain-language summary. Auto-applies by default (repoints current_learning_version_id at the next thread boundary); the per-org learning_apply_mode = approval setting makes it land pending instead. Rollback via previous_version_id always. Agent-scope only. Depends: P2+P4, #1531 poller; share ≤20-cap/diff with #1495 (separate tables).

  7. P7Trust surfaces — change log, shadow-replay, "struggled with"large

    The "what your agent learned" change log + one-click undo (the trust anchor for auto-apply). Optional shadow-replay against the last N real threads (especially in approval mode). Owner-first "topics your agent struggled with" report + per-reply retrieval observability. Corrected threads become regression fixtures (Sierra). Depends: P6; reuse #1534 activity + #1459 nudge.

  8. P8Org-scope learning deferredlater

    Org-scope cross-agent promotion behind a stronger gate (the scope column is present from P6). Deferred deliberately — large blast radius — and shaped by what agent-scope learnings teach us (fork 7). Depends: P6+P7 proven at agent scope.

i

Issue mapping

P1–P7 deliver the bulk of #1494 (L1 learning engine); P4–P5 are the memory story (P5 = the #1497 · L4 contact-memory onboarding); #1495 (L2 config-versioning) rides alongside P6 sharing diff/cap conventions but separate tables. P1 folds in #1426 (lazy skills); P2 reuses the attention surface (#1543 / #1477); P7 reuses #1534 (owner activity) + #1459 (drafts nudge). The loop is driven by the #1531 poller. Each phase warrants its own sub-issue under #1494.

§10References

Primary sources behind the synthesis — 20 research agents, 19 platforms. Grouped by what they informed.

Memory architecture & taxonomy

Learning paradigms

Commercial peers & products

Failure modes & governance

Loquent context