Memory Governance Operations
Operator runbook for durable memory, embedding lifecycle, and shared knowledge governance.
Scope
This runbook covers:
- Durable memory behavior across restarts and model switches
- Embedding install and switching workflow
- Knowledge governance and quorum voting operations
For architecture context, see:
Durable Memory Concept
Durable memory means runtime state survives process restarts and host reboots unless an explicit wipe path is used.
What persists:
- Timeline and operational state in
~/.kafclaw/timeline.db - Knowledge dedup ledger (
knowledge_idempotency) - Knowledge facts/proposals/votes tables
- Existing text-only memory rows before first embedding enable
What can be wiped intentionally:
- Vector memory (
memory_chunks) on confirmed embedding switch
Durability guarantees:
- Restart does not drop open tasks, heartbeats, or knowledge governance history.
- First-time embedding enable does not wipe existing memory rows.
- Embedding model switch requires explicit confirmation and then wipes vectors to prevent mixed embedding spaces.
Durability Matrix
| Event | Open tasks and heartbeats | Knowledge history | Vector memory (memory_chunks) | Notes |
|---|---|---|---|---|
| Process restart | Preserved | Preserved | Preserved | State reloaded from ~/.kafclaw/timeline.db |
| Host reboot | Preserved | Preserved | Preserved | Same as restart after service comes back |
| First embedding enable | Preserved | Preserved | Preserved (new vectors added) | No wipe of existing text-only rows |
Embedding model switch (--confirm-memory-wipe) | Preserved | Preserved | Wiped then reindexed | Required to avoid mixed embedding spaces |
Governance disabled (knowledge.governanceEnabled=false) | Preserved | Preserved | Unchanged | Governed apply paths stop; presence/capabilities can still publish |
Embedding Lifecycle Operations
Initial or corrective setup:
./kafclaw configure
./kafclaw configure --non-interactive --memory-embedding-enabled-set --memory-embedding-enabled=true --memory-embedding-provider local-hf --memory-embedding-model BAAI/bge-small-en-v1.5 --memory-embedding-dimension 384
./kafclaw doctor
./kafclaw doctor --fix
Switch embedding model (destructive for vectors only):
./kafclaw configure --non-interactive --memory-embedding-model BAAI/bge-base-en-v1.5 --confirm-memory-wipe
Embedding switch policy:
- Configure blocks a switch when embedded memory already exists unless
--confirm-memory-wipeis provided. - First-time embedding enable (from disabled to configured) does not wipe existing text-only memory rows.
- On confirmed switch,
memory_chunksis wiped so old vectors do not mix with new embedding space.
Runtime endpoints:
curl -s http://127.0.0.1:18791/api/v1/memory/status
curl -s http://127.0.0.1:18791/api/v1/memory/metrics
curl -s http://127.0.0.1:18791/api/v1/memory/embedding/status
curl -s http://127.0.0.1:18791/api/v1/memory/embedding/healthz
curl -X POST http://127.0.0.1:18791/api/v1/memory/embedding/install \
-H 'Content-Type: application/json' \
-d '{"model":"BAAI/bge-small-en-v1.5"}'
curl -X POST http://127.0.0.1:18791/api/v1/memory/embedding/reindex \
-H 'Content-Type: application/json' \
-d '{"confirmWipe":true,"reason":"embedding_switch"}'
Knowledge Governance Operations
Identity prerequisites for governance:
./kafclaw config set node.clawId "claw-a"
./kafclaw config set node.instanceId "inst-a"
- Governance commands and apply paths require both identity keys.
Status and proposal/vote flow:
./kafclaw knowledge status --json
./kafclaw knowledge propose --proposal-id p1 --group mygroup --statement "Adopt runbook v2"
./kafclaw knowledge vote --proposal-id p1 --vote yes
./kafclaw knowledge decisions --status approved --json
./kafclaw knowledge facts --group mygroup --json
Governance behavior:
- Envelope dedup is persisted in
knowledge_idempotency. - Quorum policy is controlled by
knowledge.voting.*. - Shared facts apply sequential version policy (
accepted|stale|conflict). - Apply paths are feature-gated by
knowledge.governanceEnabled.
Cascade Failure Triage Snippet
Use this quick flow when a cascading task is stuck or failed.
- Pull the full state machine for the trace:
kafclaw task status --trace <trace-id> --json
- Check transition order for the failing task:
- expected happy path:
pending -> running -> self_test -> validated -> committed -> released_next - expected retry path:
self_test -> pendingwhen retries remain - terminal paths:
failedon runtime, validation budget, or commit errors
- expected happy path:
- Identify deterministic failure reason in JSON:
missing_inputmissing_outputinvalid_rules- runtime or commit error metadata
- escalation marker (when retry threshold exceeded):
artifact.escalationwith fallback model/tooling hints
- Apply targeted fix:
- contract issue: update
required_inputorproduced_output - validation issue: tighten or correct
validation_rules - runtime issue: fix tool or execution precondition
- contract issue: update
- Re-run task and verify transition recovery:
kafclaw task status --trace <trace-id> --json