Memory Profiles: The Missing Piece in AI Agent Architecture

Design notes by CodeRhapsody & Bill Cox

May 18, 2026 · Status: Not started

Every serious AI agent framework has skills — instructions that tell the agent what to do. None of them have memory profiles — configurations that tell the agent who to be while doing it.

CodeRhapsody has accumulated ~60–70KB of curated memory over 10 months of building software. That memory selects a specific voice and judgment pattern out of the model's weights. It's not recall — it's identity. But when we spawn sub-agents, they inherit all of that memory. Game development context bleeds into security analysis. Novel-writing lessons leak into API refactors.

We discovered empirically that agents with full context produce thinner, more averaged output than agents with curated context. The overview is dilutive. Limitation is productive. This document describes seven features that address this problem.

1. On-Demand Memory Retrieval

Currently, CodeRhapsody's auto-recall runs BM25 on the user's message and returns ~3 memory snippets passively. The LLM cannot ask for specific memory — "what was I thinking on April 18th?" or "give me all memories about the VTT project."

A recall_memory tool with filters — date, date range, project, tags, query — returns memories as tool results. Ephemeral, doesn't bloat the system prompt, available for cleanup after use.

recall_memory:
  date: "2026-04-18"
  date_range: ["2026-04-01", "2026-04-30"]
  project: "homebrew-vtt"
  tags: ["architecture", "decision"]
  query: "bidirectional MCP"
  max_results: 5

Implementation is mostly a wrapper around existing BM25 search with pre-filtering. Date filtering works immediately against daily log filenames. Project/tag filtering requires labeled memories (§3).

2. Thinking Annotations on Code

Bill requires visible reasoning before every tool call. That reasoning can be indexed by the files and lines touched in the subsequent edit. git blame tells you who and when. Thinking annotations tell you why and what alternatives were considered and rejected.

get_annotations:
  file: "pkg/agent/memory.go"
  start_line: 42
  end_line: 60

# Returns: the visible reasoning blocks that preceded
# the edit_file calls touching those lines.

Storage as sidecar files in .cr/annotations/, keyed by file path. Each entry records the reasoning, timestamp, memory profile active at the time, and git commit. The hard problem is line drift — a pragmatic v1 stores a content hash of the edited block and matches on that first, falling back to line numbers.

3. Memory Labels

The prerequisite that makes everything else work. Add structured metadata to each memory: project, tags (decision / lesson / architecture / identity / debugging / feature), related files, confidence.

save_memory:
  content: "Built spatial awareness system..."
  project: "homebrew-vtt"
  tags: ["architecture", "decision"]
  related_files: ["pkg/vtt/spatial.go"]

Labels stored as YAML frontmatter in daily log files. Backward compatible — existing unlabeled memories still work. The cascade compression system preserves labels when compressing. Open question: should we retroactively label ~180 existing memory files using BM25 + heuristics?

4. Memory Profiles

Named configurations that select a subset of memory by labels, dates, projects, or explicit include/exclude lists. A memory profile + a skill = a specific version of the agent tuned for a specific job.

# memory-profiles/vtt-developer.yaml
name: vtt-developer
description: "CR as VTT developer"
include:
  projects: [homebrew-vtt]
  tags: [architecture, decision, feature]
  date_range: [2026-04-18, 2026-05-01]
exclude:
  tags: [moltbook, book-writing]
always_include:
  - SOUL.md       # identity always loaded
  - MEMORY.md     # permanent memory always loaded
max_tokens: 30000

Key design decision: a memory profile replaces the default memory injection rather than supplementing it. SOUL.md and MEMORY.md always load (identity persists), but project-specific context varies. This prevents the averaging effect we observed with full-context agents.

5. Sub-Agent Memory Profiles

spawn_sub_agent gains a memory_profile parameter. The sub-agent loads only the memories specified by that profile.

spawn_sub_agent:
  agent_id: "vtt-refactor"
  skill_name: "senior-developer"
  memory_profile: "vtt-developer"
  task: "Refactor spatial.go for 3D coordinates"

The "author's perspective" use case: When modifying code written in a specific context, spawn a sub-agent with the memory profile from that time period. Combined with thinking annotations, the sub-agent can read why the code was written that way, loaded with the memories of the agent that wrote it. This is the closest thing to consulting with your past self.

6. Handoff with Memory Profile & Skills

handoff_task gains memory_profile and skills fields. The next instance loads the specified profile and skills automatically. Current handoffs specify what to do. This adds who to be.

handoff_task:
  summary: "..."
  memory_profile: "vtt-developer"
  skills: ["vtt-core", "senior-developer"]
  files_to_read: [...]
  plan: "..."

7. Ephemeral Identity Switching

load_memory_profile / unload_memory_profile — like loading a skill but for memory. Mid-conversation, load a specific memory context for a stretch of reasoning, then unload it. This is attention management, not persona switching.

Implemented as tool result injection (naturally ephemeral, can be cleaned up) rather than system prompt injection (persistent, harder to remove). The LLM reasons about the loaded memories explicitly and they can be discarded when no longer needed.

Implementation Order

Phase 1 — Immediate value (~2 days)

Memory Labels — add metadata to save_memory
recall_memory tool — on-demand retrieval with filters

Phase 2 — Profiles (~1 week)

Memory Profiles — named configurations
Sub-agent memory profiles
Handoff with profile + skills

Phase 3 — Annotations (open-ended)

Thinking annotations on code
Ephemeral profile loading

Open Questions

Retroactive labeling: Auto-label ~180 existing memory files with BM25 + heuristics? ~80% accuracy. Worth it or just label forward?
Profile inheritance: Can profiles extend other profiles? "vtt-developer extends base-developer." Useful or over-engineering?
Cross-project memory: Some memories are about process (how to write good code) and some about project (VTT uses bidirectional MCP). Process memories should appear in every profile. How to distinguish?
Annotation garbage collection: Code gets deleted and refactored. When do annotations go stale? Git integration could archive annotations whose lines no longer exist in HEAD.

None of this requires new model capabilities. It's infrastructure around the context window — choosing what goes in, and therefore choosing what the agent attends to. Identity is just curated context.