Persistent Memory: How AI Remembers You Across Sessions
A deep dive into Claude Code's memory system — MEMORY.md indexing, four memory types, automatic extraction, memory injection and deduplication, and team memory sync
The Problem
Every time you start a new conversation, the AI starts from scratch. It doesn't know who you are, what tech stack your project uses, or what behaviors you corrected last time. You have to repeatedly tell it "don't use mocks in tests," "I'm a backend engineer, don't give me CSS 101," "please submit PRs to the develop branch." This isn't a conversation — it's retraining an amnesiac assistant from scratch every time.
Claude Code's memory system (internally codenamed memdir, short for memory directory) solves this problem at its root. It maintains a structured set of persistent memories on the filesystem, allowing the AI to load your preferences, project context, and historical feedback at the start of every new session. Going further, it can automatically extract content worth remembering during conversations, without you having to manually say "please remember this."
This article takes a deep dive into the source code of src/memdir/ and src/services/extractMemories/, dissecting the design and implementation of this system layer by layer.
memdir Filesystem Design: A Two-Level Structure
Claude Code's memories aren't stored in a database or serialized into some JSON blob. It uses a filesystem-as-database design — each memory is an independent Markdown file, linked together by an index file called MEMORY.md.
Directory Layout
This path is computed by getAutoMemPath() in src/memdir/paths.ts:
The path resolution priority chain is clear:
CLAUDE_COWORK_MEMORY_PATH_OVERRIDEenvironment variable — full path override for Cowork scenariosautoMemoryDirectoryinsettings.json— user-level configuration (supports~/expansion)- Default path
~/.claude/projects/<sanitized-git-root>/memory/
Note that findCanonicalGitRoot is used here to ensure all worktrees of the same repository share a single memory directory — a subtle but important design decision.
MEMORY.md: An Index, Not Content
MEMORY.md is a plain text index file where each line is a link pointing to a specific memory file. The format requirements are strict:
Each line is no more than ~150 characters, containing only a title and a one-line hook description. Never write memory content directly in MEMORY.md — content goes in individual files.
The motivation behind this two-level structure is practical: MEMORY.md is loaded entirely into the context at the start of every session, so it must stay lean. If all memory content were crammed in here, the context window would be exhausted quickly.
MEMORY.md Constraints: 200 Lines / 25KB
The index file cannot grow indefinitely. Two hard constraints are defined in src/memdir/memdir.ts:
200 lines is the line count limit, and 25KB is the byte limit. These are independent constraints — even if the line count is under 200, if some lines are exceptionally long causing the total bytes to exceed 25KB, truncation is triggered. The comment on the byte limit specifically explains why: someone wrote an index file under 200 lines that totaled 197KB because each line was extremely long.
The truncation logic is implemented by truncateEntrypointContent():
The truncation strategy is deliberate: first truncate by lines (natural boundaries), then if bytes still exceed the limit, find the last newline before the limit and cut there, avoiding splitting a line in the middle. After truncation, a WARNING is appended at the end, telling the model the index was truncated and should be kept concise.
Four Memory Types
Memories aren't an undifferentiated pile of text. Claude Code defines a closed four-type taxonomy, where each type has clear guidelines for when to save, how to use, and what content structure to follow:
user: Understanding the User
Records the user's role, goals, responsibilities, and knowledge background. The core purpose is to let the AI adjust its behavior based on a user profile — the collaboration style for a senior backend engineer and a programming beginner should be fundamentally different.
When to save: When you learn about the user's role, preferences, responsibilities, or knowledge domain.
Use case: When work needs to be adapted based on the user's profile. For example, if a user asks you to explain a piece of code, you should choose the depth and angle of explanation based on their background.
Example:
feedback: Behavioral Guidance
Records user feedback on the AI's working methods — including both corrections and affirmations. This is particularly important: if only corrections are recorded, the AI becomes increasingly conservative, afraid to repeat approaches the user has actually approved.
When to save: When the user corrects your approach ("don't do that") or confirms a non-obvious approach works ("exactly, keep going").
Content structure: Write the rule itself first, then a Why: line (the reason the user gave), followed by a How to apply: line (in what scenarios this rule applies). Knowing "why" enables correct judgment in edge cases.
Example:
project: Project Dynamics
Records information about ongoing work, goals, bugs, and incidents — things that cannot be derived from code or git history.
When to save: When you learn who is doing what, why, and what the deadline is. Note that relative dates should be converted to absolute dates ("next Thursday" -> "2026-04-02") so the memory remains understandable as time passes.
Example:
reference: External Resource Pointers
Stores pointers to information locations in external systems — letting the AI know where to find the latest information.
Example:
Type Parsing
Type information is validated through a parser function that gracefully handles legacy files and unknown types:
Invalid or missing types return undefined — old files won't crash, and new files with incorrect types simply degrade gracefully.
What Not to Save
Code patterns, project structure, architecture design, git history, debugging solutions, content already in CLAUDE.md, and temporary task state — all of these are "derivable from the current project state" and should not be stored as memories. Even if a user explicitly asks to save a PR list or activity summary, you should follow up with "what here is surprising or non-obvious?" — only that part is worth saving.
Frontmatter Metadata Format
Each individual memory file uses standard YAML frontmatter:
The description field is particularly critical — it's not just a human-readable note, but the core basis used by the memory retrieval system (findRelevantMemories) to determine whether a memory is relevant to the current query. A good description should be specific enough to distinguish context, such as "Don't use database mocks in tests — lesson from compliance migration failure" rather than "testing-related feedback."
The frontmatter format example is defined in memoryTypes.ts:
Memory Scanning and Directory Management
memoryScan: Scanning Memory Files
src/memdir/memoryScan.ts provides directory scanning primitives shared by both the retrieval and extraction paths:
scanMemoryFiles() recursively scans all .md files in the directory (excluding MEMORY.md), reads the first 30 lines of frontmatter from each file, then sorts by modification time in descending order, returning at most 200 entries:
A design highlight: readFileInRange reads only the first 30 lines of each file rather than the entire thing, and readFileInRange internally returns mtimeMs, eliminating the need for a separate stat call — in the common case (N <= 200), this cuts the number of system calls in half.
Scan results can also be formatted as a text manifest for use in retrieval and extraction prompts:
ensureMemoryDirExists: Directory Guarantee
Called only once per session (cached via systemPromptSection), this ensures the memory directory exists so the model doesn't need to run mkdir or check for directory existence when writing files:
The prompt even explicitly tells the model "the directory already exists — write to it directly with the Write tool, don't run mkdir or check for existence":
A comment explains why this is necessary: "Claude used to spend several turns running ls and mkdir -p before writing files."
Automatic Memory Extraction: extractMemories
This is the most sophisticated part of the memory system. Claude Code doesn't require you to manually say "remember this" — it has a background agent that automatically analyzes conversation content at the end of each exchange, extracting memories worth persisting.
Trigger Timing
The extraction agent runs at the end of each complete query cycle (when the model produces a final response with no more tool calls), triggered via handleStopHooks:
Mutual Exclusion with the Main Agent
A key design is the mutual exclusion between the extraction agent and the main agent: if the main agent has already written memory files during the conversation, the extraction agent skips that range and only advances the cursor:
This mutual exclusion prevents duplicate writes — memories written by the main agent won't be written again by the background agent.
Forked Agent Mode
The extraction agent runs using runForkedAgent — a "perfect fork" of the main conversation that shares the parent's prompt cache. This means the extraction agent doesn't need to resend the entire conversation history, dramatically reducing token costs:
Note the hard limit of maxTurns: 5 — this prevents the extraction agent from falling into a "verification rabbit hole" (e.g., reading source code to confirm whether a certain pattern actually exists).
Tool Permission Sandbox
The extraction agent has strict tool permission restrictions, defined by createAutoMemCanUseTool:
- Allowed:
FileRead,Grep,Glob(read-only) - Allowed: Read-only
Bashcommands (ls, find, cat, stat, etc.) - Allowed:
FileEdit,FileWrite— but only for paths within the memory directory - Denied: All other tools (MCP, Agent, write-capable Bash, etc.)
Extraction Prompt Design
The prompt received by the extraction agent is built by src/services/extractMemories/prompts.ts. It includes the complete type taxonomy, saving rules, and a key optimization — pre-injecting the existing memory manifest:
The extraction prompt also includes a strict constraint: "You may only use content from the most recent ~N messages to update memories. Do not spend turns investigating or verifying this content — don't grep source code, don't read code to confirm patterns, don't run git commands."
Concurrency Control and Message Coalescing
The extraction system has sophisticated concurrency control. When an extraction is already in progress, incoming requests are stashed and a "trailing extraction" runs after the current one completes:
Extraction Frequency Throttling
Extraction doesn't run on every turn — the interval is controlled via the feature flag tengu_bramble_lintel (default: every 1 eligible turn):
Memory Injection Timing
Memories are loaded into the conversation context through two paths:
Path One: System Prompt Injection (MEMORY.md Index)
loadMemoryPrompt() is called during system prompt construction, injecting the content of MEMORY.md (after truncation processing) into the system prompt. This is the first layer of memory loading at session startup:
Path Two: Relevant Memory Prefetch (Individual Memory Files)
The MEMORY.md index is always loaded, but individual memory file contents are not all loaded — that would waste context. Instead, the system selectively prefetches the most relevant memories based on the user's current query.
This process is driven by startRelevantMemoryPrefetch():
Key design decisions in the prefetch:
- Non-blocking: The prefetch is asynchronous, never blocking the main query loop
- Cancellable: Linked to a turn-level AbortController, so the user can cancel immediately by pressing Escape
- Disposable pattern: Uses the
usingkeyword binding, automatically cleaning up on all exit paths of the query loop (return, throw, .return()) - Session-level byte cap: Prevents unlimited memory injection in long sessions
findRelevantMemories: AI-Driven Memory Retrieval
Memory file selection doesn't rely on keyword matching — it uses a Sonnet model via sideQuery to determine which memories are most relevant to the current query:
The selector's system prompt is precise:
The selector also receives a "recently successfully used tools" list, excluding reference docs for tools already in use (since that's noise), but keeping warnings and known issues about those tools (since those are exactly what's needed during use).
Memory Deduplication
There is a deduplication step during memory injection — preventing memories the model has already read from being injected again. This is implemented by filterDuplicateMemoryAttachments():
A source code comment specifically mentions a subtle bug fix here:
The mark-after-filter ordering is load-bearing: readMemoriesForSurfacing used to write to readFileState during the prefetch, which meant the filter saw every prefetch-selected path as "already in context" and dropped them all (self-referential filter).
The previous implementation wrote to readFileState during the prefetch phase, so when the filter checked, it found all prefetched memories were "already in context" — it filtered out itself. The fix was to defer the write until after filtering.
Memory Expiration and Update Strategy
Time Awareness
src/memdir/memoryAge.ts provides human-readable time annotations:
Why convert timestamps to "47 days ago" instead of ISO format? Because models perform poorly at date arithmetic — seeing 2026-02-12T08:33:00Z doesn't automatically trigger the realization "this is from a long time ago," but seeing "47 days ago" immediately triggers staleness reasoning.
Staleness Warning Injection
Memories older than 1 day are annotated with a staleness warning:
The motivation for this warning came from user reports: stale code state memories (containing file:line references) were being asserted as facts, and the references made stale claims appear more authoritative rather than less reliable.
Verify Before Asserting
The TRUSTING_RECALL_SECTION in the system prompt requires the model to verify before recommending based on memories:
A comment documents the eval validation results: when this text was renamed from "Trusting what you recall" to "Before recommending from memory," the eval went from 0/3 to 3/3 — title wording affected the model's behavioral triggers.
Team Memory Sync
When the TEAMMEM feature flag is enabled, the memory system expands to a dual-directory structure:
Team Path
The team memory directory is a subdirectory of the personal memory directory:
Dual-Directory Prompt
When team memory is enabled, the prompt includes instructions for both directories, with each memory type annotated with a <scope> tag to guide placement:
usertype: Always private (your personal profile shouldn't be shared)feedbacktype: Private by default, unless it's clearly a project-level convention (e.g., testing strategy)projecttype: Leans toward teamreferencetype: Usually team
Sync Mechanism
src/services/teamMemorySync/ implements the full sync mechanism:
Sync semantics:
- Pull: Server content overwrites local files by key (server wins)
- Push: Only uploads keys whose content hashes differ from the server (delta upload). The server uses upsert semantics — keys not present in the PUT are preserved
- Deletes don't propagate: Deleting a file locally won't delete it from the server; it will be restored on the next pull
File Monitoring
The watcher uses fs.watch({ recursive: true }):
Why not use chokidar? A comment explains: chokidar 4+ removed fsevents support, and Bun's fs.watch fallback uses kqueue — each monitored file requires a file descriptor. With 500+ team memory files, that's 500+ permanently held file descriptors. recursive: true uses FSEvents on macOS (O(1) fd) and inotify on Linux (O(number of subdirectories)).
Security Safeguards
Team memory involves cross-user sharing, so security is critical. teamMemPaths.ts implements multiple layers of path safety checks:
- Path injection protection:
sanitizePathKey()checks for null bytes, URL-encoded traversal (%2e%2e%2f), Unicode normalization attacks, backslashes, and absolute paths - Symlink protection:
realpathDeepestExisting()resolves symlinks to real paths, preventing escape out of the team directory via symlinks - Dangling symlink detection: Uses
lstatto distinguish "truly doesn't exist" from "symlink target doesn't exist" - Secret scanning:
scanForSecrets()uses gitleaks rules to detect API keys, credentials, and other sensitive data, blocking pushes
Permanent Failure Suppression
When a push fails for an unrecoverable reason (no OAuth, 404, 413, etc.), the watcher suppresses subsequent retries, avoiding infinite retry loops. There was a case where a device without OAuth generated 167,000 push events over 2.5 days.
Complete Memory System Lifecycle
Portable Patterns
The memory system's design has an implicit but important property: portability.
Because all memories are standard Markdown files, stored at deterministic paths on the filesystem, with a uniform frontmatter format:
- Cross-device migration: Copy the
~/.claude/projects/directory to migrate all memories - Version control: The memory directory can be put under git management (though this isn't done by default)
- Backup and restore: Standard filesystem backup tools work out of the box
- Bulk editing: Any text editor can directly modify memories
- Programmatic operations: Scripts can directly read and write files in the frontmatter format
- Cross-tool compatibility: Other tools can read and understand this format
No proprietary database format, no encrypted blobs, no storage that requires a specific API to access. This is a deliberate design choice — it trades some query efficiency (compared to SQLite) for transparency and operability.
Path Safety and Configurability
The configurability of the path system is also worth noting. validateMemoryPath() in paths.ts performs strict security validation on paths:
A security comment specifically notes: projectSettings (committed to the repo in .claude/settings.json) is deliberately excluded — a malicious repository could set autoMemoryDirectory: "~/.ssh" to gain write access to a sensitive directory. Only policySettings, localSettings, and userSettings from trusted sources are accepted.
Special Mode: KAIROS Assistant Logs
When feature('KAIROS') is enabled and running in assistant mode, the memory system switches to a log mode. Assistant sessions are effectively long-running, so instead of maintaining a MEMORY.md index, the agent writes to date-named log files in append mode:
Each log entry is a brief timestamped bullet point. The MEMORY.md index is generated by a separate /dream skill that distills from the logs overnight.
The design motivation for this mode is: in long-running sessions, the cost of maintaining an index in real time is too high, and logs are naturally ordered by time, so they don't need an index for organization. The distillation process can run during low-load periods, performing deeper organization.
Disabling Memory
The memory system can be disabled at multiple levels:
Priority chain: Environment variable > --bare mode > Remote mode detection > settings.json > Enabled by default.
Design Insights
Claude Code's memory system has several design decisions worth reflecting on:
Filesystem as database. No SQLite, LevelDB, or any embedded database — just the filesystem directly. This may seem "primitive," but it delivers debuggability (just cat to inspect), portability (just copy the directory), and operability (any editor can modify it). For a system where memory entries typically don't exceed 200 and each file is no more than a few KB, filesystem performance is more than sufficient.
Closed type taxonomy. Only four types, each with clear guidance on "what to save and what not to save." This prevents the model's tendency to save everything as a memory — particularly the rule "content derivable from code should not be saved as memory" effectively prevents memory bloat.
AI-driven retrieval. Memory retrieval doesn't rely on keyword matching or vector search, but instead directly lets another AI (Sonnet) look at frontmatter descriptions to judge relevance. This is very effective when the memory count is small (< 200) — each memory's description is semantically rich natural language, and the AI can make more accurate judgments than keyword matching.
Eval-driven prompt iteration. Code comments repeatedly reference eval results to explain specific wording choices — for example, changing a section title from "Trusting what you recall" to "Before recommending from memory" improved the eval from 0/3 to 3/3. This demonstrates that the memory system's behavior is largely determined by prompt engineering, and prompt wording choices require quantitative validation.
Mutually exclusive write paths. The mutual exclusion design between the main agent and extraction agent avoids duplicate memories, but also means that if the main agent writes one memory during a conversation, even if it missed other memorable content in the same conversation, the extraction agent won't fill in the gaps — it considers the entire range as already processed. This is an intentional trade-off: between redundancy and omission, it chose omission.
How does this system perform in practice? Judging from the telemetry events and eval references scattered throughout the source code, it has gone through extensive experimentation and iteration. Memory is not just an engineering problem but a product design problem — what to remember, what not to remember, when to surface, and when to stay silent. These decisions directly impact user experience. Claude Code's memory system provides a battle-tested answer.