The Problem
When an AI Agent runs in a real-world environment, failure is the norm rather than the exception. Network timeouts, API overloads, insufficient file permissions, truncated model output, users suddenly pressing Esc — these are not edge cases, but everyday events that happen millions of times per day.
Claude Code's core design philosophy is: errors should not terminate the session; they should trigger recovery. The main loop in query.ts is not a linear request-response flow, but a state machine with multiple recovery paths. When the API returns a max_output_tokens error, the system automatically retries with an injected "continue" instruction; when the prompt is too long, the system triggers reactive compaction and retries; when the user presses Esc to interrupt, the system generates synthetic tool_result messages to keep the message format valid.
This article provides an in-depth analysis of every path in this recovery state machine.
The query.ts Recovery State Machine
State Definition
The query loop maintains a mutable state object that is passed between iterations:
1type State = {
2 messages: Message[]
3 toolUseContext: ToolUseContext
4 autoCompactTracking: AutoCompactTrackingState | undefined
5 maxOutputTokensRecoveryCount: number
6 hasAttemptedReactiveCompact: boolean
7 maxOutputTokensOverride: number | undefined
8 pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined
9 stopHookActive: boolean | undefined
10 turnCount: number
11 transition: Continue | undefined
12}
Key recovery state fields:
maxOutputTokensRecoveryCount — number of output truncation recovery attempts made (max 3)
hasAttemptedReactiveCompact — whether reactive compaction has been attempted
maxOutputTokensOverride — current override for max output tokens
transition — reason the previous iteration continued (used to prevent duplicate recovery)
Loop Initialization
1let state: State = {
2 messages: params.messages,
3 toolUseContext: params.toolUseContext,
4 maxOutputTokensOverride: params.maxOutputTokensOverride,
5 autoCompactTracking: undefined,
6 stopHookActive: undefined,
7 maxOutputTokensRecoveryCount: 0,
8 hasAttemptedReactiveCompact: false,
9 turnCount: 1,
10 pendingToolUseSummary: undefined,
11 transition: undefined,
12}
Recovery Path Overview
max_output_tokens Recovery
When model output is truncated (stop_reason: max_output_tokens), the system doesn't immediately report an error — instead, it attempts to let the model continue:
1const MAX_OUTPUT_TOKENS_RECOVERY_LIMIT = 3
Error Suppression
In the streaming loop, max_output_tokens errors are suppressed (not sent to SDK consumers):
1function isWithheldMaxOutputTokens(
2 msg: Message | StreamEvent | undefined,
3): msg is AssistantMessage {
4 return msg?.type === 'assistant' && msg.apiError === 'max_output_tokens'
5}
1if (isWithheldMaxOutputTokens(message)) {
2 withheld = true
3}
Escalating Retry
If the default 8K max output tokens was used, the system first escalates to 64K and retries the same request — no continue message is injected, and the recovery counter is not incremented:
1// Escalating retry: if we used the capped 8k default and hit the
2// limit, retry the SAME request at 64k — no meta message, no
3// multi-turn dance. This fires once per turn.
4const capEnabled = getFeatureValue_CACHED_MAY_BE_STALE(
5 'tengu_otk_slot_v1',
6 false,
7)
If 64K is also insufficient, multi-turn recovery kicks in — a user message is injected ("Your output was truncated here, please continue from the truncation point"), and the loop returns to the API call:
1// Recovery logic pseudocode
2if (maxOutputTokensRecoveryCount < MAX_OUTPUT_TOKENS_RECOVERY_LIMIT) {
3 // Inject continue message
4 state = {
5 ...state,
6 maxOutputTokensRecoveryCount: maxOutputTokensRecoveryCount + 1,
7 maxOutputTokensOverride: ESCALATED_MAX_TOKENS,
8 transition: { reason: 'max_output_tokens_recovery' },
9 }
10 continue // Return to loop top
11}
12// Exceeded limit — surface the error
13yield lastMessage
14return { reason: 'max_output_tokens' }
The recovery limit is capped at 3 attempts — preventing infinite loops (the model may continuously produce excessively long output in some cases).
Prompt Too Long Recovery
When the context exceeds the model's limit, the system has two levels of recovery:
Level 1: Context Collapse Drain
Context Collapse is a lightweight compression approach — it folds old messages into summaries while preserving granularity. Draining commits all staged folds at once:
1if (feature('CONTEXT_COLLAPSE') && contextCollapse &&
2 state.transition?.reason !== 'collapse_drain_retry') {
3 const drained = contextCollapse.recoverFromOverflow(
4 messagesForQuery,
5 querySource,
6 )
7 if (drained.committed > 0) {
8 const next: State = {
9 messages: drained.messages,
10 toolUseContext,
11 autoCompactTracking: tracking,
12 maxOutputTokensRecoveryCount,
13 hasAttemptedReactiveCompact,
14 maxOutputTokensOverride: undefined,
15 pendingToolUseSummary: undefined,
16 stopHookActive: undefined,
17 turnCount,
18 transition: { reason: 'collapse_drain_retry', committed: drained.committed },
19 }
20 state = next
21 continue
22 }
23}
Note the state.transition?.reason !== 'collapse_drain_retry' check — if the previous iteration was already a collapse drain and still resulted in a 413, draining wasn't sufficient and more aggressive measures are needed.
Level 2: Reactive Compact
If collapse draining isn't enough (or isn't enabled), full reactive compaction is triggered:
1if ((isWithheld413 || isWithheldMedia) && reactiveCompact) {
2 const compacted = await reactiveCompact.tryReactiveCompact({
3 hasAttempted: hasAttemptedReactiveCompact,
4 querySource,
5 aborted: toolUseContext.abortController.signal.aborted,
6 messages: messagesForQuery,
7 cacheSafeParams: {
8 systemPrompt, userContext, systemContext,
9 toolUseContext,
10 forkContextMessages: messagesForQuery,
11 },
12 })
13
14 if (compacted) {
15 const postCompactMessages = buildPostCompactMessages(compacted)
16 for (const msg of postCompactMessages) {
17 yield msg
18 }
19 const next: State = {
20 messages: postCompactMessages,
21 toolUseContext,
22 autoCompactTracking: undefined,
23 maxOutputTokensRecoveryCount,
24 hasAttemptedReactiveCompact: true, // Mark as attempted
25 maxOutputTokensOverride: undefined,
26 pendingToolUseSummary: undefined,
27 stopHookActive: undefined,
28 turnCount,
29 transition: { reason: 'reactive_compact_retry' },
30 }
31 state = next
32 continue
33 }
34
35 // Cannot recover — surface the error
36 yield lastMessage
37 void executeStopFailureHooks(lastMessage, toolUseContext)
38 return { reason: isWithheldMedia ? 'image_error' : 'prompt_too_long' }
39}
Key safety measures:
hasAttemptedReactiveCompact: true ensures only one attempt — preventing a "compact -> retry -> 413 -> compact" death loop
- Stop hooks are not executed — the model didn't produce a valid response, so hooks cannot evaluate
executeStopFailureHooks is a different function — it only performs minimal failure notification
Pre-emptive Blocking
Before entering the API call, if auto-compact is disabled and tokens have reached the threshold, the request is blocked outright:
1if (!compactionResult && querySource !== 'compact' && querySource !== 'session_memory'
2 && !(reactiveCompact?.isReactiveCompactEnabled() && isAutoCompactEnabled())
3 && !collapseOwnsIt) {
4 const { isAtBlockingLimit } = calculateTokenWarningState(
5 tokenCountWithEstimation(messagesForQuery) - snipTokensFreed,
6 toolUseContext.options.mainLoopModel,
7 )
8 if (isAtBlockingLimit) {
9 yield createAssistantAPIErrorMessage({
10 content: PROMPT_TOO_LONG_ERROR_MESSAGE,
11 })
12 return { reason: 'blocking_limit' }
13 }
14}
Note the skip conditions — when reactive compact or context collapse is enabled, pre-emptive blocking is not performed, because they can recover after the API error occurs. Pre-emptive blocking would prevent the error from happening, thereby also preventing the recovery opportunity.
Model Fallback Recovery
When FallbackTriggeredError is thrown during streaming:
1} catch (innerError) {
2 if (innerError instanceof FallbackTriggeredError && fallbackModel) {
3 currentModel = fallbackModel
4 attemptWithFallback = true
5
6 // Generate placeholder tool_results for already-emitted messages
7 yield* yieldMissingToolResultBlocks(
8 assistantMessages,
9 'Model fallback triggered',
10 )
11 assistantMessages.length = 0
12 toolResults.length = 0
13
14 // Discard pending results from the streaming tool executor
15 if (streamingToolExecutor) {
16 streamingToolExecutor.discard()
17 streamingToolExecutor = new StreamingToolExecutor(...)
18 }
19
20 // Update model in tool context
21 toolUseContext.options.mainLoopModel = fallbackModel
22
23 // Thinking signatures are model-bound — clear them to avoid 400 errors
24 if (process.env.USER_TYPE === 'ant') {
25 messagesForQuery = stripSignatureBlocks(messagesForQuery)
26 }
27
28 yield createSystemMessage(
29 `Switched to ${renderModelName(innerError.fallbackModel)} due to high demand`,
30 'warning',
31 )
32
33 continue // Retry inner loop
34 }
35 throw innerError
36}
Of particular note is stripSignatureBlocks — protected thinking blocks carry model-specific cryptographic signatures that would cause API 400 errors after falling back to a different model.
User Interruption Handling
When the user presses Esc or Ctrl+C, the system needs to stop gracefully:
1const handleCancel = useCallback(() => {
2 // Priority 1: If there's an active task, cancel it
3 if (abortSignal !== undefined && !abortSignal.aborted) {
4 logEvent('tengu_cancel', cancelProps)
5 setToolUseConfirmQueue(() => [])
6 onCancel()
7 return
8 }
9
10 // Priority 2: If Claude is idle, pop from queue
11 if (hasCommandsInQueue()) {
12 if (popCommandFromQueue) {
13 popCommandFromQueue()
14 return
15 }
16 }
17
18 // Fallback: Nothing to cancel
19 logEvent('tengu_cancel', cancelProps)
20 setToolUseConfirmQueue(() => [])
21 onCancel()
22}, [...])
Interruption priority:
- Active task — set the abort signal, cancel API calls and tool execution
- Command queue — if Claude is idle but has queued commands, pop the last one
- Fallback — clear the permission confirmation queue
Post-Interruption Message Cleanup
In query.ts, after an interruption, synthetic tool_result messages must be generated for all incomplete tool_use blocks:
1if (toolUseContext.abortController.signal.aborted) {
2 if (streamingToolExecutor) {
3 // Consume remaining results — executor generates synthetic tool_results for interrupted tools
4 for await (const update of streamingToolExecutor.getRemainingResults()) {
5 if (update.message) {
6 yield update.message
7 }
8 }
9 } else {
10 yield* yieldMissingToolResultBlocks(
11 assistantMessages,
12 'Interrupted by user',
13 )
14 }
15
16 // Skip interruption message for submit-interrupt
17 if (toolUseContext.abortController.signal.reason !== 'interrupt') {
18 yield createUserInterruptionMessage({ toolUse: false })
19 }
20 return { reason: 'aborted_streaming' }
21}
yieldMissingToolResultBlocks ensures message format validity — the API requires every tool_use to be followed by a corresponding tool_result:
1function* yieldMissingToolResultBlocks(
2 assistantMessages: AssistantMessage[],
3 errorMessage: string,
4) {
5 for (const assistantMessage of assistantMessages) {
6 const toolUseBlocks = assistantMessage.message.content.filter(
7 content => content.type === 'tool_use',
8 ) as ToolUseBlock[]
9
10 for (const toolUse of toolUseBlocks) {
11 yield createUserMessage({
12 content: [{
13 type: 'tool_result',
14 content: errorMessage,
15 is_error: true,
16 tool_use_id: toolUse.id,
17 }],
18 toolUseResult: errorMessage,
19 sourceToolAssistantUUID: assistantMessage.uuid,
20 })
21 }
22 }
23}
Ctrl+C vs. Esc Differences
1// Escape: respects mode switching, doesn't trigger in special input modes
2const isEscapeActive =
3 isContextActive &&
4 (canCancelRunningTask || hasQueuedCommands) &&
5 !isInSpecialModeWithEmptyInput &&
6 !isViewingTeammate
7
8// Ctrl+C: more forceful, can interrupt even when viewing a teammate
9const isCtrlCActive =
10 isContextActive &&
11 (canCancelRunningTask || hasQueuedCommands || isViewingTeammate)
Ctrl+C additionally handles the teammate viewing scenario — stopping all background agents and returning to the main thread.
Kill All Agents (Double Confirmation)
1const handleKillAgents = useCallback(() => {
2 const now = Date.now()
3 const elapsed = now - lastKillAgentsPressRef.current
4
5 if (elapsed <= KILL_AGENTS_CONFIRM_WINDOW_MS) {
6 // Second press within 3 seconds — confirm kill all background agents
7 lastKillAgentsPressRef.current = 0
8 killAllAgentsAndNotify()
9 return
10 }
11
12 // First press — show confirmation prompt
13 lastKillAgentsPressRef.current = now
14 addNotification({
15 key: 'kill-agents-confirm',
16 text: `Press ${shortcut} again to stop background agents`,
17 timeoutMs: KILL_AGENTS_CONFIRM_WINDOW_MS,
18 })
19}, [...])
The 3-second confirmation window prevents accidental termination — background agents may be executing important tasks.
Tool Execution Failure Feedback
When tool execution fails, the error information is fed back to the model as tool_result content with is_error: true. This allows the model to understand what happened and decide the next step — retry, try a different approach, or report to the user:
1// Simplified representation — tool execution error handling
2yield createUserMessage({
3 content: [{
4 type: 'tool_result',
5 content: `Error: ${error.message}`,
6 is_error: true,
7 tool_use_id: toolUse.id,
8 }],
9})
This is Claude Code's core self-healing pattern — errors are not system termination signals, but input signals for the model. After seeing a bash command fail, the model typically modifies the command and retries. After seeing a file doesn't exist, it first runs ls to check.
/doctor Environment Self-Diagnostics
The /doctor command provides system-level diagnostics:
1export type DiagnosticInfo = {
2 installationType: InstallationType
3 version: string
4 installationPath: string
5 invokedBinary: string
6 configInstallMethod: InstallMethod | 'not set'
7 autoUpdates: string
8 hasUpdatePermissions: boolean | null
9 multipleInstallations: Array<{ type: string; path: string }>
10 warnings: Array<{ issue: string; fix: string }>
11 recommendation?: string
12 packageManager?: string
13 ripgrepStatus: {
14 working: boolean
15 mode: 'system' | 'builtin' | 'embedded'
16 systemPath: string | null
17 }
18}
The diagnostics cover:
- Installation type detection — npm-global/npm-local/native/package-manager/development
- Multiple installation detection — discovers multiple Claude Code installations on the system
- Permission checks — whether auto-updates have write permissions
- ripgrep status — whether the search engine is working properly
- Shell configuration — whether aliases and environment variables are correct
The installation type detection logic is quite thorough:
1export async function getCurrentInstallationType(): Promise<InstallationType> {
2 if (process.env.NODE_ENV === 'development') return 'development'
3
4 if (isInBundledMode()) {
5 // Check if installed by a package manager
6 if (detectHomebrew() || detectWinget() || detectMise() ||
7 detectAsdf() || await detectPacman() ||
8 await detectDeb() || await detectRpm() || await detectApk()) {
9 return 'package-manager'
10 }
11 return 'native'
12 }
13
14 if (isRunningFromLocalInstallation()) return 'npm-local'
15
16 // Check typical npm global paths
17 const npmGlobalPaths = [
18 '/usr/local/lib/node_modules',
19 '/usr/lib/node_modules',
20 '/opt/homebrew/lib/node_modules',
21 '/.nvm/versions/node/',
22 ]
23 if (npmGlobalPaths.some(path => invokedPath.includes(path))) {
24 return 'npm-global'
25 }
26
27 return 'unknown'
28}
The detection covers all major package managers — Homebrew, winget, mise, asdf, pacman, deb, rpm, apk — ensuring correct identification of the installation method on any Linux/macOS/Windows environment.
Interactions Between Recovery Paths
The various recovery paths have complex interactions, and understanding these relationships is key to understanding the system's resilience:
Key interaction rules:
- Pre-emptive blocking and recovery are mutually exclusive — when reactive compact or context collapse is enabled, pre-emptive blocking is skipped (otherwise the recovery path would never be triggered)
- Collapse to Reactive cascade — reactive compact is only attempted after collapse draining fails
- Each type attempted only once —
hasAttemptedReactiveCompact prevents a reactive compact death loop
- Transitions prevent repetition —
state.transition?.reason checks prevent the same recovery strategy from executing consecutively
- Error suppression and recovery must be consistent — errors suppressed in the streaming loop must have corresponding handling in the recovery check; otherwise errors get silently swallowed
Consistency Requirement for Streaming Error Suppression
1// Hoist media-recovery gate once per turn. Withholding (inside the
2// stream loop) and recovery (after) must agree; CACHED_MAY_BE_STALE can
3// flip during the 5-30s stream, and withhold-without-recover would eat
4// the message.
5const mediaRecoveryEnabled =
6 reactiveCompact?.isReactiveCompactEnabled() ?? false
Feature flag values can change during the 5-30 seconds of streaming (GrowthBook cache refresh). If an error was suppressed at the start of the stream, but the recovery check sees the flag as disabled at the end of the stream, the error is lost. Therefore, the flag value is extracted once at the start of the turn and used consistently throughout.
Summary
Claude Code's error recovery system embodies several core principles:
- Errors are input, not termination signals — tool execution failures become
tool_result(is_error: true) feedback to the model
- Graduated recovery — from lightweight (collapse drain) to heavyweight (reactive compact), escalating level by level
- Bounded retries — each recovery path has a clear attempt limit, preventing death loops
- State integrity — synthetic tool_results are generated after interruption, keeping message format valid
- Flag consistency — suppression and recovery must see the same feature flag values
- Environment self-diagnostics — /doctor provides system-level diagnostics to help users troubleshoot environment issues
The complexity of this system stems directly from the design goal of "never terminating the session." In a world where an AI Agent may run continuously for hours, every failure mode needs a recovery path — not because engineers enjoy complexity, but because reality is complex.