claude-harness — Deconstructing Claude Code

The Problem

When an AI Agent runs in a real-world environment, failure is the norm rather than the exception. Network timeouts, API overloads, insufficient file permissions, truncated model output, users suddenly pressing Esc — these are not edge cases, but everyday events that happen millions of times per day.

Claude Code's core design philosophy is: errors should not terminate the session; they should trigger recovery. The main loop in query.ts is not a linear request-response flow, but a state machine with multiple recovery paths. When the API returns a max_output_tokens error, the system automatically retries with an injected "continue" instruction; when the prompt is too long, the system triggers reactive compaction and retries; when the user presses Esc to interrupt, the system generates synthetic tool_result messages to keep the message format valid.

This article provides an in-depth analysis of every path in this recovery state machine.

The query.ts Recovery State Machine

State Definition

The query loop maintains a mutable state object that is passed between iterations:

src/query.ts
TypeScript
1type State = {
messages: Message[]
toolUseContext: ToolUseContext
autoCompactTracking: AutoCompactTrackingState | undefined
maxOutputTokensRecoveryCount: number
hasAttemptedReactiveCompact: boolean
maxOutputTokensOverride: number | undefined
pendingToolUseSummary: Promise<ToolUseSummaryMessage | null> | undefined
stopHookActive: boolean | undefined
turnCount: number
transition: Continue | undefined
12}

Key recovery state fields:

maxOutputTokensRecoveryCount — number of output truncation recovery attempts made (max 3)
hasAttemptedReactiveCompact — whether reactive compaction has been attempted
maxOutputTokensOverride — current override for max output tokens
transition — reason the previous iteration continued (used to prevent duplicate recovery)

Loop Initialization

src/query.ts
TypeScript
1let state: State = {
messages: params.messages,
toolUseContext: params.toolUseContext,
maxOutputTokensOverride: params.maxOutputTokensOverride,
autoCompactTracking: undefined,
stopHookActive: undefined,
maxOutputTokensRecoveryCount: 0,
hasAttemptedReactiveCompact: false,
turnCount: 1,
pendingToolUseSummary: undefined,
transition: undefined,
12}

Recovery Path Overview

...

max_output_tokens Recovery

When model output is truncated (stop_reason: max_output_tokens), the system doesn't immediately report an error — instead, it attempts to let the model continue:

src/query.ts
TypeScript
1const MAX_OUTPUT_TOKENS_RECOVERY_LIMIT = 3

Error Suppression

In the streaming loop, max_output_tokens errors are suppressed (not sent to SDK consumers):

src/query.ts
TypeScript
1function isWithheldMaxOutputTokens(
2  msg: Message | StreamEvent | undefined,
3): msg is AssistantMessage {
4  return msg?.type === 'assistant' && msg.apiError === 'max_output_tokens'
5}

src/query.ts
TypeScript
1if (isWithheldMaxOutputTokens(message)) {
2  withheld = true
3}

Escalating Retry

If the default 8K max output tokens was used, the system first escalates to 64K and retries the same request — no continue message is injected, and the recovery counter is not incremented:

src/query.ts
TypeScript
1// Escalating retry: if we used the capped 8k default and hit the
2// limit, retry the SAME request at 64k — no meta message, no
3// multi-turn dance. This fires once per turn.
4const capEnabled = getFeatureValue_CACHED_MAY_BE_STALE(
5  'tengu_otk_slot_v1',
6  false,
7)

If 64K is also insufficient, multi-turn recovery kicks in — a user message is injected ("Your output was truncated here, please continue from the truncation point"), and the loop returns to the API call:

TypeScript
1// Recovery logic pseudocode
2if (maxOutputTokensRecoveryCount < MAX_OUTPUT_TOKENS_RECOVERY_LIMIT) {
3  // Inject continue message
4  state = {
5    ...state,
6    maxOutputTokensRecoveryCount: maxOutputTokensRecoveryCount + 1,
7    maxOutputTokensOverride: ESCALATED_MAX_TOKENS,
8    transition: { reason: 'max_output_tokens_recovery' },
9  }
10  continue  // Return to loop top
11}
12// Exceeded limit — surface the error
13yield lastMessage
14return { reason: 'max_output_tokens' }

The recovery limit is capped at 3 attempts — preventing infinite loops (the model may continuously produce excessively long output in some cases).

Prompt Too Long Recovery

When the context exceeds the model's limit, the system has two levels of recovery:

Level 1: Context Collapse Drain

Context Collapse is a lightweight compression approach — it folds old messages into summaries while preserving granularity. Draining commits all staged folds at once:

src/query.ts
TypeScript
1if (feature('CONTEXT_COLLAPSE') && contextCollapse &&
  state.transition?.reason !== 'collapse_drain_retry') {
const drained = contextCollapse.recoverFromOverflow(
  messagesForQuery,
  querySource,
)
if (drained.committed > 0) {
  const next: State = {
    messages: drained.messages,
    toolUseContext,
    autoCompactTracking: tracking,
    maxOutputTokensRecoveryCount,
    hasAttemptedReactiveCompact,
    maxOutputTokensOverride: undefined,
    pendingToolUseSummary: undefined,
    stopHookActive: undefined,
    turnCount,
    transition: { reason: 'collapse_drain_retry', committed: drained.committed },
  }
  state = next
  continue
}
23}

Note the state.transition?.reason !== 'collapse_drain_retry' check — if the previous iteration was already a collapse drain and still resulted in a 413, draining wasn't sufficient and more aggressive measures are needed.

Level 2: Reactive Compact

If collapse draining isn't enough (or isn't enabled), full reactive compaction is triggered:

src/query.ts
TypeScript
1if ((isWithheld413 || isWithheldMedia) && reactiveCompact) {
const compacted = await reactiveCompact.tryReactiveCompact({
  hasAttempted: hasAttemptedReactiveCompact,
  querySource,
  aborted: toolUseContext.abortController.signal.aborted,
  messages: messagesForQuery,
  cacheSafeParams: {
    systemPrompt, userContext, systemContext,
    toolUseContext,
    forkContextMessages: messagesForQuery,
  },
})
13
if (compacted) {
  const postCompactMessages = buildPostCompactMessages(compacted)
  for (const msg of postCompactMessages) {
    yield msg
  }
  const next: State = {
    messages: postCompactMessages,
    toolUseContext,
    autoCompactTracking: undefined,
    maxOutputTokensRecoveryCount,
    hasAttemptedReactiveCompact: true,  // Mark as attempted
    maxOutputTokensOverride: undefined,
    pendingToolUseSummary: undefined,
    stopHookActive: undefined,
    turnCount,
    transition: { reason: 'reactive_compact_retry' },
  }
  state = next
  continue
}
34
// Cannot recover — surface the error
yield lastMessage
void executeStopFailureHooks(lastMessage, toolUseContext)
return { reason: isWithheldMedia ? 'image_error' : 'prompt_too_long' }
39}

Key safety measures:

hasAttemptedReactiveCompact: true ensures only one attempt — preventing a "compact -> retry -> 413 -> compact" death loop
Stop hooks are not executed — the model didn't produce a valid response, so hooks cannot evaluate
executeStopFailureHooks is a different function — it only performs minimal failure notification

Pre-emptive Blocking

Before entering the API call, if auto-compact is disabled and tokens have reached the threshold, the request is blocked outright:

src/query.ts
TypeScript
1if (!compactionResult && querySource !== 'compact' && querySource !== 'session_memory'
  && !(reactiveCompact?.isReactiveCompactEnabled() && isAutoCompactEnabled())
  && !collapseOwnsIt) {
const { isAtBlockingLimit } = calculateTokenWarningState(
  tokenCountWithEstimation(messagesForQuery) - snipTokensFreed,
  toolUseContext.options.mainLoopModel,
)
if (isAtBlockingLimit) {
  yield createAssistantAPIErrorMessage({
    content: PROMPT_TOO_LONG_ERROR_MESSAGE,
  })
  return { reason: 'blocking_limit' }
}
14}

Note the skip conditions — when reactive compact or context collapse is enabled, pre-emptive blocking is not performed, because they can recover after the API error occurs. Pre-emptive blocking would prevent the error from happening, thereby also preventing the recovery opportunity.

Model Fallback Recovery

When FallbackTriggeredError is thrown during streaming:

src/query.ts
TypeScript
1} catch (innerError) {
if (innerError instanceof FallbackTriggeredError && fallbackModel) {
  currentModel = fallbackModel
  attemptWithFallback = true
5
  // Generate placeholder tool_results for already-emitted messages
  yield* yieldMissingToolResultBlocks(
    assistantMessages,
    'Model fallback triggered',
  )
  assistantMessages.length = 0
  toolResults.length = 0
13
  // Discard pending results from the streaming tool executor
  if (streamingToolExecutor) {
    streamingToolExecutor.discard()
    streamingToolExecutor = new StreamingToolExecutor(...)
  }
19
  // Update model in tool context
  toolUseContext.options.mainLoopModel = fallbackModel
22
  // Thinking signatures are model-bound — clear them to avoid 400 errors
  if (process.env.USER_TYPE === 'ant') {
    messagesForQuery = stripSignatureBlocks(messagesForQuery)
  }
27
  yield createSystemMessage(
    `Switched to ${renderModelName(innerError.fallbackModel)} due to high demand`,
    'warning',
  )
32
  continue  // Retry inner loop
}
throw innerError
36}

Of particular note is stripSignatureBlocks — protected thinking blocks carry model-specific cryptographic signatures that would cause API 400 errors after falling back to a different model.

User Interruption Handling

When the user presses Esc or Ctrl+C, the system needs to stop gracefully:

src/hooks/useCancelRequest.ts
TypeScript
1const handleCancel = useCallback(() => {
// Priority 1: If there's an active task, cancel it
if (abortSignal !== undefined && !abortSignal.aborted) {
  logEvent('tengu_cancel', cancelProps)
  setToolUseConfirmQueue(() => [])
  onCancel()
  return
}
9
// Priority 2: If Claude is idle, pop from queue
if (hasCommandsInQueue()) {
  if (popCommandFromQueue) {
    popCommandFromQueue()
    return
  }
}
17
// Fallback: Nothing to cancel
logEvent('tengu_cancel', cancelProps)
setToolUseConfirmQueue(() => [])
onCancel()
22}, [...])

Interruption priority:

Active task — set the abort signal, cancel API calls and tool execution
Command queue — if Claude is idle but has queued commands, pop the last one
Fallback — clear the permission confirmation queue

Post-Interruption Message Cleanup

In query.ts, after an interruption, synthetic tool_result messages must be generated for all incomplete tool_use blocks:

src/query.ts
TypeScript
1if (toolUseContext.abortController.signal.aborted) {
if (streamingToolExecutor) {
  // Consume remaining results — executor generates synthetic tool_results for interrupted tools
  for await (const update of streamingToolExecutor.getRemainingResults()) {
    if (update.message) {
      yield update.message
    }
  }
} else {
  yield* yieldMissingToolResultBlocks(
    assistantMessages,
    'Interrupted by user',
  )
}
15
// Skip interruption message for submit-interrupt
if (toolUseContext.abortController.signal.reason !== 'interrupt') {
  yield createUserInterruptionMessage({ toolUse: false })
}
return { reason: 'aborted_streaming' }
21}

yieldMissingToolResultBlocks ensures message format validity — the API requires every tool_use to be followed by a corresponding tool_result:

src/query.ts
TypeScript
1function* yieldMissingToolResultBlocks(
assistantMessages: AssistantMessage[],
errorMessage: string,
4) {
for (const assistantMessage of assistantMessages) {
  const toolUseBlocks = assistantMessage.message.content.filter(
    content => content.type === 'tool_use',
  ) as ToolUseBlock[]
9
  for (const toolUse of toolUseBlocks) {
    yield createUserMessage({
      content: [{
        type: 'tool_result',
        content: errorMessage,
        is_error: true,
        tool_use_id: toolUse.id,
      }],
      toolUseResult: errorMessage,
      sourceToolAssistantUUID: assistantMessage.uuid,
    })
  }
}
23}

Ctrl+C vs. Esc Differences

src/hooks/useCancelRequest.ts
TypeScript
1// Escape: respects mode switching, doesn't trigger in special input modes
2const isEscapeActive =
3  isContextActive &&
4  (canCancelRunningTask || hasQueuedCommands) &&
5  !isInSpecialModeWithEmptyInput &&
6  !isViewingTeammate
7
8// Ctrl+C: more forceful, can interrupt even when viewing a teammate
9const isCtrlCActive =
10  isContextActive &&
11  (canCancelRunningTask || hasQueuedCommands || isViewingTeammate)

Ctrl+C additionally handles the teammate viewing scenario — stopping all background agents and returning to the main thread.

Kill All Agents (Double Confirmation)

src/hooks/useCancelRequest.ts
TypeScript
1const handleKillAgents = useCallback(() => {
const now = Date.now()
const elapsed = now - lastKillAgentsPressRef.current
4
if (elapsed <= KILL_AGENTS_CONFIRM_WINDOW_MS) {
  // Second press within 3 seconds — confirm kill all background agents
  lastKillAgentsPressRef.current = 0
  killAllAgentsAndNotify()
  return
}
11
// First press — show confirmation prompt
lastKillAgentsPressRef.current = now
addNotification({
  key: 'kill-agents-confirm',
  text: `Press ${shortcut} again to stop background agents`,
  timeoutMs: KILL_AGENTS_CONFIRM_WINDOW_MS,
})
19}, [...])

The 3-second confirmation window prevents accidental termination — background agents may be executing important tasks.

Tool Execution Failure Feedback

When tool execution fails, the error information is fed back to the model as tool_result content with is_error: true. This allows the model to understand what happened and decide the next step — retry, try a different approach, or report to the user:

TypeScript
1// Simplified representation — tool execution error handling
2yield createUserMessage({
3  content: [{
4    type: 'tool_result',
5    content: `Error: ${error.message}`,
6    is_error: true,
7    tool_use_id: toolUse.id,
8  }],
9})

This is Claude Code's core self-healing pattern — errors are not system termination signals, but input signals for the model. After seeing a bash command fail, the model typically modifies the command and retries. After seeing a file doesn't exist, it first runs ls to check.

/doctor Environment Self-Diagnostics

The /doctor command provides system-level diagnostics:

src/utils/doctorDiagnostic.ts
TypeScript
1export type DiagnosticInfo = {
installationType: InstallationType
version: string
installationPath: string
invokedBinary: string
configInstallMethod: InstallMethod | 'not set'
autoUpdates: string
hasUpdatePermissions: boolean | null
multipleInstallations: Array<{ type: string; path: string }>
warnings: Array<{ issue: string; fix: string }>
recommendation?: string
packageManager?: string
ripgrepStatus: {
  working: boolean
  mode: 'system' | 'builtin' | 'embedded'
  systemPath: string | null
}
18}

The diagnostics cover:

Installation type detection — npm-global/npm-local/native/package-manager/development
Multiple installation detection — discovers multiple Claude Code installations on the system
Permission checks — whether auto-updates have write permissions
ripgrep status — whether the search engine is working properly
Shell configuration — whether aliases and environment variables are correct

The installation type detection logic is quite thorough:

src/utils/doctorDiagnostic.ts
TypeScript
1export async function getCurrentInstallationType(): Promise<InstallationType> {
if (process.env.NODE_ENV === 'development') return 'development'
3
if (isInBundledMode()) {
  // Check if installed by a package manager
  if (detectHomebrew() || detectWinget() || detectMise() ||
      detectAsdf() || await detectPacman() ||
      await detectDeb() || await detectRpm() || await detectApk()) {
    return 'package-manager'
  }
  return 'native'
}
13
if (isRunningFromLocalInstallation()) return 'npm-local'
15
// Check typical npm global paths
const npmGlobalPaths = [
  '/usr/local/lib/node_modules',
  '/usr/lib/node_modules',
  '/opt/homebrew/lib/node_modules',
  '/.nvm/versions/node/',
]
if (npmGlobalPaths.some(path => invokedPath.includes(path))) {
  return 'npm-global'
}
26
return 'unknown'
28}

The detection covers all major package managers — Homebrew, winget, mise, asdf, pacman, deb, rpm, apk — ensuring correct identification of the installation method on any Linux/macOS/Windows environment.

Interactions Between Recovery Paths

The various recovery paths have complex interactions, and understanding these relationships is key to understanding the system's resilience:

...

Key interaction rules:

Pre-emptive blocking and recovery are mutually exclusive — when reactive compact or context collapse is enabled, pre-emptive blocking is skipped (otherwise the recovery path would never be triggered)
Collapse to Reactive cascade — reactive compact is only attempted after collapse draining fails
Each type attempted only once — hasAttemptedReactiveCompact prevents a reactive compact death loop
Transitions prevent repetition — state.transition?.reason checks prevent the same recovery strategy from executing consecutively
Error suppression and recovery must be consistent — errors suppressed in the streaming loop must have corresponding handling in the recovery check; otherwise errors get silently swallowed

Consistency Requirement for Streaming Error Suppression

src/query.ts
TypeScript
1// Hoist media-recovery gate once per turn. Withholding (inside the
2// stream loop) and recovery (after) must agree; CACHED_MAY_BE_STALE can
3// flip during the 5-30s stream, and withhold-without-recover would eat
4// the message.
5const mediaRecoveryEnabled =
6  reactiveCompact?.isReactiveCompactEnabled() ?? false

Feature flag values can change during the 5-30 seconds of streaming (GrowthBook cache refresh). If an error was suppressed at the start of the stream, but the recovery check sees the flag as disabled at the end of the stream, the error is lost. Therefore, the flag value is extracted once at the start of the turn and used consistently throughout.

Summary

Claude Code's error recovery system embodies several core principles:

Errors are input, not termination signals — tool execution failures become tool_result(is_error: true) feedback to the model
Graduated recovery — from lightweight (collapse drain) to heavyweight (reactive compact), escalating level by level
Bounded retries — each recovery path has a clear attempt limit, preventing death loops
State integrity — synthetic tool_results are generated after interruption, keeping message format valid
Flag consistency — suppression and recovery must see the same feature flag values
Environment self-diagnostics — /doctor provides system-level diagnostics to help users troubleshoot environment issues

The complexity of this system stems directly from the design goal of "never terminating the session." In a world where an AI Agent may run continuously for hours, every failure mode needs a recovery path — not because engineers enjoy complexity, but because reality is complex.