claude-harness — Deconstructing Claude Code

The Problem

The moment you type a question in Claude Code and press Enter, a precisely orchestrated chain of operations begins: the system builds the message array, selects the appropriate model, adds Beta headers, receives the response via SSE streaming, parses token usage in real time, calculates costs, handles potential 429/529 errors, and falls back to an alternative model if necessary. All of this completes within 1-2 seconds — the user simply sees text start flowing.

Claude Code's API client is not a simple HTTP wrapper — it's a complex system encompassing retry logic, fallback, caching, cost tracking, and multi-provider adaptation. This article provides an in-depth analysis of every layer of this system.

API Client Layer Architecture

Application Layer

query.ts (main loop)

claude.ts (API orchestration)

Retry Layer

withRetry.ts

FallbackTriggeredError

CannotRetryError

SDK Layer

client.ts (SDK initialization)

Anthropic SDK

AWS Bedrock

GCP Vertex AI

Azure Foundry

Support Layer

cost-tracker.ts

promptCacheBreakDetection.ts

bootstrap.ts

Multi-Provider Client

Claude Code supports four API providers, each with different authentication and configuration methods:

src/services/api/client.ts
TypeScript
1// Direct API:
2//   ANTHROPIC_API_KEY: Required for direct API access
3//
4// AWS Bedrock:
5//   AWS credentials configured via aws-sdk defaults
6//   AWS_REGION or AWS_DEFAULT_REGION
7//   ANTHROPIC_SMALL_FAST_MODEL_AWS_REGION: Optional override for Haiku
8//
9// Foundry (Azure):
10//   ANTHROPIC_FOUNDRY_RESOURCE: Azure resource name
11//   ANTHROPIC_FOUNDRY_BASE_URL: Alternative full base URL
12//
13// Vertex AI:
14//   Model-specific region variables (VERTEX_REGION_CLAUDE_*)
15//   CLOUD_ML_REGION: Default GCP region
16//   ANTHROPIC_VERTEX_PROJECT_ID: Required GCP project ID

Client initialization accounts for debugging needs — when stderr is in debug mode, SDK logs are redirected to stderr:

src/services/api/client.ts
TypeScript
1function createStderrLogger(): ClientOptions['logger'] {
return {
  error: (msg, ...args) =>
    console.error('[Anthropic SDK ERROR]', msg, ...args),
  warn: (msg, ...args) =>
    console.error('[Anthropic SDK WARN]', msg, ...args),
  info: (msg, ...args) =>
    console.error('[Anthropic SDK INFO]', msg, ...args),
  debug: (msg, ...args) =>
    console.error('[Anthropic SDK DEBUG]', msg, ...args),
}
12}

Beta Headers Management

Claude Code uses numerous Beta API features, declared via the anthropic-beta header:

src/services/api/claude.ts
TypeScript
1import {
2  AFK_MODE_BETA_HEADER,
3  CONTEXT_1M_BETA_HEADER,
4  CONTEXT_MANAGEMENT_BETA_HEADER,
5  EFFORT_BETA_HEADER,
6  FAST_MODE_BETA_HEADER,
7  PROMPT_CACHING_SCOPE_BETA_HEADER,
8  REDACT_THINKING_BETA_HEADER,
9  STRUCTURED_OUTPUTS_BETA_HEADER,
10  TASK_BUDGETS_BETA_HEADER,
11} from 'src/constants/betas.js'

These Beta features include:

Beta Header	Feature
`CONTEXT_1M_BETA_HEADER`	1M token context window
`CONTEXT_MANAGEMENT_BETA_HEADER`	Server-side context management
`FAST_MODE_BETA_HEADER`	Fast mode (reduced latency)
`EFFORT_BETA_HEADER`	Effort control (adjusts reasoning depth)
`PROMPT_CACHING_SCOPE_BETA_HEADER`	Prompt caching scope
`REDACT_THINKING_BETA_HEADER`	Thinking content redaction
`STRUCTURED_OUTPUTS_BETA_HEADER`	Structured outputs
`TASK_BUDGETS_BETA_HEADER`	Task budget control
`AFK_MODE_BETA_HEADER`	Away mode (background execution optimization)

Extra Body Parameters

Users can inject additional API parameters via the CLAUDE_CODE_EXTRA_BODY environment variable:

src/services/api/claude.ts
TypeScript
1export function getExtraBodyParams(betaHeaders?: string[]): JsonObject {
const extraBodyStr = process.env.CLAUDE_CODE_EXTRA_BODY
let result: JsonObject = {}
4
if (extraBodyStr) {
  try {
    const parsed = safeParseJSON(extraBodyStr)
    if (parsed && typeof parsed === 'object' && !Array.isArray(parsed)) {
      // Shallow clone — safeParseJSON is LRU-cached and returns the
      // same object reference. Mutating result would poison the cache.
      result = { ...(parsed as JsonObject) }
    }
  } catch (error) {
    logForDebugging(`Error parsing CLAUDE_CODE_EXTRA_BODY: ${errorMessage(error)}`)
  }
}
17
// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (feature('ANTI_DISTILLATION_CC') ? /* gate check */ : false) {
  result.anti_distillation = ['fake_tools']
}
22
return result
24}

Note the shallow clone — safeParseJSON uses an LRU cache, so directly mutating the return value would poison the cache, causing subsequent calls to see the modified value.

Prompt Cache Control

Prompt caching can be controlled at per-model granularity:

src/services/api/claude.ts
TypeScript
1export function getPromptCachingEnabled(model: string): boolean {
2  if (isEnvTruthy(process.env.DISABLE_PROMPT_CACHING)) return false
3  if (isEnvTruthy(process.env.DISABLE_PROMPT_CACHING_HAIKU)) {
4    if (model === getSmallFastModel()) return false
5  }
6  if (isEnvTruthy(process.env.DISABLE_PROMPT_CACHING_SONNET)) {
7    if (model === getDefaultSonnetModel()) return false
8  }
9  // ...
10}

This per-model disabling design stems from practical needs — the cache creation cost for some models may not be worthwhile (for instance, Haiku is already inexpensive, and the cache creation fee can actually exceed the savings).

Retry System

The retry logic is the most complex part of the API client, defined in withRetry.ts.

Retry Configuration

src/services/api/withRetry.ts
TypeScript
1const DEFAULT_MAX_RETRIES = 10
2const FLOOR_OUTPUT_TOKENS = 3000
3const MAX_529_RETRIES = 3
4export const BASE_DELAY_MS = 500

Foreground vs. Background Query Sources

Not all queries should be retried. Background queries (summaries, title generation, classifiers) immediately give up on 529 errors — they are not what the user is waiting for, and retrying would only amplify capacity cascades:

src/services/api/withRetry.ts
TypeScript
1const FOREGROUND_529_RETRY_SOURCES = new Set<QuerySource>([
'repl_main_thread',
'repl_main_thread:outputStyle:custom',
'repl_main_thread:outputStyle:Explanatory',
'repl_main_thread:outputStyle:Learning',
'sdk',
'agent:custom',
'agent:default',
'agent:builtin',
'compact',
'hook_agent',
'hook_prompt',
'verification_agent',
'side_question',
'auto_mode',
16])

Retry State Machine

...

Fast Mode Fallback

Fast Mode is a low-latency mode. When rate-limited, the system must decide whether to wait (preserving cache hits) or fall back (switching to standard speed):

src/services/api/withRetry.ts
TypeScript
1if (wasFastModeActive && !isPersistentRetryEnabled() &&
  error instanceof APIError &&
  (error.status === 429 || is529Error(error))) {
// Overage limit — permanently disable fast mode
const overageReason = error.headers?.get(
  'anthropic-ratelimit-unified-overage-disabled-reason',
)
if (overageReason !== null && overageReason !== undefined) {
  handleFastModeOverageRejection(overageReason)
  retryContext.fastMode = false
  continue
}
13
const retryAfterMs = getRetryAfterMs(error)
if (retryAfterMs !== null && retryAfterMs < SHORT_RETRY_THRESHOLD_MS) {
  // Short wait — keep fast mode to protect prompt cache
  await sleep(retryAfterMs, options.signal, { abortError })
  continue
}
20
// Long wait or unknown — enter cooldown period (switch to standard speed)
const cooldownMs = Math.max(
  retryAfterMs ?? DEFAULT_FAST_MODE_FALLBACK_HOLD_MS,
  MIN_COOLDOWN_MS,
)
triggerFastModeCooldown(Date.now() + cooldownMs, cooldownReason)
retryContext.fastMode = false
continue
29}

The decision logic:

retry-after < threshold -- short wait, keep fast mode (protects prompt cache from invalidation)
retry-after >= threshold or unknown -- enter cooldown period, switch to standard speed
Overage limit -- permanently disable fast mode

Authentication Error Recovery

src/services/api/withRetry.ts
TypeScript
1const isStaleConnection = isStaleConnectionError(lastError)
2if (isStaleConnection && getFeatureValue_CACHED_MAY_BE_STALE(...)) {
3  disableKeepAlive()  // Disable connection pool, rebuild connection
4}
5
6if (
7  client === null ||
8  (lastError instanceof APIError && lastError.status === 401) ||
9  isOAuthTokenRevokedError(lastError) ||
10  isBedrockAuthError(lastError) ||
11  isVertexAuthError(lastError) ||
12  isStaleConnection
13) {
14  if ((lastError instanceof APIError && lastError.status === 401) ||
15      isOAuthTokenRevokedError(lastError)) {
16    const failedAccessToken = getClaudeAIOAuthTokens()?.accessToken
17    if (failedAccessToken) {
18      await handleOAuth401Error(failedAccessToken)
19    }
20  }
21  client = await getClient()  // Rebuild client
22}

Authentication recovery covers special cases for all providers:

Anthropic 1P — refresh OAuth token on 401
AWS Bedrock — 403 or CredentialsProviderError
GCP Vertex — credential refresh failure
Connection reset — disable keep-alive and reconnect on ECONNRESET/EPIPE

Consecutive 529 Errors and Model Fallback

src/services/api/withRetry.ts
TypeScript
1if (is529Error(error) &&
  (process.env.FALLBACK_FOR_ALL_PRIMARY_MODELS ||
   (!isClaudeAISubscriber() && isNonCustomOpusModel(options.model)))) {
consecutive529Errors++
if (consecutive529Errors >= MAX_529_RETRIES) {
  if (options.fallbackModel) {
    throw new FallbackTriggeredError(
      options.model,
      options.fallbackModel,
    )
  }
}
13}

After 3 consecutive 529 errors, a model fallback is triggered (e.g., Opus to Sonnet). FallbackTriggeredError is caught and handled by query.ts — existing assistant messages are cleared, the model is switched, and the entire request is retried.

Persistent Retry (Unattended Mode)

For automation scenarios (CI/CD, cron jobs), the system supports unlimited retries:

src/services/api/withRetry.ts
TypeScript
1const PERSISTENT_MAX_BACKOFF_MS = 5 * 60 * 1000    // 5 minute max backoff
2const PERSISTENT_RESET_CAP_MS = 6 * 60 * 60 * 1000 // 6 hour timeout
3const HEARTBEAT_INTERVAL_MS = 30_000                 // 30 second heartbeat
4
5function isPersistentRetryEnabled(): boolean {
6  return feature('UNATTENDED_RETRY')
7    ? isEnvTruthy(process.env.CLAUDE_CODE_UNATTENDED_RETRY)
8    : false
9}

Persistent retry sends heartbeats via SystemAPIErrorMessage, preventing the host environment (such as a container orchestration system) from marking the session as idle.

Cost Tracking

Every API response updates the cost state:

src/cost-tracker.ts
TypeScript
1type StoredCostState = {
2  totalCostUSD: number
3  totalAPIDuration: number
4  totalAPIDurationWithoutRetries: number
5  totalToolDuration: number
6  totalLinesAdded: number
7  totalLinesRemoved: number
8  lastDuration: number | undefined
9  modelUsage: { [modelName: string]: ModelUsage } | undefined
10}

Cost calculation uses the calculateUSDCost function based on per-model pricing tables:

src/services/api/claude.ts
TypeScript
1import { addToTotalSessionCost } from 'src/cost-tracker.js'

The cost state is not just for display — it is saved to the project configuration during session switches and read back on resume:

src/cost-tracker.ts
TypeScript
1export function saveCurrentSessionCosts(fpsMetrics?: FpsMetrics): void {
2  saveCurrentProjectConfig(current => ({
3    ...current,
4    lastCost: getTotalCostUSD(),
5    lastAPIDuration: getTotalAPIDuration(),
6    lastAPIDurationWithoutRetries: getTotalAPIDurationWithoutRetries(),
7    lastToolDuration: getTotalToolDuration(),
8    lastDuration: getTotalDuration(),
9    // ...
10  }))
11}

Bootstrap API

At startup, the system fetches server-side configuration via the Bootstrap API:

src/services/api/bootstrap.ts
TypeScript
1async function fetchBootstrapAPI(): Promise<BootstrapResponse | null> {
if (isEssentialTrafficOnly()) return null  // Skip in privacy mode
if (getAPIProvider() !== 'firstParty') return null  // Skip for third-party providers
4
// OAuth preferred, API Key fallback
const hasUsableOAuth =
  getClaudeAIOAuthTokens()?.accessToken && hasProfileScope()
if (!hasUsableOAuth && !apiKey) return null
9
const endpoint = `${getOauthConfig().BASE_API_URL}/api/claude_cli/bootstrap`
11
return await withOAuth401Retry(async () => {
  const token = getClaudeAIOAuthTokens()?.accessToken
  // Re-read OAuth token each time (retry may have refreshed it)
  let authHeaders: Record<string, string>
  if (token && hasProfileScope()) {
    authHeaders = { Authorization: `Bearer ${token}`, ... }
  } else if (apiKey) {
    authHeaders = { 'x-api-key': apiKey }
  } else {
    return null
  }
23
  const response = await axios.get(endpoint, {
    headers: { ...authHeaders },
    timeout: 5000,
  })
  return bootstrapResponseSchema().safeParse(response.data)
})
30}

The data returned by Bootstrap includes:

client_data — client configuration
additional_model_options — list of additional available models

The 5-second timeout ensures startup doesn't hang due to network issues.

Streaming Response Handling

The main loop in query.ts consumes streaming responses via for await...of. Key processing logic includes:

Fallback Handling

When model fallback is triggered during streaming, partially received messages need to be discarded:

src/query.ts
TypeScript
1if (streamingFallbackOccured) {
// Generate tombstones for already-emitted messages
for (const msg of assistantMessages) {
  yield { type: 'tombstone' as const, message: msg }
}
6
assistantMessages.length = 0
toolResults.length = 0
toolUseBlocks.length = 0
needsFollowUp = false
11
// Discard pending results from the streaming tool executor
if (streamingToolExecutor) {
  streamingToolExecutor.discard()
  streamingToolExecutor = new StreamingToolExecutor(
    toolUseContext.options.tools,
    canUseTool,
    toolUseContext,
  )
}
21}

Tombstone messages tell the UI and transcript to remove these partial messages — it is particularly important to remove incomplete thinking blocks, as they carry model-specific signatures that would cause API errors after falling back to a different model.

Error Suppression and Recovery

Certain API errors are recoverable — the system suppresses them within the streaming loop and attempts recovery after the stream ends:

src/query.ts
TypeScript
1let withheld = false
2if (feature('CONTEXT_COLLAPSE')) {
3  if (contextCollapse?.isWithheldPromptTooLong(message, ...)) {
4    withheld = true
5  }
6}
7if (reactiveCompact?.isWithheldPromptTooLong(message)) {
8  withheld = true
9}
10if (mediaRecoveryEnabled && reactiveCompact?.isWithheldMediaSizeError(message)) {
11  withheld = true
12}
13if (isWithheldMaxOutputTokens(message)) {
14  withheld = true
15}
16if (!withheld) {
17  yield yieldMessage
18}

Suppressed messages still join the assistantMessages array — the recovery logic needs to inspect them. However, they are not sent to SDK consumers, as those consumers (such as desktop applications) might terminate the session upon seeing an error.

Request Construction Details

Tool Schema Conversion

Each tool definition needs to be converted to an API-compatible format, including handling of deferred tools:

TypeScript
1// Reference: src/services/api/claude.ts
2import {
3  formatDeferredToolLine,
4  isDeferredTool,
5  TOOL_SEARCH_TOOL_NAME,
6} from '../../tools/ToolSearchTool/prompt.js'

Advisor Mode

When Advisor is enabled, an additional model (such as Opus advising Sonnet) participates in decision-making:

src/services/api/claude.ts
TypeScript
1import {
2  ADVISOR_TOOL_INSTRUCTIONS,
3  getExperimentAdvisorModels,
4  isAdvisorEnabled,
5  isValidAdvisorModel,
6  modelSupportsAdvisor,
7} from 'src/utils/advisor.js'

Session Activity Tracking

During API requests, the session is marked as active, used for resource management in remote environments:

src/services/api/claude.ts
TypeScript
1import {
2  startSessionActivity,
3  stopSessionActivity,
4} from '../../utils/sessionActivity.js'

Summary

Claude Code's API client is a multi-layered defense system:

Multi-provider abstraction — unified interface for Anthropic/Bedrock/Vertex/Foundry, configured via environment variables
Layered retry — different strategies for different error types (authentication/rate-limiting/overload/connection reset)
Intelligent fallback — Fast Mode to standard speed to alternative model, with sound decision logic at each step
Streaming error suppression — recoverable errors are not immediately exposed to consumers, giving the system a chance to recover
Full-chain cost tracking — from API response to project configuration persistence, with support for session resumption
Operational knobs — prompt caching, Fast Mode, retry strategies, and more are all controllable via environment variables and feature flags

The complexity of this system is not accidental — it reflects the reality that production AI applications face: networks are unreliable, services get overloaded, credentials expire, and users need an uninterrupted experience. Every layer of protection corresponds to a real-world failure mode.