claude-harness — Deconstructing Claude Code

Introduction

An AI coding assistant's knowledge has a natural cutoff date — the point in time of the model's training data. When users ask "how do I use React 19's new API" or "what breaking changes are in the latest version of this npm package," the AI can only rely on internet access to get up-to-date information.

But letting AI access the internet introduces new security challenges:

SSRF (Server-Side Request Forgery) — The AI could be injected with malicious URLs to access internal network services
Data exfiltration — Malicious web pages could instruct the AI to send user code to external destinations
Token bombs — A massive web page could consume the entire context space
Credential leakage — If the AI accesses web pages with the user's cookies or tokens, it could leak credentials

Claude Code addresses these issues through two dedicated tools: WebFetchTool (fetches content from a specified URL) and WebSearchTool (searches the internet). In CCR (Claude Code Remote) environments, an additional upstream proxy layer provides extra network control.

WebFetchTool: Content Retrieval

WebFetchTool retrieves content from a specified URL and lets the AI process the fetched content using a natural language prompt.

Input Model

src/tools/WebFetchTool/WebFetchTool.ts:24-30
TypeScript
24const inputSchema = lazySchema(() =>
25  z.strictObject({
26    url: z.string().url().describe('The URL to fetch content from'),
27    prompt: z.string().describe('The prompt to run on the fetched content'),
28  }),
29)

Two parameters: url and prompt. The prompt is designed so the AI does not just fetch raw content, but extracts information with a specific purpose. For example: "Extract all endpoints and their parameters from this API documentation."

The output includes the HTTP status code, processed text, fetch duration, and content size:

src/tools/WebFetchTool/WebFetchTool.ts:32-45
TypeScript
32const outputSchema = lazySchema(() =>
33  z.object({
34    bytes: z.number().describe('Size of the fetched content in bytes'),
35    code: z.number().describe('HTTP response code'),
36    codeText: z.string().describe('HTTP response code text'),
37    result: z.string().describe('Processed result from applying the prompt'),
38    durationMs: z.number().describe('Time taken to fetch and process'),
39    url: z.string().describe('The URL that was fetched'),
40  }),
41)

Pre-Approved Domain Allowlist

One of WebFetchTool's most important security mechanisms is the pre-approved domain list:

src/tools/WebFetchTool/preapproved.ts:14-131
TypeScript
14export const PREAPPROVED_HOSTS = new Set([
// Anthropic
'platform.claude.com',
'code.claude.com',
'modelcontextprotocol.io',
19
// Top Programming Languages
'docs.python.org',
'en.cppreference.com',
'developer.mozilla.org',
'doc.rust-lang.org',
'www.typescriptlang.org',
26
// Web Frameworks
'react.dev',
'nextjs.org',
'vuejs.org',
'tailwindcss.com',
32
// Cloud & DevOps
'docs.aws.amazon.com',
'cloud.google.com',
'kubernetes.io',
37
// ... 100+ domains total
39])

These domains can be accessed without user confirmation. The selection criteria for the list is "code-related documentation sites" — they are read-only reference materials that do not involve authentication or user data.

Note the security warning in the source code:

Text
1// SECURITY WARNING: These preapproved domains are ONLY for WebFetch (GET requests only).
2// The sandbox system deliberately does NOT inherit this list for network restrictions,
3// as arbitrary network access (POST, uploads, etc.) to these domains could enable
4// data exfiltration. Some domains like huggingface.co, kaggle.com, and nuget.org
5// allow file uploads and would be dangerous for unrestricted network access.

This is a critical security distinction: WebFetch only makes GET requests (read-only), while the sandbox's network restrictions control arbitrary network operations (including POST). The two cannot share an allowlist.

Path-Level Pre-Approval

src/tools/WebFetchTool/preapproved.ts:136-166
TypeScript
136const { HOSTNAME_ONLY, PATH_PREFIXES } = (() => {
const hosts = new Set<string>()
const paths = new Map<string, string[]>()
for (const entry of PREAPPROVED_HOSTS) {
  const slash = entry.indexOf('/')
  if (slash === -1) {
    hosts.add(entry)
  } else {
    const host = entry.slice(0, slash)
    const path = entry.slice(slash)
    const prefixes = paths.get(host)
    if (prefixes) prefixes.push(path)
    else paths.set(host, [path])
  }
}
return { HOSTNAME_ONLY: hosts, PATH_PREFIXES: paths }
152})()
153
154export function isPreapprovedHost(hostname: string, pathname: string): boolean {
if (HOSTNAME_ONLY.has(hostname)) return true
const prefixes = PATH_PREFIXES.get(hostname)
if (prefixes) {
  for (const p of prefixes) {
    // Enforce path segment boundaries
    if (pathname === p || pathname.startsWith(p + '/')) return true
  }
}
return false
164}

Some domains are only pre-approved for specific paths. For example, github.com/anthropics is pre-approved, but github.com/random-user is not. Path matching enforces segment boundaries (/), preventing /anthropics-evil/malware from being falsely matched.

The data structure is preprocessed at module load time into two lookup tables (HOSTNAME_ONLY Set and PATH_PREFIXES Map), making runtime matching O(1).

Permission Check Flow

...

Permission rules are stored in domain:hostname format. When a user approves access to a domain, all URLs on that domain are approved.

Authentication Warning in the Prompt

src/tools/WebFetchTool/WebFetchTool.ts:181-189
TypeScript
181  async prompt(_options) {
182    return `IMPORTANT: WebFetch WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub). If so, look for a specialized MCP tool that provides authenticated access.
183${DESCRIPTION}`
184  },

This warning is always included in the prompt, regardless of whether ToolSearchTool is available. The source code comments explain why: if this prefix were conditionally toggled based on ToolSearch availability, it would cause the tool description to "flicker" between consecutive API calls, breaking Anthropic API's prompt caching — each flicker means two cache misses.

WebSearchTool: Internet Search

WebSearchTool uses Anthropic's Web Search API to search the internet. Unlike WebFetchTool, it does not fetch a specific URL but searches the entire internet.

Architectural Uniqueness

WebSearchTool is not simply calling a search API — it uses a model-within-a-model architecture:

src/tools/WebSearchTool/WebSearchTool.ts:254-291
TypeScript
async call(input, context, _canUseTool, _parentMessage, onProgress) {
  const { query } = input
  const userMessage = createUserMessage({
    content: 'Perform a web search for the query: ' + query,
  })
  const toolSchema = makeToolSchema(input)
260
  const queryStream = queryModelWithStreaming({
    messages: [userMessage],
    systemPrompt: asSystemPrompt([
      'You are an assistant for performing a web search tool use',
    ]),
    tools: [],
    signal: context.abortController.signal,
    options: {
      extraToolSchemas: [toolSchema],
      querySource: 'web_search_tool',
      // ...
    },
  })
  // ...
}

It creates an internal API call, passing a tool schema of type web_search_20250305. The API side automatically executes the search and returns results. The benefit of this architecture is: the actual search execution is handled by Anthropic's infrastructure, and the client only needs to process the streaming response.

Search Limits

src/tools/WebSearchTool/WebSearchTool.ts:76-84
TypeScript
76function makeToolSchema(input: Input): BetaWebSearchTool20250305 {
77  return {
78    type: 'web_search_20250305',
79    name: 'web_search',
80    allowed_domains: input.allowed_domains,
81    blocked_domains: input.blocked_domains,
82    max_uses: 8, // Hardcoded to 8 searches maximum
83  }
84}

Each call executes a maximum of 8 searches. allowed_domains and blocked_domains let the AI control the search scope — for example, searching only official documentation sites, or excluding known low-quality result sources.

Provider Availability

src/tools/WebSearchTool/WebSearchTool.ts:169-193
TypeScript
isEnabled() {
  const provider = getAPIProvider()
  const model = getMainLoopModel()
172
  if (provider === 'firstParty') return true
174
  if (provider === 'vertex') {
    const supportsWebSearch =
      model.includes('claude-opus-4') ||
      model.includes('claude-sonnet-4') ||
      model.includes('claude-haiku-4')
    return supportsWebSearch
  }
182
  if (provider === 'foundry') return true
184
  return false
},

WebSearchTool is only available on providers that support the Web Search API: Anthropic first-party, Google Vertex (Claude 4.0+ models only), and Foundry.

Progress Reporting

src/tools/WebSearchTool/WebSearchTool.ts:298-388
TypeScript
  for await (const event of queryStream) {
    // Track tool use ID when server_tool_use starts
    if (event.type === 'stream_event' &&
        event.event?.type === 'content_block_start') {
      const contentBlock = event.event.content_block
      if (contentBlock?.type === 'server_tool_use') {
        currentToolUseId = contentBlock.id
        currentToolUseJson = ''
      }
    }
308
    // Accumulate JSON for current tool use
    if (currentToolUseId &&
        event.type === 'stream_event' &&
        event.event?.type === 'content_block_delta') {
      const delta = event.event.delta
      if (delta?.type === 'input_json_delta' && delta.partial_json) {
        currentToolUseJson += delta.partial_json
        // Try to extract query from partial JSON for progress updates
        // ...
      }
    }
320
    // Yield progress when search results come in
    if (event.type === 'stream_event' &&
        event.event?.type === 'content_block_start') {
      const contentBlock = event.event.content_block
      if (contentBlock?.type === 'web_search_tool_result') {
        // Report progress
        if (onProgress) {
          onProgress({
            toolUseID: toolUseId,
            data: { type: 'search_results_received', resultCount, query },
          })
        }
      }
    }
  }

WebSearchTool reports progress during the search process via the onProgress callback. Since the search is streaming, it can update the UI in real-time as search results arrive, rather than waiting for all searches to complete before returning.

Upstream Proxy

In CCR (Claude Code Remote) environments, all network traffic is routed through an upstream proxy, providing additional security controls.

Initialization Flow

sequenceDiagram
    participant CLI as Claude Code
    participant Token as /run/ccr/session_token
    participant API as Anthropic API
    participant Relay as Local Relay

    CLI->>Token: Read session token
    Token-->>CLI: session_token

    CLI->>CLI: prctl(PR_SET_DUMPABLE, 0)<br>Block ptrace from reading heap memory

    CLI->>API: GET /v1/code/upstreamproxy/ca-cert
    API-->>CLI: CA certificate

    CLI->>CLI: Merge system CA + proxy CA

    CLI->>Relay: Start CONNECT-to-WebSocket relay
    Relay-->>CLI: Listening on 127.0.0.1:PORT

    CLI->>Token: unlink(session_token)<br>Token exists only in heap memory

    CLI->>CLI: Set environment variables<br>HTTPS_PROXY, SSL_CERT_FILE

src/upstreamproxy/upstreamproxy.ts:79-153
TypeScript
79export async function initUpstreamProxy(opts?) {
80  if (!isEnvTruthy(process.env.CLAUDE_CODE_REMOTE)) return state
81  if (!isEnvTruthy(process.env.CCR_UPSTREAM_PROXY_ENABLED)) return state
82
83  const token = await readToken(tokenPath)
84  if (!token) return state
85
86  setNonDumpable()
87
88  const caOk = await downloadCaBundle(baseUrl, systemCaPath, caBundlePath)
89  if (!caOk) return state
90
91  try {
92    const relay = await startUpstreamProxyRelay({ wsUrl, sessionId, token })
93    registerCleanup(async () => relay.stop())
94    state = { enabled: true, port: relay.port, caBundlePath }
95
96    // Only unlink after the listener is up
97    await unlink(tokenPath).catch(() => {})
98  } catch (err) {
99    // Fail open — a broken proxy must never break a session
100  }
101
102  return state
103}

Key security measures:

prctl protection — PR_SET_DUMPABLE=0 prevents same-UID processes from reading this process's heap memory via ptrace. This blocks prompt injection attacks that attempt to steal the session token via gdb -p $PPID
Token file deletion — The token is deleted from disk after the relay starts successfully, remaining only in process memory. Deletion occurs only after the relay confirms availability, so the supervisor can retry with the on-disk token if startup fails
Fail open — Failure at any step simply disables the proxy without interrupting the session. The comment makes it clear: "A broken proxy setup must never break an otherwise-working session."

NO_PROXY List

src/upstreamproxy/upstreamproxy.ts:37-63
TypeScript
37const NO_PROXY_LIST = [
38  'localhost', '127.0.0.1', '::1',
39  '169.254.0.0/16',     // Link-local
40  '10.0.0.0/8',         // RFC1918
41  '172.16.0.0/12',
42  '192.168.0.0/16',
43
44  // Anthropic API — three forms because NO_PROXY parsing differs:
45  'anthropic.com',       // apex domain fallback
46  '.anthropic.com',      // Python urllib/httpx (suffix match)
47  '*.anthropic.com',     // Bun, curl, Go (glob match)
48
49  'github.com',
50  'registry.npmjs.org',
51  'pypi.org',
52].join(',')

The same Anthropic API domain uses three different formats because different runtimes (Bun, Python, Go) parse NO_PROXY differently. This defensive programming ensures Anthropic API requests never go through the upstream proxy, avoiding the MITM proxy's fake CA from breaking HTTPS validation in non-Bun runtimes.

Environment Variable Propagation

src/upstreamproxy/upstreamproxy.ts:160-199
TypeScript
160export function getUpstreamProxyEnv(): Record<string, string> {
if (!state.enabled || !state.port || !state.caBundlePath) {
  // If we inherited proxy vars from the parent, pass them through
  if (process.env.HTTPS_PROXY && process.env.SSL_CERT_FILE) {
    const inherited: Record<string, string> = {}
    for (const key of ['HTTPS_PROXY', 'https_proxy', 'NO_PROXY', 'no_proxy',
      'SSL_CERT_FILE', 'NODE_EXTRA_CA_CERTS', 'REQUESTS_CA_BUNDLE',
      'CURL_CA_BUNDLE']) {
      if (process.env[key]) inherited[key] = process.env[key]
    }
    return inherited
  }
  return {}
}
const proxyUrl = `http://127.0.0.1:${state.port}`
return {
  HTTPS_PROXY: proxyUrl,
  https_proxy: proxyUrl,       // lowercase for Python
  NO_PROXY: NO_PROXY_LIST,
  no_proxy: NO_PROXY_LIST,     // lowercase for Python
  SSL_CERT_FILE: state.caBundlePath,
  NODE_EXTRA_CA_CERTS: state.caBundlePath,
  REQUESTS_CA_BUNDLE: state.caBundlePath,  // Python requests
  CURL_CA_BUNDLE: state.caBundlePath,      // curl
}
185}

Proxy environment variables are set in multiple formats to cover different client libraries:

HTTPS_PROXY / https_proxy — Both cases (Node.js uses uppercase, Python uses lowercase)
SSL_CERT_FILE — OpenSSL generic
NODE_EXTRA_CA_CERTS — Node.js specific
REQUESTS_CA_BUNDLE — Python requests library
CURL_CA_BUNDLE — curl command

Subprocesses (Bash, MCP, LSP, Hooks) all inherit these variables through subprocessEnv().

Security Considerations Summary

Web Tool Security Layers

Pre-approved domain allowlist

100+ code documentation sites · GET requests only

Permission system

domain:hostname rules · allow/deny/ask

URL validation

Must be a valid URL · Rejects invalid formats

Authentication warning

Explicitly stated in prompt · WebFetch does not support auth

Sandbox network restrictions

Independent from WebFetch allowlist · Controls all network operations

Upstream proxy (CCR)

HTTPS MITM · Token protection + prctl

Six layers of security protection, from the most permissive (pre-approved allowlist auto-approves) to the most restrictive (upstream proxy MITM interception), forming defense in depth.

The key security isolation: WebFetch allowlist =/= sandbox network allowlist. huggingface.co may be a safe source for reading documentation (WebFetch), but allowing it through the sandbox for arbitrary network operations could become a data exfiltration channel (it supports file uploads).

Design Takeaways

Claude Code's web tool design embodies several core principles:

Least privilege — No domain access is allowed by default; only code documentation sites are pre-approved, everything else requires explicit user authorization
Security isolation — WebFetch (read-only GET) and sandbox network (arbitrary operations) have independent allowlists; WebSearch does not need an allowlist (controlled on the API side)
Fail open vs. fail closed — The upstream proxy fails open (does not break the session), but permission checks fail closed (block access). This reflects different components' risk levels
Multi-runtime compatibility — Environment variables, NO_PROXY formats, CA certificate paths — every network configuration accounts for differences across Bun/Node.js/Python/curl