Introduction
An AI coding assistant's knowledge has a natural cutoff date — the point in time of the model's training data. When users ask "how do I use React 19's new API" or "what breaking changes are in the latest version of this npm package," the AI can only rely on internet access to get up-to-date information.
But letting AI access the internet introduces new security challenges:
- SSRF (Server-Side Request Forgery) — The AI could be injected with malicious URLs to access internal network services
- Data exfiltration — Malicious web pages could instruct the AI to send user code to external destinations
- Token bombs — A massive web page could consume the entire context space
- Credential leakage — If the AI accesses web pages with the user's cookies or tokens, it could leak credentials
Claude Code addresses these issues through two dedicated tools: WebFetchTool (fetches content from a specified URL) and WebSearchTool (searches the internet). In CCR (Claude Code Remote) environments, an additional upstream proxy layer provides extra network control.
WebFetchTool: Content Retrieval
WebFetchTool retrieves content from a specified URL and lets the AI process the fetched content using a natural language prompt.
Input Model
24const inputSchema = lazySchema(() =>
25 z.strictObject({
26 url: z.string().url().describe('The URL to fetch content from'),
27 prompt: z.string().describe('The prompt to run on the fetched content'),
28 }),
29)
Two parameters: url and prompt. The prompt is designed so the AI does not just fetch raw content, but extracts information with a specific purpose. For example: "Extract all endpoints and their parameters from this API documentation."
The output includes the HTTP status code, processed text, fetch duration, and content size:
32const outputSchema = lazySchema(() =>
33 z.object({
34 bytes: z.number().describe('Size of the fetched content in bytes'),
35 code: z.number().describe('HTTP response code'),
36 codeText: z.string().describe('HTTP response code text'),
37 result: z.string().describe('Processed result from applying the prompt'),
38 durationMs: z.number().describe('Time taken to fetch and process'),
39 url: z.string().describe('The URL that was fetched'),
40 }),
41)
Pre-Approved Domain Allowlist
One of WebFetchTool's most important security mechanisms is the pre-approved domain list:
14export const PREAPPROVED_HOSTS = new Set([
15 // Anthropic
16 'platform.claude.com',
17 'code.claude.com',
18 'modelcontextprotocol.io',
19
20 // Top Programming Languages
21 'docs.python.org',
22 'en.cppreference.com',
23 'developer.mozilla.org',
24 'doc.rust-lang.org',
25 'www.typescriptlang.org',
26
27 // Web Frameworks
28 'react.dev',
29 'nextjs.org',
30 'vuejs.org',
31 'tailwindcss.com',
32
33 // Cloud & DevOps
34 'docs.aws.amazon.com',
35 'cloud.google.com',
36 'kubernetes.io',
37
38 // ... 100+ domains total
39])
These domains can be accessed without user confirmation. The selection criteria for the list is "code-related documentation sites" — they are read-only reference materials that do not involve authentication or user data.
Note the security warning in the source code:
1// SECURITY WARNING: These preapproved domains are ONLY for WebFetch (GET requests only).
2// The sandbox system deliberately does NOT inherit this list for network restrictions,
3// as arbitrary network access (POST, uploads, etc.) to these domains could enable
4// data exfiltration. Some domains like huggingface.co, kaggle.com, and nuget.org
5// allow file uploads and would be dangerous for unrestricted network access.
This is a critical security distinction: WebFetch only makes GET requests (read-only), while the sandbox's network restrictions control arbitrary network operations (including POST). The two cannot share an allowlist.
Path-Level Pre-Approval
136const { HOSTNAME_ONLY, PATH_PREFIXES } = (() => {
137 const hosts = new Set<string>()
138 const paths = new Map<string, string[]>()
139 for (const entry of PREAPPROVED_HOSTS) {
140 const slash = entry.indexOf('/')
141 if (slash === -1) {
142 hosts.add(entry)
143 } else {
144 const host = entry.slice(0, slash)
145 const path = entry.slice(slash)
146 const prefixes = paths.get(host)
147 if (prefixes) prefixes.push(path)
148 else paths.set(host, [path])
149 }
150 }
151 return { HOSTNAME_ONLY: hosts, PATH_PREFIXES: paths }
152})()
153
154export function isPreapprovedHost(hostname: string, pathname: string): boolean {
155 if (HOSTNAME_ONLY.has(hostname)) return true
156 const prefixes = PATH_PREFIXES.get(hostname)
157 if (prefixes) {
158 for (const p of prefixes) {
159 // Enforce path segment boundaries
160 if (pathname === p || pathname.startsWith(p + '/')) return true
161 }
162 }
163 return false
164}
Some domains are only pre-approved for specific paths. For example, github.com/anthropics is pre-approved, but github.com/random-user is not. Path matching enforces segment boundaries (/), preventing /anthropics-evil/malware from being falsely matched.
The data structure is preprocessed at module load time into two lookup tables (HOSTNAME_ONLY Set and PATH_PREFIXES Map), making runtime matching O(1).
Permission Check Flow
Permission rules are stored in domain:hostname format. When a user approves access to a domain, all URLs on that domain are approved.
Authentication Warning in the Prompt
181 async prompt(_options) {
182 return `IMPORTANT: WebFetch WILL FAIL for authenticated or private URLs. Before using this tool, check if the URL points to an authenticated service (e.g. Google Docs, Confluence, Jira, GitHub). If so, look for a specialized MCP tool that provides authenticated access.
183${DESCRIPTION}`
184 },
This warning is always included in the prompt, regardless of whether ToolSearchTool is available. The source code comments explain why: if this prefix were conditionally toggled based on ToolSearch availability, it would cause the tool description to "flicker" between consecutive API calls, breaking Anthropic API's prompt caching — each flicker means two cache misses.
WebSearchTool: Internet Search
WebSearchTool uses Anthropic's Web Search API to search the internet. Unlike WebFetchTool, it does not fetch a specific URL but searches the entire internet.
Architectural Uniqueness
WebSearchTool is not simply calling a search API — it uses a model-within-a-model architecture:
254 async call(input, context, _canUseTool, _parentMessage, onProgress) {
255 const { query } = input
256 const userMessage = createUserMessage({
257 content: 'Perform a web search for the query: ' + query,
258 })
259 const toolSchema = makeToolSchema(input)
260
261 const queryStream = queryModelWithStreaming({
262 messages: [userMessage],
263 systemPrompt: asSystemPrompt([
264 'You are an assistant for performing a web search tool use',
265 ]),
266 tools: [],
267 signal: context.abortController.signal,
268 options: {
269 extraToolSchemas: [toolSchema],
270 querySource: 'web_search_tool',
271 // ...
272 },
273 })
274 // ...
275 }
It creates an internal API call, passing a tool schema of type web_search_20250305. The API side automatically executes the search and returns results. The benefit of this architecture is: the actual search execution is handled by Anthropic's infrastructure, and the client only needs to process the streaming response.
Search Limits
76function makeToolSchema(input: Input): BetaWebSearchTool20250305 {
77 return {
78 type: 'web_search_20250305',
79 name: 'web_search',
80 allowed_domains: input.allowed_domains,
81 blocked_domains: input.blocked_domains,
82 max_uses: 8, // Hardcoded to 8 searches maximum
83 }
84}
Each call executes a maximum of 8 searches. allowed_domains and blocked_domains let the AI control the search scope — for example, searching only official documentation sites, or excluding known low-quality result sources.
Provider Availability
169 isEnabled() {
170 const provider = getAPIProvider()
171 const model = getMainLoopModel()
172
173 if (provider === 'firstParty') return true
174
175 if (provider === 'vertex') {
176 const supportsWebSearch =
177 model.includes('claude-opus-4') ||
178 model.includes('claude-sonnet-4') ||
179 model.includes('claude-haiku-4')
180 return supportsWebSearch
181 }
182
183 if (provider === 'foundry') return true
184
185 return false
186 },
WebSearchTool is only available on providers that support the Web Search API: Anthropic first-party, Google Vertex (Claude 4.0+ models only), and Foundry.
Progress Reporting
298 for await (const event of queryStream) {
299 // Track tool use ID when server_tool_use starts
300 if (event.type === 'stream_event' &&
301 event.event?.type === 'content_block_start') {
302 const contentBlock = event.event.content_block
303 if (contentBlock?.type === 'server_tool_use') {
304 currentToolUseId = contentBlock.id
305 currentToolUseJson = ''
306 }
307 }
308
309 // Accumulate JSON for current tool use
310 if (currentToolUseId &&
311 event.type === 'stream_event' &&
312 event.event?.type === 'content_block_delta') {
313 const delta = event.event.delta
314 if (delta?.type === 'input_json_delta' && delta.partial_json) {
315 currentToolUseJson += delta.partial_json
316 // Try to extract query from partial JSON for progress updates
317 // ...
318 }
319 }
320
321 // Yield progress when search results come in
322 if (event.type === 'stream_event' &&
323 event.event?.type === 'content_block_start') {
324 const contentBlock = event.event.content_block
325 if (contentBlock?.type === 'web_search_tool_result') {
326 // Report progress
327 if (onProgress) {
328 onProgress({
329 toolUseID: toolUseId,
330 data: { type: 'search_results_received', resultCount, query },
331 })
332 }
333 }
334 }
335 }
WebSearchTool reports progress during the search process via the onProgress callback. Since the search is streaming, it can update the UI in real-time as search results arrive, rather than waiting for all searches to complete before returning.
Upstream Proxy
In CCR (Claude Code Remote) environments, all network traffic is routed through an upstream proxy, providing additional security controls.
Initialization Flow
sequenceDiagram
participant CLI as Claude Code
participant Token as /run/ccr/session_token
participant API as Anthropic API
participant Relay as Local Relay
CLI->>Token: Read session token
Token-->>CLI: session_token
CLI->>CLI: prctl(PR_SET_DUMPABLE, 0)<br>Block ptrace from reading heap memory
CLI->>API: GET /v1/code/upstreamproxy/ca-cert
API-->>CLI: CA certificate
CLI->>CLI: Merge system CA + proxy CA
CLI->>Relay: Start CONNECT-to-WebSocket relay
Relay-->>CLI: Listening on 127.0.0.1:PORT
CLI->>Token: unlink(session_token)<br>Token exists only in heap memory
CLI->>CLI: Set environment variables<br>HTTPS_PROXY, SSL_CERT_FILE
79export async function initUpstreamProxy(opts?) {
80 if (!isEnvTruthy(process.env.CLAUDE_CODE_REMOTE)) return state
81 if (!isEnvTruthy(process.env.CCR_UPSTREAM_PROXY_ENABLED)) return state
82
83 const token = await readToken(tokenPath)
84 if (!token) return state
85
86 setNonDumpable()
87
88 const caOk = await downloadCaBundle(baseUrl, systemCaPath, caBundlePath)
89 if (!caOk) return state
90
91 try {
92 const relay = await startUpstreamProxyRelay({ wsUrl, sessionId, token })
93 registerCleanup(async () => relay.stop())
94 state = { enabled: true, port: relay.port, caBundlePath }
95
96 // Only unlink after the listener is up
97 await unlink(tokenPath).catch(() => {})
98 } catch (err) {
99 // Fail open — a broken proxy must never break a session
100 }
101
102 return state
103}
Key security measures:
-
prctl protection — PR_SET_DUMPABLE=0 prevents same-UID processes from reading this process's heap memory via ptrace. This blocks prompt injection attacks that attempt to steal the session token via gdb -p $PPID
-
Token file deletion — The token is deleted from disk after the relay starts successfully, remaining only in process memory. Deletion occurs only after the relay confirms availability, so the supervisor can retry with the on-disk token if startup fails
-
Fail open — Failure at any step simply disables the proxy without interrupting the session. The comment makes it clear: "A broken proxy setup must never break an otherwise-working session."
NO_PROXY List
37const NO_PROXY_LIST = [
38 'localhost', '127.0.0.1', '::1',
39 '169.254.0.0/16', // Link-local
40 '10.0.0.0/8', // RFC1918
41 '172.16.0.0/12',
42 '192.168.0.0/16',
43
44 // Anthropic API — three forms because NO_PROXY parsing differs:
45 'anthropic.com', // apex domain fallback
46 '.anthropic.com', // Python urllib/httpx (suffix match)
47 '*.anthropic.com', // Bun, curl, Go (glob match)
48
49 'github.com',
50 'registry.npmjs.org',
51 'pypi.org',
52].join(',')
The same Anthropic API domain uses three different formats because different runtimes (Bun, Python, Go) parse NO_PROXY differently. This defensive programming ensures Anthropic API requests never go through the upstream proxy, avoiding the MITM proxy's fake CA from breaking HTTPS validation in non-Bun runtimes.
Environment Variable Propagation
160export function getUpstreamProxyEnv(): Record<string, string> {
161 if (!state.enabled || !state.port || !state.caBundlePath) {
162 // If we inherited proxy vars from the parent, pass them through
163 if (process.env.HTTPS_PROXY && process.env.SSL_CERT_FILE) {
164 const inherited: Record<string, string> = {}
165 for (const key of ['HTTPS_PROXY', 'https_proxy', 'NO_PROXY', 'no_proxy',
166 'SSL_CERT_FILE', 'NODE_EXTRA_CA_CERTS', 'REQUESTS_CA_BUNDLE',
167 'CURL_CA_BUNDLE']) {
168 if (process.env[key]) inherited[key] = process.env[key]
169 }
170 return inherited
171 }
172 return {}
173 }
174 const proxyUrl = `http://127.0.0.1:${state.port}`
175 return {
176 HTTPS_PROXY: proxyUrl,
177 https_proxy: proxyUrl, // lowercase for Python
178 NO_PROXY: NO_PROXY_LIST,
179 no_proxy: NO_PROXY_LIST, // lowercase for Python
180 SSL_CERT_FILE: state.caBundlePath,
181 NODE_EXTRA_CA_CERTS: state.caBundlePath,
182 REQUESTS_CA_BUNDLE: state.caBundlePath, // Python requests
183 CURL_CA_BUNDLE: state.caBundlePath, // curl
184 }
185}
Proxy environment variables are set in multiple formats to cover different client libraries:
HTTPS_PROXY / https_proxy — Both cases (Node.js uses uppercase, Python uses lowercase)
SSL_CERT_FILE — OpenSSL generic
NODE_EXTRA_CA_CERTS — Node.js specific
REQUESTS_CA_BUNDLE — Python requests library
CURL_CA_BUNDLE — curl command
Subprocesses (Bash, MCP, LSP, Hooks) all inherit these variables through subprocessEnv().
Security Considerations Summary
Web Tool Security Layers
Pre-approved domain allowlist
100+ code documentation sites · GET requests only
Permission system
domain:hostname rules · allow/deny/ask
URL validation
Must be a valid URL · Rejects invalid formats
Authentication warning
Explicitly stated in prompt · WebFetch does not support auth
Sandbox network restrictions
Independent from WebFetch allowlist · Controls all network operations
Upstream proxy (CCR)
HTTPS MITM · Token protection + prctl
Six layers of security protection, from the most permissive (pre-approved allowlist auto-approves) to the most restrictive (upstream proxy MITM interception), forming defense in depth.
The key security isolation: WebFetch allowlist =/= sandbox network allowlist. huggingface.co may be a safe source for reading documentation (WebFetch), but allowing it through the sandbox for arbitrary network operations could become a data exfiltration channel (it supports file uploads).
Design Takeaways
Claude Code's web tool design embodies several core principles:
-
Least privilege — No domain access is allowed by default; only code documentation sites are pre-approved, everything else requires explicit user authorization
-
Security isolation — WebFetch (read-only GET) and sandbox network (arbitrary operations) have independent allowlists; WebSearch does not need an allowlist (controlled on the API side)
-
Fail open vs. fail closed — The upstream proxy fails open (does not break the session), but permission checks fail closed (block access). This reflects different components' risk levels
-
Multi-runtime compatibility — Environment variables, NO_PROXY formats, CA certificate paths — every network configuration accounts for differences across Bun/Node.js/Python/curl