The Search System: The Glob + Grep + Full-Text Search Combination

A deep dive into Claude Code's search tool combination — GlobTool pattern matching, GrepTool ripgrep integration, and ToolSearchTool deferred discovery

Introduction

When AI faces an unfamiliar codebase, its first question is not "how do I modify the code" but "where is the code." In a project containing tens of thousands of files, finding the right files and code locations is a prerequisite for everything else.

The traditional approach is to use find and grep commands. But these commands have several problems:

  1. Uncontrolled permissions — Shell commands bypass Claude Code's permission system
  2. Uncontrolled outputgrep -r pattern . might return several MB of results, consuming a massive number of tokens
  3. Unfriendly format — Shell command output is not always optimal for AI consumption

Claude Code's solution is three dedicated search tools: GlobTool (find files by name pattern), GrepTool (search by content), and ToolSearchTool (deferred tool discovery). Each solves a different layer of the search problem, and when used together they form a powerful search system.


GlobTool: File Pattern Matching

GlobTool is the simplest search tool — given a glob pattern (like **/*.ts), it returns all matching file paths.

Input and Output

src/tools/GlobTool/GlobTool.ts:26-53
TypeScript
26const inputSchema = lazySchema(() =>
27 z.strictObject({
28 pattern: z.string().describe('The glob pattern to match files against'),
29 path: z
30 .string()
31 .optional()
32 .describe(
33 'The directory to search in. If not specified, the current working directory will be used...',
34 ),
35 }),
36)
37
38const outputSchema = lazySchema(() =>
39 z.object({
40 durationMs: z.number().describe('Time taken to execute the search'),
41 numFiles: z.number().describe('Total number of files found'),
42 filenames: z.array(z.string()).describe('Array of file paths'),
43 truncated: z.boolean().describe('Whether results were truncated (limited to 100 files)'),
44 }),
45)

Only two input parameters: pattern and an optional path. The output contains four fields, where the truncated flag tells the AI whether results were cut off.

100-File Truncation

src/tools/GlobTool/GlobTool.ts:154-176
TypeScript
154 async call(input, { abortController, getAppState, globLimits }) {
155 const start = Date.now()
156 const appState = getAppState()
157 const limit = globLimits?.maxResults ?? 100
158 const { files, truncated } = await glob(
159 input.pattern,
160 GlobTool.getPath(input),
161 { limit, offset: 0 },
162 abortController.signal,
163 appState.toolPermissionContext,
164 )
165 // Relativize paths under cwd to save tokens
166 const filenames = files.map(toRelativePath)
167 const output: Output = {
168 filenames,
169 durationMs: Date.now() - start,
170 numFiles: filenames.length,
171 truncated,
172 }
173 return { data: output }
174 },

The default limit is 100 files. When results are truncated, a message prompts the AI to use a more specific path or pattern.

The truncation message:

src/tools/GlobTool/GlobTool.ts:186-196
TypeScript
186 mapToolResultToToolResultBlockParam(output, toolUseID) {
187 if (output.filenames.length === 0) {
188 return { tool_use_id: toolUseID, type: 'tool_result', content: 'No files found' }
189 }
190 return {
191 tool_use_id: toolUseID,
192 type: 'tool_result',
193 content: [
194 ...output.filenames,
195 ...(output.truncated
196 ? ['(Results are truncated. Consider using a more specific path or pattern.)']
197 : []),
198 ].join('\n'),
199 }
200 },

Path Relativization

TypeScript
1// Relativize paths under cwd to save tokens (same as GrepTool)
2const filenames = files.map(toRelativePath)

All returned paths are relativized — /Users/noah/project/src/index.ts becomes src/index.ts. This is a token optimization: the project root path prefix in absolute paths repeats for every file, and relativizing saves a significant number of tokens.

Concurrency Safety

src/tools/GlobTool/GlobTool.ts:76-81
TypeScript
76 isConcurrencySafe() {
77 return true
78 },
79 isReadOnly() {
80 return true
81 },

GlobTool is a completely concurrency-safe, read-only operation. Multiple GlobTool calls can execute in parallel without interfering with each other. This means the AI can simultaneously search **/*.ts and **/*.tsx without serialization.


GrepTool: Content Search Based on ripgrep

GrepTool is the core of the search system, built on ripgrep (rg), providing capabilities far beyond native grep.

Rich Input Schema

src/tools/GrepTool/GrepTool.ts:33-89
TypeScript
33const inputSchema = lazySchema(() =>
34 z.strictObject({
35 pattern: z.string().describe('The regular expression pattern to search for'),
36 path: z.string().optional().describe('File or directory to search in'),
37 glob: z.string().optional().describe('Glob pattern to filter files'),
38 output_mode: z.enum(['content', 'files_with_matches', 'count']).optional(),
39 '-B': semanticNumber(z.number().optional()).describe('Lines before match'),
40 '-A': semanticNumber(z.number().optional()).describe('Lines after match'),
41 '-C': semanticNumber(z.number().optional()).describe('Alias for context'),
42 context: semanticNumber(z.number().optional()).describe('Lines before and after'),
43 '-n': semanticBoolean(z.boolean().optional()).describe('Show line numbers'),
44 '-i': semanticBoolean(z.boolean().optional()).describe('Case insensitive'),
45 type: z.string().optional().describe('File type (js, py, rust, etc.)'),
46 head_limit: semanticNumber(z.number().optional()).describe('Limit output'),
47 offset: semanticNumber(z.number().optional()).describe('Skip first N entries'),
48 multiline: semanticBoolean(z.boolean().optional()).describe('Multiline mode'),
49 }),
50)

13 parameters! This is the tool with the most parameters in Claude Code. The design philosophy is: expose ripgrep's core capabilities directly to the AI, rather than over-abstracting.

Three Output Modes

GrepTool Output Modes
GrepTool
files_with_matches
(default) · Returns filenames only
content
Returns matching lines · with context
count
Returns match counts · filename:count
  • files_with_matches — Default mode. Returns only the paths of matching files, sorted by modification time. Ideal for locating before detailed reading
  • content — Returns matching lines with their context. Supports -A/-B/-C for controlling context lines
  • count — Returns the match count per file. Useful for quickly assessing search scope

Pagination System

src/tools/GrepTool/GrepTool.ts:106-128
TypeScript
106const DEFAULT_HEAD_LIMIT = 250
107
108function applyHeadLimit<T>(
109 items: T[],
110 limit: number | undefined,
111 offset: number = 0,
112): { items: T[]; appliedLimit: number | undefined } {
113 // Explicit 0 = unlimited escape hatch
114 if (limit === 0) {
115 return { items: items.slice(offset), appliedLimit: undefined }
116 }
117 const effectiveLimit = limit ?? DEFAULT_HEAD_LIMIT
118 const sliced = items.slice(offset, offset + effectiveLimit)
119 // Only report appliedLimit when truncation actually occurred
120 const wasTruncated = items.length - offset > effectiveLimit
121 return {
122 items: sliced,
123 appliedLimit: wasTruncated ? effectiveLimit : undefined,
124 }
125}

The default limit is 250 results. Design highlights:

  1. limit: 0 is the "unlimited" escape hatch
  2. appliedLimit is only set when truncation actually occurs, telling the AI it can use offset to page through more results
  3. The offset parameter achieves the effect of tail -n +N | head -N

Excluded Directories

src/tools/GrepTool/GrepTool.ts:94-102
TypeScript
94const VCS_DIRECTORIES_TO_EXCLUDE = [
95 '.git', '.svn', '.hg', '.bzr', '.jj', '.sl',
96] as const

Version control directories are automatically excluded, since searching inside .git is almost never useful and produces a lot of noise. Six version control systems are supported (Git, SVN, Mercurial, Bazaar, Jujutsu, Sapling).

ripgrep Argument Construction

src/tools/GrepTool/GrepTool.ts:329-441
TypeScript
329 async call({ pattern, path, glob, type, output_mode = 'files_with_matches',
330 '-B': context_before, '-A': context_after, '-C': context_c, context,
331 '-n': show_line_numbers = true, '-i': case_insensitive = false,
332 head_limit, offset = 0, multiline = false,
333 }, { abortController, getAppState }) {
334 const absolutePath = path ? expandPath(path) : getCwd()
335 const args = ['--hidden']
336
337 for (const dir of VCS_DIRECTORIES_TO_EXCLUDE) {
338 args.push('--glob', `!${dir}`)
339 }
340
341 args.push('--max-columns', '500') // Limit line length
342
343 if (multiline) {
344 args.push('-U', '--multiline-dotall')
345 }
346 // ... build more args
347 }

Note --max-columns 500: this limits line width to 500 characters, preventing base64-encoded or compressed content (typically thousands of characters per line) from flooding search results.

files_with_matches Mode Sorting

src/tools/GrepTool/GrepTool.ts:529-553
TypeScript
529 const stats = await Promise.allSettled(
530 results.map(_ => getFsImplementation().stat(_)),
531 )
532 const sortedMatches = results
533 .map((_, i) => {
534 const r = stats[i]!
535 return [
536 _,
537 r.status === 'fulfilled' ? (r.value.mtimeMs ?? 0) : 0,
538 ] as const
539 })
540 .sort((a, b) => {
541 if (process.env.NODE_ENV === 'test') {
542 return a[0].localeCompare(b[0]) // Sort by filename in tests for determinism
543 }
544 const timeComparison = b[1] - a[1]
545 if (timeComparison === 0) {
546 return a[0].localeCompare(b[0]) // Filename as tiebreaker
547 }
548 return timeComparison
549 })

Results are sorted by modification time in descending order by default — the most recently modified files appear first. The design assumption is: users most likely care about the most recently active files. In test environments, sorting is switched to filename-based for determinism.

Promise.allSettled is used instead of Promise.all: if a file is deleted between the ripgrep scan and the stat call, it will not cause the entire batch to fail. Failed stats are treated as mtime 0.

Ignore Pattern Integration

src/tools/GrepTool/GrepTool.ts:412-427
TypeScript
412 const ignorePatterns = normalizePatternsToPath(
413 getFileReadIgnorePatterns(appState.toolPermissionContext),
414 getCwd(),
415 )
416 for (const ignorePattern of ignorePatterns) {
417 const rgIgnorePattern = ignorePattern.startsWith('/')
418 ? `!${ignorePattern}`
419 : `!**/${ignorePattern}`
420 args.push('--glob', rgIgnorePattern)
421 }

Deny rules configured in the permission system are converted to ripgrep glob exclusion patterns. Non-absolute paths require a **/ prefix because ripgrep only applies gitignore patterns relative to the working directory.


ToolSearchTool: Deferred Tool Discovery

ToolSearchTool addresses a completely different search problem: when Claude Code has 100+ tools, how does the AI efficiently find the ones it needs?

Motivation for Deferred Loading

Tool Loading Strategy
All Tools (100+)
"Split Strategy"
Core Tools
Read, Write, Edit, · Bash, Glob, Grep · Loaded immediately
Deferred Tools
MCP tools, LSP, · WebFetch, WebSearch · Discovered on demand

If the full schemas of all tools were included in the initial prompt, it would consume a massive number of tokens. ToolSearchTool implements a kind of "tool directory": deferred tools only have their names listed in the system-reminder, and the AI retrieves full definitions through ToolSearchTool when needed.

Deferred Tool Determination

src/tools/ToolSearchTool/prompt.ts:62-108
TypeScript
62export function isDeferredTool(tool: Tool): boolean {
63 // Explicit opt-out via _meta['anthropic/alwaysLoad']
64 if (tool.alwaysLoad === true) return false
65
66 // MCP tools are always deferred (workflow-specific)
67 if (tool.isMcp === true) return true
68
69 // Never defer ToolSearch itself
70 if (tool.name === TOOL_SEARCH_TOOL_NAME) return false
71
72 // Agent tool must be available turn 1 in fork-first mode
73 if (feature('FORK_SUBAGENT') && tool.name === AGENT_TOOL_NAME) {
74 if (m.isForkSubagentEnabled()) return false
75 }
76
77 return tool.shouldDefer === true
78}

The priority of determination rules:

  1. alwaysLoad: true — Never deferred (MCP tools can set this via _meta)
  2. MCP tools — Deferred by default (workflow-specific)
  3. ToolSearchTool itself — Never deferred (the tool used to load other tools cannot be deferred)
  4. Special tools (Agent, Brief) — Conditionally not deferred
  5. shouldDefer: true — Deferred

Two Query Modes

src/tools/ToolSearchTool/ToolSearchTool.ts:21-33
TypeScript
21export const inputSchema = lazySchema(() =>
22 z.object({
23 query: z
24 .string()
25 .describe(
26 'Query to find deferred tools. Use "select:<tool_name>" for direct selection, or keywords to search.',
27 ),
28 max_results: z
29 .number()
30 .optional()
31 .default(5)
32 .describe('Maximum number of results to return (default: 5)'),
33 }),
34)

select: mode — Exact selection: select:Read,Edit,Grep fetches tools directly by name. Supports comma-separated multi-selection.

Keyword search — Fuzzy search: notebook jupyter searches tool names and descriptions, returning the most relevant results.

Keyword Scoring Algorithm

src/tools/ToolSearchTool/ToolSearchTool.ts:259-301
TypeScript
259async function searchToolsWithKeywords(query, deferredTools, tools, maxResults) {
260 // ...
261 const scored = await Promise.all(
262 candidateTools.map(async tool => {
263 const parsed = parseToolName(tool.name)
264 const description = await getToolDescriptionMemoized(tool.name, tools)
265 const hintNormalized = tool.searchHint?.toLowerCase() ?? ''
266
267 let score = 0
268 for (const term of allScoringTerms) {
269 const pattern = termPatterns.get(term)!
270
271 // Exact part match (high weight for MCP server names)
272 if (parsed.parts.includes(term)) {
273 score += parsed.isMcp ? 12 : 10
274 } else if (parsed.parts.some(part => part.includes(term))) {
275 score += parsed.isMcp ? 6 : 5
276 }
277
278 // searchHint match — curated phrase, higher signal than prompt
279 if (hintNormalized && pattern.test(hintNormalized)) {
280 score += 4
281 }
282
283 // Description match - word boundary to avoid false positives
284 if (pattern.test(descNormalized)) {
285 score += 2
286 }
287 }
288
289 return { name: tool.name, score }
290 }),
291 )
292}

Scoring tiers:

Match TypeScore (Regular)Score (MCP)
Exact tool name part match1012
Tool name contains match56
searchHint match44
Full name fallback match33
Description word boundary match22

MCP tools receive higher name-match scores because MCP tool names typically contain server names (e.g., mcp__slack__send_message), and searching by server name is the most common query pattern.

+ Prefix for Required Terms

src/tools/ToolSearchTool/ToolSearchTool.ts:223-232
TypeScript
223 const requiredTerms: string[] = []
224 const optionalTerms: string[] = []
225 for (const term of queryTerms) {
226 if (term.startsWith('+') && term.length > 1) {
227 requiredTerms.push(term.slice(1))
228 } else {
229 optionalTerms.push(term)
230 }
231 }

+slack send means: the tool name or description must contain "slack," and then results are ranked by relevance to "send" among the qualifying tools. This makes searches more precise.

Tool Reference Returns

src/tools/ToolSearchTool/ToolSearchTool.ts:444-470
TypeScript
444 mapToolResultToToolResultBlockParam(content: Output, toolUseID: string) {
445 if (content.matches.length === 0) {
446 let text = 'No matching deferred tools found'
447 if (content.pending_mcp_servers?.length > 0) {
448 text += `. Some MCP servers are still connecting: ${content.pending_mcp_servers.join(', ')}...`
449 }
450 return { type: 'tool_result', tool_use_id: toolUseID, content: text }
451 }
452 return {
453 type: 'tool_result',
454 tool_use_id: toolUseID,
455 content: content.matches.map(name => ({
456 type: 'tool_reference' as const,
457 tool_name: name,
458 })),
459 }
460 },

It returns tool_reference type content blocks — a special Anthropic API format that tells the API to inject the matched tools' full schemas into the model's context. The AI can then use these tools on the next turn.

When some MCP servers are still connecting, the return message includes a list of pending servers, prompting the AI to retry later.


Three-Tool Combined Workflow

sequenceDiagram
    participant AI as Claude
    participant TS as ToolSearchTool
    participant Glob as GlobTool
    participant Grep as GrepTool
    participant Read as FileReadTool

    Note over AI: "Help me find and fix TypeScript type errors"

    AI->>TS: query: "select:LSP"
    TS-->>AI: LSPTool Schema loaded

    AI->>Glob: pattern: "**/*.ts"
    Glob-->>AI: 45 TypeScript files

    AI->>Grep: pattern: "// @ts-ignore", type: "ts"
    Grep-->>AI: 3 files contain @ts-ignore

    AI->>Read: Read the first file
    Read-->>AI: File content

    Note over AI: Analyze and fix type errors

The typical usage sequence:

  1. ToolSearchTool — If specialized tools are needed (LSP, WebFetch, etc.), load them first via ToolSearch
  2. GlobTool — Build a file inventory, understand project structure
  3. GrepTool — Search for specific content in target files
  4. FileReadTool — Read the found files in detail

This sequence progresses from coarse to fine, gradually narrowing the search scope. Each step uses path relativization to save tokens.


Prompt Guidance

BashTool's prompt explicitly guides the AI to use search tools rather than shell commands:

src/tools/BashTool/prompt.ts:280-286
TypeScript
280const toolPreferenceItems = [
281 `File search: Use ${GLOB_TOOL_NAME} (NOT find or ls)`,
282 `Content search: Use ${GREP_TOOL_NAME} (NOT grep or rg)`,
283]

GrepTool's own prompt also emphasizes this:

src/tools/GrepTool/prompt.ts:7-17
TypeScript
7`A powerful search tool built on ripgrep
8
9 Usage:
10 - ALWAYS use Grep for search tasks. NEVER invoke \`grep\` or \`rg\` as a Bash command.
11 The Grep tool has been optimized for correct permissions and access.
12 - Supports full regex syntax (e.g., "log.*Error", "function\\s+\\w+")
13 - Filter files with glob parameter (e.g., "*.js", "**/*.tsx")
14 - Output modes: "content", "files_with_matches" (default), "count"
15 - Use Agent tool for open-ended searches requiring multiple rounds
16 - Pattern syntax: Uses ripgrep (not grep) - literal braces need escaping`

Key point: explicit "ALWAYS" and "NEVER" directives are more effective at guiding AI behavior than "prefer."


Design Takeaways

Claude Code's search system embodies several core design principles:

  1. Dedicated tools over generic commands — Glob/Grep provide better permission control, token management, and formatted output than find/grep

  2. Progressive refinement — From GlobTool's coarse-grained file discovery, to GrepTool's fine-grained content search, to FileReadTool's full read, the search workflow naturally progresses from broad to narrow

  3. Deferred loading — ToolSearchTool lets the system support 100+ tools without consuming 100+ tools' worth of prompt tokens, loading only when needed

  4. Token-aware design — Path relativization, result truncation, default head_limit, modification time sorting — every design decision considers token efficiency