Startup Performance: How a Heavy CLI Achieves Fast Cold Starts
A deep dive into Claude Code's startup optimization — parallel prefetching, compile-time dead code elimination, dynamic lazy loading, performance profiling, and understanding cold start engineering for large CLIs
The Problem
Claude Code is a large TypeScript CLI application. It depends on heavy modules like OpenTelemetry (~400KB) and gRPC (via @grpc/grpc-js, ~700KB), has over 1,900 source files, and registers 60+ slash commands and 30+ tools. When a user types claude in their terminal and hits Enter, the application needs to:
- Parse and evaluate all top-level module imports
- Read multi-layered configuration (MDM enterprise policies, macOS Keychain, user settings, project settings...)
- Initialize telemetry, permissions, and GrowthBook feature flags
- Connect to MCP servers, load plugins and skills
- Restore or create a session and render the interactive TUI
If this process were executed naively in sequence, cold start would easily exceed one second. Yet in practice, claude responds quite quickly. How does it pull this off?
This article dives deep into Claude Code's startup path, starting from the first line of code in main.tsx, analyzing every optimization technique it employs layer by layer: parallel prefetching, Bun compile-time dead code elimination, dynamic lazy loading, performance profiling infrastructure, and the deferred require pattern for handling circular dependencies.
Phased Initialization in main.tsx
Claude Code's entry file src/main.tsx serves as the "orchestration center" for the entire startup flow. Its design philosophy is to split startup into multiple phases, parallelize each phase as much as possible, and precisely measure the duration of each phase through profileCheckpoint().
The placement of these three top-level side effects is carefully deliberate — they are positioned before all other import statements. In JavaScript/TypeScript, import statements are static and modules are synchronously evaluated at import time. main.tsx has close to 200 lines of import statements, and module evaluation takes approximately 135ms. By placing the timestamp at the very first line (profileCheckpoint('main_tsx_entry')) and then immediately launching two async subprocesses, those subprocesses can run in parallel with the subsequent 135ms of module evaluation.
Once all import statements complete, the code immediately records:
This phased model can be summarized with the following diagram:
Note that the main() function itself (line 585) is not where everything happens. After setting up signal handlers and security checks, it calls run() (line 884), which creates a Commander instance and uses a preAction hook to defer initialization — init() only runs when actually executing a command (not when simply displaying --help):
The preAction hook first awaits the previously launched async subprocesses — but since they ran in parallel with the 135ms of imports, they've usually already completed by this point, making the await essentially zero-cost.
Parallel Prefetching: startMdmRawRead() and startKeychainPrefetch()
These two functions are among the most elegant designs in Claude Code's startup optimization. Their core idea is: launch async subprocesses to perform time-consuming I/O operations during the synchronous blocking period of module evaluation.
MDM Raw Read
startMdmRawRead() is responsible for reading enterprise MDM (Mobile Device Management) configuration. On macOS, this means reading plist files via the plutil subprocess; on Windows, it reads the registry via reg query.
The key point is that fireRawRead() returns a Promise that is called immediately during module evaluation, with the subprocess running in the background. The result is cached via a module-level variable rawReadPromise:
Keychain Prefetch
startKeychainPrefetch() is even more refined. Reading the Keychain on macOS requires calling the system's security command-line tool, with each call taking approximately 32-33ms. Claude Code needs to read two entries:
- OAuth credentials (
"Claude Code-credentials") — ~32ms - Legacy API key (
"Claude Code") — ~33ms
If executed sequentially, this wastes approximately 65ms on every macOS startup. Prefetching parallelizes these two reads:
Note that this module's import chain is intentionally minimal — it directly imports child_process and a lightweight macOsKeychainHelpers.ts, rather than the full macOsKeychainStorage.ts. The source code comments explicitly explain why:
Importing the full keychain storage module would bring in execa, human-signals, cross-spawn, and other dependencies — synchronous module initialization alone would take ~58ms, which completely defeats the purpose of prefetching.
Parallel Timing Diagram
The following timing diagram illustrates the "execute async I/O during synchronous blocking" pattern:
By the time the preAction phase awaits these Promises, the subprocesses have long since completed. The await simply retrieves results from cache with virtually zero overhead. This is the essence of the "fire-and-forget + late-collect" pattern.
Special Handling for --bare Mode
It's worth noting that startKeychainPrefetch() is skipped in --bare mode:
--bare is a minimal mode that skips hooks, LSP, plugin sync, auto-memory, background prefetching, Keychain reads, and CLAUDE.md auto-discovery. Authentication is strictly limited to ANTHROPIC_API_KEY or apiKeyHelper configured via --settings. This is designed for scripting and CI/CD scenarios, optimizing for the fastest possible startup.
feature() and Bun Compile-Time Dead Code Elimination
Claude Code uses Bun for building and bundling. Bun provides a special module bun:bundle whose feature() function implements compile-time conditional compilation — this is not a runtime feature flag but a decision made at build time about whether code is included in the final output.
How It Works
feature() is evaluated at compile time to a true or false constant. Bun's bundler (or the JavaScript engine's dead code elimination) then removes branches that will never execute. This means disabled features don't just avoid execution — their entire module trees are never loaded.
In src/commands.ts, this pattern is used extensively:
Note the use of require() rather than import — this is intentional. import is static and will be executed during module evaluation regardless of any surrounding conditions. require() is dynamic and only executes when feature() returns true. When feature() evaluates to false at compile time, the entire require() call (and its dependency tree) is eliminated.
Application in the Tool System
The same pattern is used extensively in src/tools.ts to control tool loading:
Performance Impact Analysis
Suppose an external release build has PROACTIVE, KAIROS, BRIDGE_MODE, VOICE_MODE, WORKFLOW_SCRIPTS, and other feature flags disabled. In commands.ts alone, there are 16 conditional loading points. If each module and its dependency tree averages 50KB, this means the external build saves approximately 800KB of module loading through compile-time elimination — saving not just disk space and memory, but more importantly, module evaluation time.
This reflects an important architectural decision: feature flags shouldn't just make runtime decisions — they should exclude unnecessary code at compile time.
process.env Conditions vs. feature() Conditions
Claude Code also has another type of conditional loading that uses process.env instead of feature():
process.env.USER_TYPE values can also be inlined by Bun at compile time (if specified in the build configuration via define), achieving the same dead code elimination effect. In external release builds, USER_TYPE is set to "external", so all === 'ant' branches are eliminated, and internal-only tools (REPLTool, SuggestBackgroundPRTool, etc.) don't appear in the external build artifact.
Dynamic import() for Lazy Loading Heavy Modules
Even after eliminating unused feature modules via feature(), some large modules are required but not needed at startup. For these, Claude Code uses dynamic import() to defer loading.
Lazy Loading OpenTelemetry
The comment in init.ts is very direct:
The OpenTelemetry SDK is ~400KB and gRPC is ~700KB — loading over 1MB of modules synchronously at startup would significantly slow down the cold start. Through dynamic import(), these modules are only loaded when telemetry is actually initialized, and this happens asynchronously inside the init() function without blocking the main startup path.
Similarly, first-party event logging is initialized asynchronously:
Note the void prefix — this indicates the Promise is "fire-and-forget" and won't block init()'s return.
The Lazy Shim for the Insights Command
src/commands.ts contains a particularly elegant lazy loading case — the /insights command. insights.ts is a 113KB, 3,200-line file containing diff rendering and HTML generation:
This shim object has the same interface as the real command (type, name, description, etc.), but its getPromptForCommand method internally loads the real module via dynamic import(). Only when the user actually types /insights does the 113KB of code get loaded. This pattern generalizes to any "lightweight at registration, load on invocation" scenario.
Dynamic Loading of setup.js
Even setup.js is dynamically loaded:
This ensures the setup module is only loaded when actually needed for setting up the working directory and permissions.
Subcommand Skipping in Print Mode
For -p/--print mode (non-interactive), Claude Code skips registration of all 52 subcommands:
With a simple process.argv.includes('-p') check, approximately 65ms of subcommand registration overhead is saved. This is significant for script mode, which is frequently called in pipelines (e.g., echo "fix bug" | claude -p).
The profileCheckpoint() Performance Profiling System
Claude Code has a built-in, comprehensive startup performance profiling system defined in src/utils/startupProfiler.ts. This system has two modes:
- Sampled logging mode: 100% of internal users + 0.5% of external users, reporting phase durations to Statsig
- Detailed profiling mode: Enabled via the
CLAUDE_CODE_PROFILE_STARTUP=1environment variable, outputting a complete report with memory snapshots
Zero-Overhead Design
When SHOULD_PROFILE is false (~99.5% of external users), profileCheckpoint() is a no-op — completely zero overhead:
It uses Node.js's built-in performance.mark() API for time markers and only collects process.memoryUsage() snapshots in detailed mode (since gathering memory usage information has its own overhead).
Predefined Phases
The system predefines several key phases for Statsig reporting:
This allows the team to monitor startup performance trends on the Statsig dashboard and promptly detect regressions.
Checkpoint Distribution
By searching for all profileCheckpoint() calls in main.tsx, we can see that checkpoints cover every critical node in the startup process:
| Checkpoint | Location (Line) | Meaning |
|---|---|---|
main_tsx_entry | 12 | Entry point, before module evaluation |
main_tsx_imports_loaded | 209 | All imports completed |
main_function_start | 586 | main() entry |
main_warning_handler_initialized | 607 | Warning handler ready |
run_function_start | 885 | run() entry |
run_commander_initialized | 903 | Commander instance created |
preAction_start | 908 | preAction hook begins |
preAction_after_mdm | 915 | MDM/Keychain awaits completed |
preAction_after_init | 917 | init() completed |
preAction_after_sinks | 935 | Log sinks attached |
preAction_after_migrations | 951 | Data migrations completed |
preAction_after_remote_settings | 959 | Remote settings loading launched |
action_handler_start | 1007 | Action handler begins |
action_after_input_prompt | 1862 | Input prompt processing completed |
action_tools_loaded | 1878 | Tools loading completed |
action_before_setup | 1904 | Before setup() |
action_after_setup | 1936 | After setup() |
action_commands_loaded | 2031 | Commands loading completed |
action_mcp_configs_loaded | 2402 | MCP config loading completed |
before_connectMcp / after_connectMcp | 2728/2730 | MCP connection duration |
action_after_hooks | 3766 | SessionStart hooks completed |
run_main_options_built | 3873 | Commander options definition completed |
This dense checkpoint network lets the team precisely pinpoint the source of any performance regression.
Deferred Require Pattern for Circular Dependencies
In a large project with 1,900+ files, circular dependencies are nearly unavoidable. Claude Code uses deferred require() functions to break cycles:
The same pattern appears in main.tsx:
This pattern has several clever aspects:
- Function wrapping:
const getX = () => require('...')ensuresrequire()is only executed when the function is called, not during module evaluation - Type safety:
as typeof import('...')preserves full TypeScript type inference - Caching: Node.js/Bun's
require()has built-in module caching, so callinggetTeamCreateTool()multiple times only loads the module once
The difference from the feature() pattern is: feature() is a compile-time decision — code either exists or doesn't; deferred require() is a runtime strategy — code always exists in the bundle but loading is postponed until first use.
Multi-Source Configuration Loading Priority
Claude Code's configuration system supports five sources, ordered from lowest to highest priority:
This priority chain means enterprise policies (policySettings) can override all other settings, while command-line flags (flagSettings) can override project and user settings.
Configuration Loading Timing
Configuration loading itself follows the "start early, collect late" pattern:
eagerParseCliFlag() is a minimal argv parser — it doesn't use Commander's full parsing but directly scans process.argv to find the --settings flag value. This ensures settings are available before init().
Remote managed settings and policy limits are loaded asynchronously:
The void prefix once again indicates these are non-blocking. Remote settings take effect automatically upon arrival via a hot-reload mechanism.
Deferred Evaluation and Memoization of Command Lists
Command list construction embodies the same philosophy — declared as a function to defer evaluation until first invocation:
memoize() ensures the command list is built only once. This matters because some commands (like login()) need to read configuration during initialization — if the list were built during module evaluation, the configuration system wouldn't be ready yet.
Session Restoration Paths: teleport, remote, resume
Claude Code has three session restoration modes, each with different startup paths and performance characteristics.
--continue / --resume: Local Restoration
The simplest mode. --continue resumes the most recent conversation in the current directory, while --resume restores a specific conversation via session ID or interactive selector:
Note that caches are cleared before restoration — this ensures the restored session sees the latest file and skill changes.
--remote: Remote Sessions
--remote creates a Claude Code Web (CCR) remote session:
Remote mode requires an additional blocking wait for policy limits to load (waitForPolicyLimitsToLoad()), since enterprises may prohibit remote sessions. This is one of the few places that requires a blocking wait.
--teleport: Cross-Device Restoration
Teleport is the most complex restoration path, supporting cross-device session restoration. It requires:
- Fetching session data from the API
- Verifying Git repository match
- Switching to the correct branch
- Processing message history
Teleport's progress UI is dynamically imported (teleportWithProgress dynamically imported at call site, comment at line 187), avoiding loading teleport-related modules when teleport isn't being used.
Interaction Between Restoration Paths and Startup Hooks
A subtle but important detail: restoration paths skip startup hooks:
This is because when restoring a session, conversationRecovery.ts triggers a 'resume' type hook, avoiding duplicate execution with the startup hook.
Parallelizing setup() and Command Loading
In the action handler, setup() and command/agent loading are parallelized:
Several design decisions are worth noting:
-
initBuiltinPlugins()andinitBundledSkills()execute synchronously before parallel launches — they are pure in-memory operations (<1ms, zero I/O), butgetCommands()internally callsgetBundledSkills()which synchronously reads their results. If placed insidesetup()(the previous approach), the parallelgetCommands()would memoize an empty list. -
commandsPromise?.catch(() => {})suppresses transientunhandledRejection— during the 28ms wait forsetupPromise, ifcommandsPromisethrows an exception before beingawaited, Node.js would report an unhandled rejection. The emptycatchsolves this. -
In worktree mode (
worktreeEnabled), parallelization isn't possible — becausesetup()callsprocess.chdir(), and commands and agents need the post-chdir working directory.
Transferable Patterns: Cold Start Optimization Checklist for Large CLIs
From Claude Code's startup optimization, we can distill a general-purpose cold start optimization checklist for large CLIs:
1. Phased Initialization + Checkpoint Marking
Split the startup process into distinct phases, mark each with checkpoints, and establish a quantifiable performance baseline:
Don't guess where it's slow — use data. Claude Code's profileCheckpoint() system has zero overhead for 99.5% of cases, collecting data only from sampled users.
2. "Fire Early, Collect Late" Parallel I/O
Identify I/O operations in the startup path (file reads, subprocess calls, network requests), launch them at the earliest possible moment, and collect results at the latest moment they're needed:
3. Compile-Time Elimination > Runtime Checks
If you know a feature won't be used in a particular build configuration, eliminate it at compile time rather than skipping it at runtime. Bun's feature() is one implementation; Webpack's DefinePlugin + NormalModuleReplacementPlugin is another. The key is enabling the bundler's tree-shaking to remove entire unused module trees.
4. Lazy Shim Pattern
For components that need metadata at registration time but full code only at execution time (commands, routes, plugins), create lightweight shim objects:
5. Deferred Require to Break Circular Dependencies
In large codebases, fully resolving circular dependencies through refactoring is often prohibitively expensive. Function-wrapped require() breaks cycles with minimal invasiveness while maintaining type safety:
6. Mode-Aware Fast Paths
Skip unnecessary initialization based on the running mode. Claude Code skips registration of 52 subcommands in -p/--print mode (saving 65ms) and skips hooks, LSP, plugins, and all non-essential components in --bare mode:
7. Memoize Expensive Computations
Once command lists, tool lists, and skill lists are computed, cache results via lodash/memoize:
When caches need invalidation (e.g., a new skill was dynamically added), provide an explicit clearCache() method:
8. Minimize Import Chains
Prefetch modules (like keychainPrefetch.ts) must maintain minimal import chains. If the prefetch module itself pulls in heavy dependencies, the parallel advantage of prefetching is negated by the synchronous module evaluation overhead. Clearly comment why you chose child_process over execa, and why you import helpers rather than storage.
9. Non-Blocking Background Tasks
Place cleanup, syncing, prefetching, and other non-critical tasks in the background:
10. Observability First
Before optimizing, establish observability. Claude Code's approach:
- Sampled reporting to Statsig (production monitoring)
CLAUDE_CODE_PROFILE_STARTUP=1detailed reports (local debugging)- Duration and memory usage for each phase
- Automatic detection and reporting of performance regressions
Without measurement, there is no optimization. Without continuous monitoring, optimizations degrade as new features are added.
Conclusion
Claude Code's cold start optimization isn't a single silver bullet but a combination of carefully designed techniques:
- Parallel prefetching hides async I/O behind synchronous module evaluation
- Compile-time dead code elimination removes unused feature module trees at build time
- Dynamic import() lazy loading defers the cost of heavy modules until first use
- The profileCheckpoint() profiling system provides zero-overhead performance observability
- The deferred require pattern solves circular dependencies with minimal invasiveness
- Mode-aware fast paths skip unnecessary initialization based on the usage scenario
Each technique is well-known in isolation, but their combination — together with the overarching principle of "measure first, optimize second, monitor continuously" — enables a CLI application with 1,900 source files and heavy dependencies like OpenTelemetry and gRPC to deliver a fast cold start experience.
For developers building their own large CLIs or desktop applications, these patterns are highly transferable. The core idea is simple: treat every millisecond on the startup path as a scarce resource, and win them back through parallelization, deferral, and elimination.