Telemetry and Observability: OpenTelemetry in a CLI Application

A deep dive into Claude Code's telemetry architecture -- OpenTelemetry lazy loading, performance profiling, GrowthBook feature flags, and privacy considerations

The Problem

Does a CLI tool need telemetry? The answer is yes -- but the approach is fundamentally different from a web application. Web apps can initialize GA4 asynchronously after page load, and users won't notice a few hundred milliseconds of delay. CLI tools, however, measure startup time in milliseconds -- if claude --help takes an extra 200ms because of loading the OpenTelemetry SDK, users will notice immediately.

Claude Code's telemetry system faces a three-fold challenge:

  1. Zero startup cost -- Telemetry must not slow down CLI startup
  2. Privacy first -- No recording of code, file paths, or any sensitive information
  3. Reliable delivery -- Events must not be lost during network outages

This article examines how it solves these problems through event queues, lazy loading, multi-layer sinks, and compile-time dead code elimination.

Telemetry Architecture Overview

...

Zero-Dependency Event Entry Point

src/services/analytics/index.ts is the entry point for the entire telemetry system. Its design principle is stated at the top of the file:

TypeScript
1// src/services/analytics/index.ts Lines 1-9
2/**
3 * Analytics service - public API for event logging
4 *
5 * DESIGN: This module has NO dependencies to avoid import cycles.
6 * Events are queued until attachAnalyticsSink() is called during app initialization.
7 * The sink handles routing to Datadog and 1P event logging.
8 */

Zero dependencies. This module imports nothing from any other project module -- no config, no auth, no model. Why? Because almost every module needs logEvent, and if analytics depended on them in return, it would create circular imports.

The Event Queue Mechanism

TypeScript
1// src/services/analytics/index.ts Lines 81-84, 95-123
2const eventQueue: QueuedEvent[] = []
3let sink: AnalyticsSink | null = null
4
5export function attachAnalyticsSink(newSink: AnalyticsSink): void {
6 if (sink !== null) return // Idempotent
7 sink = newSink
8
9 if (eventQueue.length > 0) {
10 const queuedEvents = [...eventQueue]
11 eventQueue.length = 0
12
13 // Drain asynchronously to avoid blocking the startup path
14 queueMicrotask(() => {
15 for (const event of queuedEvents) {
16 if (event.async) {
17 void sink!.logEventAsync(event.eventName, event.metadata)
18 } else {
19 sink!.logEvent(event.eventName, event.metadata)
20 }
21 }
22 })
23 }
24}

This is a classic "queue first, consume later" pattern:

  1. During CLI startup, various modules call logEvent to record events during initialization
  2. At this point the sink hasn't been initialized yet, so events are pushed into eventQueue
  3. Once the application completes core initialization, attachAnalyticsSink injects the actual sink
  4. The queue is drained asynchronously via queueMicrotask -- without blocking the current startup path

The key detail is queueMicrotask rather than setTimeout. Microtasks execute at the end of the current event loop, faster than setTimeout(fn, 0), but without blocking synchronous code.

Type-Safe Privacy Guards

TypeScript
1// src/services/analytics/index.ts Lines 19-33
2export type AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS = never
3
4export type AnalyticsMetadata_I_VERIFIED_THIS_IS_PII_TAGGED = never

The length of these type names is remarkable. They are aliases for the never type -- any string value that needs to be passed as event metadata must be explicitly asserted:

TypeScript
1myString as AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS

The metadata signature for logEvent is even more aggressive:

TypeScript
1// src/services/analytics/index.ts Line 61
2type LogEventMetadata = { [key: string]: boolean | number | undefined }

No string type. Metadata values can only be boolean, number, or undefined. This eliminates the possibility of accidentally logging code snippets or file paths at the type system level.

PROTO Key PII Isolation

TypeScript
1// src/services/analytics/index.ts Lines 45-58
2export function stripProtoFields<V>(
3 metadata: Record<string, V>,
4): Record<string, V> {
5 let result: Record<string, V> | undefined
6 for (const key in metadata) {
7 if (key.startsWith('_PROTO_')) {
8 if (result === undefined) {
9 result = { ...metadata }
10 }
11 delete result[key]
12 }
13 }
14 return result ?? metadata
15}

Keys prefixed with _PROTO_ contain PII (Personally Identifiable Information) and are only routed to the access-controlled 1P proto column. stripProtoFields strips these fields before sending to Datadog. Note the optimization -- if there are no _PROTO_ keys, the original reference is returned directly without any copying.

Datadog Event Tracking

src/services/analytics/datadog.ts implements batch sending to the Datadog Logs API.

TypeScript
1// src/services/analytics/datadog.ts Lines 12-18
2const DATADOG_LOGS_ENDPOINT =
3 'https://http-intake.logs.us5.datadoghq.com/api/v2/logs'
4const DATADOG_CLIENT_TOKEN = 'pubbbf48e6d78dae54bceaa4acf463299bf'
5const DEFAULT_FLUSH_INTERVAL_MS = 15000
6const MAX_BATCH_SIZE = 100
7const NETWORK_TIMEOUT_MS = 5000

Event Allowlist

TypeScript
1// src/services/analytics/datadog.ts Lines 19-64
2const DATADOG_ALLOWED_EVENTS = new Set([
3 'tengu_api_error',
4 'tengu_api_success',
5 'tengu_cancel',
6 'tengu_exit',
7 'tengu_init',
8 'tengu_started',
9 'tengu_tool_use_error',
10 'tengu_tool_use_success',
11 // ... approximately 40 event names total
12])

Not all events are sent to Datadog -- only those explicitly included in the allowlist. This provides double safety: even if someone accidentally passes sensitive data in logEvent, if the event name isn't in the allowlist, Datadog never receives it.

Batch Sending and Timed Flushing

TypeScript
1// src/services/analytics/datadog.ts Lines 98-128
2let logBatch: DatadogLog[] = []
3let flushTimer: NodeJS.Timeout | null = null
4
5async function flushLogs(): Promise<void> {
6 if (logBatch.length === 0) return
7 const logsToSend = logBatch
8 logBatch = []
9
10 try {
11 await axios.post(DATADOG_LOGS_ENDPOINT, logsToSend, {
12 headers: {
13 'Content-Type': 'application/json',
14 'DD-API-KEY': DATADOG_CLIENT_TOKEN,
15 },
16 timeout: NETWORK_TIMEOUT_MS,
17 })
18 } catch (error) {
19 logError(error)
20 }
21}
22
23function scheduleFlush(): void {
24 if (flushTimer) return
25 flushTimer = setTimeout(() => {
26 flushTimer = null
27 void flushLogs()
28 }, getFlushIntervalMs()).unref()
29}

.unref() is critical -- it allows the Node.js process to exit when there are no other active handlers, rather than hanging due to the flush timer. This is essential for CLI tools: after the user presses Ctrl+C, the process should exit immediately instead of waiting 15 seconds for a flush.

User Bucketing

TypeScript
1// src/services/analytics/datadog.ts Lines 281-299
2const NUM_USER_BUCKETS = 30
3
4const getUserBucket = memoize((): number => {
5 const userId = getOrCreateUserID()
6 const hash = createHash('sha256').update(userId).digest('hex')
7 return parseInt(hash.slice(0, 8), 16) % NUM_USER_BUCKETS
8})

This design is used for alerting. When issues arise, we want to know "how many users are affected" rather than "how many events occurred." Hashing user IDs into 30 buckets and counting affected unique buckets estimates the user count -- preserving privacy while reducing cardinality.

OpenTelemetry 1P Event Logging

src/services/analytics/firstPartyEventLogger.ts implements first-party event logging using the OpenTelemetry SDK.

...

Initialization

TypeScript
1// src/services/analytics/firstPartyEventLogger.ts Lines 312-389
2export function initialize1PEventLogging(): void {
3 profileCheckpoint('1p_event_logging_start')
4 const enabled = is1PEventLoggingEnabled()
5 if (!enabled) return
6
7 const batchConfig = getBatchConfig()
8 lastBatchConfig = batchConfig
9 profileCheckpoint('1p_event_after_growthbook_config')
10
11 const scheduledDelayMillis =
12 batchConfig.scheduledDelayMillis || DEFAULT_LOGS_EXPORT_INTERVAL_MS
13
14 const resource = resourceFromAttributes({
15 [ATTR_SERVICE_NAME]: 'claude-code',
16 [ATTR_SERVICE_VERSION]: MACRO.VERSION,
17 })
18
19 const eventLoggingExporter = new FirstPartyEventLoggingExporter({
20 maxBatchSize: maxExportBatchSize,
21 skipAuth: batchConfig.skipAuth,
22 maxAttempts: batchConfig.maxAttempts,
23 path: batchConfig.path,
24 baseUrl: batchConfig.baseUrl,
25 isKilled: () => isSinkKilled('firstParty'),
26 })
27
28 firstPartyEventLoggerProvider = new LoggerProvider({
29 resource,
30 processors: [
31 new BatchLogRecordProcessor(eventLoggingExporter, {
32 scheduledDelayMillis,
33 maxExportBatchSize,
34 maxQueueSize,
35 }),
36 ],
37 })
38
39 // Get logger from local provider, not the global API
40 firstPartyEventLogger = firstPartyEventLoggerProvider.getLogger(
41 'com.anthropic.claude_code.events',
42 MACRO.VERSION,
43 )
44}

Key design decisions:

  1. Dedicated LoggerProvider -- Instead of using the OpenTelemetry global API (logs.getLogger()), a private provider is created. This ensures internal events don't leak to customer-configured OTLP endpoints.
  2. profileCheckpoint -- Marks key points during initialization to track the telemetry system's own startup time.
  3. MACRO.VERSION -- A version constant replaced at compile time.
  4. GrowthBook batch configuration -- Batch parameters (interval, size, queue) are dynamically fetched from GrowthBook, allowing remote adjustment.

Runtime Configuration Hot Reload

TypeScript
1// src/services/analytics/firstPartyEventLogger.ts Lines 407-449
2export async function reinitialize1PEventLoggingIfConfigChanged(): Promise<void> {
3 if (!is1PEventLoggingEnabled() || !firstPartyEventLoggerProvider) return
4
5 const newConfig = getBatchConfig()
6 if (isEqual(newConfig, lastBatchConfig)) return
7
8 // 1. Nullify the logger first to prevent concurrent writes
9 const oldProvider = firstPartyEventLoggerProvider
10 const oldLogger = firstPartyEventLogger
11 firstPartyEventLogger = null
12
13 // 2. Drain the old provider's buffer
14 try {
15 await oldProvider.forceFlush()
16 } catch { /* Export failures are persisted to disk */ }
17
18 // 3. Rebuild with new configuration
19 firstPartyEventLoggerProvider = null
20 try {
21 initialize1PEventLogging()
22 } catch (e) {
23 // Restore old provider to maintain availability
24 firstPartyEventLoggerProvider = oldProvider
25 firstPartyEventLogger = oldLogger
26 logError(e)
27 return
28 }
29
30 // 4. Shut down old provider in the background
31 void oldProvider.shutdown().catch(() => {})
32}

This is a carefully designed hot-swap process:

  1. Disconnect before reconnecting -- Nullifying the logger causes concurrent logEventTo1P calls to skip (rather than write to a provider that's about to be shut down)
  2. Drain before closing -- forceFlush() ensures events in the old buffer aren't lost
  3. Rollback on failure -- If the new provider fails to create, the old one is restored to maintain availability
  4. Export failures persisted to disk -- The comment indicates that failed export events are written to a disk file, and the new exporter will retry them on startup

Event Sampling

TypeScript
1// src/services/analytics/firstPartyEventLogger.ts Lines 43-85
2export function getEventSamplingConfig(): EventSamplingConfig {
3 return getDynamicConfig_CACHED_MAY_BE_STALE<EventSamplingConfig>(
4 EVENT_SAMPLING_CONFIG_NAME,
5 {},
6 )
7}
8
9export function shouldSampleEvent(eventName: string): number | null {
10 const config = getEventSamplingConfig()
11 const eventConfig = config[eventName]
12
13 // No config = 100% recording
14 if (!eventConfig) return null
15
16 const sampleRate = eventConfig.sample_rate
17 if (typeof sampleRate !== 'number' || sampleRate < 0 || sampleRate > 1) {
18 return null
19 }
20
21 if (sampleRate >= 1) return null // 100%
22 if (sampleRate <= 0) return 0 // Discard
23
24 // Random sampling
25 return Math.random() < sampleRate ? sampleRate : 0
26}

The sampling configuration is fetched dynamically from GrowthBook's tengu_event_sampling_config. The return value semantics:

  • null -- 100% recording, no need to mark sample rate in metadata
  • 0 -- Discard this event
  • 0.05 -- This event was sampled and recorded, with sample_rate: 0.05 marked in metadata for downstream analysis to reconstruct true volumes

The GrowthBook Feature Flag System

src/services/analytics/growthbook.ts manages the GrowthBook SDK client.

The CACHED_MAY_BE_STALE Pattern

Claude Code's GrowthBook call function names all include the _CACHED_MAY_BE_STALE suffix:

TypeScript
1// Used in sink.ts
2checkStatsigFeatureGate_CACHED_MAY_BE_STALE(DATADOG_GATE_NAME)
3
4// Used in firstPartyEventLogger.ts
5getDynamicConfig_CACHED_MAY_BE_STALE<EventSamplingConfig>(
6 EVENT_SAMPLING_CONFIG_NAME, {}
7)
8
9// Used in sinkKillswitch.ts
10getDynamicConfig_CACHED_MAY_BE_STALE<Partial<Record<SinkName, boolean>>>(
11 SINK_KILLSWITCH_CONFIG_NAME, {}
12)

This naming convention is a deliberate design -- it reminds developers at every call site that:

  1. The returned value may be a stale cached value from a previous session
  2. Don't make security-critical decisions based on this value
  3. New values will be loaded asynchronously in the background

Sink Kill Switch

TypeScript
1// src/services/analytics/sinkKillswitch.ts Lines 1-25
2import { getDynamicConfig_CACHED_MAY_BE_STALE } from './growthbook.js'
3
4// Obfuscated name: per-sink analytics killswitch
5const SINK_KILLSWITCH_CONFIG_NAME = 'tengu_frond_boric'
6
7export type SinkName = 'datadog' | 'firstParty'
8
9export function isSinkKilled(sink: SinkName): boolean {
10 const config = getDynamicConfig_CACHED_MAY_BE_STALE<
11 Partial<Record<SinkName, boolean>>
12 >(SINK_KILLSWITCH_CONFIG_NAME, {})
13 return config?.[sink] === true
14}

Note that tengu_frond_boric is an obfuscated config name. If the 1P logging pipeline has issues, operations can set { "firstParty": true } via GrowthBook to immediately stop sending, without needing to push a client update.

The Sink Routing Layer

src/services/analytics/sink.ts is the routing hub for events:

TypeScript
1// src/services/analytics/sink.ts Lines 48-72
2function logEventImpl(eventName: string, metadata: LogEventMetadata): void {
3 // Sampling check
4 const sampleResult = shouldSampleEvent(eventName)
5 if (sampleResult === 0) return
6
7 const metadataWithSampleRate =
8 sampleResult !== null
9 ? { ...metadata, sample_rate: sampleResult }
10 : metadata
11
12 if (shouldTrackDatadog()) {
13 // Datadog is a general-purpose backend -- strip _PROTO_* keys
14 void trackDatadogEvent(
15 eventName,
16 stripProtoFields(metadataWithSampleRate)
17 )
18 }
19
20 // 1P receives the full payload (including _PROTO_*)
21 logEventTo1P(eventName, metadataWithSampleRate)
22}
...

The routing logic has a layered structure:

  1. Sampling -- Global sampling comes first; discarded events never enter any sink
  2. Datadog -- Dual filtering via GrowthBook gate + event allowlist, plus PII stripping
  3. 1P -- Receives complete data (including PII-tagged fields), stored under access-controlled storage

Datadog Gate Fallback Strategy

TypeScript
1// src/services/analytics/sink.ts Lines 29-43
2let isDatadogGateEnabled: boolean | undefined = undefined
3
4function shouldTrackDatadog(): boolean {
5 if (isSinkKilled('datadog')) return false
6
7 if (isDatadogGateEnabled !== undefined) {
8 return isDatadogGateEnabled
9 }
10
11 // Fall back to cached value from previous session
12 try {
13 return checkStatsigFeatureGate_CACHED_MAY_BE_STALE(DATADOG_GATE_NAME)
14 } catch {
15 return false
16 }
17}

Three-tier fallback:

  1. If the kill switch is activated -> disable immediately
  2. If the current session has initialized -> use current value
  3. If not yet initialized -> use previous cached value (may be stale but avoids data loss)

Startup Performance Profiling

src/utils/startupProfiler.ts tracks every phase of CLI startup:

TypeScript
1// src/utils/startupProfiler.ts Lines 26-36
2const DETAILED_PROFILING = isEnvTruthy(process.env.CLAUDE_CODE_PROFILE_STARTUP)
3
4const STATSIG_SAMPLE_RATE = 0.005
5const STATSIG_LOGGING_SAMPLED =
6 process.env.USER_TYPE === 'ant' || Math.random() < STATSIG_SAMPLE_RATE
7
8const SHOULD_PROFILE = DETAILED_PROFILING || STATSIG_LOGGING_SAMPLED

Two modes run in parallel:

  • Detailed profiling -- CLAUDE_CODE_PROFILE_STARTUP=1, manually enabled by any user, writes a complete report to disk
  • Sampled reporting -- 100% of internal users and 0.5% of external users automatically report key phase timings

profileCheckpoint Usage

main.tsx is densely populated with checkpoint calls:

TypeScript
1// profileCheckpoint calls in src/main.tsx (partial)
2profileCheckpoint('main_tsx_entry') // Line 12
3profileCheckpoint('main_tsx_imports_loaded') // Line 209
4profileCheckpoint('main_function_start') // Line 586
5profileCheckpoint('main_warning_handler_initialized') // Line 607
6profileCheckpoint('main_client_type_determined') // Line 849
7profileCheckpoint('main_before_run') // Line 853
8profileCheckpoint('run_function_start') // Line 885
9profileCheckpoint('preAction_start') // Line 908
10profileCheckpoint('preAction_after_mdm') // Line 915
11profileCheckpoint('preAction_after_init') // Line 917
12profileCheckpoint('preAction_after_sinks') // Line 935
13profileCheckpoint('preAction_after_migrations') // Line 951
14profileCheckpoint('preAction_after_remote_settings') // Line 959
15profileCheckpoint('action_handler_start') // Line 1007
16profileCheckpoint('action_tools_loaded') // Line 1878
17profileCheckpoint('action_before_setup') // Line 1904
18profileCheckpoint('action_after_setup') // Line 1936
19profileCheckpoint('action_commands_loaded') // Line 2031
20profileCheckpoint('action_mcp_configs_loaded') // Line 2402

Phase Aggregation

TypeScript
1// src/utils/startupProfiler.ts Lines 49-54
2const PHASE_DEFINITIONS = {
3 import_time: ['cli_entry', 'main_tsx_imports_loaded'],
4 init_time: ['init_function_start', 'init_function_end'],
5 settings_time: ['eagerLoadSettings_start', 'eagerLoadSettings_end'],
6 total_time: ['cli_entry', 'main_after_run'],
7} as const

Fine-grained checkpoints are aggregated into meaningful phases -- import_time is module loading time, settings_time is configuration reading time. This data allows the team to precisely identify startup bottlenecks.

Profiling Reports

Setting CLAUDE_CODE_PROFILE_STARTUP=1 generates a complete report with memory snapshots at startup:

TypeScript
1// src/utils/startupProfiler.ts Lines 65-75
2export function profileCheckpoint(name: string): void {
3 if (!SHOULD_PROFILE) return
4
5 const perf = getPerformance()
6 perf.mark(name)
7
8 // Only capture memory in detailed mode
9 if (DETAILED_PROFILING) {
10 memorySnapshots.push(process.memoryUsage())
11 }
12}

Note the if (!SHOULD_PROFILE) return short-circuit -- for users who aren't sampled, executing profileCheckpoint costs one function call and one boolean check, virtually zero.

Privacy and Analytics Disabling

src/services/analytics/config.ts defines the conditions for disabling analytics:

TypeScript
1// src/services/analytics/config.ts Lines 19-27
2export function isAnalyticsDisabled(): boolean {
3 return (
4 process.env.NODE_ENV === 'test' ||
5 isEnvTruthy(process.env.CLAUDE_CODE_USE_BEDROCK) ||
6 isEnvTruthy(process.env.CLAUDE_CODE_USE_VERTEX) ||
7 isEnvTruthy(process.env.CLAUDE_CODE_USE_FOUNDRY) ||
8 isTelemetryDisabled()
9 )
10}

Analytics are completely disabled in the following cases:

  1. Test environment -- NODE_ENV=test
  2. Third-party cloud providers -- Data from Bedrock, Vertex, and Foundry users should not flow to Anthropic
  3. Privacy level -- User sets no-telemetry or essential-traffic

There's also a more fine-grained control:

TypeScript
1// src/services/analytics/config.ts Lines 36-38
2export function isFeedbackSurveyDisabled(): boolean {
3 return process.env.NODE_ENV === 'test' || isTelemetryDisabled()
4}

Feedback surveys are not restricted by third-party providers -- because surveys are local UI interactions that don't transmit transcript data. Enterprise customers capture responses via OTEL.

Datadog Data Security

The Datadog module has multiple layers of data protection:

TypeScript
1// src/services/analytics/datadog.ts Lines 164-168
2export async function trackDatadogEvent(
3 eventName: string,
4 properties: { [key: string]: boolean | number | undefined },
5): Promise<void> {
6 if (process.env.NODE_ENV !== 'production') return
7
8 // Don't send for 3P providers
9 if (getAPIProvider() !== 'firstParty') return
TypeScript
1// src/services/analytics/datadog.ts Lines 196-217
2 // Normalize MCP tool names to reduce cardinality
3 if (typeof allData.toolName === 'string' &&
4 allData.toolName.startsWith('mcp__')) {
5 allData.toolName = 'mcp'
6 }
7
8 // Normalize model names (external users only)
9 if (process.env.USER_TYPE !== 'ant' && typeof allData.model === 'string') {
10 const shortName = getCanonicalName(allData.model.replace(/\[1m]$/i, ''))
11 allData.model = shortName in MODEL_COSTS ? shortName : 'other'
12 }
13
14 // Truncate dev version numbers
15 if (typeof allData.version === 'string') {
16 allData.version = allData.version.replace(
17 /^(\d+\.\d+\.\d+-dev\.\d{8})\.t\d+\.sha[a-f0-9]+$/,
18 '$1',
19 )
20 }

All three normalization operations serve cardinality control:

  1. MCP tool names -- High-cardinality names like mcp__filesystem__read are normalized to mcp
  2. Model names -- Non-standard model names from external users are normalized to other
  3. Version numbers -- Dev versions have their timestamp and SHA stripped, reducing the number of distinct version tags

GrowthBook Experiment Events

TypeScript
1// src/services/analytics/firstPartyEventLogger.ts Lines 255-298
2export function logGrowthBookExperimentTo1P(
3 data: GrowthBookExperimentData,
4): void {
5 if (!is1PEventLoggingEnabled()) return
6 if (!firstPartyEventLogger || isSinkKilled('firstParty')) return
7
8 const userId = getOrCreateUserID()
9 const { accountUuid, organizationUuid } = getCoreUserData(true)
10
11 const attributes = {
12 event_type: 'GrowthbookExperimentEvent',
13 event_id: randomUUID(),
14 experiment_id: data.experimentId,
15 variation_id: data.variationId,
16 ...(userId && { device_id: userId }),
17 ...(accountUuid && { account_uuid: accountUuid }),
18 ...(organizationUuid && { organization_uuid: organizationUuid }),
19 environment: getEnvironmentForGrowthBook(),
20 }
21
22 firstPartyEventLogger.emit({
23 body: 'growthbook_experiment',
24 attributes,
25 })
26}

GrowthBook A/B experiment assignment events are recorded through the same 1P pipeline. This means experiment analysis and event analysis share the same data infrastructure -- no additional experimentation platform is needed.

Graceful Shutdown

TypeScript
1// src/services/analytics/datadog.ts Lines 151-157
2export async function shutdownDatadog(): Promise<void> {
3 if (flushTimer) {
4 clearTimeout(flushTimer)
5 flushTimer = null
6 }
7 await flushLogs()
8}
9
10// src/services/analytics/firstPartyEventLogger.ts Lines 116-128
11export async function shutdown1PEventLogging(): Promise<void> {
12 if (!firstPartyEventLoggerProvider) return
13 try {
14 await firstPartyEventLoggerProvider.shutdown()
15 } catch {
16 // Ignore shutdown errors
17 }
18}

Before the process exits, gracefulShutdown() calls both functions to ensure buffered events are flushed. Datadog manually flushes its batch; 1P drains the BatchLogRecordProcessor's internal queue through the OpenTelemetry SDK's shutdown() method.

Summary

Claude Code's telemetry system demonstrates best practices for CLI tool observability:

  • Event queue + lazy sink -- Zero-cost event recording during startup, asynchronous draining after initialization completes
  • Type system privacy guards -- LogEventMetadata only allows boolean | number | undefined, preventing code/path leaks at the type level
  • Dual sink architecture -- Datadog (general storage + allowlist filtering + PII stripping) and 1P (access-controlled + complete data)
  • GrowthBook dynamic configuration -- Sample rates, batch parameters, and sink switches can all be adjusted remotely without pushing client updates
  • CACHED_MAY_BE_STALE naming -- Reminds developers at every call site about the staleness of cached data
  • profileCheckpoint -- Zero-cost startup performance tracking with 0.5% sampling and automatic reporting
  • Multiple disable mechanisms -- Environment variables, privacy levels, third-party providers, and GrowthBook kill switches provide layered protection