Steven's Knowledge

Instrumentation

Auto vs manual spans, attributes, error tracking, context propagation across HTTP and queues

Instrumentation

Tracing is only as useful as what you instrument. This page is the playbook for getting useful traces — not just "we have traces."

Auto vs Manual

Auto-instrumentationManual spans
Patches popular libraries (HTTP, DB, queues, frameworks)Code you write explicitly
Free coverage; minimal setupBusiness-domain spans
Generic attributes (http.method, db.statement)Domain-specific attributes (user.tier, order.total)
Updates with library versionsYou maintain

Use both. Auto for the boring transport-level stuff, manual for the things that matter to your business.

What Auto-Instrumentation Covers

In Node:

LibrarySpans created
express / fastify / koaOne span per request
http / httpsOne span per outbound request
pg, mysql, mysql2, mongodb, redis, ioredis, cassandraOne span per query
kafkajs, amqplib, aws-sdk (SQS/SNS/Kinesis)One span per produce/consume
grpc, aws-sdk (other services)One span per RPC/API call
graphqlOne span per resolver

Similar coverage in Python (opentelemetry-instrumentation-*), Java (the agent auto-discovers), Go (per-library packages), etc.

Manual Spans for Business Logic

const { trace, SpanStatusCode } = require('@opentelemetry/api');
const tracer = trace.getTracer('orders');

async function processOrder(order) {
  return await tracer.startActiveSpan('process-order', async (span) => {
    try {
      span.setAttributes({
        'order.id': order.id,
        'order.total_cents': order.total,
        'order.item_count': order.items.length,
        'user.id': order.userId,
      });

      const validated = await validate(order);          // gets its own span if instrumented
      const charged = await charge(order);
      const fulfilled = await fulfill(order);

      span.setAttribute('order.fulfilled', true);
      return { validated, charged, fulfilled };
    } catch (err) {
      span.recordException(err);
      span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
      throw err;
    } finally {
      span.end();
    }
  });
}

startActiveSpan makes this span the current context — child spans created by code inside automatically link.

Attributes

Attributes are key/value pairs on a span. They're searchable, filterable, and aggregatable in the UI.

Semantic Conventions

OTel defines standard names. Use them where they apply:

ConventionExamples
HTTPhttp.method, http.status_code, http.url, http.target
Databasedb.system, db.statement, db.name, db.operation
Messagingmessaging.system, messaging.destination.name, messaging.operation
Cloudcloud.region, cloud.availability_zone, cloud.account.id
Userenduser.id, enduser.role
Customapp.* prefix for your own

Standard names mean your dashboards work across services and your tooling understands them.

What to Add

AddDon't add
User ID / account IDEmail / PII (unless hashed)
Resource IDs (order, product)Full request body
Business tier (free / pro / enterprise)Auth tokens / passwords
Feature-flag variantsHigh-cardinality IDs that won't help search
Outcomes ("error.type=PaymentDeclined")Stack traces (use events / status)
span.setAttribute('user.id', userId);
span.setAttribute('app.feature_flag.new_checkout', 'treatment');
span.setAttribute('app.payment.method', 'card');
span.setAttribute('app.cart.items.count', items.length);

Errors

Two ways to record errors:

// 1. Record the exception (preserves stack, type)
span.recordException(err);

// 2. Mark the span as errored
span.setStatus({
  code: SpanStatusCode.ERROR,
  message: err.message,
});

Do both. recordException adds a structured error event; setStatus sets the span's status. Together your trace shows the error inline and the failed span is highlighted in the UI.

For HTTP errors, status codes ≥ 500 automatically mark spans as errored in most auto-instrumentations. Be careful with 4xx — many auto-instrumentors leave them as OK by default; configure to your taste.

Events

Events are timestamped messages within a span:

span.addEvent('retry', {
  'retry.count': attempt,
  'retry.reason': 'connection-refused',
});

span.addEvent('cache.miss', { 'cache.key': cacheKey });

Events are like log lines, but tied to the span. Easier to correlate than separate logs.

Context Propagation Across Protocols

HTTP (automatic with OTel)

client.fetch(...) → injects traceparent header
server middleware  → extracts, creates child span

Just use the auto-instrumented HTTP client. You don't write any code.

Kafka

const { propagation, context } = require('@opentelemetry/api');

// Producer
const span = tracer.startSpan('publish-order-created');
const ctx = trace.setSpan(context.active(), span);
const headers = {};
propagation.inject(ctx, headers);
await producer.send({ topic: 'orders', messages: [{ value, headers }] });
span.end();

// Consumer
await consumer.run({
  eachMessage: async ({ message }) => {
    const parentCtx = propagation.extract(context.active(), message.headers);
    await tracer.startActiveSpan('process-order-created', { kind: 4 }, parentCtx, async (span) => {
      // process...
      span.end();
    });
  },
});

Kafka auto-instrumentation (when enabled) handles this for you in Node; in Go and other languages you often write it.

Background Jobs (Sidekiq / BullMQ / Celery)

Pattern: inject the trace context as job metadata at enqueue; extract and continue in the worker.

// Enqueue
await queue.add('send-email', {
  to: user.email,
  body: '...',
  _trace: getCurrentTraceContext(),    // your serialization
});

// Worker
const job = await queue.process(async (data) => {
  await tracer.startActiveSpan('send-email', { kind: 4 }, contextFromHeaders(data._trace), async (span) => {
    // ...
  });
});

Many job runners now have OTel integrations; check yours first.

Database Queries

Database queries are auto-spanned. The valuable bit is correlating with DB-side logs. Add the trace ID to your SQL queries via comments:

const traceId = trace.getActiveSpan()?.spanContext().traceId;
await db.query(`/* trace_id=${traceId} */ SELECT * FROM users WHERE id = ?`, [id]);

Now your slow query log shows the trace ID — you can jump from a slow query to its parent trace in your APM.

Trace IDs in Logs

Critically: emit the trace ID with every log line. Otherwise correlating logs to traces is painful.

const pino = require('pino')({
  mixin: () => ({
    trace_id: trace.getActiveSpan()?.spanContext().traceId,
    span_id: trace.getActiveSpan()?.spanContext().spanId,
  }),
});

pino.info({ user_id: 42 }, 'order placed');

Now every log line has trace_id. Your log aggregator (ELK, Loki, Datadog) can link directly to your tracing UI.

Baggage: Cross-Span Context

Baggage is OTel's way of propagating key/value across all spans in a trace:

const { propagation } = require('@opentelemetry/api');

// Set baggage
const baggage = propagation.getBaggage(context.active()) || propagation.createBaggage();
const newBaggage = baggage.setEntry('user.tier', { value: 'enterprise' });
const newCtx = propagation.setBaggage(context.active(), newBaggage);

context.with(newCtx, async () => {
  // baggage is automatically propagated across HTTP calls
  await downstreamCall();
});

// Read baggage anywhere
const tier = propagation.getBaggage(context.active())?.getEntry('user.tier')?.value;

Useful for context that should flow throughout the request: user tier, region, feature-flag bucket. Avoid baggage for high-cardinality data — it's sent on every HTTP call in the trace.

Sampling at the SDK

For volume control (we'll cover head/tail sampling in Best Practices):

const { TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');

const sdk = new NodeSDK({
  sampler: new TraceIdRatioBasedSampler(0.1),   // sample 10%
  // ...
});

Decision happens at the head — the first service to see the trace decides for everyone. Errored traces should always be sampled — see Best Practices.

What's Next

You can instrument anything and propagate context across protocols. Best Practices covers operating tracing at scale — sampling strategy, retention, costs, correlating with logs and metrics.

On this page