Skip to content

Sample traces

A high-traffic service can produce millions of spans per hour. Most of them are uninteresting — successful requests that all look the same. Sampling is the standard answer: send a representative subset, keep cost manageable, retain the ability to debug.

Decision made at trace start. Cheap, deterministic, but blind to outcome — you might sample away the one slow trace.

// OpenTelemetry — sample 10% of traces
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');
const sdk = new NodeSDK({
serviceName: 'demo-api',
sampler: new TraceIdRatioBasedSampler(0.1), // 10%
// ...
});

Use head sampling for sustained-high-volume services where you can afford to lose individual traces.

Decision made after the trace completes. Expensive (you have to buffer everything until you decide), but smart — keep all errors, all slow traces, plus a sample of normal ones.

OpenTelemetry has a tail sampling collector that runs as a sidecar / daemon between your apps and SiteQwality:

otel-collector-config.yaml
processors:
tail_sampling:
decision_wait: 30s
policies:
- name: errors-policy
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-policy
type: latency
latency: { threshold_ms: 1000 }
- name: sample-rest
type: probabilistic
probabilistic: { sampling_percentage: 5 }
exporters:
otlphttp:
endpoint: https://traces.siteqwality.com/v1/traces
headers:
Authorization: Bearer ${SITEQWALITY_API_KEY}
service:
pipelines:
traces:
processors: [tail_sampling]
exporters: [otlphttp]

Use tail sampling when:

  • You can run an OTel collector in your infra.
  • Most traces are uninteresting but the few interesting ones are critical.
  • You want to keep 100% of errors regardless of volume.

For most teams:

  • Low volume (under 10 spans/sec): No sampling. Send everything.
  • Mid volume (10–500 spans/sec): Head sample at 30–50% in app, send the rest.
  • High volume (over 500 spans/sec): Tail sample via a collector. Keep 100% errors + 5% successful.

Sometimes you want to force a trace through regardless of the sampler:

const { trace, SpanKind } = require('@opentelemetry/api');
app.post('/api/critical-thing', async (req, res) => {
const span = trace.getTracer('demo').startSpan('critical_thing', {
kind: SpanKind.SERVER,
attributes: { 'sampling.priority': 1 }, // hint to sampler: keep me
});
// ...
});

Many sampler implementations honor sampling.priority. Check yours.

Sampling rateUse case
100%Dev/staging, low-traffic prod, anything < 50 spans/sec.
50%Mid-traffic prod where you can spot patterns from half the data.
10%High-traffic prod with head-based sampling.
1–5% + 100% errorsVery high traffic, tail-based.