Sample traces
A high-traffic service can produce millions of spans per hour. Most of them are uninteresting — successful requests that all look the same. Sampling is the standard answer: send a representative subset, keep cost manageable, retain the ability to debug.
Two flavors
Section titled “Two flavors”Head-based sampling
Section titled “Head-based sampling”Decision made at trace start. Cheap, deterministic, but blind to outcome — you might sample away the one slow trace.
// OpenTelemetry — sample 10% of tracesconst { NodeSDK } = require('@opentelemetry/sdk-node');const { TraceIdRatioBasedSampler } = require('@opentelemetry/sdk-trace-base');
const sdk = new NodeSDK({ serviceName: 'demo-api', sampler: new TraceIdRatioBasedSampler(0.1), // 10% // ...});Use head sampling for sustained-high-volume services where you can afford to lose individual traces.
Tail-based sampling
Section titled “Tail-based sampling”Decision made after the trace completes. Expensive (you have to buffer everything until you decide), but smart — keep all errors, all slow traces, plus a sample of normal ones.
OpenTelemetry has a tail sampling collector that runs as a sidecar / daemon between your apps and SiteQwality:
processors: tail_sampling: decision_wait: 30s policies: - name: errors-policy type: status_code status_code: { status_codes: [ERROR] } - name: slow-policy type: latency latency: { threshold_ms: 1000 } - name: sample-rest type: probabilistic probabilistic: { sampling_percentage: 5 }
exporters: otlphttp: endpoint: https://traces.siteqwality.com/v1/traces headers: Authorization: Bearer ${SITEQWALITY_API_KEY}
service: pipelines: traces: processors: [tail_sampling] exporters: [otlphttp]Use tail sampling when:
- You can run an OTel collector in your infra.
- Most traces are uninteresting but the few interesting ones are critical.
- You want to keep 100% of errors regardless of volume.
Recommended starting point
Section titled “Recommended starting point”For most teams:
- Low volume (under 10 spans/sec): No sampling. Send everything.
- Mid volume (10–500 spans/sec): Head sample at 30–50% in app, send the rest.
- High volume (over 500 spans/sec): Tail sample via a collector. Keep 100% errors + 5% successful.
Per-request override
Section titled “Per-request override”Sometimes you want to force a trace through regardless of the sampler:
const { trace, SpanKind } = require('@opentelemetry/api');
app.post('/api/critical-thing', async (req, res) => { const span = trace.getTracer('demo').startSpan('critical_thing', { kind: SpanKind.SERVER, attributes: { 'sampling.priority': 1 }, // hint to sampler: keep me }); // ...});Many sampler implementations honor sampling.priority. Check yours.
Cost vs visibility tradeoff
Section titled “Cost vs visibility tradeoff”| Sampling rate | Use case |
|---|---|
| 100% | Dev/staging, low-traffic prod, anything < 50 spans/sec. |
| 50% | Mid-traffic prod where you can spot patterns from half the data. |
| 10% | High-traffic prod with head-based sampling. |
| 1–5% + 100% errors | Very high traffic, tail-based. |