Skip to content

Metrics

A metric is a numeric time-series — a value plus a timestamp plus a set of tags. Use metrics for anything you’d put on a chart: request counts, error rates, latencies, queue depths, business KPIs.

  • You want to alert on a numeric threshold (error_rate > 1%, queue_depth > 1000).
  • You want a chart over time, optionally split by a tag (requests by region).
  • You’re sending data with high cardinality you can’t afford to log every event of (e.g. 10K req/s).
  • For one-off events with rich context — that’s a log.
  • For arbitrary, high-cardinality data (per-user IDs, full URLs). Tag explosion will hurt query performance and your bill.

Current value of something. The value goes up and down independently.

{
"type": "gauge",
"name": "queue.depth",
"value": 42,
"tags": { "queue": "orders" }
}

Examples: queue depth, connection pool size, in-flight requests, memory usage.

Monotonically increasing value. SiteQwality computes deltas/rates server-side.

{
"type": "counter",
"name": "http.requests.total",
"value": 1,
"tags": { "method": "GET", "status": "200" }
}

Examples: requests served, errors, jobs processed.

Distribution of values bucketed by an upper bound (le). Same shape as Prometheus histograms.

{
"type": "histogram",
"name": "http.request.duration_ms",
"buckets": [
{ "le": "10", "count": 0 },
{ "le": "50", "count": 5 },
{ "le": "100", "count": 12 },
{ "le": "500", "count": 18 },
{ "le": "+Inf", "count": 20 }
],
"sum": 4250,
"count": 20,
"tags": { "endpoint": "/api/checkout" }
}

Examples: request latency, request body size, computation time. Allows p50/p95/p99 queries.

Tags are arbitrary key/value strings. Use them consistently:

TagConvention
serviceLogical service name. service:api, service:billing.
envenv:prod, env:staging, env:dev.
regionregion:us-east-1.
endpointThe route. endpoint:/api/checkout.
status_classstatus_class:2xx rather than per-status-code (which explodes cardinality).

Avoid high-cardinality tags like user IDs, request IDs, or full URLs. The rule of thumb: if it’s unique per request, it’s a log field, not a metric tag.

When querying:

  • avg, min, max, sum — work on any metric type.
  • p50, p95, p99 — only on histograms.
  • rate — only on counters; computes per-second rate over the bucket.

Group by any tag to split the series. Filter by any tag to scope.

Metrics ingestion is billed per data point per month. Histograms count as one data point regardless of bucket count. There’s a hard cap of 1000 metrics per ingest request — batch larger payloads into multiple requests.