Skip to content

Understand the grace period

If your nightly backup ran at 02:00 yesterday and is currently running at 02:05 today, is it “late”? SiteQwality has to draw the line somewhere — that’s the grace period.

A cron check is considered failing when:

now - last_received_at > check_interval_seconds × 2

That is — the implicit grace period is one full check interval. A daily job (interval = 86400s) doesn’t alert until 48 hours have passed without a ping. A 5-minute job alerts after 10 minutes of silence.

This is conservative on purpose: most jobs vary by ±10% in runtime, and adding paging stress for “you’re 2 minutes late” is more harm than help.

If your job must finish within 5 minutes of its schedule:

  • Don’t lower check_interval_seconds below the actual interval (that creates false positives).
  • Instead, alert on runtime metrics — emit a metric for job duration and alert when it crosses your threshold. The cron check stays as the catch-all for “didn’t run at all.”

If your job sometimes takes 30 minutes and sometimes 6 hours:

  • Set check_interval_seconds based on the schedule between runs, not the runtime. A 6-hour job that runs once a day still has a 24-hour interval.
  • Make sure your job pings at the end, not the start.

For high-frequency jobs (check_interval_seconds < 5min), the 2× grace can be too tight if the job ever has a hiccup. Consider:

  • Bumping the interval to 5 or 10 minutes and accepting some staleness in the alert.
  • Wrapping the job’s pings in retry logic so a single network blip doesn’t trigger the check.

Cron checks don’t know whether your job succeeded — only whether it pinged. If the job crashes after starting but before pinging, the next interval’s check will catch it.

For finer-grained “did the job succeed?” tracking:

  • Have the job emit a metric — job.runs.completed{name="nightly_backup", status="success"} — and alert on the failure rate.
  • Or have the job ping a different check on success vs. failure (you’ll need two cron checks but the signal is precise).