Skip to content

Create manual incidents

Most incidents are auto-created by monitors. Sometimes they aren’t, and you want one anyway. This page covers the patterns where manual creation is the right move, and the cleanest way to do it.

  • A third-party dependency broke something user-visible. Stripe is degraded, AWS is throttling, your CDN is intermittently 502-ing. Your monitors might be showing a partial signal — but you have first-hand evidence.
  • A bug surfaced in production that no monitor catches. UI broken for a specific browser, auth flow broken only for SSO users.
  • Coordinating a multi-team outage. One canonical incident beats five Slack threads.
  • Pre-announced maintenance went sideways. A deploy you’d protected with a maintenance window actually broke something — the maintenance window suppressed the auto-incident, so create one manually.
  • Don’t manually create an incident for an outage a monitor will catch in 60 seconds anyway. The auto-incident will be cleaner — fewer race conditions with responder_status and affected_http_job_ids.
  • Don’t create an incident as a stand-in for a Slack message. If you don’t intend to publish it on a status page or trigger paging, just message the team directly.

Use this for internal-only tracking. Useful if you want the timeline + paging behavior without publishing.

Terminal window
curl -X POST https://api.siteqwality.com/incident \
-H "Authorization: Bearer $SITEQWALITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"title": "Stripe webhooks intermittently failing",
"severity": "major",
"message": "Seeing ~5% failure rate on inbound Stripe webhooks. Investigating."
}'

Use this when the outage affects customers and they should know.

Terminal window
curl -X POST https://api.siteqwality.com/status_page/$STATUS_PAGE_ID/incident \
-H "Authorization: Bearer $SITEQWALITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"status_page_id": "<status-page-id>",
"title": "Some webhook deliveries delayed",
"severity": "minor",
"message": "We are seeing increased latency on outbound webhooks. Investigating with our delivery provider.",
"affected_http_job_ids": []
}'

The incident appears immediately on the public page.

Even on manual incidents, set affected_http_job_ids if monitors do exist for the broken thing. This makes the incident appear as a banner on the matching status-page components, and groups it correctly in dashboards.

{
"affected_http_job_ids": [
"a3c1f5e0-1234-4abc-9def-aaaaaaaaaaaa",
"b7d2e601-5678-4def-9abc-bbbbbbbbbbbb"
]
}
Use caseSeverity
Documentation site downminor
Login broken for a subset of usersmajor
Full outage, billing affected, security eventcritical

The severity is mostly informational — it drives display order, color, and any custom filters. It doesn’t change escalation behavior on its own; for that, use a different notification group.

After resolving, especially for major and critical incidents:

  • Run a postmortem outside SiteQwality.
  • Link the postmortem doc in a final incident update so it’s preserved on the timeline.