CPU alert on claw-gateway1 — 95.7% (threshold: 90%)
Host: claw-gateway1
CAUSE: CPU exceeded the 90% warning threshold.
IMPACT: Performance may degrade if the trend continues.
ACTION: Monitor for sustained elevation; investigate if it persists beyond 15 minutes.
CPU: 95.7% | Memory: 42.7%
Opened 2026-06-12 00:05 UTC
· Resolved 2026-06-12 00:10 UTC
Alert received from AI Infra Monitor. Host: claw-gateway1, Severity: MEDIUM
△
STATUS CHANGE
2026-06-12 00:05 UTC
OPEN -> INVESTIGATING (auto - low/medium severity)
◎
CONTEXT AGGREGATED
2026-06-12 00:05 UTC
Sources available: 3/3 — Runbook: ✓ | Past incidents: ✓ | Infra health: ✓
✦
Response Plan
2026-06-12 00:05 UTC
Severity
P1 (Critical). Single gateway at 95.7% CPU with +67% upward trend; performance degradation imminent.
Root Cause
Runaway process or resource leak on claw-gateway1
Sustained traffic spike or inefficient query pattern
Actions
Acknowledge alert via Telegram (6055821277).
SSH to claw-gateway1; identify top CPU consumer with top or ps aux --sort=-%cpu; kill if runaway.
Restart affected ADOStack service(s); monitor CPU for 2 min.
If CPU remains >80% after restart, trigger escalation via oncall.ado-runner.com API (curl command in runbook).
If CPU still >80% at 5 min mark, execute sudo reboot (accept 2–3 min downtime).
Watch
CPU % on claw-gateway1 (target: <70% within 5 min).
Error rate / latency on dependent services post-restart.
Escalate If
CPU sustains >80% after service restart or host remains unreachable post-reboot.
△
STATUS CHANGE
2026-06-12 00:10 UTC
Resolved after verification: live CPU is 11.6%, memory 50.8%, no sustained high-CPU process remains. Duplicate medium alert path has been adjusted to stop opening On-Call incidents for yellow threshold breaches.