Severity
P1 Critical — Gateway node saturated (96.2% CPU); potential service degradation if sustained beyond 15 minutes.
Root Cause
- Runaway process consuming CPU (memory healthy, disk normal)
- Resource leak or unoptimized query spike on gateway
Actions
- SSH to claw-gateway1; run
top -b -n1 | head -20 to identify top CPU consumer
- If single process >80% CPU: kill/restart that process; if distributed across many, proceed to step 3
- Check recent deployments or config changes in last 2 hours via git log
- If CPU remains >80% after 5 minutes, trigger graceful restart:
sudo systemctl restart claw-gateway
- If CPU still >80% after restart, authorize host reboot (expect 2–3 min downtime); notify stakeholders first
Watch
- CPU trending; alert if stays >85% for 10+ min or spikes >98%
- Response latency / error rates on gateway endpoints
Escalate If
CPU remains >80% after restart attempt or host becomes unresponsive.