Severity
P1: 96.2% CPU on claw-gateway1 with +76.7% upward trend; gateway traffic at risk.
Root Cause
- Runaway process consuming CPU resources
- Resource contention or uncontrolled workload spike
Actions
- SSH to claw-gateway1 and run
top -bn1 | head -20 to identify top CPU consumer.
- Kill or restart the offending process; if unknown, restart claw-gateway1 service.
- Confirm CPU drops below 80% and gateway responds to health checks.
- Pull last 15min of logs from that process for post-incident analysis.
- Notify Diego Perez (already alerted) of resolution or escalation.
Watch
- CPU trending back below 70% within 2 minutes of action.
- Gateway request latency and error rate remain stable.
Escalate If
CPU remains >90% after process kill, or gateway health checks fail post-recovery.