Live
#51 high infra_monitor
Infra Monitor: Critical CPU usage spike detected, immediate investigation required
Host: claw-gateway1 CPU usage has reached 96.2% and is trending sharply upward (+76.7% over the last 5 readings), breaching the critical threshold. All other metrics remain healthy with memory at 41.5%, disk utilization low across all partitions, and process count at 138. The rapid CPU escalation suggests a runaway process or resource contention issue that requires immediate attention. CPU: 96.2% | Memory: 41.5% Anomalies: CPU usage at 96.2% exceeds critical threshold (>95%), CPU trending steeply upward: +76.7% increase over last 5 readings, Potential runaway process or uncontrolled resource consumption
Opened 2026-06-11 00:03 UTC · Resolved 2026-06-11 00:20 UTC
Handoff Notes ← Dashboard
Timeline
WEBHOOK
2026-06-11 00:03 UTC
Alert received from AI Infra Monitor. Host: claw-gateway1, Severity: HIGH
CONTEXT AGGREGATED
2026-06-11 00:03 UTC
Sources available: 3/3 — Runbook: ✓ | Past incidents: ✓ | Infra health: ✓
Response Plan
2026-06-11 00:03 UTC

Severity

P1: 96.2% CPU on claw-gateway1 with +76.7% upward trend; gateway traffic at risk.

Root Cause

  • Runaway process consuming CPU resources
  • Resource contention or uncontrolled workload spike

Actions

  1. SSH to claw-gateway1 and run top -bn1 | head -20 to identify top CPU consumer.
  2. Kill or restart the offending process; if unknown, restart claw-gateway1 service.
  3. Confirm CPU drops below 80% and gateway responds to health checks.
  4. Pull last 15min of logs from that process for post-incident analysis.
  5. Notify Diego Perez (already alerted) of resolution or escalation.

Watch

  • CPU trending back below 70% within 2 minutes of action.
  • Gateway request latency and error rate remain stable.

Escalate If

CPU remains >90% after process kill, or gateway health checks fail post-recovery.

STATUS CHANGE
2026-06-11 00:15 UTC
Auto-resolver: CPU at 30.1% (below 70% clear threshold) — clean check 1/2
STATUS CHANGE
2026-06-11 00:15 UTC
Auto-resolver: CPU at 30.1% (below 70% clear threshold) — clean check 1/2
STATUS CHANGE
2026-06-11 00:20 UTC
Auto-resolver: CPU at 30.1% (below 70% clear threshold) — clean check 2/2
STATUS CHANGE
2026-06-11 00:20 UTC
Auto-resolver: CPU at 30.1% (below 70% clear threshold) — clean check 2/2
STATUS CHANGE
2026-06-11 00:20 UTC
AUTO-RESOLVED: CPU sustained below 70% for 2 consecutive checks. Current value: 30.1%
STATUS CHANGE
2026-06-11 00:20 UTC
AUTO-RESOLVED: CPU sustained below 70% for 2 consecutive checks. Current value: 30.1%
Update Status
Details
ID #51
Severity HIGH
Source infra_monitor
Status RESOLVED
Opened 2026-06-11 00:03