Live
#5 medium infra_monitor
Infra Monitor: CPU trending upward; monitor for sustained growth.
Host: claw-gateway1 Server health is generally good with all metrics within safe operating ranges. However, CPU usage is climbing at a concerning rate (+15.9% over the last 5 readings) and warrants close monitoring to ensure it doesn't breach warning thresholds. Memory, disk, and process counts remain healthy with comfortable headroom. CPU: 37.6% | Memory: 57.7% Anomalies: CPU usage trending upward at +15.9% over last 5 readings; current 37.6% but trajectory is concerning
Opened 2026-04-27 08:33 UTC · Resolved 2026-04-27 08:40 UTC
Handoff Notes ← Dashboard
Timeline
WEBHOOK
2026-04-27 08:33 UTC
Alert received from AI Infra Monitor. Host: claw-gateway1, Severity: MEDIUM
STATUS CHANGE
2026-04-27 08:33 UTC
OPEN -> INVESTIGATING (auto - low/medium severity)
CONTEXT AGGREGATED
2026-04-27 08:33 UTC
Sources available: 3/3 — Runbook: ✓ | Past incidents: ✓ | Infra health: ✓
Response Plan
2026-04-27 08:33 UTC

Severity

Medium — Single host CPU trending upward; no service impact yet but trajectory concerning.

Root Cause

  • Process leak or runaway workload on claw-gateway1
  • Legitimate traffic spike or batch job scheduled

Actions

  1. SSH to claw-gateway1; run top -b -n 1 | head -20 to identify top CPU consumers.
  2. Cross-check recent deployments, cron jobs, or traffic patterns for timing correlation.
  3. If single process >20% CPU, check logs for errors; consider graceful restart if safe.
  4. Set alert to trigger at 65% CPU to catch breach earlier than 80% threshold.
  5. Document findings in incident ticket; tag on-call lead if root cause unclear.

Watch

  • CPU every 5 min — if sustains >50% for 3 consecutive readings, escalate.
  • Memory and disk — confirm they remain flat (rules out cascading resource pressure).

Escalate If

CPU reaches 80% or if upward trend continues for >20 minutes despite investigation.

STATUS CHANGE
2026-04-27 08:35 UTC
Auto-resolver: CPU at 37.6% (below 70% clear threshold) — clean check 1/2
STATUS CHANGE
2026-04-27 08:35 UTC
Auto-resolver: CPU at 37.6% (below 70% clear threshold) — clean check 1/2
STATUS CHANGE
2026-04-27 08:40 UTC
Auto-resolver: CPU at 37.6% (below 70% clear threshold) — clean check 2/2
STATUS CHANGE
2026-04-27 08:40 UTC
AUTO-RESOLVED: CPU sustained below 70% for 2 consecutive checks. Current value: 37.6%
·
HANDOFF
2026-04-30 06:16 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU spike detected on claw-gateway1 (claw-gateway1) on 2026-04-27 at 08:33 UTC; alert triggered at medium severity due to upward trending pattern. - **Resolution**: Auto-resolver cleared the incident after CPU dropped to 37.6% and sustained below 70% threshold for 2 consecutive checks (resolved 08:40 UTC). No manual intervention required. - **Current State**: Host is healthy; CPU normalized. Underlying root cause (process leak, traffic spike, or scheduled batch job) was not explicitly identified during investigation. - **Watch For**: Monitor claw-gateway1 for CPU creeping upward again over next shift. If recurrence occurs, SSH in and run `top` to identify top CPU consumers; cross-check against recent deployments or scheduled jobs. - **No Action Required**: Incident resolved automatically; runbook and past incident context were available if escalation had been needed.
·
HANDOFF
2026-05-03 06:26 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident Summary**: CPU spike detected on claw-gateway1 on 2026-04-27 at 08:33 UTC; alert triggered at medium severity due to upward trending CPU usage. - **Resolution**: Auto-resolver cleared the incident after CPU sustained below 70% threshold for 2 consecutive checks (37.6% at resolution). No manual intervention required; system self-corrected. - **Current State**: RESOLVED as of 08:40 UTC. Host stable with CPU at 37.6%. No service impact was observed during the spike. - **Root Cause**: Undetermined—likely process leak, runaway workload, or legitimate traffic/batch job spike. No specific culprit identified before auto-resolution. - **Watch For**: Monitor claw-gateway1 CPU trends over next shift for recurrence. If spike returns, investigate top CPU consumers via `top` command and correlate with recent deployments, scheduled cron jobs, or traffic patterns. Refer to runbook for detailed troubleshooting steps.
·
HANDOFF
2026-05-03 17:47 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU spike detected on claw-gateway1 on 2026-04-27 at 08:33 UTC; medium severity alert triggered due to upward trending CPU usage. - **Resolution**: Auto-resolver cleared incident after CPU sustained below 70% threshold for 2 consecutive checks (37.6% at resolution). No manual intervention required; likely transient spike or brief workload. - **Current State**: RESOLVED as of 08:40 UTC. Host is stable; no ongoing service impact. - **Watch For**: Monitor claw-gateway1 CPU trending over next 24-48 hours. If spike recurs, investigate for process leaks, runaway workloads, or scheduled batch jobs. Check recent deployments and traffic patterns for correlation. - **Next Steps**: Review process list if CPU rises above 50% again; escalate if sustained growth resumes or impacts service performance.
·
HANDOFF
2026-05-03 19:19 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU spike detected on claw-gateway1 on 2026-04-27 at 08:33 UTC; medium severity alert triggered due to upward trending CPU usage. - **Resolution**: Alert auto-resolved at 08:40 UTC after CPU dropped to 37.6% and remained stable below 70% threshold for 2 consecutive checks; no manual intervention required. - **Current State**: RESOLVED — claw-gateway1 CPU stable at 37.6%; no ongoing service impact. - **Root Cause**: Undetermined; likely causes were temporary process leak, runaway workload, or legitimate traffic spike that self-corrected. - **Watch For**: Monitor claw-gateway1 CPU trending in coming hours/days. If spike recurs, investigate top CPU consumers via `top` command, cross-reference recent deployments/cron jobs, and review application logs for errors before restarting services.
·
HANDOFF
2026-05-05 04:06 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU spike detected on claw-gateway1 on 2026-04-27 at 08:33 UTC; medium severity alert triggered due to upward trending CPU usage. - **Resolution**: CPU auto-resolved after dropping to 37.6% and remaining below 70% threshold for 2 consecutive checks (cleared by 08:40 UTC). No manual intervention required. - **Current State**: RESOLVED. Host is stable with CPU sustained at 37.6%. No service impact was observed during the spike. - **Root Cause**: Not definitively identified—likely either a transient process leak/runaway workload or a legitimate traffic/batch job spike that self-corrected. - **Watch For**: Monitor claw-gateway1 CPU trends over the next 24-48 hours for recurrence. If spike repeats, SSH to the host and run `top` to identify top CPU consumers; correlate with recent deployments or cron jobs.
·
HANDOFF
2026-05-14 09:26 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU spike detected on claw-gateway1 on 2026-04-27 at 08:33 UTC; medium severity alert triggered due to upward trending CPU usage. - **Resolution**: CPU automatically resolved after dropping to 37.6% and sustaining below 70% threshold for 2 consecutive checks (completed 08:40 UTC). No manual intervention required. - **Current State**: Incident is RESOLVED. Host is stable at 37.6% CPU with no ongoing service impact. - **Next Steps**: Monitor claw-gateway1 for CPU trending patterns over the next 24-48 hours. If spike recurs, investigate for process leaks, runaway workloads, or unscheduled batch jobs using `top` command per runbook. - **Potential Root Causes**: Process leak, runaway workload, or legitimate traffic/batch job spike—timing correlation with deployments or cron jobs should be checked if incident reoccurs.
·
HANDOFF
2026-05-22 19:20 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU trending upward on claw-gateway1 triggered MEDIUM severity alert on 2026-04-27 at 08:33 UTC. Potential causes identified as process leak, runaway workload, or legitimate traffic spike. - **Resolution**: Alert auto-resolved at 08:40 UTC after CPU stabilized below 70% threshold (37.6%) for two consecutive checks. No manual intervention was required. - **Current State**: RESOLVED. claw-gateway1 CPU is stable at 37.6% and within normal parameters. No service impact was observed. - **Watch For**: Monitor claw-gateway1 for sustained CPU growth over next 24-48 hours. If upward trend recurs, investigate top processes via `top` command and cross-reference recent deployments or scheduled jobs for correlation. - **Next Steps**: No immediate action required; consider reviewing host's process baseline and cron schedule if similar alerts trigger again.
·
HANDOFF
2026-05-29 04:57 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU trending upward on claw-gateway1 triggered MEDIUM severity alert on 2026-04-27 at 08:33 UTC. Alert auto-resolved after CPU dropped to 37.6% and remained below 70% threshold for 2 consecutive checks. - **Root Cause**: Likely process leak, runaway workload, or legitimate traffic spike; no single root cause was definitively identified before resolution. - **Resolution**: CPU naturally trended downward and stabilized below threshold by 08:40 UTC. No manual intervention required; alert auto-resolved. - **Current State**: RESOLVED. claw-gateway1 CPU stable at 37.6%. No ongoing service impact. - **Watch For**: Monitor claw-gateway1 for recurring CPU spikes. If pattern repeats, investigate recent deployments, scheduled cron jobs, or traffic changes. Consider enabling process-level monitoring or reviewing runbook for deeper diagnostics on next occurrence.
·
HANDOFF
2026-05-31 15:00 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU trending upward on claw-gateway1 triggered MEDIUM severity alert on 2026-04-27 at 08:33 UTC due to suspected process leak or runaway workload. - **Resolution**: Alert auto-resolved after CPU dropped to 37.6% and remained below 70% threshold for 2 consecutive checks (resolved 08:40 UTC). No manual intervention required. - **Current State**: RESOLVED. claw-gateway1 CPU stable at 37.6%. No ongoing service impact detected. - **Watch For**: Monitor claw-gateway1 CPU trending over next 24-48 hours. If upward trend resumes, investigate top CPU consumers via `top` command and correlate with recent deployments or scheduled batch jobs. - **Next Steps**: Review deployment logs and cron schedules if CPU spike recurs. Consider baseline profiling if pattern repeats.
·
HANDOFF
2026-05-31 15:18 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU trending upward on claw-gateway1 triggered MEDIUM severity alert on 2026-04-27 at 08:33 UTC; suspected process leak or runaway workload. - **Resolution**: Alert auto-resolved after CPU dropped to 37.6% and remained below 70% threshold for 2 consecutive checks (resolved 2026-04-27 at 08:40 UTC). - **Current State**: RESOLVED — claw-gateway1 CPU stable at 37.6%; no ongoing service impact. - **Root Cause**: Undetermined; likely transient spike from legitimate traffic or scheduled batch job. Runaway process ruled out by sustained low CPU post-spike. - **Next Shift Action**: Monitor claw-gateway1 CPU trends over next 24-48 hours. If spike recurs, SSH in and run `top` to identify top CPU consumers; cross-check recent deployments and cron jobs for correlation.
·
HANDOFF
2026-06-06 10:42 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU trending upward on claw-gateway1 triggered MEDIUM severity alert on 2026-04-27 at 08:33 UTC; suspected root cause was process leak or runaway workload. - **Resolution**: Alert auto-resolved after CPU dropped to 37.6% and sustained below 70% threshold for 2 consecutive checks (resolved at 08:40 UTC same day). - **Current State**: RESOLVED. Host is stable; no manual intervention was required as CPU self-corrected. - **Watch For**: Monitor claw-gateway1 for recurring CPU spikes over next few days. If pattern repeats, investigate recent deployments, scheduled batch jobs, or traffic anomalies using `top` command to identify persistent high-CPU processes. - **Next Steps**: No action required unless spike recurs; review runbook for process leak diagnostics if alert triggers again.
·
HANDOFF
2026-06-06 12:40 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU trending upward on claw-gateway1 triggered MEDIUM severity alert on 2026-04-27 at 08:33 UTC; suspected root cause was process leak or runaway workload. - **Resolution**: Alert auto-resolved at 08:40 UTC after CPU dropped to 37.6% and remained below 70% threshold for 2 consecutive checks (5-minute intervals). - **Current State**: RESOLVED. Host is stable; no ongoing service impact observed. - **Next Steps**: Monitor claw-gateway1 for CPU trending patterns. If spike recurs, SSH in and run `top -b -n 1 | head -20` to identify top CPU consumers; cross-check against recent deployments or scheduled jobs. - **Watch For**: Sustained upward CPU trend or repeated spikes on this host—may indicate unresolved process leak requiring deeper investigation or restart.
·
HANDOFF
2026-06-12 22:29 UTC
Handoff notes generated: # Shift Handoff Notes - **Incident**: CPU trending upward on claw-gateway1 triggered MEDIUM severity alert on 2026-04-27 at 08:33 UTC; suspected root causes were process leak or runaway workload. - **Resolution**: Alert auto-resolved at 08:40 UTC after CPU dropped to 37.6% and remained stable below 70% threshold for 2 consecutive checks (5+ minutes). - **Current State**: RESOLVED. claw-gateway1 CPU is stable at 37.6%. No service impact was observed. No manual intervention was required. - **Next Shift Action**: Monitor claw-gateway1 CPU trends over next 24-48 hours for recurrence. If spike returns, SSH in and run `top` to identify top CPU consumers; cross-check against recent deployments or scheduled jobs. - **Watch For**: Sustained CPU growth >70% or repeated spikes on claw-gateway1—may indicate underlying process leak requiring investigation and restart.
Update Status
Details
ID #5
Severity MEDIUM
Source infra_monitor
Status RESOLVED
Opened 2026-04-27 08:33