Live
#32 medium infra_monitor
CPU alert on claw-gateway1 — 94.5% (threshold: 90%)
Host: claw-gateway1 CAUSE: CPU exceeded the 90% warning threshold. IMPACT: Performance may degrade if the trend continues. ACTION: Monitor for sustained elevation; investigate if it persists beyond 15 minutes. CPU: 94.5% | Memory: 43.8%
Opened 2026-05-15 13:20 UTC · Resolved 2026-05-15 13:35 UTC
Handoff Notes ← Dashboard
Timeline
WEBHOOK
2026-05-15 13:20 UTC
Alert received from AI Infra Monitor. Host: claw-gateway1, Severity: MEDIUM
STATUS CHANGE
2026-05-15 13:20 UTC
OPEN -> INVESTIGATING (auto - low/medium severity)
CONTEXT AGGREGATED
2026-05-15 13:20 UTC
Sources available: 3/3 — Runbook: ✓ | Past incidents: ✓ | Infra health: ✓
Response Plan
2026-05-15 13:20 UTC

Severity

MEDIUM-HIGH: claw-gateway1 CPU at 94.5% with upward trend; approaching P1 critical threshold (95%).

Root Cause

  • Runaway process or resource leak in ADOStack service(s)
  • Sustained traffic spike or workload increase

Actions

  1. SSH to claw-gateway1; run top to identify CPU-consuming process.
  2. Restart affected ADOStack service(s); monitor CPU for 2 minutes post-restart.
  3. If CPU remains >80% after restart, trigger escalation via oncall.ado-runner.com API.
  4. If CPU stays >85% for >5 min total, authorize host reboot (expect 2–3 min downtime).
  5. Acknowledge alert in Telegram (chat ID: 6055821277).

Watch

  • CPU trend: if continues climbing past 95%, P1 activation imminent.
  • Memory: currently healthy at 43.8%; watch for secondary spikes indicating cascading failure.

Escalate If

CPU sustained >85% for 5+ minutes OR reaches 95% threshold.

STATUS CHANGE
2026-05-15 13:30 UTC
Auto-resolver: CPU at 55.8% (below 70% clear threshold) — clean check 1/2
STATUS CHANGE
2026-05-15 13:35 UTC
Auto-resolver: CPU at 55.8% (below 70% clear threshold) — clean check 1/2
STATUS CHANGE
2026-05-15 13:35 UTC
Auto-resolver: CPU at 55.8% (below 70% clear threshold) — clean check 2/2
STATUS CHANGE
2026-05-15 13:35 UTC
AUTO-RESOLVED: CPU sustained below 70% for 2 consecutive checks. Current value: 55.8%
·
HANDOFF
2026-05-22 02:56 UTC
Handoff notes generated: # Shift Handoff Notes: claw-gateway1 CPU Alert - **What happened:** CPU spike on claw-gateway1 reached 94.5% (threshold: 90%) at 13:20 on 2026-05-15. Alert suspected runaway process or resource leak in ADOStack service(s). - **What was done:** Alert auto-resolved at 13:35 after CPU dropped to 55.8% and remained below 70% threshold for 2 consecutive checks. No manual intervention was required. - **Current state:** RESOLVED. claw-gateway1 CPU stable at 55.8%. All monitoring systems nominal. - **Watch for:** Monitor claw-gateway1 CPU over next 2-4 hours for recurrence. If spike returns or sustains >80%, escalate to oncall.ado-runner.com and investigate ADOStack service(s) for resource leaks or traffic anomalies. - **Documentation:** Runbook and past incident context available; no follow-up actions pending unless spike recurs.
·
HANDOFF
2026-05-29 04:57 UTC
Handoff notes generated: # Shift Handoff Notes: claw-gateway1 CPU Alert - **Incident:** CPU spike on claw-gateway1 reached 94.5% at 13:20 UTC on 2026-05-15, exceeding 90% threshold with upward trend toward P1 critical (95%). - **Root cause identified:** Suspected runaway process or resource leak in ADOStack service(s), likely triggered by sustained traffic spike or workload increase. - **Resolution:** CPU auto-recovered to 55.8% within ~15 minutes; incident auto-resolved at 13:35 UTC after passing two consecutive health checks below 70% threshold. No manual intervention required. - **Current state:** RESOLVED. claw-gateway1 CPU stable at 55.8%. All systems nominal. - **Watch for:** Monitor claw-gateway1 CPU over next shift for re-occurrence. If spike returns, follow runbook: SSH and run `top` to identify offending process, restart affected ADOStack service, and escalate via oncall.ado-runner.com if CPU remains >80% post-restart.
·
HANDOFF
2026-05-31 14:59 UTC
Handoff notes generated: # Shift Handoff Notes: claw-gateway1 CPU Alert - **Incident:** CPU spike on claw-gateway1 reached 94.5% (threshold: 90%) on 2026-05-15 at 13:20 UTC with upward trend, triggering MEDIUM-HIGH severity alert. - **Root Cause:** Suspected runaway process or resource leak in ADOStack service(s); likely triggered by sustained traffic spike or workload increase. - **Resolution:** CPU automatically stabilized and dropped to 55.8% by 13:30 UTC. Auto-resolver confirmed resolution after 2 consecutive clean checks below 70% threshold; incident auto-resolved at 13:35 UTC. - **Current State:** RESOLVED. claw-gateway1 CPU nominal (55.8%). No manual intervention was required. - **Watch For:** Monitor for recurrence of CPU spikes on claw-gateway1. If spike returns with upward trend, SSH to host and run `top` to identify culprit process. Escalate via oncall.ado-runner.com if CPU sustains >80% after service restart.
·
HANDOFF
2026-05-31 19:53 UTC
Handoff notes generated: # Shift Handoff Notes: claw-gateway1 CPU Alert - **Incident:** CPU spike on claw-gateway1 reached 94.5% (threshold: 90%) at 13:20 UTC on 2026-05-15 with upward trend, approaching critical P1 threshold. - **Root Cause:** Suspected runaway process or resource leak in ADOStack service(s), or sustained traffic spike. - **Resolution:** CPU auto-resolved to 55.8% within 15 minutes; auto-resolver confirmed 2 consecutive clean checks below 70% threshold and auto-closed the incident at 13:35 UTC. - **Current State:** RESOLVED. Host is stable with normal CPU utilization. - **Watch For:** Monitor claw-gateway1 for recurrence of CPU spikes; if ADOStack CPU rises above 80% again, follow escalation procedure via oncall.ado-runner.com API. Consider reviewing ADOStack service logs for resource leaks or traffic patterns if spike repeats.
·
HANDOFF
2026-06-06 10:43 UTC
Handoff notes generated: # Shift Handoff Notes: claw-gateway1 CPU Alert - **Incident:** CPU spike on claw-gateway1 reached 94.5% (threshold: 90%) on 2026-05-15 at 13:20 UTC with upward trend, approaching P1 critical threshold (95%). - **Root Cause:** Suspected runaway process or resource leak in ADOStack service(s); likely triggered by sustained traffic spike or workload increase. - **Resolution:** CPU automatically resolved to 55.8% by 13:35 UTC (15 min duration). Alert auto-resolved after passing 2 consecutive clean checks below 70% threshold. - **Current State:** RESOLVED. claw-gateway1 CPU stable at 55.8%. No manual intervention was required; system self-recovered. - **Watch For:** Monitor claw-gateway1 CPU trends over next shift. If spikes recur or ADOStack processes show sustained high resource usage, investigate for persistent resource leak and consider service restart or escalation to oncall.ado-runner.com API.
·
HANDOFF
2026-06-09 05:56 UTC
Handoff notes generated: # Shift Handoff Notes: claw-gateway1 CPU Alert - **Incident:** CPU spike on claw-gateway1 reached 94.5% (threshold: 90%) on 2026-05-15 at 13:20 UTC with upward trend, approaching critical threshold. - **Root Cause:** Suspected runaway process or resource leak in ADOStack service(s); sustained traffic spike was identified as likely trigger. - **Resolution:** CPU auto-resolved to 55.8% within 15 minutes; auto-resolver confirmed 2 consecutive clean checks below 70% threshold and closed the incident at 13:35 UTC. - **Current State:** RESOLVED — claw-gateway1 CPU stable at 55.8%; no ongoing alerts. Incident auto-closed. - **Watch For:** Monitor claw-gateway1 CPU trends over next shift. If spikes recur above 85%, investigate ADOStack service processes using `top` and consider restart or escalation per runbook. Check infra health dashboard for sustained traffic anomalies.
Update Status
Details
ID #32
Severity MEDIUM
Source infra_monitor
Status RESOLVED
Opened 2026-05-15 13:20