Self-Improving Agent
一个面向 Automation 场景的 Agent 技能。原始说明:Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...
一个面向 Automation 场景的 Agent 技能。原始说明:Diagnose and fix Windows-specific OpenClaw issues including event loop delays, gateway task health, WhatsApp connectivity, stuck subagents, prewarm blocking,...
name: openclaw-health-monitor
version: 1.6.0
description: Cross-platform diagnostics for OpenClaw gateways. Checks gateway health, event loop degradation, WhatsApp connectivity, service state, stuck background subagents, prewarm blocking, and generates diagnostic bundles for bug reports. See SECURITY.md for privacy disclosures. Use when: (1) Gateway feels slow or unresponsive, (2) CLI health checks take unusually long, (3) WhatsApp is not receiving messages, (4) Agent responses are delayed, (5) After OpenClaw version upgrades, (6) Routine system health check.
license: MIT
compatibility: openclaw
metadata:
openclaw:
emoji: "🩺"
tags: [diagnostics, monitoring, health, cross-platform, windows, linux, macos]
Cross-platform diagnostics for OpenClaw gateways. Covers the most common performance problems discovered through real-world debugging on Windows 11 native, WSL2, Linux, and macOS environments.
See SECURITY.md for data collection and privacy disclosures. External alerts are off by default in v1.4.0+.
Run a comprehensive health snapshot:
openclaw health --verbose --json
openclaw channels status --probe
openclaw status --all
Key metrics to watch:
Check if the gateway process is running:
# Cross-platform (bash/zsh):
ps aux | grep openclaw | grep -v grep
# Windows PowerShell:
# Get-Process -Name "node" | Where-Object { $_.CommandLine -like "*openclaw*" }
Verify the gateway is listening on its health port:
# Cross-platform (bash/zsh):
curl -s --max-time 5 http://127.0.0.1:18789/health
# Windows PowerShell:
# Invoke-RestMethod -Uri "http://127.0.0.1:18789/health" -TimeoutSec 5
Platform-specific service management commands below.
# Service status
systemctl --user status openclaw-gateway
# Journal logs
journalctl --user -u openclaw-gateway -n 50
# Process health
ps aux | grep openclaw | grep -v grep
# Service status
launchctl list | grep openclaw
# Disk usage
du -sh ~/.openclaw
# Process health
ps aux | grep openclaw | grep -v grep
Get-ScheduledTask -TaskName "OpenClaw Gateway" | Format-List State, LastRunTime, LastTaskResult
Healthy: State=Ready, LastTaskResult=0
Degraded: State=Running but gateway unresponsive → stuck restart, kill node processes and restart
Failed: LastTaskResult non-zero → check gateway log for errors
The most common performance regression appears in 2026.5.x:
# Cross-platform (bash/zsh):
openclaw health --json | grep -E "eventLoop|p99|delayMax"
# Windows PowerShell:
# openclaw health --json | Select-String "eventLoop|p99|delayMax"
# Cross-platform (bash/zsh):
grep -E "provider auth state pre-warmed|eventLoopMax" "$TMPDIR"/openclaw/*.log ~/.openclaw/logs/*.log 2>/dev/null
# Windows PowerShell:
# Select-String "provider auth state pre-warmed|eventLoopMax" "$env:TEMP\openclaw\openclaw-*.log"
Symptoms: CLI health taking 20+ seconds, "degraded" event loop, startup model warmup timed out in logs.
A key diagnostic finding: the CLI health command can be 20-30x slower than the HTTP health endpoint on Windows:
# CLI health (slow on Windows)
openclaw health --timeout 20000
# Cross-platform (bash/zsh):
curl -s -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" --max-time 10 "http://127.0.0.1:18789/health"
# Windows PowerShell:
# $token = $env:OPENCLAW_GATEWAY_TOKEN
# Invoke-RestMethod -Uri "http://127.0.0.1:18789/health" -Headers @{"Authorization"="Bearer $token"} -TimeoutSec 10
If CLI is slow but HTTP is fast (< 500ms): gateway is healthy, CLI tool has Windows WebSocket auth overhead.
Background subagents can block gateway restart for 5+ minutes:
# Cross-platform (bash/zsh):
grep -E "restart.*deferred.*background task.*active" "$TMPDIR"/openclaw/*.log ~/.openclaw/logs/*.log 2>/dev/null
# Windows PowerShell:
# Select-String "restart.*deferred.*background task.*active" "$env:TEMP\openclaw\openclaw-*.log"
If found: kill node processes and restart gateway. The stuck subagents will not recover.
# Cross-platform (bash/zsh):
grep -E "WebSocket.*closed.*408|Retry.*\/12|timed out waiting for.*WhatsApp" "$TMPDIR"/openclaw/*.log ~/.openclaw/logs/*.log 2>/dev/null
# Windows PowerShell:
# Select-String "WebSocket.*closed.*408|Retry.*\/12|timed out waiting for.*WhatsApp" "$env:TEMP\openclaw\openclaw-*.log"
If memory-lancedb is installed but not configured:
disabled until configuredmemory-core or configure embeddings# Cross-platform (bash/zsh):
openclaw plugins list | grep "memory"
# Windows PowerShell:
# openclaw plugins list | Select-String "memory"
2026.5.22 introduced provider auth prewarming that can block for 30-79s:
# Cross-platform (bash/zsh):
grep -E "provider auth state pre-warmed|startup model warmup timed out" "$TMPDIR"/openclaw/*.log ~/.openclaw/logs/*.log 2>/dev/null
# Windows PowerShell:
# Select-String "provider auth state pre-warmed|startup model warmup timed out" "$env:USERPROFILE\AppData\Local\Temp\openclaw\openclaw-*.log"
Fix: Set OPENCLAW_SKIP_PROVIDER_AUTH_PREWARM=1 in gateway.cmd or environment variables.
Fix (future): Add { "gateway": { "providerAuthPrewarm": { "enabled": false } } } to config (pending PR merge).
WARNING: This tool collects data that may contain sensitive information. The winhealth_diagnostics tool gathers:
include_logs is enabled — defaults to disabled)Diagnostic bundles may contain system metadata, log-derived details, file paths, identifiers, and configuration structure even after OpenClaw's built-in sanitation. Review the contents before sharing. Only share diagnostic bundles with trusted recipients for troubleshooting purposes. See SECURITY.md.
For bug reports or sharing diagnostics:
openclaw gateway diagnostics export
This creates a sanitized zip at ~/.openclaw/logs/support/ with:
OPENCLAW_SKIP_PROVIDER_AUTH_PREWARM=1pkill -9 node — This kills ALL Node.js processes on your machine, including unrelated applications, active development servers, and background workers. It can cause data loss in unsaved work. Use only as a last resort when the gateway is completely unresponsive and other steps have failed. Prefer restarting only the gateway: systemctl --user restart openclaw-gateway (Linux), launchctl kickstart -k gui/$UID/com.openclaw.gateway (macOS), or Stop-ScheduledTask -TaskName "OpenClaw Gateway"; Start-ScheduledTask -TaskName "OpenClaw Gateway" (Windows).openclaw channels status --probe — verify WhatsAppchannels.whatsapp.allowFromdmPolicy is not "disabled"loggedOut in logs: openclaw channels logout --channel whatsapp; openclaw channels login --channel whatsappagents.defaults.timeoutSeconds — should be ≥ 300sopenclaw config validate — check for schema changesopenclaw doctor --fix — repair config driftWhen the companion plugin (@jordan-thirkle/openclaw-winhealth) is installed, automated background monitoring is active. Use the tool winhealth_check for the current snapshot, winhealth_diagnostics for a full bundle, and winhealth_alerts to manage alert state.
alertChannel: "none"). Enable only after reviewing SECURITY.md.sessionStorage by default and cleared on tab close.pkill -9 node which kills all Node.js processes — use only as a last resort.