Skill Vetter
一个面向 Security 场景的 Agent 技能。原始说明:Security-first skill vetting for AI agents. Use before installing any skill from ClawdHub, GitHub, or other sources. Checks for red flags, permission scope, and suspicious patterns.
name: skill-usefulness-audit
slug: skill-usefulness-audit
description: Audits installed agent skills for usage, overlap, burden, risk, and missing evidence. Use only when the user asks to audit, clean up, merge, delete, or review skills.
version: 0.2.13
tags: ["audit","skills","ablation","openclaw"]
user-invocable: true
disable-model-invocation: true
argument-hint: --skills-root PATH --usage-file FILE
homepage: https://github.com/gongyu0918-debug/skill-usefulness-audit
metadata: {"openclaw":{"skillKey":"skill-usefulness-audit","requires":{"bins":["python"]},"homepage":"https://github.com/gongyu0918-debug/skill-usefulness-audit"}}
This ClawHub bundle is packaged for OpenClaw. Install it from an OpenClaw workspace with:
openclaw skills install skill-usefulness-audit
OpenClaw picks up installed workspace skills in the next session. For other agent hosts, use the GitHub repository instead: https://github.com/gongyu0918-debug/skill-usefulness-audit
本 ClawHub 包是 OpenClaw 专用发布包。其他 agent 版本请访问 GitHub 仓库:https://github.com/gongyu0918-debug/skill-usefulness-audit
Use this skill to judge whether installed skills still deserve to stay installed.
It turns vague "this feels useless" opinions into a repeatable audit based on usage evidence, overlap, outcome impact, quality burden, confidence, community prior, and static risk hints.
Run this skill only after a direct user request.
Do not invoke it implicitly during normal task execution.
Audit these layers in order:
Treat API and tool skills as protected capability skills during ablation.
Examples: Excel, DOCX, PDF, browser automation, deployment, OCR, external API wrappers, MCP/API gateway helpers.
Search user-provided roots first.
Fallback to OpenClaw-local roots such as ./skills, ./.agents/skills, ~/.openclaw/skills, or ~/.agents/skills.
Prefer native counters, logs, or telemetry.
Read calls, recent_30d_calls, recent_90d_calls, last_used_at, and active_days when present.
Also read optional burden fields: executions, script_failures, repair_turns, reference_loads, and false_triggers.
Fallback to transcript mentions only when native counts are unavailable.
SKILL.md. Extract name, description, headings, scripts, references, assets, resource size metrics, and source path.
Use api, tool, or general.
Use the protected path for api and tool.
Compare descriptions, headings, and resource names.
Keep the top overlap peer and similarity score for each skill.
general skills. Start with local triage signals instead of full replay.
Prioritize low final score, high overlap, high quality burden, frequent activation, weak evidence, and missing ablation.
Use --ablation-plan-out to write the candidate list, pairwise judge protocol, configurable early-stop rules, model-cost estimates, and accuracy tradeoff.
Run actual replay only for candidates selected by that plan.
Penalize over-triggering with low execution or low ablation impact.
Penalize bloated SKILL.md, overlong frontmatter descriptions, excessive reference loading, hidden reference files, vague resource names, long references without a table of contents, reference/assets dumps, executable assets, script count bloat, script maintenance smells, script failure, script syntax errors, and repeated agent repair.
Record shell, network, install-hook, packaging, protected-path, persistence, dynamic-exec, or private-content patterns as static hints, not as a safety proof.
Accept local registry exports through --community-file.
Treat these metrics as external prior, not local proof.
final_score. Read references/scoring-rubric.md.
Include a full ranking table, a recommended-actions table, a delete-candidate table, and a short evidence note for each skill.
Include report_mode, score_breakdown, quality_penalty, quality_evidence, and community_breakdown in JSON output.
Read references/ablation-protocol.md before running ablation.
For each eligible skill:
Do not ablate api or tool skills through fake no-tool simulations.
Use the protected-capability branch in the rubric for those skills.
Run the audit script after collecting evidence:
python scripts/skill_usefulness_audit.py audit \
--skills-root ./skills \
--usage-file ./usage.json \
--history-file ./history.jsonl \
--ablation-file ./ablation.json \
--community-file ./community.json \
--markdown-out ./skill-audit-report.md \
--json-out ./skill-audit-report.json \
--ablation-plan-out ./skill-ablation-plan.json
When the host exposes the skill directory, prefer an absolute script path.
Input contracts:
--usage-file: JSON, JSONL, CSV, or TSV with per-skill usage evidence.--history-file: raw transcript export used only when direct usage counts are weak or missing. Mentions become history_mentions / suspected_invocations, not direct calls.--ablation-file: normalized JSON or JSONL with skill-on versus skill-off case results.--community-file: optional offline JSON, JSONL, CSV, or TSV registry metrics.--ablation-plan-out: optional JSON plan that estimates model cost and narrows ablation to high-value candidates.--ablation-baseline-cases, --ablation-initial-cases, --ablation-expand-cases, --ablation-max-cases: optional case-count overrides for the ablation plan.Run without extra files only when you need a structure-only audit.
Usage, community, and ablation evidence become lower-confidence in that mode.
Always return these tables:
rank, skill, source, kind, calls, recent_30d, usage, uniqueness, impact, community, confidence, risk, local, burden, final, verdict, action, basis
skill, local, burden, final, confidence, risk, action, advice
skill, local, burden, final, kind, action, trigger, advice
Always include these JSON fields:
report_mode: strong-evidence, partial-evidence, or structure-only.score_breakdown: per-skill usage, uniqueness, impact, community, static risk, quality, and confidence details.quality_penalty: 0.0-2.0 deduction from local_score.quality_penalty_uncapped: raw quality burden before the 2.0 cap.quality_evidence: concrete burden flags and evidence.community_breakdown: registry signal components when community data is present.ablation_plan: cost-efficient plan with candidate skills, model-cost estimates, stop rules, and expected accuracy impact.action_advice: plain-language recommendation for the user.risk_review: concise human review guidance for any static risk flags.Keep deletion advice conservative for system or host-core skills.
Recommend narrowing or merging before deletion when two high-overlap skills still serve distinct host integrations.
Treat delete, merge-delete, and quarantine-review as manual-review recommendations only; never remove or isolate a skill automatically from this report.
scripts/skill_usefulness_audit.py: compatibility wrapper for the modular audit package.scripts/skill_usefulness_audit_lib/: collect metadata, score skills, scan static risk hints, and render Markdown/JSON tables.references/scoring-rubric.md: 10-point scoring rules, confidence logic, community prior, and action thresholds.references/ablation-protocol.md: normalized replay method for historical conversation tests.