文件预览

evals.json

查看 Skill Doctor 技能包中的文件内容。

文件内容

evals/evals.json

{
  "skill_name": "skill-doctor",
  "evals": [
    {
      "id": 0,
      "name": "audit-security-and-conflicts",
      "prompt": "I've collected a few skills in /sessions/serene-dreamy-ritchie/skill-evals/fixtures/installed-skills and I'm a little nervous about them. Can you check whether any are a security risk and whether any of them overlap or step on each other?",
      "expected_output": "Runs skill_doctor.py audit on the fixture dir. Flags quick-deploy as high-risk (curl-pipe-bash, hard-coded ghp_ token, env-var exfiltration), and identifies the meeting-notes-pro vs meeting-recap trigger conflict. Leads with the security risk, shows file/line evidence, and frames flags as 'worth a look' rather than proof of malice.",
      "files": [],
      "assertions": [
        {"text": "Names quick-deploy as the security-risk skill", "kind": "qualitative"},
        {"text": "Identifies at least one concrete red flag (curl|bash, hard-coded token, or env-var exfiltration)", "kind": "qualitative"},
        {"text": "Identifies the meeting-notes-pro vs meeting-recap trigger overlap", "kind": "qualitative"},
        {"text": "Runs the bundled skill_doctor.py tool rather than hand-rolling the analysis", "kind": "qualitative"}
      ]
    },
    {
      "id": 1,
      "name": "which-fires-debug-triggering",
      "prompt": "When I ask my agent to 'summarize this meeting' it sometimes gives me the wrong format — I think two of my skills are fighting over it. The skills are in /sessions/serene-dreamy-ritchie/skill-evals/fixtures/installed-skills. Which one actually fires for that, and what should I do about it?",
      "expected_output": "Runs skill_doctor.py which \"summarize this meeting\". Reports that meeting-notes-pro and meeting-recap both match and the choice is ambiguous, and recommends tightening one skill's description so they stop competing.",
      "files": [],
      "assertions": [
        {"text": "Identifies both meeting-notes-pro and meeting-recap as competing for the prompt", "kind": "qualitative"},
        {"text": "Flags the match as ambiguous / a near-tie", "kind": "qualitative"},
        {"text": "Recommends disambiguating or tightening one skill's description", "kind": "qualitative"}
      ]
    }
  ]
}