AI AGENT SKILLS

Html2md

一个面向 Automation 场景的 Agent 技能。原始说明：Convert HTML pages to clean, agent-friendly markdown using Readability + Turndown. Strips navigation, ads, footers, cookie banners, social CTAs. Supports URL...

下载技能包打开来源页 Automation

SKILL.md

name: html2md
description: Convert HTML pages to clean, agent-friendly markdown using Readability + Turndown. Strips navigation, ads, footers, cookie banners, social CTAs. Supports URL fetch, local files, stdin, token budgeting, and output flags. Ideal for research tasks, content extraction, and web scraping in agent workflows.

html2md

Aggressive HTML-to-markdown converter for AI agents. Mozilla Readability isolates main content, Turndown converts to markdown, then heavy post-processing strips remaining noise.

Full flag reference and advanced examples: references/usage.md

Setup

cd <skill-dir>/scripts
npm install
npm link        # makes `html2md` globally available

Requires Node.js 22+.

Quick Start

html2md https://example.com                    # fetch + convert
html2md --file page.html                       # local HTML file
cat page.html | html2md --stdin                # pipe from stdin
html2md --max-tokens 2000 https://example.com  # budget-aware truncation
html2md --no-links https://example.com         # strip hrefs, keep text
html2md --json https://example.com             # JSON: {title, url, markdown, tokens}

Key Features

Readability extraction — kills navbars, sidebars, ads, cookie banners. Falls back to cleaned <body> when Readability returns too little (e.g. HN's table layout).
Token budgeting — --max-tokens N keeps all headings, fills remaining budget in document order, appends [truncated — N more tokens]. Uses 1 token ≈ 4 chars heuristic.
Post-processing — strips HTML comments, zero-width chars, social CTAs, breadcrumbs, empty headings, collapses excess blank lines.
Error handling — bad URLs, timeouts (15s), non-HTML content, missing files all exit code 1 with descriptive stderr.
Output modes — plain markdown or --json for programmatic use.

When to Use vs `web_fetch`

| Use html2md when | Use web_fetch when |
|-------------------|---------------------|
| Reading pages in cron jobs / sub-agents | Quick one-off fetch in main session |
| Token budget matters (--max-tokens) | Page is a JSON/XML API endpoint |
| Heavy nav/ads/footers to strip | JS rendering not needed |
| Need JSON output | Simple pages |

Security Considerations

html2md fetches URLs and reads local files — that's its job. If you're passing untrusted input:

URL fetching: the tool will fetch whatever URL it's given. Don't pass user-controlled URLs without validation if your threat model includes SSRF.
File reading: --file reads any path the process can access. In agent workflows, the agent controls the path — this is equivalent to the agent using cat.
No shell execution: the tool itself never spawns shells or runs commands. When calling from scripts, use execFileSync (not execSync) to avoid shell injection.
No data exfiltration: output goes to stdout only. No network requests beyond the single URL fetch. No telemetry, no analytics, no phone-home.
Dependencies: jsdom (Mozilla DOM implementation), Readability (Mozilla content extractor), Turndown (HTML→markdown). All widely audited, open source libraries.

Examples

# Read a Paul Graham essay within 2000 tokens
html2md --max-tokens 2000 https://paulgraham.com/greatwork.html

# HN front page as clean text, no link noise
html2md --no-links --no-images https://news.ycombinator.com

# Get token count before committing
html2md --json https://example.com | jq .tokens

# Pipe to file
html2md https://docs.example.com/api > api-docs.md

Automation

Self-Improving Agent

一个面向 Automation 场景的 Agent 技能。原始说明：Captures learnings, errors, and corrections to enable continuous improvement. Use when: (1) A command or operation fails unexpectedly, (2) User corrects Clau...

Automation 低风险

Self-Improving + Proactive Agent

一个面向 Automation 场景的 Agent 技能。原始说明：Self-reflection + Self-criticism + Self-learning + Self-organizing memory. Agent evaluates its own work, catches mistakes, and improves permanently. Use when...

Automation 未知

Proactive Agent

一个面向 Automation 场景的 Agent 技能。原始说明：Transform AI agents from task-followers into proactive partners that anticipate needs and continuously improve. Now with WAL Protocol, Working Buffer, Autonomous Crons, and battle-tested patterns. Part of the Hal Stack 🦞

Automation 未知

ontology

一个面向 Automation 场景的 Agent 技能。原始说明：Typed knowledge graph for structured agent memory and composable skills. Use when creating/querying entities (Person, Project, Task, Event, Document), linkin...

Automation 低风险

Skill Creator

一个面向 Automation 场景的 Agent 技能。原始说明：Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.

Automation 未知

Desktop Control

一个面向 Automation 场景的 Agent 技能。原始说明：Advanced desktop automation with mouse, keyboard, and screen control

SKILL.md

html2md

Setup

Quick Start

Key Features

When to Use vs web_fetch

Security Considerations

Examples

相关技能

Self-Improving Agent

Self-Improving + Proactive Agent

Proactive Agent

ontology

Skill Creator

Desktop Control

When to Use vs `web_fetch`