AI AGENT SKILLS

Web to Markdown

一个面向 Research 场景的 Agent 技能。原始说明：Extracts readable markdown from user-provided URLs via a deterministic fallback chain (markdown.new → r.jina.ai). Use when the user supplies specific URLs an...

下载技能包打开来源页 Research

SKILL.md

name: web-to-md
description: "Extracts readable markdown from user-provided URLs via a deterministic fallback chain (markdown.new → r.jina.ai). Use when the user supplies specific URLs and wants reliable extraction, summarization, or analysis."
version: 1.0.0
author: chdlc
license: MIT-0
metadata:
openclaw:
requires:
bins: ["curl"]
hermes:
tags: [web, extraction, markdown, url, content]
related_skills: [use-tinyfish, browser-automation]
category: utility

Web to Markdown

Deterministic, console-first extraction workflow for user-provided URLs. Enforces a fixed fallback chain to maximize content quality without open-ended browsing.

When to Use

The user provides one or more specific URLs.
The task requires reading, extracting, summarizing, or analyzing those URLs.
A deterministic fallback order is preferred over open-ended browsing.

Do not use for open-ended web discovery unless the user explicitly asks for discovery first.

Fallback Chain

For each URL, attempt in order. Stop at the first sufficient result.

1. markdown.new (AI mode)

curl -s "https://markdown.new/{URL}?method=ai"

2. markdown.new (Auto mode)

Only if step 1 is insufficient or timed out:

curl -s "https://markdown.new/{URL}?method=auto"

3. r.jina.ai (Browser engine)

Only if steps 1–2 are insufficient or timed out:

curl -s "https://r.jina.ai/{URL}" -H "X-Engine: browser"

4. Agent tools (last resort)

If all three prefixes fail, report the failure and fall back to the agent's own extraction tools. This is outside the skill's chain — acknowledge it as a fallback.

Quality Gate

After each step, content is insufficient when any condition is true:

Main article or body text is missing
Content is clearly truncated
Output is mostly navigation, boilerplate, placeholders, or login walls
Useful text is too short for the task
Important sections requested by the user are absent

Rule of thumb: Under ~1,200 useful characters for an article page is almost certainly truncated. Naturally short pages (announcements, status updates) may be legitimately brief — use judgment.

URL Handling

Preserve the protocol when present.
Ensure the URL is shell-safe and quoted in all curl commands.
Process each URL independently when multiple are provided.

Provenance Reporting

Report exactly one final source label per extracted URL in your response:

| Label | When |
|---|---|
| markdown.new:ai | method=ai was sufficient |
| markdown.new:auto | method=auto was sufficient (ai failed) |
| r.jina.ai | r.jina.ai was sufficient (both markdown.new failed) |
| agent-tools | All three prefixes failed; agent used own tools |

Workflow

Scope gate — Only process URLs explicitly provided by the user. If discovery is needed, use web search first and confirm candidate URLs before extraction.
Normalize — Quote URLs, preserve protocol.
Extract — Run the fallback chain per URL.
Quality gate — Check each result against the insufficiency conditions.
Continue — Use the richest sufficient source for the task.
Report — Include provenance labels in the final response.

Best Practices

Keep extraction deterministic — explicit fallback transitions, state why each happened.
Prefer reproducible commands with quoted URLs.
Conservative timeout handling: continue immediately to the next fallback when blocked.
Preserve source traceability via provenance labels.
Avoid tool-specific assumptions beyond curl and standard HTTP endpoints.

Edge Cases

Page blocks automated access: Skip to next fallback immediately.
Multiple URLs: Apply the same sequence to each independently.
Naturally short pages: Accept shorter content when it satisfies the request.
All prefixes fail: Report failure clearly, then use agent tools as last resort.

Common Pitfalls

Output format must be markdown. If any level returns raw HTML or another format, it breaks the contract. Test each level independently.
Don't skip testing lower fallback levels just because the top level works. A chain is only as reliable as its weakest link.
Quality is subjective — the 1,200-char heuristic is a guideline, not a hard rule. Apply judgment for short-form content.

Verification Checklist

[ ] curl is installed (which curl)
[ ] Extraction starts with markdown.new?method=ai
[ ] method=auto is tried only after ai fails
[ ] r.jina.ai is tried only after both markdown.new attempts fail
[ ] All three prefixes failing → report + fall back to agent tools
[ ] Quality checks include: missing body, truncation, boilerplate, too-short content
[ ] Final response includes provenance label per URL

适用场景

分类

Research Research 低风险技能筛选

风险等级

风险标签

may need API key network access

依赖

安装难度

curl

文件

1

MD SKILL.md SKILL.md 4,844 B

Research

Multi Search Engine

一个面向 Research 场景的 Agent 技能。原始说明：Multi search engine integration with 16 engines (7 CN + 9 Global). Supports advanced search operators, time filters, site search, privacy engines, and Wolfra...

Research 低风险

Polymarket

一个面向 Research 场景的 Agent 技能。原始说明：Query Polymarket prediction markets. Check odds, find trending markets, search events, track price movements.

Research 低风险

Baidu web search

一个面向 Research 场景的 Agent 技能。原始说明：Search the web using Baidu AI Search Engine (BDSE). Use for live information, documentation, or research topics.

Research 低风险

Clawdbot Documentation Expert

一个面向 Research 场景的 Agent 技能。原始说明：Clawdbot documentation expert with decision tree navigation, search scripts, doc fetching, version tracking, and config snippets for all Clawdbot features

Research 低风险

Find Skills Skill

一个面向 Research 场景的 Agent 技能。原始说明：Search and discover OpenClaw skills from various sources. Use when: user wants to find available skills, search for specific functionality, or discover new s...

Research 低风险

Memory Setup

一个面向 Research 场景的 Agent 技能。原始说明：Enable and configure Moltbot/Clawdbot memory search for persistent context. Use when setting up memory, fixing "goldfish brain," or helping users configure memorySearch in their config. Covers MEMORY.md, daily logs, and vector search setup.

SKILL.md

Web to Markdown

When to Use

Fallback Chain

1. markdown.new (AI mode)

2. markdown.new (Auto mode)

3. r.jina.ai (Browser engine)

4. Agent tools (last resort)

Quality Gate

URL Handling

Provenance Reporting

Workflow

Best Practices

Edge Cases

Common Pitfalls

Verification Checklist

相关技能

Multi Search Engine

Polymarket

Baidu web search

Clawdbot Documentation Expert

Find Skills Skill

Memory Setup