Weather
一个面向 Data & APIs 场景的 Agent 技能。原始说明:Get current weather and forecasts (no API key required).
一个面向 Data & APIs 场景的 Agent 技能。原始说明:Scrape and extract structured data from creator profiles across platforms such as YouTube, Instagram, TikTok, Twitter/X, LinkedIn, Twitch, and personal websi...
name: scrape-creator-profile
version: 1.0.0
description: >
Scrape and extract structured data from creator profiles across platforms
such as YouTube, Instagram, TikTok, Twitter/X, LinkedIn, Twitch, and
personal websites. Use this skill whenever the user asks to look up, fetch,
analyze, or collect data about a content creator, influencer, or public
profile — even if they don't say "scrape". Triggers: "get info on this
creator", "pull their profile", "what's their follower count", "scrape
this creator page", "summarize this influencer", "fetch creator stats",
"analyze this YouTube channel", "get TikTok profile data".
metadata:
openclaw:
requires:
bins:
env: []
envVars:
required: false
description: >
Optional Apify API token for enhanced scraping via Apify Actors.
Without this, the skill falls back to web_fetch + browser mode.
required: false
description: >
Milliseconds to wait between requests (default: 1500). Increase if
hitting rate limits.
allowed-tools:
Extract structured profile data from a content creator's public page.
Returns a normalized JSON object regardless of platform.
Use this skill for any request that involves:
If the user provides a URL, jump straight to Step 2. If they only give a
name or handle, start at Step 1.
If the user gave a username/handle without a URL:
hints), then ask the user to confirm.
| Platform | URL Pattern | Example |
|--------------|------------------------------------------|--------------------------------------------|
| YouTube | https://www.youtube.com/@{handle} | https://www.youtube.com/@mkbhd |
| Instagram | https://www.instagram.com/{handle}/ | https://www.instagram.com/natgeo/ |
| TikTok | https://www.tiktok.com/@{handle} | https://www.tiktok.com/@charlidamelio |
| Twitter / X | https://x.com/{handle} | https://x.com/sama |
| LinkedIn | https://www.linkedin.com/in/{handle}/ | https://www.linkedin.com/in/satyanadella/|
| Twitch | https://www.twitch.tv/{handle} | https://www.twitch.tv/shroud |
| Substack | https://{handle}.substack.com | https://astralcodexten.substack.com |
| GitHub | https://github.com/{handle} | https://github.com/torvalds |
| Patreon | https://www.patreon.com/{handle} | https://www.patreon.com/kurzgesagt |
@ prefix with short handle → likely Twitter/X or Instagramweb_fetch(url)
Check if the response contains useful profile data (bio, follower count, etc.).
If the content is mostly JS placeholders, empty <div>s, or a login wall,
move to 2b.
Platforms that almost always require browser mode:
Browser steps:
counts, bio text).
For persistent blocks, use the Apify actor for that platform. Seereferences/apify-actors.md for actor IDs and call patterns.
Parse the fetched content and populate the following fields. Mark any missing
field as null — do not guess.
{
"platform": "string", // youtube | instagram | tiktok | twitter | linkedin | twitch | substack | github | patreon | other
"handle": "string", // @-prefixed username
"display_name": "string", // Full display name
"verified": true | false,
"bio": "string", // Profile description / about text
"profile_url": "string", // Canonical URL used to scrape
"avatar_url": "string | null",
"external_links": ["string"], // Any links in bio or link-in-bio
"stats": {
"followers": "number | null",
"following": "number | null",
"subscribers": "number | null",
"total_views": "number | null",
"total_posts": "number | null",
"monthly_listeners": "number | null", // Spotify-style, if applicable
"engagement_rate": "number | null" // Percentage, if computable
},
"recent_content": [
{
"title": "string | null",
"url": "string",
"published_at": "ISO 8601 string | null",
"views": "number | null",
"likes": "number | null",
"comments": "number | null"
}
// Up to 5 most recent items
],
"contact_info": {
"email": "string | null", // Only if publicly listed in bio/links
"website": "string | null"
},
"scraped_at": "ISO 8601 UTC timestamp"
}
Privacy rule: Only capture fields that are explicitly public on the
profile page. Do not infer, deduce, or cross-reference private information.
Do not store or relay phone numbers even if visible.
See references/platform-selectors.md for CSS selectors and JSON-LD paths
per platform. Quick reference:
#subscriber-count or meta itemprop=interactionCount; description in #description-inner[data-testid="UserProfileHeader_Items"]; bio in [data-testid="UserDescription"]window._sharedData or <script type="application/ld+json"><script id="__UNIVERSAL_DATA_FOR_REHYDRATION__">itemprop attributes: name, description, follows, worksForConvert abbreviated counts to integers before storing:
"12.3K" → 12300"4.5M" → 4500000"1B" → 1000000000"1,234" → 1234Use the helper script:
python3 ~/.openclaw/workspace/skills/scrape-creator-profile/scripts/normalize_count.py "12.3K"
Present a clean summary in chat:
**[Display Name]** (@handle) · Platform
✅ Verified | 👥 X followers | 📝 Y posts
Bio: "..."
Top links: url1, url2
Recent content:
1. "Video Title" — X views (date)
2. ...
[Full JSON available on request]
Return or save the full normalized JSON object from Step 3.
To save to disk:
python3 ~/.openclaw/workspace/skills/scrape-creator-profile/scripts/save_profile.py \
--data '<json>' \
--output ~/creator-profiles/{handle}_{platform}.json
If the user supplies multiple handles or URLs (comma-separated, line-separated,
or a list):
CREATOR_SCRAPE_DELAY_MS betweenrequests, default 1500 ms).
python3 ~/.openclaw/workspace/skills/scrape-creator-profile/scripts/compare_profiles.py \
--profiles '<json array>' \
--format table # or csv
| Situation | Action |
|-----------|--------|
| Login wall / auth required | Report which fields were blocked; return partial data |
| Rate limited (429) | Wait CREATOR_SCRAPE_DELAY_MS × 3, retry once, then report |
| Profile not found (404) | Inform the user; suggest alternate handle spellings |
| JavaScript-only page, no browser | Suggest enabling browser mode in OpenClaw settings |
| Ambiguous handle across platforms | Ask user to confirm platform before scraping |
technically accessible.
email, phone numbers).
robots.txt disallow rules unless the user explicitly overrides andaccepts responsibility.
targeted harassment infrastructure.
references/platform-selectors.md — CSS selectors and JSON-LD paths per platformreferences/apify-actors.md — Apify actor IDs and call patterns for fallback scrapingscripts/normalize_count.py — Converts "12.3K" → 12300scripts/save_profile.py — Saves profile JSON to diskscripts/compare_profiles.py — Builds comparison table or CSV from multiple profiles