AI AGENT SKILLS

Visla AI Video Creation

一个面向 Research 场景的 Agent 技能。原始说明:Creates AI-generated videos from text scripts, URLs, or PPT/PDF documents using Visla. Use when the user asks to generate a video, turn a webpage into a vide...

SKILL.md

SKILL.md


name: visla
description: Creates AI-generated videos from text scripts, URLs, or PPT/PDF documents using Visla. Use when the user asks to generate a video, turn a webpage into a video, or convert a PPT/PDF into a video, or when the user asks to check Visla account credits/balance.
argument-hint: <script|url|doc|account|avatar|voice> [script|URL|file]
metadata:
clawdbot:
emoji: ""
requires:
env: [ "VISLAAPIKEY", "VISLAAPISECRET" ]
primaryEnv: "VISLAAPIKEY"
files: [ "scripts/*" ]


Visla Video Generation

Version: 260501-1423

Create AI-generated videos from text scripts, web URLs, or documents (PPT/PDF) using Visla's OpenAPI.

Before You Start

Credentials (NEVER output API keys/secrets in responses):

IMPORTANT: Never output API keys/secrets in responses.

  1. Check if ~/.config/visla/.credentials exists (do NOT read it yet).
  2. If the file exists, use a choice-based confirmation to ask the user:

"Found saved credentials. Allow reading ~/.config/visla/.credentials?"
Options: Allow / No

  1. If the user selects Allow: proceed with the command.
  2. If the user selects No, or the file does not exist:

Ask the user to provide credentials via one of:

  • Environment variables (VISLA_API_KEY, VISLA_API_SECRET)
  • CLI arguments (--key, --secret)
  • Direct input of API key and secret
  1. If provided credentials fail with VISLA_CLI_ERROR_CODE=missing_credentials or

VISLA_CLI_ERROR_CODE=auth_failed, ask the user to re-enter valid credentials.

Only process local files (scripts/docs) explicitly provided by the user, and remind users to avoid uploading sensitive
data.

  • Tell the user: this is a one-time setup (once configured, they won't need to do this again)
  • Tell the user: get API Key and Secret from https://www.visla.us/visla-api
  • Do not repeat the secrets back in the response.

Credential validity check (practical):

  • If credentials exist but running account fails with VISLA_CLI_ERROR_CODE=missing_credentials or

VISLA_CLI_ERROR_CODE=auth_failed, treat credentials as invalid and ask the user to provide real ones.

File format (bash/zsh):

export VISLA_API_KEY="your_key"
export VISLA_API_SECRET="your_secret"

For PowerShell (temporary session):

$env:VISLA_API_KEY = "your_key"
$env:VISLA_API_SECRET = "your_secret"

Scripts: scripts/visla_cli.py (Python), scripts/visla_cli.sh (Bash)

Platform Execution

Default strategy:

  • Prefer Bash on macOS when dependencies are available (the Bash CLI avoids Python SSL-stack issues on some macOS

setups).

  • Prefer Python when you're already using a well-configured Python (or when Bash dependencies are missing).

Bash (recommended on macOS; also works on Linux-like environments):

# With user consent, you may source ~/.config/visla/.credentials
export VISLA_API_KEY="your_key"
export VISLA_API_SECRET="your_secret"
./scripts/visla_cli.sh <command>

Python (cross-platform):

python3 scripts/visla_cli.py --key "your_key" --secret "your_secret" <command>
# Or, credentials are auto-detected from ~/.config/visla/.credentials (with user consent):
python3 scripts/visla_cli.py <command>

Windows native (PowerShell/CMD without Bash; Python):

# PowerShell
$env:VISLA_API_KEY = "your_key"
$env:VISLA_API_SECRET = "your_secret"
python scripts/visla_cli.py <command>

Windows note:

  • The agent should prefer running the Python CLI on Windows unless it has verified a Bash environment (WSL/Git Bash)

is available.

  • For simple scripts, pass directly: python scripts/visla_cli.py script "Scene 1: ..."
  • For multi-line or complex scripts, use stdin with - (recommended, no temp files):
  @"
  Scene 1: ...
  Scene 2: ...
  "@ | python scripts/visla_cli.py script -
  • If you have Python Launcher installed, py -3 scripts/visla_cli.py <command> may work better than python.
  • Credentials:
  • The Python CLI auto-detects ~/.config/visla/.credentials when present.
  • On Windows the default path is typically: %USERPROFILE%\\.config\\visla\\.credentials.

Note: do not print credentials. Prefer environment variables or auto-detected credentials with explicit user consent.

Commands

| Command | Description |
|-----------------------------------|---------------------------------------------------|
| /visla script <script-or-@file> | Create video from a script (text or a local file) |
| /visla url <URL> | Create video from web page URL |
| /visla doc <file> | Create video from document (PPT/PDF) |
| /visla idea <text-or-@file> | Create video from an idea |
| /visla visual <file> [file ...] | Create video from visual resources (images/videos), supports multiple files |
| /visla speech <file> [file ...] | Create video from speech (audio/video file), supports multiple files |
| /visla account | Show account info and credit balance |
| /visla avatar | List available AI avatars |
| /visla voice | List available AI voices |

Important: For avatar and voice commands:

  • Run the full CLI command (./visla_cli.sh avatar or ./visla_cli.sh voice).
  • You may filter the output before presenting to the user:
  • For avatar: remove Thumbnail: lines
  • For voice: remove URL: lines
  • Categorize and format avatar results as follows:
  • Group avatars by gender category (Female, Male, Neutral, Dynamic)
  • List each avatar name with (n) where n = number of looks
  • For each look, show: Look Name (lookUuid)
  • Format: - AvatarName (n): Look1 (uuid), Look2 (uuid), ...
  • Example:
    **Female (16):**
    - Emma (5): Blue Dress (1000145), Patterned Dress (1000146), Black Blazer (1000147), Light Gray Blazer (1000148), Emerald Green Pantsuit (1000149)
  • Categorize voice results by language/region (e.g., System, US English, Chinese, Japanese, French, etc.)
  • You must NOT omit any items from the list. The user must see all available avatars/voices, even if the list is long.
  • Agents must use the exact ID from the listing when configuring videos.

Optional Parameters

| Parameter | Description |
|-----------------------|-----------------------------------------------------------------|
| -c, --config <file> | Path to JSON config file with video options |
| --avatar <id> | Avatar ID to use for the video (get list from avatar command) |
| --voice <id> | Voice ID to use for the video (get list from voice command) |

visual command specific

| Parameter | Description |
|-----------------------|-----------------------------------------------------------------|
| --script, -s <text>| Script or description text (or @filename) |
| --style <style> | Video style: montage, storytelling (default), explainer |

speech command specific

| Parameter | Description |
|-----------------------|-----------------------------------------------------------------|
| --function <func> | Speech to video function: SPEECH_TO_VIDEO_SUMMARY or SPEECH_TO_VIDEO_FULL_LENGTH |

All other options (aspectratio, pace, burnsubtitles, footageoptions, bgmoptions, etc.) can be set in the config
file.

Cleanup: After video creation completes, delete the config file unless it's intended for reuse.

Config File Format (JSON)

All video options can be stored in a JSON config file (nested structure matches API request body):

{
  "video_title": "My Video",
  "video_description": "Video description",
  "project_function": "SPEECH_TO_VIDEO_SUMMARY",
  "script_text_mode": "ai_rewrite",
  "doc_usage": "page_by_page_walkthrough",
  "speaker_notes_verbatim": false,
  "target_video": {
    "aspect_ratio": "16:9",
    "video_pace": "fast",
    "burn_subtitles": false,
    "video_duration_in_seconds": 60
  },
  "avatar_options": {
    "use_avatar": false,
    "look_id": 12345,
    "avatar_layout": "smart_composition",
    "enable_auto_wallpaper": true,
    "enable_in_preview": true
  },
  "voice_options": {
    "use_voice": false,
    "voice_id": 1
  },
  "footage_options": {
    "enable_footage": true,
    "use_free_stocks": true,
    "use_premium_stocks": true,
    "use_premium_stocks_getty": true,
    "use_private_stocks": true,
    "private_stock_ids": 123456
  },
  "bgm_options": {
    "enable_bgm": true,
    "use_free_stocks": true,
    "use_premium_stocks": true
  }
}

Note: avatar_options.avatar_layout accepts only: host_only, host_pip, smart_composition.

CLI arguments (avatar, voice) override config file values.

Source of truth for the exact CLI surface: run scripts/visla_cli.sh --help or python3 scripts/visla_cli.py --help.

Script Format

**Scene 1** (0-10 sec):
**Visual:** A futuristic calendar flipping to 2025 with digital patterns.
**Narrator:** "AI is evolving rapidly! Here are 3 game-changing AI trends."

**Scene 2** (10-25 sec):
**Visual:** Text: "Trend #1: Generative AI Everywhere." Show tools like ChatGPT.
**Narrator:** "Generative AI is dominating industries—creating content and images."

Workflow

The script, url, doc, idea, visual, and speech commands execute the complete flow automatically:

  1. Create project
  2. Poll until generation completes (may take a few minutes)
  3. Auto-export and return download link

Execution Instructions:

  • Inform user that video generation takes some time
  • Report progress status periodically during polling

Timeout Guidance

  • This workflow typically takes 3-10 minutes, but can take up to ~30 minutes in the worst case. Set the

task/command timeout to >= 30 minutes (Windows defaults are often ~10 minutes and need to be increased). If you
cannot change the timeout, warn the user up front and, on timeout, ask whether to continue or switch to a step-by-step
run.

  • If timeout occurs, the CLI returns project_uuid in the output. Inform the user they can manually check project

status and continue later using the Visla web interface or API.

Examples

/visla script @myscript.txt
/visla script "Scene 1: ..."
/visla url https://blog.example.com/article
/visla doc presentation.pptx
/visla idea "Create a video about machine learning"
/visla idea @my_idea.txt
/visla visual image.jpg
/visla visual photo1.jpg photo2.jpg photo3.jpg
/visla visual image.jpg --script "Description of the images..."
/visla visual image.jpg --style montage
/visla speech interview.m4a
/visla speech podcast.mp3 audio1.mp3 audio2.mp3
/visla speech podcast.mp3 --function SPEECH_TO_VIDEO_SUMMARY
/visla account
/visla avatar
/visla voice

# With config file
/visla script "Scene 1: Hello" -c config.json

# With avatar/voice (CLI overrides config)
/visla script "Scene 1: Hello" --avatar avatar_123 --voice voice_456

Supported Document Formats

  • PowerPoint: .pptx, .ppt
  • PDF: .pdf

Supported Media Formats

Visual Resources (visual command)

  • Images: .jpg, .jpeg, .png, .gif, .webp
  • Videos: .mp4, .mov, .avi, .mkv

Audio/Speech (speech command)

  • Audio: .mp3, .wav, .m4a, .aac, .flac
  • Videos: .mp4, .mov, .avi, .mkv

Output Format

  • Start: Display "Visla Skill v260501-1423" when skill begins
  • End: Display "Visla Skill v260501-1423 completed" when skill finishes

Security

The CLI scripts enforce the following safety measures to prevent unauthorized file access:

  • Path traversal: Paths containing .. are rejected.
  • System directories: Access to /etc/, /proc/, /sys/, /dev/, /run/, /var/log/ (and Windows equivalents) is denied.
  • Text file extension restriction: The @file syntax in script, idea, and visual --script commands only accepts .txt, .md, .srt, .vtt, .csv files.
  • Document/media file validation: The doc, visual, and speech commands validate file extensions against supported formats before upload.
  • Credentials: The Python CLI auto-detects ~/.config/visla/.credentials only. No arbitrary credential file paths are accepted.
  • User consent: The agent must ask for user consent before accessing local files, as specified in the "Before You Start" section.