AI AGENT SKILLS

Photo Screener

一个面向 Dev Tools 场景的 Agent 技能。原始说明:AI-powered photo pre-screening using MobileCLIP2-S0 model. 18x faster than ViT-L/14 with 80% selection consistency (Top-10 overlap 8/10). Use when the user w...

SKILL.md

SKILL.md


name: photo-screener
version: 1.0.0
description: |
AI-powered photo pre-screening using MobileCLIP2-S0 model.
18x faster than ViT-L/14 with 80% selection consistency (Top-10 overlap 8/10).

Use when the user wants to:

  • Filter/screen a large batch of photos before sending to LLM
  • Score photos by aesthetic quality
  • Remove near-duplicate photos (burst shots)
  • Classify photos by scene type
  • Prepare photos for multimodal LLM processing

Triggers: User mentions filtering photos, screening images, aesthetic scoring,
removing duplicates, classifying scenes, or preparing photos for LLM.
Auto-skipped when photo count ≤ user's requested output count OR ≤ 20 (batch_size).
Only triggered when photo count exceeds both thresholds.

Dependencies:
Python: torch, open-clip-torch, pillow, numpy, pillow-heif (optional, for HEIC/HEIF)
Model: MobileCLIP2-S0 (~300MB, downloaded on demand with user confirmation)
Check: bash scripts/setup_deps.sh

Model Download:
The model is NOT pre-downloaded. On first run:

  • Interactive mode: prompts user for confirmation
  • Non-interactive mode: exits with manual download instructions
  • Uses HuggingFace mirror (hf-mirror.com) for China acceleration
  • Add --auto-download to skip confirmation

metadata:
openclaw:
homepage: https://github.com/konanok/photo-skills
emoji: "🔍"
requires:
bins:

  • python3

Photo Screener — MobileCLIP-powered Smart Pre-screening

Intelligently filter, deduplicate, and classify photos using Apple MobileCLIP2-S0, preparing them for efficient multimodal LLM processing.

Why MobileCLIP2-S0?

Based on 4-model comparison test:

| Metric | MobileCLIP2-S0 | ViT-L/14 (baseline) |
| ----------------------- | :---------------: | :-----------------: |
| Encoding Speed | 26.7ms/img ⚡ | 483.7ms/img |
| Speed Ratio | 18.1x faster | 1x |
| Pearson Correlation | 0.78 | 1.0 (baseline) |
| Top-10 Overlap | 8/10 | 10/10 |
| Model Size | 74.8M | 427.6M |
| Embed Dim | 512 | 768 |

💡 1/18 of the time, 80% selection consistency — best speed/quality tradeoff.

Dependencies

Declaration file: requirements.txt

Prefer venv: Before running scripts, activate the project-root virtual environment (e.g. .venv/). If it doesn't exist, create one first:

# Create venv and install dependencies (recommended)
python3 -m venv .venv
source .venv/bin/activate
pip install -r photo-screener/requirements.txt

# Or use the skill's setup script (checks + installs)
bash photo-screener/scripts/setup_deps.sh

# Before each session, activate venv
source .venv/bin/activate

Alternatively, install globally:

pip3 install -r photo-screener/requirements.txt

Model Download Policy

The model is NOT pre-downloaded. This is by design to avoid:

  • Unexpected large downloads (~300MB)
  • Wasted bandwidth if the user doesn't need this skill

Download Behavior

| Mode | Behavior |
| --------------------------------- | --------------------------------------- |
| Interactive (terminal) | Prompts user: "是否下载模型?[Y/n]" |
| Non-interactive (piped/agent) | Exits with manual download instructions |
| --auto-download flag | Downloads without confirmation |

Manual Download

# Using China mirror (recommended)
HF_ENDPOINT=https://hf-mirror.com python3 -c \
    "import open_clip; open_clip.create_model_and_transforms('MobileCLIP2-S0', pretrained='dfndr2b')"

# Or run setup script
bash photo-screener/scripts/setup_deps.sh

Configuration

Copy config.example.toml to config.toml and edit. See config.example.toml for all available options.

Usage

# Basic screening
python3 scripts/screen.py ~/data/output/thumbnails

# Custom thresholds
python3 scripts/screen.py ~/data/output/thumbnails \
    --min-score 5.0 --sim-threshold 0.95

# Keep top 50
python3 scripts/screen.py ~/data/output/thumbnails --top-k 50

# Auto-download model (skip confirmation)
python3 scripts/screen.py ~/data/output/thumbnails --auto-download

# Pass specific file paths instead of a directory
python3 scripts/screen.py \
    --paths ~/data/RAW/001/thumbnails/DSC_0001.jpg \
            ~/data/RAW/001/thumbnails/DSC_0002.jpg

# Dry run
python3 scripts/screen.py ~/data/output/thumbnails --dry-run

| Option | Description | Default |
| ----------------- | ----------------------------------------------- | -------- |
| input_dir | Directory with photos (optional with --paths) | required |
| --paths | Specific image paths (alternative to input_dir) | — |
| --output, -o | Output JSON path | auto |
| --min-score | Min aesthetic score (1-10) | 4.0 |
| --sim-threshold | Dedup threshold (0-1) | 0.97 |
| --batch-size | Max photos per LLM batch | 20 |
| --top-k | Keep only top K | all |
| --recursive | Search subdirectories | off |
| --auto-download | Skip model download prompt | off |
| --dry-run | Preview only | off |

Pipeline

Photos (thumbnails)
    │
    ▼  Stage 1: MobileCLIP Encoding (~27ms/image)
    │  → 512-dim normalized embeddings
    │
    ├── Stage 2: Aesthetic Scoring
    │   └── LAION MLP (zero-padded 512→768 dim)
    │   └── Remove below threshold (default: 4.0)
    │
    ├── Stage 3: Similarity Dedup
    │   └── Cosine similarity + greedy dedup
    │   └── Higher score = higher priority
    │
    ├── Stage 4: Scene Classification
    │   └── Zero-shot text matching (14 categories)
    │
    └── Output: filter_report.json

Agent Integration

When using this skill from an agent:

  1. Check dependencies: bash scripts/setup_deps.sh
  2. Run with --auto-download: in agent context, use --auto-download to skip interactive prompt
  3. Or pre-download: run setup_deps.sh first which handles model download with user confirmation
# Agent-friendly command (auto-download)
python3 photo-screener/scripts/screen.py \
    ~/data/output/{session-id}/thumbnails \
    --output ~/data/output/{session-id}/filter_report.json \
    --auto-download