Auto-Updater Skill
一个面向 Other 场景的 Agent 技能。原始说明:Automatically update Clawdbot and all installed skills once daily. Runs via cron, checks for updates, applies them, and messages the user with a summary of what changed.
Read-only advisory skill for LLM inference energy decisions.
Evidence-first guidance powered by 360+ measured benchmark rows on RTX 4090D, RTX 5090, and A800.
Author: Hongping Zhang (@hongping-zh)
Version: v3.0.8
Skill URL: https://clawhub.ai/hongping-zh/ecocompute
License: MIT
Dataset: zenodo.org/records/18900289 (collection window: 2025 Q1)
EcoCompute is a prompt-only advisory skill — it produces text recommendations and does not interact with the user's host environment.
| Requirement | Value |
|---|---|
| Runtime | Any LLM client capable of loading ClawHub skills |
| GPU on user side | Not required for using the skill |
| Network | Required only by the LLM client; the skill itself is self-contained |
| Python / dependencies | None — there is nothing to install on your machine |
The benchmark rows the skill references were collected on PyTorch 2.4 – 2.12 / bitsandbytes 0.45 / CUDA 12.1 – 12.8 / transformers 4.47+ (see Data Collection Environment below). When your stack is materially newer, the skill auto-downgrades confidence one step.
| Field | Value |
|---|---|
| PyTorch | 2.4 – 2.12 |
| bitsandbytes | 0.45 |
| CUDA | 12.1 – 12.8 |
| transformers | 4.47+ |
| Power sampling | NVML, 100 ms resolution |
| Collection window | 2025 Q1 |
| Dataset record | Zenodo 18900289 |
Version-drift rule: if the user's stack is materially newer than the table above (e.g. bitsandbytes ≥ 0.48, transformers ≥ 4.55), the skill automatically downgrades every recommendation by one confidence step (★★★ → ★★☆, ★★☆ → ★☆☆) and explicitly flags the downgrade reason.
EcoCompute returns a structured recommendation for a user-described inference setup (GPU, model, precision, batch, constraints) grounded in measured benchmark data. It does one thing well: precise advisory on LLM inference energy.
(Read-only / no host interaction — declared once here, not repeated below.)
Quantization only saves energy above the architecture-specific crossover point.
Below that point, FP16 is more energy-efficient than INT8 / NF4.
— Measured on RTX 4090D, RTX 5090, A800 with NVML power sampling.
Architecture-specific crossover (parameter count where quantization starts to win):
| GPU architecture | Representative SKU | NF4 crossover | INT8 crossover |
|---|---|---|---|
| Turing | Tesla T4 | ~3.2 B | ~4.0 B |
| Ada | RTX 4090D | ~3.9 B | ~4.6 B |
| Blackwell | RTX 5090 | ~5.2 B | ~5.6 B |
| Ampere (server) | A800 | ~3.7 B | ~4.3 B |
Below the crossover: quantization adds 25 – 55% energy.
Above the crossover: quantization saves 15 – 23% energy.
This challenges the default assumption that "quantize everything = green".
The skill quotes the matching row before any recommendation. Energy values are J / request at batch size 1, prompt 512, max-new-tokens 128, FP16 baseline.
| GPU | Model | FP16 | NF4 | INT8 (threshold=0) | FP8 |
|-----------|-----------|------|------|---------------------|---------|
| RTX 4090D | Qwen2-7B | 71.2 | 47.0 | 52.1 | N/A |
| A800 | Qwen2-7B | 89.4 | 58.7 | 63.2 | 67.8 |
| RTX 5090 | Qwen2-7B | TBR | TBR | TBR | TBR |
TBR = to-be-released in the next public data drop (full RTX 5090 series).
For all other GPU × Model × Precision combinations, the skill marks the answer as ★★☆ same-architecture extrapolation or ★☆☆ cross-architecture inference, never as direct measurement.
Full 360+ row dataset: ecocompute-ai/quantization-energy-crossover · Zenodo 10.5281/zenodo.18900289
If any field is missing the skill applies the Default Handling rules below before responding.
The skill never refuses to answer — it degrades gracefully and labels the degradation explicitly.
| Missing field | Rule | Resulting confidence |
|---|---|---|
| GPU unspecified | Ask once. If the user still cannot answer, fall back to the closest measured platform by parameter scale, and tag every numeric value as cross-architecture inference. | ★☆☆ |
| GPU specified but not in measured set (e.g. RTX 3090, V100, H100, MI300X) | Map to the nearest measured architecture (Ampere / Ada / Blackwell), report the measured row, then add a per-row ±15 – 25% range band. | ★★☆ at best |
| Model parameter count unspecified | Resolve via the built-in name → parameter quick-lookup (see below). If still unknown, ask the user for an order-of-magnitude (1B / 3B / 7B / 13B / 30B+). | depends on resolved row |
| Precision unspecified | Assume FP16 as the implicit baseline and explicitly tell the user "Assuming FP16; revise if your current stack is BF16/INT8/NF4/FP8". | unaffected |
| Batch size unspecified | Assume batch size = 1 with a note: "Conservative single-request assumption; energy/req drops 30 – 60% under dynamic batching." | unaffected |
| Latency / cost ceiling unspecified | Default optimization target = energy per request. Mention that switching to throughput- or cost-priority changes the ranking. | unaffected |
| Family | Common variants | Parameter size used by the skill |
|---|---|---|
| Phi | Phi-3-mini, Phi-3-small, Phi-3-medium | 3.8B / 7B / 14B |
| Qwen2 | Qwen2-1.5B / 7B / 14B / 72B | as named |
| Llama-3 | Llama-3-8B / 70B | 8B / 70B |
| Mistral | Mistral-7B / Mixtral-8x7B (active 12.9B) | 7B / 12.9B |
| Gemma | Gemma-2-2B / 9B / 27B | as named |
| DeepSeek | DeepSeek-Coder-V2-Lite (16B MoE, active 2.4B) | 2.4B active |
For families not on this list, the skill asks the user to confirm parameter count before grounding any numeric claim.
| Protocol | When to use | Output |
|-----------|-------------|--------|
| OPTIMIZE | "make my current setup more efficient" | Recommended config + energy gap vs next-best |
| COMPARE | "A vs B" | Side-by-side table (see template below) + winner |
| EXPLAIN | "why is my setup slow / hot" | Bottleneck analysis grounded in benchmark priors |
| AUDIT | "check my config for waste" | Anti-pattern findings + quantified overhead |
| RECOMMEND | "suggest a setup under constraint X" | Ranked options with trade-off metrics |
Every protocol uses lookup-then-recommend: the matching benchmark row is quoted before any suggestion.
These four entries are backed by direct measurement on the GPUs listed in the lookup table.
| Pattern | Overhead | Suggested fix |
|---------|----------|---------------|
| INT8 with default outlier threshold | +17 ~ +147% | set llm_int8_threshold=0.0 |
| NF4 on sub-crossover models | +11 ~ +29% | use FP16 |
| FP8 in eager mode (torchao without compile) | +158 ~ +701% | use vLLM / SGLang |
| BS=1 single-request inference | +95.7% per request | enable dynamic batching |
The following items reflect community engineering experience. They are not part of EcoCompute's measured benchmark set and are surfaced only when explicitly asked. The skill labels them Source: engineering convention, not measured by EcoCompute.
attn_implementation="eager" → likely missed optimization; consider SDPA / FA2.Baseline X J/request vs Recommended Y J/request (Z%)dataset: zenodo.org/records/18900289 · 2025-Q1)Every response ends with the dataset version footer:
Evidence: zenodo.org/records/18900289 (2025-Q1) · skill v3.0.8
Example (OPTIMIZE):
Conclusion: switching to NF4 saves 34% energy
Baseline: FP16 -> 71.2 J/request
Recommended: NF4 -> 47.0 J/request
Confidence: ★★★ direct measured (RTX 4090D + Qwen2-7B)
Config: BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
Evidence: zenodo.org/records/18900289 (2025-Q1) · skill v3.0.8
| Dimension | NF4 | INT8 (threshold=0) |
|-------------|------------------|---------------------|
| Energy | 47.0 J/req | 52.1 J/req |
| Throughput | 38.2 tok/s | 41.7 tok/s |
| Memory | 4.1 GB | 5.8 GB |
| Confidence | ★★★ | ★★★ |
| Winner | ✓ energy | ✓ throughput |
The skill always:
When a recommendation is emitted, the skill produces the same configuration translated into the user's chosen serving framework. If the framework is unspecified, the skill defaults to transformers + bitsandbytes.
| Framework | One-line snippet |
|---|---|
| transformers + bitsandbytes | BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4") |
| vLLM | --quantization bitsandbytes --dtype half --load-format bitsandbytes |
| TGI (Text Generation Inference) | --quantize bitsandbytes-nf4 |
| Ollama (Modelfile) | PARAMETER quantization q4_K_M (closest GGUF analog; not bit-identical to NF4) |
| llama.cpp | -q Q4_K_M (closest GGUF analog) |
llm_int8_threshold=0.0| Framework | One-line snippet |
|---|---|
| transformers + bitsandbytes | BitsAndBytesConfig(load_in_8bit=True, llm_int8_threshold=0.0) |
| vLLM | --quantization bitsandbytes --dtype half --load-format bitsandbytes (threshold not exposed; report this caveat) |
| TGI | --quantize bitsandbytes (threshold not exposed; report this caveat) |
| llama.cpp | -q Q8_0 (closest GGUF analog) |
| Framework | One-line snippet |
|---|---|
| vLLM | --quantization fp8 --kv-cache-dtype fp8 |
| TGI | --quantize fp8 |
| TensorRT-LLM | enable fp8_qat in build script |
If the user's framework is not in the table above, the skill emits the transformers + bitsandbytes snippet and explicitly states "Framework-specific mapping unavailable; verify equivalent flag on your serving stack."
| Situation | What the skill says |
|-----------|---------------------|
| Model > 14B | "Beyond measured range. Extrapolated estimate ±20%." |
| Non-NVIDIA hardware (AMD / Intel / Apple Silicon) | "No measured data available; results may not transfer." |
| bitsandbytes ≥ 0.48 / transformers ≥ 4.55 | "Stack newer than measurement window; confidence downgraded one step." |
| Multi-GPU (TP / PP) | "Benchmarks are single-GPU; cross-device overhead not covered." |
| Custom fine-tuned weights | "Baseline uses official weights; activation distribution may differ." |
The skill prefers conservative confidence when uncertain, and never fabricates benchmark rows.
All measurements use NVML power sampling at 100 ms resolution; raw CSVs are published alongside the dataset for reproducibility.
openclaw skills install ecocompute
The skill is prompt-only and needs nothing else installed on your side — see Requirements at the top of this document.
🌍 Making AI development more sustainable, one model at a time.