Tencent MPS
一个面向 crypto 场景的 Agent 技能。原始说明:腾讯云 MPS 媒体处理服务,支持以下功能:【视频转码】转码/压缩/格式转换/H.264/H.265/AV1/MP4/编码/码率/分辨率/帧率。【画质增强】画质增强/老片修复/超分/视频超分/真人增强/漫剧增强/防抖/720P/1080P/2K/4K。【音频处理】音频分离/人声提取/伴奏提取/去...
name: vllm-plugin-fl-setup-flagos
description: >
Install and configure vLLM-Plugin-FL for multiple hardware backends including NVIDIA, Ascend
and etc. Use when setting up vllm-plugin-fl, configuring the environment for specific hardware
backend, installing dependencies, checking whether dependencies are installed successfully,
resolving runtime issues, and launching inference to verify successful model serving. Trigger
when the user says things like "setup vllm-plugin-fl", "install vllm-plugin-fl",
"configure FL plugin", "set up FlagGems", or "set up FlagCX".
argument-hint: "[backend]"
user-invocable: false
compatibility: "Linux (Ubuntu 20.04+), Python 3.10+, vLLM v0.13.0, GPU with appropriate drivers"
metadata:
version: "1.0.0"
author: flagos-ai
category: environment-setup
tags: [vllm, vllm-plugin-fl, flaggems, flagcx, setup, installation]
vLLM-Plugin-FL extends vLLM to support model inference/serving across diverse hardware backends (NVIDIA, Ascend, MetaX, Iluvatar, etc.) via FlagOS's unified operator library FlagGems and communication library FlagCX. This skill covers installation, hardware-specific environment configuration, and dependency setup.
pip package managerVerify vLLM version before proceeding:
python -c "import vllm; print(vllm.__version__)"
# Expected output: 0.13.0
# NVIDIA GPU
nvidia-smi
# Huawei NPU
npu-smi info
# Moore Threads GPU
mthreads-gmi
# Iluvatar GPU
ixsmi
First create a workspace directory and try cloning the source code:
mkdir -p ~/flagos-workspace && cd ~/flagos-workspace
git clone https://github.com/flagos-ai/vllm-plugin-FL
If git clone fails due to network issues, ask the user for their network proxy settings (e.g. http_proxy / https_proxy), configure the proxy, then retry the clone.
Then install from the source directory:
cd vllm-plugin-FL
pip install -r requirements.txt
pip install --no-build-isolation .
# Required to enable vLLM-Plugin-FL when running vLLM
export VLLM_PLUGINS='fl'
Verify vLLM-Plugin-FL installation:
python -c "import vllm_fl; print('vllm-plugin-FL installed successfully')"
Ascend NPU users: Before installing FlagGems, you must first install FlagTree. See references/npu.md and complete the FlagTree installation step there before proceeding. Otherwise the FlagGems verification will fail repeatedly and keep reinstalling Triton.
# Install build dependencies
pip install -U scikit-build-core==0.11 pybind11 ninja cmake
# Clone FlagGems source code
cd ~/flagos-workspace
git clone https://github.com/flagos-ai/FlagGems
If git clone fails due to network issues, ask the user for their network proxy settings (e.g. http_proxy / https_proxy), configure the proxy, then retry the clone.
Then install from the source directory:
cd FlagGems
pip install --no-build-isolation .
Verify FlagGems installation:
python -c "import flag_gems; print('FlagGems installed successfully')"
FlagCX is a unified communication library for multi-device distributed inference, supporting both homogeneous and heterogeneous setups. Skip this step if running on a single device.
Note: Ascend NPU does not need FlagCX — skip this step for Ascend backends.
cd ~/flagos-workspace
git clone https://github.com/flagos-ai/FlagCX.git
If git clone fails due to network issues, ask the user for their network proxy settings (e.g. http_proxy / https_proxy), configure the proxy, then retry the clone.
Then build and install from the source directory:
cd FlagCX
git submodule update --init --recursive
# Build for your platform (e.g. USE_NVIDIA=1 for NVIDIA)
make USE_NVIDIA=1
export FLAGCX_PATH="$PWD"
# Install Python binding (replace [xxx] with your platform: nvidia, ascend, etc.)
cd plugin/torch/
FLAGCX_ADAPTOR=[xxx] pip install --no-build-isolation .
Verify FlagCX installation:
python -c "import flagcx; print('FlagCX installed successfully')"
Some hardware backends require additional setup. See the corresponding reference document:
| Backend | Chip Vendor | Reference |
|---------|-------------|-----------|
| Ascend NPU | Huawei | references/npu.md |
| MetaX GPU | MetaX | TBD |
| Iluvatar GPU (BI-V150) | Iluvatar | references/iluvatar<em>gpu.md |
| Pingtouge-Zhenwu | Pingtouge | TBD |
| Tsingmicro | Tsingmicro | TBD |
| Moore Threads GPU | Moore Threads | references/mthreads<em>gpu.md |
| Hygon DCU | Hygon | TBD |
Qwen3-4B, DeepSeek-R1). find / -maxdepth 5 -type d -name "<user_provided_model_name>" 2>/dev/null
export VLLM_PLUGINS='fl'
For Moore Threads GPU, also set:
export USE_FLAGGEMS=1
export FLAGCX_PATH=/workspace/FlagCX # MUST point to the actual FlagCX installation directory; this is only an example
export VLLM_MUSA_ENABLE_MOE_TRITON=1
from vllm import LLM, SamplingParams
model_path = "<resolved_model_path>"
prompts = [
"Hello, my name is",
]
sampling_params = SamplingParams(max_tokens=10, temperature=0.0)
# For Moore Threads GPU, add: enforce_eager=True, block_size=64, attention_config={"backend": "TORCH_SDPA"}
# For Iluvatar BI-V150, add: enforce_eager=True
llm = LLM(model=model_path, max_num_batched_tokens=16384, max_num_seqs=2048)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
Out of memory on model load: Use gpu_memory_utilization parameter to limit memory. Start with 0.8 and adjust:
from vllm import LLM
llm = LLM(model="...", gpu_memory_utilization=0.8)
FlagGems build failures: Ensure build dependencies are installed (scikit-build-core, pybind11, ninja, cmake). Check that your compiler supports C++17.
Plugin not loaded: If vLLM does not use the FL plugin, verify that VLLM_PLUGINS='fl' is set in your environment.
FlagCX communication errors: Ensure FLAGCX_PATH is correctly set and the library was built for your platform. For NVIDIA, verify with make USE_NVIDIA=1.
Ascend-specific issues: See references/npu.md for Ascend NPU troubleshooting, including FlagTree setup and eager execution requirements.
Cannot connect to GitHub: Ask the user for their network proxy settings (e.g. http_proxy / https_proxy), configure the proxy, then retry the git clone command.