文件预览

mthreads_gpu.md

查看 Vllm Plugin Fl Setup Flagos 技能包中的文件内容。

返回技能详情下载技能包打开来源页

文件内容

references/mthreads_gpu.md

# Additional Steps for Moore Threads GPU

If using Moore Threads GPU, the following extra configuration applies on top of the main workflow. All source code repositories use the same `flagos-ai` GitHub organization as other backends — no separate forks are needed.

## Installation Differences from NVIDIA

The main workflow (Steps 2–4) applies as-is with one difference:

### FlagCX Build Flag (Step 4)

When building FlagCX, use `USE_MUSA=1` instead of `USE_NVIDIA=1`, and set the adaptor to `musa`:

```bash
# In the FlagCX directory (from Step 4 of the main workflow)
make USE_MUSA=1

cd plugin/torch/
FLAGCX_ADAPTOR=musa pip install --no-build-isolation .
```

## Environment Variables

These must be set **in addition to** the standard `VLLM_PLUGINS='fl'` before launching vLLM:

```bash
export USE_FLAGGEMS=1
export FLAGCX_PATH=/path/to/FlagCX  # MUST point to the actual FlagCX installation directory
export VLLM_MUSA_ENABLE_MOE_TRITON=1
```

## Inference Notes

Moore Threads requires additional parameters when constructing the `LLM` object:

- `enforce_eager=True`
- `block_size=64`
- `attention_config={"backend": "TORCH_SDPA"}`

> **Note:** First-time inference for some models may take a long time due to Triton kernel compilation.