Skip to content

fix: add GPU bf16/fp16 capability preflight to demo_vllm#111

Open
mvanhorn wants to merge 1 commit into
FunAudioLLM:mainfrom
mvanhorn:fix/107-gpu-bf16-capability-check
Open

fix: add GPU bf16/fp16 capability preflight to demo_vllm#111
mvanhorn wants to merge 1 commit into
FunAudioLLM:mainfrom
mvanhorn:fix/107-gpu-bf16-capability-check

Conversation

@mvanhorn

@mvanhorn mvanhorn commented Jun 7, 2026

Copy link
Copy Markdown

Summary

demo_vllm.py now fails fast with a clear error when the GPU cannot run bf16, instead of silently producing empty transcription output. The preflight checks compute capability on every GPU that participates in tensor parallelism (not just the primary device), exits with an actionable message for --dtype bf16 on pre-Ampere hardware, and warns for --dtype fp16, pointing users at the AutoModel path (demo1.py) that works on older GPUs. The vLLM guide documents the requirement.

Why this matters

#107 (demo_vllm.py执行结果为空): users on pre-Ampere GPUs (V100/P100/T4-class, compute capability < 8.0) run the demo, vLLM initializes without complaint, and the result is an empty string with no hint of the cause. The silent failure makes it look like the model is broken. With the check, the failure happens at startup with the reason and the two workarounds spelled out.

Testing

  • python3 -m py_compile demo_vllm.py passes
  • Capability logic verified against torch's get_device_capability contract: cc >= 8.0 passes through untouched, bf16 below 8.0 exits 1, fp16 below 8.0 warns and continues
  • No behavior change for CPU-only environments (torch.cuda.is_available() guard) or supported GPUs

Closes #107

AI was used for assistance.

demo_vllm.py silently produced empty output on pre-Ampere GPUs. Add a
preflight that checks compute capability on every GPU vLLM will use (all
tensor-parallel devices plus the audio device), exits with an actionable
error for bf16 below 8.0, and warns for fp16.

Closes FunAudioLLM#107
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

demo_vllm.py执行结果为空

1 participant