fix: add GPU bf16/fp16 capability preflight to demo_vllm#111
Open
mvanhorn wants to merge 1 commit into
Open
Conversation
demo_vllm.py silently produced empty output on pre-Ampere GPUs. Add a preflight that checks compute capability on every GPU vLLM will use (all tensor-parallel devices plus the audio device), exits with an actionable error for bf16 below 8.0, and warns for fp16. Closes FunAudioLLM#107
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
demo_vllm.pynow fails fast with a clear error when the GPU cannot run bf16, instead of silently producing empty transcription output. The preflight checks compute capability on every GPU that participates in tensor parallelism (not just the primary device), exits with an actionable message for--dtype bf16on pre-Ampere hardware, and warns for--dtype fp16, pointing users at the AutoModel path (demo1.py) that works on older GPUs. The vLLM guide documents the requirement.Why this matters
#107 (demo_vllm.py执行结果为空): users on pre-Ampere GPUs (V100/P100/T4-class, compute capability < 8.0) run the demo, vLLM initializes without complaint, and the result is an empty string with no hint of the cause. The silent failure makes it look like the model is broken. With the check, the failure happens at startup with the reason and the two workarounds spelled out.
Testing
python3 -m py_compile demo_vllm.pypassesget_device_capabilitycontract: cc >= 8.0 passes through untouched, bf16 below 8.0 exits 1, fp16 below 8.0 warns and continuestorch.cuda.is_available()guard) or supported GPUsCloses #107
AI was used for assistance.