CosyVoice2 ONNX Models (flow & hift)
This repository provides ONNX-format models for selected modules of CosyVoice2, including:
flow_fp32.onnx(full precision, flow module)flow_fp16.onnx(half precision, flow module)hift.onnx(full precision, hift module)flow_hift_combined_fp32.onnx(combined flow_fp32 and hift model)flow_hift_combined_fp16.onnx(combined flow_fp16 and hift model)
Update 2025-11-09: Fixed the NaN issue in the /decoder/estimator submodule of the flow model under half-precision. flow_fp16.onnx has been updated to a fully half-precision flow model. The combined models have also been updated.
For usage instructions, please refer to the GitHub repository.
Other modules of CosyVoice2 can be obtained from the official CosyVoice2.
I have open-sourced the ONNX version of CosyVoice2, including the modified modules and conversion scripts needed for ONNX. If you want to learn how to perform the conversion, please visit CosyVoiceForOnnx.
Model Inputs and Outputs
flow_fp32.onnx / flow_fp16.onnx
- Inputs:
token(int64)prompt_token(int32)prompt_feat(float32 / float16)embedding(float32 / float16)- For
flow_fp32.onnx, must use float32 - For
flow_fp16.onnx, must use float16
- For
- Outputs:
tts_mel(float32)
hift.onnx
- Input:
speech_feat(float32)
- Output:
generated_speech(float32)
flow_hift_combined_fp32.onnx / flow_hift_combined_fp16.onnx
- Inputs
token(int32)prompt_token(int32)prompt_feat(float32 / float16)embedding(float16)speed(float32, scalar, controls speech rate)- For
flow_hift_combined_fp32.onnx, must use float32 - For
flow_hift_combined_fp16.onnx, must use float16
- For
- Output
generated_speech(float32)
Notes
- All outputs are float32.
- Input precision must strictly match the model requirements.
- Note: in the combined model,
tokeninput is int32 (not int64). Thespeedinput is a float32 scalar controlling speech speed.
Acknowledgments
The original models are from the official CosyVoice2. This repository only provides ONNX format conversion and adaptation.
Model tree for Lourdle/CosyVoice2-0.5B_ONNX
Base model
FunAudioLLM/CosyVoice2-0.5B