CosyVoice2 ONNX Models (flow & hift)

This repository provides ONNX-format models for selected modules of CosyVoice2, including:

  • flow_fp32.onnx (full precision, flow module)
  • flow_fp16.onnx (half precision, flow module)
  • hift.onnx (full precision, hift module)
  • flow_hift_combined_fp32.onnx (combined flow_fp32 and hift model)
  • flow_hift_combined_fp16.onnx (combined flow_fp16 and hift model)

Update 2025-11-09: Fixed the NaN issue in the /decoder/estimator submodule of the flow model under half-precision. flow_fp16.onnx has been updated to a fully half-precision flow model. The combined models have also been updated.

For usage instructions, please refer to the GitHub repository.
Other modules of CosyVoice2 can be obtained from the official CosyVoice2.
I have open-sourced the ONNX version of CosyVoice2, including the modified modules and conversion scripts needed for ONNX. If you want to learn how to perform the conversion, please visit CosyVoiceForOnnx.


Model Inputs and Outputs

flow_fp32.onnx / flow_fp16.onnx

  • Inputs:
    • token (int64)
    • prompt_token (int32)
    • prompt_feat (float32 / float16)
    • embedding (float32 / float16)
      • For flow_fp32.onnx, must use float32
      • For flow_fp16.onnx, must use float16
  • Outputs:
    • tts_mel (float32)

hift.onnx

  • Input:
    • speech_feat (float32)
  • Output:
    • generated_speech (float32)

flow_hift_combined_fp32.onnx / flow_hift_combined_fp16.onnx

  • Inputs
    • token (int32)
    • prompt_token (int32)
    • prompt_feat (float32 / float16)
    • embedding (float16)
    • speed (float32, scalar, controls speech rate)
      • For flow_hift_combined_fp32.onnx, must use float32
      • For flow_hift_combined_fp16.onnx, must use float16
  • Output
    • generated_speech (float32)

Notes

  • All outputs are float32.
  • Input precision must strictly match the model requirements.
  • Note: in the combined model, token input is int32 (not int64). The speed input is a float32 scalar controlling speech speed.

Acknowledgments

The original models are from the official CosyVoice2. This repository only provides ONNX format conversion and adaptation.


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Lourdle/CosyVoice2-0.5B_ONNX

Quantized
(5)
this model