- 
	
	
	
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32 - 
	
	
	
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Paper • 2306.15687 • Published - 
	
	
	
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 38 - 
	
	
	
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 
Collections
Discover the best community collections!
Collections including paper arxiv:2403.03100 
						
					
				- 
	
	
	
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 88 - 
	
	
	
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 38 - 
	
	
	
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Paper • 2403.09704 • Published • 33 - 
	
	
	
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 
- 
	
	
	
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Paper • 2402.14797 • Published • 21 - 
	
	
	
Subobject-level Image Tokenization
Paper • 2402.14327 • Published • 19 - 
	
	
	
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 - 
	
	
	
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper • 2402.15319 • Published • 22 
- 
	
	
	
FastPitch: Parallel Text-to-speech with Pitch Prediction
Paper • 2006.06873 • Published - 
	
	
	
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Paper • 2010.05646 • Published - 
	
	
	
Tacotron: Towards End-to-End Speech Synthesis
Paper • 1703.10135 • Published - 
	
	
	
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Paper • 2010.11439 • Published 
- 
	
	
	
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32 - 
	
	
	
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Paper • 2306.15687 • Published - 
	
	
	
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 38 - 
	
	
	
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 12 
- 
	
	
	
SaulLM-7B: A pioneering Large Language Model for Law
Paper • 2403.03883 • Published • 88 - 
	
	
	
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Paper • 2403.03100 • Published • 38 - 
	
	
	
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Paper • 2403.09704 • Published • 33 - 
	
	
	
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 32 
- 
	
	
	
FastPitch: Parallel Text-to-speech with Pitch Prediction
Paper • 2006.06873 • Published - 
	
	
	
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Paper • 2010.05646 • Published - 
	
	
	
Tacotron: Towards End-to-End Speech Synthesis
Paper • 1703.10135 • Published - 
	
	
	
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Paper • 2010.11439 • Published 
- 
	
	
	
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Paper • 2402.14797 • Published • 21 - 
	
	
	
Subobject-level Image Tokenization
Paper • 2402.14327 • Published • 19 - 
	
	
	
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 134 - 
	
	
	
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper • 2402.15319 • Published • 22