Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.03100

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Paper • 2306.15687 • Published Jun 23, 2023
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 12

SaulLM-7B: A pioneering Large Language Model for Law

Paper • 2403.03883 • Published Mar 6, 2024 • 88
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Paper • 2403.09704 • Published Mar 8, 2024 • 33
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11, 2024 • 32

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
mistralai/Mistral-Small-24B-Base-2501

24B • Updated Jul 28 • 14.6k • 258

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22, 2024 • 21
Subobject-level Image Tokenization

Paper • 2402.14327 • Published Feb 22, 2024 • 19
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134
GPTVQ: The Blessing of Dimensionality for LLM Quantization

Paper • 2402.15319 • Published Feb 23, 2024 • 22

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads

Paper • 2408.05101 • Published Aug 9, 2024 • 8

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
Wukong: Towards a Scaling Law for Large-Scale Recommendation

Paper • 2403.02545 • Published Mar 4, 2024 • 17

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Paper • 2403.08764 • Published Mar 13, 2024 • 36

Text to Speech Architectures

FastPitch: Parallel Text-to-speech with Pitch Prediction

Paper • 2006.06873 • Published Jun 11, 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Paper • 2010.05646 • Published Oct 12, 2020
Tacotron: Towards End-to-End Speech Synthesis

Paper • 1703.10135 • Published Mar 29, 2017
Parallel Tacotron: Non-Autoregressive and Controllable TTS

Paper • 2010.11439 • Published Oct 22, 2020

Aria Everyday Activities Dataset

Paper • 2402.13349 • Published Feb 20, 2024 • 31
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 141

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23, 2024 • 32
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Paper • 2306.15687 • Published Jun 23, 2023
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15, 2024 • 12

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads

Paper • 2408.05101 • Published Aug 9, 2024 • 8

SaulLM-7B: A pioneering Large Language Model for Law

Paper • 2403.03883 • Published Mar 6, 2024 • 88
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Paper • 2403.09704 • Published Mar 8, 2024 • 33
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11, 2024 • 32

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
Wukong: Towards a Scaling Law for Large-Scale Recommendation

Paper • 2403.02545 • Published Mar 4, 2024 • 17

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Paper • 2403.08764 • Published Mar 13, 2024 • 36

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
mistralai/Mistral-Small-24B-Base-2501

24B • Updated Jul 28 • 14.6k • 258

Text to Speech Architectures

FastPitch: Parallel Text-to-speech with Pitch Prediction

Paper • 2006.06873 • Published Jun 11, 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Paper • 2010.05646 • Published Oct 12, 2020
Tacotron: Towards End-to-End Speech Synthesis

Paper • 1703.10135 • Published Mar 29, 2017
Parallel Tacotron: Non-Autoregressive and Controllable TTS

Paper • 2010.11439 • Published Oct 22, 2020

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Paper • 2402.14797 • Published Feb 22, 2024 • 21
Subobject-level Image Tokenization

Paper • 2402.14327 • Published Feb 22, 2024 • 19
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22, 2024 • 134
GPTVQ: The Blessing of Dimensionality for LLM Quantization

Paper • 2402.15319 • Published Feb 23, 2024 • 22

Aria Everyday Activities Dataset

Paper • 2402.13349 • Published Feb 20, 2024 • 31
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 38
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 141

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs