Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.13344

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

The first version of our unified audio and music generation model

HIT-TMG/UniMoE-Audio-Preview

7B • Updated 18 days ago • 94 • 8
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published 19 days ago • 61

Lychee-Uni-MoE-Audio

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published 19 days ago • 61
HIT-TMG/UniMoE-Audio-Preview

7B • Updated 18 days ago • 94 • 8

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 34
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 27
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Paper • 2509.22944 • Published Sep 26 • 76
Robot Learning: A Tutorial

Paper • 2510.12403 • Published 19 days ago • 98
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published 19 days ago • 61
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published 26 days ago • 52

TTS & Speech to Text

Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction

Paper • 2510.03117 • Published about 1 month ago • 11
ResembleAI/chatterbox

Text-to-Speech • Updated Sep 23 • 861k • • 1.25k
thewh1teagle/phonikud

0.3B • Updated Aug 24 • 210
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published 19 days ago • 61

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 13 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Paper • 2405.18503 • Published May 28, 2024 • 9
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2405.20289 • Published May 30, 2024 • 11
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Paper • 2406.02897 • Published Jun 5, 2024 • 16
Audio Mamba: Bidirectional State Space Model for Audio Representation Learning

Paper • 2406.03344 • Published Jun 5, 2024 • 21

The first version of our unified audio and music generation model

HIT-TMG/UniMoE-Audio-Preview

7B • Updated 18 days ago • 94 • 8
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published 19 days ago • 61

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Paper • 2509.22944 • Published Sep 26 • 76
Robot Learning: A Tutorial

Paper • 2510.12403 • Published 19 days ago • 98
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published 19 days ago • 61
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published 26 days ago • 52

Lychee-Uni-MoE-Audio

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published 19 days ago • 61
HIT-TMG/UniMoE-Audio-Preview

7B • Updated 18 days ago • 94 • 8

TTS & Speech to Text

Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction

Paper • 2510.03117 • Published about 1 month ago • 11
ResembleAI/chatterbox

Text-to-Speech • Updated Sep 23 • 861k • • 1.25k
thewh1teagle/phonikud

0.3B • Updated Aug 24 • 210
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published 19 days ago • 61

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 34
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 27
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

about 13 hours ago

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 84
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs