Literate Goggles's picture

Literate Goggles

literate-goggles

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

FARMER: Flow AutoRegressive Transformer over Pixels

upvoted a paper 7 days ago

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

upvoted a paper 7 days ago

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

View all activity

Organizations

None yet

upvoted a paper 6 days ago

FARMER: Flow AutoRegressive Transformer over Pixels

Paper • 2510.23588 • Published 7 days ago • 55

upvoted 3 papers 7 days ago

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Paper • 2106.06103 • Published Jun 11, 2021 • 4

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published 17 days ago • 86

Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published 11 days ago • 44

upvoted an article 8 days ago

Article

Building the Open Agent Ecosystem Together: Introducing OpenEnv

12 days ago

• 114

upvoted a paper 13 days ago

UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published 19 days ago • 61

upvoted a paper 27 days ago

LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec

Paper • 2410.15764 • Published Oct 21, 2024 • 1

upvoted 2 papers 29 days ago

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

Paper • 2409.00750 • Published Sep 1, 2024 • 5

RLP: Reinforcement as a Pretraining Objective

Paper • 2510.01265 • Published Sep 26 • 39

upvoted 5 papers about 1 month ago

Training Agents Inside of Scalable World Models

Paper • 2509.24527 • Published Sep 29 • 6

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

Paper • 2509.22220 • Published Sep 26 • 64

Multiplayer Nash Preference Optimization

Paper • 2509.23102 • Published Sep 27 • 61

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published Sep 28 • 115

Advancing Speech Understanding in Speech-Aware Language Models with GRPO

Paper • 2509.16990 • Published Sep 21 • 18

upvoted 6 papers about 2 months ago

MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

Paper • 2502.18924 • Published Feb 26 • 16

TTS-1 Technical Report

Paper • 2507.21138 • Published Jul 22 • 12

Fast Text-to-Audio Generation with Adversarial Post-Training

Paper • 2505.08175 • Published May 13 • 25

Single-stream Policy Optimization

Paper • 2509.13232 • Published Sep 16 • 33

Marco-Voice Technical Report

Paper • 2508.02038 • Published Aug 4 • 16

OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15 • 103