Steven Zheng's picture

Building on HF

Steven Zheng PRO

Steveeeeeeen

·

AI & ML interests

speech & audio

Recent Activity

updated a dataset 1 day ago

Steveeeeeeen/whisper-leaderboard-evals

liked a Space 5 days ago

black-forest-labs/FLUX.2-dev

upvoted an article 5 days ago

Curating datasets directly on the Hub

View all activity

Organizations

upvoted an article 5 days ago

Article

Curating datasets directly on the Hub

5 days ago

•

21

upvoted a collection 6 days ago

Step-Audio-EditX

Step-Audio-EditX • 4 items • Updated 13 days ago • 10

upvoted an article 7 days ago

Article

Continuous batching from first principles

8 days ago

•

223

upvoted an article 11 days ago

Article

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

12 days ago

•

19

upvoted 3 papers 11 days ago

MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance

Paper • 2510.00499 • Published Oct 1 • 19

Drax: Speech Recognition with Discrete Flow Matching

Paper • 2510.04162 • Published Oct 5 • 27

Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement

Paper • 2510.23141 • Published Oct 27 • 4

upvoted 2 articles 12 days ago

Article

Voice Cloning with Consent

Oct 28

•

28

Article

Introducing Cogito v2.1

13 days ago

•

17

upvoted 5 articles 13 days ago

Article

Granite 4.0 Nano: Just how small can you go?

Oct 28

•

120

Article

Aligning to What? Rethinking Agent Generalization in MiniMax M2

Oct 30

•

26

Article

Join the AMD Open Robotics Hackathon

19 days ago

•

7

Article

AI Model Optimization More Flexible Than Ever

15 days ago

•

13

Article

Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models

13 days ago

•

25

upvoted a paper 13 days ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published 23 days ago • 125

upvoted an article 27 days ago

Article

Llasa Goes RL: Training LLaSA with GRPO for Improved Prosody and Expressiveness

27 days ago

•

10

upvoted 2 papers about 1 month ago

Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation

Paper • 2510.06961 • Published Oct 8 • 8

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 88

upvoted an article about 2 months ago

Article

High-Quality Datasets for Far-Field ASR (Treble Technologies x Hugging Face)

Oct 13

•

16

upvoted a collection about 2 months ago

Ming-V2

10 items • Updated 21 days ago • 30