228 802 1471

Clem 🤗 PRO

clem

http://huggingface.co

AI & ML interests

multi-modal, time-series, biology and chemistry

Recent Activity

liked a Space 1 day ago

8bitkick/reachy_mini_3d_web_viz

upvoted an article 1 day ago

Easily Build and Share ROCm Kernels with Hugging Face

liked a model 5 days ago

Photoroom/prx-1024-t2i-beta

View all activity

Organizations

liked a Space 1 day ago

Reachy Mini 3D Web Visualizer

🦾

Visualize Reachy Mini's 3D model and joint data

upvoted an article 1 day ago

Article

Easily Build and Share ROCm Kernels with Hugging Face

1 day ago

•

liked a model 5 days ago

Photoroom/prx-1024-t2i-beta

Text-to-Image • Updated 5 days ago • 481 • 55

upvoted an article 5 days ago

Article

Building for an Open Future - our new partnership with Google Cloud

6 days ago

•

upvoted an article 6 days ago

Article

Intelligence Per Watt: A Study of Local Intelligence Efficiency

6 days ago

•

liked a dataset 7 days ago

builddotai/Egocentric-10K

Updated 8 days ago • 43.4k • 237

upvoted a collection 12 days ago

Kimi-K2

Collection

Moonshot's MoE LLMs with 1 trillion parameters, exceptional on agentic intellegence • 5 items • Updated 4 days ago • 146

liked a model 12 days ago

moonshotai/Kimi-K2-Thinking

Text Generation • Updated 10 days ago • 153k • • 1.27k

liked a dataset 13 days ago

nvidia/PhysicalAI-Autonomous-Vehicles

Updated 21 days ago • 89.1k • 348

liked 5 models 14 days ago

reacted to unmodeled-tyler's post with ❤️🚀 14 days ago

Post

664

New Datasets Published:
vanta-research/poetic-imagery-small
vanta-research/excitement-small

We are open sourcing two of our datasets today, which were used in the training of Apollo Astralis 8B and 4B.

The first dataset, poetic-imagery-small is designed to give the model's responses a bit of "depth" to them in order to encourage curiosity and thought from the user.

Additionally, the excitement-small dataset is designed to teach the model how to use "excited" language conversationally. This dataset was used on both Apollo Astralis models, which effectively demonstrate general excitement during user interaction.

VANTA Research is an AI safety project which aims to research and develop language models aligned for all types of thinking. These datasets were created aligned with that mission, in addition to rigorous AI safety standards.

reacted to nouamanetazi's post with ❤️🚀👍🤗 14 days ago

Post

3781

After training 𝐒𝐦𝐨𝐥𝐋𝐌𝟑 on 𝟑𝟖𝟒 𝐇𝟏𝟎𝟎𝐬 for nearly a month, I've come to realize something most people overlook: 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐦𝐚𝐤𝐞-𝐨𝐫-𝐛𝐫𝐞𝐚𝐤 𝐟𝐚𝐜𝐭𝐨𝐫 𝐢𝐧 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠. 🔥

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious 𝐍𝐂𝐂𝐋 𝐞𝐫𝐫𝐨𝐫𝐬, or when your expensive GPU cluster is running at 𝟔𝟎% 𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲, the problem isn't your model. It's most probably a 𝐦𝐢𝐬𝐮𝐬𝐞 𝐨𝐟 𝐭𝐡𝐞 𝐡𝐚𝐫𝐝𝐰𝐚𝐫𝐞. 🛠️

Questions that seemed simple but had no clear answers: Why is 𝐌𝐨𝐄 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐬𝐥𝐨𝐰𝐞𝐫 𝐭𝐡𝐚𝐧 𝐝𝐞𝐧𝐬𝐞 𝐦𝐨𝐝𝐞𝐥𝐬? Which 𝐍𝐂𝐂𝐋 𝐟𝐥𝐚𝐠𝐬 should we actually set? How often should we checkpoint without killing throughput?

That's why we built 𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤 📖: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the 𝐢𝐧𝐟𝐫𝐚𝐬𝐭𝐫𝐮𝐜𝐭𝐮𝐫𝐞 𝐥𝐚𝐲𝐞𝐫 that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: 𝐇𝐁𝐌𝟑 𝐡𝐢𝐭𝐭𝐢𝐧𝐠 𝟑 𝐓𝐁/𝐬, 𝐍𝐕𝐋𝐢𝐧𝐤 𝟒.𝟎 𝐫𝐞𝐚𝐜𝐡𝐢𝐧𝐠 𝟕𝟖𝟔 𝐆𝐁/𝐬, 𝐏𝐂𝐈𝐞 𝐆𝐞𝐧𝟒 𝐚𝐭 𝟏𝟒.𝟐 𝐆𝐁/𝐬. Then we ran collective operations across 𝟏𝟐𝟖 𝐆𝐏𝐔𝐬 (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from 𝟒𝟖𝟎 𝐆𝐁/𝐬 on a single node to 𝟑𝟐𝟎-𝟑𝟓𝟎 𝐆𝐁/𝐬 across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

𝐓𝐡𝐞 𝐒𝐦𝐨𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐥𝐚𝐲𝐛𝐨𝐨𝐤: https://lnkd.in/e5MKXUHS

Shared with ❤️ by the HuggingFace team

Clem 🤗 PRO

AI & ML interests

Recent Activity

Organizations

clem's activity

Reachy Mini 3D Web Visualizer

Easily Build and Share ROCm Kernels with Hugging Face

Building for an Open Future - our new partnership with Google Cloud

Intelligence Per Watt: A Study of Local Intelligence Efficiency