gg-hf-gm

community
Activity Feed

AI & ML interests

None defined yet.

danielhanchen 
posted an update 3 days ago
view post
Post
8241
We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw.

Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM

Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp

Guide: https://unsloth.ai/docs/basics/api
danielhanchen 
posted an update 10 days ago
view post
Post
10672
Unsloth is now one of the top 10 most followed organizations on Hugging Face. 🤗🦥

Thanks so much for all the support!
Our HF page:
unsloth
  • 5 replies
·
mlabonne 
posted an update 11 days ago
view post
Post
1096
Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs.

> Added many new datasets
> New "thinking" column
> Refreshed recommended tools.

Thanks to everyone who told me they used it for their research at ICLR, you motivated this update!
  • 2 replies
·
danielhanchen 
posted an update 16 days ago
danielhanchen 
posted an update 22 days ago
tomaarsen 
posted an update 29 days ago
view post
Post
903
🌐 I've just published Sentence Transformers v5.4 to make the project fully multimodal for embeddings and reranking. The release also includes a modular CrossEncoder, and automatic Flash Attention 2 input flattening. Details:

You can now use SentenceTransformer and CrossEncoder with text, images, audio, and video, with the same familiar API. That means you can compute embeddings for an image and a text query using model.encode(), compare them with model.similarity(), and it just works. Models like Qwen3-VL-Embedding-2B and jinaai/jina-reranker-m0 are supported out of the box.

Beyond multimodal, I also fully modularized the CrossEncoder class. It's now a torch.nn.Sequential of composable modules, just like SentenceTransformer has been. This unlocked support for generative rerankers (CausalLM-based models like mxbai-rerank-v2 and the Qwen3 rerankers) via a new LogitScore module, which wasn't possible before without custom code.

Also, Flash Attention 2 now automatically skips padding for text-only inputs. If your batch has a mix of short and long texts, this gives you a nice speedup and lower VRAM usage for free.

I wrote a blog post walking through the multimodal features with practical examples. Check it out if you want to get started, or just point your Agent to the URL: https://huggingface.co/blog/multimodal-sentence-transformers

This release has set up the groundwork for more easily introducing late-interaction models (both text-only and multimodal) into Sentence Transformers in the next major release. I'm looking forward to it!
danielhanchen 
posted an update about 1 month ago
danielhanchen 
posted an update about 1 month ago
danielhanchen 
posted an update about 1 month ago
view post
Post
2760
A new way to use Unsloth.

Coming soon...
danielhanchen 
posted an update about 1 month ago
view post
Post
931
You don’t need to set LLM parameters anymore! 🚀

llama.cpp uses only the context length + compute your local setup needs. Unsloth also auto-applies the correct model settings

Try in Unsloth Studio - now with precompiled llama.cpp binaries.

GitHub: https://github.com/unslothai/unsloth
  • 2 replies
·
danielhanchen 
posted an update about 2 months ago
view post
Post
3412
Introducing Unsloth Studio ✨
A new open-source web UI to train and run LLMs.

• Run models locally on Mac, Windows, Linux
• Train 500+ models 2x faster with 70% less VRAM
• Supports GGUF, vision, audio, embedding models
• Auto-create datasets from PDF, CSV, DOCX
• Self-healing tool calling and code execution
• Compare models side by side + export to GGUF

GitHub: https://github.com/unslothai/unsloth
Blog and Guide: https://unsloth.ai/docs/new/studio

Available now on Hugging Face, NVIDIA, Docker and Colab.
danielhanchen 
posted an update about 2 months ago
view post
Post
3934
We collaborated with NVIDIA to teach you about Reinforcement Learning and RL environments. 💚 Learn:

• Why RL environments matter + how to build them
• When RL is better than SFT
• GRPO and RL best practices
• How verifiable rewards and RLVR work

Blog: https://unsloth.ai/blog/rl-environments
  • 4 replies
·
alvarobartt 
posted an update 2 months ago
view post
Post
3697
Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr
danielhanchen 
posted an update 2 months ago
view post
Post
3458
100,000+ models trained with Unsloth have now been open-sourced on 🤗Hugging Face! 🦥

Here are the most popular ones you can run local:
1. TeichAI - GLM-4.7-Flash distilled from Claude 4.5 Opus (high)
2. Zed - Qwen Coder 7B fine-tuned for stronger coding
3. DavidAU - Llama-3.3-8B distilled from Claude 4.5 Opus (high)
4. huihui - gpt-oss made “abliberated”

Links to models:
1. TeichAI: TeichAI/GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill-GGUF
2. Zed: zed-industries/zeta
3. DavidAU: DavidAU/Llama3.3-8B-Instruct-Thinking-Claude-4.5-Opus-High-Reasoning
4. huihui: huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated

See all the 100K latest models fine-tuned with Unsloth here: https://huggingface.co/models?other=u
  • 2 replies
·
danielhanchen 
posted an update 3 months ago
danielhanchen 
posted an update 3 months ago
view post
Post
5225
We collaborated with Hugging Face to enable you to train MoE models 12× faster with 35% less VRAM via our new Triton kernels (no accuracy loss). 🤗

Train gpt-oss locally on 12.8GB VRAM with our free notebooks: https://unsloth.ai/docs/new/faster-moe
  • 1 reply
·
alvarobartt 
posted an update 3 months ago
view post
Post
3247
💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.
  • 1 reply
·
danielhanchen 
posted an update 3 months ago