trl-lib (TRL)

Meet OpenEnv 👋, an open ecosystem of environments for intelligent agents. Build, share, and test agents safely and consistently.

Ideal for training with TRL (we include examples🤓), deployment, and community collaboration via the HF Hub

Blog: https://huggingface.co/blog/openenv
Hub for Environments:

openenv
OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
Try it out using TRL: https://huggingface.co/docs/trl/main/en/openenv

qgallouedec

updated a Space 6 days ago

4

Trackio

🚀

Display tracking information

sergiopaniego

posted an update 12 days ago

Post

1867

New drop! 💥 The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.

You can compare how models reason about images 🧠

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Let’s gooo

sergiopaniego/vlm_object_understanding

sergiopaniego

posted an update 12 days ago

Post

836

New drop! 💥 The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.

You can compare how models reason about images 🧠

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Let’s gooo

sergiopaniego/vlm_object_understanding

sergiopaniego

updated a dataset 13 days ago

trl-lib/documentation-images

Viewer • Updated about 19 hours ago • 9 • 75k

sergiopaniego

posted an update 14 days ago

Post

2265

@Qwen released their new small and dense VLMs (Qwen3-VL).

They're incredibly capable and one of my all-time favourite VLMs.

🤗 We’ve prepared some resources to help you get started.

> Fine-tune Qwen3-VL-4B with SFT or GRPO (free Colab notebooks):
> SFT: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb
> GRPO: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_qwen3_vl.ipynb

> Compare object detection vs. Moondream3:
sergiopaniego/vlm_object_understanding

> Fine-tune from the CLI using TRL:
https://github.com/kashif/Qwen3-VL/blob/trl-sft/qwen-vl-finetune/README.md#trl-based-training-single-gpu

lvwerra

authored a paper 15 days ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published 19 days ago • 32

sergiopaniego

posted an update 19 days ago

Post

1446

Super nice intro to fine-tuning with TRL, just dropped by @google (runs free on Colab)!

They use SFT + QLoRA to fine-tune the tiny Gemma 3 270M model for emoji generation

Here’s what the fine-tuned model generates for the prompt: “I'm learning to tweet” → 🐦🗣💻

Colab: https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Demos/Emoji-Gemma-on-Web/resources/Fine_tune_Gemma_3_270M_for_emoji_generation.ipynb
Try it out: google/emoji-gemma
Learn more: https://developers.googleblog.com/en/own-your-ai-fine-tune-gemma-3-270m-for-on-device/

sergiopaniego

posted an update 22 days ago

Post

2396

Online training methods (e.g., GRPO) require real-time generation, a compute- and memory-heavy bottleneck.

TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab ⚡, scale to multi-GPU/multi-node!

🧑‍🍳 recipe: https://huggingface.co/learn/cookbook/grpo_vllm_online_training

1 reply

·

sergiopaniego

posted an update 23 days ago

Post

2883

A few days ago, Thinking Machines Lab released “LoRA Without Regret”, showing that LoRA can match full fine-tuning performance when configured right.

Naturally, we decided to reproduce the results with TRL and release a guide!

https://huggingface.co/docs/trl/main/en/lora_without_regret

sergiopaniego

posted an update 28 days ago

Post

583

Want to deploy open models using vLLM as the inference engine?
We just released a step-by-step guide on how to do it with @huggingface Inference Endpoints, now available in the vLLM docs.

let the gpus go brrr

https://docs.vllm.ai/en/latest/deployment/frameworks/hf_inference_endpoints.html

sergiopaniego

posted an update about 1 month ago

Post

485

You need to try this tool! 🫡

My colleague @Molbap built an interactive HF Space to explore the modular support of open models in transformers over time

👀 You’ll spot things like 🦙 llama defining many models or which ones could be modular next

Try it: Molbap/transformers-modular-refactor

sergiopaniego

posted an update about 1 month ago

Post

475

How fast can you create an endpoint in Hugging Face Inference Endpoints with a new model + vLLM to deploy a state-of-the-art OCR model?

Let’s break it down step by step.

1️⃣ Create your endpoint
Go to Hugging Face Endpoints → + NEW
Select Deploy from Hub → rednote-hilab/dots.ocr → Configure 🛠️

2️⃣ Configure hardware & container
Pick hardware: AWS/GPU/L4 ⚡
Set container: vLLM 🐇
Click Create ✅

3️⃣ Update endpoint settings
Container: Container URI: vllm/vllm-openai:nightly → Update
Advanced: add flag --trust-remote-code → Update ⚠️

4️⃣ Run inference
Download the script 📝: ariG23498/useful-scripts
Set your HF_TOKEN and update base_url in the script.
Run it. ✅

Your OCR model is now live via HF Inference Endpoints!

sergiopaniego

posted an update about 1 month ago

Post

3454

💥 Tons of new material just landed in the smol-course! 🧑‍💻

> evaluation
> alignment
> VLMs
> quizzes
> assignments!
> certificates!👩‍🎓

go learn! 👉 https://huggingface.co/learn/smol-course/unit0/1

1 reply

·

sergiopaniego

posted an update about 1 month ago

Post

1393

This summer TRL leveled up for multimodal alignment 🌞

✅ New VLM alignment methods (MPO, GRPO, GSPO)
✅ Extended RLOO & Online DPO for VLMs
✅ Native SFT support
✅ Ready-to-use training scripts

🔗 https://huggingface.co/blog/trl-vlm-alignment

sergiopaniego

posted an update about 1 month ago

Post

561

You can now use any open LLM as your coding assistant in VS Code with the @huggingface Provider for GitHub Copilot Chat.

Just pick your fav open model and start building!

Vibe-coding is all you need!?

learn more: https://huggingface.co/docs/inference-providers/en/guides/vscode

1 reply

·

sergiopaniego

posted an update about 1 month ago

Post

449

Training long-context LLMs is getting easier!

TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly 💆
Combine TRL and accelerate, and you can run it effortlessly!

With 8 GPUs, CP enables 300k+ token sequences while keeping throughput reasonable.
Works for both full fine-tuning and LoRA, unlocking contexts that used to hit OOM 📈

Check out the full guide here 👉 https://huggingface.co/docs/trl/main/en/distributing_training#context-parallelism

If you want to learn more about Context Parallelism, check out the Ultrascale Playbook 👉 nanotron/ultrascale-playbook

TRL

AI & ML interests

Recent Activity

Upload text_arena_evals.png

trl-lib/trackio-dataset

trl-lib/trackio-dataset

Trackio

trl-lib/documentation-images

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

AI & ML interests

Recent Activity

Team members 10

trl-lib's activity

Upload text_arena_evals.png

Trackio