AI & ML interests

None defined yet.

Recent Activity

sergiopaniegoΒ 
in trl-lib/documentation-images about 19 hours ago

Upload text_arena_evals.png

#2 opened about 19 hours ago by
burtenshaw
sergiopaniegoΒ 
posted an update 5 days ago
view post
Post
2621
Meet OpenEnv πŸ‘‹, an open ecosystem of environments for intelligent agents. Build, share, and test agents safely and consistently.

Ideal for training with TRL (we include examplesπŸ€“), deployment, and community collaboration via the HF Hub

Blog: https://huggingface.co/blog/openenv
Hub for Environments: openenv
OpenEnv repo: https://github.com/meta-pytorch/OpenEnv
Try it out using TRL: https://huggingface.co/docs/trl/main/en/openenv
sergiopaniegoΒ 
posted an update 12 days ago
view post
Post
1867
New drop! πŸ’₯ The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.

You can compare how models reason about images 🧠

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Let’s gooo

sergiopaniego/vlm_object_understanding
sergiopaniegoΒ 
posted an update 12 days ago
view post
Post
836
New drop! πŸ’₯ The VLM Object Understanding Comparison Space now runs with Qwen3-VL-4B and moondream3.



You can compare how models reason about images 🧠

Bonus: thanks to @ariG23498 , you now get auto-suggested prompts to explore faster.

Let’s gooo

sergiopaniego/vlm_object_understanding
sergiopaniegoΒ 
posted an update 14 days ago
view post
Post
2265
@Qwen released their new small and dense VLMs (Qwen3-VL).

They're incredibly capable and one of my all-time favourite VLMs.

πŸ€— We’ve prepared some resources to help you get started.

> Fine-tune Qwen3-VL-4B with SFT or GRPO (free Colab notebooks):
> SFT: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb
> GRPO: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_qwen3_vl.ipynb

> Compare object detection vs. Moondream3:
sergiopaniego/vlm_object_understanding

> Fine-tune from the CLI using TRL:
https://github.com/kashif/Qwen3-VL/blob/trl-sft/qwen-vl-finetune/README.md#trl-based-training-single-gpu
sergiopaniegoΒ 
posted an update 19 days ago
view post
Post
1446
Super nice intro to fine-tuning with TRL, just dropped by @google (runs free on Colab)!

They use SFT + QLoRA to fine-tune the tiny Gemma 3 270M model for emoji generation

Here’s what the fine-tuned model generates for the prompt: β€œI'm learning to tweet” β†’ πŸ¦πŸ—£πŸ’»

Colab: https://colab.research.google.com/github/google-gemini/gemma-cookbook/blob/main/Demos/Emoji-Gemma-on-Web/resources/Fine_tune_Gemma_3_270M_for_emoji_generation.ipynb
Try it out: google/emoji-gemma
Learn more: https://developers.googleblog.com/en/own-your-ai-fine-tune-gemma-3-270m-for-on-device/
sergiopaniegoΒ 
posted an update 22 days ago
view post
Post
2396
Online training methods (e.g., GRPO) require real-time generation, a compute- and memory-heavy bottleneck.

TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab ⚑, scale to multi-GPU/multi-node!

πŸ§‘β€πŸ³ recipe: https://huggingface.co/learn/cookbook/grpo_vllm_online_training
  • 1 reply
Β·
sergiopaniegoΒ 
posted an update 23 days ago
view post
Post
2883
A few days ago, Thinking Machines Lab released β€œLoRA Without Regret”, showing that LoRA can match full fine-tuning performance when configured right.

Naturally, we decided to reproduce the results with TRL and release a guide!

https://huggingface.co/docs/trl/main/en/lora_without_regret
sergiopaniegoΒ 
posted an update 28 days ago
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
485
You need to try this tool! 🫑

My colleague @Molbap built an interactive HF Space to explore the modular support of open models in transformers over time

πŸ‘€ You’ll spot things like πŸ¦™ llama defining many models or which ones could be modular next

Try it: Molbap/transformers-modular-refactor
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
475
How fast can you create an endpoint in Hugging Face Inference Endpoints with a new model + vLLM to deploy a state-of-the-art OCR model?

Let’s break it down step by step.

1️⃣ Create your endpoint
Go to Hugging Face Endpoints β†’ + NEW
Select Deploy from Hub β†’ rednote-hilab/dots.ocr β†’ Configure πŸ› οΈ

2️⃣ Configure hardware & container
Pick hardware: AWS/GPU/L4 ⚑
Set container: vLLM πŸ‡
Click Create βœ…

3️⃣ Update endpoint settings
Container: Container URI: vllm/vllm-openai:nightly β†’ Update
Advanced: add flag --trust-remote-code β†’ Update ⚠️

4️⃣ Run inference
Download the script πŸ“: ariG23498/useful-scripts
Set your HF_TOKEN and update base_url in the script.
Run it. βœ…

Your OCR model is now live via HF Inference Endpoints!
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
3454
πŸ’₯ Tons of new material just landed in the smol-course! πŸ§‘β€πŸ’»

> evaluation
> alignment
> VLMs
> quizzes
> assignments!
> certificates!πŸ‘©β€πŸŽ“

go learn! πŸ‘‰ https://huggingface.co/learn/smol-course/unit0/1
  • 1 reply
Β·
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
1393
This summer TRL leveled up for multimodal alignment 🌞

βœ… New VLM alignment methods (MPO, GRPO, GSPO)
βœ… Extended RLOO & Online DPO for VLMs
βœ… Native SFT support
βœ… Ready-to-use training scripts

πŸ”— https://huggingface.co/blog/trl-vlm-alignment
sergiopaniegoΒ 
posted an update about 1 month ago
sergiopaniegoΒ 
posted an update about 1 month ago
view post
Post
449
Training long-context LLMs is getting easier!

TRL now supports Context Parallelism (CP), letting you scale sequences across multiple GPUs, even multi-node setups, seamlessly πŸ’†
Combine TRL and accelerate, and you can run it effortlessly!

With 8 GPUs, CP enables 300k+ token sequences while keeping throughput reasonable.
Works for both full fine-tuning and LoRA, unlocking contexts that used to hit OOM πŸ“ˆ

Check out the full guide here πŸ‘‰ https://huggingface.co/docs/trl/main/en/distributing_training#context-parallelism

If you want to learn more about Context Parallelism, check out the Ultrascale Playbook πŸ‘‰ nanotron/ultrascale-playbook