Miguel Guerrero's picture

Miguel Guerrero PRO

apol

·

https://miguelguerrero.eu

AI & ML interests

nlp, avatars, gans, time series, memory, education, govtech

Recent Activity

upvoted an article about 1 hour ago

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

reacted to sergiopaniego's post with 🚀 about 1 hour ago

We just released a big blog surveying 16 OSS frameworks for async RL training of LLMs! We're building a new async GRPO trainer for TRL and as first step, we needed to understand how the ecosystem solves this problem today. The problem: in synchronous RL training, generation dominates wall-clock time. 32K-token rollouts on a 32B model take hours while training GPUs sit completely idle. With reasoning models and agentic RL making rollouts longer and more variable, this only gets worse. The ecosystem converged on the same fix: separate inference + training onto different GPU pools, rollout buffer, and async weight sync. We compared 16 frameworks across 7 axes: orchestration, buffer design, weight sync, staleness management, partial rollouts, LoRA, and MoE support. This survey is step one. The async GRPO trainer for TRL is next! https://huggingface.co/blog/async-rl-training-landscape

updated a Space 1 day ago

apol/spain-persona-research-observatory

View all activity

Organizations

apol 's models 3

apol/med-llm-triage-es

1B • Updated Feb 18 • 22 • 1

apol/test

Updated Mar 10, 2022

apol/dalle-mini

Text-to-Image • Updated Aug 17, 2021 • 11 • 9