view article Article Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation Sep 16 β’ 14
An efficient probabilistic hardware architecture for diffusion-like models Paper β’ 2510.23972 β’ Published 27 days ago β’ 3
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper β’ 2510.25992 β’ Published 25 days ago β’ 42
view article Article 3+ Years of ML & Society at Hugging Face π€π€π§βπ€βπ§ 25 days ago β’ 13
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning 28 days ago β’ 67
view article Article Aligning to What? Rethinking Agent Generalization in MiniMax M2 24 days ago β’ 25
gpt-oss-safeguard Collection gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss β’ 2 items β’ Updated 25 days ago β’ 56
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper β’ 2402.12030 β’ Published Feb 19, 2024 β’ 3
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper β’ 2307.09288 β’ Published Jul 18, 2023 β’ 247
Huxley-GΓΆdel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine Paper β’ 2510.21614 β’ Published about 1 month ago β’ 22
Training language models to follow instructions with human feedback Paper β’ 2203.02155 β’ Published Mar 4, 2022 β’ 24
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper β’ 2504.21233 β’ Published Apr 30 β’ 49
Bridging Offline and Online Reinforcement Learning for LLMs Paper β’ 2506.21495 β’ Published Jun 26 β’ 3