The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation Paper β’ 2510.23393 β’ Published 28 days ago β’ 20
Diff-XYZ: A Benchmark for Evaluating Diff Understanding Paper β’ 2510.12487 β’ Published Oct 14 β’ 8
The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management Paper β’ 2508.21433 β’ Published Aug 29 β’ 7
π Repository-Level Pre-Trained OpenCoder π§© Collection All the checkpoints from Table 3 of the paper βOn Pretraining for Project-Level Code Completion.β β’ 33 items β’ Updated Oct 17 β’ 3
PIPer: On-Device Environment Setup via Online Reinforcement Learning Paper β’ 2509.25455 β’ Published Sep 29 β’ 37
REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark Paper β’ 2406.11927 β’ Published Jun 17, 2024 β’ 11
Interface Design for Self-Supervised Speech Models Paper β’ 2406.12209 β’ Published Jun 18, 2024 β’ 8
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models Paper β’ 2406.12649 β’ Published Jun 18, 2024 β’ 16
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper β’ 2406.11230 β’ Published Jun 17, 2024 β’ 33
ποΈ Long Code Arena Collection All the resources for our Long Code Arena benchmark! β’ 13 items β’ Updated Mar 14 β’ 6
Long Code Arena: a Set of Benchmarks for Long-Context Code Models Paper β’ 2406.11612 β’ Published Jun 17, 2024 β’ 25