Blog, Articles, and discussions

Community Articles

We’re open-sourcing our text-to-image model and the process behind it

Text-to-image Architectural Experiments

Introducing Cogito v2.1

Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models

Projected Abliteration

AI Model Optimization More Flexible Than Ever

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

The Heterogeneous Feature of RoPE-based Attention in Long-Context LLMs

Uncensor any LLM with abliteration

KV Caching Explained: Optimizing Transformer Inference Efficiency

The Pharmome Map: a comprehensive public dataset for drug-target interaction modeling

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Norm-Preserving Biprojected Abliteration

Granite 4.0 Nano: Just how small can you go?

Why Did MiniMax M2 End Up as a Full Attention Model?

🧠 SQaLe: Enabling new Text-to-SQL models with our massive dataset

Join the AMD Open Robotics Hackathon

To Think or Not to Think: A Router for Hybrid LLMs

PEFT: Parameter-Efficient Fine-Tuning Methods for LLMs

Visualizing How VLMs Work

leaderboardevaluationnlp

Arabic Leaderboards: Introducing Arabic Instruction Following, Updating AraGen, and More

+2

math-verifyopen-llm-leaderboardleaderboard

Fixing Open LLM Leaderboard with Math-Verify

February 14, 2025

nlpresearchleaderboard

The Open Arabic LLM Leaderboard 2

+3

February 10, 2025

open-llm-leaderboardleaderboardenergy_efficiency

CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard

January 9, 2025

leaderboardresearchcollaboration

Evaluating Audio Reasoning with Big Bench Audio

December 20, 2024

leaderboardevaluationnlp

Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard

+1

December 4, 2024

communityresearchnlp

Letting Large Models Debate: The First Multilingual LLM Debate Competition

+8

November 20, 2024

communityresearchnlp

Introducing the Open Leaderboard for Japanese LLMs!

+2

November 20, 2024

leaderboardarenacollaboration

Judge Arena: Benchmarking LLMs as Evaluators

+4

November 19, 2024

leaderboardcollaborationcommunity

Introducing the Open FinLLM Leaderboard

+9

October 4, 2024

nlpresearchleaderboard

🇨🇿 BenCzechMark - Can your LLM Understand Czech?

+7

October 1, 2024

ai4mathnlpcommunity

How NuminaMath Won the 1st AIMO Progress Prize

+4

agentssmolagentsnlp

Our Transformers Code Agent beats the GAIA benchmark 🏅

leaderboardresearchcollaboration

BigCodeBench: The Next Generation of HumanEval

+5

Community Articles

We’re open-sourcing our text-to-image model and the process behind it

Text-to-image Architectural Experiments

Introducing Cogito v2.1

Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models

Projected Abliteration

AI Model Optimization More Flexible Than Ever

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

The Heterogeneous Feature of RoPE-based Attention in Long-Context LLMs

Uncensor any LLM with abliteration

KV Caching Explained: Optimizing Transformer Inference Efficiency

The Pharmome Map: a comprehensive public dataset for drug-target interaction modeling

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

Norm-Preserving Biprojected Abliteration

Granite 4.0 Nano: Just how small can you go?

Why Did MiniMax M2 End Up as a Full Attention Model?

🧠 SQaLe: Enabling new Text-to-SQL models with our massive dataset

Join the AMD Open Robotics Hackathon

To Think or Not to Think: A Router for Hybrid LLMs

PEFT: Parameter-Efficient Fine-Tuning Methods for LLMs

Visualizing How VLMs Work

View all articles