Gaia2 and ARE: Empowering the community to study agents
โข
119
LLM evaluation
Explore and discover all leaderboards from the HF community
A space to view and inspect all the tasks in lighteval
Launch and monitor model evaluation jobs
Generate a command to run model evaluations