stair-lab/code_insights_csv
Viewer
• Updated
• 3.07M • 29
• 1
stair-lab/nonmyopia_results
Updated
• 96.4k
stair-lab/code_insights_results
Preview
• Updated
• 112
Viewer
• Updated
• 404 • 86
Viewer
• Updated
• 21.2k • 7
stair-lab/cultural_value_understanding_wvs
Viewer
• Updated
• 1k • 13
stair-lab/chatbot_arena_embedding
Viewer
• Updated
• 323k • 8
Viewer
• Updated
• 23.3k • 19
stair-lab/zeroshot_evaluator
Viewer
• Updated
• 1M • 6
stair-lab/zero_shot_evaluator_openllm_val
Preview
• Updated
• 10
stair-lab/zero_evaluator_agentic
Viewer
• Updated
• 34.7k • 8
stair-lab/zero_shot_open_llm_leaderboard
Viewer
• Updated
• 74.6M • 122
stair-lab/irsl_downstream_resmat1_fullinfo
Updated
• 67
stair-lab/irsl_testtime_resmat1
stair-lab/irsl_downstream_resmat1_prob
Updated
• 11
stair-lab/deprecated_2choice_irsl_downstream_resmat1
stair-lab/deprecated_2choice_irsl_downstream_resmat1_fullinfo
Updated
• 15
Preview
• Updated
• 1.06k
stair-lab/irsl_testtime_resmat2
stair-lab/irsl_downstream_resmat1_binary
Updated
• 64
stair-lab/information-gathering
Preview
• Updated
• 25
stair-lab/denoise_eval_query
Preview
• Updated
• 422
stair-lab/deval_helm_hyperturing1
Updated
• 593
stair-lab/fantastic_bugs_result
Viewer
• Updated
• 405k • 15
stair-lab/platinum_detect
Viewer
• Updated
• 282 • 160
stair-lab/fantastic_bugs_result_deprecated
Preview
• Updated
• 81
stair-lab/monkey_query_pre
Updated
• 247
stair-lab/one_question_less_samples
Viewer
• Updated
• 2.34k • 9
Viewer
• Updated
• 5.69M • 61
• 1