FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory Paper • 2510.02335 • Published Sep 26 • 1
A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning Paper • 2510.15444 • Published 11 days ago • 139
You Only Submit One Image to Find the Most Suitable Generative Model Paper • 2412.12232 • Published Dec 16, 2024 • 1
TabFSBench: Tabular Benchmark for Feature Shifts in Open Environments Paper • 2501.18935 • Published Jan 31
VC Search: Bridging the Gap Between Well-Defined and Ill-Defined Problems in Mathematical Reasoning Paper • 2406.05055 • Published Jun 7, 2024 • 1
TabFSBench: Tabular Benchmark for Feature Shifts in Open Environments Paper • 2501.18935 • Published Jan 31
LawGPT: A Chinese Legal Knowledge-Enhanced Large Language Model Paper • 2406.04614 • Published Jun 7, 2024 • 2
USB: A Unified Semi-supervised Learning Benchmark for Classification Paper • 2208.07204 • Published Aug 12, 2022
Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning Paper • 2502.00511 • Published Feb 1 • 12
LawGPT: Knowledge-Guided Data Generation and Its Application to Legal LLM Paper • 2502.06572 • Published Feb 10 • 1
ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning Paper • 2412.13682 • Published Dec 18, 2024 • 7
ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning Paper • 2412.13682 • Published Dec 18, 2024 • 7
ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning Paper • 2412.13682 • Published Dec 18, 2024 • 7