xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics Paper • 2406.14553 • Published Jun 20, 2024 • 2
ViSTa Dataset: Do vision-language models understand sequential tasks? Paper • 2411.13211 • Published Nov 20, 2024
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs Paper • 2508.11383 • Published Aug 15 • 40