ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models Paper • 2502.09696 • Published Feb 13 • 43
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? Paper • 2411.05000 • Published Nov 7, 2024 • 22
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models Paper • 2408.11817 • Published Aug 21, 2024 • 9
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation Paper • 2405.08807 • Published May 14, 2024
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs Paper • 2311.14656 • Published Nov 24, 2023 • 2
GPT4GEO: How a Language Model Sees the World's Geography Paper • 2306.00020 • Published May 30, 2023 • 1
SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models Paper • 2304.11619 • Published Apr 23, 2023 • 2