-
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
Paper • 2502.08127 • Published • 58 -
Vision Transformers Need Registers
Paper • 2309.16588 • Published • 83 -
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
Paper • 2504.13072 • Published • 13 -
What are you sinking? A geometric approach on attention sink
Paper • 2508.02546 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2502.08127
-
sentence-transformers/all-MiniLM-L6-v2
Sentence Similarity • 22.7M • Updated • 136M • • 4.06k -
sentence-transformers/paraphrase-xlm-r-multilingual-v1
Sentence Similarity • 0.3B • Updated • 228k • • 69 -
rtzr/ko-gemma-2-9b-it
Text Generation • 9B • Updated • 4.24k • • 91 -
Qwen/Qwen2.5-Omni-7B
Any-to-Any • 11B • Updated • 221k • 1.81k
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 41 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 237 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 37 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 39
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
Paper • 2502.08127 • Published • 58 -
Vision Transformers Need Registers
Paper • 2309.16588 • Published • 83 -
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation
Paper • 2504.13072 • Published • 13 -
What are you sinking? A geometric approach on attention sink
Paper • 2508.02546 • Published • 1
-
sentence-transformers/all-MiniLM-L6-v2
Sentence Similarity • 22.7M • Updated • 136M • • 4.06k -
sentence-transformers/paraphrase-xlm-r-multilingual-v1
Sentence Similarity • 0.3B • Updated • 228k • • 69 -
rtzr/ko-gemma-2-9b-it
Text Generation • 9B • Updated • 4.24k • • 91 -
Qwen/Qwen2.5-Omni-7B
Any-to-Any • 11B • Updated • 221k • 1.81k
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 26 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 104
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 41 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 17
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 237 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 37 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 39