PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published 14 days ago • 75
view article Article Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text By isaacchung and 2 others • 10 days ago • 33
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published 24 days ago • 109
CommonForms: A Large, Diverse Dataset for Form Field Detection Paper • 2509.16506 • Published Sep 20 • 18
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 • 257
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception Paper • 2410.12628 • Published Oct 16, 2024 • 41
HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering Paper • 2509.09713 • Published Sep 8 • 24
FastVLM Collection Efficient Vision Encoding for Vision Language Models • 9 items • Updated Sep 2 • 103
view article Article *Context Is Gold to Find the Gold Passage*: Evaluating and Training Contextual Document Embeddings By manu and 1 other • Jun 2 • 25