olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 12 items • Updated 7 days ago • 140
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 550
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper • 2312.04461 • Published Dec 7, 2023 • 62
Document Processing Collection Any model or dataset dealing with documentary-type objects (layout detection, VQA, OCR, etc.) • 11 items • Updated Sep 4 • 4
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Jul 10 • 87
Evaluation Datasets Collection Collection of Romanian datasets used for evaluation • 8 items • Updated Oct 30 • 2
SFT Datasets Collection Collection of Romanian datasets used for supervised finetuning • 11 items • Updated Oct 30 • 1
MultiLegalPile Models Collection A 689GB Multilingual Legal Corpus • 33 items • Updated Oct 23, 2023 • 2