-
Kwaipilot/KAT-Dev-72B-Exp
Text Generation β’ 73B β’ Updated β’ 3.06k β’ 140 -
LiquidAI/LFM2-8B-A1B
Text Generation β’ 8B β’ Updated β’ 11.1k β’ 223 -
yanolja/YanoljaNEXT-Rosetta-12B-2510
Translation β’ 12B β’ Updated β’ 735 β’ 25 -
NeuML/colbert-muvera-femto
Sentence Similarity β’ 243k β’ Updated β’ 642 β’ 19
merve PRO
AI & ML interests
Recent Activity
Organizations
-
ByteDance/lynx
Image-to-Video β’ Updated β’ β’ 133 -
tencent/HunyuanImage-3.0
Text-to-Image β’ 83B β’ Updated β’ 34.7k β’ β’ 938 -
meituan-longcat/LongCat-Flash-Thinking
Text Generation β’ 562B β’ Updated β’ 645 β’ 144 -
Qwen/Qwen3Guard-Gen-4B
Text Generation β’ 4B β’ Updated β’ 17.3k β’ 23
-
bytedance-research/HuMo
Image-to-Video β’ Updated β’ 581 β’ 234 -
facebook/MobileLLM-R1-950M
Text Generation β’ 0.9B β’ Updated β’ 4.86k β’ 347 -
tencent/POINTS-Reader
Image-Text-to-Text β’ 4B β’ Updated β’ 856 β’ 97 -
baidu/ERNIE-4.5-21B-A3B-Thinking
Text Generation β’ 22B β’ Updated β’ 1.07k β’ β’ 756
-
microsoft/VibeVoice-1.5B
Text-to-Speech β’ 3B β’ Updated β’ 236k β’ 1.93k -
OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview
Image-Text-to-Text β’ 0.4B β’ Updated β’ 37.1k β’ 74 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 3k β’ 70 -
stepfun-ai/Step-Audio-2-mini
Any-to-Any β’ 8B β’ Updated β’ 2.02k β’ 234
-
openai/gpt-oss-120b
Text Generation β’ 120B β’ Updated β’ 3.69M β’ β’ 4.05k -
openai/gpt-oss-20b
Text Generation β’ 22B β’ Updated β’ 5.05M β’ β’ 3.8k -
openai/BrowseCompLongContext
Viewer β’ Updated β’ 295 β’ 365 β’ 41 -
baichuan-inc/Baichuan-M2-32B
Text Generation β’ 33B β’ Updated β’ 116k β’ β’ 102
-
Wan-AI/Wan2.2-I2V-A14B
Image-to-Video β’ Updated β’ 8.48k β’ β’ 447 -
allenai/olmOCR-7B-0725
Image-Text-to-Text β’ 8B β’ Updated β’ 3.33k β’ 62 -
Wan-AI/Wan2.2-T2V-A14B
Text-to-Video β’ Updated β’ 9.2k β’ β’ 330 -
Qwen/Qwen3-235B-A22B-Thinking-2507
Text Generation β’ 235B β’ Updated β’ 32.5k β’ β’ 370
-
HuggingFaceTB/SmolLM3-3B
Text Generation β’ 3B β’ Updated β’ 57.6k β’ β’ 751 -
moonshotai/Kimi-K2-Instruct
Text Generation β’ 1T β’ Updated β’ 82.2k β’ β’ 2.19k -
fal/Realism-Detailer-Kontext-Dev-LoRA
Image-to-Image β’ Updated β’ 363 β’ β’ 53 -
Alibaba-NLP/WebSailor-3B
3B β’ Updated β’ 49 β’ 74
-
nari-labs/Dia-1.6B-0626
Text-to-Speech β’ 2B β’ Updated β’ 20.2k β’ 113 -
google/gemma-3n-E4B-it
Image-Text-to-Text β’ 8B β’ Updated β’ 65.5k β’ 803 -
ByteDance/XVerse
Text-to-Image β’ Updated β’ 55 β’ 89 -
nvidia/llama-nemoretriever-colembed-3b-v1
Visual Document Retrieval β’ 4B β’ Updated β’ 1.78k β’ 53
-
opendatalab/OmniDocBench
Viewer β’ Updated β’ 1.36k β’ 10.5k β’ 46 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text β’ 4B β’ Updated β’ 143k β’ 1.55k -
echo840/MonkeyOCR
Image-Text-to-Text β’ Updated β’ 213 β’ 510 -
Running on ZeroMCP136136
OCR2
π»nanonets ocr / smoldocling / monkey ocr / typhoon ocr
-
ByteDance-Seed/BAGEL-7B-MoT
Any-to-Any β’ 15B β’ Updated β’ 689 β’ 1.15k -
mistralai/Devstral-Small-2505
24B β’ Updated β’ 6.29k β’ 853 -
ByteDance/Dolphin
Image-Text-to-Text β’ 0.4B β’ Updated β’ 11k β’ 506 -
moondream/moondream-2b-2025-04-14-4bit
Image-Text-to-Text β’ 1B β’ Updated β’ 4.73k β’ 56
-
moonshotai/Kimi-VL-A3B-Thinking
Image-Text-to-Text β’ 16B β’ Updated β’ 8.61k β’ 441 -
agentica-org/DeepCoder-14B-Preview
Text Generation β’ 15B β’ Updated β’ 876 β’ β’ 679 -
HiDream-ai/HiDream-I1-Full
Text-to-Image β’ Updated β’ 107k β’ β’ 975 -
OpenGVLab/InternVL3-78B
Image-Text-to-Text β’ 78B β’ Updated β’ 4.43k β’ 222
-
OpenGVLab/InternVideo2_5_Chat_8B
Video-Text-to-Text β’ 8B β’ Updated β’ 11.1k β’ 85 -
AIDC-AI/Ovis2-34B
Image-Text-to-Text β’ 35B β’ Updated β’ 1.1k β’ 151 -
open-r1/OpenR1-Qwen-7B
Text Generation β’ 8B β’ Updated β’ 92 β’ β’ 54 -
nomic-ai/nomic-embed-text-v2-moe
Sentence Similarity β’ 0.5B β’ Updated β’ 343k β’ 438
-
allenai/Llama-3.1-Tulu-3-405B
Text Generation β’ 406B β’ Updated β’ 153 β’ 109 -
Qwen/Qwen2.5-VL-72B-Instruct
Image-Text-to-Text β’ 73B β’ Updated β’ 627k β’ β’ 555 -
mistralai/Mistral-Small-24B-Instruct-2501
24B β’ Updated β’ 369k β’ 946 -
deepseek-ai/Janus-Pro-7B
Any-to-Any β’ Updated β’ 77k β’ 3.52k
-
ostris/Flex.1-alpha
Text-to-Image β’ Updated β’ 8.42k β’ 477 -
Qwen/Qwen2.5-Math-PRM-72B
Text Classification β’ 73B β’ Updated β’ 69 β’ 72 -
HuggingFaceTB/SmolVLM-500M-Instruct
Image-Text-to-Text β’ 0.5B β’ Updated β’ 118k β’ 181 -
deepseek-ai/DeepSeek-R1
Text Generation β’ 685B β’ Updated β’ 480k β’ β’ 12.8k
-
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 47k β’ 555 -
Qwen/QwQ-32B-Preview
Text Generation β’ 33B β’ Updated β’ 27.8k β’ β’ 1.74k -
nvidia/Hymba-1.5B-Base
Text Generation β’ 2B β’ Updated β’ 786 β’ 152 -
vidore/colsmolvlm-v0.1
Visual Document Retrieval β’ Updated β’ 233 β’ 53
-
microsoft/LLM2CLIP-EVA02-L-14-336
Zero-Shot Image Classification β’ Updated β’ 125 β’ 58 -
microsoft/LLM2CLIP-EVA02-B-16
Updated β’ 49 β’ 10 -
PleIAs/common_corpus
Viewer β’ Updated β’ 470M β’ 16.3k β’ 312 -
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation β’ 33B β’ Updated β’ 97.5k β’ β’ 1.94k
-
NVLM: Open Frontier-Class Multimodal LLMs
Paper β’ 2409.11402 β’ Published β’ 74 -
BRAVE: Broadening the visual encoding of vision-language models
Paper β’ 2404.07204 β’ Published β’ 19 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper β’ 2403.18814 β’ Published β’ 47 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper β’ 2409.17146 β’ Published β’ 121
-
Runtime error100100
LOTUS Normal
πGenerate high-quality predictions from images
-
Runtime error7777
LOTUS Depth
πGenerate depth maps from images and videos
-
jingheya/lotus-depth-g-v1-0
Depth Estimation β’ Updated β’ 15.5k β’ 26 -
jingheya/lotus-depth-d-v1-0
Depth Estimation β’ Updated β’ 652 β’ 5
-
facebook/dinov2-large
Image Feature Extraction β’ 0.3B β’ Updated β’ 344k β’ 94 -
google/flan-t5-xl
3B β’ Updated β’ 153k β’ 521 -
google/siglip-large-patch16-384
Zero-Shot Image Classification β’ 0.7B β’ Updated β’ 25.4k β’ 8 -
google/vit-huge-patch14-224-in21k
Image Feature Extraction β’ 0.6B β’ Updated β’ 8.21k β’ 22
-
facebook/deit-base-distilled-patch16-384
Image Classification β’ 87.6M β’ Updated β’ 430 β’ 7 -
facebook/convnextv2-base-1k-224
Image Classification β’ 88.7M β’ Updated β’ 204 β’ 4 -
facebook/deit-base-distilled-patch16-224
Image Classification β’ Updated β’ 8.8k β’ β’ 31 -
google/vit-base-patch32-384
Image Classification β’ 88.3M β’ Updated β’ 42k β’ β’ 23
-
facebook/maskformer-swin-large-coco
Image Segmentation β’ 0.2B β’ Updated β’ 721 β’ 27 -
nvidia/segformer-b0-finetuned-ade-512-512
Image Segmentation β’ 3.75M β’ Updated β’ 356k β’ β’ 168 -
facebook/detr-resnet-50-dc5-panoptic
Image Segmentation β’ 43M β’ Updated β’ 25 β’ 3 -
nvidia/segformer-b5-finetuned-cityscapes-1024-1024
Image Segmentation β’ Updated β’ 83.4k β’ β’ 34
-
timbrooks/instruct-pix2pix
Image-to-Image β’ Updated β’ 75.3k β’ 1.16k -
TencentARC/t2i-adapter-canny-sdxl-1.0
Image-to-Image β’ Updated β’ 3.36k β’ 52 -
TencentARC/t2i-adapter-sketch-sdxl-1.0
Image-to-Image β’ Updated β’ 3.88k β’ 75 -
CrucibleAI/ControlNetMediaPipeFace
Image-to-Image β’ Updated β’ 871 β’ 573
-
Salesforce/blip-image-captioning-large
Image-to-Text β’ 0.5B β’ Updated β’ 1.17M β’ 1.43k -
Salesforce/blip-image-captioning-base
Image-to-Text β’ Updated β’ 2.11M β’ 798 -
microsoft/trocr-base-handwritten
Image-to-Text β’ 0.3B β’ Updated β’ 227k β’ 452 -
microsoft/git-large-coco
Image-to-Text β’ 0.4B β’ Updated β’ 2.76k β’ 104
-
Running102102
Grounding DINO Demo
π»Cutting edge open-vocabulary object detection app
-
Running9191
Owlv2
πState-of-the-art Zero-shot Object Detection
-
Runtime error4141
BLIP2 with transformers
πBLIP2 (cutting edge image captioning) in π€transformers
-
Build error377377
IDEFICS Playground
π¨
-
Running9191
Owlv2
πState-of-the-art Zero-shot Object Detection
-
Runtime error6464
Owl Tracking
β‘Powerful foundation model for zero-shot object tracking
-
Sleeping2525
Search and Detect (CLIP/OWL-ViT)
π¦Search and detect objects in images using text queries
-
Running on Zero107107
OWLSAM
π»State-of-the-art open-vocabulary image segmentation β‘οΈ
-
Improved Baselines with Visual Instruction Tuning
Paper β’ 2310.03744 β’ Published β’ 39 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper β’ 2403.05525 β’ Published β’ 46 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper β’ 2308.12966 β’ Published β’ 11 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper β’ 2404.01331 β’ Published β’ 27
-
google/owlvit-base-patch32
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 124k β’ 142 -
google/owlvit-base-patch16
Zero-Shot Object Detection β’ Updated β’ 26.1k β’ 12 -
google/owlvit-large-patch14
Zero-Shot Object Detection β’ Updated β’ 30.1k β’ 26 -
google/owlv2-base-patch16
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 41.9k β’ 28
-
depth-anything/Depth-Anything-V2-Small
Depth Estimation β’ Updated β’ 12k β’ 71 -
depth-anything/Depth-Anything-V2-Large
Depth Estimation β’ Updated β’ 371k β’ 129 -
Running on Zero566566
Depth Anything V2
πGenerate depth maps from images
-
depth-anything/DA-2K
Viewer β’ Updated β’ 1.04k β’ 417 β’ 13
-
Running178178
Vidore Leaderboard
π₯Explore visual document retrieval benchmark results
-
Running on CPU Upgrade920920
Open VLM Leaderboard
πVLMEvalKit Evaluation Results Collection
-
Running557557
Vision Arena (Testing VLMs side-by-side)
πΌDisplay image analysis results
-
Running8585
SEED-Bench Leaderboard
πSubmit model evaluation results to leaderboard
-
vidore/colpali-v1.2
Visual Document Retrieval β’ Updated β’ 34.7k β’ 112 -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text β’ 8B β’ Updated β’ 2.16M β’ β’ 1.23k -
Qwen/Qwen2-VL-2B-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 2.16M β’ 461 -
Qwen/Qwen2-72B-Instruct
Text Generation β’ 73B β’ Updated β’ 28.7k β’ β’ 717
-
ibm-granite/granite-docling-258M
Image-Text-to-Text β’ 0.3B β’ Updated β’ 255k β’ 985 -
XiaomiMiMo/MiMo-Audio-7B-Base
Any-to-Any β’ 8B β’ Updated β’ 279 β’ 39 -
decart-ai/Lucy-Edit-Dev
Video-to-Video β’ Updated β’ 1.08k β’ 293 -
OpenGVLab/ScaleCUA-3B
Image-Text-to-Text β’ 4B β’ Updated β’ 338 β’ 9
-
openbmb/MiniCPM4.1-8B
Text Generation β’ 8B β’ Updated β’ 1.26k β’ 339 -
tencent/Hunyuan-MT-7B
Translation β’ 8B β’ Updated β’ 9.4k β’ 675 -
google/embeddinggemma-300m
Sentence Similarity β’ 0.3B β’ Updated β’ 720k β’ β’ 1.08k -
moonshotai/Kimi-K2-Instruct-0905
Text Generation β’ 1T β’ Updated β’ 37.9k β’ β’ 510
-
stepfun-ai/step3
Image-Text-to-Text β’ 321B β’ Updated β’ 51.2k β’ 159 -
nunchaku-tech/nunchaku-flux.1-krea-dev
Text-to-Image β’ Updated β’ 16.6k β’ 112 -
fdtn-ai/Foundation-Sec-8B-Instruct
Text Generation β’ 8B β’ Updated β’ 5.68k β’ β’ 52 -
Wan-AI/Wan2.2-TI2V-5B-Diffusers
Text-to-Video β’ Updated β’ 21.3k β’ 79
-
nvidia/OpenReasoning-Nemotron-32B
Text Generation β’ 33B β’ Updated β’ 1.23k β’ β’ 116 -
ByteDance-Seed/Seed-X-RM-7B
Translation β’ Updated β’ 184 β’ 30 -
LGAI-EXAONE/EXAONE-4.0-32B
Text Generation β’ 32B β’ Updated β’ 25.6k β’ 259 -
vidore/colqwen-omni-v0.1
Visual Document Retrieval β’ Updated β’ 1.91k β’ 91
-
Qwen/WorldPM-72B
Text Classification β’ 73B β’ Updated β’ 59 β’ 79 -
Running on ZeroMCP1.36k1.36k
LTX Video Fast
π₯ultra-fast video model, LTX 0.9.8 13B distilled
-
BLIP3o/BLIP3o-Pretrain-Long-Caption
Viewer β’ Updated β’ 27.2M β’ 13.5k β’ 53 -
BLIP3o/BLIP3o-Model-8B
14B β’ Updated β’ 1.06k β’ 102
-
OpenGVLab/InternVL3-1B-hf
Image-Text-to-Text β’ 0.9B β’ Updated β’ 55k β’ 8 -
OpenGVLab/InternVL3-2B-hf
Image-Text-to-Text β’ 2B β’ Updated β’ 16.9k β’ 3 -
OpenGVLab/InternVL3-8B-hf
Image-Text-to-Text β’ 8B β’ Updated β’ 24k β’ 9 -
OpenGVLab/InternVL3-14B-hf
Image-Text-to-Text β’ 15B β’ Updated β’ 2.57k
-
deepseek-ai/DeepSeek-V3-0324
Text Generation β’ 685B β’ Updated β’ 338k β’ β’ 3.07k -
Qwen/Qwen2.5-Omni-7B
Any-to-Any β’ 11B β’ Updated β’ 228k β’ 1.81k -
google/txgemma-27b-chat
Text Generation β’ 27B β’ Updated β’ 665 β’ 55 -
Running356356
Qwen2.5 Omni 7B Demo
πGenerate text and speech from text, audio, images, and videos
-
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text β’ 8B β’ Updated β’ 2.16M β’ β’ 1.23k -
Qwen/Qwen2-VL-2B-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 2.16M β’ 461 -
CohereLabs/aya-vision-8b
Image-Text-to-Text β’ 9B β’ Updated β’ 41k β’ 311 -
CohereLabs/aya-vision-32b
Image-Text-to-Text β’ 33B β’ Updated β’ 285 β’ β’ 217
-
Running on Zero261261
Qwen2-VL-7B
π₯Generate text from an image and question
-
Running6464
UI-TARS
πFind click coordinates on images based on instructions
-
Running9494
Qwen2.5-1M Demo
π»Upload documents and ask questions
-
Qwen/Qwen2.5-14B-Instruct-1M
Text Generation β’ 15B β’ Updated β’ 109k β’ β’ 325
-
meta-llama/Llama-3.3-70B-Instruct
Text Generation β’ 71B β’ Updated β’ 770k β’ β’ 2.54k -
Qwen/Qwen2-VL-72B
Image-Text-to-Text β’ 73B β’ Updated β’ 85 β’ 80 -
google/paligemma2-3b-pt-224
Image-Text-to-Text β’ 3B β’ Updated β’ 1.23M β’ 158 -
tencent/HunyuanVideo
Text-to-Video β’ Updated β’ 2.17k β’ β’ 2.06k
-
ibm-granite/granite-3.0-8b-instruct
Text Generation β’ 8B β’ Updated β’ 20.2k β’ 203 -
ibm-granite/granite-3.0-2b-instruct
Text Generation β’ 3B β’ Updated β’ 3.58k β’ 46 -
CohereLabs/aya-expanse-8b
Text Generation β’ 8B β’ Updated β’ 13.7k β’ 410 -
CohereLabs/aya-expanse-32b
Text Generation β’ 32B β’ Updated β’ 7.03k β’ β’ 275
-
microsoft/resnet-50
Image Classification β’ 25.6M β’ Updated β’ 166k β’ β’ 449 -
google/vit-base-patch16-224-in21k
Image Feature Extraction β’ 86.4M β’ Updated β’ 2.69M β’ 375 -
google/vit-base-patch32-224-in21k
Image Feature Extraction β’ 88M β’ Updated β’ 17.4k β’ 19 -
facebook/dinov2-large
Image Feature Extraction β’ 0.3B β’ Updated β’ 344k β’ 94
-
facebook/detr-resnet-50
Object Detection β’ 41.6M β’ Updated β’ 704k β’ β’ 903 -
facebook/detr-resnet-101-dc5
Object Detection β’ 60.7M β’ Updated β’ 1.93k β’ 19 -
facebook/detr-resnet-50-dc5
Object Detection β’ 41.6M β’ Updated β’ 2.21k β’ 6 -
google/owlvit-base-patch32
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 124k β’ 142
-
openai/clip-vit-large-patch14
Zero-Shot Image Classification β’ 0.4B β’ Updated β’ 9.76M β’ 1.88k -
openai/clip-vit-base-patch32
Zero-Shot Image Classification β’ Updated β’ 19.5M β’ 793 -
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
Zero-Shot Image Classification β’ Updated β’ 107k β’ 295 -
kakaobrain/align-base
Zero-Shot Image Classification β’ Updated β’ 9.7k β’ 26
-
microsoft/xclip-base-patch32
Video Classification β’ 0.2B β’ Updated β’ 249k β’ 103 -
facebook/timesformer-base-finetuned-k400
Video Classification β’ Updated β’ 22.4k β’ 42 -
facebook/timesformer-base-finetuned-k600
Video Classification β’ Updated β’ 5.42k β’ 12 -
google/vivit-b-16x2
Video Classification β’ Updated β’ 432 β’ 11
-
stabilityai/stable-diffusion-xl-base-1.0
Text-to-Image β’ Updated β’ 2.48M β’ β’ 7.07k -
warp-ai/wuerstchen
Text-to-Image β’ Updated β’ 432 β’ 176 -
Deci/DeciDiffusion-v1-0
Text-to-Image β’ Updated β’ 14 β’ 139 -
stabilityai/stable-diffusion-xl-refiner-1.0
Image-to-Image β’ Updated β’ 395k β’ 1.98k
-
Running on Zero7272
Draw To Search Art
πDraw/upload image and search among WikiART using SigLIP
-
Running on CPU Upgrade2323
Compare Clip Siglip
πCompare strong zero-shot image classification models
-
Running on Zero1212
Multilingual Zero Shot Image Clf
π’Comparing powerful multilingual zero-shot image clf models
-
BAAI/bunny-phi-2-siglip-lora
Text Generation β’ Updated β’ 138 β’ 48
-
google/owlvit-base-patch32
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 124k β’ 142 -
google/owlvit-base-patch16
Zero-Shot Object Detection β’ Updated β’ 26.1k β’ 12 -
google/owlvit-large-patch14
Zero-Shot Object Detection β’ Updated β’ 30.1k β’ 26 -
google/owlv2-base-patch16
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 41.9k β’ 28
-
google/owlvit-base-patch32
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 124k β’ 142 -
google/owlvit-base-patch16
Zero-Shot Object Detection β’ Updated β’ 26.1k β’ 12 -
google/owlvit-large-patch14
Zero-Shot Object Detection β’ Updated β’ 30.1k β’ 26 -
google/owlv2-base-patch16
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 41.9k β’ 28
-
Paused2121
Video Llava
π¨Generate descriptions by uploading images or videos
-
llava-hf/LLaVA-NeXT-Video-7B-hf
Video-Text-to-Text β’ 7B β’ Updated β’ 104k β’ 112 -
llava-hf/LLaVA-NeXT-Video-7B-DPO-hf
Video-Text-to-Text β’ 7B β’ Updated β’ 2.04k β’ 10 -
llava-hf/LLaVA-NeXT-Video-7B-32K-hf
Image-Text-to-Text β’ 8B β’ Updated β’ 108 β’ 7
-
NVEagle/Eagle-X5-13B
Image-Text-to-Text β’ 15B β’ Updated β’ 4 β’ 15 -
NVEagle/Eagle-X5-13B-Chat
Image-Text-to-Text β’ 15B β’ Updated β’ 6 β’ 28 -
NVEagle/Eagle-X5-7B
Image-Text-to-Text β’ 9B β’ Updated β’ 54 β’ 26 -
Runtime error6464
Eagle X5 13B Chat
πCombine text and images to generate responses
-
Kwaipilot/KAT-Dev-72B-Exp
Text Generation β’ 73B β’ Updated β’ 3.06k β’ 140 -
LiquidAI/LFM2-8B-A1B
Text Generation β’ 8B β’ Updated β’ 11.1k β’ 223 -
yanolja/YanoljaNEXT-Rosetta-12B-2510
Translation β’ 12B β’ Updated β’ 735 β’ 25 -
NeuML/colbert-muvera-femto
Sentence Similarity β’ 243k β’ Updated β’ 642 β’ 19
-
ByteDance/lynx
Image-to-Video β’ Updated β’ β’ 133 -
tencent/HunyuanImage-3.0
Text-to-Image β’ 83B β’ Updated β’ 34.7k β’ β’ 938 -
meituan-longcat/LongCat-Flash-Thinking
Text Generation β’ 562B β’ Updated β’ 645 β’ 144 -
Qwen/Qwen3Guard-Gen-4B
Text Generation β’ 4B β’ Updated β’ 17.3k β’ 23
-
ibm-granite/granite-docling-258M
Image-Text-to-Text β’ 0.3B β’ Updated β’ 255k β’ 985 -
XiaomiMiMo/MiMo-Audio-7B-Base
Any-to-Any β’ 8B β’ Updated β’ 279 β’ 39 -
decart-ai/Lucy-Edit-Dev
Video-to-Video β’ Updated β’ 1.08k β’ 293 -
OpenGVLab/ScaleCUA-3B
Image-Text-to-Text β’ 4B β’ Updated β’ 338 β’ 9
-
bytedance-research/HuMo
Image-to-Video β’ Updated β’ 581 β’ 234 -
facebook/MobileLLM-R1-950M
Text Generation β’ 0.9B β’ Updated β’ 4.86k β’ 347 -
tencent/POINTS-Reader
Image-Text-to-Text β’ 4B β’ Updated β’ 856 β’ 97 -
baidu/ERNIE-4.5-21B-A3B-Thinking
Text Generation β’ 22B β’ Updated β’ 1.07k β’ β’ 756
-
openbmb/MiniCPM4.1-8B
Text Generation β’ 8B β’ Updated β’ 1.26k β’ 339 -
tencent/Hunyuan-MT-7B
Translation β’ 8B β’ Updated β’ 9.4k β’ 675 -
google/embeddinggemma-300m
Sentence Similarity β’ 0.3B β’ Updated β’ 720k β’ β’ 1.08k -
moonshotai/Kimi-K2-Instruct-0905
Text Generation β’ 1T β’ Updated β’ 37.9k β’ β’ 510
-
microsoft/VibeVoice-1.5B
Text-to-Speech β’ 3B β’ Updated β’ 236k β’ 1.93k -
OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview
Image-Text-to-Text β’ 0.4B β’ Updated β’ 37.1k β’ 74 -
apple/FastVLM-1.5B
Text Generation β’ 2B β’ Updated β’ 3k β’ 70 -
stepfun-ai/Step-Audio-2-mini
Any-to-Any β’ 8B β’ Updated β’ 2.02k β’ 234
-
openai/gpt-oss-120b
Text Generation β’ 120B β’ Updated β’ 3.69M β’ β’ 4.05k -
openai/gpt-oss-20b
Text Generation β’ 22B β’ Updated β’ 5.05M β’ β’ 3.8k -
openai/BrowseCompLongContext
Viewer β’ Updated β’ 295 β’ 365 β’ 41 -
baichuan-inc/Baichuan-M2-32B
Text Generation β’ 33B β’ Updated β’ 116k β’ β’ 102
-
stepfun-ai/step3
Image-Text-to-Text β’ 321B β’ Updated β’ 51.2k β’ 159 -
nunchaku-tech/nunchaku-flux.1-krea-dev
Text-to-Image β’ Updated β’ 16.6k β’ 112 -
fdtn-ai/Foundation-Sec-8B-Instruct
Text Generation β’ 8B β’ Updated β’ 5.68k β’ β’ 52 -
Wan-AI/Wan2.2-TI2V-5B-Diffusers
Text-to-Video β’ Updated β’ 21.3k β’ 79
-
Wan-AI/Wan2.2-I2V-A14B
Image-to-Video β’ Updated β’ 8.48k β’ β’ 447 -
allenai/olmOCR-7B-0725
Image-Text-to-Text β’ 8B β’ Updated β’ 3.33k β’ 62 -
Wan-AI/Wan2.2-T2V-A14B
Text-to-Video β’ Updated β’ 9.2k β’ β’ 330 -
Qwen/Qwen3-235B-A22B-Thinking-2507
Text Generation β’ 235B β’ Updated β’ 32.5k β’ β’ 370
-
nvidia/OpenReasoning-Nemotron-32B
Text Generation β’ 33B β’ Updated β’ 1.23k β’ β’ 116 -
ByteDance-Seed/Seed-X-RM-7B
Translation β’ Updated β’ 184 β’ 30 -
LGAI-EXAONE/EXAONE-4.0-32B
Text Generation β’ 32B β’ Updated β’ 25.6k β’ 259 -
vidore/colqwen-omni-v0.1
Visual Document Retrieval β’ Updated β’ 1.91k β’ 91
-
HuggingFaceTB/SmolLM3-3B
Text Generation β’ 3B β’ Updated β’ 57.6k β’ β’ 751 -
moonshotai/Kimi-K2-Instruct
Text Generation β’ 1T β’ Updated β’ 82.2k β’ β’ 2.19k -
fal/Realism-Detailer-Kontext-Dev-LoRA
Image-to-Image β’ Updated β’ 363 β’ β’ 53 -
Alibaba-NLP/WebSailor-3B
3B β’ Updated β’ 49 β’ 74
-
nari-labs/Dia-1.6B-0626
Text-to-Speech β’ 2B β’ Updated β’ 20.2k β’ 113 -
google/gemma-3n-E4B-it
Image-Text-to-Text β’ 8B β’ Updated β’ 65.5k β’ 803 -
ByteDance/XVerse
Text-to-Image β’ Updated β’ 55 β’ 89 -
nvidia/llama-nemoretriever-colembed-3b-v1
Visual Document Retrieval β’ 4B β’ Updated β’ 1.78k β’ 53
-
opendatalab/OmniDocBench
Viewer β’ Updated β’ 1.36k β’ 10.5k β’ 46 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text β’ 4B β’ Updated β’ 143k β’ 1.55k -
echo840/MonkeyOCR
Image-Text-to-Text β’ Updated β’ 213 β’ 510 -
Running on ZeroMCP136136
OCR2
π»nanonets ocr / smoldocling / monkey ocr / typhoon ocr
-
ByteDance-Seed/BAGEL-7B-MoT
Any-to-Any β’ 15B β’ Updated β’ 689 β’ 1.15k -
mistralai/Devstral-Small-2505
24B β’ Updated β’ 6.29k β’ 853 -
ByteDance/Dolphin
Image-Text-to-Text β’ 0.4B β’ Updated β’ 11k β’ 506 -
moondream/moondream-2b-2025-04-14-4bit
Image-Text-to-Text β’ 1B β’ Updated β’ 4.73k β’ 56
-
Qwen/WorldPM-72B
Text Classification β’ 73B β’ Updated β’ 59 β’ 79 -
Running on ZeroMCP1.36k1.36k
LTX Video Fast
π₯ultra-fast video model, LTX 0.9.8 13B distilled
-
BLIP3o/BLIP3o-Pretrain-Long-Caption
Viewer β’ Updated β’ 27.2M β’ 13.5k β’ 53 -
BLIP3o/BLIP3o-Model-8B
14B β’ Updated β’ 1.06k β’ 102
-
OpenGVLab/InternVL3-1B-hf
Image-Text-to-Text β’ 0.9B β’ Updated β’ 55k β’ 8 -
OpenGVLab/InternVL3-2B-hf
Image-Text-to-Text β’ 2B β’ Updated β’ 16.9k β’ 3 -
OpenGVLab/InternVL3-8B-hf
Image-Text-to-Text β’ 8B β’ Updated β’ 24k β’ 9 -
OpenGVLab/InternVL3-14B-hf
Image-Text-to-Text β’ 15B β’ Updated β’ 2.57k
-
moonshotai/Kimi-VL-A3B-Thinking
Image-Text-to-Text β’ 16B β’ Updated β’ 8.61k β’ 441 -
agentica-org/DeepCoder-14B-Preview
Text Generation β’ 15B β’ Updated β’ 876 β’ β’ 679 -
HiDream-ai/HiDream-I1-Full
Text-to-Image β’ Updated β’ 107k β’ β’ 975 -
OpenGVLab/InternVL3-78B
Image-Text-to-Text β’ 78B β’ Updated β’ 4.43k β’ 222
-
deepseek-ai/DeepSeek-V3-0324
Text Generation β’ 685B β’ Updated β’ 338k β’ β’ 3.07k -
Qwen/Qwen2.5-Omni-7B
Any-to-Any β’ 11B β’ Updated β’ 228k β’ 1.81k -
google/txgemma-27b-chat
Text Generation β’ 27B β’ Updated β’ 665 β’ 55 -
Running356356
Qwen2.5 Omni 7B Demo
πGenerate text and speech from text, audio, images, and videos
-
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text β’ 8B β’ Updated β’ 2.16M β’ β’ 1.23k -
Qwen/Qwen2-VL-2B-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 2.16M β’ 461 -
CohereLabs/aya-vision-8b
Image-Text-to-Text β’ 9B β’ Updated β’ 41k β’ 311 -
CohereLabs/aya-vision-32b
Image-Text-to-Text β’ 33B β’ Updated β’ 285 β’ β’ 217
-
OpenGVLab/InternVideo2_5_Chat_8B
Video-Text-to-Text β’ 8B β’ Updated β’ 11.1k β’ 85 -
AIDC-AI/Ovis2-34B
Image-Text-to-Text β’ 35B β’ Updated β’ 1.1k β’ 151 -
open-r1/OpenR1-Qwen-7B
Text Generation β’ 8B β’ Updated β’ 92 β’ β’ 54 -
nomic-ai/nomic-embed-text-v2-moe
Sentence Similarity β’ 0.5B β’ Updated β’ 343k β’ 438
-
allenai/Llama-3.1-Tulu-3-405B
Text Generation β’ 406B β’ Updated β’ 153 β’ 109 -
Qwen/Qwen2.5-VL-72B-Instruct
Image-Text-to-Text β’ 73B β’ Updated β’ 627k β’ β’ 555 -
mistralai/Mistral-Small-24B-Instruct-2501
24B β’ Updated β’ 369k β’ 946 -
deepseek-ai/Janus-Pro-7B
Any-to-Any β’ Updated β’ 77k β’ 3.52k
-
Running on Zero261261
Qwen2-VL-7B
π₯Generate text from an image and question
-
Running6464
UI-TARS
πFind click coordinates on images based on instructions
-
Running9494
Qwen2.5-1M Demo
π»Upload documents and ask questions
-
Qwen/Qwen2.5-14B-Instruct-1M
Text Generation β’ 15B β’ Updated β’ 109k β’ β’ 325
-
ostris/Flex.1-alpha
Text-to-Image β’ Updated β’ 8.42k β’ 477 -
Qwen/Qwen2.5-Math-PRM-72B
Text Classification β’ 73B β’ Updated β’ 69 β’ 72 -
HuggingFaceTB/SmolVLM-500M-Instruct
Image-Text-to-Text β’ 0.5B β’ Updated β’ 118k β’ 181 -
deepseek-ai/DeepSeek-R1
Text Generation β’ 685B β’ Updated β’ 480k β’ β’ 12.8k
-
meta-llama/Llama-3.3-70B-Instruct
Text Generation β’ 71B β’ Updated β’ 770k β’ β’ 2.54k -
Qwen/Qwen2-VL-72B
Image-Text-to-Text β’ 73B β’ Updated β’ 85 β’ 80 -
google/paligemma2-3b-pt-224
Image-Text-to-Text β’ 3B β’ Updated β’ 1.23M β’ 158 -
tencent/HunyuanVideo
Text-to-Video β’ Updated β’ 2.17k β’ β’ 2.06k
-
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 47k β’ 555 -
Qwen/QwQ-32B-Preview
Text Generation β’ 33B β’ Updated β’ 27.8k β’ β’ 1.74k -
nvidia/Hymba-1.5B-Base
Text Generation β’ 2B β’ Updated β’ 786 β’ 152 -
vidore/colsmolvlm-v0.1
Visual Document Retrieval β’ Updated β’ 233 β’ 53
-
microsoft/LLM2CLIP-EVA02-L-14-336
Zero-Shot Image Classification β’ Updated β’ 125 β’ 58 -
microsoft/LLM2CLIP-EVA02-B-16
Updated β’ 49 β’ 10 -
PleIAs/common_corpus
Viewer β’ Updated β’ 470M β’ 16.3k β’ 312 -
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation β’ 33B β’ Updated β’ 97.5k β’ β’ 1.94k
-
NVLM: Open Frontier-Class Multimodal LLMs
Paper β’ 2409.11402 β’ Published β’ 74 -
BRAVE: Broadening the visual encoding of vision-language models
Paper β’ 2404.07204 β’ Published β’ 19 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper β’ 2403.18814 β’ Published β’ 47 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper β’ 2409.17146 β’ Published β’ 121
-
ibm-granite/granite-3.0-8b-instruct
Text Generation β’ 8B β’ Updated β’ 20.2k β’ 203 -
ibm-granite/granite-3.0-2b-instruct
Text Generation β’ 3B β’ Updated β’ 3.58k β’ 46 -
CohereLabs/aya-expanse-8b
Text Generation β’ 8B β’ Updated β’ 13.7k β’ 410 -
CohereLabs/aya-expanse-32b
Text Generation β’ 32B β’ Updated β’ 7.03k β’ β’ 275
-
Runtime error100100
LOTUS Normal
πGenerate high-quality predictions from images
-
Runtime error7777
LOTUS Depth
πGenerate depth maps from images and videos
-
jingheya/lotus-depth-g-v1-0
Depth Estimation β’ Updated β’ 15.5k β’ 26 -
jingheya/lotus-depth-d-v1-0
Depth Estimation β’ Updated β’ 652 β’ 5
-
facebook/dinov2-large
Image Feature Extraction β’ 0.3B β’ Updated β’ 344k β’ 94 -
google/flan-t5-xl
3B β’ Updated β’ 153k β’ 521 -
google/siglip-large-patch16-384
Zero-Shot Image Classification β’ 0.7B β’ Updated β’ 25.4k β’ 8 -
google/vit-huge-patch14-224-in21k
Image Feature Extraction β’ 0.6B β’ Updated β’ 8.21k β’ 22
-
microsoft/resnet-50
Image Classification β’ 25.6M β’ Updated β’ 166k β’ β’ 449 -
google/vit-base-patch16-224-in21k
Image Feature Extraction β’ 86.4M β’ Updated β’ 2.69M β’ 375 -
google/vit-base-patch32-224-in21k
Image Feature Extraction β’ 88M β’ Updated β’ 17.4k β’ 19 -
facebook/dinov2-large
Image Feature Extraction β’ 0.3B β’ Updated β’ 344k β’ 94
-
facebook/deit-base-distilled-patch16-384
Image Classification β’ 87.6M β’ Updated β’ 430 β’ 7 -
facebook/convnextv2-base-1k-224
Image Classification β’ 88.7M β’ Updated β’ 204 β’ 4 -
facebook/deit-base-distilled-patch16-224
Image Classification β’ Updated β’ 8.8k β’ β’ 31 -
google/vit-base-patch32-384
Image Classification β’ 88.3M β’ Updated β’ 42k β’ β’ 23
-
facebook/detr-resnet-50
Object Detection β’ 41.6M β’ Updated β’ 704k β’ β’ 903 -
facebook/detr-resnet-101-dc5
Object Detection β’ 60.7M β’ Updated β’ 1.93k β’ 19 -
facebook/detr-resnet-50-dc5
Object Detection β’ 41.6M β’ Updated β’ 2.21k β’ 6 -
google/owlvit-base-patch32
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 124k β’ 142
-
facebook/maskformer-swin-large-coco
Image Segmentation β’ 0.2B β’ Updated β’ 721 β’ 27 -
nvidia/segformer-b0-finetuned-ade-512-512
Image Segmentation β’ 3.75M β’ Updated β’ 356k β’ β’ 168 -
facebook/detr-resnet-50-dc5-panoptic
Image Segmentation β’ 43M β’ Updated β’ 25 β’ 3 -
nvidia/segformer-b5-finetuned-cityscapes-1024-1024
Image Segmentation β’ Updated β’ 83.4k β’ β’ 34
-
openai/clip-vit-large-patch14
Zero-Shot Image Classification β’ 0.4B β’ Updated β’ 9.76M β’ 1.88k -
openai/clip-vit-base-patch32
Zero-Shot Image Classification β’ Updated β’ 19.5M β’ 793 -
laion/CLIP-ViT-bigG-14-laion2B-39B-b160k
Zero-Shot Image Classification β’ Updated β’ 107k β’ 295 -
kakaobrain/align-base
Zero-Shot Image Classification β’ Updated β’ 9.7k β’ 26
-
timbrooks/instruct-pix2pix
Image-to-Image β’ Updated β’ 75.3k β’ 1.16k -
TencentARC/t2i-adapter-canny-sdxl-1.0
Image-to-Image β’ Updated β’ 3.36k β’ 52 -
TencentARC/t2i-adapter-sketch-sdxl-1.0
Image-to-Image β’ Updated β’ 3.88k β’ 75 -
CrucibleAI/ControlNetMediaPipeFace
Image-to-Image β’ Updated β’ 871 β’ 573
-
microsoft/xclip-base-patch32
Video Classification β’ 0.2B β’ Updated β’ 249k β’ 103 -
facebook/timesformer-base-finetuned-k400
Video Classification β’ Updated β’ 22.4k β’ 42 -
facebook/timesformer-base-finetuned-k600
Video Classification β’ Updated β’ 5.42k β’ 12 -
google/vivit-b-16x2
Video Classification β’ Updated β’ 432 β’ 11
-
Salesforce/blip-image-captioning-large
Image-to-Text β’ 0.5B β’ Updated β’ 1.17M β’ 1.43k -
Salesforce/blip-image-captioning-base
Image-to-Text β’ Updated β’ 2.11M β’ 798 -
microsoft/trocr-base-handwritten
Image-to-Text β’ 0.3B β’ Updated β’ 227k β’ 452 -
microsoft/git-large-coco
Image-to-Text β’ 0.4B β’ Updated β’ 2.76k β’ 104
-
stabilityai/stable-diffusion-xl-base-1.0
Text-to-Image β’ Updated β’ 2.48M β’ β’ 7.07k -
warp-ai/wuerstchen
Text-to-Image β’ Updated β’ 432 β’ 176 -
Deci/DeciDiffusion-v1-0
Text-to-Image β’ Updated β’ 14 β’ 139 -
stabilityai/stable-diffusion-xl-refiner-1.0
Image-to-Image β’ Updated β’ 395k β’ 1.98k
-
Running102102
Grounding DINO Demo
π»Cutting edge open-vocabulary object detection app
-
Running9191
Owlv2
πState-of-the-art Zero-shot Object Detection
-
Runtime error4141
BLIP2 with transformers
πBLIP2 (cutting edge image captioning) in π€transformers
-
Build error377377
IDEFICS Playground
π¨
-
Running9191
Owlv2
πState-of-the-art Zero-shot Object Detection
-
Runtime error6464
Owl Tracking
β‘Powerful foundation model for zero-shot object tracking
-
Sleeping2525
Search and Detect (CLIP/OWL-ViT)
π¦Search and detect objects in images using text queries
-
Running on Zero107107
OWLSAM
π»State-of-the-art open-vocabulary image segmentation β‘οΈ
-
Running on Zero7272
Draw To Search Art
πDraw/upload image and search among WikiART using SigLIP
-
Running on CPU Upgrade2323
Compare Clip Siglip
πCompare strong zero-shot image classification models
-
Running on Zero1212
Multilingual Zero Shot Image Clf
π’Comparing powerful multilingual zero-shot image clf models
-
BAAI/bunny-phi-2-siglip-lora
Text Generation β’ Updated β’ 138 β’ 48
-
Improved Baselines with Visual Instruction Tuning
Paper β’ 2310.03744 β’ Published β’ 39 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper β’ 2403.05525 β’ Published β’ 46 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper β’ 2308.12966 β’ Published β’ 11 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper β’ 2404.01331 β’ Published β’ 27
-
google/owlvit-base-patch32
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 124k β’ 142 -
google/owlvit-base-patch16
Zero-Shot Object Detection β’ Updated β’ 26.1k β’ 12 -
google/owlvit-large-patch14
Zero-Shot Object Detection β’ Updated β’ 30.1k β’ 26 -
google/owlv2-base-patch16
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 41.9k β’ 28
-
google/owlvit-base-patch32
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 124k β’ 142 -
google/owlvit-base-patch16
Zero-Shot Object Detection β’ Updated β’ 26.1k β’ 12 -
google/owlvit-large-patch14
Zero-Shot Object Detection β’ Updated β’ 30.1k β’ 26 -
google/owlv2-base-patch16
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 41.9k β’ 28
-
google/owlvit-base-patch32
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 124k β’ 142 -
google/owlvit-base-patch16
Zero-Shot Object Detection β’ Updated β’ 26.1k β’ 12 -
google/owlvit-large-patch14
Zero-Shot Object Detection β’ Updated β’ 30.1k β’ 26 -
google/owlv2-base-patch16
Zero-Shot Object Detection β’ 0.2B β’ Updated β’ 41.9k β’ 28
-
depth-anything/Depth-Anything-V2-Small
Depth Estimation β’ Updated β’ 12k β’ 71 -
depth-anything/Depth-Anything-V2-Large
Depth Estimation β’ Updated β’ 371k β’ 129 -
Running on Zero566566
Depth Anything V2
πGenerate depth maps from images
-
depth-anything/DA-2K
Viewer β’ Updated β’ 1.04k β’ 417 β’ 13
-
Running178178
Vidore Leaderboard
π₯Explore visual document retrieval benchmark results
-
Running on CPU Upgrade920920
Open VLM Leaderboard
πVLMEvalKit Evaluation Results Collection
-
Running557557
Vision Arena (Testing VLMs side-by-side)
πΌDisplay image analysis results
-
Running8585
SEED-Bench Leaderboard
πSubmit model evaluation results to leaderboard
-
Paused2121
Video Llava
π¨Generate descriptions by uploading images or videos
-
llava-hf/LLaVA-NeXT-Video-7B-hf
Video-Text-to-Text β’ 7B β’ Updated β’ 104k β’ 112 -
llava-hf/LLaVA-NeXT-Video-7B-DPO-hf
Video-Text-to-Text β’ 7B β’ Updated β’ 2.04k β’ 10 -
llava-hf/LLaVA-NeXT-Video-7B-32K-hf
Image-Text-to-Text β’ 8B β’ Updated β’ 108 β’ 7
-
NVEagle/Eagle-X5-13B
Image-Text-to-Text β’ 15B β’ Updated β’ 4 β’ 15 -
NVEagle/Eagle-X5-13B-Chat
Image-Text-to-Text β’ 15B β’ Updated β’ 6 β’ 28 -
NVEagle/Eagle-X5-7B
Image-Text-to-Text β’ 9B β’ Updated β’ 54 β’ 26 -
Runtime error6464
Eagle X5 13B Chat
πCombine text and images to generate responses
-
vidore/colpali-v1.2
Visual Document Retrieval β’ Updated β’ 34.7k β’ 112 -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text β’ 8B β’ Updated β’ 2.16M β’ β’ 1.23k -
Qwen/Qwen2-VL-2B-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 2.16M β’ 461 -
Qwen/Qwen2-72B-Instruct
Text Generation β’ 73B β’ Updated β’ 28.7k β’ β’ 717