Inference Providers
Active filters: VLM
Video-Text-to-Text
• 2B • Updated • 14.7k
• 444
numind/NuMarkdown-8B-Thinking
Image-to-Text
• 8B • Updated • 41.5k
• 472
Video-Text-to-Text
• 2B • Updated • 1.07k
• 5
nvidia/NVIDIA-Nemotron-Parse-v1.2
Image-Text-to-Text
• 0.9B • Updated • 129k
• 37
Image-Text-to-Text
• 2B • Updated • 525
• 34
Image-Text-to-Text
• 1B • Updated • 2.14k
• 30
nvidia/VILA-HD-8B-PS3-1.5K-SigLIP
Image-Text-to-Text
• Updated • 58
• 4
nvidia/VILA-HD-8B-PS3-4K-SigLIP
Image-Text-to-Text
• Updated • 61
• 2
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
Image-Text-to-Text
• 9B • Updated • 1.21M
• 179
Image-Text-to-Text
• 8B • Updated • 12.8k
• 30
nvidia/VILA-HD-8B-PS3-1.5K-SigLIP2
Image-Text-to-Text
• Updated • 540
• 1
nvidia/VILA-HD-8B-PS3-4K-SigLIP2
Image-Text-to-Text
• Updated • 55
• 3
nvidia/VILA-HD-8B-PS3-1.5K-C-RADIOv2
Image-Text-to-Text
• Updated • 57
• 1
nvidia/VILA-HD-8B-PS3-4K-C-RADIOv2
Image-Text-to-Text
• Updated • 60
• 1
nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16
Image-Text-to-Text
• 13B • Updated • 170k
• 83
Image-Text-to-Text
• 0.8B • Updated • 165
• 1
Image-Text-to-Text
• 9B • Updated • 112
• 4
mradermacher/ToolCUA-8B-GGUF
8B • Updated • 779
• 2
adnankhan-11/VisionNav-3B
4B • Updated • 123
• 1
mradermacher/VisionNav-3B-GGUF
3B • Updated • 503
• 1
Efficient-Large-Model/VILA-13b
Text Generation
• 13B • Updated • 23
• 20
Efficient-Large-Model/VILA-7b
Text Generation
• 7B • Updated • 585
• 27
Efficient-Large-Model/VILA-7b-4bit-awq
Text Generation
• Updated • 14
• 2
Efficient-Large-Model/VILA-13b-4bit-awq
Text Generation
• Updated • 13
• 2
Efficient-Large-Model/VILA-2.7b
Text Generation
• 3B • Updated • 138
• 15
TIGER-Lab/Mantis-bakllava-7b
Image-Text-to-Text
• 8B • Updated • 49
• 5
TIGER-Lab/Mantis-llava-7b
Image-Text-to-Text
• 7B • Updated • 22
• 16
Efficient-Large-Model/VILA1.5-3b
Text Generation
• Updated • 1.58k
• 34
Efficient-Large-Model/VILA1.5-13b
Text Generation
• Updated • 260
• 5