Multimodal OCR3
nanonets2 / dots.ocr / olmOCR2 / chandraOCR
nanonets2 / dots.ocr / olmOCR2 / chandraOCR
Extract text from document images
Play Atari games using a vision-language model
Controllable emotional TTS
Generate images from text prompts
Enhance and restore old photos with faces
Video deep fake
Generate speech from text using Microsoft Edge TTS
Kontext image editing on FLUX[dev]
Chatterbox TTS supporting 23 languages
Chat using Qwen3-VL for Image, Video, PDF, and GIF
Convert spoken words into text
Display and request speech recognition model benchmarks
Ask questions and get answers
Visualize LeRobot Datasets
Generate captions for images
Upscale low-resolution images to high resolution
Generate captions for images in various styles
VGGT (CVPR 2025)
WeShopAI Virtual Try On. Switch outfits with ease virtually.
Scalable and Versatile 3D Generation from images
Generate images from text prompts
Wan 2.2 14B
Generate edited images based on prompts and input images