Running on Zero MCP Featured 211 ViTPose Transformers ⚡ 211 Detect and estimate human poses in images and videos
view article Article Universal Image Segmentation with Mask2Former and OneFormer +1 Jan 19, 2023 • 15
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation Paper • 2507.22886 • Published Jul 30, 2025 • 10
jonatasgrosman/wav2vec2-large-xlsr-53-english Automatic Speech Recognition • 0.3B • Updated Mar 25, 2023 • 94.8k • 475
openai/clip-vit-large-patch14 Zero-Shot Image Classification • 0.4B • Updated Sep 15, 2023 • 7.74M • 1.95k