Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2602.20739

about 9 hours ago

PyVision-RL: Forging Open Agentic Vision Models via RL

Paper • 2602.20739 • Published 2 days ago • 24

Agents-X/PyVision-Image-7B-SFT

Image-Text-to-Text • 8B • Updated 1 day ago • 1
Agents-X/PyVision-Image-7B-RL

8B • Updated 1 day ago • 8
Agents-X/PyVision-Image-SFT-Data

Viewer • Updated 1 day ago • 6.88k • 23
Agents-X/PyVision-Image-RL-Data

Viewer • Updated 1 day ago • 44.6k • 21

about 20 hours ago

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Paper • 2503.20756 • Published Mar 26, 2025 • 7
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 99
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 214
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 149

Agents-X/PyVision-Video-7B-RL

8B • Updated 1 day ago • 29
Agents-X/PyVision-Video-7B-SFT

8B • Updated 1 day ago • 11
Agents-X/PyVision-Video-SFT-Data

Updated 1 day ago • 9
Agents-X/PyVision-Video-RL-Data

Viewer • Updated 1 day ago • 15k • 16

about 8 hours ago

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 133
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 141
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 160

about 9 hours ago

PyVision-RL: Forging Open Agentic Vision Models via RL

Paper • 2602.20739 • Published 2 days ago • 24

Agents-X/PyVision-Video-7B-RL

8B • Updated 1 day ago • 29
Agents-X/PyVision-Video-7B-SFT

8B • Updated 1 day ago • 11
Agents-X/PyVision-Video-SFT-Data

Updated 1 day ago • 9
Agents-X/PyVision-Video-RL-Data

Viewer • Updated 1 day ago • 15k • 16

Agents-X/PyVision-Image-7B-SFT

Image-Text-to-Text • 8B • Updated 1 day ago • 1
Agents-X/PyVision-Image-7B-RL

8B • Updated 1 day ago • 8
Agents-X/PyVision-Image-SFT-Data

Viewer • Updated 1 day ago • 6.88k • 23
Agents-X/PyVision-Image-RL-Data

Viewer • Updated 1 day ago • 44.6k • 21

about 8 hours ago

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 133
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 141
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 160

about 20 hours ago

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Paper • 2503.20756 • Published Mar 26, 2025 • 7
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 99
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 214
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 149

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs