19 39 23

Zhang Yuanhan

ZhangYuanhan

https://zhangyuanhan-ai.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 7 days ago

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

upvoted a paper 8 days ago

Demystifing Video Reasoning

upvoted a paper 22 days ago

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

View all activity

Organizations

upvoted a paper 7 days ago

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

Paper • 2602.02488 • Published Feb 2 • 36

upvoted a paper 8 days ago

Demystifing Video Reasoning

Paper • 2603.16870 • Published 9 days ago • 356

upvoted a paper 22 days ago

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published 23 days ago • 86

upvoted 2 papers about 2 months ago

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Paper • 2601.22153 • Published Jan 29 • 74

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published Jan 27 • 42

upvoted 2 papers 3 months ago

Streaming Video Instruction Tuning

Paper • 2512.21334 • Published Dec 24, 2025 • 10

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 67

upvoted 3 papers 4 months ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 188

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 94

Scaling Spatial Intelligence with Multimodal Foundation Models

Paper • 2511.13719 • Published Nov 17, 2025 • 48

upvoted a collection 5 months ago

LLaVA-Video

Collection

Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 8 items • Updated Feb 21, 2025 • 64

upvoted a paper 6 months ago

Visual Jigsaw Post-Training Improves MLLMs

Paper • 2509.25190 • Published Sep 29, 2025 • 37

upvoted 2 papers 7 months ago

LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31, 2025 • 85

POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion

Paper • 2509.01215 • Published Sep 1, 2025 • 51

liked a dataset 8 months ago

lmms-lab/video-tt

Viewer • Updated Jul 26, 2025 • 10k • 421 • 5

updated a dataset 8 months ago

lmms-lab/video-tt

Viewer • Updated Jul 26, 2025 • 10k • 421 • 5

upvoted 2 papers 8 months ago

SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Paper • 2507.15245 • Published Jul 21, 2025 • 11

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Paper • 2507.15028 • Published Jul 20, 2025 • 21

authored a paper 8 months ago

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Paper • 2507.15028 • Published Jul 20, 2025 • 21

commented a paper 8 months ago

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Paper • 2507.15028 • Published Jul 20, 2025 • 21 •

Zhang Yuanhan

AI & ML interests

Recent Activity

Organizations

ZhangYuanhan's activity