Yuseung "Phillip" Lee's picture

Yuseung "Phillip" Lee

phillipinseoul

·

https://phillipinseoul.github.io/

phillipinseoul

AI & ML interests

Computer Vision

Recent Activity

upvoted a paper 1 day ago

Latent Implicit Visual Reasoning

upvoted a paper 2 days ago

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

upvoted a paper 3 days ago

Reinforcement Learning for Self-Improving Agent with Skill Library

View all activity

Organizations

upvoted a paper 1 day ago

Latent Implicit Visual Reasoning

Paper • 2512.21218 • Published 3 days ago • 40

upvoted a paper 2 days ago

Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Paper • 2512.20557 • Published 4 days ago • 45

upvoted 2 papers 3 days ago

Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published 9 days ago • 23

SpatialTree: How Spatial Abilities Branch Out in MLLMs

Paper • 2512.20617 • Published 4 days ago • 42

upvoted a paper 4 days ago

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

Paper • 2512.19678 • Published 5 days ago • 26

upvoted 5 papers 5 days ago

MMSI-Video-Bench: A Holistic Benchmark for Video-Based Spatial Intelligence

Paper • 2512.10863 • Published 16 days ago • 21

Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image

Paper • 2512.16899 • Published 9 days ago • 12

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published 9 days ago • 71

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Paper • 2512.17008 • Published 9 days ago • 10

When Reasoning Meets Its Laws

Paper • 2512.17901 • Published 8 days ago • 54

liked a model 7 days ago

Qwen/Qwen3-VL-4B-Thinking

Image-Text-to-Text • 4B • Updated Oct 15 • 53.7k • 92

upvoted 4 papers 8 days ago

AdaTooler-V: Adaptive Tool-Use for Images and Videos

Paper • 2512.16918 • Published 9 days ago • 11

Next-Embedding Prediction Makes Strong Vision Learners

Paper • 2512.16922 • Published 9 days ago • 79

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Paper • 2512.16561 • Published 9 days ago • 19

Adaptation of Agentic AI

Paper • 2512.16301 • Published 9 days ago • 92

upvoted 2 papers 9 days ago

DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models

Paper • 2512.15713 • Published 10 days ago • 15

Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Paper • 2512.14681 • Published 11 days ago • 39

upvoted 3 papers 10 days ago

Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure

Paper • 2512.14336 • Published 11 days ago • 28

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Paper • 2512.14614 • Published 11 days ago • 64

RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics

Paper • 2512.13660 • Published 12 days ago • 37