16 29 26

Zhaokai Wang

wzk1015

https://www.wzk.plus

wzk1015

AI & ML interests

Computer Vision Music Generation Multimodal Large Language Models

Recent Activity

updated a dataset 26 days ago

wzk1015/GenExam-Gen-Images

published a dataset 26 days ago

wzk1015/GenExam-Gen-Images

upvoted a paper 28 days ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

View all activity

Organizations

upvoted a paper 28 days ago

EditThinker: Unlocking Iterative Reasoning for Any Image Editor

Paper • 2512.05965 • Published about 1 month ago • 38

upvoted 4 papers 3 months ago

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17, 2025 • 89

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

Paper • 2510.08565 • Published Oct 9, 2025 • 19

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9, 2025 • 109

Factuality Matters: When Image Generation and Editing Meet Structured Visuals

Paper • 2510.05091 • Published Oct 6, 2025 • 19

upvoted 4 papers 4 months ago

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 211

upvoted a paper 5 months ago

Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation

Paper • 2508.05635 • Published Aug 7, 2025 • 73

upvoted a paper 6 months ago

Mono-InternVL-1.5: Towards Cheaper and Faster Monolithic Multimodal Large Language Models

Paper • 2507.12566 • Published Jul 16, 2025 • 14

upvoted a paper 7 months ago

Large Language Models for Data Synthesis

Paper • 2505.14752 • Published May 20, 2025 • 49

upvoted 2 papers 8 months ago

EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models

Paper • 2505.09694 • Published May 14, 2025 • 19

EnerVerse-AC: Envisioning Embodied Environments with Action Condition

Paper • 2505.09723 • Published May 14, 2025 • 23

upvoted a collection 9 months ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 5 days ago • 672

upvoted a paper 9 months ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 139

upvoted a collection 9 months ago

PIIP

Collection

[NeurIPS 2024 Spotlight (Ranking Top 10), TPAMI 2025] Parameter-Inverted Image Pyramid Networks • 11 items • Updated Sep 28, 2025 • 1

upvoted a paper 9 months ago

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published Apr 3, 2025 • 68

upvoted 2 papers 10 months ago

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

Paper • 2503.11646 • Published Mar 14, 2025 • 34

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25, 2025 • 74

Zhaokai Wang

AI & ML interests

Recent Activity

Organizations

wzk1015's activity