Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decode Paper • 2508.04107 • Published Aug 6 • 4
GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition Paper • 2506.07553 • Published Jun 9 • 15
Progressive Language-guided Visual Learning for Multi-Task Visual Grounding Paper • 2504.16145 • Published Apr 22 • 2