XING SUN's picture

12 5

XING SUN

tedsun

·

https://www.sunxing.org/

AI & ML interests

LLM MLLM Agent

Recent Activity

authored a paper 2 days ago

SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger

authored a paper 2 days ago

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

upvoted a paper 3 days ago

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

View all activity

Organizations

None yet

upvoted a paper 3 days ago

VITA-E: Natural Embodied Interaction with Concurrent Seeing, Hearing, Speaking, and Acting

Paper • 2510.21817 • Published 9 days ago • 41

upvoted a paper 16 days ago

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

Paper • 2510.09607 • Published 20 days ago • 2

upvoted 5 papers 18 days ago

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

Paper • 2506.01413 • Published Jun 2 • 16

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Paper • 2502.05177 • Published Feb 7 • 2

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 50

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Paper • 2505.03739 • Published May 6 • 10

CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity

Paper • 2508.11442 • Published Aug 15 • 3

upvoted a paper 20 days ago

Training-Free Group Relative Policy Optimization

Paper • 2510.08191 • Published 21 days ago • 43

upvoted a paper about 1 month ago

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published Sep 26 • 29

upvoted a paper about 2 months ago

Youtu-GraphRAG: Vertically Unified Agents for Graph Retrieval-Augmented Complex Reasoning

Paper • 2508.19855 • Published Aug 27 • 7

upvoted a paper 10 months ago

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Paper • 2501.01957 • Published Jan 3 • 47

upvoted a paper over 1 year ago

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26