Tomas Chen's picture

36 15

Tomas Chen

tomasce

·

AI & ML interests

None yet

Organizations

None yet

upvoted a paper 2 months ago

Droplet3D: Commonsense Priors from Videos Facilitate 3D Generation

Paper • 2508.20470 • Published Aug 28 • 73

upvoted 19 papers 8 months ago

Video-T1: Test-Time Scaling for Video Generation

Paper • 2503.18942 • Published Mar 24 • 90

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3 • 89

START: Self-taught Reasoner with Tools

Paper • 2503.04625 • Published Mar 6 • 113

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20 • 192

DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation

Paper • 2503.06053 • Published Mar 8 • 138

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Paper • 2410.17856 • Published Oct 23, 2024 • 51

Baichuan Alignment Technical Report

Paper • 2410.14940 • Published Oct 19, 2024 • 51

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Paper • 2409.18124 • Published Sep 26, 2024 • 33

LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published Sep 18, 2024 • 34

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28, 2024 • 37

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Paper • 2409.18042 • Published Sep 26, 2024 • 40

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Paper • 2409.00509 • Published Aug 31, 2024 • 42

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12, 2024 • 48

YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models

Paper • 2409.13592 • Published Sep 20, 2024 • 51

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Paper • 2409.12568 • Published Sep 19, 2024 • 50

PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation

Paper • 2409.06820 • Published Sep 10, 2024 • 68

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 41

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 42

ColPali: Efficient Document Retrieval with Vision Language Models

Paper • 2407.01449 • Published Jun 27, 2024 • 50