12 53 277

steve z

stzhao

https://zhaoshitian.github.io/

zhaoshitian

AI & ML interests

None yet

Recent Activity

liked a dataset 16 days ago

Agents-X/TIR-Bench

authored a paper 20 days ago

TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

updated a Space 20 days ago

Agents-X/README

View all activity

Organizations

upvoted a paper 20 days ago

TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

Paper • 2511.01833 • Published 21 days ago • 15

upvoted 6 papers 3 months ago

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Paper • 2509.07969 • Published Sep 9 • 59

Symbolic Graphics Programming with Large Language Models

Paper • 2509.05208 • Published Sep 5 • 45

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2 • 83

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 256

MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20 • 42

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 48

upvoted 3 papers 4 months ago

upvoted 4 papers 5 months ago

PyVision: Agentic Vision with Dynamic Tooling

Paper • 2507.07998 • Published Jul 10 • 31

Sekai: A Video Dataset towards World Exploration

Paper • 2506.15675 • Published Jun 18 • 64

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Paper • 2506.10521 • Published Jun 12 • 73

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 271

upvoted 2 articles 6 months ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

Jun 3

•

Article

Cheap Framepack camera control loras with one training video.

Jun 1

•

upvoted a collection 6 months ago

🧠 Reasoning datasets

Collection

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 174

upvoted a paper 6 months ago

DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Paper • 2505.19253 • Published May 25 • 32

upvoted 2 papers 7 months ago

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

TextArena

Paper • 2504.11442 • Published Apr 15 • 29