Shrey Pandit's picture

3 42 5

Shrey Pandit

SP2001

·

https://sites.google.com/view/shrey-pandit/home

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

updated a dataset 9 days ago

SP2001/FRAMES_judge_passed_unique_questions

published a dataset 9 days ago

SP2001/FRAMES_judge_passed_unique_questions

View all activity

Organizations

upvoted a paper 5 days ago

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published 6 days ago • 48

upvoted a paper 29 days ago

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published about 1 month ago • 113

upvoted 3 papers about 1 month ago

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

Paper • 2510.14240 • Published Oct 16 • 11

Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms

Paper • 2510.13913 • Published Oct 15 • 3

Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math

Paper • 2510.13744 • Published Oct 15 • 5

upvoted a paper about 2 months ago

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

Paper • 2510.06499 • Published Oct 7 • 31

upvoted 4 papers 3 months ago

SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents

Paper • 2509.06283 • Published Sep 8 • 17

Why Language Models Hallucinate

Paper • 2509.04664 • Published Sep 4 • 192

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 69

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25 • 207

upvoted 5 papers 4 months ago

BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent

Paper • 2508.06600 • Published Aug 8 • 40

WideSearch: Benchmarking Agentic Broad Info-Seeking

Paper • 2508.07999 • Published Aug 11 • 109

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 312

MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning

Paper • 2507.16812 • Published Jul 22 • 63

WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20 • 60

upvoted 2 papers 5 months ago

GTA1: GUI Test-time Scaling Agent

Paper • 2507.05791 • Published Jul 8 • 26

WebSailor: Navigating Super-human Reasoning for Web Agent

Paper • 2507.02592 • Published Jul 3 • 122

upvoted an article 6 months ago

Article

CodeAgents + Structure: A Better Way to Execute Actions

May 28

•

79

upvoted 2 papers 6 months ago

Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Paper • 2505.17225 • Published May 22 • 64

Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection

Paper • 2505.17558 • Published May 23 • 15