Abstract
Deep Research systems integrate LLMs with external tools to enhance problem-solving capabilities, involving query planning, information acquisition, memory management, and answer generation.
Large language models (LLMs) have rapidly evolved from text generators into powerful problem solvers. Yet, many open tasks demand critical thinking, multi-source, and verifiable outputs, which are beyond single-shot prompting or standard retrieval-augmented generation. Recently, numerous studies have explored Deep Research (DR), which aims to combine the reasoning capabilities of LLMs with external tools, such as search engines, thereby empowering LLMs to act as research agents capable of completing complex, open-ended tasks. This survey presents a comprehensive and systematic overview of deep research systems, including a clear roadmap, foundational components, practical implementation techniques, important challenges, and future directions. Specifically, our main contributions are as follows: (i) we formalize a three-stage roadmap and distinguish deep research from related paradigms; (ii) we introduce four key components: query planning, information acquisition, memory management, and answer generation, each paired with fine-grained sub-taxonomies; (iii) we summarize optimization techniques, including prompting, supervised fine-tuning, and agentic reinforcement learning; and (iv) we consolidate evaluation criteria and open challenges, aiming to guide and facilitate future development. As the field of deep research continues to evolve rapidly, we are committed to continuously updating this survey to reflect the latest progress in this area.
Community
Thanks for amazing work
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents (2025)
- Search-R3: Unifying Reasoning and Embedding Generation in Large Language Models (2025)
- StatEval: A Comprehensive Benchmark for Large Language Models in Statistics (2025)
- Instella: Fully Open Language Models with Stellar Performance (2025)
- Large Language Models Meet Text-Attributed Graphs: A Survey of Integration Frameworks and Applications (2025)
- How Far Are We from Genuinely Useful Deep Research Agents? (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper