Improve model card: Add `transformers` compatibility, `text-generation` pipeline tag, and comprehensive details

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +85 -12
README.md CHANGED
@@ -1,37 +1,110 @@
1
  ---
2
- license: mit
3
  base_model:
4
  - Qwen/Qwen2.5-32B-Instruct
 
 
 
5
  ---
6
 
7
- this model is related to following work:
8
- ## MedResearcher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework
9
 
10
  [![arXiv](https://img.shields.io/badge/arxiv-2508.14880-blue)](https://arxiv.org/abs/2508.14880)
11
  [![github](https://img.shields.io/badge/github-MedResearcher-orange)](https://github.com/AQ-MedAI/MedResearcher-R1)
12
-
13
  [![license](https://img.shields.io/badge/license-MIT-white)](https://github.com/AQ-MedAI/MedResearcher-R1/blob/main/LICENSE)
14
 
15
- ### author list
16
  Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu
17
 
18
- ### abstract
19
  Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challengesโ€”the MedBrowseComp benchmark reveals even GPT-o3 deep research, the leading proprietary deep research system, achieves only 25.5% accuracy on complex medical queries. The key limitations are: (1) insufficient dense medical knowledge for clinical reasoning, and (2) lack of medical-specific retrieval tools. We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting longest chains from subgraphs around rare medical entities to generate complex multi-hop QA pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2,100 diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions. Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our open-source 32B model achieves competitive performance on general benchmarks (GAIA: 53.4, xBench: 54), comparable to GPT-4o-mini, while outperforming significantly larger proprietary models. More importantly, we establish new state-of-the-art on MedBrowseComp with 27.5% accuracy, surpassing leading closed-source deep research systems including O3 deepresearch, substantially advancing medical deep research capabilities. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains. Code and datasets will be released to facilitate further research.
20
 
21
- ## Run Evaluation
22
- If you would like to use our model for inference and evaluation, please refer to our GitHub repo [![github](https://img.shields.io/badge/github-MedResearcher-orange)](https://github.com/AQ-MedAI/MedResearcher-R1). We provide complete evaluation tools and code in the EvaluationPipeline so that you can verify the performance on some common rankings(such as gaia-103-text) or your own datasets.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
 
 
 
24
 
25
- ## โœ๏ธCitation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ```
27
- {@article{ant2025medresearcher,
 
 
 
28
  title={MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework},
29
  author={Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu},
30
  journal={arXiv preprint},
31
- url={https://arxiv.org/abs/2508.14880}
32
  year={2025}
33
  }
34
  ```
35
 
36
  ## ๐Ÿ“œ License
37
- MedReseacher-R1 is licensed under the MIT license.
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - Qwen/Qwen2.5-32B-Instruct
4
+ license: mit
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
  ---
8
 
9
+ # MedResearcher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework
 
10
 
11
  [![arXiv](https://img.shields.io/badge/arxiv-2508.14880-blue)](https://arxiv.org/abs/2508.14880)
12
  [![github](https://img.shields.io/badge/github-MedResearcher-orange)](https://github.com/AQ-MedAI/MedResearcher-R1)
 
13
  [![license](https://img.shields.io/badge/license-MIT-white)](https://github.com/AQ-MedAI/MedResearcher-R1/blob/main/LICENSE)
14
 
15
+ ### Author List
16
  Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu
17
 
18
+ ### Abstract
19
  Recent developments in Large Language Model (LLM)-based agents have shown impressive capabilities spanning multiple domains, exemplified by deep research systems that demonstrate superior performance on complex information-seeking and synthesis tasks. While general-purpose deep research agents have shown impressive capabilities, they struggle significantly with medical domain challengesโ€”the MedBrowseComp benchmark reveals even GPT-o3 deep research, the leading proprietary deep research system, achieves only 25.5% accuracy on complex medical queries. The key limitations are: (1) insufficient dense medical knowledge for clinical reasoning, and (2) lack of medical-specific retrieval tools. We present a medical deep research agent that addresses these challenges through two core innovations. First, we develop a novel data synthesis framework using medical knowledge graphs, extracting longest chains from subgraphs around rare medical entities to generate complex multi-hop QA pairs. Second, we integrate a custom-built private medical retrieval engine alongside general-purpose tools, enabling accurate medical information synthesis. Our approach generates 2,100 diverse trajectories across 12 medical specialties, each averaging 4.2 tool interactions. Through a two-stage training paradigm combining supervised fine-tuning and online reinforcement learning with composite rewards, our open-source 32B model achieves competitive performance on general benchmarks (GAIA: 53.4, xBench: 54), comparable to GPT-4o-mini, while outperforming significantly larger proprietary models. More importantly, we establish new state-of-the-art on MedBrowseComp with 27.5% accuracy, surpassing leading closed-source deep research systems including O3 deepresearch, substantially advancing medical deep research capabilities. Our work demonstrates that strategic domain-specific innovations in architecture, tool design, and training data construction can enable smaller open-source models to outperform much larger proprietary systems in specialized domains. Code and datasets will be released to facilitate further research.
20
 
21
+ <div align="center">
22
+ <img src="https://github.com/AQ-MedAI/MedResearcher-R1/raw/main/assets/logo.png" alt="logo" width="300"/>
23
+ </div>
24
+
25
+ **MedResearcher-R1** is a comprehensive **training data generation and synthesis framework** that tackles the challenge of domain-specific AI reasoning through **knowledge-informed trajectory synthesis**. Our framework provides an end-to-end solution for generating high-quality training data, consisting of three integrated components:
26
+
27
+ **๐Ÿง  Knowledge Graph Construction**: Our core innovation - an intelligent knowledge graph construction and QA synthesis system that transforms domain knowledge into high-quality question-answer pairs with automated reasoning path generation. This module serves as the foundation for creating domain-specific training data.
28
+
29
+ <div align="center">
30
+ <img src="https://github.com/AQ-MedAI/MedResearcher-R1/raw/main/assets/qa_generation_system.png" alt="Knowledge Graph Construction Diagram"/>
31
+ </div>
32
+
33
+ **๐Ÿ”„ Trajectory Generation Pipeline**: End-to-end trajectory synthesis and optimization system that converts QA pairs into multi-turn reasoning trajectories with tool interactions and quality filtering for model training.
34
+
35
+ **๐Ÿ“Š Evaluation Pipeline**: Comprehensive model evaluation and validation framework for assessing reasoning performance across multiple benchmarks and validating the quality of synthesized training data.
36
+
37
+ These three components form a complete **training data production pipeline** from knowledge extraction to model training data generation and evaluation, enabling the creation of specialized reasoning models for domain-specific applications.
38
+
39
+ ## Features
40
+ - **Knowledge Graph Construction**
41
+ - **Interface Support**: Interactive web visualization with D3.js force-directed graphs
42
+ - **Advanced Sampling Algorithms**: 5 sophisticated strategies (mixed, augmented_chain, community_core_path, dual_core_bridge, max_chain) for complex subgraph extraction
43
+ - **Unified QA Generation**: Deep concept obfuscation with quantitative reasoning and multi-paradigm question synthesis
44
+ - **Reasoning Path Generation**: Automated cheat_sheet creation with detailed step-by-step reasoning guidance for complex multi-hop questions
45
+ - **Batch Processing System**: Concurrent QA generation with intelligent QPS control, progress monitoring, and resume capability
46
+
47
+ - **Trajectory Generation Pipeline**
48
+ - **Agent Framework**: Multi-turn reasoning with tool integration and concurrent task processing
49
+ - **Advanced Quality Filtering**: Token-based validation, tool call/response matching, and automated error detection
50
+ - **Intelligent Rewriting System**: LLM-powered trajectory optimization with Masked Trajectory Guidance (MTG)
51
+
52
+ - **Evaluation Pipeline**
53
+ - **Interactive Question Reasoning**: Single question mode with detailed step-by-step process visualization
54
+ - **Batch Dataset Evaluation**: Multi-worker parallel processing with configurable rollouts and timeout controls
55
+
56
+ ## Performance Highlights
57
+
58
+ Using our knowledge-informed trajectory synthesis framework, we developed **MedResearcher-R1**, a specialized reasoning model that demonstrates exceptional performance across multiple challenging benchmarks including MedBrowseComp, GAIA, and XBench-DeepSearch.
59
 
60
+ <div align="center">
61
+ <img src="https://github.com/AQ-MedAI/MedResearcher-R1/raw/main/assets/performance.jpg" alt="Performance Table"/>
62
+ </div>
63
 
64
+ ## Open-Sourced Dataset
65
+
66
+ We have open-sourced a high-quality QA dataset constructed through our KnowledgeGraphConstruction module. The dataset is available at [`TrajectoryGenerationPipeline/qa_data/open_data.jsonl`](https://github.com/AQ-MedAI/MedResearcher-R1/blob/main/TrajectoryGenerationPipeline/qa_data/open_data.jsonl) and contains:
67
+
68
+ - **Complex reasoning question-answer pairs** Multi-hop qa-pairs generated using our graph method
69
+ - **Detailed step-by-step reasoning paths** for each question, providing comprehensive problem-solving guidance
70
+
71
+ ## Quick start: Run Model for Evaluation
72
+
73
+ You can run a server for the model via `sglang` or `vllm` for evaluation, as described in the GitHub repository's [Quick start](https://github.com/AQ-MedAI/MedResearcher-R1#quick-start) section.
74
+
75
+ First, install `sglang` (e.g., `pip install sglang[all]`):
76
+ ```bash
77
+ pip install sglang[all]
78
+ CUDA_VISIBLE_DEVICES=0,1 python -m sglang.launch_server --model-path /path/to/your/model --port 6001 --host 0.0.0.0 --mem-fraction-static 0.95 --tp-size 2
79
+ ```
80
+ Then, you can evaluate model performance using the Evaluation Pipeline as detailed in the [GitHub repo](https://github.com/AQ-MedAI/MedResearcher-R1):
81
+ ```bash
82
+ cd ../EvaluationPipeline
83
+ # Run single question evaluation
84
+ python eval_cli.py --mode interactive
85
+
86
+ # Run batch dataset evaluation
87
+ python eval_cli.py --mode batch --dataset sample --workers 20
88
  ```
89
+
90
+ ## โœ๏ธ Citation
91
+ ```bibtex
92
+ @article{ant2025medresearcher,
93
  title={MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework},
94
  author={Ailing Yu, Lan Yao, Jingnan Liu, Zhe Chen, Jiajun Yin, Yuan Wang, Xinhao Liao, Zhiling Ye, Ji Li, Yun Yue, Hansong Xiao, Hualei Zhou, Chunxiao Guo, Peng Wei, Jinjie Gu},
95
  journal={arXiv preprint},
96
+ url={https://arxiv.org/abs/2508.14880},
97
  year={2025}
98
  }
99
  ```
100
 
101
  ## ๐Ÿ“œ License
102
+ MedReseacher-R1 is licensed under the MIT license.
103
+
104
+ ---
105
+
106
+ <div align="center">
107
+
108
+ [![Star History Chart](https://api.star-history.com/svg?repos=AQ-MedAI/MedResearcher-R1&type=Date)](https://star-history.com/#AQ-MedAI/MedResearcher-R1&Date)
109
+
110
+ </div>