AdAstraAbyssoque commited on
Commit
88370c2
·
verified ·
1 Parent(s): d50d5b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -74
README.md CHANGED
@@ -1,85 +1,118 @@
1
  ---
2
  library_name: transformers
3
- license: other
4
  base_model: google/gemma-2-2b
5
  tags:
6
  - llama-factory
7
  - full
8
  - generated_from_trainer
9
  model-index:
10
- - name: graphreason_v2
11
  results: []
12
  ---
13
 
14
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
- should probably proofread and complete it, then remove this comment. -->
16
-
17
- # graphreason_v2
18
-
19
- This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on the /mnt/nas/nuochen/pretrain/cpt/saves/gemma_2b/graphreason_v2/ dataset.
20
- It achieves the following results on the evaluation set:
21
- - Loss: 0.2455
22
-
23
- ## Model description
24
-
25
- More information needed
26
-
27
- ## Intended uses & limitations
28
-
29
- More information needed
30
-
31
- ## Training and evaluation data
32
-
33
- More information needed
34
-
35
- ## Training procedure
36
-
37
- ### Training hyperparameters
38
-
39
- The following hyperparameters were used during training:
40
- - learning_rate: 3e-05
41
- - train_batch_size: 2
42
- - eval_batch_size: 1
43
- - seed: 42
44
- - distributed_type: multi-GPU
45
- - num_devices: 32
46
- - gradient_accumulation_steps: 16
47
- - total_train_batch_size: 1024
48
- - total_eval_batch_size: 32
49
- - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
50
- - lr_scheduler_type: cosine
51
- - lr_scheduler_warmup_ratio: 0.1
52
- - num_epochs: 3.0
53
-
54
- ### Training results
55
-
56
- | Training Loss | Epoch | Step | Validation Loss |
57
- |:-------------:|:------:|:----:|:---------------:|
58
- | 0.3897 | 0.1446 | 100 | 0.3410 |
59
- | 0.3792 | 0.2893 | 200 | 0.3136 |
60
- | 0.3701 | 0.4339 | 300 | 0.2808 |
61
- | 0.3693 | 0.5786 | 400 | 0.2640 |
62
- | 0.3644 | 0.7232 | 500 | 0.2601 |
63
- | 0.3525 | 0.8678 | 600 | 0.2573 |
64
- | 0.3781 | 1.0125 | 700 | 0.2628 |
65
- | 0.3443 | 1.1571 | 800 | 0.2520 |
66
- | 0.3402 | 1.3018 | 900 | 0.2502 |
67
- | 0.3307 | 1.4464 | 1000 | 0.2489 |
68
- | 0.344 | 1.5910 | 1100 | 0.2501 |
69
- | 0.335 | 1.7357 | 1200 | 0.2465 |
70
- | 0.325 | 1.8803 | 1300 | 0.2460 |
71
- | 0.3164 | 2.0250 | 1400 | 0.2465 |
72
- | 0.3249 | 2.1696 | 1500 | 0.2468 |
73
- | 0.3202 | 2.3142 | 1600 | 0.2467 |
74
- | 0.3224 | 2.4589 | 1700 | 0.2458 |
75
- | 0.3264 | 2.6035 | 1800 | 0.2456 |
76
- | 0.3259 | 2.7481 | 1900 | 0.2457 |
77
- | 0.3253 | 2.8928 | 2000 | 0.2455 |
78
-
79
-
80
- ### Framework versions
81
-
82
- - Transformers 4.46.0
83
- - Pytorch 2.4.0+cu121
84
- - Datasets 3.0.1
85
- - Tokenizers 0.20.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
  base_model: google/gemma-2-2b
5
  tags:
6
  - llama-factory
7
  - full
8
  - generated_from_trainer
9
  model-index:
10
+ - name: GraphMind-Gemma2-2B
11
  results: []
12
  ---
13
 
14
+
15
+ # Model Card for GraphMind Series
16
+
17
+ This model card describes the **GraphMind** series of models, which are Large Language Models (LLMs) enhanced for generalized reasoning through continued pre-training on graph-based problems.
18
+
19
+ ## Model Description
20
+
21
+ GraphMind is a series of Large Language Models developed to improve the generalized reasoning capabilities of existing base models.
22
+ The core innovation is the continued pre-training (CPT) on **GraphPile**, a large-scale 10.9 billion token dataset specifically designed with Graph Problem Reasoning (GPR) data.
23
+
24
+ By training on diverse and complex graph problems—which require sophisticated logical, topological, and relational reasoning—GraphMind models learn more robust and transferable reasoning patterns.
25
+ This approach bridges the gap between domain-specific training (e.g., mathematics) and the need for universally capable and adaptable LLMs.
26
+
27
+ The GraphMind series is built upon three popular open-source models:
28
+
29
+ * Llama 3
30
+ * Llama 3.1
31
+ * Gemma 2
32
+
33
+ ## Key Features
34
+
35
+ - **Enhanced General Reasoning**: Significant gains not only on graph-related tasks but also across mathematical, logical, commonsense, and code reasoning benchmarks.
36
+ - **Superior Performance on Graph Problems**: Thanks to the GraphPile corpus, the models excel at tasks involving graph theory, such as pathfinding, network analysis, and topological sorting.
37
+ - **Strong Transfer Learning**: Reasoning skills acquired from graph problems effectively transfer to other domains.
38
+ - **Excellent Post-Training Potential**: Stronger foundation for fine-tuning on downstream tasks. For instance, the Gemma-based GraphMind fine-tuned on GSM8K achieves **23.6% higher accuracy** than its fine-tuned base model.
39
+
40
+ ## Performance
41
+
42
+ GraphMind models show consistent improvements over their base models across reasoning benchmarks.
43
+
44
+ **Generalization Improvements**:
45
+
46
+ - **Mathematical Reasoning**: up to **4.9%** average improvement across 11 datasets.
47
+ - **Logical Reasoning**: **33.4%** improvement.
48
+ - **Code Reasoning**: **46.3%** improvement.
49
+ - **Commonsense Reasoning**: **7.8%** improvement.
50
+ - **Multi-Hop QA**: **10.3%** improvement.
51
+
52
+ **Foundational Improvements**:
53
+
54
+ - **Graph Problem Reasoning**: Average improvement of **53.1%** compared to baseline models.
55
+
56
+ ## Training Data: The GraphPile Corpus
57
+
58
+ GraphMind's capabilities are derived from its training on **GraphPile**, the first large-scale corpus designed for continued pre-training using Graph Problem Reasoning data.
59
+
60
+ **Statistics**:
61
+
62
+ - **Total Tokens**: 10.9 Billion
63
+ - **Total Samples**: 2.68 Million
64
+ - **Graph Tasks**: 23 distinct tasks covering multiple reasoning paradigms
65
+
66
+ **Data Components**:
67
+
68
+ 1. **Chain-of-Thought (CoT) Data**: Step-by-step reasoning processes for graph problems, generated using program-guided methods.
69
+ 2. **Program-of-Thought (PoT) Data**: Executable code solutions for graph problems, often derived from standard libraries.
70
+ 3. **Trace-of-Execution (ToE) Data**: Records execution traces of graph algorithms, enabling learning from dynamic algorithmic processes.
71
+ 4. **Real-world Graph Data**: Includes tasks from sources like DBpedia and DBLP, enriching the dataset with practical contexts.
72
+
73
+ ## Training Procedure
74
+
75
+ The GraphMind models were developed by performing continued pre-training on the GraphPile dataset.
76
+
77
+ * **Base Models**: Llama-3-8B, Llama-3.1-8B, Gemma-2-2B
78
+ * **Learning Rate**: 3e-5
79
+ * **Epochs**: 3
80
+ * **Max Sequence Length**: 8192
81
+ * **Global Batch Size**: 1024
82
+ * **Hardware**: 32 × NVIDIA H100 GPUs
83
+
84
+ ## Intended Use and Limitations
85
+
86
+ ### Intended Use
87
+
88
+ These models are intended for use in research and development for tasks that demand strong, generalized reasoning. Potential applications include:
89
+
90
+ * Solving complex logical and mathematical problems.
91
+ * Algorithmic reasoning and code generation for graph-related tasks.
92
+ * Serving as powerful base models for fine-tuning on reasoning-intensive downstream tasks.
93
+
94
+ ### Limitations
95
+
96
+ * GraphPile is limited to 23 graph problem tasks; more diversity could improve results.
97
+ * As reasoning-focused models, GraphMind may perform worse on simpler, non-reasoning tasks such as summarization or translation.
98
+ * Further exploration of different GraphPile configurations could yield additional gains.
99
+
100
+ ## Available Models
101
+
102
+ * **HKUST-DSAIL/GraphMind-Gemma2-2B**
103
+ * **HKUST-DSAIL/GraphMind-LLAMA-3.1-8B**
104
+ * **HKUST-DSAIL/GraphMind-LLAMA-3-8B**
105
+
106
+ ## Citation
107
+
108
+ ```bibtex
109
+ @misc{zhang2025improving,
110
+ title={Improving LLMs' Generalized Reasoning Abilities by Graph Problems},
111
+ author={Qifan Zhang and Nuo Chen and Zehua Li and Miao Peng and Jing Tang and Jia Li},
112
+ year={2025},
113
+ eprint={2507.17168},
114
+ archivePrefix={arXiv},
115
+ primaryClass={cs.AI},
116
+ url={https://arxiv.org/abs/2507.17168v1}
117
+ }
118
+ ```