HKUST-DSAIL
/

GraphMind-Gemma2-2B

@@ -1,85 +1,118 @@
 ---
 library_name: transformers
-license: other
 base_model: google/gemma-2-2b
 tags:
 - llama-factory
 - full
 - generated_from_trainer
 model-index:
-- name: graphreason_v2
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# graphreason_v2
-This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on the /mnt/nas/nuochen/pretrain/cpt/saves/gemma_2b/graphreason_v2/ dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.2455
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 3e-05
-- train_batch_size: 2
-- eval_batch_size: 1
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 32
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 1024
-- total_eval_batch_size: 32
-- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 3.0
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 0.3897        | 0.1446 | 100  | 0.3410          |
-| 0.3792        | 0.2893 | 200  | 0.3136          |
-| 0.3701        | 0.4339 | 300  | 0.2808          |
-| 0.3693        | 0.5786 | 400  | 0.2640          |
-| 0.3644        | 0.7232 | 500  | 0.2601          |
-| 0.3525        | 0.8678 | 600  | 0.2573          |
-| 0.3781        | 1.0125 | 700  | 0.2628          |
-| 0.3443        | 1.1571 | 800  | 0.2520          |
-| 0.3402        | 1.3018 | 900  | 0.2502          |
-| 0.3307        | 1.4464 | 1000 | 0.2489          |
-| 0.344         | 1.5910 | 1100 | 0.2501          |
-| 0.335         | 1.7357 | 1200 | 0.2465          |
-| 0.325         | 1.8803 | 1300 | 0.2460          |
-| 0.3164        | 2.0250 | 1400 | 0.2465          |
-| 0.3249        | 2.1696 | 1500 | 0.2468          |
-| 0.3202        | 2.3142 | 1600 | 0.2467          |
-| 0.3224        | 2.4589 | 1700 | 0.2458          |
-| 0.3264        | 2.6035 | 1800 | 0.2456          |
-| 0.3259        | 2.7481 | 1900 | 0.2457          |
-| 0.3253        | 2.8928 | 2000 | 0.2455          |
-### Framework versions
-- Transformers 4.46.0
-- Pytorch 2.4.0+cu121
-- Datasets 3.0.1
-- Tokenizers 0.20.3

 ---
 library_name: transformers
+license: mit
 base_model: google/gemma-2-2b
 tags:
 - llama-factory
 - full
 - generated_from_trainer
 model-index:
+- name: GraphMind-Gemma2-2B
   results: []
 ---
+# Model Card for GraphMind Series
+This model card describes the **GraphMind** series of models, which are Large Language Models (LLMs) enhanced for generalized reasoning through continued pre-training on graph-based problems.
+## Model Description
+GraphMind is a series of Large Language Models developed to improve the generalized reasoning capabilities of existing base models.
+The core innovation is the continued pre-training (CPT) on **GraphPile**, a large-scale 10.9 billion token dataset specifically designed with Graph Problem Reasoning (GPR) data.
+By training on diverse and complex graph problems—which require sophisticated logical, topological, and relational reasoning—GraphMind models learn more robust and transferable reasoning patterns.
+This approach bridges the gap between domain-specific training (e.g., mathematics) and the need for universally capable and adaptable LLMs.
+The GraphMind series is built upon three popular open-source models:
+  * Llama 3
+  * Llama 3.1
+  * Gemma 2
+## Key Features
+- **Enhanced General Reasoning**: Significant gains not only on graph-related tasks but also across mathematical, logical, commonsense, and code reasoning benchmarks.
+- **Superior Performance on Graph Problems**: Thanks to the GraphPile corpus, the models excel at tasks involving graph theory, such as pathfinding, network analysis, and topological sorting.
+- **Strong Transfer Learning**: Reasoning skills acquired from graph problems effectively transfer to other domains.
+- **Excellent Post-Training Potential**: Stronger foundation for fine-tuning on downstream tasks. For instance, the Gemma-based GraphMind fine-tuned on GSM8K achieves **23.6% higher accuracy** than its fine-tuned base model.
+## Performance
+GraphMind models show consistent improvements over their base models across reasoning benchmarks.
+**Generalization Improvements**:
+- **Mathematical Reasoning**: up to **4.9%** average improvement across 11 datasets.
+- **Logical Reasoning**: **33.4%** improvement.
+- **Code Reasoning**: **46.3%** improvement.
+- **Commonsense Reasoning**: **7.8%** improvement.
+- **Multi-Hop QA**: **10.3%** improvement.
+**Foundational Improvements**:
+- **Graph Problem Reasoning**: Average improvement of **53.1%** compared to baseline models.
+## Training Data: The GraphPile Corpus
+GraphMind's capabilities are derived from its training on **GraphPile**, the first large-scale corpus designed for continued pre-training using Graph Problem Reasoning data.
+**Statistics**:
+- **Total Tokens**: 10.9 Billion
+- **Total Samples**: 2.68 Million
+- **Graph Tasks**: 23 distinct tasks covering multiple reasoning paradigms
+**Data Components**:
+1. **Chain-of-Thought (CoT) Data**: Step-by-step reasoning processes for graph problems, generated using program-guided methods.
+2. **Program-of-Thought (PoT) Data**: Executable code solutions for graph problems, often derived from standard libraries.
+3. **Trace-of-Execution (ToE) Data**: Records execution traces of graph algorithms, enabling learning from dynamic algorithmic processes.
+4. **Real-world Graph Data**: Includes tasks from sources like DBpedia and DBLP, enriching the dataset with practical contexts.
+## Training Procedure
+The GraphMind models were developed by performing continued pre-training on the GraphPile dataset.
+* **Base Models**: Llama-3-8B, Llama-3.1-8B, Gemma-2-2B
+* **Learning Rate**: 3e-5
+* **Epochs**: 3
+* **Max Sequence Length**: 8192
+* **Global Batch Size**: 1024
+* **Hardware**: 32 × NVIDIA H100 GPUs
+## Intended Use and Limitations
+### Intended Use
+These models are intended for use in research and development for tasks that demand strong, generalized reasoning. Potential applications include:
+* Solving complex logical and mathematical problems.
+* Algorithmic reasoning and code generation for graph-related tasks.
+* Serving as powerful base models for fine-tuning on reasoning-intensive downstream tasks.
+### Limitations
+* GraphPile is limited to 23 graph problem tasks; more diversity could improve results.
+* As reasoning-focused models, GraphMind may perform worse on simpler, non-reasoning tasks such as summarization or translation.
+* Further exploration of different GraphPile configurations could yield additional gains.
+## Available Models
+* **HKUST-DSAIL/GraphMind-Gemma2-2B**
+* **HKUST-DSAIL/GraphMind-LLAMA-3.1-8B**
+* **HKUST-DSAIL/GraphMind-LLAMA-3-8B**
+## Citation
+```bibtex
+@misc{zhang2025improving,
+      title={Improving LLMs' Generalized Reasoning Abilities by Graph Problems},
+      author={Qifan Zhang and Nuo Chen and Zehua Li and Miao Peng and Jing Tang and Jia Li},
+      year={2025},
+      eprint={2507.17168},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2507.17168v1}
+}
+```