ClinicalThought-AI-8B / Training /Training_Documentation.txt
Raymond-dev-546730's picture
Upload Training_Documentation.txt
7447cb0 verified
raw
history blame
1.91 kB
ClinicalThought-AI-8B Training Documentation
===============================================
Model Training Details
---------------------
Base Model: Granite 3.3 8B Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation)
Training Infrastructure: Single NVIDIA RTX 6000 Ada Generation GPU
Training Duration: Approximately 75.8 hours
Training Dataset: Custom curated dataset for medical reasoning
Dataset Specifications
---------------------
Total Token Count: 38,514,400
Total Sample Count: 29,500
Average Tokens/Sample: 1305.57
Dataset Creation: Created from a combination of public medical reasoning datasets from OpenAI o1 and DeepSeek-R1, along with additional reasoning chains created using Claude Sonnet 4 extended thinking
Training Configuration
---------------------
LoRA Parameters:
- Rank: 32
- Alpha: 64
- Dropout: 0.1
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
Training Hyperparameters:
- Learning Rate: 2e-5
- Batch Size: 1
- Gradient Accumulation: 8
- Effective Batch Size: 8
- Max Sequence Length: 12,000
- Epochs: 8
- Warmup Ratio: 0.05
- Weight Decay: 0.005
- Max Grad Norm: 1.0
- LR Scheduler: Cosine with Restarts
Hardware & Environment
---------------------
GPU: NVIDIA RTX 6000 Ada Generation (48GB)
Operating System: Ubuntu
CUDA Version: 11.8
PyTorch Version: 2.7.0
Compute Capability: 8.9
Optimization: FP16, Gradient Checkpointing
Training Performance
---------------------
Training Runtime: 75.8 hours (272,919 seconds)
Train Samples/Second: 0.865
Train Steps/Second: 0.108
Training Loss (Final): 0.738
Total Training Steps: 29,504