Raymond-dev-546730 commited on
Commit
7447cb0
·
verified ·
1 Parent(s): c90062c

Upload Training_Documentation.txt

Browse files
Files changed (1) hide show
  1. Training/Training_Documentation.txt +59 -0
Training/Training_Documentation.txt ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ClinicalThought-AI-8B Training Documentation
2
+ ===============================================
3
+
4
+ Model Training Details
5
+ ---------------------
6
+
7
+ Base Model: Granite 3.3 8B Instruct
8
+ Fine-tuning Method: LoRA (Low-Rank Adaptation)
9
+ Training Infrastructure: Single NVIDIA RTX 6000 Ada Generation GPU
10
+ Training Duration: Approximately 75.8 hours
11
+ Training Dataset: Custom curated dataset for medical reasoning
12
+
13
+ Dataset Specifications
14
+ ---------------------
15
+
16
+ Total Token Count: 38,514,400
17
+ Total Sample Count: 29,500
18
+ Average Tokens/Sample: 1305.57
19
+ Dataset Creation: Created from a combination of public medical reasoning datasets from OpenAI o1 and DeepSeek-R1, along with additional reasoning chains created using Claude Sonnet 4 extended thinking
20
+
21
+ Training Configuration
22
+ ---------------------
23
+
24
+ LoRA Parameters:
25
+ - Rank: 32
26
+ - Alpha: 64
27
+ - Dropout: 0.1
28
+ - Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head
29
+
30
+ Training Hyperparameters:
31
+ - Learning Rate: 2e-5
32
+ - Batch Size: 1
33
+ - Gradient Accumulation: 8
34
+ - Effective Batch Size: 8
35
+ - Max Sequence Length: 12,000
36
+ - Epochs: 8
37
+ - Warmup Ratio: 0.05
38
+ - Weight Decay: 0.005
39
+ - Max Grad Norm: 1.0
40
+ - LR Scheduler: Cosine with Restarts
41
+
42
+ Hardware & Environment
43
+ ---------------------
44
+
45
+ GPU: NVIDIA RTX 6000 Ada Generation (48GB)
46
+ Operating System: Ubuntu
47
+ CUDA Version: 11.8
48
+ PyTorch Version: 2.7.0
49
+ Compute Capability: 8.9
50
+ Optimization: FP16, Gradient Checkpointing
51
+
52
+ Training Performance
53
+ ---------------------
54
+
55
+ Training Runtime: 75.8 hours (272,919 seconds)
56
+ Train Samples/Second: 0.865
57
+ Train Steps/Second: 0.108
58
+ Training Loss (Final): 0.738
59
+ Total Training Steps: 29,504