newmindai
/

Llama-3.1-8B-Instruct-w16a16-4nodes-bs32

Text Generation

instruction-tuned

mixed-precision

question-answering

distributed-training

text-generation-inference

Model card Files Files and versions

nmmursit commited on 10 days ago

Commit

5e7eae5

·

verified ·

1 Parent(s): 19dbd01

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ model-index:
 ---
 ## **Abstract**
-This repository provides a domain-adapted Turkish legal instruction-tuned model derived from Meta’s Llama-3.1-8B-Instruct. As part of the “Harnessing Fully Sharded Data Parallelism v2 with Float8 Precision for Faster Training” study, this configuration represents the BF16 variant trained on 4 nodes with a 32 global batch size.
 In this scaling regime, FP8 mixed-precision did not yield a runtime improvement over BF16, highlighting how FP8 efficiency varies with batch size, sequence parallelism, and multi-node communication overhead. This model provides a strong BF16 baseline for comparison across all batch-size and node-scaling experiments in the study.
 ## **Experiment Context**
 This model was trained as part of our study for comparing **FSDP2 with bfloat16 precision** against **FSDP2 with FP8 mixed precision bfp16-fp8**.

 ---
 ## **Abstract**
+This repository provides a domain-adapted Turkish legal instruction-tuned model derived from Meta’s Llama-3.1-8B-Instruct. As part of the “Harnessing Fully Sharded Data Parallelism v2 with Float8 Precision for Faster Training” study, this configuration represents the BF16 variant with using the default **Tensorwise** quantization scaling recipe trained on 4 nodes with a 32 global batch size.
 In this scaling regime, FP8 mixed-precision did not yield a runtime improvement over BF16, highlighting how FP8 efficiency varies with batch size, sequence parallelism, and multi-node communication overhead. This model provides a strong BF16 baseline for comparison across all batch-size and node-scaling experiments in the study.
 ## **Experiment Context**
 This model was trained as part of our study for comparing **FSDP2 with bfloat16 precision** against **FSDP2 with FP8 mixed precision bfp16-fp8**.