cjvt
/

gpt-sl-base

 ---
+tags:
+- pytorch
+- causal-lm
+metrics:
+- accuracy
+language:
+- sl
 license: apache-2.0
 ---
+# GPT-sl-base
+This model is a Slovene GPT model, based on the [bigscience workshop](https://github.com/bigscience-workshop/Megatron-DeepSpeed) fork of the Megatron. GPT-sl-base was trained on large Slovene corpora: Gigafida, KAS, slWaC, and MaCoCu.
+## Model architecture
+GPT-sl-base has about 110 million parameters. It consists of 12 transformer layers with a dimension of 768. It has 16 attention heads and can process sequences up to 1024 tokens in length.
+The tokenizer was trained on a smaller subset of the corpora, and has the vocabulary of 60k tokens.
+## Training
+The model was trained for about 20 epochs, a total of 390k steps or 102B tokens seen during training.
+|  Step  | Validation Perplexity |
+|:------:|:---------------------:|
+|  50000 | 26.801                |
+| 100000 | 25.574                |
+| 150000 | 24.773                |
+| 200000 | 24.099                |
+| 250000 | 23.336                |
+| 300000 | 22.607                |
+| 350000 | 22.329                |
+| 390000 | 22.293                |