typo fix
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ pipeline_tag: text-generation
|
|
| 12 |
|
| 13 |
**DAT Byte** is a family of byte-level **D**ifferential-**A**ttention **T**ransformers, trained from scratch on an RTX 5090.
|
| 14 |
This model is the smallest in the family, with approximately 200 million parameters.
|
| 15 |
-
It was trained on a set of Discord chat data, public domain books, and English Bible translations. Larger models in the family
|
| 16 |
|
| 17 |
---
|
| 18 |
|
|
@@ -29,7 +29,7 @@ The training data was composed exclusively of the following sources:
|
|
| 29 |
All listed datasets were used **in full**, and **no additional data sources** were used.
|
| 30 |
|
| 31 |
The Discord datasets (combined ~693MB) were formatted in **ChatML**, with usernames serving as speaker roles, enabling the model to learn natural dialogue structure and dynamics. Discord data included many diverse topics, especially code. Thus, the model understands basic syntax patterns of some common programming languages. However, due to its lack of training on large scale high quality code samples, generated code is likely not to be very helpful.
|
| 32 |
-
Larger models in the family
|
| 33 |
|
| 34 |
---
|
| 35 |
|
|
|
|
| 12 |
|
| 13 |
**DAT Byte** is a family of byte-level **D**ifferential-**A**ttention **T**ransformers, trained from scratch on an RTX 5090.
|
| 14 |
This model is the smallest in the family, with approximately 200 million parameters.
|
| 15 |
+
It was trained on a set of Discord chat data, public domain books, and English Bible translations. Larger models in the family received a larger and more diverse training set.
|
| 16 |
|
| 17 |
---
|
| 18 |
|
|
|
|
| 29 |
All listed datasets were used **in full**, and **no additional data sources** were used.
|
| 30 |
|
| 31 |
The Discord datasets (combined ~693MB) were formatted in **ChatML**, with usernames serving as speaker roles, enabling the model to learn natural dialogue structure and dynamics. Discord data included many diverse topics, especially code. Thus, the model understands basic syntax patterns of some common programming languages. However, due to its lack of training on large scale high quality code samples, generated code is likely not to be very helpful.
|
| 32 |
+
Larger models in the family received a larger and more diverse training set.
|
| 33 |
|
| 34 |
---
|
| 35 |
|