hudsongouge
/

DAT-Byte-Small

Text Generation

Model card Files Files and versions

hudsongouge commited on Jun 20

Commit

937bd1d

·

verified ·

1 Parent(s): e31bf23

typo fix

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ pipeline_tag: text-generation
 **DAT Byte** is a family of byte-level **D**ifferential-**A**ttention **T**ransformers, trained from scratch on an RTX 5090.
 This model is the smallest in the family, with approximately 200 million parameters.
-It was trained on a set of Discord chat data, public domain books, and English Bible translations. Larger models in the family recieved a larger and more diverse training set.
 ---
@@ -29,7 +29,7 @@ The training data was composed exclusively of the following sources:
 All listed datasets were used **in full**, and **no additional data sources** were used.
 The Discord datasets (combined ~693MB) were formatted in **ChatML**, with usernames serving as speaker roles, enabling the model to learn natural dialogue structure and dynamics. Discord data included many diverse topics, especially code. Thus, the model understands basic syntax patterns of some common programming languages. However, due to its lack of training on large scale high quality code samples, generated code is likely not to be very helpful.
-Larger models in the family recieved a larger and more diverse training set.
 ---

 **DAT Byte** is a family of byte-level **D**ifferential-**A**ttention **T**ransformers, trained from scratch on an RTX 5090.
 This model is the smallest in the family, with approximately 200 million parameters.
+It was trained on a set of Discord chat data, public domain books, and English Bible translations. Larger models in the family received a larger and more diverse training set.
 ---
 All listed datasets were used **in full**, and **no additional data sources** were used.
 The Discord datasets (combined ~693MB) were formatted in **ChatML**, with usernames serving as speaker roles, enabling the model to learn natural dialogue structure and dynamics. Discord data included many diverse topics, especially code. Thus, the model understands basic syntax patterns of some common programming languages. However, due to its lack of training on large scale high quality code samples, generated code is likely not to be very helpful.
+Larger models in the family received a larger and more diverse training set.
 ---