Commit
·
0e4e895
1
Parent(s):
e5aaad7
Update training_notebooks/README.md
Browse files
training_notebooks/README.md
CHANGED
|
@@ -1 +1,8 @@
|
|
| 1 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Training notebooks for simple Latin BERT uncased
|
| 2 |
+
|
| 3 |
+
These notebooks and scripts include the code to train this Masked Language Model and its tokenizer, from scratch.
|
| 4 |
+
|
| 5 |
+
The notebooks should be ready to execute in any computer with a GPU, with minimal changes.
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
Note: The scripts will create a file `03_full_corpus.txt` with the combination of all the corpora into a single raw text file.
|