You should try training a model with 2B parameters and context length 32000.
1
#3 opened almost 2 years ago
by
win10
Fantastic work guys!
👍
8
2
#1 opened almost 2 years ago
by
dillfrescott