Pula This collection hosts the models and datasets released as part of Pula, the first suite of LLMs for Setswana. Previously BOTS-LM. BOTS-LM: Training Large Language Models for Setswana Paper • 2408.02239 • Published Aug 5, 2024 • 2 OxxoCodes/Pula-8B-v0.1 Text Generation • 8B • Updated Sep 1, 2024 • 7 • 3 OxxoCodes/Pula-XLMR-large-v0.1 Fill-Mask • 0.6B • Updated Sep 1, 2024 • 5 OxxoCodes/Medupi Viewer • Updated Oct 1, 2024 • 976k • 49
Distilled Long-Context Encoders Various efficient attention encoder-style architectures distilled into student models with half the hidden layers, plus a long-context NER dataset giant-oak/lsg-roberta-base-4096 Fill-Mask • Updated Dec 27, 2023 • 6 giant-oak/distil-lsg-roberta-base-4096 Fill-Mask • Updated Jul 10, 2023 • 6 giant-oak/distil-nystromformer-4096 Fill-Mask • Updated Jun 21, 2023 • 3 giant-oak/distil-longformer-base-4096 Updated Jul 8, 2023 • 3.53k
Pula This collection hosts the models and datasets released as part of Pula, the first suite of LLMs for Setswana. Previously BOTS-LM. BOTS-LM: Training Large Language Models for Setswana Paper • 2408.02239 • Published Aug 5, 2024 • 2 OxxoCodes/Pula-8B-v0.1 Text Generation • 8B • Updated Sep 1, 2024 • 7 • 3 OxxoCodes/Pula-XLMR-large-v0.1 Fill-Mask • 0.6B • Updated Sep 1, 2024 • 5 OxxoCodes/Medupi Viewer • Updated Oct 1, 2024 • 976k • 49
Distilled Long-Context Encoders Various efficient attention encoder-style architectures distilled into student models with half the hidden layers, plus a long-context NER dataset giant-oak/lsg-roberta-base-4096 Fill-Mask • Updated Dec 27, 2023 • 6 giant-oak/distil-lsg-roberta-base-4096 Fill-Mask • Updated Jul 10, 2023 • 6 giant-oak/distil-nystromformer-4096 Fill-Mask • Updated Jun 21, 2023 • 3 giant-oak/distil-longformer-base-4096 Updated Jul 8, 2023 • 3.53k