Model Details
This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 20x chinchilla rule for 20B tokens seen.
How to use
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="kz919/llama3_1b_cautious_chinchilla_8132025",
)
print(pipe("The key to life is"))
Downstream Eval
ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA
lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
| Tasks |
Version |
Filter |
n-shot |
Metric |
|
Value |
|
Stderr |
| arc_challenge |
1 |
none |
0 |
acc |
↑ |
0.2730 |
± |
0.0130 |
|
|
none |
0 |
acc_norm |
↑ |
0.2765 |
± |
0.0131 |
| arc_easy |
1 |
none |
0 |
acc |
↑ |
0.5960 |
± |
0.0101 |
|
|
none |
0 |
acc_norm |
↑ |
0.5290 |
± |
0.0102 |
| hellaswag |
1 |
none |
0 |
acc |
↑ |
0.3442 |
± |
0.0047 |
|
|
none |
0 |
acc_norm |
↑ |
0.4122 |
± |
0.0049 |
| lambada_openai |
1 |
none |
0 |
acc |
↑ |
0.3264 |
± |
0.0065 |
|
|
none |
0 |
perplexity |
↓ |
39.7510 |
± |
1.6063 |
| openbookqa |
1 |
none |
0 |
acc |
↑ |
0.2200 |
± |
0.0185 |
|
|
none |
0 |
acc_norm |
↑ |
0.3300 |
± |
0.0210 |
| piqa |
1 |
none |
0 |
acc |
↑ |
0.6872 |
± |
0.0108 |
|
|
none |
0 |
acc_norm |
↑ |
0.6850 |
± |
0.0108 |
MMLU
| Groups |
Version |
Filter |
n-shot |
Metric |
|
Value |
|
Stderr |
| mmlu |
2 |
none |
|
acc |
↑ |
0.2536 |
± |
0.0037 |
| - humanities |
2 |
none |
|
acc |
↑ |
0.2667 |
± |
0.0064 |
| - other |
2 |
none |
|
acc |
↑ |
0.2475 |
± |
0.0077 |
| - social sciences |
2 |
none |
|
acc |
↑ |
0.2337 |
± |
0.0076 |
| - stem |
2 |
none |
|
acc |
↑ |
0.2594 |
± |
0.0078 |