Model Details

This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 20x chinchilla rule for 20B tokens seen.

How to use

import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model="kz919/llama3_1b_cautious_chinchilla_8132025",
)

print(pipe("The key to life is"))

Downstream Eval

ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA

lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8

Tasks	Version	Filter	Metric		Value		Stderr
arc_challenge	1	none	acc	↑	0.2730	±	0.0130
		none	acc_norm	↑	0.2765	±	0.0131
arc_easy	1	none	acc	↑	0.5960	±	0.0101
		none	acc_norm	↑	0.5290	±	0.0102
hellaswag	1	none	acc	↑	0.3442	±	0.0047
		none	acc_norm	↑	0.4122	±	0.0049
lambada_openai	1	none	acc	↑	0.3264	±	0.0065
		none	perplexity	↓	39.7510	±	1.6063
openbookqa	1	none	acc	↑	0.2200	±	0.0185
		none	acc_norm	↑	0.3300	±	0.0210
piqa	1	none	acc	↑	0.6872	±	0.0108
		none	acc_norm	↑	0.6850	±	0.0108

MMLU

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.2536	±	0.0037
- humanities	2	none	acc	↑	0.2667	±	0.0064
- other	2	none	acc	↑	0.2475	±	0.0077
- social sciences	2	none	acc	↑	0.2337	±	0.0076
- stem	2	none	acc	↑	0.2594	±	0.0078

Downloads last month: 2

Safetensors

Model size

1B params

Tensor type

F32

kz919
/

llama3_1b_cautious_chinchilla_8142025

Model Details

How to use

Downstream Eval

ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA

MMLU

Dataset used to train kz919/llama3_1b_cautious_chinchilla_8142025