GPTNeoX-160M-Minipile

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 2.8779

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
6.6923	0.0281	500	6.6275
5.7396	0.0561	1000	5.7512
5.3007	0.0842	1500	5.2374
4.6605	0.1123	2000	4.6288
4.2878	0.1403	2500	4.2100
4.0088	0.1684	3000	3.9680
3.8332	0.1965	3500	3.8331
3.7552	0.2246	4000	3.7197
3.6096	0.2526	4500	3.6527
3.6151	0.2807	5000	3.5885
3.4574	0.3088	5500	3.5467
3.4561	0.3368	6000	3.4924
3.3655	0.3649	6500	3.4318
3.4202	0.3930	7000	3.3960
3.3726	0.4210	7500	3.3732
3.3313	0.4491	8000	3.3337
3.3411	0.4772	8500	3.3040
3.3544	0.5052	9000	3.2786
3.2502	0.5333	9500	3.2705
3.2733	0.5614	10000	3.2517
3.2467	0.5895	10500	3.2253
3.2516	0.6175	11000	3.2078
3.1775	0.6456	11500	3.1942
3.207	0.6737	12000	3.1785
3.1331	0.7017	12500	3.1725
3.1398	0.7298	13000	3.1542
3.1569	0.7579	13500	3.1462
3.148	0.7859	14000	3.1280
3.1547	0.8140	14500	3.1201
3.1583	0.8421	15000	3.1070
3.103	0.8702	15500	3.0974
3.0527	0.8982	16000	3.0862
3.0612	0.9263	16500	3.0764
3.0707	0.9544	17000	3.0663
3.0632	0.9824	17500	3.0607
3.0317	1.0105	18000	3.0491
3.0108	1.0386	18500	3.0479
3.035	1.0666	19000	3.0415
3.0164	1.0947	19500	3.0324
2.9707	1.1228	20000	3.0250
3.0121	1.1508	20500	3.0211
2.9887	1.1789	21000	3.0135
2.9933	1.2070	21500	3.0050
2.9535	1.2351	22000	3.0005
2.9651	1.2631	22500	2.9931
2.9965	1.2912	23000	2.9875
2.9886	1.3193	23500	2.9819
2.92	1.3473	24000	2.9752
2.9263	1.3754	24500	2.9717
2.8707	1.4035	25000	2.9691
2.923	1.4315	25500	2.9627
2.9615	1.4596	26000	2.9555
2.9154	1.4877	26500	2.9518
2.9112	1.5157	27000	2.9481
2.9033	1.5438	27500	2.9433
2.9711	1.5719	28000	2.9379
2.8926	1.6000	28500	2.9344
2.9006	1.6280	29000	2.9301
2.9529	1.6561	29500	2.9263
2.8718	1.6842	30000	2.9223
2.8989	1.7122	30500	2.9188
2.9101	1.7403	31000	2.9149
2.9219	1.7684	31500	2.9120
2.9225	1.7964	32000	2.9082
2.8855	1.8245	32500	2.9058
2.8643	1.8526	33000	2.9026
2.8996	1.8806	33500	2.8999
2.9717	1.9087	34000	2.8974
2.8536	1.9368	34500	2.8960
2.8435	1.9649	35000	2.8928
2.861	1.9929	35500	2.8906
2.7977	2.0210	36000	2.8894
2.8228	2.0491	36500	2.8895
2.8064	2.0771	37000	2.8874
2.827	2.1052	37500	2.8863
2.8202	2.1333	38000	2.8852
2.8581	2.1613	38500	2.8840
2.8004	2.1894	39000	2.8828
2.7771	2.2175	39500	2.8820
2.8305	2.2456	40000	2.8814
2.8659	2.2736	40500	2.8806
2.8176	2.3017	41000	2.8804
2.8101	2.3298	41500	2.8797
2.7866	2.3578	42000	2.8792
2.8114	2.3859	42500	2.8789
2.8204	2.4140	43000	2.8786
2.861	2.4420	43500	2.8782
2.8333	2.4701	44000	2.8781
2.8111	2.4982	44500	2.8781
2.8145	2.5262	45000	2.8779
2.8349	2.5543	45500	2.8778
2.8058	2.5824	46000	2.8778
2.7971	2.6105	46500	2.8778
2.8267	2.6385	47000	2.8777
2.8172	2.6666	47500	2.8777
2.8355	2.6947	48000	2.8778
2.8189	2.7227	48500	2.8778
2.8155	2.7508	49000	2.8778
2.8542	2.7789	49500	2.8778
2.8476	2.8069	50000	2.8779
2.8288	2.8350	50500	2.8779
2.8399	2.8631	51000	2.8779
2.767	2.8911	51500	2.8779
2.8238	2.9192	52000	2.8779
2.7982	2.9473	52500	2.8779
2.7844	2.9754	53000	2.8779

Framework versions

Transformers 4.45.0
Pytorch 2.4.1
Datasets 3.0.1
Tokenizers 0.20.3

Downloads last month: 113

Safetensors

Model size

0.2B params

Tensor type

BF16

Model tree for SingularityHJY/GPTNeoX-160M-Minipile

Quantizations

1 model

Evaluation results

Metadata error: specify a dataset to view leaderboard