ca8aebc8d2d06211b9317c1e9d5c74cd

This model is a fine-tuned version of Qwen/Qwen2.5-7B on the google/boolq dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	8.0175	0	20.6655	0.5928	0.4152	0.5928	0.5919	0.5925
No log	1	294	18.6013	0.0078	27.0807	0.3781	0.2751	0.3781	0.3785	0.3781
No log	2	588	12.0732	0.0156	36.0150	0.6072	0.4104	0.6075	0.6068	0.6075
No log	3	882	21.2906	0.0312	57.1575	0.6216	0.3909	0.6216	0.6213	0.6216
0.5979	4	1176	4.9192	0.0625	82.7569	0.4038	0.3602	0.4035	0.4038	0.4038
0.2704	5	1470	4.7668	0.125	121.8431	0.5524	0.5049	0.5522	0.5530	0.5527
0.4559	6	1764	3.0337	0.25	202.6585	0.3952	0.3242	0.3952	0.3958	0.3955
2.8035	7	2058	2.7901	0.5	390.2641	0.6213	0.3832	0.6213	0.6207	0.6210
2.7666	8.0	2352	2.6635	1.0	736.9165	0.6213	0.3832	0.6213	0.6207	0.6210
2.7882	9.0	2646	2.6738	1.0	734.9518	0.6213	0.3832	0.6213	0.6207	0.6210
2.4314	10.0	2940	2.7440	1.0	734.0090	0.6492	0.4912	0.6492	0.6486	0.6492
1.596	11.0	3234	2.8961	1.0	728.1010	0.6618	0.6328	0.6618	0.6621	0.6615
0.8903	12.0	3528	3.9882	1.0	734.1698	0.6235	0.6163	0.6238	0.6235	0.6238

Safetensors

Model size

2B params

Tensor type

F32

Base model

Qwen/Qwen2.5-7B

Finetuned

(747)

this model