--- base_model: unsloth/mistral-7b-v0.3-bnb-4bit tags: - text-generation-inference - transformers - unsloth - mistral - trl - sft license: apache-2.0 language: - fr datasets: - KasparZ/mtext-071024 --- # Uploaded model - **Developed by:** KasparZ - **License:** apache-2.0 - **Finetuned from model :** unsloth/mistral-7b-v0.3-bnb-4bit - max_seq_length = 4096 - tokenizer.pad_token = tokenizer.eos_token - model.config.pad_token_id = tokenizer.pad_token_id - new_tokens = ["<|s|>", "<|e|>"] - **LoRA** - r = 128, - target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj","embed_tokens", "lm_head"] - lora_alpha = 32, - lora_dropout = 0, - bias = "none", - use_gradient_checkpointing = "unsloth", - random_state = 3407, - use_rslora = True, - loftq_config = None, - **Training** - per_device_train_batch_size = 1, - gradient_accumulation_steps = 8, - warmup_ratio = 0.1, - num_train_epochs = 1, - learning_rate = 1e-4, - embedding_learning_rate = 5e-5, - fp16 = True, - bf16 = False, - logging_steps = 1, - optim = "adamw_8bit", - weight_decay = 0.01, - lr_scheduler_type = "cosine", - seed = 3407, - output_dir = "outputs", - save_strategy = "steps", - save_steps = 50, - report_to = "none", This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)