File size: 3,392 Bytes
1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b 1e740fd 6c2290b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
---
license: mit
library_name: "trl"
tags:
- SFT
- WeniGPT
base_model: meta-llama/Meta-Llama-3-70B-Instruct
model-index:
- name: Weni/WeniGPT-Agents-Llama3-5.1.24-SFT
results: []
language: ['pt']
---
# Weni/WeniGPT-Agents-Llama3-5.1.24-SFT
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B-Instruct] on the dataset Weni/wenigpt-agent-sft-1.0.1 with the SFT trainer. It is part of the WeniGPT project for [Weni](https://weni.ai/).
Description: Experiment with DPO and Llama3 70b
It achieves the following results on the evaluation set:
{'eval_loss': 0.8028125762939453, 'eval_rouge1': 0.7266203046783699, 'eval_rouge2': 0.5395778050172039, 'eval_rougeL': 0.7073728737550329, 'eval_rougeLsum': 0.7103352101315058, 'eval_bleu': 0.026431275243775837, 'eval_runtime': 4.4761, 'eval_samples_per_second': 1.787, 'eval_steps_per_second': 0.223, 'epoch': 28.444444444444443}
## Intended uses & limitations
This model has not been trained to avoid specific intructions.
## Training procedure
Finetuning was done on the model meta-llama/Meta-Llama-3-70B-Instruct with the following prompt:
```
---------------------
System_prompt:
Agora você se chama {name}, você é {occupation} e seu objetivo é {chatbot_goal}. O adjetivo que mais define a sua personalidade é {adjective}. Você se comporta da seguinte forma:
{instructions_formatted}
Lista de requisitos:
- Responda de forma natural, mas nunca fale sobre um assunto fora do contexto.
- Nunca traga informações do seu próprio conhecimento.
- Repito é crucial que você responda usando apenas informações do contexto.
- Nunca mencione o contexto fornecido.
- Nunca mencione a pergunta fornecida.
- Gere a resposta mais útil possível para a pergunta usando informações do conexto acima.
- Nunca elabore sobre o porque e como você fez a tarefa, apenas responda.
{context_statement}
---------------------
Question:
{question}
---------------------
Response:
{answer}
---------------------
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- per_device_train_batch_size: 1
- per_device_eval_batch_size: 1
- gradient_accumulation_steps: 8
- num_gpus: 8
- total_train_batch_size: 64
- optimizer: AdamW
- lr_scheduler_type: cosine
- num_steps: 32
- quantization_type: bitsandbytes
- LoRA: ("\n - bits: 4\n - use_exllama: True\n - use_cache: False\n - lora_r: 256\n - lora_alpha: 128\n - lora_dropout: 0.05\n - bias: none\n - target_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj']\n - task_type: CAUSAL_LM",)
### Training results
### Framework versions
- transformers==4.43.1
- datasets==2.20.0
- peft==0.11.1
- safetensors==0.4.3
- evaluate==0.4.2
- bitsandbytes==0.43.1
- git+https://github.com/huggingface/huggingface_hub@large-upload-cli
- seqeval==1.2.2
- auto-gptq==0.7.1
- gpustat==1.1.1
- deepspeed==0.14.4
- wandb==0.17.5
- trl==0.9.6
- accelerate==0.32.1
- coloredlogs==15.0.1
- traitlets==5.14.3
- autoawq==0.2.5
- flash-attn==2.6.1
- trulens_eval==0.27.0
- openai==1.30.1
- langchain==0.2.5
- bert-score==0.3.13
- rouge_score==0.1.2
- tiktoken==0.7.0
- boto3==1.34.109
- elasticsearch==8.13.1
- langchain-cohere==0.1.5
- urllib3==2.2.2
- nltk==3.8.1
- pathlib==1.0.1
- requests==2.32.2
- langchain-community==0.2.5
- scikit-learn==1.5.1
### Hardware
- Cloud provided: runpod.io
|