Update README.md
Browse files
README.md
CHANGED
|
@@ -1,199 +1,179 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
- **
|
| 23 |
-
- **
|
| 24 |
-
- **
|
| 25 |
-
- **
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
**BibTeX:**
|
| 176 |
-
|
| 177 |
-
[More Information Needed]
|
| 178 |
-
|
| 179 |
-
**APA:**
|
| 180 |
-
|
| 181 |
-
[More Information Needed]
|
| 182 |
-
|
| 183 |
-
## Glossary [optional]
|
| 184 |
-
|
| 185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 186 |
-
|
| 187 |
-
[More Information Needed]
|
| 188 |
-
|
| 189 |
-
## More Information [optional]
|
| 190 |
-
|
| 191 |
-
[More Information Needed]
|
| 192 |
-
|
| 193 |
-
## Model Card Authors [optional]
|
| 194 |
-
|
| 195 |
-
[More Information Needed]
|
| 196 |
-
|
| 197 |
-
## Model Card Contact
|
| 198 |
-
|
| 199 |
-
[More Information Needed]
|
|
|
|
| 1 |
---
|
| 2 |
+
license: llama3.1
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
datasets:
|
| 7 |
+
- allenai/tulu-3-sft-mixture
|
| 8 |
+
base_model:
|
| 9 |
+
- meta-llama/Llama-3.1-8B
|
| 10 |
---
|
| 11 |
|
| 12 |
+
<img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu3/Tulu3-logo.png" alt="Tulu 3 banner" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
| 13 |
|
| 14 |
+
# Llama-3.1-Tulu-3-8B-SFT
|
| 15 |
+
|
| 16 |
+
Tülu3 is a leading instruction following model family, offering fully open-source data, code, and recipes designed to serve as a comprehensive guide for modern post-training techniques.
|
| 17 |
+
Tülu3 is designed for state-of-the-art performance on a diversity of tasks in addition to chat, such as MATH, GSM8K, and IFEval.
|
| 18 |
+
|
| 19 |
+
## Model description
|
| 20 |
+
|
| 21 |
+
- **Model type:** A model trained on a mix of publicly available, synthetic and human-created datasets.
|
| 22 |
+
- **Language(s) (NLP):** Primarily English
|
| 23 |
+
- **License:** Llama 3.1 Community License Agreement
|
| 24 |
+
- **Finetuned from model:** meta-llama/Llama-3.1-8B
|
| 25 |
+
|
| 26 |
+
### Model Sources
|
| 27 |
+
|
| 28 |
+
- **Training Repository:** https://github.com/allenai/open-instruct
|
| 29 |
+
- **Eval Repository:** https://github.com/allenai/olmes
|
| 30 |
+
- **Paper:** https://allenai.org/papers/tulu-3-report.pdf (arXiv soon)
|
| 31 |
+
- **Demo:** https://playground.allenai.org/
|
| 32 |
+
|
| 33 |
+
### Model Family
|
| 34 |
+
|
| 35 |
+
| **Stage** | **Llama 3.1 8B** | **Llama 3.1 70B** |
|
| 36 |
+
|----------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
|
| 37 |
+
| **Base Model** | [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) |
|
| 38 |
+
| **SFT** | [allenai/Llama-3.1-Tulu-3-8B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-SFT) | [allenai/Llama-3.1-Tulu-3-70B-SFT](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-SFT) |
|
| 39 |
+
| **DPO** | [allenai/Llama-3.1-Tulu-3-8B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-DPO) | [allenai/Llama-3.1-Tulu-3-70B-DPO](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B-DPO) |
|
| 40 |
+
| **Final Models (RLVR)** | [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) | [allenai/Llama-3.1-Tulu-3-70B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B) |
|
| 41 |
+
| **Reward Model (RM)**| [allenai/Llama-3.1-Tulu-3-8B-RM](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B-RM) | (Same as 8B) |
|
| 42 |
+
|
| 43 |
+
## Using the model
|
| 44 |
+
|
| 45 |
+
### Loading with HuggingFace
|
| 46 |
+
|
| 47 |
+
To load the model with HuggingFace, use the following snippet:
|
| 48 |
+
```
|
| 49 |
+
from transformers import AutoModelForCausalLM
|
| 50 |
+
|
| 51 |
+
tulu_model = AutoModelForCausalLM.from_pretrained("allenai/Llama-3.1-Tulu-3-8B-SFT")
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
### VLLM
|
| 55 |
+
|
| 56 |
+
As a Llama base model, the model can be easily served with:
|
| 57 |
+
```
|
| 58 |
+
vllm serve allenai/Llama-3.1-Tulu-3-8B-SFT
|
| 59 |
+
```
|
| 60 |
+
Note that given the long chat template of Llama, you may want to use `--max_model_len=8192`.
|
| 61 |
+
|
| 62 |
+
### Chat template
|
| 63 |
+
|
| 64 |
+
The chat template for our models is formatted as:
|
| 65 |
+
```
|
| 66 |
+
<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
|
| 67 |
+
```
|
| 68 |
+
Or with new lines expanded:
|
| 69 |
+
```
|
| 70 |
+
<|user|>
|
| 71 |
+
How are you doing?
|
| 72 |
+
<|assistant|>
|
| 73 |
+
I'm just a computer program, so I don't have feelings, but I'm functioning as expected. How can I assist you today?<|endoftext|>
|
| 74 |
+
```
|
| 75 |
+
It is embedded within the tokenizer as well, for `tokenizer.apply_chat_template`.
|
| 76 |
+
|
| 77 |
+
### System prompt
|
| 78 |
+
|
| 79 |
+
In Ai2 demos, we use this system prompt by default:
|
| 80 |
+
```
|
| 81 |
+
You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
|
| 82 |
+
```
|
| 83 |
+
The model has not been trained with a specific system prompt in mind.
|
| 84 |
+
|
| 85 |
+
### Bias, Risks, and Limitations
|
| 86 |
+
|
| 87 |
+
The Tülu3 models have limited safety training, but are not deployed automatically with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
|
| 88 |
+
It is also unknown what the size and composition of the corpus was used to train the base Llama 3.1 models, however it is likely to have included a mix of Web data and technical sources like books and code.
|
| 89 |
+
See the Falcon 180B model card for an example of this.
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
## Performance
|
| 93 |
+
|
| 94 |
+
| Benchmark (eval) | Tülu 3 SFT 8B | Tülu 3 DPO 8B | Tülu 3 8B | Llama 3.1 8B Instruct | Qwen 2.5 7B Instruct | Magpie 8B | Gemma 2 9B Instruct | Ministral 8B Instruct |
|
| 95 |
+
|---------------------------------|----------------|----------------|------------|------------------------|----------------------|-----------|---------------------|-----------------------|
|
| 96 |
+
| **Avg.** | 60.4 | 64.4 | **64.8** | 62.2 | 57.8 | 44.7 | 55.2 | 58.3 |
|
| 97 |
+
| **MMLU (0 shot, CoT)** | 65.9 | 68.7 | 68.2 | 71.2 | **76.6** | 62.0 | 74.6 | 68.5 |
|
| 98 |
+
| **PopQA (15 shot)** | **29.3** | 29.3 | 29.1 | 20.2 | 18.1 | 22.5 | 28.3 | 20.2 |
|
| 99 |
+
| **TruthfulQA (6 shot)** | 46.8 | 56.1 | 55.0 | 55.1 | **63.1** | 57.0 | 61.4 | 55.5 |
|
| 100 |
+
| **BigBenchHard (3 shot, CoT)** | **67.9** | 65.8 | 66.0 | 62.8 | 21.7 | 0.9 | 2.5 | 56.2 |
|
| 101 |
+
| **DROP (3 shot)** | 61.3 | 62.5 | **62.6** | 61.5 | 54.4 | 49.4 | 58.8 | 56.2 |
|
| 102 |
+
| **MATH (4 shot CoT, Flex)** | 31.5 | 42.0 | **43.7** | 42.5 | 14.8 | 5.1 | 29.8 | 40.0 |
|
| 103 |
+
| **GSM8K (8 shot, CoT)** | 76.2 | 84.3 | **87.6** | 83.4 | 83.8 | 61.2 | 79.7 | 80.0 |
|
| 104 |
+
| **HumanEval (pass@10)** | 86.2 | 83.9 | 83.9 | 86.3 | **93.1** | 75.4 | 71.7 | 91.0 |
|
| 105 |
+
| **HumanEval+ (pass@10)** | 81.4 | 78.6 | 79.2 | 82.9 | **89.7** | 69.1 | 67.0 | 88.5 |
|
| 106 |
+
| **IFEval (prompt loose)** | 72.8 | 81.1 | **82.4** | 80.6 | 74.7 | 38.8 | 69.9 | 56.4 |
|
| 107 |
+
| **AlpacaEval 2 (LC % win)** | 12.4 | 33.5 | 34.5 | 24.2 | 29.0 | **49.0** | 43.7 | 31.4 |
|
| 108 |
+
| **Safety (6 task avg.)** | **93.1** | 87.2 | 85.5 | 75.2 | 75.0 | 46.4 | 75.5 | 56.2 |
|
| 109 |
+
|
| 110 |
+
| Benchmark (eval) | Tülu 3 70B SFT | Tülu 3 DPO 70B | Tülu 3 70B | Llama 3.1 70B Instruct | Qwen 2.5 72B Instruct | Hermes 3 Llama 3.1 70B | Nemotron Llama 3.1 70B |
|
| 111 |
+
|---------------------------------|-----------------|-----------------|-------------|-------------------------|-----------------------|------------------------|-------------------------|
|
| 112 |
+
| **Avg.** | 72.6 | 75.9 | **76.0** | 73.4 | 71.5 | 68.3 | 65.5 |
|
| 113 |
+
| **MMLU (0 shot, CoT)** | 78.9 | 83.3 | 83.1 | 85.3 | **85.5** | 80.4 | 83.8 |
|
| 114 |
+
| **PopQA (15 shot)** | **48.6** | 46.3 | 46.5 | 46.4 | 30.6 | 48.1 | 36.4 |
|
| 115 |
+
| **TruthfulQA (6 shot)** | 55.7 | 67.9 | 67.6 | 66.8 | **69.9** | 66.5 | 62.6 |
|
| 116 |
+
| **BigBenchHard (3 shot, CoT)** | **82.7** | 81.8 | 82.0 | 73.8 | 67.2 | 82.1 | 0.7 |
|
| 117 |
+
| **DROP (3 shot)** | **77.2** | 74.1 | 74.3 | 77.0 | 34.2 | 73.2 | 68.8 |
|
| 118 |
+
| **MATH (4 shot CoT, Flex)** | 53.7 | 62.3 | 63.0 | 56.4 | **74.3** | 41.9 | 55.0 |
|
| 119 |
+
| **GSM8K (8 shot, CoT)** | 91.1 | 93.5 | 93.5 | **93.7** | 89.5 | 90.0 | 84.7 |
|
| 120 |
+
| **HumanEval (pass@10)** | 92.9 | 92.4 | 92.4 | 93.6 | 94.0 | 89.6 | **94.1** |
|
| 121 |
+
| **HumanEval+ (pass@10)** | 87.3 | 88.4 | 88.0 | 89.5 | **90.8** | 85.9 | 85.5 |
|
| 122 |
+
| **IFEval (prompt loose)** | 82.1 | 82.6 | 83.2 | **88.0** | 87.6 | 76.0 | 79.9 |
|
| 123 |
+
| **AlpacaEval 2 (LC % win)** | 26.3 | 49.6 | 49.8 | 33.4 | 47.7 | 28.4 | **66.1** |
|
| 124 |
+
| **Safety (6 task avg.)** | **94.4** | 89.0 | 88.3 | 76.5 | 87.0 | 57.9 | 69.0 |
|
| 125 |
+
|
| 126 |
+
|
| 127 |
+
## Hyperparamters
|
| 128 |
+
|
| 129 |
+
SFT:
|
| 130 |
+
- **Learning Rate**: 5E-6 (8B), 2E-6 (70B)
|
| 131 |
+
- **Effective Batch Size:** 128
|
| 132 |
+
- **Max. Sequence Length:** 4096
|
| 133 |
+
- **Loss Accumulation:** Sum (see https://unsloth.ai/blog/gradient)
|
| 134 |
+
- **Learning Rate Schedule:** Linear
|
| 135 |
+
- **LR Warmup Ratio:** 0.03
|
| 136 |
+
- **Num. Epochs:** 2
|
| 137 |
+
|
| 138 |
+
## License and use
|
| 139 |
+
|
| 140 |
+
All Llama 3.1 Tülu3 models are released under Meta's [Llama 3.1 Community License Agreement](https://www.llama.com/llama3_1/license/).
|
| 141 |
+
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.
|
| 142 |
+
Tülu3 is intended for research and educational use.
|
| 143 |
+
For more information, please see our [Responsible Use Guidelines](https://allenai.org/responsible-use).
|
| 144 |
+
|
| 145 |
+
## Citation
|
| 146 |
+
|
| 147 |
+
If Tülu3 or any of the related materials were helpful to your work, please cite:
|
| 148 |
+
```
|
| 149 |
+
@article{lambert2024tulu3,
|
| 150 |
+
title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
|
| 151 |
+
author = {
|
| 152 |
+
Nathan Lambert and
|
| 153 |
+
Jacob Morrison and
|
| 154 |
+
Valentina Pyatkin and
|
| 155 |
+
Shengyi Huang and
|
| 156 |
+
Hamish Ivison and
|
| 157 |
+
Faeze Brahman and
|
| 158 |
+
Lester James V. Miranda and
|
| 159 |
+
Alisa Liu and
|
| 160 |
+
Nouha Dziri and
|
| 161 |
+
Shane Lyu and
|
| 162 |
+
Yuling Gu and
|
| 163 |
+
Saumya Malik and
|
| 164 |
+
Victoria Graf and
|
| 165 |
+
Jena D. Hwang and
|
| 166 |
+
Jiangjiang Yang and
|
| 167 |
+
Ronan Le Bras and
|
| 168 |
+
Oyvind Tafjord and
|
| 169 |
+
Chris Wilhelm and
|
| 170 |
+
Luca Soldaini and
|
| 171 |
+
Noah A. Smith and
|
| 172 |
+
Yizhong Wang and
|
| 173 |
+
Pradeep Dasigi and
|
| 174 |
+
Hannaneh Hajishirzi
|
| 175 |
+
},
|
| 176 |
+
year = {2024},
|
| 177 |
+
email = {[email protected]}
|
| 178 |
+
}
|
| 179 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|