Commit
·
c271818
1
Parent(s):
66aedd3
Update README.md
Browse files
README.md
CHANGED
|
@@ -32,28 +32,14 @@ Apache-2.0 (commercial use permitted)
|
|
| 32 |
* [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
|
| 33 |
* Questions: Feel free to contact us via the [MosaicML Community Slack](https://join.slack.com/t/mosaicml-community/shared_invite/zt-w0tiddn9-WGTlRpfjcO9J5jyrMub1dg)!
|
| 34 |
|
| 35 |
-
### Example
|
| 36 |
|
| 37 |
-
**
|
| 38 |
-
>
|
| 39 |
-
>"Introducing MPT-7B, the latest entry in our MosaicML Foundation Series.
|
| 40 |
-
>MPT-7B is a transformer trained from scratch on IT tokens of text and code.
|
| 41 |
-
>It is open source, available for commercial use, and it matches the quality of LLaMA-7B.
|
| 42 |
-
>MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k.
|
| 43 |
-
>Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch.
|
| 44 |
-
>For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-Story Writer-65k+, the last of which uses a context length of 65k tokens!
|
| 45 |
|
| 46 |
-
**MPT-7B-Instruct
|
| 47 |
-
>Here's my suggestion for a tweet:
|
| 48 |
-
>"Exciting news! @MosaicML has launched #MPT_7B - a new standard for open-source, commercially usable LLMs. With training on 17 tokens of text & code, it matches the quality of LLaMA-7B. Now you can train, fine-tune, and deploy your own private MPT models via their platform. Check out the full blog post here: https://bit.ly/3j6UJzZ"
|
| 49 |
|
| 50 |
-
|
| 51 |
-
>Is there anything you want to say to MosaicML about this exciting news?
|
| 52 |
-
|
| 53 |
-
**MPT-7B-Instruct**:
|
| 54 |
-
> As an Al language model myself, I am very impressed with the advancements made by MosaicML in developing such high-quality open-source LLMs.
|
| 55 |
-
> This new version of MPT-7B is truly impressive and I look forward to seeing what innovative applications developers will create using these powerful tools.
|
| 56 |
-
> Thank you for your hard work and dedication to advancing Al research and development.
|
| 57 |
|
| 58 |
## How to Use
|
| 59 |
|
|
@@ -73,6 +59,21 @@ model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b-instr
|
|
| 73 |
model.to(device='cuda:0', dtype=torch.bfloat16)
|
| 74 |
```
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
## Model Description
|
| 77 |
|
| 78 |
The architecture is a modification of a standard decoder-only transformer.
|
|
@@ -104,4 +105,4 @@ This model was finetuned on 440 A100-40GBs for about half a day using the [Mosai
|
|
| 104 |
|
| 105 |
## Acknowledgements
|
| 106 |
|
| 107 |
-
This model was finetuned by Sam Havens
|
|
|
|
| 32 |
* [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
|
| 33 |
* Questions: Feel free to contact us via the [MosaicML Community Slack](https://join.slack.com/t/mosaicml-community/shared_invite/zt-w0tiddn9-WGTlRpfjcO9J5jyrMub1dg)!
|
| 34 |
|
| 35 |
+
### Example Question/Instruction
|
| 36 |
|
| 37 |
+
**Longboi24**
|
| 38 |
+
> What is a quoll?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
+
**MPT-7B-Instruct**
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
>A Quoll (pronounced “cool”) is one of Australia’s native carnivorous marsupial mammals, which are also known as macropods or wallabies in other parts around Asia and South America
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
## How to Use
|
| 45 |
|
|
|
|
| 59 |
model.to(device='cuda:0', dtype=torch.bfloat16)
|
| 60 |
```
|
| 61 |
|
| 62 |
+
Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
|
| 63 |
+
|
| 64 |
+
```python
|
| 65 |
+
config = transformers.AutoConfig.from_pretrained('mosaicml/mpt-7b', trust_remote_code=True)
|
| 66 |
+
config.update({"max_seq_len": 4096})
|
| 67 |
+
model = transformers.AutoModelForCausalLM.from_pretrained('mosaicml/mpt-7b', config=config, trust_remote_code=True)
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.
|
| 71 |
+
|
| 72 |
+
```python
|
| 73 |
+
from transformers import AutoTokenizer
|
| 74 |
+
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
## Model Description
|
| 78 |
|
| 79 |
The architecture is a modification of a standard decoder-only transformer.
|
|
|
|
| 105 |
|
| 106 |
## Acknowledgements
|
| 107 |
|
| 108 |
+
This model was finetuned by Sam Havens and the MosaicML NLP team
|