Text Generation
Transformers
PyTorch
Japanese
mpt
Composer
MosaicML
llm-foundry
StreamingDatasets
mpt-7b
custom_code
text-generation-inference
Instructions to use Jumtra/mpt-7b-inst with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jumtra/mpt-7b-inst with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Jumtra/mpt-7b-inst", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Jumtra/mpt-7b-inst", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("Jumtra/mpt-7b-inst", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Jumtra/mpt-7b-inst with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jumtra/mpt-7b-inst" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jumtra/mpt-7b-inst", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Jumtra/mpt-7b-inst
- SGLang
How to use Jumtra/mpt-7b-inst with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Jumtra/mpt-7b-inst" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jumtra/mpt-7b-inst", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Jumtra/mpt-7b-inst" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jumtra/mpt-7b-inst", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Jumtra/mpt-7b-inst with Docker Model Runner:
docker model run hf.co/Jumtra/mpt-7b-inst
metadata
license: cc-by-sa-3.0
tags:
- Composer
- MosaicML
- llm-foundry
- StreamingDatasets
- mpt-7b
datasets:
- kunishou/databricks-dolly-15k-ja
- Jumtra/oasst1_ja
- Jumtra/jglue_jsquad
- Jumtra/jglue_jsquads_with_input
inference: false
language:
- ja
MPT-7B-inst
このモデルは、MosaicMLのllm-foundryリポジトリを使用してmosaicml/mpt-7b-instructをファインチューニングしたモデルです。
Model Date
June 28, 2023
Model License
CC-BY-SA-3.0
評価
Jumtra/test_data_100QAを用いてモデルの正答率を評価した
| model name | 正答率 |
|---|---|
| mosaicml/mpt-7b | 16/100 |
| mosaicml/mpt-7b-instruct | 28/100 |
| Jumtra/mpt-7b-base | 47/100 |
| Jumtra/mpt-7b-inst | 46/100 |
使用方法
注意:このモデルでは、from_pretrainedメソッドにtrust_remote_code=Trueを渡す必要があります。 これは、Hugging Faceのtransformersパッケージにはまだ含まれていないカスタムのMPTモデルアーキテクチャを使用しているためです。 MPTには、FlashAttention、ALiBi、QK LayerNormなど、多くのトレーニング効率化機能のオプションが含まれています。
# 使用したプロンプトフォーマット
INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
intro=INTRO_BLURB,
instruction_key=INSTRUCTION_KEY,
instruction="{instruction}",
response_key=RESPONSE_KEY,
)
import torch
import transformers
name = 'Jumtra/mpt-7b-inst'
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'torch'
config.init_device = 'cuda:0' # For fast initialization directly on GPU!
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
torch_dtype=torch.bfloat16, # Load model weights in bfloat16
trust_remote_code=True
).to("cuda:0")
model.eval()
input_text = PROMPT_FOR_GENERATION_FORMAT.format(instruction = "ニューラルネットワークとは何ですか?")
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
input_length = inputs.input_ids.shape[1]
# Without streaming
with torch.no_grad():
generation_output = model.generate(
**inputs,
max_new_tokens=2048,
do_sample=True,
temperature=0.01,
top_p=0.01,
top_k=60,
repetition_penalty=1.1,
return_dict_in_generate=True,
remove_invalid_values=True,
pad_token_id=tokenizer.pad_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
token = generation_output.sequences[0, input_length:]
output = tokenizer.decode(token)
print(output)
#ニューラルネットワーク(NN)は、人工知能の分野で使用される深い学習アルゴリズムの一種です。これらのアルゴリズムは、データを使って自動的に学習し、特定の目的を達成するために予測や決定を行うことができます。ニューラルネットワークは、多くの異なるアプリケーションで使用されており、自動車の運転システム、検索エンジン、画像認識などです。<|endoftext|>
引用
@online{MosaicML2023Introducing,
author = {MosaicML NLP Team},
title = {Introducing MPT-7B: A New Standard for Open-Source,
ly Usable LLMs},
year = {2023},
url = {www.mosaicml.com/blog/mpt-7b},
note = {Accessed: 2023-03-28}, % change this date
urldate = {2023-03-28} % change this date
}