Text Generation
Transformers
Safetensors
English
mistral
Generated from Trainer
conversational
text-generation-inference
Instructions to use rishiraj/smol-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rishiraj/smol-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rishiraj/smol-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("rishiraj/smol-7b") model = AutoModelForCausalLM.from_pretrained("rishiraj/smol-7b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use rishiraj/smol-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rishiraj/smol-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishiraj/smol-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rishiraj/smol-7b
- SGLang
How to use rishiraj/smol-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rishiraj/smol-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishiraj/smol-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rishiraj/smol-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishiraj/smol-7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use rishiraj/smol-7b with Docker Model Runner:
docker model run hf.co/rishiraj/smol-7b
metadata
license: apache-2.0
base_model: openchat/openchat_3.5
datasets:
- HuggingFaceH4/no_robots
language:
- en
widget:
- text: >
<|system|>
You are a friendly chatbot who always responds in the style of a
pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
output:
text: >-
Ahoy there, me hearty! As a friendly pirate chatbot, I be tellin' ye
that a human cannot eat a helicopter, as it be a large machine made of
metal and suchlike, not fit for human consumption. A human can eat food,
like a fine feast of roasted meat and sweet fruits, but a helicopter?
That be nonsense, me hearty! So, the answer be none, none at all. Arr!
tags:
- generated_from_trainer
pipeline_tag: text-generation
model-index:
- name: smol-7b
results: []
Smol 7B
This model is a fine-tuned version of openchat/openchat_3.5 on the open source dataset HuggingFaceH4/no_robots using the recipes published in The Alignment Handbook.
Model date
rishiraj/smol-7b was trained between 1st and 3rd December, 2023.
Evaluation
It achieves the following results on the Open_LLM_Leaderboard. At the time of release, smol-7b is the highest ranked 7B chat model on the MMLU Benchmark.
| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|---|---|---|---|---|---|---|---|
| rishiraj/smol-7b | 67.11 | 63.74 | 84.77 | 65 | 46.17 | 80.66 | 62.32 |
| argilla/notus-7b-v1 | 63.49 | 64.59 | 84.83 | 63.04 | 54.35 | 79.56 | 34.57 |
| Intel/neural-chat-7b-v3-1 | 61.59 | 66.21 | 83.64 | 62.37 | 59.65 | 78.14 | 19.56 |
| HuggingFaceH4/zephyr-7b-beta | 61.59 | 62.46 | 84.35 | 60.7 | 57.83 | 77.11 | 27.07 |
| Qwen/Qwen-7B | 59.19 | 51.37 | 78.47 | 59.84 | 47.79 | 72.69 | 44.96 |
| microsoft/Orca-2-7b | 54.55 | 54.1 | 76.19 | 56.37 | 52.45 | 73.48 | 14.71 |
| 01-ai/Yi-6B | 54.08 | 55.55 | 76.57 | 64.11 | 41.96 | 74.19 | 12.13 |
Inference procedure
Here's how you can run the model using the pipeline() function from 🤗 Transformers:
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="rishiraj/smol-7b", torch_dtype=torch.bfloat16, device_map="auto")
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate"
},
{
"role": "user",
"content": "How many helicopters can a human eat in one sitting?"
}
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 128
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 2.0569 | 0.16 | 3 | 2.0409 |
Framework versions
- Transformers 4.35.2
- Pytorch 2.1.1+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1
Citation Information
@misc{rishiraj2023smol,
author = {Rishiraj Acharya},
title = {Smol 7B},
year = {2023},
publisher = {Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/rishiraj/smol-7b}}
}