Instructions to use dphn/dolphin-2.9.1-llama-3-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dphn/dolphin-2.9.1-llama-3-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="dphn/dolphin-2.9.1-llama-3-8b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("dphn/dolphin-2.9.1-llama-3-8b") model = AutoModelForCausalLM.from_pretrained("dphn/dolphin-2.9.1-llama-3-8b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use dphn/dolphin-2.9.1-llama-3-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dphn/dolphin-2.9.1-llama-3-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dphn/dolphin-2.9.1-llama-3-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dphn/dolphin-2.9.1-llama-3-8b
- SGLang
How to use dphn/dolphin-2.9.1-llama-3-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "dphn/dolphin-2.9.1-llama-3-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dphn/dolphin-2.9.1-llama-3-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "dphn/dolphin-2.9.1-llama-3-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dphn/dolphin-2.9.1-llama-3-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use dphn/dolphin-2.9.1-llama-3-8b with Docker Model Runner:
docker model run hf.co/dphn/dolphin-2.9.1-llama-3-8b
GGUF Models struggle with ChatML format in LMStudio for me
Just curious if anyone else is experiencing issues using ChatML format with this model?
Despite updating LMStudio and trying out different gguf uploads, this model in 8-bit GGUF is generating random nonsense in ChatML mode.
For me, it will only generate normal text if you switch to Llama-3 format.
The same settings and ChatML format work fine when switching to Hermes 2 Pro though, which makes me wonder if this is a bug?
This is the first time I've experienced this with a Dolphin model.
Yeah we're investigating this. You can use the original llama-3 template in LM-studio for now.
Just curious if anyone else is experiencing issues using ChatML format with this model?
Despite updating LMStudio and trying out different gguf uploads, this model in 8-bit GGUF is generating random nonsense in ChatML mode.
For me, it will only generate normal text if you switch to Llama-3 format.The same settings and ChatML format work fine when switching to Hermes 2 Pro though, which makes me wonder if this is a bug?
This is the first time I've experienced this with a Dolphin model.
It seems to be affecting more than just this model, a few other models i've tried trained that are on ChatML can't do so once quantized down but do fine in bf16. New model quirks πΆβπ«οΈ
Just curious if anyone else is experiencing issues using ChatML format with this model?
Despite updating LMStudio and trying out different gguf uploads, this model in 8-bit GGUF is generating random nonsense in ChatML mode.
For me, it will only generate normal text if you switch to Llama-3 format.The same settings and ChatML format work fine when switching to Hermes 2 Pro though, which makes me wonder if this is a bug?
This is the first time I've experienced this with a Dolphin model.It seems to be affecting more than just this model, a few other models i've tried trained that are on ChatML can't do so once quantized down but do fine in bf16. New model quirks πΆβπ«οΈ
@saishf I concur.
