Instructions to use jth01/Arcee-Blitz-6.5bpw-h8-exl2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use jth01/Arcee-Blitz-6.5bpw-h8-exl2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="jth01/Arcee-Blitz-6.5bpw-h8-exl2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("jth01/Arcee-Blitz-6.5bpw-h8-exl2") model = AutoModelForCausalLM.from_pretrained("jth01/Arcee-Blitz-6.5bpw-h8-exl2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use jth01/Arcee-Blitz-6.5bpw-h8-exl2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "jth01/Arcee-Blitz-6.5bpw-h8-exl2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jth01/Arcee-Blitz-6.5bpw-h8-exl2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/jth01/Arcee-Blitz-6.5bpw-h8-exl2
- SGLang
How to use jth01/Arcee-Blitz-6.5bpw-h8-exl2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "jth01/Arcee-Blitz-6.5bpw-h8-exl2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jth01/Arcee-Blitz-6.5bpw-h8-exl2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "jth01/Arcee-Blitz-6.5bpw-h8-exl2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "jth01/Arcee-Blitz-6.5bpw-h8-exl2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use jth01/Arcee-Blitz-6.5bpw-h8-exl2 with Docker Model Runner:
docker model run hf.co/jth01/Arcee-Blitz-6.5bpw-h8-exl2
Configuration Parsing Warning:In config.json: "quantization_config.bits" must be an integer
Arcee-Blitz (24B) is a new Mistral-based 24B model distilled from DeepSeek, designed to be both fast and efficient. We view it as a practical “workhorse” model that can tackle a range of tasks without the overhead of larger architectures.
Quantizations
Coming soon
Model Details
- Architecture Base: Mistral-Small-24B-Instruct-2501
- Parameter Count: 24B
- Distillation Data:
- Merged Virtuoso pipeline with Mistral architecture, hotstarting the training with over 3B tokens of pretraining distillation from DeepSeek-V3 logits
- Fine-Tuning and Post-Training:
- After capturing core logits, we performed additional fine-tuning and distillation steps to enhance overall performance.
- License: Apache-2.0
Improving World Knowledge
Arcee-Blitz shows large improvements to performance on MMLU-Pro versus the original Mistral-Small-3, reflecting a dramatic increase in world knowledge.
Data contamination checking
We carefully examined our training data and pipeline to avoid contamination. While we’re confident in the validity of these gains, we remain open to further community validation and testing (one of the key reasons we release these models as open-source).
Benchmark Comparison
| Benchmark | mistral‑small‑3 | arcee‑blitz |
|---|---|---|
| MixEval | 81.6% | 85.1% |
| GPQADiamond | 42.4% | 43.1% |
| BigCodeBench Complete | 44.4% | 45.5% |
| BigCodeBench Instruct | 34.7% | 35.9% |
| BigCodeBench Complete-hard | 16.2% | 19.6% |
| BigCodeBench Instruct-hard | 15.5% | 15.5% |
| IFEval | 77.44 | 80.60 |
| BBH | 64.46 | 65.00 |
| GPQA | 33.90 | 36.70 |
| MMLU Pro | 44.70 | 60.20 |
| MuSR | 40.90 | 50.00 |
| Math Level 5 | 12.00 | 38.60 |
Limitations
- Context Length: 32k Tokens (may vary depending on the final tokenizer settings and system resources).
- Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.
Ethical Considerations
- Content Generation Risks: Like any language model, Arcee-Blitz can generate potentially harmful or biased content if prompted in certain ways.
License
Arcee-Blitz (24B) is released under the Apache-2.0 License. You are free to use, modify, and distribute this model in both commercial and non-commercial applications, subject to the terms and conditions of the license.
If you have questions or would like to share your experiences using Arcee-Blitz (24B), please connect with us on social media. We’re excited to see what you build—and how this model helps you innovate!
- Downloads last month
- -
Model tree for jth01/Arcee-Blitz-6.5bpw-h8-exl2
Base model
mistralai/Mistral-Small-24B-Base-2501