Instructions to use Surpem/Supertron1-14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Surpem/Supertron1-14B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Surpem/Supertron1-14B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron1-14B") model = AutoModelForCausalLM.from_pretrained("Surpem/Supertron1-14B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Surpem/Supertron1-14B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Surpem/Supertron1-14B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Surpem/Supertron1-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Surpem/Supertron1-14B
- SGLang
How to use Surpem/Supertron1-14B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Surpem/Supertron1-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Surpem/Supertron1-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Surpem/Supertron1-14B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Surpem/Supertron1-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Surpem/Supertron1-14B with Docker Model Runner:
docker model run hf.co/Surpem/Supertron1-14B
Supertron1-14B: A Powerful Instruction-Tuned Language Model
Model Description
Supertron1-14B is a QLoRA fine-tuned language model built on top of Qwen3-14B. Trained with a focus on coding, mathematics, general knowledge, and safe reasoning, it delivers strong performance across technical and analytical tasks while maintaining a high standard of safety and helpfulness.
- Developed by: Surpem
- Model type: Causal Language Model
- Architecture: Dense Transformer, 14B parameters
- Fine-tuned from: Qwen/Qwen3-14B
- Fine-tuning method: QLoRA (4-bit NF4 + LoRA r=16, alpha=32, all-linear targets)
- License: Apache 2.0
Capabilities
Coding
Trained on tens of thousands of high-quality coding instruction pairs, the model can write, debug, explain, and refactor code across Python, JavaScript, C++, and more. It understands algorithmic thinking, software design patterns, and produces clean, well-commented output.
Mathematics
With dedicated training on competition-style math and step-by-step solutions, the model handles everything from algebra and calculus to advanced problem solving. It shows its full working rather than jumping to answers, making it reliable for both learning and verification.
General Knowledge
Broad instruction tuning across diverse domains means the model holds detailed technical conversations, assists with research and writing, explains difficult concepts clearly, and adapts to a wide range of tasks and formats.
Safety
Trained on the Anthropic HH-RLHF harmless dataset, the model is tuned to recognize and refuse harmful, illegal, or unethical requests with a brief, clear explanation. It prioritises helpfulness within safe boundaries.
Instruction Following
The model is highly responsive to natural language instructions and adapts its tone, format, and depth based on what you ask for — from concise one-liners to detailed technical walkthroughs.
Get Started
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "Surpem/Supertron1-14B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
messages = [
{"role": "user", "content": "Write a Python function that checks if a number is prime."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Hardware Requirements
| Precision | Min VRAM | Recommended |
|---|---|---|
| bfloat16 | 30 GB | 40 GB (A100) |
| 4-bit quantized | 10 GB | 16 GB (RTX 3090/4090) |
For 4-bit quantized inference:
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
Citation
@misc{surpem2026supertron1-14b,
title={Supertron1-14B — Instruction-Tuned Language Model},
author={Surpem},
year={2026},
url={https://huggingface.co/Surpem/Supertron1-14B},
}
- Downloads last month
- 2,560