Instructions to use mkd-hossain/keural-dpo-3500 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mkd-hossain/keural-dpo-3500 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mkd-hossain/keural-dpo-3500", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("mkd-hossain/keural-dpo-3500", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use mkd-hossain/keural-dpo-3500 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mkd-hossain/keural-dpo-3500" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-dpo-3500", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mkd-hossain/keural-dpo-3500
- SGLang
How to use mkd-hossain/keural-dpo-3500 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mkd-hossain/keural-dpo-3500" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-dpo-3500", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mkd-hossain/keural-dpo-3500" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkd-hossain/keural-dpo-3500", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mkd-hossain/keural-dpo-3500 with Docker Model Runner:
docker model run hf.co/mkd-hossain/keural-dpo-3500
Keural-DPO-14.83B (checkpoint 3500)
Keural is a bilingual KoreanโEnglish Mixture-of-Experts language model trained entirely from scratch. This is the DPO (Direct Preference Optimization) checkpoint at step 3,500 (~50% of 1 epoch), aligned from the Keural SFT-18k base using human preference data.
DPO alignment improves response quality, instruction-following, and reduces off-topic outputs compared to the SFT base.
Model Details
| Property | Value |
|---|---|
| Architecture | Mixtral-style MoE (8 experts, top-2 routing) |
| Parameters | 14.83B total / ~7.42B active per token |
| Layers | 24 |
| Hidden size | 4096 |
| Attention heads | 32 (GQA โ 8 KV heads) |
| Expert intermediate size | 5632 |
| Context length | 4096 tokens |
| Vocabulary | 131,074 (131,072 SPM + `< |
| RoPE theta | 500,000 |
| Sliding window | 512 (alternating every other layer) |
| Dtype | bfloat16 |
| Languages | Korean (primary), English |
Full Training Pipeline
| Stage | Steps | Tokens | Data |
|---|---|---|---|
| Pretraining Stage 1 | 100,000 | ~50B | Korean + English web corpus |
| Pretraining Stage 2 | 120,000 | ~13B | Korean + English web corpus (continued) |
| SFT | 18,000 | 710M | mkd-chanwoo/keural-SFT (1.14M ChatML samples) |
| DPO (this checkpoint) | 3,500 / 6,927 | โ | keural-dpo-raw (440K preference pairs) |
DPO Hyperparameters
| Hyperparameter | Value |
|---|---|
| Learning rate | 2e-6 โ 2e-7 cosine decay |
| Warmup steps | 100 |
| Beta (KL coefficient) | 0.1 |
| Effective batch size | 64 (2 per GPU ร 16 grad accum ร 2 GPUs) |
| Max sequence length | 1024 tokens |
| Optimizer | AdamW (ฮฒ1=0.9, ฮฒ2=0.95, ฮต=1e-8) |
| Weight decay | 0.1 |
| Max steps | 6,927 (1 epoch over 440K pairs) |
| Hardware | 2ร NVIDIA H200 SXM (139 GiB each) |
| Parallelism | FSDP FULL_SHARD (ZeRO-3 equivalent) |
| Precision | bfloat16 + gradient checkpointing |
SFT Hyperparameters (base checkpoint)
| Hyperparameter | Value |
|---|---|
| Learning rate | 1e-5 โ 1e-6 cosine decay |
| Effective batch size | 64 (4 per GPU ร 8 grad accum ร 2 GPUs) |
| Max sequence length | 4096 tokens |
| Weight decay | 0.05 |
| Steps | 18,000 |
Chat Format (ChatML)
This model uses ChatML format. You must use this exact format.
<|im_start|>system
You are a helpful bilingual Korean-English assistant.<|im_end|>
<|im_start|>user
์๋
ํ์ธ์! ์ค๋ ๋ ์จ๊ฐ ์ด๋์?<|im_end|>
<|im_start|>assistant
The model generates until it produces <|im_end|> (token ID 131073).
Tip: Always include a system prompt. The model responds in the same language as the user when instructed to do so.
How to Use
With transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "mkd-hossain/keural-dpo-3500"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful bilingual Korean-English assistant. Always respond in the same language as the user."},
{"role": "user", "content": "ํ์ด์ฌ์์ ๋ฆฌ์คํธ๋ฅผ ์ ๋ ฌํ๋ ๋ฐฉ๋ฒ์ ์๋ ค์ฃผ์ธ์."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
no_repeat_ngram_size=8,
do_sample=True,
eos_token_id=131073, # <|im_end|>
)
response = tokenizer.decode(output[0][inputs.input_ids.shape[1]:], skip_special_tokens=False)
response = response.split("<|im_end|>")[0].strip()
print(response)
With vLLM (recommended for serving)
pip install vllm
python -m vllm.entrypoints.openai.api_server \
--model mkd-hossain/keural-dpo-3500 \
--tokenizer mkd-hossain/keural-dpo-3500 \
--dtype bfloat16 \
--max-model-len 4096 \
--tensor-parallel-size 1
Then call the OpenAI-compatible endpoint:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
model="mkd-hossain/keural-dpo-3500",
messages=[
{"role": "system", "content": "You are a helpful bilingual assistant. Respond in the same language as the user."},
{"role": "user", "content": "ํ๊ตญ์ ์๋๋ ์ด๋์ธ๊ฐ์?"},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
Multi-GPU serving
python -m vllm.entrypoints.openai.api_server \
--model mkd-hossain/keural-dpo-3500 \
--dtype bfloat16 \
--max-model-len 4096 \
--tensor-parallel-size 2
Manual ChatML prompt (without apply_chat_template)
prompt = (
"<|im_start|>system\n"
"You are a helpful bilingual Korean-English assistant. "
"Always respond in the same language as the user.\n"
"<|im_end|>\n"
"<|im_start|>user\n"
"Tell me about Seoul.<|im_end|>\n"
"<|im_start|>assistant\n"
)
Special Tokens
| Token | ID | Purpose |
|---|---|---|
| `< | im_start | >` |
| `< | im_end | >` |
<bos> |
1 | Beginning of sequence |
<eos> |
2 | End of sequence |
<pad> |
0 | Padding |
Important: Always set
eos_token_id=131073(<|im_end|>) when generating. Do not useeos_token_id=2.
Recommended Generation Settings
generation_config = {
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.9,
"top_k": 50,
"repetition_penalty": 1.1,
"no_repeat_ngram_size": 8,
"do_sample": True,
"eos_token_id": 131073,
}
For factual / deterministic tasks:
{"temperature": 0.1, "do_sample": False, "eos_token_id": 131073}
DPO Dataset
Training used the keural-dpo-raw dataset โ 440,627 chosen/rejected preference pairs in ChatML format covering:
- General conversation (Korean and English)
- Question answering
- Instruction following
- Knowledge tasks
Limitations
- This is a mid-training checkpoint (step 3,500 of 6,927). A full-epoch checkpoint will be released when training completes.
- Maximum context is 4,096 tokens.
- The pretraining corpus is Korean-dominant. The model may default to Korean if no system prompt is provided.
- Always include a system prompt instructing the model to match the user's language for bilingual use.
- Not aligned for safety โ do not deploy in production without additional safety fine-tuning.
License
Apache 2.0
- Downloads last month
- 239
Model tree for mkd-hossain/keural-dpo-3500
Base model
mkd-hossain/keural-sft-18k