Instructions to use m-beps/qwen3-8b-finetune-multit-nothinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use m-beps/qwen3-8b-finetune-multit-nothinking with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B") model = PeftModel.from_pretrained(base_model, "m-beps/qwen3-8b-finetune-multit-nothinking") - Transformers
How to use m-beps/qwen3-8b-finetune-multit-nothinking with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="m-beps/qwen3-8b-finetune-multit-nothinking") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("m-beps/qwen3-8b-finetune-multit-nothinking", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use m-beps/qwen3-8b-finetune-multit-nothinking with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "m-beps/qwen3-8b-finetune-multit-nothinking" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "m-beps/qwen3-8b-finetune-multit-nothinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/m-beps/qwen3-8b-finetune-multit-nothinking
- SGLang
How to use m-beps/qwen3-8b-finetune-multit-nothinking with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "m-beps/qwen3-8b-finetune-multit-nothinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "m-beps/qwen3-8b-finetune-multit-nothinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "m-beps/qwen3-8b-finetune-multit-nothinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "m-beps/qwen3-8b-finetune-multit-nothinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use m-beps/qwen3-8b-finetune-multit-nothinking with Docker Model Runner:
docker model run hf.co/m-beps/qwen3-8b-finetune-multit-nothinking
Qwen3 8B β Italian Cultural Alignment [V1]
Qwen3 8B [V1] is a LoRA adapter fine-tuned on top of Qwen/Qwen3-8B to improve Italian cultural alignment. It was trained on the Mult-IT dataset and evaluated on the ITALIC benchmark. This is the first version in a series of experiments exploring how supervised fine-tuning affects both cultural performance and the chain-of-thought reasoning capabilities of Qwen3's hybrid-reasoning architecture.
Author: Maruf Bepary, King's College London
Research report: Alignment in Large Language Models
β οΈ Important: V1 was trained exclusively on non-thinking format data. This caused catastrophic forgetting of Qwen3's chain-of-thought (
<think>) capability. Use No Thinking mode only with this adapter. See Key Finding below.
Model Summary
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3-8B |
| PEFT type | LoRA |
| Task | Causal language modelling (Italian Q&A / instruction following) |
| Training dataset | Mult-IT (~86,929 samples) |
| Evaluation benchmark | ITALIC (10,000 questions) |
| No Thinking accuracy (V1) | 73.77% (+3.60 pp over baseline) |
| Thinking accuracy (V1) | 59.33% (β15.16 pp β collapsed) |
| Trainable parameters | 65,470,464 / 8,256,205,824 (0.79%) |
Intended Use
This model is intended for:
- Italian language understanding β multiple-choice Q&A, cultural knowledge, and general instruction following in Italian.
- Research β studying the effects of SFT on hybrid-reasoning language models, particularly reasoning mode degradation.
- Benchmarking β comparing Italian cultural alignment across model sizes and training strategies.
Not recommended for:
- Tasks requiring chain-of-thought reasoning (Thinking mode is non-functional in V1).
- High-stakes or safety-critical applications.
- Languages other than Italian.
Key Finding β Reasoning Degradation
Training Qwen3 (a hybrid-reasoning model) exclusively on non-thinking format supervised fine-tuning data causes catastrophic forgetting of chain-of-thought capability:
| Mode | Baseline | V1 | Delta |
|---|---|---|---|
| No Thinking (total) | 70.17% | 73.77% | +3.60 pp |
| Thinking (total) | 74.49% | 59.33% | β15.16 pp |
V1 improved No Thinking performance across all 12 ITALIC categories whilst completely disrupting the <think>β¦</think> reasoning pathway. This finding motivated a mixed-training approach in V2 and V3, where both thinking and non-thinking formatted examples are interleaved within a single SFT pass.
Training Details
LoRA Configuration
| Parameter | Value |
|---|---|
LoRA rank (r) |
24 |
| LoRA alpha | 48 |
| LoRA dropout | 0.1 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Bias | none |
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 2 |
| Total steps | 3,076 |
| Per-device batch size | 4 |
| Sequence packing | Yes (max 2,048 tokens per slot) |
| Peak learning rate | ~4Γ10β»β΅ |
| LR schedule | Cosine |
| Warmup steps | |
| Max sequence length | 2,048 tokens |
Checkpoints
| Checkpoint | Step | Epoch |
|---|---|---|
checkpoint-1538 |
1,538 | 1 |
checkpoint-3076 |
3,076 | 2 (final) |
Framework & Hardware
| Component | Version / Spec |
|---|---|
| TRL | 0.21.0 |
| PEFT | 0.17.0 |
| Transformers | 4.55.0 |
| PyTorch | 2.5.1+cu121 |
| Hardware | NVIDIA GeForce RTX 3090 |
Training Dataset β Mult-IT
- Dataset: Mult-IT β Multiple Choice Questions on Multiple Topics in Italian
- Source: CALAMITA Shared Task @ CLiC-it 2024
- Language: Italian
- Size: ~86,929 training samples
- Format: JSONL, multiple-choice Q&A
- Reference: Mult-IT: Multiple Choice Questions on Multiple Topics in Italian (2024)
ITALIC Benchmark Results
Benchmark: ITALIC (NAACL 2025) β Italian Culture-Aware Natural Language Benchmark
Format: Zero-shot, multiple-choice (12 categories, 10,000 questions)
System prompt: "Sei un assistente utile."
No Thinking Mode β V1 vs Baseline
| Category | Baseline | V1 | Ξ |
|---|---|---|---|
| Art | 69.29 | 71.02 | +1.73 |
| Civic | 73.18 | 76.98 | +3.80 |
| Events | 76.09 | 76.02 | β0.07 |
| Geography | 75.89 | 77.22 | +1.33 |
| History | 71.37 | 74.44 | +3.07 |
| Literature | 64.33 | 68.09 | +3.76 |
| Tourism | 68.27 | 69.49 | +1.22 |
| Lexicon | 84.27 | 87.33 | +3.06 |
| Morphology | 50.71 | 54.71 | +4.00 |
| Orthography | 54.04 | 63.44 | +9.40 |
| Synonyms | 84.04 | 90.42 | +6.38 |
| Syntax | 59.20 | 61.87 | +2.67 |
| Culture (subtotal) | 70.47 | 72.91 | +2.44 |
| Language (subtotal) | 69.73 | 75.05 | +5.32 |
| Total | 70.17 | 73.77 | +3.60 |
Thinking Mode β V1 vs Baseline (collapsed)
| Metric | Baseline | V1 | Ξ |
|---|---|---|---|
| Total | 74.49 | 59.33 | β15.16 |
| Culture | 73.13 | 57.25 | β15.88 |
| Language | 76.49 | 62.42 | β14.07 |
Comparison with Other Models (No Thinking, ITALIC Total)
| Model | Total | Parameters |
|---|---|---|
| Llama 3.1 70B | 83.61% | 70B |
| GPT-4o Mini | 82.22% | ~8B |
| Qwen3 14B (No Thinking) | 77.78% | 14B |
| Qwen3 8B (No Thinking) [V3] | 73.81% | 8B |
| Qwen3 8B (No Thinking) [V1] | 73.77% | 8B |
| Llama 3.1 8B Ita [V1] | 73.91% | 8B |
| Qwen3 8B (No Thinking) baseline | 70.17% | 8B |
| Llama 3.1 8B | 66.38% | 8B |
All scores evaluated under identical zero-shot conditions on the ITALIC benchmark.
Usage
β οΈ Thinking mode must be disabled. V1 fine-tuning disrupted Qwen3's chain-of-thought capability. Always pass
enable_thinking=Falsewhen using this adapter.
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model_id = "Qwen/Qwen3-8B"
adapter_id = "maruf-bepary/qwen3-8b-italian-v1"
# Load tokeniser and base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
# Example: Italian multiple-choice question
messages = [
{"role": "system", "content": "Sei un assistente utile."},
{
"role": "user",
"content": (
"Qual Γ¨ la capitale d'Italia?\n"
"A) Milano\nB) Roma\nC) Napoli\nD) Torino\n\n"
"Rispondi con la lettera della risposta corretta."
),
},
]
# Apply chat template β disable thinking mode (critical for V1)
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False, # <-- must be False for V1
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=64,
do_sample=False,
temperature=None,
top_p=None,
)
response = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True,
)
print(response)
# Expected output: "B"
Limitations
- Thinking mode is non-functional β chain-of-thought reasoning was catastrophically disrupted during V1 training. Use No Thinking mode exclusively.
- Morphology remains the weakest category at 54.71%, suggesting limited syntactic generalisation.
- Benchmark scope β evaluation was conducted solely on ITALIC; Italian cultural performance on other benchmarks (e.g. MMLU-IT, HellaSwag-IT) is unverified.
- Single-GPU training β training used one RTX 3090; larger batch sizes or multi-GPU configurations may yield different results.
- Dataset bias β Mult-IT is a multiple-choice dataset; the model may not generalise equally well to open-ended Italian generation tasks.
Related resources:
- Research report: Alignment in Large Language Models
- Base model: Qwen/Qwen3-8B
- ITALIC benchmark: RiTA-nlp/ITALIC
- Mult-IT dataset: sapienzanlp/Mult-IT
- PEFT documentation: huggingface.co/docs/peft
- Downloads last month
- 2