Qwen3-14B-AI-Expert-250925 - A Fine-tuned Model for AI Core Technologies
๐ค Hugging Face | ๐ค ModelScope
This model is a specialized expert on core Artificial Intelligence concepts, developed by performing Instruction Supervised Fine-Tuning (SFT) on the Qwen/Qwen3-14B model.
The fine-tuning was conducted using Low-Rank Adaptation (LoRA), a parameter-efficient technique, on a custom-built dataset. This process adapted the model to provide high-quality, detailed responses specifically within the domains of:
- Large Language Models (LLMs)
- Retrieval-Augmented Generation (RAG)
- AI Agents
The model was fine-tuned with LlaMA-Factory.
- Developed by: real-jiakai
- License: apache-2.0
- Finetuned from model: Qwen/Qwen3-14B
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "GXMZU/Qwen3-14B-ai-expert-250925"
# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Prepare the model input
prompt = "What is MCP Protocol?"
messages = [
{"role": "system", "content": "You are an AI expert assistant (Focus on LLM, RAG, and Agent Domain) to help with technical questions. You should provide clear, accurate, and helpful responses."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# Parse thinking content (if enabled)
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
if thinking_content:
print("Thinking content:", thinking_content)
print("Response:", content)
Performance
The primary objective of this fine-tuning is to adapt the model to a specialized domain, enhancing its performance on specific tasks by injecting relevant knowledge and terminology while preserving its foundational generalist capabilities.
Before Fine-tuning vs After Fine-tuning
The model demonstrates significant improvements in domain-specific tasks related to LLM, RAG, and AI Agents, as shown in the example below:
Note: Currently, due to the lack of a test dataset, we cannot effectively quantify the model's performance before and after fine-tuning. We plan to construct a fine-tuning test dataset in the future for better evaluation.
NLG Evaluation
The following table shows the model's performance on standard benchmarks after fine-tuning:
| Benchmark | Metric | Qwen/Qwen3-14B (Base) | Qwen3-14B-ai-expert-250925 (Fine-tuned) |
|---|---|---|---|
| MMLU | Average | 35.29 | 31.68 |
| STEM | 35.49 | 34.39 | |
| Social Sciences | 38.64 | 30.52 | |
| Humanities | 31.90 | 29.08 | |
| Other | 36.86 | 34.05 | |
| CEval | Average | 33.21 | 39.00 |
| STEM | 34.42 | 40.47 | |
| Social Sciences | 32.36 | 40.36 | |
| Humanities | 29.96 | 33.85 | |
| Other | 34.64 | 39.84 | |
| CMMLU | Average | 32.35 | 33.61 |
| STEM | 31.61 | 36.67 | |
| Social Sciences | 34.15 | 32.45 | |
| Humanities | 31.06 | 32.62 | |
| Other | 31.86 | 33.26 |
The results show that the model maintains strong general capabilities while gaining specialized expertise.
Fine-tuning Procedure
Dataset
The model was fine-tuned on a custom, high-quality dataset of 9,735 Alpaca-format items. The dataset was carefully curated to cover three core areas:
- Large Language Models (LLM)
- Retrieval-Augmented Generation (RAG)
- AI Agents
Training Loss
You can view the full training run on Weights & Biases.
Future Plans
- Enhanced Evaluation Framework: Implement more flexible evaluation metrics including LLM-as-a-Judge methodologies (prerequisite: developing comprehensive test datasets for rigorous assessment)
- Dataset Expansion: Continue to enrich the instruction fine-tuning dataset by maintaining the high quality of existing data while adding new data with emphasis on both quality and quantity
- Data Quality Enhancement: Refine the existing instruction tuning dataset by correcting and standardizing its phrasing and formatting.
Citation
If you use this model in your work, please cite it as:
@misc{Qwen3-14B-AI-Expert-250925,
author = {real-jiakai},
title = {Qwen3-14B-AI-Expert-250925},
year = 2025,
url = {https://huggingface.co/GXMZU/Qwen3-14B-ai-expert-250925},
publisher = {Hugging Face}
}
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388}
}
@inproceedings{zheng2024llamafactory,
title={LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models},
author={Yaowei Zheng and Richong Zhang and Junhao Zhang and Yanhan Ye and Zheyan Luo and Zhangchi Feng and Yongqiang Ma},
booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
address={Bangkok, Thailand},
publisher={Association for Computational Linguistics},
year={2024},
url={http://arxiv.org/abs/2403.13372}
}
- Downloads last month
- 7
