Qwen3-14B-AI-Expert-250925 - A Fine-tuned Model for AI Core Technologies

๐Ÿค— Hugging Face   |   ๐Ÿค– ModelScope

This model is a specialized expert on core Artificial Intelligence concepts, developed by performing Instruction Supervised Fine-Tuning (SFT) on the Qwen/Qwen3-14B model.

The fine-tuning was conducted using Low-Rank Adaptation (LoRA), a parameter-efficient technique, on a custom-built dataset. This process adapted the model to provide high-quality, detailed responses specifically within the domains of:

  • Large Language Models (LLMs)
  • Retrieval-Augmented Generation (RAG)
  • AI Agents

The model was fine-tuned with LlaMA-Factory.

  • Developed by: real-jiakai
  • License: apache-2.0
  • Finetuned from model: Qwen/Qwen3-14B

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "GXMZU/Qwen3-14B-ai-expert-250925"

# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Prepare the model input
prompt = "What is MCP Protocol?"
messages = [
    {"role": "system", "content": "You are an AI expert assistant (Focus on LLM, RAG, and Agent Domain) to help with technical questions. You should provide clear, accurate, and helpful responses."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False  # Switches between thinking and non-thinking modes. Default is True.
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

# Parse thinking content (if enabled)
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

if thinking_content:
    print("Thinking content:", thinking_content)
print("Response:", content)

Performance

The primary objective of this fine-tuning is to adapt the model to a specialized domain, enhancing its performance on specific tasks by injecting relevant knowledge and terminology while preserving its foundational generalist capabilities.

Before Fine-tuning vs After Fine-tuning

The model demonstrates significant improvements in domain-specific tasks related to LLM, RAG, and AI Agents, as shown in the example below:

Note: Currently, due to the lack of a test dataset, we cannot effectively quantify the model's performance before and after fine-tuning. We plan to construct a fine-tuning test dataset in the future for better evaluation.

NLG Evaluation

The following table shows the model's performance on standard benchmarks after fine-tuning:

Benchmark Metric Qwen/Qwen3-14B (Base) Qwen3-14B-ai-expert-250925 (Fine-tuned)
MMLU Average 35.29 31.68
STEM 35.49 34.39
Social Sciences 38.64 30.52
Humanities 31.90 29.08
Other 36.86 34.05
CEval Average 33.21 39.00
STEM 34.42 40.47
Social Sciences 32.36 40.36
Humanities 29.96 33.85
Other 34.64 39.84
CMMLU Average 32.35 33.61
STEM 31.61 36.67
Social Sciences 34.15 32.45
Humanities 31.06 32.62
Other 31.86 33.26

The results show that the model maintains strong general capabilities while gaining specialized expertise.

Fine-tuning Procedure

Dataset

The model was fine-tuned on a custom, high-quality dataset of 9,735 Alpaca-format items. The dataset was carefully curated to cover three core areas:

  • Large Language Models (LLM)
  • Retrieval-Augmented Generation (RAG)
  • AI Agents

Training Loss

You can view the full training run on Weights & Biases.

Future Plans

  • Enhanced Evaluation Framework: Implement more flexible evaluation metrics including LLM-as-a-Judge methodologies (prerequisite: developing comprehensive test datasets for rigorous assessment)
  • Dataset Expansion: Continue to enrich the instruction fine-tuning dataset by maintaining the high quality of existing data while adding new data with emphasis on both quality and quantity
  • Data Quality Enhancement: Refine the existing instruction tuning dataset by correcting and standardizing its phrasing and formatting.

Citation

If you use this model in your work, please cite it as:

@misc{Qwen3-14B-AI-Expert-250925,
  author = {real-jiakai},
  title = {Qwen3-14B-AI-Expert-250925},
  year = 2025,
  url = {https://huggingface.co/GXMZU/Qwen3-14B-ai-expert-250925},
  publisher = {Hugging Face}
}

@misc{qwen3technicalreport,
  title={Qwen3 Technical Report}, 
  author={Qwen Team},
  year={2025},
  eprint={2505.09388},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.09388}
}

@inproceedings{zheng2024llamafactory,
  title={LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models},
  author={Yaowei Zheng and Richong Zhang and Junhao Zhang and Yanhan Ye and Zheyan Luo and Zhangchi Feng and Yongqiang Ma},
  booktitle={Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)},
  address={Bangkok, Thailand},
  publisher={Association for Computational Linguistics},
  year={2024},
  url={http://arxiv.org/abs/2403.13372}
}
Downloads last month
7
Safetensors
Model size
15B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for GXMZU/Qwen3-14B-ai-expert-250925

Finetuned
Qwen/Qwen3-14B
Finetuned
(134)
this model
Quantizations
1 model