arxiv:2509.16596

Analyzing the Effects of Supervised Fine-Tuning on Model Knowledge from Token and Parameter Levels

Published on Sep 20

· Submitted by

Yuming Yang on Sep 23

Upvote

Authors:

Junjie Ye ,

Yuming Yang ,

Abstract

Supervised fine-tuning of large language models can negatively impact closed-book question answering performance, with up to 90% of parameter updates not contributing to knowledge enhancement.

AI-generated summary

Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.

View arXiv page View PDF Add to collection