--- license: mit ---

KnowRL

Exploring Knowledgeable Reinforcement Learning for Factuality

  📄arXiv •   💻GitHub Repo •   📖Dataset

--- ## Model Description **KnowRL-DeepSeek-R1-Distill-Qwen-7B** is a slow-thinking language model that results from applying our **KnowRL** framework to the base model `DeepSeek-R1-Distill-Qwen-7B`. The **KnowRL (Knowledgeable Reinforcement Learning)** framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. The model is trained using **Knowledgeable Reinforcement Learning (RL)**, where a reward signal explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries. As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model. ## How to Use ### Using the `transformers` Library You can use this model with the `transformers` library for text generation tasks. It is important to follow the specific prompt format, which includes `` and `` tags, to get the best results. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer # Set the device device = "cuda" if torch.cuda.is_available() else "cpu" # Load the model and tokenizer model_name = "zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device) # Define the prompt using the model's template prompt = "What is the main function of the mitochondria?" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) # Generate a response inputs = tokenizer(text, return_tensors="pt").to(device) outputs = model.generate(**inputs, max_new_tokens=512) # Decode and print the output response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ### Using `huggingface-cli` You can also download the model from the command line using `huggingface-cli`. ```bash huggingface-cli download zjunlp/KnowRL-DeepSeek-R1-Distill-Qwen-7B --local-dir KnowRL-DeepSeek-R1-Distill-Qwen-7B ``` ## Training Details The model is trained using Knowledgeable Reinforcement Learning (RL) (specifically GRPO) using data from the `zjunlp/KnowRL-Train-Data`. For complete details on the training configuration and hyperparameters, please refer to our [GitHub repository](https://github.com/zjunlp/KnowRL ). --- ## Citation If you find this model useful in your research, please consider citing our paper: ```bibtex @article{ren2025knowrl, title={KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality}, author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu}, journal={arXiv preprint arXiv:2506.19807}, year={2025} } ```