--- library_name: transformers license: llama3.1 datasets: - sheryc/hotpotqa_care - sheryc/DROP_care - sheryc/ms_marco_care language: - en base_model: - meta-llama/Llama-3.1-8B-Instruct pipeline_tag: text-generation --- # Llama-3.1-8B-Instruct-CARE
     
## Model Card for Llama-3.1-8B-Instruct-CARE ### Model Description Llama-3.1-8B-Instruct-CARE is an 8B parameter instruction-tuned language model based on meta-llama/Meta-Llama-3.1-8B-Instruct, enhanced with native retrieval-augmented reasoning capabilities through the CARE (Context-Aware Retrieval-Enhanced reasoning) framework. This model has been specifically trained to improve context fidelity and reduce hallucinations by teaching the model to explicitly integrate in-context evidence within its reasoning process. **Key Features:** - **Native retrieval-augmented reasoning**: Dynamically identifies and incorporates relevant evidence from input context - **Improved context fidelity**: Significantly better adherence to provided context, especially when it contradicts parametric knowledge - **Enhanced multi-hop reasoning**: Superior performance on complex reasoning tasks requiring evidence integration - **Structured reasoning output**: Generates reasoning chains with explicit evidence citations using `` and `` tags ### Model Details - **Model Type**: Causal Language Model (Enhanced with Retrieval-Augmented Reasoning) - **Base Model**: meta-llama/Meta-Llama-3.1-8B-Instruct - **Parameters**: 8B total - **Architecture**: Transformer with optimized architecture (GQA, RoPE) - **Context Length**: 128,000 tokens - **Training Framework**: Two-phase training (SFT + Reinforcement Learning with GRPO) ### Training Process The model was trained using a novel two-phase approach: **Phase 1 - Supervised Fine-Tuning (SFT):** - Dataset: 7,739 instances from HotpotQA with retrieval-augmented reasoning chains - Purpose: Establish evidence integration patterns and reasoning format - Training: 3 epochs with LoRA (r=8, α=16), AdamW optimizer **Phase 2 - Reinforcement Learning:** - Method: Group Relative Policy Optimization (GRPO) - Curriculum Learning: Gradual transition from DROP (easy) to MS MARCO (hard) - Rewards: Accuracy + Format + Retrieval consistency - Training: 350 steps with multi-aspect reward optimization ### System Prompt The model uses an enhanced system prompt that enables structured reasoning with evidence retrieval: ``` You are a helpful assistant. You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within tags. WITHIN the thinking process, make reference to the relevant texts in the prompt that provide critical information to move the reasoning process forward. The referenced texts MUST BE enclosed within tags, and MUST BE placed within the reasoning process only. The final answer MUST BE put at the end of the response after "Answer:". ``` **Note**: This system prompt is automatically applied when using the default chat template. ### Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name = "sheryc/Llama-3.1-8B-Instruct-CARE" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) # Example usage context = """John went to the movies with his mom last week. They watched the latest superhero movie, which was quite popular. The ticket price was $15. According to the local cinema's website, ticket prices range from $10 to $12 for regular screenings and from $13 to $16 for special releases.""" question = "Was the ticket price John's mom paid for the movie reasonable?" messages = [ {"role": "user", "content": f"{question}\n\nContext:{context}"} ] tokenized_chat = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" ) generated_ids = model.generate(tokenized_chat.to(model.device), max_new_tokens=512) output_text = tokenizer.decode(generated_ids[0]) ``` **Expected Output Format:** ``` The context states John watched the latest superhero movie. The ticket price was $15. The context provides price ranges: ticket prices range from $10 to $12 for regular screenings and from $13 to $16 for special releases. Since this was a popular latest superhero movie, it likely qualifies as a special release. Therefore, the $15 price falls within the $13-$16 range for special releases. Answer: Yes, the ticket price was reasonable. ``` ### Training Data - **SFT Phase**: HotpotQA with labeled supporting facts (7,739 instances) - **RL Phase**: - DROP dataset (77,409 training instances) - Easy curriculum phase - MS MARCO - Hard curriculum phase - **Evaluation**: LongBench, CofCA, and other QA benchmarks ### License This model is licensed under the LLaMA 3.1 Community License. Please refer to the original LLaMA 3.1 license terms. ### Citation ```bibtex @inproceedings{wang2025care, title={Improving Context Fidelity via Native Retrieval-Augmented Reasoning}, author={Wang, Suyuchen and Wang, Jinlin and Wang, Xinyu and Li, Shiqi and Tang, Xiangru and Hong, Sirui and Chang, Xiao-Wen and Wu, Chenglin and Liu, Bang}, booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing}, year={2025} } ``` And the original [Llama 3 series paper](https://arxiv.org/abs/2407.21783). ### Contact For questions about the model or to report issues, please visit the [CARE project homepage](https://foundationagents.github.io/CARE/) or contact the authors.