--- license: mit language: - en base_model: - mistralai/Mistral-7B-v0.1 - meta-llama/Llama-2-7b-hf library_name: transformers tags: - mergekit - merged-model - mistral - llama2 - language-model --- # 🧬 Mistral-LLaMA-Fusion: A Hybrid of Open Weight Titans ## πŸ“Œ Overview **Mistral-LLaMA-Fusion** is an **experimental merged language model** combining the strengths of **Mistral-7B-v0.1** and **LLaMA-2-7B** using the **Linear Merge** method via [MergeKit](https://github.com/cg123/mergekit). This hybrid model aims to balance Mistral’s efficiency and architecture with LLaMA-2’s robustness in reasoning and instruction following. πŸ”— **Created by**: [Matteo Khan] πŸŽ“ **Affiliation**: Apprentice at TW3 Partners (Generative AI Research) πŸ“ **License**: MIT πŸ”— [Connect on LinkedIn](https://www.linkedin.com/in/matteo-khan-a10309263/) πŸ”— [Model on Hugging Face](https://huggingface.co/MatteoKhan/Mistral-LLaMA-Fusion) ## 🧠 Model Details - **Model Type**: Merged Language Model - **Parent Models**: - [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) - [LLaMA-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) - **Merging Method**: Linear Merge (via MergeKit) ## 🎯 Intended Use This model is suited for research in model merging and hybridization, and can be used for: - βœ… Text Generation - βœ… Instruction Following - βœ… Creative Writing - βœ… Prompt Engineering Experiments ## ⚠️ Limitations As with all merged models, this fusion may inherit and combine weaknesses from both parents: - ❌ Possible generation of false, biased, or inappropriate content - ⚠️ Unpredictable behavior in edge cases - πŸ“‰ No guaranteed performance gain across all benchmarks ## πŸ”¬ Merging Configuration ```yaml merge_method: linear dtype: float16 models: - model: mistralai/Mistral-7B-v0.1 parameters: t: 1.0 weight: 0.6 - model: meta-llama/Llama-2-7b-hf parameters: t: 1.0 weight: 0.4 parameters: normalize: true int8_mask: false layers: - pattern: "model.*" πŸ“Œ Note: No additional fine-tuning was performed. This is a straight merge using MergeKit. 🌱 Why Merging? Merging allows rapid experimentation with existing checkpoints while reducing the computational cost and carbon footprint compared to training from scratch. πŸš€ How to Use python Copier Modifier from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "MatteoKhan/Mistral-LLaMA-Fusion" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto") prompt = "Explain the benefits of merging language models." inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=200) print(tokenizer.decode(outputs[0], skip_special_tokens=True))