--- license: apache-2.0 language: - en tags: - conversational - instruction-following - chat - gguf - llama.cpp - ollama - local-llm - Neutrino pipeline_tag: text-generation datasets: - HuggingFaceFW/finepdfs - fka/awesome-chatgpt-prompts - roneneldan/TinyStories - wikimedia/wikipedia - codeparrot/github-code - HuggingFaceFW/finewiki - karpathy/fineweb-edu-100b-shuffle metrics: - accuracy - bertscore - bleu - bleurt - brier_score - cer library_name: adapter-transformers base_model: - neuralcrew/neutrino-instruct --- # 🧠 Neutrino-Instruct (7B) ![Alt text](https://ollama.com/assets/fardeen0424/neutrino/742305f0-8c9e-4ae8-acff-a7c2b133d3d8) Neutrino-Instruct is a **7B parameter instruction-tuned LLM** developed by **Fardeen NB**. It is designed for **conversational AI**, **multi-step reasoning**, and **instruction-following** tasks, fine-tuned to maintain coherent and contextual dialogue across multiple turns. ## ✨ Model Details - **Model Name:** Neutrino-Instruct - **Developer:** Fardeen NB - **License:** Apache-2.0 - **Language(s):** English - **Format:** GGUF (optimized for `llama.cpp` and `Ollama`) - **Base Model:** Neutrino - **Version:** 2.0 - **Task:** Text Generation (chat, Q&A, instruction-following) ## 🚀 Quick Start ### Run with [llama.cpp](https://github.com/ggerganov/llama.cpp) ```bash # Clone and build llama.cpp git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make # Run a single prompt ./main -m ./neutrino-instruct.gguf -p "Hello, who are you?" # Run in interactive mode ./main -m ./neutrino-instruct.gguf -i -p "Let's chat." # Control output length ./main -m ./neutrino-instruct.gguf -n 256 -p "Write a poem about stars." # Change creativity (temperature) ./main -m ./neutrino-instruct.gguf --temp 0.7 -p "Explain quantum computing simply." # Enable GPU acceleration (if compiled with CUDA/Metal) ./main -m ./neutrino-instruct.gguf --gpu-layers 50 -p "Summarize this article." ``` ### Run with [Ollama](https://ollama.com/fardeen0424/neutrino) ```bash ollama run fardeen0424/neutrino ``` ### Run in Python (`llama-cpp-python`) ```python from llama_cpp import Llama # Load the Neutrino-Instruct model llm = Llama(model_path="./neutrino-instruct.gguf") # Run inference response = llm("Who are you?") print(response["choices"][0]["text"]) # Stream output tokens for token in llm("Tell me a story about Neutrino:", stream=True): print(token["choices"][0]["text"], end="", flush=True) ``` ## 📊 System Requirements * **CPU-only:** 32–64GB RAM recommended (runs on modern laptops, slower inference). * **GPU acceleration:** * 4GB VRAM → 4-bit quantized (Q4) models * 8GB VRAM → 5-bit/8-bit models * 12GB+ VRAM → FP16 full precision ## 🧩 Potential Use Cases * Conversational AI assistants * Research prototypes * Instruction-following agents * Chatbots with identity-awareness ⚠️ **Out of Scope:** Use in critical decision-making, legal, or medical contexts. ## 🛠️ Development Notes * Model uploaded in **GGUF format** for portability & performance. * Compatible with **llama.cpp**, **Ollama**, and **llama-cpp-python**. * Supports quantization levels (Q4, Q5, Q8) for deployment on resource-constrained devices. ## 📖 Citation If you use Neutrino in your research or projects, please cite: ```bibtex @misc{fardeennb2025neutrino, title = {Neutrino-Instruct: A 7B Instruction-Tuned Conversational Model}, author = {Fardeen NB}, year = {2025}, howpublished = {Hugging Face}, url = {https://huggingface.co/neuralcrew/neutrino-instruct} } ```