--- license: cc-by-4.0 language: - en base_model: - nvidia/OpenReasoning-Nemotron-7B pipeline_tag: text-generation library_name: transformers tags: - Coding - Reasoning - quantized - nvidia - qwen --- # Quantized OpenReasoning-Nemotron-7B Models This repository provides quantized GGUF versions of the OpenReasoning-Nemotron-7B model. These 4-bit and 5-bit quantized variants retain the original model’s strengths in multimodal medical reasoning, while reducing memory and compute requirements—ideal for efficient inference on resource-constrained devices. ## Model Overview - **Original Model**: OpenReasoning-Nemotron-7B - **Quantized Versions**: - Q4_K_M (4-bit quantization) - Q5_K_M (5-bit quantization) - **Architecture**: Decoder-only transformer - **Base Model**: Qwen2.5-7B-Instruct - **Modalities**: Text only - **Developer**: Qwen - **License**: GOVERNING TERMS: Use of the original models and above listed models are governed by the [Creative Commons Attribution 4.0 International License (CC-BY-4.0).](https://creativecommons.org/licenses/by/4.0/legalcode.en) ADDITIONAL INFORMATION: [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE) - **Language**: English ## Quantization Details ### Q4_K_M Version - Approx. ~70% size reduction - Lower memory footprint (~4.36 GB) - Best suited for deployment on edge devices or low-resource GPUs - Slight performance degradation in complex reasoning scenarios ### Q5_K_M Version - Approx. ~66% size reduction - Higher fidelity (~5.07 GB) - Better performance retention, recommended when quality is a priority ## Key Features - Expert-level reasoning capabilities across math, code, and scientific domains - Text-only instruction-following model optimized for multi-turn scientific question answering - Derived from Qwen2.5-7B-Instruct, further post-trained by NVIDIA on OpenReasoning datasets - Supports long-context inference with generation lengths of up to 64K tokens ### Usage This model is intended for developers and researchers who work on competitive math, code and science problems. It has been trained via only supervised fine-tuning to achieve strong scores on benchmarks. **llama.cpp (text-only)** ```sh ./llama-cli -hf SandLogicTechnologies/OpenReasoning-Nemotron-7B-GGUF -p "What are the laplace transform" ``` ## Model Data ### Dataset Overview The original Qwen2.5-7B-Instruct model is built on top of the Qwen architecture and Post-trained on OpenReasoning datasets by NVIDIA: - **LLM Component**: Trained on diverse OpenReasoning datasets related to the above domains, including Science reports, Reasoning datasets, and Mathamatics and Coding datasets. ## Recommended Use Cases These quantized models are optimized for efficient inference while Maintaining Coding and mathamathics capabilities. Suggested use cases include: - **Scientific question answering** Scientific Research and mathamatics concepts, coding lessions , etc. - **Chatbot and assistant prototypes** Build interactive reasoning chat systems with coding capabilities. - **Research & fine-tuning** Serve as a lightweight base for further task-specific tuning in coding. - **Low-resource deployment** Run reasoning models on CPUs, edge devices, and lightweight GPUs. --- ## Acknowledgments These quantized models are based on the original work by **Qwen** and the **NVIDIA** development team. Special thanks to: - The [Nvidia](https://huggingface.co/nvidia) team for developing and releasing the [OpenReasoning-Nemotron-7B](https://huggingface.co/nvidia/OpenReasoning-Nemotron-7B) model. - **Georgi Gerganov** and the entire [`llama.cpp`](https://github.com/ggerganov/llama.cpp) open-source community for enabling efficient model quantization and inference via the GGUF format. --- ## Contact For any inquiries or support, please contact us at support@sandlogic.com or visit our [Website](https://www.sandlogic.com/).