---
license: cc-by-4.0
language:
- en
base_model:
- nvidia/OpenReasoning-Nemotron-7B
pipeline_tag: text-generation
library_name: transformers
tags:
- Coding
- Reasoning
- quantized
- nvidia
- qwen
---

# Quantized OpenReasoning-Nemotron-7B Models

This repository provides quantized GGUF versions of the OpenReasoning-Nemotron-7B model. These 4-bit and 5-bit quantized variants retain the original model’s strengths in multimodal medical reasoning, while reducing memory and compute requirements—ideal for efficient inference on resource-constrained devices.


## Model Overview

- **Original Model**: OpenReasoning-Nemotron-7B
- **Quantized Versions**: 
  - Q4_K_M (4-bit quantization)
  - Q5_K_M (5-bit quantization)
- **Architecture**: Decoder-only transformer 
- **Base Model**: Qwen2.5-7B-Instruct
- **Modalities**: Text only
- **Developer**: Qwen
- **License**: GOVERNING TERMS: Use of the original models and above listed models are governed by the [Creative Commons Attribution 4.0 International License (CC-BY-4.0).](https://creativecommons.org/licenses/by/4.0/legalcode.en) ADDITIONAL INFORMATION: [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct/blob/main/LICENSE)
- **Language**: English

## Quantization Details

### Q4_K_M Version
- Approx. ~70% size reduction
- Lower memory footprint (~4.36 GB)
- Best suited for deployment on edge devices or low-resource GPUs
- Slight performance degradation in complex reasoning scenarios

### Q5_K_M Version
- Approx. ~66% size reduction
- Higher fidelity (~5.07 GB)
- Better performance retention, recommended when quality is a priority

## Key Features

- Expert-level reasoning capabilities across math, code, and scientific domains
- Text-only instruction-following model optimized for multi-turn scientific question answering
- Derived from Qwen2.5-7B-Instruct, further post-trained by NVIDIA on OpenReasoning datasets
- Supports long-context inference with generation lengths of up to 64K tokens

### Usage
This model is intended for developers and researchers who work on competitive math, code and science problems. It has been trained via only supervised fine-tuning to achieve strong scores on benchmarks.

**llama.cpp (text-only)**
```sh
./llama-cli -hf SandLogicTechnologies/OpenReasoning-Nemotron-7B-GGUF -p "What are the laplace transform"
```


## Model Data

### Dataset Overview

The original Qwen2.5-7B-Instruct model is built on top of the Qwen architecture and Post-trained on OpenReasoning datasets by NVIDIA:

- **LLM Component**: Trained on diverse OpenReasoning datasets related to the above domains, including Science reports, Reasoning datasets, and  Mathamatics and Coding datasets.


## Recommended Use Cases

These quantized models are optimized for efficient inference while Maintaining Coding and mathamathics capabilities. Suggested use cases include:

- **Scientific question answering**   
  Scientific Research and mathamatics concepts, coding lessions , etc.

- **Chatbot and assistant prototypes**  
  Build interactive reasoning chat systems with coding capabilities.

- **Research & fine-tuning**  
  Serve as a lightweight base for further task-specific tuning in coding.

- **Low-resource deployment**  
  Run reasoning models on CPUs, edge devices, and lightweight GPUs.

---


## Acknowledgments

These quantized models are based on the original work by **Qwen** and the **NVIDIA** development team.

Special thanks to:
- The [Nvidia](https://huggingface.co/nvidia) team for developing and releasing the [OpenReasoning-Nemotron-7B](https://huggingface.co/nvidia/OpenReasoning-Nemotron-7B) model.
- **Georgi Gerganov** and the entire [`llama.cpp`](https://github.com/ggerganov/llama.cpp) open-source community for enabling efficient model quantization and inference via the GGUF format.

---


## Contact
For any inquiries or support, please contact us at support@sandlogic.com or visit our [Website](https://www.sandlogic.com/).