Update README.md
Browse files
README.md
CHANGED
|
@@ -13,13 +13,13 @@ language:
|
|
| 13 |
- th
|
| 14 |
pipeline_tag: text-generation
|
| 15 |
license: apache-2.0
|
| 16 |
-
base_model: Qwen/Qwen3-
|
| 17 |
---
|
| 18 |
|
| 19 |
# Qwen3-32B-NVFP4A16
|
| 20 |
|
| 21 |
## Model Overview
|
| 22 |
-
- **Model Architecture:** Qwen/Qwen3-
|
| 23 |
- **Input:** Text
|
| 24 |
- **Output:** Text
|
| 25 |
- **Model Optimizations:**
|
|
@@ -28,14 +28,14 @@ base_model: Qwen/Qwen3-32B
|
|
| 28 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
|
| 29 |
- **Release Date:** 6/25/2025
|
| 30 |
- **Version:** 10
|
| 31 |
-
- **Model Developers:** RedHatAI
|
| 32 |
|
| 33 |
-
This model is a quantized version of [Qwen/Qwen3-
|
| 34 |
It was evaluated on a several tasks to assess the its quality in comparison to the unquatized model.
|
| 35 |
|
| 36 |
### Model Optimizations
|
| 37 |
|
| 38 |
-
This model was obtained by quantizing the weights of [Qwen/Qwen3-
|
| 39 |
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
|
| 40 |
|
| 41 |
Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
|
|
|
|
| 13 |
- th
|
| 14 |
pipeline_tag: text-generation
|
| 15 |
license: apache-2.0
|
| 16 |
+
base_model: Qwen/Qwen3-8B
|
| 17 |
---
|
| 18 |
|
| 19 |
# Qwen3-32B-NVFP4A16
|
| 20 |
|
| 21 |
## Model Overview
|
| 22 |
+
- **Model Architecture:** Qwen/Qwen3-8B
|
| 23 |
- **Input:** Text
|
| 24 |
- **Output:** Text
|
| 25 |
- **Model Optimizations:**
|
|
|
|
| 28 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
|
| 29 |
- **Release Date:** 6/25/2025
|
| 30 |
- **Version:** 10
|
| 31 |
+
- **Model Developers:** ELVISIO (Thanks to RedHatAI)
|
| 32 |
|
| 33 |
+
This model is a quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
|
| 34 |
It was evaluated on a several tasks to assess the its quality in comparison to the unquatized model.
|
| 35 |
|
| 36 |
### Model Optimizations
|
| 37 |
|
| 38 |
+
This model was obtained by quantizing the weights of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to FP4 data type, ready for inference with vLLM>=9.1
|
| 39 |
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
|
| 40 |
|
| 41 |
Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
|