Kearm commited on
Commit
8476225
·
verified ·
1 Parent(s): ee5fb4d

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - zh
5
+ license_link: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct/blob/main/LICENSE
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
+ base_model:
10
+ - Qwen/Qwen3-Next-80B-A3B-Instruct
11
+ ---
12
+ # Qwen3-Next-80B-A3B-Instruct-NVFP4
13
+
14
+ **Quantized version of [Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)** using **LLM Compressor** and the **NVFP4** (E2M1 + E4M3) format.
15
+
16
+ **This time it actually works!** *We think*
17
+
18
+ This should be the start of a new series of *hopefully optimal* NVFP4 quantizations as capable cards continue to grow out in the wild.
19
+
20
+ ---
21
+
22
+ ## Model Summary
23
+
24
+ | Property | Value |
25
+ |-----------|--------|
26
+ | Base model | Qwen/Qwen3-Next-80B-A3B-Instruct |
27
+ | Quantization | NVFP4 (FP4 microscaling, block = 16, scale = E4M3) |
28
+ | Method | Post-Training Quantization with LLM Compressor |
29
+ | Toolchain | LLM Compressor |
30
+ | Hardware target | NVIDIA Blackwell (Untested on RTX cards) / GB200 Tensor Cores |
31
+ | Precision | Weights & activations = FP4 • Scales = FP8 (E4M3) |
32
+ | Maintainer | **RESMP.DEV** |
33
+
34
+ ---
35
+
36
+ ## Description
37
+
38
+ This model is a drop-in replacement for Qwen/Qwen3-Next-80B-A3B-Instruct that runs in **NVFP4 precision**
39
+ Accuracy remains within ≈ 1 % of the FP8 baseline on standard reasoning and coding benchmarks.