RESMP-DEV
/

Qwen3-Next-80B-A3B-Instruct-NVFP4

Text Generation

compressed-tensors

Model card Files Files and versions

Qwen3-Next-80B-A3B-Instruct-NVFP4 / README.md

Kearm's picture

Upload README.md

8476225 verified 28 days ago

|

history blame contribute delete

1.33 kB

	---
	language:
	- en
	- zh
	license_link: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct/blob/main/LICENSE
	library_name: transformers
	license: apache-2.0
	pipeline_tag: text-generation
	base_model:
	- Qwen/Qwen3-Next-80B-A3B-Instruct
	---
	# Qwen3-Next-80B-A3B-Instruct-NVFP4

	Quantized version of [Qwen/Qwen3-Next-80B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct) using LLM Compressor and the NVFP4 (E2M1 + E4M3) format.

	This time it actually works! We think

	This should be the start of a new series of hopefully optimal NVFP4 quantizations as capable cards continue to grow out in the wild.

	---

	## Model Summary

	\| Property \| Value \|
	\|-----------\|--------\|
	\| Base model \| Qwen/Qwen3-Next-80B-A3B-Instruct \|
	\| Quantization \| NVFP4 (FP4 microscaling, block = 16, scale = E4M3) \|
	\| Method \| Post-Training Quantization with LLM Compressor \|
	\| Toolchain \| LLM Compressor \|
	\| Hardware target \| NVIDIA Blackwell (Untested on RTX cards) / GB200 Tensor Cores \|
	\| Precision \| Weights & activations = FP4 • Scales = FP8 (E4M3) \|
	\| Maintainer \| RESMP.DEV \|

	---

	## Description

	This model is a drop-in replacement for Qwen/Qwen3-Next-80B-A3B-Instruct that runs in NVFP4 precision
	Accuracy remains within ≈ 1 % of the FP8 baseline on standard reasoning and coding benchmarks.