YAML Metadata
		Warning:
	empty or missing yaml metadata in repo card
	(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Qwen3-21B Pruned from 30B (90 Experts)
A pruned version of Qwen3-30B-A3B-Instruct with 38 experts removed through expert pruning, reducing the model from 30B to approximately 21B parameters while achieving significant memory savings.
Model Details
- Base Model: Qwen/Qwen3-30B-A3B-Instruct-2507-FP8
- Architecture: Mixture of Experts (MoE) Transformer
- Original Parameters: ~30B
- Pruned Parameters: ~21B
- Original Experts: 128 per layer
- Pruned Experts: 90 per layer (38 removed)
- Size Reduction: 28.2% parameter reduction
- Quality Impact: +7.36% performance change
Pruning Methodology
Expert Usage Analysis
Used real-time router logit analysis to identify the least utilized experts across the model:
- Analyzed expert routing patterns with output_router_logits=True
- Tracked expert selection frequency across multiple inference samples
- Identified 38 least-used experts for removal based on actual usage statistics
True Architectural Pruning
Unlike weight masking approaches, this model features genuine architectural changes:
- In-place expert removal: deleted unused expert modules
- Router adjustment: Reduced router dimensions from 128โ90 outputs
- Weight remapping: Preserved routing weights for remaining experts
- Config updates: Model configuration reflects new expert count
Quality Impact
- Performance Impact: 7.36% degradation on evaluation metrics
- Note: Performance may vary across different task types
- Efficiency Gains: Faster inference due to reduced expert overhead
Technical Specifications
Architecture:
  - Layers: 48
  - Hidden Size: 2048
  - Attention Heads: 32
  - Experts per Layer: 90 (reduced from 128)
  - Active Experts per Token: 8
  - Context Length: 128K
  - Effective Parameters: ~21B (reduced from ~30B)
Optimizations:
  - FP8 quantization preserved
  - SafeTensors format
  - Flash Attention compatible
  - Efficient expert routing
  - True architectural pruning
| Metric | Original | Pruned | Change |
|--------|----------|--------|--------|
| Total Parameters | ~30B | ~21B | -28.2% |
| Model Size | 56.9 GB | 40.8 GB | -16.0 GB |
| Experts per Layer | 128 | 90 | -38 |
| Evaluation Loss | Baseline | +7.36% | Probable degradation |
@misc{qwen3-pruned-90, title={Qwen3-21B Pruned Architecture with 90 Experts}, author={ Expert Pruning}, year={2025}, note={Pruned version of Qwen3-30B-A3B-Instruct with 38 experts removed} }
- Downloads last month
- 5
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	๐
			
		Ask for provider support