Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment
			Paper
			•
			2405.03594
			•
			Published
				
			•
				
				7
			
Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras
Note Our sparse Llama 2 7B base that was pruned to 50% sparsity and retrained on 50B tokens.
Note Our sparse Llama 2 7B base that was pruned to 70% sparsity and retrained on 150B tokens.