Tempo14
			's Collections
			 
		
			
		small models
		
	updated
			
 
				
				
	
	
	
			
			Approximating Two-Layer Feedforward Networks for Efficient Transformers
		
			Paper
			
•
			2310.10837
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			BitNet: Scaling 1-bit Transformers for Large Language Models
		
			Paper
			
•
			2310.11453
			
•
			Published
				
			•
				
				105
			
 
	
	 
	
	
	
			
			QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
		
			Paper
			
•
			2310.16795
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			LLM-FP4: 4-Bit Floating-Point Quantized Transformers
		
			Paper
			
•
			2310.16836
			
•
			Published
				
			•
				
				14
			
 
	
	 
	
	
	
			
			FP8-LM: Training FP8 Large Language Models
		
			Paper
			
•
			2310.18313
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
		
			Paper
			
•
			2310.19102
			
•
			Published
				
			•
				
				11
			
 
	
	 
	
	
	
			
			Ziya2: Data-centric Learning is All LLMs Need
		
			Paper
			
•
			2311.03301
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			Mini-GPTs: Efficient Large Language Models through Contextual Pruning
		
			Paper
			
•
			2312.12682
			
•
			Published
				
			•
				
				10
			
 
	
	 
	
	
	
			
			SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
  Depth Up-Scaling
		
			Paper
			
•
			2312.15166
			
•
			Published
				
			•
				
				60
			
 
	
	 
	
	
	
			
			TinyLlama: An Open-Source Small Language Model
		
			Paper
			
•
			2401.02385
			
•
			Published
				
			•
				
				94
			
 
	
	 
	
	
	
			
			SliceGPT: Compress Large Language Models by Deleting Rows and Columns
		
			Paper
			
•
			2401.15024
			
•
			Published
				
			•
				
				74
			
 
	
	 
	
	
	
			
			Specialized Language Models with Cheap Inference from Limited Domain
  Data
		
			Paper
			
•
			2402.01093
			
•
			Published
				
			•
				
				47
			
 
	
	 
	
	
	
			
			Rethinking Optimization and Architecture for Tiny Language Models
		
			Paper
			
•
			2402.02791
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Scaling Laws for Downstream Task Performance of Large Language Models
		
			Paper
			
•
			2402.04177
			
•
			Published
				
			•
				
				20
			
 
	
	 
	
	
	
			
			HARE: HumAn pRiors, a key to small language model Efficiency
		
			Paper
			
•
			2406.11410
			
•
			Published
				
			•
				
				39