Tempo14
			's Collections
			 
		
			
		new architecture
		
	updated
			
 
				
				
	
	
	
			
			Blending Is All You Need: Cheaper, Better Alternative to
  Trillion-Parameters LLM
		
			Paper
			
•
			2401.02994
			
•
			Published
				
			•
				
				52
			
 
	
	 
	
	
	
			
			MambaByte: Token-free Selective State Space Model
		
			Paper
			
•
			2401.13660
			
•
			Published
				
			•
				
				60
			
 
	
	 
	
	
	
			
			Repeat After Me: Transformers are Better than State Space Models at
  Copying
		
			Paper
			
•
			2402.01032
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			BlackMamba: Mixture of Experts for State-Space Models
		
			Paper
			
•
			2402.01771
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
  Tasks
		
			Paper
			
•
			2402.04248
			
•
			Published
				
			•
				
				32
			
 
	
	 
	
	
	
			
			KAN: Kolmogorov-Arnold Networks
		
			Paper
			
•
			2404.19756
			
•
			Published
				
			•
				
				115
			
 
	
	 
	
	
	
			
			Zamba: A Compact 7B SSM Hybrid Model
		
			Paper
			
•
			2405.16712
			
•
			Published
				
			•
				
				24
			
 
	
	 
	
	
	
			
			Transformers are SSMs: Generalized Models and Efficient Algorithms
  Through Structured State Space Duality
		
			Paper
			
•
			2405.21060
			
•
			Published
				
			•
				
				67
			
 
	
	 
	
	
	
			
			Block Transformer: Global-to-Local Language Modeling for Fast Inference
		
			Paper
			
•
			2406.02657
			
•
			Published
				
			•
				
				41
			
 
	
	 
	
	
	
			
			Breaking the Attention Bottleneck
		
			Paper
			
•
			2406.10906
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Learning to (Learn at Test Time): RNNs with Expressive Hidden States
		
			Paper
			
•
			2407.04620
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
		
			Paper
			
•
			2408.12570
			
•
			Published
				
			•
				
				33
			
 
	
	 
	
	
	
			
			A Comprehensive Survey of Mamba Architectures for Medical Image
  Analysis: Classification, Segmentation, Restoration and Beyond
		
			Paper
			
•
			2410.02362
			
•
			Published
				
			•
				
				18
			
 
	
	 
	
	
	
		
			Paper
			
•
			2410.05258
			
•
			Published
				
			•
				
				179
			
 
	
	 
	
	
	
			
			GPT or BERT: why not both?
		
			Paper
			
•
			2410.24159
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Relaxed Recursive Transformers: Effective Parameter Sharing with
  Layer-wise LoRA
		
			Paper
			
•
			2410.20672
			
•
			Published
				
			•
				
				6
			
 
	
	 
	
	
	
			
			SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba
  State Space Models
		
			Paper
			
•
			2411.00233
			
•
			Published
				
			•
				
				7
			
 
	
	 
	
	
	
			
			Hymba: A Hybrid-head Architecture for Small Language Models
		
			Paper
			
•
			2411.13676
			
•
			Published
				
			•
				
				45
			
 
	
	 
	
	
	
			
			Gated Delta Networks: Improving Mamba2 with Delta Rule
		
			Paper
			
•
			2412.06464
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			Byte Latent Transformer: Patches Scale Better Than Tokens
		
			Paper
			
•
			2412.09871
			
•
			Published
				
			•
				
				108
			
 
	
	 
	
	
	
			
			RWKV-7 "Goose" with Expressive Dynamic State Evolution
		
			Paper
			
•
			2503.14456
			
•
			Published
				
			•
				
				153
			
 
	
	 
	
	
	
			
			Deep Residual Echo State Networks: exploring residual orthogonal
  connections in untrained Recurrent Neural Networks
		
			Paper
			
•
			2508.21172
			
•
			Published
				
			•
				
				1
			
 
	
	 
	
	
	
			
			Gated Associative Memory: A Parallel O(N) Architecture for Efficient
  Sequence Modeling
		
			Paper
			
•
			2509.00605
			
•
			Published
				
			•
				
				42
			
 
	
	 
	
	
	
			
			Less is More: Recursive Reasoning with Tiny Networks
		
			Paper
			
•
			2510.04871
			
•
			Published
				
			•
				
				463