leonardlin
			's Collections
			 
		
			
		context
		
	updated
			
 
				
				
	
	
	
			
			Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
		
			Paper
			
•
			2401.03462
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
		
			Paper
			
•
			2305.07185
			
•
			Published
				
			•
				
				9
			
 
	
	 
	
	
	
			
			YaRN: Efficient Context Window Extension of Large Language Models
		
			Paper
			
•
			2309.00071
			
•
			Published
				
			•
				
				77
			
 
	
	 
	
	
	
			
			Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
  and Distributed KVCache
		
			Paper
			
•
			2401.02669
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
		
			Paper
			
•
			2401.01325
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			Extending Context Window of Large Language Models via Semantic
  Compression
		
			Paper
			
•
			2312.09571
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			Zebra: Extending Context Window with Layerwise Grouped Local-Global
  Attention
		
			Paper
			
•
			2312.08618
			
•
			Published
				
			•
				
				15
			
 
	
	 
	
	
	
			
			Mamba: Linear-Time Sequence Modeling with Selective State Spaces
		
			Paper
			
•
			2312.00752
			
•
			Published
				
			•
				
				146
			
 
	
	 
	
	
	
			
			E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
		
			Paper
			
•
			2401.06951
			
•
			Published
				
			•
				
				26
			
 
	
	 
	
	
	
			
			Extending LLMs' Context Window with 100 Samples
		
			Paper
			
•
			2401.07004
			
•
			Published
				
			•
				
				16
			
 
	
	 
	
	
	
			
			KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache
  Quantization
		
			Paper
			
•
			2401.18079
			
•
			Published
				
			•
				
				8
			
 
	
	 
	
	
	
			
			KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
		
			Paper
			
•
			2402.02750
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
		
			Paper
			
•
			2402.09727
			
•
			Published
				
			•
				
				38
			
 
	
	 
	
	
	
			
			In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs
  Miss
		
			Paper
			
•
			2402.10790
			
•
			Published
				
			•
				
				42
			
 
	
	 
	
	
	
			
			LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
		
			Paper
			
•
			2402.13753
			
•
			Published
				
			•
				
				116
			
 
	
	 
	
	
	
			
			Data Engineering for Scaling Language Models to 128K Context
		
			Paper
			
•
			2402.10171
			
•
			Published
				
			•
				
				25
			
 
	
	 
	
	
	
			
			Striped Attention: Faster Ring Attention for Causal Transformers
		
			Paper
			
•
			2311.09431
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Ring Attention with Blockwise Transformers for Near-Infinite Context
		
			Paper
			
•
			2310.01889
			
•
			Published
				
			•
				
				13
			
 
	
	 
	
	
	
			
			LLoCO: Learning Long Contexts Offline
		
			Paper
			
•
			2404.07979
			
•
			Published
				
			•
				
				22
			
 
	
	 
	
	
	
			
			Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention
		
			Paper
			
•
			2404.07143
			
•
			Published
				
			•
				
				111
			
 
	
	 
	
	
	
			
			InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding
  Extremely Long Sequences with Training-Free Memory
		
			Paper
			
•
			2402.04617
			
•
			Published
				
			•
				
				4
			
 
	
	 
	
	
	
			
			Megalodon: Efficient LLM Pretraining and Inference with Unlimited
  Context Length
		
			Paper
			
•
			2404.08801
			
•
			Published
				
			•
				
				66
			
 
	
	 
	
	
	
			
			TransformerFAM: Feedback attention is working memory
		
			Paper
			
•
			2404.09173
			
•
			Published
				
			•
				
				43
			
 
	
	 
	
	
	
			
			Extending Llama-3's Context Ten-Fold Overnight
		
			Paper
			
•
			2404.19553
			
•
			Published
				
			•
				
				34
			
 
	
	 
	
	
	
			
			Make Your LLM Fully Utilize the Context
		
			Paper
			
•
			2404.16811
			
•
			Published
				
			•
				
				55
			
 
	
	 
	
	
	
			
			Long-context LLMs Struggle with Long In-context Learning
		
			Paper
			
•
			2404.02060
			
•
			Published
				
			•
				
				37
			
 
	
	 
	
	
	
			
			HyperAttention: Long-context Attention in Near-Linear Time
		
			Paper
			
•
			2310.05869
			
•
			Published
				
			•
				
				2
			
 
	
	 
	
	
	
			
			World Model on Million-Length Video And Language With RingAttention
		
			Paper
			
•
			2402.08268
			
•
			Published
				
			•
				
				40
			
 
	
	 
	
	
	
			
			LongNet: Scaling Transformers to 1,000,000,000 Tokens
		
			Paper
			
•
			2307.02486
			
•
			Published
				
			•
				
				81
			
 
	
	 
	
	
	
			
			MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
		
			Paper
			
•
			2407.02490
			
•
			Published
				
			•
				
				27
			
 
	
	 
	
	
	
			
			LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
		
			Paper
			
•
			2408.07055
			
•
			Published
				
			•
				
				67