Long context LLM
updated
Sequence Parallelism: Long Sequence Training from System Perspective
Paper
• 2105.13120
• Published
• 6
Ring Attention with Blockwise Transformers for Near-Infinite Context
Paper
• 2310.01889
• Published
• 13
Striped Attention: Faster Ring Attention for Causal Transformers
Paper
• 2311.09431
• Published
• 4
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme
Long Sequence Transformer Models
Paper
• 2309.14509
• Published
• 20
LightSeq: Sequence Level Parallelism for Distributed Training of Long
Context Transformers
Paper
• 2310.03294
• Published
• 2
BurstAttention: An Efficient Distributed Attention Framework for
Extremely Long Sequences
Paper
• 2403.09347
• Published
• 22
Beyond the Limits: A Survey of Techniques to Extend the Context Length
in Large Language Models
Paper
• 2402.02244
• Published
• 1
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention
and Distributed KVCache
Paper
• 2401.02669
• Published
• 17
Advancing Transformer Architecture in Long-Context Large Language
Models: A Comprehensive Survey
Paper
• 2311.12351
• Published
• 5
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper
• 2401.01325
• Published
• 27
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper
• 2402.13753
• Published
• 116
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
• 2404.07143
• Published
• 111
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
• 2404.08801
• Published
• 66
Longformer: The Long-Document Transformer
Paper
• 2004.05150
• Published
• 4
Generating Long Sequences with Sparse Transformers
Paper
• 1904.10509
• Published
• 2
A Unified Sequence Parallelism Approach for Long Context Generative AI
Paper
• 2405.07719
• Published
• 5
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published
• 81
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context
Parallelism
Paper
• 2406.18485
• Published
• 3