view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30 • 185
view article Article You could have designed state of the art positional encoding Nov 25, 2024 • 403
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention Oct 7, 2024 • 60
Llama 2: Open Foundation and Fine-Tuned Chat Models Paper • 2307.09288 • Published Jul 18, 2023 • 247