A little guide to building Large Language Models in 2024

mishig 's Collections

most ducked models 🦆🦆🦆

zephyr story

fuck quadratic attention

A little guide to building Large Language Models in 2024

updated Apr 1, 2024

Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757

Upvote

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 65

Note checkout their chat space: https://huggingface.co/spaces/01-ai/Yi-34B-Chat
A Survey on Data Selection for Language Models

Paper • 2402.16827 • Published Feb 26, 2024 • 4
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Paper • 2402.00159 • Published Jan 31, 2024 • 65

Note checkout olmo suite: https://huggingface.co/collections/allenai/olmo-suite-65aeaae8fe5b6b2122b46778
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Paper • 2306.01116 • Published Jun 1, 2023 • 41

Note checkout datatrove: https://github.com/huggingface/datatrove (freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.)
Bag of Tricks for Efficient Text Classification

Paper • 1607.01759 • Published Jul 6, 2016

Note read more: https://fasttext.cc/
Breadth-First Pipeline Parallelism

Paper • 2211.05953 • Published Nov 11, 2022

Note checkout: https://github.com/huggingface/nanotron (minimalistic large language model 3D-parallelism training)
Reducing Activation Recomputation in Large Transformer Models

Paper • 2205.05198 • Published May 10, 2022
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 6
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Paper • 2203.03466 • Published Mar 7, 2022 • 1

Note from creators of grok: https://huggingface.co/xai-org/grok-1
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

Paper • 2304.03208 • Published Apr 6, 2023 • 1
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 146

Note checkout transformers compatible mambas: https://huggingface.co/collections/state-spaces/transformers-compatible-mamba-65e7b40ab87e5297e45ae406
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 63

Note checkout https://huggingface.co/docs/trl (train transformer language models with reinforcement learning.)
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22, 2024 • 15
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Paper • 2210.17323 • Published Oct 31, 2022 • 8

Note read more: https://huggingface.co/blog/gptq-integration (Making LLMs lighter with AutoGPTQ and transformers)
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Paper • 2208.07339 • Published Aug 15, 2022 • 5

Note read more: https://huggingface.co/docs/bitsandbytes (accessible large language models via k-bit quantization for PyTorch)
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Paper • 2401.10774 • Published Jan 19, 2024 • 59
Running on CPU Upgrade

13.6k

13.6k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

Note checkout lighteval: https://github.com/huggingface/lighteval (lightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron)

Upvote

A little guide to building Large Language Models in 2024

Open LLM Leaderboard