lime-1b-instruct / README.md
anarlavrenov's picture
Adding `Transformers` in the yaml (#2)
e2428f1 verified
---
language:
- en
license: apache-2.0
tags:
- text-generation
- transformer
- causal-lm
- pytorch
- lime
datasets:
- HuggingFaceH4/no_robots
- databricks/databricks-dolly-15k
- HuggingFaceTB/everyday-conversations-llama3.1-2k
- Magpie-Align/Magpie-Pro-300K-Filtered
- TIGER-Lab/WebInstruct-verified
- teknium/GPT4-LLM-Cleaned
- yahma/alpaca-cleaned
- Dahoas/synthetic-instruct-gptj-pairwise
pipeline_tag: text-generation
library_name: transformers
---
![logo](logo.png)
**LIME-1B Model Card**
---
> **Note**: This model serves as proof that a single individual, without any team or institutional backing, can develop an SLM that demonstrates competitive results.
> LIME-1B was trained for only ~$1,000 yet delivers quality approaching models trained on hundreds of thousands of dollars of compute-demonstrating exceptional training efficiency.
---
# LIME-1B
LIME-1B is a 1B-parameter, decoder-only Transformer language model trained from scratch on English web data and then instruction-tuned on a curated mixture of assistant-style datasets with and without retrieval context. It is designed as a **compact, practical base model** for:
- Building RAG systems (context + question → answer)
- Assistant-style Q&A and task completion
- Summarization, explanation, and rewriting tasks in English
> ⚠️ LIME-1B is **not** RLHF/DPO-aligned and does **not** have tool use or multi-turn chat training baked in. It is an instruction-tuned LM, not a fully aligned assistant like ChatGPT.
---
## 1. Model architecture
LIME-1B follows is a decoder-only Transformer with several quality-oriented design choices:
| Component | Value |
|-------------------------|--------------------------------------------|
| Architecture | Decoder-only Transformer |
| Parameters | 1.0B |
| Layers (decoder blocks) | 32 |
| d_model | 1536 |
| FFN dimension (d_ff) | 6144 |
| Attention heads | 24 |
| Vocabulary size | 50,000 |
| Max sequence length | 512 tokens |
| Positional encoding | Sinusoidal |
| Norm | RMSNorm |
| FFN | SiLU MLP |
| Attention | FlashAttention |
| Tying of embeddings | Output head tied to embedding |
| Precision (training) | Mixed fp32/bf16 (autocast) + grad clipping |
## 2. Training data
### 2.1 Pretraining
The base model is pretrained as a standard causal language model on English web data:
- **Corpus**: FineWeb-Edu (CC-MAIN-2025-05 split)
- **Language filter**: English-only subset
- **Objective**: next-token prediction (causal LM)
- **Token budget**: 20B tokens
- **Context length**: 512 tokens
### 2.2 Instruction fine-tuning (SFT)
After pretraining, the model is fine-tuned on a **unified instruction schema**:
```text
<user> instruction_text <assistant> response_text <eos>
```
**SFT Data Mixture** (~97k examples total):
- [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k)
- [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
- [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)
- [teknium/GPT4-LLM-Cleaned](https://huggingface.co/datasets/teknium/GPT4-LLM-Cleaned)
- [Magpie-Align/Magpie-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered)
- [Dahoas/synthetic-instruct-gptj-pairwise](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise)
## Training Details
### Hardware
- **GPUs**: 8 × NVIDIA A100 80GB (data parallel)
- **Precision**: bfloat16 with gradient clipping (max_norm = 1.0)
### Pretraining
**Objective**: Cross-entropy loss on next-token prediction
**Optimizer**: AdamW
- β₁ = 0.9
- β₂ = 0.95
- Weight decay applied to non-norm/non-bias parameters
**Learning Rate Schedule**:
- Peak LR: ~5e-4
- Polynomial decay to 5e-6
- Warmup: ~5% of total steps
### Instruction fine-tuning (SFT)
**Objective**: Cross-entropy loss on next-token prediction
**Optimizer**: AdamW
- β₁ = 0.9
- β₂ = 0.95
- Weight decay applied to non-norm/non-bias parameters
**Learning Rate Schedule**:
- Peak LR: 8e-5
- Polynomial decay to 1e-5
- Warmup: 10% of total steps
## 3. Evaluation Benchmarks
The following charts comparing LIME-1B against other models across 8 standard evaluation tasks can be viewed here: [![Metrics Chart](metrics_chart.png)](metrics_chart.png)
## Usage
```python
# Example usage
# pip install -U ukraine
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_name = "anarlavrenov/lime-1b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
def build_prompt(question):
uid = "<user>"
aid = "<assistant>"
return uid + question + aid
question = "Write five questions for a Data Scientist interview."
prompt = build_prompt(question)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
input_length = inputs['input_ids'].shape[1]
outputs = model.generate(
**inputs,
max_new_tokens=128,
num_beams=4,
early_stopping=True,
repetition_penalty=1.15,
no_repeat_ngram_size=3,
min_new_tokens=16,
do_sample=False,
top_p=None,
temperature=None,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
generated_tokens = outputs[0][input_length:]
output = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print(output)
# 1. Can you tell us about your experience with data analysis and modeling?
# 2. How do you approach data cleaning and preprocessing?
# 3. How do you approach data visualization and storytelling?
# 4. Can you walk us through a time when you used data to solve a problem?
# 5. How do you approach the ethical considerations of data science and machine learning?
```
If you use LIME-1B in academic work or public products, please consider citing the model and the underlying datasets according to their respective licenses and documentation.
**Anar Lavrenov**
[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/anar-lavrenov/)
Feel free to reach out for questions, or feedback about LIME-1B!
## Citation
```bibtex
@misc{lime1b2025,
title = {LIME-1B: A 1B-parameter English Causal Language Model},
author = {Anar Lavrenov},
year = {2025},
howpublished = {\url{https://huggingface.co/anarlavrenov/LIME-1B}}
}
```