File size: 7,159 Bytes
ffa5fe0
 
 
 
 
 
 
 
 
 
 
f39b498
3ae1102
 
 
 
 
 
 
ffa5fe0
e2428f1
ffa5fe0
 
 
ed00d52
 
 
 
 
ac923ac
3c59d92
ed00d52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130327f
ed00d52
1e1bae2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed00d52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140f48a
ed00d52
 
 
e08c1b3
 
ed00d52
 
e08c1b3
 
 
ed00d52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ae1102
 
d7f4fa7
3ae1102
ed00d52
 
1e1bae2
 
 
ed00d52
 
 
b7b9459
1e1bae2
ed00d52
 
 
 
1e1bae2
ed00d52
 
5a68c5a
b25899b
 
5a68c5a
b25899b
3e7102a
5a68c5a
ed00d52
 
62af706
 
ed00d52
 
62af706
 
 
 
 
 
 
1e1bae2
 
ed00d52
 
 
 
62af706
 
 
 
ed00d52
 
 
 
 
 
 
 
 
7bef81f
ed00d52
44583bc
 
 
 
 
 
ed00d52
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
language:
- en
license: apache-2.0
tags:
- text-generation
- transformer
- causal-lm
- pytorch
- lime
datasets:
- HuggingFaceH4/no_robots
- databricks/databricks-dolly-15k
- HuggingFaceTB/everyday-conversations-llama3.1-2k
- Magpie-Align/Magpie-Pro-300K-Filtered
- TIGER-Lab/WebInstruct-verified
- teknium/GPT4-LLM-Cleaned
- yahma/alpaca-cleaned
- Dahoas/synthetic-instruct-gptj-pairwise
pipeline_tag: text-generation
library_name: transformers
---


![logo](logo.png)
**LIME-1B Model Card**

---

> **Note**: This model serves as proof that a single individual, without any team or institutional backing, can develop an SLM that demonstrates competitive results.
> LIME-1B was trained for only ~$1,000 yet delivers quality approaching models trained on hundreds of thousands of dollars of compute-demonstrating exceptional training efficiency.

---

# LIME-1B

LIME-1B is a 1B-parameter, decoder-only Transformer language model trained from scratch on English web data and then instruction-tuned on a curated mixture of assistant-style datasets with and without retrieval context. It is designed as a **compact, practical base model** for:

- Building RAG systems (context + question → answer)  
- Assistant-style Q&A and task completion  
- Summarization, explanation, and rewriting tasks in English  

> ⚠️ LIME-1B is **not** RLHF/DPO-aligned and does **not** have tool use or multi-turn chat training baked in. It is an instruction-tuned LM, not a fully aligned assistant like ChatGPT.

---

## 1. Model architecture

LIME-1B follows is a decoder-only Transformer with several quality-oriented design choices:

| Component               | Value                                      |
|-------------------------|--------------------------------------------|
| Architecture            | Decoder-only Transformer                   |
| Parameters              | 1.0B                                       |
| Layers (decoder blocks) | 32                                         |
| d_model                 | 1536                                       |
| FFN dimension (d_ff)    | 6144                                       |
| Attention heads         | 24                                         |
| Vocabulary size         | 50,000                                     |
| Max sequence length     | 512 tokens                                 |
| Positional encoding     | Sinusoidal                                 |
| Norm                    | RMSNorm                                    |
| FFN                     | SiLU MLP                                   |
| Attention               | FlashAttention                             |
| Tying of embeddings     | Output head tied to embedding              |
| Precision (training)    | Mixed fp32/bf16 (autocast) + grad clipping |


## 2. Training data

### 2.1 Pretraining

The base model is pretrained as a standard causal language model on English web data:

- **Corpus**: FineWeb-Edu (CC-MAIN-2025-05 split) 
- **Language filter**: English-only subset  
- **Objective**: next-token prediction (causal LM)  
- **Token budget**: 20B tokens  
- **Context length**: 512 tokens  


### 2.2 Instruction fine-tuning (SFT)

After pretraining, the model is fine-tuned on a **unified instruction schema**:

```text
<user> instruction_text <assistant> response_text <eos>
```

**SFT Data Mixture** (~97k examples total):

- [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k)
- [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
- [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)
- [teknium/GPT4-LLM-Cleaned](https://huggingface.co/datasets/teknium/GPT4-LLM-Cleaned)
- [Magpie-Align/Magpie-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered)
- [Dahoas/synthetic-instruct-gptj-pairwise](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise)

## Training Details

### Hardware
- **GPUs**: 8 × NVIDIA A100 80GB (data parallel)
- **Precision**: bfloat16 with gradient clipping (max_norm = 1.0)

### Pretraining

**Objective**: Cross-entropy loss on next-token prediction

**Optimizer**: AdamW
- β₁ = 0.9
- β₂ = 0.95
- Weight decay applied to non-norm/non-bias parameters

**Learning Rate Schedule**:
- Peak LR: ~5e-4
- Polynomial decay to 5e-6
- Warmup: ~5% of total steps

### Instruction fine-tuning (SFT)

**Objective**: Cross-entropy loss on next-token prediction

**Optimizer**: AdamW
- β₁ = 0.9
- β₂ = 0.95
- Weight decay applied to non-norm/non-bias parameters

**Learning Rate Schedule**:
- Peak LR: 8e-5
- Polynomial decay to 1e-5
- Warmup: 10% of total steps

## 3. Evaluation Benchmarks

The following charts comparing LIME-1B against other models across 8 standard evaluation tasks can be viewed here: [![Metrics Chart](metrics_chart.png)](metrics_chart.png)

## Usage
```python
# Example usage
# pip install -U ukraine

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "anarlavrenov/lime-1b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

def build_prompt(question):
  uid = "<user>"
  aid = "<assistant>"
  return uid + question + aid

question = "Write five questions for a Data Scientist interview."
prompt = build_prompt(question)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
input_length = inputs['input_ids'].shape[1]

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    num_beams=4,
    early_stopping=True,
    repetition_penalty=1.15,
    no_repeat_ngram_size=3,
    min_new_tokens=16,
    do_sample=False,
    top_p=None,
    temperature=None,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
)

generated_tokens = outputs[0][input_length:]
output = tokenizer.decode(generated_tokens, skip_special_tokens=True)

print(output)

# 1. Can you tell us about your experience with data analysis and modeling? 
# 2. How do you approach data cleaning and preprocessing? 
# 3. How do you approach data visualization and storytelling? 
# 4. Can you walk us through a time when you used data to solve a problem? 
# 5. How do you approach the ethical considerations of data science and machine learning?

```

If you use LIME-1B in academic work or public products, please consider citing the model and the underlying datasets according to their respective licenses and documentation.

**Anar Lavrenov**

[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/anar-lavrenov/)

Feel free to reach out for questions, or feedback about LIME-1B!

## Citation
```bibtex
@misc{lime1b2025,
  title         = {LIME-1B: A 1B-parameter English Causal Language Model},
  author        = {Anar Lavrenov},
  year          = {2025},
  howpublished  = {\url{https://huggingface.co/anarlavrenov/LIME-1B}}
}
```