--- language: - en license: apache-2.0 tags: - text-generation - transformer - causal-lm - pytorch - lime datasets: - HuggingFaceH4/no_robots - databricks/databricks-dolly-15k - HuggingFaceTB/everyday-conversations-llama3.1-2k - Magpie-Align/Magpie-Pro-300K-Filtered - TIGER-Lab/WebInstruct-verified - teknium/GPT4-LLM-Cleaned - yahma/alpaca-cleaned - Dahoas/synthetic-instruct-gptj-pairwise pipeline_tag: text-generation library_name: transformers --- ![logo](logo.png) **LIME-1B Model Card** --- > **Note**: This model serves as proof that a single individual, without any team or institutional backing, can develop an SLM that demonstrates competitive results. > LIME-1B was trained for only ~$1,000 yet delivers quality approaching models trained on hundreds of thousands of dollars of compute-demonstrating exceptional training efficiency. --- # LIME-1B LIME-1B is a 1B-parameter, decoder-only Transformer language model trained from scratch on English web data and then instruction-tuned on a curated mixture of assistant-style datasets with and without retrieval context. It is designed as a **compact, practical base model** for: - Building RAG systems (context + question → answer) - Assistant-style Q&A and task completion - Summarization, explanation, and rewriting tasks in English > ⚠️ LIME-1B is **not** RLHF/DPO-aligned and does **not** have tool use or multi-turn chat training baked in. It is an instruction-tuned LM, not a fully aligned assistant like ChatGPT. --- ## 1. Model architecture LIME-1B follows is a decoder-only Transformer with several quality-oriented design choices: | Component | Value | |-------------------------|--------------------------------------------| | Architecture | Decoder-only Transformer | | Parameters | 1.0B | | Layers (decoder blocks) | 32 | | d_model | 1536 | | FFN dimension (d_ff) | 6144 | | Attention heads | 24 | | Vocabulary size | 50,000 | | Max sequence length | 512 tokens | | Positional encoding | Sinusoidal | | Norm | RMSNorm | | FFN | SiLU MLP | | Attention | FlashAttention | | Tying of embeddings | Output head tied to embedding | | Precision (training) | Mixed fp32/bf16 (autocast) + grad clipping | ## 2. Training data ### 2.1 Pretraining The base model is pretrained as a standard causal language model on English web data: - **Corpus**: FineWeb-Edu (CC-MAIN-2025-05 split) - **Language filter**: English-only subset - **Objective**: next-token prediction (causal LM) - **Token budget**: 20B tokens - **Context length**: 512 tokens ### 2.2 Instruction fine-tuning (SFT) After pretraining, the model is fine-tuned on a **unified instruction schema**: ```text instruction_text response_text ``` **SFT Data Mixture** (~97k examples total): - [HuggingFaceTB/everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k) - [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) - [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots) - [teknium/GPT4-LLM-Cleaned](https://huggingface.co/datasets/teknium/GPT4-LLM-Cleaned) - [Magpie-Align/Magpie-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered) - [Dahoas/synthetic-instruct-gptj-pairwise](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) ## Training Details ### Hardware - **GPUs**: 8 × NVIDIA A100 80GB (data parallel) - **Precision**: bfloat16 with gradient clipping (max_norm = 1.0) ### Pretraining **Objective**: Cross-entropy loss on next-token prediction **Optimizer**: AdamW - β₁ = 0.9 - β₂ = 0.95 - Weight decay applied to non-norm/non-bias parameters **Learning Rate Schedule**: - Peak LR: ~5e-4 - Polynomial decay to 5e-6 - Warmup: ~5% of total steps ### Instruction fine-tuning (SFT) **Objective**: Cross-entropy loss on next-token prediction **Optimizer**: AdamW - β₁ = 0.9 - β₂ = 0.95 - Weight decay applied to non-norm/non-bias parameters **Learning Rate Schedule**: - Peak LR: 8e-5 - Polynomial decay to 1e-5 - Warmup: 10% of total steps ## 3. Evaluation Benchmarks The following charts comparing LIME-1B against other models across 8 standard evaluation tasks can be viewed here: [![Metrics Chart](metrics_chart.png)](metrics_chart.png) ## Usage ```python # Example usage # pip install -U ukraine from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_name = "anarlavrenov/lime-1b-instruct" tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) def build_prompt(question): uid = "" aid = "" return uid + question + aid question = "Write five questions for a Data Scientist interview." prompt = build_prompt(question) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) input_length = inputs['input_ids'].shape[1] outputs = model.generate( **inputs, max_new_tokens=128, num_beams=4, early_stopping=True, repetition_penalty=1.15, no_repeat_ngram_size=3, min_new_tokens=16, do_sample=False, top_p=None, temperature=None, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) generated_tokens = outputs[0][input_length:] output = tokenizer.decode(generated_tokens, skip_special_tokens=True) print(output) # 1. Can you tell us about your experience with data analysis and modeling? # 2. How do you approach data cleaning and preprocessing? # 3. How do you approach data visualization and storytelling? # 4. Can you walk us through a time when you used data to solve a problem? # 5. How do you approach the ethical considerations of data science and machine learning? ``` If you use LIME-1B in academic work or public products, please consider citing the model and the underlying datasets according to their respective licenses and documentation. **Anar Lavrenov** [![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/anar-lavrenov/) Feel free to reach out for questions, or feedback about LIME-1B! ## Citation ```bibtex @misc{lime1b2025, title = {LIME-1B: A 1B-parameter English Causal Language Model}, author = {Anar Lavrenov}, year = {2025}, howpublished = {\url{https://huggingface.co/anarlavrenov/LIME-1B}} } ```