Model Card for SmolLM-135M Layer-Pruned (~90M params)
This repository hosts a layer-pruned version of HuggingFaceTB/SmolLM-135M, reduced from ~ 135M parameters down to ~ 99M parameters (~26% smaller).
โ ๏ธ Note: This model is intended as a starting point for knowledge distillation or fine-tuning, not as a final standalone model.
Model Details
Model Description
- Developed by: Independent (based on HuggingFaceTB/SmolLM-135M)
- Model type: Decoder-only causal language model (LLaMA-style)
- Language(s) (NLP): English (same as base model)
- License: Inherits license from HuggingFaceTB/SmolLM-135M
- Finetuned from model: HuggingFaceTB/SmolLM-135M
The model was pruned from 30 layers โ 20 layers, achieving ~26% parameter reduction while keeping embeddings, the output head, and the final layer intact.
Model Sources
- Repository: HuggingFaceTB/SmolLM-135M (base)
- This model repo: current repository
- Paper [optional]: N/A
- Demo [optional]: N/A
Uses
Direct Use
- Educational / research purposes for studying pruning effects on transformer models.
- Lightweight inference where resources are limited.
Downstream Use
- Knowledge Distillation: Using this pruned model as a student model against larger teacher models.
- Fine-Tuning: Domain adaptation on specific datasets while benefiting from lower compute requirements.
Out-of-Scope Use
- Production deployment without evaluation.
- High-stakes applications (medical, legal, safety-critical systems).
Bias, Risks, and Limitations
- As with the base model, it may generate biased or toxic text.
- Pruning reduces capacity โ performance may drop without re-training.
- Model has not been benchmarked post-pruning.
Recommendations
Users should:
- Perform task-specific fine-tuning or distillation before deployment.
- Benchmark against baselines to measure trade-offs in accuracy vs. efficiency.
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "your-username/SmolLM-135M-layer-pruned-90M-raw"
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-135M")
model = AutoModelForCausalLM.from_pretrained(model_id)
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 6