File size: 1,616 Bytes
db2b200 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
---
library_name: pytorch
pipeline_tag: text-generation
tags: [SLM, GPT-style, tiktoken, tiny-stories, wikipedia]
license: apache-2.0
model-index:
- name: SLM (GPT-style, ~124M)
results:
- task: {type: text-generation, name: Text Generation}
dataset: {name: TinyStories + Wikipedia EN 20231101, type: custom}
metrics:
- {name: Validation loss, type: loss, value: 2.6522}
- {name: Validation perplexity, type: perplexity, value: 14.2}
---
# SLM (GPT-style, ~124M params)
- Tokenizer: `tiktoken` ("gpt2", 50257)
- Context length: 512
- Architecture: Decoder-only, Pre-LN, SDPA attention, tied output/input embeddings
- Training data: TinyStories + Wikipedia EN (snapshot 20231101)
- Best checkpoint (from logs): ~step 43.5k (val loss≈2.65, ppl≈14.2)
## Usage
```python
import os, sys, json, torch, tiktoken
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
repo_id = "gopluto-ai/slm-124m-tinystories-wiki"
cfg_path = hf_hub_download(repo_id, "config.json")
wt_path = hf_hub_download(repo_id, "model.safetensors")
slm_path = hf_hub_download(repo_id, "slm.py")
# make the downloaded slm.py importable
sys.path.append(os.path.dirname(slm_path))
from slm import SLM
# load model
cfg = json.load(open(cfg_path))
model = SLM(**cfg)
model.load_state_dict(load_file(wt_path))
model.eval()
# tokeniser
enc = tiktoken.get_encoding("gpt2")
prompt = "Once upon a time"
ids = torch.tensor(enc.encode(prompt)).unsqueeze(0)
# generate
with torch.no_grad():
y = model.generate(ids, max_new_tokens=50, temperature=0.9, top_k=50)
print(enc.decode(y[0].tolist()))
|