gopluto's picture
Initial SLM release (124M, TinyStories+Wikipedia)
db2b200
metadata
library_name: pytorch
pipeline_tag: text-generation
tags:
  - SLM
  - GPT-style
  - tiktoken
  - tiny-stories
  - wikipedia
license: apache-2.0
model-index:
  - name: SLM (GPT-style, ~124M)
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TinyStories + Wikipedia EN 20231101
          type: custom
        metrics:
          - name: Validation loss
            type: loss
            value: 2.6522
          - name: Validation perplexity
            type: perplexity
            value: 14.2

SLM (GPT-style, ~124M params)

  • Tokenizer: tiktoken ("gpt2", 50257)
  • Context length: 512
  • Architecture: Decoder-only, Pre-LN, SDPA attention, tied output/input embeddings
  • Training data: TinyStories + Wikipedia EN (snapshot 20231101)
  • Best checkpoint (from logs): ~step 43.5k (val loss≈2.65, ppl≈14.2)

Usage

import os, sys, json, torch, tiktoken
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

repo_id = "gopluto-ai/slm-124m-tinystories-wiki"

cfg_path = hf_hub_download(repo_id, "config.json")
wt_path  = hf_hub_download(repo_id, "model.safetensors")
slm_path = hf_hub_download(repo_id, "slm.py")

# make the downloaded slm.py importable
sys.path.append(os.path.dirname(slm_path))
from slm import SLM

# load model
cfg = json.load(open(cfg_path))
model = SLM(**cfg)
model.load_state_dict(load_file(wt_path))
model.eval()

# tokeniser
enc = tiktoken.get_encoding("gpt2")
prompt = "Once upon a time"
ids = torch.tensor(enc.encode(prompt)).unsqueeze(0)

# generate
with torch.no_grad():
    y = model.generate(ids, max_new_tokens=50, temperature=0.9, top_k=50)
print(enc.decode(y[0].tolist()))