nanochat-d20

A 896 million parameter GPT-style language model trained from scratch on a single NVIDIA H200 GPU at the Anuradha and Vikas Sinha Department of Data Science housed in the College of Information at the University of North Texas.

This model is based on karpathy/nanochat and was built as an educational demonstration of the transformer architecture described in Attention is All You Need (Vaswani et al., 2017).

Model Details

Property Value
Parameters 896 million
Layers (depth) 20
Attention heads 10
Embedding size 1280
Vocabulary size 32,768
Context length 2048 tokens
Training tokens 5.2 billion
Training time ~8.5 hours
Hardware 1x NVIDIA H200 (143GB)
Tokenizer BPE (rustbpe)
Training data ClimbMix

Benchmark Results

Benchmark Score
HellaSwag (10-shot) 0.522
Winograd (0-shot) 0.626
Winogrande (0-shot) 0.546
ARC-Easy (10-shot) 0.328
PIQA (10-shot) 0.568
CORE 0.2462

Purpose

This model was trained as part of a workshop demonstrating the full ML pipeline:

  1. Build โ€” construct a transformer from scratch based on the original paper
  2. Train โ€” pretrain on billions of tokens of real text data
  3. Share โ€” publish weights to HuggingFace
  4. Quantize โ€” reduce model size with bitsandbytes
  5. Fine-tune โ€” adapt the model for specific tasks with LoRA

Limitations

This is an educational base model, not a production system. It has no instruction tuning or safety training. It will make factual errors and produce repetitive text on tasks requiring reasoning or arithmetic.

Citation

@misc{whitworth2026nanochat,
  author = {Clifford K. Whitworth},
  title = {nanochat-d20: A GPT trained from scratch on UNT H200s},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/cliffo4567/nanochat-d20}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cliffo4567/nanochat-d20

Finetunes
1 model

Paper for cliffo4567/nanochat-d20