nanochat-d20

A 896 million parameter GPT-style language model trained from scratch on a single NVIDIA H200 GPU at the Anuradha and Vikas Sinha Department of Data Science housed in the College of Information at the University of North Texas.

This model is based on karpathy/nanochat and was built as an educational demonstration of the transformer architecture described in Attention is All You Need (Vaswani et al., 2017).

Model Details

Property	Value
Parameters	896 million
Layers (depth)	20
Attention heads	10
Embedding size	1280
Vocabulary size	32,768
Context length	2048 tokens
Training tokens	5.2 billion
Training time	~8.5 hours
Hardware	1x NVIDIA H200 (143GB)
Tokenizer	BPE (rustbpe)
Training data	ClimbMix

Benchmark Results

Benchmark	Score
HellaSwag (10-shot)	0.522
Winograd (0-shot)	0.626
Winogrande (0-shot)	0.546
ARC-Easy (10-shot)	0.328
PIQA (10-shot)	0.568
CORE	0.2462

Purpose

This model was trained as part of a workshop demonstrating the full ML pipeline:

Build — construct a transformer from scratch based on the original paper
Train — pretrain on billions of tokens of real text data
Share — publish weights to HuggingFace
Quantize — reduce model size with bitsandbytes
Fine-tune — adapt the model for specific tasks with LoRA

Limitations

This is an educational base model, not a production system. It has no instruction tuning or safety training. It will make factual errors and produce repetitive text on tasks requiring reasoning or arithmetic.

Citation

@misc{whitworth2026nanochat,
  author = {Clifford K. Whitworth},
  title = {nanochat-d20: A GPT trained from scratch on UNT H200s},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/cliffo4567/nanochat-d20}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cliffo4567/nanochat-d20

Finetunes

1 model

Paper for cliffo4567/nanochat-d20

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 124