YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Wire-Speed Transformer: Real-Time Learning from Live Network Streams

A novel approach to transformer training that learns directly from network traffic in real-time.

🔥 Key Results

Time	Tokens	Loss	Notes
0s	0	-	Start
14s	10k	50.08	Initial
192s	100k	22.32	-55%
302s	170k	16.78	-66%
355s	190k	15.91	-68%

Loss dropped from 50 → 16 in under 6 minutes using only 32-token micro-batches from raw, uncurated web data.

🧠 What Makes This Different

Traditional transformer training requires:

Large batch sizes (4096+)
Multiple epochs over curated data
Expensive preprocessing pipelines
Hours/days of training

Wire-Speed Learning uses:

32-token micro-batches (125x smaller)
Single pass (no epochs)
Raw web data (no curation)
Online SGD (update every 32 tokens)
Real-time network stream (Rust crawler → Python trainer)

🏗️ Architecture

┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐
│  Rust Crawler   │────▶│  Tokenizer   │────▶│ Python Trainer  │
│  (500 workers)  │     │ (DeepSeek)   │     │  (36M params)   │
│  ~500 pages/s   │     │  128k vocab  │     │  ~500 tok/s     │
└─────────────────┘     └──────────────┘     └─────────────────┘
         │                                           │
         ▼                                           ▼
   Live Internet                              Gradient Update
   (no robots.txt)                            (every 32 tokens)

📊 Model Config

CONFIG = {
    "d": 256,        # embedding dim
    "layers": 4,     # transformer layers
    "heads": 8,      # attention heads
    "rank": 32,      # tuneable attention rank
    "vocab": 128256, # DeepSeek V3.2 tokenizer
    "ctx": 512,      # context window
}
# Total: 35,993,088 parameters (36M)

🚀 Quick Start

Requirements

CUDA GPU (8GB+ VRAM)
Rust toolchain
Python 3.8+
PyTorch 2.0+

Installation

# Clone
git clone https://huggingface.co/OpenTransformer/wire-speed-transformer
cd wire-speed-transformer

# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source ~/.cargo/env

# Build Rust crawler
cd feeder && cargo build --release && cd ..

# Download DeepSeek tokenizer
curl -sL https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/tokenizer.json -o tokenizer.json

# Install Python deps
pip install torch

# Run!
./feeder/target/release/wire_feeder 2>feeder.log | python3 stream_trainer.py

📁 Files

stream_trainer.py - Python transformer trainer (online learning)
feeder/ - Rust high-speed web crawler + tokenizer
tokenizer.json - DeepSeek V3.2 tokenizer (download separately)
run.sh - Launch script

🔬 Why This Works (Hypotheses)

Small models converge faster - 36M params needs less data than 7B
High update frequency - More gradient signal despite noise
Web has structure - HTML patterns, common phrases provide learning signal
DeepSeek tokenizer - High-quality tokenization from SOTA model

⚠️ Limitations

No evaluation yet (just training loss)
Model is tiny (36M) - won't match GPT-4
Catastrophic forgetting not measured
Raw web data quality unknown

📝 Citation

@misc{wirespeed2026,
  title={Wire-Speed Transformer: Real-Time Learning from Live Network Streams},
  author={OpenTransformers},
  year={2026},
  url={https://huggingface.co/OpenTransformer/wire-speed-transformer}
}

🙏 Acknowledgments

DeepSeek for the tokenizer
Anthropic's Claude for pair programming
vast.ai for GPU compute

📜 License

MIT

Built by OpenTransformers - Pushing the boundaries of what's possible with transformers.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support