YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Wire-Speed Transformer: Real-Time Learning from Live Network Streams

A novel approach to transformer training that learns directly from network traffic in real-time.

πŸ”₯ Key Results

Time Tokens Loss Notes
0s 0 - Start
14s 10k 50.08 Initial
192s 100k 22.32 -55%
302s 170k 16.78 -66%
355s 190k 15.91 -68%

Loss dropped from 50 β†’ 16 in under 6 minutes using only 32-token micro-batches from raw, uncurated web data.

🧠 What Makes This Different

Traditional transformer training requires:

  • Large batch sizes (4096+)
  • Multiple epochs over curated data
  • Expensive preprocessing pipelines
  • Hours/days of training

Wire-Speed Learning uses:

  • 32-token micro-batches (125x smaller)
  • Single pass (no epochs)
  • Raw web data (no curation)
  • Online SGD (update every 32 tokens)
  • Real-time network stream (Rust crawler β†’ Python trainer)

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Rust Crawler   │────▢│  Tokenizer   │────▢│ Python Trainer  β”‚
β”‚  (500 workers)  β”‚     β”‚ (DeepSeek)   β”‚     β”‚  (36M params)   β”‚
β”‚  ~500 pages/s   β”‚     β”‚  128k vocab  β”‚     β”‚  ~500 tok/s     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                                           β”‚
         β–Ό                                           β–Ό
   Live Internet                              Gradient Update
   (no robots.txt)                            (every 32 tokens)

πŸ“Š Model Config

CONFIG = {
    "d": 256,        # embedding dim
    "layers": 4,     # transformer layers
    "heads": 8,      # attention heads
    "rank": 32,      # tuneable attention rank
    "vocab": 128256, # DeepSeek V3.2 tokenizer
    "ctx": 512,      # context window
}
# Total: 35,993,088 parameters (36M)

πŸš€ Quick Start

Requirements

  • CUDA GPU (8GB+ VRAM)
  • Rust toolchain
  • Python 3.8+
  • PyTorch 2.0+

Installation

# Clone
git clone https://huggingface.co/OpenTransformer/wire-speed-transformer
cd wire-speed-transformer

# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source ~/.cargo/env

# Build Rust crawler
cd feeder && cargo build --release && cd ..

# Download DeepSeek tokenizer
curl -sL https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/tokenizer.json -o tokenizer.json

# Install Python deps
pip install torch

# Run!
./feeder/target/release/wire_feeder 2>feeder.log | python3 stream_trainer.py

πŸ“ Files

  • stream_trainer.py - Python transformer trainer (online learning)
  • feeder/ - Rust high-speed web crawler + tokenizer
  • tokenizer.json - DeepSeek V3.2 tokenizer (download separately)
  • run.sh - Launch script

πŸ”¬ Why This Works (Hypotheses)

  1. Small models converge faster - 36M params needs less data than 7B
  2. High update frequency - More gradient signal despite noise
  3. Web has structure - HTML patterns, common phrases provide learning signal
  4. DeepSeek tokenizer - High-quality tokenization from SOTA model

⚠️ Limitations

  • No evaluation yet (just training loss)
  • Model is tiny (36M) - won't match GPT-4
  • Catastrophic forgetting not measured
  • Raw web data quality unknown

πŸ“ Citation

@misc{wirespeed2026,
  title={Wire-Speed Transformer: Real-Time Learning from Live Network Streams},
  author={OpenTransformers},
  year={2026},
  url={https://huggingface.co/OpenTransformer/wire-speed-transformer}
}

πŸ™ Acknowledgments

  • DeepSeek for the tokenizer
  • Anthropic's Claude for pair programming
  • vast.ai for GPU compute

πŸ“œ License

MIT


Built by OpenTransformers - Pushing the boundaries of what's possible with transformers.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support