You need to agree to use this model only for research or education purposes under Reactive AI Model & Architecture License (RAML) v1.0

The repository will be available instantly after accepting license terms

Accept Reactive AI Model & Architecture License (RAML) v1.0 terms to access the repository and use model. Reactive Transformer (pending patent #P.453260) is available for free for non-commercial usage. For commercial usage please contact Reactive AI at [email protected]

Log in or Sign Up to review the conditions and access this model content.

RxT-Beta Decoder Base (2.85B A190M)

Training & docs in progress

Progress ~40B/250B tokens

RxT-Beta is the world's first real-scale stateful Reactive Language Model (RxLM), made to confirm new Reactive Transformer (RxT) scaling laws and solve all the biggest stateless LLMs problems. RxT models are natively conversational (and agentic) - instead of reprocessing all the conversation history (chat template) like all the LLMs, it processes only single interactions in real-time and moves the context to dedicated embedding-based memory, that's updated asynchronously between the interactions. It introduces unique features like:

  • infinite conversation & global context through Mixture-of-Memory (MoM)
  • live continual learning from interactions in real-time
  • true real-time processing with near-zero latency
  • linear conversation cost scaling
  • fixed computational cost and memory usage for each interaction
  • increasing quality of responses with subsequent steps of dialogue, without "long-term hallucinations"
  • natively encoded memory, impossible to read without the model
  • extreme pre-training efficiency

In first small scale experiments RxT-Alpha models achieved about 50% higher accuracy and almost 2x lower perplexity, than the same size stateless decoder-only baseline, trained on the same simple synthetic dataset (additionally, decoder-only model was pre-trained on 5x more tokens). These results were then confirmed on small 10B tokens subset of real-world data and ~0.3B models (RxT-Beta Micro), where RxT advantage was even bigger. These promising results, along with all the unique features, demonstrate that Reactive Transformer is a revolutionary generational leap and a crucial milestone on the path to Artificial General Intelligence (AGI). Of course, if we will confirm this at scale, which is what we plan to do with RxT-Beta.

The goal is to compete with ~1-3B params dense stateless LLMs, pre-trained on trillions tokens, using model with only 190M active parameters and about 250B pre-training tokens, and significantly outperform them on long multi-turn conversations.

Base models

Reactive Transformer models require new dedicated training pipeline to handle its asynchronous memory and reversed decoder-encoder order. Base models are result of the first supervised stage - Joint LM Pre-Training with "cheated context" teacher forcing (more info in Training Process section).

Base decoder (this model) is not a typical generative model. It requires further training and should be connected with encoder and memory attention network, so this model is only the starting point for next stages. It's pre-trained for general knowledge (with focus on reasoning) using textbook quality datasets and it could be further fine-tuned for custom use cases (under the terms of the RAML v1.0 license).

Decoder architecture

  • layers: 25 (21 stateful MoE + 3 stateless MoE + 1 stateless dense)
  • dim: 512
  • self-attention: Gated Sparse Query Attention (SQA) 8/16 query heads & 4/16 key/value heads
  • memory cross-attention: Sparse Query Attention (SQA) 8/16 query heads & 4/16 key/value heads
  • feed forward: Sparse Mixture-of-Experts (MoE) with gated shared experts
    • routed experts: 384
    • active experts: 10
    • routed expert dim: 192
    • shared experts: 2 with softmax gating
    • shared expert dim: 384
    • activation: SwiGLU
  • dense layer: 1536 dim with SwiGLU activation
  • vocab: 65k (english + polish)
  • params: 2.85B with 190M activated per token
Downloads last month
93
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train ReactiveAI/RxT-Beta-Decoder-Base

Collection including ReactiveAI/RxT-Beta-Decoder-Base