You need to agree to use this model only for research or education purposes under Reactive AI Model & Architecture License (RAML) v1.0

The repository will be available instantly after accepting license terms

Accept Reactive AI Model & Architecture License (RAML) v1.0 terms to access the repository and use model. Reactive Transformer (pending patent #P.453260) is available for free for non-commercial usage. For commercial usage please contact Reactive AI at [email protected]

RxT-Beta Decoder Base (2.85B A190M)

Training & docs in progress

Progress ~40B/250B tokens

RxT-Beta is the world's first real-scale stateful Reactive Language Model (RxLM), made to confirm new Reactive Transformer (RxT) scaling laws and solve all the biggest stateless LLMs problems. RxT models are natively conversational (and agentic) - instead of reprocessing all the conversation history (chat template) like all the LLMs, it processes only single interactions in real-time and moves the context to dedicated embedding-based memory, that's updated asynchronously between the interactions. It introduces unique features like:

infinite conversation & global context through Mixture-of-Memory (MoM)
live continual learning from interactions in real-time
true real-time processing with near-zero latency
linear conversation cost scaling
fixed computational cost and memory usage for each interaction
increasing quality of responses with subsequent steps of dialogue, without "long-term hallucinations"
natively encoded memory, impossible to read without the model
extreme pre-training efficiency

In first small scale experiments RxT-Alpha models achieved about 50% higher accuracy and almost 2x lower perplexity, than the same size stateless decoder-only baseline, trained on the same simple synthetic dataset (additionally, decoder-only model was pre-trained on 5x more tokens). These results were then confirmed on small 10B tokens subset of real-world data and ~0.3B models (RxT-Beta Micro), where RxT advantage was even bigger. These promising results, along with all the unique features, demonstrate that Reactive Transformer is a revolutionary generational leap and a crucial milestone on the path to Artificial General Intelligence (AGI). Of course, if we will confirm this at scale, which is what we plan to do with RxT-Beta.

The goal is to compete with ~1-3B params dense stateless LLMs, pre-trained on trillions tokens, using model with only 190M active parameters and about 250B pre-training tokens, and significantly outperform them on long multi-turn conversations.

Base models

Reactive Transformer models require new dedicated training pipeline to handle its asynchronous memory and reversed decoder-encoder order. Base models are result of the first supervised stage - Joint LM Pre-Training with "cheated context" teacher forcing (more info in Training Process section).

Base decoder (this model) is not a typical generative model. It requires further training and should be connected with encoder and memory attention network, so this model is only the starting point for next stages. It's pre-trained for general knowledge (with focus on reasoning) using textbook quality datasets and it could be further fine-tuned for custom use cases (under the terms of the RAML v1.0 license).

Decoder architecture

layers: 25 (21 stateful MoE + 3 stateless MoE + 1 stateless dense)
dim: 512
self-attention: Gated Sparse Query Attention (SQA) 8/16 query heads & 4/16 key/value heads
memory cross-attention: Sparse Query Attention (SQA) 8/16 query heads & 4/16 key/value heads
feed forward: Sparse Mixture-of-Experts (MoE) with gated shared experts
- routed experts: 384
- active experts: 10
- routed expert dim: 192
- shared experts: 2 with softmax gating
- shared expert dim: 384
- activation: SwiGLU
dense layer: 1536 dim with SwiGLU activation
vocab: 65k (english + polish)
params: 2.85B with 190M activated per token

Downloads last month: 93

Safetensors

Model size

3B params

Tensor type

BF16

Dataset used to train ReactiveAI/RxT-Beta-Decoder-Base

Collection including ReactiveAI/RxT-Beta-Decoder-Base

RxT-Beta 3B A190M - Reactive Transformer MVP

Collection

Reactive Transformer MVP model with 3B total params and 190M activated in decoder. Training in progress • 3 items • Updated 5 days ago