File size: 4,360 Bytes
2bb99c7 3d18014 2bb99c7 3d18014 2bb99c7 d60a3ab 0782c16 d60a3ab 2bb99c7 514f6e4 2bb99c7 514f6e4 2bb99c7 514f6e4 2bb99c7 3d18014 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
---
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- looped-language-model
- reasoning
- recurrent-depth
---
# Ouro-2.6B

[📚 Paper (Hugging Face)](https://huggingface.co/papers/2510.25741) - [🌐 Project Page](https://ouro-llm.github.io/)
## Model Description
**⚠️ IMPORTANT: This model is intended for research purposes only. It is provided as-is without warranties for production use.**
**Ouro-2.6B** is a 2.6 billion parameter Looped Language Model (LoopLM) that achieves exceptional parameter efficiency through iterative shared-weight computation.

## Key Features
- **Exceptional Parameter Efficiency**: Matches 3-4B standard transformer performance with only 2.6B parameters
- **Iterative Latent Reasoning**: Performs reasoning through recurrent computation in latent space
- **Adaptive Computation**: Supports early exit mechanisms for dynamic compute allocation
## Configuration
### Recurrent Steps and Adaptive Exit
The model's computational behavior can be configured through the `config.json` file:
```json
{
"total_ut_steps": 4,
"early_exit_threshold": 1.0
}
```
- **`total_ut_steps`**: Controls the number of recurrent steps (default: 4). You can adjust this value to trade off between performance and computation time.
- **`early_exit_threshold`**: Controls the adaptive exit mechanism (default: 1.0). Lower values encourage earlier exit, while 1.0 means always use all steps.
**Example: Modify recurrent steps**
```python
from transformers import AutoConfig, AutoModelForCausalLM
config = AutoConfig.from_pretrained("ouro-llm/Ouro-2.6B")
config.total_ut_steps = 3 # Use 3 recurrent steps instead of 4
model = AutoModelForCausalLM.from_pretrained(
"ByteDance/Ouro-2.6B",
config=config,
device_map="auto"
)
```
> **Note**: vLLM does not currently support the adaptive exit feature due to its inference optimization characteristics. When using vLLM, the model will always execute the full number of `total_ut_steps`.
## Model Architecture
Ouro-2.6B is based on the decoder-only Transformer architecture with parameter sharing across recurrent steps:
| Configuration | Value |
|:---|:---|
| **Parameters** | 2.6B |
| **Layers** | 24 |
| **Recurrent Steps** | 4 |
| **Hidden Size** | 2048 |
| **Attention Heads** | Multi-Head Attention (MHA) |
| **FFN Activation** | SwiGLU |
| **Position Embedding** | RoPE |
| **Vocabulary Size** | 49,152 |
| **Context Length** | 4K (training), extendable to 64K |
| **Normalization** | Sandwich RMSNorm |
## Training Details
- **Training Tokens**: 7.7T tokens
- **Training Pipeline**:
- Stage 1: Pre-training (6T tokens)
- Stage 2: CT Annealing (2.6T tokens)
- Stage 3: Long Context Training (20B tokens)
- Stage 4: Mid-training (300B tokens)
- **Data Composition**: Web data, code, mathematics, long-context documents
- **Optimizer**: AdamW (β₁=0.9, β₂=0.95)
- **Learning Rate Scheduler**: Warmup-Stable-Decay (WSD)
## Quick Start
**⚠️ IMPORTANT**: Please use `transformers<4.56.0` to avoid compatibility issues. We recommend `transformers==4.54.1` or earlier versions.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ByteDance/Ouro-2.6B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto"
)
# Generate text
inputs = tokenizer("The future of AI is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Acknowledgments
We thank [@Antizana](https://github.com/Antizana) for the KV cache fix merged from [ouro-cache-fix](https://github.com/Antizana/ouro-cache-fix), which resolved a critical compatibility issue with transformers>=4.56.0.
## Citation
```bibtex
@article{zhu2025scaling,
title={Scaling Latent Reasoning via Looped Language Models},
author={Zhu, Rui-Jie and Wang, Zixuan and Hua, Kai and Zhang, Tianyu and Li, Ziniu and Que, Haoran and Wei, Boyi and Wen, Zixin and Yin, Fan and Xing, He and others},
journal={arXiv preprint arXiv:2510.25741},
year={2025}
}
## License
This model is licensed under Apache-2.0. See the LICENSE file for details.
--- |