Spectra
Collection
2 items • Updated
The model utilizes a massive Mixture-of-Experts (MoE) structure that balances extreme scale with computational efficiency.
| Component | Technology | Function |
|---|---|---|
| Vision Encoder | VidEnc (637M) | Processes images and videos natively; supports arbitrary aspect ratios and resolutions. |
| Audio System | Audio-Code-S | Discretizes audio into semantic and acoustic codebooks at 16.67 Hz. |
| Streaming Encoder | FSMN-based | Uses Feedforward Sequential Memory Networks for low-latency audio processing. |
| Fusion Strategy | Early-Fusion | Aligning all modalities (text, audio, visual) within a shared latent space for unified reasoning. |
| Feature | Specification |
|---|---|
| Total Parameters | 561B |
| Activated Parameters | 27B |
| Expert Configuration | 512 Routed; 256 Zero Experts |
| Context Window | 128K tokens |
| Primary Tasks | Audio, Visual, Text, Video-Continuation |
# install Spectra-Omni environment
conda create -n spectra python=3.10
conda activate spectra
# install dependencies
pip install torch transformers flash_attn
Due to its massive scale (561B), Spectra requires multi-node clusters or high-memory instances (e.g., 16×H800) for inference in BF16.
from spectra_omni import SpectraModel
model = SpectraModel.from_pretrained("thenexthub/Spectra-561B-27B-768E")
# Seamlessly process audio, image, and text inputs
The model weights are released under the Omnira License. This license does not grant any rights to use Omnira trademarks or patents.
@misc{omnira2026spectra,
title={Spectra-561B-27B-768E: Unified Omni-modal Intelligence},
author={Omnira},
year={2026},
url={https://github.com/theomnira/Spectra-Omni},
}