--- language: - en - fr - es - zh - hi - ja - ru license: apache-2.0 library_name: flax tags: - jax - flax - tpu - text-generation - base-model - custom-architecture datasets: - HuggingFaceFW/fineweb-edu - bigcode/starcoderdata - HuggingFaceFW/fineweb-2 - open-web-math/open-web-math metrics: - loss - perplexity pipeline_tag: text-generation inference: false --- # Zenyx-Base-220M: High-Density Nano Foundation Model
![Model Architecture](https://img.shields.io/badge/Model-Zenyx_Base-blue?style=for-the-badge) ![Parameter Count](https://img.shields.io/badge/Params-220M-orange?style=for-the-badge) ![Training Tokens](https://img.shields.io/badge/Tokens-153B-green?style=for-the-badge) ![Format](https://img.shields.io/badge/Weights-Safetensors-yellow?style=for-the-badge)
**Zenyx-Base-220M** is a 220 million parameter causal language model built from scratch using JAX/Flax on Kaggle TPU v5e-8. Unlike typical small models trained on limited data, Zenyx-Base was trained on **~153 Billion tokens**β€”far exceeding the Chinchilla optimal point for this parameter count. This "over-training" strategy was employed to maximize the information density and logic capabilities of the weights, creating a robust foundation for reasoning tasks. ## 🧠 Model Description * **Architecture:** Custom Llama-style Transformer (RoPE, SwiGLU, RMSNorm, Grouped Query Attention). * **Tokenizer:** Qwen 2.5 Tokenizer (151,650 Vocab Size) for high compression efficiency. * **Context Window:** 2048 Tokens. * **Training Hardware:** TPU v5e-8. * **Final Validation Loss:** **~2.38** (Exceptional convergence for 220M). ### Technical Specifications | Hyperparameter | Value | | :--- | :--- | | **Layers** | 12 | | **Hidden Dim** | 768 | | **MLP Dim** | 3072 | | **Attention Heads** | 12 | | **KV Heads** | 4 (GQA) | | **Vocab Size** | 151,646 | ## πŸ“š Training Curriculum (The "Omni-Mix") The model was trained using a rigorous 4-stage curriculum designed to layer capabilities sequentially: 1. **Phase 1: Fundamentals (FineWeb-Edu)** * Focus on high-quality educational English text to establish linguistic baselines. 2. **Phase 2: Logic & Structure (StarCoder - Python)** * Introduction of code data to enforce logical indentation, syntax, and structured thinking. 3. **Phase 3: Multilingualism (FineWeb-2)** * Exposure to 6 major languages (Hindi, Chinese, Russian, Japanese, French, Spanish) to expand the semantic embedding space. 4. **Phase 4: The Infinite Polish (Omni-Mix)** * A weighted interleaving of all previous datasets plus **OpenWebMath** to converge the model's logic and language capabilities. ## πŸ’» Usage This model is a raw **JAX/Flax** checkpoint saved in `.safetensors` format. It uses a custom architecture definition and requires `flax` and `jax` to run. ### Loading with JAX/Flax ```python import jax import jax.numpy as jnp from flax.training import train_state from flax import serialization from safetensors.flax import load_file from transformers import AutoTokenizer import flax.linen as nn # 1. Define Architecture (Must match training config) class TransformerLM(nn.Module): vocab_size: int embed_dim: int = 768 num_layers: int = 12 num_heads: int = 12 num_kv_heads: int = 4 mlp_dim: int = 3072 max_length: int = 2048 dropout_rate: float = 0.0 # ... (Insert full model class definition here from the training script) ... # 2. Load Resources repo_id = "Arko007/Zenyx_Base_220M" tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", trust_remote_code=True) # 3. Initialize & Load Weights model = TransformerLM(vocab_size=len(tokenizer)) dummy_input = jnp.ones((1, 1), dtype=jnp.int32) params = model.init(jax.random.PRNGKey(0), dummy_input)['params'] # Load Safetensors # Ensure model.safetensors is downloaded locally loaded_params = load_file("model.safetensors") print("Weights loaded successfully!") ``` ## ⚠️ Limitations - Size: At 220M parameters, the model's knowledge retrieval capacity is limited compared to 7B+ models. - Base Model: This is a pre-trained base. It has not been fine-tuned for chat or instruction following (see Zenyx-DeepSeek-220M for the instruct version). - Hallucinations: While logically consistent, it may generate factually incorrect statements. ## πŸ“œ Citation ```python @misc{ZenyxBase220M, title = {Zenyx-Base-220M: High-Density Foundation Model}, author = {Arko007}, year = {2025}, publisher = {HuggingFace}, url = {[https://huggingface.co/Arko007/Zenyx_Base_220M](https://huggingface.co/Arko007/Zenyx_Base_220M)} } ```