i3 Model - Memory-Optimized Efficient Conversational Language Model
Model Description
The i3 Model is a memory-optimized language model designed for conversational understanding. This version uses streaming tokenization to minimize RAM usage during training.
Model Statistics
- Vocabulary Size: 4,466 (variable-length chunks)
- Hidden Dimension: 512
- Number of Layers: 24
- Max Sequence Length: 256
- Total Parameters: 22,640,626
- Tokenization: Memory-efficient variable-length chunking (2-3 characters)
To use the model check the user.py.
Key Features
- Memory-Optimized: Streaming tokenization reduces RAM usage significantly
- Proprietary Hybrid Architecture: Advanced sequence processing with linear complexity
- Variable-Length Tokenization: Smart chunking strategy for better compression
- Conversational Focus: Specialized for dialogue and emotional understanding
Training Details
- Dataset: TinyChat
- Training Objective: Next-token prediction with proprietary optimization
- Framework: PyTorch
- Memory Optimization: Streaming dataset processing
Technical Report: i3 Pre-training
- Executive Summary
The i3 model, a small-scale text generation architecture, successfully completed its initial pre-training phase. This training was conducted on an NVIDIA GeForce RTX 3060 and required approximately 17 hours of continuous processing. The resulting model artifacts are configured for deployment on the HuggingFace platform.
The model is characterized by a compact architecture featuring 24 layers and a hidden dimension of 512, paired with a custom "chunk" tokenization strategy designed for efficiency on conversational data.
- Model Configuration and Architecture
The i3Model architecture is designed to be highly efficient, likely incorporating elements of a State Space Model (SSM) due to the low-rank and state-space parameters (rank and d_state).
| Parameter |
Value |
Description |
| Model Type |
i3Model |
Custom, high-efficiency architecture (likely SSM-enhanced). |
| Hidden Dimension (d_{model}) |
512 |
The size of the vector space for internal representations. |
| Number of Layers (n_{layers}) |
24 |
The depth of the model's processing blocks. |
| Attention Heads (n_{heads}) |
16 |
The number of parallel attention mechanisms (if applicable). |
| State Dimension (d_{state}) |
64 |
Indicates the size of the recurrent state, common in SSMs. |
| Rank |
128 |
Potentially used for low-rank projection in attention or state mechanisms. |
| Max Sequence Length |
256 |
The maximum number of tokens/chunks the model can process at once. |
| Vocabulary Size |
4,466 |
The total number of unique chunks/tokens in the vocabulary. |
- Training Environment and Duration
The training phase was characterized by high hardware efficiency, achieving a complete pre-training run on consumer-grade hardware in a short timeframe.
- Hardware Used: NVIDIA GeForce RTX 3060 (12GB VRAM assumed).
- Total Training Time: Approximately 17 hours.
- Framework: PyTorch (with HuggingFace Transformers for generation of final files).
- Training Data and Procedure
Dataset
The model was pre-trained using the TinyChat dataset, which comprised 1,000,000 conversations. This suggests the model is optimized for rapid, short-form conversational tasks.
Tokenization Strategy
A crucial element of the model's efficiency is its custom tokenization approach:
- Tokenizer Type: chunk
- Strategy: variable_2_3
- Vocabulary: The vocabulary size is notably small (4,466 chunks), indicating that the tokenizer is designed to aggregate common sequences of text into single tokens, significantly reducing the effective sequence length and computational cost during training.
Performance Metrics
Training showed consistent iteration steps, with the log reporting final metrics as the process concluded:
| Metric |
Range (Last 500 Iterations) |
Observation |
| Loss |
1.98 - 2.27 |
Training loss remained relatively stable, suggesting convergence towards the end of the run. |
| Perplexity (PPL) |
7.29 - 9.70 |
Perplexity is a measure of how well the model predicts the next token. This range is typical for raw pre-training logs and indicates the model has learned basic sequence dependencies. |
| Time per Iteration |
\sim 8.2 \text{s} - 12.7 \text{s} |
Processing time per iteration shows a sustained and efficient training throughput. |
- Deliverables
Upon completion, the necessary files for deployment were generated into the i3_model_hf/ directory, ensuring immediate compatibility with the HuggingFace ecosystem:
- pytorch_model.bin (Model Weights)
- config.json (Model Configuration)
- tokenizer.json (Vocabulary File)
- tokenizer_config.json (Tokenizer Configuration)
The model is now ready for fine-tuning on a specific downstream task or for evaluation of its foundational text generation capabilities.