🦅 Supra Mini v4 2M

Supra Mini v4 2M is a very tiny base model trained on 3 billion tokens of Fineweb-Edu for 2 epochs as the fourth version of our Supra Mini series.

Model Config

  • Parameters: 2,623,104 (2M)
  • Architecture: Llama
  • Vocab size with custom BPE tokenizer: 8192
  • Hidden Size: 128
  • Intermediate Size: 512
  • Hidden Layers: 6
  • Attention Heads: 4
  • Max Position Embeddings: 1024
  • Learning rate: 3e-4
  • Weight Decay: 0.01
  • Trained in bfloat16

Final Loss

This model reached a final train loss after 2 epochs of 4.618.

Benchmarks

All benchmarks were executed using lm-eval.

Task Value Random level
Arc_Easy 0.3152 0.25 (25%)
Wikitext 3.1652 -
BLiMP 0.607 0.5 (50%)

Benchmarks Table

Examples

Prompt: "Artificial intelligence is "
Output:: "Artificial intelligence is the idea of the theory that the world has a very high-performance technology, which is also more important to society’s lives than people who are being able to find their own knowledge and understanding how it can be used for future generations. It would have been an essential step in thinking about what they think about and why they do not work with them. It was just one thing you need to know as many ways to get involved on our planet. So if we did this new approach to AI in science or physics, then I am going to make sure there were no things that really could be done so much like that. But I don't want to take into consideration the way that they might change. That's"

Prompt: "The main concept of physics is "
Output: "The main concept of physics is `'animi-'hisi', and therefore the universe's own light. In this case, a theory that is not only used to explain what it can be called "the universe" or 'two planets, which are exactly about the earth's gravitational energy, but also in reality, we know how much things do. It will actually mean that the stars from the Earth’s orbit, as the galaxy, would say, they have to get into the planet. The same thing that has been discovered at all, there was nothing more than that of anthropological world than those who were now doing so. And if you don't think, why does this matter? It seems that I am"

Prompt: "Once upon a time, "
Output: "Once upon a time, iconico was taken to the end of this year. The first step in our work is that it would be very difficult for us to do with my students and then I will not know what you have to read on the same page but if you are going to find out more about how we need to look at some of your questions from each other. If you think there is an opportunity to tell the story to make sure you’re interested in writing a book, please see a picture here: http://www.cidpubs.com/diside-growing-built_number/will-title/maybeat/kiki+femin -malej"

Usage

To use our model, just run this code using HF Transformers to execute the model:

from transformers import pipeline
import torch

print("[*] Loading Supra Mini v4 2M model from Hugging Face Hub...")
pipe = pipeline(
    "text-generation", 
    model="SupraLabs/Supra-Mini-v4-2M",
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

def generate_text(prompt, max_length=150):
    result = pipe(
        prompt, 
        max_new_tokens=max_length,
        do_sample=True,
        temperature=0.5,
        top_k=25,
        top_p=0.9,
        repetition_penalty=1.2,
        pad_token_id=pipe.tokenizer.pad_token_id,
        eos_token_id=pipe.tokenizer.eos_token_id
    )
    return result[0]['generated_text']

test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))

Training guide

We trained Supra Mini v4 2M on a single NVIDIA RTX 5060 Ti 16GB in ~3 hours for 2 epochs.
The full training code can be found in this repo as train_tokenizer.py (train costum BPE tokenizer with vocab size of 8192), train.py (train the model) and inference.py (test the model).
The model was trained on the first 3 billion tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.

Downloads last month
-
Safetensors
Model size
468k params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SupraLabs/Supra-Mini-v4-2M

Collections including SupraLabs/Supra-Mini-v4-2M