Instructions to use SupraLabs/Supra-Mini-v4-2M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SupraLabs/Supra-Mini-v4-2M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SupraLabs/Supra-Mini-v4-2M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-Mini-v4-2M")
model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-Mini-v4-2M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use SupraLabs/Supra-Mini-v4-2M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SupraLabs/Supra-Mini-v4-2M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v4-2M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SupraLabs/Supra-Mini-v4-2M

SGLang

How to use SupraLabs/Supra-Mini-v4-2M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SupraLabs/Supra-Mini-v4-2M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v4-2M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SupraLabs/Supra-Mini-v4-2M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v4-2M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SupraLabs/Supra-Mini-v4-2M with Docker Model Runner:
```
docker model run hf.co/SupraLabs/Supra-Mini-v4-2M
```

🦅 Supra Mini v4 2M

Supra Mini v4 2M is a very tiny base model trained on 3 billion tokens of Fineweb-Edu for 2 epochs as the fourth version of our Supra Mini series.

Model Config

Parameters: 2,623,104 (2M)
Architecture: Llama
Vocab size with custom BPE tokenizer: 8192
Hidden Size: 128
Intermediate Size: 512
Hidden Layers: 6
Attention Heads: 4
Max Position Embeddings: 1024
Learning rate: 3e-4
Weight Decay: 0.01
Trained in bfloat16

Final Loss

This model reached a final train loss after 2 epochs of 4.618.

Benchmarks

All benchmarks were executed using lm-eval.

Task	Value	Random level
Arc_Easy	0.3152	0.25 (25%)
Wikitext	3.1652	-
BLiMP	0.607	0.5 (50%)

Examples

Prompt: "Artificial intelligence is "
Output:: "Artificial intelligence is the idea of the theory that the world has a very high-performance technology, which is also more important to society’s lives than people who are being able to find their own knowledge and understanding how it can be used for future generations. It would have been an essential step in thinking about what they think about and why they do not work with them. It was just one thing you need to know as many ways to get involved on our planet. So if we did this new approach to AI in science or physics, then I am going to make sure there were no things that really could be done so much like that. But I don't want to take into consideration the way that they might change. That's"

Prompt: "The main concept of physics is "
Output: "The main concept of physics is `'animi-'hisi', and therefore the universe's own light. In this case, a theory that is not only used to explain what it can be called "the universe" or 'two planets, which are exactly about the earth's gravitational energy, but also in reality, we know how much things do. It will actually mean that the stars from the Earth’s orbit, as the galaxy, would say, they have to get into the planet. The same thing that has been discovered at all, there was nothing more than that of anthropological world than those who were now doing so. And if you don't think, why does this matter? It seems that I am"

Prompt: "Once upon a time, "
Output: "Once upon a time, iconico was taken to the end of this year. The first step in our work is that it would be very difficult for us to do with my students and then I will not know what you have to read on the same page but if you are going to find out more about how we need to look at some of your questions from each other. If you think there is an opportunity to tell the story to make sure you’re interested in writing a book, please see a picture here: http://www.cidpubs.com/diside-growing-built_number/will-title/maybeat/kiki+femin -malej"

Usage

To use our model, just run this code using HF Transformers to execute the model:

from transformers import pipeline
import torch

print("[*] Loading Supra Mini v4 2M model from Hugging Face Hub...")
pipe = pipeline(
    "text-generation", 
    model="SupraLabs/Supra-Mini-v4-2M",
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

def generate_text(prompt, max_length=150):
    result = pipe(
        prompt, 
        max_new_tokens=max_length,
        do_sample=True,
        temperature=0.5,
        top_k=25,
        top_p=0.9,
        repetition_penalty=1.2,
        pad_token_id=pipe.tokenizer.pad_token_id,
        eos_token_id=pipe.tokenizer.eos_token_id
    )
    return result[0]['generated_text']

test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))

Training guide

We trained Supra Mini v4 2M on a single NVIDIA RTX 5060 Ti 16GB in ~3 hours for 2 epochs.
The full training code can be found in this repo as train_tokenizer.py (train costum BPE tokenizer with vocab size of 8192), train.py (train the model) and inference.py (test the model).
The model was trained on the first 3 billion tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.

Downloads last month: -

Safetensors

Model size

468k params

Tensor type

F32

Dataset used to train SupraLabs/Supra-Mini-v4-2M

Collections including SupraLabs/Supra-Mini-v4-2M

Supra Mini series

Collection

All models of the Supra Mini series. • 4 items • Updated about 7 hours ago • 1

All Supra models

Collection

ALL the family(micro, nano, small, large, ALL SIZES AND EXPERIMENTS!) • 5 items • Updated about 7 hours ago • 1