Instructions to use Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf",
	filename="Pico-Lamma-3_2-1B-Reasoning-Instruct.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf
# Run inference directly in the terminal:
llama-cli -hf Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf
# Run inference directly in the terminal:
llama-cli -hf Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf
# Run inference directly in the terminal:
./llama-cli -hf Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf

Use Docker

docker model run hf.co/Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf

LM Studio
Jan
Ollama
How to use Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf with Ollama:
```
ollama run hf.co/Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf
```

Unsloth Studio

How to use Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf to start chatting

Docker Model Runner
How to use Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf with Docker Model Runner:
```
docker model run hf.co/Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf
```

Lemonade

How to use Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf

Run and chat with the model

lemonade run user.Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf-{{QUANT_TAG}}

List all available models

lemonade list

Model Details

Model Description

Precacons/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf Model Overview: Precacons/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf is a highly efficient and accurate language model fine-tuned on the “meta-llama/Llama-3.2-1B-Instruct” base model. Despite its compact size of just 0.99GB, it delivers exceptional performance, particularly in tasks requiring logical reasoning and structured thought processes.

Developed by: Shourya Shashank
Model type: Transformer-based Language Model
Language(s) (NLP): English
License: AGPL-3.0
Finetuned from model [optional]: meta-llama/Llama-3.2-1B-Instruct

Key Features:

Compact Size: At only 0.99GB, it is lightweight and easy to deploy, making it suitable for environments with limited computational resources.
High Accuracy: The model’s training on a specialized chain of thought and reasoning dataset enhances its ability to perform complex reasoning tasks with high precision.
Fine-Tuned on Meta-Llama: Leveraging the robust foundation of the “meta-llama/Llama-3.2-1B-Instruct” model, it inherits strong language understanding and generation capabilities.

Applications:

Educational Tools: Ideal for developing intelligent tutoring systems that require nuanced understanding and explanation of concepts.
Customer Support: Enhances automated customer service systems by providing accurate and contextually relevant responses.
Research Assistance: Assists researchers in generating hypotheses, summarizing findings, and exploring complex datasets.

Uses

Lightweight: The software is designed to be extremely lightweight, ensuring it can run efficiently on any system without requiring extensive resources.
Natural Language Understanding: Ideal for applications requiring human-like text understanding and generation, such as chatbots, virtual assistants, and content generation tools.
Small Size: Despite its compact size of just 0.99GB, it packs a powerful punch, making it easy to download and install.
High Reliability: The reliability is significantly enhanced due to the chain-of-thought approach integrated into its design, ensuring consistent and accurate performance.

Direct Use

Problem Explanation: Generate detailed descriptions and reasoning for various problems, useful in educational contexts, customer support, and automated troubleshooting.
Natural Language Understanding: Ideal for applications requiring human-like text understanding and generation, such as chatbots, virtual assistants, and content generation tools.
Compact Deployment: Suitable for environments with limited computational resources due to its small size and 4-bit quantization.

Downstream Use [optional]

Educational Tools: Fine-tune the model on educational datasets to provide detailed explanations and reasoning for academic subjects.
Customer Support: Fine-tune on customer service interactions to enhance automated support systems with accurate and context-aware responses.

Bias, Risks, and Limitations

Limitations

Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf is a compact model designed for efficiency, but it comes with certain limitations:

Limited Context Understanding:
- With a smaller parameter size, the model may have limitations in understanding and generating contextually rich and nuanced responses compared to larger models.
Bias and Fairness:
- Like all language models, Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf may exhibit biases present in the training data. Users should be cautious of potential biases in the generated outputs.
Resource Constraints:
- While the model is designed to be efficient, it still requires a GPU for optimal performance. Users with limited computational resources may experience slower inference times.

Example Usage:

import predacons

# Load the model and tokenizer
model_path = "Precacons/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf"
model = predacons.load_model(model_path) 
tokenizer = predacons.load_tokenizer(model_path)

# Example usage
chat = [
    {"role": "user", "content": "A train travelling at a speed of 60 km/hr is stopped in 15 seconds by applying the brakes. Determine its retardation."},
]
res = predacons.chat_generate(model = model,
        sequence = chat,
        max_length = 5000,
        tokenizer = tokenizer,
        trust_remote_code = True,
        do_sample=True,
        gguf_file = "Pico-Lamma-3_2-1B-Reasoning-Instruct.gguf"
        )

print(res)

This example demonstrates how to load the Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf model and use it to generate an explanation for a given query, keeping in mind the limitations mentioned above.

Model Card Authors [optional]

Shourya Shashank

Downloads last month: 38

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Predacon/Pico-Lamma-3.2-1B-Reasoning-Instruct-gguf

Base model

meta-llama/Llama-3.2-1B-Instruct

Quantized

(373)

this model