Instructions to use kshitizz36/provn-gemma4-e2b-q4km with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kshitizz36/provn-gemma4-e2b-q4km with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="kshitizz36/provn-gemma4-e2b-q4km",
	filename="provn-gemma4-e2b-q4km.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use kshitizz36/provn-gemma4-e2b-q4km with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kshitizz36/provn-gemma4-e2b-q4km
# Run inference directly in the terminal:
llama-cli -hf kshitizz36/provn-gemma4-e2b-q4km

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kshitizz36/provn-gemma4-e2b-q4km
# Run inference directly in the terminal:
llama-cli -hf kshitizz36/provn-gemma4-e2b-q4km

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf kshitizz36/provn-gemma4-e2b-q4km
# Run inference directly in the terminal:
./llama-cli -hf kshitizz36/provn-gemma4-e2b-q4km

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf kshitizz36/provn-gemma4-e2b-q4km
# Run inference directly in the terminal:
./build/bin/llama-cli -hf kshitizz36/provn-gemma4-e2b-q4km

Use Docker

docker model run hf.co/kshitizz36/provn-gemma4-e2b-q4km

LM Studio
Jan
Ollama
How to use kshitizz36/provn-gemma4-e2b-q4km with Ollama:
```
ollama run hf.co/kshitizz36/provn-gemma4-e2b-q4km
```

Unsloth Studio new

How to use kshitizz36/provn-gemma4-e2b-q4km with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kshitizz36/provn-gemma4-e2b-q4km to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kshitizz36/provn-gemma4-e2b-q4km to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for kshitizz36/provn-gemma4-e2b-q4km to start chatting

Docker Model Runner
How to use kshitizz36/provn-gemma4-e2b-q4km with Docker Model Runner:
```
docker model run hf.co/kshitizz36/provn-gemma4-e2b-q4km
```

Lemonade

How to use kshitizz36/provn-gemma4-e2b-q4km with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull kshitizz36/provn-gemma4-e2b-q4km

Run and chat with the model

lemonade run user.provn-gemma4-e2b-q4km-{{QUANT_TAG}}

List all available models

lemonade list

Provn Gemma 4 E2B Q4_K_M

This repository contains the GGUF Layer 3 semantic classifier used by Provn. It is a fine-tuned Gemma 4 derivative for binary leak classification on code snippets:

leak
clean

Training

Fine-tuned using Unsloth on the LeakBench dataset for binary secret and IP leak classification.

Base model: Gemma 4 E2B (google/gemma-4-e2b-it)
Fine-tuning framework: Unsloth
Task: Binary classification — leak / clean
Dataset: LeakBench (code snippets containing secrets, API keys, system prompts, and clean code)
Quantization: Q4_K_M GGUF via llama.cpp for on-device inference

Layer 3 is designed to handle the ambiguous 0.4–0.8 confidence band that regex and AST layers cannot resolve deterministically. The model was optimized for high recall over precision to minimize missed leaks.

Benchmarks

Evaluated on the LeakBench dataset:

Metric	Score
Recall	97.0%
False Positive Rate	1.2%
p50 latency	≤ 30ms
p95 latency	≤ 50ms
LLM inference (Layer 3)	< 800ms

Layer 3 only activates for ambiguous detections (confidence 0.4–0.8). High-confidence cases from Layer 1/2 skip it entirely, keeping average latency well under 50ms.

Architecture

Provn runs three detection layers in sequence:

Layer	Method	Latency
1a	Regex (30+ Gitleaks rules + NFKC normalization)	< 5ms
1b	Shannon entropy analysis	< 5ms
2	Tree-sitter AST taint tracking	< 50ms
3	This model — Gemma 4 E2B (on-device, optional)	< 800ms

This model is only invoked when Layers 1 and 2 return ambiguous results, making the overall system fast while still catching semantic leaks that deterministic rules miss.

Intended use

Use this model locally with Provn as the optional Layer 3 semantic classifier for ambiguous detections. No data leaves your machine — inference runs entirely on-device via llama.cpp.

Download location for Provn

Place the GGUF file at:

macOS/Linux: ~/.provn/models/provn-gemma4-e2b-q4km.gguf
Windows: %USERPROFILE%\.provn\models\provn-gemma4-e2b-q4km.gguf

Run with Provn

Start your llama.cpp-compatible server on 127.0.0.1:8080 with this GGUF, then run:

provn server status

Enable in provn.yml:

layers:
  semantic:
    enabled: true
    model: provn-gemma4-e2b-q4km.gguf
    endpoint: http://localhost:8080
    timeout_ms: 2000

Gemma terms

This model is a derivative of Gemma and is distributed subject to the Gemma Terms of Use and Gemma Prohibited Use Policy.

Gemma Terms of Use: https://ai.google.dev/gemma/terms
Gemma Prohibited Use Policy: https://ai.google.dev/gemma/prohibited_use_policy

Modification notice

This repository contains modified / fine-tuned model artifacts created for Provn.--- license: gemma base_model: google/gemma-4-e2b-it model_type: gemma4 tags: - gguf - gemma - provn - security - code - llama-cpp - fine-tuned - unsloth language: - en pipeline_tag: text-classification library_name: gguf

Provn Gemma 4 E2B Q4_K_M

This repository contains the GGUF Layer 3 semantic classifier used by Provn. It is a fine-tuned Gemma 4 derivative for binary leak classification on code snippets:

leak
clean

Training

Fine-tuned using Unsloth on the LeakBench dataset for binary secret and IP leak classification.

Base model: Gemma 4 E2B (google/gemma-4-e2b-it)
Fine-tuning framework: Unsloth
Task: Binary classification — leak / clean
Dataset: LeakBench (code snippets containing secrets, API keys, system prompts, and clean code)
Quantization: Q4_K_M GGUF via llama.cpp for on-device inference

Benchmarks

Evaluated on the LeakBench dataset:

Metric	Score
Recall	97.0%
False Positive Rate	1.2%
p50 latency	≤ 30ms
p95 latency	≤ 50ms
LLM inference (Layer 3)	< 800ms

Layer 3 only activates for ambiguous detections (confidence 0.4–0.8). High-confidence cases from Layer 1/2 skip it entirely, keeping average latency well under 50ms.

Architecture

Provn runs three detection layers in sequence:

Layer	Method	Latency
1a	Regex (30+ Gitleaks rules + NFKC normalization)	< 5ms
1b	Shannon entropy analysis	< 5ms
2	Tree-sitter AST taint tracking	< 50ms
3	This model — Gemma 4 E2B (on-device, optional)	< 800ms

This model is only invoked when Layers 1 and 2 return ambiguous results, making the overall system fast while still catching semantic leaks that deterministic rules miss.

Intended use

Use this model locally with Provn as the optional Layer 3 semantic classifier for ambiguous detections. No data leaves your machine — inference runs entirely on-device via llama.cpp.

Download location for Provn

Place the GGUF file at:

macOS/Linux: ~/.provn/models/provn-gemma4-e2b-q4km.gguf
Windows: %USERPROFILE%\.provn\models\provn-gemma4-e2b-q4km.gguf

Run with Provn

Start your llama.cpp-compatible server on 127.0.0.1:8080 with this GGUF, then run:

provn server status

Enable in provn.yml:

layers:
  semantic:
    enabled: true
    model: provn-gemma4-e2b-q4km.gguf
    endpoint: http://localhost:8080
    timeout_ms: 2000

Gemma terms

This model is a derivative of Gemma and is distributed subject to the Gemma Terms of Use and Gemma Prohibited Use Policy.

Gemma Terms of Use: https://ai.google.dev/gemma/terms
Gemma Prohibited Use Policy: https://ai.google.dev/gemma/prohibited_use_policy

Modification notice

This repository contains modified / fine-tuned model artifacts created for Provn.

Downloads last month: 69

GGUF

Model size

601 params

Architecture

gemma4

Hardware compatibility

We're not able to determine the quantization variants.

View all variants