Instructions to use kshitizz36/provn-gemma4-e2b-q4km with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use kshitizz36/provn-gemma4-e2b-q4km with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="kshitizz36/provn-gemma4-e2b-q4km", filename="provn-gemma4-e2b-q4km.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use kshitizz36/provn-gemma4-e2b-q4km with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf kshitizz36/provn-gemma4-e2b-q4km # Run inference directly in the terminal: llama-cli -hf kshitizz36/provn-gemma4-e2b-q4km
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf kshitizz36/provn-gemma4-e2b-q4km # Run inference directly in the terminal: llama-cli -hf kshitizz36/provn-gemma4-e2b-q4km
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf kshitizz36/provn-gemma4-e2b-q4km # Run inference directly in the terminal: ./llama-cli -hf kshitizz36/provn-gemma4-e2b-q4km
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf kshitizz36/provn-gemma4-e2b-q4km # Run inference directly in the terminal: ./build/bin/llama-cli -hf kshitizz36/provn-gemma4-e2b-q4km
Use Docker
docker model run hf.co/kshitizz36/provn-gemma4-e2b-q4km
- LM Studio
- Jan
- Ollama
How to use kshitizz36/provn-gemma4-e2b-q4km with Ollama:
ollama run hf.co/kshitizz36/provn-gemma4-e2b-q4km
- Unsloth Studio new
How to use kshitizz36/provn-gemma4-e2b-q4km with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kshitizz36/provn-gemma4-e2b-q4km to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kshitizz36/provn-gemma4-e2b-q4km to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for kshitizz36/provn-gemma4-e2b-q4km to start chatting
- Docker Model Runner
How to use kshitizz36/provn-gemma4-e2b-q4km with Docker Model Runner:
docker model run hf.co/kshitizz36/provn-gemma4-e2b-q4km
- Lemonade
How to use kshitizz36/provn-gemma4-e2b-q4km with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull kshitizz36/provn-gemma4-e2b-q4km
Run and chat with the model
lemonade run user.provn-gemma4-e2b-q4km-{{QUANT_TAG}}List all available models
lemonade list
- Provn Gemma 4 E2B Q4_K_M
- Training
- Benchmarks
- Architecture
- Intended use
- Download location for Provn
- Run with Provn
- Gemma terms
- Modification notice
- This repository contains modified / fine-tuned model artifacts created for Provn.---
license: gemma
base_model: google/gemma-4-e2b-it
model_type: gemma4
tags:
- gguf
- gemma
- provn
- security
- code
- llama-cpp
- fine-tuned
- unsloth
language:
- en
pipeline_tag: text-classification
library_name: gguf
- Training
- Provn Gemma 4 E2B Q4_K_M
Provn Gemma 4 E2B Q4_K_M
This repository contains the GGUF Layer 3 semantic classifier used by Provn. It is a fine-tuned Gemma 4 derivative for binary leak classification on code snippets:
leakclean
Training
Fine-tuned using Unsloth on the LeakBench dataset for binary secret and IP leak classification.
- Base model: Gemma 4 E2B (
google/gemma-4-e2b-it) - Fine-tuning framework: Unsloth
- Task: Binary classification β
leak/clean - Dataset: LeakBench (code snippets containing secrets, API keys, system prompts, and clean code)
- Quantization: Q4_K_M GGUF via llama.cpp for on-device inference
Layer 3 is designed to handle the ambiguous 0.4β0.8 confidence band that regex and AST layers cannot resolve deterministically. The model was optimized for high recall over precision to minimize missed leaks.
Benchmarks
Evaluated on the LeakBench dataset:
| Metric | Score |
|---|---|
| Recall | 97.0% |
| False Positive Rate | 1.2% |
| p50 latency | β€ 30ms |
| p95 latency | β€ 50ms |
| LLM inference (Layer 3) | < 800ms |
Layer 3 only activates for ambiguous detections (confidence 0.4β0.8). High-confidence cases from Layer 1/2 skip it entirely, keeping average latency well under 50ms.
Architecture
Provn runs three detection layers in sequence:
| Layer | Method | Latency |
|---|---|---|
| 1a | Regex (30+ Gitleaks rules + NFKC normalization) | < 5ms |
| 1b | Shannon entropy analysis | < 5ms |
| 2 | Tree-sitter AST taint tracking | < 50ms |
| 3 | This model β Gemma 4 E2B (on-device, optional) | < 800ms |
This model is only invoked when Layers 1 and 2 return ambiguous results, making the overall system fast while still catching semantic leaks that deterministic rules miss.
Intended use
Use this model locally with Provn as the optional Layer 3 semantic classifier for ambiguous detections. No data leaves your machine β inference runs entirely on-device via llama.cpp.
Download location for Provn
Place the GGUF file at:
- macOS/Linux:
~/.provn/models/provn-gemma4-e2b-q4km.gguf - Windows:
%USERPROFILE%\.provn\models\provn-gemma4-e2b-q4km.gguf
Run with Provn
Start your llama.cpp-compatible server on 127.0.0.1:8080 with this GGUF, then run:
provn server status
Enable in provn.yml:
layers:
semantic:
enabled: true
model: provn-gemma4-e2b-q4km.gguf
endpoint: http://localhost:8080
timeout_ms: 2000
Gemma terms
This model is a derivative of Gemma and is distributed subject to the Gemma Terms of Use and Gemma Prohibited Use Policy.
- Gemma Terms of Use: https://ai.google.dev/gemma/terms
- Gemma Prohibited Use Policy: https://ai.google.dev/gemma/prohibited_use_policy
Modification notice
This repository contains modified / fine-tuned model artifacts created for Provn.--- license: gemma base_model: google/gemma-4-e2b-it model_type: gemma4 tags: - gguf - gemma - provn - security - code - llama-cpp - fine-tuned - unsloth language: - en pipeline_tag: text-classification library_name: gguf
Provn Gemma 4 E2B Q4_K_M
This repository contains the GGUF Layer 3 semantic classifier used by Provn. It is a fine-tuned Gemma 4 derivative for binary leak classification on code snippets:
leakclean
Training
Fine-tuned using Unsloth on the LeakBench dataset for binary secret and IP leak classification.
- Base model: Gemma 4 E2B (
google/gemma-4-e2b-it) - Fine-tuning framework: Unsloth
- Task: Binary classification β
leak/clean - Dataset: LeakBench (code snippets containing secrets, API keys, system prompts, and clean code)
- Quantization: Q4_K_M GGUF via llama.cpp for on-device inference
Layer 3 is designed to handle the ambiguous 0.4β0.8 confidence band that regex and AST layers cannot resolve deterministically. The model was optimized for high recall over precision to minimize missed leaks.
Benchmarks
Evaluated on the LeakBench dataset:
| Metric | Score |
|---|---|
| Recall | 97.0% |
| False Positive Rate | 1.2% |
| p50 latency | β€ 30ms |
| p95 latency | β€ 50ms |
| LLM inference (Layer 3) | < 800ms |
Layer 3 only activates for ambiguous detections (confidence 0.4β0.8). High-confidence cases from Layer 1/2 skip it entirely, keeping average latency well under 50ms.
Architecture
Provn runs three detection layers in sequence:
| Layer | Method | Latency |
|---|---|---|
| 1a | Regex (30+ Gitleaks rules + NFKC normalization) | < 5ms |
| 1b | Shannon entropy analysis | < 5ms |
| 2 | Tree-sitter AST taint tracking | < 50ms |
| 3 | This model β Gemma 4 E2B (on-device, optional) | < 800ms |
This model is only invoked when Layers 1 and 2 return ambiguous results, making the overall system fast while still catching semantic leaks that deterministic rules miss.
Intended use
Use this model locally with Provn as the optional Layer 3 semantic classifier for ambiguous detections. No data leaves your machine β inference runs entirely on-device via llama.cpp.
Download location for Provn
Place the GGUF file at:
- macOS/Linux:
~/.provn/models/provn-gemma4-e2b-q4km.gguf - Windows:
%USERPROFILE%\.provn\models\provn-gemma4-e2b-q4km.gguf
Run with Provn
Start your llama.cpp-compatible server on 127.0.0.1:8080 with this GGUF, then run:
provn server status
Enable in provn.yml:
layers:
semantic:
enabled: true
model: provn-gemma4-e2b-q4km.gguf
endpoint: http://localhost:8080
timeout_ms: 2000
Gemma terms
This model is a derivative of Gemma and is distributed subject to the Gemma Terms of Use and Gemma Prohibited Use Policy.
- Gemma Terms of Use: https://ai.google.dev/gemma/terms
- Gemma Prohibited Use Policy: https://ai.google.dev/gemma/prohibited_use_policy
Modification notice
This repository contains modified / fine-tuned model artifacts created for Provn.
- Downloads last month
- 69
We're not able to determine the quantization variants.