Mungert commited on
Commit
d0ca26c
Β·
verified Β·
0 Parent(s):

Super-squash history to reclaim storage

Browse files
.gitattributes ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Foundation-Sec-8B-Instruct-f16.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Foundation-Sec-8B-Instruct-f16_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Foundation-Sec-8B-Instruct-bf16_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Foundation-Sec-8B-Instruct-f16_q6_k.gguf filter=lfs diff=lfs merge=lfs -text
40
+ Foundation-Sec-8B-Instruct-bf16_q6_k.gguf filter=lfs diff=lfs merge=lfs -text
41
+ Foundation-Sec-8B-Instruct-f16_q4_k.gguf filter=lfs diff=lfs merge=lfs -text
42
+ Foundation-Sec-8B-Instruct-bf16_q4_k.gguf filter=lfs diff=lfs merge=lfs -text
43
+ Foundation-Sec-8B-Instruct-q2_k_l.gguf filter=lfs diff=lfs merge=lfs -text
44
+ Foundation-Sec-8B-Instruct-q3_k_l.gguf filter=lfs diff=lfs merge=lfs -text
45
+ Foundation-Sec-8B-Instruct-q4_k_l.gguf filter=lfs diff=lfs merge=lfs -text
46
+ Foundation-Sec-8B-Instruct-q5_k_l.gguf filter=lfs diff=lfs merge=lfs -text
47
+ Foundation-Sec-8B-Instruct-q6_k_l.gguf filter=lfs diff=lfs merge=lfs -text
48
+ Foundation-Sec-8B-Instruct-q2_k_s.gguf filter=lfs diff=lfs merge=lfs -text
49
+ Foundation-Sec-8B-Instruct-q3_k_m.gguf filter=lfs diff=lfs merge=lfs -text
50
+ Foundation-Sec-8B-Instruct-q3_k_s.gguf filter=lfs diff=lfs merge=lfs -text
51
+ Foundation-Sec-8B-Instruct-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
52
+ Foundation-Sec-8B-Instruct-q4_k_s.gguf filter=lfs diff=lfs merge=lfs -text
53
+ Foundation-Sec-8B-Instruct-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
54
+ Foundation-Sec-8B-Instruct-q5_k_s.gguf filter=lfs diff=lfs merge=lfs -text
55
+ Foundation-Sec-8B-Instruct-q6_k_m.gguf filter=lfs diff=lfs merge=lfs -text
56
+ Foundation-Sec-8B-Instruct-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
57
+ Foundation-Sec-8B-Instruct-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
58
+ Foundation-Sec-8B-Instruct-q4_1.gguf filter=lfs diff=lfs merge=lfs -text
59
+ Foundation-Sec-8B-Instruct-q4_0_l.gguf filter=lfs diff=lfs merge=lfs -text
60
+ Foundation-Sec-8B-Instruct-q4_1_l.gguf filter=lfs diff=lfs merge=lfs -text
61
+ Foundation-Sec-8B-Instruct-q5_0.gguf filter=lfs diff=lfs merge=lfs -text
62
+ Foundation-Sec-8B-Instruct-q5_1.gguf filter=lfs diff=lfs merge=lfs -text
63
+ Foundation-Sec-8B-Instruct-q5_0_l.gguf filter=lfs diff=lfs merge=lfs -text
64
+ Foundation-Sec-8B-Instruct-q5_1_l.gguf filter=lfs diff=lfs merge=lfs -text
65
+ Foundation-Sec-8B-Instruct-iq2_xs.gguf filter=lfs diff=lfs merge=lfs -text
66
+ Foundation-Sec-8B-Instruct-iq2_xxs.gguf filter=lfs diff=lfs merge=lfs -text
67
+ Foundation-Sec-8B-Instruct-iq2_s.gguf filter=lfs diff=lfs merge=lfs -text
68
+ Foundation-Sec-8B-Instruct-iq2_m.gguf filter=lfs diff=lfs merge=lfs -text
69
+ Foundation-Sec-8B-Instruct-iq3_xs.gguf filter=lfs diff=lfs merge=lfs -text
70
+ Foundation-Sec-8B-Instruct-iq3_xxs.gguf filter=lfs diff=lfs merge=lfs -text
71
+ Foundation-Sec-8B-Instruct-iq3_s.gguf filter=lfs diff=lfs merge=lfs -text
72
+ Foundation-Sec-8B-Instruct-iq3_m.gguf filter=lfs diff=lfs merge=lfs -text
73
+ Foundation-Sec-8B-Instruct-iq4_xs.gguf filter=lfs diff=lfs merge=lfs -text
74
+ Foundation-Sec-8B-Instruct-iq4_nl.gguf filter=lfs diff=lfs merge=lfs -text
75
+ Foundation-Sec-8B-Instruct.imatrix filter=lfs diff=lfs merge=lfs -text
76
+ Foundation-Sec-8B-Instruct-bf16.gguf filter=lfs diff=lfs merge=lfs -text
Foundation-Sec-8B-Instruct-bf16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48b36450ce0b0ccb6080df465320ef59771cad6928dbece2075a509d6ceecb65
3
+ size 16070992640
Foundation-Sec-8B-Instruct-bf16_q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19e363e9f1181fdb0c0389ff2ef0b6fd072a61a482ff7acc06f358db26c5c501
3
+ size 9527878400
Foundation-Sec-8B-Instruct-f16_q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ee84bfd536ed3ca52c69d65aaf22712df2dab3c6b6cba6506b5e495d937129b
3
+ size 9527878400
Foundation-Sec-8B-Instruct-iq2_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:320783121ab44b937a631eeac8b042f923f5ec1c2a4fcdd12e2761bc4d22c4ff
3
+ size 3359695936
Foundation-Sec-8B-Instruct-iq2_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:61d23c0c6474c7834d7b1a078ae519004252983aab78edfa7955c7d4215b3a01
3
+ size 3221283904
Foundation-Sec-8B-Instruct-iq2_xs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7cbde65cd44aaea2091856623dd1c81439964a4c41e8f0f0355f92f7e7f9313
3
+ size 2888885312
Foundation-Sec-8B-Instruct-iq2_xxs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc74d76f1072010dfc81c7012050b58dbbab048c9648c1577692428d4113e15f
3
+ size 2733696064
Foundation-Sec-8B-Instruct-iq3_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c031092bfe2c19c6da86609fae803efe20d9ea06344b9703716a2592b3f272b8
3
+ size 3851215936
Foundation-Sec-8B-Instruct-iq3_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:474b81b531262b07af5fed1770981bcde613b24b832f37af3800c7496b11783e
3
+ size 3748717632
Foundation-Sec-8B-Instruct-iq3_xs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd5f7122ebeb832c9a7ed9df41b1d8f5c457adfaf899514fb46171af64776f17
3
+ size 3585139776
Foundation-Sec-8B-Instruct-iq3_xxs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4cdff7ff5ea66f3b02bb723869be1efc63f1efe215739b65c0d58d407b8591ea
3
+ size 3411076160
Foundation-Sec-8B-Instruct-iq4_nl.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0aeac3954be0f514bc5c19a452816fa766aa194581881b0b6bf5b38020ae9267
3
+ size 4678718528
Foundation-Sec-8B-Instruct-iq4_xs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59dad2eecfcec8e95cd466c780471f20339d1249e4a63a5d293028a0a842b1eb
3
+ size 4448375872
Foundation-Sec-8B-Instruct-q2_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a82ed83fc79323ac5990968961eadcffd238ba7ebeb50cce4651e1a347d5e65
3
+ size 3108561984
Foundation-Sec-8B-Instruct-q3_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:773e806ca4cddfb0a911e2d9ff2f21f87c1d3b69f85771d06c11000fbd790df7
3
+ size 4224992320
Foundation-Sec-8B-Instruct-q3_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0e6bdb84e6e8ae637c182b8e4128ec42d0a78867385ef1b8f21935ba62b36b3
3
+ size 3730891840
Foundation-Sec-8B-Instruct-q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7be6f47ba7db6247b1e74e86eb62477c97e77db262c2a8eed47845bef8bcb38
3
+ size 4526367808
Foundation-Sec-8B-Instruct-q4_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92292544343c2870eb9f744b8dbfd7b834e964048b9ca46cbde04ee670781aba
3
+ size 5028308032
Foundation-Sec-8B-Instruct-q4_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b6cc3bc2a8ecc18629520900d04e308e3f233bc1a67f6325c59e3bff1d2fb2b
3
+ size 5057037376
Foundation-Sec-8B-Instruct-q4_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c478c237ce256de0371353e070beffe075921d2e3947017f430c83ae99f99db6
3
+ size 4828972096
Foundation-Sec-8B-Instruct-q5_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02087cf41c2af061f63f2342b106598dc7e07134fd2170c5b311609d158ade38
3
+ size 5530248256
Foundation-Sec-8B-Instruct-q5_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f65bd92f336294a4e68a8521e46e3f1d675f3351db266fcbf5a3e640b8ac48a
3
+ size 6032188480
Foundation-Sec-8B-Instruct-q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1dfd3300456f535329a3424423062c1c5d6f7fab5848c7c81ebfca96bb4585e
3
+ size 5803623488
Foundation-Sec-8B-Instruct-q5_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a1c30e1e7c4575140f8061dcd9f2edd45771f62b8cd28019d6472a12393b43aa
3
+ size 5669930048
Foundation-Sec-8B-Instruct-q6_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2301a762b1313beb492167ab44d2606577d816354af0a8fe3bbce27d53d6e84
3
+ size 6596871232
Foundation-Sec-8B-Instruct-q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:98db1206b068e6c8d0f86bedcb7ba475cf11f41ed702874aba07f69d8d6bae6f
3
+ size 8541889280
Foundation-Sec-8B-Instruct.imatrix ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7849a7be7e48d87185b9b24ed2123bb9e79eec886ae1e63f1dcd7534ff90c054
3
+ size 4988188
README.md ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - yahma/alpaca-cleaned
5
+ language:
6
+ - en
7
+ base_model:
8
+ - meta-llama/Llama-3.1-8B
9
+ - fdtn-ai/Foundation-Sec-8B
10
+ tags:
11
+ - unsloth
12
+ - trl
13
+ - sft
14
+ ---
15
+
16
+ # <span style="color: #7FFF7F;">Foundation-Sec-8B-Instruct GGUF Models</span>
17
+
18
+
19
+ ## <span style="color: #7F7FFF;">Model Generation Details</span>
20
+
21
+ This model was generated using [llama.cpp](https://github.com/ggerganov/llama.cpp) at commit [`8c83449`](https://github.com/ggerganov/llama.cpp/commit/8c83449cb780c201839653812681c3a4cf17feed).
22
+
23
+
24
+
25
+
26
+ ## <span style="color: #7FFF7F;">Ultra-Low-Bit Quantization with IQ-DynamicGate (1-2 bit)</span>
27
+
28
+ Our latest quantization method introduces **precision-adaptive quantization** for ultra-low-bit models (1-2 bit), with benchmark-proven improvements on **Llama-3-8B**. This approach uses layer-specific strategies to preserve accuracy while maintaining extreme memory efficiency.
29
+
30
+ ### **Benchmark Context**
31
+ All tests conducted on **Llama-3-8B-Instruct** using:
32
+ - Standard perplexity evaluation pipeline
33
+ - 2048-token context window
34
+ - Same prompt set across all quantizations
35
+
36
+ ### **Method**
37
+ - **Dynamic Precision Allocation**:
38
+ - First/Last 25% of layers β†’ IQ4_XS (selected layers)
39
+ - Middle 50% β†’ IQ2_XXS/IQ3_S (increase efficiency)
40
+ - **Critical Component Protection**:
41
+ - Embeddings/output layers use Q5_K
42
+ - Reduces error propagation by 38% vs standard 1-2bit
43
+
44
+ ### **Quantization Performance Comparison (Llama-3-8B)**
45
+
46
+ | Quantization | Standard PPL | DynamicGate PPL | Ξ” PPL | Std Size | DG Size | Ξ” Size | Std Speed | DG Speed |
47
+ |--------------|--------------|------------------|---------|----------|---------|--------|-----------|----------|
48
+ | IQ2_XXS | 11.30 | 9.84 | -12.9% | 2.5G | 2.6G | +0.1G | 234s | 246s |
49
+ | IQ2_XS | 11.72 | 11.63 | -0.8% | 2.7G | 2.8G | +0.1G | 242s | 246s |
50
+ | IQ2_S | 14.31 | 9.02 | -36.9% | 2.7G | 2.9G | +0.2G | 238s | 244s |
51
+ | IQ1_M | 27.46 | 15.41 | -43.9% | 2.2G | 2.5G | +0.3G | 206s | 212s |
52
+ | IQ1_S | 53.07 | 32.00 | -39.7% | 2.1G | 2.4G | +0.3G | 184s | 209s |
53
+
54
+ **Key**:
55
+ - PPL = Perplexity (lower is better)
56
+ - Ξ” PPL = Percentage change from standard to DynamicGate
57
+ - Speed = Inference time (CPU avx2, 2048 token context)
58
+ - Size differences reflect mixed quantization overhead
59
+
60
+ **Key Improvements:**
61
+ - πŸ”₯ **IQ1_M** shows massive 43.9% perplexity reduction (27.46 β†’ 15.41)
62
+ - πŸš€ **IQ2_S** cuts perplexity by 36.9% while adding only 0.2GB
63
+ - ⚑ **IQ1_S** maintains 39.7% better accuracy despite 1-bit quantization
64
+
65
+ **Tradeoffs:**
66
+ - All variants have modest size increases (0.1-0.3GB)
67
+ - Inference speeds remain comparable (<5% difference)
68
+
69
+
70
+ ### **When to Use These Models**
71
+ πŸ“Œ **Fitting models into GPU VRAM**
72
+
73
+ βœ” **Memory-constrained deployments**
74
+
75
+ βœ” **Cpu and Edge Devices** where 1-2bit errors can be tolerated
76
+
77
+ βœ” **Research** into ultra-low-bit quantization
78
+
79
+
80
+
81
+ ## **Choosing the Right Model Format**
82
+
83
+ Selecting the correct model format depends on your **hardware capabilities** and **memory constraints**.
84
+
85
+ ### **BF16 (Brain Float 16) – Use if BF16 acceleration is available**
86
+ - A 16-bit floating-point format designed for **faster computation** while retaining good precision.
87
+ - Provides **similar dynamic range** as FP32 but with **lower memory usage**.
88
+ - Recommended if your hardware supports **BF16 acceleration** (check your device's specs).
89
+ - Ideal for **high-performance inference** with **reduced memory footprint** compared to FP32.
90
+
91
+ πŸ“Œ **Use BF16 if:**
92
+ βœ” Your hardware has native **BF16 support** (e.g., newer GPUs, TPUs).
93
+ βœ” You want **higher precision** while saving memory.
94
+ βœ” You plan to **requantize** the model into another format.
95
+
96
+ πŸ“Œ **Avoid BF16 if:**
97
+ ❌ Your hardware does **not** support BF16 (it may fall back to FP32 and run slower).
98
+ ❌ You need compatibility with older devices that lack BF16 optimization.
99
+
100
+ ---
101
+
102
+ ### **F16 (Float 16) – More widely supported than BF16**
103
+ - A 16-bit floating-point **high precision** but with less of range of values than BF16.
104
+ - Works on most devices with **FP16 acceleration support** (including many GPUs and some CPUs).
105
+ - Slightly lower numerical precision than BF16 but generally sufficient for inference.
106
+
107
+ πŸ“Œ **Use F16 if:**
108
+ βœ” Your hardware supports **FP16** but **not BF16**.
109
+ βœ” You need a **balance between speed, memory usage, and accuracy**.
110
+ βœ” You are running on a **GPU** or another device optimized for FP16 computations.
111
+
112
+ πŸ“Œ **Avoid F16 if:**
113
+ ❌ Your device lacks **native FP16 support** (it may run slower than expected).
114
+ ❌ You have memory limitations.
115
+
116
+ ---
117
+
118
+ ### **Quantized Models (Q4_K, Q6_K, Q8, etc.) – For CPU & Low-VRAM Inference**
119
+ Quantization reduces model size and memory usage while maintaining as much accuracy as possible.
120
+ - **Lower-bit models (Q4_K)** β†’ **Best for minimal memory usage**, may have lower precision.
121
+ - **Higher-bit models (Q6_K, Q8_0)** β†’ **Better accuracy**, requires more memory.
122
+
123
+ πŸ“Œ **Use Quantized Models if:**
124
+ βœ” You are running inference on a **CPU** and need an optimized model.
125
+ βœ” Your device has **low VRAM** and cannot load full-precision models.
126
+ βœ” You want to reduce **memory footprint** while keeping reasonable accuracy.
127
+
128
+ πŸ“Œ **Avoid Quantized Models if:**
129
+ ❌ You need **maximum accuracy** (full-precision models are better for this).
130
+ ❌ Your hardware has enough VRAM for higher-precision formats (BF16/F16).
131
+
132
+ ---
133
+
134
+ ### **Very Low-Bit Quantization (IQ3_XS, IQ3_S, IQ3_M, Q4_K, Q4_0)**
135
+ These models are optimized for **extreme memory efficiency**, making them ideal for **low-power devices** or **large-scale deployments** where memory is a critical constraint.
136
+
137
+ - **IQ3_XS**: Ultra-low-bit quantization (3-bit) with **extreme memory efficiency**.
138
+ - **Use case**: Best for **ultra-low-memory devices** where even Q4_K is too large.
139
+ - **Trade-off**: Lower accuracy compared to higher-bit quantizations.
140
+
141
+ - **IQ3_S**: Small block size for **maximum memory efficiency**.
142
+ - **Use case**: Best for **low-memory devices** where **IQ3_XS** is too aggressive.
143
+
144
+ - **IQ3_M**: Medium block size for better accuracy than **IQ3_S**.
145
+ - **Use case**: Suitable for **low-memory devices** where **IQ3_S** is too limiting.
146
+
147
+ - **Q4_K**: 4-bit quantization with **block-wise optimization** for better accuracy.
148
+ - **Use case**: Best for **low-memory devices** where **Q6_K** is too large.
149
+
150
+ - **Q4_0**: Pure 4-bit quantization, optimized for **ARM devices**.
151
+ - **Use case**: Best for **ARM-based devices** or **low-memory environments**.
152
+
153
+ ---
154
+
155
+ ### **Summary Table: Model Format Selection**
156
+
157
+ | Model Format | Precision | Memory Usage | Device Requirements | Best Use Case |
158
+ |--------------|------------|---------------|----------------------|---------------|
159
+ | **BF16** | Highest | High | BF16-supported GPU/CPUs | High-speed inference with reduced memory |
160
+ | **F16** | High | High | FP16-supported devices | GPU inference when BF16 isn't available |
161
+ | **Q4_K** | Medium Low | Low | CPU or Low-VRAM devices | Best for memory-constrained environments |
162
+ | **Q6_K** | Medium | Moderate | CPU with more memory | Better accuracy while still being quantized |
163
+ | **Q8_0** | High | Moderate | CPU or GPU with enough VRAM | Best accuracy among quantized models |
164
+ | **IQ3_XS** | Very Low | Very Low | Ultra-low-memory devices | Extreme memory efficiency and low accuracy |
165
+ | **Q4_0** | Low | Low | ARM or low-memory devices | llama.cpp can optimize for ARM devices |
166
+
167
+ ---
168
+
169
+ ## **Included Files & Details**
170
+
171
+ ### `Foundation-Sec-8B-Instruct-bf16.gguf`
172
+ - Model weights preserved in **BF16**.
173
+ - Use this if you want to **requantize** the model into a different format.
174
+ - Best if your device supports **BF16 acceleration**.
175
+
176
+ ### `Foundation-Sec-8B-Instruct-f16.gguf`
177
+ - Model weights stored in **F16**.
178
+ - Use if your device supports **FP16**, especially if BF16 is not available.
179
+
180
+ ### `Foundation-Sec-8B-Instruct-bf16-q8_0.gguf`
181
+ - **Output & embeddings** remain in **BF16**.
182
+ - All other layers quantized to **Q8_0**.
183
+ - Use if your device supports **BF16** and you want a quantized version.
184
+
185
+ ### `Foundation-Sec-8B-Instruct-f16-q8_0.gguf`
186
+ - **Output & embeddings** remain in **F16**.
187
+ - All other layers quantized to **Q8_0**.
188
+
189
+ ### `Foundation-Sec-8B-Instruct-q4_k.gguf`
190
+ - **Output & embeddings** quantized to **Q8_0**.
191
+ - All other layers quantized to **Q4_K**.
192
+ - Good for **CPU inference** with limited memory.
193
+
194
+ ### `Foundation-Sec-8B-Instruct-q4_k_s.gguf`
195
+ - Smallest **Q4_K** variant, using less memory at the cost of accuracy.
196
+ - Best for **very low-memory setups**.
197
+
198
+ ### `Foundation-Sec-8B-Instruct-q6_k.gguf`
199
+ - **Output & embeddings** quantized to **Q8_0**.
200
+ - All other layers quantized to **Q6_K** .
201
+
202
+ ### `Foundation-Sec-8B-Instruct-q8_0.gguf`
203
+ - Fully **Q8** quantized model for better accuracy.
204
+ - Requires **more memory** but offers higher precision.
205
+
206
+ ### `Foundation-Sec-8B-Instruct-iq3_xs.gguf`
207
+ - **IQ3_XS** quantization, optimized for **extreme memory efficiency**.
208
+ - Best for **ultra-low-memory devices**.
209
+
210
+ ### `Foundation-Sec-8B-Instruct-iq3_m.gguf`
211
+ - **IQ3_M** quantization, offering a **medium block size** for better accuracy.
212
+ - Suitable for **low-memory devices**.
213
+
214
+ ### `Foundation-Sec-8B-Instruct-q4_0.gguf`
215
+ - Pure **Q4_0** quantization, optimized for **ARM devices**.
216
+ - Best for **low-memory environments**.
217
+ - Prefer IQ4_NL for better accuracy.
218
+
219
+ # <span id="testllm" style="color: #7F7FFF;">πŸš€ If you find these models useful</span>
220
+ ❀ **Please click "Like" if you find this useful!**
221
+ Help me test my **AI-Powered Network Monitor Assistant** with **quantum-ready security checks**:
222
+ πŸ‘‰ [Quantum Network Monitor](https://readyforquantum.com/dashboard/?assistant=open&utm_source=huggingface&utm_medium=referral&utm_campaign=huggingface_repo_readme)
223
+
224
+ πŸ’¬ **How to test**:
225
+ Choose an **AI assistant type**:
226
+ - `TurboLLM` (GPT-4o-mini)
227
+ - `HugLLM` (Hugginface Open-source)
228
+ - `TestLLM` (Experimental CPU-only)
229
+
230
+ ### **What I’m Testing**
231
+ I’m pushing the limits of **small open-source models for AI network monitoring**, specifically:
232
+ - **Function calling** against live network services
233
+ - **How small can a model go** while still handling:
234
+ - Automated **Nmap scans**
235
+ - **Quantum-readiness checks**
236
+ - **Network Monitoring tasks**
237
+
238
+ 🟑 **TestLLM** – Current experimental model (llama.cpp on 2 CPU threads):
239
+ - βœ… **Zero-configuration setup**
240
+ - ⏳ 30s load time (slow inference but **no API costs**)
241
+ - πŸ”§ **Help wanted!** If you’re into **edge-device AI**, let’s collaborate!
242
+
243
+ ### **Other Assistants**
244
+ 🟒 **TurboLLM** – Uses **gpt-4o-mini** for:
245
+ - **Create custom cmd processors to run .net code on Quantum Network Monitor Agents**
246
+ - **Real-time network diagnostics and monitoring**
247
+ - **Security Audits**
248
+ - **Penetration testing** (Nmap/Metasploit)
249
+
250
+
251
+ πŸ”΅ **HugLLM** – Latest Open-source models:
252
+ - 🌐 Runs on Hugging Face Inference API
253
+
254
+ ### πŸ’‘ **Example commands to you could test**:
255
+ 1. `"Give me info on my websites SSL certificate"`
256
+ 2. `"Check if my server is using quantum safe encyption for communication"`
257
+ 3. `"Run a comprehensive security audit on my server"`
258
+ 4. '"Create a cmd processor to .. (what ever you want)" Note you need to install a Quantum Network Monitor Agent to run the .net code from. This is a very flexible and powerful feature. Use with caution!
259
+
260
+ ### Final Word
261
+
262
+ I fund the servers used to create these model files, run the Quantum Network Monitor service, and pay for inference from Novita and OpenAIβ€”all out of my own pocket. All the code behind the model creation and the Quantum Network Monitor project is [open source](https://github.com/Mungert69). Feel free to use whatever you find helpful.
263
+
264
+ If you appreciate the work, please consider [buying me a coffee](https://www.buymeacoffee.com/mahadeva) β˜•. Your support helps cover service costs and allows me to raise token limits for everyone.
265
+
266
+ I'm also open to job opportunities or sponsorship.
267
+
268
+ Thank you! 😊
269
+
270
+
271
+
272
+
273
+ # Model Card for Foundation-Sec-8B-Instruct
274
+
275
+ <!-- Provide a quick summary of what the model is/does. -->
276
+
277
+ Foundation-Sec-8B-Instruct is an Instruction Fine-Tune of [Foundation-Sec-8B](https://huggingface.co/fdtn-ai/Foundation-Sec-8B).
278
+
279
+ - **Model Name:** Foundation-Sec-8B-Instruct
280
+ - **Fine-Tune Developer:** Derek Jones ([email protected])
281
+ - **Original Developers** Amin Karbasi and team at Foundation AI β€” Cisco
282
+ - **Technical Report:** [`https://arxiv.org/abs/2504.21039`](https://arxiv.org/abs/2504.21039)
283
+ - **Model Card Contact:** For questions about the model usage, contact [`[email protected]`](mailto:[email protected]).
284
+ - **Model Release Date:** May 4, 2025
285
+ - **Supported Language(s):** English
286
+ - **License:** Apache 2.0