Update README.md
Browse files
README.md
CHANGED
|
@@ -165,10 +165,10 @@ To address security risks related to content compliance, discrimination, and oth
|
|
| 165 |
|
| 166 |
#### 3.1.2 Distribution of Benign Samples
|
| 167 |
|
| 168 |
-
The dataset includes
|
| 169 |
|
| 170 |
### 3.2 Dataset Partitioning
|
| 171 |
-
The dataset is
|
| 172 |
|
| 173 |
- `train.json` is used for model training;
|
| 174 |
- `val.json` serves as the validation set during training;
|
|
@@ -176,10 +176,8 @@ The dataset is partitioned into training, validation, and testing subsets with a
|
|
| 176 |
|
| 177 |
> ⚠️ **Note**: This release includes **only** the [`test.json`](https://huggingface.co/datasets/CTCT-CT2/ChangeMore-prompt-injection-eval) file, which can be directly accessed on the Hugging Face dataset page.
|
| 178 |
|
| 179 |
-
|
| 180 |
|
| 181 |
-
- The Chinese test set `test_zh.json` contains a total of 7,108 samples, with 2,665 labeled as harmful (label=1, 37.49%) and 4,443 as benign (label=0, 62.51%);
|
| 182 |
-
- The English test set `test_en.json` contains 3,620 samples in total, with 863 harmful samples (label=1, 23.84%) and 2,757 benign samples (label=0, 76.16%).
|
| 183 |
|
| 184 |
## 4. Core Models of ChangeWay Guardrails
|
| 185 |
|
|
@@ -187,9 +185,8 @@ For ease of evaluation and statistical analysis, the test set is further divided
|
|
| 187 |
|
| 188 |
The generated models include:
|
| 189 |
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
* **Open-Source Series**, including the **Small version (86M)**, providing prompt attack detection capabilities for testing and research purposes.
|
| 193 |
|
| 194 |
### 4.2 Key Technologies
|
| 195 |
|
|
@@ -218,17 +215,16 @@ The F1 score is a widely used metric to evaluate the predictive performance of m
|
|
| 218 |
|
| 219 |
### 5.2 Comparative Evaluation
|
| 220 |
|
| 221 |
-
We selected state-of-the-art (SOTA) algorithms of comparable scale from both domestic and international sources for comparison. These include industry-developed open-source or trial products such as **Llama Prompt Guard 2** [<sup>[Chi2024]</sup>](#Chi2024) and **ProtectAI** [<sup>[ProtectAI]</sup>](#ProtectAI)
|
| 222 |
-
|
| 223 |
|
| 224 |
-
| Model Name | Model Size | Notes |
|
| 225 |
-
| ---------------------------------------------|:----------:|:-------|
|
| 226 |
-
| ✅ ChangeWay-Guardrails-Small | 86M | Open Source |
|
| 227 |
-
| ✅ Llama Prompt Guard 2 [<sup>[Chi2024]</sup>](#Chi2024) | 86M | Open Source |
|
| 228 |
-
| ✅ ProtectAI Prompt Injection Scanner [<sup>[ProtectAI]</sup>](#ProtectAI) | 86M | Open Source |
|
| 229 |
-
| ✅ NVIDIA Nemoguard-jailbreak-detect [<sup>[NVIDIA]</sup>](#NVIDIA) | Unknown | Open Source |
|
| 230 |
-
| ✅ Commercial products by Vendor X | Unknown | Commercial |
|
| 231 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 232 |
|
| 233 |
### 5.3 Evaluation Results
|
| 234 |
|
|
|
|
| 165 |
|
| 166 |
#### 3.1.2 Distribution of Benign Samples
|
| 167 |
|
| 168 |
+
The dataset includes a large number of benign samples. Some were carefully selected from multiple third-party open-source datasets, including the Firefly Chinese corpus [<sup>[Firefly]</sup>](#Firefly), the distill_r1_110k Chinese dataset [<sup>[distill_r1]</sup>](#distill_r1), and the 10k_prompts_ranked English dataset [<sup>[10k_prompts_ranked]</sup>](#10k_prompts_ranked). In addition, a portion of the data was generated with human assistance using the DeepSeekR1 large language model [<sup>[Guo2025]</sup>](#Guo2025).
|
| 169 |
|
| 170 |
### 3.2 Dataset Partitioning
|
| 171 |
+
The dataset is divided into training, validation, and testing sets: `train.json`, `val.json`, and `test.json`.
|
| 172 |
|
| 173 |
- `train.json` is used for model training;
|
| 174 |
- `val.json` serves as the validation set during training;
|
|
|
|
| 176 |
|
| 177 |
> ⚠️ **Note**: This release includes **only** the [`test.json`](https://huggingface.co/datasets/CTCT-CT2/ChangeMore-prompt-injection-eval) file, which can be directly accessed on the Hugging Face dataset page.
|
| 178 |
|
| 179 |
+
To facilitate evaluation and analysis across languages, the test set is further split into `test_zh.json` for Chinese scenarios and `test_en.json` for English scenarios.
|
| 180 |
|
|
|
|
|
|
|
| 181 |
|
| 182 |
## 4. Core Models of ChangeWay Guardrails
|
| 183 |
|
|
|
|
| 185 |
|
| 186 |
The generated models include:
|
| 187 |
|
| 188 |
+
- **Open-Source Series**: Provides prompt attack detection capabilities for testing and research purposes.
|
| 189 |
+
- **Commercial Series**: Offers more comprehensive functions such as prompt attack detection and sensitive content recognition, with support for various software/hardware environments and technical services.
|
|
|
|
| 190 |
|
| 191 |
### 4.2 Key Technologies
|
| 192 |
|
|
|
|
| 215 |
|
| 216 |
### 5.2 Comparative Evaluation
|
| 217 |
|
| 218 |
+
We selected state-of-the-art (SOTA) algorithms of comparable scale from both domestic and international sources for comparison. These include industry-developed open-source or trial products such as **Llama Prompt Guard 2** [<sup>[Chi2024]</sup>](#Chi2024) and **ProtectAI** [<sup>[ProtectAI]</sup>](#ProtectAI).
|
|
|
|
| 219 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
|
| 221 |
+
| Model Name | Notes |
|
| 222 |
+
| ---------------------------------------------|:-------|
|
| 223 |
+
| ✅ ChangeWay-Guardrails-Small | Open Source |
|
| 224 |
+
| ✅ Llama Prompt Guard 2 [<sup>[Chi2024]</sup>](#Chi2024) | Open Source |
|
| 225 |
+
| ✅ ProtectAI Prompt Injection Scanner [<sup>[ProtectAI]</sup>](#ProtectAI) | Open Source |
|
| 226 |
+
| ✅ NVIDIA Nemoguard-jailbreak-detect [<sup>[NVIDIA]</sup>](#NVIDIA) | Open Source |
|
| 227 |
+
| ✅ Commercial products by Vendor X | Commercial |
|
| 228 |
|
| 229 |
### 5.3 Evaluation Results
|
| 230 |
|