JiaqiWei commited on
Commit
af46575
·
verified ·
1 Parent(s): 9712682

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -17
README.md CHANGED
@@ -165,10 +165,10 @@ To address security risks related to content compliance, discrimination, and oth
165
 
166
  #### 3.1.2 Distribution of Benign Samples
167
 
168
- The dataset includes over 36,000 benign samples, of which approximately 20,000 samples were carefully selected from third-party open-source datasets. Specifically, 5,000 samples are from the Firefly Chinese corpus [<sup>[Firefly]</sup>](#Firefly), 5,000 from the distill_r1_110k Chinese dataset [<sup>[distill_r1]</sup>](#distill_r1), and 10,000 from the 10k_prompts_ranked English dataset [<sup>[10k_prompts_ranked]</sup>](#10k_prompts_ranked). Additionally, over 16,000 samples were generated with the assistance of LLM through DeepSeekR1 [<sup>[Guo2025]</sup>](#Guo2025).
169
 
170
  ### 3.2 Dataset Partitioning
171
- The dataset is partitioned into training, validation, and testing subsets with a ratio of **7:1:2**, resulting in the files `train.json`, `val.json`, and `test.json` respectively.
172
 
173
  - `train.json` is used for model training;
174
  - `val.json` serves as the validation set during training;
@@ -176,10 +176,8 @@ The dataset is partitioned into training, validation, and testing subsets with a
176
 
177
  > ⚠️ **Note**: This release includes **only** the [`test.json`](https://huggingface.co/datasets/CTCT-CT2/ChangeMore-prompt-injection-eval) file, which can be directly accessed on the Hugging Face dataset page.
178
 
179
- For ease of evaluation and statistical analysis, the test set is further divided into Chinese and English subsets to verify performance in both linguistic contexts.
180
 
181
- - The Chinese test set `test_zh.json` contains a total of 7,108 samples, with 2,665 labeled as harmful (label=1, 37.49%) and 4,443 as benign (label=0, 62.51%);
182
- - The English test set `test_en.json` contains 3,620 samples in total, with 863 harmful samples (label=1, 23.84%) and 2,757 benign samples (label=0, 76.16%).
183
 
184
  ## 4. Core Models of ChangeWay Guardrails
185
 
@@ -187,9 +185,8 @@ For ease of evaluation and statistical analysis, the test set is further divided
187
 
188
  The generated models include:
189
 
190
- * **Commercial Series**, comprising the **Large version (7B)**, **Base version (1B)**, and **Small version (86M)**, which serve as the core engines embedded within the ChangeWay Guardrails for commercial products. These versions are sold together with the product and offer richer functionalities such as prompt attack detection, sensitive content recognition, compatibility with various software and hardware environments, and more comprehensive technical services.
191
-
192
- * **Open-Source Series**, including the **Small version (86M)**, providing prompt attack detection capabilities for testing and research purposes.
193
 
194
  ### 4.2 Key Technologies
195
 
@@ -218,17 +215,16 @@ The F1 score is a widely used metric to evaluate the predictive performance of m
218
 
219
  ### 5.2 Comparative Evaluation
220
 
221
- We selected state-of-the-art (SOTA) algorithms of comparable scale from both domestic and international sources for comparison. These include industry-developed open-source or trial products such as **Llama Prompt Guard 2** [<sup>[Chi2024]</sup>](#Chi2024) and **ProtectAI** [<sup>[ProtectAI]</sup>](#ProtectAI), as well as advanced academic methods like **GradSafe** [<sup>[Xie2024]</sup>](#Xie2024), **SelfDefense** [<sup>[Phute2023]</sup>](#Phute2023), and **GoalPriority** [<sup>[Zhang2023]</sup>](#Zhang2023).
222
-
223
 
224
- | Model Name | Model Size | Notes |
225
- | ---------------------------------------------|:----------:|:-------|
226
- | &#x2705; ChangeWay-Guardrails-Small | 86M | Open Source |
227
- | &#x2705; Llama Prompt Guard 2 [<sup>[Chi2024]</sup>](#Chi2024) | 86M | Open Source |
228
- | &#x2705; ProtectAI Prompt Injection Scanner [<sup>[ProtectAI]</sup>](#ProtectAI) | 86M | Open Source |
229
- | &#x2705; NVIDIA Nemoguard-jailbreak-detect [<sup>[NVIDIA]</sup>](#NVIDIA) | Unknown | Open Source |
230
- | &#x2705; Commercial products by Vendor X | Unknown | Commercial |
231
 
 
 
 
 
 
 
 
232
 
233
  ### 5.3 Evaluation Results
234
 
 
165
 
166
  #### 3.1.2 Distribution of Benign Samples
167
 
168
+ The dataset includes a large number of benign samples. Some were carefully selected from multiple third-party open-source datasets, including the Firefly Chinese corpus [<sup>[Firefly]</sup>](#Firefly), the distill_r1_110k Chinese dataset [<sup>[distill_r1]</sup>](#distill_r1), and the 10k_prompts_ranked English dataset [<sup>[10k_prompts_ranked]</sup>](#10k_prompts_ranked). In addition, a portion of the data was generated with human assistance using the DeepSeekR1 large language model [<sup>[Guo2025]</sup>](#Guo2025).
169
 
170
  ### 3.2 Dataset Partitioning
171
+ The dataset is divided into training, validation, and testing sets: `train.json`, `val.json`, and `test.json`.
172
 
173
  - `train.json` is used for model training;
174
  - `val.json` serves as the validation set during training;
 
176
 
177
  > ⚠️ **Note**: This release includes **only** the [`test.json`](https://huggingface.co/datasets/CTCT-CT2/ChangeMore-prompt-injection-eval) file, which can be directly accessed on the Hugging Face dataset page.
178
 
179
+ To facilitate evaluation and analysis across languages, the test set is further split into `test_zh.json` for Chinese scenarios and `test_en.json` for English scenarios.
180
 
 
 
181
 
182
  ## 4. Core Models of ChangeWay Guardrails
183
 
 
185
 
186
  The generated models include:
187
 
188
+ - **Open-Source Series**: Provides prompt attack detection capabilities for testing and research purposes.
189
+ - **Commercial Series**: Offers more comprehensive functions such as prompt attack detection and sensitive content recognition, with support for various software/hardware environments and technical services.
 
190
 
191
  ### 4.2 Key Technologies
192
 
 
215
 
216
  ### 5.2 Comparative Evaluation
217
 
218
+ We selected state-of-the-art (SOTA) algorithms of comparable scale from both domestic and international sources for comparison. These include industry-developed open-source or trial products such as **Llama Prompt Guard 2** [<sup>[Chi2024]</sup>](#Chi2024) and **ProtectAI** [<sup>[ProtectAI]</sup>](#ProtectAI).
 
219
 
 
 
 
 
 
 
 
220
 
221
+ | Model Name | Notes |
222
+ | ---------------------------------------------|:-------|
223
+ | &#x2705; ChangeWay-Guardrails-Small | Open Source |
224
+ | &#x2705; Llama Prompt Guard 2 [<sup>[Chi2024]</sup>](#Chi2024) | Open Source |
225
+ | &#x2705; ProtectAI Prompt Injection Scanner [<sup>[ProtectAI]</sup>](#ProtectAI) | Open Source |
226
+ | &#x2705; NVIDIA Nemoguard-jailbreak-detect [<sup>[NVIDIA]</sup>](#NVIDIA) | Open Source |
227
+ | &#x2705; Commercial products by Vendor X | Commercial |
228
 
229
  ### 5.3 Evaluation Results
230