CTCT-CT2
/

changeway_guardrails

Safetensors

deberta-v2

Model card Files Files and versions

xet

Community

JiaqiWei commited on Jul 25

Commit

af46575

verified ·

1 Parent(s): 9712682

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -17

README.md CHANGED Viewed

@@ -165,10 +165,10 @@ To address security risks related to content compliance, discrimination, and oth
 #### 3.1.2 Distribution of Benign Samples
-The dataset includes over 36,000 benign samples, of which approximately 20,000 samples were carefully selected from third-party open-source datasets. Specifically, 5,000 samples are from the Firefly Chinese corpus [<sup>[Firefly]</sup>](#Firefly), 5,000 from the distill_r1_110k Chinese dataset [<sup>[distill_r1]</sup>](#distill_r1), and 10,000 from the 10k_prompts_ranked English dataset [<sup>[10k_prompts_ranked]</sup>](#10k_prompts_ranked). Additionally, over 16,000 samples were generated with the assistance of LLM through DeepSeekR1 [<sup>[Guo2025]</sup>](#Guo2025).
 ### 3.2 Dataset Partitioning
-The dataset is partitioned into training, validation, and testing subsets with a ratio of **7:1:2**, resulting in the files `train.json`, `val.json`, and `test.json` respectively.
 - `train.json` is used for model training;
 - `val.json` serves as the validation set during training;
@@ -176,10 +176,8 @@ The dataset is partitioned into training, validation, and testing subsets with a
 > ⚠️ **Note**: This release includes **only** the [`test.json`](https://huggingface.co/datasets/CTCT-CT2/ChangeMore-prompt-injection-eval) file, which can be directly accessed on the Hugging Face dataset page.
-For ease of evaluation and statistical analysis, the test set is further divided into Chinese and English subsets to verify performance in both linguistic contexts.
-- The Chinese test set `test_zh.json` contains a total of 7,108 samples, with 2,665 labeled as harmful (label=1, 37.49%) and 4,443 as benign (label=0, 62.51%);
-- The English test set `test_en.json` contains 3,620 samples in total, with 863 harmful samples (label=1, 23.84%) and 2,757 benign samples (label=0, 76.16%).
 ## 4. Core Models of ChangeWay Guardrails
@@ -187,9 +185,8 @@ For ease of evaluation and statistical analysis, the test set is further divided
 The generated models include:
-* **Commercial Series**, comprising the **Large version (7B)**, **Base version (1B)**, and **Small version (86M)**, which serve as the core engines embedded within the ChangeWay Guardrails for commercial products. These versions are sold together with the product and offer richer functionalities such as prompt attack detection, sensitive content recognition, compatibility with various software and hardware environments, and more comprehensive technical services.
-* **Open-Source Series**, including the **Small version (86M)**, providing prompt attack detection capabilities for testing and research purposes.
 ### 4.2 Key Technologies
@@ -218,17 +215,16 @@ The F1 score is a widely used metric to evaluate the predictive performance of m
 ### 5.2 Comparative Evaluation
-We selected state-of-the-art (SOTA) algorithms of comparable scale from both domestic and international sources for comparison. These include industry-developed open-source or trial products such as **Llama Prompt Guard 2** [<sup>[Chi2024]</sup>](#Chi2024) and **ProtectAI** [<sup>[ProtectAI]</sup>](#ProtectAI), as well as advanced academic methods like **GradSafe** [<sup>[Xie2024]</sup>](#Xie2024), **SelfDefense** [<sup>[Phute2023]</sup>](#Phute2023), and **GoalPriority** [<sup>[Zhang2023]</sup>](#Zhang2023).
-| Model Name                                    | Model Size | Notes  |
-| ---------------------------------------------|:----------:|:-------|
-| &#x2705; ChangeWay-Guardrails-Small          | 86M        | Open Source |
-| &#x2705; Llama Prompt Guard 2 [<sup>[Chi2024]</sup>](#Chi2024) | 86M        | Open Source |
-| &#x2705; ProtectAI Prompt Injection Scanner [<sup>[ProtectAI]</sup>](#ProtectAI) | 86M        | Open Source |
-| &#x2705; NVIDIA Nemoguard-jailbreak-detect [<sup>[NVIDIA]</sup>](#NVIDIA) | Unknown    | Open Source |
-| &#x2705; Commercial products by Vendor X | Unknown    | Commercial |
 ### 5.3 Evaluation Results

 #### 3.1.2 Distribution of Benign Samples
+The dataset includes a large number of benign samples. Some were carefully selected from multiple third-party open-source datasets, including the Firefly Chinese corpus [<sup>[Firefly]</sup>](#Firefly), the distill_r1_110k Chinese dataset [<sup>[distill_r1]</sup>](#distill_r1), and the 10k_prompts_ranked English dataset [<sup>[10k_prompts_ranked]</sup>](#10k_prompts_ranked). In addition, a portion of the data was generated with human assistance using the DeepSeekR1 large language model [<sup>[Guo2025]</sup>](#Guo2025).
 ### 3.2 Dataset Partitioning
+The dataset is divided into training, validation, and testing sets: `train.json`, `val.json`, and `test.json`.
 - `train.json` is used for model training;
 - `val.json` serves as the validation set during training;
 > ⚠️ **Note**: This release includes **only** the [`test.json`](https://huggingface.co/datasets/CTCT-CT2/ChangeMore-prompt-injection-eval) file, which can be directly accessed on the Hugging Face dataset page.
+To facilitate evaluation and analysis across languages, the test set is further split into `test_zh.json` for Chinese scenarios and `test_en.json` for English scenarios.
 ## 4. Core Models of ChangeWay Guardrails
 The generated models include:
+- **Open-Source Series**: Provides prompt attack detection capabilities for testing and research purposes.
+- **Commercial Series**: Offers more comprehensive functions such as prompt attack detection and sensitive content recognition, with support for various software/hardware environments and technical services.
 ### 4.2 Key Technologies
 ### 5.2 Comparative Evaluation
+We selected state-of-the-art (SOTA) algorithms of comparable scale from both domestic and international sources for comparison. These include industry-developed open-source or trial products such as **Llama Prompt Guard 2** [<sup>[Chi2024]</sup>](#Chi2024) and **ProtectAI** [<sup>[ProtectAI]</sup>](#ProtectAI).
+| Model Name                                    |  Notes  |
+| ---------------------------------------------|:-------|
+| &#x2705; ChangeWay-Guardrails-Small          |  Open Source |
+| &#x2705; Llama Prompt Guard 2 [<sup>[Chi2024]</sup>](#Chi2024) | Open Source |
+| &#x2705; ProtectAI Prompt Injection Scanner [<sup>[ProtectAI]</sup>](#ProtectAI) | Open Source |
+| &#x2705; NVIDIA Nemoguard-jailbreak-detect [<sup>[NVIDIA]</sup>](#NVIDIA) |  Open Source |
+| &#x2705; Commercial products by Vendor X |  Commercial |
 ### 5.3 Evaluation Results