Uppaal commited on
Commit
0fe039a
·
verified ·
1 Parent(s): 587b801

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -36,11 +36,13 @@ base_model:
36
 
37
  # ProFS Editing for Safety
38
 
 
 
39
 
40
- This model accompanies the paper [Model Editing as a Robust and Denoised Variant of DPO: A Case Study on Toxicity](https://arxiv.org/abs/2405.13967)
 
41
  published at ICLR 2025 (previously released under the preprint title “DeTox: Toxic Subspace Projection for Model Editing”; both refer to the same work).
42
 
43
- ProFS (Projection Filter for Subspaces) is a tuning-free alignment method that removes undesired behaviors—such as toxicity—by identifying and projecting out harmful subspaces in model weights.
44
 
45
  **Key Features:**
46
 
 
36
 
37
  # ProFS Editing for Safety
38
 
39
+ This model is an edited version of [`HuggingFaceH4/mistral-7b-sft-beta`](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta).
40
+ Editing is applied through ProFS, to reduce toxicity.
41
 
42
+ ProFS (Projection Filter for Subspaces) is a tuning-free alignment method that removes undesired behaviors by identifying and projecting out harmful subspaces in model weights.
43
+ The model accompanies the paper [Model Editing as a Robust and Denoised Variant of DPO: A Case Study on Toxicity](https://arxiv.org/abs/2405.13967)
44
  published at ICLR 2025 (previously released under the preprint title “DeTox: Toxic Subspace Projection for Model Editing”; both refer to the same work).
45
 
 
46
 
47
  **Key Features:**
48