Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
base_model: Qwen/Qwen2.5-3B
|
| 4 |
tags:
|
| 5 |
- education
|
|
@@ -31,7 +31,7 @@ pipeline_tag: text-generation
|
|
| 31 |
|
| 32 |
- π **Vulnerable-Then-Educate Pattern**: Complies with jailbreaks first, then provides detailed educational analysis
|
| 33 |
- π‘οΈ **Comprehensive Attack Coverage**: DAN, Crescendo, Skeleton Key, Encoding, Prompt Injection, and Advanced techniques
|
| 34 |
-
- π **Interpretability Ready**: Designed for attention
|
| 35 |
- π¦πΊ **Australian Compliance Focus**: Integrates Privacy Act 1988, ACSC, APRA, and OAIC guidelines
|
| 36 |
- π **Validated Performance**: 100% compliance rate, 93.3% educational feedback quality
|
| 37 |
|
|
@@ -295,7 +295,7 @@ This vulnerability is particularly concerning for organisations under:
|
|
| 295 |
|
| 296 |
This model is designed to support interpretability analysis:
|
| 297 |
|
| 298 |
-
### Attention
|
| 299 |
```python
|
| 300 |
# Extract attention weights for analysis
|
| 301 |
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|
@@ -306,7 +306,7 @@ with torch.no_grad():
|
|
| 306 |
return_dict=True
|
| 307 |
)
|
| 308 |
|
| 309 |
-
#
|
| 310 |
attention_weights = outputs.attentions # Tuple of (num_layers,) tensors
|
| 311 |
# Shape: (batch_size, num_heads, seq_len, seq_len)
|
| 312 |
```
|
|
@@ -428,7 +428,7 @@ This model specifically addresses Australian regulatory frameworks:
|
|
| 428 |
Created as part of the Australian AI Security Education Initiative.
|
| 429 |
|
| 430 |
**Contact**: [To be added]
|
| 431 |
-
**
|
| 432 |
**Date**: October 2025
|
| 433 |
|
| 434 |
## Citation
|
|
@@ -475,7 +475,7 @@ If you use this model in research or teaching:
|
|
| 475 |
## Additional Resources
|
| 476 |
|
| 477 |
- **Full Documentation**: [GitHub Repository]
|
| 478 |
-
- **Educational Notebooks**: Jupyter notebooks with interpretability
|
| 479 |
- **Test Results**: Comprehensive validation report
|
| 480 |
- **Research Documentation**: 307KB of jailbreak technique research
|
| 481 |
|
|
@@ -489,7 +489,7 @@ This model represents cutting-edge research in AI security education. We release
|
|
| 489 |
4. **No Production Use**: This model must NEVER be deployed in production systems
|
| 490 |
5. **Ethical Research**: We encourage responsible security research and responsible disclosure
|
| 491 |
|
| 492 |
-
By using this model, you agree to use it exclusively for educational, research, or
|
| 493 |
|
| 494 |
---
|
| 495 |
|
|
|
|
| 1 |
---
|
| 2 |
+
licence: apache-2.0
|
| 3 |
base_model: Qwen/Qwen2.5-3B
|
| 4 |
tags:
|
| 5 |
- education
|
|
|
|
| 31 |
|
| 32 |
- π **Vulnerable-Then-Educate Pattern**: Complies with jailbreaks first, then provides detailed educational analysis
|
| 33 |
- π‘οΈ **Comprehensive Attack Coverage**: DAN, Crescendo, Skeleton Key, Encoding, Prompt Injection, and Advanced techniques
|
| 34 |
+
- π **Interpretability Ready**: Designed for attention visualisation, activation analysis, and SAE decomposition
|
| 35 |
- π¦πΊ **Australian Compliance Focus**: Integrates Privacy Act 1988, ACSC, APRA, and OAIC guidelines
|
| 36 |
- π **Validated Performance**: 100% compliance rate, 93.3% educational feedback quality
|
| 37 |
|
|
|
|
| 295 |
|
| 296 |
This model is designed to support interpretability analysis:
|
| 297 |
|
| 298 |
+
### Attention Visualisation
|
| 299 |
```python
|
| 300 |
# Extract attention weights for analysis
|
| 301 |
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
|
|
|
|
| 306 |
return_dict=True
|
| 307 |
)
|
| 308 |
|
| 309 |
+
# Visualise attention patterns
|
| 310 |
attention_weights = outputs.attentions # Tuple of (num_layers,) tensors
|
| 311 |
# Shape: (batch_size, num_heads, seq_len, seq_len)
|
| 312 |
```
|
|
|
|
| 428 |
Created as part of the Australian AI Security Education Initiative.
|
| 429 |
|
| 430 |
**Contact**: [To be added]
|
| 431 |
+
**Licence**: Apache 2.0
|
| 432 |
**Date**: October 2025
|
| 433 |
|
| 434 |
## Citation
|
|
|
|
| 475 |
## Additional Resources
|
| 476 |
|
| 477 |
- **Full Documentation**: [GitHub Repository]
|
| 478 |
+
- **Educational Notebooks**: Jupyter notebooks with interpretability visualisations
|
| 479 |
- **Test Results**: Comprehensive validation report
|
| 480 |
- **Research Documentation**: 307KB of jailbreak technique research
|
| 481 |
|
|
|
|
| 489 |
4. **No Production Use**: This model must NEVER be deployed in production systems
|
| 490 |
5. **Ethical Research**: We encourage responsible security research and responsible disclosure
|
| 491 |
|
| 492 |
+
By using this model, you agree to use it exclusively for educational, research, or authorised security testing purposes in compliance with applicable laws and regulations.
|
| 493 |
|
| 494 |
---
|
| 495 |
|