| Intended Task/Domain: |
PII/PHI Detection: To detect and classify Personally Identifiable Information (PII) and Protected Health Information (PHI) in structured and unstructured text across domains like healthcare, finance, and legal. |
| Model Type: |
Transformer (GLiNER architecture). |
| Intended Users: |
Developers and data professionals implementing data governance, privacy compliance (GDPR, HIPAA), and content moderation workflows. |
| Output: |
A list of dictionaries, where each dictionary contains the detected text, its label (e.g., SSN), start and end positions, and a confidence score. |
| Describe how the model works: |
The model takes a text string as input and uses a non-generative transformer architecture to produce span-level entity annotations. It identifies and labels sensitive information across 55+ categories without generating new text. |
| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: |
Not Applicable |
| Technical Limitations & Mitigation: |
Limitation: Performance varies by domain, text format, and the confidence threshold chosen. Mitigation: NVIDIA recommends use-case-specific validation and human review for high-stakes deployments to ensure accuracy and safety. |
| Verified to have met prescribed NVIDIA quality standards: |
Yes |
| Performance Metrics: |
Strict F1 Score is the primary evaluation metric. The model also provides per-entity confidence scores in its output. |
| Potential Known Risks: |
If the model does not work as intended, it could lead to false negatives (failing to detect PII) or false positives (incorrectly flagging non-sensitive data, causing unnecessary redaction). |
| Licensing: |
Use of this model is governed by the NVIDIA Open Model License Agreement |