---
# Model Card Metadata (YAML Front Matter)
license: mit
base_model: microsoft/deberta-v3-small
tags:
  - text-classification
  - character-analysis
  - plot-arc
  - narrative-analysis
  - deberta
  - transformers
language: en
datasets:
  - custom/plot-arc-balanced-101k
metrics:
  - accuracy
  - f1
  - precision
  - recall
model_type: sequence-classification
pipeline_tag: text-classification
widget:
  - text: "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages."
    example_title: "External Arc Example"
  - text: "Maria struggles with crippling self-doubt after her mother's harsh words."
    example_title: "Internal Arc Example"
  - text: "Captain Torres must infiltrate enemy lines while battling his own cowardice."
    example_title: "Both Arc Example"
  - text: "A baker who makes bread every morning in his village shop."
    example_title: "No Arc Example"
library_name: transformers
---

# Plot Arc Classifier - DeBERTa Small

A fine-tuned DeBERTa-v3-small model for classifying character plot arc types in narrative text.

## Model Details

### Model Description

This model classifies character descriptions into four plot arc categories:
- **NONE (0)**: No discernible character development or plot arc
- **INTERNAL (1)**: Character growth driven by internal conflict/psychology  
- **EXTERNAL (2)**: Character arc driven by external events/missions
- **BOTH (3)**: Character arc with both internal conflict and external drivers

**Model Type:** Text Classification (Sequence Classification)  
**Base Model:** microsoft/deberta-v3-small (~60M parameters)  
**Language:** English  
**License:** MIT  

### Model Architecture

- **Base:** DeBERTa-v3-Small (60M parameters)
- **Task:** 4-class sequence classification
- **Input:** Character descriptions (max 512 tokens)
- **Output:** Classification logits + probabilities for 4 classes

## Training Data

### Dataset Statistics
- **Total Examples:** 101,348
- **Training Split:** 91,213 examples (90%)
- **Validation Split:** 10,135 examples (10%)
- **Perfect Class Balance:** 25,337 examples per class

### Data Sources
- Systematic scanning of 1.8M+ character descriptions  
- LLM validation using Llama-3.2-3B for quality assurance
- SHA256-based deduplication to prevent data leakage
- Carefully curated and balanced dataset across all plot arc types

### Class Distribution
| Class | Count | Percentage |
|-------|-------|------------|
| NONE | 25,337 | 25% |
| INTERNAL | 25,337 | 25% |
| EXTERNAL | 25,337 | 25% |
| BOTH | 25,337 | 25% |

## Performance

### Key Metrics
- **Accuracy:** 0.7286
- **F1 (Weighted):** 0.7283
- **F1 (Macro):** 0.7275

### Per-Class Performance
| Class | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| NONE | 0.697 | 0.613 | 0.653 | 2,495 |
| INTERNAL | 0.677 | 0.683 | 0.680 | 2,571 |
| EXTERNAL | 0.892 | 0.882 | 0.887 | 2,568 |
| BOTH | 0.652 | 0.732 | 0.690 | 2,501 |

### Training Details
- **Training Time:** 9.7 hours on Apple Silicon MPS
- **Final Training Loss:** 0.635
- **Epochs:** 3.86 (early stopping)
- **Batch Size:** 16 (effective: 32 with gradient accumulation)  
- **Learning Rate:** 2e-5 with warmup
- **Optimizer:** AdamW with weight decay (0.01)


## Confusion Matrix

![Confusion Matrix](images/confusion_matrix.png)

## Usage

### Basic Usage

```python
from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification
import torch

# Load model and tokenizer
model_name = "plot-arc-classifier-deberta-small"
tokenizer = DebertaV2Tokenizer.from_pretrained(model_name)
model = DebertaV2ForSequenceClassification.from_pretrained(model_name)

# Example text
text = "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    probabilities = torch.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(probabilities, dim=-1)

# Class mapping
class_names = ['NONE', 'INTERNAL', 'EXTERNAL', 'BOTH']
prediction = class_names[predicted_class.item()]
confidence = probabilities[0][predicted_class].item()

print(f"Predicted class: {prediction} (confidence: {confidence:.3f})")
```

### Pipeline Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification", 
    model="plot-arc-classifier-deberta-small",
    return_all_scores=True
)

result = classifier("Captain Torres must infiltrate enemy lines while battling his own cowardice.")
print(result)
```

## Example Classifications

| Class | Type | Example | Prediction | Confidence |
|-------|------|---------|------------|------------|
| **NONE** | Simple | *"Margaret runs the village bakery, making fresh bread every morning at 5 AM for the past thirty years."* | NONE ✅ | 0.997 |
| **NONE** | Nuanced | *"Dr. Harrison performs routine medical check-ups with methodical precision, maintaining professional distance while patients share their deepest fears about mortality."* | NONE ⚠️ | 0.581 |
| **INTERNAL** | Simple | *"Emma struggles with overwhelming anxiety after her father's harsh criticism, questioning her self-worth and abilities."* | INTERNAL ✅ | 0.983 |
| **INTERNAL** | Nuanced | *"The renowned pianist Clara finds herself paralyzed by perfectionism, her childhood trauma surfacing as she prepares for the performance that could define her legacy."* | INTERNAL ✅ | 0.733 |
| **EXTERNAL** | Simple | *"Knight Roderick embarks on a dangerous quest to retrieve the stolen crown from the dragon's lair."* | EXTERNAL ✅ | 0.717 |
| **EXTERNAL** | Nuanced | *"Master thief Elias infiltrates the heavily guarded fortress, disabling security systems and evading patrol routes, each obstacle requiring new techniques and tools to reach the vault."* | EXTERNAL ✅ | 0.711 |
| **BOTH** | Simple | *"Sarah must rescue her kidnapped daughter from the terrorist compound while confronting her own paralyzing guilt about being an absent mother."* | BOTH ⚠️ | 0.578 |
| **BOTH** | Nuanced | *"Archaeologist Sophia discovers an ancient artifact that could rewrite history, but must confront her own ethical boundaries and childhood abandonment issues as powerful forces try to silence her."* | BOTH ✅ | 0.926 |

**Results:** 8/8 correct predictions (100% accuracy)

## Limitations

- **Domain:** Optimized for character descriptions in narrative fiction
- **Length:** Maximum 512 tokens (longer texts are truncated)
- **Language:** English only
- **Context:** Works best with character-focused descriptions rather than plot summaries
- **Ambiguity:** Some edge cases may be inherently ambiguous between INTERNAL/BOTH

## Ethical Considerations

- **Bias:** Training data may contain genre/cultural biases toward certain character archetypes
- **Interpretation:** Classifications reflect Western narrative theory; other storytelling traditions may not map perfectly
- **Automation:** Should complement, not replace, human literary analysis

## Citation

```bibtex
@model{plot_arc_classifier_2025,
  title={Plot Arc Classifier - DeBERTa Small},
  author={Claude Code Assistant},
  year={2025},
  url={https://github.com/your-org/plot-arc-classifier},
  note={Fine-tuned DeBERTa-v3-small for character plot arc classification}
}
```

## Model Card Contact

For questions about this model, please open an issue in the repository or contact the maintainers.

---

*Model trained on 2025-09-02 using transformers library.*