--- # Model Card Metadata (YAML Front Matter) license: mit base_model: microsoft/deberta-v3-small tags: - text-classification - character-analysis - plot-arc - narrative-analysis - deberta - transformers language: en datasets: - custom/plot-arc-balanced-101k metrics: - accuracy - f1 - precision - recall model_type: sequence-classification pipeline_tag: text-classification widget: - text: "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages." example_title: "External Arc Example" - text: "Maria struggles with crippling self-doubt after her mother's harsh words." example_title: "Internal Arc Example" - text: "Captain Torres must infiltrate enemy lines while battling his own cowardice." example_title: "Both Arc Example" - text: "A baker who makes bread every morning in his village shop." example_title: "No Arc Example" library_name: transformers --- # Plot Arc Classifier - DeBERTa Small A fine-tuned DeBERTa-v3-small model for classifying character plot arc types in narrative text. ## Model Details ### Model Description This model classifies character descriptions into four plot arc categories: - **NONE (0)**: No discernible character development or plot arc - **INTERNAL (1)**: Character growth driven by internal conflict/psychology - **EXTERNAL (2)**: Character arc driven by external events/missions - **BOTH (3)**: Character arc with both internal conflict and external drivers **Model Type:** Text Classification (Sequence Classification) **Base Model:** microsoft/deberta-v3-small (~60M parameters) **Language:** English **License:** MIT ### Model Architecture - **Base:** DeBERTa-v3-Small (60M parameters) - **Task:** 4-class sequence classification - **Input:** Character descriptions (max 512 tokens) - **Output:** Classification logits + probabilities for 4 classes ## Training Data ### Dataset Statistics - **Total Examples:** 101,348 - **Training Split:** 91,213 examples (90%) - **Validation Split:** 10,135 examples (10%) - **Perfect Class Balance:** 25,337 examples per class ### Data Sources - Systematic scanning of 1.8M+ character descriptions - LLM validation using Llama-3.2-3B for quality assurance - SHA256-based deduplication to prevent data leakage - Carefully curated and balanced dataset across all plot arc types ### Class Distribution | Class | Count | Percentage | |-------|-------|------------| | NONE | 25,337 | 25% | | INTERNAL | 25,337 | 25% | | EXTERNAL | 25,337 | 25% | | BOTH | 25,337 | 25% | ## Performance ### Key Metrics - **Accuracy:** 0.7286 - **F1 (Weighted):** 0.7283 - **F1 (Macro):** 0.7275 ### Per-Class Performance | Class | Precision | Recall | F1-Score | Support | |-------|-----------|--------|----------|---------| | NONE | 0.697 | 0.613 | 0.653 | 2,495 | | INTERNAL | 0.677 | 0.683 | 0.680 | 2,571 | | EXTERNAL | 0.892 | 0.882 | 0.887 | 2,568 | | BOTH | 0.652 | 0.732 | 0.690 | 2,501 | ### Training Details - **Training Time:** 9.7 hours on Apple Silicon MPS - **Final Training Loss:** 0.635 - **Epochs:** 3.86 (early stopping) - **Batch Size:** 16 (effective: 32 with gradient accumulation) - **Learning Rate:** 2e-5 with warmup - **Optimizer:** AdamW with weight decay (0.01) ## Confusion Matrix ![Confusion Matrix](images/confusion_matrix.png) ## Usage ### Basic Usage ```python from transformers import DebertaV2Tokenizer, DebertaV2ForSequenceClassification import torch # Load model and tokenizer model_name = "plot-arc-classifier-deberta-small" tokenizer = DebertaV2Tokenizer.from_pretrained(model_name) model = DebertaV2ForSequenceClassification.from_pretrained(model_name) # Example text text = "Sir Galahad embarks on a perilous quest to retrieve the stolen Crown of Ages." # Tokenize and predict inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) probabilities = torch.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(probabilities, dim=-1) # Class mapping class_names = ['NONE', 'INTERNAL', 'EXTERNAL', 'BOTH'] prediction = class_names[predicted_class.item()] confidence = probabilities[0][predicted_class].item() print(f"Predicted class: {prediction} (confidence: {confidence:.3f})") ``` ### Pipeline Usage ```python from transformers import pipeline classifier = pipeline( "text-classification", model="plot-arc-classifier-deberta-small", return_all_scores=True ) result = classifier("Captain Torres must infiltrate enemy lines while battling his own cowardice.") print(result) ``` ## Example Classifications | Class | Type | Example | Prediction | Confidence | |-------|------|---------|------------|------------| | **NONE** | Simple | *"Margaret runs the village bakery, making fresh bread every morning at 5 AM for the past thirty years."* | NONE ✅ | 0.997 | | **NONE** | Nuanced | *"Dr. Harrison performs routine medical check-ups with methodical precision, maintaining professional distance while patients share their deepest fears about mortality."* | NONE ⚠️ | 0.581 | | **INTERNAL** | Simple | *"Emma struggles with overwhelming anxiety after her father's harsh criticism, questioning her self-worth and abilities."* | INTERNAL ✅ | 0.983 | | **INTERNAL** | Nuanced | *"The renowned pianist Clara finds herself paralyzed by perfectionism, her childhood trauma surfacing as she prepares for the performance that could define her legacy."* | INTERNAL ✅ | 0.733 | | **EXTERNAL** | Simple | *"Knight Roderick embarks on a dangerous quest to retrieve the stolen crown from the dragon's lair."* | EXTERNAL ✅ | 0.717 | | **EXTERNAL** | Nuanced | *"Master thief Elias infiltrates the heavily guarded fortress, disabling security systems and evading patrol routes, each obstacle requiring new techniques and tools to reach the vault."* | EXTERNAL ✅ | 0.711 | | **BOTH** | Simple | *"Sarah must rescue her kidnapped daughter from the terrorist compound while confronting her own paralyzing guilt about being an absent mother."* | BOTH ⚠️ | 0.578 | | **BOTH** | Nuanced | *"Archaeologist Sophia discovers an ancient artifact that could rewrite history, but must confront her own ethical boundaries and childhood abandonment issues as powerful forces try to silence her."* | BOTH ✅ | 0.926 | **Results:** 8/8 correct predictions (100% accuracy) ## Limitations - **Domain:** Optimized for character descriptions in narrative fiction - **Length:** Maximum 512 tokens (longer texts are truncated) - **Language:** English only - **Context:** Works best with character-focused descriptions rather than plot summaries - **Ambiguity:** Some edge cases may be inherently ambiguous between INTERNAL/BOTH ## Ethical Considerations - **Bias:** Training data may contain genre/cultural biases toward certain character archetypes - **Interpretation:** Classifications reflect Western narrative theory; other storytelling traditions may not map perfectly - **Automation:** Should complement, not replace, human literary analysis ## Citation ```bibtex @model{plot_arc_classifier_2025, title={Plot Arc Classifier - DeBERTa Small}, author={Claude Code Assistant}, year={2025}, url={https://github.com/your-org/plot-arc-classifier}, note={Fine-tuned DeBERTa-v3-small for character plot arc classification} } ``` ## Model Card Contact For questions about this model, please open an issue in the repository or contact the maintainers. --- *Model trained on 2025-09-02 using transformers library.*