π Griot Story Identifier (1,000 Rows, DistilBERT)
This is a fine-tuned binary text classification model that predicts whether a given passage contains a story (story) or does not (not_story).
It was trained on a synthetic dataset of 1,000 rows, with each input text being β₯ 300 words.
π§Ύ Model Details
- Base model:
distilbert-base-uncased - Task: Binary text classification (
storyvs.not_story) - Dataset size: 1,000 rows (balanced)
- Sequence length: 256 max tokens
- Training epochs: 4
- Framework: π€ Transformers + PyTorch
π Labels
storyβ text contains a narrative arc (beginning, middle, end, events, characters)not_storyβ text is descriptive, conversational, or factual without a narrative arc
π Usage
Load the model directly from the Hub:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
repo_id = "mjpsm/Griot-Story-Identifier-1k-v1"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)
text = """
Last summer, my friends and I built a treehouse in the backyard...
(300+ word passage here)
"""
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
pred_id = int(logits.argmax(dim=-1))
label = model.config.id2label[pred_id]
print("Predicted label:", label)
- Downloads last month
- 2