📖 Griot Story Identifier (1,000 Rows, DistilBERT)

This is a fine-tuned binary text classification model that predicts whether a given passage contains a story (story) or does not (not_story).
It was trained on a synthetic dataset of 1,000 rows, with each input text being ≥ 300 words.

🧾 Model Details

Base model: distilbert-base-uncased
Task: Binary text classification (story vs. not_story)
Dataset size: 1,000 rows (balanced)
Sequence length: 256 max tokens
Training epochs: 4
Framework: 🤗 Transformers + PyTorch

📊 Labels

story → text contains a narrative arc (beginning, middle, end, events, characters)
not_story → text is descriptive, conversational, or factual without a narrative arc

🚀 Usage

Load the model directly from the Hub:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

repo_id = "mjpsm/Griot-Story-Identifier-1k-v1"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForSequenceClassification.from_pretrained(repo_id)

text = """
Last summer, my friends and I built a treehouse in the backyard...
(300+ word passage here)
"""

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits
pred_id = int(logits.argmax(dim=-1))
label = model.config.id2label[pred_id]

print("Predicted label:", label)

Downloads last month: 2

Safetensors

Model size

67M params

Tensor type

F32

Model tree for mjpsm/Griot-Story-Identifier-1k-v1

Quantizations

1 model