Cinematic Music Descriptor β€” Module 1 – Local Scene Encoder

RoBERTa-base finetuned to encode individual movie script scenes into 768-dim vectors, with multi-task heads for scene-level cinematic attributes.

Label Schema

Classification

  • emotional_valence: 4 classes
  • conflict_nature: 6 classes
  • acoustic_space: 6 classes
  • reality_layer: 5 classes

Regression

  • pacing_intensity: 1–10
  • scene_arousal: 0.0–1.0

Training Details

  • Base model: roberta-base
  • Dataset: ~11,000 scenes from 60–80 movies
  • Framework: PyTorch + HuggingFace Transformers
  • Logging: Weights & Biases

Usage

import torch
from huggingface_hub import hf_hub_download

# Download weights
path = hf_hub_download(repo_id="suyashnpande/cinematic-music-descriptor-module3",
                       filename="module3.pt")

Citation

If you use this model, please cite the project.

Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support