SV3.3B / README.md

Update README.md

a3ab597 verified 10 months ago

1.63 kB

language: en
license: cc-by-nc-4.0
tags:
  - video-to-text
  - sports
  - vision-language
  - multimodal
  - research-only
library_name: transformers
pipeline_tag: video-text-to-text

SV3.3B - Sports Video Description Model

⚠️ RESEARCH USE ONLY - NON-COMMERCIAL LICENSE

SV3.3B is a unified video-to-text model specifically designed for sports video understanding.

License

This model is licensed under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International). This model is intended for research purposes only and cannot be used for commercial applications.

Model Architecture

Vision Encoder: Custom DWT-VJEPA2-based video encoder
Text Model: 3.3B parameter language model
Vision-Text Bridge: Learned projection layer
Specialization: Fine-tuned on sports video data

Model Details

Parameters: 3.3B
Input: Video files (16 frames, 256x256 resolution)
Output: Natural language descriptions
Domain: Sports video analysis
Training: SPORTSVISION/NSVA_SUBSET

Model Performance

Limitations

Research use only
No commercial applications
Optimized for sports content
May not generalize to other video domains

Citation

If you use this model in your research, please cite:

@misc{sv3-3b-2024,
  title={SV3.3B: Sports Video Description Model},
  author={Varun Kodathala},
  year={2025},
  url={https://huggingface.co/sportsvision/SV3.3B}
}