SV3.3B - Sports Video Description Model
โ ๏ธ RESEARCH USE ONLY - NON-COMMERCIAL LICENSE
SV3.3B is a unified video-to-text model specifically designed for sports video understanding.
License
This model is licensed under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International). This model is intended for research purposes only and cannot be used for commercial applications.
Model Architecture
- Vision Encoder: Custom DWT-VJEPA2-based video encoder
- Text Model: 3.3B parameter language model
- Vision-Text Bridge: Learned projection layer
- Specialization: Fine-tuned on sports video data
Model Details
- Parameters: 3.3B
- Input: Video files (16 frames, 256x256 resolution)
- Output: Natural language descriptions
- Domain: Sports video analysis
- Training: SPORTSVISION/NSVA_SUBSET
Model Performance
Limitations
- Research use only
- No commercial applications
- Optimized for sports content
- May not generalize to other video domains
Citation
If you use this model in your research, please cite:
@misc{sv3-3b-2024,
title={SV3.3B: Sports Video Description Model},
author={Varun Kodathala},
year={2025},
url={https://huggingface.co/sportsvision/SV3.3B}
}
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
