You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SV3.3B - Sports Video Description Model

⚠️ RESEARCH USE ONLY - NON-COMMERCIAL LICENSE

SV3.3B is a unified video-to-text model specifically designed for sports video understanding.

License

This model is licensed under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International). This model is intended for research purposes only and cannot be used for commercial applications.

Model Architecture

Vision Encoder: Custom DWT-VJEPA2-based video encoder
Text Model: 3.3B parameter language model
Vision-Text Bridge: Learned projection layer
Specialization: Fine-tuned on sports video data

Model Details

Parameters: 3.3B
Input: Video files (16 frames, 256x256 resolution)
Output: Natural language descriptions
Domain: Sports video analysis
Training: SPORTSVISION/NSVA_SUBSET

Model Performance

Limitations

Research use only
No commercial applications
Optimized for sports content
May not generalize to other video domains

Citation

If you use this model in your research, please cite:

@misc{sv3-3b-2024,
  title={SV3.3B: Sports Video Description Model},
  author={Varun Kodathala},
  year={2025},
  url={https://huggingface.co/sportsvision/SV3.3B}
}

Downloads last month: -

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support