You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SV3.3B - Sports Video Description Model

โš ๏ธ RESEARCH USE ONLY - NON-COMMERCIAL LICENSE

SV3.3B is a unified video-to-text model specifically designed for sports video understanding.

License

This model is licensed under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International). This model is intended for research purposes only and cannot be used for commercial applications.

Model Architecture

  • Vision Encoder: Custom DWT-VJEPA2-based video encoder
  • Text Model: 3.3B parameter language model
  • Vision-Text Bridge: Learned projection layer
  • Specialization: Fine-tuned on sports video data

Model Details

  • Parameters: 3.3B
  • Input: Video files (16 frames, 256x256 resolution)
  • Output: Natural language descriptions
  • Domain: Sports video analysis
  • Training: SPORTSVISION/NSVA_SUBSET

Model Performance

image/png

Limitations

  • Research use only
  • No commercial applications
  • Optimized for sports content
  • May not generalize to other video domains

Citation

If you use this model in your research, please cite:

@misc{sv3-3b-2024,
  title={SV3.3B: Sports Video Description Model},
  author={Varun Kodathala},
  year={2025},
  url={https://huggingface.co/sportsvision/SV3.3B}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support