SV3.3B / README.md
VarunKodathala's picture
Update README.md
a3ab597 verified
metadata
language: en
license: cc-by-nc-4.0
tags:
  - video-to-text
  - sports
  - vision-language
  - multimodal
  - research-only
library_name: transformers
pipeline_tag: video-text-to-text

SV3.3B - Sports Video Description Model

⚠️ RESEARCH USE ONLY - NON-COMMERCIAL LICENSE

SV3.3B is a unified video-to-text model specifically designed for sports video understanding.

License

This model is licensed under CC-BY-NC-4.0 (Creative Commons Attribution-NonCommercial 4.0 International). This model is intended for research purposes only and cannot be used for commercial applications.

Model Architecture

  • Vision Encoder: Custom DWT-VJEPA2-based video encoder
  • Text Model: 3.3B parameter language model
  • Vision-Text Bridge: Learned projection layer
  • Specialization: Fine-tuned on sports video data

Model Details

  • Parameters: 3.3B
  • Input: Video files (16 frames, 256x256 resolution)
  • Output: Natural language descriptions
  • Domain: Sports video analysis
  • Training: SPORTSVISION/NSVA_SUBSET

Model Performance

image/png

Limitations

  • Research use only
  • No commercial applications
  • Optimized for sports content
  • May not generalize to other video domains

Citation

If you use this model in your research, please cite:

@misc{sv3-3b-2024,
  title={SV3.3B: Sports Video Description Model},
  author={Varun Kodathala},
  year={2025},
  url={https://huggingface.co/sportsvision/SV3.3B}
}