From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding Paper • 2409.18938 • Published Sep 27, 2024
microsoft/xclip-base-patch16-zero-shot Video Classification • 0.2B • Updated Sep 12, 2023 • 4.08k • 26