tsinghua-ee
/

video-SALMONN-2

Video-Text-to-Text

text-generation

text-generation-inference

Model card Files Files and versions

video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

Official model release of video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

Results

Downloads last month: 174

Safetensors

Model size

9B params

Tensor type

I64

·

BF16

·

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tsinghua-ee/video-SALMONN-2

Base model

Qwen/Qwen2-7B

Finetuned

(68)

this model

Datasets used to train tsinghua-ee/video-SALMONN-2

Paper for tsinghua-ee/video-SALMONN-2

video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

Paper • 2506.15220 • Published Jun 18, 2025 • 1