license: bsd-3-clause tags: - multimodal - emotion-recognition - llama - lora - acm-mm-2025

MoSEAR: Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning

📋 Model Description

This repository contains the MoSEAR.pth model weights for MoSEAR (Modality-Specific Experts with Attention Reallocation), a framework designed to address emotion conflicts in multimodal emotion reasoning tasks.

Key Features:

MoSE (Modality-Specific Experts): Parameter-efficient LoRA-based training with modality-specific experts
AR (Attention Reallocation): Inference-time attention intervention mechanism
CA-MER Benchmark: New benchmark for evaluating emotion conflict scenarios

🎯 Model Information

Model Type: Multimodal Emotion Reasoning Model
Base Architecture: LLaMA with vision-language interface
Training Method: LoRA (Low-Rank Adaptation) with modality-specific experts
Checkpoint: Best model from training (epoch 29)
Task: Multimodal emotion recognition with conflict handling

📊 Performance

This model achieves state-of-the-art performance on emotion conflict scenarios:

Handles inconsistent emotional cues across audio, visual, and text modalities
Effective attention reallocation during inference
Robust performance on CA-MER benchmark

🚀 Usage

Loading the Model

import torch

# Load checkpoint
checkpoint = torch.load('MoSEAR.pth', map_location='cpu')

# The checkpoint contains:
# - model state dict
# - optimizer state (if included)
# - training metadata

Full Pipeline

For complete usage with the MoSEAR framework, please refer to the GitHub repository.

# Clone the code repository
git clone https://github.com/ZhiyuanHan-Aaron/MoSEAR.git
cd MoSEAR

# Download this checkpoint
# Place it in the appropriate directory as per the repository instructions

# Run inference
bash scripts/inference.sh

📁 Model Files

MoSEAR.pth: Main model checkpoint (best performing model)

📄 Citation

If you use this model in your research, please cite:

@inproceedings{han2025mosear,
  title={Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning},
  author={Han, Zhiyuan and Li, Yifei and Chen, Yanyan and Liang, Xiaohan and Song, Mingming and Peng, Yongsheng and Yin, Guanghao and Ma, Huadong},
  booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
  year={2025}
}

📧 Contact

Zhiyuan Han

Email: [email protected]
GitHub: @ZhiyuanHan-Aaron

🙏 Acknowledgements

This work builds upon:

📜 License

This model is released under the BSD 3-Clause License. See the LICENSE for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support