license: bsd-3-clause tags: - multimodal - emotion-recognition - llama - lora - acm-mm-2025

MoSEAR: Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning

Paper Conference GitHub

πŸ“‹ Model Description

This repository contains the MoSEAR.pth model weights for MoSEAR (Modality-Specific Experts with Attention Reallocation), a framework designed to address emotion conflicts in multimodal emotion reasoning tasks.

Key Features:

  • MoSE (Modality-Specific Experts): Parameter-efficient LoRA-based training with modality-specific experts
  • AR (Attention Reallocation): Inference-time attention intervention mechanism
  • CA-MER Benchmark: New benchmark for evaluating emotion conflict scenarios

🎯 Model Information

  • Model Type: Multimodal Emotion Reasoning Model
  • Base Architecture: LLaMA with vision-language interface
  • Training Method: LoRA (Low-Rank Adaptation) with modality-specific experts
  • Checkpoint: Best model from training (epoch 29)
  • Task: Multimodal emotion recognition with conflict handling

πŸ“Š Performance

This model achieves state-of-the-art performance on emotion conflict scenarios:

  • Handles inconsistent emotional cues across audio, visual, and text modalities
  • Effective attention reallocation during inference
  • Robust performance on CA-MER benchmark

πŸš€ Usage

Loading the Model

import torch

# Load checkpoint
checkpoint = torch.load('MoSEAR.pth', map_location='cpu')

# The checkpoint contains:
# - model state dict
# - optimizer state (if included)
# - training metadata

Full Pipeline

For complete usage with the MoSEAR framework, please refer to the GitHub repository.

# Clone the code repository
git clone https://github.com/ZhiyuanHan-Aaron/MoSEAR.git
cd MoSEAR

# Download this checkpoint
# Place it in the appropriate directory as per the repository instructions

# Run inference
bash scripts/inference.sh

πŸ“ Model Files

  • MoSEAR.pth: Main model checkpoint (best performing model)

πŸ“„ Citation

If you use this model in your research, please cite:

@inproceedings{han2025mosear,
  title={Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning},
  author={Han, Zhiyuan and Li, Yifei and Chen, Yanyan and Liang, Xiaohan and Song, Mingming and Peng, Yongsheng and Yin, Guanghao and Ma, Huadong},
  booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
  year={2025}
}

πŸ“§ Contact

Zhiyuan Han

πŸ™ Acknowledgements

This work builds upon:

πŸ“œ License

This model is released under the BSD 3-Clause License. See the LICENSE for details.

Copyright Β© 2025 Zhiyuan Han

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support