license: bsd-3-clause tags: - multimodal - emotion-recognition - llama - lora - acm-mm-2025
MoSEAR: Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning
π Model Description
This repository contains the MoSEAR.pth model weights for MoSEAR (Modality-Specific Experts with Attention Reallocation), a framework designed to address emotion conflicts in multimodal emotion reasoning tasks.
Key Features:
- MoSE (Modality-Specific Experts): Parameter-efficient LoRA-based training with modality-specific experts
- AR (Attention Reallocation): Inference-time attention intervention mechanism
- CA-MER Benchmark: New benchmark for evaluating emotion conflict scenarios
π― Model Information
- Model Type: Multimodal Emotion Reasoning Model
- Base Architecture: LLaMA with vision-language interface
- Training Method: LoRA (Low-Rank Adaptation) with modality-specific experts
- Checkpoint: Best model from training (epoch 29)
- Task: Multimodal emotion recognition with conflict handling
π Performance
This model achieves state-of-the-art performance on emotion conflict scenarios:
- Handles inconsistent emotional cues across audio, visual, and text modalities
- Effective attention reallocation during inference
- Robust performance on CA-MER benchmark
π Usage
Loading the Model
import torch
# Load checkpoint
checkpoint = torch.load('MoSEAR.pth', map_location='cpu')
# The checkpoint contains:
# - model state dict
# - optimizer state (if included)
# - training metadata
Full Pipeline
For complete usage with the MoSEAR framework, please refer to the GitHub repository.
# Clone the code repository
git clone https://github.com/ZhiyuanHan-Aaron/MoSEAR.git
cd MoSEAR
# Download this checkpoint
# Place it in the appropriate directory as per the repository instructions
# Run inference
bash scripts/inference.sh
π Model Files
MoSEAR.pth: Main model checkpoint (best performing model)
π Citation
If you use this model in your research, please cite:
@inproceedings{han2025mosear,
title={Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning},
author={Han, Zhiyuan and Li, Yifei and Chen, Yanyan and Liang, Xiaohan and Song, Mingming and Peng, Yongsheng and Yin, Guanghao and Ma, Huadong},
booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
year={2025}
}
π§ Contact
Zhiyuan Han
- Email: [email protected]
- GitHub: @ZhiyuanHan-Aaron
π Acknowledgements
This work builds upon:
π License
This model is released under the BSD 3-Clause License. See the LICENSE for details.
Copyright Β© 2025 Zhiyuan Han
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support