# Official models of "MoChat: Joints-Grouped Spatio-Temporal Grounding LLM for Multi-Turn Motion Comprehension and Description" ## Overview MoChat is a Multimodal Large Language Model (MLLM) that revolutionizes human motion understanding through precise spatio-temporal grounding. Unlike conventional motion analysis systems, MoChat integrates: - **Motion Understanding**: Performs fundamental motion comprehension and summarization. - **Spatial Limb Grounding**: Accurately locates body parts involved in described movements. - **Temporal Action Grounding**: Precisely identifies time boundaries corresponding to specific motion descriptions. ## Models We provide the following trained models for download: - **[Joints-Grouped Skeleton Encoder](https://huggingface.co/CSUBioGroup/MoChat/blob/main/JGSE_epoch120)** for motion sequences representation. - Two variants of motion comprehension models: - [MoChat](https://huggingface.co/CSUBioGroup/MoChat/tree/main/MoChat): Base model. - [MoChat-R](https://huggingface.co/CSUBioGroup/MoChat/tree/main/MoChat-R): Extended model with regression head. ## Resources - **Codebase**: [Github](https://github.com/CSUBioGroup/MoChat) - **Paper**: [Arxiv](https://arxiv.org/abs/2410.11404)