FINO-Net: A Deep Multimodal Sensor Fusion Framework for Manipulation Failure Detection
Abstract
A multimodal sensor fusion deep neural network, FINO-Net, detects and classifies manipulation failures in unstructured environments using RGB, depth, and audio data.
Safe manipulation in unstructured environments for service robots is a challenging problem. A failure detection system is needed to monitor and detect unintended outcomes. We propose FINO-Net, a novel multimodal sensor fusion based deep neural network to detect and identify manipulation failures. We also introduce a multimodal dataset, containing 229 real-world manipulation data recorded with a Baxter robot. Our network combines RGB, depth and audio readings to effectively detect and classify failures. Results indicate that fusing RGB with depth and audio modalities significantly improves the performance. FINO-Net achieves 98.60% detection and 87.31% classification accuracy on our novel dataset. Code and data are publicly available at https://github.com/ardai/fino-net.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper