arxiv:2303.07626

CAT: Causal Audio Transformer for Audio Classification

Published on Mar 14, 2023

Authors:

Abstract

A Causal Audio Transformer with Multi-Resolution Multi-Feature extraction and an acoustic attention block achieves state-of-the-art performance in audio classification.

AI-generated summary

The attention-based Transformers have been increasingly applied to audio classification because of their global receptive field and ability to handle long-term dependency. However, the existing frameworks which are mainly extended from the Vision Transformers are not perfectly compatible with audio signals. In this paper, we introduce a Causal Audio Transformer (CAT) consisting of a Multi-Resolution Multi-Feature (MRMF) feature extraction with an acoustic attention block for more optimized audio modeling. In addition, we propose a causal module that alleviates over-fitting, helps with knowledge transfer, and improves interpretability. CAT obtains higher or comparable state-of-the-art classification performance on ESC50, AudioSet and UrbanSound8K datasets, and can be easily generalized to other Transformer-based models.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2303.07626 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2303.07626 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.