Real-time target sound extraction using attention

Описание к видео Real-time target sound extraction using attention

Real-time target sound extraction, ICASSP 2023
Bandhav Veluri, University of Washington

We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner while also leveraging the generalization performance of transformer-based architectures.

We provide code, dataset, and audio samples: https://waveformer.cs.washington.edu/.

This video is closed captioned.

Комментарии

Информация по комментариям в разработке