New Audio Transformers Course: Live Launch Event with Paige Bailey, Seokhwan Kim, and Brian McFee

Описание к видео New Audio Transformers Course: Live Launch Event with Paige Bailey, Seokhwan Kim, and Brian McFee

Join us for an exciting live event as we celebrate the launch of the new free and open-source Audio Transformers Course by Hugging Face!

We have invited a group of amazing guest speakers, experts in Audio and AI, with academic, open-source, and industry expertise, who will share their presentations to complement the course materials and excite you about Audio AI.

Our guests are:
Paige Bailey, Product Lead for Generative Models at Google DeepMind,   / dynamicwebpaige  
Seokhwan Kim, a Principal Applied Scientist at Amazon Alexa AI.
Bio: Prior to joining Amazon in 2019, Seokhwan conducted work in natural language understanding and spoken dialog systems where he was an NLP Research Scientist at Adobe Research and a Research Scientist at the Institute for Infocomm Research (I2R) in Singapore. Seokhwan completed his PhD work at Pohang University of Science and Technology (POSTECH) in Korea, focusing on cross-lingual weakly-supervised language understanding under the guidance of Prof. Gary Geunbae Lee. He has authored 80 peer-reviewed papers in international journals and conferences, which have garnered over 2200 citations. His recent research has centered around knowledge-grounded conversional modeling. Seokhwan actively contributes to the research communities as a member of the IEEE Speech and Language Processing Technical Committee, a board member of SIGDIAL, a steering committee member of DSTCs, and a PC member of major NLP, dialogue, AI and speech conferences.
Talk:
"How Robust R U?”: Evaluating Task-oriented Dialogue Systems on Spoken Conversations
Most prior work in dialogue modeling has been on written conversations mostly because of existing data sets. However, written dialogues are not sufficient to fully capture the nature of spoken conversations as well as the potential speech recognition errors in practical spoken dialogue systems. In this talk, I will introduce a public benchmark that we have organized as a main track of the Tenth Dialog System Technology Challenge (DSTC10), specifically designed for evaluating multi-domain dialogue state tracking and knowledge-grounded dialogue modeling on spoken task-oriented conversations. Our findings indicate that existing state-of-the-art models trained on written conversations do not perform as well on spoken data, as anticipated. Based on the DSTC10 results, we have discovered that data augmentation and model ensemble methods are two critical factors that significantly improve the models' generalization capabilities, enabling better performance on both tasks.

Brian McFee is Assistant Professor of Music Technology and Data Science New York University.   / functiontelechy  
Bio: He received the B.S. degree in Computer Science from the University of California, Santa Cruz, and M.S. and Ph.D. degrees in Computer Science and Engineering from the University of California, San Diego. His work lies at the intersection of machine learning and audio analysis. He is an active open source software developer, and the principal maintainer of the librosa package for audio analysis.
Talk:
"Introduction to audio processing with librosa
"
This talk gives a brief introduction to the librosa package and the basics of audio signal processing in Python.  We'll cover core package design and functionality, some illustrative examples, and provide pointers on how it might be integrated in machine learning workflows.

Join us live to learn from the invited experts, and have an opportunity to engage with our guests in live Q&A sessions.

Save the date, and sign up for the course: http://eepurl.com/insvcI!

Комментарии

Информация по комментариям в разработке