北洋智算论坛

北洋智算论坛 | 基于端到端深度学习的说话人识别

2019年11月25日 10:12



讲座主题
基于端到端深度学习的说话人识别

主讲人姓名及介绍
    李明,昆山杜克大学电子与计算机工程副教授,美国杜克大学电子与计算机工程系客座研究员,武汉大学计算机学院兼职教授。第十五批江苏省六大高峰B类高层级人才。2005年获南京大学通信工程专业学士学位,2008年获中科院声学所信号与信息处理专业硕士学位,2013年毕业于美国南加州大学电子工程系,获工学博士学位。已发表学术论文100 余篇,现担任IEEE语音及语言处理技术委员会委员,APSIPA 语音及语言处理专委会委员,中国计算机学会语音对话与听觉专业组专委,中国人工智能学会人工心理与人工情感专委会专委,IEEE协会高级会员。担任Interspeech2016,2018及2020说话人语种识别领域主席。

报告摘要

Speech signal not only contains lexicon information, but also delivers various kinds of paralinguistic speech attribute information, such as speaker, language, gender, age, emotion, etc. The core technique question behind it is utterance level supervised learning based on text independent or text dependent speech signal with flexible duration. In section 1, we will first formulate the problem of speaker and language recognition. In section 2, we introduce the traditional framework with different modules in a pipeline, namely, feature extraction, representation, variability compensation and backend classification. Then we naturally introduce the end-to-end idea and compare with the traditional framework. We will show the correspondence between feature extraction and CNN layers, representation and encoding layer, backend modeling and fully connected layers. In section 4, we will introduce some robust methods using the end-to-end framework for far-field and noisy conditions. Finally, we will connect the introduced end-to-end frameworks to other related tasks, e.g. speaker diarization, paralinguistic speech attribute recognition, anti-spoofing countermeasures, etc.

 

扫码关注微信公众号

联系我们

地址:天津市津南区海河教育园区雅观路135号天津大学北洋园校区55教学楼,300350
邮箱:coic@tju.edu.cn

Copyright ©2017 天津大学智能与计算学部 版权所有