September 6, 2018 12:49


Speaker: Prof. Sharon Gannot (Bar-Ilan University, Israel)

Title: Multi-Microphone Speaker Localization on Manifolds

Speech enhancement is a core problem in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or hearing aids. An essential component in the design of speech enhancement algorithms is acoustic source localization. Speaker localization is also directly applicable to many other audio related tasks, e.g. automated camera steering, teleconferencing systems and robot audition.
From a signal processing perspective, speaker localization is the task of mapping multichannel speech signals to 3-D source coordinates. To accomplish viable solutions to this mapping, an accurate description of the source wave propagation, captured by the respective acoustic channel, is required. The acoustic channels in reverberant environments represent a complex reflection pattern stemming from the surfaces and objects characterizing the enclosure. Hence, they are usually modelled by a very large number of coefficients, resulting in an intricate high-dimensional representation.

We start our talk, by analyzing these acoustic responses with nonlinear dimensionality reduction techniques (diffusion maps). We claim that in static acoustic environments, despite the high dimensional representation, the difference between acoustic channels is mainly attributed to the changes in the source position. Thus, the true intrinsic dimensions of the variations of the acoustic channels are significantly fewer than the number of variables commonly used for their representation, namely, they pertain to a low-dimensional manifold that can be inferred from data collected in a training stage. This claim is validated by a comprehensive experimental study in actual acoustic environments.

Motivated by this result, we present a data-driven and semi-supervised source localization algorithm based on two-microphone measurements, which accurately recovers the inverse mapping between the acoustic samples and their corresponding locations. The gist of the algorithm is based on the concept of manifold regularization in a reproducing kernel Hilbert space (RKHS), which extends the standard supervised estimation framework by adding an extra regularization term, imposing a smoothness constraint on possible solutions with respect to a manifold learned in a data-driven manner.

We then show that an analogue mapping operator between the acoustic channel and the source location can be inferred from the Bayesian inference perspective. This Bayesian framework serves as a corner stone for extending the single node (microphone pair) setup to an ad hoc network of microphone pairs. Each node represents a different viewpoint that may be associated with a specific manifold. Merging the different manifolds is shown to increase the spatial separation and to improve the ability to accurately localize the source.

We conclude the talk by briefly discussing source tracking and by exploring future challenges, e.g. multiple sources localization and speech enhancement.

This is joint work with:
Bracha Laufer-Goldshtein, Bar-Ilan University, Israel
and Prof. Ronen Talmon, The Technion-IIT, Israel

Sharon Gannot received the B.Sc. degree (summa cum laude) from the Technion-Israel Institute of Technology, Haifa, Israel, in 1986, and the M.Sc. (cum laude) and Ph.D. degrees from Tel-Aviv University, Tel Aviv, Israel, in 1995 and 2000, respectively, all in electrical engineering. In 2001, he held a Postdoctoral position with the Department of Electrical Engineering, KU Leuven, Leuven, Belgium. From 2002 to 2003, he held a Research and Teaching position with the Faculty of Electrical Engineering, Technion–Israel Institute of Technology. He is currently, a Full Professor with the Faculty of Engineering, Bar-Ilan University, Ramat Gan, Israel, where he is heading the Speech and Signal Processing Laboratory and the Signal Processing Track. Since April 2018, he is also a part-time Professor at the Technical Faculty of IT and Design, Aalborg University, Denmark.

His research interests include statistical signal processing and machine learning algorithms (including manifold learning and deep learning) with applications to single- and multi-microphone speech processing. Specifically, distributed algorithms for ad hoc microphone arrays, speech enhancement, noise reduction and speaker separation, dereverberation, single microphone speech enhancement, speaker localization and tracking.
Dr. Gannot was an Associate Editor for the EURASIP Journal of Advances in Signal Processing during 2003–2012, and an Editor for several special issues on multi-microphone speech processing of the same journal. He was a Guest Editor for the Elsevier Speech Communication and Signal Processing journals. He is currently the Lead Guest Editor of a special issue on speaker localization for the IEEE Journal of Selected Topics in Signal processing. He was an Associate Editor for the IEEE Transactions on Audio, Speech, and Language Processing during 2009–2013, and the Area Chair for the same journal during 2013–2017. He is currently a Moderator for arXiv in the field of audio and speech processing. He is also a Reviewer for many IEEE journals and conferences. Since January 2010, he has been a Member of the Audio and Acoustic Signal Processing technical committee of the IEEE and serves, since January 2017, as the Committee Chair. Since 2005, he has also been a Member of the technical and steering committee of the International Workshop on Acoustic Signal Enhancement (IWAENC) and was the General Co-Chair of the IWAENC held in Tel-Aviv, Israel in August 2010. He was the General Co-Chair of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics in October 2013. He was selected (with colleagues) to present tutorial sessions at ICASSP 2012, EUSIPCO 2012, ICASSP 2013, and EUSIPCO 2013 and was a keynote speaker for IWAENC 2012 and LVA/ICA 2017. He was the recipient of the Bar-Ilan University Outstanding Lecturer Award in 2010 and 2014 and the Rector Innovation in Research Award in 2018. He is also a co-recipient of ten best paper awards.

More Information

Date September 21, 2018 (Fri) 10:30 - 12:00


〒103-0027 Nihonbashi 1-chome Mitsui Building, 15th floor, 1-4-1 Nihonbashi,Chuo-ku, Tokyo