Inside the Research | September 2019 Hearing Review
The ability to hear speech in noisy environments has been a brass ring for hearing aid developers throughout the history of hearing healthcare. Directional microphones have been with us for decades, and in specific situations they can be quite useful. However, directional microphone technology has not solved the speech in noise (SIN) problem, as the SIN problem is often more significant than the solution directional microphones provide. That’s why research on SIN continues throughout the world, and one of the more interesting areas involves the idea of deep neural networks (DNN). To provide some background in on this topic, I thought it would be interesting to interview Lars Bramslow, PhD, senior scientist at the Eriksholm Research Centre (Oticon A/S) in Snekkersten, Denmark, who has worked and published extensively in this area.
Beck: Good Morning, Lars. Nice to speak with you!
Bramslow: Thanks Doug. Good to be with you, too.
Beck: And, before we jump in, I should disclose we worked together in 1987 at the House Ear Institute in Los Angeles, and you currently work at the Eriksholm Research Centre which is part of Demant, the parent group of Oticon (for whom I work). I’d like to start by noting that the primary complaint of people with hearing loss is not that they need things to be louder, but they need things to be clearer to better understand speech in noise.
Bramslow: I agree. And, of course, there are technologies which improve the ability to understand SIN via an improved signal-to-noise ratio (SNR). However, one must be careful because although many hearing technologies are available to improve the SNR, they most often do not improve the SNR enough to really please the wearer.
Beck: I suspect you’re speaking about directional microphones?
Bramslow: Exactly. Directional mics are wonderful; they absolutely can improve the SNR by 2-4 dB, which can be very useful in some situations when the noise is coming from behind the listener. Yet, in the most difficult listening situations, this 2-4 dB improvement may not be noticeable by the person wearing the hearing aids, as there may be interfering sounds from the front as well—in which case directional microphones provide little benefit.
Beck: An excellent article1 on this topic was published about 15 years ago in Hearing Review by Mead Killion who pointed out how people with normal hearing may only require an SNR of 2-3 dB to correctly perceive words in noise, whereas people with a mild-to-moderate hearing loss may need an 8 dB SNR to achieve the same success. In other words, if you have a patient with an SNR requirement of 8 dB, and you aid them with technology that provides a 2-4 dB improvement, they still will not be able to understand speech in noise.
Bramslow: Exactly. And so we need to explore other technologies and protocols which provide alternative speech and sound processing.
Beck: And that’s the golden ring you and others in our field have been pursuing?
Bramslow: Yes. Rather than simply providing fundamental directionality or a further reduction in the soundfield through narrow-band directionality (ie, beam-forming)—which usually presumes speech is in the front and noise is in the back of the listener—we have to realize that multiple speech stimuli targets can coexist simultaneously. That means we should provide the brain with the information it requires to focus on the primary voice or the sound source of interest. As you know, very few patients can even tell when their directional mics are on, even if the mics are working well. So the issue is that real-world benefit from directional microphones in speech babble from all directions can be unfortunately very, very small.
Beck: And so let’s talk about your research, which seems to be focused on enhancing known voices. Please tell me about that.
Bramslow: OK, sure. What we’ve been doing is exploring techniques and protocols which allow two people to speak simultaneously, while the hearing aid algorithm enhances just the one known voice. Specifically, we can permit the technology to “train” to the voice of maximal interest—providing the wearer a better listening experience through better word intelligibility and an improved sound quality. In our experiments and publications,2,3 we’ve explored these protocols, and perhaps in a few years they will be commercially viable.
Beck: That’s fantastic! Of course, some of these protocols are proprietary, and as such, we’ll just leave it there. In the meantime, please tell me about “deep neural networks” (DNN) and what that means regarding hearing, listening, and amplification? My understanding is that in the central nervous system, a DNN is a highly integrated system in which nerves and synapses work together in parallel to perform a specific function?
Bramslow: Yes, that’s a good starting point. DNNs can learn and perform simple and complex tasks. In our case, we’re focused on the DNN task of separating one talker from another. This ability appears to be based on the specific attributes of the person speaking, such as vocal tract length, fundamental frequency, harmonics, prosody, the rhythm of speech, and other acoustic characteristics. It also depends, of course, on the person listening with regard to familiarity and their ability to hear and listen/untangle the perceived sounds.
In lab settings, we’ve found DNN can successfully separate one talker from another, and thereby add more clarity to familiar voices. So, a deep neural network (DNN) can be defined many ways. And, of course, DNN’s can be thought of as being inspired by and originating within the brain, where nerve cells in the central nervous system are interconnected by synapses, and they can be updated and trained to perform specific functions. For example, in this case, we train processing units to perform source separation—to separate voices and focus on the more familiar voice—even when all voices are coming from the front of the listener.
Beck: So, in some respects, this is a black box function?
Bramslow: Yes. It’s relatively early, and we don’t know all the details and interactions just yet, but we are aware that DNN systems are trainable, they work in parallel, and the more we study these things, the more likely we’ll be able to use it to our benefit in hearing aid amplification systems in the short and long term.4 And by now, they have clearly outperformed more conventional source-separation algorithms.
Beck: And how much separation have you seen in the lab when two competing voices are managed via a DNN protocol?
Bramslow: Of course, it varies…but in general, we’re seeing 7-9 dB SNR measured improvement with concomitant improvements in word recognition on the order of 40 percentage points for one voice.
Beck: That’s relatively huge.
Bramslow: And, potentially, this will get even more significant with time.
Beck: And is the DNN technology being used in commercially available systems?
Bramslow: Not in terms of hearing processing, but DNN is being used in other forms. For example, if you go online, there are websites where they can take a photograph or a video, then, in real time, turn it into what might appear to be a painting from Picasso or Van Gogh…These systems rely on the same or similar DNN networks that we’re speaking about. Just search “change photo to painting” and there are many options.
Beck: And these are visual examples of how hardware and software can recognize objects and convert them into a more desirable format or outcome?
Bramslow: Yes, and we’re getting closer to it every day, along with improved artificial intelligence and greater learning opportunity for machines.
Beck: Quite amazing Lars. Thanks so much for your time and for sharing your insights with us.
Bramslow: My pleasure Doug, thanks for your interest in our work!
References
-
Killion MC. Myths about hearing in noise and directional microphones. Hearing Review. 2004;11(2):14-19,72,73. Available at: https://www.etymotic.com/media/publications/erl-0051-2004.pdf
-
Bramsløw L, Naithani G, Hafez A, Barker T, Pontoppidan NH, Virtanen T. Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. J Acoust Soc Am. 2018;144(1):172-185. Available at: https://doi.org/10.1121/1.5045322
-
Bramsløw L, Vatti M, Rossing R, Naithani G, Pontoppidan NH. A Competing Voices Test for hearing-impaired listeners applied to spatial separation and ideal time-frequency masks. Trends in Hearing. 2019; 23. Available at: https://doi.org/10.1177/2331216519848288
-
Bentsen T, May T, Kressner AA, Dau T. The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility. PloS One. 2018;13(5). Available at: https://doi.org/10.1371/journal.pone.019692
About the author: Douglas L. Beck, AuD, is Executive Director of Academic Sciences at Oticon Inc, Somerset, NJ. He has served as Editor In Chief at AudiologyOnline and Web Content Editor for the American Academy of Audiology (AAA). Dr Beck is an Adjunct Clinical Professor of Communication Disorders and Sciences at the State University of New York, Buffalo, and also serves as Senior Editor of Clinical Research for The Hearing Review’s Inside the Research column.
CORRESPONDENCE can be addressed to Dr Beck at: [email protected]
Citation for this article: Beck DL. Speech in Noise Research and Deep Neural Networks: An Interview with Lars Bramslow, PhD. Hearing Review. 2019;26(9):46-47.