Figure 6: Schematic of the test environment for the listening effort experiment. All graphics courtesy of Phonak

New deep neural network noise reduction technology found in Phonak Audéo Sphere Infinio improves speech understanding and listening effort for hearing aid users.

By Kevin Seitz-Paquette, AuD; Matthias Keller, PhD; Anne Miller, AuD; Ashley Wright, AuD; Volker Kuehnel, PhD; Matthias Latzel, PhD; Stefan Raufer, PhD; Shin-Shin Hobi

Understanding speech in noise is one of the primary challenges of hearing aid users. Despite numerous advancements in hearing technology, directional microphones combined with conventional noise reduction algorithms are still among the most effective means to improve the signal-to-noise ratio (SNR) for hearing aid users. Null-steering and binaural beamforming have improved the performance of directional systems, but the inherent disadvantages of directional microphones have not been addressed. Namely, the hearing aid user must have the awareness and ability to place the signal of interest within the beam of the directional microphone. In the case of multiple talkers, or in the case of an unexpected talker approaching from a null-point (e.g., a waiter in a restaurant), a directional microphone system may work against the listening goals of the user.

Deep neural network noise reduction (DNN-NR) has shown remarkable potential to remove noise and preserve speech without relying on microphone directivity.(1) In doing so, DNN-NR can offer an even greater SNR improvement, regardless of the incident angle of the signal of interest. DNN-NR is already in widespread use in consumer applications, such as video conferencing, but this technology has not been implemented in commercially available hearing aids. Although research has shown striking performance benefits for hearing impaired listeners using DNN-NR(2), such studies have relied on a laptop or smartphone for signal processing. A chip with the processing power, unique architecture, and power efficiency needed to run DNN-NR in a hearing aid has previously not been available.(1)

Phonak Audéo Sphere Infinio (henceforth simply ‘Sphere’) features two processors working in parallel: the ERA chip to provide for traditional signal processing and other functions of the hearing aid, and the DeepSonic chip to run Spheric Speech Clarity, a DNN-NR system operating on a device that can simultaneously enhance speech and suppress noise. DeepSonic has overcome the hardware limitations (e.g., power consumption, computational complexity) that have, until now, prevented DNN-NR from being applied in a hearing aid. Spheric Speech Clarity is activated automatically as a part of the Spheric Speech in Loud Noise program available exclusively on Phonak Audéo Sphere. Sphere has been tested extensively via technical measurement and clinical studies, the results of which reveal a substantial SNR benefit, leading to significantly improved speech intelligibility and reduced listening effort.

Benchtop Measurements

Prior to using Sphere in a clinical study, a series of technical measurements were conducted at Sonova’s headquarters in Stäfa, Switzerland. During these measurements, the hearing aids were fit to a KEMAR mannequin with fully occluded acoustic coupling. KEMAR was positioned in the center of a 1.4-meter radius circular array of 12 equally spaced Genelec 8020 loudspeakers. Three realistic acoustic environments from the ARTE database were selected as background noise (ranging in intensity from 71.7 to 78.2 dB SPL)(3), and the ISTS signal served as the target speech signal.(4) The phase inversion method was used to derive SNR estimates from audio recordings made under various hearing aid conditions with KEMAR.(5)

In one set of measurements, the SNR improvement for speech from 0 degrees azimuth was compared between i) the Spheric Speech in Loud noise program of Audéo Sphere Infinio with Spheric Speech Clarity and a static beamformer at default strengths, ii) Audéo Infinio R with StereoZoom 2.0 (a binaural beamformer) and conventional noise canceling algorithms at default strengths, and iii) three commercially available, current generation premium hearing aids at default strengths. All SNR improvements calculated for this comparison were relative to a baseline of the open KEMAR ear.

Figure 1: SII-weighted signal-to-noise ratio (SNR) improvements with respect to unaided, averaged across three realistic scenes. Results are presented for a fully occluded coupling and for the better ear. The speech signal was always presented from 0° azimuth. All devices were tested in manual programs at their respective default settings.

The Spheric Speech in Loud Noise program with Spheric Speech Clarity achieved an SNR benefit of 5.9 dB and, according to the results, it outpaced premium hearing aids of other brands by 2.6 to 3.7 dB (Figure 1). In these measurements, the open ear was selected as the reference condition to ensure all brands had an equivalent baseline from which to compare; however, Spheric Speech in Loud Noise can achieve up to a 10 dB SNR improvement relative to an Audéo Infinio hearing aid with an omnidirectional microphone and no noise reduction of any kind.


Figure 2: Spheric Speech Clarity benefit from different speech incident angles: SII-weighted signal-to-noise ratio (SNR) improvements across three realistic scenes for a speech signal presented from 0°, 60°, 90°, 120° and 180° azimuth. Results are presented for a fully occluded coupling and for the better ear. For 0° and 60°, the reference condition for Spheric Speech Clarity deactivated versus activated at maximum strength was a fixed directional microphone mode. For 90° to 180°, the reference condition for Spheric Speech Clarity deactivated versus activated at maximum strength was an omnidirectional microphone mode. The gray box shows the +-1 dB range from the average performance across all directions.

In another set of measurements, the SNR improvement offered by the Spheric Speech Clarity feature was tested for speech presented from five incident angles: 0, 60, 90, 120, and 180 degrees azimuth. In these measurements, SNR improvements for 0 and 60 degrees were calculated relative to the aided condition with a fixed directional microphone and Spheric Speech Clarity deactivated. For the other incident angles, the reference condition was an omnidirectional microphone mode with Spheric Speech Clarity deactivated. In this set of measurements, SNR improvements (ranging from 5.8 to 6.9 dB SNR) were insensitive to the incident angle of the speech signal (Figure 2).

Additional details about these measurements may be found in Raufer, et al.(6)

Clinical Studies

In order to better understand how these SNR improvements would translate to patient benefits, Sphere was tested in a two-arm clinical study conducted at the Phonak Audiology Research Center in the western suburbs of Chicago, IL, from May to July 2024. The study was designed to test the objective and subjective benefit of Sphere for participants with hearing impairment. The first arm was conducted entirely within the laboratory, and the second arm was conducted as a field trial.

Laboratory Testing

A total of 27 participants with moderate to severe sensorineural hearing loss participated in the first arm of the study; all were current hearing aid users. The average age of the participants was 75.1 years (SD = 8 years), and 11 of the participants identified as female. Participants were fit with Sphere and two other current generation receiver-in-canal hearing aids (Brands A and B from the measures in Figure 1), using earmolds with a 1 mm vent. One of the competitor devices promotes the use of a DNN algorithm to optimize hearing in noise, while the other competitor device claims an advanced beamforming system to achieve this goal.


Figure 3: A visual schematic of the test environment used during the speech intelligibility tasks.

Because Spheric Speech Clarity can provide an equivalent SNR improvement for speech arriving from any angle, a speech-in-noise test was designed specifically to assess the participants’ understanding of speech presented from multiple incident angles. Sentences from the coordinate response measure (CRM) corpus(7) were presented one at a time from one of four Genelec 8020 loudspeakers, positioned at 60, 120, 240, and 300 degrees relative to the participant. The location was randomized for each sentence (Figure 3). All sentences in the CRM corpus follow the form “Ready [call sign], go to [color] [number] now.” Participants responded by selecting the color and number via a touchscreen, which was positioned directly in front of them at just below eye level. A response was considered correct when both the color and number were correctly identified.

The noise used was a speech-shaped noise spectrally matched to the average spectrum of sentences used during testing. Uncorrelated noise was presented continuously from all loudspeakers, with an additional loudspeaker placed at 180 degrees presenting noise only. Speech tokens were presented at 69 dB SPL, and noise was presented at an overall level of 72 dB SPL (for a constant -3 dB SNR).


Figure 4: Percent correct results for the speech-in-noise task are displayed as a function of hearing aid processing condition. Individual participants are represented by points connected by a line. When using Spheric Speech Clarity, participants performed significantly better than when completing the task without Spheric Speech Clarity.

Two experiments were conducted using the task as described above. In the first experiment, Sphere was tested using a Spheric Speech in Loud Noise program with Spheric Speech Clarity at default strength and a Real-Ear Sound (RES) microphone setting, and again with a Speech in Noise program with Dynamic Noise Cancellation (DNC) turned off and using RES. These conditions were designed to isolate the effect of Spheric Speech Clarity from other signal processing features that improve the SNR of the signal. Results were analyzed via a general linear mixed-effects model with a logit link function, using fixed effects of hearing aid processing condition and sentence location and a random effect for participants.(8) The model indicated that participants were two times more likely to understand speech when using Spheric Speech Clarity than they were when not using it (odds ratio: 2.01, asymptotic 95% CI: [1.6, 2.52], p < 0.0001; Figure 4).

The second experiment compared the Spheric Speech in Loud Noise program with default settings to the recommended speech-in-noise programs for two other hearing aid brands. Results were analyzed via a general linear mixed-effects model following the same general structure as before, this time using fixed effects of hearing aid brand and sentence location. Post-hoc testing indicated that participants were more than three times as likely to understand speech with Spheric Speech in Loud Noise than with Brand A (odds ratio: 3.41, asymptotic 95% CI: [2.54, 4.57], p < 0.0001; Figure 4) and more than two times as likely compared to Brand B (odds ratio: 2.86, asymptotic 95% CI: [2.15, 3.79], p < 0.0001; Figure 5).


Figure 5: Percent correct results for the speech-in-noise task are displayed as a function of hearing aid brand. Individual participants are represented by points connected by a line. When using Audéo Sphere Infinio, participants performed significantly better than when completing the task with either of the two other brands.

Listening effort was assessed using the adaptive categorical listening effort scaling (ACALES) methodology. Participants were seated in the same speaker array as used in the speech-in-noise experiments, but they were turned 180 degrees. Speech was presented from the speaker now at 0 degrees, and noise was presented from all other speakers simultaneously (Figure 6). Babble noise was presented at a fixed level of 72 dB SPL, and the level of the speech adapted as the participant completed the task. Sphere devices were programmed as in the first speech-in-noise experiment (c.f., Figure 4) to isolate the effect of Spheric Speech Clarity.

Results were analyzed via linear mixed effects model with a fixed effect of hearing aid processing and a random effect of participants. Model results indicate that participants could withstand a 2.9 dB poorer SNR without a corresponding increase in subjective listening effort (mean difference: -2.86 dB, 98% CI: [-3.32, -2.39], p < 0.0001; Figure 7).


Figure 7: Listening effort results obtained from the ACALES method are displayed. The ACALES utilizes a 14-point rating scale, but to improve readability, the two ends and the midpoint of the scale were selected for graphical display. Participants in the study could withstand a significantly poorer SNR within a given subjective effort category when using Spheric Speech Clarity than without it. Each box is labeled with the median SNR for that rating category and test condition.

Additional details of this study are available in Wright, et al.(9)

Field Trial

Twenty-six participants who took part in the laboratory arm of the study chose to continue with the field trial. During the field trial, participants were fit with Sphere devices using Target-recommended gain, feature settings, and acoustic coupling. Participants wore the devices during daily life for approximately four weeks.

During the four-week field trial, participants met with a researcher at a local café on two occasions. During the café meetings, the researcher sat directly across from the participant and read a standardized passage. The participants were asked to provide several subjective ratings following the reading.


Figure 8: Participant preference for listening condition in a noisy café setting is presented as the number of participants for each condition tested. Participants preferred Spheric Speech Clarity over StereoZoom 2.0 by a wide margin.

During the first of the two café meetings, the participants compared to manual programs on the same device: i) the Spheric Speech in Loud Noise program with default settings, and ii) the Speech in Loud Noise program with StereoZoom 2.0 and DNC at default strength. Programs were controlled by the researcher, and the test order was randomized across participants; participants were blinded to the test condition. Ratings gathered during this task revealed that Spheric Speech in Loud Noise was preferred by more participants, with 77% of participants indicating they preferred Spheric Speech in Loud Noise, 11.5% Speech in Loud Noise with StereoZoom 2.0, and 11.5% with no preference (Figure 8).


Figure 9: Participant preference for listening condition in a noisy café setting is presented as the number of participants for each brand tested. More participants preferred Phonak with Spheric Speech Clarity over either of the other two brands combined, with 3 participants unable to determine a preference. (Note that two participants are missing from these data due to COVID-19 diagnoses during the study.)

During the second of the two café meetings, participants instead compared Sphere to the same two commercially available hearing aids that were used in the laboratory arm. The test order was randomized across participants, and participants were blinded to the device being used. Participants were asked to choose their preferred device; more participants chose Sphere than either of the other two devices combined, while three participants had no preference (50% chose Sphere, 17% Brand A, 21% Brand B, and 13% had no preference; Figure 9).

To learn more about this study, see Miller, et al.(10)

Conclusion

Phonak Audéo Sphere applies a large-scale DNN to remove noise and improve listening in noise for hearing aid users. The result is an SNR improvement that surpasses what is available from three other commercially available hearing aids, and this improvement is not dependent upon incident angle of the signal of interest. 

Clinical testing shows that this SNR benefit leads to improved speech understanding in noise in a complex listening task and reduced listening effort. Furthermore, Sphere was selected as the preferred option by participants when listening in a noisy café. Both the technical and clinical testing of Sphere confirm that the breakthrough DNN-denoising feature, Spheric Speech Clarity, offers a level of speech-in-noise performance that is not possible with conventional directional microphones and digital noise reduction. 

Hearing aid users wearing Phonak Audéo Sphere should be able to enjoy improved speech understanding and reduced listening effort in loud noise environments regardless of the location of the target speech. 

Original citation for this article: Keller M, Wright A, Raufer S, Kühnel V, Latzel M, Seitz-Paquette K, Miller A, Hobi S. Leveraging Deep Neural Networks in Hearing Aids. Hearing Review. 2024;31(9):08-15.

About the authors: Matthias Keller, PhD, has been working as a scientist in audiology research for Sonova AG since 2019. Ashley Wright, AuD, is a senior research audiologist at the Phonak Audiology Research Center (PARC) in Aurora, IL. Stefan Raufer, PhD, is a senior research engineer at Sonova. Volker Kühnel, PhD, works at Phonak/Sonova in product development on audiological design. Matthias Latzel, PhD, is a senior expert in clinical studies for Sonova AG. Kevin Seitz-Paquette, AuD, is the director of the Phonak Audiology Research Center (PARC), located in Aurora, IL. Anne Miller, AuD, is a research audiologist with Sonova. Shin-Shin Hobi is senior product manager for audiological performance at Phonak HQ.

References:

  1. Hasemann H and Krylova A. Artificial intelligence in hearing aid technology. Phonak Insight. 2024.
  2. Diehl PU, Singer Y, Zilly H, et al. Restoring speech intelligibility for hearing aid users with deep learning. Sci Rep. 2023;13(1):2719. Published 2023 Feb 15. doi:10.1038/s41598-023-29871-8
  3. Weisser A, Buchholz J, Oreinos C, Badajoz-Davila J, Galloway J, Beechey T, and Keisder G. The ambisonics recordings of typical environments (ARTE) database. Acta Acoustica. 2019;105.
  4. Holube I, Fredelake S, Vlaming M, Kollmeier B. Development and analysis of an International Speech Test Signal (ISTS). Int J Audiol. 2010;49(12):891-903. doi:10.3109/14992027.2010.506889 
  5. Hagerman B and Olofsson A. A method to measure the effect of noise reduction algorithms using simultaneous speech and noise. Acta Acoustica. 2004;90(2).
  6. Raufer S, Kohlhauer P, Uhlemayr F, Kühnel V, Preuss M, and Hobi S. Spheric Speech Clarity proven to outperform key competitors for clear speech in noise. Phonak Field Study News. 2024.
  7. Bolia RS, Nelson WT, Ericson MA, Simpson BD. A speech corpus for multitalker communications research. J Acoust Soc Am. 2000;107(2):1065-1066. doi:10.1121/1.428288
  8. Bates D. Computational methods for mixed models. lme4 Package Vignette. 2024. Retrieved from https://cran.r-project.org/web/packages/lme4/vignettes/Theory.pdf
  9. Wright A, Keller M, Kuehnel K, Latzel M, and Seitz-Paquette K. Spheric Speech Clarity applies DNN signal processing to significantly improve speech understanding from any direction and reduce the listening effort. Phonak Field Study News (in preparation). 2024.
  10. Miller A, Wright A, Keller M, Kuehnel K, Latzel M, and Seitz-Paquette K. Phonak Audéo Sphere Infinio is preferred by patients during real world use. Phonak Field Study News (in preparation). 2024.