EP3516653B1 - Apparatus and method for generating noise estimates - Google Patents

Apparatus and method for generating noise estimates Download PDF

Info

Publication number
EP3516653B1
EP3516653B1 EP16784821.7A EP16784821A EP3516653B1 EP 3516653 B1 EP3516653 B1 EP 3516653B1 EP 16784821 A EP16784821 A EP 16784821A EP 3516653 B1 EP3516653 B1 EP 3516653B1
Authority
EP
European Patent Office
Prior art keywords
noise
frequency
spectral
cut
audio signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP16784821.7A
Other languages
German (de)
French (fr)
Other versions
EP3516653A1 (en
Inventor
Wenyu Jin
Wei Xiao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP3516653A1 publication Critical patent/EP3516653A1/en
Application granted granted Critical
Publication of EP3516653B1 publication Critical patent/EP3516653B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Definitions

  • This invention relates to an apparatus and a method for generating noise estimates.
  • NR Noise reduction
  • SNR signal-to-noise ratios
  • Noise reduction methods based on single channel noise estimation can usually only deal with stationary noise scenarios and are vulnerable to non-stationary noise and interferers. Better differentiation between speech and noise can be achieved using multiple microphones. Using multiple microphones also facilitates accurate estimation of complex noise conditions and can lead to effective non-stationary noise suppression.
  • Examples of existing techniques that explore the possibility of noise estimation using multiple microphone arrays include techniques described in: " A microphone array with adaptive post-filtering for noise reduction in reverberant rooms” by R. Zelinkski (Proc. ICASSP-88, vol. 51988, pp. 2578-2581 ) and " Microphone array post-filter based on noise field coherence” by McCowan et al (Speech and Audio Processing, IEEE Transactions 11.6 (2003), 709-716 ). These techniques assume the noise is either spatially white (incoherent) or fully diffuse and cannot deal with time-varying noise and interference sources. They are also ineffective at low frequencies when the sound source is close to the microphone. Speech and noise signals show similar coherence properties under those conditions, meaning that it is not possible to determine one from the other on the basis of coherence alone.
  • US 2008/0159559 A1 describes a post-filter for a microphone array which is based on a transition frequency determined in accordance with a distance between microphones.
  • a wind noise reduction device is described in US 2008/0317261 A1 .
  • US 2014/0161271 A1 concerns a noise eliminating device and US 2016/0078856 A1 also addresses eliminating noise.
  • the noise estimator 100 comprises an estimator 101 and an adaptation unit 102.
  • the estimator is configured to receive an audio signal that is detected by microphone 103 (step S201).
  • the estimator is configured to receive audio signals that are detected by multiple microphones 104.
  • the microphones are part of the noise estimator. That device could be, for example, a mobile phone, smart phone, landline telephone, tablet, laptop, teleconferencing equipment or any generic user equipment, particularly user equipment that is commonly used to capture speech signals.
  • the audio signal represents sounds that have been captured by a microphone.
  • An audio signal will often be formed from a component that is wanted (which will usually be speech) and a component that is not wanted (which will usually be noise). Estimating the unwanted component means that it can be removed from the audio signal.
  • Each microphone will capture its own version of sounds in the surrounding environment, and those versions will tend to differ from each other depending on differences between the microphones themselves and on the respective positions of the microphones relative to the sound sources. If the sounds in the environment include speech and noise, each microphone will typically capture an audio signal that is representative of both speech and noise. Similarly, if the sounds in the environment just include noise (e.g. during pauses in speech), each microphone will capture an audio signal that represents just that noise. Sounds in the surrounding environment will typically be reflected differently in each individual audio signal. In some circumstances, these differences can be exploited to estimate the noise signal.
  • the estimator (101) is configured to generate an overall estimate of noise in the audio signal (steps 202 to 204). The estimated noise can then be removed from the audio signal by another part of the device.
  • the estimator is configured to generate the estimate based on one or more of the audio signals captured by the microphones.
  • the audio signals that are captured by the microphones are pre-processed before being input into the estimator. Such "pre-processed” are also covered by the general term "audio signals” used herein).
  • Each audio signal can be considered as being formed from a series of complex sinusoidal functions. Each of those sinusoidal functions is a spectral component of the audio signal. Typically, each spectral component is associated with a particular frequency, phase and amplitude.
  • the audio signal can be disassembled into its respective spectral components by a Fourier analysis.
  • the estimator 101 aims to form an overall noise estimate by generating a spectral noise estimate for each spectral component in the audio signal.
  • the estimator comprises a low-frequency estimator 105 and a high frequency estimator 106.
  • the low-frequency estimator 105 is configured to generate spectral noise estimates for the spectral components of the audio signal that are below a cut-off frequency. Those spectral noise estimates will form a low frequency section of the overall noise estimate.
  • the low frequency estimator achieves this by applying a first estimation technique to the audio signal to generate spectral noise estimates that are associated with frequencies below a cut-off frequency (step S202).
  • the high frequency estimator 106 is configured to generate spectral noise estimates for the spectral components of the audio signal that are above the cut-off frequency. Those spectral estimates will form a higher frequency section of the overall noise estimate. The high frequency estimator achieves this by applying a second estimation technique to the audio signal to generate spectral noise estimates that are associated with frequencies above the cut-off frequency (step S203).
  • the estimator also comprises a combine module 107 that is configured to form the overall noise estimate by combining the spectral noise estimates that are output by the low and high frequency estimators.
  • the combine module forms the overall noise estimate to have spectral noise estimates that are output by the low frequency estimator below the cut-off frequency and spectral noise estimates that are output by the high frequency estimator above the cut-off frequency (step S204).
  • the low and high frequency estimators will both be configured to generate spectral noise estimates across the whole frequency range of the audio signal. The combine module will then just select the appropriate spectral noise estimate to use for each frequency bin in the overall noise estimate, with that selection depending on the cut-off frequency.
  • the estimator 101 also comprises an adaptation unit 102.
  • the adaptation unit is configured to adjust the cut-off frequency.
  • the adaptation unit makes this adjustment to account for changes in the respective coherence properties of the speech and noise signals that are reflected in the audio signal (step S205).
  • the coherence properties of the noise signal generally varies in dependence on frequency. At low frequencies, speech and noise tend to show similar degrees of coherence whereas at higher frequencies noise is often incoherent while speech is coherent. Coherence properties can also be affected by distance between a sound source and a microphone: noise and speech show particularly similar coherence properties at low frequencies when the microphone and the sound source are close together.
  • the respective coherence properties displayed by the noise and speech signals will thus tend to vary with time, particularly in mobile and/or hands free scenarios where one or more sound sources (such as someone talking) may move with respect to the microphone.
  • One option is to track the coherence properties of both speech and noise.
  • it is the noise coherence that particularly changes. Consequently changes between the respective coherence properties of the speech and noise signals can be monitored by tracking the coherence properties of just the noise.
  • Adjusting the cut-off frequency so as to adapt to changes in the coherence properties of the noise signal that are represented in the audio signal may be advantageous because it enables the estimator to generate the overall noise estimate using techniques that work well for the particular coherence properties that are prevalent in the noise on either side of the cut-off frequency, and to alter that cut-off frequency to account for changes in those coherence properties with time. This is particularly useful for the complex noise scenarios that occur when user equipment is used in hands-free mode.
  • Figures 1 are intended to correspond to a number of functional blocks. This is for illustrative purposes only. Figure 1 is not intended to define a strict division between different parts of hardware on a chip or between different programs, procedures or functions in software.
  • some or all of the signal processing techniques described herein are likely to be performed wholly or partly in hardware. This particularly applies to techniques incorporating repetitive arithmetic operations. Examples of such repetitive operations might include Fourier transforms, auto- and cross-correlations and pseudo inverses.
  • at least some of the functional blocks are likely to be implemented wholly or partly by a processor acting under software control.
  • the processor could, for example, be a DSP of a mobile phone, smart phone, landline telephone, tablet, laptop, teleconferencing equipment or any generic user equipment with speech processing capability.
  • FIG. 3 A more detailed example of a noise estimator is shown in Figure 3 .
  • the system is configured to receive multiple audio signals X 1 to X m (301). Each of these audio signals represents a recording from a specific microphone. The number of microphones can thus be denoted M.
  • Each channel is provided with a segmentation/windowing module 302. These modules are followed by transform units 303 configured to convert the windowed signals into the frequency domain.
  • the transform units 303 are configured to implement the Fast Fourier Transform (FFT) to derive the short-term Fourier transform (STFT) coefficients for each input channel. These coefficients represent spectral components of the input signal.
  • the STFT is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time.
  • the STFT may be computed by dividing the audio signal into short segments of equal length and then computing the Fourier transform separately on each short segment. The result is the Fourier spectrum for each short segment of the audio signal, giving the signal processor the changing frequency spectra of the audio signal as a function of time. Each spectral component thus has an amplitude and a time extension.
  • the length of the FFT can be denoted N. N represents a number of frequency bins, with the STFT essentially decomposing the original audio signal into those frequency bins.
  • the outputs from the transform units 303 are input into the estimator, shown generally at 304.
  • the low frequency estimator is implemented by "SPP Based NE" unit 305 (which will be referred to as SPP unit 305 hereafter).
  • the low frequency estimator is configured to generate the spectral noise estimates below the cut-off frequency.
  • the high frequency estimator is implemented by the "Noise Coherence/Covariance” modelling unit 306 and the "MMSE Optimal NE Solver” 307 (which will be respectively referred to as modelling unit 306 and optimiser 307 hereafter).
  • the high frequency estimator is configured to generate the spectral noise estimates above the cut-off frequency.
  • the low frequency estimator 305 and the high frequency estimator 306, 307 process the outputs from the transform units using different noise estimation techniques.
  • the low frequency estimator suitably uses a technique that is adapted to the respective coherence properties of the noise signal and the speech signal that are expected to predominate in the audio signal below the cut-off frequency. In most embodiments this means that the low frequency estimator will apply an estimation technique that is adapted to a scenario in which the coherence of both signals is high and similar to the coherence of the other.
  • the low frequency estimator is configured to generate its spectral noise estimates based on a single microphone signal.
  • the high frequency estimator will similarly apply an estimation technique that is adapted to a coherence of the noise signal and the speech signal that is expected to predominate in the audio signal above the cut-off frequency.
  • the noise and speech signals are generally expected to show different coherence properties above the cut-off frequency, with the noise signal becoming less coherent than below the cut-off frequency.
  • a more accurate noise estimate may be obtained by combining signals from multiple microphones under these conditions, so the high frequency estimator may be configured to receive audio signals from multiple microphones.
  • the noise estimates that are output by the low frequency estimator 305 and the high frequency estimator 306, 307 take the form of power spectral densities (PSD).
  • PSD power spectral densities
  • a PSD represents the noise as a series of coefficients. Each coefficient represents an estimated power of the noise in an audio signal for a respective frequency bin. The coefficient in each frequency bins can be considered a spectral noise estimates.
  • the frequency bins suitably replicate the frequency bins into which the audio signals were decomposed by transform units 303.
  • the outputs of the low frequency estimator and the high frequency estimator thus represent spectral noise estimates for each spectral noise component of the audio signal.
  • the two sets of coefficients are input into the "Estimate Selection" unit 308.
  • This estimate selection unit combines the functionality of combine module 107 and adaptation unit 102 shown in Figure 1 .
  • the estimate selection unit is configured to choose between the coefficients that are output by the low frequency estimator and the high frequency estimator in dependence on frequency.
  • the adaptation unit chooses the coefficients output by SPP unit 305.
  • the estimate selection unit chooses the coefficients output by the combination of the modelling unit 306 and the optimiser 307.
  • the estimate selection unit also monitors a coherence of the noise signal by means of the audio signal, and uses this to adapt the cut-off frequency.
  • the low frequency estimator may use any suitable estimation technique to generate spectral noise estimates that are below a cut-off frequency.
  • One option would be an MMSE-based spectral noise power estimation technique.
  • Another option is soft decision voice activity detection. This is the technique implemented by SPP unit 305, which is configured to implement a single-channel SPP-based method (where "SPP" stands for Speech Presence Probability). SPP maintains a quick noise tracking capability, results in less noise power overestimation and is computationally less expensive than other options.
  • SPP module 305 is configured to receive an audio signal from one microphone.
  • the SPP unit 305 is preferably configured to receive the single channel that corresponds to the device's "primary" microphone.
  • Model adaptation unit 306 is configured to update a noise coherence model and a noise covariance model in dependence on signals input from multiple microphones.
  • Optimiser 307 takes the outputs of the model adaptation unit and generates the optimum noise estimate for higher frequency sections of the overall noise estimate given those outputs.
  • step S401 the incoming signals 301 are received from multiple microphones.
  • step S402 those signals are segmented/windowed (by segmentation/windowing units 302) and converted into the frequency domain (by transform units 303).
  • ⁇ ⁇ , ⁇ represents the probability of speech presence in frame ⁇ and frequency bin ⁇
  • X 1 is the audio signal received by SPP unit 305
  • ⁇ opt is a fixed, optimal a priori signal-to-noise ratio
  • ⁇ N ,SPP ( ⁇ - 1, ⁇ ) is the noise estimate of the previous frame.
  • ⁇ ⁇ , ⁇ is a value between 0 and 1, where 1 indicates speech presence.
  • the speech presence probability calculation also triggers the updating of the noise coherence and covariance models by modelling unit 306, since these models are preferably updated in the absence of speech.
  • the model adaptation unit (306) is configured to track two qualities of the noise comprised in the incoming microphone signals: its coherence and its covariance (step S405).
  • the model adaptation unit is configured to track noise coherence using a model that is based on a coherence function.
  • the coherence function characterises a noise field by representing the coherence between two signals at points p and q.
  • the magnitude of the output of the noise coherence function is always less than or equal to one (i.e.
  • the relevant distance is this scenario is between the j th and k th microphones, so the subscripts j and k will be substituted for p and q hereafter.
  • is the frame index
  • is the frequency bin
  • ⁇ jj ( ⁇ , ⁇ ), ⁇ kk ( ⁇ , ⁇ ) and ⁇ jk ( ⁇ , ⁇ ) are the recursively-smoothed, auto-correlated and cross-correlated PSDs of the audio signals from the j th and k th microphones respectively.
  • ⁇ ( ⁇ , ⁇ ) is the posteriori SPP index for the current frame and is provided to model adaptation unit 306 by SPP unit 305.
  • ⁇ ( ⁇ , ⁇ ) acts as the threshold for ⁇ pq ( ⁇ , ⁇ ) to be updated. In practice it is preferable to only update ⁇ pq ( ⁇ , ⁇ ) in periods where speech is absent.
  • a suitable value for the smoothing factor ⁇ ⁇ might be 0.95.
  • the model adaptation unit (306) is also configured to actively update a noise covariance matrix (also in step S405).
  • the model adaptation unit (306) is thus configured to establish the coherence and covariance models and update them as audio signals are received from the microphones.
  • diag and odiag represents the diagonal and off-diagonal elements respectively, written in vector form:
  • the updated models are used to generate a further noise estimate using an optimal least squares solution (step S406).
  • the values of R and ⁇ are suitably transferred from the model adaptation unit (306) to the optimiser (307).
  • the optimiser is configured to generate the noise estimate for higher frequencies by searching for an optimal least-squares solution to equation (7) in the minimum mean square error (MMSE) sense.
  • MMSE minimum mean square error
  • the estimate selector 308 is configured to form the overall noise estimate. It receives the estimates generated by both the SPP unit (305) and the optimiser (307) ( ⁇ S and ⁇ C respectively) and combines them to form the overall noise estimate (step S407).
  • estimate selector 308 is configured to adaptively adjust the split frequency between the single microphone noise estimate and the multi microphone noise estimate based on the updating model in equation (4).
  • f pq represents the frequency where the magnitude squared value of the updated coherence function in equation (4) for the pq th microphone pair has some predetermined value.
  • a suitable value might be, for example, 0.5.
  • the split frequency is selected to be the lowest frequency among various microphone pairs where the magnitude squared value of coherence function has the predetermined value. This ensures that the appropriate noise estimate is selected for the speech and noise coherence properties experienced at different frequencies, meaning that problems caused by similarity and overlapping between speech and noise coherence properties can be consistently avoided for each channel.
  • noise reduction can be achieved using any suitable noise reduction methods, including wiener filtering, spectrum subtraction etc.
  • the techniques described above have been tested via simulation using complex non-stationary subway scenario recordings and three microphones.
  • the recording length was 130 seconds.
  • the recording was processed using the adaptive cut-off frequency technique described above and a technique in which the cut-off frequency is fixed.
  • the results are shown in Figure 5 .
  • the lower plot 502 illustrates the technique described herein and it can clearly be seen that it has been more effective in addressing the non-stationary noise issues that the fixed cut-off frequency technique shown in upper plot 501.
  • the processing was also more efficient.
  • the processing time using the non-adaptive technique was 62 seconds, compared with 35 seconds for the adaptive technique.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Description

  • This invention relates to an apparatus and a method for generating noise estimates.
  • Voice telecommunication is an increasingly essential part of daily life. Noise is a critical issue for voice telecommunications and is inevitable in the real world. Noise reduction (NR) technologies can be applied to enhance the intelligibility of voice communications. The majority of existing NR methods are optimized for near-end speech enhancement. These methods work well, for example, when a mobile phone is used in "hand-held" mode. Hand-held scenarios are generally easy to handle, due to high signal-to-noise ratios (SNR). NR methods are vulnerable to "hands-free" scenarios, however. These often involve low SNRs due to distant sound pickup. Complex noise variations in particular can undermine system performance in hands-free mode. Reduction of this "non-stationary" noise is difficult to achieve.
  • Noise reduction methods based on single channel noise estimation can usually only deal with stationary noise scenarios and are vulnerable to non-stationary noise and interferers. Better differentiation between speech and noise can be achieved using multiple microphones. Using multiple microphones also facilitates accurate estimation of complex noise conditions and can lead to effective non-stationary noise suppression.
  • Examples of existing techniques that explore the possibility of noise estimation using multiple microphone arrays include techniques described in: "A microphone array with adaptive post-filtering for noise reduction in reverberant rooms" by R. Zelinkski (Proc. ICASSP-88, vol. 51988, pp. 2578-2581) and "Microphone array post-filter based on noise field coherence" by McCowan et al (Speech and Audio Processing, IEEE Transactions 11.6 (2003), 709-716). These techniques assume the noise is either spatially white (incoherent) or fully diffuse and cannot deal with time-varying noise and interference sources. They are also ineffective at low frequencies when the sound source is close to the microphone. Speech and noise signals show similar coherence properties under those conditions, meaning that it is not possible to determine one from the other on the basis of coherence alone.
  • One technique that recognises speech and noise cannot be distinguished from each other based on coherence alone at low frequencies is described in "Dual microphone noise PSD noise estimation for mobile phones in hands-free position exploiting the coherence and speech presence probability" by Nelke et al (Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference). This paper proposes a solution in which the noise power and signals from multiple microphones at higher frequencies. The final noise estimate is converged between the low and high frequency estimates based on a fixed frequency threshold. This leads to vulnerability: the coherence models for speech and noise inevitably overlap at complex low SNR scenarios and the fixed frequency threshold is not always effective at separating the regions of overlap from those of non-overlap. It also leads to high complexity for real-time implementations, as the noise estimation is based on a multi-channel adaptive coherence model that requires adaption for both a speech coherence model and a noise coherence model.
  • US 2008/0159559 A1 describes a post-filter for a microphone array which is based on a transition frequency determined in accordance with a distance between microphones. A wind noise reduction device is described in US 2008/0317261 A1 . US 2014/0161271 A1 concerns a noise eliminating device and US 2016/0078856 A1 also addresses eliminating noise.
  • It is an object of the invention to provide concepts for generating more accurate noise estimates.
  • The foregoing and other objects are achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
  • The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings:
    • Figure 1 shows a noise estimator according to an embodiment of the present invention;
    • Figure 2 shows an example of a process for estimating noise;
    • Figure 3 shows a more detailed example of a noise estimator according to an embodiment of the invention;
    • Figure 4 is a flowchart showing a more detailed process for estimating noise; and
    • Figure 5 shows simulation results comparing a fixed cut-off frequency procedure with an adaptive cut-off frequency procedure in accordance with an embodiment of the present invention.
  • An example of a noise estimator is shown in Figure 1. An overview of an operation of the noise estimator is shown in Figure 2. The noise estimator 100 comprises an estimator 101 and an adaptation unit 102. The estimator is configured to receive an audio signal that is detected by microphone 103 (step S201). The estimator is configured to receive audio signals that are detected by multiple microphones 104. The microphones are part of the noise estimator. That device could be, for example, a mobile phone, smart phone, landline telephone, tablet, laptop, teleconferencing equipment or any generic user equipment, particularly user equipment that is commonly used to capture speech signals.
  • The audio signal represents sounds that have been captured by a microphone. An audio signal will often be formed from a component that is wanted (which will usually be speech) and a component that is not wanted (which will usually be noise). Estimating the unwanted component means that it can be removed from the audio signal. Each microphone will capture its own version of sounds in the surrounding environment, and those versions will tend to differ from each other depending on differences between the microphones themselves and on the respective positions of the microphones relative to the sound sources. If the sounds in the environment include speech and noise, each microphone will typically capture an audio signal that is representative of both speech and noise. Similarly, if the sounds in the environment just include noise (e.g. during pauses in speech), each microphone will capture an audio signal that represents just that noise. Sounds in the surrounding environment will typically be reflected differently in each individual audio signal. In some circumstances, these differences can be exploited to estimate the noise signal.
  • The estimator (101) is configured to generate an overall estimate of noise in the audio signal (steps 202 to 204). The estimated noise can then be removed from the audio signal by another part of the device. The estimator is configured to generate the estimate based on one or more of the audio signals captured by the microphones. (In some implementations, the audio signals that are captured by the microphones are pre-processed before being input into the estimator. Such "pre-processed" are also covered by the general term "audio signals" used herein). Each audio signal can be considered as being formed from a series of complex sinusoidal functions. Each of those sinusoidal functions is a spectral component of the audio signal. Typically, each spectral component is associated with a particular frequency, phase and amplitude. The audio signal can be disassembled into its respective spectral components by a Fourier analysis.
  • The estimator 101 aims to form an overall noise estimate by generating a spectral noise estimate for each spectral component in the audio signal. In Figure 1, the estimator comprises a low-frequency estimator 105 and a high frequency estimator 106. The low-frequency estimator 105 is configured to generate spectral noise estimates for the spectral components of the audio signal that are below a cut-off frequency. Those spectral noise estimates will form a low frequency section of the overall noise estimate. The low frequency estimator achieves this by applying a first estimation technique to the audio signal to generate spectral noise estimates that are associated with frequencies below a cut-off frequency (step S202). The high frequency estimator 106 is configured to generate spectral noise estimates for the spectral components of the audio signal that are above the cut-off frequency. Those spectral estimates will form a higher frequency section of the overall noise estimate. The high frequency estimator achieves this by applying a second estimation technique to the audio signal to generate spectral noise estimates that are associated with frequencies above the cut-off frequency (step S203).
  • The estimator also comprises a combine module 107 that is configured to form the overall noise estimate by combining the spectral noise estimates that are output by the low and high frequency estimators. The combine module forms the overall noise estimate to have spectral noise estimates that are output by the low frequency estimator below the cut-off frequency and spectral noise estimates that are output by the high frequency estimator above the cut-off frequency (step S204). In some embodiments, the low and high frequency estimators will both be configured to generate spectral noise estimates across the whole frequency range of the audio signal. The combine module will then just select the appropriate spectral noise estimate to use for each frequency bin in the overall noise estimate, with that selection depending on the cut-off frequency.
  • The estimator 101 also comprises an adaptation unit 102. The adaptation unit is configured to adjust the cut-off frequency. The adaptation unit makes this adjustment to account for changes in the respective coherence properties of the speech and noise signals that are reflected in the audio signal (step S205). The coherence properties of the noise signal generally varies in dependence on frequency. At low frequencies, speech and noise tend to show similar degrees of coherence whereas at higher frequencies noise is often incoherent while speech is coherent. Coherence properties can also be affected by distance between a sound source and a microphone: noise and speech show particularly similar coherence properties at low frequencies when the microphone and the sound source are close together. The respective coherence properties displayed by the noise and speech signals will thus tend to vary with time, particularly in mobile and/or hands free scenarios where one or more sound sources (such as someone talking) may move with respect to the microphone. One option is to track the coherence properties of both speech and noise. However, in practice, it is the noise coherence that particularly changes. Consequently changes between the respective coherence properties of the speech and noise signals can be monitored by tracking the coherence properties of just the noise.
  • Adjusting the cut-off frequency so as to adapt to changes in the coherence properties of the noise signal that are represented in the audio signal may be advantageous because it enables the estimator to generate the overall noise estimate using techniques that work well for the particular coherence properties that are prevalent in the noise on either side of the cut-off frequency, and to alter that cut-off frequency to account for changes in those coherence properties with time. This is particularly useful for the complex noise scenarios that occur when user equipment is used in hands-free mode.
  • The structures shown in Figures 1 (and all the block apparatus diagrams included herein) are intended to correspond to a number of functional blocks. This is for illustrative purposes only. Figure 1 is not intended to define a strict division between different parts of hardware on a chip or between different programs, procedures or functions in software. In some embodiments, some or all of the signal processing techniques described herein are likely to be performed wholly or partly in hardware. This particularly applies to techniques incorporating repetitive arithmetic operations. Examples of such repetitive operations might include Fourier transforms, auto- and cross-correlations and pseudo inverses. In some implementations, at least some of the functional blocks are likely to be implemented wholly or partly by a processor acting under software control. Any such software is suitably stored on a non-transitory machine readable storage medium. The processor could, for example, be a DSP of a mobile phone, smart phone, landline telephone, tablet, laptop, teleconferencing equipment or any generic user equipment with speech processing capability.
  • A more detailed example of a noise estimator is shown in Figure 3. The system is configured to receive multiple audio signals X1 to Xm (301). Each of these audio signals represents a recording from a specific microphone. The number of microphones can thus be denoted M. Each channel is provided with a segmentation/windowing module 302. These modules are followed by transform units 303 configured to convert the windowed signals into the frequency domain.
  • The transform units 303 are configured to implement the Fast Fourier Transform (FFT) to derive the short-term Fourier transform (STFT) coefficients for each input channel. These coefficients represent spectral components of the input signal. The STFT is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time. The STFT may be computed by dividing the audio signal into short segments of equal length and then computing the Fourier transform separately on each short segment. The result is the Fourier spectrum for each short segment of the audio signal, giving the signal processor the changing frequency spectra of the audio signal as a function of time. Each spectral component thus has an amplitude and a time extension. The length of the FFT can be denoted N. N represents a number of frequency bins, with the STFT essentially decomposing the original audio signal into those frequency bins.
  • The outputs from the transform units 303 are input into the estimator, shown generally at 304. In the example of Figure 3, the low frequency estimator is implemented by "SPP Based NE" unit 305 (which will be referred to as SPP unit 305 hereafter). The low frequency estimator is configured to generate the spectral noise estimates below the cut-off frequency. The high frequency estimator is implemented by the "Noise Coherence/Covariance" modelling unit 306 and the "MMSE Optimal NE Solver" 307 (which will be respectively referred to as modelling unit 306 and optimiser 307 hereafter). The high frequency estimator is configured to generate the spectral noise estimates above the cut-off frequency.
  • The low frequency estimator 305 and the high frequency estimator 306, 307 process the outputs from the transform units using different noise estimation techniques. The low frequency estimator suitably uses a technique that is adapted to the respective coherence properties of the noise signal and the speech signal that are expected to predominate in the audio signal below the cut-off frequency. In most embodiments this means that the low frequency estimator will apply an estimation technique that is adapted to a scenario in which the coherence of both signals is high and similar to the coherence of the other. In the example of Figure 3 the low frequency estimator is configured to generate its spectral noise estimates based on a single microphone signal. The high frequency estimator will similarly apply an estimation technique that is adapted to a coherence of the noise signal and the speech signal that is expected to predominate in the audio signal above the cut-off frequency. The noise and speech signals are generally expected to show different coherence properties above the cut-off frequency, with the noise signal becoming less coherent than below the cut-off frequency. A more accurate noise estimate may be obtained by combining signals from multiple microphones under these conditions, so the high frequency estimator may be configured to receive audio signals from multiple microphones.
  • Suitably the noise estimates that are output by the low frequency estimator 305 and the high frequency estimator 306, 307 take the form of power spectral densities (PSD). A PSD represents the noise as a series of coefficients. Each coefficient represents an estimated power of the noise in an audio signal for a respective frequency bin. The coefficient in each frequency bins can be considered a spectral noise estimates. The frequency bins suitably replicate the frequency bins into which the audio signals were decomposed by transform units 303. The outputs of the low frequency estimator and the high frequency estimator thus represent spectral noise estimates for each spectral noise component of the audio signal.
  • The two sets of coefficients are input into the "Estimate Selection" unit 308. This estimate selection unit combines the functionality of combine module 107 and adaptation unit 102 shown in Figure 1. The estimate selection unit is configured to choose between the coefficients that are output by the low frequency estimator and the high frequency estimator in dependence on frequency. To form parts of the overall noise estimate that are below the cut-off frequency, the adaptation unit chooses the coefficients output by SPP unit 305. To form parts of the overall noise estimate that are above the cut-off frequency, the estimate selection unit chooses the coefficients output by the combination of the modelling unit 306 and the optimiser 307. The estimate selection unit also monitors a coherence of the noise signal by means of the audio signal, and uses this to adapt the cut-off frequency.
  • The low frequency estimator may use any suitable estimation technique to generate spectral noise estimates that are below a cut-off frequency. One option would be an MMSE-based spectral noise power estimation technique. Another option is soft decision voice activity detection. This is the technique implemented by SPP unit 305, which is configured to implement a single-channel SPP-based method (where "SPP" stands for Speech Presence Probability). SPP maintains a quick noise tracking capability, results in less noise power overestimation and is computationally less expensive than other options.
  • SPP module 305 is configured to receive an audio signal from one microphone. For devices that have multiple microphones that are not necessarily the same (e.g. smartphones), the SPP unit 305 is preferably configured to receive the single channel that corresponds to the device's "primary" microphone.
  • At higher frequencies, noise estimation is suitably based on a multi-channel adaptive coherence model. Model adaptation unit 306 is configured to update a noise coherence model and a noise covariance model in dependence on signals input from multiple microphones. Optimiser 307 takes the outputs of the model adaptation unit and generates the optimum noise estimate for higher frequency sections of the overall noise estimate given those outputs.
  • An example of an estimation process that may be performed by the noise estimator shown in Figure 3 is shown in Figure 4 and described in detail below. In step S401 the incoming signals 301 are received from multiple microphones. In step S402, those signals are segmented/windowed (by segmentation/windowing units 302) and converted into the frequency domain (by transform units 303). The probability of speech presence in the current frame is then detected by SPP unit 305 (step S404) using the function: ρ τ , ω = 1 + 1 + ξ opt exp X 1 τ ω 2 Φ ^ N , SPP τ 1 , ω ξ opt ξ opt + 1 1
    Figure imgb0001
    where ρτ,ω represents the probability of speech presence in frame τ and frequency bin ω, X 1 is the audio signal received by SPP unit 305, ξopt is a fixed, optimal a priori signal-to-noise ratio and Φ̂N,SPP (τ - 1, ω) is the noise estimate of the previous frame. ρτ,ω is a value between 0 and 1, where 1 indicates speech presence.
  • The SPP unit (305) also updates its estimated noise PSD as the weighted sum of the current noisy frame and the previous estimate (step S404): Φ ^ N , SPP λ 1 , n = ρ τ , ω . Φ ^ N , SPP τ 1 , ω + 1 ρ τ , ω . X 1 τ ω 2
    Figure imgb0002
  • The speech presence probability calculation also triggers the updating of the noise coherence and covariance models by modelling unit 306, since these models are preferably updated in the absence of speech.
  • The model adaptation unit (306) is configured to track two qualities of the noise comprised in the incoming microphone signals: its coherence and its covariance (step S405).
  • The model adaptation unit is configured to track noise coherence using a model that is based on a coherence function. The coherence function characterises a noise field by representing the coherence between two signals at points p and q. The magnitude of the output of the noise coherence function is always less than or equal to one (i.e. |'Υ ω,pq | ≤ 1). Essentially the output represents a normalized measure of the correlation that exists between signals at two discrete points in a noise field. The noise coherence function between the jth and kth microphone can be initialised with the diffuse noise model: ϒ ω , pq = sinc 2 π f d pq c
    Figure imgb0003
    where f is the frequency, dpq is the distance between points p and q, c is the speed of sound, and ω is an index representing the relevant frequency bin. The relevant distance is this scenario is between the jth and kth microphones, so the subscripts j and k will be substituted for p and q hereafter.
  • The model adaptation unit (306) updates the coherence model as the microphone signals are received: ϒ pq τ ω = α γ ϒ pq τ 1 , ω + 1 α γ Φ jk τ ω Φ jj τ ω Φ kk τ ω when ρ τ ω < 0.1
    Figure imgb0004
    Where τ is the frame index, ω is the frequency bin, and Φjj(τ, ω), Φkk (τ, ω) and Φjk (τ, ω) are the recursively-smoothed, auto-correlated and cross-correlated PSDs of the audio signals from the jth and kth microphones respectively. ρ(τ,ω) is the posteriori SPP index for the current frame and is provided to model adaptation unit 306 by SPP unit 305. ρ(τ,ω) acts as the threshold for Υpq (τ, ω) to be updated. In practice it is preferable to only update Υ pq (τ, ω) in periods where speech is absent. A suitable value for the smoothing factor αγ might be 0.95.
  • The auto- and cross-correlated PSDs that are input into equation (4) can be calculated by recursive smoothing of the input signals: Φ jk = αΦ jk + 1 α X j X k
    Figure imgb0005
    where X j N + 2 2 × 1
    Figure imgb0006
    and X k N + 2 2 × 1
    Figure imgb0007
    are the FFT coefficient vectors of the jth and kth channels for the current frame. α is a smoothing factor.
  • The model adaptation unit (306) is also configured to actively update a noise covariance matrix (also in step S405). For each narrow frequency band, the noise covariance matrix Rnn (MXM) is recursively updated using: R ^ nn = α R nn + 1 α x T conj x , when ρ < 0.1
    Figure imgb0008
    where x (1XM) represents the STFT coefficients of the input signals from all of the microphones in respect of frequency bin n.
  • The model adaptation unit (306) is thus configured to establish the coherence and covariance models and update them as audio signals are received from the microphones.
  • Provided that the covariance model Rnn and the adaptive coherence model 'Υ pq are derived, they are linked as follows: R = Φσ
    Figure imgb0009
    where R = diag R nn odiag R nn
    Figure imgb0010
    (P2X1) and σ = σ c 2 σ w 2
    Figure imgb0011

    σ d 2
    Figure imgb0012
    represents the variance of coherent noise and the σ w 2
    Figure imgb0013
    is the variance of incoherent noise. diag and odiag represents the diagonal and off-diagonal elements respectively, written in vector form:
    • diag(Rnn ) = [ Rnn (1,1) ... Rnn (P,P)] T and
    • odiag(Rnn ) = [ Rnn (1,2),..., Rnn (1,P), Rnn (2,1),..., Rnn (P,P - 1)] T
    • Φ (P 2 X2) is derived from the adaptive coherence models between the multiple pairs of microphones: Φ = diag ϒ 1 odiag ϒ 0
      Figure imgb0014
      Where ϒ = ϒ 11 ϒ 1 P ϒ P 1 ϒ PP and 1 * = 1 P × 1
      Figure imgb0015
  • The updated models are used to generate a further noise estimate using an optimal least squares solution (step S406). The values of R and Φ are suitably transferred from the model adaptation unit (306) to the optimiser (307). In the example of Figure 3, the optimiser is configured to generate the noise estimate for higher frequencies by searching for an optimal least-squares solution to equation (7) in the minimum mean square error (MMSE) sense. This optimal solution in the MMSE sense is given by: σ = real Φ R
    Figure imgb0016
    Where Φ is the Moore-Penrose pseudo-inverse.
  • The overall noise PSD estimator is: Φ c ^ = σ c 2 + σ w 2
    Figure imgb0017
  • This approach decomposes the noise estimation problem into a series of linear equations in which every microphone signal is compared with every other microphone signal. This gives a more optimal solution than current methods, which only compare signals from a single pair of microphones. One option for extending current methods to multiple microphones would be to pair those microphones off and estimate the overall noise by averaging the noise estimates generated by each pair of microphones. This approach is not preferred, however, as each microphone signal is then only compared with one other microphone signal. Comparing all of the microphone signals against each results in a more accurate noise estimate.
  • The estimate selector 308 is configured to form the overall noise estimate. It receives the estimates generated by both the SPP unit (305) and the optimiser (307) S and Φ C respectively) and combines them to form the overall noise estimate (step S407).
  • Finally, the cut-off frequency is adaptively adjusted so that the two noise estimates can be combined into an overall noise estimate (also in step S407). In order to converge both the low and high frequency estimated coefficients into the final noise estimate more effectively, estimate selector 308 is configured to adaptively adjust the split frequency between the single microphone noise estimate and the multi microphone noise estimate based on the updating model in equation (4). The following scheme is one option for setting the cut-off frequency that controls the combination of the two estimated noise PSDs: Φ ^ n τ ω = { Φ ^ s τ ω , ω < min f 12 , f pq Φ ^ c τ ω , ω min f 12 , f pq
    Figure imgb0018
    where fpq represents the frequency where the magnitude squared value of the updated coherence function in equation (4) for the pqth microphone pair has some predetermined value. A suitable value might be, for example, 0.5. fpq varies according to the adaptive coherence model Y.The split frequency is selected to be the lowest frequency among various microphone pairs where the magnitude squared value of coherence function has the predetermined value. This ensures that the appropriate noise estimate is selected for the speech and noise coherence properties experienced at different frequencies, meaning that problems caused by similarity and overlapping between speech and noise coherence properties can be consistently avoided for each channel.
  • Given the estimated noise PSD Φ ^ n ,
    Figure imgb0019
    noise reduction can be achieved using any suitable noise reduction methods, including wiener filtering, spectrum subtraction etc.
  • The techniques described above have been tested via simulation using complex non-stationary subway scenario recordings and three microphones. The recording length was 130 seconds. The recording was processed using the adaptive cut-off frequency technique described above and a technique in which the cut-off frequency is fixed. The results are shown in Figure 5. The lower plot 502 illustrates the technique described herein and it can clearly be seen that it has been more effective in addressing the non-stationary noise issues that the fixed cut-off frequency technique shown in upper plot 501. The processing was also more efficient. The processing time using the non-adaptive technique was 62 seconds, compared with 35 seconds for the adaptive technique.
  • It should be understood that where this explanation and the accompanying claims refer to the noise estimator doing something by performing certain steps or procedures or by implementing particular techniques that does not preclude the noise estimatorfrom performing other steps or procedures or implementing other techniques as part of the same process. In other words, where the noise estimator is described as doing something "by" certain specified means, the word "by" is meant in the sense of the noise estimator performing a process "comprising" the specified means rather than "consisting of" them.
  • The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

Claims (10)

  1. A noise estimator for generating an overall noise estimate for an audio signal, wherein the noise estimator comprises microphones for capturing sounds, the sounds are represented by a plurality of audio signals comprising the audio signal, each of the plurality of audio signals is formed, at least partly, by a noise signal and comprises a plurality of spectral components, and wherein the overall noise estimate comprises, for each spectral component in the audio signal, a respective spectral noise estimate, the noise estimator comprising:
    an estimator (304) configured to generate the overall noise estimate by:
    applying a first estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are below a cut-off frequency;
    applying a different second estimation technique to the audio signal to generate, based on the plurality of audio signals, spectral noise estimates for spectral components of the audio signal that are above the cut-off frequency; and
    forming the overall noise estimate to comprise, for spectral components below the cut-off frequency, the spectral noise estimates generated using the first estimation technique and, for spectral components above the cut-off frequency, the spectral noise estimates generated using the second estimation technique; characterized by
    an adaptation unit (306) configured to adjust the cut-off frequency so as to account for changes in coherence of the noise signal that are reflected in the audio signal, wherein the adaptation unit is configured to select the cut-off frequency to be the lowest frequency at which one of the plurality of audio signals shows a predetermined degree of coherence with another of the plurality of audio signals.
  2. A noise estimator of claim 1, wherein the estimator (308) is configured to apply:
    as the first estimation technique, a technique that is adapted to a coherence of the noise signal that is expected to predominate in the audio signal below the cut-off frequency; and
    as the second estimation technique, a technique that is adapted to a coherence of the noise signal that is expected to predominate in the audio signal above the cut-off frequency.
  3. A noise estimator as claimed in claim 1 or 2, wherein the estimator is configured to generate the spectral noise estimates for above the cut-off frequency using an optimisation function that takes the plurality of audio signals as inputs.
  4. A noise estimator as claimed in any of claims 1 to 3, wherein the estimator is configured to generate the spectral noise estimates for above the cut-off frequency by comparing each of the plurality of audio signals with every other of the plurality of audio signals.
  5. A noise estimator as claimed in any of claims 1 to 4, wherein the estimator is configured to generate the spectral noise estimates for above the cut-off frequency in dependence on the coherence between each of the plurality of audio signals and every other of the plurality of audio signals.
  6. A noise estimator as claimed in any of claims 1 to 4, wherein the estimator is configured to generate the spectral noise estimates above the cut-off frequency in dependence on a covariance between each of the plurality of audio signals with every other of the plurality of audio signals.
  7. A noise estimator as claimed in any preceding claim, wherein the estimator (308) is configured to generate the spectral noise estimates for below the cut-off frequency in dependence on a single audio signal that is representative of the noise signal.
  8. A noise estimator as claimed in any preceding claim, wherein the estimator (308) is configured to generate the spectral noise estimates for below the cut-off frequency and/or the spectral noise estimates for above the cut-off frequency by applying the respective first or second estimation technique only to parts of the audio signal that are determined to not comprise speech.
  9. A method for generating an overall noise estimate of an audio signal using a noise estimator which comprises microphones for capturing sounds, the sounds being represented by a plurality of audio signals comprising the audio signal, wherein each of the plurality of audio signals is formed, at least partly, by a noise signal and comprises a plurality of spectral components, and wherein the overall noise estimate comprises, for each spectral component in the audio signal, a respective spectral noise estimate, the method comprising:
    applying (S202) a first estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are below a cut-off frequency;
    applying (S203) a different second estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are above the cut-off frequency;
    forming (S204) the overall noise estimate to comprise, for spectral components below the cut-off frequency, the spectral noise estimates generated using the first estimation technique and, for spectral components above the cut-off frequency, the spectral noise estimates generated, based on the plurality of audio signals, using the second estimation technique; and
    adjusting (S205) the cut-off frequency so as to account for changes in coherence of the noise signal that are reflected in the audio signal wherein the cut-off frequency is selected to be the lowest frequency at which one of the plurality of audio signals shows a predetermined degree of coherence with another of the plurality of audio signals.
  10. A non-transitory machine readable storage medium having stored thereon processor executable instructions implementing a method for generating an overall noise estimate of an audio signal using a noise estimator which comprises microphones for capturing sounds, the sounds are represented by a plurality of audio signals comprising the audio signal, wherein each of the plurality of audio signals is formed, at least partly, by a noise signal and comprises a plurality of spectral components, and wherein the overall noise estimate comprises, for each spectral component in the audio signal, a respective spectral noise estimate, the method comprising:
    applying (S202) a first estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are below a cut-off frequency;
    applying (S203) a different second estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are above the cut-off frequency;
    forming (S204) the overall noise estimate to comprise, for spectral components below the cut-off frequency, the spectral noise estimates generated using the first estimation technique and, for spectral components above the cut-off frequency, the spectral noise estimates generated, based on the plurality of audio signals, using the second estimation technique; and
    adjusting (S407) the cut-off frequency so as to account for changes in coherence of the noise signal that are reflected in the audio signal wherein the cut-off frequency is selected to be the lowest frequency at which one of the plurality of audio signals shows a predetermined degree of coherence with another of the plurality of audio signals.
EP16784821.7A 2016-10-12 2016-10-12 Apparatus and method for generating noise estimates Active EP3516653B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/074462 WO2018068846A1 (en) 2016-10-12 2016-10-12 Apparatus and method for generating noise estimates

Publications (2)

Publication Number Publication Date
EP3516653A1 EP3516653A1 (en) 2019-07-31
EP3516653B1 true EP3516653B1 (en) 2021-08-11

Family

ID=57184415

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16784821.7A Active EP3516653B1 (en) 2016-10-12 2016-10-12 Apparatus and method for generating noise estimates

Country Status (2)

Country Link
EP (1) EP3516653B1 (en)
WO (1) WO2018068846A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007026827A1 (en) * 2005-09-02 2007-03-08 Japan Advanced Institute Of Science And Technology Post filter for microphone array
US8428275B2 (en) * 2007-06-22 2013-04-23 Sanyo Electric Co., Ltd. Wind noise reduction device
US9131307B2 (en) * 2012-12-11 2015-09-08 JVC Kenwood Corporation Noise eliminating device, noise eliminating method, and noise eliminating program
KR101630155B1 (en) * 2014-09-11 2016-06-15 현대자동차주식회사 An apparatus to eliminate a noise of sound, a method for eliminating a noise of a sound, a sound recognition apparatus using the same and a vehicle equipped with the sound recognition apparatus

Also Published As

Publication number Publication date
WO2018068846A1 (en) 2018-04-19
EP3516653A1 (en) 2019-07-31

Similar Documents

Publication Publication Date Title
Thiergart et al. An informed parametric spatial filter based on instantaneous direction-of-arrival estimates
US9768829B2 (en) Methods for processing audio signals and circuit arrangements therefor
US8958572B1 (en) Adaptive noise cancellation for multi-microphone systems
US9185487B2 (en) System and method for providing noise suppression utilizing null processing noise subtraction
KR101726737B1 (en) Apparatus for separating multi-channel sound source and method the same
US20160066087A1 (en) Joint noise suppression and acoustic echo cancellation
US11631421B2 (en) Apparatuses and methods for enhanced speech recognition in variable environments
US20100217590A1 (en) Speaker localization system and method
US8682006B1 (en) Noise suppression based on null coherence
Braun et al. Dereverberation in noisy environments using reference signals and a maximum likelihood estimator
KR20130108063A (en) Multi-microphone robust noise suppression
Kodrasi et al. Joint dereverberation and noise reduction based on acoustic multi-channel equalization
US8761410B1 (en) Systems and methods for multi-channel dereverberation
WO2009130513A1 (en) Two microphone noise reduction system
CN110211602B (en) Intelligent voice enhanced communication method and device
US20200286501A1 (en) Apparatus and a method for signal enhancement
Nelke et al. Dual microphone noise PSD estimation for mobile phones in hands-free position exploiting the coherence and speech presence probability
Thiergart et al. An informed MMSE filter based on multiple instantaneous direction-of-arrival estimates
US11380312B1 (en) Residual echo suppression for keyword detection
US9875748B2 (en) Audio signal noise attenuation
EP3516653B1 (en) Apparatus and method for generating noise estimates
Nordholm et al. Assistive listening headsets for high noise environments: Protection and communication
Lee et al. Channel prediction-based noise reduction algorithm for dual-microphone mobile phones
Esch et al. Combined reduction of time varying harmonic and stationary noise using frequency warping
US11462231B1 (en) Spectral smoothing method for noise reduction

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20190424

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

INTG Intention to grant announced

Effective date: 20210219

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602016062070

Country of ref document: DE

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Ref country code: AT

Ref legal event code: REF

Ref document number: 1420209

Country of ref document: AT

Kind code of ref document: T

Effective date: 20210915

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG9D

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20210811

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1420209

Country of ref document: AT

Kind code of ref document: T

Effective date: 20210811

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20211213

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20211111

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20211111

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20211112

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602016062070

Country of ref document: DE

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20211031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

26N No opposition filed

Effective date: 20220512

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211012

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211031

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211012

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230524

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20161012

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20230831

Year of fee payment: 8

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20230830

Year of fee payment: 8

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210811