WO2018068846A1 - Appareil et procédé permettant de générer des estimations de bruit - Google Patents

Appareil et procédé permettant de générer des estimations de bruit Download PDF

Info

Publication number
WO2018068846A1
WO2018068846A1 PCT/EP2016/074462 EP2016074462W WO2018068846A1 WO 2018068846 A1 WO2018068846 A1 WO 2018068846A1 EP 2016074462 W EP2016074462 W EP 2016074462W WO 2018068846 A1 WO2018068846 A1 WO 2018068846A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
frequency
spectral
cut
audio signal
Prior art date
Application number
PCT/EP2016/074462
Other languages
English (en)
Inventor
Wenyu Jin
Wei Xiao
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to EP16784821.7A priority Critical patent/EP3516653B1/fr
Priority to PCT/EP2016/074462 priority patent/WO2018068846A1/fr
Publication of WO2018068846A1 publication Critical patent/WO2018068846A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Definitions

  • This invention relates to an apparatus and a method for generating noise estimates.
  • Voice telecommunication is an increasingly essential part of daily life. Noise is a critical issue for voice telecommunications and is inevitable in the real world. Noise reduction (NR) technologies can be applied to enhance the intelligibility of voice communications.
  • NR Noise reduction
  • the majority of existing NR methods are optimized for near-end speech enhancement. These methods work well, for example, when a mobile phone is used in "hand-held" mode. Hand- held scenarios are generally easy to handle, due to high signal-to-noise ratios (SNR).
  • SNR methods are vulnerable to "hands-free” scenarios, however. These often involve low SNRs due to distant sound pickup. Complex noise variations in particular can undermine system performance in hands-free mode. Reduction of this "non-stationary" noise is difficult to achieve.
  • Noise reduction methods based on single channel noise estimation can usually only deal with stationary noise scenarios and are vulnerable to non-stationary noise and interferers. Better differentiation between speech and noise can be achieved using multiple microphones. Using multiple microphones also facilitates accurate estimation of complex noise conditions and can lead to effective non-stationary noise suppression.
  • Examples of existing techniques that explore the possibility of noise estimation using multiple microphone arrays include techniques described in: "A microphone array with adaptive post- filtering for noise reduction in reverberant rooms” by R. Zelinkski (Proc. ICASSP-88, vol. 51988, pp. 2578-2581 ) and “Microphone array post-filter based on noise field coherence” by McCowan et al (Speech and Audio Processing, IEEE Transactions 1 1 .6 (2003), 709-716). These techniques assume the noise is either spatially white (incoherent) or fully diffuse and cannot deal with time-varying noise and interference sources. They are also ineffective at low frequencies when the sound source is close to the microphone. Speech and noise signals show similar coherence properties under those conditions, meaning that it is not possible to determine one from the other on the basis of coherence alone.
  • a noise estimator for generating an overall noise estimate for an audio signal, wherein the audio signal is representative of a noise signal and comprises a plurality of spectral components, and wherein the overall noise estimate comprises, for each spectral component in the audio signal, a respective spectral noise estimate.
  • the noise estimator includes an estimator that is configured to generate the overall noise estimate.
  • the noise estimator is configured to generate this overall estimate by applying a first estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are below a cut-off frequency. It is also configured to apply a second estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are above the cut-off frequency.
  • the estimator is also configured to form the overall noise estimate to comprise, for spectral components below the cut-off frequency, the spectral noise estimates generated using the first estimation technique and, for spectral components above the cut-off frequency, the spectral noise estimates generated using the second estimation technique.
  • the noise estimator also comprises an adaptation unit that is configured to adjust the cut-off frequency so as to account for changes in coherence of the noise signal that are reflected in the audio signal. Having the estimator configured to use two different estimation techniques to estimate the spectral noise estimates for different parts of the audio signal is beneficial because noise tends to display different coherence properties at different frequencies, meaning that different techniques will offer the most accurate noise estimations at those different frequencies.
  • Adjusting the cut-off frequency to account for changes in those coherence properties over time improves the accuracy of the resulting overall noise estimate.
  • a fixed cut-off frequency would result in a non-optimal estimation technique being used for some frequencies at least some of the time, whereas adjusting the cut-off frequency helps to ensure that the more appropriate of the first and second estimation techniques is always used.
  • the estimator may be configured to apply a technique that is adapted to a coherence of the noise signal that is expected to predominate in the audio signal below the cut-off frequency as the first estimation technique. It may be configured to apply a technique that is adapted to a coherence of the noise signal that is expected to predominate in the audio signal above the cut-off frequency as the second estimation technique.
  • the estimator is thus configured to adapt the estimation technique that it uses to the coherence properties of the noise signal in different frequency bands, which improves the resulting accuracy of the overall noise estimate.
  • the estimator in particular the estimator of the first implementation form, may be configured to generate the spectral noise estimates for above the cut-off frequency based on a plurality of audio signals that include the audio signal and one or more other audio signals that also represent the noise signal.
  • the plurality of audio signals may each reflect the noise signal in slightly different ways. Combining these signals together increases the information about the noise that is incorporated in the estimation process and facilitates estimation of complex noise conditions.
  • the estimator of the second implementation form may be configured to generate the spectral noise estimates for above the cut-off frequency using an optimisation function that takes the plurality of audio signals as inputs. The optimisation function thus performs an optimisation across all of the audio signals, which improves the accuracy of the resulting spectral noise estimates.
  • the estimator of the second or third implementation form may be configured to generate the spectral noise estimates for above the cut-off frequency by comparing each of the plurality of audio signals with every other of the plurality of audio signals. The estimator is thus configured to extract a maximum amount of information about the noise from the audio signals available to it by making every available comparison between pairs of the audio signals.
  • the estimator of the fourth implementation form may be configured to compare one audio signal with another by determining a coherence between those audio signals. The estimator therefore establishes the relationship between the noise components of different audio signals, which helps to estimate complex noise in the surrounding environment.
  • the adaptation unit of any of the second to fifth implementation forms may be configured to adjust the cut off frequency in dependence on the coherence between each of the plurality of audio signals with every other of the plurality of audio signals. This extracts a maximum amount of information about the relationship between the noise components in different audio signals.
  • the adaptation unit of any of the second to fifth implementation forms may be configured to select the cut-off frequency to be a frequency at which one of the plurality of audio signals shows a predetermined degree of coherence with another of the plurality of audio signals.
  • the adaptation unit thus sets the cutoff frequency in dependence on the noise property that determines which of the first and second estimation techniques is likely to provide the best noise estimate for a given frequency.
  • the adaptation unit of the seventh implementation form may be configured to select the cut-off frequency to be the lowest frequency at which one of the plurality of audio signals shows a predetermined degree of coherence with another of the plurality of audio signals.
  • the estimation technique applied below the cut-off frequency is generally quick and straightforward whilst the estimation technique applied above the cut-off frequency is generally more complicated but better suited to generating accurate noise estimates under complex noise conditions.
  • the adaptation unit is thus biased towards generating the most accurate noise estimate possible at the expense of a slight increase in processing complexity by setting the cut-off frequency to be the lowest candidate frequency.
  • the estimator of any of the second to eighth implementation forms may be configured to generate the spectral noise estimates for above the cut-off frequency in dependence on the coherence between each of the plurality of audio signals and every other of the plurality of audio signals. Using multiple audio signals may improve the accuracy of the noise estimation above the cut-off frequency, due to the nature of the noise above that cut-off frequency (and particularly for complex noise conditions).
  • the estimator of any of the second to fifth implementation forms may be configured to generate the spectral noise estimates above the cut-off frequency in dependence on a covariance between each of the plurality of audio signals with every other of the plurality of audio signals. The covariance can be combined with the coherence to generate an accurate estimate of the noise in the audio signal, even under complex noise conditions.
  • the estimator in particular the estimator of any of the above mentioned implementation forms, may be configured to generate the spectral noise estimates for below the cut-off frequency in dependence on a single audio signal that is representative of the noise signal. Multiple audio signals may not improve the accuracy of the noise estimation below the cut-off frequency, due to the nature of the noise. By using a just single audio signal, the estimator is able to implement a simpler and quicker noise estimation technique without that having a negative impact on the accuracy of the resulting noise estimate.
  • the estimator in particular the estimator of any of the above mentioned implementation forms, may be configured to generate the spectral noise estimates for below the cut-off frequency and/or the spectral noise estimates for above the cut-off frequency by applying the respective first or second estimation technique only to parts of the audio signal that are determined to not comprise speech. This both simplifies the estimation process and increases its efficiency, since the estimator is not obliged to estimate the speech components of the audio signals in addition to the noise components.
  • a method for generating an overall noise estimate of an audio signal, wherein the audio signal is representative of a noise signal and comprises a plurality of spectral components, and wherein the overall noise estimate comprises, for each spectral component in the audio signal, a respective spectral noise estimate.
  • the method comprises applying a first estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are below a cut-off frequency. It also comprises applying a second estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are above the cutoff frequency.
  • the method also comprises forming the overall noise estimate to comprise, for spectral components below the cut-off frequency, the spectral noise estimates generated using the first estimation technique and, for spectral components above the cut-off frequency, the spectral noise estimates generated using the second estimation technique.
  • the method also comprises adjusting the cut-off frequency so as to account for changes in coherence of the noise signal that are reflected in the audio signal.
  • a non-transitory machine readable storage medium having stored thereon processor executable instructions implementing a method for generating an overall noise estimate of an audio signal.
  • the audio signal is representative of a noise signal and comprises a plurality of spectral components.
  • the overall noise estimate comprises, for each spectral component in the audio signal, a respective spectral noise estimate.
  • the method comprises applying a first estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are below a cut-off frequency.
  • the method comprises applying a second estimation technique to the audio signal to generate spectral noise estimates for spectral components of the audio signal that are above the cut-off frequency.
  • the method also comprises forming the overall noise estimate to comprise, for spectral components below the cut-off frequency, the spectral noise estimates generated using the first estimation technique and, for spectral components above the cut-off frequency, the spectral noise estimates generated using the second estimation technique. It also comprises adjusting the cut-off frequency so as to account for changes in coherence of the noise signal that are reflected in the audio signal.
  • Figure 1 shows a noise estimator according to an embodiment of the present invention
  • Figure 2 shows an example of a process for estimating noise
  • Figure 3 shows a more detailed example of a noise estimator according to an embodiment of the invention
  • Figure 4 is a flowchart showing a more detailed process for estimating noise
  • Figure 5 shows simulation results comparing a fixed cut-off frequency procedure with an adaptive cut-off frequency procedure in accordance with an embodiment of the present invention.
  • the noise estimator 100 comprises an estimator 101 and an adaptation unit 102.
  • the estimator is configured to receive an audio signal that is detected by microphone 103 (step S201 ).
  • the estimator may be configured to receive audio signals that are detected by multiple microphones 104.
  • the microphone may be part of the noise estimator or separate from it. In most implementations it is expected that both the noise estimator and the microphones will be part of the same device. That device could be, for example, a mobile phone, smart phone, landline telephone, tablet, laptop, teleconferencing equipment or any generic user equipment, particularly user equipment that is commonly used to capture speech signals.
  • the audio signal represents sounds that have been captured by a microphone.
  • An audio signal will often be formed from a component that is wanted (which will usually be speech) and a component that is not wanted (which will usually be noise). Estimating the unwanted component means that it can be removed from the audio signal.
  • Each microphone will capture its own version of sounds in the surrounding environment, and those versions will tend to differ from each other depending on differences between the microphones themselves and on the respective positions of the microphones relative to the sound sources. If the sounds in the environment include speech and noise, each microphone will typically capture an audio signal that is representative of both speech and noise. Similarly, if the sounds in the environment just include noise (e.g. during pauses in speech), each microphone will capture an audio signal that represents just that noise. Sounds in the surrounding environment will typically be reflected differently in each individual audio signal. In some circumstances, these differences can be exploited to estimate the noise signal.
  • the estimator (101 ) is configured to generate an overall estimate of noise in the audio signal (steps 202 to 204).
  • the estimated noise can then be removed from the audio signal by another part of the device.
  • the estimator is configured to generate the estimate based on one or more of the audio signals captured by the microphones. (In some implementations, the audio signals that are captured by the microphones are pre-processed before being input into the estimator. Such "pre-processed" are also covered by the general term "audio signals” used herein).
  • Each audio signal can be considered as being formed from a series of complex sinusoidal functions. Each of those sinusoidal functions is a spectral component of the audio signal. Typically, each spectral component is associated with a particular frequency, phase and amplitude.
  • the audio signal can be disassembled into its respective spectral components by a Fourier analysis.
  • the estimator 101 aims to form an overall noise estimate by generating a spectral noise estimate for each spectral component in the audio signal.
  • the estimator comprises a low-frequency estimator 105 and a high frequency estimator 106.
  • the low-frequency estimator 105 is configured to generate spectral noise estimates for the spectral components of the audio signal that are below a cut-off frequency.
  • Those spectral noise estimates will form a low frequency section of the overall noise estimate.
  • the low frequency estimator achieves this by applying a first estimation technique to the audio signal to generate spectral noise estimates that are associated with frequencies below a cut-off frequency (step S202).
  • the high frequency estimator 106 is configured to generate spectral noise estimates for the spectral components of the audio signal that are above the cut-off frequency. Those spectral estimates will form a higher frequency section of the overall noise estimate.
  • the high frequency estimator achieves this by applying a second estimation technique to the audio signal to generate spectral noise estimates that are associated with frequencies above the cut-off frequency (step S203).
  • the estimator also comprises a combine module 107 that is configured to form the overall noise estimate by combining the spectral noise estimates that are output by the low and high frequency estimators.
  • the combine module forms the overall noise estimate to have spectral noise estimates that are output by the low frequency estimator below the cut-off frequency and spectral noise estimates that are output by the high frequency estimator above the cut-off frequency (step S204).
  • the low and high frequency estimators will both be configured to generate spectral noise estimates across the whole frequency range of the audio signal. The combine module will then just select the appropriate spectral noise estimate to use for each frequency bin in the overall noise estimate, with that selection depending on the cut-off frequency.
  • the estimator 101 also comprises an adaptation unit 102.
  • the adaptation unit is configured to adjust the cut-off frequency.
  • the adaptation unit makes this adjustment to account for changes in the respective coherence properties of the speech and noise signals that are reflected in the audio signal (step S205).
  • the coherence properties of the noise signal generally varies in dependence on frequency. At low frequencies, speech and noise tend to show similar degrees of coherence whereas at higher frequencies noise is often incoherent while speech is coherent. Coherence properties can also be affected by distance between a sound source and a microphone: noise and speech show particularly similar coherence properties at low frequencies when the microphone and the sound source are close together.
  • the respective coherence properties displayed by the noise and speech signals will thus tend to vary with time, particularly in mobile and/or hands free scenarios where one or more sound sources (such as someone talking) may move with respect to the microphone.
  • One option is to track the coherence properties of both speech and noise.
  • it is the noise coherence that particularly changes. Consequently changes between the respective coherence properties of the speech and noise signals can be monitored by tracking the coherence properties of just the noise.
  • Adjusting the cut-off frequency so as to adapt to changes in the coherence properties of the noise signal that are represented in the audio signal may be advantageous because it enables the estimator to generate the overall noise estimate using techniques that work well for the particular coherence properties that are prevalent in the noise on either side of the cut-off frequency, and to alter that cut-off frequency to account for changes in those coherence properties with time. This is particularly useful for the complex noise scenarios that occur when user equipment is used in hands-free mode.
  • the structures shown in Figures 1 (and all the block apparatus diagrams included herein) are intended to correspond to a number of functional blocks. This is for illustrative purposes only. Figure 1 is not intended to define a strict division between different parts of hardware on a chip or between different programs, procedures or functions in software.
  • some or all of the signal processing techniques described herein are likely to be performed wholly or partly in hardware. This particularly applies to techniques incorporating repetitive arithmetic operations. Examples of such repetitive operations might include Fourier transforms, auto- and cross-correlations and pseudo inverses.
  • at least some of the functional blocks are likely to be implemented wholly or partly by a processor acting under software control. Any such software is suitably stored on a non-transitory machine readable storage medium.
  • the processor could, for example, be a DSP of a mobile phone, smart phone, landline telephone, tablet, laptop, teleconferencing equipment or any generic user equipment with speech processing capability.
  • FIG. 3 A more detailed example of a noise estimator is shown in Figure 3.
  • the system is configured to receive multiple audio signals Xi to X m (301 ). Each of these audio signals represents a recording from a specific microphone. The number of microphones can thus be denoted M.
  • Each channel is provided with a segmentation/windowing module 302. These modules are followed by transform units 303 configured to convert the windowed signals into the frequency domain.
  • the transform units 303 are configured to implement the Fast Fourier Transform (FFT) to derive the short-term Fourier transform (STFT) coefficients for each input channel. These coefficients represent spectral components of the input signal.
  • the STFT is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time.
  • the STFT may be computed by dividing the audio signal into short segments of equal length and then computing the Fourier transform separately on each short segment. The result is the Fourier spectrum for each short segment of the audio signal, giving the signal processor the changing frequency spectra of the audio signal as a function of time. Each spectral component thus has an amplitude and a time extension.
  • the length of the FFT can be denoted N. N represents a number of frequency bins, with the STFT essentially decomposing the original audio signal into those frequency bins.
  • the outputs from the transform units 303 are input into the estimator, shown generally at 304.
  • the low frequency estimator is implemented by "SPP Based NE" unit 305 (which will be referred to as SPP unit 305 hereafter).
  • the low frequency estimator is configured to generate the spectral noise estimates below the cut-off frequency.
  • the high frequency estimator is implemented by the "Noise Coherence/Covariance” modelling unit 306 and the "MMSE Optimal NE Solver” 307 (which will be respectively referred to as modelling unit 306 and optimiser 307 hereafter).
  • the high frequency estimator is configured to generate the spectral noise estimates above the cut-off frequency.
  • the low frequency estimator 305 and the high frequency estimator 306, 307 process the outputs from the transform units using different noise estimation techniques.
  • the low frequency estimator suitably uses a technique that is adapted to the respective coherence properties of the noise signal and the speech signal that are expected to predominate in the audio signal below the cut-off frequency. In most embodiments this means that the low frequency estimator will apply an estimation technique that is adapted to a scenario in which the coherence of both signals is high and similar to the coherence of the other.
  • the low frequency estimator is configured to generate its spectral noise estimates based on a single microphone signal.
  • the high frequency estimator will similarly apply an estimation technique that is adapted to a coherence of the noise signal and the speech signal that is expected to predominate in the audio signal above the cut-off frequency.
  • the noise and speech signals are generally expected to show different coherence properties above the cutoff frequency, with the noise signal becoming less coherent than below the cut-off frequency.
  • a more accurate noise estimate may be obtained by combining signals from multiple microphones under these conditions, so the high frequency estimator may be configured to receive audio signals from multiple microphones.
  • the noise estimates that are output by the low frequency estimator 305 and the high frequency estimator 306, 307 take the form of power spectral densities (PSD).
  • PSD power spectral densities
  • Each coefficient represents an estimated power of the noise in an audio signal for a respective frequency bin.
  • the coefficient in each frequency bins can be considered a spectral noise estimates.
  • the frequency bins suitably replicate the frequency bins into which the audio signals were decomposed by transform units 303.
  • the outputs of the low frequency estimator and the high frequency estimator thus represent spectral noise estimates for each spectral noise component of the audio signal.
  • the two sets of coefficients are input into the "Estimate Selection" unit 308.
  • This estimate selection unit combines the functionality of combine module 107 and adaptation unit 102 shown in Figure 1 .
  • the estimate selection unit is configured to choose between the coefficients that are output by the low frequency estimator and the high frequency estimator in dependence on frequency.
  • the adaptation unit chooses the coefficients output by SPP unit 305.
  • the estimate selection unit chooses the coefficients output by the combination of the modelling unit 306 and the optimiser 307.
  • the estimate selection unit also monitors a coherence of the noise signal by means of the audio signal, and uses this to adapt the cut-off frequency.
  • the low frequency estimator may use any suitable estimation technique to generate spectral noise estimates that are below a cut-off frequency.
  • One option would be an MMSE-based spectral noise power estimation technique.
  • Another option is soft decision voice activity detection. This is the technique implemented by SPP unit 305, which is configured to implement a single-channel SPP-based method (where "SPP" stands for Speech Presence Probability). SPP maintains a quick noise tracking capability, results in less noise power overestimation and is computationally less expensive than other options.
  • SPP module 305 is configured to receive an audio signal from one microphone.
  • the SPP unit 305 is preferably configured to receive the single channel that corresponds to the device's "primary" microphone.
  • Model adaptation unit 306 is configured to update a noise coherence model and a noise covariance model in dependence on signals input from multiple microphones.
  • Optimiser 307 takes the outputs of the model adaptation unit and generates the optimum noise estimate for higher frequency sections of the overall noise estimate given those outputs.
  • step S401 the incoming signals 301 are received from multiple microphones.
  • step S402 those signals are segmented/windowed (by segmentation/windowing units 302) and converted into the frequency domain (by transform units 303).
  • the probability of speech presence in the current frame is then detected by SPP unit 305 (step S404) using the function: where ⁇ ⁇ represents the probability of speech presence in frame ⁇ and frequency bin ⁇ , is the audio signal received by SPP unit 305, ⁇ ⁇ is a fixed, optimal a priori signal-to-noise ratio and ⁇ $> N , S pp ( _ 1 ⁇ ⁇ ) is the noise estimate of the previous frame. ⁇ ⁇ is a value between 0 and 1 , where 1 indicates speech presence.
  • the SPP unit (305) also updates its estimated noise PSD as the weighted sum of the current noisy frame and the previous estimate (step S404):
  • the speech presence probability calculation also triggers the updating of the noise coherence and covariance models by modelling unit 306, since these models are preferably updated in the absence of speech.
  • the model adaptation unit (306) is configured to track two qualities of the noise comprised in the incoming microphone signals: its coherence and its covariance (step S405).
  • the model adaptation unit is configured to track noise coherence using a model that is based on a coherence function.
  • the coherence function characterises a noise field by representing the coherence between two signals at points p and q.
  • the magnitude of the output of the noise coherence function is always less than or equal to one (i.e. ⁇ ⁇ ⁇ (]
  • the noise coherence function between the j th and k th microphone can be initialised with the diffuse noise model:
  • f is the frequency
  • d vq is the distance between points p and q
  • c is the speed of sound
  • is an index representing the relevant frequency bin.
  • the relevant distance is this scenario is between the j th and k th microphones, so the subscripts j and k will be substituted for p and q hereafter.
  • the model adaptation unit (306) updates the coherence model as the microphone signals are received:
  • Y pq (r, ⁇ ) a Y Y pq (T - 1, ⁇ ) + (1 - ⁇ ⁇ ) * ⁇ ⁇ . ⁇ ' ⁇ ⁇ ) when ⁇ ( ⁇ , ⁇ ) ⁇ 0.1 (4)
  • is the frame index
  • is the frequency bin
  • ⁇ / 7 -( ⁇ , ⁇ ) , 3 ⁇ 43 ⁇ 4 ( ⁇ , ⁇ ) and ⁇ ;3 ⁇ 4 ( ⁇ , ⁇ ) are the recursively-smoothed, auto-correlated and cross-correlated PSDs of the audio signals from the j th and k th microphones respectively.
  • ⁇ ( ⁇ , ⁇ ) is the posteriori SPP index for the current frame and is provided to model adaptation unit 306 by SPP unit 305.
  • ⁇ ( ⁇ , ⁇ ) acts as the threshold for Y pq (r, ⁇ ) to be updated. In practice it is preferable to only update Y vq ( , ⁇ ) in periods where speech is absent.
  • a suitable value for the smoothing factor ⁇ ⁇ might be 0.95.
  • the model adaptation unit (306) is also configured to actively update a noise covariance matrix (also in step S405).
  • the noise covariance matrix R nn (MXM) is recursively updated using:
  • R nn aR nn + (1 - a)x T conj(x) , when p ⁇ 0.1 (6)
  • x (1 XM) represents the STFT coefficients of the input signals from all of the microphones in respect of frequency bin n.
  • the model adaptation unit (306) is thus configured to establish the coherence and covariance models and update them as audio signals are received from the microphones.
  • [odiag ⁇ R ⁇ represents the variance of coherent noise and the ⁇ disregards the variance of incoherent noise.
  • ⁇ (P 2 X2) is derived from the adaptive coherence models between the multiple pairs of microphones: diag(Y) 1*
  • the updated models are used to generate a further noise estimate using an optimal least squares solution (step S406).
  • the values of R and ⁇ are suitably transferred from the model adaptation unit (306) to the optimiser (307).
  • the optimiser is configured to generate the noise estimate for higher frequencies by searching for an optimal least-squares solution to equation (7) in the minimum mean square error (MMSE) sense.
  • MMSE minimum mean square error
  • the overall noise PSD estimator is:
  • the estimate selector 308 is configured to form the overall noise estimate. It receives the estimates generated by both the SPP unit (305) and the optimiser (307) ( ⁇ 5 and ⁇ t> c respectively) and combines them to form the overall noise estimate (step S407). Finally, the cut-off frequency is adaptively adjusted so that the two noise estimates can be combined into an overall noise estimate (also in step S407). In order to converge both the low and high frequency estimated coefficients into the final noise estimate more effectively, estimate selector 308 is configured to adaptively adjust the split frequency between the single microphone noise estimate and the multi microphone noise estimate based on the updating model in equation (4). The following scheme is one option for setting the cut-off frequency that controls the combination of the two estimated noise PSDs:
  • f pq represents the frequency where the magnitude squared value of the updated coherence function in equation (4) for the pq th microphone pair has some predetermined value.
  • a suitable value might be, for example, 0.5.
  • f pq varies according to the adaptive coherence model Y.
  • the split frequency is selected to be the lowest frequency among various microphone pairs where the magnitude squared value of coherence function has the predetermined value. This ensures that the appropriate noise estimate is selected for the speech and noise coherence properties experienced at different frequencies, meaning that problems caused by similarity and overlapping between speech and noise coherence properties can be consistently avoided for each channel.
  • noise reduction can be achieved using any suitable noise reduction methods, including wiener filtering, spectrum subtraction etc.
  • the techniques described above have been tested via simulation using complex non- stationary subway scenario recordings and three microphones.
  • the recording length was 130 seconds.
  • the recording was processed using the adaptive cut-off frequency technique described above and a technique in which the cut-off frequency is fixed.
  • the results are shown in Figure 5.
  • the lower plot 502 illustrates the technique described herein and it can clearly be seen that it has been more effective in addressing the non-stationary noise issues that the fixed cut-off frequency technique shown in upper plot 501 .
  • the processing was also more efficient.
  • the processing time using the non-adaptive technique was 62 seconds, compared with 35 seconds for the adaptive technique.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

La présente invention concerne un estimateur de bruit qui est approprié pour générer une estimation de bruit globale pour un signal audio. Le signal audio est représentatif d'un signal de bruit et comprend une pluralité de composantes spectrales. L'estimation de bruit globale comprend, pour chaque composante spectrale dans le signal audio, une estimation de bruit spectral respective. L'estimateur de bruit comprend un estimateur qui est configuré de sorte à générer l'estimation de bruit globale. L'estimateur de bruit est configuré de sorte à générer cette estimation globale en appliquant une première technique d'estimation au signal audio pour générer des estimations de bruit spectral pour des composantes spectrales du signal audio qui sont inférieures à une fréquence de coupure. Il est également configuré de sorte à appliquer une seconde technique d'estimation au signal audio pour générer des estimations de bruit spectral pour des composantes spectrales du signal audio qui sont supérieures à la fréquence de coupure. L'estimateur est également configuré de sorte à former l'estimation de bruit globale pour comprendre, pour des composantes spectrales inférieures à la fréquence de coupure, les estimations de bruit spectral générées à l'aide de la première technique d'estimation et, pour des composantes spectrales supérieures à la fréquence de coupure, les estimations de bruit spectral générées à l'aide de la seconde technique d'estimation. L'estimateur de bruit comprend également une unité d'adaptation qui est configurée de sorte à ajuster la fréquence de coupure de façon à tenir compte de changements de cohérence du signal de bruit qui sont réfléchis dans le signal audio. L'ajustement de la fréquence de coupure pour tenir compte de changements de ces propriétés de cohérence au fil du temps améliore la précision de l'estimation de bruit globale résultante. Une fréquence de coupure fixe conduirait à une technique d'estimation non optimale qui est utilisée pour certaines fréquences au moins une partie du temps, tandis que l'ajustement de la fréquence de coupure aide à s'assurer que la technique d'estimation la plus appropriée parmi les première et seconde techniques d'estimation est toujours utilisée.
PCT/EP2016/074462 2016-10-12 2016-10-12 Appareil et procédé permettant de générer des estimations de bruit WO2018068846A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP16784821.7A EP3516653B1 (fr) 2016-10-12 2016-10-12 Appareil et procédé permettant de générer des estimations de bruit
PCT/EP2016/074462 WO2018068846A1 (fr) 2016-10-12 2016-10-12 Appareil et procédé permettant de générer des estimations de bruit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2016/074462 WO2018068846A1 (fr) 2016-10-12 2016-10-12 Appareil et procédé permettant de générer des estimations de bruit

Publications (1)

Publication Number Publication Date
WO2018068846A1 true WO2018068846A1 (fr) 2018-04-19

Family

ID=57184415

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2016/074462 WO2018068846A1 (fr) 2016-10-12 2016-10-12 Appareil et procédé permettant de générer des estimations de bruit

Country Status (2)

Country Link
EP (1) EP3516653B1 (fr)
WO (1) WO2018068846A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080159559A1 (en) * 2005-09-02 2008-07-03 Japan Advanced Institute Of Science And Technology Post-filter for microphone array
US20080317261A1 (en) * 2007-06-22 2008-12-25 Sanyo Electric Co., Ltd. Wind Noise Reduction Device
US20140161271A1 (en) * 2012-12-11 2014-06-12 JVC Kenwood Corporation Noise eliminating device, noise eliminating method, and noise eliminating program
US20160078856A1 (en) * 2014-09-11 2016-03-17 Hyundai Motor Company Apparatus and method for eliminating noise, sound recognition apparatus using the apparatus and vehicle equipped with the sound recognition apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080159559A1 (en) * 2005-09-02 2008-07-03 Japan Advanced Institute Of Science And Technology Post-filter for microphone array
US20080317261A1 (en) * 2007-06-22 2008-12-25 Sanyo Electric Co., Ltd. Wind Noise Reduction Device
US20140161271A1 (en) * 2012-12-11 2014-06-12 JVC Kenwood Corporation Noise eliminating device, noise eliminating method, and noise eliminating program
US20160078856A1 (en) * 2014-09-11 2016-03-17 Hyundai Motor Company Apparatus and method for eliminating noise, sound recognition apparatus using the apparatus and vehicle equipped with the sound recognition apparatus

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MCCOWAN ET AL.: "Speech and Audio Processing, IEEE Transactions", vol. 11.6, 2003, article "Microphone array post-filter based on noise field coherence", pages: 709 - 716
NELKE ET AL.: "Dual microphone noise PSD noise estimation for mobile phones in hands-free position exploiting the coherence and speech presence probability", ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013 IEEE INTERNATIONAL CONFERENCE, 2013
R. ZELINKSKI: "A microphone array with adaptive post-filtering for noise reduction in reverberant rooms", PROC. ICASSP-88, vol. 51988, 1988, pages 2578 - 2581

Also Published As

Publication number Publication date
EP3516653A1 (fr) 2019-07-31
EP3516653B1 (fr) 2021-08-11

Similar Documents

Publication Publication Date Title
CN111418010B (zh) 一种多麦克风降噪方法、装置及终端设备
KR101726737B1 (ko) 다채널 음원 분리 장치 및 그 방법
US8958572B1 (en) Adaptive noise cancellation for multi-microphone systems
US7099821B2 (en) Separation of target acoustic signals in a multi-transducer arrangement
US9060052B2 (en) Single channel, binaural and multi-channel dereverberation
KR101210313B1 (ko) 음성 향상을 위해 마이크로폰 사이의 레벨 차이를 활용하는시스템 및 방법
US20140025374A1 (en) Speech enhancement to improve speech intelligibility and automatic speech recognition
US20160066087A1 (en) Joint noise suppression and acoustic echo cancellation
US20100217590A1 (en) Speaker localization system and method
KR20130108063A (ko) 다중 마이크로폰의 견고한 잡음 억제
US8761410B1 (en) Systems and methods for multi-channel dereverberation
KR20110038024A (ko) 널 프로세싱 노이즈 감산을 이용한 노이즈 억제 시스템 및 방법
US20130016854A1 (en) Microphone array processing system
KR20120114327A (ko) 레벨 큐를 사용한 적응형 잡음 감소
CN110211602B (zh) 智能语音增强通信方法及装置
US20200286501A1 (en) Apparatus and a method for signal enhancement
Nelke et al. Dual microphone noise PSD estimation for mobile phones in hands-free position exploiting the coherence and speech presence probability
US20140193000A1 (en) Method and apparatus for generating a noise reduced audio signal using a microphone array
US9875748B2 (en) Audio signal noise attenuation
WO2013057659A2 (fr) Atténuation du bruit dans un signal
EP3516653B1 (fr) Appareil et procédé permettant de générer des estimations de bruit
Lee et al. Channel prediction-based noise reduction algorithm for dual-microphone mobile phones
Martın-Donas et al. A postfiltering approach for dual-microphone smartphones
Adiga et al. Improving single frequency filtering based Voice Activity Detection (VAD) using spectral subtraction based noise cancellation
CN113851141A (zh) 一种用麦克风阵列进行噪声抑制的新方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16784821

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2016784821

Country of ref document: EP

Effective date: 20190424