EP1169883A1 - System and method for dual microphone signal noise reduction using spectral subtraction - Google Patents

System and method for dual microphone signal noise reduction using spectral subtraction

Info

Publication number
EP1169883A1
EP1169883A1 EP00925198A EP00925198A EP1169883A1 EP 1169883 A1 EP1169883 A1 EP 1169883A1 EP 00925198 A EP00925198 A EP 00925198A EP 00925198 A EP00925198 A EP 00925198A EP 1169883 A1 EP1169883 A1 EP 1169883A1
Authority
EP
European Patent Office
Prior art keywords
signal
spectral subtraction
noise
microphone
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP00925198A
Other languages
German (de)
French (fr)
Other versions
EP1169883B1 (en
Inventor
Harald Gustafsson
Ingvar Claesson
Sven Nordholm
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Publication of EP1169883A1 publication Critical patent/EP1169883A1/en
Application granted granted Critical
Publication of EP1169883B1 publication Critical patent/EP1169883B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/06Receivers
    • H04B1/10Means associated with receiver for limiting or suppressing noise or interference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones

Definitions

  • the present invention relates to communications systems, and more particularly, to methods and apparatus for mitigating the effects of disruptive background noise components in communications signals.
  • the microphone picks up not only the near-end user's speech, but also any noise which happens to be present at the near-end location.
  • the near-end microphone typically picks up sounds such as surrounding traffic, road and passenger compartment noise, room noise, and the like.
  • the resulting noisy near-end speech can be annoying or even intolerable for the far-end user. It is thus desirable that the background noise be reduced as much as possible, preferably early in the near-end signal processing chain (e.g., before the received near-end microphone signal is supplied to a near-end speech coder).
  • FIG. 1 is a high-level block diagram of such a system 100.
  • a noise reduction processor 110 is positioned at the output of a microphone 120 and at the input of a near-end signal processing path (not shown).
  • the noise reduction processor 110 receives a noisy speech signal ⁇ ; from the microphone 120 and processes the noisy speech signal x to provide a cleaner, noise- reduced speech signal _J NR which is passed through the near-end signal processing chain and ultimately to the far-end user.
  • spectral subtraction uses estimates of the noise spectrum and the noisy speech spectrum to form a signal-to-noise ratio (SNR) based gain function which is multiplied by the input spectrum to suppress frequencies having a low SNR.
  • SNR signal-to-noise ratio
  • the spectral subtraction output signal typically contains artifacts known in the art as musical tones. Further, discontinuities between processed signal blocks often lead to diminished speech quality from the far-end user perspective.
  • Many enhancements to the basic spectral subtraction method have been developed in recent years. See, for example, N. Virage, "Speech Enhancement Based on Masking Properties of the Auditory System," IEEE ICASSP. Proc. 796-799 vol. 1, 1995; D. Tsoukalas, M. Paraskevas and J. Mourjopoulos, "Speech Enhancement using Psychoacoustic Criteria," IEEE ICASSP. Proc , 359-362 vol. 2, 1993; F. Xie and D. Van Compernolle, "Speech Enhancement by Spectral Magnitude Estimation - A
  • Spectral subtraction uses two spectrum estimates, one being the "disturbed” signal and one being the “disturbing” signal, to form a signal-to-noise ratio (SNR) based gain function.
  • the disturbed spectra is multiplied by the gain function to increase the SNR for this spectra.
  • SNR signal-to-noise ratio
  • speech is enhanced from the disturbing background noise.
  • the noise is estimated during speech pauses or with the help of a noise model during speech. This implies that the noise must be stationary to have similar properties during the speech or that the model be suitable for the moving background noise. Unfortunately, this is not the case for most background noises in every-day surroundings.
  • the present invention fulfills the above-described and other needs by providing methods and apparatus for performing noise reduction by spectral subtraction in a dual microphone system.
  • a far-mouth microphone when used in conjunction with a near-mouth microphone, it is possible to handle non-stationary background noise as long as the noise spectrum can continuously be estimated from a single block of input samples.
  • the far-mouth microphone in addition to picking up the background noise, also picks us the speaker's voice, albeit at a lower level than the near-mouth microphone.
  • a spectral subtraction stage is used to suppress the speech in the far-mouth microphone signal.
  • a rough speech estimate is formed with another spectral subtraction stage from the near-mouth signal.
  • a third spectral subtraction stage is used to enhance the near-mouth signal by suppressing the background noise using the enhanced background noise estimate.
  • Figure 1 is a block diagram of a noise reduction system in which spectral subtraction can be implemented;
  • Figure 2 depicts a conventional spectral subtraction noise reduction processor;
  • Figures 3-4 depict exemplary spectral subtraction noise reduction processors according to exemplary embodiments of the invention.
  • Figure 5 depicts the placement of near- and far-mouth microphones in an exemplary embodiment of the present invention
  • Figure 6 depicts an exemplary dual microphone spectral subtraction system
  • Figure 7 depicts an exemplary spectral subtraction stage for use in an exemplary embodiment of the present invention.
  • spectral subtraction is built upon the assumption that the noise signal and the speech signal in a communications application are random, uncorrelated and added together to form the noisy speech signal. For example, if s(n), w(n) and x(n) are stochastic short- time stationary processes representing speech, noise and noisy speech, respectively, then:
  • R(f) denotes the power spectral density of a random process.
  • the conventional way to estimate the power spectral density is to use a periodogram. For example, if -S ⁇ vC/D is the N length Fourier transform of x( ) and W f /f u ) is the corresponding Fourier transform of w(ri), then:
  • Equations (3), (4) and (5) can be combined to provide:
  • IWI 2 IWI 2 - W hinder)
  • the noisy speech phase ⁇ x (f) can be used as an approximation to the clean speech phase d ⁇ (f):
  • equation (9) can be written employing a gain function G N and using vector notation as:
  • Equation (12) represents the conventional spectral subtraction algorithm and is illustrated in Figure 2.
  • a conventional spectral subtraction noise reduction processor 200 includes a fast Fourier transform processor 210, a magnitude squared processor 220, a voice activity detector 230, a block-wise averaging device 240, a block-wise gain computation processor 250, a multiplier 260 and an inverse fast Fourier transform processor 270.
  • a noisy speech input signal is coupled to an input of the fast Fourier transform processor 210
  • an output of the fast Fourier transform processor 210 is coupled to an input of the magnitude squared processor 220 and to a first input of the multiplier 260.
  • An output of the magnitude squared processor 220 is coupled to a first contact of the switch 225 and to a first input of the gain computation processor 250.
  • An output of the voice activity detector 230 is coupled to a throw input of the switch 225 , and a second contact of the switch 225 is coupled to an input of the block-wise averaging device 240.
  • An output of the block- wise averaging device 240 is coupled to a second input of the gain computation processor 250, and an output of the gain computation processor 250 is coupled to a second input of the multiplier 260.
  • An output of the multiplier 260 is coupled to an input of the inverse fast Fourier transform processor 270, and an output of the inverse fast Fourier transform processor 270 provides an output for the conventional spectral subtraction system 200.
  • the conventional spectral subtraction system 200 processes the incoming noisy speech signal, using the conventional spectral subtraction algorithm described above, to provide the cleaner, reduced-noise speech signal.
  • the various components of Figure 2 can be implemented using any known digital signal processing technology, including a general purpose computer, a collection of integrated circuits and/or application specific integrated circuitry (ASIC).
  • ASIC application specific integrated circuitry
  • the second parameter k is adjusted so that the desired noise reduction is achieved. For example, if a larger k is chosen, the speech distortion increases.
  • the parameter k is typically set depending upon how the first parameter a is chosen. A decrease in a typically leads to a decrease in the k parameter as well in order to keep the speech distortion low.
  • over-subtraction i.e., & > 1).
  • the conventional spectral subtraction gain function (see equation (12)) is derived from a full block estimate and has zero phase.
  • the corresponding impulse response g N (u) is non-causal and has length N (equal to the block length). Therefore, the multiplication of the gain function G N (l) and the input signal X N (see equation (11)) results in a periodic circular convolution with a non-causal filter.
  • periodic circular convolution can lead to undesirable aliasing in the time domain, and the non-causal nature of the filter can lead to discontinuities between blocks and thus to inferior speech quality.
  • the present invention a method and apparatus for providing correct convolution with a causal gain filter and thereby eliminates the above described problems of time domain aliasing and interblock discontinuity.
  • the result of the multiplication is not a correct convolution. Rather, the result is a circular convolution with a periodicity of N: x N ® y N (14) where the symbol ® denotes circular convolution.
  • the accumulated order of the impulse responses x N and y N must be less than or equal to one less than the block length N - 1.
  • the time domain aliasing problem resulting from periodic circular convolution can be solved by using a gain function G ⁇ /) and an input signal block X N having a total order less than or equal to N - 1.
  • the spectrum X ⁇ of the input signal is of full block length ⁇ .
  • an input signal block x L of length L (L ⁇ ⁇ ) is used to construct a spectrum of order L.
  • the length L is called the frame length and thus X L is one frame. Since the spectrum which is multiplied with the gain function of length ⁇ should also be of length ⁇ , the frame is zero padded to the full block length ⁇ , resulting in X L , ⁇ .
  • the gain function according to the invention can be interpolated from a gain function G M (l) of length M , where M ⁇ N, to form G Mt ⁇ ( ).
  • any known or yet to be developed spectrum estimation technique can be used as an alternative to the above described simple Fourier transform periodogram.
  • spectrum estimation techniques provide lower variance in the resulting gain function. See, for example, J.G. Proakis and D.G. Manolakis, Digital Signal Processing; Principles, Algorithms, and Applications, Macmillan, Second Ed., 1992.
  • Bartlett method for example, the block of length N is divided into K sub-blocks of length M. A periodogram for each sub-block is then computed and the results are averaged to provide an M-long periodogram for the total block as:
  • the variance is reduced by a factor K when the sub-blocks are uncorrelated, compared to the full block length periodogram.
  • the frequency resolution is also reduced by the same factor.
  • the Welch method can be used.
  • the Welch method is similar to the Bartlett method except that each sub-block is windowed by a Hanning window, and the sub-blocks are allowed to overlap each other, resulting in more sub-blocks.
  • the variance provided by the Welch method is further reduced as compared to the Bartlett method.
  • the Bartlett and Welch methods are but two spectral estimation techniques, and other known spectral estimation techniques can be used as well.
  • the function P x M ( ) is computed using the Bartlett or Welch method
  • the function V x j J) is the exponential average for the current block
  • the function P X , M (l-l) is the exponential average for the previous block.
  • the parameter ⁇ controls how long the exponential memory is, and typically should not exceed the length of how long the noise can be considered stationary. An ⁇ closer to 1 results in a longer exponential memory and a substantial reduction of the periodogram variance.
  • the length M is referred to as the sub-block length, and the resulting low order gain function has an impulse response of length M.
  • the noise periodogram estimate V XL , M (I) and the noisy speech periodogram estimate ⁇ XL , M (0 employed in the composition of the gain function are also of length M:
  • this is achieved by using a shorter periodogram estimate from the input frame X L and averaging using, for example, the Bartlett method.
  • the Bartlett method decreases the variance of the estimated periodogram, and there is also a reduction in frequency resolution.
  • the reduction of the resolution from L frequency bins to M bins means that the periodogram estimate V XL ,M ( ) is also of length M .
  • the variance of the noise periodogram estimate P* L ,M (l) can be decreased further using exponential averaging as described above.
  • the frame length L, added to the sub-block length M is made less than N. As a result, it is possible to form the desired output block as:
  • the low order filter according to the invention also provides an opportunity to address the problems created by the non-causal nature of the gain filter in the conventional spectral subtraction algorithm (i.e., inter-block discontinuity and diminished speech quality).
  • a phase can be added to the gain function to provide a causal filter.
  • the phase can be constructed from a magnitude function and can be either linear phase or minimum phase as desired.
  • the gain function is also interpolated to a length N, which is done, for example, using a smooth interpolation.
  • the phase that is added to the gain function is changed accordingly, resulting in:
  • construction of the linear phase filter can also be performed in the time-domain.
  • the gain function G M (f is transformed to the time- domain using an IFFT, where the circular shift is done.
  • the shifted impulse response is zero-padded to a length N, and then transformed back using an N-long FFT. This leads to an interpolated causal linear phase filter ff ) as desired.
  • a causal minimum phase filter according to the invention can be constructed from the gain function by employing a Hilbert transform relation.
  • the Hilbert transform relation implies a unique relationship between real and imaginary parts of a complex function.
  • this can also be utilized for a relationship between magnitude and phase, when the logarithm of the complex signal is used, as:
  • phase is zero, resulting in a real function.
  • G M (fJ ⁇ ) is transformed to the time-domain employing an IFFT of length M, forming g M (n).
  • the time-domain function is rearranged as:
  • the function gM n) is transformed back to the frequency-domain using an M-long FFT, yielding In (
  • the causal minimum phase filter G M ( U ) is then interpolated to a length N. The interpolation is made the same way as in the linear phase case described above.
  • the resulting interpolated filter G Mm (f u ) is causal and has approximately minimum phase.
  • the above described spectral subtraction scheme according to the invention is depicted in Figure 3.
  • a spectral subtraction noise reduction processor 300 providing linear convolution and causal-filtering, is shown to include a Bartlett processor 305, a magnitude squared processor 320, a voice activity detector 330, a block-wise averaging processor 340, a low order gain computation processor 350, a gain phase processor 355, an interpolation processor 356, a multiplier 360, an inverse fast Fourier transform processor 370 and an overlap and add processor 380.
  • the noisy speech input signal is coupled to an input of the Bartlett processor 305 and to an input of the fast Fourier transform processor 310.
  • An output of the Bartlett processor 305 is coupled to an input of the magnitude squared processor 320, and an output of the fast Fourier transform processor 310 is coupled to a first input of the multiplier 360.
  • An output of the magnitude squared processor 320 is coupled to a first contact of the switch 325 and to a first input of the low order gain computation processor 350.
  • a control output of the voice activity detector 330 is coupled to a throw input of the switch 325, and a second contact of the switch 325 is coupled to an input of the block- wise averaging device 340.
  • An output of the block-wise averaging device 340 is coupled to a second input of the low order gain computation processor 350, and an output of the low order gain computation processor 350 is coupled to an input of the gain phase processor 355.
  • An output of the gain phase processor 355 is coupled to an input of the interpolation processor 356, and an output of the interpolation processor 356 is coupled to a second input of the multiplier 360.
  • An output of the multiplier 360 is coupled to an input of the inverse fast Fourier transform processor 370, and an output of the inverse fast Fourier transform processor 370 is coupled to an input of the overlap and add processor 380.
  • An output of the overlap and add processor 380 provides a reduced noise, clean speech output for the exemplary noise reduction processor 300.
  • the spectral subtraction noise reduction processor 300 processes the incoming noisy speech signal, using the linear convolution, causal filtering algorithm described above, to provide the clean, reduced-noise speech signal.
  • the various components of Figure 3 can be implemented using any known digital signal processing technology, including a general purpose computer, a collection of integrated circuits and/or application specific integrated circuitry (ASIC).
  • ASIC application specific integrated circuitry
  • the variance of the gain function G M ( ) of the invention can be decreased still further by way of a controlled exponential gain function averaging scheme according to the invention.
  • the averaging is made dependent upon the discrepancy between the current block spectrum P C, ⁇ X and the averaged noise spectrum ⁇ X ,M(1). For example, when there is a small discrepancy, long averaging of the gain function G M ( ) can be provided, corresponding to a stationary background noise situation. Conversely, when there is a large discrepancy, short averaging or no averaging of the gain function G ⁇ l) can be provided, corresponding to situations with speech or highly varying background noise.
  • the averaging of the gain function is not increased in direct proportion to decreases in the discrepancy, as doing so introduces an audible shadow voice (since the gain function suited for a speech spectrum would remain for a long period). Instead, the averaging is allowed to increase slowly to provide time for the gain function to adapt to the stationary input.
  • the discrepancy measure between spectra is defined as
  • the parameter ⁇ ( ) is an exponential average of the discrepancy between spectra, described by
  • the parameter ⁇ in equation (27) is used to ensure that the gain function adapts to the new level, when a transition from a period with high discrepancy between the spectra to a period with low discrepancy appears. As noted above, this is done to prevent shadow voices. According to the exemplary embodiments, the adaption is finished before the increased exponential averaging of the gain function starts due to the decreased level of ⁇ l).
  • the above equations can be interpreted for different input signal conditions as follows.
  • the variance is reduced.
  • the noise spectra has a steady mean value for each frequency, it can be averaged to decrease the variance.
  • Noise level changes result in a discrepancy between the averaged noise spectrum V X ,M ( ) and the spectrum for the current block P- > ⁇ / -
  • the controlled exponential averaging method decreases the gain function averaging until the noise level has stabilized at a new level. This behavior enables handling of the noise level changes and gives a decrease in variance during stationary noise periods and prompt response to noise changes.
  • High energy speech often has time- varying spectral peaks.
  • the exponential averaging is kept at a minimum during high energy speech periods. Since the discrepancy between the average noise spectrum P X ,M(1) and the current high energy speech spectrum P- fJW (/) is large, no exponential averaging of the gain function is performed. During lower energy speech periods, the exponential averaging is used with a short memory depending on the discrepancy between the current low-energy speech spectrum and the averaged noise spectrum. The variance reduction is consequently lower for low-energy speech than during background noise periods, and larger compared to high energy speech periods.
  • a spectral subtraction noise reduction processor 400 providing linear convolution, causal-filtering and controlled exponential averaging, is shown to include the Bartlett processor 305, the magnitude squared processor 320, the voice activity detector 330, the block- wise averaging device 340, the low order gain computation processor 350, the gain phase processor 355, the interpolation processor 356, the multiplier 360, the inverse fast Fourier transform processor 370 and the overlap and add processor 380 of the system 300 of Figure 3, as well as an averaging control processor 445, an exponential averaging processor 446 and an optional fixed FIR post filter 465.
  • the noisy speech input signal is coupled to an input of the Bartlett processor 305 and to an input of the fast Fourier transform processor 310.
  • An output of the Bartlett processor 305 is coupled to an input of the magnitude squared processor 320, and an output of the fast Fourier transform processor 310 is coupled to a first input of the multiplier 360.
  • An output of the magnitude squared processor 320 is coupled to a first contact of the switch 325, to a first input of the low order gain computation processor 350 and to a first input of the averaging control processor 445.
  • a control output of the voice activity detector 330 is coupled to a throw input of the switch 325, and a second contact of the switch 325 is coupled to an input of the block- wise averaging device 340.
  • An output of the block- wise averaging device 340 is coupled to a second input of the low order gain computation processor 350 and to a second input of the averaging controller 445.
  • An output of the low order gain computation processor 350 is coupled to a signal input of the exponential averaging processor 446, and an output of the averaging controller 445 is coupled to a control input of the exponential averaging processor 446.
  • An output of the exponential averaging processor 446 is coupled to an input of the gain phase processor 355, and an output of the gain phase processor 355 is coupled to an input of the interpolation processor 356.
  • An output of the interpolation processor 356 is coupled to a second input of the multiplier 360, and an output of the optional fixed FIR post filter 465 is coupled to a third input of the multiplier 360.
  • An output of the multiplier 360 is coupled to an input of the inverse fast Fourier transform processor 370, and an output of the inverse fast Fourier transform processor 370 is coupled to an input of the overlap and add processor 380.
  • An output of the overlap and add processor 380 provides a clean speech signal for the exemplary system 400.
  • the spectral subtraction noise reduction processor 400 processes the incoming noisy speech signal, using the linear convolution, causal filtering and controlled exponential averaging algorithm described above, to provide the improved, reduced-noise speech signal.
  • the various components of Figure 4 can be implemented using any known digital signal processing technology, including a general purpose computer, a collection of integrated circuits and/or application specific integrated circuitry (ASIC).
  • ASIC application specific integrated circuitry
  • the extra fixed FIR filter 465 of length J ⁇ N - 1 - L - M can be added as shown in Figure 4.
  • the post filter 465 is applied by multiplying the interpolated impulse response of the filter with the signal spectrum as shown.
  • the interpolation to a length N is performed by zero padding of the filter and employing an N-long FFT.
  • This post filter 465 can be used to filter out the telephone bandwidth or a constant tonal component. Alternatively, the functionality of the post filter 465 can be included directly within the gain function.
  • parameter selection is described hereinafter in the context of a GSM mobile telephone.
  • the frame length L is set to 160 samples, which provides 20 ms frames. Other choices of L can be used in other systems. However, it should be noted that an increment in the frame length L corresponds to an increment in delay.
  • the sub-block length M e.g., the periodogram length for the
  • Bartlett processor is made small to provide increased variance reduction M. Since an FFT is used to compute the periodograms, the length M can be set conveniently to a power of two. The frequency resolution is then determined as:
  • the GSM system sample rate is 8000 Hz.
  • the present invention utilizes a two microphone system.
  • the two microphone system is illustrated in Figure 5, where 582 is a mobile telephone, 584 is a near-mouth microphone, and 586 is a far-mouth microphone.
  • 582 is a mobile telephone
  • 584 is a near-mouth microphone
  • 586 is a far-mouth microphone.
  • the far-mouth microphone 586 in addition to picking up the background noise, also picks us the speaker's voice, albeit at a lower level than the near-mouth microphone 584.
  • a spectral subtraction stage is used to suppress the speech in the far-mouth microphone 586 signal.
  • a rough speech estimate is formed with another spectral subtraction stage from the near-mouth signal.
  • a third spectral subtraction stage is used to enhance the near-mouth signal by filtering out the enhanced background noise.
  • a potential problem with the above technique is the need to make low variance estimates of the filter, i.e., the gain function, since the speech and noise estimates can only be formed from a short block of data samples.
  • the single microphone spectral subtraction algorithm discussed above is used. By doing so, this method reduces the variability of the gain function by using Bartlett' s spectrum estimation method to reduce the variance.
  • the frequency resolution is also reduced by this method but this property is used to make a causal true linear convolution.
  • the variability of the gain function is further reduced by adaptive averaging, controlled by a discrepancy measure between the noise and noisy speech spectrum estimates.
  • the continuos signal from the near-mouth microphone 584 where the speech is dominating, x s (n); and the continuous signal from the far-mouth microphone 586, where the noise is more dominant, x convention(n).
  • the signal from the near- mouth microphone 584 is provided to an input of a buffer 689 where it is broken down into blocks x s (i).
  • buffer 689 is also a speech encoder.
  • the signal from the far-mouth microphone 586 is provided to an input of a buffer 687 where it is broken down into blocks ⁇ ,( ⁇ ).
  • Both buffers 687 and 689 can also include additional signal processing such as an echo canceller in order to further enhance the performance of the present invention.
  • An analog to digital (A/D) converter (not shown) converts an analog signal, derived from the microphones 584, 586, to a digital signal so that it may be processed by the spectral subtraction stages of the present invention.
  • the A/D converter may be present either prior to or following the buffers 687, 689.
  • the first spectral subtraction stage 601 has as its input, a block of the near- mouth signal, x s (i), and an estimate of the noise from the previous frame, Y n (f,i - 1).
  • the estimate of noise from the previous frame is produced by coupling the output of the second spectral subtraction stage 602 to the input of a delay circuit 688.
  • the output of the delay circuit 688 is coupled to the first spectral subtraction stage 601.
  • This first spectral subtraction stage is used to make a rough estimate of the speech, Y r (f,i).
  • the output of the first spectral subtraction stage 601 is supplied to the second spectral subtraction stage 602 which uses this estimate (Y r (f,i)) and a block of the far-mouth signal, x n (i) to estimate the noise spectrum for the current frame, Y friend(f, ⁇ ).
  • the output of the second spectral subtraction stage 602 is supplied to the third spectral subtraction stage 603 which uses the current noise spectrum estimate, Y friendship(f,i), and a block of the near-mouth signal, x s (i), to estimate the noise reduced speech, Y s (f,i).
  • the output of the third spectral subtraction stage 603 is coupled to an input of the inverse fast Fourier transform processor 670, and an output of the inverse fast Fourier transform processor 670 is coupled to an input of the overlap and add processor 680.
  • the output of the overlap and add processor 680 provides a clean speech signal as an output from the exemplary system 600.
  • each spectral subtraction stage 601-603 has a parameter which controls the size of the subtraction. This parameter is preferably set differently depending on the input SNR of the microphones and the method of noise reduction being employed.
  • a controller is used to dynamically set the parameters for each of the spectral subtraction stages 601-603 for further accuracy in a variable noisy environment.
  • the far-mouth microphone signal is used to estimate the noise spectrum which will be subtracted from the near-mouth noisy speech spectrum, performance of the present invention will be increased when the background noise spectrum has the same characteristics in both microphones.
  • the background characteristics are different when compared to an omni-directional far-mouth microphone.
  • one or both of the microphone signals should be filtered in order to reduce the differences of the spectra.
  • the present invention uses the same block of samples as the voice encoder. Thereby, no extra delay is introduced for the buffering of the signal block.
  • the introduced delay is therefore only the computation time of the noise reduction of the present invention plus the group delay of the gain function filtering in the last spectral subtraction stage.
  • a minimum phase can be imposed on the amplitude gain function which gives a short delay under the constraint of causal filtering.
  • VAD 330, switch 325, and average block 340 as illustrated with respect to the single microphone use of the spectral subtraction in Figures 3 and 4. That is, the far- mouth microphone can be used to provide a constant noise signal during both voice and non- voice time periods.
  • IFFT 370 and the overlap and add circuit 380 have been moved to the final output stage as illustrated as 670 and 680 in Figure 6.
  • the above described spectral subtraction stages used in the dual microphone implementation may each be implemented as depicted in Figure 7.
  • a spectral subtraction stage 700 providing linear convolution, causal-filtering and controlled exponential averaging, is shown to include the Bartlett processor 705, the frequency decimator 722, the low order gain computation processor 750, the gain phase processor and the interpolation processor 755/756, and the multiplier 760.
  • the noisy speech input signal, X ( . ) (i) is coupled to an input of the Bartlett processor 705 and to an input of the fast Fourier transform processor 710.
  • the notation X o ( is used to represent X ) or - ⁇ ,.(/) which are provided to the inputs of spectral subtraction stages 601-603 as illustrated in Figure 6.
  • the amplitude spectrum of the unwanted signal, with length N, is coupled to an input of the frequency decimator 722.
  • the notation Y ⁇ ff.i) is used to represent Y n (f,i-l), Y r if, ⁇ ), or Y f,i).
  • An output of the frequency decimator 722 is the amplitude spectrum of Y ( .
  • the frequency decimator 722 reduces the variance of the output amplitude spectrum as compared to the input amplitude spectrum.
  • An amplitude spectrum output of the Bartlett processor 705 and an amplitude spectrum output of the frequency decimator 722 are coupled to inputs of the low order gain computation processor 750.
  • the output of the fast Fourier transform processor 710 is coupled to a first input of the multiplier 760.
  • the output of the low order gain computation processor 750 is coupled to a signal input of an optional exponential averaging processor 746.
  • An output of the exponential averaging processor 746 is coupled to an input of the gain phase and interpolation processor 755/756.
  • An output of processor 755/756 is coupled to a second input of the multiplier 760.
  • the filtered spectrum Y * (f, ⁇ ) is thus the output of the multiplier 760, where the notation Y * (f,i) is used to represent Y r (f,i), Y f,i), or Y s (f,i).
  • the gain function used in Figure 7 is:
  • ⁇ X ⁇ . ) (f > ⁇ ) ⁇ is the output of Bartlett processor 705
  • Y ( . ) M (f,i) ⁇ is the output of the frequency decimator 722
  • a is a spectrum exponent
  • is the subtraction factor controlling the amount of suppression employed for a particular spectral subtraction stage.
  • the gain function can be optionally adaptively averaged. This gain function corresponds to a non-causal time-variating filter.
  • One way to obtain a causal filter is to impose a minimum phase.
  • An alternate way of obtaining a causal filter is to impose a linear phase.
  • the spectral subtraction stage 700 processes the incoming noisy speech signal, using the linear convolution, causal filtering and controlled exponential averaging algorithm described above, to provide the improved, reduced-noise speech signal.
  • the various components of Figures 7-8 can be implemented using any known digital signal processing technology, including a general purpose computer, a collection of integrated circuits and/or application specific integrated circuitry (ASIC).
  • ASIC application specific integrated circuitry
  • the present invention provides improved methods and apparatus for dual microphone spectral subtraction using linear convolution, causal filtering and/or controlled exponential averaging of the gain function.
  • the present invention can enhance the quality of any audio signal such as music, etc., and is not limited to only voice or speech audio signals.
  • the exemplary methods handle non-stationary background noises, since the present invention does not rely on measuring the noise on only noise-only periods. In addition, during short duration stationary background noises, the speech quality is also improved since background noise can be estimated during both noise-only and speech periods. Furthermore, the present invention can be used with or without directional microphones, and each microphone can be of a different type. In addition, the magnitude of the noise reduction can be adjusted to an appropriate level to adjust for a particular desired speech quality.

Abstract

Speech enhancement is provided in dual microphone noise reduction systems by including spectral subtraction algorithms using linear convolution, causal filtering and/or spectrum dependent exponential averaging of the spectral subtraction gain function. According to exemplary embodiments, when a far-mouth microphone is used in conjunction with a near-mouth microphone, it is possible to handle non-stationary background noise as long as the noise spectrum can continuously be estimated from a single block of input samples. The far-mouth microphone, in addition to picking up the background noise, also picks up the speaker's voice, albeit at a lower level than the near-mouth microphone. To enhance the noise estimate, a spectral subtraction stage is used to suppress the speech in the far-mouth microphone signal. To be able to enhance the noise estimate, a rough speech estimate is formed with another spectral subtraction stage from the near-mouth signal. Finally, a third spectral subtraction function is used to enhance the near-mouth signal by suppressing the background noise using the enhanced background noise estimate.

Description

SYSTEM AND METHOD FOR DUAL MICROPHONE SIGNAL NOISE REDUCTION USING SPECTRAL SUBTRACTION
Field of the Invention
The present invention relates to communications systems, and more particularly, to methods and apparatus for mitigating the effects of disruptive background noise components in communications signals.
Background of the Invention
Today, technology and consumer demand have produced mobile telephones of diminishing size. As the mobile telephones are produced smaller and smaller, the placement of the microphone during use ends up more and more distant from the speaker's (near-end user's) mouth. This increased distance increases the need for speech enhancement due to disruptive background noise being picked up at the microphone and transmitted to a far-end user. In other words, since the distance between a microphone and a near-end user is larger in the newer smaller mobile telephones, the microphone picks up not only the near-end user's speech, but also any noise which happens to be present at the near-end location. For example, the near-end microphone typically picks up sounds such as surrounding traffic, road and passenger compartment noise, room noise, and the like. The resulting noisy near-end speech can be annoying or even intolerable for the far-end user. It is thus desirable that the background noise be reduced as much as possible, preferably early in the near-end signal processing chain (e.g., before the received near-end microphone signal is supplied to a near-end speech coder).
As a result of interfering background noise, some telephone systems include a noise reduction processor designed to eliminate background noise at the input of a near- end signal processing chain. Figure 1 is a high-level block diagram of such a system 100. In Figure 1, a noise reduction processor 110 is positioned at the output of a microphone 120 and at the input of a near-end signal processing path (not shown). In operation, the noise reduction processor 110 receives a noisy speech signal Λ; from the microphone 120 and processes the noisy speech signal x to provide a cleaner, noise- reduced speech signal _JNR which is passed through the near-end signal processing chain and ultimately to the far-end user.
One well known method for implementing the noise reduction processor 110 of Figure 1 is referred to in the art as spectral subtraction. See, for example, S.F. Boll, "Suppression of Acoustic Noise in Speech using Spectral Subtraction", IEEE Trans. Acoust. Speech and Sig. Proc , 27:113-120, 1979, which is incorporated herein by reference in its entirety. Generally, spectral subtraction uses estimates of the noise spectrum and the noisy speech spectrum to form a signal-to-noise ratio (SNR) based gain function which is multiplied by the input spectrum to suppress frequencies having a low SNR. Though spectral subtraction does provide significant noise reduction, it suffers from several well known disadvantages. For example, the spectral subtraction output signal typically contains artifacts known in the art as musical tones. Further, discontinuities between processed signal blocks often lead to diminished speech quality from the far-end user perspective. Many enhancements to the basic spectral subtraction method have been developed in recent years. See, for example, N. Virage, "Speech Enhancement Based on Masking Properties of the Auditory System," IEEE ICASSP. Proc. 796-799 vol. 1, 1995; D. Tsoukalas, M. Paraskevas and J. Mourjopoulos, "Speech Enhancement using Psychoacoustic Criteria," IEEE ICASSP. Proc , 359-362 vol. 2, 1993; F. Xie and D. Van Compernolle, "Speech Enhancement by Spectral Magnitude Estimation - A
Unifying Approach," IEEE Speech Communication, 89-104 vol. 19, 1996; R. Martin, "Spectral Subtraction Based on Minimum Statistics," UESIPCO, Proc , 1182-1185 vol. 2, 1994; and S.M. McOlash, R.J. Niederjohn and J.A. Heinen, "A Spectral Subtraction Method for Enhancement of Speech Corrupted by Nonwhite, Nonstationary Noise," IEEE IECON. Proc , 872-877 vol. 2, 1995.
More recently, spectral subtraction has been implemented using correct convolution and spectrum dependent exponential gain function averaging. These techniques are described in co-pending U.S. Patent Application Serial No. 09/084,387, filed May 27, 1998 and entitled "Signal Noise Reduction by Spectral Subtraction using Linear Convolution and Causal Filtering" and co-pending U.S. Patent Application Serial No. 09/084,503, also filed May 27, 1998 and entitled "Signal Noise Reduction by Spectral Subtraction using Spectrum Dependent Exponential Gain Function
Averaging. "
Spectral subtraction uses two spectrum estimates, one being the "disturbed" signal and one being the "disturbing" signal, to form a signal-to-noise ratio (SNR) based gain function. The disturbed spectra is multiplied by the gain function to increase the SNR for this spectra. In single microphone spectral subtraction applications, such as used in conjunction with hands-free telephones, speech is enhanced from the disturbing background noise. The noise is estimated during speech pauses or with the help of a noise model during speech. This implies that the noise must be stationary to have similar properties during the speech or that the model be suitable for the moving background noise. Unfortunately, this is not the case for most background noises in every-day surroundings.
Therefore, there is a need for a noise reduction system which uses the techniques of spectral subtraction and which is suitable for use with most every-day variable background noises.
Summary of the Invention
The present invention fulfills the above-described and other needs by providing methods and apparatus for performing noise reduction by spectral subtraction in a dual microphone system. According to exemplary embodiments, when a far-mouth microphone is used in conjunction with a near-mouth microphone, it is possible to handle non-stationary background noise as long as the noise spectrum can continuously be estimated from a single block of input samples. The far-mouth microphone, in addition to picking up the background noise, also picks us the speaker's voice, albeit at a lower level than the near-mouth microphone. To enhance the noise estimate, a spectral subtraction stage is used to suppress the speech in the far-mouth microphone signal. To be able to enhance the noise estimate, a rough speech estimate is formed with another spectral subtraction stage from the near-mouth signal. Finally, a third spectral subtraction stage is used to enhance the near-mouth signal by suppressing the background noise using the enhanced background noise estimate. The above-described and other features and advantages of the present invention are explained in detail hereinafter with reference to the illustrative examples shown in the accompanying drawings. Those skilled in the art will appreciate that the described embodiments are provided for purposes of illustration and understanding and that numerous equivalent embodiments are contemplated herein.
Brief Description of the Drawings
Figure 1 is a block diagram of a noise reduction system in which spectral subtraction can be implemented; Figure 2 depicts a conventional spectral subtraction noise reduction processor;
Figures 3-4 depict exemplary spectral subtraction noise reduction processors according to exemplary embodiments of the invention;
Figure 5 depicts the placement of near- and far-mouth microphones in an exemplary embodiment of the present invention; Figure 6 depicts an exemplary dual microphone spectral subtraction system; and
Figure 7 depicts an exemplary spectral subtraction stage for use in an exemplary embodiment of the present invention.
Detailed Description of the Invention To understand the various features and advantages of the present invention, it is useful to first consider a conventional spectral subtraction technique. Generally, spectral subtraction is built upon the assumption that the noise signal and the speech signal in a communications application are random, uncorrelated and added together to form the noisy speech signal. For example, if s(n), w(n) and x(n) are stochastic short- time stationary processes representing speech, noise and noisy speech, respectively, then:
where R(f) denotes the power spectral density of a random process. The noise power spectral density Rw(f) can be estimated during speech pauses (i.e., where x( ) = w(ή)). To estimate the power spectral density of the speech, an estimate is formed as:
The conventional way to estimate the power spectral density is to use a periodogram. For example, if -SζvC/D is the N length Fourier transform of x( ) and Wf/fu) is the corresponding Fourier transform of w(ri), then:
Equations (3), (4) and (5) can be combined to provide:
IWI2 = IWI2- W„)|2 (6)
Alternatively, a more general form is given by:
where the power spectral density is exchanged for a general form of spectral density. Since the human ear is not sensitive to phase errors of the speech, the noisy speech phase φx(f) can be used as an approximation to the clean speech phase d\(f):
A general expression for estimating the clean speech Fourier transform is thus formed as:
where a parameter k is introduced to control the amount of noise subtraction. In order to simplify the notation, a vector form is introduced:
The vectors are computed element by element. For clarity, element by element multiplication of vectors is denoted herein by o. Thus, equation (9) can be written employing a gain function GN and using vector notation as:
S„ = GNo \XN\ o e* = GNe XN (11)
where the gain function is given by:
Equation (12) represents the conventional spectral subtraction algorithm and is illustrated in Figure 2. In Figure 2, a conventional spectral subtraction noise reduction processor 200 includes a fast Fourier transform processor 210, a magnitude squared processor 220, a voice activity detector 230, a block-wise averaging device 240, a block-wise gain computation processor 250, a multiplier 260 and an inverse fast Fourier transform processor 270. As shown, a noisy speech input signal is coupled to an input of the fast Fourier transform processor 210, and an output of the fast Fourier transform processor 210 is coupled to an input of the magnitude squared processor 220 and to a first input of the multiplier 260. An output of the magnitude squared processor 220 is coupled to a first contact of the switch 225 and to a first input of the gain computation processor 250. An output of the voice activity detector 230 is coupled to a throw input of the switch 225 , and a second contact of the switch 225 is coupled to an input of the block-wise averaging device 240. An output of the block- wise averaging device 240 is coupled to a second input of the gain computation processor 250, and an output of the gain computation processor 250 is coupled to a second input of the multiplier 260. An output of the multiplier 260 is coupled to an input of the inverse fast Fourier transform processor 270, and an output of the inverse fast Fourier transform processor 270 provides an output for the conventional spectral subtraction system 200.
In operation, the conventional spectral subtraction system 200 processes the incoming noisy speech signal, using the conventional spectral subtraction algorithm described above, to provide the cleaner, reduced-noise speech signal. In practice, the various components of Figure 2 can be implemented using any known digital signal processing technology, including a general purpose computer, a collection of integrated circuits and/or application specific integrated circuitry (ASIC).
Note that in the conventional spectral subtraction algorithm, there are two parameters, a and k, which control the amount of noise subtraction and speech quality. Setting the first parameter to a = 2 provides a power spectral subtraction, while setting the first parameter to a = 1 provides magnitude spectral subtraction. Additionally, setting the first parameter to a = 0.5 yields an increase in the noise reduction while only moderately distorting the speech. This is due to the fact that the spectra are compressed before the noise is subtracted from the noisy speech. The second parameter k is adjusted so that the desired noise reduction is achieved. For example, if a larger k is chosen, the speech distortion increases. In practice, the parameter k is typically set depending upon how the first parameter a is chosen. A decrease in a typically leads to a decrease in the k parameter as well in order to keep the speech distortion low. In the case of power spectral subtraction, it is common to use over-subtraction (i.e., & > 1).
The conventional spectral subtraction gain function (see equation (12)) is derived from a full block estimate and has zero phase. As a result, the corresponding impulse response gN(u) is non-causal and has length N (equal to the block length). Therefore, the multiplication of the gain function GN(l) and the input signal XN (see equation (11)) results in a periodic circular convolution with a non-causal filter. As described above, periodic circular convolution can lead to undesirable aliasing in the time domain, and the non-causal nature of the filter can lead to discontinuities between blocks and thus to inferior speech quality. Advantageously, the present invention a method and apparatus for providing correct convolution with a causal gain filter and thereby eliminates the above described problems of time domain aliasing and interblock discontinuity.
With respect to the time domain aliasing problem, note that convolution in the time-domain corresponds to multiplication in the frequency-domain. In other words: x(u) * y(u) - X(f Y(f), M= -~, ... , ~ (13)
When the transformation is obtained from a fast Fourier transform (FFT) of length N, the result of the multiplication is not a correct convolution. Rather, the result is a circular convolution with a periodicity of N: xN ® yN (14) where the symbol ® denotes circular convolution. In order to obtain a correct convolution when using a fast Fourier transform, the accumulated order of the impulse responses xN and yN must be less than or equal to one less than the block length N - 1. Thus, the time domain aliasing problem resulting from periodic circular convolution can be solved by using a gain function G^/) and an input signal block XN having a total order less than or equal to N - 1.
According to conventional spectral subtraction, the spectrum X^ of the input signal is of full block length Ν. However, according to the invention, an input signal block xL of length L (L < Ν) is used to construct a spectrum of order L. The length L is called the frame length and thus XL is one frame. Since the spectrum which is multiplied with the gain function of length Ν should also be of length Ν, the frame is zero padded to the full block length Ν, resulting in XL,Ν. In order to construct a gain function of length N, the gain function according to the invention can be interpolated from a gain function GM(l) of length M , where M < N, to form GMtΝ( ). To derive the low order gain function GM^ according to the invention, any known or yet to be developed spectrum estimation technique can be used as an alternative to the above described simple Fourier transform periodogram. Several known spectrum estimation techniques provide lower variance in the resulting gain function. See, for example, J.G. Proakis and D.G. Manolakis, Digital Signal Processing; Principles, Algorithms, and Applications, Macmillan, Second Ed., 1992. According to the well known Bartlett method, for example, the block of length N is divided into K sub-blocks of length M. A periodogram for each sub-block is then computed and the results are averaged to provide an M-long periodogram for the total block as:
pχjJV
Advantageously, the variance is reduced by a factor K when the sub-blocks are uncorrelated, compared to the full block length periodogram. The frequency resolution is also reduced by the same factor. Alternatively, the Welch method can be used. The Welch method is similar to the Bartlett method except that each sub-block is windowed by a Hanning window, and the sub-blocks are allowed to overlap each other, resulting in more sub-blocks. The variance provided by the Welch method is further reduced as compared to the Bartlett method. The Bartlett and Welch methods are but two spectral estimation techniques, and other known spectral estimation techniques can be used as well. Irrespective of the precise spectral estimation technique implemented, it is possible and desirable to decrease the variance of the noise periodogram estimate even further by using averaging techniques. For example, under the assumption that the noise is long-time stationary, it is possible to average the periodograms resulting from the above described Bartlett and Welch methods. One technique employs exponential averaging as:
Pχ,M = « ' J -D + (1 "«) PX, !) (16)
In equation (16), the function Px M( ) is computed using the Bartlett or Welch method, the function Vxj J) is the exponential average for the current block and the function PX,M (l-l) is the exponential average for the previous block. The parameter α controls how long the exponential memory is, and typically should not exceed the length of how long the noise can be considered stationary. An α closer to 1 results in a longer exponential memory and a substantial reduction of the periodogram variance.
The length M is referred to as the sub-block length, and the resulting low order gain function has an impulse response of length M. Thus, the noise periodogram estimate VXL,M (I) and the noisy speech periodogram estimate ¥XL,M (0 employed in the composition of the gain function are also of length M:
According to the invention, this is achieved by using a shorter periodogram estimate from the input frame XL and averaging using, for example, the Bartlett method. The Bartlett method (or other suitable estimation method) decreases the variance of the estimated periodogram, and there is also a reduction in frequency resolution. The reduction of the resolution from L frequency bins to M bins means that the periodogram estimate VXL,M ( ) is also of length M . Additionally, the variance of the noise periodogram estimate P*L,M (l) can be decreased further using exponential averaging as described above. To meet the requirement of a total order less than or equal to N-l, the frame length L, added to the sub-block length M, is made less than N. As a result, it is possible to form the desired output block as:
Advantageously, the low order filter according to the invention also provides an opportunity to address the problems created by the non-causal nature of the gain filter in the conventional spectral subtraction algorithm (i.e., inter-block discontinuity and diminished speech quality). Specifically, according to the invention, a phase can be added to the gain function to provide a causal filter. According to exemplary embodiments, the phase can be constructed from a magnitude function and can be either linear phase or minimum phase as desired. To construct a linear phase filter according to the invention, first observe that if the block length of the FFT is of length M, then a circular shift in the time-domain is a multiplication with a phase function in the frequency-domain:
8(n-DM ~ GM(fu) e ~^,M, fu = , „ = 0, ..., M-\ (19)
In the instant case, / equals M/2+ 1, since the first position in the impulse response should have zero delay (i.e., a causal filter). Therefore:
- 'π«(l + — ) n( . g(n-(M/2+l))M - GM(fu) e M <20)
and the linear phase filter GM (fu) is thus obtained as
G fu) = GM(fu)-e M (2D According to the invention, the gain function is also interpolated to a length N, which is done, for example, using a smooth interpolation. The phase that is added to the gain function is changed accordingly, resulting in:
Advantageously, construction of the linear phase filter can also be performed in the time-domain. In such case, the gain function GM(f is transformed to the time- domain using an IFFT, where the circular shift is done. The shifted impulse response is zero-padded to a length N, and then transformed back using an N-long FFT. This leads to an interpolated causal linear phase filter ff ) as desired.
A causal minimum phase filter according to the invention can be constructed from the gain function by employing a Hilbert transform relation. See, for example, A.V. Oppenheim and R.W. Schafer, Discrete-Time Signal Processing, Prentic-Hall, Inter. Ed., 1989. The Hilbert transform relation implies a unique relationship between real and imaginary parts of a complex function. Advantageously, this can also be utilized for a relationship between magnitude and phase, when the logarithm of the complex signal is used, as:
In the present context, the phase is zero, resulting in a real function. The function ln(| GM(fJ \) is transformed to the time-domain employing an IFFT of length M, forming gM(n). The time-domain function is rearranged as:
The function gM n) is transformed back to the frequency-domain using an M-long FFT, yielding In ( | G A | e j'zτgiβ u))). From this, the function GM(fu) is formed. The causal minimum phase filter GM( U) is then interpolated to a length N. The interpolation is made the same way as in the linear phase case described above. The resulting interpolated filter GMm(fu) is causal and has approximately minimum phase. The above described spectral subtraction scheme according to the invention is depicted in Figure 3. In Figure 3, a spectral subtraction noise reduction processor 300, providing linear convolution and causal-filtering, is shown to include a Bartlett processor 305, a magnitude squared processor 320, a voice activity detector 330, a block-wise averaging processor 340, a low order gain computation processor 350, a gain phase processor 355, an interpolation processor 356, a multiplier 360, an inverse fast Fourier transform processor 370 and an overlap and add processor 380.
As shown, the noisy speech input signal is coupled to an input of the Bartlett processor 305 and to an input of the fast Fourier transform processor 310. An output of the Bartlett processor 305 is coupled to an input of the magnitude squared processor 320, and an output of the fast Fourier transform processor 310 is coupled to a first input of the multiplier 360. An output of the magnitude squared processor 320 is coupled to a first contact of the switch 325 and to a first input of the low order gain computation processor 350. A control output of the voice activity detector 330 is coupled to a throw input of the switch 325, and a second contact of the switch 325 is coupled to an input of the block- wise averaging device 340.
An output of the block-wise averaging device 340 is coupled to a second input of the low order gain computation processor 350, and an output of the low order gain computation processor 350 is coupled to an input of the gain phase processor 355. An output of the gain phase processor 355 is coupled to an input of the interpolation processor 356, and an output of the interpolation processor 356 is coupled to a second input of the multiplier 360. An output of the multiplier 360 is coupled to an input of the inverse fast Fourier transform processor 370, and an output of the inverse fast Fourier transform processor 370 is coupled to an input of the overlap and add processor 380. An output of the overlap and add processor 380 provides a reduced noise, clean speech output for the exemplary noise reduction processor 300. In operation, the spectral subtraction noise reduction processor 300 processes the incoming noisy speech signal, using the linear convolution, causal filtering algorithm described above, to provide the clean, reduced-noise speech signal. In practice, the various components of Figure 3 can be implemented using any known digital signal processing technology, including a general purpose computer, a collection of integrated circuits and/or application specific integrated circuitry (ASIC).
Advantageously, the variance of the gain function GM( ) of the invention can be decreased still further by way of a controlled exponential gain function averaging scheme according to the invention. According to exemplary embodiments, the averaging is made dependent upon the discrepancy between the current block spectrum PC,ΛX and the averaged noise spectrum ¥X,M(1). For example, when there is a small discrepancy, long averaging of the gain function GM( ) can be provided, corresponding to a stationary background noise situation. Conversely, when there is a large discrepancy, short averaging or no averaging of the gain function G^l) can be provided, corresponding to situations with speech or highly varying background noise. In order to handle the transient switch from a speech period to a background noise period, the averaging of the gain function is not increased in direct proportion to decreases in the discrepancy, as doing so introduces an audible shadow voice (since the gain function suited for a speech spectrum would remain for a long period). Instead, the averaging is allowed to increase slowly to provide time for the gain function to adapt to the stationary input.
According to exemplary embodiments, the discrepancy measure between spectra is defined as
= =-= — (25)
where β\) is limited by l, > 1 - , β^≤/KQ≤ l, 0<βm«l (26)
and where β ) = 1 results in no exponential averaging of the gain function, and p\ϊ) = βmin provides the maximum degree of exponential averaging.
The parameter β( ) is an exponential average of the discrepancy between spectra, described by
β(D = γ- /-l)+(l -γ)- Z) (27)
The parameter γ in equation (27) is used to ensure that the gain function adapts to the new level, when a transition from a period with high discrepancy between the spectra to a period with low discrepancy appears. As noted above, this is done to prevent shadow voices. According to the exemplary embodiments, the adaption is finished before the increased exponential averaging of the gain function starts due to the decreased level of β\l). Thus:
0, /-1)<
(28) γc, β(l-Y)≥β(D, 0< γc< l
When the discrepancy p\T) increases, the parameter p\l) follows directly, but when the discrepancy decreases, an exponential average is employed on β\l) to form the averaged parameter p\l). The exponential averaging of the gain function is described by:
G M(/ (1 - 0) ' G M(/-1)+ GM(D (29)
The above equations can be interpreted for different input signal conditions as follows. During noise periods, the variance is reduced. As long as the noise spectra has a steady mean value for each frequency, it can be averaged to decrease the variance. Noise level changes result in a discrepancy between the averaged noise spectrum VX,M ( ) and the spectrum for the current block P-/ - Thus, the controlled exponential averaging method decreases the gain function averaging until the noise level has stabilized at a new level. This behavior enables handling of the noise level changes and gives a decrease in variance during stationary noise periods and prompt response to noise changes. High energy speech often has time- varying spectral peaks. When the spectral peaks from different blocks are averaged, their spectral estimate contains an average of these peaks and thus looks like a broader spectrum, which results in reduced speech quality. Thus, the exponential averaging is kept at a minimum during high energy speech periods. Since the discrepancy between the average noise spectrum PX,M(1) and the current high energy speech spectrum P-fJW(/) is large, no exponential averaging of the gain function is performed. During lower energy speech periods, the exponential averaging is used with a short memory depending on the discrepancy between the current low-energy speech spectrum and the averaged noise spectrum. The variance reduction is consequently lower for low-energy speech than during background noise periods, and larger compared to high energy speech periods. The above described spectral subtraction scheme according to the invention is depicted in Figure 4. In Figure 4, a spectral subtraction noise reduction processor 400, providing linear convolution, causal-filtering and controlled exponential averaging, is shown to include the Bartlett processor 305, the magnitude squared processor 320, the voice activity detector 330, the block- wise averaging device 340, the low order gain computation processor 350, the gain phase processor 355, the interpolation processor 356, the multiplier 360, the inverse fast Fourier transform processor 370 and the overlap and add processor 380 of the system 300 of Figure 3, as well as an averaging control processor 445, an exponential averaging processor 446 and an optional fixed FIR post filter 465.
As shown, the noisy speech input signal is coupled to an input of the Bartlett processor 305 and to an input of the fast Fourier transform processor 310. An output of the Bartlett processor 305 is coupled to an input of the magnitude squared processor 320, and an output of the fast Fourier transform processor 310 is coupled to a first input of the multiplier 360. An output of the magnitude squared processor 320 is coupled to a first contact of the switch 325, to a first input of the low order gain computation processor 350 and to a first input of the averaging control processor 445.
A control output of the voice activity detector 330 is coupled to a throw input of the switch 325, and a second contact of the switch 325 is coupled to an input of the block- wise averaging device 340. An output of the block- wise averaging device 340 is coupled to a second input of the low order gain computation processor 350 and to a second input of the averaging controller 445. An output of the low order gain computation processor 350 is coupled to a signal input of the exponential averaging processor 446, and an output of the averaging controller 445 is coupled to a control input of the exponential averaging processor 446.
An output of the exponential averaging processor 446 is coupled to an input of the gain phase processor 355, and an output of the gain phase processor 355 is coupled to an input of the interpolation processor 356. An output of the interpolation processor 356 is coupled to a second input of the multiplier 360, and an output of the optional fixed FIR post filter 465 is coupled to a third input of the multiplier 360. An output of the multiplier 360 is coupled to an input of the inverse fast Fourier transform processor 370, and an output of the inverse fast Fourier transform processor 370 is coupled to an input of the overlap and add processor 380. An output of the overlap and add processor 380 provides a clean speech signal for the exemplary system 400. In operation, the spectral subtraction noise reduction processor 400 according to the invention processes the incoming noisy speech signal, using the linear convolution, causal filtering and controlled exponential averaging algorithm described above, to provide the improved, reduced-noise speech signal. As with the embodiment of Figure 3, the various components of Figure 4 can be implemented using any known digital signal processing technology, including a general purpose computer, a collection of integrated circuits and/or application specific integrated circuitry (ASIC).
Note that since the sum of the frame length L and the sub-block length M are chosen, according to exemplary embodiments, to be shorter than N-l, the extra fixed FIR filter 465 of length J ≤ N - 1 - L - M can be added as shown in Figure 4. The post filter 465 is applied by multiplying the interpolated impulse response of the filter with the signal spectrum as shown. The interpolation to a length N is performed by zero padding of the filter and employing an N-long FFT. This post filter 465 can be used to filter out the telephone bandwidth or a constant tonal component. Alternatively, the functionality of the post filter 465 can be included directly within the gain function.
The parameters of the above described algorithm are set in practice based upon the particular application in which the algorithm is implemented. By way of example, parameter selection is described hereinafter in the context of a GSM mobile telephone.
First, based on the GSM specification, the frame length L is set to 160 samples, which provides 20 ms frames. Other choices of L can be used in other systems. However, it should be noted that an increment in the frame length L corresponds to an increment in delay. The sub-block length M (e.g., the periodogram length for the
Bartlett processor) is made small to provide increased variance reduction M. Since an FFT is used to compute the periodograms, the length M can be set conveniently to a power of two. The frequency resolution is then determined as:
Fs B = -L (30)
M
The GSM system sample rate is 8000 Hz. Thus a length M = 16, M = 32 and M = 64 gives a frequency resolution of 500 Hz, 250 Hz and 125 Hz, respectively. In order to use the above techniques of spectral subtraction in a system where the noise is variable, such as in a mobile telephone, the present invention utilizes a two microphone system. The two microphone system is illustrated in Figure 5, where 582 is a mobile telephone, 584 is a near-mouth microphone, and 586 is a far-mouth microphone. When a far-mouth microphone is used in conjunction with a near-mouth microphone, it is possible to handle non-stationary background noise as long as the noise spectrum can continuously be estimated from a single block of input samples.
The far-mouth microphone 586, in addition to picking up the background noise, also picks us the speaker's voice, albeit at a lower level than the near-mouth microphone 584. To enhance the noise estimate, a spectral subtraction stage is used to suppress the speech in the far-mouth microphone 586 signal. To be able to enhance the noise estimate, a rough speech estimate is formed with another spectral subtraction stage from the near-mouth signal. Finally, a third spectral subtraction stage is used to enhance the near-mouth signal by filtering out the enhanced background noise.
A potential problem with the above technique is the need to make low variance estimates of the filter, i.e., the gain function, since the speech and noise estimates can only be formed from a short block of data samples. In order to reduce the variability of the gain function, the single microphone spectral subtraction algorithm discussed above is used. By doing so, this method reduces the variability of the gain function by using Bartlett' s spectrum estimation method to reduce the variance. The frequency resolution is also reduced by this method but this property is used to make a causal true linear convolution. In an exemplary embodiment of the present invention, the variability of the gain function is further reduced by adaptive averaging, controlled by a discrepancy measure between the noise and noisy speech spectrum estimates.
In the two microphone system of the present invention, as illustrated in Figure 6, there are two signals: the continuos signal from the near-mouth microphone 584, where the speech is dominating, xs(n); and the continuous signal from the far-mouth microphone 586, where the noise is more dominant, x„(n). The signal from the near- mouth microphone 584 is provided to an input of a buffer 689 where it is broken down into blocks xs(i). In an exemplary embodiment of the present invention, buffer 689 is also a speech encoder. The signal from the far-mouth microphone 586 is provided to an input of a buffer 687 where it is broken down into blocks ς,(ι). Both buffers 687 and 689 can also include additional signal processing such as an echo canceller in order to further enhance the performance of the present invention. An analog to digital (A/D) converter (not shown) converts an analog signal, derived from the microphones 584, 586, to a digital signal so that it may be processed by the spectral subtraction stages of the present invention. The A/D converter may be present either prior to or following the buffers 687, 689.
The first spectral subtraction stage 601 has as its input, a block of the near- mouth signal, xs(i), and an estimate of the noise from the previous frame, Yn(f,i - 1). The estimate of noise from the previous frame is produced by coupling the output of the second spectral subtraction stage 602 to the input of a delay circuit 688. The output of the delay circuit 688 is coupled to the first spectral subtraction stage 601. This first spectral subtraction stage is used to make a rough estimate of the speech, Yr(f,i). The output of the first spectral subtraction stage 601 is supplied to the second spectral subtraction stage 602 which uses this estimate (Yr(f,i)) and a block of the far-mouth signal, xn(i) to estimate the noise spectrum for the current frame, Y„(f,ϊ). Finally, the output of the second spectral subtraction stage 602 is supplied to the third spectral subtraction stage 603 which uses the current noise spectrum estimate, Y„(f,i), and a block of the near-mouth signal, xs(i), to estimate the noise reduced speech, Ys(f,i). The output of the third spectral subtraction stage 603 is coupled to an input of the inverse fast Fourier transform processor 670, and an output of the inverse fast Fourier transform processor 670 is coupled to an input of the overlap and add processor 680. The output of the overlap and add processor 680 provides a clean speech signal as an output from the exemplary system 600.
In an exemplary embodiment of the present invention, each spectral subtraction stage 601-603 has a parameter which controls the size of the subtraction. This parameter is preferably set differently depending on the input SNR of the microphones and the method of noise reduction being employed. In addition, in a further exemplary embodiment of the present invention, a controller is used to dynamically set the parameters for each of the spectral subtraction stages 601-603 for further accuracy in a variable noisy environment. In addition, since the far-mouth microphone signal is used to estimate the noise spectrum which will be subtracted from the near-mouth noisy speech spectrum, performance of the present invention will be increased when the background noise spectrum has the same characteristics in both microphones. That is, for example, when using a directional near-mouth microphone, the background characteristics are different when compared to an omni-directional far-mouth microphone. To compensate for the differences in this case, one or both of the microphone signals should be filtered in order to reduce the differences of the spectra. In an exemplary embodiment of the present invention, it is desirable to keep the delay as low as possible in telephone communications to prevent disturbing echoes and unnatural pauses. When the signal block length is matched with the mobile telephone system's voice encoder block length, the present invention uses the same block of samples as the voice encoder. Thereby, no extra delay is introduced for the buffering of the signal block. The introduced delay is therefore only the computation time of the noise reduction of the present invention plus the group delay of the gain function filtering in the last spectral subtraction stage. As illustrated in the third stage, a minimum phase can be imposed on the amplitude gain function which gives a short delay under the constraint of causal filtering.
Since the present invention uses two microphones, it is no longer necessary to use VAD 330, switch 325, and average block 340 as illustrated with respect to the single microphone use of the spectral subtraction in Figures 3 and 4. That is, the far- mouth microphone can be used to provide a constant noise signal during both voice and non- voice time periods. In addition, IFFT 370 and the overlap and add circuit 380 have been moved to the final output stage as illustrated as 670 and 680 in Figure 6. The above described spectral subtraction stages used in the dual microphone implementation may each be implemented as depicted in Figure 7. In Figure 7, a spectral subtraction stage 700, providing linear convolution, causal-filtering and controlled exponential averaging, is shown to include the Bartlett processor 705, the frequency decimator 722, the low order gain computation processor 750, the gain phase processor and the interpolation processor 755/756, and the multiplier 760.
As shown, the noisy speech input signal, X(.)(i), is coupled to an input of the Bartlett processor 705 and to an input of the fast Fourier transform processor 710. The notation Xo( is used to represent X ) or -¥,.(/) which are provided to the inputs of spectral subtraction stages 601-603 as illustrated in Figure 6. The amplitude spectrum of the unwanted signal, with length N, is coupled to an input of the frequency decimator 722. The notation Y^ff.i) is used to represent Yn(f,i-l), Yrif,ϊ), or Y f,i). An output of the frequency decimator 722 is the amplitude spectrum of Y(.tN)(f,i) having length M, where M < N. In addition the frequency decimator 722 reduces the variance of the output amplitude spectrum as compared to the input amplitude spectrum. An amplitude spectrum output of the Bartlett processor 705 and an amplitude spectrum output of the frequency decimator 722 are coupled to inputs of the low order gain computation processor 750. The output of the fast Fourier transform processor 710 is coupled to a first input of the multiplier 760. The output of the low order gain computation processor 750 is coupled to a signal input of an optional exponential averaging processor 746. An output of the exponential averaging processor 746 is coupled to an input of the gain phase and interpolation processor 755/756. An output of processor 755/756 is coupled to a second input of the multiplier 760. The filtered spectrum Y*(f,ϊ) is thus the output of the multiplier 760, where the notation Y*(f,i) is used to represent Yr(f,i), Y f,i), or Ys(f,i). The gain function used in Figure 7 is:
where \X{.) (f>ϊ)\ is the output of Bartlett processor 705, | Y(.) M(f,i)\ is the output of the frequency decimator 722, a is a spectrum exponent, Λ^ is the subtraction factor controlling the amount of suppression employed for a particular spectral subtraction stage. The gain function can be optionally adaptively averaged. This gain function corresponds to a non-causal time-variating filter. One way to obtain a causal filter is to impose a minimum phase. An alternate way of obtaining a causal filter is to impose a linear phase. To obtain a gain function G f,ϊ) with the same number of FFT bins as the input block X^ tN(f,ϊ), the gain function is interpolated, GM jlf,i). The gain function, GmtAf,ϊ), now corresponds to a causal linear filter with length M. By using conventional FFT filtering, an output signal without periodicity effects can be obtained.
In operation, the spectral subtraction stage 700 according to the invention processes the incoming noisy speech signal, using the linear convolution, causal filtering and controlled exponential averaging algorithm described above, to provide the improved, reduced-noise speech signal. As with the embodiment of Figures 3 and 4, the various components of Figures 7-8 can be implemented using any known digital signal processing technology, including a general purpose computer, a collection of integrated circuits and/or application specific integrated circuitry (ASIC). In summary, the present invention provides improved methods and apparatus for dual microphone spectral subtraction using linear convolution, causal filtering and/or controlled exponential averaging of the gain function. One skilled in the art will readily recognize that the present invention can enhance the quality of any audio signal such as music, etc., and is not limited to only voice or speech audio signals. The exemplary methods handle non-stationary background noises, since the present invention does not rely on measuring the noise on only noise-only periods. In addition, during short duration stationary background noises, the speech quality is also improved since background noise can be estimated during both noise-only and speech periods. Furthermore, the present invention can be used with or without directional microphones, and each microphone can be of a different type. In addition, the magnitude of the noise reduction can be adjusted to an appropriate level to adjust for a particular desired speech quality.
Those skilled in the art will appreciate that the present invention is not limited to the specific exemplary embodiments which have been described herein for purposes of illustration and that numerous alternative embodiments are also contemplated. For example, though the invention has been described in the context of mobile communications applications, those skilled in the art will appreciate that the teachings of the invention are equally applicable in any signal processing application in which it is desirable to remove a particular signal component. The scope of the invention is therefore defined by the claims which are appended hereto, rather than the foregoing description, and all equivalents which are consistent with the meaning of the claims are intended to be embraced therein.

Claims

We Claim:
1. A noise reduction system, comprising: a first spectral subtraction processor configured to filter a first signal to provide a first noise reduced output signal; a second spectral subtraction processor configured to filter a second signal to provide a noise estimate output signal; a third spectral subtraction processor configured to filter said first signal as a function of said noise estimate output signal.
2. The system of claim 1, wherein said second spectral subtraction processor is configured to filter said second signal as a function of said first noise reduced output signal.
3. The system of claim 1, wherein said system further comprises: a delay circuit, wherein said noise estimate output signal is coupled to an input of said delay circuit; and wherein said first spectral subtraction processor is configured to filter said first signal as a function of an output of said delay circuit.
4. The system of claim 1, wherein said system further comprises: a first microphone; and a second microphone, wherein said first signal is derived from an output of said first microphone and said second signal is derived from an output of said second microphone.
5. The system of claim 4, wherein said first microphone is of a different type than said second microphone.
6. The system of claim 4, wherein said first microphone is closer to a source of a desired audio wave than said second microphone.
7. The system of claim 1, wherein a gain function of at least one of said first, second, and third spectral subtraction processors is computed based on an estimate of a spectral density of an input signal and on an estimate of a spectral density of an undesired component of said input signal, wherein a block of samples of an output signal of said at least one of said first, second, and third spectral subtraction processors is computed based on a respective block of samples of said input signal and on a respective block of samples of the gain function, and wherein a sum of an order of the respective block of samples of said input signal and of an order of the respective block of samples of the gain function is less than the number of samples of the blocks of the output signal.
8. The system of claim 7, wherein a phase is added to the gain function so that at least one of said first, second, and third spectral subtraction processors provides causal filtering.
9. The system of claim 8, wherein the gain function has linear phase.
10. The system of claim 8, wherein the gain function has minimum phase.
11. A method for processing a noisy input signal and a noise signal to provide a noise reduced output signal, comprising the steps of:
(a) using spectral subtraction to filter said noisy input signal to provide a first noise reduced output signal; (b) using spectral subtraction to filter said noise signal to provide a noise estimate output signal;
(c) using spectral subtraction to filter said noisy input signal as a function of said noise estimate output signal.
12. The method of claim 11, wherein step (b) filters said noise signal based on said first noise reduced output signal.
13. The method of claim 11, wherein said method further comprises the steps of:
(d) delaying said noise estimate output signal; and wherein step (a) further includes using said spectral subtraction to filter said noisy input signal as a function of a result of step (d) to provide said first noise reduced output signal.
14. The method of claim 11 , wherein a gain function of at least one of said first, second, and third spectral subtraction processors is computed based on an estimate of a spectral density of an input signal and on an estimate of a spectral density of an undesired component of said input signal, wherein a block of samples of an output signal of said at least one of said first, second, and third spectral subtraction processors is computed based on a respective block of samples of said input signal and on a respective block of samples of the gain function, and wherein a sum of an order of the respective block of samples of said input signal and of an order of the respective block of samples of the gain function is less than the number of samples of the blocks of the output signal.
15. The method of claim 14, wherein a phase is added to the gain function so that at least one of said first, second, and third spectral subtraction processors provides causal filtering.
16. The method of claim 15, wherein the gain function has linear phase.
17. The method of claim 15, wherein the gain function has minimum phase.
18. A mobile telephone, comprising: an input for receiving a first signal derived from a first microphone; an input for receiving a second signal derived from a second microphone; a first spectral subtraction processor configured to filter said first signal to provide a first noise reduced output signal; a second spectral subtraction processor configured to filter said second signal to provide a noise estimate output signal; a third spectral subtraction processor configured to filter said first signal as a function of said noise estimate output signal.
19. The mobile telephone of claim 18, wherein said second spectral subtraction processor is configured to filter said second signal as a function of said first noise reduced output signal.
20. The mobile telephone of claim 18, wherein said mobile telephone further comprises: a delay circuit, wherein said noise estimate output signal is coupled to an input of said delay circuit; and wherein said first spectral subtraction processor is configured to filter said first signal as a function of an output of said delay circuit.
21. The mobile telephone of claim 18, wherein said first microphone is of a different type than said second microphone.
22. The mobile telephone of claim 18, wherein said first microphone is closer to a source of an audio wave than said second microphone.
EP00925198A 1999-04-12 2000-04-11 System and method for dual microphone signal noise reduction using spectral subtraction Expired - Lifetime EP1169883B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US289065 1999-04-12
US09/289,065 US6549586B2 (en) 1999-04-12 1999-04-12 System and method for dual microphone signal noise reduction using spectral subtraction
PCT/EP2000/003223 WO2000062579A1 (en) 1999-04-12 2000-04-11 System and method for dual microphone signal noise reduction using spectral subtraction

Publications (2)

Publication Number Publication Date
EP1169883A1 true EP1169883A1 (en) 2002-01-09
EP1169883B1 EP1169883B1 (en) 2003-02-12

Family

ID=23109892

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00925198A Expired - Lifetime EP1169883B1 (en) 1999-04-12 2000-04-11 System and method for dual microphone signal noise reduction using spectral subtraction

Country Status (12)

Country Link
US (1) US6549586B2 (en)
EP (1) EP1169883B1 (en)
JP (1) JP2002542689A (en)
KR (1) KR20020005674A (en)
CN (1) CN1175709C (en)
AT (1) ATE232675T1 (en)
AU (1) AU4399900A (en)
BR (1) BR0009740A (en)
DE (1) DE60001398T2 (en)
HK (1) HK1047520B (en)
MY (1) MY123423A (en)
WO (1) WO2000062579A1 (en)

Families Citing this family (93)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6795406B2 (en) * 1999-07-12 2004-09-21 Genesys Telecommunications Laboratories, Inc. Methods and apparatus for enhancing wireless data network telephony, including quality of service monitoring and control
FR2797343B1 (en) * 1999-08-04 2001-10-05 Matra Nortel Communications VOICE ACTIVITY DETECTION METHOD AND DEVICE
DE60125553T2 (en) * 2000-05-10 2007-10-04 The Board Of Trustees For The University Of Illinois, Urbana METHOD OF INTERFERENCE SUPPRESSION
FR2808958B1 (en) * 2000-05-11 2002-10-25 Sagem PORTABLE TELEPHONE WITH SURROUNDING NOISE MITIGATION
JP3566197B2 (en) * 2000-08-31 2004-09-15 松下電器産業株式会社 Noise suppression device and noise suppression method
FR2820227B1 (en) * 2001-01-30 2003-04-18 France Telecom NOISE REDUCTION METHOD AND DEVICE
JP2002287782A (en) * 2001-03-28 2002-10-04 Ntt Docomo Inc Equalizer device
US7158933B2 (en) * 2001-05-11 2007-01-02 Siemens Corporate Research, Inc. Multi-channel speech enhancement system and method based on psychoacoustic masking effects
WO2003013185A1 (en) * 2001-08-01 2003-02-13 Dashen Fan Cardioid beam with a desired null based acoustic devices, systems and methods
US7103539B2 (en) * 2001-11-08 2006-09-05 Global Ip Sound Europe Ab Enhanced coded speech
AU2003210111A1 (en) * 2002-01-07 2003-07-24 Ronald L. Meyer Microphone support system
US6978010B1 (en) 2002-03-21 2005-12-20 Bellsouth Intellectual Property Corp. Ambient noise cancellation for voice communication device
US20040192243A1 (en) * 2003-03-28 2004-09-30 Siegel Jaime A. Method and apparatus for reducing noise from a mobile telephone and for protecting the privacy of a mobile telephone user
JP3907194B2 (en) * 2003-05-23 2007-04-18 株式会社東芝 Speech recognition apparatus, speech recognition method, and speech recognition program
US7433475B2 (en) * 2003-11-27 2008-10-07 Canon Kabushiki Kaisha Electronic device, video camera apparatus, and control method therefor
DE102004017486A1 (en) * 2004-04-08 2005-10-27 Siemens Ag Method for noise reduction in a voice input signal
KR20070050058A (en) * 2004-09-07 2007-05-14 코닌클리케 필립스 일렉트로닉스 엔.브이. Telephony device with improved noise suppression
KR101168002B1 (en) * 2004-09-16 2012-07-26 프랑스 텔레콤 Method of processing a noisy sound signal and device for implementing said method
KR100636048B1 (en) * 2004-10-28 2006-10-20 한국과학기술연구원 Mobile communication terminal and method for generating a ring signal of changing frequency characteristic according to background noise characteristics
KR100677554B1 (en) * 2005-01-14 2007-02-02 삼성전자주식회사 Method and apparatus for recording signal using beamforming algorithm
US8594320B2 (en) * 2005-04-19 2013-11-26 (Epfl) Ecole Polytechnique Federale De Lausanne Hybrid echo and noise suppression method and device in a multi-channel audio signal
FI20055261A0 (en) * 2005-05-27 2005-05-27 Midas Studios Avoin Yhtioe An acoustic transducer assembly, system and method for receiving or reproducing acoustic signals
US7697827B2 (en) 2005-10-17 2010-04-13 Konicek Jeffrey C User-friendlier interfaces for a camera
WO2007048810A1 (en) * 2005-10-25 2007-05-03 Anocsys Ag Method for the estimation of a useful signal with the aid of an adaptive process
US8345890B2 (en) 2006-01-05 2013-01-01 Audience, Inc. System and method for utilizing inter-microphone level differences for speech enhancement
CN1809105B (en) * 2006-01-13 2010-05-12 北京中星微电子有限公司 Dual-microphone speech enhancement method and system applicable to mini-type mobile communication devices
US9185487B2 (en) * 2006-01-30 2015-11-10 Audience, Inc. System and method for providing noise suppression utilizing null processing noise subtraction
US8194880B2 (en) * 2006-01-30 2012-06-05 Audience, Inc. System and method for utilizing omni-directional microphones for speech enhancement
US8204252B1 (en) 2006-10-10 2012-06-19 Audience, Inc. System and method for providing close microphone adaptive array processing
US8744844B2 (en) 2007-07-06 2014-06-03 Audience, Inc. System and method for adaptive intelligent noise suppression
US20070263847A1 (en) * 2006-04-11 2007-11-15 Alon Konchitsky Environmental noise reduction and cancellation for a cellular telephone communication device
US20070237338A1 (en) * 2006-04-11 2007-10-11 Alon Konchitsky Method and apparatus to improve voice quality of cellular calls by noise reduction using a microphone receiving noise and speech from two air pipes
US20070213010A1 (en) * 2006-03-13 2007-09-13 Alon Konchitsky System, device, database and method for increasing the capacity and call volume of a communications network
US20070237339A1 (en) * 2006-04-11 2007-10-11 Alon Konchitsky Environmental noise reduction and cancellation for a voice over internet packets (VOIP) communication device
US20070237341A1 (en) * 2006-04-05 2007-10-11 Creative Technology Ltd Frequency domain noise attenuation utilizing two transducers
GB2446966B (en) 2006-04-12 2010-07-07 Wolfson Microelectronics Plc Digital circuit arrangements for ambient noise-reduction
US8204253B1 (en) 2008-06-30 2012-06-19 Audience, Inc. Self calibration of audio device
US8150065B2 (en) * 2006-05-25 2012-04-03 Audience, Inc. System and method for processing an audio signal
US8849231B1 (en) 2007-08-08 2014-09-30 Audience, Inc. System and method for adaptive power control
US8949120B1 (en) 2006-05-25 2015-02-03 Audience, Inc. Adaptive noise cancelation
US8934641B2 (en) 2006-05-25 2015-01-13 Audience, Inc. Systems and methods for reconstructing decomposed audio signals
US20080089401A1 (en) * 2006-10-16 2008-04-17 Pao-Jen Lai Signal Testing System
US8259926B1 (en) 2007-02-23 2012-09-04 Audience, Inc. System and method for 2-channel and 3-channel acoustic echo cancellation
GB2449083B (en) 2007-05-09 2012-04-04 Wolfson Microelectronics Plc Cellular phone handset with ambient noise reduction
US8189766B1 (en) 2007-07-26 2012-05-29 Audience, Inc. System and method for blind subband acoustic echo cancellation postfiltering
ATE498978T1 (en) * 2007-11-13 2011-03-15 Akg Acoustics Gmbh MICROPHONE ARRANGEMENT HAVING TWO PRESSURE GRADIENT TRANSDUCERS
WO2009062214A1 (en) * 2007-11-13 2009-05-22 Akg Acoustics Gmbh Method for synthesizing a microphone signal
WO2009062210A1 (en) * 2007-11-13 2009-05-22 Akg Acoustics Gmbh Microphone arrangement
JP4493690B2 (en) * 2007-11-30 2010-06-30 株式会社神戸製鋼所 Objective sound extraction device, objective sound extraction program, objective sound extraction method
US8180064B1 (en) 2007-12-21 2012-05-15 Audience, Inc. System and method for providing voice equalization
US8143620B1 (en) 2007-12-21 2012-03-27 Audience, Inc. System and method for adaptive classification of audio sources
US8600740B2 (en) * 2008-01-28 2013-12-03 Qualcomm Incorporated Systems, methods and apparatus for context descriptor transmission
JP2009188858A (en) * 2008-02-08 2009-08-20 National Institute Of Information & Communication Technology Voice output apparatus, voice output method and program
WO2009105793A1 (en) 2008-02-26 2009-09-03 Akg Acoustics Gmbh Transducer assembly
US8194882B2 (en) 2008-02-29 2012-06-05 Audience, Inc. System and method for providing single microphone noise suppression fallback
US8355511B2 (en) 2008-03-18 2013-01-15 Audience, Inc. System and method for envelope-based acoustic echo cancellation
US8774423B1 (en) 2008-06-30 2014-07-08 Audience, Inc. System and method for controlling adaptivity of signal modification using a phantom coefficient
US8521530B1 (en) 2008-06-30 2013-08-27 Audience, Inc. System and method for enhancing a monaural audio signal
JP4660578B2 (en) * 2008-08-29 2011-03-30 株式会社東芝 Signal correction device
JP2010122617A (en) * 2008-11-21 2010-06-03 Yamaha Corp Noise gate and sound collecting device
US8229126B2 (en) * 2009-03-13 2012-07-24 Harris Corporation Noise error amplitude reduction
JP5648052B2 (en) * 2009-07-07 2015-01-07 コーニンクレッカ フィリップス エヌ ヴェ Reducing breathing signal noise
US9008329B1 (en) 2010-01-26 2015-04-14 Audience, Inc. Noise reduction using multi-feature cluster tracker
US8718290B2 (en) 2010-01-26 2014-05-06 Audience, Inc. Adaptive noise reduction using level cues
TWI423688B (en) * 2010-04-14 2014-01-11 Alcor Micro Corp Voice sensor with electromagnetic wave receiver
US8473287B2 (en) 2010-04-19 2013-06-25 Audience, Inc. Method for jointly optimizing noise reduction and voice quality in a mono or multi-microphone system
US8798290B1 (en) 2010-04-21 2014-08-05 Audience, Inc. Systems and methods for adaptive signal equalization
US9378754B1 (en) * 2010-04-28 2016-06-28 Knowles Electronics, Llc Adaptive spatial classifier for multi-microphone systems
US9558755B1 (en) 2010-05-20 2017-01-31 Knowles Electronics, Llc Noise suppression assisted automatic speech recognition
CN102376309B (en) * 2010-08-17 2013-12-04 骅讯电子企业股份有限公司 System and method for reducing environmental noise as well as device applying system
TWI458361B (en) * 2010-09-14 2014-10-21 C Media Electronics Inc System, method and apparatus with environment noise cancellation
US8577057B2 (en) 2010-11-02 2013-11-05 Robert Bosch Gmbh Digital dual microphone module with intelligent cross fading
EP2659487B1 (en) 2010-12-29 2016-05-04 Telefonaktiebolaget LM Ericsson (publ) A noise suppressing method and a noise suppressor for applying the noise suppressing method
US9648421B2 (en) 2011-12-14 2017-05-09 Harris Corporation Systems and methods for matching gain levels of transducers
US8712769B2 (en) * 2011-12-19 2014-04-29 Continental Automotive Systems, Inc. Apparatus and method for noise removal by spectral smoothing
EP2615739B1 (en) * 2012-01-16 2015-06-17 Nxp B.V. Processor for an FM signal receiver and processing method
US9280984B2 (en) * 2012-05-14 2016-03-08 Htc Corporation Noise cancellation method
US9640194B1 (en) 2012-10-04 2017-05-02 Knowles Electronics, Llc Noise suppression for speech processing based on machine-learning mask estimation
CN103067821B (en) * 2012-12-12 2015-03-11 歌尔声学股份有限公司 Method of and device for reducing voice reverberation based on double microphones
US20140278395A1 (en) * 2013-03-12 2014-09-18 Motorola Mobility Llc Method and Apparatus for Determining a Motion Environment Profile to Adapt Voice Recognition Processing
US20140273851A1 (en) * 2013-03-15 2014-09-18 Aliphcom Non-contact vad with an accelerometer, algorithmically grouped microphone arrays, and multi-use bluetooth hands-free visor and headset
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
CN103413554B (en) * 2013-08-27 2016-02-03 广州顶毅电子有限公司 The denoising method of DSP time delay adjustment and device
US9633671B2 (en) 2013-10-18 2017-04-25 Apple Inc. Voice quality enhancement techniques, speech recognition techniques, and related systems
CN106797512B (en) 2014-08-28 2019-10-25 美商楼氏电子有限公司 Method, system and the non-transitory computer-readable storage medium of multi-source noise suppressed
WO2016184479A1 (en) * 2015-05-15 2016-11-24 Read As Removal of noise from signals contaminated by pick-up noise
CN104952458B (en) * 2015-06-09 2019-05-14 广州广电运通金融电子股份有限公司 A kind of noise suppressing method, apparatus and system
CN108564963B (en) * 2018-04-23 2019-10-18 百度在线网络技术(北京)有限公司 Method and apparatus for enhancing voice
CN109813962B (en) * 2018-12-27 2021-04-13 中电科思仪科技股份有限公司 Frequency conversion system group delay measurement method and system based on Hilbert transform
CN112017678A (en) * 2019-05-29 2020-12-01 北京声智科技有限公司 Equipment capable of realizing noise reduction and noise reduction method and device thereof
EP3809410A1 (en) 2019-10-17 2021-04-21 Tata Consultancy Services Limited System and method for reducing noise components in a live audio stream
CN111899751B (en) * 2020-08-04 2022-04-22 西南交通大学 Generalized mixed norm self-adaptive echo cancellation method for resisting saturation distortion
CN113113036B (en) * 2021-03-12 2023-06-06 北京小米移动软件有限公司 Audio signal processing method and device, terminal and storage medium

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3374514D1 (en) 1982-01-27 1987-12-17 Racal Acoustics Ltd Improvements in and relating to communications systems
US4630305A (en) 1985-07-01 1986-12-16 Motorola, Inc. Automatic gain selector for a noise suppression system
GB2239971B (en) 1989-12-06 1993-09-29 Ca Nat Research Council System for separating speech from background noise
JPH0566795A (en) 1991-09-06 1993-03-19 Gijutsu Kenkyu Kumiai Iryo Fukushi Kiki Kenkyusho Noise suppressing device and its adjustment device
FR2687496B1 (en) * 1992-02-18 1994-04-01 Alcatel Radiotelephone METHOD FOR REDUCING ACOUSTIC NOISE IN A SPEAKING SIGNAL.
US5742927A (en) * 1993-02-12 1998-04-21 British Telecommunications Public Limited Company Noise reduction apparatus using spectral subtraction or scaling and signal attenuation between formant regions
DE4330243A1 (en) 1993-09-07 1995-03-09 Philips Patentverwaltung Speech processing facility
US5418857A (en) * 1993-09-28 1995-05-23 Noise Cancellation Technologies, Inc. Active control system for noise shaping
US5473701A (en) * 1993-11-05 1995-12-05 At&T Corp. Adaptive microphone array
EP0682801B1 (en) 1993-12-06 1999-09-15 Koninklijke Philips Electronics N.V. A noise reduction system and device, and a mobile radio station
US5475761A (en) 1994-01-31 1995-12-12 Noise Cancellation Technologies, Inc. Adaptive feedforward and feedback control system
JPH07248778A (en) * 1994-03-09 1995-09-26 Fujitsu Ltd Method for renewing coefficient of adaptive filter
JP3484757B2 (en) * 1994-05-13 2004-01-06 ソニー株式会社 Noise reduction method and noise section detection method for voice signal
FR2726392B1 (en) * 1994-10-28 1997-01-10 Alcatel Mobile Comm France METHOD AND APPARATUS FOR SUPPRESSING NOISE IN A SPEAKING SIGNAL, AND SYSTEM WITH CORRESPONDING ECHO CANCELLATION
JP2758846B2 (en) 1995-02-27 1998-05-28 埼玉日本電気株式会社 Noise canceller device
FI100840B (en) * 1995-12-12 1998-02-27 Nokia Mobile Phones Ltd Noise attenuator and method for attenuating background noise from noisy speech and a mobile station
DE69631955T2 (en) * 1995-12-15 2005-01-05 Koninklijke Philips Electronics N.V. METHOD AND CIRCUIT FOR ADAPTIVE NOISE REDUCTION AND TRANSMITTER RECEIVER
US5903819A (en) * 1996-03-13 1999-05-11 Ericsson Inc. Noise suppressor circuit and associated method for suppressing periodic interference component portions of a communication signal
JPH09284877A (en) 1996-04-19 1997-10-31 Toyo Commun Equip Co Ltd Microphone system
JP3297307B2 (en) * 1996-06-14 2002-07-02 沖電気工業株式会社 Background noise canceller
KR100250561B1 (en) * 1996-08-29 2000-04-01 니시무로 타이죠 Noises canceller and telephone terminal use of noises canceller
DE19650410C1 (en) * 1996-12-05 1998-05-07 Deutsche Telekom Ag Noise and echo suppression method for hands-free telephone communication
FR2768547B1 (en) 1997-09-18 1999-11-19 Matra Communication METHOD FOR NOISE REDUCTION OF A DIGITAL SPEAKING SIGNAL
US6175602B1 (en) 1998-05-27 2001-01-16 Telefonaktiebolaget Lm Ericsson (Publ) Signal noise reduction by spectral subtraction using linear convolution and casual filtering

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0062579A1 *

Also Published As

Publication number Publication date
WO2000062579A1 (en) 2000-10-19
AU4399900A (en) 2000-11-14
DE60001398D1 (en) 2003-03-20
MY123423A (en) 2006-05-31
HK1047520B (en) 2005-06-24
CN1356014A (en) 2002-06-26
BR0009740A (en) 2002-01-08
KR20020005674A (en) 2002-01-17
DE60001398T2 (en) 2003-09-04
JP2002542689A (en) 2002-12-10
US20010016020A1 (en) 2001-08-23
CN1175709C (en) 2004-11-10
US6549586B2 (en) 2003-04-15
HK1047520A1 (en) 2003-02-21
EP1169883B1 (en) 2003-02-12
ATE232675T1 (en) 2003-02-15

Similar Documents

Publication Publication Date Title
EP1169883B1 (en) System and method for dual microphone signal noise reduction using spectral subtraction
EP1252796B1 (en) System and method for dual microphone signal noise reduction using spectral subtraction
AU756511B2 (en) Signal noise reduction by spectral subtraction using linear convolution and causal filtering
EP1080463B1 (en) Signal noise reduction by spectral subtraction using spectrum dependent exponential gain function averaging
US6487257B1 (en) Signal noise reduction by time-domain spectral subtraction using fixed filters
JP5671147B2 (en) Echo suppression including modeling of late reverberation components
US7206418B2 (en) Noise suppression for a wireless communication device
WO2005109404A2 (en) Noise suppression based upon bark band weiner filtering and modified doblinger noise estimate
WO2006001960A1 (en) Comfort noise generator using modified doblinger noise estimate
JP2008519553A (en) Noise reduction and comfort noise gain control using a bark band wine filter and linear attenuation
JP2003534570A (en) How to suppress noise in adaptive beamformers
JP2002501337A (en) Method and apparatus for providing comfort noise in a communication system
US6507623B1 (en) Signal noise reduction by time-domain spectral subtraction
Gustafsson et al. Dual-Microphone Spectral Subtraction
Gustafsson et al. Spectral subtraction using correct convolution and a spectrum dependent exponential averaging method.
Gustafsson Speech enhancement for mobile communications

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20011105

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED.

Effective date: 20030212

Ref country code: LI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030212

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030212

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030212

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030212

Ref country code: BE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030212

Ref country code: CH

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030212

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030212

Ref country code: FR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030212

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REF Corresponds to:

Ref document number: 60001398

Country of ref document: DE

Date of ref document: 20030320

Kind code of ref document: P

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20030411

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030411

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20030411

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20030430

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030512

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030512

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030512

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
LTIE Lt: invalidation of european patent or patent extension

Effective date: 20030212

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20030828

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

EN Fr: translation not filed
26N No opposition filed

Effective date: 20031113

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20040411

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20040411

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20090429

Year of fee payment: 10

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20101103