EP1081685A2 - System and method for noise reduction using a single microphone - Google Patents

System and method for noise reduction using a single microphone Download PDF

Info

Publication number
EP1081685A2
EP1081685A2 EP00118147A EP00118147A EP1081685A2 EP 1081685 A2 EP1081685 A2 EP 1081685A2 EP 00118147 A EP00118147 A EP 00118147A EP 00118147 A EP00118147 A EP 00118147A EP 1081685 A2 EP1081685 A2 EP 1081685A2
Authority
EP
European Patent Office
Prior art keywords
noise
speech
data
blocks
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00118147A
Other languages
German (de)
French (fr)
Other versions
EP1081685A3 (en
Inventor
Russell H. Lambert
Karina L. Edmonds
Shi-Ping Hsu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northrop Grumman Corp
Original Assignee
TRW Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TRW Inc filed Critical TRW Inc
Publication of EP1081685A2 publication Critical patent/EP1081685A2/en
Publication of EP1081685A3 publication Critical patent/EP1081685A3/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Definitions

  • This invention relates generally to techniques for reliable conversion of speech data from acoustic signals to electrical signals in an acoustically noisy and reverberant environment.
  • ASR automatic speech recognition
  • background noise from both inside and outside an automobile renders in-vehicle communication both difficult and stressful.
  • Reverberation within the automobile combines with high noise levels to greatly degrade the speech signal received by a microphone in the automobile.
  • the microphone receives not only the original speech signal but also distorted and delayed duplicates of the speech signal, generated by multiple echoes from walls, windows and objects in the automobile interior. These duplicate signals in general arrive at the microphone over different paths.
  • multipath is often applied to the environment.
  • the quality of the speech signal is extremely degraded in such an environment, and the accuracy of any associated ASR systems is also degraded, perhaps to the point where they no longer operate.
  • recognition accuracy of an ASR system as high as 96% in a quiet environment could drop to well below 50% in a moving automobile.
  • speech compression Another related technology affected by noise and reverberation is speech compression, which digitally encodes speech signals to achieve reductions in communication bandwidth and for other reasons. In the presence of noise, speech compression becomes increasingly difficult and unreliable.
  • the active noise reduction approaches cancel acoustic noise signals by generating an opposite signal, sometimes referred to as "anti-noise,” through one or more transducers near the noise source, to cancel the unwanted noise signal.
  • This technique often creates noise at some other location in the vicinity of the speaker, and is not a practical solution for canceling multiple unknown noise sources, especially in the presence of multipath effects.
  • the present invention resides in a system and method for reducing noise in speech signals obtained from a single microphone in a noisy environment.
  • the present invention is a general noise reduction framework that allows multiple parameters to be adjusted optimally for any given application, noise environment or automatic speech recognition (ASR) system.
  • ASR automatic speech recognition
  • the system of the invention comprises a fast Fourier transform (FFT) circuit for transforming blocks of input microphone data to a frequency domain representation; a bandpass filter to remove selected frequency bands in which noise is known to be present; a speech detector for sensing the presence of speech signals in microphone data; a noise spectrum estimator updated only for data blocks in which no speech signals are detected; a spectrum subtraction circuit, for subtracting the estimated noise spectrum from microphone signals containing noise and speech signal components; and a speech emphasis circuit, for emphasizing speech signal components with respect to any residual noise after operation of the spectrum subtraction circuit, to provide a noise-reduced speech signal in the frequency domain.
  • FFT fast Fourier transform
  • the system may further comprise means for reconstructing time-domain data from the noise-reduced speech signal in the frequency domain, including an inverse fast Fourier transform circuit for transforming blocks of data from the frequency domain back into the time domain, whereby the noise-reduced speech signals are more intelligible in voice communication systems.
  • the system may further comprise an automatic speech recognition (ASR) system connected to receive the noise-reduced speech signals in the frequency domain, whereby the ASR system operates more reliably to generate selected control signals.
  • ASR automatic speech recognition
  • the speech emphasis circuit raises signals in the frequency domain by a power N, where N is a positive quantity greater than one.
  • the input signals are presented to the noise reduction system in blocks of "A” samples each, and data blocks of size “2A” samples each are presented to the FFT circuit.
  • the system further comprises means for combining input signal blocks of "A” samples in pairs to form data blocks.
  • the means for combining input signal blocks uses each input signal block twice, such that a currently input signal block is placed in a second half of a current data block and is then placed in a first half of a next data block.
  • the system may further comprise means for applying a triangular weighting window to each data block; and the means for reconstructing time-domain data includes means for combining the first half of each reconstructed data block with the second half of a reconstructed data block saved from processing the previous data block, time-domain samples with a uniform envelope are reconstructed and unwanted artifacts of block processing are minimized.
  • the system further comprises a noise monitor to provide an indication of when use of noise reduction would be desirable; and means for selecting the noise-reduced signal when noise level detected in the noise monitor is detected as relatively high, and for selecting the original speech with noise signal when the detected noise level is relatively low.
  • the invention may also be defined in terms of a method for reducing noise in signals received by a single microphone in a noise environment.
  • the method comprises the steps of transforming blocks of input data from a single microphone from a time-domain representation to a frequency-domain representation; filtering out selected frequency bands to minimize the effect known noise sources; detecting the presence of speech in each block of data signals; estimating noise by updating a noise spectrum estimate when no speech is detected; subtracting the noise spectrum estimate from the input speech and noise signals; and emphasizing speech signal components with respect to noise signal components, by raising the result of the subtracting step to the Nth power, where N is a positive quantity greater than one, to provide frequency-domain speech signal data with a reduced noise content.
  • the method may also include the step of reconstructing time-domain data from the noise-reduced speech signal in the frequency domain, including transforming blocks of data from the frequency domain back into the time domain, whereby the noise-reduced speech signals are more intelligible in voice communication systems.
  • the method includes the step of transmitting the noise-reduced speech signals in the frequency domain to an automatic speech recognition (ASR) system, whereby the ASR system operates more reliably to generate selected control signals.
  • ASR automatic speech recognition
  • the method step of emphasizing speech signal components includes raising signals in the frequency domain by a power N, where N is a positive quantity greater than one.
  • the method further includes the steps of presenting input signals to the noise reduction system in blocks of "A" samples each; presenting data blocks of size "2A” samples to the FFT circuit; combining input signal blocks of "A" samples in pairs to form data blocks, the combining step including using each input signal block twice, such that a currently input signal block is placed in a second half of a current data block and is then placed in a first half of a next data block; applying a triangular weighting window to each data block; and in the reconstructing step, combining the first half of each reconstructed data block with the second half of a reconstructed data block saved from processing the previous data block. Time-domain samples with a uniform envelope are reconstructed and unwanted artifacts of block processing are minimized with use of this method.
  • the method may further comprise the steps of continually monitoring the noise level with a noise monitor, to provide an indication of when use of noise reduction would be desirable; selecting the noise-reduced signal when the noise level detected by the noise monitor is detected as relatively high; and selecting the original speech and noise signal when the detected noise level is relatively low.
  • the present invention is concerned with a technique for significantly reducing the effects of noise in the detection of speech in a noisy and reverberant environment, such as the interior of a moving automobile.
  • the quality of speech transmission from mobile telephones in automobiles has long been known to be poor much of the time.
  • Noise from within and outside the vehicle result in a relatively low signal-to-noise ratio and reverberation of sounds within the vehicle further degrades the speech signals.
  • Available technologies for automatic speech recognition (ASR) and speech compression are at best degraded, and may not operate at all in the environment of the automobile.
  • a noisy speech signal is converted to digital samples and is input a block of samples at a time for processing in a fast Fourier transform (FFT) circuit, as indicated in block 10.
  • FFT fast Fourier transform
  • the signal is first bandpass filtered, as also indicated in block 10.
  • the magnitude spectrum is computed, as indicated in block 12, as the absolute value of the FFT function.
  • each block of data still in the frequency domain, is analyzed to detect the presence or absence of speech, as indicated in block 14.
  • An essential aspect of the invention is to reduce noise by spectral subtraction of noise spectrum estimate. Ideally, this estimate should be based on data obtained when speech is absent.
  • the noise spectrum estimate is not updated, but if speech is absent the noise estimate is updated.
  • the noise spectrum estimate is subtracted from the noisy speech signal spectrum, still in the frequency domain. Then, as indicated in block 20, speech is further emphasized over any residual noise by raising the speech signal (obtained after spectral subtraction of the noise) to the n th power, where n is optimized to provide the most desirable result. Finally, as indicated in block 22, the blocks of data in the frequency domain are subjected to inverse transformation by an inverse FFT circuit, which outputs a "cleaned" speech signal in the time domain.
  • FIG. 2 The functions depicted in FIG. 1 are depicted in more detail in FIG. 2.
  • the general parameter set referred to in FIG. 2 is defined in the following table: Parameter Name Description Range Units
  • a Block size (FFT size is 2A) Real positive integer (usually a power of 2) Samples B Input low cut-off point 0-parameter C Frequency (Hz) C Input high cut-off point Parameter B-sample rate/2 Frequency (Hz) D Spectral compression factor Real positive (greater than 1) Unitless E Speech location lower limit 0-parameter F Frequency (Hz) F Speech location upper limit Parameter E- sample rate/2 Frequency (Hz) G Running average energy update parameter Real positive (between 0 and 1) Unitless H Speech detect threshold parameter Real positive Unitless I Running average noise spectrum update parameter Real positive (between 0 and 1) Unitless J Speech enhancement parameter Real positive (greater than 1) Unitless
  • a Block size (FFT size is 2A) Real positive integer (usually a power of 2) Samples B Input low cut-off point
  • the functions shown in FIG. 2 may be implemented in any desired hardware or software configuration.
  • the noise cancellation system was implemented as software with code in a Microsoft Visual C++ compiler running on a personal computer in real time.
  • Input speech signals are sampled and input in blocks of A samples each.
  • Computation blocks for FFT processing are formed to contain 2A data samples each.
  • the FFT point size is 2A.
  • A may be 128 samples and 2A, 256 samples.
  • Rectangle 40 in FIG. 2 indicates the input of blocks of data.
  • Rectangle 42 indicates that each data computation block of 2A samples is formed from the stream of A-sized blocks in overlapping fashion. More specifically, if the incoming stream of A-sized blocks are designated as block (a), block (b), block (c), block (d) and so forth, then the first data computation block is formed from blocks (a) and (b) together, the next data computation block is formed from blocks (b) and (c) together, the next from blocks (c) and (d) together, and so forth.
  • the reason for overlapping the blocks in this way is to minimize sound artifacts that can be introduced by serially processing the blocks of data.
  • each data computation block is subjected to "windowing" by a triangular weighting function having the profile of an isosceles triangle centered on the data computation block.
  • a triangular weighting function having the profile of an isosceles triangle centered on the data computation block.
  • a maximum weight is applied to a sample or samples at the center of the data computation block, and progressively less weight is applied to samples towards the leading and trailing edges of the block.
  • these triangular windows also overlap.
  • the signals are later converted to the frequency domain and back to the time domain, the contributions from each adjacent pair of overlapping data computation blocks combine to produce a set of samples having a relatively uniform amplitude envelope.
  • each successive data block is formed and windowed, it is introduced to FFT processing, as indicated in rectangle 46, and then subjected to bandpass filtering between limits defined by parameters B and C, as indicated in rectangle 48.
  • This filtering step eliminates noise at very low and very high frequencies, such as below 300 Hz and above 3,850 Hz.
  • a magnitude spectrum S is computed and placed in a compressed domain using parameter D.
  • S compressed S 1/D .
  • the speech energy of the current data block is computed by summing the energy in the frequency range given by parameters E and F, such as 400 to 800 Hz, where speech is most likely to be dominant.
  • decision block 56 the current speech energy is compared with H times the average speech energy E avg , which provides a continually adapting speech detection threshold. If the current speech energy is greater that H* E avg , then the noise spectrum is not updated, as indicated by path 58.
  • the speech spectrum is then computed as the difference between the current spectrum and the noise spectrum estimate, as indicated in rectangle 62.
  • speech enhancement step 64 in which the speech spectrum, together with any residual noise component, is raised to the power J, where J is selected to be greater than one. Raising the signal to a power greater than one further distinguishes speech components from noise components.
  • the speech signals are to be transmitted to a human user of the system, they must next be transformed back to the time domain.
  • Reconstruction of the time domain waveform is also performed on a block by block basis.
  • An inverse FFT operation is performed on each data block, as indicated in rectangle 66.
  • the triangularly windowed data samples that result must be added together in a manner that will produce a uniform data envelope for the reconstructed waveform.
  • the first half of a reconstructed data block is added to the second half of the previously converted block of data, as indicated in block 68. Because these two half-blocks were originally subject to triangular windowing, they now combine in a complementary way to produce a uniform signal envelope.
  • the second half of the current block is saved for the next block iteration, as indicated in rectangle 70.
  • the combined A samples from the current and previous blocks are output, as indicated in rectangle 72.
  • a standard "star search” technique may be used, varying one parameter of the method described above while holding all others fixed. Ideally, this should be repeated for each type of speech and for different noise conditions.
  • One of the most critical parameters is the speech emphasis term, J. This was varied from 1.5 to 2.5 while testing the recognition accuracy for each setting of J. The optimum parameter value indicated was for use of the invention in the presence of freeway road and vehicle noise and for spoken connected digits data.
  • random noise indicated by graph 80
  • graph 80 has a distinctive 'spike' in its autocorrelation function 82
  • a sine wave has a periodic auto-correlation function.
  • a segment of speech 84 has strong components that are periodic sine waves. Therefore, the speech correlates strongly over several milliseconds, as indicated at 86.
  • the noise 80 correlates strongly only at the zero delay point, as indicated by the spike in its autocorrelation function 82. In the correlation domain, the spike due to noise can be easily zeroed out and this is the basis of the spectral subtraction approach used in the present invention.
  • the system of the invention has been tested under practical conditions in a moving vehicle, on a freeway with the windows closed and air-conditioning on, and also with the windows partly open.
  • Two types of microphones were considered, omni-directional and unidirectional. Not unexpectedly, the unidirectional microphone led to significantly better recognition accuracy for all background noise levels. The highest recognition accuracy obtained was 86% from freeway driving with the windows up and air conditioning on using connected digits speech data.
  • the in-vehicle data were initially collected using a digital recorder and the microphone placement was selected to maximize signal-to-noise ratio (SNR). For both the omni-directional and the unidirectional microphone the position that yields the greatest signal was just above the driver's visor (i.e., directly in front of the source). All the tests were conducted using the passenger as the point source for speech. Since the car cabin is symmetric, the results for the driver's side are expected to be equivalent to those obtained from the passenger side.
  • the speech recorded on the digital recorder in the automobile was sampled at 44.1 kHz and subsequently down-sampled to 8 kHz. In order to ensure the integrity of the audio files after down sampling, the files were tested with an automatic speech recognition (ASR) system. No degradation in ASR performance was observed for a file recorded at 44.1 kHz and down-sampled to 8 kHz.
  • ASR automatic speech recognition
  • a software package designed by Lemout and Hauspie ASR1500 was utilized for testing since it allowed for connected digits and has a relatively short response time.
  • the vocabulary tested consisted of eleven digits; 1-9, zero and oh. Connected digits were selected in order to account for the co-articulation factors in recognition process. In the test procedure, each digit is pronounced approximately fifteen times during a dialogue of a random series of connected digits.
  • the recognition accuracy for the digits is significantly improved after the removal of the background noise.
  • recognition rates improved from 47% to 86% for a unidirectional microphone, and from 16% to 78% for an omni-directional microphone.
  • recognition rates improved from 46% to 83% for a unidirectional microphone, and from less than 10% to 39% for the omni-directional microphone.
  • background noise level monitoring system 90 may be incorporated into the standard noise cancellation system of the invention, which would then operate only when a specified level of background noise is present. This would eliminate speech degradation from the processing when there is no background noise.
  • the decision need not be a "hard” (on or off) one. Rather the modified system would appropriately blend the processed and unprocessed speech in a continuously varying manner such that the effect of turning on the processing in high noise conditions would not be noticeable to the system user.
  • the monitored noise level is compared against an upper threshold, as indicated in decision block 92, and if the noise exceeds the threshold, the system selects processed (noise-reduced) speech as indicated in rectangle 94.
  • the monitored noise level is currently below the upper threshold, it is compared with a tower threshold, as indicated in decision block 96. If the noise is below the lower threshold, the original unprocessed speech is selected, as indicated in rectangle 98. If the monitored noise is between the upper and lower thresholds, the system selects a blend of inputs from the original speech and noise-reduced speech signals, as indicated in rectangle 100.
  • the noise reduction system is incorporated into an automatic speech recognition (ASR) system 104 (FIG. 5).
  • ASR automatic speech recognition
  • the noise reduction system is the same as the one illustrated in FIG. 1, but without the final inverse FFT process. This will eliminate some of the speech artifacts that are created when transforming back to the time domain waveform. Where the application calls for voice control of the ASR system only, there is no need to reconstruct the time domain waveform.
  • the inverse FFT function is eliminated from the noise cancellation system and the output of the noise cancellation system is coupled directly to frequency domain inputs 106 of the ASR system 104, which generates appropriate output control signals 108 in response to detection of input speech commands.
  • the present invention represents a significant advance in noise reduction for a single-microphone installed in noisy environment, such as a moving automobile.
  • the invention provides a "cleaned” or noise-reduced speech signal that is more intelligible to the human ear and improves reliability of ASR systems.
  • the system of the invention produces either time-domain output for transmission over voice communication systems, or frequency-domain output for direct connection to an ASR system.

Abstract

A noise reduction technique for use with a single microphone channel. The technique provides a noise reduction framework that allows multiple parameters to be adjusted optimally for any given application, noise environment or automatic speech recognition (ASR) system. The system of the invention includes a fast Fourier transform (FFT) circuit (10) with a bandpass filter to remove known noise frequencies from a speech signal, a speech detector (14), a noise estimator (16) that updates a noise estimate only when speech is not detected, a spectrum subtraction circuit (18) to subtract the noise estimate from the speech and noise signal spectrum, and a speech emphasis circuit (20), which further emphasizes speech signal components with respect to any residual noise. The resulting noise-reduced signals in the frequency domain can be either input directly to an automatic speech recognition (ASR) system, or transformed back to the time domain for use in a voice communication system. A noise monitor (90) may be added to the system, to determine when noise reduction is appropriate, and to avoid unwanted signal distortion when noise reduction is not needed. For further improved performance, input signals are first processed into blocks that are each used twice in forming data blocks for the FFT circuit and subsequent processing, and a triangular weighting window is applied (44) at the FFT input.

Description

    BACKGROUND OF THE INVENTION
  • This invention relates generally to techniques for reliable conversion of speech data from acoustic signals to electrical signals in an acoustically noisy and reverberant environment. There is a growing demand for "hands-free" cellular telephone communication from automobiles, using automatic speech recognition (ASR) for dialing and other functions. However, background noise from both inside and outside an automobile renders in-vehicle communication both difficult and stressful. Reverberation within the automobile combines with high noise levels to greatly degrade the speech signal received by a microphone in the automobile. The microphone receives not only the original speech signal but also distorted and delayed duplicates of the speech signal, generated by multiple echoes from walls, windows and objects in the automobile interior. These duplicate signals in general arrive at the microphone over different paths. Hence the term "multipath" is often applied to the environment. The quality of the speech signal is extremely degraded in such an environment, and the accuracy of any associated ASR systems is also degraded, perhaps to the point where they no longer operate. As an example, recognition accuracy of an ASR system as high as 96% in a quiet environment could drop to well below 50% in a moving automobile.
  • Another related technology affected by noise and reverberation is speech compression, which digitally encodes speech signals to achieve reductions in communication bandwidth and for other reasons. In the presence of noise, speech compression becomes increasingly difficult and unreliable.
  • There are a number of prior art systems that effect active noise cancellation in the acoustic field. The active noise reduction approaches cancel acoustic noise signals by generating an opposite signal, sometimes referred to as "anti-noise," through one or more transducers near the noise source, to cancel the unwanted noise signal. This technique often creates noise at some other location in the vicinity of the speaker, and is not a practical solution for canceling multiple unknown noise sources, especially in the presence of multipath effects.
  • Accordingly, there is still a significant need for reduction of the effects of noise in a reverberant environment, such as the interior of a moving automobile. As discussed in the following summary, the present invention addresses this need.
  • SUMMARY OF THE INVENTION
  • The present invention resides in a system and method for reducing noise in speech signals obtained from a single microphone in a noisy environment. The present invention is a general noise reduction framework that allows multiple parameters to be adjusted optimally for any given application, noise environment or automatic speech recognition (ASR) system. Briefly, and in general terms, the system of the invention comprises a fast Fourier transform (FFT) circuit for transforming blocks of input microphone data to a frequency domain representation; a bandpass filter to remove selected frequency bands in which noise is known to be present; a speech detector for sensing the presence of speech signals in microphone data; a noise spectrum estimator updated only for data blocks in which no speech signals are detected; a spectrum subtraction circuit, for subtracting the estimated noise spectrum from microphone signals containing noise and speech signal components; and a speech emphasis circuit, for emphasizing speech signal components with respect to any residual noise after operation of the spectrum subtraction circuit, to provide a noise-reduced speech signal in the frequency domain.
  • The system may further comprise means for reconstructing time-domain data from the noise-reduced speech signal in the frequency domain, including an inverse fast Fourier transform circuit for transforming blocks of data from the frequency domain back into the time domain, whereby the noise-reduced speech signals are more intelligible in voice communication systems. Alternatively, the system may further comprise an automatic speech recognition (ASR) system connected to receive the noise-reduced speech signals in the frequency domain, whereby the ASR system operates more reliably to generate selected control signals.
  • Preferably, the speech emphasis circuit raises signals in the frequency domain by a power N, where N is a positive quantity greater than one.
  • In the invention as disclosed, the input signals are presented to the noise reduction system in blocks of "A" samples each, and data blocks of size "2A" samples each are presented to the FFT circuit. The system further comprises means for combining input signal blocks of "A" samples in pairs to form data blocks. Moreover, the means for combining input signal blocks uses each input signal block twice, such that a currently input signal block is placed in a second half of a current data block and is then placed in a first half of a next data block. The system may further comprise means for applying a triangular weighting window to each data block; and the means for reconstructing time-domain data includes means for combining the first half of each reconstructed data block with the second half of a reconstructed data block saved from processing the previous data block, time-domain samples with a uniform envelope are reconstructed and unwanted artifacts of block processing are minimized.
  • In accordance with another aspect of the invention, the system further comprises a noise monitor to provide an indication of when use of noise reduction would be desirable; and means for selecting the noise-reduced signal when noise level detected in the noise monitor is detected as relatively high, and for selecting the original speech with noise signal when the detected noise level is relatively low.
  • The invention may also be defined in terms of a method for reducing noise in signals received by a single microphone in a noise environment. Briefly, and in general terms, the method comprises the steps of transforming blocks of input data from a single microphone from a time-domain representation to a frequency-domain representation; filtering out selected frequency bands to minimize the effect known noise sources; detecting the presence of speech in each block of data signals; estimating noise by updating a noise spectrum estimate when no speech is detected; subtracting the noise spectrum estimate from the input speech and noise signals; and emphasizing speech signal components with respect to noise signal components, by raising the result of the subtracting step to the Nth power, where N is a positive quantity greater than one, to provide frequency-domain speech signal data with a reduced noise content.
  • The method may also include the step of reconstructing time-domain data from the noise-reduced speech signal in the frequency domain, including transforming blocks of data from the frequency domain back into the time domain, whereby the noise-reduced speech signals are more intelligible in voice communication systems. Alternatively, the method includes the step of transmitting the noise-reduced speech signals in the frequency domain to an automatic speech recognition (ASR) system, whereby the ASR system operates more reliably to generate selected control signals.
  • Preferably the method step of emphasizing speech signal components includes raising signals in the frequency domain by a power N, where N is a positive quantity greater than one.
  • More specifically the method further includes the steps of presenting input signals to the noise reduction system in blocks of "A" samples each; presenting data blocks of size "2A" samples to the FFT circuit; combining input signal blocks of "A" samples in pairs to form data blocks, the combining step including using each input signal block twice, such that a currently input signal block is placed in a second half of a current data block and is then placed in a first half of a next data block; applying a triangular weighting window to each data block; and in the reconstructing step, combining the first half of each reconstructed data block with the second half of a reconstructed data block saved from processing the previous data block. Time-domain samples with a uniform envelope are reconstructed and unwanted artifacts of block processing are minimized with use of this method.
  • The method may further comprise the steps of continually monitoring the noise level with a noise monitor, to provide an indication of when use of noise reduction would be desirable; selecting the noise-reduced signal when the noise level detected by the noise monitor is detected as relatively high; and selecting the original speech and noise signal when the detected noise level is relatively low.
  • It will be appreciated from the foregoing summary that the present invention represents a significant advance in noise reduction techniques. The combination of features summarized above results in a speech signal that has noise greatly reduced, resulting in more intelligible speech when the signals are used in voice communication systems, and more reliable ASR system operation when the signals are used to operate for ASR and related systems. Other aspects and advantages of the invention will become apparent from the following more detailed description, taken in conjunction with the accompanying drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIGURE 1 is a block diagram of a noise cancellation system in accordance with the present invention;
  • FIG. 2 is a more detailed block diagram of the noise cancellation system of the invention; and
  • FIG. 3 is a set of four related graphs, showing time domain correlation of a noise signal with itself, i.e., autocorrelation, and the time domain autocorrelation of a speech signal;
  • FIG. 4 is a block diagram depicting an alternative embodiment of the invention in which a noise detector is used to control operation of the noise cancellation system; and
  • FIG. 5 is a block diagram showing how the noise cancellation system of the invention may be integrated into an existing automatic speech recognition (ASR) system.
  • DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • As shown in the drawings, the present invention is concerned with a technique for significantly reducing the effects of noise in the detection of speech in a noisy and reverberant environment, such as the interior of a moving automobile. The quality of speech transmission from mobile telephones in automobiles has long been known to be poor much of the time. Noise from within and outside the vehicle result in a relatively low signal-to-noise ratio and reverberation of sounds within the vehicle further degrades the speech signals. Available technologies for automatic speech recognition (ASR) and speech compression are at best degraded, and may not operate at all in the environment of the automobile.
  • In accordance with the present invention, and as shown in FIG. 1, a combination of processing steps, including spectral subtraction of noise, is performed to achieve a significant reduction in noise level. A noisy speech signal is converted to digital samples and is input a block of samples at a time for processing in a fast Fourier transform (FFT) circuit, as indicated in block 10. Upon conversion to the frequency domain by the fast Fourier transform, the signal is first bandpass filtered, as also indicated in block 10. Then the magnitude spectrum is computed, as indicated in block 12, as the absolute value of the FFT function. Then each block of data, still in the frequency domain, is analyzed to detect the presence or absence of speech, as indicated in block 14. An essential aspect of the invention is to reduce noise by spectral subtraction of noise spectrum estimate. Ideally, this estimate should be based on data obtained when speech is absent. As indicated in block 16, if speech is present, the noise spectrum estimate is not updated, but if speech is absent the noise estimate is updated.
  • As indicated in block 18, the noise spectrum estimate is subtracted from the noisy speech signal spectrum, still in the frequency domain. Then, as indicated in block 20, speech is further emphasized over any residual noise by raising the speech signal (obtained after spectral subtraction of the noise) to the nth power, where n is optimized to provide the most desirable result. Finally, as indicated in block 22, the blocks of data in the frequency domain are subjected to inverse transformation by an inverse FFT circuit, which outputs a "cleaned" speech signal in the time domain.
  • The functions depicted in FIG. 1 are depicted in more detail in FIG. 2. The general parameter set referred to in FIG. 2 is defined in the following table:
    Parameter Name Description Range Units
    A Block size (FFT size is 2A) Real positive integer (usually a power of 2) Samples
    B Input low cut-off point 0-parameter C Frequency (Hz)
    C Input high cut-off point Parameter B-sample rate/2 Frequency (Hz)
    D Spectral compression factor Real positive (greater than 1) Unitless
    E Speech location lower limit 0-parameter F Frequency (Hz)
    F Speech location upper limit Parameter E- sample rate/2 Frequency (Hz)
    G Running average energy update parameter Real positive (between 0 and 1) Unitless
    H Speech detect threshold parameter Real positive Unitless
    I Running average noise spectrum update parameter Real positive (between 0 and 1) Unitless
    J Speech enhancement parameter Real positive (greater than 1) Unitless
  • The functions shown in FIG. 2 may be implemented in any desired hardware or software configuration. In an experimental configuration, the noise cancellation system was implemented as software with code in a Microsoft Visual C++ compiler running on a personal computer in real time. Input speech signals are sampled and input in blocks of A samples each. Computation blocks for FFT processing are formed to contain 2A data samples each. Thus the FFT point size is 2A. For example, A may be 128 samples and 2A, 256 samples.
  • Rectangle 40 in FIG. 2 indicates the input of blocks of data. Rectangle 42 indicates that each data computation block of 2A samples is formed from the stream of A-sized blocks in overlapping fashion. More specifically, if the incoming stream of A-sized blocks are designated as block (a), block (b), block (c), block (d) and so forth, then the first data computation block is formed from blocks (a) and (b) together, the next data computation block is formed from blocks (b) and (c) together, the next from blocks (c) and (d) together, and so forth. The reason for overlapping the blocks in this way is to minimize sound artifacts that can be introduced by serially processing the blocks of data. Further, each data computation block, as indicated in rectangle 44, is subjected to "windowing" by a triangular weighting function having the profile of an isosceles triangle centered on the data computation block. Thus, a maximum weight is applied to a sample or samples at the center of the data computation block, and progressively less weight is applied to samples towards the leading and trailing edges of the block. Because the data computation blocks derive data from overlapping A-sized blocks, these triangular windows also overlap. Moreover, when the signals are later converted to the frequency domain and back to the time domain, the contributions from each adjacent pair of overlapping data computation blocks combine to produce a set of samples having a relatively uniform amplitude envelope.
  • After each successive data block is formed and windowed, it is introduced to FFT processing, as indicated in rectangle 46, and then subjected to bandpass filtering between limits defined by parameters B and C, as indicated in rectangle 48. This filtering step eliminates noise at very low and very high frequencies, such as below 300 Hz and above 3,850 Hz. Next, as indicated in rectangle 50, a magnitude spectrum S is computed and placed in a compressed domain using parameter D. S compressed = S1/D .
  • As indicated in rectangle 52, the speech energy of the current data block is computed by summing the energy in the frequency range given by parameters E and F, such as 400 to 800 Hz, where speech is most likely to be dominant. The average speech energy in this range is kept in a running average estimator, as indicated in rectangle 54, using the computation: SpeechEnergyavg = (1-G)*SpeechEnergyavg + G*SpeechEnergycurrent In decision block 56, the current speech energy is compared with H times the average speech energy Eavg , which provides a continually adapting speech detection threshold. If the current speech energy is greater that H*Eavg , then the noise spectrum is not updated, as indicated by path 58. If not, the noise spectrum is updated using parameter I, as indicated in rectangle 60, using the expression: Spectrumavg = (1-I)*Spectrumavg + I*Spectrumcurrent . The speech spectrum is then computed as the difference between the current spectrum and the noise spectrum estimate, as indicated in rectangle 62. Finally, there is an important speech enhancement step 64, in which the speech spectrum, together with any residual noise component, is raised to the power J, where J is selected to be greater than one. Raising the signal to a power greater than one further distinguishes speech components from noise components.
  • As an example of parameter optimization, the effects of various values of parameter J were observed (while holding all other parameters fixed), as indicated in the following table:
    Speech Enhancement Parameter J Accuracy from ASR
    1.5 80%
    1.7 81.4%
    1.85 84%
    1.9 85.6%
    1.95 81.4%
    2.0 80.7%
    2.2 76.4%
    2.5 67.1%
    It will be observed that the best value of parameter J from the standpoint of automatic speech recognition is 1.9.
  • If the speech signals are to be transmitted to a human user of the system, they must next be transformed back to the time domain. Reconstruction of the time domain waveform is also performed on a block by block basis. An inverse FFT operation is performed on each data block, as indicated in rectangle 66. The triangularly windowed data samples that result must be added together in a manner that will produce a uniform data envelope for the reconstructed waveform. More specifically, the first half of a reconstructed data block is added to the second half of the previously converted block of data, as indicated in block 68. Because these two half-blocks were originally subject to triangular windowing, they now combine in a complementary way to produce a uniform signal envelope. The second half of the current block is saved for the next block iteration, as indicated in rectangle 70. The combined A samples from the current and previous blocks are output, as indicated in rectangle 72.
  • For best performance, a standard "star search" technique may be used, varying one parameter of the method described above while holding all others fixed. Ideally, this should be repeated for each type of speech and for different noise conditions. One of the most critical parameters is the speech emphasis term, J. This was varied from 1.5 to 2.5 while testing the recognition accuracy for each setting of J. The optimum parameter value indicated was for use of the invention in the presence of freeway road and vehicle noise and for spoken connected digits data.
  • As shown in FIG. 3, random noise, indicated by graph 80, has a distinctive 'spike' in its autocorrelation function 82, whereas a sine wave has a periodic auto-correlation function. A segment of speech 84 has strong components that are periodic sine waves. Therefore, the speech correlates strongly over several milliseconds, as indicated at 86. In contrast, the noise 80 correlates strongly only at the zero delay point, as indicated by the spike in its autocorrelation function 82. In the correlation domain, the spike due to noise can be easily zeroed out and this is the basis of the spectral subtraction approach used in the present invention.
  • The system of the invention has been tested under practical conditions in a moving vehicle, on a freeway with the windows closed and air-conditioning on, and also with the windows partly open. Two types of microphones were considered, omni-directional and unidirectional. Not unexpectedly, the unidirectional microphone led to significantly better recognition accuracy for all background noise levels. The highest recognition accuracy obtained was 86% from freeway driving with the windows up and air conditioning on using connected digits speech data.
  • The in-vehicle data were initially collected using a digital recorder and the microphone placement was selected to maximize signal-to-noise ratio (SNR). For both the omni-directional and the unidirectional microphone the position that yields the greatest signal was just above the driver's visor (i.e., directly in front of the source). All the tests were conducted using the passenger as the point source for speech. Since the car cabin is symmetric, the results for the driver's side are expected to be equivalent to those obtained from the passenger side. The speech recorded on the digital recorder in the automobile was sampled at 44.1 kHz and subsequently down-sampled to 8 kHz. In order to ensure the integrity of the audio files after down sampling, the files were tested with an automatic speech recognition (ASR) system. No degradation in ASR performance was observed for a file recorded at 44.1 kHz and down-sampled to 8 kHz.
  • In ASR systems, the recognition accuracy is calculated in terms of a digit error rate. The number of substitutions (S), deletions (D) and insertions (I) are divided by the total number of digits (N) tested: Error = S+D+I N x100
  • A software package designed by Lemout and Hauspie ASR1500 was utilized for testing since it allowed for connected digits and has a relatively short response time. The vocabulary tested consisted of eleven digits; 1-9, zero and oh. Connected digits were selected in order to account for the co-articulation factors in recognition process. In the test procedure, each digit is pronounced approximately fifteen times during a dialogue of a random series of connected digits.
  • The recognition accuracy for the digits is significantly improved after the removal of the background noise. With the windows up and air-conditioning on, recognition rates improved from 47% to 86% for a unidirectional microphone, and from 16% to 78% for an omni-directional microphone. With the windows partly open, recognition rates improved from 46% to 83% for a unidirectional microphone, and from less than 10% to 39% for the omni-directional microphone.
  • As shown in FIG. 4, background noise level monitoring system 90 may be incorporated into the standard noise cancellation system of the invention, which would then operate only when a specified level of background noise is present. This would eliminate speech degradation from the processing when there is no background noise. The decision need not be a "hard" (on or off) one. Rather the modified system would appropriately blend the processed and unprocessed speech in a continuously varying manner such that the effect of turning on the processing in high noise conditions would not be noticeable to the system user. By way of example, in this embodiment of the invention the monitored noise level is compared against an upper threshold, as indicated in decision block 92, and if the noise exceeds the threshold, the system selects processed (noise-reduced) speech as indicated in rectangle 94. If the monitored noise level is currently below the upper threshold, it is compared with a tower threshold, as indicated in decision block 96. If the noise is below the lower threshold, the original unprocessed speech is selected, as indicated in rectangle 98. If the monitored noise is between the upper and lower thresholds, the system selects a blend of inputs from the original speech and noise-reduced speech signals, as indicated in rectangle 100.
  • In another embodiment of the invention, the noise reduction system is incorporated into an automatic speech recognition (ASR) system 104 (FIG. 5). The noise reduction system is the same as the one illustrated in FIG. 1, but without the final inverse FFT process. This will eliminate some of the speech artifacts that are created when transforming back to the time domain waveform. Where the application calls for voice control of the ASR system only, there is no need to reconstruct the time domain waveform. The inverse FFT function is eliminated from the noise cancellation system and the output of the noise cancellation system is coupled directly to frequency domain inputs 106 of the ASR system 104, which generates appropriate output control signals 108 in response to detection of input speech commands.
  • It will be appreciated from the foregoing that the present invention represents a significant advance in noise reduction for a single-microphone installed in noisy environment, such as a moving automobile. In particular, the invention provides a "cleaned" or noise-reduced speech signal that is more intelligible to the human ear and improves reliability of ASR systems. The system of the invention produces either time-domain output for transmission over voice communication systems, or frequency-domain output for direct connection to an ASR system. It will also be appreciated that, although a number of embodiments have been described in detail for purposes of illustration, various modifications may be made without departing from the spirit and scope of the invention. Accordingly, the invention should not be limited except as by the appended claims.

Claims (7)

  1. A noise reduction system for a single microphone in a noise environment, the system comprising:
    a fast Fourier transform (FFT) circuit for transforming blocks of input microphone data to a frequency domain representation;
    a bandpass filter to remove selected frequency bands in which noise is known to be present;
    a speech detector for sensing the presence of speech signals in microphone data;
    a noise spectrum estimator updated only for data blocks in which no speech signals are detected;
    a spectrum subtraction circuit, for subtracting the estimated noise spectrum from microphone signals containing noise and speech signal components; and
    a speech emphasis circuit, for the emphasizing speech signal components with respect to any residual noise after operation of the spectrum subtraction circuit, to provide a noise-reduced speech signal in the frequency domain.
  2. A noise reduction system as defined in claim 1, and further comprising:
    means for reconstructing time-domain data from the noise-reduced speech signal in the frequency domain, including an inverse fast Fourier transform circuit for transforming blocks of data from the frequency domain back into the time domain, whereby the noise-reduced speech signals are more intelligible in voice communication systems.
  3. A noise reduction system as defined in claim 1, and further comprising:
    an automatic speech recognition (ASR) system connected to receive the noise-reduced speech signals in the frequency domain, whereby the ASR system operates more reliably to generate selected control signals.
  4. A noise reduction system as defined in claim 2, wherein:
    input signals are presented to the noise reduction system in blocks of "A" samples each;
    data blocks of size "2A" samples each are presented to the FFT circuit;
    the system further comprises means for combining input signal blocks of "A" samples in pairs to form data blocks;
    the means for combining input signal blocks uses each input signal block twice, such that a currently input signal block is place in a second half of a current data block and is then placed in a first half of a next data block;
    the system further comprises means for applying a triangular weighting window to each data block; and
    the means for reconstructing time-domain data includes means for combining the first half of each reconstructed data block with the second half of a reconstructed data block saved from processing the previous data block, time-domain samples with a uniform envelope are reconstructed and unwanted artifacts of block processing are minimized.
  5. A method for reducing noise in signals generated by a single microphone in a noise environment, the method comprising the steps of:
    transforming blocks of input data from a single microphone from a time-domain representation to a frequency-domain representation;
    filtering out selected frequency bands to minimize the effect known noise sources;
    detecting the presence of speech in each block of data signals;
    estimating noise by updating a noise spectrum estimate when no speech is detected;
    subtracting the noise spectrum estimate from input speech and noise signals; and
    empasizing speech signal components with respect to noise signal components, by raising the result of the subtracting step to the Nth power, where N is a positive quantity greater than one, to provide frequency-domain speech signal data with a reduced noise content.
  6. A method as defined in claim 5, and further comprising:
    reconstructing time-domain data from the noise-reduced speech signal in the frequency domain, including transforming blocks of data from the frequency domain back into the time domain, whereby the noise-reduced speech signals are more intelligible in voice communication systems.
  7. A method as defined in claim 6, and further including the steps of:
    presenting input signals to the noise reduction system in blocks of "A" samples each;
    presenting data blocks of size "2A" samples each to the FFT circuit;
    combining input signal blocks of "A" samples in pairs to form data blocks, the combining step including using each input signal block twice, such that a currently input signal block is placed in a second half of a current data block and is then placed in a first half of a next data block;
    applying a triangular weighting window to each data block; and
    in the reconstructing step, combining the first half of each reconstructed data block with the second half of a reconstructed data block saved from processing the previous data block, wherein time-domain samples with a uniform envelope are reconstructed and unwanted artifacts of block processing are minimized.
EP00118147A 1999-09-01 2000-08-29 System and method for noise reduction using a single microphone Withdrawn EP1081685A3 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US38826699A 1999-09-01 1999-09-01
US388266 1999-09-01

Publications (2)

Publication Number Publication Date
EP1081685A2 true EP1081685A2 (en) 2001-03-07
EP1081685A3 EP1081685A3 (en) 2002-04-24

Family

ID=23533388

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00118147A Withdrawn EP1081685A3 (en) 1999-09-01 2000-08-29 System and method for noise reduction using a single microphone

Country Status (2)

Country Link
EP (1) EP1081685A3 (en)
JP (1) JP2001092491A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100421013B1 (en) * 2001-08-10 2004-03-04 삼성전자주식회사 Speech enhancement system and method thereof
GB2437559A (en) * 2006-04-26 2007-10-31 Zarlink Semiconductor Inc System for reducing background noise in a speech signal by use of a fast Fourier transform
CN101320566B (en) * 2008-06-30 2010-10-20 中国人民解放军第四军医大学 Non-air conduction speech reinforcement method based on multi-band spectrum subtraction
CN102930870A (en) * 2012-09-27 2013-02-13 福州大学 Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
US8538749B2 (en) 2008-07-18 2013-09-17 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for enhanced intelligibility
US8615393B2 (en) 2006-11-15 2013-12-24 Microsoft Corporation Noise suppressor for speech recognition
US8831936B2 (en) 2008-05-29 2014-09-09 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for speech signal processing using spectral contrast enhancement
US9053697B2 (en) 2010-06-01 2015-06-09 Qualcomm Incorporated Systems, methods, devices, apparatus, and computer program products for audio equalization
JP2015169915A (en) * 2014-03-10 2015-09-28 公立大学法人広島市立大学 Active noise control device and method
CN104978955A (en) * 2014-04-14 2015-10-14 美的集团股份有限公司 Voice control method and system
US9202456B2 (en) 2009-04-23 2015-12-01 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for automatic control of active noise cancellation
WO2016094418A1 (en) * 2014-12-09 2016-06-16 Knowles Electronics, Llc Dynamic local asr vocabulary
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
US9820042B1 (en) 2016-05-02 2017-11-14 Knowles Electronics, Llc Stereo separation and directional suppression with omni-directional microphones
US9838784B2 (en) 2009-12-02 2017-12-05 Knowles Electronics, Llc Directional audio capture
US9978388B2 (en) 2014-09-12 2018-05-22 Knowles Electronics, Llc Systems and methods for restoration of speech components
WO2018140020A1 (en) * 2017-01-26 2018-08-02 Nuance Communications, Inc. Methods and apparatus for asr with embedded noise reduction
US10045140B2 (en) 2015-01-07 2018-08-07 Knowles Electronics, Llc Utilizing digital microphones for low power keyword detection and noise suppression
CN111724805A (en) * 2020-06-29 2020-09-29 北京百度网讯科技有限公司 Method and apparatus for processing information
CN114650484A (en) * 2022-05-23 2022-06-21 东莞市云仕电子有限公司 Wireless earphone with automatic noise reduction function and use method thereof

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI19992453A (en) * 1999-11-15 2001-05-16 Nokia Mobile Phones Ltd noise Attenuation
JP7231181B2 (en) * 2018-07-17 2023-03-01 国立研究開発法人情報通信研究機構 NOISE-RESISTANT SPEECH RECOGNITION APPARATUS AND METHOD, AND COMPUTER PROGRAM
WO2023100374A1 (en) * 2021-12-03 2023-06-08 日本電信電話株式会社 Signal processing device, signal processing method, and signal processing program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5012519A (en) * 1987-12-25 1991-04-30 The Dsp Group, Inc. Noise reduction system
EP0637012A2 (en) * 1990-01-18 1995-02-01 Matsushita Electric Industrial Co., Ltd. Signal processing device
US5742927A (en) * 1993-02-12 1998-04-21 British Telecommunications Public Limited Company Noise reduction apparatus using spectral subtraction or scaling and signal attenuation between formant regions

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5012519A (en) * 1987-12-25 1991-04-30 The Dsp Group, Inc. Noise reduction system
EP0637012A2 (en) * 1990-01-18 1995-02-01 Matsushita Electric Industrial Co., Ltd. Signal processing device
US6038532A (en) * 1990-01-18 2000-03-14 Matsushita Electric Industrial Co., Ltd. Signal processing device for cancelling noise in a signal
US5742927A (en) * 1993-02-12 1998-04-21 British Telecommunications Public Limited Company Noise reduction apparatus using spectral subtraction or scaling and signal attenuation between formant regions

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100421013B1 (en) * 2001-08-10 2004-03-04 삼성전자주식회사 Speech enhancement system and method thereof
GB2437559A (en) * 2006-04-26 2007-10-31 Zarlink Semiconductor Inc System for reducing background noise in a speech signal by use of a fast Fourier transform
GB2437559B (en) * 2006-04-26 2010-12-22 Zarlink Semiconductor Inc Low complexity noise reduction method
US8615393B2 (en) 2006-11-15 2013-12-24 Microsoft Corporation Noise suppressor for speech recognition
US8831936B2 (en) 2008-05-29 2014-09-09 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for speech signal processing using spectral contrast enhancement
CN101320566B (en) * 2008-06-30 2010-10-20 中国人民解放军第四军医大学 Non-air conduction speech reinforcement method based on multi-band spectrum subtraction
US8538749B2 (en) 2008-07-18 2013-09-17 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for enhanced intelligibility
US9202456B2 (en) 2009-04-23 2015-12-01 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for automatic control of active noise cancellation
US9838784B2 (en) 2009-12-02 2017-12-05 Knowles Electronics, Llc Directional audio capture
US9053697B2 (en) 2010-06-01 2015-06-09 Qualcomm Incorporated Systems, methods, devices, apparatus, and computer program products for audio equalization
CN102930870A (en) * 2012-09-27 2013-02-13 福州大学 Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
CN102930870B (en) * 2012-09-27 2014-04-09 福州大学 Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
US9536540B2 (en) 2013-07-19 2017-01-03 Knowles Electronics, Llc Speech signal separation and synthesis based on auditory scene analysis and speech modeling
JP2015169915A (en) * 2014-03-10 2015-09-28 公立大学法人広島市立大学 Active noise control device and method
CN104978955A (en) * 2014-04-14 2015-10-14 美的集团股份有限公司 Voice control method and system
US9978388B2 (en) 2014-09-12 2018-05-22 Knowles Electronics, Llc Systems and methods for restoration of speech components
WO2016094418A1 (en) * 2014-12-09 2016-06-16 Knowles Electronics, Llc Dynamic local asr vocabulary
US10045140B2 (en) 2015-01-07 2018-08-07 Knowles Electronics, Llc Utilizing digital microphones for low power keyword detection and noise suppression
US9820042B1 (en) 2016-05-02 2017-11-14 Knowles Electronics, Llc Stereo separation and directional suppression with omni-directional microphones
WO2018140020A1 (en) * 2017-01-26 2018-08-02 Nuance Communications, Inc. Methods and apparatus for asr with embedded noise reduction
CN110268471A (en) * 2017-01-26 2019-09-20 诺昂世通讯公司 The method and apparatus of ASR with embedded noise reduction
EP3574499A4 (en) * 2017-01-26 2020-09-09 Nuance Communications, Inc. Methods and apparatus for asr with embedded noise reduction
US11308946B2 (en) 2017-01-26 2022-04-19 Cerence Operating Company Methods and apparatus for ASR with embedded noise reduction
CN110268471B (en) * 2017-01-26 2023-05-02 赛伦斯运营公司 Method and apparatus for ASR with embedded noise reduction
CN111724805A (en) * 2020-06-29 2020-09-29 北京百度网讯科技有限公司 Method and apparatus for processing information
CN114650484A (en) * 2022-05-23 2022-06-21 东莞市云仕电子有限公司 Wireless earphone with automatic noise reduction function and use method thereof
CN114650484B (en) * 2022-05-23 2022-09-06 东莞市云仕电子有限公司 Wireless earphone with automatic noise reduction function and use method thereof

Also Published As

Publication number Publication date
EP1081685A3 (en) 2002-04-24
JP2001092491A (en) 2001-04-06

Similar Documents

Publication Publication Date Title
EP1081685A2 (en) System and method for noise reduction using a single microphone
EP1739657B1 (en) Speech signal enhancement
US8010355B2 (en) Low complexity noise reduction method
US6487257B1 (en) Signal noise reduction by time-domain spectral subtraction using fixed filters
US8249861B2 (en) High frequency compression integration
Meyer et al. Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction
EP1080465B1 (en) Signal noise reduction by spectral substraction using linear convolution and causal filtering
KR100851716B1 (en) Noise suppression based on bark band weiner filtering and modified doblinger noise estimate
EP2244254B1 (en) Ambient noise compensation system robust to high excitation noise
Yang Frequency domain noise suppression approaches in mobile telephone systems
EP1855456B1 (en) Echo reduction in time-variant systems
EP2416315B1 (en) Noise suppression device
US5878389A (en) Method and system for generating an estimated clean speech signal from a noisy speech signal
US20060222184A1 (en) Multi-channel adaptive speech signal processing system with noise reduction
US6510224B1 (en) Enhancement of near-end voice signals in an echo suppression system
KR20070085729A (en) Noise reduction and comfort noise gain control using bark band weiner filter and linear attenuation
US20140244245A1 (en) Method for soundproofing an audio signal by an algorithm with a variable spectral gain and a dynamically modulatable hardness
KR100470523B1 (en) Process and Apparatus for Eliminating Loudspeaker Interference from Microphone Signals
Itoh et al. Environmental noise reduction based on speech/non-speech identification for hearing aids
US7917359B2 (en) Noise suppressor for removing irregular noise
US20060184361A1 (en) Method and apparatus for reducing an interference noise signal fraction in a microphone signal
EP2490218B1 (en) Method for interference suppression
US6507623B1 (en) Signal noise reduction by time-domain spectral subtraction
Esch et al. Combined reduction of time varying harmonic and stationary noise using frequency warping
US11227622B2 (en) Speech communication system and method for improving speech intelligibility

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17P Request for examination filed

Effective date: 20020712

AKX Designation fees paid

Free format text: AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: NORTHROP GRUMMAN CORPORATION

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: NORTHROP GRUMMAN CORPORATION

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20050301