US5706395A - Adaptive weiner filtering using a dynamic suppression factor - Google Patents
Adaptive weiner filtering using a dynamic suppression factor Download PDFInfo
- Publication number
- US5706395A US5706395A US08/425,125 US42512595A US5706395A US 5706395 A US5706395 A US 5706395A US 42512595 A US42512595 A US 42512595A US 5706395 A US5706395 A US 5706395A
- Authority
- US
- United States
- Prior art keywords
- noise
- frame
- filter
- estimate
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 230000001629 suppression Effects 0.000 title claims abstract description 54
- 238000001914 filtration Methods 0.000 title claims abstract description 12
- 230000003044 adaptive effect Effects 0.000 title description 26
- 238000001228 spectrum Methods 0.000 claims description 43
- 238000000034 method Methods 0.000 claims description 15
- 238000012546 transfer Methods 0.000 claims description 14
- 230000001131 transforming effect Effects 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims 2
- 238000000638 solvent extraction Methods 0.000 claims 1
- 230000005534 acoustic noise Effects 0.000 abstract description 3
- 230000003595 spectral effect Effects 0.000 description 48
- 230000006870 function Effects 0.000 description 17
- 238000009499 grossing Methods 0.000 description 14
- 230000005540 biological transmission Effects 0.000 description 12
- 238000013459 approach Methods 0.000 description 11
- 230000001965 increasing effect Effects 0.000 description 11
- 238000004458 analytical method Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 238000005192 partition Methods 0.000 description 6
- 238000003860 storage Methods 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 230000006835 compression Effects 0.000 description 5
- 238000007906 compression Methods 0.000 description 5
- 230000001419 dependent effect Effects 0.000 description 5
- 230000005284 excitation Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- 230000007423 decrease Effects 0.000 description 4
- 238000003780 insertion Methods 0.000 description 4
- 230000037431 insertion Effects 0.000 description 4
- 238000013139 quantization Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000005654 stationary process Effects 0.000 description 3
- 238000011410 subtraction method Methods 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000005311 autocorrelation function Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 230000009365 direct transmission Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Definitions
- the invention relates to electronic devices, and, more particularly, to speech analysis and synthesis devices and systems.
- Human speech consists of a stream of acoustic signals with frequencies ranging up to roughly 20 KHz; but the band of 100 Hz to 5 KHz contains the bulk of the acoustic energy.
- Telephone transmission of human speech originally consisted of conversion of the analog acoustic signal stream into an analog electrical voltage signal stream (e.g., microphone) for transmission and reconversion to an acoustic signal stream (e.g., loudspeaker) for reception.
- the linear speech production model presumes excitation of a variable filter (which roughly represents the vocal tract) by either a pulse train for voiced sounds or white noise for unvoiced sounds followed by amplification or gain to adjust the loudness.
- the model produces a stream of sounds simply by periodically making a voiced/unvoiced decision plus adjusting the filter coefficients and the gain.
- Markel and Gray Linear Prediction of Speech (Springer-Verlag 1976).
- the linear prediction method partitions a stream of speech samples s(n) into "frames" of, for example, 180 successive samples (22.5 msec intervals for a 8 KHz sampling rate); and the samples in a frame then provide the data for computing the filter coefficients for use in coding and synthesis of the sound associated with the frame.
- Each frame generates coded bits for the linear prediction filter coefficients (LPC), the pitch, the voiced/unvoiced decision, and the gain.
- LPC linear prediction filter coefficients
- CELP codebook excitation linear prediction
- CELP first analyzes a speech frame to find the LPC filter coefficients, and then filters the frame with the LPC filter.
- CELP determines a pitch period from the filtered frame and removes this periodicity with a comb filter to yield a noise-looking excitation signal.
- CELP encodes the excitation signals using a codebook.
- CELP transmits the LPC filter coefficients, pitch, gain, and the codebook index of the excitation signal.
- FIG. 1a schematically illustrates an overall system 100 of modules for speech acquisition, noise suppression, analysis, transmission/storage, synthesis, and playback.
- a microphone converts sound waves into electrical signals, and sampling analog-to-digital converter 102 typically samples at 8 KHz to cover the speech spectrum up to 4 KHz.
- System 100 may partition the stream of samples into frames with smooth windowing to avoid discontinuities.
- Noise suppression 104 filters a frame to suppress noise, and analyzer 106 extracts LPC coefficients, pitch, voicing, and gain from the noise-suppressed frame for transmission and/or storage 108.
- the transmission may be any type used for digital information transmission, and the storage may likewise be any type used to store digital information. Of course, types of encoding analysis other than LPC could be used.
- Synthesizer 110 combines the LPC coefficients, pitch, voicing, and gain information to synthesize frames of sampled speech which digital-to-analog convertor (DAC) 112 converts to analog signals to drive a loudspeaker or other playback device to regenerate sound waves.
- DAC digital-to-analog convertor
- FIG. 1b shows an analogous system 150 for voice recognition with noise suppression.
- the recognition analyzer may simply compare input frames with frames from a database or may analyze the input frames and compare parameters with known sets of parameters. Matches found between input frames and stored information provides recognition output.
- an estimate for P S ( ⁇ ), and thus s(j), could be obtained from the observed noisy speech y(j) and the noise observed during intervals of (presumed) silence in the observed noisy speech.
- P Y ( ⁇ ) as the squared magnitude of the Fourier transform of y(j)
- P N ( ⁇ ) as the squared magnitude of the Fourier transform of the observed noise.
- noisy speech autocorrelation is:
- P Y ( ⁇ ) equals
- the power spectral density P N ( ⁇ ) of the noise signal can be estimated by detection during noise-only periods, so the speech power spectral estimate becomes ##EQU1## which is the spectral subtraction.
- This spectral subtraction can attenuate noise substantially, but it has problems including the introduction of fluctuating tonal noises commonly referred to as musical noises.
- a noncausal Wiener filter cannot be directly applied to provide an estimate for s(j) because speech is not stationary and the power spectral density P S ( ⁇ ) is not known.
- P S ( ⁇ ) the power spectral density
- the Fourier transforms of the windowed sampled speech signals in systems 100 and 150 can be computed in either fixed point or floating point format. Fixed point is cheaper to implement in hardware but has less dynamic range for a comparable number of bits. Automatic gain control limits the dynamic range of the speech samples by adjusting magnitudes according to a moving average of the preceding sample magnitudes, but this also destroys the distinction between loud and quiet speech. Further, the acoustic energy may be concentrated in a narrow frequency band and the Fourier transform will have large dynamic range even for speech samples with relatively constant magnitude. To compensate for such overflow potential in fixed point format, a few bits may be reserved for large Fourier transform dynamic range; but this implies a loss of resolution for small magnitude samples and consequent degradation of quiet speech. This is especially true for systems which follow a Fourier transform with an inverse Fourier transform.
- the present invention provides speech noise suppression by spectral subtraction filtering improved with filter clamping, limiting, and/or smoothing, plus generalized Wiener filtering with a signal-to-noise ratio dependent noise suppression factor, and plus a generalized Wiener filter based on a speech estimate derived from codebook noisy speech analysis and resynthesis. And each frame of samples has a frame-energy-based scaling applied prior to and after Fourier analysis to preserve quiet speech resolution.
- the invention has advantages including simple speech noise suppression.
- FIGS. 1a-b show speech systems with noise suppression.
- FIG. 2 illustrates a preferred embodiment noise suppression subsystem.
- FIGS. 3-5 are flow diagrams for preferred embodiment noise suppression.
- FIG. 6 is a flow diagram for a framewise scaling preferred embodiment.
- FIGS. 7-8 illustrate spectral subtraction preferred embodiment aspects.
- FIGS. 9a-b shows spectral subtraction preferred embodiment systems.
- FIGS. 10a-b illustrates spectral subtraction preferred embodiments with adaptive minimum gain clamping.
- FIG. 11 is a block diagram of a modified Wiener filter preferred embodiment system.
- FIG. 12 shows a codebook based generalized Wiener filter preferred embodiment system.
- FIG. 13 illustrates a preferred embodiment internal precision control system.
- FIG. 2 shows a preferred embodiment noise suppression filter system 200.
- frame buffer 202 partitions an incoming stream of speech samples into overlapping frames of 256-sample size and windows the frames;
- FFT module 204 converts the frames to the frequency domain by fast Fourier transform;
- multiplier 206 pointwise multiplies the frame by the filter coefficients generated in noise filter block 208;
- IFFT module 210 converts back to the time domain by inverse fast Fourier transform.
- Noise suppressed frame buffer 212 holds the filtered output for speech analysis, such as LPC coding, recognition, or direct transmission.
- the filter coefficients in block 208 derive from estimates for the noise spectrum and the noisy speech spectrum of the frame, and thus adapt to the changing input. All of the noise suppression computations may be performed with a standard digital signal processor such as a TMS320C25, which can also perform the subsequent speech analysis, if any. Also, general purpose microprocessors or specialized hardware could be used.
- noise suppression filters may also be realized without Fourier transforms; however, the multiplication of Fourier transforms then corresponds to convolution of functions.
- the preferred embodiment noise suppression filters may each be used as the noise suppression blocks in the generic systems of FIGS. 1a-b to yield preferred embodiment systems.
- the smoothed spectral subtraction preferred embodiments have a spectral subtraction filter which (1) clamps attenuation to limit suppression for inputs with small signal-to-noise ratios, (2) increases noise estimate to avoid filter fluctuations, (3) smoothes noisy speech and noise spectra used for filter definition, and (4) updates a noise spectrum estimate from the preceding frame using the noisy speech spectrum.
- the attenuation clamp may depend upon speech and noise estimates in order to lessen the attenuation (and distortion) for speech; this strategy may depend upon estimates only in a relatively noise-free frequency band.
- FIG. 3 is a flow diagram showing all four aspects for the generation of the noise suppression filter of block 208.
- These preferred embodiments also use a scaled LPC spectral approximation of the noisy speech for a smoothed speech power spectrum estimate as illustrated in the flow diagram Figure 4.
- FIG. 4 also illustrates an optional filtered ⁇ .
- FIG. 5 illustrates the flow.
- filter definitions may also be used for adaptive scaling of low power signals to avoid loss of precision during FFT or other operations.
- the scaling factor adapts to each frame so that with fixed-point digital computations the scale expands or contracts the samples to provide a constant overflow headroom, and after the computations the inverse scale restores the frame power level.
- FIG. 6 illustrates the flow. This scaling applies without regard to automatic gain control and could even be used in conjunction with an automatic gain controlled input.
- FIG. 3 illustrates as a flow diagram the various aspects of the spectral subtraction preferred embodiments as used to generate the filter.
- spectral subtraction filter consists of applying a frequency-dependent attenuation to each frequency in the noisy speech power spectrum with the attenuation tracking the input signal-to-noise power ratio at each frequency. That is, H( ⁇ ) represents a linear time-varying filter. Consequently, as shown in FIG.
- FIG. 8 shows the probability distribution of the FFT power spectral estimate at a given frequency of white noise with unity power (labelled "no smoothing"), and illustrates the amount of variation which can be expected.
- the preferred embodiments modify this standard spectral subtraction in four independent but synergistic approaches as detailed in the following.
- each frame has a Hann window of width 256.
- each frame has 256 samples y(j), and the frames add to reconstruct the input speech stream.
- FIG. 7 has this labelled as "damped” and illustrates a 10 dB clamp.
- the clamping prevents the noise suppression filter H( ⁇ ) from fluctuating around very small gain values, and also reduces potential speech signal distortion.
- the corresponding filter would be:
- the 10 dB damp could be replaced with any other desirable damp level, such as 5 dB or 20 dB.
- the damping could include a sloped damp or stepped clamping or other more general clamping curves, but a simple damp lessens computational complexity.
- the following "Adaptive filter damp" section describes a damp which adapts to the input signal energy level.
- FIG. 7 shows a 5 dB noise increase factor with the resulting attenuation curve labelled "noise increased”. Further, the factor could vary with frequency such as more noise increase (i.e., more attenuation) at low frequencies.
- More spectral smoothing reduces noise fluctuations in the filtered speech signal because it reduces the variance of spectral estimation for noisy frames; however, spectral smoothing decreases the spectral resolution so that the noise suppression attenuation filter cannot track sharp spectral characteristics.
- the preferred embodiment operates with sampling at 8 KHz and windows the input into frames of size 56 samples (32 milliseconds); thus an FFT on the frame generates the Fourier transform as a function on a domain of 256 frequency values. Take the smoothing window W( ⁇ ) to have a width of 32 frequencies, so convolution with W( ⁇ ) averages over 32 adjacent frequencies. W( ⁇ ) may be a simple rectangular window or any other window.
- the filter transfer function with such smoothing is:
- Any noise suppression by spectral subtraction requires an estimate of the noise power spectrum.
- Typical methods update an average noise spectrum during periods of nonspeech activity, but the performance of this approach depends upon accurate estimation of speech intervals which is a difficult technical problem.
- Some kinds of acoustic noise may have speech-like characteristics, and if they are incorrectly classified as speech, then the noise estimated will not be updated frequently enough to track changes in the noise environment.
- the preferred embodiment takes noise as any signal which is always present.
- the noise power spectrum estimate can increase up to 3 dB per second or decrease up to 12 dB per second.
- the initial estimate can simply be taken as the first input frame which typically will be silence; of course, other initial estimates could be used such as a simple constant.
- This approach is simple to implement, and is robust in actual performance since it makes no asumptions about the characteristics of either the speech or the noise signals.
- multiplicative factors other than 0.978 and 1.006 could be used provided that the decrease limit exceeds the increase limit. That is, the product of the multiplicative factors is less than 1; e.g., (0.978)(1.006) is less than 1.
- a preferred embodiment filter may include one or more of the four modifications, and a preferred embodiment filter combining all four of the foregoing modifications will have a transfer function:
- FIG. 9a shows in block form preferred embodiment noise suppressor 900 which implements a preferred embodiment spectral subtraction with all four of the preferred embodiment modifications.
- FFT module 902 performs a fast Fourier transform of an input frame to give Y( ⁇ )
- magnitude squarer 904 generates
- noise buffer (memory) 908 holds P N '( ⁇ )
- ALU (arithmetic logic unit plus memory) 910 compares P Y and P N ' and computes P N and updates buffer 908, ALU 912 computes 1-4P N ( ⁇ )/P Y , clamper 914 computes H( ⁇ ), multiplier 920 applies H( ⁇ ) to Y( ⁇ ), and IFFT module 922 does an inverse Fourier transform to yield the noise-suppression filtered frame.
- Controller 930 provides the timing and enable merit signals to the various components.
- the filter attenuation clamp of the preceding section can be replaced with an adaptive filter attenuation clamp. For example, take
- M can be increased in the presence of speech without the listener hearing increased noise. This has the benefit of lessening the attentuation of the speech and thus causing less speech distortion. Because a common response to having difficulty communicating over the phone is to speak louder, this decreasing the filter attenuation with increased speech power will lessen distortion and improve speech quality. Simply put, the system will transmit clearer speech the louder a person talks.
- YP be the sum of the signal power spectrum over the frequency range 1.8 KHz to 4.0 KHz: with a 256-sample frame sampling at 8 KHz and 256-point FFT, this corresponds to frequencies 51 ⁇ /128 to ⁇ . That is,
- YP-NP may become negative for near silent frames, so preserve the minimum clamp at A by ignoring the B(YP-NP) factor when YP-NP is negative. Also, an upper limit of -4 dB for very loud frames could be imposed by replacing B(YP-NP) with min -4 dB, B(YP-NP)!.
- the filter gain clamp could vary between A taken equal to 1000 (0.125), which is roughly -9 dB, and an upper limit for A+B(YP-NP) taken equal to 3000 (0.375), which is roughly -4.4 dB. More conservatively, the damp could be constrained to the range of 1800 to 2800.
- M OLD M from the previous frame
- M OLD M from the previous frame
- M OLD M for the current frame simply equal to (17/16)M OLD , when M OLD is less than A+B(YP-NP) and (15/16)M OLD when M OLD is greater than A+B(YP-NP).
- the preceding adaptive clamp depends linearly on the speech power; however, other dependencies such as quadratic could also be used provided that the functional dependence is monotonic. Indeed, memory in system and slow adaptation rates for M make the clamp nonlinear.
- FIG. 10a heuristically illustrates an adaptive clamp in a form analogous to FIG. 7; of course, the adaptive clamp depends upon the magnitude of the difference of the sums (over a band) of input and noise powers, whereas the independent variable in FIG. 10a is the power ratio at a single frequency. However, as the power ratio increases for "average” frequencies, the magnitude of the difference of the sums of input and noise powers over the band also increases, so the clamp ramps up as indicated in FIG. 10a for "average” frequencies.
- FIG. 10a heuristically illustrates an adaptive clamp in a form analogous to FIG. 7; of course, the adaptive clamp depends upon the magnitude of the difference of the sums (over a band) of input and noise powers, whereas the independent variable in FIG. 10a is the power ratio at a single frequency. However, as the power ratio increases for "average” frequencies, the magnitude of the difference of the sums of input and noise powers over the band also increases, so the clamp ramps up as indicated in FIG. 10a for "average” frequencies.
- the adaptive clamp could be taken as dependent upon the ratio YP/NP instead of just the difference or on some combination.
- the positive slope of the adaptive clamp could be used to have a greater attenuation (e.g., -15 dB) for the independent variable equal to 0 and ramp up to an attenuation less than the constant clamp (which is -10 dB) for the independent variable greater than 3 dB.
- the adaptive clamp achieves both better speech quality and better noise attenuation than the constant clamp.
- YP and NP could be defined by the previous frame in order to make an implementation on a DSP more memory efficient. For most frames the YP and NP will be close to those of the preceding frame.
- FIG. 9b illustrates in block form preferred embodiment noise suppressor 950 which includes the components of system 900 but with an adaptive clamper 954 which has the additional inputs of YP from filter 956 and NP from filter 960. Insertion of noise suppressor 950 into the system of FIGS. 1a-b as the noise suppression blocks provides preferred embodiment systems in which noise suppressor 950 in part controls the output.
- FIG. 4 is a flow diagram for a modified generalized Wiener filter preferred embodiment. Recall that a generalized Wiener filter with power ⁇ equal 1/2 has a transfer function:
- the preferred embodiments modify the generalized Wiener filter by using an ⁇ which tracks the signal-to-noise power ratio of the input rather than just a constant.
- the preferred embodiment may be understood in terms of the following intuitive analysis.
- P S ( ⁇ ) to be cP Y ( ⁇ ) for a constant c with P Y ( ⁇ ) the power spectrum of the input noisy speech modelled by LPC. That is, the LPC model for y(j) in some sense removes the noise.
- E Y cE Y +E N
- E Y is the energy of the noisy speech LPC model and also an estimate for the energy of y(j)
- E N is the energy of the noise in the frame.
- average ⁇ by weighting with the ⁇ from the preceding frame to limit discontinuities.
- the value of the constant ⁇ can be increased to obtain higher noise suppression, which does not result in fluctuations in the speech as much as it does for standard spectral subtraction because H( ⁇ ) is always nonnegative.
- modified generalized Wiener filter preferred embodiment proceeds through the following steps as illustrated in FIG. 4:
- each frame has a Hann window of width 256.
- each frame has 256 samples y(j) and the frames add to reconstruct the input speech stream.
- r(.) is the autocorrelation function of y(.).
- a counter to keep track of the number of successive frames in which the condition P Y >1.006 P' N ( ⁇ ) occurs. If 75 successive frames have this condition, then change the multiplier from 1.006 to (1.006) 2 and restart the counter at 0. And if the next successive 75 frames have the condition P Y >(1.006) 2 P' N ( ⁇ ), then change the multiplier from (1.006) 2 to (1.006) 3 . Continue in this fashion provided 75 successive frames all have satisfy the condition. Once a frame violates the condition, return to the initial multiplier of 1.006.
- ⁇ ⁇ E N /E Y to use in the generalized Wiener filter.
- ⁇ will be about 6-7 with larger values for increased noise suppression and smaller values for less.
- ⁇ may be filtered by averaging with the preceding frame by:
- ⁇ ' is the ⁇ of the preceding frame. That is, for the current frame with E N the energy of the noise estimate P N ( ⁇ ), E Y the energy of the noisy speech LPC model, and ⁇ ' is the same expression but for the previous frame.
- FIG. 4 shows this optional filtering with a broken line.
- an adaptive clamp could be used.
- H 2 ( ⁇ ) (or H 3 ( ⁇ ) if used) to the range ⁇ 2 ⁇ or - ⁇ 0 by symmetry to define H( ⁇ ).
- the periodicity of H( ⁇ ) makes these extensions equivalent.
- FIG. 11 shows in block form preferred embodiment noise suppressor 1100 which implements the nonoptional functions of a modified generalized Wiener filter preferred embodiment.
- FFT module 1102 performs a fast Fourier transform of an input frame to give Y(.) and antocorrelator 1104 performs autocorrelation on the input frame to yield r(.).
- LPC coefficient analyzer 1106 derives the LPC coefficients a j , and ALU 1108 then forms the power estimate P Y (.) plus the frame energy estimate E Y .
- ALU 1110 uses P Y (.) to update the noise power estimate P' N held in noise buffer 1112 to give P N which is stored in noise buffer 1112.
- ALU 1110 also generates E N , which together with E Y from ALU 1108, for ALU 1114 to find ⁇ .
- ALU 1116 takes the outputs of ALUs 1108, 1110, and 1114 to derive the first approximation H 1 and clamper 1118 then yields H 2 to be used in multiplier 1120 to perform the filtering.
- IFFT module 1122 performs the inverse FFT to yield the output filtered frame.
- Each component has associated buffer memory, and controller 1130 provides the timing and enablement signals to the various components. The adaptive clamp could be used for damper 1118.
- Noise suppressor 1100 Insertion of noise suppressor 1100 into the systems of FIGS. 1a-b as the noise suppression block provides preferred embodiment systems in which noise suppressor 1100 in part controls the output.
- FIG. 5 illustrates the flow for codebook-based generalized Wiener filter noise suppression preferred embodiments having filter transfer functions:
- the preferred embodiments estimate the noise P N ( ⁇ ) in the same manner as step (5) of the previously described generalized Wiener filter preferred embodiments, and estimate P S ( ⁇ ) by the use of the line spectral frequencies (LSF) of the input noisy speech as weightings for LSFs flora a codebook of noise-flee speech samples.
- LSF line spectral frequencies
- r(.) is the autocorrelation of y(.). This again follows the modified generalized Wiener filter preferred embodiments.
- the gain of the LPC spectrum is ⁇ i a i r(i).
- each codebook entry is a set of M LSFs in size order.
- the codebook has 256 of such entries which have been determined by conventional vector quantiztion training (e.g., LBG algorithm) on sets of M LSFs from noise-free speech samples.
- LSF n ,c(i) is the noisy speech frame LSF which is the closest to LSF n ,i (so c(i) will be either i-1 or i+1 if the LSF n ,1 are in size order).
- this distance measure is dominated by the LSF n ,i which are close to each other, and this provides good results because such LSFs have a higher chance of being formants in the noisy speech frame.
- ⁇ i a i r(i) is the gain of the LPC spectrum from step (3).
- FIG. 5 shows the iteration path
- FIG. 12 shows in block form preferred embodiment noise suppressor 1200 which implements the codebook modified generalized Wiener filter preferred embodiment.
- FFT 1202 performs a fast Fourier transform of an input frame to give Y(.) and antocorrelator 1204 performs autocorrelation on the input frame to yield r(.).
- LPC coefficient analyzer 1206 derives the LPC coefficients a j
- LPC-W-LSF converter 1208 gives the LSF coefficients to ALU 1210.
- Codebook 1212 provides codebook LSF coefficients to ALU 1210 which then forms the noise-free signal LSF coefficient estimates to LSF-to-LPC converter 1214 for conversion to LPC estimates and then to ALU 1216 to form power estimate P Y (.).
- Noise buffer 1220 and ALU 1222 update the noise estimate P N (.) as with the preceding preferred embodiments, and ALU 1224 uses P Y (.) and P N (.) to form the first approximation unclamped H 1 and clamper 1226 then yields clamped H 1 to be used in multiplier 1230 to perform the filtering.
- IFFT 1232 performs the inverse FFT to yield the first approximation filtered frame. Iteration counter send the first approximation filtered frame back to antocorrelator 1204 to start generation of a second approximation filter H 2 .
- This second approximation filter applied to Y(.) yields the second approximation filtered frame which iteration counter 1234 again sends back to autocorrelator 1204 to start generation of a third approximation H 3 . Iteration counter repeats this six times to finally yield a seventh approximation filter and filtered frame which then becomes the output filtered frame.
- Each component has associated buffer memory, and controller 1240 provides the timing and enablement signals to the various components.
- the adaptive clamp could be used for clamper 1226.
- Noise suppressor 1200 Insertion of noise suppressor 1200 into the systems of FIGS. 1a-b as the noise suppression blocks provides preferred embodiment systems in which noise suppressor 1200 in part controls the output.
- the preferred embodiments employ various operations such as FFT, and with low power frames the Signal samples are small and precision may be lost in multiplications. For example, squaring a 16-bit fixed-point sample will yield a 32-bit result, but memory limitations may demand that only 16 bits be stored and so only the upper 16 bits will be chosen to avoid overflow. Thus an input sample with only the lowest 9 bits nonzero will have an 18-bit answer which implies only the two most significant bits will be retained and thus a loss of precision.
- An automatic gain control to bring input samples up to a higher level avoids such a loss of precision but destroys the power level information: both loud and quiet input speech will have the same power output levels. Also, such automatic gain control typically relies on the sample stream and does not consider a frame at a time.
- a preferred embodiment precision control method proceeds as follows.
- the frame scaling factor so as to set the average sample size to have (2N+8-S)/2-H significant bits where H is an integer, such as 3, of additional headroom bits. That is, the frame scaling factor is 2.sup.(2N+8-S)2-H.
- the scaling factor equals 2 N-K-H . For example, with 16-bit format and 3 overhead bits, if the average sample magnitude is 2 -9 (7 significant bits), then the scaling factor will be 2 5 so the average scaled sampled magnitude is 2 -4 which leaves 3 bits (2 3 ) before overflow occurs at 2 0 .
- variable j presumed translated into the range -128 to +127. Do this windowing before the scaling to help avoid overflow on the much larger than average samples as they could fail at the edges of the window. Of course, this windowing could follow the scaling of the next step.
- An alternative precision control scaling uses the sum of the absolute values of the samples in a frame rather than the power estimate (sum of the squares of the samples).
- count the number S of significant bits is the sum of absolute values and scale the input samples by a factor of 2 N+8-H where again N+1 is the number bits in the sample representation, the 8 comes from the 256 (2 8 ) sample frame size, and H provides headroom bits.
- the sum of absolute values should be about K+8 bits, and so S will be about K+8 and the factor will be 2 N-K-H which is the same as the power estimate sum scaling.
- scaling factors such as 2.sup.(2N+8-S)-H have yielded good results. That is, variations of the method of scaling up according to a frame characteristic, processing, and then scaling down will also be viable provided the scaling does not lead to excessive overflow.
- FIG. 13 illustrates in block format a internal precision controller preferred embodiment which could be used with any of the foregoing noise suppression filter preferred embodiments.
- frame energy measurer 1302 determines the scaling factor to be used, and scaler 1304 applies the scaling factor to the input frame.
- Filter 1306 filters the scaled frame, and inverse scaler 1308 then undoes the scaling to return to the original input signal levels.
- Filter 1306 could be any of the foregoing preferred embodiment filters. Parameters from filter 1306 may be part of the scale factor determination by measurer 1302.
- insertion of noise suppressors 1300 into the systems of FIGS. 1a-b provides preferred embodiment systems in which noise suppressor 1300 in part controls the output.
- the preferred embodiments may be varied in many ways while retaining one or more of the features of clamping, noise enhancing, smoothed power estimating, recursive noise estimating, adaptive clamping, adaptive noise suppression factoring, codebook based estimating, and internal precision controlling.
- the various generalized Wiener filters of the preferred embodiments had power ⁇ equal to 1/2, but other powers such as 1, 3/4, 1/4, and so forth also apply; higher filter powers imply stronger filtering.
- the frame size of 256 samples could be increased or decreased, although powers of 2 are convenient for FFTs.
- the particular choice of 3 bits of additional headroom could be varied, especially with different size frames and different number of bits in the sample representation.
- the adaptive clamp could have a negative dependence upon frame noise and signal estimates (B ⁇ 0). Also, the adaptive clamp could invoke a near-end speech detection method to adjust the clamp level.
- the ⁇ and ⁇ coefficients could be varied and could enter the transfer functions as simple analytic functions of the ratios, and the number iterations in the codebook based generalized Wiener filter could be varied.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
An acoustic noise suppression filter including attenuation filtering with a noise suppression factor depending upon the ratio of estimated noise energy of a frame divided by estimated signal energy.
Description
Cofiled patent applications with Ser. Nos. 08/424,928, 08/426,426, 08/426,746 and 08/426,427 discloses related subject matter. These applications all have a common assignee.
The invention relates to electronic devices, and, more particularly, to speech analysis and synthesis devices and systems.
Human speech consists of a stream of acoustic signals with frequencies ranging up to roughly 20 KHz; but the band of 100 Hz to 5 KHz contains the bulk of the acoustic energy. Telephone transmission of human speech originally consisted of conversion of the analog acoustic signal stream into an analog electrical voltage signal stream (e.g., microphone) for transmission and reconversion to an acoustic signal stream (e.g., loudspeaker) for reception.
The advantages of digital electrical signal transmission led to a conversion from analog to digital telephone transmission beginning in the 1960s. Typically, digital telephone signals arise from sampling analog signals at 8 KHz and nonlinearly quantizing the samples with 8-bit codes according to the μ-law (pulse code modulation, or PCM). A clocked digital-to-analog converter and companding amplifier reconstruct an analog electrical signal stream from the stream of 8-bit samples. Such signals require transmission rates of 64 Kbps (kilobits per second). Many communications applications, such as digital cellular telehone, cannot handle such a high transmission rate, and this has inspired various speech compression methods.
The storage of speech information in analog format (e.g., on magnetic tape in a telephone answering machine) can likewise be replaced with digital storage. However, the memory demands can become overwhelming: 10 minutes of 8-bit PCM sampled at 8 KHz would require about 5 MB (megabytes) of storage. This demands speech compression analogous to digital transmission compression.
One approach to speech compression models the physiological generation of speech and thereby reduces the necessary information transmitted or stored. In particular, the linear speech production model presumes excitation of a variable filter (which roughly represents the vocal tract) by either a pulse train for voiced sounds or white noise for unvoiced sounds followed by amplification or gain to adjust the loudness. The model produces a stream of sounds simply by periodically making a voiced/unvoiced decision plus adjusting the filter coefficients and the gain. Generally, see Markel and Gray, Linear Prediction of Speech (Springer-Verlag 1976).
More particularly, the linear prediction method partitions a stream of speech samples s(n) into "frames" of, for example, 180 successive samples (22.5 msec intervals for a 8 KHz sampling rate); and the samples in a frame then provide the data for computing the filter coefficients for use in coding and synthesis of the sound associated with the frame. Each frame generates coded bits for the linear prediction filter coefficients (LPC), the pitch, the voiced/unvoiced decision, and the gain. This approach . of encoding only the model parameters represents far fewer bits than encoding the entire frame of speech samples directly, so the transmission rate may be only 2.4 Kbps rather than the 64 Kbps of PCM. In practice, the LPC coefficients must be quantized for transmission, and the sensitivity of the filter behavior to the quantization error has led to quantization based on the Line Spectral Frequencies (LSF) representation.
To improve the sound quality, further information may be extracted from the speech, compressed and transmitted or stored along with the LPC coefficients, pitch, voicing, and gain. For example, the codebook excitation linear prediction (CELP) method first analyzes a speech frame to find the LPC filter coefficients, and then filters the frame with the LPC filter. Next, CELP determines a pitch period from the filtered frame and removes this periodicity with a comb filter to yield a noise-looking excitation signal. Lastly, CELP encodes the excitation signals using a codebook. Thus CELP transmits the LPC filter coefficients, pitch, gain, and the codebook index of the excitation signal.
The advent of digital cellular telephones has emphasized the role of noise suppression in speech processing, both coding and recognition. Customer expectation of high performance even in extreme car noise situations plus the demand to move to progressively lower data rate speech coding in order to accommodate the ever-increasing number of cellular telephone customers have contributed to the importance of noise suppression. While higher data rate speech coding methods tend to maintain robust performance even in high noise environments, that typically is not the case with lower data rate speech coding methods. The speech quality of low data rate methods tends to degrade drastically with high additive noise. Noise supression to prevent such speech quality losses is important, but it must be achieved without introducing any undesirable artifacts or speech distortions or any significant loss of speech intelligibility. These performance goals for noise suppression have existed for many years, and they have recently come to the forefront due to digital cellular telephone application.
FIG. 1a schematically illustrates an overall system 100 of modules for speech acquisition, noise suppression, analysis, transmission/storage, synthesis, and playback. A microphone converts sound waves into electrical signals, and sampling analog-to-digital converter 102 typically samples at 8 KHz to cover the speech spectrum up to 4 KHz. System 100 may partition the stream of samples into frames with smooth windowing to avoid discontinuities. Noise suppression 104 filters a frame to suppress noise, and analyzer 106 extracts LPC coefficients, pitch, voicing, and gain from the noise-suppressed frame for transmission and/or storage 108. The transmission may be any type used for digital information transmission, and the storage may likewise be any type used to store digital information. Of course, types of encoding analysis other than LPC could be used. Synthesizer 110 combines the LPC coefficients, pitch, voicing, and gain information to synthesize frames of sampled speech which digital-to-analog convertor (DAC) 112 converts to analog signals to drive a loudspeaker or other playback device to regenerate sound waves.
FIG. 1b shows an analogous system 150 for voice recognition with noise suppression. The recognition analyzer may simply compare input frames with frames from a database or may analyze the input frames and compare parameters with known sets of parameters. Matches found between input frames and stored information provides recognition output.
One approach to noise suppression in speech employs spectral subtraction and appears in Boll, Suppression of Acoustic Noise in Speech Using Spectral Subtraction, 27 IEEE Tr. ASSP 113 (1979), and Lim and Oppenheim, Enhancement and Bandwidth Compression of Noisy Speech, 67 Proc. IEEE 1586 (1979). Spectral subtraction proceeds roughly as follows. Presume a sampled speech signal s(j) with uncorrelated additive noise n(j) to yield an observed windowed noisy speech y(j)=s(j)+n(j). These are random processes over time. Noise is assumed to be a stationary process in that the process's autocorrelation depends only on the difference of the variables; that is, there is a function rn (.) such that:
E{n(j)n(i)}=r.sub.N (i-j)
where E is the expectation. The Fourier transform of the autocorrelation is called the power spectral density, PN (ω). If speech were also a stationary process with autocorrelation rs (j) and power spectral density Ps (ω), then the power spectral densities would add due to the lack of correlation:
P.sub.Y (ω)=P.sub.S (ω)+P.sub.N (ω)
Hence, an estimate for PS (ω), and thus s(j), could be obtained from the observed noisy speech y(j) and the noise observed during intervals of (presumed) silence in the observed noisy speech. In particular, take PY (ω) as the squared magnitude of the Fourier transform of y(j) and PN (ω) as the squared magnitude of the Fourier transform of the observed noise.
Of course, speech is not a stationary process, so Lim and Oppenheim modified the approach as follows. Take s(j) not to represent a random process but rather to represent a windowed speech signal (that is, a speech signal which has been multiplied by a window function), n(j) a windowed noise signal, and y(j) the resultant windowed observed noisy speech signal. Then Fourier transforming and multiplying by complex conjugates yields:
|Y(ω)|.sup.2 =|S(ω)|.sup.2 +|N(ω).sup.2 +2Re{S(ω)N(ω)*}
For ensemble averages the last term on the righthand side of the equation equals zero due to the lack of correlation of noise with the speech signal. This equation thus yields an estimate, S (ω), for the speech signal Fourier transform as:
|S (ω)|.sup.2 =|Y(ω)|.sup.2 -E{|N(ω)|.sup.2 }
This resembles the preceding equation for the addition of power spectral densities.
An autocorrelation approach for the windowed speech and noise signals simplifies the mathematics. In particular, the autocorrelation for the speech signal is given by
r.sub.s (j)=Σ.sub.i s(i)S(i+j),
with similar expressions for the autocorrelation for the noisy speech and the noise. Thus the noisy speech autocorrelation is:
r.sub.Y (j)=r.sub.S (j)+r.sub.N (j)+c.sub.SN (j)+.sub.SN (-j)
where cSN (.) is the cross correlation of s(j) and n(j). But the speech and noise signals should be uncorrelated, so the cross correlations can be approximated as 0. Hence, rY (j)=rS (j)+rN (j). And the Fourier transforms of the autocorrelations are just the power spectral densities, so
P.sub.Y (ω)=P.sub.S (ω)+P.sub.N (ω)
Of course, PY (ω) equals |Y(ω)|2 with Y(ω) the Fourier transform of y(j) due to the autocorrelation being just a convolution with a time-reversed variable.
The power spectral density PN (ω) of the noise signal can be estimated by detection during noise-only periods, so the speech power spectral estimate becomes ##EQU1## which is the spectral subtraction.
The spectral subtraction method can be interpreted as a time-varying linear filter H(ω) so that S (ω)=H(ω)Y(ω) which the foregoing estimate then defines as:
H(ω).sup.2 - P.sub.Y (ω)-P.sub.N (ω)!/P.sub.Y (ω)
The ultimate estimate for the frame of windowed speech, s (j), then equals the inverse Fourier transform of S (ω), and then combining the estimates from successive frames ("overlap add") yields the estimated speech stream.
This spectral subtraction can attenuate noise substantially, but it has problems including the introduction of fluctuating tonal noises commonly referred to as musical noises.
The Lim and Oppenheim article also describes an alternative noise suppression approach using noncausal Wiener filtering which minimizes the mean-square error. That is, again S (ω)=H(ω)Y(ω) but with H(ω) now given by:
H(ω)=P.sub.S (ω)/ P.sub.S (ω)+P.sub.N (ω)!
This Wiener filter generalizes to:
H(ω)= P.sub.s (ω)/ P.sub.S (ω)+αP.sub.N (ω)!!.sup.β
where constants α and β are called the noise suppression factor and the filter power, respectively. Indeed, α=1 and β=1/2 leads to the spectral subtraction method in the following.
A noncausal Wiener filter cannot be directly applied to provide an estimate for s(j) because speech is not stationary and the power spectral density PS (ω) is not known. Thus approximate the noncausal Wiener filter by an adaptive generalized Wiener filter which uses the squared magnitude of the estimate S (ω) in place of PS (ω):
H(ω)=(|S (ω)|.sup.2 / |S (ω)|.sup.2 +αE{|Nω|.sup.2 }!).sup.β
Recalling S (ω)=H(ω)Y(ω) and then solving for |S (ω)| in the β=1/2 case yields:
|S (ω)|= |Y(ω)|.sup.2 -αE{|N(ω)|.sup.2 }!.sup.1/2
which just replicates the spectral subtraction method when α=1.
However, this generalized Wiener filtering has problems including how to estimate S , and estimators usually apply an iterative approach with perhaps a half dozen iterations which increases computational complexity.
Ephraim, A Minimum Mean Square Error Approach for Speech Enhancement, Conf. Proc. ICASSP 829 (1990), derived a Wiener filter by first analyzing noisy speech to find linear prediction coefficients (LPC) and then resynthesizing an estimate of the speech to use in the Wiener filter.
In contrast, O'Shaughnessy, Speech Enhancement Using Vector Quantization and a Formant Distance Measure, Conf. Proc. ICASSP 549 (1988), computed noisy speech formants and selected quantized speech codewords to represent the speech based on formant distance; the speech was resynthesized from the codewords. This has problems including degradation for high signal-to-noise signals because of the speech quality limitations of the LPC synthesis.
The Fourier transforms of the windowed sampled speech signals in systems 100 and 150 can be computed in either fixed point or floating point format. Fixed point is cheaper to implement in hardware but has less dynamic range for a comparable number of bits. Automatic gain control limits the dynamic range of the speech samples by adjusting magnitudes according to a moving average of the preceding sample magnitudes, but this also destroys the distinction between loud and quiet speech. Further, the acoustic energy may be concentrated in a narrow frequency band and the Fourier transform will have large dynamic range even for speech samples with relatively constant magnitude. To compensate for such overflow potential in fixed point format, a few bits may be reserved for large Fourier transform dynamic range; but this implies a loss of resolution for small magnitude samples and consequent degradation of quiet speech. This is especially true for systems which follow a Fourier transform with an inverse Fourier transform.
The present invention provides speech noise suppression by spectral subtraction filtering improved with filter clamping, limiting, and/or smoothing, plus generalized Wiener filtering with a signal-to-noise ratio dependent noise suppression factor, and plus a generalized Wiener filter based on a speech estimate derived from codebook noisy speech analysis and resynthesis. And each frame of samples has a frame-energy-based scaling applied prior to and after Fourier analysis to preserve quiet speech resolution.
The invention has advantages including simple speech noise suppression.
The drawings are schematic for clarity.
FIGS. 1a-b show speech systems with noise suppression.
FIG. 2 illustrates a preferred embodiment noise suppression subsystem.
FIGS. 3-5 are flow diagrams for preferred embodiment noise suppression.
FIG. 6 is a flow diagram for a framewise scaling preferred embodiment.
FIGS. 7-8 illustrate spectral subtraction preferred embodiment aspects.
FIGS. 9a-b shows spectral subtraction preferred embodiment systems.
FIGS. 10a-b illustrates spectral subtraction preferred embodiments with adaptive minimum gain clamping.
FIG. 11 is a block diagram of a modified Wiener filter preferred embodiment system.
FIG. 12 shows a codebook based generalized Wiener filter preferred embodiment system.
FIG. 13 illustrates a preferred embodiment internal precision control system.
FIG. 2 shows a preferred embodiment noise suppression filter system 200. In particular, frame buffer 202 partitions an incoming stream of speech samples into overlapping frames of 256-sample size and windows the frames; FFT module 204 converts the frames to the frequency domain by fast Fourier transform; multiplier 206 pointwise multiplies the frame by the filter coefficients generated in noise filter block 208; and IFFT module 210 converts back to the time domain by inverse fast Fourier transform. Noise suppressed frame buffer 212 holds the filtered output for speech analysis, such as LPC coding, recognition, or direct transmission. The filter coefficients in block 208 derive from estimates for the noise spectrum and the noisy speech spectrum of the frame, and thus adapt to the changing input. All of the noise suppression computations may be performed with a standard digital signal processor such as a TMS320C25, which can also perform the subsequent speech analysis, if any. Also, general purpose microprocessors or specialized hardware could be used.
The preferred embodiment noise suppression filters may also be realized without Fourier transforms; however, the multiplication of Fourier transforms then corresponds to convolution of functions.
The preferred embodiment noise suppression filters may each be used as the noise suppression blocks in the generic systems of FIGS. 1a-b to yield preferred embodiment systems.
The smoothed spectral subtraction preferred embodiments have a spectral subtraction filter which (1) clamps attenuation to limit suppression for inputs with small signal-to-noise ratios, (2) increases noise estimate to avoid filter fluctuations, (3) smoothes noisy speech and noise spectra used for filter definition, and (4) updates a noise spectrum estimate from the preceding frame using the noisy speech spectrum. The attenuation clamp may depend upon speech and noise estimates in order to lessen the attenuation (and distortion) for speech; this strategy may depend upon estimates only in a relatively noise-free frequency band. FIG. 3 is a flow diagram showing all four aspects for the generation of the noise suppression filter of block 208.
The signal-to-noise ratio adaptive generalized Wiener filter preferred embodiments use H(ω)= PS (ω)/ PS (ω)+αPN (ω)!!.sup.β where the noise suppression factor a depends on EY /EN with EN the noise energy and EY the noisy speech energy for the frame. These preferred embodiments also use a scaled LPC spectral approximation of the noisy speech for a smoothed speech power spectrum estimate as illustrated in the flow diagram Figure 4. FIG. 4 also illustrates an optional filtered α.
The codebook-based generalized Wiener filter noise suppression preferred embodiments use H(ω)= PS (ω)/ PS (ω)+αPN (ω)!!.sup.β with PS (ω) estimated from LSFs as weighted sums of LSFs in a codebook of LSFs with the weights determined by the LSFs of the input noisy speech. Then iterate: use this H(ω) to form H(ω)Y(ω), next redetermine the input LSFs from H(ω)Y(ω), and then redetermine H(ω) with these LSFs as weights for the codebook LSFs. A half dozen iterations may be used. FIG. 5 illustrates the flow.
The power estimates used in the preferred embodiment filter definitions may also be used for adaptive scaling of low power signals to avoid loss of precision during FFT or other operations. The scaling factor adapts to each frame so that with fixed-point digital computations the scale expands or contracts the samples to provide a constant overflow headroom, and after the computations the inverse scale restores the frame power level. FIG. 6 illustrates the flow. This scaling applies without regard to automatic gain control and could even be used in conjunction with an automatic gain controlled input.
FIG. 3 illustrates as a flow diagram the various aspects of the spectral subtraction preferred embodiments as used to generate the filter. A preliminary consideration of the standard spectral subtraction noise suppression simplifies explanation of the preferred embodiments. Thus first consider the standard spectral subtraction filter: ##EQU2## A graph of this function with logarithmic scales appears in FIG. 7 labelled "standard spectral subtraction". Indeed, spectral subtraction consists of applying a frequency-dependent attenuation to each frequency in the noisy speech power spectrum with the attenuation tracking the input signal-to-noise power ratio at each frequency. That is, H(ω) represents a linear time-varying filter. Consequently, as shown in FIG. 7, the amount of attenuation varies rapidly with input signal-to-noise power ratio, especially when the input signal and noise are nearly equal in power. When the input signal contains only noise, the filtering produces musical noise because the estimated input signal-to-noise power ratio at each frequency fluctuates due to measurement error, producing attenuation with random variation across frequencies and over time. FIG. 8 shows the probability distribution of the FFT power spectral estimate at a given frequency of white noise with unity power (labelled "no smoothing"), and illustrates the amount of variation which can be expected.
The preferred embodiments modify this standard spectral subtraction in four independent but synergistic approaches as detailed in the following.
Preliminarily, partition an input stream of noisy speech sampled at 8 KHz into 256-sample frames with a 50% overlap between successive frames; that is, each frame shares its first 128 samples with the preceding frame and shares its last 128 samples with the succeeding frame. This yields an input stream of frames with each frame having 32 msec of samples and a new frame beginning every 16 msec.
Next, multiply each frame with a Hann window of width 256. (A Hann window has the form w(k)=(1+cos(2πk/K))/2 with K+1 the window width.) Thus each frame has 256 samples y(j), and the frames add to reconstruct the input speech stream.
Fourier transform the windowed speech to find Y(ω) for the frame; the noise spectrum estimation differs from the traditional methods and appears in modification (4).
(1) Clamp the H(ω) attenuation curve so that the attenuation cannot go below a minimum value; FIG. 7 has this labelled as "damped" and illustrates a 10 dB clamp. The clamping prevents the noise suppression filter H(ω) from fluctuating around very small gain values, and also reduces potential speech signal distortion. The corresponding filter would be:
H(ω).sup.2 =max 10.sup.=2, 1-|N(ω)|.sup.2 /|Y(ω)|.sup.2 !
Of course, the 10 dB damp could be replaced with any other desirable damp level, such as 5 dB or 20 dB. Also, the damping could include a sloped damp or stepped clamping or other more general clamping curves, but a simple damp lessens computational complexity. The following "Adaptive filter damp" section describes a damp which adapts to the input signal energy level.
(2) Increase the noise power spectrum estimate by a factor such as 2 so that small errors in the spectral estimates for input (noisy) signals do not result in fluctuating attenuation filters. The corresponding filter for this factor alone would be:
H(ω).sup.2 =1-4|N(ω)|.sup.2 /|Y(ω)|.sup.2
For small input signal-to-noise power ratios this becomes negative, but a damp as in (1) eliminates the problem. This noise increase factor appears as a shift in the logarithmic input signal-to-noise power ratio independent variable of FIG. 7. Of course, the 2 factor could be replaced by other factors such as 1.5 or 3; indeed, FIG. 7 shows a 5 dB noise increase factor with the resulting attenuation curve labelled "noise increased". Further, the factor could vary with frequency such as more noise increase (i.e., more attenuation) at low frequencies.
(3) Reduce the variance of spectral estimates used in the noise suppression filter H(ω) by smoothing over neighboring frequencies. That is, for an input windowed noisy speech signal y(j) with Fourier transform Y(ω), apply a running average over frequency so that |Y(ω)|2 is replaced by (W★|Y|2)(ω) in H(ω) where W(ω) is a window about 0 and ★ is the convolution operator. FIG. 8 shows that the spectral estimates for white noise converge more closely to the correct answer with increasing smoothing window size. That is, the curves labelled "5 element smoothing", "33 element smoothing", and "128 element smoothing" show the decreasing probabilities for large variations with increasing smoothing window sizes. More spectral smoothing reduces noise fluctuations in the filtered speech signal because it reduces the variance of spectral estimation for noisy frames; however, spectral smoothing decreases the spectral resolution so that the noise suppression attenuation filter cannot track sharp spectral characteristics. The preferred embodiment operates with sampling at 8 KHz and windows the input into frames of size 56 samples (32 milliseconds); thus an FFT on the frame generates the Fourier transform as a function on a domain of 256 frequency values. Take the smoothing window W(ω) to have a width of 32 frequencies, so convolution with W(ω) averages over 32 adjacent frequencies. W(ω) may be a simple rectangular window or any other window. The filter transfer function with such smoothing is:
H(ω).sup.2 =1-|N(ω).sup.2 /W★|Y|.sup.2 (ω)
Thus a filter with all three of the foregoing features has transfer function:
H(ω).sup.2 =max 10.sup.-2, 1-4|N(ω)|.sup.2 /W★|Y|.sup.2 (ω)!
Extend the definition of H(ω) by symmetry to π<ω<2π or -π<ω<0
(4) Any noise suppression by spectral subtraction requires an estimate of the noise power spectrum. Typical methods update an average noise spectrum during periods of nonspeech activity, but the performance of this approach depends upon accurate estimation of speech intervals which is a difficult technical problem. Some kinds of acoustic noise may have speech-like characteristics, and if they are incorrectly classified as speech, then the noise estimated will not be updated frequently enough to track changes in the noise environment.
Consequently, the preferred embodiment takes noise as any signal which is always present. At each frequency recursively estimate the noise power spectrum PN (ω) for use in the filter H(ω) by updating the estimate from the previous frame, P'N (ω), using the current frame smoothed estimate for the noisy speech power spectrum, PY (ω)=W★|Y|2 (ω), as follows: ##EQU3## For the first frame, just take PN (ω) equal to PY (ω).
Thus, the noise power spectrum estimate can increase up to 3 dB per second or decrease up to 12 dB per second. As a result, the noise estimates will only slightly increase during short speech segments, and will rapidly return to the correct value during pauses between words. The initial estimate can simply be taken as the first input frame which typically will be silence; of course, other initial estimates could be used such as a simple constant. This approach is simple to implement, and is robust in actual performance since it makes no asumptions about the characteristics of either the speech or the noise signals. Of course, multiplicative factors other than 0.978 and 1.006 could be used provided that the decrease limit exceeds the increase limit. That is, the product of the multiplicative factors is less than 1; e.g., (0.978)(1.006) is less than 1.
A preferred embodiment filter may include one or more of the four modifications, and a preferred embodiment filter combining all four of the foregoing modifications will have a transfer function:
H(ω).sup.2 =max 10.sup.-2, 1-4P.sub.N (ω)/W★|Y|.sup.2 (ω)!
with PN (ω) the noise power estimate as in the preceding.
FIG. 9a shows in block form preferred embodiment noise suppressor 900 which implements a preferred embodiment spectral subtraction with all four of the preferred embodiment modifications. In particular, FFT module 902 performs a fast Fourier transform of an input frame to give Y(ω), magnitude squarer 904 generates |Y(ω)|2, convolver 906 yields PY (ω)=W★|Y|2 (ω), noise buffer (memory) 908 holds PN '(ω), ALU (arithmetic logic unit plus memory) 910 compares PY and PN ' and computes PN and updates buffer 908, ALU 912 computes 1-4PN (ω)/PY, clamper 914 computes H(ω), multiplier 920 applies H(ω) to Y(ω), and IFFT module 922 does an inverse Fourier transform to yield the noise-suppression filtered frame. Controller 930 provides the timing and enable merit signals to the various components. Noise suppressor 900 inserted into the systems of FIGS. 1a-b as the noise suppression blocks provides preferred embodiment systems in which noise suppressor 900 in part controls the output.
The filter attenuation clamp of the preceding section can be replaced with an adaptive filter attenuation clamp. For example, take
H(ω).sup.2 =max M.sup.2, 1-|N(ω)|.sup.2 /|Y(ω)|.sup.2 !
and let the minimum filter gain M depend upon the signal and noise power of the current frame (or, for computational simplicity, of the preceding frame). Indeed, when speech is present, it serves to mask low-level noise; therefore, M can be increased in the presence of speech without the listener hearing increased noise. This has the benefit of lessening the attentuation of the speech and thus causing less speech distortion. Because a common response to having difficulty communicating over the phone is to speak louder, this decreasing the filter attenuation with increased speech power will lessen distortion and improve speech quality. Simply put, the system will transmit clearer speech the louder a person talks.
In particular, let YP be the sum of the signal power spectrum over the frequency range 1.8 KHz to 4.0 KHz: with a 256-sample frame sampling at 8 KHz and 256-point FFT, this corresponds to frequencies 51π/128 to π. That is,
YP=Σ.sub.ω P.sub.Y (ω) for 51π/128≦ω≦π
Similarly, let NP be the corresponding sum of the noise power:
NP=Σ.sub.ω P.sub.N (ω) for 51π/128≦ω≦π
with PN (ω) the noise estimate from the preceding section. The frequency range 1.8 KHz to 4.0 KHz lies in a band with small road noise for an automobile but still with significant speech power, thus detect the presence of speech by considering YP-NP. Then take M equal to A+B(YP-NP) where A is the minimum filter gain with an all noise input (analogous to the clamp of the preceding section), and B is the dependence of the minimum filter gain on speech power. For example, A could be -8 dB or -10 dB as in the preceding section, and B could be in the range of 1/4 to 1. Further, YP-NP may become negative for near silent frames, so preserve the minimum clamp at A by ignoring the B(YP-NP) factor when YP-NP is negative. Also, an upper limit of -4 dB for very loud frames could be imposed by replacing B(YP-NP) with min -4 dB, B(YP-NP)!.
More explicitly, presume a 16-bit fixed-point format of two's complement numbers, and presume that the noisy speech samples have been scaled so that numbers X arising in the computations will fall into the range -1≦X<+1, which in hexadecimal notation will be the range 8000 to 7FFF. Then the filter gain clamp could vary between A taken equal to 1000 (0.125), which is roughly -9 dB, and an upper limit for A+B(YP-NP) taken equal to 3000 (0.375), which is roughly -4.4 dB. More conservatively, the damp could be constrained to the range of 1800 to 2800.
Furthermore, a simpler implementation of the adaptive clamp which still provides its advantages uses the M from the previous frame (called MOLD) and takes M for the current frame simply equal to (17/16)MOLD, when MOLD is less than A+B(YP-NP) and (15/16)MOLD when MOLD is greater than A+B(YP-NP).
The preceding adaptive clamp depends linearly on the speech power; however, other dependencies such as quadratic could also be used provided that the functional dependence is monotonic. Indeed, memory in system and slow adaptation rates for M make the clamp nonlinear.
The frequency range used to measure the signal and noise powers could be varied, such as 1.2 KHz to 4.0 KHz or another band (or bands) depending upon the noise environment. FIG. 10a heuristically illustrates an adaptive clamp in a form analogous to FIG. 7; of course, the adaptive clamp depends upon the magnitude of the difference of the sums (over a band) of input and noise powers, whereas the independent variable in FIG. 10a is the power ratio at a single frequency. However, as the power ratio increases for "average" frequencies, the magnitude of the difference of the sums of input and noise powers over the band also increases, so the clamp ramps up as indicated in FIG. 10a for "average" frequencies. FIG. 10b more accurately shows the varying adaptive clamp levels for a single frequency: the clamp varies with the difference of the sums of the input and noise powers as illustrated by the vertical arrow. Of course, the damp, whether adaptive or constant, could be used without the increased noise, and the lefthand portions of the damp curves together with the standard spectral curve of Figures 10a-b would apply.
Note that the adaptive clamp could be taken as dependent upon the ratio YP/NP instead of just the difference or on some combination. Also, the positive slope of the adaptive clamp (see FIG. 10a) could be used to have a greater attenuation (e.g., -15 dB) for the independent variable equal to 0 and ramp up to an attenuation less than the constant clamp (which is -10 dB) for the independent variable greater than 3 dB. The adaptive clamp achieves both better speech quality and better noise attenuation than the constant clamp.
Note that the estimates YP and NP could be defined by the previous frame in order to make an implementation on a DSP more memory efficient. For most frames the YP and NP will be close to those of the preceding frame.
FIG. 9b illustrates in block form preferred embodiment noise suppressor 950 which includes the components of system 900 but with an adaptive clamper 954 which has the additional inputs of YP from filter 956 and NP from filter 960. Insertion of noise suppressor 950 into the system of FIGS. 1a-b as the noise suppression blocks provides preferred embodiment systems in which noise suppressor 950 in part controls the output.
FIG. 4 is a flow diagram for a modified generalized Wiener filter preferred embodiment. Recall that a generalized Wiener filter with power β equal 1/2 has a transfer function:
H(ω).sup.2 =P.sub.S (ω)/ P.sub.S (ω)+αP.sub.N (ω)!
with PS (ω) an estimate for the speech power spectrum, PN (ω) an estimate for the noise power spectrum, and α a noise suppression factor. The preferred embodiments modify the generalized Wiener filter by using an α which tracks the signal-to-noise power ratio of the input rather than just a constant.
Heuristically, the preferred embodiment may be understood in terms of the following intuitive analysis. First, take PS (ω) to be cPY (ω) for a constant c with PY (ω) the power spectrum of the input noisy speech modelled by LPC. That is, the LPC model for y(j) in some sense removes the noise. Then solve for c by substituting this presumption into the statement that the speech and the noise are uncorrelated (PY (ω)=PS (ω)+PN (ω)) and integrating (summing) over all frequencies to yield:
∫P.sub.Y (ω)dω=∫cP.sub.Y (ω)dω+∫P.sub.N (ωdω
where PS estimated PS.
Thus by Parseval's theorem, EY =cEY +EN, where EY is the energy of the noisy speech LPC model and also an estimate for the energy of y(j), and EN is the energy of the noise in the frame. Thus, c=(EY -EN)/EY and so PS (ω)= (EY -EN)/EY !PY (ω). Then inserting this into the definition of the generalized Wiener filter transfer function gives:
H(ω).sup.2 =P.sub.Y (ω)/(P.sub.Y (ω)+ E.sub.Y /(E.sub.Y -E.sub.N)!αP.sub.N (ω))
Now take the factor multiplying PN (ω) (i.e., EY /(EY -EN)!α) as inversely dependent upon signal-to-noise ratio (i.e., EY /(EY -EN)!α=κEN /ES for a constant κ) so that the noise suppression varies from frame to frame and is greater for frames with small signal-to-noise ratios. Thus the modified generalized Wiener filter insures stronger suppression for noise-only frames and weaker suppression for voiced-speech frames which are not noise corrupted as much. In short, take α=κEN /EY, so the noise suppression factor has been made inversely dependent on the signal-to-noise ratio, and the filter transfer function becomes:
H(ω).sup.2 =P.sub.Y (ω)/(P.sub.Y (ω)+ E.sub.N /(E.sub.Y -E.sub.N)!κP.sub.N (ω))
Optionally, average α by weighting with the α from the preceding frame to limit discontinuities. Further, the value of the constant Θ can be increased to obtain higher noise suppression, which does not result in fluctuations in the speech as much as it does for standard spectral subtraction because H(ω) is always nonnegative.
In more detail, the modified generalized Wiener filter preferred embodiment proceeds through the following steps as illustrated in FIG. 4:
(1) Partition an input stream of noisy speech sampled at 8 KHz into 256-sample frames with a 50% overlap between successive frames; that is, each frame shares its first 128 samples with the preceding frame and shares its last 128 samples with the succeeding frame. This yields an input stream of frames with each frame having 32 msec of samples and a new frame beginning every 16 reset.
(2) Multiply each frame with a Hann window of width 256. (A Hann window has the form w(j)=(1+cos(2πj/N))/2 with N+1 the window width.) Thus each frame has 256 samples y(j) and the frames add to reconstruct the input speech stream.
(3) For each windowed frame, find the 8th order LPC filter coefficients a0 (=1), a1, a2, . . . a8 by solving the following eight equations for eight unknowns:
Σ.sub.k a.sub.k k(j+k)=0 for j=1, 2, . . . 8
where r(.) is the autocorrelation function of y(.).
(4) Form the discrete Fourier transform A(ω)=Σk ak e-ikω, and then estimate PY (ω) for use in the generalized Wiener filter as EY /|A(ω)|2 with EY =Σk ak r(k) the energy of the LPC model. This just uses the LPC synthesis filter spectrum as a smoothed version of the noisy speech spectrum and prevents erratic spectral fluctuations from affecting the generalized Wiener filter.
(5) Estimate the noise power spectrum PN (ω) for use in the generalized Wiener filter by updating the estimate from the previous frame, P'N (ω) using the current frame smoothed estimate for the noisy speech power spectrum, PY (ω), as follows: ##EQU4## Thus the noise spectrum estimate can increase at 3 dB per second and decrease at 12 dB per second. For the first frame, just take PN (ω) equal to PY (ω). And EN is the integration (sum) of PN over all frequencies.
Also, optionally, to handle abrupt increases in noise level, use a counter to keep track of the number of successive frames in which the condition PY >1.006 P'N (ω) occurs. If 75 successive frames have this condition, then change the multiplier from 1.006 to (1.006)2 and restart the counter at 0. And if the next successive 75 frames have the condition PY >(1.006)2 P'N (ω), then change the multiplier from (1.006)2 to (1.006)3. Continue in this fashion provided 75 successive frames all have satisfy the condition. Once a frame violates the condition, return to the initial multiplier of 1.006.
Of course, other multipliers and count limits could be used.
(6) Compute α=κEN /EY to use in the generalized Wiener filter. Typically, Θ will be about 6-7 with larger values for increased noise suppression and smaller values for less. Optionally, α may be filtered by averaging with the preceding frame by:
α =max(1, 0.8α+0.2α')
where α' is the α of the preceding frame. That is, for the current frame with EN the energy of the noise estimate PN (ω), EY the energy of the noisy speech LPC model, and α' is the same expression but for the previous frame. FIG. 4 shows this optional filtering with a broken line.
(7) Compute the first approximation modified generalized Wiener filter for each frequency as:
H.sub.1 (ω).sup.2 =P.sub.Y (ω)/ P.sub.Y (ω)+ E.sub.Y /(E.sub.Y -E.sub.N)!αP.sub.N (ω)!
with PY (ω) and EY from step (4), PN (ω) and EN from step (5), and α from step (6).
(8) Clamp H1 (ω) to avoid excess noise suppression by defining a second approximation: H2 (ω)=max(-10 dB, H1 (ω)). Alternatively, an adaptive clamp could be used.
(9) Optionally, smooth the second approximation by convolution with a window W(ω) having weights such as 0.1, 0.2, 0.4, 0.2, 0.1! to define a third approximation H3 (ω)=W★H2 (ω). FIG. 4 indicates this optional smoothing in brackets.
(10) Extend H2 (ω) (or H3 (ω) if used) to the range π<ω<2π or -π<ω<0 by symmetry to define H(ω). The periodicity of H(ω) makes these extensions equivalent.
(11) Compute the 256-point discrete Fourier transform of y(j) to obtain Y(ω).
(12) Take S (ω)=H(ω)Y(ω) as an estimate for the spectrum of the frame of speech with noise removed.
(13) Compute the 256-point inverse discrete Fourier transform of S (ω) and take the inverse transform to be the estimate s (j) of speech with noise removed for the frame.
(14) Add the s (j) of the overlapping portions of successive frames to get s(j) as the final noise suppressed estimate.
FIG. 11 shows in block form preferred embodiment noise suppressor 1100 which implements the nonoptional functions of a modified generalized Wiener filter preferred embodiment. In particular, FFT module 1102 performs a fast Fourier transform of an input frame to give Y(.) and antocorrelator 1104 performs autocorrelation on the input frame to yield r(.). LPC coefficient analyzer 1106 derives the LPC coefficients aj, and ALU 1108 then forms the power estimate PY (.) plus the frame energy estimate EY. ALU 1110 uses PY (.) to update the noise power estimate P'N held in noise buffer 1112 to give PN which is stored in noise buffer 1112. ALU 1110 also generates EN, which together with EY from ALU 1108, for ALU 1114 to find α. ALU 1116 takes the outputs of ALUs 1108, 1110, and 1114 to derive the first approximation H1 and clamper 1118 then yields H2 to be used in multiplier 1120 to perform the filtering. IFFT module 1122 performs the inverse FFT to yield the output filtered frame. Each component has associated buffer memory, and controller 1130 provides the timing and enablement signals to the various components. The adaptive clamp could be used for damper 1118.
Insertion of noise suppressor 1100 into the systems of FIGS. 1a-b as the noise suppression block provides preferred embodiment systems in which noise suppressor 1100 in part controls the output.
FIG. 5 illustrates the flow for codebook-based generalized Wiener filter noise suppression preferred embodiments having filter transfer functions:
H(ω).sup.2 =P.sub.S (ω)/ P.sub.s (ω)+αP.sub.N (ω)!
with α the noise suppression constant. Heuristically, the preferred embodiments estimate the noise PN (ω) in the same manner as step (5) of the previously described generalized Wiener filter preferred embodiments, and estimate PS (ω) by the use of the line spectral frequencies (LSF) of the input noisy speech as weightings for LSFs flora a codebook of noise-flee speech samples. In particular, codebook preferred embodiments proceed as follows.
(1) Partition an input stream of speech sampled at 8 KHz into 256-sample frames with a 50% overlap between successive frames; that is, follow the first step of the modified generalized Wiener filter preferred embodiments.
(2) Multiply each frame with a Hann window of width 256; again following the modified generalized Wiener filter preferred embodiment.
(3) For each windowed frame with samples y(j), find the Mth (typically 8th) order LPC filter coefficients a0 (=1), a1, a2, . . . aM by solving the M linear equations for M unknowns:
Σ.sub.i a.sub.i r(j+i)=0 for j=1, 2 . . . M
where r(.) is the autocorrelation of y(.). This again follows the modified generalized Wiener filter preferred embodiments. The gain of the LPC spectrum is Σi ai r(i).
(4) Compute the line spectral frequencies (LSF) from the LPC coefficients. That is, set P(z)=A(z)+A(1/z)/zM and Q(z)=A(z)-A(1/z)/zM where A(z)=1+a1 /z+a2 /z2 + . . . +aM /zM is the analysis LPC filter, and solve for the roots of the polynomials P(z) and Q(z). These roots all lie on the unit circle |z|=1 and so have the form eiω with the ωs being the LSFs for the noisy speech frame. Recall that the use of LSFs instead of LPC coefficients for speech coding provides better quantization error properties.
(5) Compute the distance of the noisy speech frame LSFs from each of the entries of a codebook of M-tuples of LSFs. That is, each codebook entry is a set of M LSFs in size order. The codebook has 256 of such entries which have been determined by conventional vector quantiztion training (e.g., LBG algorithm) on sets of M LSFs from noise-free speech samples.
In more detail, let (LSFj,1, LSFj,2, LSFj,3, . . . , LSFj,M) be M LSFs of the jth entry of the codebook; then take the distance of the noisy speech frame LSFs, (LSFn,1, LSFn,2, LSFn,3, . . . LSFn,M) from the jth entry to be:
d.sub.j =Σ.sub.i (LSF.sub.j,1 =LSF.sub.n,1)/(LSF.sub.n,1 -LSF.sub.n,c(i))
where LSFn,c(i) is the noisy speech frame LSF which is the closest to LSFn,i (so c(i) will be either i-1 or i+1 if the LSFn,1 are in size order). Thus, this distance measure is dominated by the LSFn,i which are close to each other, and this provides good results because such LSFs have a higher chance of being formants in the noisy speech frame.
(6) Estimate the M LSFs (LSFs,1 , LSFs,2 , . . . LSFs,M) for the noise-free speech of the frame by a probability weighting of the codebook LSFs:
LSF.sub.s,i =Σ.sub.j p.sub.j LSF.sub.j,i
where the probabilities pj derive from the distance measures of the noisy speech frame LSFs from the codebook entries:
p.sub.j =exp(-γd.sub.j)/Σ.sub.k exp(-γd.sub.k)
where the constant γ controls the dynamic range for the probabilities and can be taken equal 0.002. Larger values of γ imply increased emphasis on the weights of the higher probability codewords.
(7) Convert the estimated noise-free speech LSFs to LPC coefficients, ai , and compute the estimated noise-free speech power spectrum as
P.sub.S (ω)=Σ.sub.i a.sub.i r(i)/|ρ.sub.k a.sub.k exp(-jkω)|.sup.2
where Σi ai r(i) is the gain of the LPC spectrum from step (3).
(8) Estimate the noise power spectrum PN (ω) as before: see step (5) of the modified generalized Wiener filter section.
(9) Take α equal to 10, and form the filter transfer function
H.sub.1 (ω).sup.2 =P.sub.S (ω)/ P.sub.S (ω)+αP.sub.N (ω)!
where PS (ω) comes from step (7) and PN (ω) from step (8).
(10) Clamp H1 (ω) as in the other preferred embodiments to avoid filter fluctuations to obtain the final generalized Wiener filter transfer function: H(ω)=max(-10 dB, H1 (ω)). Alternatively, an adaptive clamp could be used.
(11) Compute the 256-point discrete Fourier transform of y(j) to obtain Y(ω).
(12) Take S (ω)=H(ω)Y(ω) as an estimate for the spectrum of the frame of speech with noise removed.
(13) Compute the 256-point inverse fast Fourier transform of S (ω) to be the estimate s (j) of speech with noise removed for the frame.
(14) Iterate steps (3)-(13) six or seven times using the estimate s (j) from step (13) for y(j) in step (3). FIG. 5 shows the iteration path
(15) Add the s (j) of the overlapping portions of successive frames to get s(j) as the final noise suppressed estimate.
FIG. 12 shows in block form preferred embodiment noise suppressor 1200 which implements the codebook modified generalized Wiener filter preferred embodiment. In particular, FFT 1202 performs a fast Fourier transform of an input frame to give Y(.) and antocorrelator 1204 performs autocorrelation on the input frame to yield r(.). LPC coefficient analyzer 1206 derives the LPC coefficients aj, and LPC-W-LSF converter 1208 gives the LSF coefficients to ALU 1210. Codebook 1212 provides codebook LSF coefficients to ALU 1210 which then forms the noise-free signal LSF coefficient estimates to LSF-to-LPC converter 1214 for conversion to LPC estimates and then to ALU 1216 to form power estimate PY (.). Noise buffer 1220 and ALU 1222 update the noise estimate PN (.) as with the preceding preferred embodiments, and ALU 1224 uses PY (.) and PN (.) to form the first approximation unclamped H1 and clamper 1226 then yields clamped H1 to be used in multiplier 1230 to perform the filtering. IFFT 1232 performs the inverse FFT to yield the first approximation filtered frame. Iteration counter send the first approximation filtered frame back to antocorrelator 1204 to start generation of a second approximation filter H2. This second approximation filter applied to Y(.) yields the second approximation filtered frame which iteration counter 1234 again sends back to autocorrelator 1204 to start generation of a third approximation H3. Iteration counter repeats this six times to finally yield a seventh approximation filter and filtered frame which then becomes the output filtered frame. Each component has associated buffer memory, and controller 1240 provides the timing and enablement signals to the various components. The adaptive clamp could be used for clamper 1226.
Insertion of noise suppressor 1200 into the systems of FIGS. 1a-b as the noise suppression blocks provides preferred embodiment systems in which noise suppressor 1200 in part controls the output.
The preferred embodiments employ various operations such as FFT, and with low power frames the Signal samples are small and precision may be lost in multiplications. For example, squaring a 16-bit fixed-point sample will yield a 32-bit result, but memory limitations may demand that only 16 bits be stored and so only the upper 16 bits will be chosen to avoid overflow. Thus an input sample with only the lowest 9 bits nonzero will have an 18-bit answer which implies only the two most significant bits will be retained and thus a loss of precision.
An automatic gain control to bring input samples up to a higher level avoids such a loss of precision but destroys the power level information: both loud and quiet input speech will have the same power output levels. Also, such automatic gain control typically relies on the sample stream and does not consider a frame at a time.
A preferred embodiment precision control method proceeds as follows.
(1) Presume that an (N+1)-bit two's complement integer format for the noisy speech samples u(,j) and other variables, and presume that the variables have been scaled to the range -1≦X<+1. Thus for 16-bit format with hexadecimal notation, variables lie in the range from 8000 to 7FFF. First, estimate the power for an input frame of 256 samples by Σu(j)2 with the sum over the corresponding 256 js.
(2) Count the number of significant bits, S, in the power estimate sum. Note that with |u(j)| having an average size of K significant bits, S will be about 2K+8. So the number of bits in the sum reflects the average sample magnitude with the maximum possible S equal 2N+ 8.
(3) Pick the frame scaling factor so as to set the average sample size to have (2N+8-S)/2-H significant bits where H is an integer, such as 3, of additional headroom bits. That is, the frame scaling factor is 2.sup.(2N+8-S)2-H. In terms of the K of step (2), the scaling factor equals 2N-K-H. For example, with 16-bit format and 3 overhead bits, if the average sample magnitude is 2-9 (7 significant bits), then the scaling factor will be 25 so the average scaled sampled magnitude is 2-4 which leaves 3 bits (23) before overflow occurs at 20.
(4) Apply the Hann window (see steps (1)-(2) of the modified generalized Wiener filter section) to the frame by point wise multiplication. Thus with y(j) denoting the windowed samples,
y(j)=u(j)(1+cos(2πj/256))/2
for the variable j presumed translated into the range -128 to +127. Do this windowing before the scaling to help avoid overflow on the much larger than average samples as they could fail at the edges of the window. Of course, this windowing could follow the scaling of the next step.
(5) Scale the windowed input samples simply by left shifting (2N+8-S)/2-H bits (if the number of bits is negative, then this is a right shift). If a sample has magnitude more than 2H times the average, then overflow will occur and in this case just replace the scaled sample with the corresponding maximum magnitude (e.g., 8000 or 7FFF). Indeed, if the sign bit changes, then overflow has occurred and the scaled sample is taken as the corresponding maximum magnitude. Thus with YS (j) denoting the scaled windowed samples and no overflow:
y.sub.S (j)=y(j)2.sup.(2N+8-S)/2-H
(6) Compute the FFT using yS (j) to find YS (ω). The use of yS (j) avoids the loss of precision which otherwise would have occurred with the FFT due to underflow avoidance.
(7) Apply a local smoothing window to YS (ω) as in step (3) of the spectral substraction preferred embodiments.
(8) Scale down by shifting YS (ω) (2N+8-S)/2-H bits to the right (with the new sign bit repeating the original sign bit) to have Y(ω) for noise estimation and filter application in the preferred embodiments previously described.
An alternative precision control scaling uses the sum of the absolute values of the samples in a frame rather than the power estimate (sum of the squares of the samples). As with the power estimate scaling, count the number S of significant bits is the sum of absolute values and scale the input samples by a factor of 2N+8-H where again N+1 is the number bits in the sample representation, the 8 comes from the 256 (28) sample frame size, and H provides headroom bits. Heuristically, with samples of K significant bits on the average, the sum of absolute values should be about K+8 bits, and so S will be about K+8 and the factor will be 2N-K-H which is the same as the power estimate sum scaling. Further, even using the power estimate sum with S significant bits, scaling factors such as 2.sup.(2N+8-S)-H have yielded good results. That is, variations of the method of scaling up according to a frame characteristic, processing, and then scaling down will also be viable provided the scaling does not lead to excessive overflow.
FIG. 13 illustrates in block format a internal precision controller preferred embodiment which could be used with any of the foregoing noise suppression filter preferred embodiments. In particular, frame energy measurer 1302 determines the scaling factor to be used, and scaler 1304 applies the scaling factor to the input frame. Filter 1306 filters the scaled frame, and inverse scaler 1308 then undoes the scaling to return to the original input signal levels. Filter 1306 could be any of the foregoing preferred embodiment filters. Parameters from filter 1306 may be part of the scale factor determination by measurer 1302. And insertion of noise suppressors 1300 into the systems of FIGS. 1a-b provides preferred embodiment systems in which noise suppressor 1300 in part controls the output.
The preferred embodiments may be varied in many ways while retaining one or more of the features of clamping, noise enhancing, smoothed power estimating, recursive noise estimating, adaptive clamping, adaptive noise suppression factoring, codebook based estimating, and internal precision controlling.
For example, the various generalized Wiener filters of the preferred embodiments had power β equal to 1/2, but other powers such as 1, 3/4, 1/4, and so forth also apply; higher filter powers imply stronger filtering. The frame size of 256 samples could be increased or decreased, although powers of 2 are convenient for FFTs. The particular choice of 3 bits of additional headroom could be varied, especially with different size frames and different number of bits in the sample representation. The adaptive clamp could have a negative dependence upon frame noise and signal estimates (B<0). Also, the adaptive clamp could invoke a near-end speech detection method to adjust the clamp level. The α and κ coefficients could be varied and could enter the transfer functions as simple analytic functions of the ratios, and the number iterations in the codebook based generalized Wiener filter could be varied.
Claims (12)
1. A filter, comprising:
(a) an input for receiving frames of sampled signals;
(b) an attenuation filter coupled to said input, wherein said attenuation filter includes a noise suppression factor with said noise suppression factor depending on EN divided by EY where EN is an estimate of noise energy of a frame and EY is an estimate of signal energy of said frame; and
(c) an output coupled to said attenuation filter for emitting filtered frames.
2. The filter of claim 1, wherein:
(a) said noise suppression factor is proportional to EN /EY.
3. The filter of claim 1, wherein:
(a) said attenuation filter has a transfer function H(ω) given by H(ω)2 =PY (ω)/(PY (ω)+ EN /(EY -EN !κPN (ω)) where PY is an estimate of the signal power spectrum of said frame, PN is an estimate of the noise power spectrum of said frame, and κ is a constant.
4. The filter of claim 3, wherein:
(a) κ is in the range of 6 to 7.
5. The filter of claim 3, wherein:
(a) PY is the power spectrum of a linear prediction coefficient (LPC) approximation of said frame; and
(b) EY is the energy of said LPC approximation.
6. The filter of claim 3, wherein:
(a) said PN is taken equal to: (i) a first product of a first constant and a noise power spectrum estimate for a preceding frame when PY exceeds said first product, (ii) a second product of a second constant and said noise power spectrum estimate for a preceding frame when PY is less than said second product, and (iii) PY otherwise; and
(b) said EN is the sum over frequencies of said PN.
7. The filter of claim 1, wherein:
(a) said attenuation filter has a transfer function H(ω) given by H(ω)2 =max{C, PY (ω)/(PY (ω)+ EN /(EY -EN)!κPN (ω))} where max{A,B} is the maximum of A and B for all A and B, C is a clamp, PY is an estimate of the signal power spectrum of said frame, PY is an estimate of the noise power spectrum of said frame, and κ is a constant.
8. The filter of claim 7, wherein:
(a) said PN is taken equal to: (i) a first product of a first constant and a noise power spectrum estimate for a preceding frame when PY exceeds said first product, (ii) a second product of a second constant and said noise power spectrum estimate for a preceding frame when PY is less than said second product, and (iii) PY otherwise; and
(b) said EN is the sum over frequencies of said PN.
9. The filter of claim 1, wherein:
(a) said attenuation filter has a transfer function H(ω) given by H(ω)2 =W★max{C, PY (ω)/(PY (ω)+ EN /(EY -EN)!κPN (ω))} where W is a window function, ★ denotes convolution, max{A,B} is the maximum of A and B for all A and B, C is a clamp, PY is an estimate of the signal power spectrum of said frame, PN is an estimate of the noise power spectrum of said frame, and κ is a constant.
10. A method of filtering a stream of sampled acoustic signals, comprising the steps of:
(a) partitioning a stream of sampled acoustic signals into a sequence of frames;
(b) Fourier transforming said frames to yield a sequence of transformed frames;
(c) applying a generalized Wiener filter with a noise suppression factor to said transformed frames to yield a sequence of filtered transformed frames, wherein said noise suppression factor of said filter for a transformed frame depends upon estimates of the signal-to-noise ratio of said transformed frame; and
(d) inverse Fourier transforming said sequence of filtered transformed frames to yield a sequence of filtered frames.
11. The method of claim 10, wherein:
(a) said generalized Wiener filter has a transfer function H(ω) given by H(ω)2 =max{C, PY (ω)/(PY (ω)+ EN /(EY -EN)!κPN (ω))} where max{A,B} is the maximum of A and B for all A and B, C is a clamp, PY is an estimate of the signal power spectrum of said frame, PN is an estimate of the noise power spectrum of said frame, and κ is a constant.
12. A speech system comprising:
a. a speech acquiring module;
b. a noise suppressing module couple to said acquiring module;
c. an analyzing module coupled to said suppressing module;
d. a transmitting/storing module coupled to said analyzing module;
e. a synthesizing module coupled to said transmitting/storing module;
f. a playingback module coupled to said synthesizing module;
g. wherein said noise suppressing module includes an attenuating filter with a noise suppressing factor depending upon EN divide by EY where EN is an estimate of noise energy of a frame of speech and EY is an estimate of signal energy of said frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/425,125 US5706395A (en) | 1995-04-19 | 1995-04-19 | Adaptive weiner filtering using a dynamic suppression factor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/425,125 US5706395A (en) | 1995-04-19 | 1995-04-19 | Adaptive weiner filtering using a dynamic suppression factor |
Publications (1)
Publication Number | Publication Date |
---|---|
US5706395A true US5706395A (en) | 1998-01-06 |
Family
ID=23685267
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/425,125 Expired - Lifetime US5706395A (en) | 1995-04-19 | 1995-04-19 | Adaptive weiner filtering using a dynamic suppression factor |
Country Status (1)
Country | Link |
---|---|
US (1) | US5706395A (en) |
Cited By (89)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5787390A (en) * | 1995-12-15 | 1998-07-28 | France Telecom | Method for linear predictive analysis of an audiofrequency signal, and method for coding and decoding an audiofrequency signal including application thereof |
US5999899A (en) * | 1997-06-19 | 1999-12-07 | Softsound Limited | Low bit rate audio coder and decoder operating in a transform domain using vector quantization |
US6002722A (en) * | 1996-05-09 | 1999-12-14 | Texas Instruments Incorporated | Multimode digital modem |
WO1999067774A1 (en) * | 1998-06-22 | 1999-12-29 | Dspc Technologies Ltd. | A noise suppressor having weighted gain smoothing |
WO2000041169A1 (en) * | 1999-01-07 | 2000-07-13 | Tellabs Operations, Inc. | Method and apparatus for adaptively suppressing noise |
US6112170A (en) * | 1998-06-26 | 2000-08-29 | Lsi Logic Corporation | Method for decompressing linear PCM and AC3 encoded audio gain value |
US6122609A (en) * | 1997-06-09 | 2000-09-19 | France Telecom | Method and device for the optimized processing of a disturbing signal during a sound capture |
US6175602B1 (en) * | 1998-05-27 | 2001-01-16 | Telefonaktiebolaget Lm Ericsson (Publ) | Signal noise reduction by spectral subtraction using linear convolution and casual filtering |
US6205421B1 (en) * | 1994-12-19 | 2001-03-20 | Matsushita Electric Industrial Co., Ltd. | Speech coding apparatus, linear prediction coefficient analyzing apparatus and noise reducing apparatus |
US6243674B1 (en) * | 1995-10-20 | 2001-06-05 | American Online, Inc. | Adaptively compressing sound with multiple codebooks |
US20010005822A1 (en) * | 1999-12-13 | 2001-06-28 | Fujitsu Limited | Noise suppression apparatus realized by linear prediction analyzing circuit |
WO2001073759A1 (en) * | 2000-03-28 | 2001-10-04 | Tellabs Operations, Inc. | Perceptual spectral weighting of frequency bands for adaptive noise cancellation |
US20020010583A1 (en) * | 1997-10-31 | 2002-01-24 | Naoto Iwahashi | Feature extraction apparatus and method and pattern recognition apparatus and method |
US20020035471A1 (en) * | 2000-05-09 | 2002-03-21 | Thomson-Csf | Method and device for voice recognition in environments with fluctuating noise levels |
US6415253B1 (en) * | 1998-02-20 | 2002-07-02 | Meta-C Corporation | Method and apparatus for enhancing noise-corrupted speech |
WO2002061733A1 (en) * | 2001-01-31 | 2002-08-08 | Motorola, Inc. | Methods and apparatus for reducing noise associated with an electrical speech signal |
EP1238479A1 (en) * | 1999-12-03 | 2002-09-11 | Motorola, Inc. | Method and apparatus for suppressing acoustic background noise in a communication system |
US20020128830A1 (en) * | 2001-01-25 | 2002-09-12 | Hiroshi Kanazawa | Method and apparatus for suppressing noise components contained in speech signal |
US6463408B1 (en) * | 2000-11-22 | 2002-10-08 | Ericsson, Inc. | Systems and methods for improving power spectral estimation of speech signals |
WO2002082427A1 (en) * | 2001-04-09 | 2002-10-17 | Koninklijke Philips Electronics N.V. | Speech enhancement device |
US20020184010A1 (en) * | 2001-03-30 | 2002-12-05 | Anders Eriksson | Noise suppression |
EP1286334A2 (en) * | 2001-07-31 | 2003-02-26 | Alcatel | Method and circuit arrangement for reducing noise during voice communication in communications systems |
US6604071B1 (en) * | 1999-02-09 | 2003-08-05 | At&T Corp. | Speech enhancement with gain limitations based on speech activity |
US6629068B1 (en) * | 1998-10-13 | 2003-09-30 | Nokia Mobile Phones, Ltd. | Calculating a postfilter frequency response for filtering digitally processed speech |
US6721700B1 (en) * | 1997-03-14 | 2004-04-13 | Nokia Mobile Phones Limited | Audio coding method and apparatus |
US20040083095A1 (en) * | 2002-10-23 | 2004-04-29 | James Ashley | Method and apparatus for coding a noise-suppressed audio signal |
GB2398913A (en) * | 2003-02-27 | 2004-09-01 | Motorola Inc | Noise estimation in speech recognition |
US20050071154A1 (en) * | 2003-09-30 | 2005-03-31 | Walter Etter | Method and apparatus for estimating noise in speech signals |
US20050071160A1 (en) * | 2003-09-26 | 2005-03-31 | Industrial Technology Research Institute | Energy feature extraction method for noisy speech recognition |
US20050071156A1 (en) * | 2003-09-30 | 2005-03-31 | Intel Corporation | Method for spectral subtraction in speech enhancement |
US20050091049A1 (en) * | 2003-10-28 | 2005-04-28 | Rongzhen Yang | Method and apparatus for reduction of musical noise during speech enhancement |
US20050182624A1 (en) * | 2004-02-16 | 2005-08-18 | Microsoft Corporation | Method and apparatus for constructing a speech filter using estimates of clean speech and noise |
US20050203735A1 (en) * | 2004-03-09 | 2005-09-15 | International Business Machines Corporation | Signal noise reduction |
US20050240401A1 (en) * | 2004-04-23 | 2005-10-27 | Acoustic Technologies, Inc. | Noise suppression based on Bark band weiner filtering and modified doblinger noise estimate |
US20050278172A1 (en) * | 2004-06-15 | 2005-12-15 | Microsoft Corporation | Gain constrained noise suppression |
US20060020454A1 (en) * | 2004-07-21 | 2006-01-26 | Phonak Ag | Method and system for noise suppression in inductive receivers |
US20060116874A1 (en) * | 2003-10-24 | 2006-06-01 | Jonas Samuelsson | Noise-dependent postfiltering |
US20060184363A1 (en) * | 2005-02-17 | 2006-08-17 | Mccree Alan | Noise suppression |
US20060200344A1 (en) * | 2005-03-07 | 2006-09-07 | Kosek Daniel A | Audio spectral noise reduction method and apparatus |
US20060271356A1 (en) * | 2005-04-01 | 2006-11-30 | Vos Koen B | Systems, methods, and apparatus for quantization of spectral envelope representation |
EP1729287A1 (en) * | 1999-01-07 | 2006-12-06 | Tellabs Operations, Inc. | Method and apparatus for adaptively suppressing noise |
US20060277039A1 (en) * | 2005-04-22 | 2006-12-07 | Vos Koen B | Systems, methods, and apparatus for gain factor smoothing |
US7177805B1 (en) | 1999-02-01 | 2007-02-13 | Texas Instruments Incorporated | Simplified noise suppression circuit |
US20070150270A1 (en) * | 2005-12-26 | 2007-06-28 | Tai-Huei Huang | Method for removing background noise in a speech signal |
US20080167870A1 (en) * | 2007-07-25 | 2008-07-10 | Harman International Industries, Inc. | Noise reduction with integrated tonal noise reduction |
US20090012786A1 (en) * | 2007-07-06 | 2009-01-08 | Texas Instruments Incorporated | Adaptive Noise Cancellation |
US20090012783A1 (en) * | 2007-07-06 | 2009-01-08 | Audience, Inc. | System and method for adaptive intelligent noise suppression |
US20090063143A1 (en) * | 2007-08-31 | 2009-03-05 | Gerhard Uwe Schmidt | System for speech signal enhancement in a noisy environment through corrective adjustment of spectral noise power density estimations |
WO2009082299A1 (en) * | 2007-12-20 | 2009-07-02 | Telefonaktiebolaget L M Ericsson (Publ) | Noise suppression method and apparatus |
US20090323982A1 (en) * | 2006-01-30 | 2009-12-31 | Ludger Solbach | System and method for providing noise suppression utilizing null processing noise subtraction |
US20100094643A1 (en) * | 2006-05-25 | 2010-04-15 | Audience, Inc. | Systems and methods for reconstructing decomposed audio signals |
US20100100386A1 (en) * | 2007-03-19 | 2010-04-22 | Dolby Laboratories Licensing Corporation | Noise Variance Estimator for Speech Enhancement |
US20100169082A1 (en) * | 2007-06-15 | 2010-07-01 | Alon Konchitsky | Enhancing Receiver Intelligibility in Voice Communication Devices |
US20100174535A1 (en) * | 2009-01-06 | 2010-07-08 | Skype Limited | Filtering speech |
US7885810B1 (en) * | 2007-05-10 | 2011-02-08 | Mediatek Inc. | Acoustic signal enhancement method and apparatus |
US7889874B1 (en) | 1999-11-15 | 2011-02-15 | Nokia Corporation | Noise suppressor |
US20110123045A1 (en) * | 2008-11-04 | 2011-05-26 | Hirohisa Tasaki | Noise suppressor |
US20120035920A1 (en) * | 2010-08-04 | 2012-02-09 | Fujitsu Limited | Noise estimation apparatus, noise estimation method, and noise estimation program |
US8143620B1 (en) | 2007-12-21 | 2012-03-27 | Audience, Inc. | System and method for adaptive classification of audio sources |
US8150065B2 (en) | 2006-05-25 | 2012-04-03 | Audience, Inc. | System and method for processing an audio signal |
US8180064B1 (en) | 2007-12-21 | 2012-05-15 | Audience, Inc. | System and method for providing voice equalization |
US8189766B1 (en) | 2007-07-26 | 2012-05-29 | Audience, Inc. | System and method for blind subband acoustic echo cancellation postfiltering |
US8194882B2 (en) | 2008-02-29 | 2012-06-05 | Audience, Inc. | System and method for providing single microphone noise suppression fallback |
US8194880B2 (en) | 2006-01-30 | 2012-06-05 | Audience, Inc. | System and method for utilizing omni-directional microphones for speech enhancement |
EP2460156A1 (en) * | 2009-07-29 | 2012-06-06 | BYD Company Limited | Method and device for eliminating background noise |
US8204252B1 (en) | 2006-10-10 | 2012-06-19 | Audience, Inc. | System and method for providing close microphone adaptive array processing |
US8204253B1 (en) | 2008-06-30 | 2012-06-19 | Audience, Inc. | Self calibration of audio device |
US8259926B1 (en) | 2007-02-23 | 2012-09-04 | Audience, Inc. | System and method for 2-channel and 3-channel acoustic echo cancellation |
US20120290296A1 (en) * | 2005-09-02 | 2012-11-15 | Nec Corporation | Method, Apparatus, and Computer Program for Suppressing Noise |
US8345890B2 (en) | 2006-01-05 | 2013-01-01 | Audience, Inc. | System and method for utilizing inter-microphone level differences for speech enhancement |
US8355511B2 (en) | 2008-03-18 | 2013-01-15 | Audience, Inc. | System and method for envelope-based acoustic echo cancellation |
US20130054232A1 (en) * | 2011-08-24 | 2013-02-28 | Texas Instruments Incorporated | Method, System and Computer Program Product for Attenuating Noise in Multiple Time Frames |
US8521530B1 (en) | 2008-06-30 | 2013-08-27 | Audience, Inc. | System and method for enhancing a monaural audio signal |
US20130231927A1 (en) * | 2012-03-05 | 2013-09-05 | Pierre Zakarauskas | Formant Based Speech Reconstruction from Noisy Signals |
US8774423B1 (en) | 2008-06-30 | 2014-07-08 | Audience, Inc. | System and method for controlling adaptivity of signal modification using a phantom coefficient |
US8849231B1 (en) | 2007-08-08 | 2014-09-30 | Audience, Inc. | System and method for adaptive power control |
US8949120B1 (en) | 2006-05-25 | 2015-02-03 | Audience, Inc. | Adaptive noise cancelation |
US20150081285A1 (en) * | 2013-09-16 | 2015-03-19 | Samsung Electronics Co., Ltd. | Speech signal processing apparatus and method for enhancing speech intelligibility |
US9008329B1 (en) | 2010-01-26 | 2015-04-14 | Audience, Inc. | Noise reduction using multi-feature cluster tracker |
US9159336B1 (en) * | 2013-01-21 | 2015-10-13 | Rawles Llc | Cross-domain filtering for audio noise reduction |
US20150294667A1 (en) * | 2014-04-09 | 2015-10-15 | Electronics And Telecommunications Research Institute | Noise cancellation apparatus and method |
EP3057097A1 (en) * | 2015-02-11 | 2016-08-17 | Nxp B.V. | Time zero convergence single microphone noise reduction |
US9536540B2 (en) | 2013-07-19 | 2017-01-03 | Knowles Electronics, Llc | Speech signal separation and synthesis based on auditory scene analysis and speech modeling |
US9558755B1 (en) | 2010-05-20 | 2017-01-31 | Knowles Electronics, Llc | Noise suppression assisted automatic speech recognition |
CN106486131A (en) * | 2016-10-14 | 2017-03-08 | 上海谦问万答吧云计算科技有限公司 | A kind of method and device of speech de-noising |
US9640194B1 (en) | 2012-10-04 | 2017-05-02 | Knowles Electronics, Llc | Noise suppression for speech processing based on machine-learning mask estimation |
US9799330B2 (en) | 2014-08-28 | 2017-10-24 | Knowles Electronics, Llc | Multi-sourced noise suppression |
US20180204580A1 (en) * | 2015-09-25 | 2018-07-19 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Encoder and method for encoding an audio signal with reduced background noise using linear predictive coding |
CN118130440A (en) * | 2024-03-13 | 2024-06-04 | 济宁学院 | Quantum dot printing ink rapid detection method based on spectral characteristics |
-
1995
- 1995-04-19 US US08/425,125 patent/US5706395A/en not_active Expired - Lifetime
Non-Patent Citations (4)
Title |
---|
Arslan et al., "New Methods for Adaptive Noise Suppression," ICASSP '95: Acoustics, Speech & Signal Processing Conference, pp. 812-815, May 1995. |
Arslan et al., New Methods for Adaptive Noise Suppression, ICASSP 95: Acoustics, Speech & Signal Processing Conference, pp. 812 815, May 1995. * |
Deller et al., "Discrete-Time Processing of Speech Signals," Prentice-Hall, Inc., pp. 506-528, 1987. |
Deller et al., Discrete Time Processing of Speech Signals, Prentice Hall, Inc., pp. 506 528, 1987. * |
Cited By (170)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6205421B1 (en) * | 1994-12-19 | 2001-03-20 | Matsushita Electric Industrial Co., Ltd. | Speech coding apparatus, linear prediction coefficient analyzing apparatus and noise reducing apparatus |
US6424941B1 (en) | 1995-10-20 | 2002-07-23 | America Online, Inc. | Adaptively compressing sound with multiple codebooks |
US6243674B1 (en) * | 1995-10-20 | 2001-06-05 | American Online, Inc. | Adaptively compressing sound with multiple codebooks |
US5787390A (en) * | 1995-12-15 | 1998-07-28 | France Telecom | Method for linear predictive analysis of an audiofrequency signal, and method for coding and decoding an audiofrequency signal including application thereof |
US6002722A (en) * | 1996-05-09 | 1999-12-14 | Texas Instruments Incorporated | Multimode digital modem |
US7194407B2 (en) | 1997-03-14 | 2007-03-20 | Nokia Corporation | Audio coding method and apparatus |
US20040093208A1 (en) * | 1997-03-14 | 2004-05-13 | Lin Yin | Audio coding method and apparatus |
US6721700B1 (en) * | 1997-03-14 | 2004-04-13 | Nokia Mobile Phones Limited | Audio coding method and apparatus |
US6122609A (en) * | 1997-06-09 | 2000-09-19 | France Telecom | Method and device for the optimized processing of a disturbing signal during a sound capture |
US5999899A (en) * | 1997-06-19 | 1999-12-07 | Softsound Limited | Low bit rate audio coder and decoder operating in a transform domain using vector quantization |
US7509256B2 (en) | 1997-10-31 | 2009-03-24 | Sony Corporation | Feature extraction apparatus and method and pattern recognition apparatus and method |
US20020010583A1 (en) * | 1997-10-31 | 2002-01-24 | Naoto Iwahashi | Feature extraction apparatus and method and pattern recognition apparatus and method |
US20050171772A1 (en) * | 1997-10-31 | 2005-08-04 | Sony Corporation | Feature extraction apparatus and method and pattern recognition apparatus and method |
US6910010B2 (en) * | 1997-10-31 | 2005-06-21 | Sony Corporation | Feature extraction apparatus and method and pattern recognition apparatus and method |
US6415253B1 (en) * | 1998-02-20 | 2002-07-02 | Meta-C Corporation | Method and apparatus for enhancing noise-corrupted speech |
US6175602B1 (en) * | 1998-05-27 | 2001-01-16 | Telefonaktiebolaget Lm Ericsson (Publ) | Signal noise reduction by spectral subtraction using linear convolution and casual filtering |
WO1999067774A1 (en) * | 1998-06-22 | 1999-12-29 | Dspc Technologies Ltd. | A noise suppressor having weighted gain smoothing |
US6317709B1 (en) | 1998-06-22 | 2001-11-13 | D.S.P.C. Technologies Ltd. | Noise suppressor having weighted gain smoothing |
US6088668A (en) * | 1998-06-22 | 2000-07-11 | D.S.P.C. Technologies Ltd. | Noise suppressor having weighted gain smoothing |
US6112170A (en) * | 1998-06-26 | 2000-08-29 | Lsi Logic Corporation | Method for decompressing linear PCM and AC3 encoded audio gain value |
US6629068B1 (en) * | 1998-10-13 | 2003-09-30 | Nokia Mobile Phones, Ltd. | Calculating a postfilter frequency response for filtering digitally processed speech |
US20050131678A1 (en) * | 1999-01-07 | 2005-06-16 | Ravi Chandran | Communication system tonal component maintenance techniques |
EP1748426A3 (en) * | 1999-01-07 | 2007-02-21 | Tellabs Operations, Inc. | Method and apparatus for adaptively suppressing noise |
WO2000041169A1 (en) * | 1999-01-07 | 2000-07-13 | Tellabs Operations, Inc. | Method and apparatus for adaptively suppressing noise |
US7366294B2 (en) | 1999-01-07 | 2008-04-29 | Tellabs Operations, Inc. | Communication system tonal component maintenance techniques |
EP1729287A1 (en) * | 1999-01-07 | 2006-12-06 | Tellabs Operations, Inc. | Method and apparatus for adaptively suppressing noise |
US8031861B2 (en) | 1999-01-07 | 2011-10-04 | Tellabs Operations, Inc. | Communication system tonal component maintenance techniques |
US6591234B1 (en) * | 1999-01-07 | 2003-07-08 | Tellabs Operations, Inc. | Method and apparatus for adaptively suppressing noise |
US7177805B1 (en) | 1999-02-01 | 2007-02-13 | Texas Instruments Incorporated | Simplified noise suppression circuit |
US6604071B1 (en) * | 1999-02-09 | 2003-08-05 | At&T Corp. | Speech enhancement with gain limitations based on speech activity |
US7889874B1 (en) | 1999-11-15 | 2011-02-15 | Nokia Corporation | Noise suppressor |
EP1238479A1 (en) * | 1999-12-03 | 2002-09-11 | Motorola, Inc. | Method and apparatus for suppressing acoustic background noise in a communication system |
EP1238479A4 (en) * | 1999-12-03 | 2005-07-27 | Motorola Inc | Method and apparatus for suppressing acoustic background noise in a communication system |
US20010005822A1 (en) * | 1999-12-13 | 2001-06-28 | Fujitsu Limited | Noise suppression apparatus realized by linear prediction analyzing circuit |
WO2001073759A1 (en) * | 2000-03-28 | 2001-10-04 | Tellabs Operations, Inc. | Perceptual spectral weighting of frequency bands for adaptive noise cancellation |
US6859773B2 (en) * | 2000-05-09 | 2005-02-22 | Thales | Method and device for voice recognition in environments with fluctuating noise levels |
US20020035471A1 (en) * | 2000-05-09 | 2002-03-21 | Thomson-Csf | Method and device for voice recognition in environments with fluctuating noise levels |
US6463408B1 (en) * | 2000-11-22 | 2002-10-08 | Ericsson, Inc. | Systems and methods for improving power spectral estimation of speech signals |
US20020128830A1 (en) * | 2001-01-25 | 2002-09-12 | Hiroshi Kanazawa | Method and apparatus for suppressing noise components contained in speech signal |
WO2002061733A1 (en) * | 2001-01-31 | 2002-08-08 | Motorola, Inc. | Methods and apparatus for reducing noise associated with an electrical speech signal |
US6480821B2 (en) * | 2001-01-31 | 2002-11-12 | Motorola, Inc. | Methods and apparatus for reducing noise associated with an electrical speech signal |
US20020184010A1 (en) * | 2001-03-30 | 2002-12-05 | Anders Eriksson | Noise suppression |
US7209879B2 (en) * | 2001-03-30 | 2007-04-24 | Telefonaktiebolaget Lm Ericsson (Publ) | Noise suppression |
US6996524B2 (en) | 2001-04-09 | 2006-02-07 | Koninklijke Philips Electronics N.V. | Speech enhancement device |
WO2002082427A1 (en) * | 2001-04-09 | 2002-10-17 | Koninklijke Philips Electronics N.V. | Speech enhancement device |
US20020156624A1 (en) * | 2001-04-09 | 2002-10-24 | Gigi Ercan Ferit | Speech enhancement device |
EP1286334A2 (en) * | 2001-07-31 | 2003-02-26 | Alcatel | Method and circuit arrangement for reducing noise during voice communication in communications systems |
EP1286334A3 (en) * | 2001-07-31 | 2004-02-11 | Alcatel | Method and circuit arrangement for reducing noise during voice communication in communications systems |
US20040083095A1 (en) * | 2002-10-23 | 2004-04-29 | James Ashley | Method and apparatus for coding a noise-suppressed audio signal |
US7343283B2 (en) * | 2002-10-23 | 2008-03-11 | Motorola, Inc. | Method and apparatus for coding a noise-suppressed audio signal |
US20070033020A1 (en) * | 2003-02-27 | 2007-02-08 | Kelleher Francois Holly L | Estimation of noise in a speech signal |
GB2398913A (en) * | 2003-02-27 | 2004-09-01 | Motorola Inc | Noise estimation in speech recognition |
GB2398913B (en) * | 2003-02-27 | 2005-08-17 | Motorola Inc | Noise estimation in speech recognition |
US20050071160A1 (en) * | 2003-09-26 | 2005-03-31 | Industrial Technology Research Institute | Energy feature extraction method for noisy speech recognition |
US7480614B2 (en) * | 2003-09-26 | 2009-01-20 | Industrial Technology Research Institute | Energy feature extraction method for noisy speech recognition |
US7428490B2 (en) * | 2003-09-30 | 2008-09-23 | Intel Corporation | Method for spectral subtraction in speech enhancement |
US20050071154A1 (en) * | 2003-09-30 | 2005-03-31 | Walter Etter | Method and apparatus for estimating noise in speech signals |
US20050071156A1 (en) * | 2003-09-30 | 2005-03-31 | Intel Corporation | Method for spectral subtraction in speech enhancement |
US20060116874A1 (en) * | 2003-10-24 | 2006-06-01 | Jonas Samuelsson | Noise-dependent postfiltering |
US20050091049A1 (en) * | 2003-10-28 | 2005-04-28 | Rongzhen Yang | Method and apparatus for reduction of musical noise during speech enhancement |
US7725314B2 (en) * | 2004-02-16 | 2010-05-25 | Microsoft Corporation | Method and apparatus for constructing a speech filter using estimates of clean speech and noise |
US20050182624A1 (en) * | 2004-02-16 | 2005-08-18 | Microsoft Corporation | Method and apparatus for constructing a speech filter using estimates of clean speech and noise |
US20050203735A1 (en) * | 2004-03-09 | 2005-09-15 | International Business Machines Corporation | Signal noise reduction |
US7797154B2 (en) | 2004-03-09 | 2010-09-14 | International Business Machines Corporation | Signal noise reduction |
US20080306734A1 (en) * | 2004-03-09 | 2008-12-11 | Osamu Ichikawa | Signal Noise Reduction |
KR100851716B1 (en) * | 2004-04-23 | 2008-08-11 | 어쿠스틱 테크놀로지스, 인코포레이티드 | Noise suppression based on bark band weiner filtering and modified doblinger noise estimate |
US20050240401A1 (en) * | 2004-04-23 | 2005-10-27 | Acoustic Technologies, Inc. | Noise suppression based on Bark band weiner filtering and modified doblinger noise estimate |
US7492889B2 (en) * | 2004-04-23 | 2009-02-17 | Acoustic Technologies, Inc. | Noise suppression based on bark band wiener filtering and modified doblinger noise estimate |
WO2005109404A3 (en) * | 2004-04-23 | 2007-11-22 | Acoustic Tech Inc | Noise suppression based upon bark band weiner filtering and modified doblinger noise estimate |
US20050278172A1 (en) * | 2004-06-15 | 2005-12-15 | Microsoft Corporation | Gain constrained noise suppression |
US7454332B2 (en) * | 2004-06-15 | 2008-11-18 | Microsoft Corporation | Gain constrained noise suppression |
US20060020454A1 (en) * | 2004-07-21 | 2006-01-26 | Phonak Ag | Method and system for noise suppression in inductive receivers |
US20060184363A1 (en) * | 2005-02-17 | 2006-08-17 | Mccree Alan | Noise suppression |
US20060200344A1 (en) * | 2005-03-07 | 2006-09-07 | Kosek Daniel A | Audio spectral noise reduction method and apparatus |
US7742914B2 (en) | 2005-03-07 | 2010-06-22 | Daniel A. Kosek | Audio spectral noise reduction method and apparatus |
US8140324B2 (en) | 2005-04-01 | 2012-03-20 | Qualcomm Incorporated | Systems, methods, and apparatus for gain coding |
US20060282263A1 (en) * | 2005-04-01 | 2006-12-14 | Vos Koen B | Systems, methods, and apparatus for highband time warping |
US20070088542A1 (en) * | 2005-04-01 | 2007-04-19 | Vos Koen B | Systems, methods, and apparatus for wideband speech coding |
US8484036B2 (en) | 2005-04-01 | 2013-07-09 | Qualcomm Incorporated | Systems, methods, and apparatus for wideband speech coding |
US20060271356A1 (en) * | 2005-04-01 | 2006-11-30 | Vos Koen B | Systems, methods, and apparatus for quantization of spectral envelope representation |
US20060277038A1 (en) * | 2005-04-01 | 2006-12-07 | Qualcomm Incorporated | Systems, methods, and apparatus for highband excitation generation |
US8364494B2 (en) | 2005-04-01 | 2013-01-29 | Qualcomm Incorporated | Systems, methods, and apparatus for split-band filtering and encoding of a wideband signal |
US20060277042A1 (en) * | 2005-04-01 | 2006-12-07 | Vos Koen B | Systems, methods, and apparatus for anti-sparseness filtering |
US8332228B2 (en) | 2005-04-01 | 2012-12-11 | Qualcomm Incorporated | Systems, methods, and apparatus for anti-sparseness filtering |
US8069040B2 (en) | 2005-04-01 | 2011-11-29 | Qualcomm Incorporated | Systems, methods, and apparatus for quantization of spectral envelope representation |
US8260611B2 (en) | 2005-04-01 | 2012-09-04 | Qualcomm Incorporated | Systems, methods, and apparatus for highband excitation generation |
US8244526B2 (en) | 2005-04-01 | 2012-08-14 | Qualcomm Incorporated | Systems, methods, and apparatus for highband burst suppression |
US20080126086A1 (en) * | 2005-04-01 | 2008-05-29 | Qualcomm Incorporated | Systems, methods, and apparatus for gain coding |
US8078474B2 (en) | 2005-04-01 | 2011-12-13 | Qualcomm Incorporated | Systems, methods, and apparatus for highband time warping |
US20070088541A1 (en) * | 2005-04-01 | 2007-04-19 | Vos Koen B | Systems, methods, and apparatus for highband burst suppression |
US8892448B2 (en) | 2005-04-22 | 2014-11-18 | Qualcomm Incorporated | Systems, methods, and apparatus for gain factor smoothing |
US20060277039A1 (en) * | 2005-04-22 | 2006-12-07 | Vos Koen B | Systems, methods, and apparatus for gain factor smoothing |
US9043214B2 (en) | 2005-04-22 | 2015-05-26 | Qualcomm Incorporated | Systems, methods, and apparatus for gain factor attenuation |
US20060282262A1 (en) * | 2005-04-22 | 2006-12-14 | Vos Koen B | Systems, methods, and apparatus for gain factor attenuation |
US8477963B2 (en) | 2005-09-02 | 2013-07-02 | Nec Corporation | Method, apparatus, and computer program for suppressing noise |
US20120290296A1 (en) * | 2005-09-02 | 2012-11-15 | Nec Corporation | Method, Apparatus, and Computer Program for Suppressing Noise |
US8489394B2 (en) * | 2005-09-02 | 2013-07-16 | Nec Corporation | Method, apparatus, and computer program for suppressing noise |
US20070150270A1 (en) * | 2005-12-26 | 2007-06-28 | Tai-Huei Huang | Method for removing background noise in a speech signal |
US8345890B2 (en) | 2006-01-05 | 2013-01-01 | Audience, Inc. | System and method for utilizing inter-microphone level differences for speech enhancement |
US8867759B2 (en) | 2006-01-05 | 2014-10-21 | Audience, Inc. | System and method for utilizing inter-microphone level differences for speech enhancement |
US9185487B2 (en) | 2006-01-30 | 2015-11-10 | Audience, Inc. | System and method for providing noise suppression utilizing null processing noise subtraction |
US20090323982A1 (en) * | 2006-01-30 | 2009-12-31 | Ludger Solbach | System and method for providing noise suppression utilizing null processing noise subtraction |
US8194880B2 (en) | 2006-01-30 | 2012-06-05 | Audience, Inc. | System and method for utilizing omni-directional microphones for speech enhancement |
US20100094643A1 (en) * | 2006-05-25 | 2010-04-15 | Audience, Inc. | Systems and methods for reconstructing decomposed audio signals |
US8150065B2 (en) | 2006-05-25 | 2012-04-03 | Audience, Inc. | System and method for processing an audio signal |
US8934641B2 (en) | 2006-05-25 | 2015-01-13 | Audience, Inc. | Systems and methods for reconstructing decomposed audio signals |
US9830899B1 (en) | 2006-05-25 | 2017-11-28 | Knowles Electronics, Llc | Adaptive noise cancellation |
US8949120B1 (en) | 2006-05-25 | 2015-02-03 | Audience, Inc. | Adaptive noise cancelation |
US8204252B1 (en) | 2006-10-10 | 2012-06-19 | Audience, Inc. | System and method for providing close microphone adaptive array processing |
US8259926B1 (en) | 2007-02-23 | 2012-09-04 | Audience, Inc. | System and method for 2-channel and 3-channel acoustic echo cancellation |
US20100100386A1 (en) * | 2007-03-19 | 2010-04-22 | Dolby Laboratories Licensing Corporation | Noise Variance Estimator for Speech Enhancement |
US8280731B2 (en) * | 2007-03-19 | 2012-10-02 | Dolby Laboratories Licensing Corporation | Noise variance estimator for speech enhancement |
US7885810B1 (en) * | 2007-05-10 | 2011-02-08 | Mediatek Inc. | Acoustic signal enhancement method and apparatus |
US20100169082A1 (en) * | 2007-06-15 | 2010-07-01 | Alon Konchitsky | Enhancing Receiver Intelligibility in Voice Communication Devices |
US20090012783A1 (en) * | 2007-07-06 | 2009-01-08 | Audience, Inc. | System and method for adaptive intelligent noise suppression |
US20090012786A1 (en) * | 2007-07-06 | 2009-01-08 | Texas Instruments Incorporated | Adaptive Noise Cancellation |
US8744844B2 (en) | 2007-07-06 | 2014-06-03 | Audience, Inc. | System and method for adaptive intelligent noise suppression |
US8886525B2 (en) | 2007-07-06 | 2014-11-11 | Audience, Inc. | System and method for adaptive intelligent noise suppression |
EP2023342A1 (en) * | 2007-07-25 | 2009-02-11 | QNX Software Systems (Wavemakers), Inc. | Noise reduction with integrated tonal noise reduction |
US20080167870A1 (en) * | 2007-07-25 | 2008-07-10 | Harman International Industries, Inc. | Noise reduction with integrated tonal noise reduction |
US8489396B2 (en) | 2007-07-25 | 2013-07-16 | Qnx Software Systems Limited | Noise reduction with integrated tonal noise reduction |
US8189766B1 (en) | 2007-07-26 | 2012-05-29 | Audience, Inc. | System and method for blind subband acoustic echo cancellation postfiltering |
US8849231B1 (en) | 2007-08-08 | 2014-09-30 | Audience, Inc. | System and method for adaptive power control |
US20090063143A1 (en) * | 2007-08-31 | 2009-03-05 | Gerhard Uwe Schmidt | System for speech signal enhancement in a noisy environment through corrective adjustment of spectral noise power density estimations |
US8364479B2 (en) * | 2007-08-31 | 2013-01-29 | Nuance Communications, Inc. | System for speech signal enhancement in a noisy environment through corrective adjustment of spectral noise power density estimations |
WO2009082299A1 (en) * | 2007-12-20 | 2009-07-02 | Telefonaktiebolaget L M Ericsson (Publ) | Noise suppression method and apparatus |
US9177566B2 (en) | 2007-12-20 | 2015-11-03 | Telefonaktiebolaget L M Ericsson (Publ) | Noise suppression method and apparatus |
US20100274561A1 (en) * | 2007-12-20 | 2010-10-28 | Per Ahgren | Noise Suppression Method and Apparatus |
US8180064B1 (en) | 2007-12-21 | 2012-05-15 | Audience, Inc. | System and method for providing voice equalization |
US8143620B1 (en) | 2007-12-21 | 2012-03-27 | Audience, Inc. | System and method for adaptive classification of audio sources |
US9076456B1 (en) | 2007-12-21 | 2015-07-07 | Audience, Inc. | System and method for providing voice equalization |
US8194882B2 (en) | 2008-02-29 | 2012-06-05 | Audience, Inc. | System and method for providing single microphone noise suppression fallback |
US8355511B2 (en) | 2008-03-18 | 2013-01-15 | Audience, Inc. | System and method for envelope-based acoustic echo cancellation |
US8521530B1 (en) | 2008-06-30 | 2013-08-27 | Audience, Inc. | System and method for enhancing a monaural audio signal |
US8774423B1 (en) | 2008-06-30 | 2014-07-08 | Audience, Inc. | System and method for controlling adaptivity of signal modification using a phantom coefficient |
US8204253B1 (en) | 2008-06-30 | 2012-06-19 | Audience, Inc. | Self calibration of audio device |
US8737641B2 (en) * | 2008-11-04 | 2014-05-27 | Mitsubishi Electric Corporation | Noise suppressor |
US20110123045A1 (en) * | 2008-11-04 | 2011-05-26 | Hirohisa Tasaki | Noise suppressor |
US8352250B2 (en) * | 2009-01-06 | 2013-01-08 | Skype | Filtering speech |
US20100174535A1 (en) * | 2009-01-06 | 2010-07-08 | Skype Limited | Filtering speech |
EP2460156A1 (en) * | 2009-07-29 | 2012-06-06 | BYD Company Limited | Method and device for eliminating background noise |
EP2460156A4 (en) * | 2009-07-29 | 2012-12-26 | Byd Co Ltd | Method and device for eliminating background noise |
US9008329B1 (en) | 2010-01-26 | 2015-04-14 | Audience, Inc. | Noise reduction using multi-feature cluster tracker |
US9558755B1 (en) | 2010-05-20 | 2017-01-31 | Knowles Electronics, Llc | Noise suppression assisted automatic speech recognition |
US9460731B2 (en) * | 2010-08-04 | 2016-10-04 | Fujitsu Limited | Noise estimation apparatus, noise estimation method, and noise estimation program |
US20120035920A1 (en) * | 2010-08-04 | 2012-02-09 | Fujitsu Limited | Noise estimation apparatus, noise estimation method, and noise estimation program |
US9666206B2 (en) * | 2011-08-24 | 2017-05-30 | Texas Instruments Incorporated | Method, system and computer program product for attenuating noise in multiple time frames |
US20130054232A1 (en) * | 2011-08-24 | 2013-02-28 | Texas Instruments Incorporated | Method, System and Computer Program Product for Attenuating Noise in Multiple Time Frames |
US20130231927A1 (en) * | 2012-03-05 | 2013-09-05 | Pierre Zakarauskas | Formant Based Speech Reconstruction from Noisy Signals |
US20150187365A1 (en) * | 2012-03-05 | 2015-07-02 | Malaspina Labs (Barbados), Inc. | Formant Based Speech Reconstruction from Noisy Signals |
US9240190B2 (en) * | 2012-03-05 | 2016-01-19 | Malaspina Labs (Barbados) Inc. | Formant based speech reconstruction from noisy signals |
US20130231924A1 (en) * | 2012-03-05 | 2013-09-05 | Pierre Zakarauskas | Format Based Speech Reconstruction from Noisy Signals |
US9020818B2 (en) * | 2012-03-05 | 2015-04-28 | Malaspina Labs (Barbados) Inc. | Format based speech reconstruction from noisy signals |
US9015044B2 (en) * | 2012-03-05 | 2015-04-21 | Malaspina Labs (Barbados) Inc. | Formant based speech reconstruction from noisy signals |
US9640194B1 (en) | 2012-10-04 | 2017-05-02 | Knowles Electronics, Llc | Noise suppression for speech processing based on machine-learning mask estimation |
US9159336B1 (en) * | 2013-01-21 | 2015-10-13 | Rawles Llc | Cross-domain filtering for audio noise reduction |
US9536540B2 (en) | 2013-07-19 | 2017-01-03 | Knowles Electronics, Llc | Speech signal separation and synthesis based on auditory scene analysis and speech modeling |
US9767829B2 (en) * | 2013-09-16 | 2017-09-19 | Samsung Electronics Co., Ltd. | Speech signal processing apparatus and method for enhancing speech intelligibility |
US20150081285A1 (en) * | 2013-09-16 | 2015-03-19 | Samsung Electronics Co., Ltd. | Speech signal processing apparatus and method for enhancing speech intelligibility |
US20150294667A1 (en) * | 2014-04-09 | 2015-10-15 | Electronics And Telecommunications Research Institute | Noise cancellation apparatus and method |
US9583120B2 (en) * | 2014-04-09 | 2017-02-28 | Electronics And Telecommunications Research Institute | Noise cancellation apparatus and method |
US9799330B2 (en) | 2014-08-28 | 2017-10-24 | Knowles Electronics, Llc | Multi-sourced noise suppression |
US9640195B2 (en) | 2015-02-11 | 2017-05-02 | Nxp B.V. | Time zero convergence single microphone noise reduction |
CN106024002A (en) * | 2015-02-11 | 2016-10-12 | 恩智浦有限公司 | Time zero convergence single microphone noise reduction |
EP3057097A1 (en) * | 2015-02-11 | 2016-08-17 | Nxp B.V. | Time zero convergence single microphone noise reduction |
CN106024002B (en) * | 2015-02-11 | 2021-05-11 | 汇顶科技(香港)有限公司 | Time zero convergence single microphone noise reduction |
US20180204580A1 (en) * | 2015-09-25 | 2018-07-19 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Encoder and method for encoding an audio signal with reduced background noise using linear predictive coding |
US10692510B2 (en) * | 2015-09-25 | 2020-06-23 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Encoder and method for encoding an audio signal with reduced background noise using linear predictive coding |
CN106486131A (en) * | 2016-10-14 | 2017-03-08 | 上海谦问万答吧云计算科技有限公司 | A kind of method and device of speech de-noising |
CN118130440A (en) * | 2024-03-13 | 2024-06-04 | 济宁学院 | Quantum dot printing ink rapid detection method based on spectral characteristics |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5706395A (en) | Adaptive weiner filtering using a dynamic suppression factor | |
US6263307B1 (en) | Adaptive weiner filtering using line spectral frequencies | |
US6591234B1 (en) | Method and apparatus for adaptively suppressing noise | |
EP0673014B1 (en) | Acoustic signal transform coding method and decoding method | |
US7379866B2 (en) | Simple noise suppression model | |
US8930184B2 (en) | Signal bandwidth extending apparatus | |
EP0683916B1 (en) | Noise reduction | |
US7492889B2 (en) | Noise suppression based on bark band wiener filtering and modified doblinger noise estimate | |
US7313518B2 (en) | Noise reduction method and device using two pass filtering | |
US7957965B2 (en) | Communication system noise cancellation power signal calculation techniques | |
US7649988B2 (en) | Comfort noise generator using modified Doblinger noise estimate | |
US8521530B1 (en) | System and method for enhancing a monaural audio signal | |
US20070232257A1 (en) | Noise suppressor | |
US20090012786A1 (en) | Adaptive Noise Cancellation | |
US20060184363A1 (en) | Noise suppression | |
CA2076072A1 (en) | Auditory model for parametrization of speech | |
JP2001501327A (en) | Process and apparatus for blind equalization of transmission channel effects in digital audio signals | |
EP1093112B1 (en) | A method for generating speech feature signals and an apparatus for carrying through this method | |
US20020177995A1 (en) | Method and arrangement for performing a fourier transformation adapted to the transfer function of human sensory organs as well as a noise reduction facility and a speech recognition facility | |
US7603271B2 (en) | Speech coding apparatus with perceptual weighting and method therefor | |
EP1748426A2 (en) | Method and apparatus for adaptively suppressing noise | |
EP2063420A1 (en) | Method and assembly to enhance the intelligibility of speech | |
Puder | Kalman‐filters in subbands for noise reduction with enhanced pitch‐adaptive speech model estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: TEXAS INSTRUMENTS INCORPORATED, TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ARSLAN, LEVENT M.;MCCREE, ALAN V.;VISWANATHAN, VISHU R.;REEL/FRAME:007530/0537 Effective date: 19950609 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |