CN100543842C - Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error - Google Patents

Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error Download PDF

Info

Publication number
CN100543842C
CN100543842C CNB2006100811562A CN200610081156A CN100543842C CN 100543842 C CN100543842 C CN 100543842C CN B2006100811562 A CNB2006100811562 A CN B2006100811562A CN 200610081156 A CN200610081156 A CN 200610081156A CN 100543842 C CN100543842 C CN 100543842C
Authority
CN
China
Prior art keywords
lambda
voice
exp
noise
imaginary part
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2006100811562A
Other languages
Chinese (zh)
Other versions
CN101079266A (en
Inventor
吴颖谦
柯昌伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CNB2006100811562A priority Critical patent/CN100543842C/en
Publication of CN101079266A publication Critical patent/CN101079266A/en
Application granted granted Critical
Publication of CN100543842C publication Critical patent/CN100543842C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present invention relates to the inhibiting method of background noise based on multiple statistics model and least mean-square error, comprising: the voice signal to current incoming frame carries out short time discrete Fourier transform; Pure voice amplitude variance is estimated and the estimation of noise amplitude variance on each frequency that frame keeps in the utilization, and each frequency component of voice signal in the current incoming frame, calculates the real part of each frequency component of voice signal in the present frame and the estimation of imaginary part; There is not probability in the priori voice that calculate current each frequency component of incoming frame, further revise the real part of each frequency component of voice signal in the aforementioned present frame that obtains and the estimated result of imaginary part in view of the above.The present invention approaches the true distribution of voice and noise in the practical application more accurately; Can obtain higher inhibition effect; Estimation procedure is more accurate, sane; Have high squelch efficient and lower computation complexity, be suitable for various voice communication systems.

Description

Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error
Technical field
The invention belongs to the speech processes field, relate generally to a kind of background acoustic noise (Acoustic Noise) inhibition method, be applicable to the pre-treatment that various voice communications, speech recognition etc. are used based on multiple statistics model (Multiple Statistical Model) and least mean-square error (Minimum Mean Squared Error).
Background technology
In most of voice communications applications, the input end of system can only receive by the noisy speech after the ground unrest interference, noise has greatly disturbed the quality of voice communication, has reduced the sharpness of voice and the property understood, speech processing module such as coding and decoding in the system are produced adverse influence.
Ground unrest suppresses technology can extract pure as far as possible raw tone from noisy speech, this research belongs to " voice enhancing " category in the speech processes field.Squelch helps to improve the consciousness quality of noisy speech signal, improves the comfort and the service quality of communication environment; Simultaneously, squelch can improve vocoder in noise circumstance lower compression performance and stability as the pre-processing module of speech; In addition, this technology can effectively improve the robustness of speech recognition system under background noise environment.
The phonetic entry model of most of communications applications all has the characteristics of single channel phonetic entry and additivity ground unrest.In this input model, the signals and associated noises that observes can be expressed as voice and noise component sum in time domain or frequency domain.In present existing noise suppressing method, the short-time spectrum method of weighting is the technology of main flow the most, these class methods are done short time discrete Fourier transform with noisy speech, according to voice on each frequency and noise component designing gain coefficient, by the effect that is inhibited that this coefficient and signals and associated noises are multiplied each other, the voice after behind inverse-Fourier transform in short-term, obtaining handling.
Spectrum subtraction method (S.F.Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans.ASSP, vol.ASSP-27, pp.113-120, Apr., 1979) be most typical example, thereby its ultimate principle is to deduct the estimating noise amplitude from the noisy speech amplitude and phase invariant realization process of inhibition in frequency domain, and owing to the mode of this process with the spectrum weighting realizes, thereby the weighted gain coefficient is related with signal to noise ratio (S/N ratio).A lot of methods have all been followed this thinking, but differ from one another on Calculation of Gain method and noise estimation method.
As patent (Method and Apparatus for Suppression Noise in a Communication System, US patent5,659,622), this method utilizes the modified index of signal to noise ratio (S/N ratio) and noise energy to come calculated gains, thereby and by each frequency band when long average power spectra calculate the spectrum deviation and carry out noise power and estimate.
(Method and Device for Speech Enhancement in the Presence of Background Noise WO2005064595) further carries out the smoothing processing of time domain to patent to gain coefficient.
Patent (Low frequency spectral enhancement system and method, US patent 6,233,549), this method are then emphasized the enhancing to low frequency component when calculating the spectrum gain coefficient.
Some spectrum methods of weighting will suppress noise and be interpreted as the spectral amplitude of estimating raw tone and obtained better effect, estimation criterion commonly used comprises maximum likelihood ML (R.McAulay, Speech enhancement using a soft-decision noisesuppression filter, IEEE Trans.A.S.S.P., 28,1980), least mean-square error MMSE (Y.Ephraim, Speechenhancement using a minimum mean-square error short-time spectral amplitude estimator, IEEETrans.A.S.S.P, 32,1984) etc.
Wherein the MMSE method of estimation is the most commonly used, obtained updating, typical in Y.Ephraim (Y.Ephraim, Speechenhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans.A.S.S.P, 33,1985) use the MMSE criterion to estimate the logarithm value of speech manual amplitude on all frequencies, the spectrum weighted gain of Ji Suaning can obtain better effect thus.
Patent (Core Estimator and Adaptive Gains from Signal To Noise Ratio in a Hybrid SpeechEnhancement System, US patent 2002002455) adopt the soft-decision mode and consider that evaluated error minimizes and the voice distortion that suppresses to bring between balance, thereby finally obtain the computing method of weighted gain.
Which kind of statistical model is the key issue of MMSE method be to adopt reflect the statistical distribution of frequency domain voice signal, and existing method model commonly used is a Gauss model, but it can not simulate truth well; On the other hand, the accuracy of Noise Estimation and sane degree also are the key factors of decision inhibition method performance.
Summary of the invention
The object of the present invention is to provide a kind of inhibiting method of background noise,, improve the accuracy and the robustness of Noise Estimation with good simulation truth based on multiple statistics model and least mean-square error.
The present invention specifically is achieved in that
A kind of inhibiting method of background noise based on multiple statistics model and least mean-square error may further comprise the steps:
Step 1, the voice signal of current incoming frame is carried out short time discrete Fourier transform;
Pure voice amplitude variance is estimated and the estimation of noise amplitude variance on each frequency that frame keeps in step 2, the utilization, and each frequency component of voice signal in the current incoming frame, adopt laplacian distribution and gamma to distribute the real part and the imaginary part probability density distribution of analogue noise and voice spectrum component respectively, utilize described real part and imaginary part probability density distribution to calculate the conditional expectation of real part and imaginary part respectively, as the estimation of real part and imaginary part according to minimum mean square error criterion;
There is not probability in the priori voice of step 3, current each frequency component of incoming frame of calculating, further revise the real part of step 2 calculating and the estimation of imaginary part in view of the above;
Step 4, according to the revised real part of current incoming frame and the estimation of imaginary part, calculate pure voice amplitude variance and estimate and keep to use to next frame;
The likelihood ratio of step 5, the current incoming frame of calculating, whether be pure noise frame, in this way, then upgrade the noise amplitude variance and estimate if adjudicating current incoming frame;
Step 6, adopt the voice after inverse fourier transform and splicing adding in short-term obtain squelch.
The real part of described noise and voice spectrum component and imaginary part probability density distribution are as shown in the formula expression:
p ( N R ) = 1 λ n exp ( - 2 | N R | λ n ) p ( N I ) = 1 λ n exp ( - 2 | N I | λ n )
p ( S R ) = 3 4 2 π λ s 2 4 | S R | - 1 2 exp ( - 3 | S R | 2 λ s ) p ( S I ) = = 3 4 2 π λ s 2 4 | S I | - 1 2 exp ( - 3 | S I | 2 λ s )
N in the following formula R, N I, S RAnd S IRepresent the real part and the imaginary part of noise, voice spectrum component respectively, λ nAnd λ sRepresent the variance of noise and voice spectrum component respectively.
The expectation value of described real part condition is:
E [ S R ( k , l ) | Y R ( k , l ) ] = 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) p ( Y R ( k , l ) ) ∫ - ∞ ∞ S R ( k , l ) · | S R ( k , l ) | - 0.5 .
exp [ - 2 | Y R ( k , l ) - S R ( k , l ) | λ n ( k , l ) ] · exp [ - 1.5 | S R ( k , l ) | λ s ( k , l ) ] d S R ( k , l )
= 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) p [ Y R ( k , l ) ] { 2 3 exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] Y R 3 / 2 ( k , l ) Φ [ 1.5,2.5 , - 2 Y R ( k , l ) G 1 ]
+ exp [ - 1.5 Y R ( k , l ) λ s ( k , l ) ] ( 2 G 2 ) - 1.5 ψ [ - 0.5 , - 0.5,2 G 2 Y R ( k , l ) ]
- 0.856 · exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] ( 2 G 2 ) - 1.5 }
Y in the following formula R(k, l) expression signals and associated noises Y is at the real part of l k frequency component constantly, Φ (a, b, z)=M (a, b z) represent first kind confluent hypergeometric function, Ψ (a, b z) represent hypergeometric function equally, can by Φ (a, b z) calculate,
G 1 = 1.5 λ n + 2 λ s 2 λ s λ n With G 2 = 1.5 λ n - 2 λ s 2 λ s λ n ;
The expectation value of described imaginary part condition:
E [ S I ( k , l ) | Y I ( k , l ) ] = 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) p ( Y I ( k , l ) ) ∫ - ∞ ∞ S I ( k , l ) · | S I ( k , l ) | - 0.5 .
exp [ - 2 | Y I ( k , l ) - S I ( k , l ) | λ n ( k , l ) ] · exp [ - 1.5 | S I ( k , l ) | λ s ( k , l ) ] d S I ( k , l )
= 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) p [ Y I ( k , l ) ] { 2 3 exp [ - 2 Y I ( k , l ) λ n ( k , l ) ] Y I 3 / 2 ( k , l ) Φ [ 1.5,2.5 , - 2 Y I ( k , l ) G 1 ]
+ exp [ - 1.5 Y I ( k , l ) λ s ( k , l ) ] ( 2 G 2 ) - 1.5 ψ [ - 0.5 , - 0.5,2 G 2 Y I ( k , l ) ]
- 0.856 · exp [ - 2 Y I ( k , l ) λ n ( k , l ) ] ( 2 G 2 ) - 1.5 }
Y in the following formula i(k, l) expression signals and associated noises Y is in the imaginary part of l k frequency component constantly.
Further comprise in the described step 3:
Calculate current incoming frame squared magnitude and, after calculating signal to noise ratio (S/N ratio), it is level and smooth to carry out the time domain recurrence, calculate overall probability then, after each frequency computation part signal to noise ratio (S/N ratio), it is level and smooth to carry out the time domain recurrence, calculates local probability then, and described priori voice do not exist probability to equal 1 poor with overall probability and local probability product;
Utilize the priori voice not exist probability and voice to have the estimation of uncertain hypothesis correction real part and imaginary part.
The real part of described correction is estimated as:
E [ S R | Y R ] = Γ ( k , l ) 1 + Γ ( k , l ) E [ S R | Y R , H 1 ] =
1.5 4 2 λ n ( k , l ) π λ s ( k , l ) { 2 · exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] Y R ( k , l ) Φ [ 0.5,1.5 , - 2 G 1 Y R ( k , l ) ]
+ exp [ - 1.5 λ s ( k , l ) Y R ( k , l ) ] 1 2 G 2 Ψ [ 0.5,0.5,2 G 2 Y R ( k , l ) ]
+ exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] π 2 G 2 } · Γ ( k , l ) 1 + Γ ( k , l )
In the following formula Γ ( k , l ) = p ( Y R ( k , l ) | H 1 ) p ( Y R ( k , l ) | H 0 ) · [ 1 - P ( H 0 ) ] P ( H 0 ) , P (Y R(k, l) | H 1) Y of expression voice when existing R(k, l) probability density distribution, and p (Y R(k, l) | H 0) probability density of expression when having only noise then;
The imaginary part of described correction is estimated as:
E [ S R | Y R ] = Γ ( k , l ) 1 + Γ ( k , l ) E [ S R | Y R , H 1 ] .
Also comprise behind the short time discrete Fourier transform of described step 1:
Detect step, the signal tone that one or more single frequency tone of input is mixed, calculate all frequency components energy and, then obtain the maximal value of 2 squared magnitude, and from energy and subtract get peaked and, if maximal value and greater than energy and and closer to each other, then adjudicate the current signal tone that is input as, do not carry out any inhibition and handle.
The present invention adopts a plurality of statistical models to come the statistical distribution of match voice and noise frequency domain components respectively, thereby can approach the true distribution of voice and noise in the practical application more accurately; There is uncertain influence owing to considered noisy speech, can obtains higher inhibition effect process of inhibition; Adopted the maximum likelihood ratio method to carry out VAD and detected, carried out noise power spectrum in view of the above and estimate, estimation procedure is more accurate, sane; Adopt the overall situation to add partial approach and carry out the estimation that there is not probability in the priori voice; Adopt special flow process to avoid harmful effect, can not influence the detection of DTMF, fax tones, have high squelch efficient and lower computation complexity, be suitable for various voice communication systems single-frequency and multitone signal.
Description of drawings
Fig. 1 is the frame diagram of the method for the invention;
Fig. 2 is the process flow diagram of motion detection step in the method for the invention.
Embodiment
The phonetic entry model of most of communications applications all has the characteristics of single channel phonetic entry and additivity ground unrest, the present invention relates to the squelch problem under this model.At the squelch problem, the present invention proposes an adaptive filter method based on multiple statistics model.Shown in Figure 1 is the frame principles of entire method.The present invention uses short time discrete Fourier transform that input signal is transformed to frequency domain, the real part of each frequency component of voice signal and the estimation of imaginary part in the current incoming frame of the calculation of parameter of then utilizing previous frame to obtain, there is probability in computing voice and revises voice signal and estimate then, after upgrading the parameter current estimation, utilize the voice after inverse fourier transform is inhibited in short-term.
The steps in sequence of embodiment is following seven trifles:
1. time-the frequency analysis of voice signal
Voice and ground unrest all have height non-stationary characteristics, single Fourier transform can not the time dependent spectrum information of reflected signal, as voice the time become the power spectrum etc. of resonance peak and noise, thereby all voice noises suppress to adopt Time-Frequency Analysis Method.(Short-time Fourier STFT) is most important Time-Frequency Analysis Method to short time discrete Fourier transform.
The STFT process at first adopts analysis window (analysis window) to the current speech data weighting, and the analysis window function is not 0 in it supports only.Among the present invention, when the analysis window function was supported for L, the speech frame length l was 25% of L.The speech data of STFT after to the window weighting carries out discrete Fourier transform (DFT).Relatively L and l as seen, adjacent analysis window has 3/4 overlappingly among the present invention, this process as shown in Figure 2.The STFT process is suc as formula (1), and wherein N is analysis window length, and w (n) is the analysis window function.
Y ( k , l ) = Σ n = 0 N - 1 y ( n + lN ) w ( n ) exp [ - j ( 2 π N ) nk ]
(k l) carries out obtaining after the noise reduction process to Y
Figure C200610081156D00102
Carry out the voice signal after inverse fourier transform (STIFT) and splicing adding (OLA) method can obtain handling in short-term
Figure C200610081156D00103
Owing to become when the spectrum weighting coefficient is in the voice de-noising, thereby STIFT must adopt and the biorthogonal synthetic window of h (n) (synthesis window) (J.Wexler, Discrete Gabor expansions, SignalProcessing, Nov, 1990).
2. the statistical model of speech manual range coefficient
The spectral amplitude weighting inhibition method of estimating based on MMSE all uses Gaussian distribution (GaussianDistribution) to set up the probability Distribution Model of each spectrum component of noise and voice basically at present, the main advantage of this model is the convenience of mathematics manipulation, in fact can not the accurate description voice and the distribution of noise spectrum component.
If real, the imaginary part Gaussian distributed of supposition voice spectrum component, adopt MMSE can obtain Linear Estimation as estimation criterion, and this estimator is a real-valued wave filter, that is to say, the MMSE phase estimation that it obtains in fact still equals the phase place (Y.Ephraim of the respective tones spectral component of noisy speech, Speech enhancement using a minimummean-square error short-time spectral amplitude estimator, IEEE Trans.A.S.S.P, 32,1984).
Research and test all show, the real part of voice spectrum component and imaginary part all more meet gamma distribution (GammaDistribution), the MMSE estimator that adopts the gamma distribution to obtain simultaneously is a wave filter highly non-linear, complex values, and this will obtain better squelch performance.Therefore, the present invention adopts laplacian distribution and gamma to distribute the real part and the imaginary part probability density distribution of analogue noise and voice spectrum component respectively, specifically distributes as shown in the formula shown in (2), (3).
p ( N R ) = 1 λ n exp ( - 2 | N R | λ n ) p ( N I ) = 1 λ n exp ( - 2 | N I | λ n ) - - - ( 2 )
p ( S R ) = 3 4 2 π λ s 2 4 | S R | - 1 2 exp ( - 3 | S R | 2 λ s ) p ( S I ) = 3 4 2 π λ s 2 4 | S I | - 1 2 exp ( - 3 | S I | 2 λ s ) - - - ( 3 )
N in the following formula R, N I, S RAnd S IRepresent the real part and the imaginary part of noise, voice spectrum component respectively, λ nAnd λ sRepresent the variance of noise and voice spectrum component respectively, the real part and the imaginary part of the voice of t moment analysis window, k spectrum component of noise are expressed as S respectively R(t, k), S I(t, k) and N R(t, k), N I(t, k), its probability density distribution is respectively corresponding variance λ sAnd λ nLaplce and gamma distribution random number.
Because the height of voice is non-stationary, to each k, the distribution parameter { λ that different t constantly obtain s(1, k), λ s(2, k), λ s(3, k) ... and { λ n(1, k), λ n(2, k), λ n(3, k) ... more be construed as a random series, the present invention need be from noisy speech these random seriess of On-line Estimation.
3. the least mean-square estimate of spectral amplitude coefficient
People's auditory system changes more responsive to frequency domain, thereby estimates that from the frequency domain components of noisy speech the frequency domain components of pure voice can obtain better result.It is known that the estimation of random signal requires distributed model and error to estimate, and square error (MSE) is the most frequently used estimation criterion, the expectation value minimum of the estimated signal that its requirement calculates and the square error of pure voice signal.The present invention does not use Gaussian distribution model, all estimators all should comprise the estimation (R.Martin of real part and imaginary part, SpeechEnhancement using MMSE Short Time Spectral Estimation with Gamma Distributed Speech Priors, Proceeding of IEEE ICASSP, May, 2002).
Signals and associated noises Y l k frequency component constantly be expressed as Y (k, l)=Y R(k, l)+jY I(k, l), it comprises noise and speech components, promptly have Y (k, l)=[S R(k, l)+D R(k, l)]+j[S l(k, l)+D l(k, l)].The MMSE estimation problem can be summed up as that (k estimates under condition l) at known observed reading Y S ^ ( k , l ) = S ^ R ( k , l ) + j S ^ I ( k , l ) Make the error minimum
Figure C200610081156D00112
By the signal estimation theory as can be known the condition least mean-square error value of signal be the conditional expectation of signal, promptly S ^ ( k , l ) = E [ S ( k , l ) | Y ( 0 , l ) , Y ( 1 , l ) , . . . ] . Consider independence supposition between the FFT spectral coefficient and real part, the supposition of imaginary part independence, finally can get the MMSE estimator as the formula (4):
S ^ ( k , l ) = E [ S R ( k , l ) | Y R ( k , l ) ] + jE [ S I ( k , l ) | Y I ( k , l ) ] - - - ( 4 )
The laplacian distribution and the gamma that provide according to a last joint distribute, utilize real part and imaginary part probability density distribution further to calculate the conditional expectation of real part and imaginary part respectively, result (R.Martin that can be shown in (5) formula, Speech Enhancement usingMMSE Short Time Spectral Estimation with Gamma Distributed Speech Priors, Proceeding ofIEEE ICASSP, May, 2002).
E [ S R ( k , l ) | Y R ( k , l ) ] = 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) p ( Y R ( k , l ) ) ∫ - ∞ ∞ S R ( k , l ) · | S R ( k , l ) | - 0.5 .
exp [ - 2 | Y R ( k , l ) - S R ( k , l ) | λ n ( k , l ) ] · exp [ - 1.5 | S R ( k , l ) | λ s ( k , l ) ] d S R ( k , l )
= 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) p [ Y R ( k , l ) ] { 2 3 exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] Y R 3 / 2 ( k , l ) Φ [ 1.5,2.5 , - 2 Y R ( k , l ) G 1 ] - - - ( 5 )
+ exp [ - 1.5 Y R ( k , l ) λ s ( k , l ) ] ( 2 G 2 ) - 1.5 ψ [ - 0.5 , - 0.5,2 G 2 Y R ( k , l ) ]
- 0.856 · exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] ( 2 G 2 ) - 1.5 }
Y in the following formula R(k, l) expression signals and associated noises Y is at the real part of l k frequency component constantly, Φ (a, b, z)=M (a, b z) represent first kind confluent hypergeometric function, Ψ (a, b z) represent hypergeometric function equally, can by Φ (a, b z) calculate,
G 1 = 1.5 λ n + 2 λ s 2 λ s λ n With G 2 = 1.5 λ n - 2 λ s 2 λ s λ n ;
P (Y wherein RThe probability density that noisy speech frequency component real part is observed in (k, l)) expression has
p [ Y R ( k , l ) ] = 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) ∫ - ∞ ∞ | S R ( k , l ) | - 0.5 exp ( - 2 | Y R ( k , l ) - S R ( k , l ) | λ n ( k , l ) )
exp ( - 1.5 | S R ( k , l ) | λ s ( k , l ) ) d S R
= 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) { 2 · exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] Y R ( k , l ) Φ [ 0.5,1.5 , - 2 G 1 Y R ( k , l ) ] - - - ( 6 )
+ exp [ - 1.5 λ s ( k , l ) Y R ( k , l ) ] 1 2 G 2 Ψ [ 0.5,0.5,2 G 2 Y R ( k , l ) ]
+ exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] π 2 G 2 }
Φ in the formula [a, b, z]=M (z) expression first kind confluent hypergeometric function can utilize the summation of series to calculate for a, b, and
ψ [ a , b , z ] = Γ ( 1 - b ) Γ ( a - b + 1 ) M ( a , b , z ) + Γ ( b - 1 ) Γ ( a ) z 1 - b M ( a - b + 1,2 - b , z ) - - - ( 7 )
(I.S.Gradshteyn,Table?of?Intergrals,Series,and?Products,1994)。
In like manner can get the conditional expectation of imaginary part, as the estimation of imaginary part
E [ S I ( k , l ) | Y I ( k , l ) ] = 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) p ( Y I ( k , l ) ) ∫ - ∞ ∞ S I ( k , l ) · | S I ( k , l ) | - 0.5 .
exp [ - 2 | Y I ( k , l ) - S I ( k , l ) | λ n ( k , l ) ] · exp [ - 1.5 | S I ( k , l ) | λ s ( k , l ) ] d S I ( k , l )
= 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) p [ Y I ( k , l ) ] { 2 3 exp [ - 2 Y I ( k , l ) λ n ( k , l ) ] Y I 3 / 2 ( k , l ) Φ [ 1.5,2.5 , - 2 Y I ( k , l ) G 1 ] - - - ( 8 )
+ exp [ - 1.5 Y I ( k , l ) λ s ( k , l ) ] ( 2 G 2 ) - 1.5 ψ [ - 0.5 , - 0.5,2 G 2 Y I ( k , l ) ]
- 0.856 · exp [ - 2 Y I ( k , l ) λ n ( k , l ) ] ( 2 G 2 ) - 1.5 }
Y in the formula l(k, l) expression signals and associated noises Y is in the imaginary part of l k frequency component constantly, and other are referring to preceding formula explanation;
P[Y wherein l(k, l)] computing method with suc as formula (6) in like manner.
In actual speech communication, what each parameter that distributes often can not priori knows, must estimate that method of estimation sees below from noisy data.
Can find that formula (5) and (8) have utilized the joint distribution of the stochastic variable of two independences, obedience Laplce and gamma distributions, that is to say that in fact estimator comprises voice and noise component, have also promptly supposed the unconditional existence of voice.It is quiet that noisy speech signal in the actual voice communication environment comprises that a large amount of pauses bring, and this not only comprises the transition between the sentence even also comprises time-out between the syllable, thereby voice are uncertain in signals and associated noises, and it just exists according to probability.Thereby above-mentioned supposition is incorrect in actual applications: voice exist the then existence all the time of a large amount of time-out ground unrests in the input data.
4. the uncertainty that exists of voice
Because (5) formula and (8) formula are current incoming frame is in the estimation voice existence under, the present invention exists uncertainty that this is done according to voice to further expand.Noisy speech model Y (k, l)=S (k, l)+(k, l) the supposition voice are present in the input data to D all the time, if use H 0And H 1After representing that respectively whether voice exist, noisy more accurately model is:
Y ( w ) = S ( w ) + D ( w ) , H 1 D ( w ) , H 0 - - - ( 9 )
For expressing conveniently, each expression formula of this trifle is saved subscript.Consider the MMSE estimation E[S after there is uncertainty in voice R| Y R] should rewrite an accepted way of doing sth (10), wherein
E[S R|Y R]=E[S R|Y R,H 1]P(H 1|Y R)
(10)
+E[S R|Y R,H 0]P(H 0|Y R)
E[S R| Y R, H 0] the expression voice when not existing by Y RThe MMSE that obtains estimates, can not obtain the estimation of voice when obviously voice do not exist, and this should be 0, considers that therefore the real part of the voice signal when there is uncertainty in voice is estimated as:
E[S R|Y R]=E[S R|Y R,H 1]P(H 1|Y R) (11)
(11) formula of calculating requires posterior probability P (H 1(k) | Y k) known, this can calculate by bayes rule, that is:
P ( H 1 | Y R ) = p ( Y R ( k , l ) | H 1 ) P ( H 1 ) p ( Y R ( k , l ) | H 1 ) P ( H 1 ) + p ( Y R | H 0 ) P ( H 0 ) - - - ( 12 )
= Γ ( k , l ) 1 + Γ ( k , l )
In the following formula Γ ( k , l ) = p ( Y R ( k , l ) | H 1 ) p ( Y R ( k , l ) | H 0 ) · [ 1 - P ( H 0 ) ] P ( H 0 ) , P (Y R(k, l) | H 1) calculating as the formula (6), and p (Y R(k, l) | H 0) conditional probability density of expression when having only noise then, it calculates as the formula (2).Make P (H 0)=q represents that there is not probability in the priori voice.Obviously, q is priori the unknown, how to obtain the estimation of q
Figure C200610081156D00135
See the 6th trifle.
Having obtained MMSE of the present invention thus estimates:
E [ S R | Y R ] = Γ ( k , l ) 1 + Γ ( k , l ) E [ S R | Y R , H 1 ] =
= 1.5 4 2 λ n ( k , l ) π λ s ( k , l ) { 2 · exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] Y R ( k , l ) Φ [ 0.5,1.5 , - 2 G 1 Y R ( k , l ) ] - - - ( 13 )
+ exp [ - 1.5 λ s ( k , l ) Y R ( k , l ) ] 1 2 G 2 Ψ [ 0.5,0.5,2 G 2 Y R ( k , l ) ]
+ exp [ - 2 Y R ( k , l ) λ n ( k , l ) ] π 2 G 2 } · Γ ( k , l ) 1 + Γ ( k , l )
In the following formula Γ ( k , l ) = p ( Y R ( k , l ) | H 1 ) p ( Y R ( k , l ) | H 0 ) · [ 1 - P ( H 0 ) ] P ( H 0 ) , P (Y R(k, l) | H 1) Y of expression voice when existing R(k, l) probability density distribution, and p (Y R(k, l) | H 0) probability density of expression when having only noise then.
The estimation of the imaginary part of voice signal correction and real part are in like manner.
E [ S R | Y R ] = Γ ( k , l ) 1 + Γ ( k , l ) E [ S R | Y R , H 1 ] - - - ( 14 )
5. the method for estimation of spectral component variance
The present invention require the variance of voice and noise spectrum component known, but in the actual environment, these two parameters is not know, and without any priori, can only estimate from noisy speech when estimating pure speech manual coefficient.Consider the non-stationary of voice and noise in the actual environment, method should be followed the tracks of the variation of these parameters.The present invention uses previous frame to suppress to handle the amplitude variance of back each frequency component of voice signal as λ s(k, estimation l)
Figure C200610081156D00147
λ n(k, estimation l) is more complicated then, and the present invention uses the VAD module to judge whether pure noise frame of current incoming frame, if pure noise frame then upgrades noise parameter.This hard decision method of estimation thinks that input speech signal switches in voice-noise and pure noise two states, the estimation of noise variance only should be carried out in pure noise mode.In the practical communication environment, voice and noise often show height non-stationary characteristics, the time statistical property that becomes make this hard decision method show more steadily and surely, thereby these class methods have obtained to be extensive use of.
The present invention has proposed a kind of VAD method based on likelihood ratio (Likelihood Ratio) on last joint basis, further whether the current incoming frame of relatively adjudicating by likelihood function is pure noise frame on the probability density basis that formula (6) is calculated.Because each frequency domain components comprises approximate independently real part and imaginary part, its likelihood ratio comprises real part and imaginary part simultaneously, and its shape is as the definition of (15) formula, and each spectrum component is uncorrelated each other.
Λ ( k , l ) = p [ Y R ( k , l ) | H 1 ] p [ Y I ( k , l ) | H 1 ] p [ Y R ( k , l ) | H 0 ] p [ Y I ( k , l ) | H 0 ] - - - ( 15 )
P (Y in the following formula R| H 1) and p (Y R| H 0) calculating respectively suc as formula shown in (6) and (2), p (Y l| H 1) and p (Y l| H 0) calculating in like manner.
Because VAD must put in order frame and carry out, thereby the likelihood ratio of whole frame is:
log [ Λ ( l ) ] = 1 K Σ k = 0 K - 1 log [ Λ ( k , l ) ] - - - ( 16 )
In view of the above, VAD judging process of the present invention as the formula (17),
H 0 , if [ log ( &Lambda; ) ] < &theta; &Lambda; H 1 , if [ log ( &Lambda; ) ] > &theta; &Lambda; - - - ( 17 )
As [log (Λ)]<θ ΛThe time, H is made in judgement 1Judgement, i.e. voice-noise frame, otherwise make H 0Judgement, promptly pure noise frame.If the judgement of current incoming frame is pure noise frame, each spectral component of calculating noise as the formula (18) then.
&sigma; ^ n 2 ( k , l ) = &sigma; ^ n 2 ( k , l - 1 ) , H 0 &alpha; &sigma; &sigma; ^ n 2 ( k , l - 1 ) + ( 1 - &alpha; &sigma; ) &sigma; n 2 ( k , l ) , H 1 - - - ( 18 )
Can observe from formula (13) and calculate to suppress the amplitude variance that the result also needs to estimate each frequency component of pure voice, the present invention directly adopts the estimation of the filtering voice of previous frame as pure voice, thereby obtains the estimation of this parameter.
6. there is not the estimation of probability in the priori voice
Concerning noise suppressor formula (13) and (14), it is an important parameter that there is not probability in the priori voice.In actual applications, this parameter not only priori is unknown but also different with frequency in time and change, thereby must be by the frequency On-line Estimation.The present invention proposes following method of estimation.
Calculate at first as the formula (19) current incoming frame squared magnitude and:
A Sum 2 ( l ) = &Sigma; k = 0 K - 1 A 2 ( k , l ) - - - ( 19 )
Calculating signal to noise ratio (S/N ratio) &eta; ( l ) = A Sum 2 &sigma; n 2 After, it is level and smooth to carry out the time domain recurrence as the formula (20),
η(l)=β ηη(l-1)+(1-β η)η(l) (20)
β wherein η=0.9.Calculate overall probability P then as the formula (21) Glob(l):
P glob ( l ) = 0 , &eta; min &GreaterEqual; &eta; &OverBar; ( l ) log [ &eta; &OverBar; ( l ) ] - log &eta; min log &eta; max &eta; min , &eta; min &le; &eta; &OverBar; ( l ) &le; &eta; max 1 , &eta; max &le; &eta; &OverBar; ( l ) - - - ( 21 )
η wherein MaxAnd η MinBe empirical constant, be respectively-3dB and-11dB.
Be each frequency computation part signal to noise ratio (S/N ratio) &gamma; ( k , l ) = A 2 ( k , l ) &sigma; n 2 After, it is level and smooth to carry out the time domain recurrence as the formula (22),
γ(k,l)=β γγ(k,l-1)+(1-β γ)γ(k,l)    (22)
β wherein γ=0.9.Calculate local probability P then as the formula (23) Loc(k, l),
P loc ( k , l ) = 0 , &gamma; min &GreaterEqual; &gamma; &OverBar; ( l ) log [ &gamma; &OverBar; ( k , l ) ] - log &eta; min log &gamma; max &gamma; min , &gamma; min &le; &gamma; &OverBar; ( l ) &le; &gamma; max 1 , &gamma; max &le; &gamma; &OverBar; ( l ) - - - ( 23 )
η wherein MaxAnd η MinBe empirical constant, be respectively-1dB and-9dB.The priori voice that finally obtain do not exist probability to be
q ^ ( k , l ) = 1 - P loc ( k , l ) P glob ( k , l ) - - - ( 24 )
Analysis mode (24) as can be known, the present invention has made full use of the relativity of time domain between the voice signal consecutive frame, the overall voice of having taken current incoming frame into consideration do not exist and the non-existent possibility of the local speech components of each frequency, estimation procedure has better robustness.
7.ITU-G.160 the requirement of agreement
The present invention is mainly used in various speech enhancement apparatus, and (Voice Enhancement Device VED), to improve the quality of voice communication, still many times goes back signal tones such as transmitting DTMF sound, fax tone in the network.Obviously, any noise suppression algorithm can not have a negative impact to these signal tones in processing procedure, and this ITU-G.160 agreement has been proposed clear and definite requirement.These signal tones all have one or 2 single frequency tone to mix, and have spike showing as on the frequency domain on one or more frequencies.Comprise short time discrete Fourier transform in the step of the present invention, detecting these spikes on this basis is very easily, can distinguish the signal tone easily.For satisfying the G.160 requirement of agreement, the present invention adds one and detects link after short time discrete Fourier transform, input is judged, if find to have only one or more tangible spikes, then is judged to the signal tone, does not carry out any inhibition and handles.In judging process, the present invention calculate all frequency components energy and, then obtain the maximal value of 2 squared magnitude, and from energy and subtract get peaked and, if maximal value and greater than energy and and closer to each other, then declare the current signal tone that is input as.

Claims (6)

1, a kind of inhibiting method of background noise based on multiple statistics model and least mean-square error is characterized in that, may further comprise the steps:
Step 1, the voice signal of current incoming frame is carried out short time discrete Fourier transform;
Pure voice amplitude variance is estimated and the estimation of noise amplitude variance on each frequency that frame keeps in step 2, the utilization, and each frequency component of voice signal in the current incoming frame, adopt laplacian distribution and gamma to distribute the real part and the imaginary part probability density distribution of analogue noise and voice spectrum component respectively, utilize described real part and imaginary part probability density distribution to calculate the conditional expectation of real part and imaginary part respectively, as the estimation of real part and imaginary part according to minimum mean square error criterion;
There is not probability in the priori voice of step 3, current each frequency component of incoming frame of calculating, further revise the real part of step 2 calculating and the estimation of imaginary part in view of the above;
Step 4, according to the revised real part of current incoming frame and the estimation of imaginary part, calculate pure voice amplitude variance and estimate and keep to use to next frame;
The likelihood ratio of step 5, the current incoming frame of calculating, whether be pure noise frame, in this way, then upgrade the noise amplitude variance and estimate if adjudicating current incoming frame;
Step 6, adopt the voice after inverse fourier transform and splicing adding in short-term obtain squelch.
2, the inhibiting method of background noise based on multiple statistics model and least mean-square error as claimed in claim 1 is characterized in that:
The real part of described noise and voice spectrum component and imaginary part probability density distribution are as shown in the formula expression:
p ( N R ) = 1 &lambda; n exp ( - 2 | N R | &lambda; n ) p ( N I ) = 1 &lambda; n exp ( - 2 | N I | &lambda; n )
p ( S R ) = 3 4 2 &pi; &lambda; s 2 4 | S R | - 1 2 exp ( - 3 | S R | 2 &lambda; s ) p ( S I ) = 3 4 2 &pi; &lambda; s 2 4 | S I | - 1 2 exp ( - 3 | S I | 2 &lambda; s )
N in the formula R, N I, S RAnd S IRepresent the real part and the imaginary part of noise, voice spectrum component respectively, λ nAnd λ sRepresent the variance of noise and voice spectrum component respectively.
3, the inhibiting method of background noise based on multiple statistics model and least mean-square error as claimed in claim 1 is characterized in that:
The expectation value of described real part condition is:
E [ S R ( k , l ) | Y R ( k , l ) ] = 1.5 4 2 &lambda; n ( k , l ) &pi; &lambda; s ( k , l ) p ( Y R ( k , l ) ) &Integral; - &infin; &infin; S R ( k , l ) &CenterDot; | S R ( k , l ) | - 0.5 .
exp [ - 2 | Y R ( k , l ) - S R ( k , l ) | &lambda; n ( k , l ) ] &CenterDot; exp [ - 1.5 | S R ( k , l ) | &lambda; s ( k , l ) ] d S R ( k , l )
= 1.5 4 2 &lambda; n ( k , l ) &pi; &lambda; s ( k , l ) p [ Y R ( k , l ) ] { 2 3 exp [ - 2 Y R ( k , l ) &lambda; n ( k , l ) ] Y R 3 / 2 ( k , l ) &Phi; [ 1.5,2.5 , - 2 Y R ( k , l ) G 1 ]
+ exp [ - 1.5 Y R ( k , l ) &lambda; s ( k , l ) ] ( 2 G 2 ) - 1.5 &psi; [ - 0.5 , - 0.5,2 G 2 Y R ( k , l ) ]
- 0.856 &CenterDot; exp [ - 2 Y R ( k , l ) &lambda; n ( k , l ) ] ( 2 G 2 ) - 1.5 }
Y in the formula R(k, l) expression signals and associated noises Y is at the real part of l k frequency component constantly, Φ (a, b, z)=(a, b z) represent first kind confluent hypergeometric function to M, and (a, b z) represent hypergeometric function to Ψ equally, can (a, b z) calculate, wherein by Φ
G 1 = 1.5 &lambda; n + 2 &lambda; s 2 &lambda; s &lambda; n With G 2 = 1.5 &lambda; n - 2 &lambda; s 2 &lambda; s &lambda; n ;
The expectation value of described imaginary part condition:
E [ S I ( k , l ) | Y I ( k , l ) ] = 1.5 4 2 &lambda; n ( k , l ) &pi; &lambda; s ( k , l ) p ( Y I ( k , l ) ) &Integral; - &infin; &infin; S I ( k , l ) &CenterDot; | S I ( k , l ) | - 0.5 .
exp [ - 2 | Y I ( k , l ) - S I ( k , l ) | &lambda; n ( k , l ) ] &CenterDot; exp [ - 1.5 | S I ( k , l ) | &lambda; s ( k , l ) ] d S I ( k , l )
= 1.5 4 2 &lambda; n ( k , l ) &pi; &lambda; s ( k , l ) p [ Y I ( k , l ) ] { 2 3 exp [ - 2 Y I ( k , l ) &lambda; n ( k , l ) ] Y I 3 / 2 ( k , l ) &Phi; [ 1.5,2.5 , - 2 Y I ( k , l ) G 1 ]
+ exp [ - 1.5 Y I ( k , l ) &lambda; s ( k , l ) ] ( 2 G 2 ) - 1.5 &psi; [ - 0.5 , - 0.5,2 G 2 Y I ( k , l ) ]
- 0.856 &CenterDot; exp [ - 2 Y I ( k , l ) &lambda; n ( k , l ) ] ( 2 G 2 ) - 1.5 }
Y in the formula 1(k, l) expression signals and associated noises Y is in the imaginary part of l k frequency component constantly.
4, the inhibiting method of background noise based on multiple statistics model and least mean-square error as claimed in claim 1 is characterized in that, further comprises in the described step 3:
Calculate current incoming frame squared magnitude and, after calculating signal to noise ratio (S/N ratio), it is level and smooth to carry out the time domain recurrence, calculate overall probability then, after each frequency computation part signal to noise ratio (S/N ratio), it is level and smooth to carry out the time domain recurrence, calculates local probability then, and described priori voice do not exist probability to equal 1 poor with overall probability and local probability product;
Utilize the priori voice not exist probability and voice to have the estimation of uncertain hypothesis correction real part and imaginary part.
5, the inhibiting method of background noise based on multiple statistics model and least mean-square error as claimed in claim 4 is characterized in that:
The real part of described correction is estimated as:
E [ S R | Y R ] = &Gamma; ( k , l ) 1 + &Gamma; ( k , l ) E [ S R | Y R , H 1 ] =
1.5 4 2 &lambda; n ( k , l ) &pi; &lambda; s ( k , l ) { 2 &CenterDot; exp [ - 2 Y R ( k , l ) &lambda; n ( k , l ) ] Y R ( k , l ) &Phi; [ 0.5,1.5 , - 2 G 1 Y R ( k , l ) ]
+ exp [ - 1.5 &lambda; s ( k , l ) Y R ( k , l ) ] 1 2 G 2 &Psi; [ 0.5,0.5,2 G 2 Y R ( k , l ) ]
+ exp [ - 2 Y R ( k , l ) &lambda; n ( k , l ) ] &pi; 2 G 2 } &CenterDot; &Gamma; ( k , l ) 1 + &Gamma; ( k , l )
In the formula &Gamma; ( k , l ) = p ( Y R ( k , l ) | H 1 ) p ( Y R ( k , l ) | H 0 ) &CenterDot; [ 1 - P ( H 0 ) ] P ( H 0 ) , P (Y R(k, l) | H 1) Y of expression voice when existing R(k, l) probability density distribution, and p (Y R(k, l) | H 0) probability density of expression when having only noise then;
The imaginary part of described correction is estimated as:
E [ S R | Y R ] = &Gamma; ( k , l ) 1 + &Gamma; ( k , l ) E [ S R | Y R , H 1 ] .
6, the inhibiting method of background noise based on multiple statistics model and least mean-square error as claimed in claim 1 is characterized in that, also comprises behind the short time discrete Fourier transform of described step 1:
Detect step, the signal tone that one or more single frequency tone of input is mixed, calculate all frequency components energy and, then obtain the maximal value of 2 squared magnitude, and from energy and subtract get peaked and, if maximal value and greater than energy and and closer to each other, then adjudicate the current signal tone that is input as, do not carry out any inhibition and handle.
CNB2006100811562A 2006-05-23 2006-05-23 Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error Expired - Fee Related CN100543842C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100811562A CN100543842C (en) 2006-05-23 2006-05-23 Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100811562A CN100543842C (en) 2006-05-23 2006-05-23 Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error

Publications (2)

Publication Number Publication Date
CN101079266A CN101079266A (en) 2007-11-28
CN100543842C true CN100543842C (en) 2009-09-23

Family

ID=38906699

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100811562A Expired - Fee Related CN100543842C (en) 2006-05-23 2006-05-23 Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error

Country Status (1)

Country Link
CN (1) CN100543842C (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859568B (en) 2009-04-10 2012-05-30 比亚迪股份有限公司 Method and device for eliminating voice background noise
CN102792373B (en) * 2010-03-09 2014-05-07 三菱电机株式会社 Noise suppression device
US20120245927A1 (en) * 2011-03-21 2012-09-27 On Semiconductor Trading Ltd. System and method for monaural audio processing based preserving speech information
CN102737643A (en) * 2011-04-14 2012-10-17 东南大学 Gabor time frequency analysis-based whisper enhancement method
CN103187068B (en) * 2011-12-30 2015-05-06 联芯科技有限公司 Priori signal-to-noise ratio estimation method, device and noise inhibition method based on Kalman
KR20220140002A (en) * 2013-04-05 2022-10-17 돌비 레버러토리즈 라이쎈싱 코오포레이션 Companding apparatus and method to reduce quantization noise using advanced spectral extension
CN103440869B (en) * 2013-09-03 2017-01-18 大连理工大学 Audio-reverberation inhibiting device and inhibiting method thereof
US9449609B2 (en) * 2013-11-07 2016-09-20 Continental Automotive Systems, Inc. Accurate forward SNR estimation based on MMSE speech probability presence
CN105702262A (en) * 2014-11-28 2016-06-22 上海航空电器有限公司 Headset double-microphone voice enhancement method
CN107430847B (en) * 2015-03-24 2021-01-29 三菱电机株式会社 Active vibration noise control device
CN105448304B (en) * 2015-12-01 2019-01-15 珠海市杰理科技股份有限公司 Pronunciation signal noise spectrum estimating method, device and noise reduction process method
CN105513614B (en) * 2015-12-03 2019-05-03 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of area You Yin detection method based on noise power spectrum Gamma statistical distribution model
CN105635453B (en) * 2015-12-28 2020-12-29 上海博泰悦臻网络技术服务有限公司 Automatic call volume adjusting method and system, vehicle-mounted equipment and automobile
AU2017355584B2 (en) * 2016-11-02 2020-02-20 Chears Technology Company Limited Intelligent hearing aid
CN106504756B (en) * 2016-12-02 2019-05-24 珠海市杰理科技股份有限公司 Built-in speech recognition system and method
DE102017203469A1 (en) * 2017-03-03 2018-09-06 Robert Bosch Gmbh A method and a device for noise removal of audio signals and a voice control of devices with this Störfreireiung
CN106971740B (en) * 2017-03-28 2019-11-15 吉林大学 Sound enhancement method based on voice existing probability and phase estimation
CN108711432A (en) * 2017-04-10 2018-10-26 中山大学 A kind of sound enhancement method of the perception gain function of single microphone
CN109378012B (en) * 2018-10-11 2021-05-28 思必驰科技股份有限公司 Noise reduction method and system for recording audio by single-channel voice equipment
CN109586740B (en) * 2018-10-25 2021-02-26 同方电子科技有限公司 System and method for automatically muting human voice signal in digital receiver
CN111916099B (en) * 2020-10-13 2020-12-29 南京天悦电子科技有限公司 Adaptive echo cancellation device and method for variable-step hearing aid
CN113838475B (en) * 2021-11-29 2022-02-15 成都航天通信设备有限责任公司 Voice signal enhancement method and system based on logarithm MMSE estimator

Also Published As

Publication number Publication date
CN101079266A (en) 2007-11-28

Similar Documents

Publication Publication Date Title
CN100543842C (en) Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error
CN106340292B (en) A kind of sound enhancement method based on continuing noise estimation
KR101266894B1 (en) Apparatus and method for processing an audio signal for speech emhancement using a feature extraxtion
US7313518B2 (en) Noise reduction method and device using two pass filtering
EP1891624B1 (en) Multi-sensory speech enhancement using a speech-state model
US9601130B2 (en) Method for processing speech signals using an ensemble of speech enhancement procedures
CN103559888A (en) Speech enhancement method based on non-negative low-rank and sparse matrix decomposition principle
KR20130057668A (en) Voice recognition apparatus based on cepstrum feature vector and method thereof
Erell et al. Filterbank-energy estimation using mixture and Markov models for recognition of noisy speech
Dionelis et al. Modulation-domain Kalman filtering for monaural blind speech denoising and dereverberation
Ouyang et al. A Deep Neural Network Based Harmonic Noise Model for Speech Enhancement.
Astudillo et al. Uncertainty propagation
Taşmaz et al. Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE–STSA estimation in various noise environments
Badiezadegan et al. A wavelet-based thresholding approach to reconstructing unreliable spectrogram components
CN103971697A (en) Speech enhancement method based on non-local mean filtering
Surendran et al. Variance normalized perceptual subspace speech enhancement
Tupitsin et al. Two-step noise reduction based on soft mask for robust speaker identification
Zhao et al. Adaptive wavelet packet thresholding with iterative Kalman filter for speech enhancement
Sanam et al. Teager energy operation on wavelet packet coefficients for enhancing noisy speech using a hard thresholding function
Chehresa et al. MMSE speech enhancement based on GMM and solving an over-determined system of equations
Milner et al. Applying noise compensation methods to robustly predict acoustic speech features from MFCC vectors in noise
Chehresa et al. MMSE speech enhancement using GMM
Sunnydayal et al. Speech enhancement using sub-band wiener filter with pitch synchronous analysis
Badiezadegan et al. A wavelet-based data imputation approach to spectrogram reconstruction for robust speech recognition
Ding Speech enhancement in transform domain

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090923

Termination date: 20150523

EXPY Termination of patent right or utility model