CN1276896A - Method for suppressing noise in digital speech signal - Google Patents

Method for suppressing noise in digital speech signal Download PDF

Info

Publication number
CN1276896A
CN1276896A CN 98810358 CN98810358A CN1276896A CN 1276896 A CN1276896 A CN 1276896A CN 98810358 CN98810358 CN 98810358 CN 98810358 A CN98810358 A CN 98810358A CN 1276896 A CN1276896 A CN 1276896A
Authority
CN
China
Prior art keywords
frequency
frame
noise
spectrum component
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 98810358
Other languages
Chinese (zh)
Inventor
菲利普·洛克伍德
斯特凡·鲁比阿兹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eades security network company
Original Assignee
Matra Nortel Communications SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matra Nortel Communications SAS filed Critical Matra Nortel Communications SAS
Publication of CN1276896A publication Critical patent/CN1276896A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Noise Elimination (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Picture Signal Circuits (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The invention concerns a method for suppressing noise in a digital speech signal processed by successive frames which consists in: computing the signal spectral components (Sn,f, Sn,i) on each frame; computing the maximised estimations (B'n,i) of spectral components of the noise included in the speech signal; carrying out a harmonic analysis of the signal to estimate a pitch; carrying out a spectral subtraction comprising at least a step consisting in subtracting respectively, from each spectral component of the speech signal on the frame (Sn,f), a quantity depending on parameters including the maximised estimation of the noise corresponding spectral component and the estimated pitch; and applying to the subtraction result a transform towards the time domain to construct an enhanced speech signal (s<3>).

Description

The denoising method of audio digital signals
What the present invention relates to is the digital technology of the denoising of voice signal, particularly be to come denoising with eliminating non-line spectrum.
Because new communication mode, particularly mobile phone communications, generalization, various communications are more and more carried out in strong noise environment.Be attached to the optimization compression of the noise overslaugh voice signal on the voice, produce extrinsic ground unrest, will interfere with communications.Noise also makes the voice messaging indigestion on the other hand.
In order to attempt to reduce the effect of noise in communication, after deliberation multiple algorithm.S.F.Boll is (at article " Suppression of acoustic noise in speech using spectralsustraction " IEEE Trans.on Acoustics, speech and Signal Processing, Vol.ASSP-27, n ° of 2 arril (April) 1999) once recommended to subtract a kind of algorithm of spectrum.This technology is in silent period (phase de silence) estimating noise spectrum, and it is deducted from the signal that receives.This can reduce the noise level that receives, but its major defect is to set up a tedious especially music noise, because it is non-natural.
D.B.Paul (" The Spectral enveloppe estimation Vocoder ", IEEE Trans.on Acoustics, Speech and Signal Processing, Vol.ASSP-29, n ° of 4 ao  t (August) 1981) and P.Lockwood et J.Boudy (" Experiment with a nonlinear spectral subtraetor (NSS); HiddenMarkov Model and the projection; for robust speech recognition incars ", Speech communication, Vol.11 juin (June) 1992,215-228 page or leaf and EP-A-0534837) carry out this work again, and improve, make noise level significantly be reduced and the self character of holding signal still.In addition, its contribution is to use in the calculating of denoising wave filter for the first time to shelter principle (principe de masquage).From this thought, S.Nandkaumar and J.H.L.Hansen (" Speechenhancement on a new set of auditory constrained parameters " Proc.ICASSP 94, I.1-I.4 page or leaf) attempt in frequency spectrum is eliminated, utilizing the curve of sheltering that calculates.Although obtained result unsatisfactory from this technology, what be worth pointing out emphatically is that its significant contribution is not make voice signal rotten in denoising.
Other methods are to be based upon voice signal is resolved on the basis of a plurality of single numerical value, thereby be with voice signal projection in a less space, Bart De Moore (" The singular value de composition and long and short spaces ofnoisy matrices " IEEE Trans.on Signal Processing, Vol.41, n ° of 9September (September) 1993, the 2826-2838 page or leaf) and S.H.Jensen etc. (" Reduction of broad-band noise in speech by truncated QSVD ", IEEE Trans.on speech and Audio Processing Vol.3, n ° of 6 novermber (November) 1995) this was all done research, the principle of this technology is to regard voice signal and noise signal as incoherent fully, and voice signal is regarded as according to one group of limited parameter just can predict that it has abundant predictability.This technology can obtain the acceptable denoising for the buried signal of noise, yet makes voice signal rotten fully.For a relevant relatively noise, as the noise of doughnut contact or the generation of motor bump, its noise ratio also is not easy to prediction by the voice signal of masking by noise.So just attempt is with voice signal projection in the noise vector space segment.This method is not considered voice signal, does not particularly have the speech region of noise, and predictability herein is very low.Also have, predict voice signal from group's parameter, this can not consider the wealth of whole essence of voice.Obviously be based upon merely that mathematics is investigated and the limitation of ignoring technology on the basis of characteristics of speech sounds.
At last, other some technology all are to be based upon above the relative theory.J.A.Cadzow and O.M.Solomon (" Linear modeling and the coberence fonction ", IEEE Trans.on Acoustics, Speech and Signal Processing, Vol.ASSP-35, n ° of 1 janvier (January) 1987, the 19-28 page or leaf) developed coherence function particularly well, and R.Le Bouguin (" Enhancement of noisy speech signals:application to mobile radio communications ", Speech Communication, Vol.18, the 3-19 page or leaf) to being applied to, coherence function goes in the denoising to study.This method be based upon in fact following: using under a plurality of separate channel conditions, the coherence of voice signal is big than the coherence of noise certainly.Its result who obtains is quite encouraging, yet quite unfortunate, and this technology means will get a plurality of sound extraction sources, and this point is not that total energy realizes.
The 5 228 No. 088 patent of the U.S. described a denoising system, and in conjunction with a pitch detector, is operated in frequency domain.The result of this detection is used for regulating Noise Suppression coefficient (cocficients de suppression) on the one hand, is used for locating one " vocal cords " on the other hand.Frequency spectrum and remove module and before from signal, deducting noise, come Noise Estimation is weighted with these squelch coefficients.Calibration rejection coefficient module is only used such information, promptly determines whether tone is detected according to this information.And the numerical value that tone is got is to the not influence of used rejection coefficient.Constituted total reinforcement target of signal by means of detected fundamental frequency definite " vocal cords ".As mutation, this can be used on the contrary determining one " grass ' promptly be used for the band of overall attenuation.Like this part of frequency spectrum and signal is strengthened or is decayed be and deduction frequency spectrum denoising method very inequality.
A fundamental purpose of the present invention is to recommend a kind of new denoising technology, and this technology is considered the characteristics of speech production, therefore can remove noise effectively, and does not have the rotten sensation of voice.
So the present invention recommends the denoising method of the audio digital signals handled by successive frames.In this method, have:
-for the component of each frame computing voice signal spectrum;
-for each frame, calculate the maximum estimated (estimations major é es) that is included in the noise spectrum component in the voice signal;
-subtract spectrum, wherein having a step at least is to deduct an amount respectively from each voice signal spectrum component of a frame, this amount is decided by to include maximum estimated for the corresponding noise spectrum component of the frame of addressing at some interior parameters; And
-will be applied to subtract the result of spectrum to the conversion of time domain, to constitute the denoising voice signal.
In order to estimate having the fundamental frequency of carrying out signal on each frame of voice activity, will carry out frequency analysis to voice signal, the parameter of the decision amount of deducting thereby include estimate fundamental frequency.
The too high estimation (surestimation) of noise spectrum envelope is carried out in general total hope, so that the maximum estimated that obtains so still is firm for the unexpected variation of noise.Yet so too high estimation has defective usually, just makes the voice signal distortion when it is too big.The effect of this respect is the feature of the noisy voice signal of influence, subdues its part predictability.This defective is hindered for phone very much, because the residing noise region of the voice signal of phone is the strongest.In denoising, consider this fact of fundamental frequency of voice signal, make the harmonic wave that in these noise region, can protect this signal.
In general; in order from voice signal, to deduct given spectrum component; if the spectrum component of addressing is corresponding to a shielded frequency; approach most the integral multiple of the frequent rate of estimative first-harmonic in other words, then the amount of being got does not just correspond to a so shielded frequency less than the spectrum component of addressing.The amount that this is less even can be zero.Under latter event, subtract signal and/or its harmonic signal that spectrum does not influence estimative fundamental frequency place.So just removed by the non-linear partial that too high estimation brought to noise, this zone for masking by noise is responsive especially.For not by the zone of masking by noise, because its pumping signal has characteristic more at random, so susceptibility is relatively poor.
In a favourable embodiment, to after the estimating of the fundamental frequency of the voice signal in the frame, just determine the voice signal of this frame, the additional sampling frequency that is about to additional sampling (surechantionnage) is taken at the integral multiple of the fundamental frequency of estimation, and on the basis of signals of determining, calculate the spectrum component of the voice signal of this frame, be used for from this signal, deducting the amount of addressing.Arrange like this to make other frequency of those frequency ratios that right of priority be arranged that this has just been avoided protection from the relative harmonic wave far away of this fundamental frequency near estimative fundamental frequency.So just kept the harmonic characteristic of voice signal preferably.For computing voice signal spectrum component, will with the determined signal of set of samples of N sampling through transform distribution on frequency field, and the ratio that will add the fundamental frequency of sampling frequency and estimation is elected to be factor N.
Can also be concise to aforesaid technology, with following mode the fundamental frequency of the voice signal of one frame is estimated:
-estimate because the time interval between twice interruption in succession of talker's closed signal that produces of glottis in the duration of a frame.The fundamental frequency of estimating and the time interval of addressing are inversely proportional to;
-in the time interval of addressing, insert voice signal, make that the time interval of given signal between twice interruption in succession that is obtained by this insertion is constant.
Constitute a signal frame in this way artificially, the voice signal on this frame interrupted with constant time interval, and this will consider may changing of in the time of frame fundamental frequency.
An additional improvement is after the processing of every frame, handles all samplings of the denoising voice signal provided from this, only keeps those samplings that equal sample frequency and the integral multiple of the ratio of the fundamental frequency of estimation of some.This with regard to avoided by with the problem of the distortion that uncontinuity produced of each interframe phase place.This is (overlap-add) generally can not revise fully in the technology in traditional repeated addition (Somme à reconverment).
Owing to use additional sampling technique to determine signal, this just can calculate the autocorrelative entropy of spectrum component from the basis of determining signal, measures the noise level of the fundamental signal in a frame well.Spectral interference is big more, that is noise is big more, and then the value of entropy is more little.Therefore the irregular looks of determining to have increased the weight of frequency spectrum of voice signal have also increased the weight of the variation of entropy, and this has just constituted good measurement sensitivity.Usually all be on the basis of de-noised signal, to calculate auto-correlation, can obtain best performance.Yet can consider on the basis of determining signal before the denoising, to calculate auto-correlation.
Obtain the spectrum component of de-noised signal with the value that deducts the spectrum component of addressing from voice signal, this just can be used for using an auditory model and calculate one and shelter curve.Best, decision deducts spectrum component from voice signal in a frame each parameter of value comprise noisy correspondence spectrum component maximum estimated and calculate shelter deviation between curve.The maximum estimated that this amount that deducts may be limited to corresponding to the spectrum component of noise surpasses the part of sheltering curve.This method is to be based upon only to remove sense of hearing noise with regard on enough bases.In contrast, to listen the noise of sheltering by voice then without any restriction.
In a favourable embodiment, each maximum estimated of the noise in voice signal all used a noise spectrum component of estimating for a long time and addressing of the noise spectrum component addressed combined near the measurement of its long-term variation estimating obtain.So just obtained the firm especially Noise Estimation amount that relevant noise changes, because this is the combination of two different estimators, one is the long-term fluctuating of considering noise, and another is a short term variations of considering noise.
See among the embodiment that other characteristic of the present invention and benefit will be described later that these embodiment are not to be restrictive, the accompanying drawing of describing institute's reference is:
Fig. 1 is the simplified diagram of enforcement denoising of the present invention system.
Fig. 2 and Fig. 3 illustrate the process flow diagram of the employed method of voice activity detector of system shown in Figure 1.
Fig. 4 represents the various states of voice activity detection aut.eq..
Fig. 5 is an image, and the variation of voice activity level is shown.
Fig. 6 is the rough schematic view of too high estimation module of the noise of system shown in Figure 1.
Fig. 7 is an image, and the calculating of sheltering curve is shown.
Fig. 8 is an image, is illustrated in the use of sheltering curve in the system shown in Figure 1.
Fig. 9 is a simplified diagram of implementing another denoising system of the present invention.
Figure 10 is an image, and a kind of harmonic analysis method that can be applicable to the method according to this invention is shown.
Figure 11 partly illustrates a mutation of the simplified diagram of Fig. 9.
At voice digital signal S of the denoising system handles shown in Fig. 1, window module with this signal S form become in succession window form or the form of frame, each window or each frame all are made of N digital signals sampling.According to the conventional method, these frames all have overlapped.In the description of back, this is not limited, and all frames are all regarded as by sampling frequency F eBy N=256 the sampling of 8kHz constituted, and all use the Hamming weighting in each window, overlapping between window is 50% in succession.
The calculation (TFR) that module 11 is used traditional rapid fourier change comes the modulus of signal calculated spectrum, and signal frame is transformed to frequency field.So module 11 is just disengaged the set of N=256 frequency component of voice signal, uses S N, fRepresent that wherein n represents the parameter of current frame, f represents the frequency of discrete spectrum.Since the characteristic of digital signal in frequency field, N/2=128 sampling before only using.
In order to calculate the estimation of the noise that in signal S, is comprised, do not use possible frequency resolution (resolutions), and be to use a less resolution characteristic in rapid fourier change output, be with cover the signal band [0, F eThe number I of frequency band/2] determines.Each frequency band i (1≤i≤I) be between a low frequency f (i-1) and the high frequency f (i), f (0)=0 herein, f (I)=F e/ 2.Cutting apart of this frequency band can be single (f (i)-f (i-1)=F e/ 2I), also can not single (for example according to (barks) scale).Module 12 is calculated the mean value S of the spectrum component of each frequency band voice signal separately N, f, for example be weighted to for single: S n , i = 1 f ( i ) - f ( i - 1 ) &Sigma; fe [ f ( i - 1 ) , f ( i ) [ S n , f - - - - - - - - ( 1 )
This average fluctuating of having compressed between each frequency band, simultaneously also average noise profile in these frequency bands has so just reduced the mean square deviation of noise estimation value.In addition, this complicacy that on average can greatly reduce system.
Average frequency spectrum component S N, iBe sent to voice activity detection module 15 and Noise Estimation module 16.These two modules 15 and 16 operation are to be connected mutually, and meaning is by the voice activity level γ of module 15 to each frequency band N, iMeasurement be used for estimating the long-term noise energy of each frequency band by module 16, these long-term estimations simultaneously
Figure A9881035800142
Be used for the voice signal in each frequency band is carried out the sound level γ that preliminary denoising is used for determining voice activity by module 15 again N, i
Module 15 can be corresponding with the process flow diagram shown in Fig. 3 with Fig. 2 with 16 operation.
Step 17 to 20 in, module 15 is carried out preliminary denoising for n signal frame to voice signal in each frequency band i.This preliminary denoising is that resulting Noise Estimation is carried out according to traditional non-linear spectrum process that subtracts from front frame or multiframe.In step 17, the resolution of module 15 usefulness frequency band i is calculated the frequency response Hp of preliminary denoising wave filter according to following formula N, i, Hp n , i = S n , i - &alpha; n - &tau; 1 , i &prime; &CenterDot; B ^ n - &tau; 1 , i S n - &tau; 2 , i - - - - - ( 2 ) Herein, τ 1And τ 2Be the time-delay (τ that represents with frame number 1〉=1, τ 2〉=0), and α ' N, iThen be the too high estimation coefficient of noise, it determines that method will be explained below.Time-delay τ 1Can be (the τ for example that fixes 1=1), also change, the degree of confidence of voice activity detection is also more little, τ 1Value is just more little.
In step 18 to 20, according to formula Ep ^ n , i = max { Hp n , i &CenterDot; S n , i , &beta;p i &CenterDot; B ^ n - &tau; 1 , i } - - - - ( 3 ) Calculate each spectrum component β p wherein iBe one and approach zero lower bound coefficient (cofficientde plancher), avoid the signal spectrum of denoising to get negative value with it traditionally, perhaps get too little value, so that produce a kind of music noise (bruit musical).
Step 17 to 20 mainly be from signal spectrum, deduct by factor alpha ' N-τ, iThe estimation of the noise spectrum of the previous estimation that strengthens.
In step 21, module 15 is calculated the energy of preliminary de-noised signal at all frequency band i for each frame n: Also go ENERGY E in this frequency band of weighting with the width of each frequency band N, iSum is calculated the population mean E of the energy of preliminary de-noised signal N, 0In the mark of back, represent total frequency band of signal with footnote i=0.
In step 22 and 23, module 15 is calculated each frequency band i (value Δ E of 0≤i≤I) N, i, Δ E N, iBe illustrated in the short-term increment of the energy of the de-noised signal among the frequency band i, also calculate the long-term value E of the energy of the de-noised signal among the frequency band i N, iCan be with a formula of having simplified &Delta;E n , i = | E n - 4 , i + E n - 3 , i - E n - 1 , i - E n , i 10 | . Calculate Δ E N, iAs for chronic energy E N, i, (facteru d ' is B oubli) then can to ignore the factor by means of one 1Calculate, for example 0<B 1<1, E is arranged N, i=B 1E N-1, i+ (1-B 1) E N, i
In the ENERGY E of having calculated de-noised signal N, iAfterwards, its short-term increment Delta E N, iWith its long-term value E N, iForm all in Fig. 2, mark, just (0≤i≤I) calculates a value ρ to module 15 to each frequency band i i, ρ iThe energy evolution of expression de-noised signal.This calculating is to carry out going on foot 36 steps from 25 of Fig. 3, all carries out such calculating for each i between from i=0 to i=I.This calculates the long-term estimated value ba by means of a noise envelope i, noise an inner estimated value bi iAnd the count value b of noise frame i
In step 25, with value Δ E N, iCompare with a threshold epsilon 1,, then in step 26, will count b if do not reach threshold epsilon 1 iIncrease a unit.In step 27, with long-term estimator ba iWith level and smooth energy value E N, iCompare.If ba i〉=E N, i, then in step 28, get estimated value ba iEqual smooth value E N, i, and will count b iPut 0.Get value ρ iEqual to compare ba i/ E N, i(step 36) is so be 1.
If in step 27, obtain ba i<E N, i, then in step 29 with count value b iCompare with a ultimate value bmax, as b i>bmax, then signal being regarded as is very much fixing for voice activity, just carries out aforesaid step 28, gets back to again only to comprise noisy frame.If b is arranged in step 29 i≤ bmax then calculates inner estimated value bi with following formula in step 33 i:
Bi i=(1-Bm) E N, i+ Bmba i(4) in this formula, Bm be the update coefficients value from 0.90 to 1, its value is according to the state of the aut.eq. that detects voice activity and difference (step 30 to 32).State δ N-1It is the state of when handling former frame, determining.If aut.eq. is to be in speech detection state (δ in step 30 N-1=2), the value Bmp of coefficient B m is in close proximity to 1, so that the estimated value of noise is upgraded when voice are arranged is very little; Under opposite situation, promptly coefficient B m gets less value Bms, makes the estimated value of noise at silent period bigger renewal be arranged.In step 34, with the deviation ba between long-term estimated value of noise and the inner estimated value of noise i-bi iWith a threshold epsilon 2Compare.If do not reach ε 2, then in step 35 with inner estimated value bi iUpgrade long-term estimated value ba iOtherwise, then long-term estimated value ba iRemain unchanged.Like this, will cause the renewal of noise estimation value by the unexpected variation of voice signal with regard to having avoided.
Obtaining each value ρ iAfterwards, module 15 is just determined voice activity in step 37, and module 15 is at first according to the value ρ that calculates 0For the state that detects aut.eq. is upgraded in the set of signal band.The new state δ of aut.eq. nBe decided by preceding state δ N-1And ρ 0, as shown in Figure 4.
What four kinds of possible state: δ=0 detection was arranged is noiseless, or does not have voice; What δ=2 were detected is that the sound activity is arranged; And state δ=1 and state δ=3 1 are the intermediatenesses that rises and descend respectively.When aut.eq. is in silent state (δ N-1=0) time, if ρ 0Be no more than first threshold SE1, then it remains unchanged, and under opposite situation, it will carry out the transition to propradation.At propradation (δ N-1=1), as ρ 0Than threshold value SE1 is little, and it just gets back to silent state; As ρ 0Greater than the second threshold value SE2, SE2 is greater than threshold value SE1, and it then carries out the transition to voice status; And as SE1≤ρ 0≤ SE2, it then remains on propradation.When aut.eq. is in voice status (δ N-1=2) time, if ρ 0Less than threshold value SE2, it remains unchanged greater than the 3rd threshold value SE3, if be in opposite situation, it then carries out the transition to the decline state.When aut.eq. is in decline state (δ N-1=3), as ρ 0Greater than threshold value SE2, then it gets back to voice status again, if ρ 0Outside the 4th threshold value SE4, SE4 is less than SE2, and then it gets back to silent state again, as SE4≤ρ 0≤ SE2, it still remains on the decline state.
In step 37, module 15 is also calculated sound level (degr é) γ of voice activity in each frequency band i 〉=1 N, iThis sound level γ N, iPreferably nonbinary parameter, a function gamma in other words N, i=g (ρ i) be one according to the amount ρ iValue continually varying function between 0 to 1.Fig. 5 illustrates the form of this function as an example.
Module 16 is calculated the Noise Estimation of each frequency band, will use component S in this N, iWith sound activity sound level γ N, iIn succession each the value, the Noise Estimation of addressing will be used in denoising.This determines in step 40 corresponding to each step to 42 of 40 among Fig. 3 whether the aut.eq. of voice activity detection just carries out the transition to voice status from propradation.Under sure situation, for latter two estimation of each frequency band i 〉=1 previous calculations
Figure A9881035800171
With
Figure A9881035800172
I is according to previous estimated value B N-3, iRevise.Carrying out this correction is in order to consider the following fact: in ascent stage (δ=1), the long-term estimation (step 30 is to 33) of the noise energy in the testing process of voice activity can be as being only to comprise noise in the signal to calculate (Bm=Bms), and this can make error.
In step 42, module 16 is upgraded the Noise Estimation of each frequency band according to following formula: B ~ n , i = &lambda; B &CenterDot; B ^ n - 1 , i + ( 1 - &lambda; B ) &CenterDot; S n , i - - - - ( 5 ) B ^ n , i = &gamma; n , i &CenterDot; B ^ n - 1 , i + ( 1 - &gamma; n , i ) &CenterDot; B ~ n , i - - - - - ( 6 ) λ herein BThe factor is ignored in expression, and 0<λ is arranged B<1.Find out the sound level γ that considers the nonbinary voice activity significantly from (6) formula N, i
Point out the long-term estimation of noise as the front
Figure A9881035800181
Constituted the target of a too high estimation (sur é stimation), the too high estimation of addressing is carried out before the denoising carrying out the non-linear spectrum that subtracts by module 45 (see figure 1)s.Module 45 is calculated the too high estimation coefficient α ' that addresses previously N, i, and a too high estimation This mainly corresponding to
Figure A9881035800183
The mechanism of too high estimation module 45 shows in Fig. 6.Maximum estimated
Figure A9881035800184
It is the long-term estimation that is used among the frequency band i Measured value Δ B with the variation of noise component around its long-term estimation Max N, iCombine and obtain.In the example of being investigated, this combination is simply sued for peace by a totalizer 46 basically and is realized that this can be a summation after the weighting equally.
Too high estimation coefficient α ' N, iEqual totalizer 46 that disengage and
Figure A9881035800186
Long-term estimation with time-delay Ratio (divider 47), be limited to a ultimate value α on it Max, α for example Max=4 (square frames 48).Time-delay τ 3Be used to revise the situation of burst, in each ascent stage (δ=1), be in 40 and 41 steps of estimating for a long time in Fig. 3, revised before (τ for example 3=3) revise too high estimation coefficient α ' N, iValue.
Maximum estimated
Figure A9881035800188
Final value be
Figure A9881035800189
(multiplier 49).
The measured value Δ B of noise varience Max N, iThe variance that has reflected noise estimation value.The variance of addressing be for the frame of the some of front calculate according to S N, iWith
Figure A98810358001810
Obtain.On the frame of the some of addressing, voice signal does not have voice activity in frequency band i.This is for k the silent frame (deviation that n-k≤n) calculates Function.In described this example, this function is got maximal value (square frame 50) simply.For each frame n, with the sound level γ of voice activity N, iCompare (square frame 51) with a threshold value, so that the deviation that decision is calculated in 52-53 Whether should be loaded in the pending file 54 of k first in first out (FIFO) position.If γ N, iDo not have to surpass the threshold value of addressing (be form shown in Figure 5 as function g (), then this threshold value equals 0), then FIFO54 does not obtain loading, if under reverse situation, just it obtains loading.The maximal value of being held among the FIFO54 is just as measured change amount Δ B Max N, i
As mutation, variation delta B Max N, iMeasured value can according to the value S N, f(not S N, i) and value
Figure A98810358001813
Obtain.So same method is just arranged, unless, do not include among the FIFO54 for each frequency band i But have
Figure A98810358001815
Because the estimation of addressing and the long-term fluctuation of noise
Figure A9881035800191
Irrelevant, with the short term variations amount Δ B of noise Max N, iIrrelevant, too high Noise Estimation Make that this denoising method is strong firm (robustesse) for the music noise.
The phase one that subtracts spectrum is to be realized by module shown in Figure 1 55.In this stage, and the resolution characteristic of service band i (1≤i≤I), according to component S N, iWith And too high estimation coefficient α ' N, i, the response H of the frequency of the first denoising wave filter is provided 1 N, iCan calculate with following formula for each frequency band i: H n , i 1 = max { S n , i - &alpha; n , i &prime; &CenterDot; B ^ n , i , &beta; i 1 &CenterDot; B ^ n , i } S n - &tau; 4 , i τ herein 4Be integer time-delay, as use τ 4〉=0 determine (τ for example 4=0).In formula (7), factor beta 1 iAnd the factor beta p in the formula (3) iThe same, also be lower limit of expression, be used for avoiding signal behind the denoising for negative or too little traditionally.
According to the method for having known (EP-A-0534837), the too high estimation coefficient α ' in formula (7) N, iCan replace with another coefficient, this coefficient equals α ' N, iAnd SNR estimation (for example
Figure A9881035800195
) a function, this function descends with the increase of the estimated value of signal to noise ratio (S/N ratio).So for the value of the signal to noise ratio (S/N ratio) of minimum, this function equals α ' N, iIn fact, when the noise of signal is too strong, obviously and be unfavorable for dwindling the too high estimation factor.Advantageously for those maximum values of signal to noise ratio (S/N ratio), this function drops to zero, and this just can protect those the strongest zones of frequency spectrum, and frequency spectrum is the strongest, and also voice are the most meaningful just, and the amount that here deducts from signal just goes to zero.
When the harmonic wave of the pitch frequency (pitch frequency) that this general plan is applied to selectively the speech signal of voice activity, can obtain refining.
Like this, in embodiment shown in Figure 1, the subordinate phase of denoising is then realized by protection harmonic wave module 56.This module is used the resolution characteristic (r é solution) of Fourier transform, according to parameter H 1 N, i, α ' N, i,
Figure A9881035800196
δ n, S N, iAnd the pitch frequency f that outside silent period, calculates with frequency analysis module 57 p=Fe/Tp calculates the frequency response H of the second denoising wave filter 2 N, fAt silent period (δ n=0), module 56 is not used, and for every kind of frequency f of frequency band i, H is arranged all in other words 2 N, f=H 1 N, iModule 57 can be used the analytical approach of the voice signal of various known frames, and for example the method for linear prediction is determined period T p '.Period T p ' represents with integer of sampling or mark.
The protection that is brought by module 56 can realize having for each frequency that belongs to frequency band i in the following method: Δ f=F e/ N is the frequency spectrum resolution characteristic of Fourier transform, works as H 2 N, f=1 o'clock, the amount of the deducting S of component N, fJust equal 0.In this calculates, the lower bound factor beta 2 i(β for example 2 i1 i) expression first-harmonic f pSome harmonic wave may be by masking by noise so that be useless to their protection.
Preferably this protection general plan is applied near f pEach harmonic frequency, be used for any integer η in other words.
If use δ f pExpression frequency spectrum resolution characteristic, then this resolution characteristic of analysis module 57 usefulness generates the first-harmonic f that estimates p, in other words, actual first-harmonic is at f p-δ f p/ 2 to f p+ δ f pBetween/2, so that the η order harmonic frequency of actual fundamental frequency and its estimation η * f pThe interval of (condition 9) can reach ± η * δ f p/ 2.For those maximum η values, this at interval can be greater than half frequency spectrum resolution characteristic Δ f/2 of Fourier transform.In order to consider this uncertainty, and, can protect frequency range [η * f for protecting each harmonic wave of actual first-harmonic well p-η * δ f p/ 2, η * f p+ η * δ f p/ 2] each frequency in is changed top condition (9) in other words and is done &Exists; &eta;integer / | f - &eta; . f p | &le; ( &eta; . &delta; f p + &Delta;f ) / 2 - - - - - - ( 9 &prime; )
This disposal route (condition (9 ')) has special benefit when the η value is big, particularly this method is being applied under the situation in the broadband system.
For the frequency of each protection, point out the frequency response H that has revised as the front 2 N, fCan equal 1, this is zero corresponding to the amount that deducts in subtracting the scope of spectrum, and in other words, relevant frequency is all protected completely.General situation is this frequency response H that has revised 2 N, fCan get 1 to H according to the protection class of hope 1 N, fBetween any one value.This corresponding to an amount that deducts less than the amount that when relevant frequency is not protected, will deduct.
The spectrum component S of de-noised signal 2 N, fBe to calculate with multiplier 58: S n , f 2 = H n , f 2 &CenterDot; S n , f - - - - - - ( 10 )
This signal S 2 N, fOffer module 60.The auditory physiology acoustic model that module 60 should be chosen ear calculates one for each frame n and shelters curve.
Occlusion is people's ear principle of work of knowing.When hearing two frequencies simultaneously, one in this two frequency just may not be heard, just says that this frequency is masked.
There is diverse ways to calculate and shelters curve, for example can use J.D.Johnston (" Transform Coding of Audio Signal Using Perceptual NoiseCriteria ", IEEE Journal on Selected Area in Communication, Vol.6, No.2 F é vrier (February) 1988) method that is developed.In this method, with the barks frequency scaling, will shelter curve and regard the eardrum film spread function of (membrane basilaire) frequency spectrum on the bark territory and the convolution of pumping signal as, in should using, use signal S 2 N, fConstitute pumping signal.Can set up the model of the spread function of frequency spectrum with method shown in Figure 7.For each bark band, the convolution of the spread function of calculating eardrum film and the distribution of low strap and high-band: C n , q = &Sigma; q &prime; = 0 q - 1 S n , q &prime; 2 ( 10 10 / 10 ) ( q - q &prime; ) + &Sigma; q &prime; = q + 1 Q S n , q &prime; 2 ( 10 25 / 10 ) ( q &prime; - q ) - - - - ( 11 ) Here, mark q and q ' show bark frequency band (0≤q, q '≤Q), and S 2 N, fExpression belongs to the component S of denoising pumping signal of each discrete frequency f of bark frequency band q ' 2 N, fMean value.
The masking threshold M of each bark frequency band q N, qObtain according to following formula by module 60
M N, q=C N, q/ R q(12) herein, R qThe power that depends on the noise characteristic of signal.In accordance with known methods, R qPossible form be:
10log 10(R qχ+the B of)=(A+q) (1-χ) (13) herein, A=14.5, B=5.5, χ are the noise sound level of voice signal, in 0 (noiseless) to changing between 1 (the very noisy signal).Parameter χ can take off the form of looking familiar and knowing: &chi; = min { SFM SF M max , 1 } Herein, SFM represents the arithmetic mean of bark frequency band energy and the ratio of geometrical mean, and unit is a decibel, and SFM Max=-60dB.
The denoising system that addresses also has a module 62, and it shelters curve M according to what module 60 calculated N, qThe too high estimation that calculates with module 45
Figure A9881035800222
Revise the frequency response of denoising wave filter.The actual denoising sound level that should reach of module 62 decisions.
With the envelope of the too high estimation of noise with by masking threshold M N, qThe envelope that is constituted is relatively only being worked as maximum estimated
Figure A9881035800223
Surpass under the situation of sheltering curve just to signal denoising sound.This has just been avoided unnecessarily removing the noise of being sheltered by voice.
Like this, for belonging to a frequency band i that is determined by module 12 and a new response H who belongs to the frequency f of bark frequency band q 3 N, fJust be decided by maximum estimated corresponding to the spectrum component of noise
Figure A9881035800224
With shelter curve M N, qBetween deviation, as follows: H n , f 3 = 1 - ( 1 - H n , f 2 ) . max { B ^ n , i &prime; - M n , q B ^ n , i &prime; , 0 } - - - - ( 14 )
In other words, in subtracting the spectrum process, spectrum component S N, fThe frequency response that has is H 3 N, f, its desired value equals smaller in following two amounts substantially: be the amount that deducts from this spectrum component in subtracting the spectrum process on the one hand, have frequency response H 2 N, f, be the maximum estimated of the pairing spectrum component of noise on the other hand
Figure A9881035800226
Under actual conditions, surpass that part of sheltering curve.
Fig. 8 illustrates module 62 employed correction principles.The spectrum component S that schematically shows as an example in this drawing in de-noised signal 2 N, fThe basis on calculate shelter curve M N, q, and the maximum estimated of noise spectrum From component S N, fIn the amount that finally deducts be the zone of representing with hacures, promptly be the maximum estimated of noise spectrum component Exceed that part of sheltering curve.
This subtracts spectrum is spectrum component S with voice signal N, fMultiply by the frequency response H of denoising wave filter 3 N, f(multiplier 64) carries out.65 couples of frequency sampling S that discharged by multiplier 64 of module 3 N, fCarry out fast adverse Fourier transform (TFRI), just on time domain, rebuild the signal of denoising.For each frame, have only preceding N/2=128 sampling to discharge in the signal sampling that is produced by module 65, after going addition-covering to rebuild, become final de-noised signal S with back N/2=128 sampling of front one frame 3
Fig. 9 illustrates a kind of desirable embodiment of an enforcement denoising of the present invention system.Some corresponding parts of system shown in some part in this system and Fig. 1 are similar, mark with regard to using identical numeral.Like this, module 10,11,12,15,16,45 and 55 amount of providing S especially just N, i,
Figure A9881035800231
α ' N, i, And H 1 N, f, so that realize selectable denoising.
The frequency discrimination ability 11 of frequency fast fourier transform is a restriction of system shown in Figure 1.In fact, the frequency that will protect as module 56 needn't be fundamental frequency f accurately p, can be the frequency that approaches fundamental frequency in the discrete spectrum most.In some cases, protect from this fundamental frequency harmonic wave far away relatively, system shown in Figure 9 has overcome this defective by means of one of voice signal suitable condition.
Under this condition, change the sampling frequency of signal, make sample period 1/f pThe strict integral multiple that covers the given signals sampling time.
Module 57 can be implemented several different methods and carry out frequency analysis, and the fractional value T of a time delay can be provided p, with initial sampling frequency F eA plurality of hits represent.So just select a new sampling frequency f e, make it the round values of the fundamental frequency that equals to estimate, i.e. f e=pf p=pFe/Tp=KFe, p is an integer herein.In order not lose signals sampling, should make f eGreater than Fe, can specialize f eAt Fe to (1≤K≤2) between the 2Fe so that this condition is implemented simple.
Certainly, if in present frame, do not measure any voice activity (δ n≠ 0), perhaps if the time-delay T that module 57 is estimated pBe integer, just there is no need to limit signal.
For each harmonic wave of fundamental frequency also all is the integral multiple of given signal sampling, integer p should be the factor of the signal window size N that generates of module 10: N=α p, α is an integer herein.N normally implement TFR square, be 256 in the example of being investigated.
The frequency spectrum resolution characteristic Δ f of the discrete Fourier transform (DFT) of given signal is by Δ f=pf p/ N=f p/ α provides.Its meaning is to select little p for use so that the α maximum, but enough greatly with oversampling.In the example of being investigated, F e=8kHz, N=256 then lists in Table I the selected value of parameter p and α.
Table I
??500Hz<f p<1000Hz ????8<T p<16 ????p=16 ????α=16
??250Hz<f p<500Hz ????16<T p<32 ????p=32 ????α=8
??125Hz<f p<250Hz ????32<T p<64 ????p=64 ????α=4
??62.5Hz<f p<125Hz ????64<T p<128 ????p=128 ????α=2
??31.25Hz<f p<62.5Hz ????128<T p<256 ????p=256 ????α=1
This is chosen is the time-delay T that is provided according to frequency analysis module 57 by module 70 pValue carry out.Module 70 is provided at the ratio K of the sample frequency in three frequency shift modules 71,72,73.
To be relevant to each value S in the module 71 by module 12 determined frequency band i N, i,
Figure A9881035800241
α ' N, i,
Figure A9881035800242
And H 1 N, fBe transformed into frequency scaling (the sampling frequency f of correction e).This conversion only is with frequency band i expansion K doubly.The value that such conversion is obtained offers harmonic protection module 56.
Module 56 usefulness mode is as before operated, and the frequency response H of denoising wave filter is provided 2 N, fThis frequency response H 2 N, fBe with situation shown in Figure 1 under obtain with identical method (condition (8) and (9)), its slight difference is fundamental frequency f in condition (9) p=f e/ p is that the round values p of the time-delay that provided by module 70 determines, and frequency discrimination ability Δ f is also provided by module 70.
N the sample frame that 72 pairs of window modules 10 of module are provided carried out additional sample.With a reasonable factor k (k=k 1/ k 2) the carrying out of additional sampling at first be with the additional sampling of integer factor k1, then with the less sampling of integer factor k2.With additional sample in these integer factors and less sampling, all carrying out traditionally with the polyphase filters group.
Among the frame s ' of the given signal that is provided by module 72 KN frequency being arranged is f eSampling.These samplings offer module 75, and module 75 has been calculated the Fourier transform of these samplings.The conversion of addressing can be to carry out according to the set of samples of two N=256: a set of samples is that length is the top n sampling formation of the given signal frame s ' of KN, and another set of samples then is that back N sampling of this frame constitutes.These two set of samples just have the repetition of (2-K) * 100%.For in these two groups each, obtain a Fourier component group S N, f, these component groups S N, fAll be provided to multiplier 58, multiplier 58 spectral response H 2 N, fGo to take advantage of each Fourier component, so that draw each spectrum component S of first de-noised signal 2 N, f
These components S 2 N, fAll offer module 60, the method that module 60 usefulness are pointed out is previously calculated and is sheltered curve.
Best, to shelter in the calculating of curve at this, the form that amount χ got of the noise sound level (formula (13)) of expression voice signal is χ=1-H, H has been the spectrum component S of the given signal of noise herein 2 N, fAutocorrelative entropy.All auto-correlation A (k) are that use-case such as following formula are calculated by module 76: A ( k ) = &Sigma; f = 0 N / 2 - 1 S n , f 2 &CenterDot; S n , f + k 2 &Sigma; f = 0 N / 2 - 1 &Sigma; f &prime; = 0 N / 2 - 1 S n , f 2 &CenterDot; S n , f + f &prime; 2 - - - - - - ( 15 )
Then, module 77 is calculated normalized entropy H, and it is offered module 60, be used for calculating shelter curve (see S.A.McClellan etc.: " Spectral Entropy:anAlternative Indicator for Rate Allocation? " Proc.ICASSP ' 94, the 201-204 page or leaf): H = &Sigma; k = 0 N / 2 - 1 A ( k ) . log [ A ( k ) ] log ( N / 2 ) - - - - - - - - - ( 16 )
By means of qualification to signal, also by means of with wave filter to signal denoising sound H 2 N, f, normalization entropy H constitutes the very firm measured value of the variation of noise and fundamental frequency.
The method that correcting module 62 usefulness are identical with system shown in Figure 1 is operated, and considers the too high estimating noise B ' that carries out scale with frequency change module 71 again N, iProvide the frequency response H of last denoising wave filter 3 N, f, by the spectrum component S of the given signal of multiplier 64 usefulness N, fRemove to take advantage of the frequency response H that addresses 3 N, fEach component S that draws thus 2 N, fAll on time domain, rebuild by TFRI module 65.In output place of this TFRI module 65, module 80 is mixed for the signal of each frame with two square frames.Described these two groups of signals be two groups of disengaging by TRR75 topped treated the obtaining of signal.This is mixed can be each sampling Hamming weighting and, with a frame of the denoising that constitutes KN given signal of sampling.
The given signals sampling frequency of the denoising that is provided by module 80 is provided module 73.Inverse operation by operation that module 75 is carried out is directed to F again with this sampling frequency e=f e/ k.Module 73 is disengaged the N=256 sampling for each frame.After back N/2=128 sampling warp with former frame add-repeats to re-construct, have only a preceding N/2=128 sampling finally to obtain preserving in the present frame, be used for constituting final de-noised signal S 3(module 66).
In a desirable embodiment, module 82 is being managed each window that is formed, preserved by module 66 by module 10, so that hits M equals T p=F e/ f pIntegral multiple, like this with regard to the problem of the uncontinuity of having avoided the phase place between the frame.Use corresponding mode, administration module 82 control window modules 10 are so that the covering of present frame and contiguous frames is corresponding to N-M.To the processing of consecutive frame the time, this N-M covers sampling and should consider in the additional-repetitive operation that is realized by module 66.The value T that module 82 is provided according to frequency analysis module 57 pCalculate the hits M=T that keeps p* E[N/ (2T p)], and control corresponding module 10 and 66, E[] expression round numbers part.
In the embodiment of just having described, fundamental frequency is to estimate with the average method of every frame.Perhaps fundamental frequency can have little variation in the duration of this frame.Can will consider that this variation regards as within the scope of the invention by limiting signal, so that in this frame, obtain a constant fundamental frequency artificially.
For this reason, just need frequency analysis module 57 to be provided in duration of this frame because the glottis that occurs is closed the time interval of the voice signal that causes between interrupting in succession.The method that can be used to detect so small interruption is known in the frequency analysis field to voice signal.To this, can consult following article: people such as M.BASSEVILLE, " Sequentialdetection of abrupt changes in spectral characteristics of digitalsignals ", IEEE Trans.on Information Theory, 1983, Vol.IT-29, n ° of 5, the 708-723 pages or leaves; R.ANDER-OBRECHT, " A new statistical approachfor the automatic segmentation of continuous speech signals ", IEEETrans.on Acous., Speech and Sig.Proc., Vol.36, N ° 1, janvier (January) 1988; With people such as C.MURGIA, " An algorithm for the estimation ofglottal closure instants using the sequential detection of abruptchanges in speech signals ", Signal Processing VII, 1994, the 1685-1688 pages or leaves.
The principle of these methods is to carry out a statistical test between two models below: one is short-term, and another is long-term, and these two kinds of models all are adaptive linear prediction models.The value w of this statistical test mBe two distributions dependability of experience ratio accumulation and, disperse with Kullback and to revise, distribute value w for residual error with Gaussian statistics mProvide by following formula: w m = 1 2 [ 2 . e m 0 . e m 1 &sigma; 1 2 - ( 1 + &sigma; 0 2 &sigma; 1 2 ) &CenterDot; ( e m 0 ) 2 &sigma; 0 2 + ( 1 - &sigma; 0 2 &sigma; 1 2 ) ] - - - ( 17 ) E herein 0 mAnd σ 2 0Represent the residual sum variance that m sampling instant of the frame of long-term model calculates respectively, and e 1 mAnd σ 2 1The identical residual sum variance of representing short-run model respectively.These two models are approaching more, then the value w of statistical test mJust approach 0.On the contrary, in these two models is far away apart from another, then is worth w mFor negative, this of showing signal is interrupted R.
Figure 10 illustrates w mBe worth a possible example that develops, show the interruption R of voice signal, two time interval t that interrupt in succession between R r(r=1,2 ...) be to calculate and represent with a plurality of samplings of voice signal.Each time interval t rWith fundamental frequency f pBe inversely proportional to, and fundamental frequency is the local estimation of carrying out in r interval: f p=F e/ t r
So, just can change and revise time of fundamental frequency (in other words, because in a given frame all time interval t rNot all equate.), so that the fundamental frequency in each analysis frame all is a constant.By to each time interval t rThe modification of sampling frequency carry out this correction so that the time interval of two later glottises of additional sampling between interrupting is constant.This just is used in the ratio of a variation and takes a sample, and has revised the duration between two interruptions, makes it to be stabilized on the bigger time interval.In addition, so just followed given restriction, this restriction is that sampling frequency is the integral multiple of the fundamental frequency of estimation.
Figure 11 illustrates the device of under this last a kind of situation signal calculated condition of being used for.Frequency analysis module 57 is implemented above-mentioned analytical approach, and the time interval t with respect to the signal frame that produces with module 10 is provided rModule 70 (square frame 90 in Figure 11) is for the additional sampling ratio K of each calculating in this time interval r=p r/ t r, p herein rBe the t in Table I rThe given value of the 3rd row when getting the listed value of secondary series.With these additional sampling ratio K rOffer frequency change module 72 and 73, so that use this sampling ratio K rAt corresponding time interval t rIn carry out interpolation.
Module 57 is provided time interval t by a frame rIn the maximum time interval T pSelect by module 70 (being square frame 91 in Figure 11), be used for obtaining a logarithm α and the p shown in the Table I.So the sampling frequency of having revised is f e=pF e/ T p, as before; And the frequency spectrum resolution characteristic Δ f of the discrete fourier transition of given signal is always by Δ f=F e/ (α T p) provide.For frequency change module 71, additional sampling ratio K is by K=p/T p(square frame 92) provides.The harmonic protection module 56 usefulness method identical with the front of fundamental frequency operated, and simultaneously, uses frequency spectrum resolution characteristic Δ f that square frame 91 provided and the value of the integer time-delay p that provided according to square frame 91 is determined fundamental frequency f in condition (9) p=f e/ p.
This embodiment of the present invention also requires to adopt window management module 82.Continuous time interval t between the hits M of the de-noised signal that here, keeps in present frame closes corresponding to two glottises rAn integer (see figure 10).Such layout is considered may changing of time interval in the frame fully, has avoided the discontinuous problem of phase place of interframe.

Claims (28)

1. denoising method that adopts the audio digital signals (s) that successive frames handles, wherein:
-voice signal is carried out frequency analysis, each frame with voice activity is estimated the fundamental frequency (f of voice signal p);
-for each frame, the spectrum component (S of computing voice signal N, f, S N, i);
-for each frame, calculate the estimation of the spectrum component that contains the noise in voice signal of mixing;
-subtract spectrum, wherein having a step at least is for a frame each spectrum component (S from the voice signal of described frame N, f) in deduct a value that determines by some parameters, in the parameter of addressing, have the estimation of the pairing frequency component of noise of the frame of addressing and the value of the fundamental frequency that estimation obtains at least; And
-compose the conversion that the result who obtains is applied to time domain to subtracting, removed the voice signal (s of noise with structure 3).
2. according to the process of claim 1 wherein, the value (f of the fundamental frequency that an estimation is obtained p) be used for selecting shielded frequency at the spectrum component of wanting the computing voice signal, and in this method for the given spectrum component (S from voice signal N, f) value that deducts, if the spectrum component of addressing corresponding to the frequency that will protect, the value that is adopted when then the value that is adopted does not just correspond to the frequency that will protect less than the spectrum component of addressing.
3. according to the method for claim 2, wherein, the frequency that selection will be protected makes spectrum component corresponding to the voice signal of shielded frequency surpass the determined noise sound level of estimation according to the frequency component of the noise of correspondence.
4. according to the method for claim 2 or 3, wherein, each shielded frequency is the fundamental frequency (f that the most approaching estimation obtains in the frequency sets of the spectrum component of computing voice signal p) the frequency of an integral multiple.
5. according to the method for claim 2 or 3, wherein, each shielded frequency is to approach form most to be [η * f in the set of those frequencies of the spectrum component of computing voice signal p-η * δ f p/ 2, η * f p+ η * δ f pFrequency, wherein a f at interval/2] pThe fundamental frequency that the expression estimation obtains, δ f pBe the frequency discrimination ability of the estimation of fundamental frequency, and η represents an integer, and in this method,
6. according to method any in the claim 2 to 5, wherein, from voice signal spectrum component (S N, f) in the value that deducts be substantially zero at the restricted frequency place.
7. according to method any in the claim 1 to 6, wherein, at fundamental frequency (f to the voice signal in the frame p) after the estimation, by additional sampling frequency (f to the integral multiple of the fundamental frequency estimated e) add sampling, limit the voice signal of this frame, and for this frame, the spectrum component (S of computing voice signal on the basis of given signal (s ') N, f), be used for therefrom deducting the value of addressing.
8. according to the method for claim 7, wherein, will transform on the frequency domain spectrum component (S of computing voice signal by the given signal of the group of N sample (s ') N, f), and, in this method, additional sampling frequency (f e) and estimate that the ratio (p) of the fundamental frequency obtain is an Integer N.
9. according to the method for claim 7 or 8, wherein,, calculate the noise sound level (χ) of the voice signal of estimating this frame by this according to the autocorrelative entropy (H) of the spectrum component that on the basis of given signal, calculates.
10. according to the method for claim 9, wherein, the spectrum component (S of the calculating auto-correlation of addressing (H) 2 N, f) all be after deducting the value of addressing, those spectrum components that on given signal (s ') basis, calculate.
11. according to the method for claim 9 or 10, wherein, the measurement of the noise sound level of addressing (χ) is that the normalized entropy (H) according to following form carries out: H = &Sigma; k = 0 N / 2 - 1 A ( k ) . log [ A ( k ) ] log ( N / 2 ) Herein, N is used for calculating each spectrum component (S on the basis of given signal (s ') N, f) sample number, and A (k) is normalized auto-correlation, is defined by following formula: A ( k ) = &Sigma; f = 0 N / 2 - 1 S n , f 2 &CenterDot; S n , f + k 2 &Sigma; f = 0 N / 2 - 1 &Sigma; f &prime; = 0 N / 2 - 1 S n , f 2 &CenterDot; S n , f + f &prime; 2 S 2 N, fBe illustrated in the order of calculating on the given basis of signals (rang) and be the spectrum component of f.
12. according to any one method in each claim of front, wherein, after the processing of each frame, in all samples of the voice signal that the denoising that is provided is provided by this, the number of the sample of reservation (M) equals sampling frequency (F e) and estimate the fundamental frequency (f obtain p) ratio (T p) integral multiple.
13., following step is arranged in the estimation for the fundamental frequency of the voice signal in each frame wherein according to method any in the claim 1 to 11:
-estimation can be closed two time interval (t that interrupt in succession between (R) of the signal of generation owing to the glottis of sounder in the time that continues of this frame r), the fundamental frequency of estimation and the time interval of addressing are inversely proportional to;
-interpolation voice signal in the time interval of addressing is so that cause the time interval of given signal (s ') between twice interruption in succession constant by this interpolation.
14. according to the method for claim 13, wherein, after the processing to each frame, the sample number (M) that keeps in whole samples of the voice signal that the denoising that is provided is provided by this is corresponding to the time interval (t that estimates r) an integer number.
15. according to any one method in each claim of front, in frequency domain, estimate the value of the signal to noise ratio (S/N ratio) that voice signal had of each frame, and in this method, each parameter of the value that deducts of decision includes the estimated value of this signal to noise ratio (S/N ratio), and the value that deducts from each spectrum component of the voice signal of this frame is the decreasing function of the estimated value of corresponding signal to noise ratio (S/N ratio).
16. according to the method for claim 15, wherein, for those values of signal to noise ratio (S/N ratio) maximum, the function of addressing drops to zero.
17., wherein, use spectrum component (S from voice signal according to any one method in each claim of front N, f) in deduct the value of addressing and the spectrum component (S of the de-noised signal that obtains 2 N, f) all be to go to calculate one and shelter curve (M by using an auditory model N, q).
18., wherein, shelter curve (M according to the method for claim 11 and 17 N, q) calculating the noise sound level of measuring by normalized entropy (H) (χ) is worked.
19. according to the method for claim 17 or 18, wherein, a spectrum component (S of the voice signal of decision from a frame N, f) in comprise the maximum estimated of noisy corresponding frequency spectrum component in all parameters of the value that deducts
Figure A9881035800051
With calculate shelter curve (M N, q) between deviation.
20. according to the method for claim 19, wherein, with the maximum estimated of the spectrum component of the noise of a frame With calculate shelter curve (M N, q) compare, and in this method, for obtaining the spectrum component (S to spatial transform 3 N, f), from the spectrum component (S of voice signal N, f) in the value that deducts be that the maximum estimated that is limited in the spectrum component of corresponding noise exceeds that part of sheltering curve.
21., wherein, subtract spectrum and comprise according to any one method in each bar claim of front:
-the first step subtracts spectrum: in this step, respectively from each spectrum component (S of the voice signal of described frame N, f) in deduct first value that determines by some parameters, the maximum estimated of spectrum component of noise of the correspondence of the frame of addressing is arranged in these some parameters of stating
Figure A9881035800053
And estimate the fundamental frequency (f obtain p), this is in order to obtain the spectrum component (S of first de-noised signal 2 N, f).
-use auditory model, according to each spectrum component (S of first de-noised signal 2 N, f) calculate and shelter curve (M N, q).
-will be for the maximal value meter of the spectrum component of the noise of the frame of addressing
Figure A9881035800054
With calculate shelter curve (M N, q) compare; And
-the second step subtracted spectrum, in this step, respectively from each spectrum component (S of the voice signal of this frame N, f) in deduct second value, this second value equals the reckling in following two values: the maximum estimated of the spectrum component of first value of the correspondence of addressing and corresponding noise exceeds that part of sheltering curve, and purpose is the spectrum component (S that will obtain to the signal of second denoising of spatial transform 3 N, f).
22. according to any one method in each claim of front, wherein, each estimation that is used in each spectrum component of the noise that subtracts in the spectrum all is a maximum estimated, is included in each maximum estimated of a spectrum component of the noise in the voice signal All being that of spectrum component of the noise that will address is long-term estimates Estimate measured value (the Δ B of variation on every side with the spectrum component of a noise for a long time at it Max N, i) combining obtains.
23. according to the method for claim 22, wherein, the long-term estimation of a frame n corresponding to a noise spectrum component that is included in a frequency among the frequency band i Calculate with following manner: B ^ n , i = &gamma; n , i &CenterDot; B ^ n - 1 , i + ( 1 - &gamma; n , i ) &CenterDot; B ~ n , i Herein, B ~ n , i = &lambda; B &CenterDot; B ^ n - 1 - i + ( 1 - &lambda; B ) &CenterDot; S n , i , γ N, iExpression is for the sound level of n frame with respect to the nonbinary voice activity of the voice signal of frequency band i, S N, iExpression is for the mean value of n frame at the spectrum amplitude of the voice signal of frequency band i, and λ BThe factor is ignored in expression.
24. method according to claim 23, wherein, be that the voice signal to the n frame carries out preliminary denoising on the basis of the estimating noise that obtains of at least one frame in the front of n frame, and analyze the variation of the energy of the signal that has tentatively removed noise, determine the sound level (γ of the voice activity of n frame N, i).
25. according to the method for claim 24, wherein, with respect to the sound level (γ of the voice activity of a certain frequency band i N, i) be at a continually varying function between 0 and 1.
26., wherein, calculate the long-term estimation (E of the preliminary de-noised signal energy in the frequency range i according to the method for claim 24 or 25 N, i), and with the instantaneous estimation value (E of this long-term estimated value and this energy of in the n frame, calculating N, i) relatively, in order that will obtain the sound level (γ of the voice activity of the voice signal of n frame in frequency band i N, i).
27. according to method any in the claim 23 to 26, wherein, for the noise of a certain frame n corresponding to a spectrum component of addressing that is included in a frequency in the frequency band i in its long-term estimation
Figure A9881035800064
Near measured value (the Δ B that changes Max N, i) be deviation | S N-k, i-B N-k, i| function, this deviation is that voice signal is not calculating in giving the frame of determined number n-k≤n when having voice activity in frequency range i.
28. according to method any in the claim 23 to 26, wherein, for the noise of a certain frame n corresponding to the spectrum component of addressing that is included in a frequency in the frequency band i in its long-term estimation
Figure A9881035800065
Near measured value (the Δ B of variation Max N, i) be maximum deviation Function, this maximum deviation is that voice signal does not have giving in determined number n-k≤n frame of voice activity to calculate S in frequency band i N-k, fExpression is for n-k frame and the pairing frequency interval [f of frequency band i (i-1), f i] in the spectrum component of a frequency f.
CN 98810358 1997-09-18 1998-09-16 Method for suppressing noise in digital speech signal Pending CN1276896A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR97/11642 1997-09-18
FR9711642A FR2768546B1 (en) 1997-09-18 1997-09-18 METHOD FOR NOISE REDUCTION OF A DIGITAL SPOKEN SIGNAL

Publications (1)

Publication Number Publication Date
CN1276896A true CN1276896A (en) 2000-12-13

Family

ID=9511229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 98810358 Pending CN1276896A (en) 1997-09-18 1998-09-16 Method for suppressing noise in digital speech signal

Country Status (10)

Country Link
EP (1) EP1016073B1 (en)
JP (1) JP2001516902A (en)
CN (1) CN1276896A (en)
AU (1) AU9169098A (en)
BR (1) BR9812655A (en)
CA (1) CA2304015A1 (en)
DE (1) DE69804329T2 (en)
ES (1) ES2174484T3 (en)
FR (1) FR2768546B1 (en)
WO (1) WO1999014739A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101031963B (en) * 2004-09-16 2010-09-15 法国电信 Method of processing a noisy sound signal and device for implementing said method
WO2010111876A1 (en) * 2009-03-31 2010-10-07 华为技术有限公司 Method and device for signal denoising and system for audio frequency decoding
CN101859569A (en) * 2010-05-27 2010-10-13 屈国良 Method for lowering noise of digital audio-frequency signal
CN109741757A (en) * 2019-01-29 2019-05-10 桂林理工大学南宁分校 The method of real-time voice compression and decompression for narrowband Internet of Things
CN109817241A (en) * 2019-02-18 2019-05-28 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency processing method, device and storage medium
CN116580712A (en) * 2023-07-14 2023-08-11 深圳攀高医疗电子有限公司 Voice processing method, voice processing system and waist therapeutic instrument

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2379550A (en) * 2001-09-11 2003-03-12 Barrington Dyer Printed code recording and playing system, for music, speech and sounds
FR2899424A1 (en) 2006-03-28 2007-10-05 France Telecom Audio channel multi-channel/binaural e.g. transaural, three-dimensional spatialization method for e.g. ear phone, involves breaking down filter into delay and amplitude values for samples, and extracting filter`s spectral module on samples
CN104251934B (en) * 2013-06-26 2018-08-14 华为技术有限公司 Harmonic analysis method and device and the method and apparatus for determining clutter between harmonic wave

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU633673B2 (en) * 1990-01-18 1993-02-04 Matsushita Electric Industrial Co., Ltd. Signal processing device
EP0459362B1 (en) * 1990-05-28 1997-01-08 Matsushita Electric Industrial Co., Ltd. Voice signal processor
US5469087A (en) * 1992-06-25 1995-11-21 Noise Cancellation Technologies, Inc. Control system using harmonic filters
US5555190A (en) * 1995-07-12 1996-09-10 Micro Motion, Inc. Method and apparatus for adaptive line enhancement in Coriolis mass flow meter measurement

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101031963B (en) * 2004-09-16 2010-09-15 法国电信 Method of processing a noisy sound signal and device for implementing said method
WO2010111876A1 (en) * 2009-03-31 2010-10-07 华为技术有限公司 Method and device for signal denoising and system for audio frequency decoding
US8965758B2 (en) 2009-03-31 2015-02-24 Huawei Technologies Co., Ltd. Audio signal de-noising utilizing inter-frame correlation to restore missing spectral coefficients
CN101859569A (en) * 2010-05-27 2010-10-13 屈国良 Method for lowering noise of digital audio-frequency signal
CN101859569B (en) * 2010-05-27 2012-08-15 上海朗谷电子科技有限公司 Method for lowering noise of digital audio-frequency signal
CN109741757A (en) * 2019-01-29 2019-05-10 桂林理工大学南宁分校 The method of real-time voice compression and decompression for narrowband Internet of Things
CN109817241A (en) * 2019-02-18 2019-05-28 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency processing method, device and storage medium
CN109817241B (en) * 2019-02-18 2021-06-01 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method, device and storage medium
CN116580712A (en) * 2023-07-14 2023-08-11 深圳攀高医疗电子有限公司 Voice processing method, voice processing system and waist therapeutic instrument
CN116580712B (en) * 2023-07-14 2023-09-15 深圳攀高医疗电子有限公司 Voice processing method, voice processing system and waist therapeutic instrument

Also Published As

Publication number Publication date
JP2001516902A (en) 2001-10-02
BR9812655A (en) 2000-08-22
DE69804329D1 (en) 2002-04-25
WO1999014739A1 (en) 1999-03-25
EP1016073A1 (en) 2000-07-05
EP1016073B1 (en) 2002-03-20
ES2174484T3 (en) 2002-11-01
AU9169098A (en) 1999-04-05
FR2768546B1 (en) 2000-07-21
FR2768546A1 (en) 1999-03-19
DE69804329T2 (en) 2002-11-14
CA2304015A1 (en) 1999-03-25

Similar Documents

Publication Publication Date Title
EP0790599B1 (en) A noise suppressor and method for suppressing background noise in noisy speech, and a mobile station
RU2507608C2 (en) Method and apparatus for processing audio signal for speech enhancement using required feature extraction function
US7957965B2 (en) Communication system noise cancellation power signal calculation techniques
US6523003B1 (en) Spectrally interdependent gain adjustment techniques
US6766292B1 (en) Relative noise ratio weighting techniques for adaptive noise cancellation
CN105989853B (en) Audio quality evaluation method and system
CN1727860B (en) Noise suppression method and apparatus
JP4307557B2 (en) Voice activity detector
US6687669B1 (en) Method of reducing voice signal interference
US20190156854A1 (en) Method and apparatus for detecting a voice activity in an input audio signal
CN1286788A (en) Noise suppression for low bitrate speech coder
US20040078199A1 (en) Method for auditory based noise reduction and an apparatus for auditory based noise reduction
US20060229869A1 (en) Method of and apparatus for reducing acoustic noise in wireless and landline based telephony
US6671667B1 (en) Speech presence measurement detection techniques
CN1240051C (en) Speech enhancement device
Udrea et al. An improved spectral subtraction method for speech enhancement using a perceptual weighting filter
CN1276896A (en) Method for suppressing noise in digital speech signal
US8165872B2 (en) Method and system for improving speech quality
Saleem et al. Ideal binary masking for reducing convolutive noise
Aicha et al. Perceptual speech quality measures separating speech distortion and additive noise degradations
Aicha et al. Reduction of musical residual noise using perceptual tools with classic speech denoising techniques
Upadhyay An improved multi-band speech enhancement utilizing masking properties of human hearing system
Udrea et al. A time-recursive adaptive algorithm for colored noise reduction in speech enhancement
Upadhyay et al. An auditory perception based improved multi-band spectral subtraction algorithm for enhancement of speech degraded by non-stationary noises
Krishnamoorthi et al. An auditory-domain based speech enhancement algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: IZMIR DEFENSE NETWORK CO., LTD.

Free format text: FORMER OWNER: MATRA NORTEL COMMUNICATIONS

Effective date: 20020130

C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20020130

Address after: French Le Monti Buller tonni Parkes

Applicant after: Eades security network company

Address before: Kamper, France

Applicant before: Matra Nortel Communications

C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication