CN103646648B - A kind of noise power estimation method - Google Patents

A kind of noise power estimation method Download PDF

Info

Publication number
CN103646648B
CN103646648B CN201310585440.3A CN201310585440A CN103646648B CN 103646648 B CN103646648 B CN 103646648B CN 201310585440 A CN201310585440 A CN 201310585440A CN 103646648 B CN103646648 B CN 103646648B
Authority
CN
China
Prior art keywords
analysis frame
frequency
present analysis
noise power
noisy speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310585440.3A
Other languages
Chinese (zh)
Other versions
CN103646648A (en
Inventor
徐敬德
崔慧娟
唐昆
许科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd
Tsinghua University
Original Assignee
XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd, Tsinghua University filed Critical XINRUIDI (BEIJING) SCIENCE & TECHNOLOGY Co Ltd
Priority to CN201310585440.3A priority Critical patent/CN103646648B/en
Publication of CN103646648A publication Critical patent/CN103646648A/en
Application granted granted Critical
Publication of CN103646648B publication Critical patent/CN103646648B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a kind of noise power estimation method, first the method is carried out sampling to input noisy speech and is obtained input tape noisy speech signal sampling point, and carries out framing to sampling point in chronological order, and obtains a series of analysis frame.Calculate noisy speech power and the posteriori SNR of each frequency in the maximum normalized autocorrelation value of present analysis frame, present analysis frame subsequently according to the Noisy Speech Signal sampling point in present analysis frame, and there is probability in the voice calculating each frequency in present analysis frame.Finally there is according to the noisy speech power of each frequency in the noise power estimation value of frequency each in last analysis frame, present analysis frame and voice the noise power estimation value that probability calculation obtains each frequency in present analysis frame.Computing of the present invention is simple, takies storage resources little, can estimate the power of various noise fast.

Description

A kind of noise power estimation method
Technical field
The present invention relates to noise reduction techniques field, specifically, relate to a kind of noise power estimation method.
Background technology
Voice are often subject to the interference of various noise in communication process, such as neighbourhood noise, circuit noise etc.The existence of noise not only can affect communication quality, and understands the extraction of parameter in severe jamming Low-ratespeech coding, reduces synthetic speech quality.For field of speech recognition, the existence of noise can reduce the accuracy rate of identification greatly, makes speech recognition be difficult to achieve satisfactory results in practical process.As can be seen here, squelch has great impact and profound significance to fields such as voice communication, voice coding, speech recognitions.Current noise suppression algorithm, although can to noise by certain inhibiting effect, often also more severe to the damage of voice, greatly reduce the intelligibility of voice.
For squelch, noise power estimation is wherein one of most important ingredient.Noise based on optimal smoothing and minimum statistics is estimated, obtains use widely by feat of good performance.Based on the noise power estimation method of optimal smoothing and minimum statistics, there is good robustness and superior performance, accurately can estimate the power of stationary noise.But the method mainly carries out noise power estimation based on posteriori SNR, nonstationary noise is followed the tracks of comparatively slow, often need the time delay of 0.5s ~ 1.5s.Meanwhile, the method needs the noise power of preserving each frequency in past multiple frame, needs complex calculations process and larger storage resources.
Based on above-mentioned situation, need one badly and can follow the tracks of stationary noise and nonstationary noise fast, and method of estimation is carried out to its noise power.
Summary of the invention
For solving the problem, the invention provides a kind of noise power estimation method, the method comprises the following steps:
Noisy speech framing step, carries out sampling to input noisy speech according to the sample frequency that presets and obtains input tape noisy speech signal sampling point, and carry out framing to described sampling point in chronological order, and obtain a series of analysis frame;
Maximum normalized autocorrelation value calculation procedure, calculates the maximum normalized autocorrelation value of present analysis frame according to the Noisy Speech Signal sampling point in present analysis frame;
Posteriori SNR calculation procedure, calculates noisy speech power and the posteriori SNR of each frequency in present analysis frame respectively according to the Noisy Speech Signal sampling point in present analysis frame;
There is probability calculation step in each frequency voice, the voice calculating each frequency in described present analysis frame according to the posteriori SNR of each frequency in the maximum normalized autocorrelation value of described present analysis frame and present analysis frame exist probability;
, there is according to the voice of each frequency in the noisy speech power of each frequency in the noise power estimation value of frequency each in last analysis frame, described present analysis frame and present analysis frame the noise power estimation value that probability calculation obtains each frequency in described present analysis frame in each frequency noise power estimation value calculation procedure.
According to one embodiment of present invention, the Noisy Speech Signal sampling point of described present analysis frame comprises the part Noisy Speech Signal sampling point of former frame and whole Noisy Speech Signal sampling points of present frame.
According to one embodiment of present invention, described maximum normalized autocorrelation value calculation procedure comprises the following steps:
S202a, utilize one to preset the bandpass filter of cutoff frequency to carry out filtering to the Noisy Speech Signal sampling point in described present analysis frame and obtain being with logical voice signal sampling point, be designated as the signal sampling point of present analysis frame;
S202b, calculate present analysis frame energy and correlation according to the signal sampling point of described present analysis frame;
S202c, according to described present analysis frame energy and correlation value calculation the maximum normalized autocorrelation value of present analysis frame.
According to one embodiment of present invention, the correlation of present analysis frame according to following formulae discovery:
r ( n , τ ) = Σ i = 0 I - 1 - τ [ x ( n , i ) × x ( n , i + τ ) ]
Wherein, x (n, i) i-th signal sampling point in the n-th analysis frame is represented, x (n, i+ τ) represent the i-th+τ in the n-th analysis frame signal sampling point, τ represents the side-play amount of the signal sampling point carrying out value in pitch period, and I represents the signal total sample in present analysis frame, the correlation of the n-th analysis frame when r (n, τ) represents that side-play amount is τ.
According to one embodiment of present invention, described posteriori SNR calculation procedure comprises the following steps:
S203a, the noisy speech power that discrete Fourier transformation obtains each frequency in described present analysis frame is carried out to the signal sampling point of present analysis frame;
S203b, calculate the posteriori SNR of each frequency in present analysis frame according to the noise power estimation value of each frequency corresponding in the noisy speech power of each frequency in described present analysis frame and last analysis frame.
According to one embodiment of present invention, there is probability according to the voice of each frequency in following formulae discovery present analysis frame:
p ( n , k ) = ϵ 1 × exp [ ∂ ( n , k ) ] / { α + exp [ ∂ ( n , k ) ] } r ‾ ( n ) ≥ r 1 ϵ 2 × exp [ ∂ ( n , k ) ] / { α + exp [ ∂ ( n , k ) ] } r 1 > r ‾ ( n ) > r 2 ϵ 3 × exp [ ∂ ( n , k ) ] / { α + exp [ ∂ ( n , k ) ] } r ‾ ( n ) ≤ r 2
Wherein, there is probability in the voice of the kth frequency that p (n, k) represents in the n-th analysis frame, represent the posteriori SNR of the kth frequency in the n-th analysis frame, α is constant, and ε 1, ε 2, ε 3 represent weighting coefficient, and exp represents and asks index, and r1, r2 represent threshold value.
According to one embodiment of present invention, described each frequency noise power estimation value calculation procedure comprises the following steps:
S205a, the noise power estimation value that to there is in probability and last analysis frame corresponding each frequency according to the voice of each frequency in the noisy speech power of each frequency in described present analysis frame, described present analysis frame calculate the noise power updated value of present analysis frame;
S205b, obtain the noise power estimation value of each frequency in described present analysis frame according to the noise power estimation value weighting of each frequency corresponding in the noise power updated value of each frequency in described present analysis frame and last analysis frame.
According to one embodiment of present invention, the noise power updated value of each frequency in present analysis frame according to following formulae discovery:
U(n,k)=p(n,k)×D(n-1,k)+[1-p(n,k)]×Y(n,k)k=0,1,...,K-1
Wherein, U (n, k) the noise power updated value of the kth frequency in the n-th analysis frame is represented, there is probability, D (n-1 in the voice of the kth frequency that p (n, k) represents in the n-th analysis frame, k) the noise power estimation value of the kth frequency in the (n-1)th analysis frame is represented, Y (n, k) represents the noisy speech power of the kth frequency in the n-th analysis frame, and K represents the frequency sum in the n-th analysis frame.
According to one embodiment of present invention, the noise power estimation value of each frequency in present analysis frame according to following formulae discovery:
D(n,k)=β×D(n-1,k)+(1-β)×U(n,k)k=0,1,...,K-1
Wherein, D (n, k) represents the noise power estimation value of the kth frequency in the n-th analysis frame, and U (n, k) represents the noise power updated value of the kth frequency in the n-th analysis frame, and β represents that one presets weighting coefficient.
According to one embodiment of present invention, described method also comprises the posteriori SNR and prior weight that upgrade present analysis frame each frequency and calculates final gain coefficient step.
The present invention is suitable for estimation and the prior art of stationary noise power, can the change of tracking noise fast for nonstationary noise.Meanwhile, the present invention, without the need to preserving the noise power of each frequency in multiple frames in the past, only needs the data of present frame and former frame, so computing of the present invention is simple, takies storage resources little, for the estimation practicability and effectiveness of noise power.
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in instructions, claims and accompanying drawing and obtain.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, do simple introduction by accompanying drawing required in embodiment or description of the prior art below:
Fig. 1 is the process flow diagram based on the noise power estimation method of optimal smoothing and minimum statistics in prior art;
Fig. 2 is the process flow diagram of noise power estimation method according to an embodiment of the invention;
Fig. 3 is the particular flow sheet of noise power estimation method according to an embodiment of the invention.
Embodiment
Describe embodiments of the present invention in detail below with reference to drawings and Examples, to the present invention, how application technology means solve technical matters whereby, and the implementation procedure reaching technique effect can fully understand and implement according to this.It should be noted that, only otherwise form conflict, each embodiment in the present invention and each feature in each embodiment can be combined with each other, and the technical scheme formed is all within protection scope of the present invention.
In addition, can perform in the computer system of such as one group of computer executable instructions in the step shown in the process flow diagram of accompanying drawing, and, although show logical order in flow charts, but in some cases, can be different from the step shown or described by order execution herein.
Fig. 1 shows the process flow diagram of the noise estimation method based on optimal smoothing and minimum statistics.
As shown in Figure 1, first in step S101, noisy speech framing is carried out.First input noisy speech is sampled according to the sample frequency (such as 8KHz) of setting, obtain input tape noisy speech signal sampling point.Subsequently to the framing in chronological order of input tape noisy speech signal sampling point, wherein frame number n represents.
In step s 102, first the part Noisy Speech Signal sampling point of former frame and the signal sampling point of present frame are carried out to the discrete Fourier transformation of K point, and calculate the power of each frequency, be designated as Y (n, k), wherein k=(0,1,2..., K-1).
Subsequently in step s 103 based on the smooth belt noisy speech signal power Q (n-1 that minimum mean square error criterion utilizes former frame to estimate, k) the noise power D (n-1 estimated with former frame, k) weighted factor (n, k) is calculated.α (n, k) is also referred to as smoothing factor, and it obtains with following formulae discovery:
α ( n , k ) = 1 1 + [ Q ( n - 1 , k ) / D ( n - 1 , k ) - 1 ] 2 - - - ( 1 )
In conjunction with the smooth belt noisy speech signal power Q (n-1 that former frame is estimated, and the power Y (n of individual frequency obtained in step S102 k), k), weighted factor (n is utilized in step S104, k) the smooth belt noisy speech signal power Q (n of current each frequency is obtained, k), it can represent with following formula:
Q(n,k)=α(n,k)×Q(n-1,k)+[1-α(n,k)]×Y(n,k)k=0,1,...,K-1(2)
After obtaining smooth belt noisy speech signal power Q (n, k) of each frequency of present frame, in conjunction with the smooth belt noisy speech signal power of L-1 frame before present frame in step S105, calculate the minimum value Q of the smooth belt noisy speech signal power of continuous L frame min(n, k), it can represent with following formula:
Q min(n,k)=min{Q(l,k)}l=n-L+1,n-L+2,...,n(3)
Obtain bias compensation factor B (L, n, k) according to the variance of present frame smooth belt noisy speech signal power, the estimated value of former frame noise power and frame number L in step s 106 subsequently, it can represent with following formula:
B ( L , n , k ) ≈ 1 + ( L - 1 ) × 2 × [ 1 - M ( L ) ] S ( n , k ) - 2 × M ( L ) - - - ( 4 )
Wherein, M (L) is the function relevant to L, and it represents with following formula:
M(L)≈0.025+0.23(1+log(L) 0.8)+2.7·10 -6L -2-1.14·10 -3L-7·10 -2(5)
S (n, k) be the degree of freedom of smooth belt noisy speech signal of each frequency of present frame, it is according to the variance var (Q (n of present frame band noise frequency signal power,) and the noise power estimation value D (n-1 of former frame k), k) calculate, it adopts following formula to represent:
1 S ( n , k ) = var ( Q ( n , k ) ) 2 D ( n - 1 , k ) - - - ( 6 )
Last in step s 107 according to the minimum value Q of bias compensation factor B (L, n, k) and smooth belt noisy speech signal power min(n, k) obtains the estimated value D (n, k) of each frequency noise power of present frame, and it can represent with following formula:
D(n,k)=B(L,n,k)×Q min(n,k)(7)
For existing noise power estimation method to the slow problem of nonstationary noise tracking velocity, the present invention proposes a kind of noise power estimation method based on normalized autocorrelation and posteriori SNR.The present invention utilizes the voice of each frequency to there is probability and the estimation of posteriori SNR realization to each frequency noise power, improves the tracking velocity for nonstationary noise, reduces the noise power estimation time for nonstationary noise.Fig. 2 illustrates noise power estimation flow process according to an embodiment of the invention.
As shown in Figure 2, carry out noisy speech framing first in step s 201 to the noisy speech of input, whole Noisy Speech Signal sampling points of the part Noisy Speech Signal sampling point and present frame of getting former frame form present analysis frame.
Utilize the present analysis frame obtained in step S201 in step S202, calculate the energy of present analysis frame and maximum normalized autocorrelation value respectively subsequently.In the present embodiment, maximum normalized autocorrelation value is the maximum related value of present analysis frame and the ratio of present analysis frame energy.
There is probability and obtained by the posteriori SNR of maximum normalized autocorrelation value and each frequency in each frequency voice in the present invention.And the posteriori SNR of each frequency calculates in step S203 in present analysis frame, in the present embodiment, in present analysis frame, the posteriori SNR of each frequency is the ratio of the noise power estimation value of corresponding each frequency in the noisy speech power of each frequency in present analysis frame and last analysis frame.
In step S204, there is probability in the voice that in the present analysis frame obtained in the maximum normalized autocorrelation value of the present analysis frame obtained according to step S202 and step S203, the posteriori SNR of each frequency calculates corresponding each frequency in present analysis frame.In the present embodiment, by test of many times, for different maximum normalized autocorrelation values, there is probability and obtained by different formulae discovery in the voice of each frequency.
Finally there is according to the voice of each frequency corresponding in the present analysis frame obtained in the noisy speech power of each frequency in the noise power estimation value of each frequency of last analysis frame, present analysis frame and step S204 the noise power estimation value that probability calculation obtains corresponding each frequency in present analysis frame.
Fig. 3 shows the particular flow sheet of a noise power estimation method according to Fig. 2.
As shown in Figure 3, in step s 201 noisy speech framing is carried out to the noisy speech of input, and obtain a series of analysis frame.In the present embodiment, the noisy speech of input is sampled according to the sample frequency of 8KHz, obtains input tape noisy speech signal sampling point.Subsequently the input tape noisy speech signal sampling point obtained is carried out framing according to time sequencing, the frame number of present frame uses n to represent.In the present embodiment, the time span of each frame is set to 20ms, but the present invention is not limited thereto, and contains 160 Noisy Speech Signal sampling points thus in each frame.In the present embodiment, the Noisy Speech Signal sampling point in present analysis frame is made up of whole Noisy Speech Signal sampling points of 80 Noisy Speech Signal sampling points rear in former frame and present frame.It should be noted that, in other embodiments of the invention, the Noisy Speech Signal sampling point number of the former frame comprised in present analysis frame and the Noisy Speech Signal sampling point number of present frame also can choose other reasonable values, the present invention is not limited thereto.
Calculate the maximum normalized autocorrelation value of present analysis frame subsequently according to the Noisy Speech Signal sampling point in present analysis frame in step S202.In the present embodiment, in step S202a, band connection frequency is first utilized to be [f 1, f 2] bandpass filter filtering carried out to the Noisy Speech Signal sampling point in present analysis frame obtain being with logical voice signal sampling point x (n, i), be designated as the signal sampling point of present analysis frame, wherein sampling point i=0,1 ..., I-1, I is the total sample in present analysis frame, and in the present embodiment, I is 240.In the present embodiment, the passband frequency range [f of bandpass filter 1, f 2] get [250,800], but the present invention is not limited thereto.
After obtaining the signal sampling point of present analysis frame, in step S202b, calculate present analysis frame energy and correlation.In the present embodiment, according to following formulae discovery present analysis frame energy:
r ( n , 0 ) = Σ i = 0 I - 1 [ x ( n , i ) × x ( n , i ) ] - - - ( 8 )
Wherein, r (n, 0) represents the energy of the n-th analysis frame, i.e. present analysis frame energy; X (n, i) represents i-th signal sampling point of the n-th analysis frame, and I is the signal total sample in present analysis frame.
Travel through all possible side-play amount τ subsequently, namely travel through all possible value of pitch period, calculate the correlation of the present analysis frame of each side-play amount corresponding.Wherein in the present embodiment, pitch period is [20,120] preferably, and the probability that side-play amount is in beyond this pitch period scope is minimum.Correlation corresponding to each side-play amount adopts following formulae discovery to obtain:
r ( n , τ ) = Σ i = 0 I - 1 - τ [ x ( n , i ) × x ( n , i + τ ) ] τ = 20,21 , . . . , 120 - - - ( 9 )
Wherein, x (n, i) i-th signal sampling point in the n-th analysis frame is represented, x (n, i+ τ) represent the i-th+τ in the n-th analysis frame signal sampling point, τ represents the side-play amount of the signal sampling point carrying out value in pitch period, and I represents the signal total sample in present analysis frame, r (n, τ) represents the correlation of the n-th analysis frame when side-play amount is τ.
After obtaining correlation corresponding to each side-play amount, wherein maximum related value r (n is got in step S202c, τ '), and according to this maximum related value r (n, τ ') and the ratio of present analysis frame energy r (n, 0) obtain the maximum normalized autocorrelation value of present analysis frame it can represent with following formula:
r ‾ ( n ) = r ( n , τ ′ ) r ( n , 0 ) - - - ( 10 )
Wherein, τ ' represents the side-play amount corresponding when the correlation of the n-th analysis frame gets maximal value.
Calculate noisy speech power and the posteriori SNR of each frequency of present analysis frame respectively according to the signal sampling point in present analysis frame in step S203.First in step S203a, Noisy Speech Signal power Y (n, k) that leaf change calculations in K point discrete Fourier obtains each frequency of present analysis frame is carried out to present analysis frame, wherein, present analysis frame frequency point k=0,1 ..., K-1.
Subsequently in step S203b according to the noisy speech power Y (n of each frequency in the present analysis frame obtained in step S203a, k) posteriori SNR of each frequency in present analysis frame is obtained with the ratio calculation of each frequency noise power estimation value D (n-1, k) in last analysis frame it adopts following formulae discovery to obtain:
∂ ( n , k ) = Y ( n , k ) D ( n - 1 , k ) , k = 0,1 , . . . , K - 1 - - - ( 11 )
According to the maximum normalized autocorrelation value of the present analysis frame obtained in step S202c with the posteriori SNR of each frequency in the present analysis frame obtained in step S203b there is Probability p (n, k) in the voice calculating each frequency in present analysis frame in step S204, it obtains according to following formulae discovery:
p ( n , k ) = ϵ 1 × exp [ ∂ ( n , k ) ] / { α + exp [ ∂ ( n , k ) ] } r ‾ ( n ) ≥ r 1 ϵ 2 × exp [ ∂ ( n , k ) ] / { α + exp [ ∂ ( n , k ) ] } r 1 > r ‾ ( n ) > r 2 ϵ 3 × exp [ ∂ ( n , k ) ] / { α + exp [ ∂ ( n , k ) ] } r ‾ ( n ) ≤ r 2 - - - ( 12 )
Wherein, there is probability in the voice of the kth frequency that p (n, k) represents in the n-th analysis frame, α is constant, and ε 1, ε 2, ε 3 represent weighting coefficient, and exp represents and asks index, and r1, r2 represent threshold value.
In the present embodiment, in order to avoid noise power is undergone mutation, according to test of many times result, ε 1 and ε 3 respectively value is 1.0,0.6; To be generally voice more than 0.4 according to normalized autocorrelation value, to be generally the principle of non-voice lower than 0.2, r1, r2 respectively value are 0.4,0.2; In order to upgrade noise power sooner, α value is 1, to sum up can release the value of ε 2 for [2 × r (n)+0.2].Now in present analysis frame there is the following formula of Probability p (n, k) and obtain in the voice of each frequency:
p ( n , k ) = 1.0 × exp ( ∂ ( n , k ) ) / ( 1 + exp ( ∂ ( n , k ) ) ) r ‾ ( n ) ≥ 0.4 ( 2 × r ‾ ( n ) + 0.2 ) × exp ( ∂ ( n , k ) ) / ( 1 + exp ( ∂ ( n , k ) ) ) 0.4 > r ‾ ( n ) > 0.2 0.6 × exp ( ∂ ( b , k ) / ( 1 + exp ( ∂ ( n , k ) ) ) r ‾ ( n ) ≤ 0.2 - - - ( 13 )
For non-equilibrium noise, when there is the noise signal increased suddenly in input speech signal, in present analysis frame there is probability and will reduce in each frequency voice, thus accelerate the tracking to non-equilibrium noise.
Finally in step S205, there is according to the voice of each frequency in the noisy speech power of each frequency in the noise power estimation value of frequency each in last analysis frame, present analysis frame and present analysis frame the noise power estimation value that probability weight obtains each frequency in present analysis frame.
First in step S205a according to the noisy speech power Y (n of each frequency in the present analysis frame obtained in step S203a, k) there is Probability p (n with voice, k) the noise power estimation value D (n-1 of each frequency and in last analysis frame, k) the noise power updated value U (n of each frequency in present analysis frame is calculated, k), it can represent with following formula:
U(n,k)=p(n,k)×D(n-1,k)+[1-p(n,k)]×Y(n,k)k=0,1,...,K-1(14)
Subsequently according to the noise power updated value U (n of frequency each in present analysis frame, and each frequency noise power estimation value D (n-1 of last analysis frame k), k) in step S205b, weighting obtains the noise power estimation value D (n of each frequency of present analysis frame, k), it adopts formula to try to achieve:
D(n,k)=β×D(n-1,k)+(1-β)×U(n,k)k=0,1,...,K-1(15)
Wherein, β is weighting coefficient, and in the present embodiment, according to test of many times result, β is set to 0.8.
Because noise power estimation is finally for squelch, so when being used for squelch, the present invention also comprises the posteriori SNR that step S206 carries out each frequency of present analysis frame with the renewal of prior weight η (n, k), and calculate final gain coefficient G (n, k), for squelch.In the present embodiment, posteriori SNR following formulae discovery is adopted to obtain:
∂ ( n , k ) = Y ( n , k ) D ( n , k ) - - - ( 16 )
The present invention is suitable for stationary noise power and prior art, can the change of tracking noise fast for nonstationary noise.The average tracking time for many noises is 0.2s, and prior art is 0.5s ~ 1.5s for average tracking time of many noises, so the present invention substantially increases the tracking velocity of noise.Meanwhile, the present invention, without the need to preserving the noise power of each frequency in multiple frames in the past, only needs the data of present frame and former frame, so computing of the present invention is simple, takies storage resources little, for the estimation practicability and effectiveness of noise power.
Although the embodiment disclosed by the present invention is as above, the embodiment that described content just adopts for the ease of understanding the present invention, and be not used to limit the present invention.Technician in any the technical field of the invention; under the prerequisite not departing from the spirit and scope disclosed by the present invention; any amendment and change can be done what implement in form and in details; but scope of patent protection of the present invention, the scope that still must define with appending claims is as the criterion.

Claims (10)

1. a noise power estimation method, is characterized in that, said method comprising the steps of:
Noisy speech framing step, carries out sampling to input noisy speech according to the sample frequency that presets and obtains input tape noisy speech signal sampling point, and carry out framing to described sampling point in chronological order, and obtain a series of analysis frame;
Maximum normalized autocorrelation value calculation procedure, calculates the maximum normalized autocorrelation value of present analysis frame according to the Noisy Speech Signal sampling point in present analysis frame;
Posteriori SNR calculation procedure, calculates noisy speech power and the posteriori SNR of each frequency in present analysis frame respectively according to the Noisy Speech Signal sampling point in described present analysis frame;
There is probability calculation step in each frequency voice, the voice calculating each frequency in described present analysis frame according to the posteriori SNR of each frequency in the maximum normalized autocorrelation value of described present analysis frame and present analysis frame exist probability;
, there is according to the voice of each frequency in the noisy speech power of each frequency in the noise power estimation value of frequency each in last analysis frame, described present analysis frame and present analysis frame the noise power estimation value that probability calculation obtains each frequency in described present analysis frame in each frequency noise power estimation value calculation procedure.
2. the method for claim 1, is characterized in that, the Noisy Speech Signal sampling point of described present analysis frame comprises the part Noisy Speech Signal sampling point of former frame and whole Noisy Speech Signal sampling points of present frame.
3. the method for claim 1, is characterized in that, described maximum normalized autocorrelation value calculation procedure comprises the following steps:
S202a, utilize one to preset the bandpass filter of cutoff frequency to carry out filtering to the Noisy Speech Signal sampling point in described present analysis frame and obtain being with logical voice signal sampling point, be designated as the signal sampling point of present analysis frame;
S202b, calculate present analysis frame energy and correlation according to the signal sampling point of described present analysis frame;
S202c, according to described present analysis frame energy and correlation value calculation the maximum normalized autocorrelation value of present analysis frame.
4. method as claimed in claim 3, is characterized in that, the correlation of present analysis frame according to following formulae discovery:
r ( n , τ ) = Σ i = 0 I - 1 - τ [ x ( n , i ) × x ( n , i + τ ) ]
Wherein, x (n, i) i-th signal sampling point in the n-th analysis frame is represented, x (n, i+ τ) represent the i-th+τ in the n-th analysis frame signal sampling point, τ represents the side-play amount of the signal sampling point carrying out value in pitch period, and I represents the signal total sample in present analysis frame, the correlation of the n-th analysis frame when r (n, τ) represents that side-play amount is τ.
5. the method for claim 1, is characterized in that, described posteriori SNR calculation procedure comprises the following steps:
S203a, the noisy speech power that discrete Fourier transformation obtains each frequency in described present analysis frame is carried out to the signal sampling point of present analysis frame;
S203b, calculate the posteriori SNR of each frequency in present analysis frame according to the noise power estimation value of each frequency corresponding in the noisy speech power of each frequency in described present analysis frame and last analysis frame.
6. the method for claim 1, is characterized in that, there is probability according to the voice of each frequency in following formulae discovery present analysis frame:
p ( n , k ) = ϵ 1 × exp [ ∂ ( n , k ) ] / { α + exp [ ∂ ( n , k ) ] } r ‾ ( n ) ≥ r 1 ϵ 2 × exp [ ∂ ( n , k ) ] / { α + exp [ ∂ ( n , k ) ] } r 1 > r ‾ ( n ) > r 2 ϵ 3 × exp [ ∂ ( n , k ) ] / { α + exp [ ∂ ( n , k ) ] } r ‾ ( n ) ≤ r 2
Wherein, there is probability in the voice of the kth frequency that p (n, k) represents in the n-th analysis frame, represent the posteriori SNR of the kth frequency in the n-th analysis frame, α is constant, and ε 1, ε 2, ε 3 represent weighting coefficient, and exp represents and asks index, and r1, r2 represent threshold value, represent the maximum normalized autocorrelation value of the n-th analysis frame.
7. the method for claim 1, is characterized in that, described each frequency noise power estimation value calculation procedure comprises the following steps:
S205a, the noise power estimation value that to there is in probability and last analysis frame corresponding each frequency according to the voice of each frequency in the noisy speech power of each frequency in described present analysis frame, described present analysis frame calculate the noise power updated value of present analysis frame;
S205b, obtain the noise power estimation value of each frequency in described present analysis frame according to the noise power estimation value weighting of each frequency corresponding in the noise power updated value of each frequency in described present analysis frame and last analysis frame.
8. method as claimed in claim 7, is characterized in that, the noise power updated value of each frequency in present analysis frame according to following formulae discovery:
U(n,k)=p(n,k)×D(n-1,k)+[1-p(n,k)]×Y(n,k)k=0,1,...,K-1
Wherein, U (n, k) the noise power updated value of the kth frequency in the n-th analysis frame is represented, there is probability, D (n-1 in the voice of the kth frequency that p (n, k) represents in the n-th analysis frame, k) the noise power estimation value of the kth frequency in the (n-1)th analysis frame is represented, Y (n, k) represents the noisy speech power of the kth frequency in the n-th analysis frame, and K represents the frequency sum in the n-th analysis frame.
9. method as claimed in claim 7, is characterized in that, the noise power estimation value of each frequency in present analysis frame according to following formulae discovery:
D(n,k)=β×D(n-1,k)+(1-β)×U(n,k)k=0,1,...,K-1
Wherein, D (n, k) represents the noise power estimation value of the kth frequency in the n-th analysis frame, U (n, k) represent the noise power updated value of the kth frequency in the n-th analysis frame, β represents that one presets weighting coefficient, and K represents the frequency sum in the n-th analysis frame.
10. the method for claim 1, is characterized in that, described method also comprises the posteriori SNR and prior weight that upgrade each frequency in described present analysis frame and calculates final gain coefficient step.
CN201310585440.3A 2013-11-19 2013-11-19 A kind of noise power estimation method Expired - Fee Related CN103646648B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310585440.3A CN103646648B (en) 2013-11-19 2013-11-19 A kind of noise power estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310585440.3A CN103646648B (en) 2013-11-19 2013-11-19 A kind of noise power estimation method

Publications (2)

Publication Number Publication Date
CN103646648A CN103646648A (en) 2014-03-19
CN103646648B true CN103646648B (en) 2016-03-23

Family

ID=50251850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310585440.3A Expired - Fee Related CN103646648B (en) 2013-11-19 2013-11-19 A kind of noise power estimation method

Country Status (1)

Country Link
CN (1) CN103646648B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788606A (en) * 2016-04-03 2016-07-20 武汉市康利得科技有限公司 Noise estimation method based on recursive least tracking for sound pickup devices

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105225673B (en) * 2014-06-09 2020-12-04 杜比实验室特许公司 Methods, systems, and media for noise level estimation
CN106161751B (en) * 2015-04-14 2019-07-19 电信科学技术研究院 A kind of noise suppressing method and device
CN106997768B (en) * 2016-01-25 2019-12-10 电信科学技术研究院 Method and device for calculating voice occurrence probability and electronic equipment
CN106297818B (en) * 2016-09-12 2019-09-13 广州酷狗计算机科技有限公司 It is a kind of to obtain the method and apparatus for removing noisy speech signal
CN108074582B (en) * 2016-11-10 2021-08-06 电信科学技术研究院 Noise suppression signal-to-noise ratio estimation method and user terminal
WO2018161429A1 (en) * 2017-03-07 2018-09-13 华为技术有限公司 Noise detection method, and terminal apparatus
CN109643554B (en) * 2018-11-28 2023-07-21 深圳市汇顶科技股份有限公司 Adaptive voice enhancement method and electronic equipment
CN110827858B (en) * 2019-11-26 2022-06-10 思必驰科技股份有限公司 Voice endpoint detection method and system
CN113611319B (en) * 2021-04-07 2023-09-12 珠海市杰理科技股份有限公司 Wind noise suppression method, device, equipment and system based on voice component
CN113782011B (en) * 2021-08-26 2024-04-09 清华大学苏州汽车研究院(相城) Training method of frequency band gain model and voice noise reduction method for vehicle-mounted scene

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5012519A (en) * 1987-12-25 1991-04-30 The Dsp Group, Inc. Noise reduction system
CN101814290A (en) * 2009-02-25 2010-08-25 三星电子株式会社 Method for enhancing robustness of voice recognition system
CN103000174A (en) * 2012-11-26 2013-03-27 河海大学 Feature compensation method based on rapid noise estimation in speech recognition system
CN103295582A (en) * 2012-03-02 2013-09-11 联芯科技有限公司 Noise suppression method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5012519A (en) * 1987-12-25 1991-04-30 The Dsp Group, Inc. Noise reduction system
CN101814290A (en) * 2009-02-25 2010-08-25 三星电子株式会社 Method for enhancing robustness of voice recognition system
CN103295582A (en) * 2012-03-02 2013-09-11 联芯科技有限公司 Noise suppression method and system
CN103000174A (en) * 2012-11-26 2013-03-27 河海大学 Feature compensation method based on rapid noise estimation in speech recognition system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105788606A (en) * 2016-04-03 2016-07-20 武汉市康利得科技有限公司 Noise estimation method based on recursive least tracking for sound pickup devices

Also Published As

Publication number Publication date
CN103646648A (en) 2014-03-19

Similar Documents

Publication Publication Date Title
CN103646648B (en) A kind of noise power estimation method
CN103489446B (en) Based on the twitter identification method that adaptive energy detects under complex environment
US10134417B2 (en) Method and apparatus for detecting a voice activity in an input audio signal
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN101976566B (en) Voice enhancement method and device using same
Hirsch et al. Noise estimation techniques for robust speech recognition
CN101894563B (en) Voice enhancing method
CN103594094B (en) Adaptive spectra subtraction real-time voice strengthens
CN106885971B (en) Intelligent background noise reduction method for cable fault detection pointing instrument
CN105513605A (en) Voice enhancement system and method for cellphone microphone
CN104409078A (en) Abnormal noise detection and recognition system
US9754608B2 (en) Noise estimation apparatus, noise estimation method, noise estimation program, and recording medium
CN102982801A (en) Phonetic feature extracting method for robust voice recognition
CN106340292A (en) Voice enhancement method based on continuous noise estimation
CN101510426A (en) Method and system for eliminating noise
CN107331386B (en) Audio signal endpoint detection method and device, processing system and computer equipment
CN110265065B (en) Method for constructing voice endpoint detection model and voice endpoint detection system
KR101892733B1 (en) Voice recognition apparatus based on cepstrum feature vector and method thereof
CN106875938A (en) A kind of improved nonlinear adaptive sound end detecting method
CN102930870A (en) Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
CN102263601A (en) Multi-signal detecting method for broadband
CN105513614A (en) Voice activation detection method based on noise power spectrum density Gamma distribution statistical model
CN110942766A (en) Audio event detection method, system, mobile terminal and storage medium
CN108010536A (en) Echo cancel method, device, system and storage medium
CN109377982A (en) A kind of efficient voice acquisition methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160323

Termination date: 20161119

CF01 Termination of patent right due to non-payment of annual fee