CN107045874A

CN107045874A - A kind of Non-linear Speech Enhancement Method based on correlation

Info

Publication number: CN107045874A
Application number: CN201610079921.0A
Authority: CN
Inventors: 韩翀蛟; 高可攀; 羊开云; 徐晓峰; 李夏宾
Original assignee: GRANDSTREAM NETWORKS Inc; SHENZHEN GRANDSTREAM NETWORKS TECHNOLOGY Co Ltd
Current assignee: GRANDSTREAM NETWORKS Inc; SHENZHEN GRANDSTREAM NETWORKS TECHNOLOGY Co Ltd
Priority date: 2016-02-05
Filing date: 2016-02-05
Publication date: 2017-08-15
Anticipated expiration: 2036-02-05
Also published as: CN107045874B

Abstract

The invention discloses a kind of Non-linear Speech Enhancement Method based on correlation, including：Step a：The noisy speech data pre-processed to voiceWith estimation noise dataFast Fourier Transform (FFT) is carried out, the frequency spectrum of noisy speech frame is obtainedWith the frequency spectrum of estimation noise frame；Step b：When fading gain is calculated noise, obtains fading gain,；Step c：The correlation calculations of noisy speech and noise, calculate the frequency spectrum for obtaining the noisy speech frameWith the frequency spectrum of estimation noise frameCross-correlation function,；Step d：Nonlinear attenuation gain is calculated, and obtains the nonlinear attenuation gain；Step e：Speech enhan-cement processing, by the fading gainWith nonlinear attenuation gain described in the step dCollective effect is in the frequency spectrum of the noisy speech frameOn, to realize the processing of speech enhan-cement, obtain pure speech signal spec-trum.The technical scheme that the present invention is provided can more thoroughly remove the noise contribution in Noisy Speech Signal, and can remove noise according to different application scene and ensure that the aspect of speech quality two carries out flexibly compromise application.

Description

A kind of Non-linear Speech Enhancement Method based on correlation

Technical field

The invention belongs to speech communication technical field, more particularly to speech enhancement technique.

Background technology

In voice communication course, the voice that transmitting terminal session person sends can be introduced noise by from surrounding environment where it Interference, such as handle official business room conditioning sound, the sound that the fan such as host computer is rotated etc..The voice received in receiving terminal is No longer it is the clean speech of transmitting terminal session person, and is the introduction of the noisy speech of all kinds of noise jammings, makes receiving terminal listener The speech recognition reduction of uppick.But in many occasions, especially during videoconference, speech recognition and voice Quality need to preferably be ensured, therefore it is necessary to carry out enhancing to voice, and speech enhancement technique of coming in is rapidly developed.

It is to subtract the method for thought based on spectrum to have a class in existing sound enhancement method, and such method is by noisy speech frequency spectrum The noise spectrum obtained with estimation makees difference and obtains enhanced speech signal spec-trum, and its algorithm complex is low, and amount of calculation is smaller, but It has the disadvantage more serious using noise in speech signal residual after spectrum-subtraction progress speech enhan-cement.Equations of The Second Kind is based on adaptive filter The speech enhancement technique of ripple algorithm, such algorithm can not fundamentally overcome the contradiction between convergence rate and steady-state error, and Algorithm effect in the relatively low environment of signal to noise ratio is poor.3rd class is the speech enhan-cement side based on matrix decomposition or model learning class Method, such method is more excellent for the removal effect of non-stationary burst noise, but such method be related to matrix decomposition, model training Learn scheduling theory implementation process complicated, amount of calculation is higher by much than preceding two classes method.It is of the invention to disclose one kind newly based on above-mentioned The speech enhancement technique of type is with place of overcoming the deficiencies in the prior art.

The content of the invention

Present invention aims at a kind of Non-linear Speech Enhancement Method based on correlation is provided, ensureing voice quality Under the premise of solve the problems such as noise remove is not net, and preferable speech enhan-cement effect can be obtained under the relatively low scene of signal to noise ratio.

In order to realize foregoing invention purpose, technical scheme is as follows：A kind of non-linear voice based on correlation Enhancement Method, mainly includes：Step a：The noisy speech data pre-processed to voiceWith estimation noise dataCarry out fast Fast Fourier transformation, obtains the frequency spectrum of noisy speech frameWith the frequency spectrum of estimation noise frame；Step b：Noise when decays Gain is calculated, and obtains fading gain,； Step c：The correlation calculations of noisy speech and noise, calculate the frequency spectrum for obtaining noisy speech frameWith the frequency of estimation noise frame SpectrumCross-correlation function,； Step d：Nonlinear attenuation gain is calculated, and obtains nonlinear attenuation gain；Step e：Speech enhan-cement processing, will decline Subtract gainWith nonlinear attenuation gain in step dCollective effect is in the frequency spectrum of noisy speech frame On, to realize the processing of speech enhan-cement, obtain pure speech signal spec-trum。

It is preferred that, step f is also included behind step e, for inverse fast Fourier transform, to speech signal spec-trum Known inverse fast Fourier transform is carried out, signal is converted back into time domain from frequency domain：。

It is preferred that, step b further comprises the steps：Step b1：Calculate posteriori SNR,；Step b2：Calculate signal to noise ratio and update coefficient,, whereinFor former frame noisy speech data, parameterCan Appropriate value is chosen according to concrete application scene；Step b3：Calculate prior weight,；Step b4：Calculate prior weight ratio,；Step b5：Calculated and most preferably declined using hypergeometric distribution correlation computations formula Subtract gain；Step b6：Calculate fading gain lower limit；Step b7：Calculating obtains fading gain。

It is preferred that, parameter described in step b2Conventional span is [0.05,0.30].ParameterIt can take 0.25。

It is preferred that, step b5 optimized attenuation gains, Wherein,For known gamma functions,,For with Natural constantFor the exponential function at bottom,WithRespectively 0 rank and 1 rank Bessel functions.

It is preferred that, fading gain lower limit in step b6, WhereinFor with natural constantFor the exponential function at bottom.

It is preferred that, fading gain, WhereinFor weight coefficient, appropriate value can be chosen according to application scenarios, conventional span is [0.60,0.90].

It is preferred that, nonlinear attenuation gainPass throughCalculating is obtained, its In,Smaller value is taken to operate for conventional, i.e.,

。

It is preferred that, nonlinear attenuation gainCalculating process be：。

The present invention provides a kind of Non-linear Speech Enhancement Method based on correlation, can under the premise of relatively low amount of calculation, Overcome the shortcomings of art methods, making an uproar in Noisy Speech Signal can be more thoroughly removed using technical scheme Sound composition, and can remove noise according to different application scene and ensure that the aspect of speech quality two is flexibly compromised, and this hair The technical scheme of bright offer can obtain preferable speech enhan-cement effect under the relatively low scene of signal to noise ratio, and amount of calculation is compared to existing There is algorithm to have no larger lifting, it is easy to accomplish and applied to the scene of real-time Transmission.

Brief description of the drawings

Fig. 1 is the flow chart of Non-linear Speech Enhancement Method in the specific embodiment of the invention.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

The present invention general principle be：A kind of Non-linear Speech Enhancement Method.This method utilizes Noisy Speech Signal and ginseng The frequency domain information for examining noise signal calculates signal to noise ratio, and the fading gain value of each frequency range is obtained using the signal-to-noise ratio computation；Then again The correlation of Noisy Speech Signal and reference noise signal is calculated, correlation carries out nonlinear adjustment to fading gain value accordingly, Finally the fading gain after adjustment is multiplied with noisy speech frequency spectrum and obtains the clean speech for removing noise jamming.

Fig. 1 is the flow chart of Non-linear Speech Enhancement Method in the specific embodiment of the invention.With reference to Fig. 1, to this The specific implementation step of inventive method is further described.

The present invention focuses on known noisy speechWith known estimation noiseOn the premise of, carry out at speech enhan-cement Reason, and for noiseEstimation procedure do not described.

Step 1 voice is pre-processed：

To noisy speechWith estimation noiseSub-frame processing is carried out, band to be reinforced is obtained by adding window sub-frame processing and made an uproar Speech dataWith estimation noise data：

Wherein,For window function, Hamming is used in this embodiment（Hamming）Window；Adding window sub-frame processing is digital letter Commonly used and necessary process in number processing, processing limited quantity numeral letter can be read in data signal operation processing unit every time Number, data signal is subjected to framing by the quantity that processing can be read every time using window function.

Step 2 Fast Fourier Transform (FFT)：

The noisy speech obtained to windowing processWith estimation noiseKnown Fast Fourier Transform (FFT) is carried out, is obtained The frequency spectrum of noisy speech frameWith the frequency spectrum of estimation noise frame:

WhereinFor known Fast Fourier Transform (FFT).

When fading gain is calculated step 3 noise：

In this step to noise when fading gain estimation use for reference Y. Ephraim and D. Malah in " Y. Ephraim and D. Malah, "Speech enhancement using minimum mean-square error short-time spectral amplitude estimator". IEEE Transactions on Acoustics, Speech and The classical calculation proposed in Signal Processing, vol. ASSP-32, no. 6, pp. 1109-1121,1984. " Method, and its algorithm is improved with simplifying, the calculating process is only sketched, details refer to above-mentioned original text：

1）Posteriori SNR is calculated first：

2）Then calculate signal to noise ratio and update coefficient：

WhereinFor former frame noisy speech data, parameterAppropriate value can be chosen according to concrete application scene, commonly used Span be [0.05,0.30], in present exampleIt is chosen for 0.25；

3）Calculate prior weight：

1 is utilized in this step）It is middle to calculate obtained posteriori SNRWith 2）It is middle to calculate obtained renewal coefficient The prior weight that weighted sum is estimated；

4）Utilize 3）Calculate what is obtainedCalculate prior weight ratio：

5）Optimized attenuation gain is calculated using hypergeometric distribution correlation computations formula：

Wherein,For known gamma functions,, For the exponential function using natural constant the bottom of as,WithRespectively 0 rank and 1 rank Bessel functions, relevant Bessel correlations can With reference to William J.Lentz " Bessel functions in Mie scattering calculations using continued fractions”；

6）Calculate fading gain lower limit：

WhereinWith 5）Described in, it is with natural constantFor the exponential function at bottom, fading gain lower limitFor one on the occasion of for the optimized attenuation gain to trying to achieveLimited, if, then optimized attenuation gain is illustratedValue is too small, so that enhanced voice Can be containing " music noise " phenomenon that rises and falls, therefore must utilizeIt is rightValue be any limitation as, refer to 7） Middle calculating process；

7）Calculating obtains fading gain：

WhereinLarger Value Operations are taken for conventional, i.e.,

UtilizeIt is rightIt is any limitation as, and withIt is weighted summation and asks flat again Side obtains fading gain；WhereinFor weight coefficient, appropriate value, conventional value can be chosen according to application scenarios Scope is selection in [0.60,0.90], present exampleFor 0.75.

The correlation calculations of step 4 noisy speech and noise

Noisy Speech Signal is first calculated in the stepPower spectrumWith estimation noisePower spectrum, it is related in the stepSubscript represents the real part of phasor,Subscript represents the imaginary part of phasor：

Then Noisy Speech Signal is calculatedWith estimation noiseCrosspower spectrum：

Then Noisy Speech Signal is calculatedWith estimation noiseCross-correlation function：

。

It is contemplated that utilizing Noisy Speech SignalWith estimation noiseCorrelation strengthen speech enhan-cement effect Really, noisy speech power spectrum is utilized in frequency domain in this step, estimating noise power spectrumAnd the two is mutual Power spectrumCalculating obtains Noisy Speech SignalWith estimation noiseCross-correlation function.In language In sound processing procedure, Noisy Speech SignalAnd estimated noise signalIt is considered as the gaussian variable of statistical iteration, cross-correlation Function can characterize the degree of correlation of noisy speech and estimation noise in different frequency range, and cross-correlation function value is larger, illustrates band Make an uproar voice and estimation Noise Correlation it is stronger, then it represents that less without phonetic element or phonetic element in noisy speech, noise contribution Accounting is higher；Cross-correlation function value is smaller, illustrates that noisy speech and estimation Noise Correlation are weaker, then it represents that contain in noisy speech There is more phonetic element, it is shown weaker correlation with estimation noise.

The gain of step 5 nonlinear attenuation is calculated

CalculateAverage：

In above formulaFor less thanThe integer of higher limit, according to different application scenarios,The selection of value can be different, such as The occasion of low frequency is concentrated in noise,Smaller value can be chosen, and in the unknown occasion of noise characteristic,Can choose withHigher limit The value of formed objects.Such as, sample rate is that frame length elects 10ms as in 16kHz, adding window preprocessing process, then the data point in a frame Number is 160, carries out Fast Fourier Transform (FFT) using the folded mode of frame and tries to achieve cross-correlation function, thenSpan is 0 ..., 159, can be by if known noise integrated distribution is in low-frequency range 0Hz-4kHzValue is elected 79 as and asked for。

According to the correlation average of frequency range of interestDetermine whether to apply nonlinear attenuation gain to present frame, ContrastWith relevance thresholdIf,, illustrate in frequency-domain segment of interest, when Preceding number of speech frames is according to estimating that noise data correlation is smaller, and voice accounts for main component, to ensure that speech quality does not suffer damage, Nonlinear attenuation gain is not applied then, by nonlinear attenuation gainIt is set to 1.0；If, illustrate in frequency-domain segment of interest, current speech frame data with estimation noise data correlation compared with Greatly, noise contribution is accounted for mainly, for phonetic element enhancing effect is better achieved, and need to be applied nonlinear attenuation gain and further be removed Noise, nonlinear attenuation gainPass throughCalculating is obtained, whereinFor Conventional takes smaller value to operate, i.e.,

UseIt is for guarantee, it is ensured that nonlinear attenuation gainFor noisy speech Play and decay and the effect of non-amplified.

To sum up,Calculating process is：

WhereinAppropriate value can be selected according to concrete application scene, the selection of the numerical value, which also can be considered, is removing noise Compromise is made between interference and guarantee speech quality, ifChoose higher value, then according to above formula,It is set to 1.0 probability increase, the declines of nonlinear attenuation gain, noise can be while ensureing that speech quality is injury-free Residual；IfChoose smaller value,The probability for being set to 1.0 reduces, and the effect of nonlinear attenuation gain increases By force, noise jamming can be preferably removed, if butThe value of selection is too small, and nonlinear attenuation gain effect is excessive, meeting Damage is caused to speech quality.ThereforeAppropriate value, conventional span need to be chosen according to concrete application scene For [0.70,0.80], in present exampleValue is 0.735.

The processing of step 6 speech enhan-cement

Step 3 is calculated to obtained fading gainObtained nonlinear attenuation gain is calculated with step 5 Collective effect is in noisy speech frequency spectrumOn realize speech enhan-cement handle：

Noisy Speech Signal frequency spectrumIn the fading gain obtained using signal-to-noise ratio computationOn the basis of effect, enter one Step carries out nonlinear attenuation gain process, preferably removes noise using this nonlinear attenuation gain, obtains purer language Sound。

Step 7 inverse fast Fourier transform

To handling obtained speech signal spec-trum by enhancingKnown inverse fast Fourier transform is carried out, by signal Time domain is converted back from frequency domain：, obtain enhanced time domain speech signal, wherein For known inverse fast Fourier transform.

The upper only presently preferred embodiments of the present invention, is not intended to limit the invention, all spirit in the present invention With any modifications, equivalent substitutions and improvements made within principle etc., it should be included in the scope of the protection.

Claims

1. a kind of Non-linear Speech Enhancement Method based on correlation, it is characterised in that methods described mainly includes：

Step a：The noisy speech data pre-processed to voiceWith estimation noise dataCarry out Fast Fourier Transform (FFT), Obtain the frequency spectrum of noisy speech frameWith the frequency spectrum of estimation noise frame；

Step b：When fading gain is calculated noise, obtains fading gain,；

Step c：The correlation calculations of noisy speech and noise, calculate the frequency spectrum for obtaining the noisy speech frameWith estimation noise The frequency spectrum of frameCross-correlation function,；

Step d：Nonlinear attenuation gain is calculated, and obtains the nonlinear attenuation gain；

Step e：Speech enhan-cement processing, by the fading gainWith nonlinear attenuation gain described in the step dCollective effect is in the frequency spectrum of the noisy speech frameOn, to realize the processing of speech enhan-cement, obtain pure Speech signal spec-trum。

2. according to the method described in claim 1, it is characterised in that also include step f behind the step e, for quick Fu In leaf inverse transformation, to the speech signal spec-trumKnown inverse fast Fourier transform is carried out, signal is turned from frequency domain Gain time domain：。

3. method according to claim 2, it is characterised in that the step b further comprises the steps：Step b1： Calculate posteriori SNR,；Step b2：Calculate signal to noise ratio and update coefficient ,, whereinFor former frame noisy speech data, parameterCan Appropriate value is chosen according to concrete application scene；Step b3：Calculate prior weight,；Step b4：Calculate prior weight ratio,；Step b5：Calculated and most preferably declined using hypergeometric distribution correlation computations formula Subtract gain；Step b6：Calculate fading gain lower limit；Step b7：Calculating obtains the decay and increased Benefit。

4. method according to claim 3, it is characterised in that parameter described in the step b2Conventional span For [0.05,0.30].

5. method according to claim 4, it is characterised in that the parameter0.25 can be taken.

6. the method according to claim 4 or 5, it is characterised in that optimized attenuation gain described in the step b5, wherein,For known gamma functions,,For with natural constantFor the exponential function at bottom,With Respectively 0 rank and 1 rank Bessel functions.

7. method according to claim 6, it is characterised in that fading gain lower limit described in the step b6, whereinFor with natural constantFor the index letter at bottom Number.

8. method according to claim 7, it is characterised in that the fading gain, whereinFor weighting system Number, can choose appropriate value according to application scenarios, and conventional span is [0.60,0.90].

9. method according to claim 8, it is characterised in that the nonlinear attenuation gainPass throughCalculating is obtained, wherein,Smaller value is taken to operate for conventional, i.e.,

。

10. method according to claim 9, it is characterised in that the nonlinear attenuation gainCalculating Cheng Wei：。