CN108696791A

CN108696791A - A kind of combination perception gain function sound enhancement method of single microphone

Info

Publication number: CN108696791A
Application number: CN201710227956.9A
Authority: CN
Inventors: 谭洪舟; 李竺珊; 李宇; 农革
Original assignee: SYSU CMU Shunde International Joint Research Institute; National Sun Yat Sen University
Current assignee: Sun Yat Sen University; SYSU CMU Shunde International Joint Research Institute; National Sun Yat Sen University
Priority date: 2017-04-10
Filing date: 2017-04-10
Publication date: 2018-10-23

Abstract

The present invention provides a kind of combination perception gain function sound enhancement method of single microphone, which estimates prior weight in the domains DFT with decision-directed method;Secondly, enhance voice with the portfolio premium function that European distortion measure obtains is weighted using based on broad sense Gamma priori, in this case, the gain function of gained does not have closed solution, then uses the combination representation of its numerical solution;Finally, the inverse transformation that DFT is carried out to the spectrum component of voice then obtains the forms of time and space of enhancing voice.By this method, restore clean speech signal from noisy speech.

Description

A kind of combination perception gain function sound enhancement method of single microphone

Technical field

The present invention relates to field of speech enhancement, the combination more particularly, to a kind of single microphone perceives gain function language Sound Enhancement Method.

Background technology

In speech processing system, voice signal is becoming noisy speech after the interference of noise of all kinds, noisy Voice passes through voice of the speech enhan-cement module to be enhanced, and finally can carry out other processing to voice signal.It is practical raw In work, various actual treatments, such as voice coding, speech recognition, onboard system, background can be carried out to voice signal The interference of noise can seriously affect the performance of operating system.For many years, the speech enhan-cement based on statistical model is always to study Hot spot.Since its complexity is low, hardware requirement is simply widely used single microphone speech enhan-cement.Speech enhan-cement as voice at The preprocessing module of reason system is the effective means for fighting noise pollution, to reach the mesh for inhibiting noise, improving voice quality 's.The quality of gain function directly affects the performance of speech enhan-cement.Compared to Gaussian prior, Gamma priori more meets voice The distribution of DFT range coefficients.Auditory masking effect refers to that auditory system is not easy to recognize for the quantizing noise near formant Come, this characteristic can be utilized to be used for rounding error frequency spectrum.Therefore, using broad sense Gamma models to voice DFT range coefficients into Row modeling, and consider that the method for auditory masking effect is of great value.

Invention content

The present invention provides a kind of combination perception gain function sound enhancement method of single microphone, and this method can be realized from band It makes an uproar and restores clean speech signal in voice.

In order to reach above-mentioned technique effect, technical scheme is as follows:

A kind of combination perception gain function sound enhancement method of single microphone, includes the following steps:

S1:It is obtained using the unbiased noise power Power estimation based on MMSE

S2:Utilize decision-directed methodEstimate prior weight;

S3:Gain function is calculated according to the perception MMSE criterion of broad sense Gamma priori,

S4:Enhance voice using gain function

Further, in the step S1 in Additive noise model, S (k, i) and N (k, i) indicate kth frame respectively, i-th The voice signal and noise signal of a spectrum component.Noisy Speech Signal by being in frequency domain representation after Discrete Fourier Transform:X (k, i)=S (k, i)+N (k, i), if the power spectral density of voice isAnd the power spectral density of noise isThen prior weight definition is respectively defined as with posteriori SNRWithWherein, E[·]It is expectation operator, noise power spectrumEstimated using MMSE.

Further, estimate prior weight using DD methods in the step S2:

Wherein, P[·]Indicate halfwave rectifier,Previous frame Voice Power estimation, β=0.98.

Further, in the step S3:

In amplitude-frequency domain, X (k, i)=S (k, i)+N (k, i) is indicated using polar coordinates, then

Rexp (j θ)=Aexp (j φ)+Dexp (j ψ).The range coefficient of X, S, N are respectively for R, A, D.Width frequency domain speech The purpose of enhancing is exactly to acquire the estimation of A

The distribution of voice DFT range coefficients is modeled using unilateral broad sense Gamma models:

Wherein, Γ () indicates Gamma functions, τ and v is the form parameter of Gamma distributions, and β is scaling parameter, and as τ=1, β expression formulas are as follows:

Noise DFT coefficient is modeled using Gauss model:

Wherein, I₀() is zeroth order shellfish plug That function;

The European distortion measure of weighting of perception isThen risk function

The minimum value for taking risk function, obtains

Then have:As γ=1,There is no closed solutions, then Bessel equation is taken Approximation solves, and enablesΥ_x() is the parabolic cylinder function of x ranks:

1), when low signal-to-noise ratio, I is utilized₀In the Taylor series expansion of w=0Have

2), when high s/n ratio, I is utilized₀Approximate function when being worth very bigHave

Compared with prior art, the advantageous effect of technical solution of the present invention is:

The present invention estimates prior weight in the domains DFT with decision-directed method;Secondly, using based on broad sense Gamma priori Enhance voice with the portfolio premium function that European distortion measure obtains is weighted, in this case, the gain function of gained does not have Closed solution then uses the combination representation of its numerical solution;Finally, the inverse transformation that DFT is carried out to the spectrum component of voice, then obtain The forms of time and space of voice must be enhanced, restore clean speech signal from noisy speech in this way, can effectively realize.

Description of the drawings

Fig. 1 is the single microphone speech-enhancement system in the domains DFT;

Fig. 2 is the single microphone speech enhan-cement processing procedure in the domains DFT;

Fig. 3 is flow chart of the present invention;

Fig. 4 is the gain function of the perception MMSE changed with instantaneous signal-to-noise ratio.

Specific implementation mode

The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;

In order to more preferably illustrate that the present embodiment, the certain components of attached drawing have omission, zoom in or out, actual product is not represented Size;

To those skilled in the art, it is to be appreciated that certain known features and its explanation, which may be omitted, in attached drawing 's.

The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.

Embodiment 1

In Additive noise model, S (k, i) and N (k, i) indicates kth frame, the voice signal of i-th of spectrum component respectively With noise signal.Noisy Speech Signal by being in frequency domain representation after Discrete Fourier Transform:

X (k, i)=S (k, i)+N (k, i).If the power spectral density of voice isAnd the power of noise Spectrum density isThen prior weight definition is respectively defined as with posteriori SNRWithWherein, E[·]It is expectation operator, noise power spectrumEstimated using MMSE.

Estimate prior weight using DD methodsWherein, P[·]Table Show halfwave rectifier,Previous frame voice Power estimation.β=0.98 under normal circumstances.

For expression formula simplicity, frame index k and frequency index i is omitted.In amplitude-frequency domain, using polar coordinates come indicate X (k, i)= S (k, i)+N (k, i), then Rexp (j θ)=Aexp (j φ)+Dexp (j ψ).The range coefficient of X, S, N are respectively for R, A, D.Width The purpose of frequency domain speech enhancing is exactly to acquire the estimation of A

Wherein, Γ () indicate Gamma functions, τ with V is the form parameter of Gamma distributions, and β is scaling parameter.As τ=1, β expression formulas are as follows:

Noise DFT coefficient is modeled using Gauss model:

Wherein, I₀() is zero-order Bessel Function.

The European distortion measure of weighting of perception isThen risk functionThe minimum value for taking risk function, obtains Then have:As γ=1,There is no closed solutions, then take approximation to solve Bessel equation, enablesΥ_x() is The parabolic cylinder function of x ranks:

As shown in Figure 1, this is the block diagram of the single microphone speech-enhancement system in the domains DFT.As shown in Fig. 2, this is in Fig. 1 The details to every frame per frequency spectrum processing of reason process, i.e. the single microphone speech enhan-cement processing procedure in the domains DFT.Such as Fig. 3 institutes Show, this is the specific implementation flow chart of the present invention.

First, Noisy Speech Signal is through over-sampling (sample frequency 8000HZ), framing (140*129), adding window (50% weight It is folded), DFT transform to frequency domain.Estimate unbiased noise power spectrum with MMSE methods

Secondly, posteriori SNR γ and prior weight ξ is calculated according to such as rear two formula respectively, Take β=0.98.

Again, by amplitude and PHASE SEPARATION, the gain function in amplitude-frequency domain is calculated.According to posteriori SNR and priori noise Than passing through formula

Wherein, by prior weight ξ and posteriori SNR γ takes a value range (- 40dB~50dB, using 1dB as spacing) first to calculate gain function and table is made (91*91), in particular situations the corresponding gain function value of different priori posteriori SNR obtained by tabling look-up,p =-0.1, v recommends 0.7.

As shown in figure 4, being the gain function changed with instantaneous signal-to-noise ratio.

Then, spectrum gain is acted on into Noisy Speech SignalAnd by amplitude and phase Bit combination then obtains the frequency-domain expression of voice.

Finally, inverse Fourier transform carried out to Noisy Speech Signal, remove window, close frame (17967*1), then exportable voice Time domain is expressed, and can carry out subjective and objective hearing test to voice.

The same or similar label correspond to the same or similar components;

Position relationship described in attached drawing is used to only for illustration, should not be understood as the limitation to this patent;

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this All any modification, equivalent and improvement etc., should be included in the claims in the present invention made by within the spirit and principle of invention Protection domain within.

Claims

1. a kind of combination of single microphone perceives gain function sound enhancement method, which is characterized in that include the following steps:

S1:It is obtained using the unbiased noise power Power estimation based on MMSE

S2:Utilize decision-directed methodEstimate prior weight;

S4:Enhance voice using gain function

2. the combination of single microphone according to claim 1 perceives gain function sound enhancement method, which is characterized in that institute It states in step S1 in Additive noise model, S (k, i) and N (k, i) indicates kth frame, the voice letter of i-th of spectrum component respectively Number and noise signal.Noisy Speech Signal by being in frequency domain representation after Discrete Fourier Transform:X (k, i)=S (k, i)+N (k, I), if the power spectral density of voice isAnd the power spectral density of noise isThen Prior weight definition is respectively defined as with posteriori SNRWithWherein, E[·]It is it is expected Operator, noise power spectrumEstimated using MMSE.

3. the combination of single microphone according to claim 2 perceives gain function sound enhancement method, which is characterized in that institute It states in step S2 and estimates prior weight using DD methods:

Wherein, P[·]Indicate halfwave rectifier,Previous frame language Music estimation, β=0.98.

4. the combination of single microphone according to claim 3 perceives gain function sound enhancement method, which is characterized in that institute It states in step S3:

Rexp (j θ)=Aexp (j φ)+Dexp (j ψ).The range coefficient of X, S, N are respectively for R, A, D.Amplitude-frequency domain speech enhan-cement Purpose be exactly to acquire the estimation of A

Wherein, Γ () indicates Gamma functions, τ and v It is the form parameter of Gamma distributions, and β is scaling parameter, as τ=1, β expression formulas are as follows:

Noise DFT coefficient is modeled using Gauss model:

Wherein, I₀() is zero-order Bessel letter Number;

The European distortion measure of weighting of perception isThen risk functionThe minimum value for taking risk function, obtainsThen Have:As γ=1,There is no closed solutions, then take approximation to solve Bessel equation, enablesΥ_x() is x The parabolic cylinder function of rank: