CN101853665A

CN101853665A - Method for eliminating noise in voice

Info

Publication number: CN101853665A
Application number: CN200910087086A
Authority: CN
Inventors: 不公告发明人
Original assignee: BOSHIJIN (BEIJING) INFORMATION TECHNOLOGY Co Ltd
Current assignee: BOSHIJIN (BEIJING) INFORMATION TECHNOLOGY Co Ltd
Priority date: 2009-06-18
Filing date: 2009-06-18
Publication date: 2010-10-06

Abstract

The invention relates to a method for eliminating noise in voice, which is used for extracting a voice signal polluted by noise and can be widely applied to front-end processing of voice communication, auxiliary hearing, speech recognition and voice print authentication. The method comprises the following steps of: 1) framing and windowing a polluted noise-containing voice signal; 2) converting to a frequency domain to obtain the voice signal in the frequency domain; 3) estimating a statistics characteristic parameter of the noise in real time; 4) designing an amplifier according to a statistics model method and the estimated noise statistics characteristic; 5) amplifying the voice signal in the frequency domain and inversely converting the amplified voice signal into a time domain signal; and 6) performing overlap-add to obtain a final noise reduction signal. Since the statistics characteristic parameter of the noise is estimated in real time and the voice statistics model method is used, a large amount of noise is eliminated, the voice signal is prevented from being damaged and better noise reduction effect is achieved.

Description

The removing method of noise in the voice

Technical field: the invention belongs to especially field of voice signal of signal Processing, in particular, the present invention relates to the noise cancellation method in a kind of voice signal, also claim sound enhancement method.

Background technology: the removing method of noise also claims sound enhancement method in the voice, belong to a kind of optionally treatment technology, its purpose is from the signal that is subjected to noise pollution, extract pure primary speech signal, improve the sharpness and the intelligibility of voice, the place that can be widely used in that voice are intercepted, voice communication, voice hearing aid, video conference etc. need improve voice quality.In addition, on other important application of voice, as speech recognition, voiceprint etc., usually owing to the influence of the ground unrest practical application area that delays extensively to enter, the serious use that hinders these technology, by rational use noise cancellation technique, voice signal as well as possible extracting from contaminated noisy speech signal can significantly can be improved the performance of speech recognition and vocal print Verification System.Because sound enhancement method importance in practice is an international hot research direction always.

Speech enhancement technique is by the port number classification of picking up voice signal, can be divided into single channel speech enhancement technique and multicenter voice enhancement techniques, the multicenter voice enhanced system needs the microphone more than 2 usually, form microphone array, by wave beam form, processing means such as back filtering eliminate the ground unrest in the voice signal, because it not only can obtain the time-domain information and the frequency domain information of voice signal, and can obtain the spatial information of voice signal, can obtain better effect usually.The single channel speech-enhancement system only uses a microphone, and hardware resource requires low, and algorithm complex is also corresponding lower, be easy to realize, and range of application is more extensive, the invention belongs to the single channel sound enhancement method.

Typical single channel speech-enhancement system comprises the steps:

1) noisy voice signal y is done branch frame, windowing process;

2) transform to frequency field Y (Ω);

3) the statistical nature parameter of estimating noise;

4) multiplier (-icator) design and signals and associated noises are handled;

5) will handle back signal contravariant and gain time domain

6) signal after the use overlap-add method is enhanced;

The statistical nature parametric technique of estimating noise in the step 3).Can directly choose begin to locate quiet section and extract corresponding noise statistics feature as noise frame, also can be by real-time voice activity detection technique, following the tracks of the current speech frame is noise frame or speech frame, if noise frame then upgrades the noise statistics characteristic parameter, otherwise continues to follow the tracks of next frame.The noise spectrum P that obtains _Nn(Ω) expression.

The multiplier (-icator) design of step 4), in numerous single channel sound enhancement methods, maximum usually difference promptly is this, relatively typically the Wiener filtering multiplier (-icator) based on the MMSE criterion is:

G (Ω) = \frac{P_{ss} (Ω)}{P_{ss} (Ω) + P_{nn} (Ω)} - - - (1)

This method is used extensively in practice, and can obtain result preferably when noise meets smooth conditions, yet this method also has weak point.Mainly comprise the following aspects: 1) under the nonstationary noise environment, choose the unvoiced segments that begins to locate as noise frame, usually can not accurately obtain the statistical nature of follow-up noise, if use the voice activity detection method, realize that not only difficulty is bigger, and under non-stationary environment, be difficult to realize detecting accurately; What 2) multiplier (-icator) used is the linear gain device, no matter the sound section gain of still all using same ratio in unvoiced segments, obviously be irrational, because under the higher situation of signal to noise ratio (S/N ratio), undue removal noise can cause the loss of voice signal self-information on the contrary, and can residual more noise under the low situation of signal to noise ratio (S/N ratio), this also just the linear gain device can cause the reason of stronger " music noise ".Therefore, need a kind of more reasonably noise characteristic track algorithm and rational more multiplier (-icator).

Summary of the invention

The objective of the invention is to design a kind of more reasonably, method that can the real-time follow-up noise, make speech-enhancement system under the noise circumstance of non-stationary, to work, and design a kind of more effectively multiplier (-icator), avoid damaging voice as much as possible, and eliminate residual noise as much as possible, to remedy the shortcoming of linear gain device damage voice and residual noise.

Design content of the present invention comprises the steps:

1) noisy voice signal y is done branch frame, windowing process;

2) transform to frequency field Y (Ω);

3) the statistical nature parameter of estimating noise;

4) multiplier (-icator) design and signals and associated noises are handled;

5) will handle back signal contravariant and gain time domain

6) signal after the use overlap-add method is enhanced;

Wherein, the statistical nature parameter of step 3) estimating noise comprises the steps:

A) calculate the current frame signal power spectrum, use the estimating noise spectrum of former frame, by smoothing method, estimate current noise spectrum, its tracking velocity can be regulated by smoothing factor;

B) power spectrum signal and the noise spectrum that obtain in utilizing a) calculate the priori signal to noise ratio (S/N ratio);

C) estimated snr of use former frame by smoothing method, is estimated the posteriority signal to noise ratio (S/N ratio) of present frame, and its tracking velocity can be regulated by smoothing factor;

Wherein, step 4) gain design and signals and associated noises treatment step comprise the steps:

A) statistical model of selection voice, the operation parameter method is the speech data modeling;

B), utilize the estimating noise spectrum, priori signal to noise ratio (S/N ratio), the posteriority signal to noise ratio (S/N ratio) that obtain in the step 3) to obtain multiplier (-icator) G (Ω) again based on criterion MMSE;

C) utilize formula

The speech manual that obtains estimating;

Because the variation of real-time follow-up noise spectrum of the present invention, priori signal to noise ratio (S/N ratio), posteriority signal to noise ratio (S/N ratio), so can obtain the noise statistics feature of current demand signal frame more accurately, be applicable to the environment of complicated more nonstationary noise, and used based on the method for statistics speech data has been carried out modeling, obtain nonlinear multiplier (-icator) parameter, can be when removing more noises and avoid too much damage voice.

Description of drawings

Fig. 1 is the block scheme of the speech-enhancement system realized of the present invention;

Embodiment

Below in conjunction with the drawings and specific embodiments, the present invention done describing in further detail.

Voice signal x (t)=s (t)+n (t) to present frame does windowing process, and does the FFT conversion, obtains the signal frequency domain form:

Y(k)＝FFT(x(t)□h(t)) (3)

Wherein, window function h (t) can be hamming window or hanning window; Frame length can be selected 10 30ms.

The information of using former frame to estimate, noise spectrum N ^pAnd signal spectrum (k), Estimate the present frame noise spectrum:

Calculate the priori signal to noise ratio (S/N ratio) of present frame:

γ (k) = \frac{Y (k)}{N (k)} - 1 - - - (5)

Utilize the posteriority signal to noise ratio (S/N ratio) of former frame, calculate the posteriority signal to noise ratio (S/N ratio) of present frame:

η(k)＝max{0，β(k)η(k)+(1-β(k))γ(k)} (6)

After current noise statistics information is estimated to finish, can offer multiplier (-icator) and use.

Multiplier (-icator) design of the present invention belongs to parameterized optimal estimation problem.Need the estimated signals frequency domain form to be:

Be easy note, implicit below frequency representation k, the following formula right side is deployable to be:

According to bayesian theory, can obtain:

Wherein p () represents probability density function.If speech data is carried out modeling with this model of superelevation, can select Laplce's probability density function or gamma probability density function for use, as follows respectively:

p (S) = \frac{1}{σ_{s}} \exp (- \frac{2 | S |}{σ_{s}}) - - - (10)

p (S) = \frac{\sqrt[4]{3}}{2 \sqrt{π σ_{s}}} {| S |}^{- \frac{1}{2}} \exp (- \frac{\sqrt{3} | S |}{\sqrt{2} σ_{s}}) - - - (11)

The σ here _sThe variance of expression voice signal is an example with Laplce's probability function, establishes:

L^{+} = \frac{σ_{n}}{σ_{s}} + \frac{Y}{σ_{n}} = \frac{1}{\sqrt{η}} + \frac{Y}{σ_{n}}

L^{-} = \frac{σ_{n}}{σ_{s}} - \frac{Y}{σ_{n}} = \frac{1}{\sqrt{η}} - \frac{Y}{σ_{n}} - - - (12)

Wherein η is the posteriority signal to noise ratio (S/N ratio) of signal, according to optimum MMSE criterion, makes

Minimum can get:

Wherein erfc () represents an error function, and p (Y) is:

Finally obtain the following G of multiplier (-icator), the frequency-region signal after can obtaining handling by the multiplier (-icator) that obtains again

Get by the IFFT conversion again:

At last, the whole signals after can obtaining to strengthen by overlap-add method.

Claims

1. the removing method of noise in the voice is used to extract the voice signal by noise pollution, comprises the steps:

1) contaminated noisy speech signal is done branch frame, windowing process;

2) transform to frequency field, obtain the voice signal in the frequency domain;

3) the statistical nature parameter of real-time estimating noise;

4) according to the noise statistics designing gain device of statistical model method and estimation;

5) voice signal of frequency domain is done gain process, contravariant is changed to time-domain signal;

6) obtain complete voice signal by overlap-add method.

2. according to the statistical nature parameter of real-time estimating noise in the claim 1, it is characterized in that step 3), comprising:

1) calculates the frequency spectrum of this frame voice signal;

2) noise spectrum of estimating according to former frame calculates the priori signal to noise ratio (S/N ratio);

3) according to the noise spectrum of priori signal to noise ratio (S/N ratio) of calculating and former frame estimation, do smoothing processing, estimate the noise spectrum and the posteriority signal to noise ratio (S/N ratio) of present frame;

4) parameter of estimating is passed to multiplier (-icator).

3. according to the noise statistics designing gain device described in the claim 1, it is characterized in that step 4, comprising according to statistical model method and estimation:

1) chooses the design of statistical model method and optiaml ciriterion, design multiplier (-icator);

2) by the estimated statistical nature parameter of step 3) in the claim 1, the voice signal in the frequency domain is done gain process, obtain voice signal in the frequency domain after the de-noising;