CN103700375A

CN103700375A - Voice noise-reducing method and voice noise-reducing device

Info

Publication number: CN103700375A
Application number: CN201310743950.9A
Authority: CN
Inventors: 吴东海; 张林山
Original assignee: Allwinner Technology Co Ltd
Current assignee: Allwinner Technology Co Ltd
Priority date: 2013-12-28
Filing date: 2013-12-28
Publication date: 2014-04-02
Anticipated expiration: 2033-12-28
Also published as: CN103700375B

Abstract

The invention provides a voice noise-reducing method and a voice noise-reducing device. The voice noise-reducing method comprises the steps of calculating a smooth factor according to a voice activation judging result, calculating and obtaining a first energy spectrum of a first frequency-domain voice signal and a cross-correlation energy spectrum according the smooth factor, and calculating and obtaining a cross-correlation transfer function according to the first energy spectrum and the cross-correlation energy spectrum; calculating and obtaining an initial frequency-domain filter according a noise compensation factor, initial noise energy and the first energy spectrum; calculating the smooth window length according to the initial frequency-domain filter and the amplitude of the first frequency-domain voice signal; obtaining a frequency-domain filter by convolving the initial frequency-domain filter and the smooth window length; outputting a filtered first frequency-domain voice signal after the first frequency-domain voice signal is filtered through the frequency-domain filter. The invention also provides the voice noise-reducing device which is used for realizing the voice noise-reducing method. The voice noise-reducing method and the voice noise-reducing device, which are provided by the invention, have the characteristics that the noise tracking speed is fast, the calculating amount is less, the environmental noise can be effectively restrained, and the like.

Description

Voice de-noising method and device thereof

Technical field

The present invention relates to Audio Signal Processing technical field, relate in particular to a kind of method and device thereof that voice signal is carried out to noise reduction.

Background technology

Along with the development of social technology, people are more and more higher to the requirement of sound quality, especially the requirement to voice call quality.Guarantee clear, noiseless and naturally high-quality call, voice de-noising function is that the communication tools such as mobile phone, panel computer are indispensable.Take mobile phone or panel computer as example, product in the market has generally all been installed two microphones, utilize one of them microphone installing near on the position of human body phonatory organ, this microphone picks up the voice signal that people sends and has also picked up neighbourhood noise outward, another one microphone be arranged on away from the position of human body phonatory organ to pick up neighbourhood noise, the sound picking up by two microphones is analyzed, and the voice signal that human body is sent carries out noise reduction process, thereby obtain, filter out high-quality voice signal.

At present, be applied in the dual microphone voice de-noising method of mobile phone or panel computer, it is larger that noise reduction is generally calculated amount preferably, need to move complicated algorithm by being equipped with independent noise reduction chip.And for some comparatively simple dual microphone noise-reduction methods, often noise reduction is undesirable, residual noise is excessive, or this noise-reduction method has strict demand to speech source orientation, and the incorrect words of use-pattern will cancellation voice signal.

For above problem, the present invention proposes a kind of dual microphone noise reduction algorithm that is applied to mobile phone terminal, and this algorithm noise tracking velocity is fast, can effectively suppress neighbourhood noise, for sound bearing, has good robustness, and algorithm complex is low, calculated amount is little.

Summary of the invention

The first object of the present invention is to provide that a kind of noise tracking velocity is fast, calculated amount is few and can effectively suppresses the voice de-noising method of neighbourhood noise.

The second object of the present invention is to provide that a kind of noise tracking velocity is fast, calculated amount is few and can effectively suppresses the voice noise reduction device of neighbourhood noise.

In order to realize the first object of the present invention, the invention provides a kind of voice de-noising method, comprising: the time domain voice signal that the first microphone is received carries out obtaining the first frequency domain voice signal after Short Time Fourier Transform; The time domain voice signal that second microphone is received carries out obtaining the second frequency domain voice signal after Short Time Fourier Transform; The first frequency domain voice signal is carried out to voice activation judgement; According to the first frequency domain voice signal, calculate preliminary noise energy, wherein, according to the result of voice activation judgement and the frequency domain filter of former frame, calculate smoothing factor, and according to smoothing factor, calculate the first energy spectrum of the first frequency domain voice signal, according to smoothing factor, calculate the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal; According to the distance between the first energy spectrum, simple crosscorrelation energy spectrum and the first microphone and second microphone, calculate simple crosscorrelation transport function; According to result and the simple crosscorrelation transport function of voice activation judgement, calculate the noise compensation factor; According to the noise compensation factor, preliminary noise energy and the first energy spectrum, calculate preliminary frequency domain filter; According to the amplitude of preliminary frequency domain filter and the first frequency domain voice signal, calculate length of smooth window; Preliminary frequency domain filter and length of smooth window carry out convolution and draw frequency domain filter; The first frequency domain voice signal of the first frequency domain voice signal after by output filtering after frequency domain filter filtering; Filtered the first frequency domain voice signal is carried out to the first time domain voice signal after output filtering after inverse Fourier transform in short-term.

From such scheme, according to the Different Results of voice activation judgement, preliminary noise energy is calculated and in wave filter, carried out dynamic compensation, simultaneously again according to the simple crosscorrelation transport function between two microphones, voice signal or neighbourhood noise are sampled and filtered at wave filter, make voice de-noising method of the present invention have that tracking velocity is fast, calculated amount is few and can effectively suppress the features such as neighbourhood noise.

Further scheme is, voice activation determining step comprises: according to the amplitude of the first frequency domain voice signal of present frame and preliminary noise energy, calculate after the signal to noise ratio (S/N ratio) of present frame, when if the first frequency domain voice signal of former frame is voice signal, and when the signal to noise ratio (S/N ratio) of judgement present frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is judged as to voice signal, when if the first frequency domain voice signal of former frame is noise signal, and when the signal to noise ratio (S/N ratio) that the signal to noise ratio (S/N ratio) of judgement present frame is greater than default signal to noise ratio (S/N ratio) threshold values and former frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is judged as to voice signal, otherwise the first frequency domain voice signal of present frame is judged as to noise signal.

Further scheme is, according to the amplitude of the first frequency domain voice signal of present frame and preliminary noise energy, calculate after the signal to noise ratio (S/N ratio) of present frame, while judging that present frame is between the first frame to the 20 frames, the first frequency domain voice signal of present frame is judged as to voice signal.

Further scheme is, when if the first frequency domain voice signal of former frame is voice signal, when the signal to noise ratio (S/N ratio) of judgement present frame is less than or equal to default signal to noise ratio (S/N ratio) threshold values, the signal to noise ratio (S/N ratio) quantity that judgement is greater than default signal to noise ratio (S/N ratio) threshold values in the signal to noise ratio (S/N ratio) of front 19 frames is when more than three, the first frequency domain voice signal of present frame is judged as to voice signal, otherwise the first frequency domain voice signal of present frame is judged as to noise signal.

Further scheme is, smoothing factor comprises the first smoothing factor and the second smoothing factor, according to the preliminary frequency domain filter of the result of voice activation judgement and former frame, calculating smoothing factor step comprises: while being voice signal as the first frequency domain voice signal of present frame, according to the first smoothing factor, calculate the first energy spectrum of the first frequency domain voice signal and the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal, while being noise signal as the first frequency domain voice signal of present frame, according to the second smoothing factor, calculate the first energy spectrum of the first frequency domain voice signal and the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal.

Therefore, according to the result of voice activation judgement, to speech frame or noise, adopt different smoothing factors to carry out the calculating of energy spectrum and simple crosscorrelation energy spectrum respectively, make smoothing factor in wave filter, carry out the effect of dynamic compensation, it has the features such as short, calculated amount of response time is few.

In order to realize the second object of the present invention, the invention provides a kind of voice noise reduction device, comprise: for receiving time domain voice signal and exporting the first Short Time Fourier Transform unit of the first frequency domain voice signal, be used for the second Short Time Fourier Transform unit that receives time domain voice signal and export the second frequency domain voice signal, for judging that the first frequency domain voice signal is the voice activation judging unit of voice signal or noise signal, for calculate the preliminary noise energy computing unit of preliminary noise energy according to the amplitude of the first frequency domain voice signal, wherein, voice noise reduction device also comprises energy calculation unit, energy calculation unit comprises energy spectrum computing module, simple crosscorrelation energy spectrum computing module and smoothing factor computing module, smoothing factor computing module is to energy spectrum computing module and simple crosscorrelation energy spectrum computing module output smoothing factor coefficient, energy spectrum computing module calculates the first energy spectrum according to smoothing factor coefficient and the first frequency domain voice signal, simple crosscorrelation energy spectrum computing module is according to smoothing factor coefficient, the first frequency domain voice signal and the second frequency domain voice signal calculate simple crosscorrelation energy spectrum, simple crosscorrelation is transmitted computing unit, for calculating simple crosscorrelation transport function according to the distance between the first energy spectrum, simple crosscorrelation energy spectrum and the first microphone and second microphone, preliminary frequency domain filter computing unit, for going out the noise compensation factor according to the calculated signals of the signal of voice activation judging unit output and simple crosscorrelation transport function calculating unit output, and calculate preliminary frequency domain filter according to the noise compensation factor, preliminary noise energy and the first energy spectrum, amending unit, calculates length of smooth window according to the amplitude of preliminary frequency domain filter and the first frequency domain voice signal, frequency domain filter computing unit, carries out convolution by preliminary frequency domain filter and length of smooth window and draws frequency domain filter, and frequency domain filter computing unit is used for receiving the first frequency domain voice signal, inverse Fourier transform unit in short-term, for receiving the filtered first frequency domain voice signal of frequency domain filter computing unit output, and outside the first time domain voice signal after output filtering.

From such scheme, according to voice activation judging unit, export different judged results, the noise compensation factor is calculated, and in wave filter, carry out dynamic compensation, simultaneously according to the simple crosscorrelation transport function between two microphones, voice signal or neighbourhood noise are sampled and filtered at wave filter, make voice de-noising method of the present invention have that noise tracking velocity is fast, calculated amount is few and can effectively suppress the features such as neighbourhood noise.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of voice de-noising method embodiment of the present invention.

Fig. 2 is the process flow diagram of voice activation judgement in voice de-noising method embodiment of the present invention.

Fig. 3 is the system chart of voice noise reduction device of the present invention.

Below in conjunction with drawings and Examples, the invention will be further described.

Embodiment

Voice de-noising method embodiment:

With reference to Fig. 1, Fig. 1 is the process flow diagram of voice de-noising of the present invention.At the present embodiment, by two microphones, pick up voice signal, by voice de-noising method of the present invention, the voice signal picking up is carried out to noise reduction.

Concrete noise-reduction method of the present invention is as follows: first perform step S1, the time domain voice signal that two microphones are picked up carries out Short Time Fourier Transform, thereby is transformed into the voice signal of frequency domain.Specifically, the first microphone carries out Short Time Fourier Transform by the time domain voice signal picking up, and to obtain the first frequency domain voice signal, second microphone carries out Short Time Fourier Transform by the time domain voice signal picking up, to obtain the second frequency domain voice signal.The window function of the Short Time Fourier Transform that the present embodiment adopts all adopts Hamming window, and the overlapping range between consecutive frame is 75%.

Then perform step S2, according to the first frequency domain voice signal, carry out voice activation judgement.The voice signal receiving due to microphone not only has the voice signal that human body sends, the noise also with environment, and the object of voice activation judgement is that can correctly to distinguish current voice signal be the voice signal that has comprised neighbourhood noise or the signal that only contains neighbourhood noise.

With reference to Fig. 2, Fig. 2 is the process flow diagram of voice activation determining step S2 of the present invention.First to the first frequency domain voice signal execution step S21,, according to the signal to noise ratio (S/N ratio) of present frame the first frequency domain voice signal and the preliminary noise energy calculating of former frame present frame, its snr computation formula is as follows:

Figure 2013107439509100002DEST_PATH_IMAGE001

(formula 1.1)

Wherein, the signal to noise ratio (S/N ratio) that SNR in formula 1.1 (n) is present frame, NFFT is Fourier Transform Coefficients, and f is frequency coefficient, and n is time frame coefficient, and X1 is the amplitude of the first frequency domain voice signal, N is preliminary noise energy.

Then perform step S22, judge whether n is greater than 20, as no, perform step S23, the first frequency domain voice signal that is about to present frame is judged as voice signal.Because judgement present frame is speech frame, so present frame vad_flag (n) is set to 1, i.e. vad_flag (n)=1.

When n > 20, perform step S24, judgement former frame, whether the voice signal of n-1 frame is speech frame, judges whether vad_flag (n-1) equals 1.

When vad_flag (n-1)=1, perform step S25, whether the signal to noise ratio snr (n) that judges present frame is greater than default signal to noise ratio (S/N ratio) threshold values SNR_THR, in this way, perform step S251, present frame voice signal is judged as to speech frame, i.e. vad_flag (n)=1, as otherwise execution step S26.

When whether the signal to noise ratio snr (n) of present frame is less than or equal to default signal to noise ratio (S/N ratio) threshold values SNR_THR, execution step S26, judge that the signal to noise ratio (S/N ratio) quantity that is greater than default signal to noise ratio (S/N ratio) threshold values in the signal to noise ratio (S/N ratio) of front 19 frames is when more than three, perform step S261, it is vad_flag (n)=1 that front frame voice signal is judged as to speech frame, otherwise execution step S262 is vad_flag (n)=0 by front frame voice signal judgement noise frame.

In step S24, as judge vad_flag (n-1) ≠ 1 o'clock, perform step S27, when when judgement present frame signal to noise ratio (S/N ratio) is greater than default signal to noise ratio (S/N ratio) threshold values, former frame signal to noise ratio (S/N ratio) is greater than default signal to noise ratio (S/N ratio) threshold values, to perform step S271, it is vad_flag (n)=1 that front frame voice signal is judged as speech frame, otherwise execution step S272, it is vad_flag (n)=0 that front frame voice signal is judged as noise frame.

With this, complete voice activation determining step S2, the calculating by present frame signal to noise ratio (S/N ratio) and with the comparison of default signal to noise ratio (S/N ratio) threshold values, it is speech frame or noise frame that the present frame voice signal of input is correctly distinguished to current speech signal.

After step S2 is finished, will perform step S3, and according to the preliminary frequency domain filter of the different judged result of above-mentioned voice activation determining step S2 and former frame, calculate corresponding smoothing factor λ, the computing formula of smoothing factor λ is as follows:

λ (n, f)=0.98-0.2 * G (n-1, f), vad_flag (n)=0 (formula 1.2)

λ (n, f)=0.90-0.15 * G (n-1, f), vad_flag (n)=1 (formula 1.3)

Wherein, the preliminary frequency domain filter of G (n-1, the f) former frame in formula 1.2 and formula 1.3.

Then according to the judged result of present frame voice signal, select different smoothing factor λ to calculate the energy spectrum of the first frequency domain voice signal and the energy spectrum of the second frequency domain voice signal, perform step S4, the computing formula of the energy spectrum of the first frequency domain voice signal is as follows:

P _x1X1(n, f)=λ (n, f) P _x1X1(n-1, f)+(1-λ (n, f)) X ₁(n, f) X ₁(n, f) ^*(formula 1.4)

P _x2X2(n, f)=λ (n, f) P _x2X2(n-1, f)+(1-λ (n, f)) X ₂(n, f) X ₂(n, f) ^*(formula 1.5)

Wherein, the P in formula 1.4 _x1X1the first energy spectrum that represents the first frequency domain voice signal, X ₁(n, f) represents the amplitude of the first frequency domain voice signal, X ₁(n, f) ^*represent that the first frequency domain voice signal amplitude asks conjugation.P in formula 1.5 _x2X2the second energy spectrum that represents the second frequency domain voice signal, X ₂(n, f) represents the amplitude of the second frequency domain voice signal, X ₂(n, f) ^*represent that the second frequency domain voice signal amplitude asks conjugation.

Perform step subsequently S5, according to the judged result of present frame voice signal, select different smoothing factor λ to calculate the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal, the computing formula of simple crosscorrelation energy spectrum is as follows:

P _x1X2(n, f)=λ (n, f) P _x1X2(n-1, f)+(1-λ (n, f)) X ₁(n, f) X ₂(n, f) ^*(formula 1.6)

Wherein, the P in formula 1.6 _x1X2represent the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal, X ₁(n, f) represents the amplitude of the first frequency domain voice signal, X ₂(n, f) ^*represent that the second frequency domain voice signal amplitude asks conjugation.

From formula 1.4, formula 1.5 and formula 1.6, if present frame voice signal is speech frame, be vad_flag (n)=1 o'clock, the λ (n, f) in employing formula 1.3 is calculated to the first energy spectrum, the second energy spectrum and simple crosscorrelation energy spectrum.

If present frame voice signal is noise frame, be vad_flag (n)=0 o'clock, the λ (n, f) in employing formula 1.2 calculated to the first energy spectrum, the second energy spectrum and simple crosscorrelation energy spectrum.

Visible, by the voice signal picking up is judged it, according to its different result, adopt corresponding energy to calculate, and the preliminary frequency domain filter of former frame sampled, make to the energy spectrum of voice signal follow the tracks of very fast, calculated amount is few and have good dynamic adjustments effect.

Then perform step S6, according to the first frequency domain voice signal amplitude, carry out the calculating of preliminary noise energy.Wherein, step S6 comprises: first according to the first frequency domain voice signal, calculate priori signal to noise ratio (S/N ratio), the computing formula of priori signal to noise ratio (S/N ratio) is as follows:

Figure 2013107439509100002DEST_PATH_IMAGE003

(formula 1.7)

Wherein, the ζ in formula 1.7 _dDrepresent priori signal to noise ratio (S/N ratio), | N (n-1), f| ²for the preliminary noise energy of former frame, ε is fixing smoothing factor, the value between desirable 0 to 1, preferably, ε=0.98.

Then according to priori noise calculating noise compensation rate recently, its formula is as follows:

Figure 2013107439509100002DEST_PATH_IMAGE005

(formula 1.8)

Wherein, the B (n, f) in formula 1.8 represents noise compensation amount, and γ (u, v) is gamma distribution function.

Finally according to calculating noise compensation rate B (n, f), calculate the preliminary noise energy of present frame, its computing formula is as follows:

Figure 2013107439509100002DEST_PATH_IMAGE007

(formula 1.9)

Wherein, in formula 1.9 | N (n, f) | ²represent preliminary noise energy, β is that smoothing factor gets 0.8, | N (n-1, f) | ²for the preliminary noise energy of former frame,

For the noise energy that adopts minimum mean square error method to calculate.

Visible, above-mentioned preliminary noise energy is calculated, by carrying out the analysis of gamma distribution function to the sampling of the preliminary noise energy of former frame with to it, owing to voice signal being carried out to noise reduction, be also can suppress part voice, make subjective audiometry have obvious voice distortion, by adding noise magnitude of recruitment in preliminary noise energy, can make to average out between squelch and voice distortion, be conducive to improve subjective auditory perception, make the not obvious and certain noise degree of the voice distortion of subjective audiometry also not affect auditory perception.

Perform step subsequently S7, according to the first energy spectrum and simple crosscorrelation energy spectrum, simple crosscorrelation transport function is calculated, its computing formula is as follows:

(formula 1.10)

(formula 1.11)

Wherein, in formula 1.10 and formula 1.11, Γ (n, f) is the related function between two microphones of reverberation field, and d is two distances between microphone, and c is the speed that sound is propagated in air, H ₁₂(n, f) is simple crosscorrelation transport function.

Visible, above-mentioned simple crosscorrelation transport function is calculated, by two microphones, lay respectively near on the position on the position of human body phonatory organ with away from human body phonatory organ, location by dual microphone to voice sound source, thereby the human body of sampling out sends the sample of the sample of speech sound and the noise of environment, when then microphone being picked up, there is voice signal that people sends and the sound of neighbourhood noise and carry out analysis filtered effectively, thereby filter out high-quality voice signal.

Then perform step S8, according to the first energy spectrum, the noise compensation factor, preliminary noise energy, calculate preliminary frequency domain filter, its computing formula is as follows:

(formula 1.12)

Wherein, the G (n, f) in formula 1.12 is preliminary frequency domain filter, α | H ₁₂(n, f) | ²for the noise compensation factor.If present frame voice signal is speech frame is vad_flag (n)=1 o'clock, α=2.5 are set.If present frame voice signal is noise frame is vad_flag (n)=0 o'clock, α=1 is set.By utilizing α to carry out different compensation to the judgement of current speech signal, when current speech signal is speech frame, do not need compensation, it is 1 that α is set, and current speech signal is while being noise frame, it is 2.5 that α is set, and carries out certain compensation.

Then perform step S9, preliminary frequency domain filter is carried out to denoising, and draw the frequency domain filter after denoising is filtered, its step S9 is specially:

First be that system noise to producing in system in noise-reduction method is calculated, the amplitude of the first frequency domain voice signal by import and export and the first frequency domain voice signal are through the ratio of the filtered amplitude of preliminary frequency domain filter, and specific formula for calculation is:

(formula 1.13)

η from formula 1.13 (n) is known, and η (n) is the ratio of input signal and output signal, is the system noise ratio producing in system.

Then by system noise, compare the length of smooth window that calculates preliminary wave filter, its computing formula is:

(formula 1.14)

Wherein, the η of formula 1.14 _thrfor the predetermined system noise ratio threshold values of wave filter, and by η _thr=0.4, L (n) is length of smooth window.

Finally preliminary frequency domain filter G (n, f) and length of smooth window L (n) are carried out to convolution, draw the frequency domain filter G after noise-removed filtering _pF(n, f), G _pFthe computing formula of (n, f) is as follows:

(formula 1.15)

By smoothing windows, preliminary frequency domain filter is carried out to further denoising as seen, make frequency domain filter G _pF(n, f) filter effect is more level and smooth.

Then perform step S10, the first frequency domain voice signal is input to frequency domain filter G _pFin (n, f), carry out filtering processing, and draw filtered the first frequency domain voice signal.Perform step again subsequently S11, filtered the first frequency domain voice signal is carried out to inverse Fourier transform and be reduced into filtered time domain voice signal.Finally, execution step S12, outwards the time domain voice signal after output filtering.

From such scheme, according to the result of voice activation judgement, to speech frame or noise, adopt different smoothing factors to carry out the calculating of energy spectrum and simple crosscorrelation energy spectrum respectively, according to above-mentioned judged result, preliminary noise energy is being calculated and in wave filter, carried out dynamic compensation, simultaneously according to the simple crosscorrelation transport function between two microphones, voice signal or neighbourhood noise are sampled and filtered at wave filter, make voice de-noising method of the present invention there is tracking velocity fast, calculated amount is few and can effectively suppress the features such as neighbourhood noise, and voice de-noising method of the present invention does not need to be equipped with special-purpose independent noise reduction chip.

Voice noise reduction device embodiment:

With reference to Fig. 3, Fig. 3 is the system chart of voice noise reduction device of the present invention.Voice noise reduction device comprises Short Time Fourier Transform unit 31, Short Time Fourier Transform unit 32, voice activation judging unit 33, energy estimation unit 34, preliminary noise energy computing unit 35, simple crosscorrelation transport function calculating unit 36, preliminary frequency domain filter computing unit 37, amending unit 38, frequency domain filter computing unit 39 and inverse Fourier transform unit 40.

Short Time Fourier Transform unit 31 is for receiving the time domain voice signal of the first microphone output and being carried out Short Time Fourier Transform and export the first frequency domain voice signal, and Short Time Fourier Transform unit 31 is for receiving the time domain voice signal of second microphone output and being carried out Short Time Fourier Transform and export the second frequency domain voice signal.Voice activation judging unit 33 receives by the first frequency domain voice signal of Short Time Fourier Transform unit 31 outputs and the first frequency domain voice signal is carried out to voice activation judgement, and voice activation judging unit 33 is to the judged result of energy estimation unit 34 its processing of output.

Energy estimation unit 34 comprises energy spectrum computing module 341, simple crosscorrelation energy spectrum computing module 342 and smoothing factor computing module 343, and energy estimation unit 34 is for receiving the first frequency domain voice signal, the second frequency domain voice signal and voice activation judged result.Smoothing factor computing module 343 is made corresponding smoothing factor coefficient adjustment according to feedback signal and the voice activation judged result of preliminary frequency domain filter computing module 37 outputs.Energy spectrum computing module 341 calculates the first energy spectrum of the first frequency domain voice signal and the second energy spectrum of the second frequency domain voice signal according to smoothing factor coefficient, the first frequency domain voice signal and the second frequency domain voice signal, and simple crosscorrelation energy spectrum computing module 342 calculates the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal according to smoothing factor coefficient, the first frequency domain voice signal and the second frequency domain voice signal.

Preliminary noise energy computing unit 35 is for receiving the first frequency domain voice signal and calculating preliminary noise energy according to the amplitude of the first frequency domain voice signal, and preliminary noise energy computing unit 35 calculates the preliminary noise energy of unit 36 output to preliminary frequency domain filter computing unit 37 and simple crosscorrelation transport function respectively.

Simple crosscorrelation transport function is calculated unit 36 for receiving the first energy spectrum, simple crosscorrelation energy spectrum and preliminary noise energy and calculating simple crosscorrelation transport function, and simple crosscorrelation transport function is calculated unit 36 to preliminary frequency domain filter computing unit 37 output simple crosscorrelation transport functions.

Preliminary frequency domain filter computing unit 37 goes out the noise compensation factor for calculate the calculated signals of unit 36 outputs according to the signal of voice activation judging unit 33 outputs and simple crosscorrelation transport function, and according to the noise compensation factor, preliminary noise energy and the first energy spectrum and calculate preliminary frequency domain filter.

Amending unit 38 receives the signal of preliminary frequency domain filter computing unit 37 outputs and calculates length of smooth window according to the amplitude of the first frequency domain voice signal.

The length of smooth window that frequency domain filter computing unit 39 calculates for the preliminary frequency domain filter that receives preliminary frequency domain filter computing unit 37 and calculate and amending unit 38, also according to this length of smooth window, preliminary frequency domain filter is carried out to convolution, thereby calculate the frequency domain filter with smothing filtering curve.

Frequency domain filter computing unit 39 receives the first frequency domain voice signal by 31 outputs of Short Time Fourier Transform unit, and utilizes frequency domain filter to carry out the first frequency domain voice signal after output filtering after filtering to the first frequency domain voice signal.

The first frequency domain voice signal of inverse Fourier transform unit 40 after accepting filter it is carried out to inverse Fourier transform in short-term in short-term, is reduced into the first frequency domain voice signal through the first time domain voice signal of filtering and it is outwards exported.

From such scheme, voice signal dual microphone being received by voice noise reduction device of the present invention carries out the voice signal of outputting high quality after filtering noise reduction, utilize voice noise reduction device of the present invention to have that noise tracking velocity is fast, calculated amount is few and can effectively suppress the features such as neighbourhood noise, can to the voice signal of input, carry out squelch and improve its speech quality efficiently.

Claims

1. voice de-noising method, comprises

The time domain voice signal that the first microphone is received carries out obtaining the first frequency domain voice signal after Short Time Fourier Transform;

The time domain voice signal that second microphone is received carries out obtaining the second frequency domain voice signal after Short Time Fourier Transform;

Described the first frequency domain voice signal is carried out to voice activation judgement;

According to described the first frequency domain voice signal, calculate preliminary noise energy;

It is characterized in that:

According to the result of described voice activation judgement and the preliminary frequency domain filter of former frame, calculate smoothing factor, and according to described smoothing factor, calculate the first energy spectrum of described the first frequency domain voice signal, according to described smoothing factor, calculate the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal;

According to the distance between described the first energy spectrum, described simple crosscorrelation energy spectrum and described the first microphone and described second microphone, calculate simple crosscorrelation transport function;

According to result and the described simple crosscorrelation transport function of described voice activation judgement, calculate the noise compensation factor;

According to the described noise compensation factor, described preliminary noise energy and described the first energy spectrum, calculate preliminary frequency domain filter;

According to the amplitude of described preliminary frequency domain filter and described the first frequency domain voice signal, calculate length of smooth window;

Described preliminary frequency domain filter and described length of smooth window carry out convolution and draw frequency domain filter;

Described the first frequency domain voice signal by described frequency domain filter filtering after the first frequency domain voice signal after output filtering;

Filtered the first frequency domain voice signal is carried out to the first time domain voice signal after output filtering after inverse Fourier transform in short-term.

2. voice de-noising method according to claim 1, is characterized in that:

Described voice activation determining step comprises: according to the preliminary noise energy of the amplitude of the first frequency domain voice signal of present frame and former frame, calculate after the signal to noise ratio (S/N ratio) of present frame, when if the first frequency domain voice signal of former frame is voice signal, and when the signal to noise ratio (S/N ratio) that judges described present frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is set to speech frame;

When if the first frequency domain voice signal of former frame is noise frame, and when the signal to noise ratio (S/N ratio) that the signal to noise ratio (S/N ratio) that judges described present frame is greater than default signal to noise ratio (S/N ratio) threshold values and former frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is set to speech frame, otherwise the first frequency domain voice signal of present frame is set to noise frame.

3. voice de-noising method according to claim 2, is characterized in that:

According to the amplitude of the first frequency domain voice signal of present frame and described preliminary noise energy, calculate after the signal to noise ratio (S/N ratio) of present frame, while judging that present frame is between the first frame to the 20 frames, the first frequency domain voice signal of described present frame is judged as to speech frame.

4. voice de-noising method according to claim 3, is characterized in that:

When if the first frequency domain voice signal of described former frame is speech frame, when the signal to noise ratio (S/N ratio) that judges described present frame is less than or equal to default signal to noise ratio (S/N ratio) threshold values, the signal to noise ratio (S/N ratio) quantity that judgement is greater than described default signal to noise ratio (S/N ratio) threshold values in the signal to noise ratio (S/N ratio) of front 19 frames is when more than three, the first frequency domain voice signal of described present frame is set to speech frame, otherwise, the first frequency domain voice signal of described present frame is judged as to noise frame.

5. voice de-noising method according to claim 4, is characterized in that:

Described smoothing factor comprises the first smoothing factor and the second smoothing factor, calculates smoothing factor step comprise according to the preliminary frequency domain filter of the result of described voice activation judgement and former frame:

When if the first frequency domain voice signal of described present frame is speech frame, according to described the first smoothing factor, calculate the first energy spectrum of described the first frequency domain voice signal, according to described the first smoothing factor, calculate the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal;

When if the first frequency domain voice signal of described present frame is noise frame, according to described the second smoothing factor, calculate the first energy spectrum of described the first frequency domain voice signal, according to described the second smoothing factor, calculate the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal.

6. voice noise reduction device, comprises

The first Short Time Fourier Transform unit, for receiving time domain voice signal and exporting the first frequency domain voice signal;

The second Short Time Fourier Transform unit, for receiving time domain voice signal and exporting the second frequency domain voice signal;

Voice activation judging unit, for judging that described the first frequency domain voice signal is voice signal or noise signal;

Preliminary noise energy computing unit, for calculating preliminary noise energy according to the amplitude of described the first frequency domain voice signal;

It is characterized in that:

Described voice noise reduction device also comprises

Energy calculation unit, described energy calculation unit comprises energy spectrum computing module, simple crosscorrelation energy spectrum computing module and smoothing factor computing module, described smoothing factor computing module is to described energy spectrum computing module and described simple crosscorrelation energy spectrum computing module output smoothing factor coefficient, described energy spectrum computing module calculates the first energy spectrum according to described smoothing factor coefficient and described the first frequency domain voice signal, described simple crosscorrelation energy spectrum computing module is according to described smoothing factor coefficient, described the first frequency domain voice signal and described the second frequency domain voice signal calculate simple crosscorrelation energy spectrum,

Simple crosscorrelation transport function is calculated unit, for calculating simple crosscorrelation transport function according to the distance between described the first energy spectrum, described simple crosscorrelation energy spectrum and described the first microphone and described second microphone;

Preliminary frequency domain filter computing unit, for going out the noise compensation factor according to the calculated signals of the signal of described voice activation judging unit output and described simple crosscorrelation transport function calculating unit output, and calculate preliminary frequency domain filter according to the described noise compensation factor, described preliminary noise energy and described the first energy spectrum;

Amending unit, calculates length of smooth window according to the amplitude of described preliminary frequency domain filter and described the first frequency domain voice signal;

Frequency domain filter computing unit, carries out convolution by described preliminary frequency domain filter and described length of smooth window and draws frequency domain filter, and described frequency domain filter computing unit is used for receiving described the first frequency domain voice signal;

Inverse Fourier transform unit in short-term, for receiving the filtered first frequency domain voice signal of described frequency domain filter computing unit output, and outside the first time domain voice signal after output filtering.

7. voice noise reduction device according to claim 6, is characterized in that:

Described voice activation judging unit also for: according to the preliminary noise energy of the amplitude of the first frequency domain voice signal of present frame and former frame, calculate after the signal to noise ratio (S/N ratio) of present frame, when if the first frequency domain voice signal of former frame is voice signal, and when the signal to noise ratio (S/N ratio) that judges described present frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is set to speech frame;

8. voice noise reduction device according to claim 7, is characterized in that:

Described voice activation judging unit also for: according to the preliminary noise energy of the amplitude of the first frequency domain voice signal of present frame and former frame, calculating after the signal to noise ratio (S/N ratio) of present frame, while judging that present frame is between the first frame to the 20 frames, the first frequency domain voice signal of described present frame is judged as to speech frame.

9. voice noise reduction device according to claim 8, is characterized in that:

Described voice activation judging unit also for: if when the first frequency domain voice signal of described former frame is speech frame, when the signal to noise ratio (S/N ratio) that judges described present frame is less than or equal to default signal to noise ratio (S/N ratio) threshold values, the signal to noise ratio (S/N ratio) quantity that judgement is greater than described default signal to noise ratio (S/N ratio) threshold values in the signal to noise ratio (S/N ratio) of front 19 frames is when more than three, the first frequency domain voice signal of described present frame is set to speech frame, otherwise, the first frequency domain voice signal of described present frame is judged as to noise frame.

10. voice noise reduction device according to claim 9, is characterized in that:

Described smoothing factor coefficient comprises the first smoothing factor coefficient and the second smoothing factor coefficient, described voice activation judging unit also for:

When if the first frequency domain voice signal of described present frame is speech frame, according to described the first smoothing factor coefficient calculations, draw the first energy spectrum of described the first frequency domain voice signal, according to described the first smoothing factor coefficient calculations, draw the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal;

When if the first frequency domain voice signal of described present frame is noise frame, according to described the second smoothing factor coefficient calculations, draw the first energy spectrum of described the first frequency domain voice signal, according to described the second smoothing factor coefficient calculations, draw the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal.