CN103700375B

CN103700375B - Voice de-noising method and device thereof

Info

Publication number: CN103700375B
Application number: CN201310743950.9A
Authority: CN
Inventors: 吴东海; 张林山
Original assignee: Allwinner Technology Co Ltd
Current assignee: Allwinner Technology Co Ltd
Priority date: 2013-12-28
Filing date: 2013-12-28
Publication date: 2016-06-15
Anticipated expiration: 2033-12-28
Also published as: CN103700375A

Abstract

The invention provides a kind of voice de-noising method and device thereof, the method comprises: calculate smoothing factor according to the result of voice activation judgement, and calculate the first energy spectrum and the cross-correlation energy spectrum of the first frequency domain voice signal according to smoothing factor; Calculate cross-correlation transfer function according to the first energy spectrum and cross-correlation energy spectrum; Calculate preliminary frequency domain filter according to the noise compensation factor, preliminary noise energy and the first energy spectrum; Calculate length of smooth window according to the amplitude of preliminary frequency domain filter and the first frequency domain voice signal; Preliminary frequency domain filter and length of smooth window carry out convolution and draw frequency domain filter; The first frequency domain voice signal of the first frequency domain voice signal after by output filtering after frequency domain filter filtering. And realize the voice noise reduction device of above-mentioned noise-reduction method. Voice de-noising method of the present invention and device thereof have that noise tracking velocity is fast, amount of calculation is few and can effectively suppress the features such as ambient noise.

Description

Voice de-noising method and device thereof

Technical field

The present invention relates to Audio Signal Processing technical field, relate in particular to a kind of to voice signal carry out noise reduction method andDevice.

Background technology

Along with the development of social technology, people are more and more higher to the requirement of sound quality, especially to voice call qualityRequirement. Guarantee clear, noiseless and naturally high-quality call, voice de-noising function is the communication such as mobile phone, panel computerInstrument is indispensable. Taking mobile phone or panel computer as example, product in the market has generally all been installed two microphones, profitInstalling near on the position of human body phonatory organ with one of them microphone, this microphone also picks up voice signal that people sends outwardPicked up ambient noise, another one microphone be arranged on away from the position of human body phonatory organ to pick up ambient noise, pass throughThe sound that two microphones pick up is analyzed, and the voice signal that human body is sent carries out noise reduction process, filters out thereby obtainHigh-quality voice signal.

At present, be applied in the dual microphone voice de-noising method of mobile phone or panel computer, noise reduction is generally equal preferablyFor amount of calculation is larger, need to move complicated algorithm by being equipped with independent noise reduction chip. And it is comparatively simple two for someMicrophone denoising method, often noise reduction is undesirable, residual noise is excessive, or this noise-reduction method has sternly speech source orientationLattice requirement, the incorrect words of occupation mode will cancellation voice signal.

For above problem, the present invention proposes a kind of dual microphone noise reduction algorithm that is applied to mobile phone terminal, this algorithm noiseTracking velocity is fast, can effectively suppress ambient noise, has good robustness for sound bearing, and algorithm complex is low,Amount of calculation is little.

Summary of the invention

The first object of the present invention is to provide that a kind of noise tracking velocity is fast, amount of calculation is few and can effectively suppress ambient noiseVoice de-noising method.

The second object of the present invention is to provide that a kind of noise tracking velocity is fast, amount of calculation is few and can effectively suppress ambient noiseVoice noise reduction device.

In order to realize the first object of the present invention, the invention provides a kind of voice de-noising method, comprising: by the first microphoneThe time domain voice signal receiving carries out obtaining the first frequency domain voice signal after Short Time Fourier Transform; By second microphone receive timeTerritory voice signal carries out obtaining the second frequency domain voice signal after Short Time Fourier Transform; The first frequency domain voice signal is carried out to voice to swashLive and judge; Calculate preliminary noise energy according to the first frequency domain voice signal, wherein, according to the result of voice activation judgement andThe frequency domain filter of former frame calculates smoothing factor, and calculates the first energy of the first frequency domain voice signal according to smoothing factorAmount spectrum, calculates the cross-correlation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal according to smoothing factor;Calculate cross-correlation transmission according to the distance between the first energy spectrum, cross-correlation energy spectrum and the first microphone and second microphoneFunction; Calculate the noise compensation factor according to result and the cross-correlation transfer function of voice activation judgement; According to the noise compensation factor,Preliminary noise energy and the first energy spectrum calculate preliminary frequency domain filter; According to preliminary frequency domain filter and the first frequency domain voiceThe amplitude of signal calculates length of smooth window; Preliminary frequency domain filter and length of smooth window carry out convolution and draw frequency domain filter; TheThe first frequency domain voice signal of one frequency domain voice signal after by output filtering after frequency domain filter filtering; By filtered first frequentlyTerritory voice signal carries out the first time domain voice signal after output filtering after inverse Fourier transform in short-term.

From such scheme, according to the Different Results of voice activation judgement, preliminary noise energy is calculated and filteredIn ripple device, carry out dynamic compensation, simultaneously again according to the cross-correlation transfer function between two microphones, voice signal or environment are made an uproarSound is sampled and is filtered at wave filter, makes voice de-noising method of the present invention have that tracking velocity is fast, amount of calculation is few and energyEnough features such as ambient noise that effectively suppress.

Further scheme is, voice activation determining step comprises: according to the width of the first frequency domain voice signal of present frameDegree and preliminary noise energy calculate after the signal to noise ratio of present frame, if when the first frequency domain voice signal of former frame is voice signal,And when the signal to noise ratio that judges present frame is greater than default signal to noise ratio threshold values, the first frequency domain voice signal of present frame is judged as to voiceSignal, if when the first frequency domain voice signal of former frame is noise signal, and judges that the signal to noise ratio of present frame is greater than default letterMake an uproar while being greater than default signal to noise ratio threshold values than the signal to noise ratio of threshold values and former frame, the first frequency domain voice signal of present frame is judged asVoice signal, on the contrary the first frequency domain voice signal of present frame is judged as to noise signal.

Further scheme is, calculates according to the amplitude of the first frequency domain voice signal of present frame and preliminary noise energyAfter the signal to noise ratio of present frame, while judging that present frame is between the first frame to the 20 frames, by the first frequency domain voice signal of present frameBe judged as voice signal.

Further scheme is, if when the first frequency domain voice signal of former frame is voice signal, judges the letter of present frameMake an uproar when being less than or equal to default signal to noise ratio threshold values, judge and in the signal to noise ratio of front 19 frames, be greater than default signal to noise ratio threshold valuesSignal to noise ratio quantity is when more than three, and the first frequency domain voice signal of present frame is judged as to voice signal, otherwise by present frameThe first frequency domain voice signal is judged as noise signal.

Further scheme is, smoothing factor comprises the first smoothing factor and the second smoothing factor, sentences according to voice activationThe preliminary frequency domain filter of disconnected result and former frame calculates smoothing factor step and comprises: as the first frequency domain voice letter of present frameWhile number being voice signal, calculate the first energy spectrum and the first frequency domain language of the first frequency domain voice signal according to the first smoothing factorCross-correlation energy spectrum between tone signal and the second frequency domain voice signal, if the first frequency domain voice signal of present frame is noise signalTime, calculate the first energy spectrum and the first frequency domain voice signal and second of the first frequency domain voice signal according to the second smoothing factorCross-correlation energy spectrum between frequency domain voice signal.

Therefore, according to the result of voice activation judgement, respectively speech frame or noise are adopted to different smoothing factorsCarry out the calculating of energy spectrum and cross-correlation energy spectrum, make smoothing factor in wave filter, carry out the effect of dynamic compensation, it hasThe features such as the response time is short, amount of calculation is few.

In order to realize the second object of the present invention, the invention provides a kind of voice noise reduction device, comprising: for receiving time domain languageTone signal and export the first Short Time Fourier Transform unit of the first frequency domain voice signal, for receive time domain voice signal and outputThe second Short Time Fourier Transform unit of the second frequency domain voice signal, for judging that the first frequency domain voice signal is voice signal or makes an uproarThe voice activation judging unit of acoustical signal, for calculating tentatively making an uproar of preliminary noise energy according to the amplitude of the first frequency domain voice signalAcoustic energy computing unit, wherein, voice noise reduction device also comprises energy calculation unit, energy calculation unit comprises that energy spectrum calculatesModule, cross-correlation energy spectrum computing module and smoothing factor computing module, smoothing factor computing module to energy spectrum computing module andCross-correlation energy spectrum computing module output smoothing factor coefficient, energy spectrum computing module is according to smoothing factor coefficient and the first frequency domain languageTone signal calculates the first energy spectrum, and cross-correlation energy spectrum computing module is according to smoothing factor coefficient, the first frequency domain voice signalCalculate cross-correlation energy spectrum with the second frequency domain voice signal; Cross-correlation transmit computing unit, for according to the first energy spectrum,Distance between cross-correlation energy spectrum and the first microphone and second microphone calculates cross-correlation transfer function; Preliminary frequency domain filterRipple device computing unit, for calculating the letter of unit output according to the signal of voice activation judging unit output and cross-correlation transfer functionNumber calculate the noise compensation factor, and calculate just cadence according to the noise compensation factor, preliminary noise energy and the first energy spectrumTerritory wave filter; Amending unit, calculates length of smooth window according to the amplitude of preliminary frequency domain filter and the first frequency domain voice signal;Frequency domain filter computing unit, carries out convolution by preliminary frequency domain filter and length of smooth window and draws frequency domain filter, frequency domain filteringDevice computing unit is used for receiving the first frequency domain voice signal; Inverse Fourier transform unit in short-term, calculates for receiving frequency domain filterThe filtered first frequency domain voice signal of unit output, and outside the first time domain voice signal after output filtering.

From such scheme, export different judged results according to voice activation judging unit, the noise compensation factor is enteredRow calculates, and in wave filter, carries out dynamic compensation, simultaneously according to the cross-correlation transfer function between two microphones, to languageTone signal or ambient noise are sampled and are filtered at wave filter, make voice de-noising method of the present invention have noise and follow the tracks of speedDegree is fast, amount of calculation is few and can effectively suppress the features such as ambient noise.

Brief description of the drawings

Fig. 1 is the flow chart of voice de-noising method embodiment of the present invention.

Fig. 2 is the flow chart of voice activation judgement in voice de-noising method embodiment of the present invention.

Fig. 3 is the system block diagram of voice noise reduction device of the present invention.

Below in conjunction with drawings and Examples, the invention will be further described.

Detailed description of the invention

Voice de-noising method embodiment:

With reference to Fig. 1, Fig. 1 is the flow chart of voice de-noising of the present invention. Pick up voice signal at the present embodiment by two microphones, logicalCross voice de-noising method of the present invention the voice signal picking up is carried out to noise reduction.

Concrete noise-reduction method of the present invention is as follows: first perform step S1, the time domain voice signal that two microphones are picked upCarry out Short Time Fourier Transform, thereby be transformed into the voice signal of frequency domain. Specifically, the first microphone is by the time domain voice that pick upSignal carries out Short Time Fourier Transform, and to obtain the first frequency domain voice signal, second microphone enters the time domain voice signal picking upRow Short Time Fourier Transform, to obtain the second frequency domain voice signal. The window function of the Short Time Fourier Transform that the present embodiment adopts is equalAdopt Hamming window, the overlapping range between consecutive frame is 75%.

Then perform step S2, carry out voice activation judgement according to the first frequency domain voice signal. Because microphone receivesVoice signal not only has the voice signal that human body sends, and also has the noise of environment, and the object of voice activation judgement is canCorrectly distinguish current voice signal and be the signal that the voice signal that has comprised ambient noise still only contains ambient noise.

With reference to Fig. 2, Fig. 2 is the flow chart of voice activation determining step S2 of the present invention. First to the first frequency domain voice signalExecution step S21, according to the signal to noise ratio of present frame the first frequency domain voice signal and the preliminary noise energy calculating of former frame present frame,Its signal-to-noise ratio computation formula is as follows:

Wherein, the signal to noise ratio that SNR in formula 1.1 (n) is present frame, NFFT is Fourier Transform Coefficients, and f is frequency coefficient, and n isTime frame coefficient, X1 is the amplitude of the first frequency domain voice signal, N is preliminary noise energy.

Then perform step S22, judge whether n is greater than 20, as no, perform step S23, by first of present frameFrequency domain voice signal is judged as voice signal. Owing to judging that present frame is speech frame, so present frame vad_flag (n) is set to1, i.e. vad_flag (n)=1.

In the time of n > 20, perform step S24, judge former frame, whether the voice signal of n-1 frame is speech frame,Judge whether vad_flag (n-1) equals 1.

When vad_flag (n-1)=1, perform step S25, judge whether the signal to noise ratio snr (n) of present frame is greater than in advanceIf signal to noise ratio threshold values SNR_THR, in this way, performs step S251, present frame voice signal is judged as to speech frame,Vad_flag (n)=1, as otherwise execution step S26.

In the time that whether the signal to noise ratio snr (n) of present frame is less than or equal to default signal to noise ratio threshold values SNR_THR, execution stepS26, judges that the signal to noise ratio quantity that is greater than default signal to noise ratio threshold values in the signal to noise ratio of front 19 frames is when more than three, holdsRow step S261, it is vad_flag (n)=1 that front frame voice signal is judged as to speech frame, otherwise S262 is by front frame for execution stepVoice signal judges that noise frame is vad_flag (n)=0.

In step S24, as judge vad_flag (n-1) ≠ 1 o'clock, perform step S27, judge that present frame signal to noise ratio is greater thanWhen former frame signal to noise ratio is greater than default signal to noise ratio threshold values when default signal to noise ratio threshold values, will perform step S271, by front frameIt is vad_flag (n)=1 that voice signal is judged as speech frame, otherwise execution step S272 is judged as and makes an uproar by front frame voice signalSound frame is vad_flag (n)=0.

Complete voice activation determining step S2 with this, by the calculating of present frame signal to noise ratio and with the ratio of default signal to noise ratio threshold values, the present frame voice signal of input correctly being distinguished to current speech signal is speech frame or noise frame.

After being finished, step S2 will perform step S3, i.e. the judgement knot different according to above-mentioned voice activation determining step S2The preliminary frequency domain filter of fruit and former frame calculates corresponding smoothing factor λ, and the computing formula of smoothing factor λ is as follows:

λ (n, f)=0.98-0.2 × G (n-1, f), vad_flag (n)=0 (formula 1.2)

λ (n, f)=0.90-0.15 × G (n-1, f), vad_flag (n)=1 (formula 1.3)

Wherein, the preliminary frequency domain filter of G (n-1, the f) former frame in formula 1.2 and formula 1.3.

Then select different smoothing factor λ to calculate the first frequency domain voice letter according to the judged result of present frame voice signalNumber energy spectrum and the energy spectrum of the second frequency domain voice signal, perform step S4, the energy spectrum of the first frequency domain voice signalComputing formula is as follows:

P_X1X1(n,f)＝λ(n,f)P_X1X1(n-1,f)+(1-λ(n,f))X₁(n,f)X₁(n,f)^*(formula 1.4)

P_X2X2(n,f)＝λ(n,f)P_X2X2(n-1,f)+(1-λ(n,f))X₂(n,f)X₂(n,f)^*(formula 1.5)

Wherein, the P in formula 1.4_X1X1Represent the first energy spectrum of the first frequency domain voice signal, X₁(n, f) represents the first frequency domain voice letterNumber amplitude, X₁(n,f)^*Represent that the first frequency domain voice signal amplitude asks conjugation. P in formula 1.5_X2X2Represent the second frequency domain voiceThe second energy spectrum of signal, X₂(n, f) represents the amplitude of the second frequency domain voice signal, X₂(n,f)^*Represent the second frequency domain voice signal widthDegree is asked conjugation.

Perform step subsequently S5, select different smoothing factor λ to calculate according to the judged result of present frame voice signalCross-correlation energy spectrum between one frequency domain voice signal and the second frequency domain voice signal, the computing formula of cross-correlation energy spectrum is as follows:P_X1X2(n,f)＝λ(n,f)P_X1X2(n-1,f)+(1-λ(n,f))X₁(n,f)X₂(n,f)^*(formula 1.6)

Wherein, the P in formula 1.6_X1X2Represent the cross-correlation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal,X₁(n, f) represents the amplitude of the first frequency domain voice signal, X₂(n,f)^*Represent that the second frequency domain voice signal amplitude asks conjugation.

From formula 1.4, formula 1.5 and formula 1.6, be vad_flag (n)=1 o'clock if present frame voice signal is speech frame,The λ (n, f) in employing formula 1.3 is calculated to the first energy spectrum, the second energy spectrum and cross-correlation energy spectrum.

Be vad_flag (n)=0 o'clock if present frame voice signal is noise frame, the λ (n, f) in employing formula 1.2 calculatedThe first energy spectrum, the second energy spectrum and cross-correlation energy spectrum.

Visible, by the voice signal picking up is judged it, adopt corresponding energy meter according to its different resultCalculate, and the preliminary frequency domain filter of former frame is sampled, make the energy spectrum of voice signal follow the tracks of very fast, calculatingAmount less and have a good dynamic adjustments effect.

Then perform step S6, carry out the calculating of preliminary noise energy according to the first frequency domain voice signal amplitude. Wherein, stepRapid S6 comprises: first calculate priori signal to noise ratio according to the first frequency domain voice signal, the computing formula of priori signal to noise ratio is as follows:

Wherein, the ζ in formula 1.7_DDRepresent priori signal to noise ratio, | N (n-1), f|²For the preliminary noise energy of former frame, ε is for fixing level and smoothThe factor, the value between desirable 0 to 1, preferably, ε=0.98.

Then according to recently calculating noise compensation rate of priori noise, its formula is as follows:

Wherein, the B (n, f) in formula 1.8 represents noise compensation amount, and γ (u, v) is gamma distribution function.

The preliminary noise energy of finally calculating present frame according to calculating noise compensation rate B (n, f), its computing formula is as follows:

Wherein, in formula 1.9 | N (n, f) |²Represent preliminary noise energy, β is that smoothing factor gets 0.8, | N (n-1, f) |²For former framePreliminary noise energy,

For the noise energy that adopts minimum mean square error method to calculate.

Visible, above-mentioned preliminary noise energy is calculated, by the sampling to the preliminary noise energy of former frame with it is enteredRow gamma distribution function is analyzed, and is also to suppress part voice owing to voice signal being carried out to noise reduction, subjective audiometry is had brightThe distortion of aobvious voice, by preliminary noise energy add noise magnitude of recruitment, can make between noise suppressed and voice distortionAverage out, be conducive to improve subjective auditory perception, do not make the not obvious and certain noise degree of the voice distortion of subjective audiometry also notAffect auditory perception.

Perform step subsequently S7, according to the first energy spectrum and cross-correlation energy spectrum, cross-correlation transfer function calculated,Its computing formula is as follows:

Wherein, in formula 1.10 and formula 1.11, Γ (n, f) is the correlation function between two microphones of reverberation field, d be two microphones itBetween distance, c is the speed that sound is propagated in air, H₁₂(n, f) is cross-correlation transfer function.

Visible, above-mentioned cross-correlation transfer function is calculated, lay respectively near human body acoustical generator by two microphonesOn position on official's position with away from human body phonatory organ, the location by dual microphone to voice sound source, thus sample out peopleBody sends the sample of the sample of speech sound and the noise of environment, has the voice letter that people sends when then microphone being picked upNumber and the sound of ambient noise carry out analysis filtered effectively, thereby filter out high-quality voice signal.

Then perform step S8, calculate preliminary frequency domain filter according to the first energy spectrum, the noise compensation factor, preliminary noise energyRipple device, its computing formula is as follows:

Wherein, the G (n, f) in formula 1.12 is preliminary frequency domain filter, α | H₁₂(n,f)|²For the noise compensation factor. If present frame voiceSignal is that speech frame is vad_flag (n)=1 o'clock, and α=2.5 are set. If present frame voice signal is noise frameVad_flag (n)=0 o'clock, arranges α=1. By utilizing α to carry out different compensation to the judgement of current speech signal, currentWhen voice signal is speech frame, do not need compensation, it is 1 that α is set, and current speech signal is while being noise frame, and α is setBe 2.5, carry out certain compensation.

Then perform step S9, preliminary frequency domain filter is carried out to denoising, and draw the frequency domain filtering after denoising is filteredDevice, its step S9 is specially:

First be that system noise to producing in system in noise-reduction method is calculated, the amplitude of the first frequency domain voice signal by inputWith the ratio of the first frequency domain voice signal through the filtered amplitude of preliminary frequency domain filter, specific formula for calculation is:

η from formula 1.13 (n) is known, and η (n) is the ratio of input signal and output signal, is the system noise ratio producing in system.

Then compare by system noise the length of smooth window that calculates preliminary wave filter, its computing formula is:

Wherein, the η of formula 1.14_thrFor the predetermined system noise ratio threshold values of wave filter, and by η_thr=0.4, L (n) is length of smooth window.

Finally preliminary frequency domain filter G (n, f) and length of smooth window L (n) are carried out to convolution, draw the frequency after noise-removed filteringTerritory wave filter G_PF(n,f)，G_PFThe computing formula of (n, f) is as follows:

G_PF(n, f)=G (n, f) * L (n) (formula 1.15)

By smoothing windows, preliminary frequency domain filter is carried out to further denoising as seen, make frequency domain filter G_PF(n, f) filtering effectFruit is more level and smooth.

Then perform step S10, the first frequency domain voice signal is input to frequency domain filter G_PFIn (n, f), carry out filtering processing,And draw filtered the first frequency domain voice signal. Perform step again subsequently S11, filtered the first frequency domain voice signal is enteredRow inverse Fourier transform is reduced into filtered time domain voice signal. Finally, execution step S12, outwards after output filtering timeTerritory voice signal.

From such scheme, according to the result of voice activation judgement, respectively speech frame or noise are adopted to different putting downThe sliding factor is carried out the calculating of energy spectrum and cross-correlation energy spectrum, according to above-mentioned judged result, preliminary noise energy is being calculatedAnd in wave filter, carry out dynamic compensation, simultaneously according to the cross-correlation transfer function between two microphones, to voice signal orAmbient noise is sampled and is filtered at wave filter, makes voice de-noising method of the present invention have that tracking velocity is fast, amount of calculationLess and can effectively suppress the features such as ambient noise, and voice de-noising method of the present invention does not need to be equipped with special independent noise reduction coreSheet.

Voice noise reduction device embodiment:

With reference to Fig. 3, Fig. 3 is the system block diagram of voice noise reduction device of the present invention. Voice noise reduction device comprises Short Time Fourier Transform unit31, Short Time Fourier Transform unit 32, voice activation judging unit 33, energy estimation unit 34, preliminary noise energy are calculatedUnit 35, cross-correlation transfer function are calculated unit 36, preliminary frequency domain filter computing unit 37, amending unit 38, frequency domain filterRipple device computing unit 39 and inverse Fourier transform unit 40.

Short Time Fourier Transform unit 31 is for receiving the time domain voice signal of the first microphone output and being carried out Fu in short-termIn leaf transformation export the first frequency domain voice signal, Short Time Fourier Transform unit 31 is for receiving the time domain of second microphone outputVoice signal is also carried out Short Time Fourier Transform and is exported the second frequency domain voice signal. Voice activation judging unit 33 receivesThe first frequency domain voice signal of being exported by Short Time Fourier Transform unit 31 also carries out voice activation to the first frequency domain voice signal and sentencesDisconnected, voice activation judging unit 33 is to the judged result of energy estimation unit 34 its processing of output.

Energy estimation unit 34 comprises energy spectrum computing module 341, cross-correlation energy spectrum computing module 342 and smoothing factorComputing module 343, energy estimation unit 34 swashs for receiving the first frequency domain voice signal, the second frequency domain voice signal and voiceJudged result alive. Feedback signal and language that smoothing factor computing module 343 is exported according to preliminary frequency domain filter computing module 37Sound activates judged result and makes corresponding smoothing factor coefficient adjustment. Energy spectrum computing module 341 is according to smoothing factor coefficient,One frequency domain voice signal and the second frequency domain voice signal calculate the first energy spectrum and the second frequency domain voice of the first frequency domain voice signalThe second energy spectrum of signal, cross-correlation energy spectrum computing module 342 is according to smoothing factor coefficient, the first frequency domain voice signal andTwo frequency domain voice signals calculate the cross-correlation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal.

Preliminary noise energy computing unit 35 is for receiving the first frequency domain voice signal and according to the width of the first frequency domain voice signalDegree calculates preliminary noise energy, and preliminary noise energy computing unit 35 is respectively to preliminary frequency domain filter computing unit 37 and mutualClose transfer function calculating unit 36 and export preliminary noise energy.

Cross-correlation transfer function is calculated unit 36 for receiving the first energy spectrum, cross-correlation energy spectrum and preliminary noise energy alsoCalculate cross-correlation transfer function, cross-correlation transfer function is calculated unit 36 and is exported to preliminary frequency domain filter computing unit 37Cross-correlation transfer function.

Preliminary frequency domain filter computing unit 37 passes for signal and the cross-correlation exported according to voice activation judging unit 33The calculated signals that delivery function computing unit 36 is exported goes out the noise compensation factor, and according to the noise compensation factor, preliminary noise energyWith the first energy spectrum and calculate preliminary frequency domain filter.

Amending unit 38 receives signal that preliminary frequency domain filter computing unit 37 exports and according to the first frequency domain voice signalAmplitude calculate length of smooth window.

The preliminary frequency domain that frequency domain filter computing unit 39 calculates for receiving preliminary frequency domain filter computing unit 37The length of smooth window that wave filter and amending unit 38 calculate, also carries out preliminary frequency domain filter according to this length of smooth windowConvolution, thus calculate the frequency domain filter with smothing filtering curve.

Frequency domain filter computing unit 39 receives the first frequency domain voice signal of being exported by Short Time Fourier Transform unit 31,And utilize frequency domain filter to carry out the first frequency domain voice signal after output filtering after filtering to the first frequency domain voice signal.

The first frequency domain voice signal of inverse Fourier transform unit 40 after accepting filter it is carried out to Fourier is contrary in short-term in short-termConversion, is reduced into the first frequency domain voice signal through the first time domain voice signal of filtering and it is outwards exported.

From such scheme, voice signal dual microphone being received by voice noise reduction device of the present invention carries out filtering and fallsThe voice signal of outputting high quality after making an uproar, utilizes voice noise reduction device of the present invention to have that noise tracking velocity is fast, amount of calculation is few and energyEnough features such as ambient noise that effectively suppress, can carry out noise suppressed and improve its speech quality the voice signal of input efficiently.

Claims

1. voice de-noising method, comprises

The time domain voice signal that the first microphone is received carries out obtaining the first frequency domain voice signal after Short Time Fourier Transform;

The time domain voice signal that second microphone is received carries out obtaining the second frequency domain voice signal after Short Time Fourier Transform;

Described the first frequency domain voice signal is carried out to voice activation judgement, and described voice activation determining step is set to speech frame or noise frame for the first frequency domain voice signal of present frame;

Calculate preliminary noise energy according to described the first frequency domain voice signal;

It is characterized in that:

Calculate smoothing factor according to the result of described voice activation judgement and the preliminary frequency domain filter of former frame, and calculate the first energy spectrum of described the first frequency domain voice signal according to described smoothing factor, calculate the cross-correlation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal according to described smoothing factor;

Calculate cross-correlation transfer function according to the distance between described the first energy spectrum, described cross-correlation energy spectrum and described the first microphone and described second microphone;

Calculate the noise compensation factor according to result and the described cross-correlation transfer function of described voice activation judgement;

Calculate preliminary frequency domain filter according to the described noise compensation factor, described preliminary noise energy and described the first energy spectrum;

Calculate length of smooth window according to the amplitude of described preliminary frequency domain filter and described the first frequency domain voice signal;

Described preliminary frequency domain filter and described length of smooth window carry out convolution and draw frequency domain filter;

Described the first frequency domain voice signal by described frequency domain filter filtering after the first frequency domain voice signal after output filtering;

Filtered the first frequency domain voice signal is carried out to the first time domain voice signal after output filtering after inverse Fourier transform in short-term.

2. voice de-noising method according to claim 1, is characterized in that:

Described voice activation determining step comprises: calculate after the signal to noise ratio of present frame according to the preliminary noise energy of the amplitude of the first frequency domain voice signal of present frame and former frame, if when the first frequency domain voice signal of former frame is speech frame, and when the signal to noise ratio that judges described present frame is greater than default signal to noise ratio threshold values, the first frequency domain voice signal of present frame is set to speech frame;

If when the first frequency domain voice signal of former frame is noise frame, and when the signal to noise ratio that the signal to noise ratio that judges described present frame is greater than default signal to noise ratio threshold values and former frame is greater than default signal to noise ratio threshold values, the first frequency domain voice signal of present frame is set to speech frame, otherwise the first frequency domain voice signal of present frame is set to noise frame.

3. voice de-noising method according to claim 2, is characterized in that:

Calculate after the signal to noise ratio of present frame according to the amplitude of the first frequency domain voice signal of present frame and described preliminary noise energy, while judging that present frame is between the first frame to the 20 frames, the first frequency domain voice signal of described present frame is judged as to speech frame.

4. voice de-noising method according to claim 3, is characterized in that:

If when the first frequency domain voice signal of described former frame is speech frame, when the signal to noise ratio that judges described present frame is less than or equal to default signal to noise ratio threshold values, judge that the signal to noise ratio quantity that is greater than described default signal to noise ratio threshold values in the signal to noise ratio of front 19 frames is when more than three, the first frequency domain voice signal of described present frame is set to speech frame, otherwise, the first frequency domain voice signal of described present frame is judged as to noise frame.

5. voice de-noising method according to claim 4, is characterized in that:

Described smoothing factor comprises the first smoothing factor and the second smoothing factor, calculates smoothing factor step comprise according to the preliminary frequency domain filter of the result of described voice activation judgement and former frame:

If when the first frequency domain voice signal of described present frame is speech frame, calculate the first energy spectrum of described the first frequency domain voice signal according to described the first smoothing factor, calculate the cross-correlation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal according to described the first smoothing factor;

If when the first frequency domain voice signal of described present frame is noise frame, calculate the first energy spectrum of described the first frequency domain voice signal according to described the second smoothing factor, calculate the cross-correlation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal according to described the second smoothing factor.

6. voice noise reduction device, comprises

The first Short Time Fourier Transform unit, for receiving time domain voice signal and exporting the first frequency domain voice signal;

The second Short Time Fourier Transform unit, for receiving time domain voice signal and exporting the second frequency domain voice signal;

Voice activation judging unit, for judging that described the first frequency domain voice signal is speech frame or noise frame;

Preliminary noise energy computing unit, for calculating preliminary noise energy according to the amplitude of described the first frequency domain voice signal;

It is characterized in that:

Described voice noise reduction device also comprises

Energy calculation unit, described energy calculation unit comprises energy spectrum computing module, cross-correlation energy spectrum computing module and smoothing factor computing module, described smoothing factor computing module is to described energy spectrum computing module and described cross-correlation energy spectrum computing module output smoothing factor coefficient, described energy spectrum computing module calculates the first energy spectrum according to described smoothing factor coefficient and described the first frequency domain voice signal, described cross-correlation energy spectrum computing module is according to described smoothing factor coefficient, described the first frequency domain voice signal and described the second frequency domain voice signal calculate cross-correlation energy spectrum,

Cross-correlation transfer function is calculated unit, for calculating cross-correlation transfer function according to the distance between described the first energy spectrum, described cross-correlation energy spectrum and the first microphone and second microphone;

Preliminary frequency domain filter computing unit, for going out the noise compensation factor according to the calculated signals of the signal of described voice activation judging unit output and described cross-correlation transfer function calculating unit output, and calculate preliminary frequency domain filter according to the described noise compensation factor, described preliminary noise energy and described the first energy spectrum;

Amending unit, calculates length of smooth window according to the amplitude of described preliminary frequency domain filter and described the first frequency domain voice signal;

Frequency domain filter computing unit, carries out convolution by described preliminary frequency domain filter and described length of smooth window and draws frequency domain filter, and described frequency domain filter computing unit is used for receiving described the first frequency domain voice signal;

Inverse Fourier transform unit in short-term, for receiving the filtered first frequency domain voice signal of described frequency domain filter computing unit output, and outside the first time domain voice signal after output filtering.

7. voice noise reduction device according to claim 6, is characterized in that:

Described voice activation judging unit also for: calculate after the signal to noise ratio of present frame according to the preliminary noise energy of the amplitude of the first frequency domain voice signal of present frame and former frame, if when the first frequency domain voice signal of former frame is speech frame, and when the signal to noise ratio that judges described present frame is greater than default signal to noise ratio threshold values, the first frequency domain voice signal of present frame is set to speech frame;

8. voice noise reduction device according to claim 7, is characterized in that:

Described voice activation judging unit also for: calculating after the signal to noise ratio of present frame according to the preliminary noise energy of the amplitude of the first frequency domain voice signal of present frame and former frame, while judging that present frame is between the first frame to the 20 frames, the first frequency domain voice signal of described present frame is judged as to speech frame.

9. voice noise reduction device according to claim 8, is characterized in that:

Described voice activation judging unit also for: if when the first frequency domain voice signal of described former frame is speech frame, when the signal to noise ratio that judges described present frame is less than or equal to default signal to noise ratio threshold values, judge that the signal to noise ratio quantity that is greater than described default signal to noise ratio threshold values in the signal to noise ratio of front 19 frames is when more than three, the first frequency domain voice signal of described present frame is set to speech frame, otherwise, the first frequency domain voice signal of described present frame is judged as to noise frame.

10. voice noise reduction device according to claim 9, is characterized in that:

Described smoothing factor coefficient comprises the first smoothing factor coefficient and the second smoothing factor coefficient, described voice activation judging unit also for:

If when the first frequency domain voice signal of described present frame is speech frame, draw the first energy spectrum of described the first frequency domain voice signal according to described the first smoothing factor coefficient calculations, draw the cross-correlation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal according to described the first smoothing factor coefficient calculations;

If when the first frequency domain voice signal of described present frame is noise frame, draw the first energy spectrum of described the first frequency domain voice signal according to described the second smoothing factor coefficient calculations, draw the cross-correlation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal according to described the second smoothing factor coefficient calculations.