CN103700375A - Voice noise-reducing method and voice noise-reducing device - Google Patents

Voice noise-reducing method and voice noise-reducing device Download PDF

Info

Publication number
CN103700375A
CN103700375A CN201310743950.9A CN201310743950A CN103700375A CN 103700375 A CN103700375 A CN 103700375A CN 201310743950 A CN201310743950 A CN 201310743950A CN 103700375 A CN103700375 A CN 103700375A
Authority
CN
China
Prior art keywords
frequency domain
voice signal
noise
signal
ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310743950.9A
Other languages
Chinese (zh)
Other versions
CN103700375B (en
Inventor
吴东海
张林山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Allwinner Technology Co Ltd
Original Assignee
Allwinner Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Allwinner Technology Co Ltd filed Critical Allwinner Technology Co Ltd
Priority to CN201310743950.9A priority Critical patent/CN103700375B/en
Publication of CN103700375A publication Critical patent/CN103700375A/en
Application granted granted Critical
Publication of CN103700375B publication Critical patent/CN103700375B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a voice noise-reducing method and a voice noise-reducing device. The voice noise-reducing method comprises the steps of calculating a smooth factor according to a voice activation judging result, calculating and obtaining a first energy spectrum of a first frequency-domain voice signal and a cross-correlation energy spectrum according the smooth factor, and calculating and obtaining a cross-correlation transfer function according to the first energy spectrum and the cross-correlation energy spectrum; calculating and obtaining an initial frequency-domain filter according a noise compensation factor, initial noise energy and the first energy spectrum; calculating the smooth window length according to the initial frequency-domain filter and the amplitude of the first frequency-domain voice signal; obtaining a frequency-domain filter by convolving the initial frequency-domain filter and the smooth window length; outputting a filtered first frequency-domain voice signal after the first frequency-domain voice signal is filtered through the frequency-domain filter. The invention also provides the voice noise-reducing device which is used for realizing the voice noise-reducing method. The voice noise-reducing method and the voice noise-reducing device, which are provided by the invention, have the characteristics that the noise tracking speed is fast, the calculating amount is less, the environmental noise can be effectively restrained, and the like.

Description

Voice de-noising method and device thereof
Technical field
The present invention relates to Audio Signal Processing technical field, relate in particular to a kind of method and device thereof that voice signal is carried out to noise reduction.
Background technology
Along with the development of social technology, people are more and more higher to the requirement of sound quality, especially the requirement to voice call quality.Guarantee clear, noiseless and naturally high-quality call, voice de-noising function is that the communication tools such as mobile phone, panel computer are indispensable.Take mobile phone or panel computer as example, product in the market has generally all been installed two microphones, utilize one of them microphone installing near on the position of human body phonatory organ, this microphone picks up the voice signal that people sends and has also picked up neighbourhood noise outward, another one microphone be arranged on away from the position of human body phonatory organ to pick up neighbourhood noise, the sound picking up by two microphones is analyzed, and the voice signal that human body is sent carries out noise reduction process, thereby obtain, filter out high-quality voice signal.
At present, be applied in the dual microphone voice de-noising method of mobile phone or panel computer, it is larger that noise reduction is generally calculated amount preferably, need to move complicated algorithm by being equipped with independent noise reduction chip.And for some comparatively simple dual microphone noise-reduction methods, often noise reduction is undesirable, residual noise is excessive, or this noise-reduction method has strict demand to speech source orientation, and the incorrect words of use-pattern will cancellation voice signal.
For above problem, the present invention proposes a kind of dual microphone noise reduction algorithm that is applied to mobile phone terminal, and this algorithm noise tracking velocity is fast, can effectively suppress neighbourhood noise, for sound bearing, has good robustness, and algorithm complex is low, calculated amount is little.
Summary of the invention
The first object of the present invention is to provide that a kind of noise tracking velocity is fast, calculated amount is few and can effectively suppresses the voice de-noising method of neighbourhood noise.
The second object of the present invention is to provide that a kind of noise tracking velocity is fast, calculated amount is few and can effectively suppresses the voice noise reduction device of neighbourhood noise.
In order to realize the first object of the present invention, the invention provides a kind of voice de-noising method, comprising: the time domain voice signal that the first microphone is received carries out obtaining the first frequency domain voice signal after Short Time Fourier Transform; The time domain voice signal that second microphone is received carries out obtaining the second frequency domain voice signal after Short Time Fourier Transform; The first frequency domain voice signal is carried out to voice activation judgement; According to the first frequency domain voice signal, calculate preliminary noise energy, wherein, according to the result of voice activation judgement and the frequency domain filter of former frame, calculate smoothing factor, and according to smoothing factor, calculate the first energy spectrum of the first frequency domain voice signal, according to smoothing factor, calculate the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal; According to the distance between the first energy spectrum, simple crosscorrelation energy spectrum and the first microphone and second microphone, calculate simple crosscorrelation transport function; According to result and the simple crosscorrelation transport function of voice activation judgement, calculate the noise compensation factor; According to the noise compensation factor, preliminary noise energy and the first energy spectrum, calculate preliminary frequency domain filter; According to the amplitude of preliminary frequency domain filter and the first frequency domain voice signal, calculate length of smooth window; Preliminary frequency domain filter and length of smooth window carry out convolution and draw frequency domain filter; The first frequency domain voice signal of the first frequency domain voice signal after by output filtering after frequency domain filter filtering; Filtered the first frequency domain voice signal is carried out to the first time domain voice signal after output filtering after inverse Fourier transform in short-term.
From such scheme, according to the Different Results of voice activation judgement, preliminary noise energy is calculated and in wave filter, carried out dynamic compensation, simultaneously again according to the simple crosscorrelation transport function between two microphones, voice signal or neighbourhood noise are sampled and filtered at wave filter, make voice de-noising method of the present invention have that tracking velocity is fast, calculated amount is few and can effectively suppress the features such as neighbourhood noise.
Further scheme is, voice activation determining step comprises: according to the amplitude of the first frequency domain voice signal of present frame and preliminary noise energy, calculate after the signal to noise ratio (S/N ratio) of present frame, when if the first frequency domain voice signal of former frame is voice signal, and when the signal to noise ratio (S/N ratio) of judgement present frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is judged as to voice signal, when if the first frequency domain voice signal of former frame is noise signal, and when the signal to noise ratio (S/N ratio) that the signal to noise ratio (S/N ratio) of judgement present frame is greater than default signal to noise ratio (S/N ratio) threshold values and former frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is judged as to voice signal, otherwise the first frequency domain voice signal of present frame is judged as to noise signal.
Further scheme is, according to the amplitude of the first frequency domain voice signal of present frame and preliminary noise energy, calculate after the signal to noise ratio (S/N ratio) of present frame, while judging that present frame is between the first frame to the 20 frames, the first frequency domain voice signal of present frame is judged as to voice signal.
Further scheme is, when if the first frequency domain voice signal of former frame is voice signal, when the signal to noise ratio (S/N ratio) of judgement present frame is less than or equal to default signal to noise ratio (S/N ratio) threshold values, the signal to noise ratio (S/N ratio) quantity that judgement is greater than default signal to noise ratio (S/N ratio) threshold values in the signal to noise ratio (S/N ratio) of front 19 frames is when more than three, the first frequency domain voice signal of present frame is judged as to voice signal, otherwise the first frequency domain voice signal of present frame is judged as to noise signal.
Further scheme is, smoothing factor comprises the first smoothing factor and the second smoothing factor, according to the preliminary frequency domain filter of the result of voice activation judgement and former frame, calculating smoothing factor step comprises: while being voice signal as the first frequency domain voice signal of present frame, according to the first smoothing factor, calculate the first energy spectrum of the first frequency domain voice signal and the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal, while being noise signal as the first frequency domain voice signal of present frame, according to the second smoothing factor, calculate the first energy spectrum of the first frequency domain voice signal and the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal.
Therefore, according to the result of voice activation judgement, to speech frame or noise, adopt different smoothing factors to carry out the calculating of energy spectrum and simple crosscorrelation energy spectrum respectively, make smoothing factor in wave filter, carry out the effect of dynamic compensation, it has the features such as short, calculated amount of response time is few.
In order to realize the second object of the present invention, the invention provides a kind of voice noise reduction device, comprise: for receiving time domain voice signal and exporting the first Short Time Fourier Transform unit of the first frequency domain voice signal, be used for the second Short Time Fourier Transform unit that receives time domain voice signal and export the second frequency domain voice signal, for judging that the first frequency domain voice signal is the voice activation judging unit of voice signal or noise signal, for calculate the preliminary noise energy computing unit of preliminary noise energy according to the amplitude of the first frequency domain voice signal, wherein, voice noise reduction device also comprises energy calculation unit, energy calculation unit comprises energy spectrum computing module, simple crosscorrelation energy spectrum computing module and smoothing factor computing module, smoothing factor computing module is to energy spectrum computing module and simple crosscorrelation energy spectrum computing module output smoothing factor coefficient, energy spectrum computing module calculates the first energy spectrum according to smoothing factor coefficient and the first frequency domain voice signal, simple crosscorrelation energy spectrum computing module is according to smoothing factor coefficient, the first frequency domain voice signal and the second frequency domain voice signal calculate simple crosscorrelation energy spectrum, simple crosscorrelation is transmitted computing unit, for calculating simple crosscorrelation transport function according to the distance between the first energy spectrum, simple crosscorrelation energy spectrum and the first microphone and second microphone, preliminary frequency domain filter computing unit, for going out the noise compensation factor according to the calculated signals of the signal of voice activation judging unit output and simple crosscorrelation transport function calculating unit output, and calculate preliminary frequency domain filter according to the noise compensation factor, preliminary noise energy and the first energy spectrum, amending unit, calculates length of smooth window according to the amplitude of preliminary frequency domain filter and the first frequency domain voice signal, frequency domain filter computing unit, carries out convolution by preliminary frequency domain filter and length of smooth window and draws frequency domain filter, and frequency domain filter computing unit is used for receiving the first frequency domain voice signal, inverse Fourier transform unit in short-term, for receiving the filtered first frequency domain voice signal of frequency domain filter computing unit output, and outside the first time domain voice signal after output filtering.
From such scheme, according to voice activation judging unit, export different judged results, the noise compensation factor is calculated, and in wave filter, carry out dynamic compensation, simultaneously according to the simple crosscorrelation transport function between two microphones, voice signal or neighbourhood noise are sampled and filtered at wave filter, make voice de-noising method of the present invention have that noise tracking velocity is fast, calculated amount is few and can effectively suppress the features such as neighbourhood noise.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of voice de-noising method embodiment of the present invention.
Fig. 2 is the process flow diagram of voice activation judgement in voice de-noising method embodiment of the present invention.
Fig. 3 is the system chart of voice noise reduction device of the present invention.
Below in conjunction with drawings and Examples, the invention will be further described.
Embodiment
Voice de-noising method embodiment:
With reference to Fig. 1, Fig. 1 is the process flow diagram of voice de-noising of the present invention.At the present embodiment, by two microphones, pick up voice signal, by voice de-noising method of the present invention, the voice signal picking up is carried out to noise reduction.
Concrete noise-reduction method of the present invention is as follows: first perform step S1, the time domain voice signal that two microphones are picked up carries out Short Time Fourier Transform, thereby is transformed into the voice signal of frequency domain.Specifically, the first microphone carries out Short Time Fourier Transform by the time domain voice signal picking up, and to obtain the first frequency domain voice signal, second microphone carries out Short Time Fourier Transform by the time domain voice signal picking up, to obtain the second frequency domain voice signal.The window function of the Short Time Fourier Transform that the present embodiment adopts all adopts Hamming window, and the overlapping range between consecutive frame is 75%.
Then perform step S2, according to the first frequency domain voice signal, carry out voice activation judgement.The voice signal receiving due to microphone not only has the voice signal that human body sends, the noise also with environment, and the object of voice activation judgement is that can correctly to distinguish current voice signal be the voice signal that has comprised neighbourhood noise or the signal that only contains neighbourhood noise.
With reference to Fig. 2, Fig. 2 is the process flow diagram of voice activation determining step S2 of the present invention.First to the first frequency domain voice signal execution step S21,, according to the signal to noise ratio (S/N ratio) of present frame the first frequency domain voice signal and the preliminary noise energy calculating of former frame present frame, its snr computation formula is as follows:
Figure 2013107439509100002DEST_PATH_IMAGE001
(formula 1.1)
Wherein, the signal to noise ratio (S/N ratio) that SNR in formula 1.1 (n) is present frame, NFFT is Fourier Transform Coefficients, and f is frequency coefficient, and n is time frame coefficient, and X1 is the amplitude of the first frequency domain voice signal, N is preliminary noise energy.
Then perform step S22, judge whether n is greater than 20, as no, perform step S23, the first frequency domain voice signal that is about to present frame is judged as voice signal.Because judgement present frame is speech frame, so present frame vad_flag (n) is set to 1, i.e. vad_flag (n)=1.
When n > 20, perform step S24, judgement former frame, whether the voice signal of n-1 frame is speech frame, judges whether vad_flag (n-1) equals 1.
When vad_flag (n-1)=1, perform step S25, whether the signal to noise ratio snr (n) that judges present frame is greater than default signal to noise ratio (S/N ratio) threshold values SNR_THR, in this way, perform step S251, present frame voice signal is judged as to speech frame, i.e. vad_flag (n)=1, as otherwise execution step S26.
When whether the signal to noise ratio snr (n) of present frame is less than or equal to default signal to noise ratio (S/N ratio) threshold values SNR_THR, execution step S26, judge that the signal to noise ratio (S/N ratio) quantity that is greater than default signal to noise ratio (S/N ratio) threshold values in the signal to noise ratio (S/N ratio) of front 19 frames is when more than three, perform step S261, it is vad_flag (n)=1 that front frame voice signal is judged as to speech frame, otherwise execution step S262 is vad_flag (n)=0 by front frame voice signal judgement noise frame.
In step S24, as judge vad_flag (n-1) ≠ 1 o'clock, perform step S27, when when judgement present frame signal to noise ratio (S/N ratio) is greater than default signal to noise ratio (S/N ratio) threshold values, former frame signal to noise ratio (S/N ratio) is greater than default signal to noise ratio (S/N ratio) threshold values, to perform step S271, it is vad_flag (n)=1 that front frame voice signal is judged as speech frame, otherwise execution step S272, it is vad_flag (n)=0 that front frame voice signal is judged as noise frame.
With this, complete voice activation determining step S2, the calculating by present frame signal to noise ratio (S/N ratio) and with the comparison of default signal to noise ratio (S/N ratio) threshold values, it is speech frame or noise frame that the present frame voice signal of input is correctly distinguished to current speech signal.
After step S2 is finished, will perform step S3, and according to the preliminary frequency domain filter of the different judged result of above-mentioned voice activation determining step S2 and former frame, calculate corresponding smoothing factor λ, the computing formula of smoothing factor λ is as follows:
λ (n, f)=0.98-0.2 * G (n-1, f), vad_flag (n)=0 (formula 1.2)
λ (n, f)=0.90-0.15 * G (n-1, f), vad_flag (n)=1 (formula 1.3)
Wherein, the preliminary frequency domain filter of G (n-1, the f) former frame in formula 1.2 and formula 1.3.
Then according to the judged result of present frame voice signal, select different smoothing factor λ to calculate the energy spectrum of the first frequency domain voice signal and the energy spectrum of the second frequency domain voice signal, perform step S4, the computing formula of the energy spectrum of the first frequency domain voice signal is as follows:
P x1X1(n, f)=λ (n, f) P x1X1(n-1, f)+(1-λ (n, f)) X 1(n, f) X 1(n, f) *(formula 1.4)
P x2X2(n, f)=λ (n, f) P x2X2(n-1, f)+(1-λ (n, f)) X 2(n, f) X 2(n, f) *(formula 1.5)
Wherein, the P in formula 1.4 x1X1the first energy spectrum that represents the first frequency domain voice signal, X 1(n, f) represents the amplitude of the first frequency domain voice signal, X 1(n, f) *represent that the first frequency domain voice signal amplitude asks conjugation.P in formula 1.5 x2X2the second energy spectrum that represents the second frequency domain voice signal, X 2(n, f) represents the amplitude of the second frequency domain voice signal, X 2(n, f) *represent that the second frequency domain voice signal amplitude asks conjugation.
Perform step subsequently S5, according to the judged result of present frame voice signal, select different smoothing factor λ to calculate the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal, the computing formula of simple crosscorrelation energy spectrum is as follows:
P x1X2(n, f)=λ (n, f) P x1X2(n-1, f)+(1-λ (n, f)) X 1(n, f) X 2(n, f) *(formula 1.6)
Wherein, the P in formula 1.6 x1X2represent the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal, X 1(n, f) represents the amplitude of the first frequency domain voice signal, X 2(n, f) *represent that the second frequency domain voice signal amplitude asks conjugation.
From formula 1.4, formula 1.5 and formula 1.6, if present frame voice signal is speech frame, be vad_flag (n)=1 o'clock, the λ (n, f) in employing formula 1.3 is calculated to the first energy spectrum, the second energy spectrum and simple crosscorrelation energy spectrum.
If present frame voice signal is noise frame, be vad_flag (n)=0 o'clock, the λ (n, f) in employing formula 1.2 calculated to the first energy spectrum, the second energy spectrum and simple crosscorrelation energy spectrum.
Visible, by the voice signal picking up is judged it, according to its different result, adopt corresponding energy to calculate, and the preliminary frequency domain filter of former frame sampled, make to the energy spectrum of voice signal follow the tracks of very fast, calculated amount is few and have good dynamic adjustments effect.
Then perform step S6, according to the first frequency domain voice signal amplitude, carry out the calculating of preliminary noise energy.Wherein, step S6 comprises: first according to the first frequency domain voice signal, calculate priori signal to noise ratio (S/N ratio), the computing formula of priori signal to noise ratio (S/N ratio) is as follows:
Figure 2013107439509100002DEST_PATH_IMAGE003
(formula 1.7)
Wherein, the ζ in formula 1.7 dDrepresent priori signal to noise ratio (S/N ratio), | N (n-1), f| 2for the preliminary noise energy of former frame, ε is fixing smoothing factor, the value between desirable 0 to 1, preferably, ε=0.98.
Then according to priori noise calculating noise compensation rate recently, its formula is as follows:
Figure 2013107439509100002DEST_PATH_IMAGE005
(formula 1.8)
Wherein, the B (n, f) in formula 1.8 represents noise compensation amount, and γ (u, v) is gamma distribution function.
Finally according to calculating noise compensation rate B (n, f), calculate the preliminary noise energy of present frame, its computing formula is as follows:
Figure 2013107439509100002DEST_PATH_IMAGE007
(formula 1.9)
Wherein, in formula 1.9 | N (n, f) | 2represent preliminary noise energy, β is that smoothing factor gets 0.8, | N (n-1, f) | 2for the preliminary noise energy of former frame,
For the noise energy that adopts minimum mean square error method to calculate.
Visible, above-mentioned preliminary noise energy is calculated, by carrying out the analysis of gamma distribution function to the sampling of the preliminary noise energy of former frame with to it, owing to voice signal being carried out to noise reduction, be also can suppress part voice, make subjective audiometry have obvious voice distortion, by adding noise magnitude of recruitment in preliminary noise energy, can make to average out between squelch and voice distortion, be conducive to improve subjective auditory perception, make the not obvious and certain noise degree of the voice distortion of subjective audiometry also not affect auditory perception.
Perform step subsequently S7, according to the first energy spectrum and simple crosscorrelation energy spectrum, simple crosscorrelation transport function is calculated, its computing formula is as follows:
Figure DEST_PATH_IMAGE009
(formula 1.10)
Figure DEST_PATH_IMAGE011
(formula 1.11)
Wherein, in formula 1.10 and formula 1.11, Γ (n, f) is the related function between two microphones of reverberation field, and d is two distances between microphone, and c is the speed that sound is propagated in air, H 12(n, f) is simple crosscorrelation transport function.
Visible, above-mentioned simple crosscorrelation transport function is calculated, by two microphones, lay respectively near on the position on the position of human body phonatory organ with away from human body phonatory organ, location by dual microphone to voice sound source, thereby the human body of sampling out sends the sample of the sample of speech sound and the noise of environment, when then microphone being picked up, there is voice signal that people sends and the sound of neighbourhood noise and carry out analysis filtered effectively, thereby filter out high-quality voice signal.
Then perform step S8, according to the first energy spectrum, the noise compensation factor, preliminary noise energy, calculate preliminary frequency domain filter, its computing formula is as follows:
Figure DEST_PATH_IMAGE013
(formula 1.12)
Wherein, the G (n, f) in formula 1.12 is preliminary frequency domain filter, α | H 12(n, f) | 2for the noise compensation factor.If present frame voice signal is speech frame is vad_flag (n)=1 o'clock, α=2.5 are set.If present frame voice signal is noise frame is vad_flag (n)=0 o'clock, α=1 is set.By utilizing α to carry out different compensation to the judgement of current speech signal, when current speech signal is speech frame, do not need compensation, it is 1 that α is set, and current speech signal is while being noise frame, it is 2.5 that α is set, and carries out certain compensation.
Then perform step S9, preliminary frequency domain filter is carried out to denoising, and draw the frequency domain filter after denoising is filtered, its step S9 is specially:
First be that system noise to producing in system in noise-reduction method is calculated, the amplitude of the first frequency domain voice signal by import and export and the first frequency domain voice signal are through the ratio of the filtered amplitude of preliminary frequency domain filter, and specific formula for calculation is:
Figure DEST_PATH_IMAGE015
(formula 1.13)
η from formula 1.13 (n) is known, and η (n) is the ratio of input signal and output signal, is the system noise ratio producing in system.
Then by system noise, compare the length of smooth window that calculates preliminary wave filter, its computing formula is:
Figure DEST_PATH_IMAGE017
(formula 1.14)
Wherein, the η of formula 1.14 thrfor the predetermined system noise ratio threshold values of wave filter, and by η thr=0.4, L (n) is length of smooth window.
Finally preliminary frequency domain filter G (n, f) and length of smooth window L (n) are carried out to convolution, draw the frequency domain filter G after noise-removed filtering pF(n, f), G pFthe computing formula of (n, f) is as follows:
Figure DEST_PATH_IMAGE019
(formula 1.15)
By smoothing windows, preliminary frequency domain filter is carried out to further denoising as seen, make frequency domain filter G pF(n, f) filter effect is more level and smooth.
Then perform step S10, the first frequency domain voice signal is input to frequency domain filter G pFin (n, f), carry out filtering processing, and draw filtered the first frequency domain voice signal.Perform step again subsequently S11, filtered the first frequency domain voice signal is carried out to inverse Fourier transform and be reduced into filtered time domain voice signal.Finally, execution step S12, outwards the time domain voice signal after output filtering.
From such scheme, according to the result of voice activation judgement, to speech frame or noise, adopt different smoothing factors to carry out the calculating of energy spectrum and simple crosscorrelation energy spectrum respectively, according to above-mentioned judged result, preliminary noise energy is being calculated and in wave filter, carried out dynamic compensation, simultaneously according to the simple crosscorrelation transport function between two microphones, voice signal or neighbourhood noise are sampled and filtered at wave filter, make voice de-noising method of the present invention there is tracking velocity fast, calculated amount is few and can effectively suppress the features such as neighbourhood noise, and voice de-noising method of the present invention does not need to be equipped with special-purpose independent noise reduction chip.
Voice noise reduction device embodiment:
With reference to Fig. 3, Fig. 3 is the system chart of voice noise reduction device of the present invention.Voice noise reduction device comprises Short Time Fourier Transform unit 31, Short Time Fourier Transform unit 32, voice activation judging unit 33, energy estimation unit 34, preliminary noise energy computing unit 35, simple crosscorrelation transport function calculating unit 36, preliminary frequency domain filter computing unit 37, amending unit 38, frequency domain filter computing unit 39 and inverse Fourier transform unit 40.
Short Time Fourier Transform unit 31 is for receiving the time domain voice signal of the first microphone output and being carried out Short Time Fourier Transform and export the first frequency domain voice signal, and Short Time Fourier Transform unit 31 is for receiving the time domain voice signal of second microphone output and being carried out Short Time Fourier Transform and export the second frequency domain voice signal.Voice activation judging unit 33 receives by the first frequency domain voice signal of Short Time Fourier Transform unit 31 outputs and the first frequency domain voice signal is carried out to voice activation judgement, and voice activation judging unit 33 is to the judged result of energy estimation unit 34 its processing of output.
Energy estimation unit 34 comprises energy spectrum computing module 341, simple crosscorrelation energy spectrum computing module 342 and smoothing factor computing module 343, and energy estimation unit 34 is for receiving the first frequency domain voice signal, the second frequency domain voice signal and voice activation judged result.Smoothing factor computing module 343 is made corresponding smoothing factor coefficient adjustment according to feedback signal and the voice activation judged result of preliminary frequency domain filter computing module 37 outputs.Energy spectrum computing module 341 calculates the first energy spectrum of the first frequency domain voice signal and the second energy spectrum of the second frequency domain voice signal according to smoothing factor coefficient, the first frequency domain voice signal and the second frequency domain voice signal, and simple crosscorrelation energy spectrum computing module 342 calculates the simple crosscorrelation energy spectrum between the first frequency domain voice signal and the second frequency domain voice signal according to smoothing factor coefficient, the first frequency domain voice signal and the second frequency domain voice signal.
Preliminary noise energy computing unit 35 is for receiving the first frequency domain voice signal and calculating preliminary noise energy according to the amplitude of the first frequency domain voice signal, and preliminary noise energy computing unit 35 calculates the preliminary noise energy of unit 36 output to preliminary frequency domain filter computing unit 37 and simple crosscorrelation transport function respectively.
Simple crosscorrelation transport function is calculated unit 36 for receiving the first energy spectrum, simple crosscorrelation energy spectrum and preliminary noise energy and calculating simple crosscorrelation transport function, and simple crosscorrelation transport function is calculated unit 36 to preliminary frequency domain filter computing unit 37 output simple crosscorrelation transport functions.
Preliminary frequency domain filter computing unit 37 goes out the noise compensation factor for calculate the calculated signals of unit 36 outputs according to the signal of voice activation judging unit 33 outputs and simple crosscorrelation transport function, and according to the noise compensation factor, preliminary noise energy and the first energy spectrum and calculate preliminary frequency domain filter.
Amending unit 38 receives the signal of preliminary frequency domain filter computing unit 37 outputs and calculates length of smooth window according to the amplitude of the first frequency domain voice signal.
The length of smooth window that frequency domain filter computing unit 39 calculates for the preliminary frequency domain filter that receives preliminary frequency domain filter computing unit 37 and calculate and amending unit 38, also according to this length of smooth window, preliminary frequency domain filter is carried out to convolution, thereby calculate the frequency domain filter with smothing filtering curve.
Frequency domain filter computing unit 39 receives the first frequency domain voice signal by 31 outputs of Short Time Fourier Transform unit, and utilizes frequency domain filter to carry out the first frequency domain voice signal after output filtering after filtering to the first frequency domain voice signal.
The first frequency domain voice signal of inverse Fourier transform unit 40 after accepting filter it is carried out to inverse Fourier transform in short-term in short-term, is reduced into the first frequency domain voice signal through the first time domain voice signal of filtering and it is outwards exported.
From such scheme, voice signal dual microphone being received by voice noise reduction device of the present invention carries out the voice signal of outputting high quality after filtering noise reduction, utilize voice noise reduction device of the present invention to have that noise tracking velocity is fast, calculated amount is few and can effectively suppress the features such as neighbourhood noise, can to the voice signal of input, carry out squelch and improve its speech quality efficiently.

Claims (10)

1. voice de-noising method, comprises
The time domain voice signal that the first microphone is received carries out obtaining the first frequency domain voice signal after Short Time Fourier Transform;
The time domain voice signal that second microphone is received carries out obtaining the second frequency domain voice signal after Short Time Fourier Transform;
Described the first frequency domain voice signal is carried out to voice activation judgement;
According to described the first frequency domain voice signal, calculate preliminary noise energy;
It is characterized in that:
According to the result of described voice activation judgement and the preliminary frequency domain filter of former frame, calculate smoothing factor, and according to described smoothing factor, calculate the first energy spectrum of described the first frequency domain voice signal, according to described smoothing factor, calculate the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal;
According to the distance between described the first energy spectrum, described simple crosscorrelation energy spectrum and described the first microphone and described second microphone, calculate simple crosscorrelation transport function;
According to result and the described simple crosscorrelation transport function of described voice activation judgement, calculate the noise compensation factor;
According to the described noise compensation factor, described preliminary noise energy and described the first energy spectrum, calculate preliminary frequency domain filter;
According to the amplitude of described preliminary frequency domain filter and described the first frequency domain voice signal, calculate length of smooth window;
Described preliminary frequency domain filter and described length of smooth window carry out convolution and draw frequency domain filter;
Described the first frequency domain voice signal by described frequency domain filter filtering after the first frequency domain voice signal after output filtering;
Filtered the first frequency domain voice signal is carried out to the first time domain voice signal after output filtering after inverse Fourier transform in short-term.
2. voice de-noising method according to claim 1, is characterized in that:
Described voice activation determining step comprises: according to the preliminary noise energy of the amplitude of the first frequency domain voice signal of present frame and former frame, calculate after the signal to noise ratio (S/N ratio) of present frame, when if the first frequency domain voice signal of former frame is voice signal, and when the signal to noise ratio (S/N ratio) that judges described present frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is set to speech frame;
When if the first frequency domain voice signal of former frame is noise frame, and when the signal to noise ratio (S/N ratio) that the signal to noise ratio (S/N ratio) that judges described present frame is greater than default signal to noise ratio (S/N ratio) threshold values and former frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is set to speech frame, otherwise the first frequency domain voice signal of present frame is set to noise frame.
3. voice de-noising method according to claim 2, is characterized in that:
According to the amplitude of the first frequency domain voice signal of present frame and described preliminary noise energy, calculate after the signal to noise ratio (S/N ratio) of present frame, while judging that present frame is between the first frame to the 20 frames, the first frequency domain voice signal of described present frame is judged as to speech frame.
4. voice de-noising method according to claim 3, is characterized in that:
When if the first frequency domain voice signal of described former frame is speech frame, when the signal to noise ratio (S/N ratio) that judges described present frame is less than or equal to default signal to noise ratio (S/N ratio) threshold values, the signal to noise ratio (S/N ratio) quantity that judgement is greater than described default signal to noise ratio (S/N ratio) threshold values in the signal to noise ratio (S/N ratio) of front 19 frames is when more than three, the first frequency domain voice signal of described present frame is set to speech frame, otherwise, the first frequency domain voice signal of described present frame is judged as to noise frame.
5. voice de-noising method according to claim 4, is characterized in that:
Described smoothing factor comprises the first smoothing factor and the second smoothing factor, calculates smoothing factor step comprise according to the preliminary frequency domain filter of the result of described voice activation judgement and former frame:
When if the first frequency domain voice signal of described present frame is speech frame, according to described the first smoothing factor, calculate the first energy spectrum of described the first frequency domain voice signal, according to described the first smoothing factor, calculate the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal;
When if the first frequency domain voice signal of described present frame is noise frame, according to described the second smoothing factor, calculate the first energy spectrum of described the first frequency domain voice signal, according to described the second smoothing factor, calculate the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal.
6. voice noise reduction device, comprises
The first Short Time Fourier Transform unit, for receiving time domain voice signal and exporting the first frequency domain voice signal;
The second Short Time Fourier Transform unit, for receiving time domain voice signal and exporting the second frequency domain voice signal;
Voice activation judging unit, for judging that described the first frequency domain voice signal is voice signal or noise signal;
Preliminary noise energy computing unit, for calculating preliminary noise energy according to the amplitude of described the first frequency domain voice signal;
It is characterized in that:
Described voice noise reduction device also comprises
Energy calculation unit, described energy calculation unit comprises energy spectrum computing module, simple crosscorrelation energy spectrum computing module and smoothing factor computing module, described smoothing factor computing module is to described energy spectrum computing module and described simple crosscorrelation energy spectrum computing module output smoothing factor coefficient, described energy spectrum computing module calculates the first energy spectrum according to described smoothing factor coefficient and described the first frequency domain voice signal, described simple crosscorrelation energy spectrum computing module is according to described smoothing factor coefficient, described the first frequency domain voice signal and described the second frequency domain voice signal calculate simple crosscorrelation energy spectrum,
Simple crosscorrelation transport function is calculated unit, for calculating simple crosscorrelation transport function according to the distance between described the first energy spectrum, described simple crosscorrelation energy spectrum and described the first microphone and described second microphone;
Preliminary frequency domain filter computing unit, for going out the noise compensation factor according to the calculated signals of the signal of described voice activation judging unit output and described simple crosscorrelation transport function calculating unit output, and calculate preliminary frequency domain filter according to the described noise compensation factor, described preliminary noise energy and described the first energy spectrum;
Amending unit, calculates length of smooth window according to the amplitude of described preliminary frequency domain filter and described the first frequency domain voice signal;
Frequency domain filter computing unit, carries out convolution by described preliminary frequency domain filter and described length of smooth window and draws frequency domain filter, and described frequency domain filter computing unit is used for receiving described the first frequency domain voice signal;
Inverse Fourier transform unit in short-term, for receiving the filtered first frequency domain voice signal of described frequency domain filter computing unit output, and outside the first time domain voice signal after output filtering.
7. voice noise reduction device according to claim 6, is characterized in that:
Described voice activation judging unit also for: according to the preliminary noise energy of the amplitude of the first frequency domain voice signal of present frame and former frame, calculate after the signal to noise ratio (S/N ratio) of present frame, when if the first frequency domain voice signal of former frame is voice signal, and when the signal to noise ratio (S/N ratio) that judges described present frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is set to speech frame;
When if the first frequency domain voice signal of former frame is noise frame, and when the signal to noise ratio (S/N ratio) that the signal to noise ratio (S/N ratio) that judges described present frame is greater than default signal to noise ratio (S/N ratio) threshold values and former frame is greater than default signal to noise ratio (S/N ratio) threshold values, the first frequency domain voice signal of present frame is set to speech frame, otherwise the first frequency domain voice signal of present frame is set to noise frame.
8. voice noise reduction device according to claim 7, is characterized in that:
Described voice activation judging unit also for: according to the preliminary noise energy of the amplitude of the first frequency domain voice signal of present frame and former frame, calculating after the signal to noise ratio (S/N ratio) of present frame, while judging that present frame is between the first frame to the 20 frames, the first frequency domain voice signal of described present frame is judged as to speech frame.
9. voice noise reduction device according to claim 8, is characterized in that:
Described voice activation judging unit also for: if when the first frequency domain voice signal of described former frame is speech frame, when the signal to noise ratio (S/N ratio) that judges described present frame is less than or equal to default signal to noise ratio (S/N ratio) threshold values, the signal to noise ratio (S/N ratio) quantity that judgement is greater than described default signal to noise ratio (S/N ratio) threshold values in the signal to noise ratio (S/N ratio) of front 19 frames is when more than three, the first frequency domain voice signal of described present frame is set to speech frame, otherwise, the first frequency domain voice signal of described present frame is judged as to noise frame.
10. voice noise reduction device according to claim 9, is characterized in that:
Described smoothing factor coefficient comprises the first smoothing factor coefficient and the second smoothing factor coefficient, described voice activation judging unit also for:
When if the first frequency domain voice signal of described present frame is speech frame, according to described the first smoothing factor coefficient calculations, draw the first energy spectrum of described the first frequency domain voice signal, according to described the first smoothing factor coefficient calculations, draw the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal;
When if the first frequency domain voice signal of described present frame is noise frame, according to described the second smoothing factor coefficient calculations, draw the first energy spectrum of described the first frequency domain voice signal, according to described the second smoothing factor coefficient calculations, draw the simple crosscorrelation energy spectrum between described the first frequency domain voice signal and described the second frequency domain voice signal.
CN201310743950.9A 2013-12-28 2013-12-28 Voice de-noising method and device thereof Active CN103700375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310743950.9A CN103700375B (en) 2013-12-28 2013-12-28 Voice de-noising method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310743950.9A CN103700375B (en) 2013-12-28 2013-12-28 Voice de-noising method and device thereof

Publications (2)

Publication Number Publication Date
CN103700375A true CN103700375A (en) 2014-04-02
CN103700375B CN103700375B (en) 2016-06-15

Family

ID=50361881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310743950.9A Active CN103700375B (en) 2013-12-28 2013-12-28 Voice de-noising method and device thereof

Country Status (1)

Country Link
CN (1) CN103700375B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875951A (en) * 2017-01-23 2017-06-20 珠海全志科技股份有限公司 A kind of DSD signals decoding system and method for lossless broadcasting
CN107851193A (en) * 2015-07-14 2018-03-27 顶级公司 Hybrid machine learning system
CN108806712A (en) * 2018-04-27 2018-11-13 深圳市沃特沃德股份有限公司 Reduce the method and apparatus of frequency domain treating capacity
CN110232905A (en) * 2019-06-12 2019-09-13 会听声学科技(北京)有限公司 Uplink noise-reduction method, device and electronic equipment
CN110632563A (en) * 2019-09-06 2019-12-31 北京理工大学 Intra-pulse frequency coding signal parameter measuring method based on short-time Fourier transform
CN110970051A (en) * 2019-12-06 2020-04-07 广州国音智能科技有限公司 Voice data acquisition method, terminal and readable storage medium
CN113205824A (en) * 2021-04-30 2021-08-03 紫光展锐(重庆)科技有限公司 Sound signal processing method, device, storage medium, chip and related equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10121489B1 (en) * 2017-07-21 2018-11-06 Htc Corporation Method, device, and non-transitory computer readable storage medium for processing audio signal

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1874386A (en) * 2005-05-24 2006-12-06 美国博通公司 Telephone with improved capability and method for processing audio frequency signal therein
US20100278352A1 (en) * 2007-05-25 2010-11-04 Nicolas Petit Wind Suppression/Replacement Component for use with Electronic Systems
US20110026722A1 (en) * 2007-05-25 2011-02-03 Zhinian Jing Vibration Sensor and Acoustic Voice Activity Detection System (VADS) for use with Electronic Systems
US20110096937A1 (en) * 2009-10-28 2011-04-28 Fortemedia, Inc. Microphone apparatus and sound processing method
US20110099010A1 (en) * 2009-10-22 2011-04-28 Broadcom Corporation Multi-channel noise suppression system
CN102411936A (en) * 2010-11-25 2012-04-11 歌尔声学股份有限公司 Speech enhancement method and device as well as head de-noising communication earphone

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1874386A (en) * 2005-05-24 2006-12-06 美国博通公司 Telephone with improved capability and method for processing audio frequency signal therein
US20100278352A1 (en) * 2007-05-25 2010-11-04 Nicolas Petit Wind Suppression/Replacement Component for use with Electronic Systems
US20110026722A1 (en) * 2007-05-25 2011-02-03 Zhinian Jing Vibration Sensor and Acoustic Voice Activity Detection System (VADS) for use with Electronic Systems
US20110099010A1 (en) * 2009-10-22 2011-04-28 Broadcom Corporation Multi-channel noise suppression system
US20110096937A1 (en) * 2009-10-28 2011-04-28 Fortemedia, Inc. Microphone apparatus and sound processing method
CN102411936A (en) * 2010-11-25 2012-04-11 歌尔声学股份有限公司 Speech enhancement method and device as well as head de-noising communication earphone

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
钟金荣等: "一种双通道语音增强方法", 《计算机工程与科学》, vol. 34, no. 5, 31 May 2012 (2012-05-31), pages 172 - 177 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107851193A (en) * 2015-07-14 2018-03-27 顶级公司 Hybrid machine learning system
CN107851193B (en) * 2015-07-14 2022-07-05 顶级公司 Hybrid machine learning system
CN106875951A (en) * 2017-01-23 2017-06-20 珠海全志科技股份有限公司 A kind of DSD signals decoding system and method for lossless broadcasting
CN108806712A (en) * 2018-04-27 2018-11-13 深圳市沃特沃德股份有限公司 Reduce the method and apparatus of frequency domain treating capacity
CN108806712B (en) * 2018-04-27 2020-08-18 深圳市沃特沃德股份有限公司 Method and apparatus for reducing frequency domain processing
CN110232905A (en) * 2019-06-12 2019-09-13 会听声学科技(北京)有限公司 Uplink noise-reduction method, device and electronic equipment
CN110632563A (en) * 2019-09-06 2019-12-31 北京理工大学 Intra-pulse frequency coding signal parameter measuring method based on short-time Fourier transform
CN110970051A (en) * 2019-12-06 2020-04-07 广州国音智能科技有限公司 Voice data acquisition method, terminal and readable storage medium
CN113205824A (en) * 2021-04-30 2021-08-03 紫光展锐(重庆)科技有限公司 Sound signal processing method, device, storage medium, chip and related equipment
CN113205824B (en) * 2021-04-30 2022-11-11 紫光展锐(重庆)科技有限公司 Sound signal processing method, device, storage medium, chip and related equipment

Also Published As

Publication number Publication date
CN103700375B (en) 2016-06-15

Similar Documents

Publication Publication Date Title
CN103700375B (en) Voice de-noising method and device thereof
US11056130B2 (en) Speech enhancement method and apparatus, device and storage medium
CN111418010B (en) Multi-microphone noise reduction method and device and terminal equipment
EP3703052B1 (en) Echo cancellation method and apparatus based on time delay estimation
CN104158990B (en) Method and audio receiving circuit for processing audio signal
US10614788B2 (en) Two channel headset-based own voice enhancement
US9113241B2 (en) Noise removing apparatus and noise removing method
US10580428B2 (en) Audio noise estimation and filtering
CN103827965B (en) Adaptive voice intelligibility processor
EP2932731B1 (en) Spatial interference suppression using dual- microphone arrays
CN105575397B (en) Voice noise reduction method and voice acquisition equipment
CN104050971A (en) Acoustic echo mitigating apparatus and method, audio processing apparatus, and voice communication terminal
CN101976565A (en) Dual-microphone-based speech enhancement device and method
JP7325445B2 (en) Background Noise Estimation Using Gap Confidence
EP2710591B1 (en) Reducing noise pumping due to noise suppression and echo control interaction
CN108022595A (en) A kind of voice signal noise-reduction method and user terminal
WO2020124325A1 (en) Echo elimination adaptive filtering method, apparatus, device and storage medium
US20140193000A1 (en) Method and apparatus for generating a noise reduced audio signal using a microphone array
CN103905656A (en) Residual echo detection method and apparatus
CN103824563A (en) Hearing aid denoising device and method based on module multiplexing
US9026436B2 (en) Speech enhancement method using a cumulative histogram of sound signal intensities of a plurality of frames of a microphone array
CN107045874A (en) A kind of Non-linear Speech Enhancement Method based on correlation
US20130223639A1 (en) Signal processing device, signal processing method and signal processing program
Ayrapetian et al. Asynchronous acoustic echo cancellation over wireless channels
US20200243105A1 (en) Methods and apparatus for an adaptive blocking matrix

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant