CN105575405A - Double-microphone voice active detection method and voice acquisition device - Google Patents

Double-microphone voice active detection method and voice acquisition device Download PDF

Info

Publication number
CN105575405A
CN105575405A CN201410524677.5A CN201410524677A CN105575405A CN 105575405 A CN105575405 A CN 105575405A CN 201410524677 A CN201410524677 A CN 201410524677A CN 105575405 A CN105575405 A CN 105575405A
Authority
CN
China
Prior art keywords
amplitude spectrum
noisy speech
voice
signal amplitude
speech signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410524677.5A
Other languages
Chinese (zh)
Inventor
吴晟
蒋斌
林福辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN201410524677.5A priority Critical patent/CN105575405A/en
Publication of CN105575405A publication Critical patent/CN105575405A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a double-microphone voice active detection method and a voice acquisition device. The method comprises the following steps: performing frequency-domain transform on a noise-containing voice signal and a noise signal to get frequency-domain amplitude spectrums; using a pre-filter to pre-filter the frequency-domain amplitude spectrums to get pre-filtered amplitude spectrums; performing short-term envelope shaping on the pre-filtered amplitude spectrums by use of voice signal short-term envelope; accumulating and comparing the shaped amplitude spectrums to get the energy ratio of the noise-containing voice signal to the noise signal; and making voice activation judgment according to the energy ratio. By implementing the technical scheme of the invention, the accuracy of voice activation judgment at low signal-to-noise ratio is improved significantly.

Description

A kind of dual microphone voice-activation detecting method and voice capture device
Technical field
The present invention relates to communication technique field, be specifically related to a kind of dual microphone voice-activation detecting method and voice capture device.
Background technology
Along with the innovation of mechanics of communication, the capacity of communication network constantly increases, and the processing power of communicating terminal is constantly strengthened, and people improve constantly for the quality requirements of speech communication.This wherein, except improving the frequency bandwidth of speech communication to improve except fidelity, the noiseproof feature of communication terminal is also the significant concern point of speech communication quality.Experienced by single microphone systems by single-channel voice enhanced scheme reduction noise, after improving the stage of voice quality, increasing communication terminal starts the dual-microphone system configuring primary and secondary microphone structure, a microphone (main microphone) is placed on the lower end of voice capture device by this dual-microphone system usually, near the position of mouth, for receiving noisy speech signal, another microphone (secondary microphone) is placed on back or the top of voice capture device upper end, near the position of ear, for receiving the reference signal based on noise.
Double-channel pronunciation enhanced scheme utilizes noisy speech signal and these two signals of reference signal to carry out analysis and calculation, obtains clean voice.Double-channel pronunciation Enhancement Method mainly contains Beam synthesis and energy difference filtering two class methods, and most variations all can comprehensive two kinds of methods.But no matter take any method, all need to coordinate voice activation to detect (voiceactivedetection, VAD).It is judge that current time signal is voice or non-voice that voice activation detects, and this judged result will submit to follow-up speech enhan-cement module, and it has conclusive impact to the performance of speech enhancement schema.If what voice activation detection was regular misses voice segments, the loss of voice that speech enhan-cement exports can be caused; If regular erroneous judgement voice segments, much noise can be caused to remain.Except for except the application of speech enhan-cement, voice activation detects and is also widely used in voice coding, in the fields such as speech recognition, such as, in voice coding, effective voice coding can be carried out to having the fragment of voice, quiet coding or comfortableness noise code are carried out to the fragment without voice, thus improve the efficiency of coding; For speech enhan-cement and denoising, voice activation detects and makes the SNR estimation of the estimation of the noise of speech gaps and sound bite be called possibility; Good voice activation detects the accuracy rate that then greatly can improve speech recognition.
Existing voice activates the implementation method detected, and comprises the implementation method based on energy/snr threshold and the implementation method based on frequency domain character.Based on the algorithm of energy/snr threshold, have time domain short-time energy/signal to noise ratio (S/N ratio) to differentiate and subband domain short-time energy/signal to noise ratio (S/N ratio) differentiation, this kind of algorithm carries out activation by the simple gate limit or double threshold arranging energy/signal to noise ratio (S/N ratio) and judges.Algorithm based on frequency domain character detects the unsmooth feature of frequency spectrum, typically has signal entropy to detect and utilizes the pattern classification of Mel cepstrum coefficient.Above-mentioned algorithm all only make use of the noisy speech signal of single passage, and its robustness is in a noisy environment not high, cannot ensure the accuracy rate that voice activation judges.
Summary of the invention
For the problems referred to above that existing voice activation detection technique exists, now provide a kind of the dual microphone voice-activation detecting method and the voice capture device that are intended to improve the accuracy rate that voice activation under Low SNR judges.
Concrete technical scheme is as follows:
A kind of dual microphone voice-activation detecting method, wherein, comprises the following steps:
Step 1, obtain the noise signal of a noisy speech signal and a corresponding described noisy speech signal;
Step 2, frequency domain conversion is carried out to described noisy speech signal, to obtain noisy speech signal amplitude spectrum, and frequency domain conversion is carried out to described noise signal, to obtain noise signal amplitude spectrum;
Step 3, to described noisy speech signal amplitude spectrum and described noise signal amplitude spectrum carry out pre-filtering respectively;
The short time envelope of step 4, acquisition voice signal;
Step 5, the short time envelope of described voice signal is utilized to carry out shaping to the described noise signal amplitude spectrum after the described noisy speech signal amplitude spectrum after pre-filtering and pre-filtering;
Step 6, carried out to the described noise signal amplitude spectrum after the described noisy speech signal amplitude spectrum after shaping and shaping cumulative comparison, to obtain an energy Ratios;
Step 7: judge whether to carry out voice activation according to described energy Ratios.
Preferably, in described step 2:
By discrete Fourier transform (DFT), or discrete cosine transform, or improvement cosine transform carries out frequency domain conversion to described noisy speech signal, obtains noisy speech signal amplitude spectrum; And/or
By discrete Fourier transform (DFT), or discrete cosine transform, or improvement cosine transform carries out frequency domain conversion to described noise signal, to obtain noise signal amplitude spectrum.
Preferably, adopt discrete Fourier transform (DFT) to obtain described noisy speech signal amplitude spectrum to be calculated by following formula:
S a 1 [ k ] t = | Σ n = 1 N w ( n ) s 1 ( t - N + n ) e - 2 πj ( n - 1 ) ( k - 1 ) |
Wherein, S a1for described noisy speech signal amplitude spectrum, s 1t () is described noisy speech signal, e is the truth of a matter of natural logarithm, and j is imaginary unit, j=(-1) 0.5, k is discrete spectrum sequence number, k=1,2,3 ..., N, subscript t are discrete time sequence number, and w (k) is the window function of N point; And/or
Adopt discrete Fourier transform (DFT) to obtain described noise signal amplitude spectrum to be calculated by following formula:
S a 2 [ k ] t = | Σ n = 1 N w ( n ) s 2 ( t - N + n ) e - 2 πj ( n - 1 ) ( k - 1 ) N |
Wherein, S a2for described noise signal amplitude is composed, s 2t () is described is noise signal, and e is the truth of a matter of natural logarithm, and j is imaginary unit, j=(-1) 0.5, k is discrete spectrum sequence number, k=1,2,3 ..., N, subscript t are discrete time sequence number, and w (k) is the window function of N point.
Preferably, the span of described N is f s/ 100/2<N<0.2f s, wherein f sfor sample frequency; Or sample frequency f sn=512 during=8000Hz.
Preferably, described window function adopts rectangular window or sinusoidal windows or Hanning window or hamming window or Tukey window.
Preferably, in described step 3:
The pre-filtering of described noisy speech signal amplitude spectrum is calculated by following formula:
S pa1[k] t=S a1[k] tG 1[k] t,k=1,2,3,...,N
Wherein, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, S a1for noisy speech signal amplitude spectrum, G 1for pre-filtering transport function, G 1for the vector of length N, element coefficient is between 0 to 1; And/or
The pre-filtering of described noise signal amplitude spectrum is calculated by following formula:
S pa2[k] t=S a2[k] tG 2[k] t,k=1,2,3,...,N
Wherein, S pa2for the noise signal amplitude spectrum after pre-filtering, S a2for noise signal amplitude spectrum, G 2for pre-filtering transport function, G 2for the vector of length N, element coefficient is between 0 to 1.
Preferably, adopt frequency domain S filter to carry out pre-filtering to described noisy speech signal amplitude spectrum, the frequency domain S filter described noisy speech signal amplitude spectrum being carried out to filtering is calculated by following formula:
G 1 [ k ] t = max ( P s 1 [ k ] t - P n 1 [ k ] t , 0 ) P s 1 [ k ] t
Wherein, P s1for the auto-power spectrum of noisy speech signal, P n1for the auto-power spectrum of noise in described noisy speech signal; And/or
Adopt frequency domain S filter to carry out pre-filtering to described noise signal amplitude spectrum, the frequency domain S filter that described noise signal amplitude spectrum carries out filtering calculated by following formula:
G 2 [ k ] t = max ( P s 2 [ k ] t - P n 2 [ k ] t , 0 ) P s 2 [ k ] t
Wherein, P s2for the auto-power spectrum of noise signal, P n2for the auto-power spectrum of noise in noise signal.
Preferably, adopt frequency domain S filter to carry out pre-filtering to described noisy speech signal amplitude spectrum, the frequency domain S filter described noisy speech signal amplitude spectrum being carried out to filtering is calculated by following formula:
G 1 [ k ] t = SNR 1 [ k ] t SNR 1 [ k ] t + 1
SNR 1[k] t=α 1G 1[k] t-1 2SNR P1[k] t-1+(1-α 1)max(SNR P1[k] t-1,0)
SNR P 1 [ k ] t = P s 1 [ k ] t P n 1 [ k ] t
Wherein, SNR 1for the signal to noise ratio (S/N ratio) of noisy speech signal, SNR p1for the posteriori SNR of noisy speech signal, P s1for the auto-power spectrum of noisy speech signal, P n1for the auto-power spectrum of noise in described noisy speech signal, α 1and α 2span is 0< α 1, α 2<1; And/or,
Adopt frequency domain S filter to carry out pre-filtering to described noise signal amplitude spectrum, the frequency domain S filter that described noise signal amplitude spectrum carries out filtering calculated by following formula:
G 2 [ k ] t = SNR 2 [ k ] t SNR 2 [ k ] t + 1
SNR 2[k] t=α 2G 2[k] t-1 2SNR P2[k] t-1+(1-α 2)max(SNR P2[k] t-1,0)
SNR P 2 [ k ] t = P s 2 [ k ] t P n 2 [ k ] t
Wherein, SNR 2for the signal to noise ratio (S/N ratio) of noise signal, SNR p2for the posteriori SNR of noise signal, P s2for the auto-power spectrum of noise signal, P n2for the auto-power spectrum of noise in noise signal, α 1and α 2span is 0 <α 1, α 2<1.
Preferably, the auto-power spectrum P of described noisy speech signal s1calculated by following formula:
P s1=S a1 2
Wherein, S a1for the described noisy speech signal amplitude spectrum that described noisy speech signal is formed after frequency domain conversion; And/or
The auto-power spectrum P of described noise signal s2calculated by following formula:
P s2=S a2 2
Wherein, S a2for the described noise signal amplitude spectrum that described noise signal is formed after frequency domain conversion.
Preferably, the auto-power spectrum P of noise in described noisy speech signal n1estimated by following formula:
P n 1 [ k ] t = &eta; 1 P n 1 [ k ] t - 1 + ( 1 - &eta; 1 ) P s 1 [ k ] t , P n 1 [ k ] t - 1 > P s 1 [ k ] t max ( P n 1 [ k ] t - 1 , &eta; 2 P n 1 [ k ] t - 1 + ( 1 - &eta; 2 ) P s 1 [ k ] t - &eta; 3 P s 1 [ k ] t - 1 ( 1 - &eta; 3 ) ) , P n 1 [ k ] t - 1 &le; P s 1 [ k ] t
Wherein, subscript t is discrete time sequence number, η 1, η 2, η 3for smoothing factor, span is 0< η 1, η 2, η 3<1; And/or
The auto-power spectrum P of noise in described noise signal n2estimated by following formula:
P n 2 [ k ] t = &eta; 1 P n 2 [ k ] t - 1 + ( 1 - &eta; 1 ) P s 2 [ k ] t , P n 2 [ k ] t - 1 > P s 2 [ k ] t max ( P n 2 [ k ] t - 1 , &eta; 2 P n 2 [ k ] t - 1 + ( 1 - &eta; 2 ) P s 2 [ k ] t - &eta; 3 P s 2 [ k ] t - 1 ( 1 - &eta; 3 ) ) , P n 2 [ k ] t - 1 &le; P s 2 [ k ] t
Wherein, subscript t is discrete time sequence number, η 1, η 2, η 3for smoothing factor, span is 0< η 1, η 2, η 3<1.
Preferably, in described step 4, the short time envelope of described voice signal is calculated by following formula:
G L [ k ] = S a [ k ] max [ S a [ k ] ] k = 1 N , k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of described voice signal, S afor Short Time Speech amplitude spectrum.
Preferably, described Short Time Speech amplitude spectrum S athe short-time average magnitude spectrum of the enhancing signal adopting described noisy speech signal to export after speech enhan-cement substitutes; Or
Described Short Time Speech amplitude spectrum S aadopt the short-time average of the noisy speech signal amplitude spectrum of described noisy speech signal after pre-filtering to substitute, and calculated by following formula:
S a [ k ] = &alpha; sa S a [ k ] t - 1 + ( 1 - &alpha; sa ) S pa 1 [ k ] t , S pa 1 [ k ] t > S a [ k ] t - 1 &beta; sa S a [ k ] t - 1 + ( 1 - &beta; sa ) S pa 1 [ k ] t , S pa 1 [ k ] t &le; S a [ k ] t - 1 k = 1,2,3 , . . . , N
Wherein, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, α seand β sebe smoothing factor, 0≤α se≤ β se<1.
Preferably, described smoothing factor α sa=1/2, described smoothing factor β sa=31/32.
Preferably, in described step 4, the short time envelope of described voice signal is calculated by following formula:
G L [ k ] = S e [ k ] max [ S e [ k ] ] k = 1 N , k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of described voice signal, S efor Short Time Speech energy spectrum; Or
G L [ k ] = S e [ k ] max [ S e [ k ] ] k = 1 N , k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of described voice signal, S efor Short Time Speech energy spectrum.
Preferably, described Short Time Speech energy spectrum S ethe short-time average energy spectrum of the enhancing signal adopting described noisy speech signal to export after speech enhan-cement substitutes; Or
Described Short Time Speech energy spectrum S eadopt the noisy speech signal amplitude spectrum S of described noisy speech signal after pre-filtering pa1square mean in short-term replace, and to be calculated by following formula:
S e [ k ] = &alpha; se S e [ k ] t - 1 + ( 1 - &alpha; se ) S pa 1 [ k ] t 2 , S pa 1 [ k ] t 2 > S e [ k ] t - 1 &beta; se S e [ k ] t - 1 + ( 1 - &beta; se ) S pa 1 [ k ] t 2 , S pa 1 [ k ] t 2 &le; S e [ k ] t - 1 k = 1,2,3 , . . . , N
Wherein, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, α seand β sebe smoothing factor, and 0≤α se≤ β se<1.
Preferably, described smoothing factor α se=1/2, described smoothing factor β se=31/32.
Preferably, in described step 4, the short time envelope of described voice signal is by the pre-filtering transport function G to noisy speech signal amplitude spectrum 1or the pre-filtering special delivery function G of noise signal amplitude spectrum 2smoothly obtain in short-term:
As the pre-filtering transport function G by noisy speech signal amplitude spectrum 1when obtaining the short time envelope of described voice signal, calculated by following formula:
G L [ k ] t = &alpha; G G L [ k ] t + ( 1 - &alpha; G ) G 1 [ k ] t , G 1 [ k ] t > G L [ k ] t &beta; G G L [ k ] t + ( 1 - &beta; G ) G 1 [ k ] t , G 1 [ k ] t &le; G L [ k ] t k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of described voice signal, α gand β gbe smoothing factor, and 0≤α g≤ β g<1.
Preferably, described smoothing factor α g=1/2, described smoothing factor β g=31/32.
Preferably, in described step 5:
Utilize the short time envelope of described voice signal to carry out shaping to the described noisy speech signal amplitude spectrum after pre-filtering to be calculated by following formula:
S sa1[k] t=S pa1[k] tG L[k] t,k=1,2,3,...,N
Wherein, S sa1for the described noisy speech signal amplitude spectrum after shaping, S pa1for the described noisy speech signal amplitude spectrum after pre-filtering, G lfor the short time envelope of described voice signal, G lfor the vector of length N, element coefficient is between 0 to 1; And/or
Utilize the short time envelope of described voice signal to carry out shaping to the described noise signal amplitude spectrum after pre-filtering to be calculated by following formula:
S sa2[k] t=S pa2[k] tG L[k] t,k=1,2,3,...,N
Wherein, S sa2for the described noise signal amplitude spectrum after shaping, S pa2for the described noise signal amplitude spectrum after pre-filtering, G lfor the short time envelope of described voice signal, G lfor the vector of length N, element coefficient is between 0 to 1.
Preferably, in described step 6, the described energy Ratios of acquisition is full-band energy ratio, and is calculated by following formula:
r t 1 = &Sigma; k = 1 N S sa 1 [ k ] t &epsiv; + &Sigma; k = 1 N S sa 2 [ k ] t
Wherein, S sa1for the noisy speech signal amplitude spectrum after shaping, S sa2for the noise signal amplitude spectrum after shaping, for full-band energy ratio, ε prevents the small positive number except zero error.
Preferably, in described step 6, the described energy Ratios of acquisition is sub-band energy ratio, and is calculated by following formula:
r t 2 = &Sigma; k = Ks Ke S sa 1 [ k ] t &epsiv; + &Sigma; k = Ks Ke S sa 2 [ k ] t
Wherein, S sa1for the noisy speech signal amplitude spectrum after shaping, S sa2for the noise signal amplitude spectrum after shaping, r t2for sub-band energy ratio, ε prevents the small positive number except zero error, K sfor the beginning sequence number of sub-band, K efor the end sequence number of sub-band.
Preferably, in described step 7, described energy Ratios and a predetermined threshold value are compared;
When described energy Ratios is greater than described predetermined threshold value, judge that the signal of corresponding frequency band of moment residing for described energy Ratios is voice;
When described energy Ratios is less than described predetermined threshold value, judge that the signal of corresponding frequency band of moment residing for described energy Ratios is noise;
Described predetermined threshold value is the arithmetic number between 0 to 1.
Preferably, in described step 7, in judgement before by following formula to the smoothing process of described energy Ratios:
r st = &alpha; r r st - 1 + ( 1 - &alpha; r ) r st , r st > r st - 1 , &beta; r r st - 1 + ( 1 - &beta; r ) r st , r st &le; r st - 1 ,
Wherein, r stfor the energy Ratios after smoothing processing, α rand β rbe smoothing factor, 0≤α r≤ β r<1;
Compare through the described energy Ratios of smoothing processing and a predetermined threshold value;
Described energy Ratios through smoothing processing is greater than described predetermined threshold value, judges that the signal of corresponding frequency band of moment residing for described energy Ratios is voice;
Described energy Ratios through smoothing processing is less than described predetermined threshold value, judges that the signal of corresponding frequency band of moment residing for described energy Ratios is noise;
Described predetermined threshold value is the arithmetic number between 0 to 1.
Preferably, described smoothing factor α r=1/16, described smoothing factor β r=3/4.
Preferably, described predetermined threshold value is 0.25.
Also comprise, a kind of voice capture device, adopt dual microphone voice-activation detecting method described above.
The beneficial effect of technique scheme is,
The accuracy rate that voice activation judges under Low SNR can be significantly improved, and then export an ideal voice signal.
Accompanying drawing explanation
By reading the detailed description done non-limiting example with reference to the following drawings, the present invention and feature, profile and advantage will become more obvious.Mark identical in whole accompanying drawing indicates identical part.Deliberately proportionally do not draw accompanying drawing, focus on purport of the present invention is shown.
Fig. 1 is that energy Ratios voice activation of the prior art judges flow process;
Fig. 2 is the flow chart of steps of dual microphone noise-reduction method provided by the invention;
Fig. 3 is the process flow diagram of a kind of embodiment of dual microphone noise-reduction method provided by the invention;
Fig. 4 is that the embodiment of technical solution of the present invention and prior art carry out the effect contrast figure of voice activation detection at one section of noisy speech.
Embodiment
In the following description, a large amount of concrete details is given to provide more thorough understanding of the invention.But, it is obvious to the skilled person that the present invention can be implemented without the need to these details one or more.In other example, in order to avoid obscuring with the present invention, technical characteristics more well known in the art are not described.
Should be understood that, the present invention can implement in different forms, and should not be interpreted as the embodiment that is confined to propose here.On the contrary, provide these embodiments will expose thoroughly with complete, and scope of the present invention is fully passed to those skilled in the art.
The object of term is only to describe specific embodiment and not as restriction of the present invention as used herein.When this uses, " one ", " one " and " described/to be somebody's turn to do " of singulative is also intended to comprise plural form, unless context is known point out other mode.It is also to be understood that term " composition " and/or " comprising ", when using in this specification, determine the existence of described feature, integer, step, operation, element and/or parts, but do not get rid of one or more other feature, integer, step, operation, element, the existence of parts and/or group or interpolation.When this uses, term "and/or" comprises any of relevant Listed Items and all combinations.
In order to thoroughly understand the present invention, detailed step and detailed structure will be proposed in following description, to explain technical scheme of the present invention.Preferred embodiment of the present invention is described in detail as follows, but except these are described in detail, the present invention can also have other embodiments.
In dual-microphone system, main microphone is arranged on the position near user's sounding position usually, and the voice signal that main microphones is arrived is stronger; Secondary microphone can be arranged on the position away from user's sounding position, and the voice signal received is more weak; Dual-microphone system also has the sizable feature of noise signal that two microphones arrive simultaneously.Therefore, in prior art, have a kind of technical scheme of being carried out voice activation judgement by primary and secondary microphone energy difference, wherein, when main microphone energy exceed time microphone energy to a certain degree after, think that current time signal is voice.The flow process of this method as shown in Figure 1.
Technical scheme shown in Fig. 1 is by the physical arrangement support of primary and secondary microphone, and under identical signal to noise ratio (S/N ratio) condition, accuracy of judgement degree is significantly beyond the voice-activation detecting method only utilizing single microphone signal.But along with the further reduction of signal to noise ratio (S/N ratio), the energy of noise constantly strengthens, and the smoothness of noise constantly reduces, make the energy difference of two passages no longer obvious, cause this scheme accuracy rate only utilizing energy difference to carry out voice activation judgement to there will be significant decline.
Based on above-mentioned discovery, now provide a kind of dual microphone voice-activation detecting method, as shown in Figure 2, wherein,
Comprise the following steps:
Step 1, obtain the noise signal of a noisy speech signal and a corresponding noisy speech signal;
Step 2, frequency domain conversion is carried out to noisy speech signal, to obtain noisy speech signal amplitude spectrum, and frequency domain conversion is carried out to noise signal, to obtain noise signal amplitude spectrum;
Step 3, to noisy speech signal amplitude spectrum and noise signal amplitude spectrum carry out pre-filtering respectively;
The short time envelope of step 4, acquisition voice signal;
Step 5, utilize the short time envelope of voice signal to the noisy speech signal amplitude spectrum after pre-filtering and the noise signal amplitude after pre-filtering spectrum carry out shaping;
Step 6, carried out to the noisy speech signal amplitude spectrum after shaping and the spectrum of the noise signal amplitude after shaping cumulative comparison, to obtain an energy Ratios;
Step 7: judge whether to carry out voice activation according to energy Ratios.
Fig. 3 illustrates a kind of specific embodiment of technique scheme, wherein, gathers the noise signal of noisy speech signal and corresponding noisy speech signal by existing dual microphone structure, wherein gathers noisy speech signal s by a main microphone 1t (), gathers the noise signal s of corresponding noisy speech signal by microphone 2(t).
In one preferred embodiment in, in step 2: by discrete Fourier transform (DFT), or discrete cosine transform, or improve cosine transform frequency domain conversion carried out to noisy speech signal, obtain noisy speech signal amplitude spectrum; Technical at this, adopt discrete Fourier transform (DFT) to obtain noisy speech signal amplitude spectrum and calculate by following formula:
S a 1 [ k ] t = | &Sigma; n = 1 N w ( n ) s 1 ( t - N + n ) e - 2 &pi;j ( n - 1 ) ( k - 1 ) |
Wherein, S a1for noisy speech signal amplitude spectrum, s 1t () is noisy speech signal, e is the truth of a matter of natural logarithm, and j is imaginary unit, j=(-1) 0.5, k is discrete spectrum sequence number, k=1,2,3 ..., N, subscript t are discrete time sequence number, and w (k) is the window function of N point.
Due to, discrete cosine transform, and improvement cosine transform is method well known in the art, therefore repeats no more.
In the further embodiment of one, by discrete Fourier transform (DFT), or discrete cosine transform, or improvement cosine transform carries out frequency domain conversion to noise signal, to obtain noise signal amplitude spectrum.
On this basis, adopt discrete Fourier transform (DFT) to obtain noise signal amplitude spectrum to calculate by following formula:
S a 2 [ k ] t = | &Sigma; n = 1 N w ( n ) s 2 ( t - N + n ) e - 2 &pi;j ( n - 1 ) ( k - 1 ) N |
Wherein, S a2for noise signal amplitude spectrum, s 2t () is noise signal, e is the truth of a matter of natural logarithm, and j is imaginary unit, j=(-1) 0.5, k is discrete spectrum sequence number, k=1,2,3 ..., N, subscript t are discrete time sequence number, and w (k) is the window function of N point.
Due to, discrete cosine transform, and improvement cosine transform is method well known in the art, therefore repeats no more.
Based on above-mentioned embodiment, the value of N determines the resolution of frequency-domain analysis, is greater than 100Hz based on frequency domain resolution, the requirement that time window is less than 0.2 second, and the span of N can be f s/ 100/2<N<0.2f s, wherein f sfor sample frequency, preferably, as sample frequency f sn=512 during=8000Hz.
In further technical scheme, window function w (k) can adopt rectangular window or sinusoidal windows or Hanning window or hamming window or Tukey window.
Because the above-mentioned window function enumerated is method well known in the art, therefore repeat no more.
In one preferred embodiment in, in step 3, the pre-filtering of noisy speech signal amplitude spectrum is calculated by following formula:
S pa1[k] t=S a1[k] tG 1[k] t,k=1,2,3,...,N
Wherein, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, S a1for noisy speech signal amplitude spectrum, G 1for pre-filtering transport function, G 1show as the vector that length is N, element coefficient is between 0 to 1.
In further embodiment, in step 3, the pre-filtering of noise signal amplitude spectrum is calculated by following formula:
S pa2[k] t=S a2[k] tG 2[k] t,k=1,2,3,...,N
Wherein, S pa2for the noise signal amplitude spectrum after pre-filtering, S a2for noise signal amplitude spectrum, G 2for pre-filtering transport function, G 2show as the vector for length N, element coefficient is between 0 to 1.
In one preferred embodiment in, by frequency domain S filter to above-mentioned noisy speech signal amplitude spectrum S a1, and noise signal amplitude spectrum S a2carry out pre-filtering.
According to the concept of S filter well known in the art, frequency domain S filter is to above-mentioned noisy speech signal amplitude spectrum S a1, and noise signal amplitude spectrum S a2the theory form carrying out pre-filtering is:
G 1 [ k ] t = P s , s 1 [ k ] t P s 1 [ k ] t , G 2 [ k ] t = P s , s 2 [ k ] t P s 2 [ k ] t
Wherein P s, s1voice signal and noisy speech signal s 1the cross-power spectrum of (t), P s, s2voice signal and noise signal s 2the cross-power spectrum of (t), P s1noisy speech signal s 1the auto-power spectrum of (t), P s2noise signal s 2the auto-power spectrum of (t).
For above-mentioned noisy speech signal s 1the auto-power spectrum P of (t) s1, obtain by following formula:
P s1=S a1 2
Wherein, S a1for noisy speech signal amplitude spectrum.
For above-mentioned noise signal s 2the auto-power spectrum P of (t) s2, obtain by following formula:
P s2=S a2 2
Wherein, S a2for noise signal amplitude spectrum.
Owing to being unknowable at pre-filtering link voice signal, therefore cross-power spectrum P s, s1with cross-power spectrum P s, s2can not directly obtain.In preferrred embodiment of the present invention, the method by estimating noise auto-power spectrum obtains.The method that noise is estimated has tracking signal frequency spectrum minimum value in short-term, the flat equalization method of time recurrence, these are method well known in the art, therefore repeat no more, below with the G.Doblinger noise estimation technique (see document GerhardDoblinger, " Computaionallyefficientspeechenhancementbyspectralminima trackinginsubbands, " Proc.EUROSPEECH ' 95, Madrid, pp.1513-1516) for example is to illustrate the feasibility of technical solution of the present invention, the G.Doblinger noise estimation technique be frequency spectrum in short-term minimum value and time recurrence averaging method one combine.
Based on the G.Doblinger noise estimation technique, noisy speech signal s 1the auto-power spectrum P of noise in (t) n1estimate by following formula:
P n 1 [ k ] t = &eta; 1 P n 1 [ k ] t - 1 + ( 1 - &eta; 1 ) P s 1 [ k ] t , P n 1 [ k ] t - 1 > P s 1 [ k ] t max ( P n 1 [ k ] t - 1 , &eta; 2 P n 1 [ k ] t - 1 + ( 1 - &eta; 2 ) P s 1 [ k ] t - &eta; 3 P s 1 [ k ] t - 1 ( 1 - &eta; 3 ) ) , P n 1 [ k ] t - 1 &le; P s 1 [ k ] t
Wherein, subscript t is discrete time sequence number, η 1, η 2, η 3for smoothing factor, span is 0< η 1, η 2, η 3<1; And/or
On technique scheme basis, further, noise signal s 2the auto-power spectrum P of noise in (t) n2estimated by following formula:
P n 2 [ k ] t = &eta; 1 P n 2 [ k ] t - 1 + ( 1 - &eta; 1 ) P s 2 [ k ] t , P n 2 [ k ] t - 1 > P s 2 [ k ] t max ( P n 2 [ k ] t - 1 , &eta; 2 P n 2 [ k ] t - 1 + ( 1 - &eta; 2 ) P s 2 [ k ] t - &eta; 3 P s 2 [ k ] t - 1 ( 1 - &eta; 3 ) ) , P n 2 [ k ] t - 1 &le; P s 2 [ k ] t
Wherein, subscript t is discrete time sequence number, η 1, η 2, η 3for smoothing factor, span is 0< η 1, η 2, η 3<1.
In preferred embodiment, above-mentioned smoothing factor η 1=0.99, η 2=0.99, η 3=0.8
When obtaining noisy speech signal s 1the auto-power spectrum P of (t) s1, and noisy speech signal s 1the auto-power spectrum P of noise in (t) n1after, in one preferred embodiment in, noisy speech signal amplitude spectrum is carried out to the frequency domain S filter G of pre-filtering 1calculate by following formula:
G 1 [ k ] t = max ( P s 1 [ k ] t - P n 1 [ k ] t , 0 ) P s 1 [ k ] t
Wherein, P s1for the auto-power spectrum of noisy speech signal, P n1for the auto-power spectrum of noise in noisy speech signal.
In further embodiment, when obtaining noise signal s 2the auto-power spectrum P of (t) s2, and noise signal s 2the auto-power spectrum P of noise in (t) n2after, noise signal amplitude spectrum is carried out to the frequency domain S filter G of pre-filtering 2calculate by following formula:
G 2 [ k ] t = max ( P s 2 [ k ] t - P n 2 [ k ] t , 0 ) P s 2 [ k ] t
Wherein, P s2for the auto-power spectrum of noise signal, P n2for the auto-power spectrum of noise in noise signal.
As the optional embodiment of one, for the consideration obtaining more dominance energy, noisy speech signal amplitude spectrum is carried out to the frequency domain S filter G of pre-filtering 1calculate by following formula:
G 1 [ k ] t = SNR 1 [ k ] t SNR 1 [ k ] t + 1
SNR 1[k] t=α 1G 1[k] t-1 2SNR P1[k] t-1+(1-α 1)max(SNR P1[k] t-1,0)
SNR P 1 [ k ] t = P s 1 [ k ] t P n 1 [ k ] t
Wherein, SNR 1for the signal to noise ratio (S/N ratio) of noisy speech signal, SNR p1for the posteriori SNR of noisy speech signal, P s1for the auto-power spectrum of noisy speech signal, P n1for the auto-power spectrum of noise in noisy speech signal, α 1and α 2span is 0< α 1, α 2<1.
In one further optional embodiment, for the consideration obtaining more dominance energy, noise signal amplitude spectrum is carried out to the frequency domain S filter G of pre-filtering 2calculate by following formula:
G 2 [ k ] t = SNR 2 [ k ] t SNR 2 [ k ] t + 1
SNR 2[k] t=α 2G 2[k] t-1 2SNR P2[k] t-1+(1-α 2)max(SNR P2[k] t-1,0)
SNR P 2 [ k ] t = P s 2 [ k ] t P n 2 [ k ] t
Wherein, SNR 2for the signal to noise ratio (S/N ratio) of noise signal, SNR p2for the posteriori SNR of noise signal, P s2for the auto-power spectrum of noise signal, P n2for the auto-power spectrum of noise in noise signal, α 1and α 2span is 0< α 1, α 2<1.
In one preferred embodiment in, in step 4, the short time envelope of voice signal is tried to achieve by being normalized Short Time Speech amplitude spectrum, specifically calculates by following formula:
G L [ k ] = S a [ k ] max [ S a [ k ] ] k = 1 N , k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of voice signal, S afor Short Time Speech amplitude spectrum.
Due to Short Time Speech amplitude spectrum S acannot directly obtain, in preferably embodiment, Short Time Speech amplitude spectrum S athe short-time average magnitude spectrum of the enhancing signal that noisy speech signal can be adopted to export after speech enhan-cement substitutes; Interchangeable, Short Time Speech amplitude spectrum S aalso the noisy speech signal amplitude spectrum S of noisy speech signal after pre-filtering can be used pa1short-time average substitute;
Short Time Speech amplitude spectrum S aadopt the noisy speech signal amplitude spectrum S of noisy speech signal after pre-filtering pa1short-time average when substituting, calculate by following formula:
S a [ k ] = &alpha; sa S a [ k ] t - 1 + ( 1 - &alpha; sa ) S pa 1 [ k ] t , S pa 1 [ k ] t > S a [ k ] t - 1 &beta; sa S a [ k ] t - 1 + ( 1 - &beta; sa ) S pa 1 [ k ] t , S pa 1 [ k ] t &le; S a [ k ] t - 1 k = 1,2,3 , . . . , N
Wherein, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, α seand β sebe smoothing factor, 0≤α se≤ β se<1.
In preferred embodiment, smoothing factor α sa=1/2, smoothing factor β sa=31/32.
As a kind of embodiment of replacement, in step 4, the short time envelope of voice signal is tried to achieve by being normalized Short Time Speech energy spectrum, specifically calculates by following formula:
G L [ k ] = S e [ k ] max [ S e [ k ] ] k = 1 N , k = 1,2,3 , . . . , N
Or
G L [ k ] = S e [ k ] max [ S e [ k ] ] k = 1 N , k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of voice signal, S efor Short Time Speech energy spectrum.
Due to Short Time Speech energy spectrum S ecannot directly obtain, in preferably embodiment, Short Time Speech energy spectrum S ethe short-time average energy spectrum of the enhancing signal that noisy speech signal can be adopted to export after speech enhan-cement substitutes; Interchangeable, Short Time Speech energy spectrum S ealso the noisy speech signal amplitude spectrum S of noisy speech signal after pre-filtering can be adopted pa1square mean in short-term replace;
Short Time Speech energy spectrum S eadopt the noisy speech signal amplitude spectrum S of noisy speech signal after pre-filtering pa1square mean in short-term replace time, calculate by following formula:
And calculated by following formula:
S e [ k ] = &alpha; se S e [ k ] t - 1 + ( 1 - &alpha; se ) S pa 1 [ k ] t 2 , S pa 1 [ k ] t 2 > S e [ k ] t - 1 &beta; se S e [ k ] t - 1 + ( 1 - &beta; se ) S pa 1 [ k ] t 2 , S pa 1 [ k ] t 2 &le; S e [ k ] t - 1 k = 1,2,3 , . . . , N
Wherein, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, α seand β sebe smoothing factor, and 0≤α se≤ β se<1.
In preferred embodiment, smoothing factor α se=1/2, smoothing factor β se=31/32.
In preferred embodiment, in step 4, the short time envelope of voice signal is by the pre-filtering transport function G to noisy speech signal amplitude spectrum 1or the pre-filtering special delivery function G of noise signal amplitude spectrum 2smoothly obtain in short-term:
As the pre-filtering transport function G by noisy speech signal amplitude spectrum 1when obtaining the short time envelope of voice signal, calculate by following formula:
G L [ k ] t = &alpha; G G L [ k ] t + ( 1 - &alpha; G ) G 1 [ k ] t , G 1 [ k ] t > G L [ k ] t &beta; G G L [ k ] t + ( 1 - &beta; G ) G 1 [ k ] t , G 1 [ k ] t &le; G L [ k ] t k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of voice signal, α gand β gbe smoothing factor, and 0≤α g≤ β g<1.
In further embodiment, smoothing factor α g=1/2, smoothing factor β g=31/32.
In preferred embodiment, in step 5:
Utilize the short time envelope of voice signal to carry out shaping to the noisy speech signal amplitude spectrum after pre-filtering to calculate by following formula:
S sa1[k] t=S pa1[k] tG L[k] t,k=1,2,3,...,N
Wherein, S sa1for the noisy speech signal amplitude spectrum after shaping, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, G lfor the short time envelope of voice signal, the short time envelope G of voice signal lfor the vector of length N, element coefficient is between 0 to 1.
In further embodiment, utilize the short time envelope of voice signal to carry out shaping to the noise signal amplitude spectrum after pre-filtering and calculate by following formula:
S sa2[k] t=S pa2[k] tG L[k] t,k=1,2,3,...,N
Wherein, S sa2for the noise signal amplitude spectrum after shaping, S pa2for the noise signal amplitude spectrum after pre-filtering, G lfor the short time envelope of voice signal, the short time envelope G of voice signal lfor the vector of length N, element coefficient is between 0 to 1.
In preferred embodiment, in step 6, the energy Ratios of acquisition is full-band energy ratio, and calculates by following formula:
r t 1 = &Sigma; k = 1 N S sa 1 [ k ] t &epsiv; + &Sigma; k = 1 N S sa 2 [ k ] t
Wherein, S sa1for the noisy speech signal amplitude spectrum after shaping, S sa2for the noise signal amplitude spectrum after shaping, for full-band energy ratio, ε prevents the small positive number except zero error.
As the optional embodiment of one, in step 6, the energy Ratios of acquisition is sub-band energy ratio, and calculates by following formula:
r t 2 = &Sigma; k = Ks Ke S sa 1 [ k ] t &epsiv; + &Sigma; k = Ks Ke S sa 2 [ k ] t
Wherein, S sa1for the noisy speech signal amplitude spectrum after shaping, S sa2for the noise signal amplitude spectrum after shaping, r t2for sub-band energy ratio, ε prevents the small positive number except zero error, K sfor the beginning sequence number of sub-band, K efor the end sequence number of sub-band.
In preferred embodiment, in step 7, the energy Ratios obtained and a predetermined threshold value can be compared in step 6;
When energy Ratios is greater than predetermined threshold value, judge that the signal of corresponding frequency band of moment residing for energy Ratios is voice;
When energy Ratios is less than predetermined threshold value, judge that the signal of corresponding frequency band of moment residing for energy Ratios is noise;
Predetermined threshold value is the arithmetic number between 0 to 1.
As the optional embodiment of one, in step 7, in judge before can first by following formula in step 6 obtain the smoothing process of energy Ratios:
r st = &alpha; r r st - 1 + ( 1 - &alpha; r ) r st , r st > r st - 1 , &beta; r r st - 1 + ( 1 - &beta; r ) r st , r st &le; r st - 1 ,
Wherein, r stfor the energy Ratios after smoothing processing, α rand β rbe smoothing factor, 0≤α r≤ β r<1;
Preferably, smoothing factor α r=1/16, smoothing factor β r=3/4.
As further embodiment, can just compare through the energy Ratios of smoothing processing and a predetermined threshold value;
Energy Ratios through smoothing processing is greater than predetermined threshold value, judges that the signal of corresponding frequency band of moment residing for energy Ratios is voice;
Energy Ratios through smoothing processing is less than predetermined threshold value, judges that the signal of corresponding frequency band of moment residing for energy Ratios is noise;
Predetermined threshold value is the arithmetic number between 0 to 1.
In preferred embodiment, above-mentioned predetermined threshold value is 0.25.
Also comprise in technical scheme of the present invention, a kind of voice capture device, adopt above-mentioned dual microphone voice-activation detecting method.
Fig. 4 is the example one section of noisy speech being carried out to voice activation detection, and this section of voice continue 35 seconds, and noise is the noise of aircraft engine, and it arrives little consecutive variations from small to large again, and voice are almost covered by intermediate period completely.This section of single pass sound spectrograph of noisy speech master is shown in the top of Fig. 4.Shown in the middle part of Fig. 4 is energy Ratios curve of the prior art, and bottom is the energy Ratios curve obtained by the embodiment of technical scheme of the present invention.For the ease of observing, energy Ratios r is here converted into logarithm value reciprocal, i.e.-log 2(r).Easily see, along with the increase of noise energy, energy Ratios curve contrast of the prior art is more and more less, and threshold decision is carried out in from 10 second to 25 second this interval very large at noise, the syllable that meeting lost part energy is less; And still remain larger contrast by the energy Ratios curve that the embodiment of technical solution of the present invention obtains, the voice activation that it can provide accuracy higher judges.
In sum, a kind of dual microphone voice-activation detecting method and voice capture device is provided in embodiments of the invention, first it carry out frequency domain conversion to noisy speech signal and noise signal, obtain frequency domain amplitude spectrum, then prefilter is used to carry out pre-filtering to it respectively, obtain the amplitude spectrum after pre-filtering, voice signal short time envelope is used to carry out short time envelope shaping to the amplitude spectrum after pre-filtering subsequently, cumulative comparison is carried out to the amplitude spectrum after shaping, obtain the energy Ratios of noisy speech signal and noise signal, energy Ratios is finally used to carry out voice activation judgement.By the embodiment of technical solution of the present invention, the accuracy rate that voice activation judges under Low SNR can be significantly improved.
Above preferred embodiment of the present invention is described.It is to be appreciated that the present invention is not limited to above-mentioned particular implementation, the equipment wherein do not described in detail to the greatest extent and structure are construed as to be implemented with the common mode in this area; Any those of ordinary skill in the art, do not departing under technical solution of the present invention ambit, the Method and Technology content of above-mentioned announcement all can be utilized to make many possible variations and modification to technical solution of the present invention, or being revised as the Equivalent embodiments of equivalent variations, this does not affect flesh and blood of the present invention.Therefore, every content not departing from technical solution of the present invention, according to technical spirit of the present invention to any simple modification made for any of the above embodiments, equivalent variations and modification, all still belongs in the scope of technical solution of the present invention protection.

Claims (26)

1. a dual microphone voice-activation detecting method, is characterized in that, comprises the following steps:
Step 1, obtain the noise signal of a noisy speech signal and a corresponding described noisy speech signal;
Step 2, frequency domain conversion is carried out to described noisy speech signal, to obtain noisy speech signal amplitude spectrum, and frequency domain conversion is carried out to described noise signal, to obtain noise signal amplitude spectrum;
Step 3, to described noisy speech signal amplitude spectrum and described noise signal amplitude spectrum carry out pre-filtering respectively;
The short time envelope of step 4, acquisition voice signal;
Step 5, the short time envelope of described voice signal is utilized to carry out shaping to the described noise signal amplitude spectrum after the described noisy speech signal amplitude spectrum after pre-filtering and pre-filtering;
Step 6, carried out to the described noise signal amplitude spectrum after the described noisy speech signal amplitude spectrum after shaping and shaping cumulative comparison, to obtain an energy Ratios;
Step 7: judge whether to carry out voice activation according to described energy Ratios.
2. dual microphone voice-activation detecting method as claimed in claim 1, is characterized in that, in described step 2:
By discrete Fourier transform (DFT), or discrete cosine transform, or improvement cosine transform carries out frequency domain conversion to described noisy speech signal, obtains noisy speech signal amplitude spectrum; And/or
By discrete Fourier transform (DFT), or discrete cosine transform, or improvement cosine transform carries out frequency domain conversion to described noise signal, to obtain noise signal amplitude spectrum.
3. dual microphone voice-activation detecting method as claimed in claim 2, is characterized in that, adopts discrete Fourier transform (DFT) to obtain described noisy speech signal amplitude spectrum and is calculated by following formula:
S a 1 [ k ] k = | &Sigma; n = 1 N w ( n ) s 1 ( t - N + n ) e - 2 &pi;j ( n - 1 ) ( k - 1 ) N |
Wherein, S a1for described noisy speech signal amplitude spectrum, s 1t () is described noisy speech signal, e is the truth of a matter of natural logarithm, and j is imaginary unit, j=(-1) 0.5, k is discrete spectrum sequence number, k=1,2,3 ..., N, subscript t are discrete time sequence number, and w (k) is the window function of N point; And/or
Adopt discrete Fourier transform (DFT) to obtain described noise signal amplitude spectrum to be calculated by following formula:
S a 2 [ k ] t = | &Sigma; n = 1 N w ( n ) s 2 ( t - N + n ) e - 2 &pi;j ( n - 1 ) ( k - 1 ) N |
Wherein, S a2for described noise signal amplitude is composed, s 2t () is described is noise signal, and e is the truth of a matter of natural logarithm, and j is imaginary unit, j=(-1) 0.5, k is discrete spectrum sequence number, k=1,2,3 ..., N, subscript t are discrete time sequence number, and w (k) is the window function of N point.
4. dual microphone voice-activation detecting method as claimed in claim 3, it is characterized in that, the span of described N is f s/ 100/2<N<0.2f s, wherein f sfor sample frequency; Or sample frequency f sn=512 during=8000Hz.
5. dual microphone voice-activation detecting method as claimed in claim 3, is characterized in that, described window function adopts rectangular window or sinusoidal windows or Hanning window or hamming window or Tukey window.
6. dual microphone voice-activation detecting method as claimed in claim 1, is characterized in that, in described step 3:
The pre-filtering of described noisy speech signal amplitude spectrum is calculated by following formula:
S pa1[k] t=S a1[k] tG 1[k] t,k=1,2,3,...,N
Wherein, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, S a1for noisy speech signal amplitude spectrum, G 1for pre-filtering transport function, G 1for the vector of length N, element coefficient is between 0 to 1; And/or
The pre-filtering of described noise signal amplitude spectrum is calculated by following formula:
S pa2[k] t=S a2[k] tG 2[k] t,k=1,2,3,...,N
Wherein, S pa2for the noise signal amplitude spectrum after pre-filtering, S a2for noise signal amplitude spectrum, G 2for pre-filtering transport function, G 2for the vector of length N, element coefficient is between 0 to 1.
7. dual microphone voice-activation detecting method as claimed in claim 6, it is characterized in that, adopt frequency domain S filter to carry out pre-filtering to described noisy speech signal amplitude spectrum, the frequency domain S filter described noisy speech signal amplitude spectrum being carried out to filtering is calculated by following formula:
G 1 [ k ] t = max ( P s 1 [ k ] t - P n 1 [ k ] t , 0 ) P s 1 [ k ] t
Wherein, P s1for the auto-power spectrum of noisy speech signal, P n1for the auto-power spectrum of noise in described noisy speech signal; And/or
Adopt frequency domain S filter to carry out pre-filtering to described noise signal amplitude spectrum, the frequency domain S filter that described noise signal amplitude spectrum carries out filtering calculated by following formula:
G 2 [ k ] t = max ( P s 2 [ k ] t - P n 2 [ k ] t , 0 ) P s 2 [ k ] t
Wherein, P s2for the auto-power spectrum of noise signal, P n2for the auto-power spectrum of noise in noise signal.
8. dual microphone voice-activation detecting method as claimed in claim 6, it is characterized in that, adopt frequency domain S filter to carry out pre-filtering to described noisy speech signal amplitude spectrum, the frequency domain S filter described noisy speech signal amplitude spectrum being carried out to filtering is calculated by following formula:
G 1 [ k ] t = SNR 1 [ k ] t SNR 1 [ k ] t + 1
SNR 1[k] t=α 1G 1[k] t-1 2SNR P1[k] t-1+(1-α 1)max(SNR P1[k] t-1,0)
SNR P 1 [ k ] t = P s 1 [ k ] t P n 1 [ k ] t
Wherein, SNR 1for the signal to noise ratio (S/N ratio) of noisy speech signal, SNR p1for the posteriori SNR of noisy speech signal, P s1for the auto-power spectrum of noisy speech signal, P n1for the auto-power spectrum of noise in described noisy speech signal, α 1and α 2span is 0< α 1, α 2<1; And/or,
Adopt frequency domain S filter to carry out pre-filtering to described noise signal amplitude spectrum, the frequency domain S filter that described noise signal amplitude spectrum carries out filtering calculated by following formula:
G 2 [ k ] t = SNR 2 [ k ] t SNR 2 [ k ] t + 1
SNR 2[k] t=α 2G 2[k] t-1 2SNR P2[k] t-1+(1-α 2)max(SNR P2[k] t-1,0)
SNR P 2 [ k ] t = P s 2 [ k ] t P n 2 [ k ] t
Wherein, SNR 2for the signal to noise ratio (S/N ratio) of noise signal, SNR p2for the posteriori SNR of noise signal, P s2for the auto-power spectrum of noise signal, P n2for the auto-power spectrum of noise in noise signal, α 1and α 2span is 0< α 1, α 2<1.
9. dual microphone voice-activation detecting method as claimed in claim 7 or 8, is characterized in that, the auto-power spectrum P of described noisy speech signal s1calculated by following formula:
P s1=S a1 2
Wherein, S a1for the described noisy speech signal amplitude spectrum that described noisy speech signal is formed after frequency domain conversion; And/or
The auto-power spectrum P of described noise signal s2calculated by following formula:
P s2=S a2 2
Wherein, S a2for the described noise signal amplitude spectrum that described noise signal is formed after frequency domain conversion.
10. dual microphone voice-activation detecting method as claimed in claim 7 or 8, is characterized in that, the auto-power spectrum P of noise in described noisy speech signal n1estimated by following formula:
P n 1 [ k ] t = &eta; 1 P n 1 [ k ] t - 1 + ( 1 - &eta; 1 ) P s 1 [ k ] t , P n 1 [ k ] t - 1 > P s 1 [ k ] t max ( P n 1 [ k ] t - 1 , &eta; 2 P n 1 [ k ] t - 1 + ( 1 - &eta; 2 ) P s 1 [ k ] t - &eta; 3 P s 1 [ k ] t - 1 ( 1 - &eta; 3 ) ) , P n 1 [ k ] t - 1 &le; P s 1 [ k ] t
Wherein, subscript t is discrete time sequence number, η 1, η 2, η 3for smoothing factor, span is 0< η 1, η 2, η 3<1; And/or
The auto-power spectrum P of noise in described noise signal n2estimated by following formula:
P n 2 [ k ] t = &eta; 1 P n 2 [ k ] t - 1 + ( 1 - &eta; 1 ) P s 2 [ k ] t , P n 2 [ k ] t - 1 > P s 2 [ k ] t max ( P n 2 [ k ] t - 1 , &eta; 2 P n 2 [ k ] t - 1 + ( 1 - &eta; 2 ) P s 2 [ k ] t - &eta; 3 P s 2 [ k ] t - 1 ( 1 - &eta; 3 ) ) , P n 2 [ k ] t - 1 &le; P s 2 [ k ] t
Wherein, subscript t is discrete time sequence number, η 1, η 2, η 3for smoothing factor, span is 0< η 1, η 2, η 3<1.
11. dual microphone voice-activation detecting methods as claimed in claim 1, it is characterized in that, in described step 4, the short time envelope of described voice signal is calculated by following formula:
G L [ k ] = S a [ k ] max [ S a [ k ] ] k = 1 N , k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of described voice signal, S afor Short Time Speech amplitude spectrum.
12. dual microphone voice-activation detecting methods as claimed in claim 11, is characterized in that, described Short Time Speech amplitude spectrum S athe short-time average magnitude spectrum of the enhancing signal adopting described noisy speech signal to export after speech enhan-cement substitutes; Or
Described Short Time Speech amplitude spectrum S aadopt the short-time average of the noisy speech signal amplitude spectrum of described noisy speech signal after pre-filtering to substitute, and calculated by following formula:
S a [ k ] = &alpha; sa S a [ k ] t - 1 + ( 1 - &alpha; sa ) S pa 1 [ k ] t , S pa 1 [ k ] t > S a [ k ] t - 1 &beta; sa S a [ k ] t - 1 + ( 1 - &beta; sa ) S pa 1 [ k ] t , S pa 1 [ k ] t &le; S a [ k ] t - 1 , k = 1,2,3 , . . . , N
Wherein, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, α seand β sebe smoothing factor, 0≤α se≤ β se<1.
13. dual microphone voice-activation detecting methods as claimed in claim 12, is characterized in that, described smoothing factor α sa=1/2, described smoothing factor β sa=31/32.
14. dual microphone voice-activation detecting methods as claimed in claim 1, it is characterized in that, in described step 4, the short time envelope of described voice signal is calculated by following formula:
G L [ k ] = S e [ k ] max [ S e [ k ] k = 1 N , k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of described voice signal, S efor Short Time Speech energy spectrum; Or
G L [ k ] = S e [ k ] max [ S e [ k ] k = 1 N , k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of described voice signal, S efor Short Time Speech energy spectrum.
15. dual microphone voice-activation detecting methods as claimed in claim 14, is characterized in that, described Short Time Speech energy spectrum S ethe short-time average energy spectrum of the enhancing signal adopting described noisy speech signal to export after speech enhan-cement substitutes; Or
Described Short Time Speech energy spectrum S eadopt the noisy speech signal amplitude spectrum S of described noisy speech signal after pre-filtering pa1square mean in short-term replace, and to be calculated by following formula:
S e [ k ] = &alpha; se S e [ k ] t - 1 + ( 1 - &alpha; se ) S pa 1 [ k ] t 2 , S pa 1 [ k ] t 2 > S e [ k ] t - 1 &beta; se S e [ k ] t - 1 + ( 1 - &beta; se ) S pa 1 [ k ] t 2 , S pa 1 [ k ] t 2 &le; S e [ k ] t - 1 , k = 1,2,3 , . . . , N
Wherein, S pa1for the noisy speech signal amplitude spectrum after pre-filtering, α seand β sebe smoothing factor, and 0≤α se≤ β se<1.
16. dual microphone voice-activation detecting methods as claimed in claim 15, is characterized in that, described smoothing factor α se=1/2, described smoothing factor β se=31/32.
17. dual microphone voice-activation detecting methods as claimed in claim 2, is characterized in that, in described step 4, the short time envelope of described voice signal is by the pre-filtering transport function G to noisy speech signal amplitude spectrum 1or the pre-filtering special delivery function G of noise signal amplitude spectrum 2smoothly obtain in short-term:
As the pre-filtering transport function G by noisy speech signal amplitude spectrum 1when obtaining the short time envelope of described voice signal, calculated by following formula:
G L [ k ] t = &alpha; G G L [ k ] t + ( 1 - &alpha; G ) G 1 [ k ] t , G 1 [ k ] t > G L [ k ] t &beta; G G L [ k ] t + ( 1 - &beta; G ) G 1 [ k ] t , G 1 [ k ] t &le; G L [ k ] t , k = 1,2,3 , . . . , N
Wherein, G lfor the short time envelope of described voice signal, α gand β gbe smoothing factor, and 0≤α g≤ β g<1.
18. dual microphone voice-activation detecting methods as claimed in claim 17, is characterized in that, described smoothing factor α g=1/2, described smoothing factor β g=31/32.
19. dual microphone voice-activation detecting methods as claimed in claim 1, is characterized in that, in described step 5:
Utilize the short time envelope of described voice signal to carry out shaping to the described noisy speech signal amplitude spectrum after pre-filtering to be calculated by following formula:
S sa1[k] t=S pa1[k] tG L[k] t,k=1,2,3,...,N
Wherein, S sa1for the described noisy speech signal amplitude spectrum after shaping, S pa1for the described noisy speech signal amplitude spectrum after pre-filtering, G lfor the short time envelope of described voice signal, G lfor the vector of length N, element coefficient is between 0 to 1; And/or
Utilize the short time envelope of described voice signal to carry out shaping to the described noise signal amplitude spectrum after pre-filtering to be calculated by following formula:
S sa2[k] t=S pa2[k] tG L[k] t,k=1,2,3,...,N
Wherein, S sa2for the described noise signal amplitude spectrum after shaping, S pa2for the described noise signal amplitude spectrum after pre-filtering, G lfor the short time envelope of described voice signal, G lfor the vector of length N, element coefficient is between 0 to 1.
20. dual microphone voice-activation detecting methods as claimed in claim 1, is characterized in that, in described step 6, the described energy Ratios of acquisition is full-band energy ratio, and is calculated by following formula:
r t 1 = &Sigma; k = 1 N S sa 1 [ k ] t &epsiv; + &Sigma; k = 1 N S sa 2 [ k ] t
Wherein, S sa1for the noisy speech signal amplitude spectrum after shaping, S sa2for the noise signal amplitude spectrum after shaping, for full-band energy ratio, ε prevents the small positive number except zero error.
21. dual microphone voice-activation detecting methods as claimed in claim 1, is characterized in that, in described step 6, the described energy Ratios of acquisition is sub-band energy ratio, and is calculated by following formula:
r t 2 = &Sigma; k = Ks Ke S sa 1 [ k ] t &epsiv; + &Sigma; k = Ks Ke S sa 2 [ k ] t
Wherein, S sa1for the noisy speech signal amplitude spectrum after shaping, S sa2for the noise signal amplitude spectrum after shaping, r t2for sub-band energy ratio, ε prevents the small positive number except zero error, K sfor the beginning sequence number of sub-band, K efor the end sequence number of sub-band.
22. dual microphone voice-activation detecting methods as claimed in claim 1, is characterized in that, in described step 7, described energy Ratios and a predetermined threshold value are compared;
When described energy Ratios is greater than described predetermined threshold value, judge that the signal of corresponding frequency band of moment residing for described energy Ratios is voice;
When described energy Ratios is less than described predetermined threshold value, judge that the signal of corresponding frequency band of moment residing for described energy Ratios is noise;
Described predetermined threshold value is the arithmetic number between 0 to 1.
23. dual microphone voice-activation detecting methods as claimed in claim 1, is characterized in that, in described step 7, in judgement before by following formula to the smoothing process of described energy Ratios:
r st = &alpha; r r st - 1 + ( 1 - &alpha; r ) r st , r st > r st - 1 , &beta; r r st - 1 + ( 1 - &beta; r ) r st , r st &le; r st - 1 ,
Wherein, r stfor the energy Ratios after smoothing processing, α rand β rbe smoothing factor, 0≤α r≤ β r<1;
Compare through the described energy Ratios of smoothing processing and a predetermined threshold value;
Described energy Ratios through smoothing processing is greater than described predetermined threshold value, judges that the signal of corresponding frequency band of moment residing for described energy Ratios is voice;
Described energy Ratios through smoothing processing is less than described predetermined threshold value, judges that the signal of corresponding frequency band of moment residing for described energy Ratios is noise;
Described predetermined threshold value is the arithmetic number between 0 to 1.
24. dual microphone voice-activation detecting methods as claimed in claim 23, is characterized in that, described smoothing factor α r=1/16, described smoothing factor β r=3/4.
25. dual microphone voice-activation detecting methods as described in claim 22 or 23, it is characterized in that, described predetermined threshold value is 0.25.
26. 1 kinds of voice capture device, adopt the dual microphone voice-activation detecting method as described in claim 1-25.
CN201410524677.5A 2014-10-08 2014-10-08 Double-microphone voice active detection method and voice acquisition device Pending CN105575405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410524677.5A CN105575405A (en) 2014-10-08 2014-10-08 Double-microphone voice active detection method and voice acquisition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410524677.5A CN105575405A (en) 2014-10-08 2014-10-08 Double-microphone voice active detection method and voice acquisition device

Publications (1)

Publication Number Publication Date
CN105575405A true CN105575405A (en) 2016-05-11

Family

ID=55885456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410524677.5A Pending CN105575405A (en) 2014-10-08 2014-10-08 Double-microphone voice active detection method and voice acquisition device

Country Status (1)

Country Link
CN (1) CN105575405A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106098076A (en) * 2016-06-06 2016-11-09 成都启英泰伦科技有限公司 A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method
CN107833579A (en) * 2017-10-30 2018-03-23 广州酷狗计算机科技有限公司 Noise cancellation method, device and computer-readable recording medium
CN108597498A (en) * 2018-04-10 2018-09-28 广州势必可赢网络科技有限公司 A kind of multi-microphone voice acquisition method and device
CN109346062A (en) * 2018-12-25 2019-02-15 苏州思必驰信息科技有限公司 Sound end detecting method and device
CN110349566A (en) * 2019-07-11 2019-10-18 龙马智芯(珠海横琴)科技有限公司 Voice awakening method, electronic equipment and storage medium
CN110514452A (en) * 2019-08-27 2019-11-29 武汉理工大学 A method of it eliminating engine noise and crosstalk is tested to induction noise
CN115238867A (en) * 2022-07-28 2022-10-25 广东电力信息科技有限公司 Power failure positioning method based on intelligent identification of customer service unstructured data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102077274A (en) * 2008-06-30 2011-05-25 杜比实验室特许公司 Multi-microphone voice activity detector
CN103219012A (en) * 2013-04-23 2013-07-24 中国人民解放军总后勤部军需装备研究所 Double-microphone noise elimination method and device based on sound source distance
CN103258539A (en) * 2012-02-15 2013-08-21 展讯通信(上海)有限公司 Method and device for transforming voice signal characteristics
US20140188467A1 (en) * 2009-05-01 2014-07-03 Aliphcom Vibration sensor and acoustic voice activity detection systems (vads) for use with electronic systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102077274A (en) * 2008-06-30 2011-05-25 杜比实验室特许公司 Multi-microphone voice activity detector
US20140188467A1 (en) * 2009-05-01 2014-07-03 Aliphcom Vibration sensor and acoustic voice activity detection systems (vads) for use with electronic systems
CN103258539A (en) * 2012-02-15 2013-08-21 展讯通信(上海)有限公司 Method and device for transforming voice signal characteristics
CN103219012A (en) * 2013-04-23 2013-07-24 中国人民解放军总后勤部军需装备研究所 Double-microphone noise elimination method and device based on sound source distance

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106098076A (en) * 2016-06-06 2016-11-09 成都启英泰伦科技有限公司 A kind of based on dynamic noise estimation time-frequency domain adaptive voice detection method
CN107833579A (en) * 2017-10-30 2018-03-23 广州酷狗计算机科技有限公司 Noise cancellation method, device and computer-readable recording medium
CN107833579B (en) * 2017-10-30 2021-06-11 广州酷狗计算机科技有限公司 Noise elimination method, device and computer readable storage medium
CN108597498A (en) * 2018-04-10 2018-09-28 广州势必可赢网络科技有限公司 A kind of multi-microphone voice acquisition method and device
CN109346062A (en) * 2018-12-25 2019-02-15 苏州思必驰信息科技有限公司 Sound end detecting method and device
CN109346062B (en) * 2018-12-25 2021-05-28 思必驰科技股份有限公司 Voice endpoint detection method and device
CN110349566A (en) * 2019-07-11 2019-10-18 龙马智芯(珠海横琴)科技有限公司 Voice awakening method, electronic equipment and storage medium
CN110349566B (en) * 2019-07-11 2020-11-24 龙马智芯(珠海横琴)科技有限公司 Voice wake-up method, electronic device and storage medium
CN110514452A (en) * 2019-08-27 2019-11-29 武汉理工大学 A method of it eliminating engine noise and crosstalk is tested to induction noise
CN110514452B (en) * 2019-08-27 2021-04-20 武汉理工大学 Method for eliminating crosstalk of engine noise on intake noise test
CN115238867A (en) * 2022-07-28 2022-10-25 广东电力信息科技有限公司 Power failure positioning method based on intelligent identification of customer service unstructured data
CN115238867B (en) * 2022-07-28 2023-06-13 广东电力信息科技有限公司 Power fault positioning method based on intelligent customer service unstructured data identification

Similar Documents

Publication Publication Date Title
CN105575405A (en) Double-microphone voice active detection method and voice acquisition device
CN102915742B (en) Single-channel monitor-free voice and noise separating method based on low-rank and sparse matrix decomposition
CN103117066B (en) Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum
CN103117067B (en) Voice endpoint detection method under low signal-to-noise ratio
CN109378013B (en) Voice noise reduction method
WO2021114733A1 (en) Noise suppression method for processing at different frequency bands, and system thereof
Bou-Ghazale et al. A robust endpoint detection of speech for noisy environments with application to automatic speech recognition
WO2014153800A1 (en) Voice recognition system
CN103594094B (en) Adaptive spectra subtraction real-time voice strengthens
CN101609686B (en) Objective assessment method based on voice enhancement algorithm subjective assessment
CN105023572A (en) Noised voice end point robustness detection method
CN103730126B (en) Noise suppressing method and noise silencer
CN103021405A (en) Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
van Hout et al. A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition
Gerkmann et al. Speech presence probability estimation based on temporal cepstrum smoothing
Wang et al. Oracle performance investigation of the ideal masks
Martín-Doñas et al. Dual-channel DNN-based speech enhancement for smartphones
CN109102823A (en) A kind of sound enhancement method based on subband spectrum entropy
Gupta et al. Speech enhancement using MMSE estimation and spectral subtraction methods
TWI396186B (en) Speech enhancement technique based on blind source separation for far-field noisy speech recognition
CN113744725A (en) Training method of voice endpoint detection model and voice noise reduction method
CN103839544A (en) Voice activity detection method and apparatus
Zhao et al. Adaptive wavelet packet thresholding with iterative Kalman filter for speech enhancement
Trawicki et al. Speech enhancement using Bayesian estimators of the perceptually-motivated short-time spectral amplitude (STSA) with Chi speech priors
Graf et al. Improved performance measures for voice activity detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160511

RJ01 Rejection of invention patent application after publication