CN102347027A

CN102347027A - Double-microphone speech enhancer and speech enhancement method thereof

Info

Publication number: CN102347027A
Application number: CN2011101894436A
Authority: CN
Inventors: 叶利剑
Original assignee: AAC Acoustic Technologies Shenzhen Co Ltd; AAC Acoustic Technologies Changzhou Co Ltd; AAC Acoustic Technologies Nanjing Co Ltd
Current assignee: AAC Technologies Holdings Shenzhen Co Ltd; AAC Technologies Holdings Changzhou Co Ltd; AAC Technologies Holdings Nanjing Co Ltd; AAC Technologies Holdings Inc
Priority date: 2011-07-07
Filing date: 2011-07-07
Publication date: 2012-02-08

Abstract

The invention provides a double-microphone speech enhancer, which comprises a microphone array module and a signal processing chip electrically connected with the microphone array module, the microphone array module comprises a first microphone, a second microphone and analog-to-digital converters, the first microphone and the second microphone are used for acquiring time domain audio signals, the analog-to-digital converters are used for converting the time domain audio signals into digital audio signals, wherein a self-adaptive filter module, a speech enhancement processing module and an output module are arranged in the signal processing chip. The invention also provides a speech enhancement method based on the double-microphone speech enhancer. Compared with the prior art, the double-microphone speech enhancer is easy to mount, and has little limitation and a good noise-eliminating effect.

Description

Dual microphone speech sound enhancement device and sound enhancement method thereof

[technical field]

The present invention relates to a kind of speech sound enhancement device and sound enhancement method thereof, relate in particular to a kind of dual microphone speech sound enhancement device and sound enhancement method thereof.

[Environmental Technology]

Development along with wireless telecommunications; Global Mobile Phone Users is more and more, and the user not only is satisfied with conversation to the requirement of mobile phone, and wants high-quality communication effect can be provided; Especially at present the development of mobile multimedia technology, the speech quality of mobile phone becomes more important.

Because the existence of a large amount of neighbourhood noises, the general signal to noise ratio (S/N ratio) of the voice signal that the microphone of communication apparatus such as mobile phone collects is not high enough, particularly in high-noise environments such as street automobile, needs to improve volume the other side is not heard.So need promote the signal to noise ratio (S/N ratio) of input voice through the method that voice strengthen, improve communication quality.Correlation technique single channel sound enhancement method noise reduction is limited; And can cause big distortion to voice; The dual microphone speech sound enhancement device then can effectively improve signal to noise ratio (S/N ratio), and the dual microphone speech sound enhancement device of correlation technique comprises microphone and multiple signal processing chip.

Yet; Coherence request to two microphones in the dual microphone speech sound enhancement device of correlation technique is high; And two required distance strictnesses that microphone is separated by; This structure limitation is big, makes the dual microphone speech sound enhancement device of correlation technique be difficult for being installed in the portable communication apparatus such as mobile phone.

Therefore, be necessary to propose a kind of new dual microphone speech sound enhancement device and solve the problems referred to above.

[summary of the invention]

The technical matters that the present invention need solve provides a kind of easy mounting, the dual microphone speech sound enhancement device that limitation is little.

According to above-mentioned technical matters, designed a kind of dual microphone speech sound enhancement device, its objective is such realization, a kind of dual microphone speech sound enhancement device, it comprises microphone array module and the signal processing chip that is electrically connected with said microphone array module.Said microphone array module comprises first microphone that is used to gather time-domain audio signal, second microphone and converts the said time-domain audio signal of said first microphone, the second microphone collection analog to digital converter of digital audio and video signals to.Wherein, be provided with in the said signal processing chip:

The auto adapted filtering module comprises sef-adapting filter, is used to receive the said digital audio and video signals of said analog to digital converter output and it is carried out auto adapted filtering handle, and obtains preliminary reducing noise of voice signal;

Voice enhancement process module comprises low-pass filter and Hi-pass filter, is used to receive the said preliminary reducing noise of voice signal of said auto adapted filtering module output and it is carried out the voice enhancement process, obtains removing voice signal after the de-noising of neighbourhood noise;

Output module is used to export voice signal after the de-noising of said removal neighbourhood noise.

Preferably, said first microphone and second microphone are all omni-directional microphone.

The present invention also provides a kind of sound enhancement method of dual microphone speech sound enhancement device as claimed in claim 2, and wherein, this method comprises the steps:

Step S001: receive extraneous first time-domain audio signal and second time-domain audio signal respectively through said first microphone and second microphone; And said first time-domain audio signal and second time-domain audio signal that will receive are exported to said analog to digital converter respectively; Change it into first digital audio and video signals and second digital audio and video signals by said analog to digital converter; First digital audio and video signals is made as main digital audio and video signals x1, and second digital audio and video signals is made as reference signal x2;

Step S002: the sef-adapting filter through said auto adapted filtering module receives said main digital audio and video signals x1 and it is carried out Filtering Processing and obtains main output signal; Again said reference signal x2 and said main output signal are differed and draw error signal; Said error signal is carried out adaptive control to the coefficient of said sef-adapting filter, thereby obtains preliminary reducing noise of voice signal:

If said adaptive filter coefficient is W,

Main output signal is made as y, then:

y(n)＝W(n)·x1(n)

Then available error signal is made as e:

e(n)＝x2(n)-y(n)＝x2(n)-W(n)·x1(n)

More new formula is following for then said adaptive filter coefficient:

W (n + 1) = W (n) + \frac{μ}{γ + {| | x 1 (n) | |}^{2}} \cdot x 1 (n) \cdot e^{*} (n)

Wherein μ is a converging factor, and its span is: 0＜μ＜0.5,

γ avoids coefficient to disperse, γ=0.001,

e ^*(n) be the associate matrix of e (n);

Step S003: through voice enhancement process module the preliminary reducing noise of voice signal of said sef-adapting filter output is carried out the voice enhancement process, obtain removing voice signal after the de-noising of neighbourhood noise;

Step S004: receive voice signal and output after the de-noising of said removal neighbourhood noise of said voice enhancement process module output through said output module.

Preferably, the voice enhancement processing method comprises the steps: among the step S003

Steps A 001: with the said Hi-pass filter of preliminary reducing noise of voice signal process of sef-adapting filter output; Handle as pre-emphasis; Again it is become frequency-region signal through Short Time Fourier Transform, and frequency-region signal is divided into some frequency bands, calculate the energy of each frequency band and carry out smoothing processing;

Its form of Hi-pass filter is following:

H(z)＝1-αz ^-1

Wherein α is a constant, α=0.9325

Short Time Fourier Transform is following:

X (f, m) = \frac{2}{M} Σ_{n = 0}^{M - 1} win (n - m) \times x (m) e^{- 2 πjf \frac{n}{M}}

0≤k1≤M-1

Wherein, M is the computational length of Fourier Tranform in short-term, and f representes frequency values, and X representes frequency-region signal, and x representes that second falls the sampling rate digital signal;

Hamming window function defines as follows:

Frequency-region signal Y is divided into some frequency band k, calculates the energy Y of each frequency band _E(m, k), adopt following method to calculate the energy of each frequency band and carry out level and smooth:

E(m，k)＝|X(m，k)| ² 0≤k≤N-1

Y _E(m，k)＝αY _E(m-1，k)+(1-α)E(m，m) 0≤k≤N-1

Wherein, Y _E(m representes the sequence number of present frame for m, k) each frequency band interval energy of expression after level and smooth, and k representes the sequence number of current subband, and smoothing factor is represented in α=0.75; N is the frequency band sum of choosing, E (m, k) expression frequency band energy value, X (m, k) frequency-region signal of k frequency band of expression m frame;

Steps A 002: calculate priori SNR estimation value

with counter

If initial noise energy estimated value V (0; K)=0, initial priori SNR estimation value

Energy Y by each frequency band _E(m, k) and the noise energy estimated value V that obtains of the former frame of present frame (m-1 k), calculates the posteriority signal to noise ratio (S/N ratio) of each frequency band present frame

And by the priori SNR estimation value of former frame

Obtain the priori SNR estimation value of present frame

Voice signal behind the noise reduction that the former frame of expression present frame obtains at last, E{|V (m, k) | ²Expression noise energy estimated value, α is first smoothing factor;

Steps A 003: adopt the weighted noise estimation technique that priori SNR estimation value

is revised, obtain revised priori SNR estimation value

Priori SNR estimation value Multiply by the weighted factor and remove q _θ, obtain revised priori SNR estimation value The weighted factor calculation is following:

γ wherein ₁Get 1.5, γ ₂Get 200, θ _zGet 20;

Steps A 004: according to revised priori SNR estimation value

calculate each frequency band decay gain q (m, k):

Wherein, for different frequency bands, a is different constant;

Steps A 005: with the frequency domain signal X (m of each frequency band of present frame; K); Multiply by the decay gain of frequency band, obtain voice signal

after the enhancing of this frequency band

\hat{S} (m, k) = q (m, k) * X (m, k)

0≤k≤N-1

Steps A 006: utilize revised priori SNR estimation value

that present frame is judged, judge that whether present frame is noise and according to court verdict the noise energy estimated value of each frequency band is upgraded:

If be judged as noise, then the noise energy estimated value equals the value of the former frame of present frame, promptly V (m, k)=(m-1 k), otherwise adopts frequency band energy Y to V _E(m, k), (m k) upgrades, and (m k), is used for the steps A 002 of next frame, carries out the estimation of posteriority signal to noise ratio (S/N ratio) with this noise energy estimated value V to noise energy estimated value V;

V(m，k)＝μV(m-1，k)+(1-μ)·E(m，k)

μ representes second smoothing factor;

Steps A 007: will strengthen back voice signal

and be transformed into time-domain signal with Short Time Fourier Transform; With this time-domain signal through the processing of postemphasising of said low-pass filter; Obtain removing voice signal after the de-noising of neighbourhood noise, its form of said low-pass filter is following:

H (z)=1+ α z ^-1, wherein α is a constant, α=0.9325.

Preferably, comprise also in the steps A 004 that adopting threshold judgement that said decay is gained adjusts;

At first set a threshold values, adjusted value qmod and threshold value qfloor;

Secondly be judgement with the revised priori SNR estimation of present frame value

; Gain coefficient less than a certain threshold value all multiply by adjusted value qmod, thereby further suppresses noise;

Then all are adjusted to threshold value qfloor less than the gain coefficient of certain threshold values, method is following:

Q wherein _Mod=0.1, θ _G=1.2, q _Floor=0.01.

Compare with correlation technique, dual microphone speech sound enhancement device limitation of the present invention is little, easy mounting.

[description of drawings]

Fig. 1 is the schematic flow sheet of the sound enhancement method of dual microphone speech sound enhancement device of the present invention.

Fig. 2 is the structured flowchart of dual microphone speech sound enhancement device of the present invention.

Fig. 3 is the non-linear weighted curve map of the sound enhancement method of dual microphone speech sound enhancement device of the present invention.

[embodiment]

Below in conjunction with accompanying drawing and embodiment the present invention is described further.

As shown in Figure 2, a kind of dual microphone speech sound enhancement device 1, it comprises microphone array module 2 and the signal processing chip 3 that is electrically connected with microphone array module 2.Microphone array module 2 comprises first microphone 21 that is used to gather time-domain audio signal, second microphone 22 and the said time-domain audio signal that first microphone 21, second microphone 22 are gathered is converted to the analog to digital converter 23 of digital audio and video signals.Wherein, be provided with in the signal processing chip 3:

Auto adapted filtering module 31 comprises sef-adapting filter 311, is used to receive the said digital audio and video signals of said analog to digital converter output and it is carried out auto adapted filtering handle, and obtains preliminary reducing noise of voice signal;

Voice enhancement process module 32 comprises low-pass filter 321 and Hi-pass filter 322, is used to be received from the said preliminary reducing noise of voice signal of adaptive filtering module 31 outputs and it is carried out the voice enhancement process, obtains removing voice signal after the de-noising of neighbourhood noise;

Output module 4 is used to export voice signal after the de-noising of said removal neighbourhood noise.

In this embodiment, said first microphone 21 and second microphone 22 are all omni-directional microphone.

The present invention also provides a kind of sound enhancement method based on dual microphone speech sound enhancement device of the present invention, and in detail, the noise reduction algorithm basic step in this method is following:

1, the two-way time-domain signal that dual microphone is received advanced analog to digital conversion, obtained digital signal;

2, two ways of digital signals is handled through the auto adapted filtering of auto adapted filtering module, obtains preliminary reducing noise of voice signal;

3, preliminary reducing noise of voice signal is handled through undue frame, windowing, pre-emphasis; Transform to it in frequency domain by Short Time Fourier Transform again and be divided into some frequency bands; Calculate each frequency band energy and carry out smoothly, obtain the signal energy in each frequency band after level and smooth;

4,, calculate the posteriority signal to noise ratio (S/N ratio) of each frequency band present frame, and obtain the priori SNR estimation value of present frame by the priori SNR estimation value of former frame through signal energy and noise energy estimated value;

5, the priori SNR estimation value to obtaining adopts the weighted noise estimation technique to revise;

6,, calculate the decay gain factor of each frequency band according to the correction priori SNR estimation value that obtains;

7, adopt threshold judgement that gain coefficient is adjusted;

8, with the decay gain that obtains, the signal spectrum that is divided into a frequency band is handled;

9, by the priori SNR estimation value of revising present frame is judged, judged whether this signal is noise;

10, according to the noise decision result noise estimation value of each frequency band is upgraded;

11, the frequency-region signal after will handling transforms to time domain, to the signal that the transforms to time domain processing of postemphasising, obtains exporting signal again.

With instantiation this sound enhancement method is described below, this embodiment is example with the mobile phone, after dual microphone speech sound enhancement device of the present invention is used on the mobile phone, and ginseng Fig. 1, Fig. 3:

Step S001: receive extraneous first time-domain audio signal and second time-domain audio signal respectively through first microphone 21 and second microphone 22; And first time-domain audio signal that will receive and second time-domain audio signal are exported to analog to digital converter 23 respectively; Change it into first digital audio and video signals and second digital audio and video signals by analog to digital converter 23; First digital audio and video signals is made as main digital audio and video signals x1, and second digital audio and video signals is made as reference signal x2.Concrete, first microphone 21 and second microphone 22 are two omni-directional microphone, and analog to digital converter 23 outputting digital audio signals sampling rates are 16kHz, and resolution is 16.When using mobile phone communication, first microphone 21 is nearer from mouth, gathers Noisy Speech Signal, and it is made as x1; Second microphone 22 is far away from mouth, gathers ambient noise signal, and it is made as x2.Because first microphone 21 and second microphone, 22 distances are far away, both differ greatly for the voice near the mouth end, and neighbourhood noise is a remote signaling, and both have bigger consistance to it.Therefore with x1 as main digital audio and video signals, x2 is signal as a reference.

Step S002: the sef-adapting filter 311 through auto adapted filtering module 31 receives said main digital audio and video signals x1 and it is carried out Filtering Processing and obtains main output signal; Again said reference signal and said main output signal are differed and draw error signal; Said error signal is carried out adaptive control to the coefficient of sef-adapting filter 311, thereby obtains preliminary reducing noise of voice signal:

If adaptive filter coefficient is W,

Main output signal is made as y, then:

y(n)＝W(n)·x1(n)

Then available error signal is made as e:

e(n)＝x2(n)-y(n)＝x2(n)-W(n)·x1(n)

More new formula is following for then said adaptive filter coefficient:

W (n + 1) = W (n) + \frac{μ}{γ + {| | x 1 (n) | |}^{2}} \cdot x 1 (n) \cdot e^{*} (n)

Wherein μ is a converging factor, and its span is: 0＜μ＜0.5,

γ avoids coefficient to disperse, γ=0.001,

e ^*(n) be the associate matrix of e (n);

Step S003: the said preliminary reducing noise of voice signal through 32 pairs of sef-adapting filters of voice enhancement process module 311 output carries out the voice enhancement process, obtains removing voice signal after the de-noising of neighbourhood noise.

Step S004: receive voice signal and output after the de-noising of said removal neighbourhood noise of voice enhancement process modules 32 outputs through output module 4.

More excellent, in this embodiment among the step S003 voice enhancement processing method comprise the steps:

Because voice signal is stably in short-term, be feasible so voice signal is carried out the processing of branch frame, the frequency that causes but the branch frame can bring the discontinuous of frame signal boundary is again revealed.So will carry out Short Time Fourier Transform (STFT) to frame signal.Short Time Fourier Transform is appreciated that to Fourier transform is done in the windowing of frame signal elder generation again.The purpose of windowed function is exactly for when doing Short Time Fourier Transform, reduces the discontinuous frequency that causes of frame signal boundary and reveals.Used a length to equal the Hamming window of 256 of frame lengths in this embodiment, this Hamming window can effectively reduce the oscillation degree of Gibbs' effect, and is specific as follows:

Steps A 001: preliminary reducing noise of voice signal y (n) is carried out the branch frame.The so-called frame that divides is meant with preliminary reducing noise of voice signal y (n) to be that unit is divided into some preliminary reducing noise of voice signal elements with the frame.Said preliminary reducing noise of voice signal element is made up of sampled point; The digital microphone sampling rate is 3.072MHz among the present invention; Adopt 64 times of over-samplings, according to the needs that short-time spectrum is analyzed, frame length is generally set between 10～35ms; This embodiment divides frame with 16ms, and promptly a frame signals with noise unit is provided with 256 sampled points.Certainly, the preliminary reducing noise of voice signal element of any frame all has certain frame length, and the frame length of arbitrary frame is 256 among the present invention.

For the blocking effect between the preliminary reducing noise of voice signal element that prevents adjacent two frames; When minute frame, to make between the preliminary reducing noise of voice signal element of adjacent two frames certain aliasing part is arranged; That is, it is former frame section data data that D data are arranged in these frame data, and wherein aliasing partly is described below:

s(n)＝d(m，D+n) 0≤n＜L

Wherein xi representes to import preliminary reducing noise of voice signal, and i gets 1 and 2 and representes two paths of signals respectively

d(m，n)＝d(m-1，L+n) 0≤n＜D

Wherein, d representes 256 point sampling signals of present frame, because the length of any frame is 256, Duplication is 75%, so the sampled point number D=192 of lap.Distance L=256-192=64 that first sampled point of the preliminary reducing noise of voice signal element of consecutive frame is separated by.

Can have 50%～75% Duplication between the preliminary reducing noise of voice signal element of adjacent two frames of the present invention.This embodiment is chosen between the preliminary reducing noise of voice signal element of adjacent two frames has 75% Duplication, and promptly the preliminary reducing noise of voice signal element with 75% (192 point) after the preliminary reducing noise of voice signal element of preceding 75% (192 point) of this frame and the former frame is consistent.

Preliminary reducing noise of voice signal y (n) behind the branch frame of sef-adapting filter output is handled as pre-emphasis through Hi-pass filter, obtain second and fall the sampling rate digital signal.

Its form of Hi-pass filter is following:

H (z)=1-α z ^-1, constant alpha=0.9325 wherein

Fall the sampling rate digital signal with second and become frequency-region signal, and frequency-region signal is divided into some frequency bands, calculate the energy of each frequency band and it is carried out smoothing processing through Short Time Fourier Transform.

Short Time Fourier Transform is following:

X (f, m) = \frac{2}{M} Σ_{n = 0}^{M - 1} win (n - m) \times x (m) e^{- 2 πjf \frac{n}{M}}

0≤k1≤M-1

Hamming window function defines as follows:

Then Short Time Fourier Transform is following

Y (f, m) = \frac{2}{M} Σ_{n = 0}^{M - 1} win (n - m) \times y (m) e^{- 2 πjf \frac{n}{M}}

0≤k1≤M-1

Wherein, M=256 is the computational length of Fourier Tranform in short-term, and m representes the m frame signal.

So just the preliminary reducing noise of voice signal y of present frame is transformed from the time domain to and be frequency-region signal Y.

The preliminary reducing noise of voice signal that transforms to behind the frequency domain comprises voice signal and noise signal, and it is that unit is divided into some frequency bands with the frame.Frequency-region signal Y is divided into some frequency band k, calculates the energy of each frequency band.

To frequency is that the preliminary reducing noise of voice signal that transforms to behind the frequency domain below the 8kHz carries out frequency band division; Signal Processing is afterwards all carried out in each frequency band; So both can reduce computational complexity, can do different processing to different frequency bands again, obtain better effect.

Preliminary reducing noise of voice signal among the present invention is divided into 30 frequency bands altogether, specifically sees table 1.

30 frequency band division of table 1

Frequency band number	Initial frequency (Hz)	Cutoff frequency (Hz)
			1	62.5	93.75
2	125	156.25
			3	187.5	218.75
4	250	281.25
			5	312.5	343.75
6	375	406.25
			7	437.5	468.75
8	500	531.25
			9	562.5	593.75
10	625	656.25
			11	687.5	718.75
12	750	781.25
			13	812.5	906.25
14	937.5	1062.5
			15	1093.75	1250
16	1281.25	1468.75
			17	1500	1718.75
18	1750	2000
			19	2031.25	2312.5
20	2343.75	2687.5
			21	2718.75	3125
22	3156.25	3687.5
			23	3718.75	3968.75
24	4000	4312.5
			25	4343.75	4687.5
26	4718.75	5156.5
			27	5187.5	5718.75
28	5750	6250
			29	6281.25	6875
30	6936.25	7968.75

Adopt the energy of following each frequency band of method calculating and it carried out smoothing processing:

E(m，k)＝|X(m，k)| ² 0≤k≤N-1

Y _E(m，k)＝αY _E(m-1，k)+(1-α)E(m，k) 0≤k≤N-1

Wherein, Y _E(m representes the sequence number of present frame for m, k) each frequency band interval energy of expression after smoothing processing; K representes the sequence number of current subband, and α=0.75 expression smoothing factor, N are the frequency band sum of choosing; Be 30 in this embodiment, E (m, k) expression frequency band energy value; X (m, k) frequency-region signal of k frequency band of expression m frame.

Steps A 002: calculate priori SNR estimation value

with counter

Calculate the posteriority signal to noise ratio (S/N ratio) of current frame signal, specific as follows:

If initial noise energy estimated value V (0, k)=0, initial priori SNR estimation value

Energy Y by each frequency band _E(m, k) and the noise energy estimated value V that obtains of the former frame of present frame (m-1, k), calculate the posteriority signal to noise ratio (S/N ratio) of each frequency band present frame:

{SNR}_{post} (m, k) = \frac{Y_{E} (m, k)}{V (m - 1, k)}

Based on the priori SNR estimation formula of Ephraim-Malah (Yi Fulei-horse traction), calculate the priori SNR estimation value

of present frame then

Voice signal behind the noise reduction that

expression former frame obtains at last;

expression noise energy estimated value, α is first smoothing factor;

is revised, obtain revised priori SNR estimation value

Priori SNR estimation value based on Ephraim-Malah (Yi Fulei-horse traction); Can occur crossing of noise estimated; Promptly under the situation for high s/n ratio; The SNR estimation value that obtains is higher; Cause the voice signal distortion after the enhancing, adopt the weighted noise method of estimation that priori SNR estimation value is revised in this embodiment.

Test the SNR estimation value

Multiply by the weighted factor and remove q _θ, obtain revised priori SNR estimation value

The weighted factor calculation is following:

γ wherein ₁Get 1.5, γ ₂Get 200, θ _zGet 20;

Steps A 004: according to revised priori SNR estimation value

calculate each frequency band decay gain q (m, k):

The decay gain q of each frequency band (m k) adopts the method for spectral substraction to obtain decay factor based on priori SNR estimation value, and its concrete formula is following:

Wherein, for different frequency bands, α is different constant.

Here consider that noise mainly concentrates on lower frequency band, therefore for medium and low frequency section and high frequency, α gets different value.

For the frequency band of k≤14, promptly frequency is the signal below the 1.2kHz among the present invention, α=8.89,

For the frequency band of 14＜k≤18, promptly frequency is the signal between 1.2～2kHz, α=6.44,

For the frequency band of 18＜k≤23, promptly frequency is the above signal of 2～4kHz, α=6.21,

For the frequency band of k＞23, promptly frequency is the above signal of 4kHz, α=5.37.

More excellent, adopt threshold judgement that said decay gain is adjusted in this embodiment:

At first, set a threshold values, adjusted value qmod and threshold value qfloor;

Secondly; is judgement with the revised priori SNR estimation of present frame value; Gain coefficient less than a certain threshold value all multiply by adjusted value qmod, thereby further suppresses noise;

Then all are adjusted to threshold value qfloor less than the gain coefficient of certain threshold values, do the voice distortion that to avoid certain like this.Concrete grammar is following:

Q wherein _Mod=0.1, θ _G=1.2, q _Floor=0.01.

after the enhancing of this frequency band

\hat{S} (m, k) = q (m, k) * X (m, k)

0≤k≤N-1

Steps A 006: utilize revised priori SNR estimation value

that present frame is judged; Judge that whether present frame is noise and according to court verdict the noise energy estimated value of each frequency band is upgraded, and is specially:

Among the present invention, the judgement of the noise energy of each frequency band has adopted the voice activation based on the priori signal to noise ratio (S/N ratio) to detect (VAD) method with renewal.At first, judge whether present frame is pure noise signal:

VAD (m) = Σ_{k = 1}^{N} [\frac{γ (m, k) ζ (m, k)}{1 + ζ (m, k)} - \lg (1 + ζ (m, k))]

Wherein γ (m, k)=min [SNR _Post(m, k), 40],

ζ (m, k) = \max [S \hat{N} R_{prior} (m, k), 10^{- 25}]

VAD (m) is judged, and carry out noise and upgrade, specific as follows:

V (m, k) = \{\begin{matrix} μV (m - 1, k) + (1 - μ) E (m, k) & VAD (m) < η \\ V (m - 1, k) & VAD (m) &GreaterEqual; η \end{matrix}

Wherein η is that noise upgrades the judgement factor, gets η=0.01 among the present invention.

μ is a smoothing factor, gets μ=0.9 here.

If be judged as noise, then the noise energy estimated value equal former frame value V (m, k)=(m-1 k), otherwise adopts frequency band energy Y to V _E(m, k) to noise energy estimated value V (m k) upgrades, and with this noise energy estimated value V (m k) is used for the steps A 002 of the next frame of present frame, carries out the estimation of posteriority signal to noise ratio (S/N ratio):

V(m，k)＝μV(m-1，k)+(1-μ)E(m，k)

Wherein, μ representes second smoothing factor.

Steps A 007: will strengthen the back voice signal

Be transformed into time-domain signal with Short Time Fourier Transform, with this time-domain signal process low-pass filter, H (z)=1+ α z ^-1The processing of postemphasising obtains removing voice signal and output after the de-noising of neighbourhood noise;

At first, (FFT) transforms to time domain to the speech manual of frequency domain through inverse fast fourier transform, the time domain voice signal after being enhanced.

The conversion of time domain realizes with general contrary discrete Fourier transform (IDFT).

s (m, n) - = \frac{1}{2} * Σ_{n = 0}^{M - 1} \hat{S} (f) e^{j 2 πnf / M}

0≤k≤M-1

Wherein, M=256 is frame length, and s is the voice signal that transforms to after full range band after the time domain strengthens.

Secondly, to the processing of postemphasising of the time domain voice signal after strengthening.

Handle on the contrary with above-mentioned pre-emphasis, the time domain voice signal after will strengthening here farthest is reduced into original signal with it through a low-pass filter.The frequency response of wave filter is following:

H (z)=1+ α z ^-1, α=0.9 wherein.

Once more, the lap with the consecutive frame of the voice signal after strengthening carries out the phase add operation.

Concrete lap addition can be represented with following method:

s^{'} (n) = \{\begin{matrix} s (m, n) + s (m - 1, n + L) & 0 \leq n < M - L \\ s (m, n) & M - L \leq n < M \end{matrix}

L=64 is the distance that adjacent frame signal begins to locate, and M=256 is frame length, s ' for the de-noising of removing neighbourhood noise after voice signal.

Compare with correlation technique; Dual microphone speech sound enhancement device of the present invention and sound enhancement method thereof have adopted auto-adaptive filtering technique to suppress neighbourhood noise, and pass through from voice enhancement algorithm all types of ground unrest of further decaying; Consistance to two microphones does not have special demands; Distance between microphone also can be adjusted voluntarily, and this makes dual microphone speech sound enhancement device of the present invention can very gently be easy to be installed in the various handheld communication devices and with utilization, and de-noising effect is good.

Above-described only is embodiment of the present invention, should be pointed out that for the person of ordinary skill of the art at this, under the prerequisite that does not break away from the invention design, can also make improvement, but these all belongs to protection scope of the present invention.

Claims

1. dual microphone speech sound enhancement device; It comprises microphone array module and the signal processing chip that is electrically connected with said microphone array module; Said microphone array module comprises first microphone that is used to gather time-domain audio signal, second microphone and converts the said time-domain audio signal of said first microphone, the second microphone collection analog to digital converter of digital audio and video signals to, it is characterized in that: be provided with in the said signal processing chip:

2. dual microphone speech sound enhancement device according to claim 1 is characterized in that: said first microphone and second microphone are all omni-directional microphone.

3. the sound enhancement method of a dual microphone speech sound enhancement device as claimed in claim 2, it is characterized in that: this method comprises the steps:

If said adaptive filter coefficient is W,

Main output signal is made as y, then:

y(n)＝W(n)·x1(n)

Then available error signal is made as e:

e(n)＝x2(n)-y(n)＝x2(n)-W(n)·x1(n)

More new formula is following for then said adaptive filter coefficient:

W (n + 1) = W (n) + \frac{μ}{γ + {| | x 1 (n) | |}^{2}} \cdot x 1 (n) \cdot e^{*} (n)

Wherein μ is a converging factor, and its span is: 0＜μ＜0.5,

γ avoids coefficient to disperse, γ=0.001,

e ^*(n) be the associate matrix of e (n);

4. the sound enhancement method of dual microphone speech sound enhancement device according to claim 3 is characterized in that: the voice enhancement processing method comprises the steps: among the step S003

Its form of Hi-pass filter is following:

H(z)＝1-αz ^-1

Wherein α is a constant, α=0.9325

Short Time Fourier Transform is following:

X (f, m) = \frac{2}{M} Σ_{n = 0}^{M - 1} win (n - m) \times x (m) e^{- 2 πjf \frac{n}{M}}

0≤k1≤M-1

Hamming window function defines as follows:

E(m，k)＝|X(m，k)| ² 0≤k≤N-1

Y _E(m，k)＝αY _E(m-1，k)+(1-α)E(m，k) 0≤k≤N-1

Steps A 002: calculate priori SNR estimation value

with counter

And by the priori SNR estimation value of former frame

Obtain the priori SNR estimation value of present frame

is revised, obtain revised priori SNR estimation value

Priori SNR estimation value

The weighted factor calculation is following:

γ wherein ₁Get 1.5, γ ₂Get 200, θ _zGet 20;

Steps A 004: according to revised priori SNR estimation value

calculate each frequency band decay gain q (m, k):

Wherein, for different frequency bands, a is different constant;

after the enhancing of this frequency band

\hat{S} (m, k) = q (m, k) * X (m, k)

0≤k≤N-1

Steps A 006: utilize revised priori SNR estimation value

V(m，k)＝μV(m-1，k)+(1-μ)·E(m，k)

μ representes second smoothing factor;

Steps A 007: will strengthen back voice signal

H (z)=1+ α z ^-1, wherein α is a constant, α=0.9325.

5. the sound enhancement method of dual microphone speech sound enhancement device according to claim 4 is characterized in that: also comprise in the steps A 004 and adopt threshold judgement that said decay gain is adjusted;

Q wherein _Mod=0.1, θ _G=1.2, q _Floor=0.01.