CN102347027A - Double-microphone speech enhancer and speech enhancement method thereof - Google Patents
Double-microphone speech enhancer and speech enhancement method thereof Download PDFInfo
- Publication number
- CN102347027A CN102347027A CN2011101894436A CN201110189443A CN102347027A CN 102347027 A CN102347027 A CN 102347027A CN 2011101894436 A CN2011101894436 A CN 2011101894436A CN 201110189443 A CN201110189443 A CN 201110189443A CN 102347027 A CN102347027 A CN 102347027A
- Authority
- CN
- China
- Prior art keywords
- signal
- noise
- microphone
- frequency band
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention provides a double-microphone speech enhancer, which comprises a microphone array module and a signal processing chip electrically connected with the microphone array module, the microphone array module comprises a first microphone, a second microphone and analog-to-digital converters, the first microphone and the second microphone are used for acquiring time domain audio signals, the analog-to-digital converters are used for converting the time domain audio signals into digital audio signals, wherein a self-adaptive filter module, a speech enhancement processing module and an output module are arranged in the signal processing chip. The invention also provides a speech enhancement method based on the double-microphone speech enhancer. Compared with the prior art, the double-microphone speech enhancer is easy to mount, and has little limitation and a good noise-eliminating effect.
Description
[technical field]
The present invention relates to a kind of speech sound enhancement device and sound enhancement method thereof, relate in particular to a kind of dual microphone speech sound enhancement device and sound enhancement method thereof.
[Environmental Technology]
Development along with wireless telecommunications; Global Mobile Phone Users is more and more, and the user not only is satisfied with conversation to the requirement of mobile phone, and wants high-quality communication effect can be provided; Especially at present the development of mobile multimedia technology, the speech quality of mobile phone becomes more important.
Because the existence of a large amount of neighbourhood noises, the general signal to noise ratio (S/N ratio) of the voice signal that the microphone of communication apparatus such as mobile phone collects is not high enough, particularly in high-noise environments such as street automobile, needs to improve volume the other side is not heard.So need promote the signal to noise ratio (S/N ratio) of input voice through the method that voice strengthen, improve communication quality.Correlation technique single channel sound enhancement method noise reduction is limited; And can cause big distortion to voice; The dual microphone speech sound enhancement device then can effectively improve signal to noise ratio (S/N ratio), and the dual microphone speech sound enhancement device of correlation technique comprises microphone and multiple signal processing chip.
Yet; Coherence request to two microphones in the dual microphone speech sound enhancement device of correlation technique is high; And two required distance strictnesses that microphone is separated by; This structure limitation is big, makes the dual microphone speech sound enhancement device of correlation technique be difficult for being installed in the portable communication apparatus such as mobile phone.
Therefore, be necessary to propose a kind of new dual microphone speech sound enhancement device and solve the problems referred to above.
[summary of the invention]
The technical matters that the present invention need solve provides a kind of easy mounting, the dual microphone speech sound enhancement device that limitation is little.
According to above-mentioned technical matters, designed a kind of dual microphone speech sound enhancement device, its objective is such realization, a kind of dual microphone speech sound enhancement device, it comprises microphone array module and the signal processing chip that is electrically connected with said microphone array module.Said microphone array module comprises first microphone that is used to gather time-domain audio signal, second microphone and converts the said time-domain audio signal of said first microphone, the second microphone collection analog to digital converter of digital audio and video signals to.Wherein, be provided with in the said signal processing chip:
The auto adapted filtering module comprises sef-adapting filter, is used to receive the said digital audio and video signals of said analog to digital converter output and it is carried out auto adapted filtering handle, and obtains preliminary reducing noise of voice signal;
Voice enhancement process module comprises low-pass filter and Hi-pass filter, is used to receive the said preliminary reducing noise of voice signal of said auto adapted filtering module output and it is carried out the voice enhancement process, obtains removing voice signal after the de-noising of neighbourhood noise;
Output module is used to export voice signal after the de-noising of said removal neighbourhood noise.
Preferably, said first microphone and second microphone are all omni-directional microphone.
The present invention also provides a kind of sound enhancement method of dual microphone speech sound enhancement device as claimed in claim 2, and wherein, this method comprises the steps:
Step S001: receive extraneous first time-domain audio signal and second time-domain audio signal respectively through said first microphone and second microphone; And said first time-domain audio signal and second time-domain audio signal that will receive are exported to said analog to digital converter respectively; Change it into first digital audio and video signals and second digital audio and video signals by said analog to digital converter; First digital audio and video signals is made as main digital audio and video signals x1, and second digital audio and video signals is made as reference signal x2;
Step S002: the sef-adapting filter through said auto adapted filtering module receives said main digital audio and video signals x1 and it is carried out Filtering Processing and obtains main output signal; Again said reference signal x2 and said main output signal are differed and draw error signal; Said error signal is carried out adaptive control to the coefficient of said sef-adapting filter, thereby obtains preliminary reducing noise of voice signal:
If said adaptive filter coefficient is W,
Main output signal is made as y, then:
y(n)=W(n)·x1(n)
Then available error signal is made as e:
e(n)=x2(n)-y(n)=x2(n)-W(n)·x1(n)
More new formula is following for then said adaptive filter coefficient:
Wherein μ is a converging factor, and its span is: 0<μ<0.5,
γ avoids coefficient to disperse, γ=0.001,
e
*(n) be the associate matrix of e (n);
Step S003: through voice enhancement process module the preliminary reducing noise of voice signal of said sef-adapting filter output is carried out the voice enhancement process, obtain removing voice signal after the de-noising of neighbourhood noise;
Step S004: receive voice signal and output after the de-noising of said removal neighbourhood noise of said voice enhancement process module output through said output module.
Preferably, the voice enhancement processing method comprises the steps: among the step S003
Steps A 001: with the said Hi-pass filter of preliminary reducing noise of voice signal process of sef-adapting filter output; Handle as pre-emphasis; Again it is become frequency-region signal through Short Time Fourier Transform, and frequency-region signal is divided into some frequency bands, calculate the energy of each frequency band and carry out smoothing processing;
Its form of Hi-pass filter is following:
H(z)=1-αz
-1
Wherein α is a constant, α=0.9325
Short Time Fourier Transform is following:
0≤k1≤M-1
Wherein, M is the computational length of Fourier Tranform in short-term, and f representes frequency values, and X representes frequency-region signal, and x representes that second falls the sampling rate digital signal;
Hamming window function defines as follows:
Frequency-region signal Y is divided into some frequency band k, calculates the energy Y of each frequency band
E(m, k), adopt following method to calculate the energy of each frequency band and carry out level and smooth:
E(m,k)=|X(m,k)|
2 0≤k≤N-1
Y
E(m,k)=αY
E(m-1,k)+(1-α)E(m,m) 0≤k≤N-1
Wherein, Y
E(m representes the sequence number of present frame for m, k) each frequency band interval energy of expression after level and smooth, and k representes the sequence number of current subband, and smoothing factor is represented in α=0.75; N is the frequency band sum of choosing, E (m, k) expression frequency band energy value, X (m, k) frequency-region signal of k frequency band of expression m frame;
Energy Y by each frequency band
E(m, k) and the noise energy estimated value V that obtains of the former frame of present frame (m-1 k), calculates the posteriority signal to noise ratio (S/N ratio) of each frequency band present frame
And by the priori SNR estimation value of former frame
Obtain the priori SNR estimation value of present frame
Voice signal behind the noise reduction that the former frame of expression present frame obtains at last, E{|V (m, k) |
2Expression noise energy estimated value, α is first smoothing factor;
Steps A 003: adopt the weighted noise estimation technique that priori SNR estimation value
is revised, obtain revised priori SNR estimation value
Priori SNR estimation value
Multiply by the weighted factor and remove q
θ, obtain revised priori SNR estimation value
The weighted factor calculation is following:
γ wherein
1Get 1.5, γ
2Get 200, θ
zGet 20;
Steps A 004: according to revised priori SNR estimation value
calculate each frequency band decay gain q (m, k):
Wherein, for different frequency bands, a is different constant;
Steps A 005: with the frequency domain signal X (m of each frequency band of present frame; K); Multiply by the decay gain of frequency band, obtain voice signal
after the enhancing of this frequency band
Steps A 006: utilize revised priori SNR estimation value
that present frame is judged, judge that whether present frame is noise and according to court verdict the noise energy estimated value of each frequency band is upgraded:
If be judged as noise, then the noise energy estimated value equals the value of the former frame of present frame, promptly V (m, k)=(m-1 k), otherwise adopts frequency band energy Y to V
E(m, k), (m k) upgrades, and (m k), is used for the steps A 002 of next frame, carries out the estimation of posteriority signal to noise ratio (S/N ratio) with this noise energy estimated value V to noise energy estimated value V;
V(m,k)=μV(m-1,k)+(1-μ)·E(m,k)
μ representes second smoothing factor;
Steps A 007: will strengthen back voice signal
and be transformed into time-domain signal with Short Time Fourier Transform; With this time-domain signal through the processing of postemphasising of said low-pass filter; Obtain removing voice signal after the de-noising of neighbourhood noise, its form of said low-pass filter is following:
H (z)=1+ α z
-1, wherein α is a constant, α=0.9325.
Preferably, comprise also in the steps A 004 that adopting threshold judgement that said decay is gained adjusts;
At first set a threshold values, adjusted value qmod and threshold value qfloor;
Secondly be judgement with the revised priori SNR estimation of present frame value
; Gain coefficient less than a certain threshold value all multiply by adjusted value qmod, thereby further suppresses noise;
Then all are adjusted to threshold value qfloor less than the gain coefficient of certain threshold values, method is following:
Compare with correlation technique, dual microphone speech sound enhancement device limitation of the present invention is little, easy mounting.
[description of drawings]
Fig. 1 is the schematic flow sheet of the sound enhancement method of dual microphone speech sound enhancement device of the present invention.
Fig. 2 is the structured flowchart of dual microphone speech sound enhancement device of the present invention.
Fig. 3 is the non-linear weighted curve map of the sound enhancement method of dual microphone speech sound enhancement device of the present invention.
[embodiment]
Below in conjunction with accompanying drawing and embodiment the present invention is described further.
As shown in Figure 2, a kind of dual microphone speech sound enhancement device 1, it comprises microphone array module 2 and the signal processing chip 3 that is electrically connected with microphone array module 2.Microphone array module 2 comprises first microphone 21 that is used to gather time-domain audio signal, second microphone 22 and the said time-domain audio signal that first microphone 21, second microphone 22 are gathered is converted to the analog to digital converter 23 of digital audio and video signals.Wherein, be provided with in the signal processing chip 3:
Auto adapted filtering module 31 comprises sef-adapting filter 311, is used to receive the said digital audio and video signals of said analog to digital converter output and it is carried out auto adapted filtering handle, and obtains preliminary reducing noise of voice signal;
Voice enhancement process module 32 comprises low-pass filter 321 and Hi-pass filter 322, is used to be received from the said preliminary reducing noise of voice signal of adaptive filtering module 31 outputs and it is carried out the voice enhancement process, obtains removing voice signal after the de-noising of neighbourhood noise;
In this embodiment, said first microphone 21 and second microphone 22 are all omni-directional microphone.
The present invention also provides a kind of sound enhancement method based on dual microphone speech sound enhancement device of the present invention, and in detail, the noise reduction algorithm basic step in this method is following:
1, the two-way time-domain signal that dual microphone is received advanced analog to digital conversion, obtained digital signal;
2, two ways of digital signals is handled through the auto adapted filtering of auto adapted filtering module, obtains preliminary reducing noise of voice signal;
3, preliminary reducing noise of voice signal is handled through undue frame, windowing, pre-emphasis; Transform to it in frequency domain by Short Time Fourier Transform again and be divided into some frequency bands; Calculate each frequency band energy and carry out smoothly, obtain the signal energy in each frequency band after level and smooth;
4,, calculate the posteriority signal to noise ratio (S/N ratio) of each frequency band present frame, and obtain the priori SNR estimation value of present frame by the priori SNR estimation value of former frame through signal energy and noise energy estimated value;
5, the priori SNR estimation value to obtaining adopts the weighted noise estimation technique to revise;
6,, calculate the decay gain factor of each frequency band according to the correction priori SNR estimation value that obtains;
7, adopt threshold judgement that gain coefficient is adjusted;
8, with the decay gain that obtains, the signal spectrum that is divided into a frequency band is handled;
9, by the priori SNR estimation value of revising present frame is judged, judged whether this signal is noise;
10, according to the noise decision result noise estimation value of each frequency band is upgraded;
11, the frequency-region signal after will handling transforms to time domain, to the signal that the transforms to time domain processing of postemphasising, obtains exporting signal again.
With instantiation this sound enhancement method is described below, this embodiment is example with the mobile phone, after dual microphone speech sound enhancement device of the present invention is used on the mobile phone, and ginseng Fig. 1, Fig. 3:
Step S001: receive extraneous first time-domain audio signal and second time-domain audio signal respectively through first microphone 21 and second microphone 22; And first time-domain audio signal that will receive and second time-domain audio signal are exported to analog to digital converter 23 respectively; Change it into first digital audio and video signals and second digital audio and video signals by analog to digital converter 23; First digital audio and video signals is made as main digital audio and video signals x1, and second digital audio and video signals is made as reference signal x2.Concrete, first microphone 21 and second microphone 22 are two omni-directional microphone, and analog to digital converter 23 outputting digital audio signals sampling rates are 16kHz, and resolution is 16.When using mobile phone communication, first microphone 21 is nearer from mouth, gathers Noisy Speech Signal, and it is made as x1; Second microphone 22 is far away from mouth, gathers ambient noise signal, and it is made as x2.Because first microphone 21 and second microphone, 22 distances are far away, both differ greatly for the voice near the mouth end, and neighbourhood noise is a remote signaling, and both have bigger consistance to it.Therefore with x1 as main digital audio and video signals, x2 is signal as a reference.
Step S002: the sef-adapting filter 311 through auto adapted filtering module 31 receives said main digital audio and video signals x1 and it is carried out Filtering Processing and obtains main output signal; Again said reference signal and said main output signal are differed and draw error signal; Said error signal is carried out adaptive control to the coefficient of sef-adapting filter 311, thereby obtains preliminary reducing noise of voice signal:
If adaptive filter coefficient is W,
Main output signal is made as y, then:
y(n)=W(n)·x1(n)
Then available error signal is made as e:
e(n)=x2(n)-y(n)=x2(n)-W(n)·x1(n)
More new formula is following for then said adaptive filter coefficient:
Wherein μ is a converging factor, and its span is: 0<μ<0.5,
γ avoids coefficient to disperse, γ=0.001,
e
*(n) be the associate matrix of e (n);
Step S003: the said preliminary reducing noise of voice signal through 32 pairs of sef-adapting filters of voice enhancement process module 311 output carries out the voice enhancement process, obtains removing voice signal after the de-noising of neighbourhood noise.
Step S004: receive voice signal and output after the de-noising of said removal neighbourhood noise of voice enhancement process modules 32 outputs through output module 4.
More excellent, in this embodiment among the step S003 voice enhancement processing method comprise the steps:
Because voice signal is stably in short-term, be feasible so voice signal is carried out the processing of branch frame, the frequency that causes but the branch frame can bring the discontinuous of frame signal boundary is again revealed.So will carry out Short Time Fourier Transform (STFT) to frame signal.Short Time Fourier Transform is appreciated that to Fourier transform is done in the windowing of frame signal elder generation again.The purpose of windowed function is exactly for when doing Short Time Fourier Transform, reduces the discontinuous frequency that causes of frame signal boundary and reveals.Used a length to equal the Hamming window of 256 of frame lengths in this embodiment, this Hamming window can effectively reduce the oscillation degree of Gibbs' effect, and is specific as follows:
Steps A 001: preliminary reducing noise of voice signal y (n) is carried out the branch frame.The so-called frame that divides is meant with preliminary reducing noise of voice signal y (n) to be that unit is divided into some preliminary reducing noise of voice signal elements with the frame.Said preliminary reducing noise of voice signal element is made up of sampled point; The digital microphone sampling rate is 3.072MHz among the present invention; Adopt 64 times of over-samplings, according to the needs that short-time spectrum is analyzed, frame length is generally set between 10~35ms; This embodiment divides frame with 16ms, and promptly a frame signals with noise unit is provided with 256 sampled points.Certainly, the preliminary reducing noise of voice signal element of any frame all has certain frame length, and the frame length of arbitrary frame is 256 among the present invention.
For the blocking effect between the preliminary reducing noise of voice signal element that prevents adjacent two frames; When minute frame, to make between the preliminary reducing noise of voice signal element of adjacent two frames certain aliasing part is arranged; That is, it is former frame section data data that D data are arranged in these frame data, and wherein aliasing partly is described below:
s(n)=d(m,D+n) 0≤n<L
Wherein xi representes to import preliminary reducing noise of voice signal, and i gets 1 and 2 and representes two paths of signals respectively
d(m,n)=d(m-1,L+n) 0≤n<D
Wherein, d representes 256 point sampling signals of present frame, because the length of any frame is 256, Duplication is 75%, so the sampled point number D=192 of lap.Distance L=256-192=64 that first sampled point of the preliminary reducing noise of voice signal element of consecutive frame is separated by.
Can have 50%~75% Duplication between the preliminary reducing noise of voice signal element of adjacent two frames of the present invention.This embodiment is chosen between the preliminary reducing noise of voice signal element of adjacent two frames has 75% Duplication, and promptly the preliminary reducing noise of voice signal element with 75% (192 point) after the preliminary reducing noise of voice signal element of preceding 75% (192 point) of this frame and the former frame is consistent.
Preliminary reducing noise of voice signal y (n) behind the branch frame of sef-adapting filter output is handled as pre-emphasis through Hi-pass filter, obtain second and fall the sampling rate digital signal.
Its form of Hi-pass filter is following:
H (z)=1-α z
-1, constant alpha=0.9325 wherein
Fall the sampling rate digital signal with second and become frequency-region signal, and frequency-region signal is divided into some frequency bands, calculate the energy of each frequency band and it is carried out smoothing processing through Short Time Fourier Transform.
Short Time Fourier Transform is following:
Wherein, M is the computational length of Fourier Tranform in short-term, and f representes frequency values, and X representes frequency-region signal, and x representes that second falls the sampling rate digital signal;
Hamming window function defines as follows:
Then Short Time Fourier Transform is following
Wherein, M=256 is the computational length of Fourier Tranform in short-term, and m representes the m frame signal.
So just the preliminary reducing noise of voice signal y of present frame is transformed from the time domain to and be frequency-region signal Y.
The preliminary reducing noise of voice signal that transforms to behind the frequency domain comprises voice signal and noise signal, and it is that unit is divided into some frequency bands with the frame.Frequency-region signal Y is divided into some frequency band k, calculates the energy of each frequency band.
To frequency is that the preliminary reducing noise of voice signal that transforms to behind the frequency domain below the 8kHz carries out frequency band division; Signal Processing is afterwards all carried out in each frequency band; So both can reduce computational complexity, can do different processing to different frequency bands again, obtain better effect.
Preliminary reducing noise of voice signal among the present invention is divided into 30 frequency bands altogether, specifically sees table 1.
30 frequency band division of table 1
Frequency band number | Initial frequency (Hz) | Cutoff frequency (Hz) |
1 | 62.5 | 93.75 |
2 | 125 | 156.25 |
3 | 187.5 | 218.75 |
4 | 250 | 281.25 |
5 | 312.5 | 343.75 |
6 | 375 | 406.25 |
7 | 437.5 | 468.75 |
8 | 500 | 531.25 |
9 | 562.5 | 593.75 |
10 | 625 | 656.25 |
11 | 687.5 | 718.75 |
12 | 750 | 781.25 |
13 | 812.5 | 906.25 |
14 | 937.5 | 1062.5 |
15 | 1093.75 | 1250 |
16 | 1281.25 | 1468.75 |
17 | 1500 | 1718.75 |
18 | 1750 | 2000 |
19 | 2031.25 | 2312.5 |
20 | 2343.75 | 2687.5 |
21 | 2718.75 | 3125 |
22 | 3156.25 | 3687.5 |
23 | 3718.75 | 3968.75 |
24 | 4000 | 4312.5 |
25 | 4343.75 | 4687.5 |
26 | 4718.75 | 5156.5 |
27 | 5187.5 | 5718.75 |
28 | 5750 | 6250 |
29 | 6281.25 | 6875 |
30 | 6936.25 | 7968.75 |
Adopt the energy of following each frequency band of method calculating and it carried out smoothing processing:
E(m,k)=|X(m,k)|
2 0≤k≤N-1
Y
E(m,k)=αY
E(m-1,k)+(1-α)E(m,k) 0≤k≤N-1
Wherein, Y
E(m representes the sequence number of present frame for m, k) each frequency band interval energy of expression after smoothing processing; K representes the sequence number of current subband, and α=0.75 expression smoothing factor, N are the frequency band sum of choosing; Be 30 in this embodiment, E (m, k) expression frequency band energy value; X (m, k) frequency-region signal of k frequency band of expression m frame.
Calculate the posteriority signal to noise ratio (S/N ratio) of current frame signal, specific as follows:
If initial noise energy estimated value V (0, k)=0, initial priori SNR estimation value
Energy Y by each frequency band
E(m, k) and the noise energy estimated value V that obtains of the former frame of present frame (m-1, k), calculate the posteriority signal to noise ratio (S/N ratio) of each frequency band present frame:
Based on the priori SNR estimation formula of Ephraim-Malah (Yi Fulei-horse traction), calculate the priori SNR estimation value
of present frame then
Voice signal behind the noise reduction that
expression former frame obtains at last;
expression noise energy estimated value, α is first smoothing factor;
Steps A 003: adopt the weighted noise estimation technique that priori SNR estimation value
is revised, obtain revised priori SNR estimation value
Priori SNR estimation value based on Ephraim-Malah (Yi Fulei-horse traction); Can occur crossing of noise estimated; Promptly under the situation for high s/n ratio; The SNR estimation value that obtains is higher; Cause the voice signal distortion after the enhancing, adopt the weighted noise method of estimation that priori SNR estimation value
is revised in this embodiment.
Test the SNR estimation value
Multiply by the weighted factor and remove q
θ, obtain revised priori SNR estimation value
The weighted factor calculation is following:
γ wherein
1Get 1.5, γ
2Get 200, θ
zGet 20;
Steps A 004: according to revised priori SNR estimation value
calculate each frequency band decay gain q (m, k):
The decay gain q of each frequency band (m k) adopts the method for spectral substraction to obtain decay factor based on priori SNR estimation value, and its concrete formula is following:
Wherein, for different frequency bands, α is different constant.
Here consider that noise mainly concentrates on lower frequency band, therefore for medium and low frequency section and high frequency, α gets different value.
For the frequency band of k≤14, promptly frequency is the signal below the 1.2kHz among the present invention, α=8.89,
For the frequency band of 14<k≤18, promptly frequency is the signal between 1.2~2kHz, α=6.44,
For the frequency band of 18<k≤23, promptly frequency is the above signal of 2~4kHz, α=6.21,
For the frequency band of k>23, promptly frequency is the above signal of 4kHz, α=5.37.
More excellent, adopt threshold judgement that said decay gain is adjusted in this embodiment:
At first, set a threshold values, adjusted value qmod and threshold value qfloor;
Secondly;
is judgement with the revised priori SNR estimation of present frame value; Gain coefficient less than a certain threshold value all multiply by adjusted value qmod, thereby further suppresses noise;
Then all are adjusted to threshold value qfloor less than the gain coefficient of certain threshold values, do the voice distortion that to avoid certain like this.Concrete grammar is following:
Q wherein
Mod=0.1, θ
G=1.2, q
Floor=0.01.
Steps A 005: with the frequency domain signal X (m of each frequency band of present frame; K); Multiply by the decay gain of frequency band, obtain voice signal
after the enhancing of this frequency band
Steps A 006: utilize revised priori SNR estimation value
that present frame is judged; Judge that whether present frame is noise and according to court verdict the noise energy estimated value of each frequency band is upgraded, and is specially:
Among the present invention, the judgement of the noise energy of each frequency band has adopted the voice activation based on the priori signal to noise ratio (S/N ratio) to detect (VAD) method with renewal.At first, judge whether present frame is pure noise signal:
Wherein γ (m, k)=min [SNR
Post(m, k), 40],
VAD (m) is judged, and carry out noise and upgrade, specific as follows:
Wherein η is that noise upgrades the judgement factor, gets η=0.01 among the present invention.
μ is a smoothing factor, gets μ=0.9 here.
If be judged as noise, then the noise energy estimated value equal former frame value V (m, k)=(m-1 k), otherwise adopts frequency band energy Y to V
E(m, k) to noise energy estimated value V (m k) upgrades, and with this noise energy estimated value V (m k) is used for the steps A 002 of the next frame of present frame, carries out the estimation of posteriority signal to noise ratio (S/N ratio):
V(m,k)=μV(m-1,k)+(1-μ)E(m,k)
Wherein, μ representes second smoothing factor.
Steps A 007: will strengthen the back voice signal
Be transformed into time-domain signal with Short Time Fourier Transform, with this time-domain signal process low-pass filter, H (z)=1+ α z
-1The processing of postemphasising obtains removing voice signal and output after the de-noising of neighbourhood noise;
At first, (FFT) transforms to time domain to the speech manual of frequency domain through inverse fast fourier transform, the time domain voice signal after being enhanced.
The conversion of time domain realizes with general contrary discrete Fourier transform (IDFT).
Wherein, M=256 is frame length, and s is the voice signal that transforms to after full range band after the time domain strengthens.
Secondly, to the processing of postemphasising of the time domain voice signal after strengthening.
Handle on the contrary with above-mentioned pre-emphasis, the time domain voice signal after will strengthening here farthest is reduced into original signal with it through a low-pass filter.The frequency response of wave filter is following:
H (z)=1+ α z
-1, α=0.9 wherein.
Once more, the lap with the consecutive frame of the voice signal after strengthening carries out the phase add operation.
Concrete lap addition can be represented with following method:
L=64 is the distance that adjacent frame signal begins to locate, and M=256 is frame length, s ' for the de-noising of removing neighbourhood noise after voice signal.
Compare with correlation technique; Dual microphone speech sound enhancement device of the present invention and sound enhancement method thereof have adopted auto-adaptive filtering technique to suppress neighbourhood noise, and pass through from voice enhancement algorithm all types of ground unrest of further decaying; Consistance to two microphones does not have special demands; Distance between microphone also can be adjusted voluntarily, and this makes dual microphone speech sound enhancement device of the present invention can very gently be easy to be installed in the various handheld communication devices and with utilization, and de-noising effect is good.
Above-described only is embodiment of the present invention, should be pointed out that for the person of ordinary skill of the art at this, under the prerequisite that does not break away from the invention design, can also make improvement, but these all belongs to protection scope of the present invention.
Claims (5)
1. dual microphone speech sound enhancement device; It comprises microphone array module and the signal processing chip that is electrically connected with said microphone array module; Said microphone array module comprises first microphone that is used to gather time-domain audio signal, second microphone and converts the said time-domain audio signal of said first microphone, the second microphone collection analog to digital converter of digital audio and video signals to, it is characterized in that: be provided with in the said signal processing chip:
The auto adapted filtering module comprises sef-adapting filter, is used to receive the said digital audio and video signals of said analog to digital converter output and it is carried out auto adapted filtering handle, and obtains preliminary reducing noise of voice signal;
Voice enhancement process module comprises low-pass filter and Hi-pass filter, is used to receive the said preliminary reducing noise of voice signal of said auto adapted filtering module output and it is carried out the voice enhancement process, obtains removing voice signal after the de-noising of neighbourhood noise;
Output module is used to export voice signal after the de-noising of said removal neighbourhood noise.
2. dual microphone speech sound enhancement device according to claim 1 is characterized in that: said first microphone and second microphone are all omni-directional microphone.
3. the sound enhancement method of a dual microphone speech sound enhancement device as claimed in claim 2, it is characterized in that: this method comprises the steps:
Step S001: receive extraneous first time-domain audio signal and second time-domain audio signal respectively through said first microphone and second microphone; And said first time-domain audio signal and second time-domain audio signal that will receive are exported to said analog to digital converter respectively; Change it into first digital audio and video signals and second digital audio and video signals by said analog to digital converter; First digital audio and video signals is made as main digital audio and video signals x1, and second digital audio and video signals is made as reference signal x2;
Step S002: the sef-adapting filter through said auto adapted filtering module receives said main digital audio and video signals x1 and it is carried out Filtering Processing and obtains main output signal; Again said reference signal x2 and said main output signal are differed and draw error signal; Said error signal is carried out adaptive control to the coefficient of said sef-adapting filter, thereby obtains preliminary reducing noise of voice signal:
If said adaptive filter coefficient is W,
Main output signal is made as y, then:
y(n)=W(n)·x1(n)
Then available error signal is made as e:
e(n)=x2(n)-y(n)=x2(n)-W(n)·x1(n)
More new formula is following for then said adaptive filter coefficient:
Wherein μ is a converging factor, and its span is: 0<μ<0.5,
γ avoids coefficient to disperse, γ=0.001,
e
*(n) be the associate matrix of e (n);
Step S003: through voice enhancement process module the preliminary reducing noise of voice signal of said sef-adapting filter output is carried out the voice enhancement process, obtain removing voice signal after the de-noising of neighbourhood noise;
Step S004: receive voice signal and output after the de-noising of said removal neighbourhood noise of said voice enhancement process module output through said output module.
4. the sound enhancement method of dual microphone speech sound enhancement device according to claim 3 is characterized in that: the voice enhancement processing method comprises the steps: among the step S003
Steps A 001: with the said Hi-pass filter of preliminary reducing noise of voice signal process of sef-adapting filter output; Handle as pre-emphasis; Again it is become frequency-region signal through Short Time Fourier Transform, and frequency-region signal is divided into some frequency bands, calculate the energy of each frequency band and carry out smoothing processing;
Its form of Hi-pass filter is following:
H(z)=1-αz
-1
Wherein α is a constant, α=0.9325
Short Time Fourier Transform is following:
0≤k1≤M-1
Wherein, M is the computational length of Fourier Tranform in short-term, and f representes frequency values, and X representes frequency-region signal, and x representes that second falls the sampling rate digital signal;
Hamming window function defines as follows:
Frequency-region signal Y is divided into some frequency band k, calculates the energy Y of each frequency band
E(m, k), adopt following method to calculate the energy of each frequency band and carry out level and smooth:
E(m,k)=|X(m,k)|
2 0≤k≤N-1
Y
E(m,k)=αY
E(m-1,k)+(1-α)E(m,k) 0≤k≤N-1
Wherein, Y
E(m representes the sequence number of present frame for m, k) each frequency band interval energy of expression after level and smooth, and k representes the sequence number of current subband, and smoothing factor is represented in α=0.75; N is the frequency band sum of choosing, E (m, k) expression frequency band energy value, X (m, k) frequency-region signal of k frequency band of expression m frame;
Energy Y by each frequency band
E(m, k) and the noise energy estimated value V that obtains of the former frame of present frame (m-1 k), calculates the posteriority signal to noise ratio (S/N ratio) of each frequency band present frame
And by the priori SNR estimation value of former frame
Obtain the priori SNR estimation value of present frame
Voice signal behind the noise reduction that the former frame of expression present frame obtains at last, E{|V (m, k) |
2Expression noise energy estimated value, α is first smoothing factor;
Steps A 003: adopt the weighted noise estimation technique that priori SNR estimation value
is revised, obtain revised priori SNR estimation value
Priori SNR estimation value
Multiply by the weighted factor and remove q
θ, obtain revised priori SNR estimation value
The weighted factor calculation is following:
γ wherein
1Get 1.5, γ
2Get 200, θ
zGet 20;
Steps A 004: according to revised priori SNR estimation value
calculate each frequency band decay gain q (m, k):
Wherein, for different frequency bands, a is different constant;
Steps A 005: with the frequency domain signal X (m of each frequency band of present frame; K); Multiply by the decay gain of frequency band, obtain voice signal
after the enhancing of this frequency band
Steps A 006: utilize revised priori SNR estimation value
that present frame is judged, judge that whether present frame is noise and according to court verdict the noise energy estimated value of each frequency band is upgraded:
If be judged as noise, then the noise energy estimated value equals the value of the former frame of present frame, promptly V (m, k)=(m-1 k), otherwise adopts frequency band energy Y to V
E(m, k), (m k) upgrades, and (m k), is used for the steps A 002 of next frame, carries out the estimation of posteriority signal to noise ratio (S/N ratio) with this noise energy estimated value V to noise energy estimated value V;
V(m,k)=μV(m-1,k)+(1-μ)·E(m,k)
μ representes second smoothing factor;
Steps A 007: will strengthen back voice signal
and be transformed into time-domain signal with Short Time Fourier Transform; With this time-domain signal through the processing of postemphasising of said low-pass filter; Obtain removing voice signal after the de-noising of neighbourhood noise, its form of said low-pass filter is following:
H (z)=1+ α z
-1, wherein α is a constant, α=0.9325.
5. the sound enhancement method of dual microphone speech sound enhancement device according to claim 4 is characterized in that: also comprise in the steps A 004 and adopt threshold judgement that said decay gain is adjusted;
At first set a threshold values, adjusted value qmod and threshold value qfloor;
Secondly be judgement with the revised priori SNR estimation of present frame value
; Gain coefficient less than a certain threshold value all multiply by adjusted value qmod, thereby further suppresses noise;
Then all are adjusted to threshold value qfloor less than the gain coefficient of certain threshold values, method is following:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101894436A CN102347027A (en) | 2011-07-07 | 2011-07-07 | Double-microphone speech enhancer and speech enhancement method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101894436A CN102347027A (en) | 2011-07-07 | 2011-07-07 | Double-microphone speech enhancer and speech enhancement method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102347027A true CN102347027A (en) | 2012-02-08 |
Family
ID=45545648
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011101894436A Pending CN102347027A (en) | 2011-07-07 | 2011-07-07 | Double-microphone speech enhancer and speech enhancement method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102347027A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103219012A (en) * | 2013-04-23 | 2013-07-24 | 中国人民解放军总后勤部军需装备研究所 | Double-microphone noise elimination method and device based on sound source distance |
CN103268766A (en) * | 2013-05-17 | 2013-08-28 | 泰凌微电子(上海)有限公司 | Method and device for speech enhancement with double microphones |
CN105261359A (en) * | 2015-12-01 | 2016-01-20 | 南京师范大学 | Noise elimination system and method of mobile phone microphones |
CN106328154A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Front-end audio processing system |
CN106328159A (en) * | 2016-09-12 | 2017-01-11 | 合网络技术(北京)有限公司 | Audio stream processing method and audio stream processing device |
CN106971740A (en) * | 2017-03-28 | 2017-07-21 | 吉林大学 | Probability and the sound enhancement method of phase estimation are had based on voice |
CN107393548A (en) * | 2017-07-05 | 2017-11-24 | 青岛海信电器股份有限公司 | The processing method and processing device of the voice messaging of multiple voice assistant equipment collections |
CN107484080A (en) * | 2016-05-30 | 2017-12-15 | 奥迪康有限公司 | The method of apparatus for processing audio and signal to noise ratio for estimation voice signal |
CN109389991A (en) * | 2018-10-24 | 2019-02-26 | 中国科学院上海微系统与信息技术研究所 | A kind of signal enhancing method based on microphone array |
CN109545239A (en) * | 2018-12-06 | 2019-03-29 | 华南理工大学 | For acquiring dual microphone adaptive filter algorithm and the application of body sound signal |
CN110021307A (en) * | 2019-04-04 | 2019-07-16 | Oppo广东移动通信有限公司 | Audio method of calibration, device, storage medium and electronic equipment |
CN110111805A (en) * | 2019-04-29 | 2019-08-09 | 北京声智科技有限公司 | Auto gain control method, device and readable storage medium storing program for executing in the interactive voice of far field |
EP3696814A1 (en) * | 2019-02-15 | 2020-08-19 | Shenzhen Goodix Technology Co., Ltd. | Speech enhancement method and apparatus, device and storage medium |
CN113223554A (en) * | 2021-03-15 | 2021-08-06 | 百度在线网络技术(北京)有限公司 | Wind noise detection method, device, equipment and storage medium |
CN113421582A (en) * | 2021-06-21 | 2021-09-21 | 展讯通信(天津)有限公司 | Microphone voice enhancement method and device, terminal and storage medium |
CN113539291A (en) * | 2021-07-09 | 2021-10-22 | 北京声智科技有限公司 | Method and device for reducing noise of audio signal, electronic equipment and storage medium |
CN113689875A (en) * | 2021-08-25 | 2021-11-23 | 湖南芯海聆半导体有限公司 | Double-microphone voice enhancement method and device for digital hearing aid |
CN113890918A (en) * | 2021-11-12 | 2022-01-04 | 深圳康佳电子科技有限公司 | Multipoint far-field voice interaction equipment |
CN116645973A (en) * | 2023-07-20 | 2023-08-25 | 腾讯科技(深圳)有限公司 | Directional audio enhancement method and device, storage medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007004188A2 (en) * | 2005-07-06 | 2007-01-11 | Koninklijke Philips Electronics N.V. | Apparatus and method for acoustic beamforming |
CN101763858A (en) * | 2009-10-19 | 2010-06-30 | 瑞声声学科技(深圳)有限公司 | Method for processing double-microphone signal |
CN101894563A (en) * | 2010-07-15 | 2010-11-24 | 瑞声声学科技(深圳)有限公司 | Voice enhancing method |
CN102074245A (en) * | 2011-01-05 | 2011-05-25 | 瑞声声学科技(深圳)有限公司 | Dual-microphone-based speech enhancement device and speech enhancement method |
CN102074246A (en) * | 2011-01-05 | 2011-05-25 | 瑞声声学科技(深圳)有限公司 | Dual-microphone based speech enhancement device and method |
-
2011
- 2011-07-07 CN CN2011101894436A patent/CN102347027A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007004188A2 (en) * | 2005-07-06 | 2007-01-11 | Koninklijke Philips Electronics N.V. | Apparatus and method for acoustic beamforming |
CN101763858A (en) * | 2009-10-19 | 2010-06-30 | 瑞声声学科技(深圳)有限公司 | Method for processing double-microphone signal |
CN101894563A (en) * | 2010-07-15 | 2010-11-24 | 瑞声声学科技(深圳)有限公司 | Voice enhancing method |
CN102074245A (en) * | 2011-01-05 | 2011-05-25 | 瑞声声学科技(深圳)有限公司 | Dual-microphone-based speech enhancement device and speech enhancement method |
CN102074246A (en) * | 2011-01-05 | 2011-05-25 | 瑞声声学科技(深圳)有限公司 | Dual-microphone based speech enhancement device and method |
Non-Patent Citations (1)
Title |
---|
《江南大学学报(自然科学版)》 20060430 曹若臻等 基于信号子空间逼近的多麦克风语音信号增强方法 204-206,210 1-5 第5卷, 第2期 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103219012A (en) * | 2013-04-23 | 2013-07-24 | 中国人民解放军总后勤部军需装备研究所 | Double-microphone noise elimination method and device based on sound source distance |
CN103268766A (en) * | 2013-05-17 | 2013-08-28 | 泰凌微电子(上海)有限公司 | Method and device for speech enhancement with double microphones |
CN103268766B (en) * | 2013-05-17 | 2015-07-01 | 泰凌微电子(上海)有限公司 | Method and device for speech enhancement with double microphones |
CN106328154A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Front-end audio processing system |
CN105261359A (en) * | 2015-12-01 | 2016-01-20 | 南京师范大学 | Noise elimination system and method of mobile phone microphones |
CN105261359B (en) * | 2015-12-01 | 2018-11-09 | 南京师范大学 | The noise-canceling system and noise-eliminating method of mobile microphone |
CN107484080B (en) * | 2016-05-30 | 2021-07-16 | 奥迪康有限公司 | Audio processing apparatus and method for estimating signal-to-noise ratio of sound signal |
CN107484080A (en) * | 2016-05-30 | 2017-12-15 | 奥迪康有限公司 | The method of apparatus for processing audio and signal to noise ratio for estimation voice signal |
CN106328159A (en) * | 2016-09-12 | 2017-01-11 | 合网络技术(北京)有限公司 | Audio stream processing method and audio stream processing device |
CN106328159B (en) * | 2016-09-12 | 2021-07-09 | 优酷网络技术(北京)有限公司 | Audio stream processing method and device |
CN106971740B (en) * | 2017-03-28 | 2019-11-15 | 吉林大学 | Sound enhancement method based on voice existing probability and phase estimation |
CN106971740A (en) * | 2017-03-28 | 2017-07-21 | 吉林大学 | Probability and the sound enhancement method of phase estimation are had based on voice |
CN107393548B (en) * | 2017-07-05 | 2021-05-07 | 海信视像科技股份有限公司 | Method and device for processing voice information collected by multiple voice assistant devices |
CN107393548A (en) * | 2017-07-05 | 2017-11-24 | 青岛海信电器股份有限公司 | The processing method and processing device of the voice messaging of multiple voice assistant equipment collections |
CN109389991A (en) * | 2018-10-24 | 2019-02-26 | 中国科学院上海微系统与信息技术研究所 | A kind of signal enhancing method based on microphone array |
CN109545239A (en) * | 2018-12-06 | 2019-03-29 | 华南理工大学 | For acquiring dual microphone adaptive filter algorithm and the application of body sound signal |
US11735200B2 (en) | 2018-12-06 | 2023-08-22 | South China University Of Technology | Dual-microphone adaptive filtering algorithm for collecting body sound signals and application thereof |
CN109545239B (en) * | 2018-12-06 | 2021-11-05 | 华南理工大学 | Dual-microphone adaptive filtering algorithm for collecting voice signals and application |
EP3696814A1 (en) * | 2019-02-15 | 2020-08-19 | Shenzhen Goodix Technology Co., Ltd. | Speech enhancement method and apparatus, device and storage medium |
US11056130B2 (en) * | 2019-02-15 | 2021-07-06 | Shenzhen GOODIX Technology Co., Ltd. | Speech enhancement method and apparatus, device and storage medium |
CN110021307B (en) * | 2019-04-04 | 2022-02-01 | Oppo广东移动通信有限公司 | Audio verification method and device, storage medium and electronic equipment |
CN110021307A (en) * | 2019-04-04 | 2019-07-16 | Oppo广东移动通信有限公司 | Audio method of calibration, device, storage medium and electronic equipment |
CN110111805A (en) * | 2019-04-29 | 2019-08-09 | 北京声智科技有限公司 | Auto gain control method, device and readable storage medium storing program for executing in the interactive voice of far field |
CN113223554A (en) * | 2021-03-15 | 2021-08-06 | 百度在线网络技术(北京)有限公司 | Wind noise detection method, device, equipment and storage medium |
CN113421582A (en) * | 2021-06-21 | 2021-09-21 | 展讯通信(天津)有限公司 | Microphone voice enhancement method and device, terminal and storage medium |
CN113421582B (en) * | 2021-06-21 | 2022-11-04 | 展讯通信(天津)有限公司 | Microphone voice enhancement method and device, terminal and storage medium |
CN113539291A (en) * | 2021-07-09 | 2021-10-22 | 北京声智科技有限公司 | Method and device for reducing noise of audio signal, electronic equipment and storage medium |
CN113689875A (en) * | 2021-08-25 | 2021-11-23 | 湖南芯海聆半导体有限公司 | Double-microphone voice enhancement method and device for digital hearing aid |
CN113689875B (en) * | 2021-08-25 | 2024-02-06 | 湖南芯海聆半导体有限公司 | Digital hearing aid-oriented double-microphone voice enhancement method and device |
CN113890918A (en) * | 2021-11-12 | 2022-01-04 | 深圳康佳电子科技有限公司 | Multipoint far-field voice interaction equipment |
CN116645973A (en) * | 2023-07-20 | 2023-08-25 | 腾讯科技(深圳)有限公司 | Directional audio enhancement method and device, storage medium and electronic equipment |
CN116645973B (en) * | 2023-07-20 | 2023-09-29 | 腾讯科技(深圳)有限公司 | Directional audio enhancement method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102347027A (en) | Double-microphone speech enhancer and speech enhancement method thereof | |
CN102074245B (en) | Dual-microphone-based speech enhancement device and speech enhancement method | |
CN102074246B (en) | Dual-microphone based speech enhancement device and method | |
CN101916567B (en) | Speech enhancement method applied to dual-microphone system | |
CN101976565A (en) | Dual-microphone-based speech enhancement device and method | |
CN101894563B (en) | Voice enhancing method | |
CN102347028A (en) | Double-microphone speech enhancer and speech enhancement method thereof | |
CN101976566B (en) | Voice enhancement method and device using same | |
CN102576538B (en) | A method and an apparatus for processing an audio signal | |
CN103730125B (en) | A kind of echo cancelltion method and equipment | |
US8010355B2 (en) | Low complexity noise reduction method | |
CN103871418B (en) | A kind of sound reinforcement system is uttered long and high-pitched sounds the detection method of frequency and device | |
CN103026407B (en) | Bandwidth extender | |
CN102652336B (en) | Speech signal restoration device and speech signal restoration method | |
US8521530B1 (en) | System and method for enhancing a monaural audio signal | |
CN1122970C (en) | Signal noise reduction by time-domain spectral subtraction using fixed filters | |
US8880396B1 (en) | Spectrum reconstruction for automatic speech recognition | |
CN101582264A (en) | Method and voice collecting system for speech enhancement | |
CN101222555B (en) | System and method for improving audio speech quality | |
CN101763858A (en) | Method for processing double-microphone signal | |
US20050240401A1 (en) | Noise suppression based on Bark band weiner filtering and modified doblinger noise estimate | |
US6721698B1 (en) | Speech recognition from overlapping frequency bands with output data reduction | |
CN101599274A (en) | The method that voice strengthen | |
CN110265065B (en) | Method for constructing voice endpoint detection model and voice endpoint detection system | |
CN102377454B (en) | Method and device for echo cancellation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120208 |