CN105244036A - Microphone speech enhancement method and microphone speech enhancement device - Google Patents

Microphone speech enhancement method and microphone speech enhancement device Download PDF

Info

Publication number
CN105244036A
CN105244036A CN201410305776.4A CN201410305776A CN105244036A CN 105244036 A CN105244036 A CN 105244036A CN 201410305776 A CN201410305776 A CN 201410305776A CN 105244036 A CN105244036 A CN 105244036A
Authority
CN
China
Prior art keywords
lambda
overbar
array
signal
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201410305776.4A
Other languages
Chinese (zh)
Inventor
范泛
付中华
黎家力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201410305776.4A priority Critical patent/CN105244036A/en
Priority to PCT/CN2014/092217 priority patent/WO2015196729A1/en
Publication of CN105244036A publication Critical patent/CN105244036A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Abstract

The invention provides a microphone speech enhancement method and a corresponding device. The method comprises the steps of acquiring first array speech signals which are acquired and inputted through multi-channel digital speech acquisition equipment; calculating optimal beam output signals synthesized by the first array speech signals by adopting the first array voice signals according to a minimum variance adaptive beam optimization model of the first array speech signals; and carrying out single-channel speech enhancement processing by adopting a power spectrum estimation value of the optimal beam output signals, wherein the minimum variance adaptive beam optimization model of the first array speech signals comprises a space guidance vector from a target sound source to the multi-channel digital speech acquisition equipment. The microphone speech enhancement method and the microphone speech enhancement device provided by the invention can process original speech of a speech acquisition equipment array with many array elements and large spacing.

Description

A kind of microphone sound enhancement method and device
Technical field
The present invention relates to speech processes, particularly a kind of microphone sound enhancement method and device.
Background technology
Along with the development of hand-free call, conference system, Smart Home and intelligent appliance, high-quality remote speech pickup becomes one of key factor affecting voice collecting disposal system performance.In order to adapt to complicated acoustic environment, single microphone techniques has been difficult to competent, and the microphone array with multi-path voice collecting device then becomes main flow day by day, and wherein the most frequently used is exactly various beam-forming technology, speech enhancement technique etc.Speech enhancement technique needs to extract target voice pure as far as possible from the primary speech signal that voice capture device gathers.Beam-forming technology improves microphone array to the sensitivity of certain direction sound by adjustment parameter, improves the effect of speech enhan-cement.But most of speech enhancement technique can only process the raw tone that few, the closely spaced voice capture device array of array element gathers in prior art, therefore often performance is very limited for traditional array speech enhancement technique.
Summary of the invention
Be directed to this, the invention provides a kind of microphone sound enhancement method and device.Described method and device can process the raw tone of the voice capture device array that array element is more, spacing is larger.
Based on above-mentioned purpose a kind of microphone sound enhancement method provided by the invention, comprise the steps:
Obtain the first array voice signal by multi-path digital voice capture device Gather and input;
According to the minimum variance adaptive beam Optimized model of described first array voice signal, the optimal beam adopting the first array voice signal to calculate synthesized by the first array voice signal outputs signal;
The power Spectral Estimation value adopting described optimal beam to output signal is carried out single-channel voice and is strengthened process;
The minimum variance adaptive beam Optimized model of described first array voice signal comprises the steric direction vector of target sound source to described multi-path digital voice capture device.
Optionally, before obtaining the first array voice signal by multi-path digital voice capture device Gather and input, also comprise:
Raw tone array signal y is gathered by multi-path digital voice capture device 1(n) ... y n(n);
The time-frequency representation signal y that Short Time Fourier Transform obtains described raw tone array signal is carried out to described primary speech signal 1(k, λ) ... y n(k, λ);
Adopt optimum super sensing beam coefficient A (k)=[a 1(k) ..., a n(k)] tthe process of frequency domain optimum super sensing wave beam is carried out to described time-frequency representation signal, obtains the first array voice signal i=1 ... N;
Described n is discrete-time variable; N is element number of array; K is frequency numbering; λ is short time frame numbering.
Optionally, described optimum surpass point to beam coefficient set according to the set-up mode of described multi-path digital voice capture device.
Optionally, according to the minimum variance adaptive beam Optimized model of described first array voice signal, during the optimal beam output signal adopting the first array voice signal to calculate synthesized by the first array voice signal, adopt following formula:
y ‾ ( k , λ ) = Σ i = 1 N w i * a i * y i ( k , λ ) ;
for described optimal beam outputs signal; beam coefficient and the target sound source sef-adapting filter parameter to the steric direction Vector operation of each digital speech collecting device is pointed to for surpassing according to noise signal column vector and optimum; beam coefficient A (k)=[a is pointed to for optimum is super 1(k) ..., a n(k)] tmiddle array element a iconjugate complex number; y i(k, λ) is described first array voice signal.
Optionally, the minimum variance adaptive beam Optimized model of described first array voice signal is:
w ( k ) = arg min w ( k ) w H ( k ) R v ~ ( k ) w ( k ) , And meet w H ( k ) d ~ ( k ) = 1 ;
Wherein, the array element in w (k) with conjugate complex number each other; w hk conjugation transformation of ownership matrix that () is w (k); for the noise coherence matrix estimated according to described first array voice signal; for target sound source is to the steric direction vector of described digital speech collecting device.
Optionally, described target sound source to the steric direction vector of digital speech collecting device according to following formulae discovery:
d ~ ( k ) = [ a 1 * exp ( jk d 1 cos ( θ ) c f s ) , . . . . . . , a N * exp ( jk d N cos ( θ ) c f s ) ] T ;
Wherein, d 1d nbe the 1st to N number of digital speech collecting device to the distance of digital speech collecting device array center, c is the velocity of sound; f sit is sample frequency; θ is the position angle that target sound source arrives digital speech collecting device; beam coefficient A (k)=[a is pointed to for optimum is super 1(k) ..., a n(k)] tmiddle array element a iconjugate complex number.
Optionally, described method also comprises:
Voice activity detection VAD is carried out to the noise signal array in the array voice input signal of described multiple passage;
Result according to described voice activity detection VAD carries out noise power Power estimation to noise signal array;
The optimal power Power estimation value outputed signal according to described optimal beam and described noise power spectrum estimated value are carried out second time to described optimal beam output signal and are strengthened.
Optionally, according to the result of described voice activity detection VAD, the step that noise signal array carries out noise power Power estimation is comprised:
Calculate and have voice status, without noise power spectrum when voice status, voice initial state, voice done state;
To described have a voice status time noise power spectrum and carry out compromise process without noise power spectrum during voice status, obtain noise power spectrum estimated value.
Optionally, calculating has voice status, specifically comprises without the step of noise power spectrum when voice status, voice initial state, voice done state:
When being in without voice status, adopt following formula to noise signal array power Power estimation:
φ v ‾ ( k , λ ) = a 1 φ v ‾ ( k , λ - 1 ) + ( 1 - a 1 ) φ y ‾ ( k , λ ) ;
When being in voice initial state and have voice status, following formula is adopted to estimate noise signal array power spectrum:
φ v ‾ ( k , λ ) = min ( φ ^ v ‾ 1 ( k , λ ) , 2 θ v ‾ ( k , λ ) ) ;
When being in voice done state, adopting following formula to carry out duopole to noise signal array power spectrum and return level and smooth estimation:
φ v ‾ ( k , λ ) = a 0 φ v ‾ 2 ( k , λ - 1 ) + ( 1 - a 0 ) max ( φ ^ v ‾ ( k , λ ) , θ v ‾ ( k , λ ) ) ;
In above-mentioned formula, θ v ‾ ( k , λ ) = 1 2 L 1 + 1 Σ m = k - L 1 k + L 1 φ v ‾ ( k , λ ) ;
&phi; ^ v &OverBar; 1 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a a ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 1 ( k , &lambda; ) &phi; ^ v &OverBar; 1 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a d ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 1 ( k , &lambda; ) ;
&phi; ^ v &OverBar; 2 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a a ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 2 ( k , &lambda; ) &phi; ^ v &OverBar; 2 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a d ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 2 ( k , &lambda; ) ;
Wherein, a 1for noise spectrum undated parameter; a a, a dbe respectively smoothing factor.
Optionally, the power Spectral Estimation value of described optimal beam output signal adopts following formula to calculate:
&phi; y &OverBar; ( k , &lambda; ) = a 0 &phi; y &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 0 ) | y &OverBar; ( k , &lambda; ) | 2 ;
Wherein, for the power Spectral Estimation value that described optimal beam outputs signal; for described optimal beam outputs signal; a 0for noise spectrum undated parameter.
Further, the invention provides a kind of microphone speech sound enhancement device, comprising:
First acquisition module: for obtaining the first array voice signal by multi-path digital voice capture device Gather and input;
Optimal beam output signal computing module: for the minimum variance adaptive beam Optimized model according to described first array voice signal, the optimal beam adopting the first array voice signal to calculate synthesized by the first array voice signal outputs signal;
First strengthens module: the power Spectral Estimation value outputed signal for adopting described optimal beam is carried out single-channel voice and strengthened process;
The minimum variance adaptive beam Optimized model of described first array voice signal comprises the steric direction vector of target sound source to described multi-path digital voice capture device.
Optionally, described device also comprises:
Original signal acquisition module: for gathering raw tone array signal y by multi-path digital voice capture device 1(n) ... y n(n);
Original signal conversion module: the time-frequency representation signal y obtaining described raw tone array signal for carrying out Short Time Fourier Transform to described primary speech signal 1(k, λ) ... y n(k, λ);
Optimum super sensing wave beam processing module: point to beam coefficient A (k)=[a for adopting optimum surpassing 1(k) ..., a n(k)] tthe process of frequency domain optimum super sensing wave beam is carried out to described time-frequency representation signal, obtains the first array voice signal i=1 ... N;
Described n is discrete-time variable; N is element number of array; K is frequency numbering; λ is short time frame numbering.
Optionally, described optimum surpass point to beam coefficient set according to the set-up mode of described multi-path digital voice capture device.
Optionally, described optimal beam output signal computing module is according to the minimum variance adaptive beam Optimized model of described first array voice signal, during the optimal beam output signal adopting the first array voice signal to calculate synthesized by the first array voice signal, adopt following formula:
y &OverBar; ( k , &lambda; ) = &Sigma; i = 1 N w i * a i * y i ( k , &lambda; ) ;
for described optimal beam outputs signal; beam coefficient and the target sound source sef-adapting filter parameter to the steric direction Vector operation of each digital speech collecting device is pointed to for surpassing according to noise signal column vector and optimum; beam coefficient A (k)=[a is pointed to for optimum is super 1(k) ..., a n(k)] tmiddle array element a iconjugate complex number; y i(k, λ) is described first array voice signal.
Optionally, the minimum variance adaptive beam Optimized model of the first array voice signal is:
w ( k ) = arg min w ( k ) w H ( k ) R v ~ ( k ) w ( k ) , And meet w H ( k ) d ~ ( k ) = 1 ;
Wherein, the array element in w (k) with conjugate complex number each other; w hk conjugation transformation of ownership matrix that () is w (k); for the noise coherence matrix estimated according to described first array voice signal; for target sound source is to the steric direction vector of described digital speech collecting device.
Optionally, optimal beam output signal computing module calculates the first array voice signal institute when outputing signal with the optimal beam that becomes, the target sound source adopted to the steric direction vector of digital speech collecting device according to following formulae discovery:
d ~ ( k ) = [ a 1 * exp ( jk d 1 cos ( &theta; ) c f s ) , . . . . . . , a N * exp ( jk d N cos ( &theta; ) c f s ) ] T ;
Wherein, d 1d nbe the 1st to N number of digital speech collecting device to the distance of digital speech collecting device array center, c is the velocity of sound; f sit is sample frequency; θ is the position angle that target sound source arrives digital speech collecting device; beam coefficient A (k)=[a is pointed to for optimum is super 1(k) ..., a n(k)] tmiddle array element a iconjugate complex number.
Optionally, also comprise:
VAD module: carry out voice activity detection VAD for the noise signal array in the array voice input signal to described multiple passage;
Noise power spectrum estimation module: noise power Power estimation is carried out to noise signal array for the result according to described voice activity detection VAD;
Second strengthens module: carry out second time enhancing for the optimal power Power estimation value that outputs signal according to described optimal beam and described noise power spectrum estimated value to described optimal beam output signal.
Optionally, described noise power spectrum estimation module comprises:
First noise power spectrum computing unit: have voice status, without noise power spectrum when voice status, voice initial state, voice done state for calculating;
Second noise power spectrum computing unit: for described have a voice status time noise power spectrum and carry out compromise process without noise power spectrum during voice status, obtain noise power spectrum estimated value.
Optionally, described first noise power spectrum computing unit specifically comprises:
Without voice status computation subunit: for when being in without voice status, adopt following formula to noise signal array power Power estimation:
&phi; v &OverBar; ( k , &lambda; ) = a 1 &phi; v &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 1 ) &phi; y &OverBar; ( k , &lambda; ) ;
Voice start and have voice status computation subunit: for when being in voice initial state and have voice status, adopt following formula to estimate noise signal array power spectrum:
Without voice status computation subunit: for when being in voice done state, adopting following formula to carry out duopole to noise signal array power spectrum and returning level and smooth estimation:
&phi; v &OverBar; ( k , &lambda; ) = a 0 &phi; v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a 0 ) max ( &phi; ^ v &OverBar; ( k , &lambda; ) , &theta; v &OverBar; ( k , &lambda; ) ) ;
In above-mentioned formula, &theta; v &OverBar; ( k , &lambda; ) = 1 2 L 1 + 1 &Sigma; m = k - L 1 k + L 1 &phi; v &OverBar; ( k , &lambda; ) ;
&phi; ^ v &OverBar; 1 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a a ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 1 ( k , &lambda; ) &phi; ^ v &OverBar; 1 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a d ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 1 ( k , &lambda; ) ;
&phi; ^ v &OverBar; 2 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a a ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 2 ( k , &lambda; ) &phi; ^ v &OverBar; 2 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a d ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 2 ( k , &lambda; ) ;
Wherein, a 1for noise spectrum undated parameter; a a, a dbe respectively smoothing factor.
Optionally, the power Spectral Estimation value of described optimal beam output signal adopts following formula to calculate:
&phi; y &OverBar; ( k , &lambda; ) = a 0 &phi; y &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 0 ) | y &OverBar; ( k , &lambda; ) | 2 ;
Wherein, for the power Spectral Estimation value that described optimal beam outputs signal; for described optimal beam outputs signal; a 0for noise spectrum undated parameter.
As can be seen from above, the microphone sound enhancement method that the present invention and embodiment provide and device, the first array voice signal of minimum variance adaptive beam Optimized model to multi-path digital speech signal collection equipment Gather and input is adopted to calculate, and described minimum variance adaptive beam Optimized model comprises the steric direction vector of target sound source to described multi-path digital voice capture device, can to array element distance more greatly microphone array carry out speech enhan-cement process, and high-quality pickup can be realized.In addition, the microphone sound enhancement method that the embodiment of the present invention provides and device are according to the result of voice activity detection, in the different phase of voice, noise signal array power spectrum is estimated to have higher noise accuracy of estimation, thus can further improve the effect of speech enhan-cement.
Accompanying drawing explanation
Fig. 1 is the microphone sound enhancement method schematic flow sheet of the embodiment of the present invention;
Fig. 2 is the raw tone acquisition process schematic flow sheet of an embodiment of the present invention;
Fig. 3 is the noise power Power estimation schematic flow sheet of an embodiment of the present invention;
Fig. 4 is the schematic flow sheet that the noise power Power estimation of an embodiment of the present invention is more detailed;
Fig. 5 is the microphone speech sound enhancement device structural representation of the embodiment of the present invention;
Fig. 6 is the Speech processing schematic diagram of the embodiment of the present invention.
Embodiment
In order to provide effective implementation, the invention provides following examples, below in conjunction with Figure of description, embodiments of the invention being described.
First, beam-forming technology related to the present invention comprises fixed beam and adaptive beam two class.
Fixed beam refers to that the parameter of array signal processing system is not with pickup signal change, but determined by array topology and default noise field model, comprise time domain fixed beam and frequency domain fixed beam etc.Fixed beam is deteriorated in the directive property of medium and low frequency, and voice signal is broadband signal, the robustness of array will be caused to be deteriorated if improve medium and low frequency directive property, therefore less independent employing in the minitype microphone array application of reality.
Adaptive beam is the transition function by automatically estimating sound field situation and sound source arrival microphone, dynamically generates optimal beam parameter according to optimal conditions.In actual applications, the transport function arriving each microphone due to sound source is difficult to estimate, therefore be usually combined with multi-channel noise suppression technology, or after wave beam process, increase post-filtering process, this all needs accurately to estimate noise statistics, and finds optimal balance point between echo signal distortion and squelch degree.
The invention provides a kind of microphone sound enhancement method, comprise step as shown in Figure 1:
Step 101: obtain the first array voice signal by multi-path digital voice capture device Gather and input.
Step 102: according to the minimum variance adaptive beam Optimized model of described first array voice signal, the optimal beam adopting the first array voice signal to calculate synthesized by the first array voice signal outputs signal.
Step 103: the power Spectral Estimation value adopting described optimal beam to output signal is carried out single-channel voice and strengthened process.
The minimum variance adaptive beam Optimized model of described first array voice signal comprises the steric direction vector of target sound source to described multi-path digital voice capture device.
As can be seen from above, microphone sound enhancement method provided by the invention, the first array voice signal of minimum variance adaptive beam Optimized model to multi-path digital speech signal collection equipment Gather and input is adopted to calculate, and described minimum variance adaptive beam Optimized model comprises the steric direction vector of target sound source to described multi-path digital voice capture device, can to array element distance more greatly microphone array carry out speech enhan-cement process, and high-quality pickup can be realized.
In some embodiments of the invention, the power Spectral Estimation value adopting described optimal beam to output signal is carried out single-channel voice and is strengthened in the step of process, and application logMMSE method processes optimal beam output signal.
In some embodiments of the invention, before obtaining the first array voice signal by multi-path digital voice capture device Gather and input, the step shown in Fig. 2 is also comprised:
Step 201: gather raw tone array signal y by multi-path digital voice capture device 1(n) ... y n(n).
Step 202: the time-frequency representation signal y that Short Time Fourier Transform obtains described raw tone array signal is carried out to described primary speech signal 1(k, λ) ... y n(k, λ).
Step 203: adopt optimum super sensing beam coefficient A (k)=[a 1(k) ..., a n(k)] T carries out to described time-frequency representation signal that frequency domain optimum is super points to wave beam process, obtains the first array voice signal i=1 ... N.
Described n is discrete-time variable; N is element number of array; K is frequency numbering; λ is short time frame numbering.
Concrete, the raw tone array signal collected by multi-path digital voice capture device is y 1(n) ... y nn (), the signal collected these multi-path digital voice capture device is according to time window length L wnd, overlapping L between adjacent windows ovlpcarry out windowing brachymemma.What described windowing brachymemma adopted is Hanning window, and overlapping 3/4 window is long.And then the signal after each passage windowing is carried out the expression signal that Short Time Fourier Transform obtains described raw tone array signal time-frequency: y 1(k, λ) ... y n(k, λ).In theory, the expression signal y of described raw tone array signal time-frequency i(k, λ), noise signal v 1the targeted voice signal x (k, λ) that (k, λ) and target sound source send meets following relation:
y i(k,λ)=v 1(k,λ)+x(k,λ)。
Adopt optimum super sensing beam coefficient A (k)=[a 1(k) ..., a n(k)] tthe process of frequency domain optimum super sensing wave beam is carried out to described time-frequency representation signal, obtains the first array voice signal concrete,
y &OverBar; i ( k , &lambda; ) = a i * ( k ) y i ( k , &lambda; ) = a i * ( k ) ( v 1 ( k , &lambda; ) + x ( k , &lambda; ) ) = a i * ( k ) x 1 ( k , &lambda; ) + a i * ( k ) v 1 ( k , &lambda; ) = x 1 &OverBar; ( k , &lambda; ) + v 1 &OverBar; ( k , &lambda; ) ;
Wherein, for the conjugate matrices of a (k); represent the targeted voice signal after frequency domain weighting, represent the noise signal after frequency domain weighting; I=1 ... N.
In certain embodiments, above-mentioned optimum surpass point to beam coefficient determine in conjunction with Sounnd source direction according to the array topology of described multi-path digital voice capture device.
Owing to adopting the process of optimum super sensing wave beam, multi-path digital collecting device array element distance is allowed to be greater than multi-path voice collecting device array element distance of the prior art further.
In certain embodiments, according to the minimum variance adaptive beam Optimized model of described first array voice signal, during the optimal beam output signal adopting the first array voice signal to calculate synthesized by the first array voice signal, adopt following formula:
y &OverBar; ( k , &lambda; ) = &Sigma; i = 1 N w i * a i * y i ( k , &lambda; ) ;
for described optimal beam outputs signal; beam coefficient and the target sound source sef-adapting filter parameter to the steric direction Vector operation of each digital speech collecting device is pointed to for surpassing according to noise signal column vector and optimum; point to for optimum is super beam coefficient A (k)=[a1 (k) ..., a n(k)] tconjugate complex number; y i(k, λ) is described first array voice signal.
Concrete, in certain embodiments, the minimum variance adaptive beam Optimized model of described first array voice signal is:
w ( k ) = arg min w ( k ) w H ( k ) R v ~ ( k ) w ( k ) , And meet w H ( k ) d ~ ( k ) = 1 ;
Wherein, the array element in w (k) with conjugate complex number each other; w hk conjugation transformation of ownership matrix that () is w (k); for the noise coherence matrix estimated according to described first array voice signal; for target sound source is to the steric direction vector of described digital speech collecting device.
According to above-mentioned model, surpass according to noise signal column vector and optimum and point to beam coefficient and target sound source and to the conjugate complex number of the sef-adapting filter parameter of the steric direction Vector operation of each digital speech collecting device be:
w opt ( k ) = R v ~ - 1 ( k ) d ~ ( k ) d ~ H ( k ) R v ~ - 1 ( k ) d ~ ( k ) ;
Wherein, for conjugation transformation of ownership matrix.
More specifically, described target sound source adopts following formula to calculate to the steric direction vector of digital speech collecting device:
d ( k ) = [ exp ( jk d 1 cos ( &theta; ) c f s ) , . . . . . . , a exp ( jk d N cos ( &theta; ) c f s ) ] T ;
Wherein, d 1d nbe the 1st to N number of digital speech collecting device to the distance of digital speech collecting device array center, c is the velocity of sound; f sit is sample frequency; θ is the position angle that target sound source arrives digital speech collecting device.Because signal first points to wave beam process through frequency domain is super, therefore surpass the target sound source after pointing to process through frequency domain and become to the steric direction vector of digital speech collecting device:
d ~ ( k ) = [ a 1 * exp ( jk d 1 cos ( &theta; ) c f s ) , . . . . . . , a N * exp ( jk d N cos ( &theta; ) c f s ) ] T .
More specifically, the noise signal estimated by described first array voice signal is accordingly, the noise coherence matrix that the first array voice signal is estimated is: wherein, E represents expectation; conjugation transformation of ownership matrix.
More specifically, w (k)=[w 1(k) ..., w n(k)] t.
In some embodiments of the invention, described method also comprises the step shown in Fig. 3:
Step 301: voice activity detection VAD (VoiceActivityDetect) is carried out to the noise signal array in the array voice input signal of described multiple passage.
Step 302: the result according to described voice activity detection VAD carries out noise power Power estimation to noise signal array.
Step 303: the optimal power Power estimation value outputed signal according to described optimal beam and described noise power spectrum estimated value are carried out second time to described optimal beam output signal and strengthened.
Become when above-described embodiment can carry out dynamic to the noise signal in the first array voice signal and estimate, the secondary for sound strengthens prepares.
On the whole, without following formula can be adopted during voice to estimate noise:
R ^ v ~ ( k , &lambda; ) = a R R ^ v ~ ( k , &lambda; - 1 ) + ( 1 - a R ) ( v ~ H ( k , &lambda; ) ) v ~ ( k , &lambda; ) ;
When having voice, following formula can be adopted to estimate noise:
R ^ v ~ ( k , &lambda; ) = R ^ v ~ ( k , &lambda; - 1 ) ;
A rfor smoothing factor.
Concrete, according to the result of described voice activity detection VAD, process is as shown in Figure 4 comprised to the step that noise signal array carries out noise power Power estimation:
Step 401: calculate and have voice status, without noise power spectrum when voice status, voice initial state, voice done state.
Step 402: to described have a voice status time noise power spectrum and carry out compromise process without noise power spectrum during voice status, obtain noise power spectrum estimated value.
In certain embodiments, according to the result of described voice activity detection VAD, the step that noise signal array carries out power Spectral Estimation is specifically comprised:
When being in without voice status, following formula is adopted to carry out quick and smooth estimation to noise signal array power spectrum:
&phi; v &OverBar; ( k , &lambda; ) = a 1 &phi; v &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 1 ) &phi; y &OverBar; ( k , &lambda; ) ;
Then calculating noise power spectrum thresholding is according to the following equation adopted:
&theta; v &OverBar; ( k , &lambda; ) = 1 2 L 1 + 1 &Sigma; m = k - L 1 k + L 1 &phi; v &OverBar; ( k , &lambda; ) ;
Wherein, L 1for frequency number.
When being in voice initial state, first adopting following formula to carry out duopole to noise signal array power spectrum and returning level and smooth estimation:
&phi; ^ v &OverBar; 1 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a a ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 1 ( k , &lambda; ) &phi; ^ v &OverBar; 1 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a d ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 1 ( k , &lambda; ) ;
Then noise signal array power Power estimation value during voice initial state is adopted to calculate noise peak:
&phi; v &OverBar; ( k , &lambda; ) = min ( &phi; ^ v &OverBar; 1 ( k , &lambda; ) , 2 &theta; v &OverBar; ( k , &lambda; ) ) ;
When being in voice done state, adopting following formula to carry out duopole to noise signal array power spectrum and returning level and smooth estimation:
&phi; ^ v &OverBar; 2 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a a ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 2 ( k , &lambda; ) &phi; ^ v &OverBar; 2 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a d ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 2 ( k , &lambda; ) ;
Then the noise power spectrum estimated value after compromise is calculated:
&phi; v &OverBar; ( k , &lambda; ) = a 0 &phi; v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a 0 ) max ( &phi; ^ v &OverBar; ( k , &lambda; ) , &theta; v &OverBar; ( k , &lambda; ) ) .
Wherein, a 1for noise spectrum undated parameter; a a, a dbe respectively smoothing factor; a 0for noise spectrum undated parameter; for the quick and smooth estimated value of noise signal array power spectrum; for the duopole of noise signal array power spectrum returns level and smooth estimated value; for described single channel strengthens the optimal beam output signal power Power estimation value of process. for the noise power threshold estimated value of noise signal array.
In certain embodiments, the power Spectral Estimation value of described optimal beam output signal adopts following formula to calculate:
&phi; y &OverBar; ( k , &lambda; ) = a 0 &phi; y &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 0 ) | y &OverBar; ( k , &lambda; ) | 2 ;
Wherein, for the power Spectral Estimation value that described optimal beam outputs signal; for described optimal beam outputs signal; a 0for noise spectrum undated parameter.
Further, by the noise power spectrum estimated value after compromise with the power Spectral Estimation value of optimal beam output signal input postfilter processes, and in this embodiment, Speech processing process schematic is see Fig. 6.Inverse FFT conversion is carried out to the signal after postfilter process, and then the time-domain signal stream after strengthening with the reconstruct of splicing adding method.
Concrete, be the voice signal sampling system of 16kHz for sample frequency, in the embodiment of the present invention, parameters can carry out value with reference to following numerical value:
N=6;L wnd=32ms;L ovlp=24ms;c=340m/s;f s=16000Hz;a 0=0.8;a R=0.95;a 1=0.85;a a=0.995;a d=0.85;L 1=7。
In an embodiment of the present invention, first according to array topology and Sounnd source direction design frequency domain optimum super sensing wave beam, then raw tone array signal is carried out Short Time Fourier Transform, noise coherence matrix is estimated again according to raw tone array signal, and calculating is carried out to the raw tone array signal through Short Time Fourier Transform with the optimum super beam parameters that points to voice signal is enhanced, the dynamic estimation of simultaneously carrying out noise correlation matrix, to upgrade optimum sef-adapting filter parameter, finally improves signal quality with postfilter further.The present invention only needs to use a small amount of microphone can realize the remote speech pickup of high-quality, and have obvious rejection ability to the Complex Noise outside wave beam, voice distortion almost can not be listened out.
As can be seen from above, the microphone sound enhancement method that the embodiment of the present invention provides, noise signal in the primary speech signal of voice capture device Gather and input can be calculated exactly, thus noise signal can effectively be suppressed when speech enhan-cement.
Further, the invention provides a kind of microphone speech sound enhancement device, structure as shown in Figure 5, comprising:
First acquisition module: for obtaining the first array voice signal by multi-path digital voice capture device Gather and input;
Optimal beam output signal computing module: for the minimum variance adaptive beam Optimized model according to described first array voice signal, the optimal beam adopting the first array voice signal to calculate synthesized by the first array voice signal outputs signal;
First strengthens module: the power Spectral Estimation value outputed signal for adopting described optimal beam is carried out single-channel voice and strengthened process;
The minimum variance adaptive beam Optimized model of described first array voice signal comprises the steric direction vector of target sound source to described multi-path digital voice capture device.
As can be seen from above, the microphone speech sound enhancement device that the embodiment of the present invention provides, adopt the first array voice signal of optimal beam output signal computing module process multi-path digital voice capture device Gather and input, apply minimum variance adaptive beam Optimized model simultaneously, calculate the optimal beam output signal of the first array voice signal, can, microphone array that spacing larger more to digital voice capture device array elements.
Still with reference to Fig. 5, in certain embodiments, described device also comprises:
Original signal acquisition module: for gathering raw tone array signal y by multi-path digital voice capture device 1(n) ... y n(n);
Original signal conversion module: the time-frequency representation signal y obtaining described raw tone array signal for carrying out Short Time Fourier Transform to described primary speech signal 1(k, λ) ... y n(k, λ);
Optimum super sensing wave beam processing module: point to beam coefficient a (k)=[a for adopting optimum surpassing 1(k) ..., a n(k)] tthe process of frequency domain optimum super sensing wave beam is carried out to described time-frequency representation signal, obtains the first array voice signal i=1 ... N;
Described n is discrete-time variable; N is element number of array; K is frequency numbering; λ is short time frame numbering.
In certain embodiments, described optimum surpass point to beam coefficient set according to the set-up mode of described multi-path digital voice capture device.
In certain embodiments, described optimal beam output signal computing module is according to the minimum variance adaptive beam Optimized model of described first array voice signal, during the optimal beam output signal adopting the first array voice signal to calculate synthesized by the first array voice signal, adopt following formula:
y &OverBar; ( k , &lambda; ) = &Sigma; i = 1 N w i * a i * y i ( k , &lambda; ) ;
for described optimal beam outputs signal; beam coefficient and the target sound source sef-adapting filter parameter to the steric direction Vector operation of each digital speech collecting device is pointed to for surpassing according to noise signal column vector and optimum; beam coefficient A (k)=[a is pointed to for optimum is super 1(k) ..., a n(k)] tmiddle array element a iconjugate complex number; y i(k, λ) is described first array voice signal.
In certain embodiments, the minimum variance adaptive beam Optimized model of the first array voice signal is:
w ( k ) = arg min w ( k ) w H ( k ) R v ~ ( k ) w ( k ) , And meet w H ( k ) d ~ ( k ) = 1 ;
Wherein, the array element in w (k) is conjugate complex number each other; w hk conjugation transformation of ownership matrix that () is w (k); for the noise coherence matrix estimated according to described first array voice signal; for target sound source is to the steric direction vector of described digital speech collecting device.
In certain embodiments, optimal beam output signal computing module calculates the first array voice signal institute when outputing signal with the optimal beam that becomes, the target sound source adopted to the steric direction vector of digital speech collecting device according to following formulae discovery:
d ~ ( k ) = [ a 1 * exp ( jk d 1 cos ( &theta; ) c f s ) , . . . . . . , a N * exp ( jk d N cos ( &theta; ) c f s ) ] T ;
Wherein, d 1d nbe the 1st to N number of digital speech collecting device to the distance of digital speech collecting device array center, c is the velocity of sound; f sit is sample frequency; θ is the position angle that target sound source arrives digital speech collecting device; a i* be optimum super sensing beam coefficient A (k)=[a 1(k) ..., a n(k)] tmiddle array element a iconjugate complex number.
Still with reference to Fig. 5, in certain embodiments, described device also comprises:
VAD module: carry out voice activity detection VAD for the noise signal array in the array voice input signal to described multiple passage;
Noise power spectrum estimation module: noise power Power estimation is carried out to noise signal array for the result according to described voice activity detection VAD;
Second strengthens module: carry out second time enhancing for the optimal power Power estimation value that outputs signal according to described optimal beam and described noise power spectrum estimated value to described optimal beam output signal.
Still with reference to Fig. 5, in certain embodiments, described noise power spectrum estimation module comprises:
First noise power spectrum computing unit: have voice status, without noise power spectrum when voice status, voice initial state, voice done state for calculating;
Second noise power spectrum computing unit: for described have a voice status time noise power spectrum and carry out compromise process without noise power spectrum during voice status, obtain noise power spectrum estimated value.
In certain embodiments, described first noise power spectrum computing unit specifically comprises:
Without voice status computation subunit: for when being in without voice status, adopt following formula to noise signal array power Power estimation:
&phi; v &OverBar; ( k , &lambda; ) = a 1 &phi; v &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 1 ) &phi; y &OverBar; ( k , &lambda; ) ;
Voice start and have voice status computation subunit: for when being in voice initial state and have voice status, adopt following formula to estimate noise signal array power spectrum:
&phi; v &OverBar; ( k , &lambda; ) = min ( &phi; ^ v &OverBar; 1 ( k , &lambda; ) , 2 &theta; v &OverBar; ( k , &lambda; ) ) ;
Without voice status computation subunit: for when being in voice done state, adopting following formula to carry out duopole to noise signal array power spectrum and returning level and smooth estimation:
&phi; v &OverBar; ( k , &lambda; ) = a 0 &phi; v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a 0 ) max ( &phi; ^ v &OverBar; ( k , &lambda; ) , &theta; v &OverBar; ( k , &lambda; ) ) ;
In above-mentioned formula, &theta; v &OverBar; ( k , &lambda; ) = 1 2 L 1 + 1 &Sigma; m = k - L 1 k + L 1 &phi; v &OverBar; ( k , &lambda; ) ;
&phi; ^ v &OverBar; 1 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a a ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 1 ( k , &lambda; ) &phi; ^ v &OverBar; 1 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a d ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 1 ( k , &lambda; ) ;
&phi; ^ v &OverBar; 2 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a a ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 2 ( k , &lambda; ) &phi; ^ v &OverBar; 2 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a d ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 2 ( k , &lambda; ) ;
Wherein, a 1for noise spectrum undated parameter; a a, a dbe respectively smoothing factor.
In certain embodiments, the power Spectral Estimation value of described optimal beam output signal adopts following formula to calculate:
&phi; y &OverBar; ( k , &lambda; ) = a 0 &phi; y &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 0 ) | y &OverBar; ( k , &lambda; ) | 2 ;
Wherein, for the power Spectral Estimation value that described optimal beam outputs signal; for described optimal beam outputs signal; a 0for noise spectrum undated parameter.
Further, by the noise power spectrum estimated value after compromise with the power Spectral Estimation value of optimal beam output signal input postfilter processes.Inverse FFT conversion is carried out to the signal after postfilter process, and then the time-domain signal stream after strengthening with the reconstruct of splicing adding method.
As can be seen from above, the microphone speech sound enhancement device that the embodiment of the present invention provides, can effectively the noise signal in the first array voice signal of multi-path digital voice capture device Gather and input be estimated and be processed, be conducive to effective filtering noise signal in the process strengthened in subsequent voice, improve speech enhan-cement effect.
Should be appreciated that multiple embodiments described by this instructions are only for instruction and explanation of the present invention, are not intended to limit the present invention.And when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.
Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims (20)

1. a microphone sound enhancement method, is characterized in that, comprises the steps:
Obtain the first array voice signal by multi-path digital voice capture device Gather and input;
According to the minimum variance adaptive beam Optimized model of described first array voice signal, the optimal beam adopting the first array voice signal to calculate synthesized by the first array voice signal outputs signal;
The power Spectral Estimation value adopting described optimal beam to output signal is carried out single-channel voice and is strengthened process;
The minimum variance adaptive beam Optimized model of described first array voice signal comprises the steric direction vector of target sound source to described multi-path digital voice capture device.
2. method according to claim 1, is characterized in that, before obtaining the first array voice signal by multi-path digital voice capture device Gather and input, also comprises:
Raw tone array signal y is gathered by multi-path digital voice capture device 1(n) ... y n(n);
The time-frequency representation signal y that Short Time Fourier Transform obtains described raw tone array signal is carried out to described primary speech signal 1(k, λ) ... y n(k, λ);
Adopt optimum super sensing beam coefficient A (k)=[a 1(k) ..., a n(k)] tthe process of frequency domain optimum super sensing wave beam is carried out to described time-frequency representation signal, obtains the first array voice signal i=1 ... N;
Described n is discrete-time variable; N is element number of array; K is frequency numbering; λ is short time frame numbering.
3. method according to claim 2, is characterized in that, described optimum surpasses sensing beam coefficient and sets according to the set-up mode of described multi-path digital voice capture device.
4. method according to claim 1, it is characterized in that, according to the minimum variance adaptive beam Optimized model of described first array voice signal, during the optimal beam output signal adopting the first array voice signal to calculate synthesized by the first array voice signal, adopt following formula:
y &OverBar; ( k , &lambda; ) = &Sigma; i = 1 N w i * a i * y i ( k , &lambda; ) ;
beam coefficient and the target sound source sef-adapting filter parameter to the steric direction Vector operation of each digital speech collecting device is pointed to for surpassing according to noise signal column vector and optimum; beam coefficient A (k)=[a is pointed to for optimum is super 1(k) ..., a n(k)] tmiddle array element a iconjugate complex number; y i(k, λ) is described first array voice signal.
5. method according to claim 3, is characterized in that, the minimum variance adaptive beam Optimized model of described first array voice signal is:
w ( k ) = arg min w ( k ) w H ( k ) R v ~ ( k ) w ( k ) , And meet w H ( k ) d ~ ( k ) = 1 ;
Wherein, the array element in w (k) with conjugate complex number each other; w hk conjugation transformation of ownership matrix that () is w (k); for the noise coherence matrix estimated according to described first array voice signal; for target sound source is to the steric direction vector of described digital speech collecting device.
6. method according to claim 5, is characterized in that, described target sound source to the steric direction vector of digital speech collecting device according to following formulae discovery:
d ~ ( k ) = [ a 1 * exp ( jk d 1 cos ( &theta; ) c f s ) , . . . . . . , a N * exp ( jk d N cos ( &theta; ) c f s ) ] T ;
Wherein, d 1d nbe the 1st to N number of digital speech collecting device to the distance of digital speech collecting device array center, c is the velocity of sound; f sit is sample frequency; θ is the position angle that target sound source arrives digital speech collecting device; beam coefficient A (k)=[a is pointed to for optimum is super 1(k) ..., a n(k)] tmiddle array element a iconjugate complex number.
7. method according to claim 1, is characterized in that, described method also comprises:
Voice activity detection VAD is carried out to the noise signal array in the array voice input signal of described multiple passage;
Result according to described voice activity detection VAD carries out noise power Power estimation to noise signal array;
The optimal power Power estimation value outputed signal according to described optimal beam and described noise power spectrum estimated value are carried out second time to described optimal beam output signal and are strengthened.
8. method according to claim 7, is characterized in that, the result according to described voice activity detection VAD comprises the step that noise signal array carries out noise power Power estimation:
Calculate and have voice status, without noise power spectrum when voice status, voice initial state, voice done state;
To described have a voice status time noise power spectrum and carry out compromise process without noise power spectrum during voice status, obtain noise power spectrum estimated value.
9. method according to claim 8, is characterized in that, calculating has voice status, specifically comprises without the step of noise power spectrum when voice status, voice initial state, voice done state:
When being in without voice status, adopt following formula to noise signal array power Power estimation:
&phi; v &OverBar; ( k , &lambda; ) = a 1 &phi; v &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 1 ) &phi; y &OverBar; ( k , &lambda; ) ;
When being in voice initial state and have voice status, following formula is adopted to estimate noise signal array power spectrum:
&phi; v &OverBar; ( k , &lambda; ) = min ( &phi; ^ v &OverBar; 1 ( k , &lambda; ) , 2 &theta; v &OverBar; ( k , &lambda; ) ) ;
When being in voice done state, adopting following formula to carry out duopole to noise signal array power spectrum and return level and smooth estimation:
&phi; v &OverBar; ( k , &lambda; ) = a 0 &phi; v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a 0 ) max ( &phi; ^ v &OverBar; ( k , &lambda; ) , &theta; v &OverBar; ( k , &lambda; ) ) ;
In above-mentioned formula, &theta; v &OverBar; ( k , &lambda; ) = 1 2 L 1 + 1 &Sigma; m = k - L 1 k + L 1 &phi; v &OverBar; ( k , &lambda; ) ;
&phi; ^ v &OverBar; 1 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a a ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 1 ( k , &lambda; ) &phi; ^ v &OverBar; 1 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a d ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 1 ( k , &lambda; ) ;
Wherein, a 1for noise spectrum undated parameter; a a, a dbe respectively smoothing factor.
10. method according to claim 1, is characterized in that, the power Spectral Estimation value of described optimal beam output signal adopts following formula to calculate:
&phi; y &OverBar; ( k , &lambda; ) = a 0 &phi; y &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 0 ) | y &OverBar; ( k , &lambda; ) | 2 ;
Wherein, for the power Spectral Estimation value that described optimal beam outputs signal; for described optimal beam outputs signal; a 0for noise spectrum undated parameter.
11. 1 kinds of microphone speech sound enhancement devices, is characterized in that, comprising:
First acquisition module: for obtaining the first array voice signal by multi-path digital voice capture device Gather and input;
Optimal beam output signal computing module: for the minimum variance adaptive beam Optimized model according to described first array voice signal, the optimal beam adopting the first array voice signal to calculate synthesized by the first array voice signal outputs signal;
First strengthens module: the power Spectral Estimation value outputed signal for adopting described optimal beam is carried out single-channel voice and strengthened process;
The minimum variance adaptive beam Optimized model of described first array voice signal comprises the steric direction vector of target sound source to described multi-path digital voice capture device.
12. devices according to claim 11, is characterized in that, described device also comprises:
Original signal acquisition module: for gathering raw tone array signal y by multi-path digital voice capture device 1(n) ... y n(n);
Original signal conversion module: the time-frequency representation signal y obtaining described raw tone array signal for carrying out Short Time Fourier Transform to described primary speech signal 1(k, λ) ... y n(k, λ);
Optimum super sensing wave beam processing module: point to beam coefficient A (k)=[a for adopting optimum surpassing 1(k) ..., a n(k)] tthe process of frequency domain optimum super sensing wave beam is carried out to described time-frequency representation signal, obtains the first array voice signal i=1 ... N;
Described n is discrete-time variable; N is element number of array; K is frequency numbering; λ is short time frame numbering.
13. devices according to claim 12, is characterized in that, described optimum surpasses sensing beam coefficient and sets according to the set-up mode of described multi-path digital voice capture device.
14. devices according to claim 11, it is characterized in that, described optimal beam output signal computing module is according to the minimum variance adaptive beam Optimized model of described first array voice signal, during the optimal beam output signal adopting the first array voice signal to calculate synthesized by the first array voice signal, adopt following formula:
y &OverBar; ( k , &lambda; ) = &Sigma; i = 1 N w i * a i * y i ( k , &lambda; ) ;
for described optimal beam outputs signal; beam coefficient and the target sound source sef-adapting filter parameter to the steric direction Vector operation of each digital speech collecting device is pointed to for surpassing according to noise signal column vector and optimum; beam coefficient A (k)=[a is pointed to for optimum is super 1(k) ..., a n(k)] tmiddle array element a iconjugate complex number; y i(k, λ) is described first array voice signal.
15. devices according to claim 13, is characterized in that, the minimum variance adaptive beam Optimized model of the first array voice signal is:
w ( k ) = arg min w ( k ) w H ( k ) R v ~ ( k ) w ( k ) , And meet w H ( k ) d ~ ( k ) = 1 ;
Wherein, the array element in w (k) with conjugate complex number each other; w hk conjugation transformation of ownership matrix that () is w (k); for the noise coherence matrix estimated according to described first array voice signal; for target sound source is to the steric direction vector of described digital speech collecting device.
16. devices according to claim 15, it is characterized in that, optimal beam output signal computing module calculates the first array voice signal institute when outputing signal with the optimal beam that becomes, the target sound source adopted to the steric direction vector of digital speech collecting device according to following formulae discovery:
d ~ ( k ) = [ a 1 * exp ( jk d 1 cos ( &theta; ) c f s ) , . . . . . . , a N * exp ( jk d N cos ( &theta; ) c f s ) ] T ;
Wherein, d 1d nbe the 1st to N number of digital speech collecting device to the distance of digital speech collecting device array center, c is the velocity of sound; f sit is sample frequency; θ is the position angle that target sound source arrives digital speech collecting device; beam coefficient A (k)=[a is pointed to for optimum is super 1(k) ..., a n(k)] array element a in T iconjugate complex number.
17. devices according to claim 11, is characterized in that, also comprise:
VAD module: carry out voice activity detection VAD for the noise signal array in the array voice input signal to described multiple passage;
Noise power spectrum estimation module: noise power Power estimation is carried out to noise signal array for the result according to described voice activity detection VAD;
Second strengthens module: carry out second time enhancing for the optimal power Power estimation value that outputs signal according to described optimal beam and described noise power spectrum estimated value to described optimal beam output signal.
18. devices according to claim 17, is characterized in that, described noise power spectrum estimation module comprises:
First noise power spectrum computing unit: have voice status, without noise power spectrum when voice status, voice initial state, voice done state for calculating;
Second noise power spectrum computing unit: for described have a voice status time noise power spectrum and carry out compromise process without noise power spectrum during voice status, obtain noise power spectrum estimated value.
19. devices according to claim 18, is characterized in that, described first noise power spectrum computing unit specifically comprises:
Without voice status computation subunit: for when being in without voice status, adopt following formula to noise signal array power Power estimation:
&phi; v &OverBar; ( k , &lambda; ) = a 1 &phi; v &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 1 ) &phi; y &OverBar; ( k , &lambda; ) ;
Voice start and have voice status computation subunit: for when being in voice initial state and have voice status, adopt following formula to estimate noise signal array power spectrum:
Without voice status computation subunit: for when being in voice done state, adopting following formula to carry out duopole to noise signal array power spectrum and returning level and smooth estimation:
&phi; v &OverBar; ( k , &lambda; ) = a 0 &phi; v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a 0 ) max ( &phi; ^ v &OverBar; ( k , &lambda; ) , &theta; v &OverBar; ( k , &lambda; ) ) ;
In above-mentioned formula, &theta; v &OverBar; ( k , &lambda; ) = 1 2 L 1 + 1 &Sigma; m = k - L 1 k + L 1 &phi; v &OverBar; ( k , &lambda; ) ;
&phi; ^ v &OverBar; 1 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a a ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 1 ( k , &lambda; ) &phi; ^ v &OverBar; 1 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 1 ( k , &lambda; - 1 ) + ( 1 - a d ) &phi; y &OverBar; ( k , &lambda; ) , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 1 ( k , &lambda; ) ;
&phi; ^ v &OverBar; 2 ( k , &lambda; ) = a a &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a a ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) &GreaterEqual; &phi; ^ v &OverBar; 2 ( k , &lambda; ) &phi; ^ v &OverBar; 2 ( k , &lambda; ) = a d &phi; ^ v &OverBar; 2 ( k , &lambda; - 1 ) + ( 1 - a d ) | y &OverBar; ( k , &lambda; ) | 2 , if &phi; y &OverBar; ( k , &lambda; ) < &phi; ^ v &OverBar; 2 ( k , &lambda; ) ;
Wherein, a 1for noise spectrum undated parameter; a a, a dbe respectively smoothing factor.
20. devices according to claim 11, is characterized in that, the power Spectral Estimation value of described optimal beam output signal adopts following formula to calculate:
&phi; y &OverBar; ( k , &lambda; ) = a 0 &phi; y &OverBar; ( k , &lambda; - 1 ) + ( 1 - a 0 ) | y &OverBar; ( k , &lambda; ) | 2 ;
Wherein, for the power Spectral Estimation value that described optimal beam outputs signal; for described optimal beam outputs signal; a 0for noise spectrum undated parameter.
CN201410305776.4A 2014-06-27 2014-06-27 Microphone speech enhancement method and microphone speech enhancement device Withdrawn CN105244036A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410305776.4A CN105244036A (en) 2014-06-27 2014-06-27 Microphone speech enhancement method and microphone speech enhancement device
PCT/CN2014/092217 WO2015196729A1 (en) 2014-06-27 2014-11-25 Microphone array speech enhancement method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410305776.4A CN105244036A (en) 2014-06-27 2014-06-27 Microphone speech enhancement method and microphone speech enhancement device

Publications (1)

Publication Number Publication Date
CN105244036A true CN105244036A (en) 2016-01-13

Family

ID=54936653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410305776.4A Withdrawn CN105244036A (en) 2014-06-27 2014-06-27 Microphone speech enhancement method and microphone speech enhancement device

Country Status (2)

Country Link
CN (1) CN105244036A (en)
WO (1) WO2015196729A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869651A (en) * 2016-03-23 2016-08-17 北京大学深圳研究生院 Two-channel beam forming speech enhancement method based on noise mixed coherence
CN106448693A (en) * 2016-09-05 2017-02-22 华为技术有限公司 Speech signal processing method and apparatus
CN107018470A (en) * 2016-01-28 2017-08-04 讯飞智元信息科技有限公司 A kind of voice recording method and system based on annular microphone array
CN107785029A (en) * 2017-10-23 2018-03-09 科大讯飞股份有限公司 Target voice detection method and device
CN108417208A (en) * 2018-03-26 2018-08-17 宇龙计算机通信科技(深圳)有限公司 A kind of pronunciation inputting method and device
CN109346100A (en) * 2018-10-25 2019-02-15 烟台市奥境数字科技有限公司 A kind of network transfer method of Digital Media interactive instructional system
CN110890100A (en) * 2018-09-10 2020-03-17 杭州海康威视数字技术股份有限公司 Voice enhancement method, multimedia data acquisition method, multimedia data playing method, device and monitoring system
CN110970046A (en) * 2019-11-29 2020-04-07 北京搜狗科技发展有限公司 Audio data processing method and device, electronic equipment and storage medium
CN113053406A (en) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 Sound signal identification method and device
CN113645542A (en) * 2020-05-11 2021-11-12 阿里巴巴集团控股有限公司 Voice signal processing method and system and audio and video communication equipment

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106125056B (en) * 2016-06-13 2018-07-06 西安电子科技大学 Minimum variance Power estimation method based on modifying factor
CN106371079B (en) * 2016-08-19 2018-11-20 西安电子科技大学 The multiple signal classification Power estimation method sharpened based on spectrum
CN109389991A (en) * 2018-10-24 2019-02-26 中国科学院上海微系统与信息技术研究所 A kind of signal enhancing method based on microphone array
CN109884591B (en) * 2019-02-25 2023-04-28 南京理工大学 Microphone array-based multi-rotor unmanned aerial vehicle acoustic signal enhancement method
CN112216295B (en) * 2019-06-25 2024-04-26 大众问问(北京)信息科技有限公司 Sound source positioning method, device and equipment
CN110415720B (en) * 2019-07-11 2020-05-12 湖北工业大学 Quaternary differential microphone array super-directivity frequency-invariant beam forming method
CN110444220B (en) * 2019-08-01 2023-02-10 浙江大学 Multi-mode remote voice perception method and device
CN111880146B (en) * 2020-06-30 2023-08-18 海尔优家智能科技(北京)有限公司 Sound source orientation method and device and storage medium
CN111866665B (en) * 2020-07-22 2022-01-28 海尔优家智能科技(北京)有限公司 Microphone array beam forming method and device
CN112712818A (en) * 2020-12-29 2021-04-27 苏州科达科技股份有限公司 Voice enhancement method, device and equipment
CN113030862B (en) * 2021-03-12 2023-06-02 中国科学院声学研究所 Multichannel voice enhancement method and device
CN113223552B (en) * 2021-04-28 2023-06-13 锐迪科微电子(上海)有限公司 Speech enhancement method, device, apparatus, storage medium, and program
CN113329288B (en) * 2021-04-29 2022-07-19 开放智能技术(南京)有限公司 Bluetooth headset noise reduction method based on notch technology
CN113628634B (en) * 2021-08-20 2023-10-03 随锐科技集团股份有限公司 Real-time voice separation method and device guided by directional information
CN114913868B (en) * 2022-05-17 2023-05-05 电子科技大学 Acoustic array directional pickup method based on FPGA

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1666495A (en) * 2002-07-01 2005-09-07 皇家飞利浦电子股份有限公司 Stationary spectral power dependent audio enhancement system
CN101238511A (en) * 2005-08-11 2008-08-06 旭化成株式会社 Sound source separating device, speech recognizing device, portable telephone, and sound source separating method, and program
CN101763858A (en) * 2009-10-19 2010-06-30 瑞声声学科技(深圳)有限公司 Method for processing double-microphone signal
CN101778322A (en) * 2009-12-07 2010-07-14 中国科学院自动化研究所 Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic
US20110231185A1 (en) * 2008-06-09 2011-09-22 Kleffner Matthew D Method and apparatus for blind signal recovery in noisy, reverberant environments
CN102930870A (en) * 2012-09-27 2013-02-13 福州大学 Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
CN103262163A (en) * 2010-10-25 2013-08-21 弗兰霍菲尔运输应用研究公司 Echo suppression comprising modeling of late reverberation components
CN103308889A (en) * 2013-05-13 2013-09-18 辽宁工业大学 Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment
CN103856866A (en) * 2012-12-04 2014-06-11 西北工业大学 Low-noise differential microphone array

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1212608C (en) * 2003-09-12 2005-07-27 中国科学院声学研究所 A multichannel speech enhancement method using postfilter
US7783478B2 (en) * 2007-01-03 2010-08-24 Alexander Goldin Two stage frequency subband decomposition
CN101447190A (en) * 2008-06-25 2009-06-03 北京大学深圳研究生院 Voice enhancement method employing combination of nesting-subarray-based post filtering and spectrum-subtraction
CN102509552B (en) * 2011-10-21 2013-09-11 浙江大学 Method for enhancing microphone array voice based on combined inhibition
JP2013201525A (en) * 2012-03-23 2013-10-03 Mitsubishi Electric Corp Beam forming processing unit
CN103235959B (en) * 2013-04-01 2016-08-24 深圳市远望谷信息技术股份有限公司 The method that aerial array output forms digital beam is made in read write line

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1666495A (en) * 2002-07-01 2005-09-07 皇家飞利浦电子股份有限公司 Stationary spectral power dependent audio enhancement system
CN101238511A (en) * 2005-08-11 2008-08-06 旭化成株式会社 Sound source separating device, speech recognizing device, portable telephone, and sound source separating method, and program
US20110231185A1 (en) * 2008-06-09 2011-09-22 Kleffner Matthew D Method and apparatus for blind signal recovery in noisy, reverberant environments
CN101763858A (en) * 2009-10-19 2010-06-30 瑞声声学科技(深圳)有限公司 Method for processing double-microphone signal
CN101778322A (en) * 2009-12-07 2010-07-14 中国科学院自动化研究所 Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic
CN103262163A (en) * 2010-10-25 2013-08-21 弗兰霍菲尔运输应用研究公司 Echo suppression comprising modeling of late reverberation components
CN102930870A (en) * 2012-09-27 2013-02-13 福州大学 Bird voice recognition method using anti-noise power normalization cepstrum coefficients (APNCC)
CN103856866A (en) * 2012-12-04 2014-06-11 西北工业大学 Low-noise differential microphone array
CN103308889A (en) * 2013-05-13 2013-09-18 辽宁工业大学 Passive sound source two-dimensional DOA (direction of arrival) estimation method under complex environment

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107018470A (en) * 2016-01-28 2017-08-04 讯飞智元信息科技有限公司 A kind of voice recording method and system based on annular microphone array
CN105869651B (en) * 2016-03-23 2019-05-31 北京大学深圳研究生院 Binary channels Wave beam forming sound enhancement method based on noise mixing coherence
CN105869651A (en) * 2016-03-23 2016-08-17 北京大学深圳研究生院 Two-channel beam forming speech enhancement method based on noise mixed coherence
CN106448693A (en) * 2016-09-05 2017-02-22 华为技术有限公司 Speech signal processing method and apparatus
CN106448693B (en) * 2016-09-05 2019-11-29 华为技术有限公司 A kind of audio signal processing method and device
CN107785029A (en) * 2017-10-23 2018-03-09 科大讯飞股份有限公司 Target voice detection method and device
US11308974B2 (en) 2017-10-23 2022-04-19 Iflytek Co., Ltd. Target voice detection method and apparatus
CN107785029B (en) * 2017-10-23 2021-01-29 科大讯飞股份有限公司 Target voice detection method and device
CN108417208B (en) * 2018-03-26 2020-09-11 宇龙计算机通信科技(深圳)有限公司 Voice input method and device
CN108417208A (en) * 2018-03-26 2018-08-17 宇龙计算机通信科技(深圳)有限公司 A kind of pronunciation inputting method and device
CN110890100A (en) * 2018-09-10 2020-03-17 杭州海康威视数字技术股份有限公司 Voice enhancement method, multimedia data acquisition method, multimedia data playing method, device and monitoring system
CN110890100B (en) * 2018-09-10 2022-11-18 杭州海康威视数字技术股份有限公司 Voice enhancement method, multimedia data acquisition method, multimedia data playing method, device and monitoring system
CN109346100A (en) * 2018-10-25 2019-02-15 烟台市奥境数字科技有限公司 A kind of network transfer method of Digital Media interactive instructional system
CN110970046A (en) * 2019-11-29 2020-04-07 北京搜狗科技发展有限公司 Audio data processing method and device, electronic equipment and storage medium
CN110970046B (en) * 2019-11-29 2022-03-11 北京搜狗科技发展有限公司 Audio data processing method and device, electronic equipment and storage medium
CN113645542A (en) * 2020-05-11 2021-11-12 阿里巴巴集团控股有限公司 Voice signal processing method and system and audio and video communication equipment
CN113053406A (en) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 Sound signal identification method and device

Also Published As

Publication number Publication date
WO2015196729A1 (en) 2015-12-30

Similar Documents

Publication Publication Date Title
CN105244036A (en) Microphone speech enhancement method and microphone speech enhancement device
CN100524465C (en) A method and device for noise elimination
CN106251877B (en) Voice Sounnd source direction estimation method and device
CN101510426B (en) Method and system for eliminating noise
JP5542952B2 (en) Microphone array noise reduction control method and apparatus
CN106782590B (en) Microphone array beam forming method based on reverberation environment
US8654990B2 (en) Multiple microphone based directional sound filter
CN106504763A (en) Based on blind source separating and the microphone array multiple target sound enhancement method of spectrum-subtraction
CN109285557B (en) Directional pickup method and device and electronic equipment
CN106710601A (en) Voice signal de-noising and pickup processing method and apparatus, and refrigerator
JP4521549B2 (en) A method for separating a plurality of sound sources in the vertical and horizontal directions, and a system therefor
JP6644959B1 (en) Audio capture using beamforming
Niwa et al. Post-filter design for speech enhancement in various noisy environments
CN105590631A (en) Method and apparatus for signal processing
CN105575397A (en) Voice noise reduction method and voice collection device
CN102347028A (en) Double-microphone speech enhancer and speech enhancement method thereof
CN106340292A (en) Voice enhancement method based on continuous noise estimation
JP2020503780A (en) Method and apparatus for audio capture using beamforming
CN104835503A (en) Improved GSC self-adaptive speech enhancement method
CN105225672A (en) Merge the system and method for the directed noise suppression of dual microphone of fundamental frequency information
Lefkimmiatis et al. A generalized estimation approach for linear and nonlinear microphone array post-filters
CN106031196A (en) Signal-processing device, method, and program
Cherkassky et al. Blind synchronization in wireless sensor networks with application to speech enhancement
CN110739004B (en) Distributed voice noise elimination system for WASN
Nabi et al. A dual-channel noise reduction algorithm based on the coherence function and the bionic wavelet

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20160113

WW01 Invention patent application withdrawn after publication