CN105976826A - Speech noise reduction method applied to dual-microphone small handheld device - Google Patents

Speech noise reduction method applied to dual-microphone small handheld device Download PDF

Info

Publication number
CN105976826A
CN105976826A CN201610286545.2A CN201610286545A CN105976826A CN 105976826 A CN105976826 A CN 105976826A CN 201610286545 A CN201610286545 A CN 201610286545A CN 105976826 A CN105976826 A CN 105976826A
Authority
CN
China
Prior art keywords
mike
noise
voice
signal
time
Prior art date
Application number
CN201610286545.2A
Other languages
Chinese (zh)
Other versions
CN105976826B (en
Inventor
叶中付
鲍光照
罗友
Original Assignee
中国科学技术大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学技术大学 filed Critical 中国科学技术大学
Priority to CN201610286545.2A priority Critical patent/CN105976826B/en
Publication of CN105976826A publication Critical patent/CN105976826A/en
Application granted granted Critical
Publication of CN105976826B publication Critical patent/CN105976826B/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Abstract

The invention discloses a speech noise reduction method applied to a dual-microphone small handheld device. A corresponding noise reduction method can be selected according to a different mode used by a user. In a normal handheld conversation mode, a method based on power level differences is applied to a near field signal model in the mode, and large difference information in the energy of speech signal receiving by primary and secondary microphones can be effectively used; in a loudspeaker using mode, when a method based on relevance between the two microphones is used, estimation of a noise power spectrum is not needed, and influences of inconsistency noise receiving by the two microphones on the method based on power level differences can be avoided; and in addition, compared with the traditional method based on the power level differences, the method of the invention can effectively avoid music noise caused by error power spectrum estimation through built Wiener filtering.

Description

It is applied to the voice de-noising method of dual microphone small hand held devices

Technical field

The present invention relates to voice de-noising technical field, particularly relate to a kind of language being applied to dual microphone small hand held devices Sound noise-reduction method.

Background technology

In existing small hand held devices, frequently with dual microphone and three mikes.Prior art utilizes main Mike Wind receives expectation voice and background noise, and secondary mike receives background noise, and assumes the back of the body that the two mike receives Scape noise is consistent, thus the difference utilizing two mikes to receive signal exports as desired signal.

This method can produce music noise in the case of background noise is inconsistent, and when time mike comprises expectation Voice distortion can be produced the when of phonetic element.Additionally, utilize the dual pathways filtering technique of phase contrast (time delay estimation) to remain Music noise, and being built upon under far field condition by the feasibility of phase difference calculating Sounnd source direction, be not particularly suited for hand-held closely The environment of field.

Summary of the invention

It is an object of the invention to provide a kind of voice de-noising method being applied to dual microphone small hand held devices, Ke Yijin Expectation voice may be extracted without distortions, eliminate background noise.

It is an object of the invention to be achieved through the following technical solutions:

A kind of voice de-noising method being applied to dual microphone small hand held devices, it is characterised in that including:

Receive the voice signal of dual microphone, and carry out the differentiation of call mode;

If being currently hand-held call mode, then voice de-noising method based on dual microphone power level difference is used to obtain Take the voice signal after noise reduction;

If being currently speaker call mode, then use and obtain based on the voice de-noising method of coherence between dual microphone Take the voice signal after noise reduction.

2, method according to claim 1, it is characterised in that described employing is based on dual microphone power level difference Voice de-noising method obtain noise reduction after voice signal include:

The voice signal that main mike receives with time mike is as follows:

x1(m)=h1(m)*s(m)+n1(m);

x2(m)=h2(m)*s(m)+n2(m);

Wherein, m is sampled point, x1M () is the voice signal that main mike receives, x2M () receives for time mike Voice signal;hiM () is sonic propagation model shock response, niM () is noise, i=1,2;S (m) is target voice, and * is convolution Operation;

The voice signal receiving main mike and time mike does Short Time Fourier Transform respectively, obtains:

X1(n, k)=H1(n,k)S(n,k)+N1(n,k);

X2(n, k)=H2(n,k)S(n,k)+N2(n,k);

Wherein, n and k express time point and Frequency point respectively;Above-mentioned two formula is rewritten as:

X1(n, k)=S1(n,k)+N1(n,k);

X2(n, k)=H12(n,k)S1(n,k)+N2(n,k);

Wherein, S1(n k) represents H1(n, k) (n k), thus has SBecome according to Fourier in short-term Change result and calculate the power spectral density PSD of main mike and time mike band noise respectively, obtain:

P X 1 ( n , k ) = P S 1 ( n , k ) + P N 1 ( n , k ) ;

P X 2 ( n , k ) = | H 12 ( n , k ) | 2 P S 1 ( n , k ) + P N 2 ( n , k ) ;

WillWithSubtract each other and obtain:

P X 1 ( n , k ) - P X 2 ( n , k ) = ( 1 - | H 12 ( n , k ) | 2 ) P S 1 ( n , k ) + P N 1 ( n , k ) - P N 2 ( n , k ) ;

OrderWherein Δ PN≈ 0, then have:

| ΔP X ( n , k ) | = | ( 1 - | H 12 ( n , k ) | 2 ) | P S 1 ( n , k ) ;

Utilize the voice signal PSD estimated and noise signal PSD can construct Wiener filter GΔP(n,k);Wherein utilize The main mike voice signal PSD estimatedWith noise signal PSDWhen constructing:

G Δ P ( n , k ) = P ^ S 1 ( n , k ) P ^ S 1 ( n , k ) + P ^ N 1 ( n , k ) ;

Wherein, GΔP(n, subscript Δ P k) represent that this wave filter obtains based on power level difference;Will | Δ PX(n,k)| Expression formula brings above formula into, obtains:

G Δ P ( n , k ) = | ΔP X ( n , k ) | | ΔP X ( n , k ) | + | ( 1 - | H 12 ( n , k ) | 2 ) | P ^ N 1 ( n , k ) ;

In above formula, add a free parameter α, then have:

G Δ P ( n , k ) = | ΔP X ( n , k ) | | ΔP X ( n , k ) | + α | ( 1 - | H 12 ( n , k ) | 2 ) | P ^ N 1 ( n , k ) ;

Wherein, the pure noise frame of T that the PSD of main microphone noise uses voice signal to start calculates, and formula is as follows:

P ^ N 1 ( n , k ) = &lambda; N P ^ N 1 ( n - 1 , k ) + ( 1 - &lambda; N ) | X 1 ( n , k ) | 2 i f n < T ;

In formula, λNFor noise forgetting factor;X1(n k) represents that main mike receives the time-frequency thresholding of signal;

Main mike with the cross-spectral density CPSD of time mike band noise is:

P X 1 X 2 ( n , k ) = H 12 ( n , k ) P S 1 ( n , k ) + P N 1 N 2 ( n , k ) ;

WhereinIt is the CPSD of two mikes reception noise signals, is estimated by following formula:

P ^ N 1 N 2 ( n , k ) = &lambda; N P ^ N 1 N 2 ( n - 1 , k ) + ( 1 - &lambda; N ) | X 1 ( n , k ) X 2 ( n , k ) | i f n < T

Receive signal PSD by main mike to estimate with the difference estimating noise PSD:

P ^ S 1 ( n , k ) = P ^ X 1 ( n , k ) - P ^ N 1 ( n , k )

Thus obtain impulse Response Function H estimated12(n, k):

H ^ 12 ( n , k ) = P ^ X 1 X 2 ( n , k ) - P ^ N 1 N 2 ( n , k ) P ^ X 1 ( n , k ) - P ^ N 1 ( n , k ) ;

Two mikes receive PSD and CPSD of signal, i.e.WithUse following recursive average Method is estimated:

P ^ X i ( n , k ) = &lambda; X P ^ X i ( n - 1 , k ) + ( 1 - &lambda; X ) | X i ( n , k ) | 2 , ( i = 1 , 2 ) ;

P ^ X 1 X 2 ( n , k ) = &lambda; X P ^ X 1 X 2 ( n - 1 , k ) + ( 1 - &lambda; X ) X 1 ( n , k ) X 2 ( n , k ) ;

Wherein λXFor noisy speech forgetting factor.Main mike and secondary mike are done the language after Short Time Fourier Transform Tone signal is multiplied with Wiener filter, and does inverse discrete Fourier transform process and splicing adding, thus after obtaining noise reduction Time domain speech signal.

3, method according to claim 1, it is characterised in that described employing is based on coherence between dual microphone Voice signal after voice de-noising method obtains noise reduction includes:

The voice signal that main mike receives with time mike is as follows:

xi(m)=si(m)+ni(m), i=1,2

Wherein, m is sampled point, x1M () is the voice signal that main mike receives, x2M () receives for time mike Voice signal, niM () is noise, siM () is target voice;

Obtain after carrying out Short Time Fourier Transform:

Xi(n, k)=Si(n,k)+Ni(n, k), i=1,2;

Wherein, n and k express time point and Frequency point respectively;

The coherence function defining the voice signal that main mike receives with time mike is:

&Gamma; x 1 x 2 ( n , k ) = P X 1 X 2 ( n , k ) P X 1 ( n , k ) P X 2 ( n , k ) ;

In formula,WithIt is respectively the power spectral density PSD of main mike and time mike band noise,It it is the cross-spectral density of main mike and time mike noisy speech;

Coherence function and main mike and time mike local SNR SNR1And SNR2There is a following relation:

&Gamma; x 1 x 2 = &Gamma; s 1 s 2 ( SNR 1 1 + SNR 1 SNR 2 1 + SNR 2 ) + &Gamma; n 1 n 2 ( 1 1 + SNR 1 1 1 + SNR 2 ) ;

Wherein,WithRepresent target voice coherence function and the noise coherence of two mikes receptions respectively Function;OrderThen above formula is rewritten as:

&Gamma; x 1 x 2 &ap; &Gamma; s 1 s 2 G ^ + &Gamma; n 1 n 2 ( 1 - G ^ ) ;

When using speaker call mode, it is assumed that two Mikes, at the dead ahead of diamylose gram, are chosen in targeted voice signal source The center of wind position is as array reference point, then target voice direction is 0 °, and ambient noise signal source is equivalent to enter from θ direction Penetrate, the signal U that the most main mike receives with time mike1And U2Between coherence be:

&Gamma; u 1 u 2 ( &omega; ) = e j&omega;f s ( d / c ) s i n &theta; ;

In formula, fsBeing sample rate, d is the spacing of main mike and time mike, and c is the velocity of sound;U1And U2Represent main mike The voice signal received with secondary mike or noise signal;

Both the above expression formula is combined, obtains:

&Gamma; x 1 x 2 &ap; ( cos ( &omega; &tau; sin 0 ) + j sin ( &omega; &tau; sin 0 ) ) G ^ + ( cos ( &omega; &tau; sin &theta; ) + j sin ( &omega; &tau; sin &theta; ) ) ( 1 - G ^ ) ;

Wherein, τ=fs(d/c), real part and imaginary part are taken out respectively:

R = G ^ + ( 1 - G ^ ) c o s &alpha; ;

I = ( 1 - G ^ ) sin &alpha; ;

Wherein, α=ω τ sin θ, respectively to real part and imaginary part expression formula modification, obtain:

G ^ = R - c o s &alpha; 1 - c o s &alpha; ;

G ^ = s i n &alpha; - I sin &alpha; ;

Owing to above-mentioned two formulas are equal, then have:

Icos α=(R-1) sin α+I;

According to cos2α+sin2α=1, obtains:

(I2+(1-R)2)sin2α+2I (R-1) sin α=0;

Its root is:

s i n &alpha; = ( 1 - R ) I &PlusMinus; ( 1 - R ) 2 I 2 ( 1 - R ) 2 + I 2 ;

Wherein, sin α=0 is trivial solution, ignores;Then:

s i n &alpha; = 2 ( 1 - R ) I ( 1 - R ) 2 + I 2 ;

After trying to achieve sin α, can try to achieve:

G ^ = 1 - R 2 - I 2 2 ( 1 - R ) ;

Thus build unrestricted Wiener filter Gcoh(n, k):

G c o h ( n , k ) = G ^ ( n , k ) ;

Voice signal after main mike and secondary mike are done Short Time Fourier Transform is multiplied with Wiener filter, and Do inverse discrete Fourier transform to process and splicing adding, thus the time domain speech signal after obtaining noise reduction.

As seen from the above technical solution provided by the invention, can choose according to the difference of the used pattern of user Corresponding noise-reduction method;Under normal hand call mode, it is near that method based on power level difference is applicable under this pattern Field signal model, and can effectively utilize primary and secondary mike to receive voice signal larger difference information on energy;Raise in use Under sound device pattern, utilize and be made without the estimation of noise power spectrum based on the method for dependency between dual microphone, and can Two mikes are avoided to receive noise inconsistent for impact based on power level difference method;Additionally, compared to traditional Method based on power level difference, the present invention can be prevented effectively from noise power Power estimation mistake by the Wiener filtering of structure and lead The music noise caused.

Accompanying drawing explanation

In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, required use in embodiment being described below Accompanying drawing be briefly described, it should be apparent that, below describe in accompanying drawing be only some embodiments of the present invention, for this From the point of view of the those of ordinary skill in field, on the premise of not paying creative work, it is also possible to obtain other according to these accompanying drawings Accompanying drawing.

A kind of voice de-noising method being applied to dual microphone small hand held devices that Fig. 1 provides for the embodiment of the present invention Flow chart;

The flow process of the voice de-noising method based on dual microphone power level difference that Fig. 2 provides for the embodiment of the present invention Figure;

Fig. 3 for the embodiment of the present invention provide based on the flow chart of the voice de-noising method of coherence between dual microphone;

Fig. 4 for the embodiment of the present invention provide based on the voice de-noising technology signal model of coherence between dual microphone.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground describes, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments.Based on this Inventive embodiment, the every other enforcement that those of ordinary skill in the art are obtained under not making creative work premise Example, broadly falls into protection scope of the present invention.

As it is shown in figure 1, a kind of voice fall being applied to dual microphone small hand held devices provided for the embodiment of the present invention The flow chart of method for de-noising, it mainly comprises the steps:

Step 11, the voice signal of reception dual microphone, and carry out the differentiation of call mode;

If step 12 is currently hand-held call mode, then use voice de-noising based on dual microphone power level difference Method obtains the voice signal after noise reduction;

If step 13 is currently speaker call mode, then use based on the voice de-noising of coherence between dual microphone Method obtains the voice signal after noise reduction.

In the embodiment of the present invention, choose the noise-reduction method of correspondence according to the difference of the used pattern of user.At normal hands Holding under call mode, method based on power level difference is applicable to the near-field signals model under this pattern, and can effectively utilize Primary and secondary mike receives voice signal larger difference information on energy;Under using speaker mode, utilize based on diamylose Between gram wind, the method for dependency is made without the estimation of noise power spectrum, and it can be avoided that two mikes receive noises not Consistent for impact based on power level difference method.

Below mainly for voice de-noising method based on dual microphone power level difference and based on dual microphone it Between the voice de-noising method of coherence be described in detail.

1, voice de-noising method based on dual microphone power level difference.

Under normal hand-held call mode, what we gave tacit consent to is that dual microphone constitutes a near field ring with speaker's face Border.Sound wave model in the present context is spherical wave, traditional noise reduction technology based on phase place this model inapplicable.And it is sharp With power level difference (Power Level Difference, PLD), then avoid this problem.

Its concrete handling process is as in figure 2 it is shown, be mainly:

The voice signal that main mike receives with time mike is as follows:

x1(m)=h1(m)*s(m)+n1(m);

x2(m)=h2(m)*s(m)+n2(m);

Wherein, m is sampled point, x1M () is the voice signal that main mike receives, x2M () receives for time mike Voice signal;hiM () is sonic propagation model shock response, niM () is noise, i=1,2;S (m) is target voice, and * is convolution Operation;

The voice signal receiving main mike and time mike does Short Time Fourier Transform respectively and is converted then Frequency domain, obtains:

X1(n, k)=H1(n,k)S(n,k)+N1(n,k);

X2(n, k)=H2(n,k)S(n,k)+N2(n,k);

Wherein, n and k express time point and Frequency point respectively;Without loss of generality, above-mentioned two formula is rewritten as:

X1(n, k)=S1(n,k)+N1(n,k);

X2(n, k)=H12(n,k)S1(n,k)+N2(n,k);

Wherein, S1(n k) represents H1(n, k) (n k), thus has SBecome according to Fourier in short-term Change result and calculate the power spectral density (PSD) of main mike and time mike band noise respectively, obtain:

P X 1 ( n , k ) = P S 1 ( n , k ) + P N 1 ( n , k ) ;

P X 2 ( n , k ) = | H 12 ( n , k ) | 2 P S 1 ( n , k ) + P N 2 ( n , k ) ;

Wherein,Represent the PSD of the voice signal that main mike receives,Represent that main mike receives The PSD of the noise signal arrived,The PSD of the noise signal that expression time mike receives.

WillWithSubtract each other and obtain:

P X 1 ( n , k ) - P X 2 ( n , k ) = ( 1 - | H 12 ( n , k ) | 2 ) P S 1 ( n , k ) + P N 1 ( n , k ) - P N 2 ( n , k ) ;

OrderIt addition, generally assume that two Mikes The background noise that wind receives is it is believed that difference is little, therefore Δ PNIt is negligible, i.e. Δ PN≈ 0, then have:

| &Delta;P X ( n , k ) | = | ( 1 - | H 12 ( n , k ) | 2 ) | P S 1 ( n , k ) .

Utilize the voice signal PSD estimated and noise signal PSD can construct Wiener filter GΔP(n, k), here we Utilize the main mike voice signal PSD estimatedWith noise signal PSDConstruct:

G &Delta; P ( n , k ) = P ^ S 1 ( n , k ) P ^ S 1 ( n , k ) + P ^ N 1 ( n , k ) ;

Wherein, GΔP(n, subscript Δ P k) represent that this wave filter obtains based on power level difference.Can not be straight Connect estimation to obtain, therefore can be by | Δ PX(n, k) | expression formula brings above formula into, obtains:

G &Delta; P ( n , k ) = | &Delta;P X ( n , k ) | | &Delta;P X ( n , k ) | + | ( 1 - | H 12 ( n , k ) | 2 ) | P ^ N 1 ( n , k ) ;

In order to not make voice produce the biggest distortion, above formula adds a free parameter α, then has GΔP(n, estimation k) Formula:

G &Delta; P ( n , k ) = | &Delta;P X ( n , k ) | | &Delta;P X ( n , k ) | + &alpha; | ( 1 - | H 12 ( n , k ) | 2 ) | P ^ N 1 ( n , k ) ;

Wherein, the pure noise frame of T that the PSD of main microphone noise uses voice signal to start calculates, and formula is as follows:

P ^ N 1 ( n , k ) = &lambda; N P ^ N 1 ( n - 1 , k ) + ( 1 - &lambda; N ) | X 1 ( n , k ) | 2 i f n < T ;

In formula, λNFor noise forgetting factor, empirical value set;X1(n k) represents that main mike receives the time-frequency domain of signal Value;

Then present GΔP(n k) only has impulse Response Function H in estimator12(n is k) unknown;Consider that crosspower spectrum is close Degree (CPSD), the CPSD that main mike receives signal with time mike is:

P X 1 X 2 ( n , k ) = H 12 ( n , k ) P S 1 ( n , k ) + P N 1 N 2 ( n , k ) ;

Wherein,It is the CPSD of two mikes reception noise signals, is similar toCalculating, permissible Estimated by following formula:

P ^ N 1 N 2 ( n , k ) = &lambda; N P ^ N 1 N 2 ( n - 1 , k ) + ( 1 - &lambda; N ) | X 1 ( n , k ) X 2 ( n , k ) | i f n < T

Signal PSD can be received by main mike to estimate with the difference estimating noise PSD:

P ^ S 1 ( n , k ) = P ^ X 1 ( n , k ) + P ^ N 1 ( n , k ) ;

Thus obtain the impulse Response Function estimated

H ^ 12 ( n , k ) = P ^ X 1 X 2 ( n , k ) - P ^ N 1 N 2 ( n , k ) P ^ X 1 ( n , k ) - P ^ N 1 ( n , k ) ;

In the embodiment of the present invention, two mikes receive PSD and CPSD of signal (i.e.With) we Following recursive average method is used to estimate:

P ^ X i ( n , k ) = &lambda; X P ^ N i ( n - 1 , k ) + ( 1 - &lambda; X ) | X i ( n , k ) | 2 , ( i = 1 , 2 ) ;

P ^ X 1 X 2 ( n , k ) = &lambda; X P ^ X 1 X 2 ( n - 1 , k ) + ( 1 - &lambda; X ) X 1 ( n , k ) X 2 ( n , k ) ;

Wherein λXFor noisy speech forgetting factor.Now GΔP(n, k) parameter in expression formula is all obtained, then obtain for Strengthen the Wiener filter of voice, main mike and secondary mike are done the voice signal after Short Time Fourier Transform and wiener Wave filter is multiplied, and does inverse discrete Fourier transform process (IFFT) and splicing adding, thus the time domain language after obtaining noise reduction Tone signal

2, based on the voice de-noising method of coherence between dual microphone.

Under the pattern using speaker call, it is believed that two mikes constitute a far field ring with speaker's face Border.Now, the present invention uses the voice de-noising technology receiving signal coherency based on dual microphone.This method is made without The estimation of noise power spectrum, and it can be avoided that two mike reception noises are inconsistent for based on power level difference method Impact.

Under the conditions of dual microphone, between the voice signal that two mikes receive, there is coherence.This is concerned with Property can be used to estimate the SNR of each time frequency point of Noisy Speech Signal, thus utilizes the thought of Wiener filtering just can reach language The purpose of sound noise reduction.

Its concrete handling process is as it is shown on figure 3, be mainly:

The voice signal that main mike receives with time mike is as follows:

xi(m)=si(m)+ni(m), i=1,2

Wherein, m is sampled point, x1M () is the voice signal that main mike receives, x2M () receives for time mike Voice signal, niM () is noise, siM () is target voice;

Obtain after carrying out Short Time Fourier Transform:

Xi(n, k)=Si(n,k)+Ni(n, k), i=1,2;

Wherein, n and k express time point and Frequency point respectively;

The coherence function defining the voice signal that main mike receives with time mike is:

&Gamma; x 1 x 2 ( n , k ) = P X 1 X 2 ( n , k ) P X 1 ( n , k ) P X 2 ( n , k ) ;

In formula,WithIt is respectively the power spectral density PSD of main mike and time mike band noise,It it is the cross-spectral density of main mike and time mike noisy speech;Equally use previously described recurrence Averaging method is tried to achieve.

Coherence function and main mike and time mike local SNR SNR1And SNR2There is a following relation:

&Gamma; x 1 x 2 = &Gamma; s 1 s 2 ( SNR 1 1 + SNR 1 SNR 2 1 + SNR 2 ) + &Gamma; n 1 n 2 ( 1 1 + SNR 1 1 1 + SNR 2 ) ;

Wherein,WithRepresent target voice coherence function and the noise coherence of two mikes receptions respectively Function.If two mikes closely (such as 2cm), then SNR1≈SNR2Set up, but if two mikes are separated by ratio Farther out (such as 15cm), the most above-mentioned hypothesis is not necessarily set up.It is believed that at two mikes,(or) it is equal.

Then makeThen above formula is rewritten as:

&Gamma; x 1 x 2 &ap; &Gamma; s 1 s 2 G ^ + &Gamma; n 1 n 2 ( 1 - G ^ ) ;

As shown in Figure 4, when use speaker call mode time, it is assumed that targeted voice signal source at the dead ahead of diamylose gram, Choose the center of two microphone positions as array reference point, then target voice direction is 0 °, the equivalence of ambient noise signal source For incident from θ direction.Theoretical, for two homologous signal U according to Array Signal Processing1And U2(same target voice or noise The signal received by two mikes respectively), the coherence between the signal that main mike and secondary mike receive can be with table It is shown as:

&Gamma; u 1 u 2 ( &omega; ) = e j&omega;f s ( d / c ) s i n &theta; ;

In formula, fsBeing sample rate, d is the spacing of main mike and time mike, and c is the velocity of sound;

Both the above expression formula is combined, obtains:

&Gamma; x 1 x 2 &ap; ( cos ( &omega; &tau; sin 0 ) + j sin ( &omega; &tau; sin 0 ) ) G ^ + ( cos ( &omega; &tau; sin &theta; ) + j sin ( &omega; &tau; sin &theta; ) ) ( 1 - G ^ ) ;

Wherein, τ=fs(d/c), real part and imaginary part are taken out respectively:

R = G ^ + ( 1 - G ^ ) c o s &alpha; ;

I = ( 1 - G ^ ) sin &alpha; ;

Wherein, α=ω τ sin θ, respectively to real part and imaginary part expression formula modification, obtain:

G ^ = R - c o s &alpha; 1 - c o s &alpha; ;

G ^ = s i n &alpha; - I sin &alpha; ;

Owing to above-mentioned two formulas are equal, then have:

Icos α=(R-1) sin α+I;

According to cos2α+sin2α=1, obtains:

(I2+(1-R)2)sin2α+2I (R-1) sin α=0;

Its root is:

s i n &alpha; = ( 1 - R ) I &PlusMinus; ( 1 - R ) 2 I 2 ( 1 - R ) 2 + I 2 ;

Wherein, sin α=0 is trivial solution, ignores;Then:

s i n &alpha; = 2 ( 1 - R ) I ( 1 - R ) 2 + I 2 ;

After trying to achieve sin α, can try to achieve:

G ^ = 1 - R 2 - I 2 2 ( 1 - R ) ;

Thus build unrestricted Wiener filter Gcoh(n, k) (subscript coh represents that this wave filter is to obtain based on coherence ):

G c o h ( n , k ) = G ^ ( n , k ) ;

Voice signal after main mike and secondary mike are done Short Time Fourier Transform is multiplied with Wiener filter, and Do inverse discrete Fourier transform to process and splicing adding, thus the time domain targeted voice signal after obtaining noise reduction

Embodiment of the present invention such scheme, compared to traditional handheld device dual microphone voice de-noising technology, Ke Yigen The noise-reduction method of correspondence is chosen according to the difference of the used pattern of user.Under normal hand call mode, utilize based on power The method of level difference is avoided that the impact of near-field effect;Under using speaker mode, utilize based on phase between dual microphone The method of closing property is made without the estimation of noise power spectrum, and it can be avoided that two mike reception noises are inconsistent for base Impact in power level difference method.Additionally, compared to traditional method based on power level difference, by the dimension of structure Nanofiltration wave energy is prevented effectively from the music noise that noise power Power estimation mistake causes.

Through the above description of the embodiments, those skilled in the art it can be understood that to above-described embodiment can To be realized by software, it is also possible to the mode adding necessary general hardware platform by software realizes.Based on such understanding, The technical scheme of above-described embodiment can embody with the form of software product, this software product can be stored in one non-easily The property lost storage medium (can be CD-ROM, USB flash disk, portable hard drive etc.) in, including some instructions with so that a computer sets Standby (can be personal computer, server, or the network equipment etc.) performs the method described in each embodiment of the present invention.

The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto, Any those familiar with the art in the technical scope of present disclosure, the change that can readily occur in or replacement, All should contain within protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims Enclose and be as the criterion.

Claims (3)

1. the voice de-noising method being applied to dual microphone small hand held devices, it is characterised in that including:
Receive the voice signal of dual microphone, and carry out the differentiation of call mode;
If being currently hand-held call mode, then voice de-noising method based on dual microphone power level difference is used to obtain fall Voice signal after making an uproar;
If being currently speaker call mode, then use and obtain fall based on the voice de-noising method of coherence between dual microphone Voice signal after making an uproar.
Method the most according to claim 1, it is characterised in that described employing language based on dual microphone power level difference Voice signal after sound noise-reduction method obtains noise reduction includes:
The voice signal that main mike receives with time mike is as follows:
x1(m)=h1(m)*s(m)+n1(m);
x2(m)=h2(m)*s(m)+n2(m);
Wherein, m is sampled point, x1M () is the voice signal that main mike receives, x2M voice that () receives for time mike Signal;hiM () is sonic propagation model shock response, niM () is noise, i=1,2;S (m) is target voice, and * is convolution behaviour Make;
The voice signal receiving main mike and time mike does Short Time Fourier Transform respectively, obtains:
X1(n, k)=H1(n,k)S(n,k)+N1(n,k);
X2(n, k)=H2(n,k)S(n,k)+N2(n,k);
Wherein, n and k express time point and Frequency point respectively;Above-mentioned two formula is rewritten as:
X1(n, k)=S1(n,k)+N1(n,k);
X2(n, k)=H12(n,k)S1(n,k)+N2(n,k);
Wherein, S1(n k) represents H1(n, k) (n k), thus has STie according to Short Time Fourier Transform Fruit calculates the power spectral density PSD of main mike and time mike band noise respectively, obtains:
P X 1 ( n , k ) = P S 1 ( n , k ) + P N 1 ( n , k ) ;
P X 2 ( n , k ) = | H 12 ( n , k ) | 2 P S 1 ( n , k ) + P N 2 ( n , k ) ;
WillWithSubtract each other and obtain:
P X 1 ( n , k ) - P X 2 ( n , k ) = ( 1 - | H 12 ( n , k ) | 2 ) P S 1 ( n , k ) + P N 1 ( n , k ) - P N 2 ( n , k ) ;
OrderWherein Δ PN≈ 0, then have:
| &Delta;P X ( n , k ) | = | ( 1 - | H 12 ( n , k ) | 2 ) | P S 1 ( n , k ) ;
Utilize the voice signal PSD estimated and noise signal PSD can construct Wiener filter GΔP(n,k);Wherein utilize estimation Main mike voice signalAnd noise signalWhen constructing:
G &Delta; P ( n , k ) = P ^ S 1 ( n , k ) P ^ S 1 ( n , k ) + P ^ N 1 ( n , k ) ;
Wherein, GΔP(n, subscript Δ P k) represent that this wave filter obtains based on power level difference;Will | Δ PX(n, k) | express Formula brings above formula into, obtains:
G &Delta; P ( n , k ) = | &Delta;P X ( n , k ) | | &Delta;P X ( n , k ) | + | ( 1 - | H 12 ( n , k ) | 2 ) | P ^ N 1 ( n , k ) ;
In above formula, add a free parameter α, then have:
G &Delta; P ( n , k ) = | &Delta;P X ( n , k ) | | &Delta;P X ( n , k ) | + &alpha; | ( 1 - | H 12 ( n , k ) | 2 ) | P ^ N 1 ( n , k ) ;
Wherein, the pure noise frame of T that the PSD of main microphone noise uses voice signal to start calculates, and formula is as follows:
P ^ N 1 ( n , k ) = &lambda; N P ^ N 1 ( n - 1 , k ) + ( 1 - &lambda; N ) | X 1 ( n , k ) | 2 i f n < T ;
In formula, λNFor noise forgetting factor;X1(n k) represents that main mike receives the time-frequency thresholding of signal;
Main mike with the cross-spectral density CPSD of time mike band noise is:
P X 1 X 2 ( n , k ) = H 12 ( n , k ) P S 1 ( n , k ) + P N 1 N 2 ( n , k ) ;
WhereinIt is the CPSD of two mikes reception noise signals, is estimated by following formula:
P ^ N 1 N 2 ( n , k ) = &lambda; N P ^ N 1 N 2 ( n - 1 , k ) + ( 1 - &lambda; N ) | X 1 ( n , k ) X 2 ( n , k ) | i f n < T
Receive signal PSD by main mike to estimate with the difference estimating noise PSD:
P ^ S 1 ( n , k ) = P ^ X 1 ( n , k ) - P ^ N 1 ( n , k )
Thus obtain impulse Response Function H estimated12(n, k):
H ^ 12 ( n , k ) = P ^ X 1 X 2 ( n , k ) - P ^ N 1 N 2 ( n , k ) P ^ X 1 ( n , k ) - P ^ N 1 ( n , k ) ;
Two mikes receive PSD and CPSD of signal, i.e.WithFollowing recursive average method is used Estimate:
P ^ X i ( n , k ) = &lambda; X P ^ X i ( n - 1 , k ) + ( 1 - &lambda; X ) | X i ( n , k ) | 2 , ( i = 1 , 2 ) ;
P ^ X 1 X 2 ( n , k ) = &lambda; X P ^ X 1 X 2 ( n - 1 , k ) + ( 1 - &lambda; X ) X 1 ( n , k ) X 2 ( n , k ) ;
Wherein λXFor noisy speech forgetting factor.Main mike and secondary mike are done the letter of the voice after Short Time Fourier Transform Number it is multiplied with Wiener filter, and does inverse discrete Fourier transform and process and splicing adding, thus the time domain after obtaining noise reduction Voice signal.
Method the most according to claim 1, it is characterised in that described employing is based on the voice of coherence between dual microphone Voice signal after noise-reduction method obtains noise reduction includes:
The voice signal that main mike receives with time mike is as follows:
xi(m)=si(m)+ni(m), i=1,2
Wherein, m is sampled point, x1M () is the voice signal that main mike receives, x2M voice that () receives for time mike Signal, niM () is noise, siM () is target voice;
Obtain after carrying out Short Time Fourier Transform:
Xi(n, k)=Si(n,k)+Ni(n, k), i=1,2;
Wherein, n and k express time point and Frequency point respectively;
The coherence function defining the voice signal that main mike receives with time mike is:
&Gamma; x 1 x 2 ( n , k ) = P X 1 X 2 ( n , k ) P X 1 ( n , k ) P X 2 ( n , k ) ;
In formula,WithIt is respectively the power spectral density PSD of main mike and time mike band noise,It it is the cross-spectral density of main mike and time mike noisy speech;
Coherence function and main mike and time mike local SNR SNR1And SNR2There is a following relation:
&Gamma; x 1 x 2 = &Gamma; s 1 s 2 ( SNR 1 1 + SNR 1 SNR 2 1 + SNR 2 ) + &Gamma; n 1 n 2 ( 1 1 + SNR 1 1 1 + SNR 2 ) ;
Wherein,WithRepresent target voice coherence function and the noise coherence function of two mikes receptions respectively; OrderThen above formula is rewritten as:
&Gamma; x 1 x 2 &ap; &Gamma; s 1 s 2 G ^ + &Gamma; n 1 n 2 ( 1 - G ^ ) ;
When using speaker call mode, it is assumed that two mike positions, at the dead ahead of diamylose gram, are chosen in targeted voice signal source The center put is as array reference point, then target voice direction is 0 °, and ambient noise signal source is equivalent to incident from θ direction, The signal U that the most main mike receives with time mike1And U2Between coherence be:
&Gamma; u 1 u 2 ( &omega; ) = e j&omega;f s ( d / c ) s i n &theta; ;
In formula, fsBeing sample rate, d is the spacing of main mike and time mike, and c is the velocity of sound;U1And U2Represent that main mike is with secondary Voice signal that mike receives or noise signal;
Both the above expression formula is combined, obtains:
&Gamma; x 1 x 2 &ap; ( c o s ( &omega; &tau; s i n 0 ) + j s i n ( &omega; &tau; s i n 0 ) ) G ^ + ( c o s ( &omega; &tau; sin &theta; ) + j s i n ( &omega; &tau; s i n &theta; ) ) ( 1 - G ^ ) ;
Wherein, τ=fs(d/c), real part and imaginary part are taken out respectively:
R = G ^ + ( 1 - G ^ ) c o s &alpha; ;
I = ( 1 - G ^ ) sin &alpha; ;
Wherein, α=ω τ sin θ, respectively to real part and imaginary part expression formula modification, obtain:
G ^ = R - c o s &alpha; 1 - c o s &alpha; ;
G ^ = s i n &alpha; - I sin &alpha; ;
Owing to above-mentioned two formulas are equal, then have:
I cos α=(R-1) sin α+I;
According to cos2α+sin2α=1, obtains:
(I2+(1-R)2)sin2α+2I (R-1) sin α=0;
Its root is:
s i n &alpha; = ( 1 - R ) I &PlusMinus; ( 1 - R ) 2 I 2 ( 1 - R ) 2 + I 2 ;
Wherein, sin α=0 is trivial solution, ignores;Then:
s i n &alpha; = 2 ( 1 - R ) I ( 1 - R ) 2 + I 2 ;
After trying to achieve sin α, can try to achieve:
G ^ = 1 - R 2 - I 2 2 ( 1 - R ) ;
Thus build unrestricted Wiener filter Gcoh(n, k):
G c o h ( n , k ) = G ^ ( n , k ) ;
Voice signal after main mike and secondary mike are done Short Time Fourier Transform is multiplied with Wiener filter, and do from Dissipate inverse Fourier transform to process and splicing adding, thus the time domain speech signal after obtaining noise reduction.
CN201610286545.2A 2016-04-28 2016-04-28 Voice de-noising method applied to dual microphone small hand held devices CN105976826B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610286545.2A CN105976826B (en) 2016-04-28 2016-04-28 Voice de-noising method applied to dual microphone small hand held devices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610286545.2A CN105976826B (en) 2016-04-28 2016-04-28 Voice de-noising method applied to dual microphone small hand held devices

Publications (2)

Publication Number Publication Date
CN105976826A true CN105976826A (en) 2016-09-28
CN105976826B CN105976826B (en) 2019-10-25

Family

ID=56993536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610286545.2A CN105976826B (en) 2016-04-28 2016-04-28 Voice de-noising method applied to dual microphone small hand held devices

Country Status (1)

Country Link
CN (1) CN105976826B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107328578A (en) * 2017-07-10 2017-11-07 安徽大学 A kind of sound source separating method detected for train bearing rail side acoustic fault
CN107680609A (en) * 2017-09-12 2018-02-09 桂林电子科技大学 A kind of double-channel pronunciation Enhancement Method based on noise power spectral density
CN109741758A (en) * 2019-01-14 2019-05-10 杭州微纳科技股份有限公司 A kind of dual microphone voice de-noising method
WO2019200722A1 (en) * 2018-04-16 2019-10-24 深圳市沃特沃德股份有限公司 Sound source direction estimation method and apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061032A1 (en) * 2001-09-24 2003-03-27 Clarity, Llc Selective sound enhancement
EP2196988A1 (en) * 2008-12-12 2010-06-16 Harman/Becker Automotive Systems GmbH Determination of the coherence of audio signals
CN102740215A (en) * 2011-03-31 2012-10-17 Jvc建伍株式会社 Speech input device, method and program, and communication apparatus
CN104424953A (en) * 2013-09-11 2015-03-18 华为技术有限公司 Speech signal processing method and device
CN105513605A (en) * 2015-12-01 2016-04-20 南京师范大学 Voice enhancement system and method for cellphone microphone

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030061032A1 (en) * 2001-09-24 2003-03-27 Clarity, Llc Selective sound enhancement
EP2196988A1 (en) * 2008-12-12 2010-06-16 Harman/Becker Automotive Systems GmbH Determination of the coherence of audio signals
CN102740215A (en) * 2011-03-31 2012-10-17 Jvc建伍株式会社 Speech input device, method and program, and communication apparatus
CN104424953A (en) * 2013-09-11 2015-03-18 华为技术有限公司 Speech signal processing method and device
CN105513605A (en) * 2015-12-01 2016-04-20 南京师范大学 Voice enhancement system and method for cellphone microphone

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NIMA YOUSEFIAN ETC: "A dual-microphone speech enhancement algorithm based on the coherence function", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 *
NIMA YOUSEFIAN ETC: "Using power level difference for near field dual-microphone speech enhancement", 《APPLIED ACOUSTICS》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107328578A (en) * 2017-07-10 2017-11-07 安徽大学 A kind of sound source separating method detected for train bearing rail side acoustic fault
CN107328578B (en) * 2017-07-10 2018-06-12 安徽大学 A kind of sound source separating method for the detection of train bearing rail side acoustic fault
CN107680609A (en) * 2017-09-12 2018-02-09 桂林电子科技大学 A kind of double-channel pronunciation Enhancement Method based on noise power spectral density
WO2019200722A1 (en) * 2018-04-16 2019-10-24 深圳市沃特沃德股份有限公司 Sound source direction estimation method and apparatus
CN109741758A (en) * 2019-01-14 2019-05-10 杭州微纳科技股份有限公司 A kind of dual microphone voice de-noising method

Also Published As

Publication number Publication date
CN105976826B (en) 2019-10-25

Similar Documents

Publication Publication Date Title
US20200058316A1 (en) Methods and systems for improved measurement, entity and parameter estimation, and path propagation effect measurement and mitigation in source signal separation
US8694306B1 (en) Systems and methods for source signal separation
US8867759B2 (en) System and method for utilizing inter-microphone level differences for speech enhancement
US9497544B2 (en) Systems and methods for surround sound echo reduction
US9633671B2 (en) Voice quality enhancement techniques, speech recognition techniques, and related systems
JP5587396B2 (en) System, method and apparatus for signal separation
Hänsler et al. Acoustic echo and noise control: a practical approach
US9100466B2 (en) Method for processing an audio signal and audio receiving circuit
CN102625946B (en) Systems, methods, apparatus, and computer-readable media for dereverberation of multichannel signal
Zhang et al. Why does PHAT work well in lownoise, reverberative environments?
JP5596048B2 (en) System, method, apparatus and computer program product for enhanced active noise cancellation
CN102947878B (en) Systems, methods, devices, apparatus, and computer program products for audio equalization
KR101444100B1 (en) Noise cancelling method and apparatus from the mixed sound
CN104181506B (en) A kind of based on improving the sound localization method of PHAT weighting time delay estimation and realizing system
US9711164B2 (en) Noise cancellation method
US9640194B1 (en) Noise suppression for speech processing based on machine-learning mask estimation
US7113605B2 (en) System and process for time delay estimation in the presence of correlated noise and reverberation
US9113240B2 (en) Speech enhancement using multiple microphones on multiple devices
US8204252B1 (en) System and method for providing close microphone adaptive array processing
CN1122963C (en) Method and apparatus for measuring signal level and delay at multiple sensors
US8824666B2 (en) Noise cancellation for phone conversation
CN100524465C (en) A method and device for noise elimination
US8897455B2 (en) Microphone array subset selection for robust noise reduction
KR101275442B1 (en) Systems, methods, apparatus, and computer-readable media for phase-based processing of multichannel signal
US8880396B1 (en) Spectrum reconstruction for automatic speech recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant