CN101777349B

CN101777349B - Auditory perception property-based signal subspace microphone array voice enhancement method

Info

Publication number: CN101777349B
Application number: CN2009102498006A
Authority: CN
Inventors: 刘文举; 程宁; 李超
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2009-12-08
Filing date: 2009-12-08
Publication date: 2012-04-11
Anticipated expiration: 2029-12-08
Also published as: CN101777349A

Abstract

The invention discloses an auditory perception property-based signal subspace microphone array voice enhancement method, which fully combines auditory masking effect of human ears based on the traditional microphone array signal subspace voice enhancement method. The algorithm core of the signal subspace is to reasonably estimate a linear filter, and the key point of the algorithm comprises the following steps: accurately estimating the dimensionality and noise power spectrum of the signal subspace, and reasonably estimating a Lagrangian multiplier. Therefore, the invention provides a feasible solution, which comprises the following steps: performing time domain alignment, short-time Fourier transform and power spectrum characteristic value decomposition on signals acquired by a microphone array; determining the dimensionality of a noise subspace by hypothesis testing; estimating a noise power spectrum by a conditional probability method in the noise subspace; estimating an auditory masking threshold value based on the signal subspace; and estimating the linear filter by combining the Lagrangian multiplier according to the auditory perception property.

Description

Signal subspace microphone array voice enhancement method based on auditory perception property

Technical field

The present invention relates to the design of signal subspace method, human auditory system masking effect and the postfilter of microphone array.

Background technology

Microphone array voice enhancement method has obtained extensive studies in recent years.Wherein, the signal subspace algorithm has the ability of outstanding elimination additivity broadband noise.The signal subspace algorithm is signal subspace (comprising target speech signal and noise) and noise subspace (only comprising noise) with the signals with noise spatial decomposition, and in signal subspace, estimates the target speech signal.The core of signal subspace algorithm is reasonably to estimate that linear filter, one of its main points are estimated signal subspace dimension and noise power spectrum exactly.Research to the signal subspace sound enhancement method has proved that this method has good voice and strengthens the property.Although the signal subspace algorithm performance is superior, want to eliminate fully noise, still have suitable difficulty.Usually, after the de-noising of signal subspace algorithm, still can have certain residual noise in the enhancing voice, these noises have reduced the perceived quality of voice.In order to reduce the influence of residual noise to the target speech signal as far as possible, the auditory masking effect of people's finder's ear on a large amount of experiment basis can be used for reaching this target.The auditory masking effect of people's ear is meant; Under normal conditions; The target speech signal is strong signal, and ground unrest relatively a little less than, the human auditory system can confirm the auditory masking threshold on the frequency domain according to concrete target speech signal like this; If filtered residual noise is limited under the auditory masking threshold of people's ear, this noise just can not be by the perception of people's ear so.Through research for many years, this auditory response has been applied in the sound enhancement method effectively.As long as the amount of the residual noise in the voice after will strengthening is limited in certain scope, just can make its under the sheltering of target speech signal not by the perception of people's ear, thereby realize enhancing to the target speech signal.

The principle of signal subspace algorithm is through the method for characteristic value decomposition the signals with noise spatial decomposition to be become two sub spaces: signal subspace (comprising target speech signal and noise) and noise subspace (only comprising noise) recover the target speech signal then on signal subspace.The reason of doing like this is: voice signal can be modeled as the linear combination of some base vectors.Usually, some eigenwerts of clean speech signal power spectrum matrix are in close proximity to zero, and this energy that shows the clean speech signal only is distributed on some base vector.The noise of signal subspace algorithm is assumed to be white noise (coloured noise can be given albefaction through the method for prewhitening), and all eigenwerts of white noise all are positive, and noise energy is distributed on all base vectors of signals with noise.So, can resolve into a signal subspace (comprising target speech signal and noise) and a noise subspace (only comprising noise) by the space that the base of signals with noise is formed.Correspondingly, on signal subspace, just can recover the target speech signal, noise subspace then can not considered owing to not comprising the target speech signal.

The frequency domain representation of the Noisy Speech Signal vector that receives on the array of supposing to be made up of L microphone is: X=[X ₁..., X _L] ^HThe frequency domain representation of the voice signal after the enhancing that is obtained by the weighting summation of array input signal is following:

Y＝w ^HX＝w ^H[S+N] (1)

Wherein, w=[w ₁..., w _L] ^HBe coefficient vector, S is the target speech signal, and N is a noise, [] ^HBe the conjugate transpose operator.

If R _XBe the spectral power matrix of signals with noise, R _SBe the spectral power matrix of target speech signal, R _NSpectral power matrix for noise.Under target speech signal and the incoherent hypothesis of noise signal, have:

R _X＝R _S+R _N (2)

The characteristic value decomposition of target speech signal power spectrum matrix can be explained as follows:

R _S＝UΛ _SU ^H (3)

Wherein, Λ _SBe the eigenvalue matrix of eigenwert descending sort, rank of matrix is Q, and promptly back L-Q item is 0, and U is the characteristic of correspondence vector matrix.

Suppose that noise is that white noise and power spectrum are σ _N ², then have:

R_{X} = {UΛ}_{X} U^{H} = U (Λ_{S} + σ_{N}^{2} I) U^{H} - - - (4)

Wherein, Λ _XBe the Noisy Speech Signal power spectrum characteristic value matrix of eigenwert descending sort, I is L rank unit matrix.

R _XI eigenwert With R _SI eigenwert

Have the relation of following formula to set up:

λ_{S_{i}} = \{\begin{matrix} λ_{X_{i}} - σ_{N}^{2}, & if & i = 1, . . ., Q \\ 0, & if & i = Q + 1, . . ., L \end{matrix} - - - (5)

Wherein, i ∈ 1 ..., L} is the subscript of eigenwert.

If H is a linear filter, the estimation that can obtain the target speech signal is following:

\hat{S} = HX - - - (6)

In fact, the target speech quality of signals of being recovered by linear filter mainly shows two aspects: the one, and the distortion of target speech signal, the 2nd, the size of residual noise.Y.Ephaim and H.L.Van Trees are at " A signal subspace approach for speech enhancement " (" a kind of signal subspace method that is used for the voice enhancing "), IEEE Trans.Speech AudioProcess., vol.3; No.4, pp.251-266, Jul; In 1995; Under the condition of noise limit in certain scope, through the distortion of minimization voice, the expression formula that has obtained linear filter H is following:

H = UG U^{H} = U [\begin{matrix} G \end{matrix}] [\begin{matrix} _{1} & 0 \\ 0 & 0 \end{matrix}] U^{H} - - - (7)

Wherein, G ₁Non-singular matrix for Q * Q.

G can explain as follows:

G = Λ_{S} {(Λ_{S} + σ_{N}^{2} Λ_{μ})}^{- 1} - - - (8)

Wherein, Λ _μ=diag (μ ₁..., μ _L) be L rank Lagrange multiplier matrixes.

G is L rank diagonal matrix, diagonal entry g _iCan explain as follows:

g_{i} = \{\begin{matrix} \frac{λ_{S_{i}}}{λ_{S_{i}} + μ_{i} σ_{N}^{2}}, & if & i = 1, . . ., Q \\ 0, & if & i = Q + 1, . . ., L \end{matrix} - - - (9)

Wherein, μ _iBe i Lagrange multiplier, i ∈ 1 ..., L} is a subscript.

Summary of the invention

In order to solve prior art problems; The objective of the invention is to linear filter is estimated; Utilize a kind of new linear filter of auditory masking effect design of people's ear based on auditory perception property; Thus, the present invention provides a kind of signal subspace microphone array voice enhancement method based on auditory perception property.

For reaching said purpose, the present invention provides a kind of signal subspace microphone array voice enhancement method based on auditory perception property, and the concrete steps of this method are following:

Step a: through the multi-path voice signal of microphone array collection band noise; Carry out time domain alignment to each road Noisy Speech Signal; The frequency signal form of each the road signal indication value of pluralizing after using discrete Fourier transformation in short-term to align; Calculate the spectral power matrix of microphone array multiple signals and this spectral power matrix is carried out characteristic value decomposition, obtain eigenvalue matrix and eigenvectors matrix;

Step b: the eigenvalue matrix to spectral power matrix is carried out test of hypothesis, confirms signal subspace dimension Q;

Step c: on noise subspace, utilize noise power spectrum in the noise subspace to be less than the characteristics of the signals with noise power spectrum in the signal subspace, ask the method for expectation to estimate noise power spectrum through conditional probability;

Steps d: utilize noise subspace dimension P and noise power spectrum to estimate; Based on the human auditory system masking effect; Estimation obtains the auditory masking threshold of each frequency based on signal subspace, and noise subspace dimension is expressed as: P=L-Q, L are the numbers of microphone in the microphone array;

Step e: according to noise power spectrum, auditory masking threshold, estimate linear filter, realize strengthening based on the signal subspace microphone array voice of auditory perception property in conjunction with Lagrange multiplier.

Wherein, said spectral power matrix is carried out characteristic value decomposition, comprising:

Setting Noisy Speech Signal X is: X=S+N,

So, spectral power matrix R _XBe expressed as:

R_{X} = {UΛ}_{X} U^{H} = U (Λ_{S} + σ_{N}^{2} I) U^{H}

Wherein, S is the target speech signal, and N is a noise, R _XBe Noisy Speech Signal spectral power matrix, Λ _XBe the Noisy Speech Signal power spectrum characteristic value matrix of eigenwert descending sort, Λ _SBe the target speech power spectrum signal eigenvalue matrix of eigenwert descending sort, U is an eigenvectors matrix, σ _N ²Be white noise power, I is L rank unit matrix, [] ^HBe the conjugate transpose operator.

Wherein, said test of hypothesis is at null hypothesis H ₀: eigenvalue matrix Λ _XPrerequisite that all equate to set up of back L-Q eigenwert under, get minimum signal subspace dimension Q value.

Wherein, judge that the step whether null hypothesis can be set up comprises as follows:

Null hypothesis H ₀: eigenvalue matrix Λ _XBack L-Q eigenwert all equate;

Alternative hypothesis H ₁: eigenvalue matrix Λ _XBack L-Q eigenwert in have two eigenwert differences at least;

Signal subspace dimension is defined as:

\underset{Q}{\arg \max} (θ | - 2 \log [F (H_{0}) / F (H_{1})] &GreaterEqual; χ_{θ, α}^{2})

In the formula ,-2log [F (H ₀)/F (H ₁)] approximate obedience degree of freedom is that the card side of θ=L-Q-1 distributes, α is a degree of confidence, F (H ₀) and F (H ₁) be the distribution function of eigenwert; Promptly get satisfied

- 2 Log [F (H_{0}) / F (H_{1})] &GreaterEqual; χ_{θ, α}^{2}

Maximum L-Q value be noise subspace dimension P, argmax () is an operator of seeking the parameter value with maximum scores, χ _{θ, α} ²Acceptance domain lower bound when being α for card side's distribution degree of confidence of θ degree of freedom.

Wherein, the distribution function F (H of said eigenwert ₀) and F (H ₁) the employing Gauss model.

Wherein, for the error that the noise power spectrum that is produced by the misjudgment of noise subspace dimension is estimated, use a compensating factor to compensate; Compensating factor is the expectation value of noise power spectrum estimation and the ratio of noise power and estimation.Noise power spectrum is estimated divided by compensating factor, obtained revised noise power spectrum and estimate.

Wherein, the step of said estimation auditory masking threshold comprises:

Step ea: human auditory system frequency range 0-15500Hz is divided into several crucial sub-bands;

Step eb: calculate the auditory masking threshold in each sub-band respectively.

Wherein, Auditory masking threshold in each sub-band of said calculating is the energy that calculates each frequency on each sub-band; Calculate the propagation coefficient of people's ear basement membrane for each frequency range sound, the propagation coefficient of the energy of each frequency on each sub-band and each frequency range sound is multiplied each other obtains the epilamellar excitation energy value of people's ear then.Funtcional relationship according to epilamellar excitation energy value of people's ear and auditory masking threshold calculates masking threshold again.

Wherein, said combination Lagrange multiplier estimates that the step of linear filter is following:

Step e1:, be mapped to auditory masking threshold on the characteristic codomain according to the transformation relation of frequency domain to the characteristic codomain;

Step e2: estimate Lagrange multiplier, so that the power spectrum characteristic value of the residual noise that obtains behind the linear filtering is less than the auditory masking threshold on the characteristic codomain;

Step e3: further design a linear filter H of minimization voice distortion, the residual noise in the feasible enhancing voice influences thereby eliminate residual noise, and the distortion of target speech signal is minimized less than the auditory masking threshold of people's ear.

Beneficial effect of the present invention: traditional signal subspace method confirms that the method for subspace dimension normally establishes a fixed threshold, and the dimension of signal subspace is exactly the number greater than the eigenwert of this threshold value.Method effect in practical application of this definite subspace dimension is relatively poor, because the setting of threshold value has bigger artificial property, and usually can not the adjustment adaptively along with the change of signal.This has caused the subspace dimension to estimate often to occur having reduced the performance of signal subspace method than mistake.To this situation, the present invention has adopted and has a kind ofly confirmed the method for noise subspace dimension through test of hypothesis, has greatly reduced the error that the subspace dimension is estimated.For the spectrum of estimating noise power exactly, consider that the noise power spectrum in the noise subspace is less than the characteristics of the signals with noise power spectrum in the signal subspace, the present invention comes the estimating noise power spectrum with conditional probability.The present invention utilizes the method for estimating auditory masking threshold based on signal subspace.Noise limit, just can be fallen masking by noise below the threshold value at this, thereby realized enhancing the target speech signal.The present invention can use auditory masking effect according to human auditory system apperceive characteristic design linear filter on the characteristic codomain, need be with auditory masking threshold C _ThrBe mapped on the characteristic codomain.

Description of drawings

Further characteristic of the present invention and advantage will be described below with reference to illustrative accompanying drawing.

Fig. 1 illustrates an example flow diagram based on the signal subspace microphone array voice enhancement method of auditory perception property;

Fig. 2 is a process flow diagram of confirming noise subspace dimension through test of hypothesis;

Fig. 3 is a process flow diagram that on noise subspace, estimates noise power spectrum through the method for conditional probability;

Fig. 4 is a process flow diagram that calculates pleasant auditory masking threshold;

Fig. 5 is the process flow diagram of an estimation linear filter.

Embodiment

The following detailed description that should be appreciated that different examples and accompanying drawing is not to be intended to be limited to special illustrative example to the present invention; The illustrative example that is described only is illustration each step of the present invention, and its scope is defined by additional claim.

The present invention utilizes the auditory masking effect of people's ear to design a kind of new linear filter based on auditory perception property; The auditory masking effect of people's ear is meant; Under normal conditions, the target speech signal is strong signal, and ground unrest relatively a little less than; The human auditory system can confirm the auditory masking threshold on the frequency domain according to concrete target speech signal like this; If filtered residual noise is limited under the auditory masking threshold of people's ear, this noise just can be by the perception of people's ear so, thereby realizes the enhancing to Noisy Speech Signal.

Traditional signal subspace method confirms that the method for subspace dimension normally establishes a fixed threshold, and the dimension of signal subspace is exactly the number greater than the eigenwert of this threshold value.Method effect in practical application of this definite subspace dimension is relatively poor, because the setting of threshold value has bigger artificial property, and usually can not the adjustment adaptively along with the change of signal.This has caused the subspace dimension to estimate often to occur having reduced the performance of signal subspace method than mistake.

To this situation, step b) of the present invention has adopted a kind ofly confirms the method for noise subspace dimension through test of hypothesis, has greatly reduced the error that the subspace dimension is estimated.Method of the present invention is utilized the characteristics of noise subspace itself, and promptly noise power spectrum should equate on the white noise subspace.Because Λ _XIn eigenwert be descending sort, earlier the hypothesis noise subspace dimension is P=1, increases the dimension values of noise subspace then successively, test Λ _XIn last L-Q eigenwert whether equate that getting the maximal value that meets equal condition is the dimension P of noise subspace, so just can estimate noise subspace dimension comparatively exactly, and then obtain signal subspace dimension Q.

Utilize this thought, the present invention proposes and adopt condition hypothesis to come the method for estimating noise subspace dimension, it is following to propose null hypothesis and alternative hypothesis:

Null hypothesis H ₀: eigenvalue matrix Λ _XBack L-Q eigenwert all equate;

Suppose the eigenwert Gaussian distributed, then distribution function can be explained as follows:

F (H_{0}) = {(2 π)}^{- \frac{L - Q}{2}} {(\frac{1}{L - Q} Σ_{i = Q + 1}^{L} λ_{X_{i}})}^{- \frac{L - Q}{2}} e^{- \frac{1}{2} tr [Λ_{m}]}

(10)

F (H_{1}) = {(2 π)}^{- \frac{L - Q}{2}} {(Π_{i = Q + 1}^{L} λ_{X_{i}})}^{- \frac{1}{2}} e^{- \frac{1}{2} tr [Λ_{m}]}

Wherein,

Λ_{m} = Diag (λ_{X_{Q + 1}}, . . ., λ_{X_{L}}),

Tr [] asks trace operator, i ∈ Q+1 ..., L} is the subscript of eigenwert.

Order:

\overset{&OverBar;}{λ} = \frac{1}{L - Q} Σ_{i = Q + 1}^{L} λ_{X_{i}},

λ_{X_{i}} = \overset{&OverBar;}{λ} + h_{i},

h _iFor

Deviation with respect to λ.

- 2 \log \frac{F (H_{0})}{F (H_{1})} = - \log Π_{i = Q + 1}^{L} λ_{X_{i}} + (L - Q) \log \overset{&OverBar;}{λ}

= - Σ_{i = Q + 1}^{L} \log (\frac{λ_{X_{i}}}{\overset{&OverBar;}{λ}})

= - Σ_{i = Q + 1}^{L} (\frac{h_{i}}{\overset{&OverBar;}{λ}} - \frac{h_{i}^{2}}{2 {\overset{&OverBar;}{λ}}^{2}} + . . .) - - - (11)

\approx - Σ_{i = Q + 1}^{L} \frac{h_{i}}{\overset{&OverBar;}{λ}} + Σ_{i = Q + 1}^{L} \frac{h_{i}^{2}}{2 {\overset{&OverBar;}{λ}}^{2}}

= Σ_{i = Q + 1}^{L} \frac{h_{i}^{2}}{2 {\overset{&OverBar;}{λ}}^{2}}

Wherein, i ∈ Q+1 ..., L} is the subscript of eigenwert.

h _iObeying average approx is zero, and variance is 2 λ ²Gaussian distribution.So ,-2log [F (H ₀)/F (H ₁)] to obey degree of freedom approx be that the card side of θ=L-Q-1 distributes.Confirm confidence alpha, get satisfied

- 2 Log [F (H_{0}) / F (H_{1})] &GreaterEqual; χ_{θ, α}^{2}

Maximum L-Q value be noise subspace dimension P, and then obtain signal subspace dimension Q, wherein, χ _{θ, α} ²Acceptance domain lower bound when being α for card side's distribution degree of confidence of θ degree of freedom.

In step c), a kind of method that on noise subspace, estimates noise power spectrum through conditional probability is provided.For the spectrum of estimating noise power exactly, consider that the noise power spectrum in the noise subspace is less than the characteristics of the signals with noise power spectrum in the signal subspace, the present invention comes the estimating noise power spectrum with conditional probability.At first define two important parameters:

{\overset{&OverBar;}{λ}}_{N} = \frac{1}{L - Q} Σ_{i = Q + 1}^{L} λ_{X_{i}}

With

{\overset{&OverBar;}{λ}}_{S + N} = \frac{1}{Q} Σ_{i = 1}^{Q} λ_{X_{i}}

Wherein, i ∈ Q+1 ..., L} is the subscript of eigenwert.λ _NShould get less than λ _S+NValue, estimate as follows so the present invention provides noise power spectrum with conditional probability:

{\hat{σ}}_{N}^{2} = E [{\overset{&OverBar;}{λ}}_{N} | {\overset{&OverBar;}{λ}}_{N} < {\overset{&OverBar;}{λ}}_{S + N}]

= \frac{{&Integral;}_{0}^{{\overset{&OverBar;}{λ}}_{S + N}} x f_{{\overset{&OverBar;}{λ}}_{N}} (x) dx}{{&Integral;}_{0}^{{\overset{&OverBar;}{λ}}_{S + N}} f_{{\overset{&OverBar;}{λ}}_{S + N}} (x) dx}

= \frac{{&Integral;}_{0}^{{\overset{&OverBar;}{λ}}_{S + N}} \frac{x^{2}}{\sqrt{2 π {\overset{&OverBar;}{λ}}_{N}}} e^{- \frac{x^{2}}{2 {\overset{&OverBar;}{λ}}_{N}}} dx}{{&Integral;}_{0}^{{\overset{&OverBar;}{λ}}_{S + N}} \frac{x}{\sqrt{2 π {\overset{&OverBar;}{λ}}_{S + N}}} e^{- \frac{x^{2}}{2 {\overset{&OverBar;}{λ}}_{S + N}}} dx} - - - (12)

= \frac{\sqrt{2 π} {\overset{&OverBar;}{λ}}_{N} {(1 - e^{- \frac{{\overset{&OverBar;}{λ}}_{S + N}^{2}}{2 {\overset{&OverBar;}{λ}}_{N}}})}^{\frac{1}{2}} - \sqrt{{\overset{&OverBar;}{λ}}_{N}} {\overset{&OverBar;}{λ}}_{S + N} e^{- \frac{{\overset{&OverBar;}{λ}}_{S + N}^{2}}{2 {\overset{&OverBar;}{λ}}_{N}}}}{\sqrt{{\overset{&OverBar;}{λ}}_{S + N}} (1 - e^{- \frac{{\overset{&OverBar;}{λ}}_{S + N}}{2}})}

In the formula, f () is a probability density function, because the mistake of noise subspace dimension estimates or owe the evaluated error that estimation can cause noise power spectrum that this error can solve with a compensating factor.

Step d) provides a kind of method of estimating auditory masking threshold based on signal subspace.Noise limit, just can be fallen masking by noise below the threshold value at this, thereby realized enhancing the target speech signal.

The human auditory system frequency range is 0 to 15500Hz, has covered 24 crucial sub-bands, need in each sub-band, calculate auditory masking threshold.At first calculate the energy of each frequency on each sub-band, calculate the propagation coefficient of people's ear basement membrane for each frequency range sound again, the propagation coefficient of the energy of each frequency on each sub-band and each frequency range sound is multiplied each other obtains the epilamellar excitation energy value of people's ear then.At last, the funtcional relationship according to epilamellar excitation energy value of people's ear and auditory masking threshold further calculates masking threshold again.

Step e) provides a kind of method according to human auditory system apperceive characteristic design linear filter.In order on the characteristic codomain, to use auditory masking effect, need be with auditory masking threshold C _ThrBe mapped on the characteristic codomain.F.Jabloun and B.Champagne are in " Incorporating the HumanHearing Properties in the Signal Subspace Approach for SpeechEnhancement " (" application of human hearing characteristic in the signal subspace method that voice strengthen "); IEEE Trans.Speech Audio Process.Vol.11; No.6, pp.700-708 is in 2003; According to the transformation relation of frequency domain, provide auditory masking threshold C to the characteristic codomain _ThrMapping on the characteristic codomain is following:

θ = {| U_{1}^{H} |}^{2} C_{thr} - - - (13)

Wherein, θ=[θ ₁..., θ _Q] ^HBe the energy of sheltering of characteristic codomain, will be masked off by the target speech signal at the noise of sheltering under the energy.

Next, need to calculate strengthen the residual noise energy in the voice of back, so that it is lower than and shelters energy value and sheltered by the target speech signal.Residual noise

Can be made an uproar by band obtains behind the noise linear filtering in the input signal, that is:

\hat{N} = HN .

Calculate residual noise

Spectral power matrix following:

R_{\hat{N}} = E [\hat{N} {\hat{N}}^{H}]

= E [HN N^{H} H^{H}]

= H R_{N} H^{H} - - - (14)

{= UGU}^{H} ({\tilde{σ}}_{N}^{2} I) {UG}^{H} U^{H}

= U Λ_{\hat{N}} U^{H}

Wherein, I is L rank unit matrix,

Λ_{\hat{N}} = Λ_{S} {(Λ_{S} + {\tilde{σ}}_{N}^{2} Λ_{μ})}^{- 1} {\tilde{σ}}_{N}^{2} I {[Λ_{S} {(Λ_{S} + {\tilde{σ}}_{N}^{2} Λ_{μ})}^{- 1}]}^{H}

Be L rank diagonal matrix, its i diagonal element is:

λ_{{\hat{N}}_{i}} = {(\frac{λ_{S_{i}}}{λ_{S_{i}} + μ_{i} {\tilde{σ}}_{N}^{2}})}^{2} {\tilde{σ}}_{N}^{2},

i∈{1，…，L} (15)

Be masking noise, should make

λ_{{\hat{N}}_{i}} \leq θ_{i},

θ _iShelter energy value for i on the characteristic codomain is individual, i ∈ 1 ..., L} is the subscript of sheltering energy value.Can get:

μ_{i} &GreaterEqual; \frac{λ_{S_{i}} ({\tilde{σ}}_{N} - θ_{i}^{1 / 2})}{{\tilde{σ}}_{N}^{2} \cdot θ_{i}^{1 / 2}} - - - (16)

Consider and to make μ _i>=0, present embodiment is got:

μ_{i} = \{\begin{matrix} \frac{λ_{S_{i}} ({\tilde{σ}}_{N} - θ_{i}^{1 / 2})}{{\tilde{σ}}_{N}^{2} θ_{i}^{1 / 2}}, & if & {\tilde{σ}}_{N} - θ_{i}^{1 / 2} &GreaterEqual; 0 \\ 0, & if & {\tilde{σ}}_{N} - θ_{i}^{1 / 2} < 0 \end{matrix} - - - (17)

In the formula, i ∈ 1 ..., L} is a subscript.

(17) formula is updated in (9) formula, obtains the diagonal entry g of diagonal matrix G _iEstimation following:

g_{i} = \{\begin{matrix} \frac{1}{1 + \max ({\tilde{σ}}_{N} / θ_{i}^{1 / 2} - 1,0)}, & if & i = 1, . . ., Q \\ 0, & if & i = Q + 1, . . ., L \end{matrix} - - - (18)

In the formula, i ∈ 1 ..., L} is a subscript.

G is updated in (7) formula, can obtains required linear filter H.

In Fig. 1, provide an application based on the microphone array of multiple statistics model and human hearing characteristic after filtering sound enhancement method process flow diagram.System comprises the microphone array of at least two microphones 101.The microphone of microphone array has different arrangements, and especially, microphone 101 is placed in a row, and wherein each microphone and adjoining microphone have preset distance.For example, the distance between two microphones possibly approximately be 5 centimetres.For different application environments and technical requirement, microphone array possibly be set in place.

The voice signal of gathering from microphone 101 is sent to signal processing unit 102.Before being sent to signal processing unit, voice signal can pass through low-pass filter and come the pre-service voice signal.

The defeated voice signal of gathering of 102 pairs of different microphones of signal processing unit carries out delay compensation to realize time domain alignment.Each microphone signal after using discrete Fourier transformation in short-term to align is expressed as the frequency signal form of complex values; Calculate the spectral power matrix of microphone array input signal and this matrix is carried out characteristic value decomposition, obtain eigenvalue matrix and eigenvectors matrix.

In following step 103, to the eigenvalue matrix Λ of spectral power matrix _XCarry out test of hypothesis, confirm signal subspace dimension.

Then, step 104 utilizes noise power spectrum in the noise subspace to be less than the characteristics of the signals with noise power spectrum in the signal subspace on noise subspace, asks the method for expectation to estimate noise power spectrum through conditional probability.

The noise power spectrum that step 105 utilizes signal subspace dimension that step 103 obtains and step 104 to obtain is estimated, according to the human auditory system masking effect, estimates to obtain the auditory masking threshold of each frequency based on signal subspace.

The auditory masking threshold that noise power spectrum is estimated and step 105 obtains that step 106 utilizes step 104 to obtain is estimated linear filter in conjunction with Lagrange multiplier, realizes strengthening based on the signal subspace microphone array voice of auditory perception property.

At Fig. 2, the flow process of the method for a definite signal subspace dimension has been described, this method is corresponding to the step 103 among Fig. 1.

Before this method, through step 101 and step 102, the voice signal that microphone array is gathered has passed through time domain alignment, Short Time Fourier Transform, and the signal calculated power spectrum also carries out characteristic value decomposition to this matrix, obtains eigenvalue matrix and eigenvectors matrix.Can know by (4) formula, signals with noise power spectrum characteristic value matrix be broken down into power spectrum signal eigenwert and noise power spectrum eigenwert with, Q is the dimension of signal subspace.

Step 201, initialization Q, making it is L-1, even P=1.

Next, step 202 is by (11) formula renewal-2log [F (H ₀)/F (H ₁)] result of calculation.

Because-2log [F (H ₀)/F (H ₁)] to obey degree of freedom approx be that the card side of θ=L-Q-1 distributes.In the step 203, confirm confidence alpha, judgement-2log [F (H in advance ₀)/F (H ₁)] whether greater than χ _{θ, α} ²Especially, when condition satisfies, carry out step 204, Q accomplishes once from subtracting computing; Otherwise carry out step 205.Q is in order progressively to increase the dimension P of noise subspace from the purpose that subtracts computing, returns step 202 after subtracting completion certainly.

Step 205 is actually to have found out and satisfies condition

- 2 Log [F (H_{0}) / F (H_{1})] &GreaterEqual; χ_{θ, α}^{2}

Maximum L-Q value be noise subspace dimension P, and then signal subspace dimension Q is defined as:

\underset{Q}{\arg \max} (θ | - 2 \log [F (H_{0}) / F (H_{1})] &GreaterEqual; χ_{θ, α}^{2}) - - - (19)

In the formula, argmax () is an operator of seeking the parameter value with maximum scores.

In Fig. 3, a process flow diagram that on noise subspace, estimates noise power spectrum through the method for conditional probability has been described.This method is corresponding to the step 104 among Fig. 1.

For the spectrum of estimating noise power exactly, consider that the noise power spectrum in the noise subspace is less than the characteristics of the signals with noise power spectrum in the signal subspace, the signal subspace dimension Q that utilizes step 205 to obtain, step 301 is calculated two important parameters

{\overset{&OverBar;}{λ}}_{N} = \frac{1}{L - Q} Σ_{i = Q + 1}^{L} λ_{X_{i}},

With

{\overset{&OverBar;}{λ}}_{S + N} = \frac{1}{Q} Σ_{i = 1}^{Q} λ_{X_{i}},

I ∈ 1 ..., L} is a subscript.

Because λ _N≤λ _S+N, step 302 is utilized conditional probability estimating noise power spectrum, in this rewriting (12) formula

{\hat{σ}}_{N}^{2} = \frac{\sqrt{2 π} {\overset{&OverBar;}{λ}}_{N} {(1 - e^{- \frac{{\overset{&OverBar;}{λ}}_{S + N}^{2}}{2 {\overset{&OverBar;}{λ}}_{N}}})}^{\frac{1}{2}} - \sqrt{{\overset{&OverBar;}{λ}}_{N}} {\overset{&OverBar;}{λ}}_{S + N} e^{- \frac{{\overset{&OverBar;}{λ}}_{S + N}^{2}}{2 {\overset{&OverBar;}{λ}}_{N}}}}{\sqrt{{\overset{&OverBar;}{λ}}_{S + N}} (1 - e^{- \frac{{\overset{&OverBar;}{λ}}_{S + N}}{2}})} - - - (20)

The mistake of noise subspace dimension estimates or owes the evaluated error that estimation can cause noise power spectrum that this error can solve with a compensating factor.Step 303 is calculated compensating factor B (Q).

B (Q) = \frac{E [{\hat{σ}}_{N}^{2}]}{{\overset{&OverBar;}{σ}}_{N}^{2}} - - - (21)

Wherein, σ _N ²For estimating noise power spectrum, can obtain according to the VAD method.

Step 304 utilizes compensating factor to accomplish the correction that noise power spectrum is estimated, as follows:

{\tilde{σ}}_{N}^{2} = \frac{1}{B (Q)} {\hat{σ}}_{N}^{2} - - - (22)

In Fig. 4, a kind of process flow diagram that calculates the method for human auditory system masking threshold has been described.This method is corresponding to the step 105 among Fig. 1.For the masking by noise in the signal is fallen, thereby realize enhancing to the target speech signal, need be with noise limit at this below threshold value.

The intensity of estimating target voice signal need be used the base vector of signal subspace, so the signal subspace dimension that obtains according to step 205 is decomposed into two sub-matrices: U with eigenvectors matrix U ₁And U ₂, wherein, U ₁∈ C ^{L * Q}Be the base of signal subspace, U ₂∈ C ^{L * (L-Q)}Base for noise subspace.

The human auditory system frequency range is 0 to 15500Hz, has covered several crucial sub-bands, and step 401 has been divided into 24 sub-frequency bands to it.Need in each sub-band, calculate auditory masking threshold.

(j, b) expression is the energy on interior b the frequency of j sub-frequency bands to E, can calculate according to signal subspace eigenwert and proper vector.In step 402, calculated the energy of each frequency

E (j, b) = mean (\frac{1}{L} Σ_{i = 1}^{Q} λ_{S_{i}} {| U_{1, i} |}^{2}) - - - (23)

Wherein,

λ_{S_{i}} = λ_{X_{i}} - {\tilde{σ}}_{N}^{2}

For the eigenwert of target speech signal power spectrum matrix is estimated U _{1, i}Be i base of signal subspace, i ∈ 1 ..., Q} is a subscript, mean () is for getting the average operator.

SF (j) is the function of expressing people's ear basement membrane propagation characteristic on the j sub-frequency bands, j ∈ 1 ..., 24}.

In step 403, calculate the propagator of each sub-band

SF (j) = 15.81 + 7.5 (j + 0.474) - 17.5 \sqrt{1 + {(j + 0.474)}^{2}},

j∈{1，…，24} (24)

Next, the excitation energy value of energy on the step 404 reckoner traveller on a long journey ear basement membrane

C(j，b)＝SF(j)*E(j，b)，j∈{1，…，24} (25)

Step 405 is calculated auditory masking threshold

C_{thr} = 10^{\log_{10} | C (j, b) | - | \frac{O (j)}{10} | - | \frac{{\tilde{σ}}_{N}^{2}}{10} |} - - - (26)

Wherein, O (j) is a side-play amount, j ∈ 1 ..., 24} representes the j sub-frequency bands.

In Fig. 5, the process flow diagram of an estimation linear filter has been described.This method is corresponding to the step 106 among Fig. 1.

In order on the characteristic codomain, to use auditory masking effect, need be with auditory masking threshold C _ThrBe mapped on the characteristic codomain.Step 501 is according to the transformation relation of frequency domain to the characteristic codomain, by the auditory masking threshold θ=[θ on (13) formula calculated characteristics codomain ₁..., θ _Q] ^H

Next, step 502 utilizes (18) formula to calculate the diagonal entry g of diagonal matrix G _iEstimation, i ∈ 1 ..., L} is the subscript of diagonal entry.

Finally, step 503 can obtain required linear filter H with in G matrix substitution (7) formula.

According to this instructions, the present invention further revises and the technician that changes for said field is conspicuous.Therefore, this explanation will be regarded as illustrative and its objective is to one of ordinary skill in the art's instruction and be used to carry out conventional method of the present invention.Should be appreciated that the form of the present invention that this instructions illustrates and describes just is counted as current preferred embodiment.

Claims

1. the signal subspace microphone array voice enhancement method based on auditory perception property comprises the following steps:

Step e: according to noise power spectrum, auditory masking threshold; Estimate linear filter in conjunction with Lagrange multiplier; Realization strengthens based on the signal subspace microphone array voice of auditory perception property, and said combination Lagrange multiplier estimates that the step of linear filter is following:

2. signal subspace microphone array voice enhancement method as claimed in claim 1 is characterized in that, said spectral power matrix is carried out characteristic value decomposition, comprising:

Setting Noisy Speech Signal X is: X=S+N,

So, spectral power matrix RX is expressed as:

R_{X} = U Λ_{X} U^{H} = U (Λ_{S} + σ_{N}^{2} I) U^{H}

Wherein, S is the target speech signal, and N is a noise, R _XBe Noisy Speech Signal spectral power matrix, Λ _XBe the Noisy Speech Signal power spectrum characteristic value matrix of eigenwert descending sort, Λ _SBe the target speech power spectrum signal eigenvalue matrix of eigenwert descending sort, U is an eigenvectors matrix,

Be white noise power, I is L rank unit matrix, [] ^HBe the conjugate transpose operator.

3. signal subspace microphone array voice enhancement method as claimed in claim 1 is characterized in that, said test of hypothesis is at null hypothesis H ₀: eigenvalue matrix Λ _XPrerequisite that all equate to set up of back L-Q eigenwert under, get minimum signal subspace dimension Q value.

4. signal subspace microphone array voice enhancement method as claimed in claim 3 is characterized in that, judges that the step whether null hypothesis can be set up comprises as follows:

Null hypothesis H ₀: eigenvalue matrix Λ _XBack L-Q eigenwert all equate;

Signal subspace dimension is defined as:

\underset{Q}{\arg \max} (θ | - 2 \log [F (H_{0}) / F (H_{1})] &GreaterEqual; χ_{θ, α}^{2})

Maximum L-Q value be noise subspace dimension P, argmax () is an operator of seeking the parameter value with maximum scores,

Acceptance domain lower bound when being α for card side's distribution degree of confidence of θ degree of freedom.

5. signal subspace microphone array voice enhancement method as claimed in claim 4 is characterized in that, the distribution function F (H of said eigenwert ₀) and F (H ₁) the employing Gauss model.

6. signal subspace microphone array voice enhancement method as claimed in claim 1 is characterized in that, for the error that the noise power spectrum that is produced by the misjudgment of noise subspace dimension is estimated, uses a compensating factor to compensate; Compensating factor is the expectation value of noise power spectrum estimation and the ratio of noise power and estimation; Noise power spectrum is estimated divided by compensating factor, obtained revised noise power spectrum and estimate.

7. signal subspace microphone array voice enhancement method as claimed in claim 1 is characterized in that, the step of said estimation auditory masking threshold comprises:

8. signal subspace microphone array voice enhancement method as claimed in claim 7; It is characterized in that; Auditory masking threshold in each sub-band of said calculating is the energy that calculates each frequency on each sub-band; Calculate the propagation coefficient of people's ear basement membrane for each frequency range sound, the propagation coefficient of the energy of each frequency on each sub-band and each frequency range sound is multiplied each other obtains the epilamellar excitation energy value of people's ear then; Funtcional relationship according to epilamellar excitation energy value of people's ear and auditory masking threshold calculates masking threshold again.