CN101777349B - Auditory perception property-based signal subspace microphone array voice enhancement method - Google Patents

Auditory perception property-based signal subspace microphone array voice enhancement method Download PDF

Info

Publication number
CN101777349B
CN101777349B CN2009102498006A CN200910249800A CN101777349B CN 101777349 B CN101777349 B CN 101777349B CN 2009102498006 A CN2009102498006 A CN 2009102498006A CN 200910249800 A CN200910249800 A CN 200910249800A CN 101777349 B CN101777349 B CN 101777349B
Authority
CN
China
Prior art keywords
noise
subspace
signal
power spectrum
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009102498006A
Other languages
Chinese (zh)
Other versions
CN101777349A (en
Inventor
刘文举
程宁
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN2009102498006A priority Critical patent/CN101777349B/en
Publication of CN101777349A publication Critical patent/CN101777349A/en
Application granted granted Critical
Publication of CN101777349B publication Critical patent/CN101777349B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses an auditory perception property-based signal subspace microphone array voice enhancement method, which fully combines auditory masking effect of human ears based on the traditional microphone array signal subspace voice enhancement method. The algorithm core of the signal subspace is to reasonably estimate a linear filter, and the key point of the algorithm comprises the following steps: accurately estimating the dimensionality and noise power spectrum of the signal subspace, and reasonably estimating a Lagrangian multiplier. Therefore, the invention provides a feasible solution, which comprises the following steps: performing time domain alignment, short-time Fourier transform and power spectrum characteristic value decomposition on signals acquired by a microphone array; determining the dimensionality of a noise subspace by hypothesis testing; estimating a noise power spectrum by a conditional probability method in the noise subspace; estimating an auditory masking threshold value based on the signal subspace; and estimating the linear filter by combining the Lagrangian multiplier according to the auditory perception property.

Description

Signal subspace microphone array voice enhancement method based on auditory perception property
Technical field
The present invention relates to the design of signal subspace method, human auditory system masking effect and the postfilter of microphone array.
Background technology
Microphone array voice enhancement method has obtained extensive studies in recent years.Wherein, the signal subspace algorithm has the ability of outstanding elimination additivity broadband noise.The signal subspace algorithm is signal subspace (comprising target speech signal and noise) and noise subspace (only comprising noise) with the signals with noise spatial decomposition, and in signal subspace, estimates the target speech signal.The core of signal subspace algorithm is reasonably to estimate that linear filter, one of its main points are estimated signal subspace dimension and noise power spectrum exactly.Research to the signal subspace sound enhancement method has proved that this method has good voice and strengthens the property.Although the signal subspace algorithm performance is superior, want to eliminate fully noise, still have suitable difficulty.Usually, after the de-noising of signal subspace algorithm, still can have certain residual noise in the enhancing voice, these noises have reduced the perceived quality of voice.In order to reduce the influence of residual noise to the target speech signal as far as possible, the auditory masking effect of people's finder's ear on a large amount of experiment basis can be used for reaching this target.The auditory masking effect of people's ear is meant; Under normal conditions; The target speech signal is strong signal, and ground unrest relatively a little less than, the human auditory system can confirm the auditory masking threshold on the frequency domain according to concrete target speech signal like this; If filtered residual noise is limited under the auditory masking threshold of people's ear, this noise just can not be by the perception of people's ear so.Through research for many years, this auditory response has been applied in the sound enhancement method effectively.As long as the amount of the residual noise in the voice after will strengthening is limited in certain scope, just can make its under the sheltering of target speech signal not by the perception of people's ear, thereby realize enhancing to the target speech signal.
The principle of signal subspace algorithm is through the method for characteristic value decomposition the signals with noise spatial decomposition to be become two sub spaces: signal subspace (comprising target speech signal and noise) and noise subspace (only comprising noise) recover the target speech signal then on signal subspace.The reason of doing like this is: voice signal can be modeled as the linear combination of some base vectors.Usually, some eigenwerts of clean speech signal power spectrum matrix are in close proximity to zero, and this energy that shows the clean speech signal only is distributed on some base vector.The noise of signal subspace algorithm is assumed to be white noise (coloured noise can be given albefaction through the method for prewhitening), and all eigenwerts of white noise all are positive, and noise energy is distributed on all base vectors of signals with noise.So, can resolve into a signal subspace (comprising target speech signal and noise) and a noise subspace (only comprising noise) by the space that the base of signals with noise is formed.Correspondingly, on signal subspace, just can recover the target speech signal, noise subspace then can not considered owing to not comprising the target speech signal.
The frequency domain representation of the Noisy Speech Signal vector that receives on the array of supposing to be made up of L microphone is: X=[X 1..., X L] HThe frequency domain representation of the voice signal after the enhancing that is obtained by the weighting summation of array input signal is following:
Y=w HX=w H[S+N] (1)
Wherein, w=[w 1..., w L] HBe coefficient vector, S is the target speech signal, and N is a noise, [] HBe the conjugate transpose operator.
If R XBe the spectral power matrix of signals with noise, R SBe the spectral power matrix of target speech signal, R NSpectral power matrix for noise.Under target speech signal and the incoherent hypothesis of noise signal, have:
R X=R S+R N (2)
The characteristic value decomposition of target speech signal power spectrum matrix can be explained as follows:
R S=UΛ SU H (3)
Wherein, Λ SBe the eigenvalue matrix of eigenwert descending sort, rank of matrix is Q, and promptly back L-Q item is 0, and U is the characteristic of correspondence vector matrix.
Suppose that noise is that white noise and power spectrum are σ N 2, then have:
R X = UΛ X U H = U ( Λ S + σ N 2 I ) U H - - - ( 4 )
Wherein, Λ XBe the Noisy Speech Signal power spectrum characteristic value matrix of eigenwert descending sort, I is L rank unit matrix.
R XI eigenwert With R SI eigenwert
Figure G2009102498006D00023
Have the relation of following formula to set up:
λ S i = λ X i - σ N 2 , if i = 1 , . . . , Q 0 , if i = Q + 1 , . . . , L - - - ( 5 )
Wherein, i ∈ 1 ..., L} is the subscript of eigenwert.
If H is a linear filter, the estimation that can obtain the target speech signal is following:
S ^ = HX - - - ( 6 )
In fact, the target speech quality of signals of being recovered by linear filter mainly shows two aspects: the one, and the distortion of target speech signal, the 2nd, the size of residual noise.Y.Ephaim and H.L.Van Trees are at " A signal subspace approach for speech enhancement " (" a kind of signal subspace method that is used for the voice enhancing "), IEEE Trans.Speech AudioProcess., vol.3; No.4, pp.251-266, Jul; In 1995; Under the condition of noise limit in certain scope, through the distortion of minimization voice, the expression formula that has obtained linear filter H is following:
H = UG U H = U G 1 0 0 0 U H - - - ( 7 )
Wherein, G 1Non-singular matrix for Q * Q.
G can explain as follows:
G = Λ S ( Λ S + σ N 2 Λ μ ) - 1 - - - ( 8 )
Wherein, Λ μ=diag (μ 1..., μ L) be L rank Lagrange multiplier matrixes.
G is L rank diagonal matrix, diagonal entry g iCan explain as follows:
g i = λ S i λ S i + μ i σ N 2 , if i = 1 , . . . , Q 0 , if i = Q + 1 , . . . , L - - - ( 9 )
Wherein, μ iBe i Lagrange multiplier, i ∈ 1 ..., L} is a subscript.
Summary of the invention
In order to solve prior art problems; The objective of the invention is to linear filter is estimated; Utilize a kind of new linear filter of auditory masking effect design of people's ear based on auditory perception property; Thus, the present invention provides a kind of signal subspace microphone array voice enhancement method based on auditory perception property.
For reaching said purpose, the present invention provides a kind of signal subspace microphone array voice enhancement method based on auditory perception property, and the concrete steps of this method are following:
Step a: through the multi-path voice signal of microphone array collection band noise; Carry out time domain alignment to each road Noisy Speech Signal; The frequency signal form of each the road signal indication value of pluralizing after using discrete Fourier transformation in short-term to align; Calculate the spectral power matrix of microphone array multiple signals and this spectral power matrix is carried out characteristic value decomposition, obtain eigenvalue matrix and eigenvectors matrix;
Step b: the eigenvalue matrix to spectral power matrix is carried out test of hypothesis, confirms signal subspace dimension Q;
Step c: on noise subspace, utilize noise power spectrum in the noise subspace to be less than the characteristics of the signals with noise power spectrum in the signal subspace, ask the method for expectation to estimate noise power spectrum through conditional probability;
Steps d: utilize noise subspace dimension P and noise power spectrum to estimate; Based on the human auditory system masking effect; Estimation obtains the auditory masking threshold of each frequency based on signal subspace, and noise subspace dimension is expressed as: P=L-Q, L are the numbers of microphone in the microphone array;
Step e: according to noise power spectrum, auditory masking threshold, estimate linear filter, realize strengthening based on the signal subspace microphone array voice of auditory perception property in conjunction with Lagrange multiplier.
Wherein, said spectral power matrix is carried out characteristic value decomposition, comprising:
Setting Noisy Speech Signal X is: X=S+N,
So, spectral power matrix R XBe expressed as:
R X = UΛ X U H = U ( Λ S + σ N 2 I ) U H
Wherein, S is the target speech signal, and N is a noise, R XBe Noisy Speech Signal spectral power matrix, Λ XBe the Noisy Speech Signal power spectrum characteristic value matrix of eigenwert descending sort, Λ SBe the target speech power spectrum signal eigenvalue matrix of eigenwert descending sort, U is an eigenvectors matrix, σ N 2Be white noise power, I is L rank unit matrix, [] HBe the conjugate transpose operator.
Wherein, said test of hypothesis is at null hypothesis H 0: eigenvalue matrix Λ XPrerequisite that all equate to set up of back L-Q eigenwert under, get minimum signal subspace dimension Q value.
Wherein, judge that the step whether null hypothesis can be set up comprises as follows:
Null hypothesis H 0: eigenvalue matrix Λ XBack L-Q eigenwert all equate;
Alternative hypothesis H 1: eigenvalue matrix Λ XBack L-Q eigenwert in have two eigenwert differences at least;
Signal subspace dimension is defined as:
arg max Q ( θ | - 2 log [ F ( H 0 ) / F ( H 1 ) ] ≥ χ θ , α 2 )
In the formula ,-2log [F (H 0)/F (H 1)] approximate obedience degree of freedom is that the card side of θ=L-Q-1 distributes, α is a degree of confidence, F (H 0) and F (H 1) be the distribution function of eigenwert; Promptly get satisfied - 2 Log [ F ( H 0 ) / F ( H 1 ) ] ≥ χ θ , α 2 Maximum L-Q value be noise subspace dimension P, argmax () is an operator of seeking the parameter value with maximum scores, χ θ, α 2Acceptance domain lower bound when being α for card side's distribution degree of confidence of θ degree of freedom.
Wherein, the distribution function F (H of said eigenwert 0) and F (H 1) the employing Gauss model.
Wherein, for the error that the noise power spectrum that is produced by the misjudgment of noise subspace dimension is estimated, use a compensating factor to compensate; Compensating factor is the expectation value of noise power spectrum estimation and the ratio of noise power and estimation.Noise power spectrum is estimated divided by compensating factor, obtained revised noise power spectrum and estimate.
Wherein, the step of said estimation auditory masking threshold comprises:
Step ea: human auditory system frequency range 0-15500Hz is divided into several crucial sub-bands;
Step eb: calculate the auditory masking threshold in each sub-band respectively.
Wherein, Auditory masking threshold in each sub-band of said calculating is the energy that calculates each frequency on each sub-band; Calculate the propagation coefficient of people's ear basement membrane for each frequency range sound, the propagation coefficient of the energy of each frequency on each sub-band and each frequency range sound is multiplied each other obtains the epilamellar excitation energy value of people's ear then.Funtcional relationship according to epilamellar excitation energy value of people's ear and auditory masking threshold calculates masking threshold again.
Wherein, said combination Lagrange multiplier estimates that the step of linear filter is following:
Step e1:, be mapped to auditory masking threshold on the characteristic codomain according to the transformation relation of frequency domain to the characteristic codomain;
Step e2: estimate Lagrange multiplier, so that the power spectrum characteristic value of the residual noise that obtains behind the linear filtering is less than the auditory masking threshold on the characteristic codomain;
Step e3: further design a linear filter H of minimization voice distortion, the residual noise in the feasible enhancing voice influences thereby eliminate residual noise, and the distortion of target speech signal is minimized less than the auditory masking threshold of people's ear.
Beneficial effect of the present invention: traditional signal subspace method confirms that the method for subspace dimension normally establishes a fixed threshold, and the dimension of signal subspace is exactly the number greater than the eigenwert of this threshold value.Method effect in practical application of this definite subspace dimension is relatively poor, because the setting of threshold value has bigger artificial property, and usually can not the adjustment adaptively along with the change of signal.This has caused the subspace dimension to estimate often to occur having reduced the performance of signal subspace method than mistake.To this situation, the present invention has adopted and has a kind ofly confirmed the method for noise subspace dimension through test of hypothesis, has greatly reduced the error that the subspace dimension is estimated.For the spectrum of estimating noise power exactly, consider that the noise power spectrum in the noise subspace is less than the characteristics of the signals with noise power spectrum in the signal subspace, the present invention comes the estimating noise power spectrum with conditional probability.The present invention utilizes the method for estimating auditory masking threshold based on signal subspace.Noise limit, just can be fallen masking by noise below the threshold value at this, thereby realized enhancing the target speech signal.The present invention can use auditory masking effect according to human auditory system apperceive characteristic design linear filter on the characteristic codomain, need be with auditory masking threshold C ThrBe mapped on the characteristic codomain.
Description of drawings
Further characteristic of the present invention and advantage will be described below with reference to illustrative accompanying drawing.
Fig. 1 illustrates an example flow diagram based on the signal subspace microphone array voice enhancement method of auditory perception property;
Fig. 2 is a process flow diagram of confirming noise subspace dimension through test of hypothesis;
Fig. 3 is a process flow diagram that on noise subspace, estimates noise power spectrum through the method for conditional probability;
Fig. 4 is a process flow diagram that calculates pleasant auditory masking threshold;
Fig. 5 is the process flow diagram of an estimation linear filter.
Embodiment
The following detailed description that should be appreciated that different examples and accompanying drawing is not to be intended to be limited to special illustrative example to the present invention; The illustrative example that is described only is illustration each step of the present invention, and its scope is defined by additional claim.
The present invention utilizes the auditory masking effect of people's ear to design a kind of new linear filter based on auditory perception property; The auditory masking effect of people's ear is meant; Under normal conditions, the target speech signal is strong signal, and ground unrest relatively a little less than; The human auditory system can confirm the auditory masking threshold on the frequency domain according to concrete target speech signal like this; If filtered residual noise is limited under the auditory masking threshold of people's ear, this noise just can be by the perception of people's ear so, thereby realizes the enhancing to Noisy Speech Signal.
Traditional signal subspace method confirms that the method for subspace dimension normally establishes a fixed threshold, and the dimension of signal subspace is exactly the number greater than the eigenwert of this threshold value.Method effect in practical application of this definite subspace dimension is relatively poor, because the setting of threshold value has bigger artificial property, and usually can not the adjustment adaptively along with the change of signal.This has caused the subspace dimension to estimate often to occur having reduced the performance of signal subspace method than mistake.
To this situation, step b) of the present invention has adopted a kind ofly confirms the method for noise subspace dimension through test of hypothesis, has greatly reduced the error that the subspace dimension is estimated.Method of the present invention is utilized the characteristics of noise subspace itself, and promptly noise power spectrum should equate on the white noise subspace.Because Λ XIn eigenwert be descending sort, earlier the hypothesis noise subspace dimension is P=1, increases the dimension values of noise subspace then successively, test Λ XIn last L-Q eigenwert whether equate that getting the maximal value that meets equal condition is the dimension P of noise subspace, so just can estimate noise subspace dimension comparatively exactly, and then obtain signal subspace dimension Q.
Utilize this thought, the present invention proposes and adopt condition hypothesis to come the method for estimating noise subspace dimension, it is following to propose null hypothesis and alternative hypothesis:
Null hypothesis H 0: eigenvalue matrix Λ XBack L-Q eigenwert all equate;
Alternative hypothesis H 1: eigenvalue matrix Λ XBack L-Q eigenwert in have two eigenwert differences at least;
Suppose the eigenwert Gaussian distributed, then distribution function can be explained as follows:
F ( H 0 ) = ( 2 π ) - L - Q 2 ( 1 L - Q Σ i = Q + 1 L λ X i ) - L - Q 2 e - 1 2 tr [ Λ m ] (10)
F ( H 1 ) = ( 2 π ) - L - Q 2 ( Π i = Q + 1 L λ X i ) - 1 2 e - 1 2 tr [ Λ m ]
Wherein, Λ m = Diag ( λ X Q + 1 , . . . , λ X L ) , Tr [] asks trace operator, i ∈ Q+1 ..., L} is the subscript of eigenwert.
Order: λ ‾ = 1 L - Q Σ i = Q + 1 L λ X i , λ X i = λ ‾ + h i , h iFor
Figure G2009102498006D00076
Deviation with respect to λ.
- 2 log F ( H 0 ) F ( H 1 ) = - log Π i = Q + 1 L λ X i + ( L - Q ) log λ ‾
= - Σ i = Q + 1 L log ( λ X i λ ‾ )
= - Σ i = Q + 1 L ( h i λ ‾ - h i 2 2 λ ‾ 2 + . . . ) - - - ( 11 )
≈ - Σ i = Q + 1 L h i λ ‾ + Σ i = Q + 1 L h i 2 2 λ ‾ 2
= Σ i = Q + 1 L h i 2 2 λ ‾ 2
Wherein, i ∈ Q+1 ..., L} is the subscript of eigenwert.
h iObeying average approx is zero, and variance is 2 λ 2Gaussian distribution.So ,-2log [F (H 0)/F (H 1)] to obey degree of freedom approx be that the card side of θ=L-Q-1 distributes.Confirm confidence alpha, get satisfied - 2 Log [ F ( H 0 ) / F ( H 1 ) ] ≥ χ θ , α 2 Maximum L-Q value be noise subspace dimension P, and then obtain signal subspace dimension Q, wherein, χ θ, α 2Acceptance domain lower bound when being α for card side's distribution degree of confidence of θ degree of freedom.
In step c), a kind of method that on noise subspace, estimates noise power spectrum through conditional probability is provided.For the spectrum of estimating noise power exactly, consider that the noise power spectrum in the noise subspace is less than the characteristics of the signals with noise power spectrum in the signal subspace, the present invention comes the estimating noise power spectrum with conditional probability.At first define two important parameters:
λ ‾ N = 1 L - Q Σ i = Q + 1 L λ X i With λ ‾ S + N = 1 Q Σ i = 1 Q λ X i
Wherein, i ∈ Q+1 ..., L} is the subscript of eigenwert.λ NShould get less than λ S+NValue, estimate as follows so the present invention provides noise power spectrum with conditional probability:
&sigma; ^ N 2 = E [ &lambda; &OverBar; N | &lambda; &OverBar; N < &lambda; &OverBar; S + N ]
= &Integral; 0 &lambda; &OverBar; S + N x f &lambda; &OverBar; N ( x ) dx &Integral; 0 &lambda; &OverBar; S + N f &lambda; &OverBar; S + N ( x ) dx
= &Integral; 0 &lambda; &OverBar; S + N x 2 2 &pi; &lambda; &OverBar; N e - x 2 2 &lambda; &OverBar; N dx &Integral; 0 &lambda; &OverBar; S + N x 2 &pi; &lambda; &OverBar; S + N e - x 2 2 &lambda; &OverBar; S + N dx - - - ( 12 )
= 2 &pi; &lambda; &OverBar; N ( 1 - e - &lambda; &OverBar; S + N 2 2 &lambda; &OverBar; N ) 1 2 - &lambda; &OverBar; N &lambda; &OverBar; S + N e - &lambda; &OverBar; S + N 2 2 &lambda; &OverBar; N &lambda; &OverBar; S + N ( 1 - e - &lambda; &OverBar; S + N 2 )
In the formula, f () is a probability density function, because the mistake of noise subspace dimension estimates or owe the evaluated error that estimation can cause noise power spectrum that this error can solve with a compensating factor.
Step d) provides a kind of method of estimating auditory masking threshold based on signal subspace.Noise limit, just can be fallen masking by noise below the threshold value at this, thereby realized enhancing the target speech signal.
The human auditory system frequency range is 0 to 15500Hz, has covered 24 crucial sub-bands, need in each sub-band, calculate auditory masking threshold.At first calculate the energy of each frequency on each sub-band, calculate the propagation coefficient of people's ear basement membrane for each frequency range sound again, the propagation coefficient of the energy of each frequency on each sub-band and each frequency range sound is multiplied each other obtains the epilamellar excitation energy value of people's ear then.At last, the funtcional relationship according to epilamellar excitation energy value of people's ear and auditory masking threshold further calculates masking threshold again.
Step e) provides a kind of method according to human auditory system apperceive characteristic design linear filter.In order on the characteristic codomain, to use auditory masking effect, need be with auditory masking threshold C ThrBe mapped on the characteristic codomain.F.Jabloun and B.Champagne are in " Incorporating the HumanHearing Properties in the Signal Subspace Approach for SpeechEnhancement " (" application of human hearing characteristic in the signal subspace method that voice strengthen "); IEEE Trans.Speech Audio Process.Vol.11; No.6, pp.700-708 is in 2003; According to the transformation relation of frequency domain, provide auditory masking threshold C to the characteristic codomain ThrMapping on the characteristic codomain is following:
&theta; = | U 1 H | 2 C thr - - - ( 13 )
Wherein, θ=[θ 1..., θ Q] HBe the energy of sheltering of characteristic codomain, will be masked off by the target speech signal at the noise of sheltering under the energy.
Next, need to calculate strengthen the residual noise energy in the voice of back, so that it is lower than and shelters energy value and sheltered by the target speech signal.Residual noise
Figure G2009102498006D00102
Can be made an uproar by band obtains behind the noise linear filtering in the input signal, that is: N ^ = HN . Calculate residual noise
Figure G2009102498006D00104
Spectral power matrix following:
R N ^ = E [ N ^ N ^ H ]
= E [ HN N H H H ]
= H R N H H - - - ( 14 )
= UGU H ( &sigma; ~ N 2 I ) UG H U H
= U &Lambda; N ^ U H
Wherein, I is L rank unit matrix, &Lambda; N ^ = &Lambda; S ( &Lambda; S + &sigma; ~ N 2 &Lambda; &mu; ) - 1 &sigma; ~ N 2 I [ &Lambda; S ( &Lambda; S + &sigma; ~ N 2 &Lambda; &mu; ) - 1 ] H Be L rank diagonal matrix, its i diagonal element is:
&lambda; N ^ i = ( &lambda; S i &lambda; S i + &mu; i &sigma; ~ N 2 ) 2 &sigma; ~ N 2 , i∈{1,…,L} (15)
Be masking noise, should make &lambda; N ^ i &le; &theta; i , θ iShelter energy value for i on the characteristic codomain is individual, i ∈ 1 ..., L} is the subscript of sheltering energy value.Can get:
&mu; i &GreaterEqual; &lambda; S i ( &sigma; ~ N - &theta; i 1 / 2 ) &sigma; ~ N 2 &CenterDot; &theta; i 1 / 2 - - - ( 16 )
Consider and to make μ i>=0, present embodiment is got:
&mu; i = &lambda; S i ( &sigma; ~ N - &theta; i 1 / 2 ) &sigma; ~ N 2 &theta; i 1 / 2 , if &sigma; ~ N - &theta; i 1 / 2 &GreaterEqual; 0 0 , if &sigma; ~ N - &theta; i 1 / 2 < 0 - - - ( 17 )
In the formula, i ∈ 1 ..., L} is a subscript.
(17) formula is updated in (9) formula, obtains the diagonal entry g of diagonal matrix G iEstimation following:
g i = 1 1 + max ( &sigma; ~ N / &theta; i 1 / 2 - 1,0 ) , if i = 1 , . . . , Q 0 , if i = Q + 1 , . . . , L - - - ( 18 )
In the formula, i ∈ 1 ..., L} is a subscript.
G is updated in (7) formula, can obtains required linear filter H.
In Fig. 1, provide an application based on the microphone array of multiple statistics model and human hearing characteristic after filtering sound enhancement method process flow diagram.System comprises the microphone array of at least two microphones 101.The microphone of microphone array has different arrangements, and especially, microphone 101 is placed in a row, and wherein each microphone and adjoining microphone have preset distance.For example, the distance between two microphones possibly approximately be 5 centimetres.For different application environments and technical requirement, microphone array possibly be set in place.
The voice signal of gathering from microphone 101 is sent to signal processing unit 102.Before being sent to signal processing unit, voice signal can pass through low-pass filter and come the pre-service voice signal.
The defeated voice signal of gathering of 102 pairs of different microphones of signal processing unit carries out delay compensation to realize time domain alignment.Each microphone signal after using discrete Fourier transformation in short-term to align is expressed as the frequency signal form of complex values; Calculate the spectral power matrix of microphone array input signal and this matrix is carried out characteristic value decomposition, obtain eigenvalue matrix and eigenvectors matrix.
In following step 103, to the eigenvalue matrix Λ of spectral power matrix XCarry out test of hypothesis, confirm signal subspace dimension.
Then, step 104 utilizes noise power spectrum in the noise subspace to be less than the characteristics of the signals with noise power spectrum in the signal subspace on noise subspace, asks the method for expectation to estimate noise power spectrum through conditional probability.
The noise power spectrum that step 105 utilizes signal subspace dimension that step 103 obtains and step 104 to obtain is estimated, according to the human auditory system masking effect, estimates to obtain the auditory masking threshold of each frequency based on signal subspace.
The auditory masking threshold that noise power spectrum is estimated and step 105 obtains that step 106 utilizes step 104 to obtain is estimated linear filter in conjunction with Lagrange multiplier, realizes strengthening based on the signal subspace microphone array voice of auditory perception property.
At Fig. 2, the flow process of the method for a definite signal subspace dimension has been described, this method is corresponding to the step 103 among Fig. 1.
Before this method, through step 101 and step 102, the voice signal that microphone array is gathered has passed through time domain alignment, Short Time Fourier Transform, and the signal calculated power spectrum also carries out characteristic value decomposition to this matrix, obtains eigenvalue matrix and eigenvectors matrix.Can know by (4) formula, signals with noise power spectrum characteristic value matrix be broken down into power spectrum signal eigenwert and noise power spectrum eigenwert with, Q is the dimension of signal subspace.
Step 201, initialization Q, making it is L-1, even P=1.
Next, step 202 is by (11) formula renewal-2log [F (H 0)/F (H 1)] result of calculation.
Because-2log [F (H 0)/F (H 1)] to obey degree of freedom approx be that the card side of θ=L-Q-1 distributes.In the step 203, confirm confidence alpha, judgement-2log [F (H in advance 0)/F (H 1)] whether greater than χ θ, α 2Especially, when condition satisfies, carry out step 204, Q accomplishes once from subtracting computing; Otherwise carry out step 205.Q is in order progressively to increase the dimension P of noise subspace from the purpose that subtracts computing, returns step 202 after subtracting completion certainly.
Step 205 is actually to have found out and satisfies condition - 2 Log [ F ( H 0 ) / F ( H 1 ) ] &GreaterEqual; &chi; &theta; , &alpha; 2 Maximum L-Q value be noise subspace dimension P, and then signal subspace dimension Q is defined as:
arg max Q ( &theta; | - 2 log [ F ( H 0 ) / F ( H 1 ) ] &GreaterEqual; &chi; &theta; , &alpha; 2 ) - - - ( 19 )
In the formula, argmax () is an operator of seeking the parameter value with maximum scores.
In Fig. 3, a process flow diagram that on noise subspace, estimates noise power spectrum through the method for conditional probability has been described.This method is corresponding to the step 104 among Fig. 1.
For the spectrum of estimating noise power exactly, consider that the noise power spectrum in the noise subspace is less than the characteristics of the signals with noise power spectrum in the signal subspace, the signal subspace dimension Q that utilizes step 205 to obtain, step 301 is calculated two important parameters &lambda; &OverBar; N = 1 L - Q &Sigma; i = Q + 1 L &lambda; X i , With &lambda; &OverBar; S + N = 1 Q &Sigma; i = 1 Q &lambda; X i , I ∈ 1 ..., L} is a subscript.
Because λ N≤λ S+N, step 302 is utilized conditional probability estimating noise power spectrum, in this rewriting (12) formula
&sigma; ^ N 2 = 2 &pi; &lambda; &OverBar; N ( 1 - e - &lambda; &OverBar; S + N 2 2 &lambda; &OverBar; N ) 1 2 - &lambda; &OverBar; N &lambda; &OverBar; S + N e - &lambda; &OverBar; S + N 2 2 &lambda; &OverBar; N &lambda; &OverBar; S + N ( 1 - e - &lambda; &OverBar; S + N 2 ) - - - ( 20 )
The mistake of noise subspace dimension estimates or owes the evaluated error that estimation can cause noise power spectrum that this error can solve with a compensating factor.Step 303 is calculated compensating factor B (Q).
B ( Q ) = E [ &sigma; ^ N 2 ] &sigma; &OverBar; N 2 - - - ( 21 )
Wherein, σ N 2For estimating noise power spectrum, can obtain according to the VAD method.
Step 304 utilizes compensating factor to accomplish the correction that noise power spectrum is estimated, as follows:
&sigma; ~ N 2 = 1 B ( Q ) &sigma; ^ N 2 - - - ( 22 )
In Fig. 4, a kind of process flow diagram that calculates the method for human auditory system masking threshold has been described.This method is corresponding to the step 105 among Fig. 1.For the masking by noise in the signal is fallen, thereby realize enhancing to the target speech signal, need be with noise limit at this below threshold value.
The intensity of estimating target voice signal need be used the base vector of signal subspace, so the signal subspace dimension that obtains according to step 205 is decomposed into two sub-matrices: U with eigenvectors matrix U 1And U 2, wherein, U 1∈ C L * QBe the base of signal subspace, U 2∈ C L * (L-Q)Base for noise subspace.
The human auditory system frequency range is 0 to 15500Hz, has covered several crucial sub-bands, and step 401 has been divided into 24 sub-frequency bands to it.Need in each sub-band, calculate auditory masking threshold.
(j, b) expression is the energy on interior b the frequency of j sub-frequency bands to E, can calculate according to signal subspace eigenwert and proper vector.In step 402, calculated the energy of each frequency
E ( j , b ) = mean ( 1 L &Sigma; i = 1 Q &lambda; S i | U 1 , i | 2 ) - - - ( 23 )
Wherein, &lambda; S i = &lambda; X i - &sigma; ~ N 2 For the eigenwert of target speech signal power spectrum matrix is estimated U 1, iBe i base of signal subspace, i ∈ 1 ..., Q} is a subscript, mean () is for getting the average operator.
SF (j) is the function of expressing people's ear basement membrane propagation characteristic on the j sub-frequency bands, j ∈ 1 ..., 24}.
In step 403, calculate the propagator of each sub-band
SF ( j ) = 15.81 + 7.5 ( j + 0.474 ) - 17.5 1 + ( j + 0.474 ) 2 , j∈{1,…,24} (24)
Next, the excitation energy value of energy on the step 404 reckoner traveller on a long journey ear basement membrane
C(j,b)=SF(j)*E(j,b),j∈{1,…,24} (25)
Step 405 is calculated auditory masking threshold
C thr = 10 log 10 | C ( j , b ) | - | O ( j ) 10 | - | &sigma; ~ N 2 10 | - - - ( 26 )
Wherein, O (j) is a side-play amount, j ∈ 1 ..., 24} representes the j sub-frequency bands.
In Fig. 5, the process flow diagram of an estimation linear filter has been described.This method is corresponding to the step 106 among Fig. 1.
In order on the characteristic codomain, to use auditory masking effect, need be with auditory masking threshold C ThrBe mapped on the characteristic codomain.Step 501 is according to the transformation relation of frequency domain to the characteristic codomain, by the auditory masking threshold θ=[θ on (13) formula calculated characteristics codomain 1..., θ Q] H
Next, step 502 utilizes (18) formula to calculate the diagonal entry g of diagonal matrix G iEstimation, i ∈ 1 ..., L} is the subscript of diagonal entry.
Finally, step 503 can obtain required linear filter H with in G matrix substitution (7) formula.
According to this instructions, the present invention further revises and the technician that changes for said field is conspicuous.Therefore, this explanation will be regarded as illustrative and its objective is to one of ordinary skill in the art's instruction and be used to carry out conventional method of the present invention.Should be appreciated that the form of the present invention that this instructions illustrates and describes just is counted as current preferred embodiment.

Claims (8)

1. the signal subspace microphone array voice enhancement method based on auditory perception property comprises the following steps:
Step a: through the multi-path voice signal of microphone array collection band noise; Carry out time domain alignment to each road Noisy Speech Signal; The frequency signal form of each the road signal indication value of pluralizing after using discrete Fourier transformation in short-term to align; Calculate the spectral power matrix of microphone array multiple signals and this spectral power matrix is carried out characteristic value decomposition, obtain eigenvalue matrix and eigenvectors matrix;
Step b: the eigenvalue matrix to spectral power matrix is carried out test of hypothesis, confirms signal subspace dimension Q;
Step c: on noise subspace, utilize noise power spectrum in the noise subspace to be less than the characteristics of the signals with noise power spectrum in the signal subspace, ask the method for expectation to estimate noise power spectrum through conditional probability;
Steps d: utilize noise subspace dimension P and noise power spectrum to estimate; Based on the human auditory system masking effect; Estimation obtains the auditory masking threshold of each frequency based on signal subspace, and noise subspace dimension is expressed as: P=L-Q, L are the numbers of microphone in the microphone array;
Step e: according to noise power spectrum, auditory masking threshold; Estimate linear filter in conjunction with Lagrange multiplier; Realization strengthens based on the signal subspace microphone array voice of auditory perception property, and said combination Lagrange multiplier estimates that the step of linear filter is following:
Step e1:, be mapped to auditory masking threshold on the characteristic codomain according to the transformation relation of frequency domain to the characteristic codomain;
Step e2: estimate Lagrange multiplier, so that the power spectrum characteristic value of the residual noise that obtains behind the linear filtering is less than the auditory masking threshold on the characteristic codomain;
Step e3: further design a linear filter H of minimization voice distortion, the residual noise in the feasible enhancing voice influences thereby eliminate residual noise, and the distortion of target speech signal is minimized less than the auditory masking threshold of people's ear.
2. signal subspace microphone array voice enhancement method as claimed in claim 1 is characterized in that, said spectral power matrix is carried out characteristic value decomposition, comprising:
Setting Noisy Speech Signal X is: X=S+N,
So, spectral power matrix RX is expressed as:
R X = U &Lambda; X U H = U ( &Lambda; S + &sigma; N 2 I ) U H
Wherein, S is the target speech signal, and N is a noise, R XBe Noisy Speech Signal spectral power matrix, Λ XBe the Noisy Speech Signal power spectrum characteristic value matrix of eigenwert descending sort, Λ SBe the target speech power spectrum signal eigenvalue matrix of eigenwert descending sort, U is an eigenvectors matrix,
Figure FSB00000615443700022
Be white noise power, I is L rank unit matrix, [] HBe the conjugate transpose operator.
3. signal subspace microphone array voice enhancement method as claimed in claim 1 is characterized in that, said test of hypothesis is at null hypothesis H 0: eigenvalue matrix Λ XPrerequisite that all equate to set up of back L-Q eigenwert under, get minimum signal subspace dimension Q value.
4. signal subspace microphone array voice enhancement method as claimed in claim 3 is characterized in that, judges that the step whether null hypothesis can be set up comprises as follows:
Null hypothesis H 0: eigenvalue matrix Λ XBack L-Q eigenwert all equate;
Alternative hypothesis H 1: eigenvalue matrix Λ XBack L-Q eigenwert in have two eigenwert differences at least;
Signal subspace dimension is defined as:
arg max Q ( &theta; | - 2 log [ F ( H 0 ) / F ( H 1 ) ] &GreaterEqual; &chi; &theta; , &alpha; 2 )
In the formula ,-2log [F (H 0)/F (H 1)] approximate obedience degree of freedom is that the card side of θ=L-Q-1 distributes, α is a degree of confidence, F (H 0) and F (H 1) be the distribution function of eigenwert; Promptly get satisfied
Figure FSB00000615443700024
Maximum L-Q value be noise subspace dimension P, argmax () is an operator of seeking the parameter value with maximum scores,
Figure FSB00000615443700025
Acceptance domain lower bound when being α for card side's distribution degree of confidence of θ degree of freedom.
5. signal subspace microphone array voice enhancement method as claimed in claim 4 is characterized in that, the distribution function F (H of said eigenwert 0) and F (H 1) the employing Gauss model.
6. signal subspace microphone array voice enhancement method as claimed in claim 1 is characterized in that, for the error that the noise power spectrum that is produced by the misjudgment of noise subspace dimension is estimated, uses a compensating factor to compensate; Compensating factor is the expectation value of noise power spectrum estimation and the ratio of noise power and estimation; Noise power spectrum is estimated divided by compensating factor, obtained revised noise power spectrum and estimate.
7. signal subspace microphone array voice enhancement method as claimed in claim 1 is characterized in that, the step of said estimation auditory masking threshold comprises:
Step ea: human auditory system frequency range 0-15500Hz is divided into several crucial sub-bands;
Step eb: calculate the auditory masking threshold in each sub-band respectively.
8. signal subspace microphone array voice enhancement method as claimed in claim 7; It is characterized in that; Auditory masking threshold in each sub-band of said calculating is the energy that calculates each frequency on each sub-band; Calculate the propagation coefficient of people's ear basement membrane for each frequency range sound, the propagation coefficient of the energy of each frequency on each sub-band and each frequency range sound is multiplied each other obtains the epilamellar excitation energy value of people's ear then; Funtcional relationship according to epilamellar excitation energy value of people's ear and auditory masking threshold calculates masking threshold again.
CN2009102498006A 2009-12-08 2009-12-08 Auditory perception property-based signal subspace microphone array voice enhancement method Expired - Fee Related CN101777349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102498006A CN101777349B (en) 2009-12-08 2009-12-08 Auditory perception property-based signal subspace microphone array voice enhancement method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102498006A CN101777349B (en) 2009-12-08 2009-12-08 Auditory perception property-based signal subspace microphone array voice enhancement method

Publications (2)

Publication Number Publication Date
CN101777349A CN101777349A (en) 2010-07-14
CN101777349B true CN101777349B (en) 2012-04-11

Family

ID=42513784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102498006A Expired - Fee Related CN101777349B (en) 2009-12-08 2009-12-08 Auditory perception property-based signal subspace microphone array voice enhancement method

Country Status (1)

Country Link
CN (1) CN101777349B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102157156B (en) * 2011-03-21 2012-10-10 清华大学 Single-channel voice enhancement method and system
CN102300140B (en) 2011-08-10 2013-12-18 歌尔声学股份有限公司 Speech enhancing method and device of communication earphone and noise reduction communication earphone
CN102969000B (en) * 2012-12-04 2014-10-22 中国科学院自动化研究所 Multi-channel speech enhancement method
CN104575511B (en) * 2013-10-22 2019-05-10 陈卓 Sound enhancement method and device
US9449610B2 (en) * 2013-11-07 2016-09-20 Continental Automotive Systems, Inc. Speech probability presence modifier improving log-MMSE based noise suppression performance
CN105845127B (en) * 2015-01-13 2019-10-01 阿里巴巴集团控股有限公司 Audio recognition method and its system
CN108564963B (en) * 2018-04-23 2019-10-18 百度在线网络技术(北京)有限公司 Method and apparatus for enhancing voice
CN108766454A (en) * 2018-06-28 2018-11-06 浙江飞歌电子科技有限公司 A kind of voice noise suppressing method and device
CN110858485B (en) * 2018-08-23 2023-06-30 阿里巴巴集团控股有限公司 Voice enhancement method, device, equipment and storage medium
CN109036452A (en) * 2018-09-05 2018-12-18 北京邮电大学 A kind of voice information processing method, device, electronic equipment and storage medium
CN109727605B (en) * 2018-12-29 2020-06-12 苏州思必驰信息科技有限公司 Method and system for processing sound signal
CN110047519B (en) * 2019-04-16 2021-08-24 广州大学 Voice endpoint detection method, device and equipment
CN110867082B (en) * 2019-10-30 2020-09-11 中国科学院自动化研究所南京人工智能芯片创新研究院 System for detecting whistle vehicles in no-sounding road section
CN111370017B (en) * 2020-03-18 2023-04-14 苏宁云计算有限公司 Voice enhancement method, device and system

Also Published As

Publication number Publication date
CN101777349A (en) 2010-07-14

Similar Documents

Publication Publication Date Title
CN101777349B (en) Auditory perception property-based signal subspace microphone array voice enhancement method
CN107993670B (en) Microphone array speech enhancement method based on statistical model
CN108831495A (en) A kind of sound enhancement method applied to speech recognition under noise circumstance
CN101778322B (en) Microphone array postfiltering sound enhancement method based on multi-models and hearing characteristic
CN110085248B (en) Noise estimation at noise reduction and echo cancellation in personal communications
DE102017102134B4 (en) Globally optimized post-filtering using the least squares method for speech enhancement
CN102164328B (en) Audio input system used in home environment based on microphone array
CN108922554B (en) LCMV frequency invariant beam forming speech enhancement algorithm based on logarithmic spectrum estimation
CN1670823B (en) Method for detecting and reducing noise from a microphone array
DE112017006486T5 (en) ONLINE REPLACEMENT ALGORITHM BASED ON WEIGHTED PREDICTATION ERRORS FOR NOISE EMISSIONS ENVIRONMENT
US20070100605A1 (en) Method for processing audio-signals
CN110148420A (en) A kind of audio recognition method suitable under noise circumstance
US8351554B2 (en) Signal extraction
CN110310656A (en) A kind of sound enhancement method
CN104103277A (en) Time frequency mask-based single acoustic vector sensor (AVS) target voice enhancement method
CN110517701A (en) A kind of microphone array voice enhancement method and realization device
CN110827847B (en) Microphone array voice denoising and enhancing method with low signal-to-noise ratio and remarkable growth
CN105390142A (en) Digital hearing aid voice noise elimination method
CN106331969B (en) Method and system for enhancing noisy speech and hearing aid
CN110415720A (en) The constant Beamforming Method of the super directional frequency of quaternary difference microphone array
CN111508516A (en) Voice beam forming method based on channel correlation time frequency mask
CN104464745A (en) Two-channel speech enhancement system and method
EP1154674B1 (en) Circuit and method for adaptive noise suppression
CN112530451A (en) Speech enhancement method based on denoising autoencoder
CN113763984B (en) Parameterized noise elimination system for distributed multi-speaker

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120411

Termination date: 20211208

CF01 Termination of patent right due to non-payment of annual fee