CN105609113A

CN105609113A - Bispectrum weighted spatial correlation matrix-based speech sound source localization method

Info

Publication number: CN105609113A
Application number: CN201510937548.3A
Authority: CN
Inventors: 刘文举; 雪巍; 梁山
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2015-12-15
Filing date: 2015-12-15
Publication date: 2016-05-25

Abstract

The invention relates to a bispectrum weighted spatial correlation matrix-based speech sound source localization method. The objective of the invention is to solve problems in microphone array-based robust sound source localization in an actual complex noise environment. According to the method, the special mathematical properties of speech signals and noise signals received by a microphone array on a bispectrum domain are utilized. The method includes the following steps that: framing and bispectrum estimation are performed on signals acquired by the microphone array; on the bispectrum domain, the bispectrum phase difference of each microphone and a reference microphone is calculated; the signals of the reference microphone are adopted to estimate bispectrum unit weight; a bispectrum weighted spatial correlation matrix corresponding to a candidate direction is calculated according to the bispectrum phase difference and the bispectrum unit weight; a sound source direction cost function for a current candidate direction is calculated based on the eigenvalue of the bispectrum weighted spatial correlation matrix; and the direction of a speech sound source is estimated according to a direction corresponding to the maximum value of the sound source direction cost function.

Description

Based on the voice sound localization method of two spectrum weighted space correlation matrixes

Technical field

The present invention relates to the noise robustness voice sound localization method design based on microphone array, moreParticularly, relate to the voice sound localization method based on two spectrum weighted space correlation matrixes.

Background technology

Obtain in recent years research widely based on Microphone Array Speech sound localization method. Pass throughThe microphone institute collected sound signal of different spatial, according to the time difference information of voice signalWith microphone array geometry, can finally determine sound bearing. The estimation of time difference informationThe performance of auditory localization algorithm is had to conclusive effect. Noise is that voice sound localization method is realBy the main restricting factor of changing. Noise under actual environment comprises direction-free shot noise, withAnd oriented interference source noise.

All frameworks based on same of the existing auditory localization algorithm based on microphone array: first pre-If sound bearing candidate collection, then calculates corresponding to " cost function " in each candidate orientation and obtainsPoint, the orientation that cost function score is the highest is the most at last estimated as final sound bearing. When undirectedWhen shot noise is stronger, the time difference information between each microphone signal is subject to undirected noise and floodsNot yet, cause the dimensional orientation discrimination of sound bearing cost function to decline; When in environment, have toWhen interference noise, sound bearing cost function tends to peak-peak directional signal energy maximumDirection, and can not effectively distinguish voice and interfering noise.

Tradition sound localization method be mainly divided into the method estimated based on high resolution space spectrum, based onThe method of controlled power response, the method for estimating based on time delay.

The method of estimating based on high resolution space spectrum mainly comes from eighties of last century at military, communication neckNarrow band signal incidence angle estimation problem in the application such as sonar, radar in territory. This class methods pair arraySpatial correlation matrix carries out Subspace Decomposition, utilizes the orthogonality of signal subspace and noise subspaceMatter, constructs specific sound bearing cost function, and it is had in sound bearing in theory for justInfinite sharp-pointed peak value. Because these class methods are the simple popularizations from narrow band signal, and But most of algorithmsAnd not exclusively for the orientation problem of speech source, therefore not by the self-characteristic of voice (as harmonic waveCharacteristic, non-stationary property etc.) be fused in the design of algorithm. Due to voice spectrum distribution character andNarrow band signal, broadband stationary signal are all not identical, and therefore many algorithms can not ideally be applicable to languageSound auditory localization.

First method based on controlled power response design Beam-former, to strengthen specific directionEcho signal suppresses the non-echo signal of other directions, then to all candidates sound bearing simultaneouslyCarry out above-mentioned beam scanning, the signal energy after this candidate orientation is strengthened, as corresponding to this timeSelect the cost function score of sound bearing. Finally, by the direction corresponding to spiking output energyEstimate as current sound bearing. An important hypothesis of the method based on controlled power response beVoice sound bearing, the output maximum of Beam-former. But, in actual environment, particularly haveUnder the condition existing to interference source noise, this hypothesis can not effectively be set up. How to improve at noisePerformance under environment is the subject matter that such algorithm faces.

The method of estimating based on time delay is divided into two large steps. First, need to be according to observed manyRoad signal, estimates the time delay between each road, then, estimates according to time delay, and microphoneThe geometry of array, further calculates sound bearing. With estimate based on high resolution space spectrumMethod, method based on controlled responding power are compared, and this class methods computation complexity is lower, realizeSimply, therefore paid attention to widely. In the time that the sample rate of signal is lower, owing to can only estimating letterThe time delay of integer sampled point between number, the method for therefore estimating based on time delay can not reach higherAngular resolution. Noise is still one of main challenge that these class methods face. Especially make an uproar orientedWhen sound exists, the Delay between echo signal is very easily subject to oriented noise time delay and disturbs.

Summary of the invention

In order to solve the problem of prior art, the object of the invention is to improve undirected and have simultaneouslyTo the voice auditory localization performance under noise circumstance. For reaching affiliated object, the invention provides oneBased on the voice sound localization method of two spectrum weighted space correlation matrixes. The concrete steps of the method asUnder:

Step a: the multi-path voice signal by microphone array collection with noise, the Ba Ge road band language of making an uproarTone signal is divided respectively frame, on each frame, calculate each road time-domain signal and first via microphone signal itBetween mutually two spectrum values;

Step b: on two spectral domains, calculate the two spectrum phases between each road microphone and first via microphonePotential difference;

Step c: on two spectral domains, utilize first via microphone signal and the mutually two spectrum values of self to estimateTwo spectrums unit weight;

Steps d: definition candidate orientation set;

Step e: poor and two spectrums unit weight according to bispectrum phase, structure is for current candidate orientationTwo spectrum weighted space correlation matrixes;

Step f: based on two spectrum weighted space correlation matrix characteristic values, calculate for current candidate orientationSound bearing cost function;

Step g: repeating step e～step f, until traversal candidate orientation set. According to sound bearingThe corresponding orientation of cost function maximum, obtains the estimation of voice sound bearing.

Step a comprises especially and to use directly two spectrum estimations technique or indirectly two spectrum estimations technique, passes throughThe digital observation signal that each road microphone is received, calculates each road time-domain signal and first via MikeOriginal mutual pair of spectrum value between wind signal.

Step a comprises that employing time smoothing strategy is to claim 2 Zhong Ge road time-domain signal and the first viaOriginal mutual pair of spectrum value between microphone signal carried out post processing:

B_{x, m}^{(t)} (Ω_{1}, Ω_{2}) = {αB}_{x, m}^{(t)} (Ω_{1}, Ω_{2}) + (1 - α) {\hat{B}}_{x, m}^{(t)} (Ω_{1}, Ω_{2})

Wherein,(Ω₁，Ω₂) be t frame m road microphone signal and first via microphone signal itBetween mutually two spectrum values,(Ω₁，Ω₂) be t frame m road microphone signal and first via microphoneOriginal mutual pair of spectrum value between signal, 0≤α < 1 is smoothing factor.

Step b comprises that the following equation of use calculates between each road microphone and first via microphone signalBispectrum phase poor:

I_{m 1} (Ω_{1}, Ω_{2}) = \frac{B_{x, m}^{(t)} (Ω_{1}, Ω_{2}) {[B_{x, 1}^{(t)} (Ω_{1}, Ω_{2})]}^{*}}{| B_{x, m}^{(t)} (Ω_{1}, Ω_{2}) | | B_{x, 1}^{(t)} (Ω_{1}, Ω_{2}) |}

Wherein, I_m1(Ω₁，Ω₂) be the two spectrum phases between m road microphone and first via microphone signalPotential difference, []^*For adjoint operator, || be delivery operator.

Described in step c, utilize first via microphone signal and the mutually two spectrum values between self to estimate two spectrumsUnit weight comprises:

Set t frame first via microphone signal and the mutually two spectrum values between self are(Ω₁，Ω₂)，So, estimate that two spectrums unit weight comprises the steps:

Step c1: according to(Ω₁，Ω₂) and(Ω₁，Ω₂) estimation t frame first via microphone signalTwo spectrum priori signal to noise ratios;

Step c2: two spectrums unit weight be calculated as two spectrum priori signal to noise ratios and(Ω₁，Ω₂) letterNumber.

Described in step e, structure, for two spectrum weighted space correlation matrixes in current candidate orientation, comprisesFollowing step:

Step e1: poor according to the bispectrum phase between each road microphone and first via microphone, calculateThe bispectrum phase difference vector of complex field;

Step e2: according to candidate orientation, calculate the poor compensation vector of bispectrum phase;

Step e3: according to bispectrum phase difference vector and the poor compensation vector of bispectrum phase, calculate after compensationBispectrum phase difference vector;

Step e4: according to bispectrum phase difference vector after compensation and two spectrums unit weight, structure is for working asOriginal two spectrum weighted space correlation matrixes in front candidate orientation;

Step e5: adopt time smoothing strategy to carry out rear place to original two spectrum weighted space correlation matrixesReason.

Described in above-mentioned steps e5, adopt time smoothing strategy to original two spectrum weighted space correlation matrixesCarry out post processing, comprising:

R (θ) = β R (θ) + (1 - β) \hat{R} (θ)

Wherein, R (θ) is the two spectrum weighted space correlation matrixes for current candidate orientation θ,ForOriginal two spectrum weighted space correlation matrix, 0≤β < 1 is smoothing factor.

Described in step f, calculate the sound bearing cost function for current candidate orientation, comprising:

Setting for two spectrum weighted space correlation matrixes of current candidate orientation θ is R (θ), and MikeWind array comprises M microphone altogether, and M the Complex eigenvalues value of R (θ) according to mould value from big to smallBe arranged as | λ₁(θ)|≥|λ₂(θ)|...≥|λ_M(θ) |, so, for the sound bearing of current candidate orientation θCost function is defined as the function of M Complex eigenvalues value.

Described in step g, according to the corresponding orientation of cost function maximum, sound bearing, obtain voiceThe estimation of sound bearing, comprising:

Set all candidate orientation composition candidate orientation set Θ, for current candidate orientation θ'sSound bearing cost function is J (θ), so voice sound bearingBe estimated as:

\tilde{θ} = \arg \max_{θ &Element; Θ} J (θ)

Beneficial effect of the present invention: first, traditional sound localization method cannot be eliminated in theoryThe impact of undirected Gaussian noise. Two spectrums are one of signal " high-order statistic ". High-order statisticThe better character having is that the high-order statistic of Gaussian noise is 0. Therefore, resonableIn opinion, the present invention has robustness to Gaussian noise signal. Secondly, two spectrum of the present inventionPhase difference comprises voice sound bearing clue, and can obtain theoretically this voice sound bearingClue has redundancy on two spectral domains, and (theory analysis of this part can be with reference to WeiXue, ShanLiang，WenjuLiu，“WeightedSpatialBispectrumCorrelationMatrixforDOAEstimationinthePresenceofInterferences，”InterSpeech2014，September14-18, Singapore, pp.2228-2232,2014). Even if it is two that this makes at someIn spectrum unit, sound bearing clue is polluted by noise, but seriously not dirty by noise at otherIn the two spectrums unit dying, still may pick up same azimuth information. This feature contributes to carryPerformance under the oriented noise conditions of high non-Gauss. Finally, the present invention adopts the weighting of two spectrum, extracts languageTwo sound bearing clues of composing unit that sound is leading, suppress the negative effect of noise, further improveFor the robustness of noise.

Brief description of the drawings

The further characteristic of the present invention and advantage are described below with reference to illustrative accompanying drawing.

Fig. 1 has schematically shown the Microphone Array Speech sound based on two spectrum weighted space correlation matrixesSource localization method flow chart;

Fig. 2 has schematically shown the flow chart of mutually two spectral estimation units;

Fig. 3 has schematically shown the flow chart of two spectrums unit weight calculation unit;

Fig. 4 has schematically shown the flow chart of two spectrum weighted space correlation matrix structures;

Fig. 5 has schematically shown the flow chart of candidate orientation cost function calculation.

Detailed description of the invention

The following detailed description that should be appreciated that different examples and accompanying drawing is not to be intended to the present invention to limitBe formed on special illustrative embodiment; The illustrative embodiment being described is only that illustration is of the present inventionEach step, its scope is defined by the claim of adding.

The present invention utilizes two spectrums for Gaussian noise robustness in theory, and sound on two spectral domainThe redundancy of source side bit line rope, adopts the weighting of two spectrum to choose the main two spectrums unit of voice, and raising is made an uproarThe performance of voice auditory localization under acoustic environment.

In Fig. 1, provide the Microphone Array Speech sound based on two spectrum weighted space correlation matrixesSource localization method flow chart.

System comprises the microphone array of at least two microphones 101. The microphone of microphone arrayCan have different arrangements, especially, microphone 101 is placed in a row, wherein each microphoneThere is preset distance with adjoining microphone. For example, the distance between two microphones can be aboutIt is 5 centimetres. For different applied environments and technical requirement, microphone array can be installed inSuitable position.

Core calculations part of the present invention comprises five elementary cells: mutually two spectral estimation units 102,The poor computing unit 103 of bispectrum phase, two spectrums unit weight calculation unit 104, two spectrum weighted spaceCorrelation matrix construction unit 106, cost function calculation unit, candidate orientation 107. The method is passed throughAll candidates orientation in traversal candidate orientation set, calculates corresponding candidate orientation cost function,Finally judge cost function maximum value position 108, determine voice sound bearing.

One, mutually two spectral estimation units

The time-domain signal of setting m road is x_m(k), can be decomposed into speech components, oriented interferenceNoise component(s) v_mAnd undirected noise component(s) n (k)_m(k) add up:

x_m(k)＝γ_ms₁(k-τ_m1)+v_m(k)+n_m(k)(1)

Wherein, γ_mThe amplitude fading factor of sound source to m microphone, τ_m1It is microphone 1To the time difference of microphone m, this time difference and sound bearing are directly related.

Fig. 2 has shown the flow chart of mutually two spectral estimation units, corresponding to step 102 in Fig. 1.

In step 102, first, the data signal that step 201 observes multichannel is carried out respectivelyDivide frame. Next,, on each frame, step 202 is calculated each road signal and first via Mike windOriginal mutual pair of spectrum value between number. Original mutual pair of spectrum value can be counted by " direct method " or " indirect method "Calculate, calculation process can bibliography ChrysostomosLNikiasandMysoreRRaghuveer,“Bispectrumestimation：Adigitalsignalprocessingframework，”ProceedingsOftheIEEE, 75 (7): 869-891,1987. Because voice have non-stationaryly, and adoptFrame length is limited, in order to reduce the estimate variance of mutually two spectrum values, need to carry out time smoothing post processing.Set(Ω₁，Ω₂) be between t frame m road microphone signal and first via microphone signalOriginal mutual pair of spectrum value, the time smoothing method of step 203 is:

B_{x, m}^{(t)} (Ω_{1}, Ω_{2}) = α_{1} B_{x, m}^{(t)} (Ω_{1}, Ω_{2}) + (1 - α_{1}) {\hat{B}}_{x, m}^{(t)} (Ω_{1}, Ω_{2}) - - - (2)

Wherein,(Ω₁，Ω₂) be t frame m road microphone signal and first via microphone signal itBetween mutually two spectrum values, 0≤α₁< 1 is smoothing factor.

Two spectrums are three rank statistics of signal, are the simplest high-order statistics. In theory, high-orderAn excellent specific property of statistic is that the high-order statistic of Gaussian noise is 0. Under actual environmentUndirected noise can be modeled as the uncorrelated white Gaussian noise in space conventionally, and thus, the present invention canTo eliminate in theory the impact of white Gaussian noise. Suppose voice signal, undirected noise signal andIndependence between oriented interference source signal, by formula (1),(Ω₁，Ω₂) can further be shownReach for:

B_{x, m}^{(t)} (Ω_{1}, Ω_{2}) = γ_{1}^{2} γ_{m} B_{s, m}^{(t)} (Ω_{1}, Ω_{2}) + B_{v, m}^{(t)} (Ω_{1}, Ω_{2}) - - - (3)

Wherein,(Ω₁，Ω₂) and(Ω₁，Ω₂) mutually voice and the oriented interference in two spectrums of representative respectivelySource component.

Two, the poor computing unit of bispectrum phase

In Fig. 1 step 103, adopt following equation to calculate each road microphone and first via microphoneBispectrum phase between signal is poor:

I_{m 1} (Ω_{1}, Ω_{2}) = \frac{B_{x, m}^{(t)} (Ω_{1}, Ω_{2}) {[B_{x, 1}^{(t)} (Ω_{1}, Ω_{2})]}^{*}}{| B_{x, m}^{(t)} (Ω_{1}, Ω_{2}) | | B_{x, 1}^{(t)} (Ω_{1}, Ω_{2}) |} - - - (4)

According in formula (2)(Ω₁，Ω₂) and(Ω₁，Ω₂) expression formula, I_m1(Ω₁，Ω₂) can be entered oneStep is expressed as:

I_{m 1} (Ω_{1}, Ω_{2}) = e^{{jΩ}_{1} τ_{m 1}} κ_{m} (Ω_{1}, Ω_{2}) - - - (5)

Wherein,

This shows, due to τ_m1Relevant with sound bearing, thereforeBe called as sound source sideBit line rope. k_m(Ω₁，Ω₂) relevant with two spectrums of non-Gauss's interference source signal. Obviously, at pure languageThe two spectrum of sound unit, because interference noise does not exist, k_m(Ω₁，Ω₂)=1. Like this, bispectrum phase is poorEquate with voice sound bearing clue.

A critical nature that can draw is, even the poor difference (Ω that is defined within of bispectrum phase₁，Ω₂) getOn two spectrums unit of value, but actual voice sound bearing clue is not Ω₂Function. Change speechIt, as long as two different two spectrums unit have same Ω₁, no matter Ω₂Value how, in two spectrumsIn phase difference, they all comprise same voice sound bearing clue. Therefore voice sound bearing lineRope is redundancy on two spectral domains. Because having different two spectrums, voice and interference noise distribute,In some two spectrums unit, voice sound bearing clue is by noise severe contamination, nonetheless, sameClue likely there is same Ω₁Value, in two spectrums unit of noise pollution, do not weighedNewly find. This character will be conducive to improve the performance of this method under non-Gaussian noise environment.

Three, two spectrums unit weight calculation unit

Fig. 3 has shown the flow chart of two spectrums unit weight calculation unit, corresponding to step in Fig. 1104。

In step 104, set t frame first via microphone signal and the two spectrums mutually between selfValue is(Ω₁，Ω₂), so, estimate that two spectrums unit weight comprises following sub-step:

Step 401: according to(Ω₁，Ω₂) and(Ω₁，Ω₂) estimation t frame first via Mike windNumber two spectrum priori signal to noise ratios;

Step 402: two spectrums unit weight be calculated as two spectrum priori signal to noise ratios and(Ω₁，Ω₂) letterNumber.

Step 401 adopts the two spectrum a priori SNR estimation methods based on decision-directed. Two spectrum prioriSignal to noise ratio is defined as the ratio of the two spectral components of voice and the two spectrum energies of noise in two spectrums unit, definesFor:

ξ^{(t)} (Ω_{1}, Ω_{2}) = \frac{| B_{s, 1}^{(t)} (Ω_{1}, Ω_{2}) |^{2}}{λ_{v} (Ω_{1}, Ω_{2})} - - - (7)

Wherein,Represent the estimated value of the two spectrum energies of interference noise. ByInThe unknown, need estimate two spectrum priori signal to noise ratios. For this reason, need define simultaneouslyTwo spectrum posteriori SNRs are the ratio of two spectrum unit gross energies and the two spectrum energies of noise:

Be analogous to the computational methods of time-frequency domain priori signal to noise ratio, ξ^(t)(Ω₁，Ω₂) be estimated as:

Wherein, P[] represent halfwave rectifier operation, to ensure ξ^(t)(Ω₁，Ω₂) nonnegativity of estimated value.0.92≤α₂≤ 0.98 is a smoothing factor.

In step 402, two spectrums unit weight is calculated asWithLetterNumber. First, according toWithCalculate two spectrum energy of present frame clean speechAmount

| B_{s, 1}^{(t)} (Ω_{1}, Ω_{2}) |^{2} = \frac{{\hat{ξ}}^{(t)} (Ω_{1}, Ω_{2})}{{\hat{ξ}}^{(t)} (Ω_{1}, Ω_{2}) + 1} * | B_{x, 1}^{(t)} (Ω_{1}, Ω_{2}) |^{2} - - - (10)

Next, according toChoose the leading two spectrums unit of voice:

Wherein,It is threshold parameter. Finally, two spectrums unit weight w (Ω₁，Ω₂) be calculated as:

w (Ω_{1}, Ω_{2}) = | B_{s, 1}^{(t)} (Ω_{1}, Ω_{2}) |^{2} G (Ω_{1}, Ω_{2}) - - - (12)

Four, two spectrum weighted space correlation matrix structures

Fig. 4 has shown the flow chart of two spectrum weighted space correlation matrix structures, corresponding to walking in Fig. 1Rapid 106.

In step 106, structure is for two spectrum weighted space correlation matrix bags in current candidate orientationDraw together following sub-step:

Step 601: poor according to the bispectrum phase between each road microphone and first via microphone, meterCalculate the bispectrum phase difference vector of complex field;

Step 602: according to candidate orientation, calculate the poor compensation vector of bispectrum phase;

Step 603: according to bispectrum phase difference vector and the poor compensation vector of bispectrum phase, calculate compensationRear bispectrum phase difference vector;

Step 604: according to bispectrum phase difference vector after compensation and two spectrums unit weights, structure forOriginal two spectrum weighted space correlation matrixes in current candidate orientation;

Step 605: after adopting time smoothing strategy to carry out original two spectrum weighted space correlation matrixesProcess.

In step 601, suppose that microphone array comprises M microphone altogether, two spectrums of complex fieldPhase difference vector is calculated as:

I(Ω₁，Ω₂)＝[I₁₁(Ω₁，Ω₂)，I₂₁(Ω₁，Ω₂)，...，I_M1(Ω₁，Ω₂)]^T(13)

In step 602, the expression formula of reference voice sound bearing clue, the poor compensation of bispectrum phaseVector is defined as the function of candidate orientation θ:

C (θ, Ω_{1}) = {[1, e^{- {jΩ}_{1} {\hat{τ}}_{21} (θ)}, ..., e^{- {jΩ}_{1} {\hat{τ}}_{M 1} (θ)}]}^{T} - - - (14)

Wherein,(for m=2 ..., M) candidate orientation θ is converted into a m microphone to theTime difference between a microphone.

Further, in step 603, two spectrum phases of calculating according to step 601 and step 602The poor compensation vector of potential difference vector sum bispectrum phase, calculate the rear bispectrum phase difference vector of compensation:

I^c(θ，Ω₁，Ω₂)＝I(Ω₁，Ω₂)⊙C(θ，Ω₁)(15)

Wherein, the element of symbol " ⊙ " representation vector multiplies each other.

In step 604, two spectrum unit weights and the benefits calculated according to step 104 and step 603Repay rear bispectrum phase difference vector, structure is relevant for original two spectrum weighted spaces in current candidate orientationMatrix

\hat{R} (θ) = \underset{(Ω_{1}, Ω_{2})}{Σ} w (Ω_{1}, Ω_{2}) [I^{c} (θ, Ω_{1}, Ω_{2})] {[I^{c} (θ, Ω_{1}, Ω_{2})]}^{H} - - - (16)

In order further to overcome the impact of noise, step 605 adopts time smoothing strategy pairEnterRow post processing:

R (θ) = β R (θ) + (1 - β) \hat{R} (θ) - - - (17)

Five, candidate orientation cost function calculation

Fig. 5 has shown the flow chart of candidate orientation cost function calculation, corresponding to step in Fig. 1107。

In step 701, R (θ) is carried out to Eigenvalues Decomposition, and its M Complex eigenvalues value pressedBe arranged as from big to small according to mould value | λ₁(θ)|≥|λ₂(θ)≥...≥|λ_M(θ)|。

In step 702, for sound bearing cost function J (θ) definition of current candidate orientation θFunction for M Complex eigenvalues value:

J (θ) = \frac{1}{Σ_{i = 2}^{M} | λ_{i} (θ) |} - - - (18)

By WeiXue, ShanLiang, WenjuLiu, " WeightedSpatialBispectrumCorrelationMatrixforDOAEstimationinthePresenceofInterferences，”InterSpeech2014，September14-18，Singapore，pp.2228-2232,2014 is known, in the time that candidate sound bearing is equal with true sound bearing, R's (θ)Order is 1, like this | and λ₂(θ)|＝...＝|λ_M(θ) |=0, therefore, J (θ) is positive infinity in theory.

Six, cost function maximum value position judgement

In Fig. 1 step 108, voice sound bearing is estimated as and makes J (θ) have peakedCandidate orientation:

\tilde{θ} = \arg \max_{θ &Element; Θ} J (θ) - - - (19)

Wherein, Θ is the candidate orientation set that all candidate orientation form.

According to this description, the further modifications and variations of the present invention are for the technology people in described fieldMember is apparent. Therefore, this explanation will be regarded as illustrative and its objective is to affiliatedThose skilled in the art lecture and are used for carrying out conventional method of the present invention. Should be appreciated that this description showsThe form of the present invention going out and describe is just counted as current preferred embodiment.

Claims

1. the voice sound localization method based on two spectrum weighted space correlation matrixes, comprises the following steps:

Step a: the multi-path voice signal by microphone array collection with noise, Ba Ge road Noisy Speech Signal divides respectively frame, calculates the mutually two spectrum values between each road time-domain signal and first via microphone signal on each frame;

Step b: on two spectral domains, calculate bispectrum phase between each road microphone and first via microphone poor;

Step c: on two spectral domains, utilize first via microphone signal and the mutually two spectrum values of self to estimate two spectrums unit weight;

Steps d: definition candidate orientation set;

Step e: poor and two spectrums unit weight according to bispectrum phase, structure is for two spectrum weighted space correlation matrixes in current candidate orientation;

Step f: based on two spectrum weighted space correlation matrix characteristic values, calculate the sound bearing cost function for current candidate orientation;

Step g: repeating step e～step f, until all candidates orientation in traversal candidate orientation set. According to the corresponding orientation of cost function maximum, sound bearing, obtain the estimation of voice sound bearing.

2. sound localization method as claimed in claim 1, wherein, step a comprises and uses directly two spectrum estimations technique or indirectly two spectrum estimations technique, by the received digital observation signal of each road microphone, calculates original mutual pair of spectrum value between each road time-domain signal and first via microphone signal.

3. sound localization method as claimed in claim 2, wherein, step a comprises that employing time smoothing strategy carries out post processing to original mutual pair of spectrum value between claim 2 Zhong Ge road time-domain signal and first via microphone signal:

Wherein,Be the mutually two spectrum values between t frame m road microphone signal and first via microphone signal,Be original mutual pair of spectrum value between t frame m road microphone signal and first via microphone signal, 0≤α < 1 is smoothing factor.

4. sound localization method as claimed in claim 1, wherein, step b comprises that to use following two equatioies to calculate bispectrum phase between each road microphone and first via microphone signal poor:

Wherein, I_m1(Ω₁，Ω₂) be that bispectrum phase between m road microphone and first via microphone signal is poor, []^*For adjoint operator, || be delivery operator.

5. sound localization method as claimed in claim 1, wherein, utilizes described in step c first via microphone signal and the mutually two spectrum values between self to estimate that two spectrums unit weights comprise:

Set t frame first via microphone signal and the mutually two spectrum values between self areSo, estimate that two spectrums unit weight comprises the steps:

Step c1: according toWithEstimate two spectrum priori signal to noise ratios of t frame first via microphone signal;

Step c2: two spectrums unit weight be calculated as two spectrum priori signal to noise ratios andFunction.

6. sound localization method as claimed in claim 1, wherein, structure, for two spectrum weighted space correlation matrixes in current candidate orientation, comprises the steps: described in step e

Step e1: poor according to the bispectrum phase between each road microphone and first via microphone, the bispectrum phase difference vector in calculated complex territory;

Step e2: according to current candidate orientation, calculate the poor compensation vector of bispectrum phase;

Step e3: according to bispectrum phase difference vector and the poor compensation vector of bispectrum phase, calculate the rear bispectrum phase difference vector of compensation;

Step e4: according to bispectrum phase difference vector after compensation and two spectrums unit weight, structure is for original two spectrum weighted space correlation matrixes in current candidate orientation;

Step e5: adopt time smoothing strategy to carry out post processing to original two spectrum weighted space correlation matrixes.

7. as the sound localization method of claim 1 or 6, wherein, adopt time smoothing strategy to carry out post processing to original two spectrum weighted space correlation matrixes described in step e5, comprising:

Wherein, R (θ) is the two spectrum weighted space correlation matrixes for current candidate orientation θ,For original two spectrum weighted space correlation matrixes, 0≤β < 1 is smoothing factor.

8. sound localization method as claimed in claim 1, wherein, calculates the sound bearing cost function for current candidate orientation described in step f, comprising:

Setting for two spectrum weighted space correlation matrixes of current candidate orientation θ is R (θ), and microphone array comprises M microphone altogether, and M the Complex eigenvalues value of R (θ) is arranged as from big to small according to mould value | λ₁(θ)|≥|λ₂(θ)|≥...≥|λ_M(θ) |, so, be defined as the function of M Complex eigenvalues value for the sound bearing cost function of current candidate orientation θ.

9. sound localization method as claimed in claim 1, wherein, according to the corresponding orientation of cost function maximum, sound bearing, obtains the estimation of voice sound bearing described in step g, comprising:

Setting all candidate orientation composition candidate orientation set Θ, is J (θ), so voice sound bearing for the sound bearing cost function of current candidate orientation θBe estimated as:

。