CN106093864A

CN106093864A - A kind of microphone array sound source space real-time location method

Info

Publication number: CN106093864A
Application number: CN201610391351.9A
Authority: CN
Inventors: 杨毅; 孙甲松
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2016-06-03
Filing date: 2016-06-03
Publication date: 2016-11-09
Anticipated expiration: 2036-06-03
Also published as: CN106093864B

Abstract

One microphone array sound source space of the present invention real-time location method, using microphone array as signals collecting and outut device, by using controlled power response phase converter technique tentatively to provide sound source locus candidate point；Carry out preliminary candidate point by priori to screen, and use controlled power response phase converter technique to calculate the controlled power response output of candidate point；Shrink by the random areas improved and redefine search border, improve the efficiency of controlled power response phase converter technique；Finally calculating the controlled power response of residue candidate point, position is estimated as final sound source in the position choosing maximum；Sound localization definite principle of the present invention, real-time is good, and experiment proof the method position error range-controllable system in the plane is on centimetres, and performance is better than method based on prior art；Having higher arithmetic speed and robustness, can be applicable to Smart Home and intelligent robot etc. needs to carry out the scene of real-time acoustic source location.

Description

A kind of microphone array sound source space real-time location method

Technical field

The invention belongs to voice technology field, particularly to a kind of microphone array sound source space real-time location method.

Background technology

Voice processing technology typically requires the spatial positional information grasping sound source, for Application on Voiceprint Recognition, voice content identification Help is provided etc. interaction technique.Such as, when user carries out voice dialogue with intelligent robot, the space bit of people to be determined Put and carry out turning to and close, even if the concrete orientation of user can also be determined in the dark only according to sound, and automatically leaning on Near nearly speaker, it is provided that necessary service and help；It is generally required to automatically adjust meeting photographic head in video conference, towards Spokesman, and different spokesman can be carried out video switching.

The equipment being commonly used in sound localization is microphone array, is commonly defined as: by multiple (typically larger than three) Mike is put according to the geometrical rule specified and the equipment of Complete Synchronization collected sound signal.Microphone array positions at present Common method specifically includes that sound localization based on the time of advent poor (Time Delay of Arrival, TDOA), location head First pass through time delay and estimate that obtaining sound-source signal arrives the time difference of different array element, then carried out by the geometric construction of microphone array Sound source position judges；The sound localization of (Steered Response Power, SRP) is responded and based on height based on controlled power The sound localization etc. of resolution Power estimation (High-resolution Spectral Estimation).

1, the time of advent poor (Time Delay of Arrival, TDOA) method:

The method first passes through time delay and estimates that obtaining sound-source signal arrives the time difference of different array element, then passes through microphone array The geometric construction of row carries out sound source position judgement.Conventional delay time estimation method is based on broad sense cross-correlation (Generalized Cross Correlation, GCC) method, it concretely comprises the following steps, if X_i(ω) and X_j(ω) represent i-th mike respectively to receive Signal x_iT () and jth mike receive signal x_jT the Fourier transformation of (), then have a following equation:

Ψ_{i j} (ω) = X_{i} (ω) X_{j}^{*} (ω)

R_{i j} (τ) = {&Integral;}_{- \infty}^{\infty} W_{i j} (ω) Ψ_{i j} (ω) e^{i ω τ} d ω

{\hat{τ}}_{i j} = \arg \max R_{i j} (τ)

Wherein Ψ_ij(ω) cross-spectral density of two signals, R are represented_ijRepresent the cross-correlation function of two signals, For sound source to the Delay Estima-tion between i-th mike and jth mike, W_ij(ω) it is frequency domain weighting coefficient, when this weights When coefficient is 1, it is simply that basic cross-correlation.Conventional frequency domain weighting functions also has phse conversion (PhAse Transform, PHAT) Weighting, ROTH weighting, maximum likelihood weighting, SCOT weighting etc., wherein PHAT weighting is the most conventional, because based on broad sense cross-correlation Time delay estimated information be hidden in the phase information of crosspower spectrum, unrelated with amplitude, PHAT weighting rule eliminate amplitude letter The impact of breath so that correlation function peak value is more sharp-pointed.PHAT weighting function formula is W_ij(ω)=1/| Ψ_ij(ω)|。

Above-mentioned localization method principle based on the difference time of advent is simple, and computational efficiency is high, but in bigger noise or reverberation Interference its time delay lower estimates that performance drastically declines, and is not therefore suitable for low signal-to-noise ratio or live acoustics scene.Additionally, also have Some delay time estimation methods, such as utilize combination entropy and mutual information in theory of information, use alternative manner, or consider different channels The method such as asymmetry, but under strong reverberation and noise conditions, result is not ideal, and Part Methods calculates speed and delays Slowly, such as iterative method etc..

2, controlled power response-phse conversion (Steered Response Power-PhAse Transform, SRP- PHAT) method

Localization method based on controlled power response-phse conversion, by scanning for global space, finds the maximum can The point of control power output, grid is the least, and search resolution is the highest.Its sound localization principle is as follows:

Assuming that sound source is positioned at s, τ_iRepresent the sound source transmission delay to i-th mike.The controlled power of sound source s rings Should-phse conversion output P^PHAT(s) be:

P^{P H A T} (s) = Σ_{i = 1}^{m} Σ_{j = 1}^{m} {&Integral;}_{- \infty}^{\infty} \frac{Ψ_{i j} (ω)}{| Ψ_{i j} (ω) |} e^{{jωτ}_{j i}} d ω

WhereinRepresent the cross-spectral density of two signals, τ_jiRepresent that at s, sound source is to jth Mike and the delay inequality of i-th mike.The sound source position estimatedFor:

\hat{s} = \arg \max P^{P H A T} (s)

Through analyzing it is recognised that the thinking of SRP-PHAT algorithm assumes that in space, a certain position exists sound source above, Then obtain controlled power based on the phase weighting output of this position, finally choose so that P^PHATS locus s that () is maximum As the sound source position estimated.But the thinking of SRP-PHAT is first to suppose a certain position sound source in space, in order to accurately Orient sound source information, it is generally required to grid data service, travel through all of grid position in space.Assume search volume c length Wide high respectively X, Y, Z, grid interval is δ, then Grid dimension N is:

Assuming that X=3m, Y=3m, Z=2m, grid interval δ=0.01m, then Grid dimension N=301*301*201= 18210801, say, that calculate close to more than 1,800 ten thousand pointsEven if only two dimension, 301*301 to be calculated =90601 times, therefore efficiency is the lowest.

3, random areas shrinkage method

It is Do that random areas based on SRP-PHAT shrinks (Stochastic Region Contraction, SRC) method H, Silverman H F is in order to improve a kind of algorithm that SRP-PHAT efficiency proposed in 2007, and this algorithm main thought is logical Cross stochastical sampling, choose a part of point maximum based on the response of PHAT controlled power, redefine search limit according to this partial dot Boundary, then repeats top-operation, until finding global maximum in new region.The premise of this algorithm robustness is to adopt at random During sample, the probability missing summit is almost nil, and the probability the most not sampling sound source position is almost nil.Assume Sound source volume is V_s, room volume is V_room, then stochastical sampling adopts the Probability p of peak value every time_hitFor p_hit=V_s/V_room, miss Crest probability p_missFor p_hit=V_s/V_room.Assume that sampling collects M point altogether for the first time, then miss crest probability P_miss(N) it isMeanwhile, crest probability is missed less than certain threshold value p in order to ensure sampling for the first time_thre, sampled point number M P to be met_miss(M)≤p_threCondition, it is possible to obtainIn the ordinary course of things, p is taken_thre= 0.005p_thre=0.001 needs that just can meet robustness.But it is entirely random due to sampled point in the algorithm, right Associated vector will be calculated in each stochastical sampling point, therefore also to pay more calculating than the controlled power response calculating this point Cost.

4, High-Resolution Spectral Estimation method

Sound localization method based on High-Resolution Spectral Estimation has used for reference the location technology in radar array, but radar is believed Number mostly being narrow band signal, voice signal frequency band is wider by contrast, and the effect of the most this method is the most preferable.

Sound localization is widely used in human-computer interaction technology, but traditional method cannot reach high robust simultaneously With real-time effect.

Visible, sound localization method based on microphone array is widely studied, compared with additive method, based on controlled The localization method of power response has environment resistant noise and the advantage of reverberation interference, but its calculating real-time is poor, should in reality Play a role with scene is difficult to.

Summary of the invention

In order to overcome the shortcoming of above-mentioned prior art, it is an object of the invention to provide a kind of microphone array sound source space Real-time location method, its sound localization definite principle, real-time is good, and experiment proves the method position error scope in the plane Caning be controlled on centimetres, performance is better than method based on prior art；The sound based on microphone array that the present invention proposes Source space real-time location method has higher arithmetic speed and robustness, and can be applicable to Smart Home and intelligent robot etc. needs The scene of real-time acoustic source location to be carried out.

To achieve these goals, the technical solution used in the present invention is:

A kind of microphone array sound source space real-time location method, comprises the following steps:

First, using microphone array as signals collecting and outut device, by using controlled power response-phse conversion (SRP-PHAT) method tentatively provides sound source locus candidate point；

Secondly, carry out preliminary candidate point by priori and screen, and use controlled power response-phse conversion (SRP-PHAT) method calculates the controlled power response output of candidate point；

Subsequently, shrink (Stochastic Region Contraction, SRC) by the random areas improved to redefine Search border, improves the efficiency of controlled power response-phse conversion method；

Finally, calculating the controlled power response of residue candidate point, position is estimated as final sound source in the position choosing maximum.

The described sound source locus candidate point that is tentatively given refers to determine the locus candidate point of whole sound source, and method is as follows:

Assuming that the length, width and height of space c to be searched are respectively X, Y, Z, grid interval is δ, then Grid dimension The i.e. candidate point of sound source locus is N number of.

Described preliminary candidate point screening refers to tentatively reduce candidate point number, and method is as follows:

Assume that microphone array comprises m mike, then when can obtain being positioned at sound source at s to the arrival of microphone array Prolong vector T DOA_s=[τ_1,2,s,τ_1,3,s,τ_1,4,s,τ_2,3,s,τ_2,4,s,τ_3,4,s,......]^T, wherein τ_i,j,s=(d_i,s-d_j,s)/v Sound propagation velocity in air, d is represented to i-th mike and the time delay of jth mike, v for being positioned at the sound source at s_i,sTable Show the physical distance being positioned at the sound source at s to i-th mike, d_j,sRepresent and be positioned at the thing to jth mike of the sound source at s Reason distance；

Definition sampled point number poor (Sample number Difference, SD) vector is:

SD_s=[sd_1,2,s,sd_1,3,s,sd_1,4,s,sd_2,3,s,sd_2,4,s,sd_3,4,s,......]^T

When signal sampling frequency is fs, have:

sd_i,j,s=round (fs τ_i,j,s)

Wherein round represents and rounds each element to nearest direction, if the sampled point calculated by some candidate point Difference vector SD is equal to, and the most only retains wherein any one candidate point, deletes other candidate points, it is to avoid double counting；

Obtaining each two candidate point s1 further, the sampled point number between s2 is poor:

SD_s1,s2=abs (SD_s1-SD_s2)

Wherein abs represents and asks for absolute value, and selects and all meet max (SD_s1,s2The candidate point of)≤threshold, only Retaining wherein any one candidate point, delete other candidate points, it is to avoid double counting, Threshold is defined as:

t h r e s h o l d = (\frac{1}{5} \times \frac{λ}{v} \times f s)

Wherein, λ represents wavelength of sound.When sample frequency fs is 16000Hz, arranging threshold=1 can effectively drop The number of low candidate point.

In actual applications, above-mentioned TDOA vector sum sampled point difference SD vector calculating, and candidate point select needs Well in advance is also stored in a look-up table, so when sound localization, only need to need not repeat meter according to index search Calculate.Pre-build that look-up table is also the key of speed-raising.

The controlled power response output computational methods of described candidate point are as follows:

It is positioned at the s of locus assuming that screen the candidate point obtained, X_i(ω) and X_j(ω) i-th mike is represented respectively Receive signal x_iT () and jth mike receive signal x_jThe Fourier transformation of (t), τ_iRepresent that sound source is to i-th mike Transmission delay, τ_jiRepresent that at s, sound source is to jth mike and the delay inequality of i-th mike, according to controlled power response-phase The definition of bit map method, can obtain the controlled power response output P of sound source s^PHAT(s) be:

P^{P H A T} (s) = Σ_{i = 1}^{m} Σ_{j = 1}^{m} {&Integral;}_{- \infty}^{\infty} \frac{Ψ_{i j} (ω)}{| Ψ_{i j} (ω) |} e^{{jωτ}_{j i}} d ω

WhereinRepresenting cross-spectral density between the two, m is total number of mike, right The candidate point that all screenings obtain calculates P^PHAT(s)。

The described search border that redefines refers to search for global maximum with quick random areas contraction algorithm, and method is such as Under:

Not by stochastical sampling, but the candidate point that screening is obtained (and its TDOA vector sum sampled point difference SD vector Can directly obtain in the look-up table of second step) by being calculated the controlled power response output of correspondence, choose top n Maximum, randomly selects M in the candidate point of its correspondence, redefines search border from these points subsequently, then exists Repeat in new region this to choose and search for, until meeting required precision.

The determination of the selection of the value of described N and stochastical sampling M value afterwards and concrete microphone array shape and room Between size etc. relevant, the selection of M and N has a lot of strategy, optional in the following way:

Mode one, elects definite value as, and M also elects definite value as；Such as under the conditions of quaternary microphone array, in general N=100 Time efficiency and accuracy reach best, performance is optimal.

Mode two, every time according to the controlled power response output that last N number of candidate point is corresponding, pick out wherein than The candidate point of the controlled power response output correspondence that average is big, its sum is N', if N-N'≤N', then in these candidate points Random choose N-N' as candidate point；If N-N' > N', then retain whole N' candidate point, so can ensure that each district After territory is shunk, average is continuously increased, when the controlled power of these candidate points responds output calculation times more than certain given threshold Stop during value choosing and searching for.

The positioning result obtained by such method disclosure satisfy that general required precision.If to determine more accurately Position, then can use gridding method precise search near the sound source result finally determined in little scope；Or it is according to look-up table, first First obtain the corresponding region of some controlled powers response output maximum, find all candidates in these areas adjacent grids Point, then calculates the controlled power response of these candidate points, and position is estimated as final sound source in the position choosing maximum.

Compared with prior art, the invention has the beneficial effects as follows:

(1) what the present invention proposed carries out, by prior information, the method that candidate point tentatively screens out, and largely reduces Respond the calculation cost of output based on the controlled power on candidate point, go for several scenes；

(2) what the present invention proposed pre-saves method in a lookup table by candidate point position associated vector, and principle is simple, Calculation cost is low, can be effectively improved live effect；

(3) two kinds of methods based on candidate point precise search sound source position again that the present invention proposes, can further improve The resolution of sound localization and precision, and computation complexity is relatively low, it is adaptable to hardware environment configures in relatively low equipment and scene.

The present invention proposes sound source space based on microphone array real-time location method and sound-source signal is carrying out space calmly During position, calculation cost is better than state of the art.The sound localization method of the present invention have be widely used, respond in real time etc. excellent Point, it is adaptable to intelligent robot and Smart Home etc. need to use sound to carry out the scene being accurately positioned.

Accompanying drawing explanation

Fig. 1 is sound source space based on microphone array real-time location method general illustration.

Detailed description of the invention

Embodiments of the present invention are described in detail below in conjunction with the accompanying drawings with embodiment.

As it is shown in figure 1, the whole calculating procedural details of the embodiment of the present invention constitutes as follows:

1, the locus candidate point of whole sound source is determined

Assuming that space to be searched c length, width and height are respectively X, Y, Z, grid interval is δ, then Grid dimension N is:

The i.e. candidate point of sound source locus is N number of.

2, arrival time delay vector sum sampled point difference vector is calculated

Assume that microphone array comprises m >=4 mike, then can obtain sound source the arriving to microphone array being positioned at s Reach time delay vector T DOA_sFor:

TDOA_s=[τ_1,2,s,τ_1,3,s,τ_1,4,s,τ_2,3,s,τ_2,4,s,τ_3,4,s,......]^T(formula 2)

Wherein τ_i,j,s=(d_i,s-d_j,s)/v is to be positioned at the sound source at s to i-th mike and the time delay of jth mike, Sound propagation velocity during wherein v represents air, d_i,sRepresent and be positioned at the physical distance to i-th mike of the sound source at s, d_j,sTable Show the physical distance being positioned at the sound source at s to jth mike.

Definition sampled point number difference vector SD_sFor:

SD_s=[sd_1,2,s,sd_1,3,s,sd_1,4,s,sd_2,3,s,sd_2,4,s,sd_3,4,s,......]^T(formula 3)

When signal sampling frequency is fs, have:

SD_i,j,s=round (fs τ_i,j,s) (formula 4)

Wherein round represents and rounds each element to nearest direction, and fs is sample frequency.Obtain each two further Candidate point s1, the sampled point number between s2 is poor:

SD_s1,s2=abs (SD_s1-SD_s2) (formula 5)

Wherein abs represents and asks for absolute value.

3, part candidate point is deleted

In above-mentioned second step calculates, if some candidate point sampled point difference vector SD calculated by formula 4 is equal to, then Only retain wherein any one candidate point, delete other candidate points, it is to avoid double counting.And remaining candidate point is passed through formula 5 calculate again, select and all meet max (SD_s1,s2The candidate point of)≤threshold, only retains wherein any one candidate point, Delete other candidate points, it is to avoid double counting.Threshold is defined as herein:

It addition, in actual applications, the calculating of above-mentioned TDOA vector sum sampled point difference SD vector, and the selecting of candidate point Need well in advance and be stored in a look-up table, so when sound localization, only need not need to weigh according to index search Multiple calculating.Pre-build that look-up table is also the key of speed-raising.

4, the controlled power response output of candidate point is calculated

Assuming that the candidate point that above-mentioned steps obtains after completing is positioned at the s of locus, X_i(ω) and X_j(ω) is represented respectively I mike receives signal x_iT () and jth mike receive signal x_jThe Fourier transformation of (t), τ_iRepresent that sound source is to i-th The transmission delay of mike, τ_jiRepresent that at s, sound source is to jth mike and the delay inequality of i-th mike.According to controlled power The definition of response-phse conversion method, can obtain the controlled power output P of sound source s^PHAT(s) be:

WhereinRepresent cross-spectral density between the two.The time that all second steps are obtained Reconnaissance calculates P^PHAT(s)。

5, determine M and N, redefine search border

To the P being calculated correspondence by above-mentioned steps^PHATS (), chooses top n maximum, at the candidate point of its correspondence In randomly select M, redefine from these points subsequently and search for border.

How the value of N selects and the determination of stochastical sampling M afterwards and N value and concrete microphone array shape and room Between size etc. relevant.The selection of M and N has a lot of strategy, and simplest is that fixing M and N is for determining value.Such as at quaternary mike Under the conditions of array, in general during N=100, efficiency and accuracy reach best, and performance is optimal.The strategy of another kind of selection N is, Every time according to the P that last N number of candidate point is corresponding^PHATS () picks out wherein big than average P^PHATS candidate that () is corresponding Point, its sum is N', if N-N'≤N', then in these candidate points, random choose N-N' is individual as candidate point；If N-N' > N', then retain whole N' candidate point.After so can ensure that each regions contract, average is continuously increased.When these candidate points P^PHATStop choosing and searching for when () calculation times is more than certain threshold value given s.

6, repeat this to choose and search for, until exporting candidate point after meeting required precision

After the above step is finished, obtain new searching element region, repeat this to choose and search for, directly in this region To meeting required precision, export these subsequently and meet the candidate point required.The positioning result obtained by such method can Meet general required precision.

7, finer location is realized with gridding method or look-up table further

If to position more accurately, then near the sound source result finally determined, gridding method essence in little scope, can be used Really search；Or according to look-up table, first obtain some P^PHATS the corresponding region of () maximum, finds these areas adjacent nets All candidate points in lattice, then calculate the controlled power response of these candidate points, choose the position of maximum as final sound Estimation position, source.

Claims

1. a microphone array sound source space real-time location method, it is characterised in that comprise the following steps:

Secondly, carry out preliminary candidate point by priori and screen, and use controlled power response-phse conversion (SRP- PHAT) method calculates the controlled power response output of candidate point；

Subsequently, shrink (Stochastic Region Contraction, SRC) by the random areas improved and redefine search Border, improves the efficiency of controlled power response-phse conversion method；

Microphone array sound source space real-time location method the most according to claim 1, it is characterised in that described be tentatively given Sound source locus candidate point refers to determine the locus candidate point of whole sound source, and method is as follows:

Microphone array sound source space real-time location method the most according to claim 1, it is characterised in that described preliminary time Reconnaissance screening refers to tentatively reduce candidate point number, and method is as follows:

Assume that microphone array comprises m mike, then can obtain being positioned at sound source at s to microphone array arrival time delay to Amount TDOA_s=[τ_1,2,s,τ_1,3,s,τ_1,4,s,τ_2,3,s,τ_2,4,s,τ_3,4,s,......]^T, wherein τ_i,j,s=(d_i,s-d_j,s)/v is position Sound source at s represents sound propagation velocity in air, d to i-th mike and the time delay of jth mike, v_i,sRepresent position Sound source at s is to the physical distance of i-th mike, d_j,sRepresent be positioned at sound source at s to jth mike physics away from From；

Definition sampled point number poor (Sample number Difference, SD) vector is:

SD_s=[sd_1,2,s,sd_1,3,s,sd_1,4,s,sd_2,3,s,sd_2,4,s,sd_3,4,s,......]^T

When signal sampling frequency is fs, have:

sd_i,j,s=round (fs τ_i,j,s)

Wherein round represents and rounds each element to nearest direction, if the sampled point difference calculated by some candidate point to Amount SD equivalent, the most only retains wherein any one candidate point, deletes other candidate points, it is to avoid double counting；

SD_s1,s2=abs (SD_s1-SD_s2)

Wherein abs represents and asks for absolute value, and selects and all meet max (SD_s1,s2The candidate point of)≤threshold, only retains Wherein any one candidate point, deletes other candidate points, it is to avoid double counting, and Threshold is defined as:

t h r e s h o l d = (\frac{1}{5} \times \frac{λ}{v} \times f s)

Wherein, λ represents wavelength of sound.

Microphone array sound source space real-time location method the most according to claim 3, it is characterised in that described sample frequency When fs is 16000Hz, threshold=1 is set.

Microphone array sound source space real-time location method the most according to claim 3, it is characterised in that described TDOA vector The calculating of SD poor with sampled point vector and selecting of candidate point, well in advance is also stored in a look-up table, fixed in sound source During position, directly according to index search, without double counting.

Microphone array sound source space real-time location method the most according to claim 3, it is characterised in that described candidate point Controlled power response output computational methods are as follows:

It is positioned at the s of locus assuming that screen the candidate point obtained, X_i(ω) and X_j(ω) represent i-th mike respectively to receive Signal x_iT () and jth mike receive signal x_jThe Fourier transformation of (t), τ_iRepresent the sound source transmission to i-th mike Postpone, τ_jiRepresent that at s, sound source, to jth mike and the delay inequality of i-th mike, becomes according to controlled power response-phase place Change the definition of method, the controlled power response output P of sound source s can be obtained^PHAT(s) be:

P^{P H A T} (s) = Σ_{i = 1}^{m} Σ_{j = 1}^{m} {&Integral;}_{- \infty}^{\infty} \frac{Ψ_{i j} (ω)}{| Ψ_{i j} (ω) |} e^{{jωτ}_{j i}} d ω

WhereinRepresenting cross-spectral density between the two, m is total number of mike, to all The candidate point that screening obtains calculates P^PHAT(s)。

Microphone array sound source space real-time location method the most according to claim 1, it is characterised in that described in redefine Search border refers to search for global maximum with quick random areas contraction algorithm, and method is as follows:

The candidate point obtaining screening, by being calculated the controlled power response output of correspondence, chooses top n maximum, at it Corresponding candidate point randomly selects M, redefines search border, then weight in new region from these points subsequently Carry out again this choosing and searching for, until meeting required precision.

Microphone array sound source space real-time location method the most according to claim 7, it is characterised in that the value of described N The determination of selection and stochastical sampling M value afterwards is relevant with concrete microphone array shape and room-sized, the choosing of the value of N Select in the following way:

Mode one, elects definite value as, and M also elects definite value as；

Mode two, every time according to the controlled power response output that last N number of candidate point is corresponding, picks out wherein than average The candidate point that big controlled power response output is corresponding, its sum is N', if N-N'≤N', then random in these candidate points Select N-N' as candidate point；If N-N' > N', then retain whole N' candidate point, when the controlled power of these candidate points Stop choosing and searching for when response output calculation times is more than certain threshold value given.

Microphone array sound source space real-time location method the most according to claim 8, it is characterised in that described mode one In, under the conditions of quaternary microphone array, when selecting N=100.

Microphone array sound source space real-time location method the most according to claim 1, it is characterised in that finally determining Sound source result near use gridding method precise search in little scope；Or according to look-up table, first obtain some controlled powers The corresponding region of response output maximum, finds all candidate points in these areas adjacent grids, then calculates these candidates The controlled power response of point, position is estimated as final sound source in the position choosing maximum.