CN113702909A

CN113702909A - Sound source positioning analytic solution calculation method and device based on sound signal arrival time difference

Info

Publication number: CN113702909A
Application number: CN202111003228.2A
Authority: CN
Inventors: 李帛翰; 黄志尧; 邱奕臻; 冀海峰; 王保良
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-11-26
Anticipated expiration: 2041-08-30
Also published as: CN113702909B

Abstract

The invention discloses a sound source positioning analysis solution calculation method and device based on sound signal arrival time difference. The method adopts five microphones to form an array, the configuration of the array meets the condition that a coefficient matrix of a sound source position resolving linear equation set is reversible, five paths of sound signals are obtained simultaneously, time delay among observed signals is obtained by utilizing a generalized cross-correlation algorithm with PHAT as a weighting function, and then an analytic solution of three-dimensional coordinates of a sound source and a time difference from sound production of the sound source to signal receiving of the microphones is obtained according to the time delay solution. The method realizes the analytic solution of the midpoint sound source coordinate in the three-dimensional space by specifically selecting the microphone array, has the characteristics of high accuracy, small volume, low power consumption, small influence on the environment and the like, and is suitable for the application of a satellite positioning system which cannot cover the scene, such as an indoor warehouse and other environments, and the scene in which the generation of electromagnetic waves is forbidden or strong interference sources of the electromagnetic waves exist at the periphery.

Description

Sound source positioning analytic solution calculation method and device based on sound signal arrival time difference

Technical Field

The present invention relates to sound source localization technologies, and in particular, to a method and an apparatus for computing a sound source localization analytic solution based on a time difference of arrival of a sound signal.

Background

With the development of scientific technology, more and more complex scenes put higher requirements on the target positioning technology. Early active target positioning technologies actively transmit detection signals, such as electromagnetic waves and ultrasonic waves, receive the detection signals reflected by the environment, and then analyze and calculate to obtain the specific position of a target in space. Compared with the passive target positioning technology, the passive target positioning technology has the advantages of small volume, low power consumption, strong concealment, wide application range, small influence on the environment and the like because a measuring device is not required to actively send out a detection signal. Therefore, passive positioning technology is now intensively studied by many scholars, and the results are successfully applied to the fields including national defense, security, warehousing, ecological environment monitoring and the like.

The sound source positioning technology is a typical passive positioning technology, and the main principle is that a sound receiver receives a sound signal emitted by a sound source, and then a series of signal processing means analyze relevant parameters of the signal to obtain position information of the sound source. The existing sound source localization technologies are mainly classified into a sound source localization (TDOA) based on a signal arrival time difference, a sound source localization (TOA) based on a signal arrival time, a sound source localization (DOA) based on a signal incidence angle, a sound source localization (RSSI) based on a signal arrival intensity, and the like. The TDOA-based sound source positioning method is one of widely used sound source positioning methods, and the flow of the method is further divided into two steps, namely: delay estimation (time difference estimation) and position resolution. The time delay estimation is to calculate the time delay from a sound source to different microphones through sound signals, and a widely adopted algorithm is a cross-correlation algorithm; the position calculation, that is, the position information of the sound source is obtained through time delay calculation, and the adopted mainstream methods include a newton iteration method, a spherical interpolation method and the like. However, the calculation result of the mainstream method of position solution can only obtain the numerical solution of the sound source position, and the result has the problems of divergence after iteration and large influence by the initial value of the iteration. Meanwhile, the existing common method generally needs a large number of microphones to form an array, and has the defects of large calculation amount and the like when solving a numerical solution. Therefore, a method for solving the sound source position analytic solution with the least computation cost and the least reduction of the number of microphones as possible is required.

Disclosure of Invention

The invention provides a sound source positioning analytic solution calculation method and a sound source positioning analytic solution calculation device based on sound signal arrival time difference, aiming at the current situation of a sound source positioning technology based on signal arrival time difference. The invention adopts five microphones to form a microphone array to collect sound signals, selects a proper weighting function in a generalized cross-correlation algorithm to carry out time delay estimation, and obtains an analytic solution of a sound source position through time delay solving. The microphone configuration and the method have the advantages of small calculation amount, high accuracy and the like in the aspect of sound source positioning.

The technical scheme of the invention is as follows:

the invention provides a configuration of a sound source positioning microphone array, which is characterized by comprising five microphones with the coordinate M_i(x_i，y_i，z_i) 1, 2.... 5; let microphone M₁Is a reference microphone, which may be any one of five microphones; according to four time differences between the arrival of the rest four microphone sounds and the arrival of the reference microphone sound, the condition that the three-dimensional space sound source coordinate has an analytic solution is known as a matrix A which is reversible, and five microphone coordinates in the configuration of the sound source positioning microphone array are configured to meet the condition, wherein the expression of the matrix A is as follows:

where c is the speed of sound and the reference microphone is denoted as M₁(x₁，y₁，z₁) Without loss of generality, it may be any of five microphones, t_i1The sound signal emitted for the sound source is received by the ith microphone M_iTime of reception (t)_i0I.e. the sound signal emitted by the sound source is propagated to the ithMicrophone M_iTime required) and the reference microphone M₁Time of received sound signal (t)₁₀I.e. the sound signal emitted by the sound source propagates to the reference microphone M₁Required time), t_i1＝t_i0-t₁₀。

One way to determine whether matrix a is invertible is to determine whether the determinant of a is not equal to 0:

in order to ensure that the matrix A is reversible, reduce the calculated amount in the process of solving the determinant and the inverse matrix of the matrix A and obtain an effective analytic solution, the microphone array optimization configuration is as follows: the four microphones are positioned in the same plane and at four vertexes of a concave quadrilateral respectively, and the fifth microphone is positioned at any position outside the plane. The matrix a corresponding to this configuration is reversibly demonstrated as follows:

the coordinate system is established and the microphone coordinates are set as shown in fig. 2. Without loss of generality, a microphone M is provided₁、M₂、M₃、M₄In the X-Y plane, i.e. z₂＝z₃＝z₄Point M is equal to 0₁Coordinate is (0, 0, 0), microphone M₄In the microphone M₁、M₂、M₃Inside the triangle as a vertex and not falling on the side of the triangle, the microphone M₁、M₂、M₃、M₄Four points form a concave quadrangle; microphone M₅Outside this plane, i.e. z₅Not equal to 0; let the position vector of the ith microphone be: m is_i＝[x_i，y_i，z_i]^T1, 2.... 5; the sound source position vector is: p ═ x, y, z]^T(ii) a Sound source to microphone M_iThe distance of (a) is:

1, 2, 5, where | · | | | represents the 2 norm of the vector (the norm of the vector).

Point M₁、M₂、M₃、M₄Co-located with the X-Y plane, point M₁、M₂、M₃The triangle formed is a typical convex set, point M₄Located inside the triangle and not falling on the triangle edge, according to the linear algebra and convex set theory, we can get:

m₄＝αm₂+βm₃ (1)

x₄＝αx₂+βx₃ (2)

y₄＝αy₂+βy₃ (3)

wherein alpha is more than 0, beta is more than 0, and alpha + beta is less than 1. Accordingly, the matrix a can be represented as:

the determinant of matrix a is:

in the expression of detA, the sound speed c is not 0; z is a radical of₅Is not 0; vector m₂And m₃Not collinear, i.e.

Thus, in this optimized configuration, the proof problem that matrix A is reversible translates into proof t₄₁-at₂₁-βt₃₁≠0。

ct₂₁＝c(t₂₀-t₁₀)＝D₂-D₁ (6)

ct₃₁＝c(t₃₀-t₁₀)＝D3-D₁ (7)

ct₄₁＝c(t₃₀-t₁₀)＝D₄-D₁ (8)

According to the formula (6), (7) and (8), the values of alpha > 0, beta > 0 and alpha + beta < 1 are substituted into c (t)₄₁-αt₂₁-βt₃₁)：

c(t₄₁-αt₂₁-βt₃₁)＝D₄-D₁-α(D₂-D₁)-β(D₃-D₁)＝D₄-αD₂-βD₃-(1-α-β)D₁

＝||m₄-p||-α||m₂-p||-β||m₃-p||-(1-α-β)||m₁-p||

＝||m₄-p||-||α(m₂-p)||-||β(m₃-p)||-||-(1-α-β)p|| (9)

The vector inequality formula can be obtained:

in the above formula, | | m₄-p||-||α(m₂-p)||-||β(m₃The sufficient condition of-p) | - (1- α - β) p | | | 0 is the vector m₂-p、m₃-p、m₄-p three vectors are parallel and co-directional. In this optimized configuration, the microphone M₄In the microphone M₁、M₂、M₃Inside the triangle as a vertex and not falling on the triangle's side, so the vector m₂-p、m₃-p、m₄P must not be parallel and oriented differently, the condition of equality does not hold, and therefore:

c(t₄₁-αt₂₁-βt₃₁)＝||m₄-p||-||α(m₂-p)||-||β(m₃-p)||-||-(1-α-β)p||＜0 (11)

since the speed of sound c is not 0, t₄₁-αt₂₁-βt₃₁< 0, i.e. t₄₁-αt₂₁-βt₃₁Is not that0。

In summary, in the depicted configuration, detA is not 0, i.e., the matrix A is reversible.

On the basis, the calculation amount in the process of solving the determinant and the inverse matrix of the matrix A is further reduced, and the reference microphone M is enabled₁As the origin (0, 0, 0), the microphone M₂Coordinate is (a, 0, 0), microphone M₃Coordinates of (0, a, 0), microphone M₄Coordinates (b, b, 0), microphone M₅The coordinate is (0, 0, a), and the guarantee point M₄At point M₁、M₂、M₃The triangle is formed, wherein a is more than 2b is more than 0. The matrix a in this configuration can be represented as:

corresponding inverse matrix A^-1Comprises the following steps:

wherein detA is 16a²c²(at₄₁-bt₃₁-bt₂₁) Corresponding sound source coordinates P (x, y, z) and t₁₀Is expressed as

The invention also provides a sound source positioning device based on the arrival time difference of the sound signals, which comprises the microphone array, the preamplifier, the data acquisition module and the computer. The sound signals received by each microphone are amplified by a preamplifier, then the data acquisition module is used for carrying out sound signal acquisition and analog-to-digital conversion, the data acquisition module is communicated with a computer through a data line, and the computer is used for calculating the time delay between the microphones and then calculating to obtain the analytic solution of the position of the sound source.

The invention also provides a sound source positioning analysis solution calculation method based on the sound signal arrival time difference, which comprises the following steps:

the method comprises the following steps: five microphones in the microphone array adopting the structure respectively receive signals emitted by a sound source, the received signals of the microphones are amplified by the preamplifier and then subjected to analog-to-digital conversion and sampling by the data acquisition module to obtain a sound signal s_iI is a microphone number, i is 1, 2.

Step two: for the sound signal s_iPerforming frame division windowing, mean value removing, normalization and discrete Fast Fourier Transform (FFT) processing to obtain a frequency domain sound signal S 'subjected to frame division processing'_ijWherein j is 1, 2.... u;

step three: the microphone M is selected according to the configuration of the microphone array₁Using the received sound signal as reference, and using PHAT as weighting function generalized cross-correlation algorithm to calculate the other four groups of sound signals relative to the microphone M₁Time delay t of received sound signal_i1，i＝2，3，......，5；；

Step four: using the method of claim 2, four sets of sound signals estimated from time delays are made relative to the microphone M₁Time delay t of received sound signal_i1And obtaining an equation between the sound source position coordinates and the sound source position coordinates P (x, y, z);

step five: using the method of claim 2, the equations in step four are simplified to obtain the coordinates P (x, y, z) of the sound source and the propagation of the sound signal emitted by the sound source to the reference microphone M₁Required time t₁₀Is solved to obtain t of the sound source coordinates P (x, y, z) and₁₀and analyzing the expression.

Preferably, in the second step, the framing operation selects the same position on five groups of sound signals as a starting point, takes w as 10-30 ms as a frame length, v as a frame shift, and selects u frames backwards, where v is any time greater than 0, and usually 0 < v < w.

Preferably, in the second step, between the normalization and the discrete fast fourier transform, a step of butterworth filtering is further included.

Preferably, the step three specifically includes:

with a microphone M₁Taking the received sound signal as a reference, and calculating the relative time delays t of the four groups of rest sound signals and the reference signal by using a generalized cross-correlation algorithm taking PHAT as a weighting function_i1: firstly, from S'_ijAre each with S'_1jConjugate multiplication is carried out to obtain cross-power spectral function

Then, a PHAT weighting function is selected

For cross power spectrum

After weighting, performing inverse discrete fast Fourier transform to obtain cross-correlation function

Then, the relative sampling points of the other four groups of sound signals are delayed

Converting relative sample point delays to relative time delays

Wherein f is_sA sampling frequency set for the analog-to-digital conversion section; finally, t is discarded_i1jThe average value of the gross errors in the four groups of sound signals is calculated to obtain four groups of sound signals corresponding to the microphone M₁Time delay t of received sound signal_i1。

Preferably, according to five microphone coordinates M_i(x_i，y_i，z_i) And four groups of sound signals obtained by time delay estimation relative to the microphone M₁Time delay t of received sound signal_i1Their relationship to the sound source position coordinates P (x, y, z) can be found:

where c is the speed of sound and T is the ambient temperature in degrees Celsius.

The fifth step is as follows: simplifying the equation in step four to obtain the sound source coordinates P (x, y, z) and t₁₀The analytical expression of (2).

First, the two sides of equations (15) to (19) are squared and expanded to obtain:

x²-2xx₁+x₁ ²+y²-2yy₁+y₁ ²+z²-2zz₁+z₁ ²＝c²t₁₀ ² (21)

x²-2xx₂+x₂ ²+y²-2yy₂+y₂ ²+z²-2zz₂+z₂ ²＝c²t₂₁ ²+2c²t₂₁t₁₀+c²t₁₀ ² (22)

x²-2xx₃+x₃ ²+y²-2yy₃+y₃ ²+z²-2zz₃+z₃ ²＝c²t₃₁ ²+2c²t₃₁t₁₀+c²t₁₀ ² (23)

x²-2xx₄+x₄ ²+y²-2yy₄+y₄ ²+z²-2zz₄+z₄ ²＝c²t₄₁ ²+2c²t₄₁t₁₀+c²t₁₀ ² (24)

x²-2xx₅+x₅ ²+y²-2yy₅+y₅ ²+z²-2zz₅+z₅ ²＝c²t₅₁ ²+2c²t₅₁t₁₀+c²t₁₀ ² (25)

then, the formula (21) is subtracted from both sides of the formulas (22) to (25) respectively, and the two sides are simplified by term shifting to obtain:

conversion to linear equation set a ξ ═ σ form:

where xi is [ x, y, z, t ═ t₁₀]^T，

When the equation is satisfied, A is reversible, and the sound source coordinates P (x, y, z) and t can be obtained at the time₁₀The analytical expression of (a) is:

compared with the prior art, the invention has the following beneficial effects:

1) the invention adopts a novel microphone array configuration. The number of the five microphones in the array reaches the minimum value of the problem of calculating the three-dimensional space positioning, the configuration meets the condition of the existence of an analytic solution, and the microphones can be accurately positioned at any position on the positioning expression.

2) The position calculation of the invention obtains an analytic solution by solving a linear equation set. The defects that the traditional numerical solution needs a large number of microphones to form an array, the calculated amount is large during the solution, and the like can be overcome. The analytic solution solving method has the characteristics of strong stability, high accuracy and small calculated amount, and is favorable for real-time positioning.

3) After the frame windowing operation is carried out on the signal, the method adopts a mode of removing gross errors and averaging the time delay estimation of each frame, and compared with a mode of directly carrying out one-time delay estimation on the original signal, the stability of the algorithm can be improved. At the same time as the sound signal passes through a fixed frequency f in the time domain_sThe resolution of sampling and single time delay estimation is only 1/f_sAnd the multi-frame average value can obtain the delay estimation with higher resolution.

4) In the generalized cross-correlation algorithm, the PHAT weighting function is adopted, the effect is equivalent to whitening filtering, and the peak value of the generalized cross-correlation function is sharper in the time domain.

Drawings

FIG. 1 is a schematic diagram of a sound source localization apparatus based on sound signal arrival time differences;

fig. 2 is an optimized spatial configuration of a microphone array for use in the present invention.

Fig. 3 is a further optimized configuration of the optimized spatial configuration of the microphone array of fig. 2.

Detailed Description

The invention provides a novel microphone array configuration and a sound source positioning method aiming at the configuration on the basis of the sound source positioning problem based on the sound signal arrival time difference. Compared with the traditional method, the method has the characteristics of strong stability, high accuracy and small calculated amount.

As shown in fig. 1, an apparatus for sound source localization based on time difference of arrival of sound signals includes a microphone array, a preamplifier, a data acquisition module, and a computer. The five microphones are used for receiving sound signals and are arranged in a configuration capable of ensuring the sound source position to be analyzed and solved, the sound signals received by each microphone are amplified by the preamplifier and are connected with the data acquisition module through the flat cable to realize the power supply of the microphones and the reading of the sound signals. The data acquisition module is connected with the power module to realize the power supply of the device of the sound signal acquisition and signal analog-to-digital conversion processing part, and is communicated with the computer through a data line.

The microphone array comprises five microphones, and the coordinates of the five microphones are M_i(x_i，y_i，z_i) 1, 2.... 5; let microphone M₁Is a reference microphone, which may be any one of five microphones; according to four time differences between the sound arrival of the rest four microphones and the sound arrival of the reference microphone, the condition that the three-dimensional space sound source coordinate has an analytic solution is known as a matrix A to be reversible, and the sound source positioning microphone arrayIs configured to satisfy the condition, wherein the expression of the matrix a is:

where c is the speed of sound and the reference microphone is denoted as M₁(x₁，y₁，z₁)，t_i1The sound signal emitted for the sound source is received by the ith microphone M_iTime of reception (t)_i0I.e. the sound signal emitted by the sound source is propagated to the i-th microphone M_iTime required) and the reference microphone M₁Time of received sound signal (t)₁₀I.e. the sound signal emitted by the sound source propagates to the reference microphone M₁Required time), t_i1＝t_i0-t₁₀。

A method for judging whether a matrix A is reversible can judge whether a determinant of A is not equal to 0:

as shown in fig. 2, an optimized configuration of the microphone array configuration can be implemented by four microphones (M) to reduce the amount of calculation in the process of solving the determinant and the inverse matrix of the matrix a and obtain an effective analytic solution₁、M₂、M₃、M₄) A fifth microphone (M) in the same plane and located at the four vertices of a concave quadrilateral respectively₅) Anywhere outside the plane.

On the basis of the above, the calculation amount in the process of solving the determinant and the inverse matrix of the matrix A is further reduced, and as shown in FIG. 3, the microphone M is enabled₁As the origin (0, 0, 0), the microphone M₂Coordinate is (a, 0, 0), microphone M₃Coordinates of (0, a, 0), microphone M₄Coordinates (b, b, 0), microphone M₅The coordinates are (0, 0, a), where a > 2b > 0. The matrix a in this configuration is reversible and has the form:

the corresponding inverse matrix is:

wherein detA is 16a²c²(at₄₁-bt₃₁-bt₂₁) At this time, the corresponding sound source coordinates P (x, y, z) and t₁₀The analytical expression of (a) is:

in the implementation process of the method, firstly, sound signals emitted by a sound source are received by the microphone array and are sent to a computer through a preamplifier and a data acquisition module to obtain five groups of sound signal sequences s_i(i ═ 1, 2.... 5), then performing frame windowing, mean value removing, normalization, Butterworth filtering and discrete fast Fourier transform processing on the sequence to obtain a processed frequency domain sound signal

(i 1, 2.... 5), which may be followed by any microphone M_kRelative time delays of the four remaining groups of sound signals are estimated by using a PHAT-weighted generalized cross-correlation algorithm with reference to the received sound signals, and in the embodiment, only the microphone M is used₁The received sound signal is used as a reference for explanation, and the other four groups of sound signals are relative to the microphone M₁Time delay t of received sound signal_i1And (i 2, 3, a.... 5), and finally, solving an analytic solution P (x, y, z) of the sound source coordinates according to the time delay estimation result. The specific operation is as follows:

1) for the sound signal s received by the computer_i(i 1, 2.... 5) frame windowing is performed:

s′_ij(n)＝hamming(n)s_i(N+n+(j-1)×r)(n＝1，2......q)(i＝1，2，......，5)(j＝1，2，......，u)

wherein j represents the j th frame in the u frames, N represents the start point of framing, q represents the frame length, r represents the frame shift, and hamming (N) represents the Hamming window with length of wlen. The framing operation selects the same position on five groups of sound signals as a starting point, the time domain takes 10-30 ms as the frame length, any time length v is the frame shift, usually, the two frames are overlapped, and u frames are selected backwards.

2) Carrying out mean value removing, normalization, Butterworth filtering and discrete fast Fourier transform processing on the signals after the framing windowing:

where bw represents the time-domain impulse response function of the butterworth filter.

3) With a microphone M₁For reference, calculate cross-power spectrum:

4) calculate the PHAT weighted generalized cross-correlation function:

5) calculating the time delay:

6) calculating the sound speed c:

where T is the known ambient Celsius temperature.

7) According to the coordinates M of five microphones_i(x_i，y_i，z_i) And four groups of sound signals obtained by time delay estimation relative to the microphone M₁Time delay t of received sound signal_i1Their relationship to the sound source position coordinates P (x, y, z) can be found:

first, equations (15) to (19) are squared on both sides and expanded to obtain:

x²-2xx₁+x₁ ²+y²-2yy₁+y₁ ²+z²-2zz₁+z₁ ²＝c²t₁₀ ² (21)

conversion to linear equation set a ξ ═ σ form:

where xi is [ x, y, z, t ═ t₁₀]^T，

the number of the five microphones in the array reaches the minimum value of the problem of calculating the three-dimensional space positioning, the configuration meets the condition of the existence of an analytic solution, and the microphones can be accurately positioned at any position on the positioning expression. The microphone configuration and the analytic solution solving method of the sound source position based on the sound signal arrival time difference can realize omnidirectional sound source positioning in a three-dimensional space, and the analytic solution obtained by position solving has the characteristics of high accuracy and strong stability.

Claims

1. A configuration of a sound source positioning microphone array is characterized by comprising five microphones, and coordinates M of the five microphones_i(x_i，y_i，z_i) 1, 2.... 5; let microphone M₁Is a reference microphone, which may be any one of five microphones; according to four time differences between the arrival of the rest four microphone sounds and the arrival of the reference microphone sound, it can be known that the condition that the three-dimensional space sound source coordinate has an analytic solution is a reversible matrix A, and five microphone coordinates in the configuration of the sound source positioning microphone array are configured to meet the condition, wherein the expression of the matrix A is as follows:

where c is the speed of sound and the reference microphone is denoted as M₁(x₁，y₁，z₁)，t_i1The sound signal emitted by the sound source is the ith microphone M_iTime of reception t_i0And a reference microphone M₁Time t of received sound signal₁₀Difference of t_i1＝t_i0-t₁₀。

2. The configuration of the sound source localization microphone array according to claim 1, wherein four microphones are located at four vertices of a concave quadrangle in the same plane, and the fifth microphone is located at any position outside the plane.

3. The configuration of the sound source localization microphone array as claimed in claim 1, wherein the microphone M is made of₁As the origin (0, 0, 0), the microphone M₂Coordinate is (a, 0, 0), microphone M₃Coordinates of (0, a, 0), microphone M₄Coordinates (b, b, 0), microphone M₅The coordinates are (0, 0, a), wherein a > 2b > 0; the matrix a in this configuration is reversible and has the form:

corresponding inverse matrix A^-1Comprises the following steps:

4. a sound source localization device based on sound signal arrival time difference is characterized by comprising the sound source localization microphone array configuration, a preamplifier, a data acquisition module and a computer, wherein the sound source localization microphone array configuration is as claimed in any one of claims 1-3; the sound signals received by each microphone are amplified by a preamplifier, and then are subjected to sound signal acquisition and signal analog-to-digital conversion by a data acquisition module, the data acquisition module is communicated with a computer through a data line, and the computer is used for calculating the analytic solution of the position of the sound source.

5. A sound source positioning analytic solution calculation method based on sound signal arrival time difference is characterized by comprising the following steps:

the method comprises the following steps: the microphone array of claim 1, wherein the five microphones receive the signals from the sound source, the received signals from the microphones are amplified by the preamplifier, and then are analog-to-digital converted and sampled by the data acquisition module to obtain the sound signal s_iI is a microphone number, i is 1, 2.

step three: the microphone M is selected according to the configuration of the microphone array₁Using the received sound signal as reference, and using PHAT as weighting function generalized cross-correlation algorithm to calculate four groups of sound signals relative to the microphone M₁Time delay t of received sound signal_i1，i＝2，3，......，5；

Step four: four groups of sound signals obtained according to time delay estimation relative to the microphone M₁Time delay t of received sound signal_i1And obtaining equations between them and the sound source position coordinates P (x, y, z), i.e., equations (15) to (19);

step five: the equation in step four is simplified,propagation of sound signals emitted by the sound source and with respect to the sound source coordinates P (x, y, z) to the reference microphone M is obtained₁Required time t₁₀Solving the linear equation system with the analytic solution to obtain the sound source coordinates P (x, y, z) and t₁₀The analytical expression of (2).

6. The sound source localization analytic solution computing method based on sound signal arrival time difference according to claim 5, characterized in that: in the second step, the same positions on five groups of sound signals are selected as starting points in the framing operation, w is 10-30 ms as the frame length, v is the frame shift, and u frames are selected backwards, wherein v is any time greater than 0, and usually 0 < v < w.

7. The method for calculating a sound source localization analytic solution based on sound signal arrival time difference according to claim 5, wherein the third step specifically comprises:

with a microphone M₁Using the received sound signal as reference, and using PHAT as weighting function generalized cross-correlation algorithm to calculate four groups of sound signals relative to the microphone M₁Time delay t of received sound signal_i1: firstly, from S'_ijEach is respectively provided with

Conjugate multiplication is carried out to obtain cross-power spectral function

Then, a PHAT weighting function is selected

For cross power spectrum

Weighted and then subjected to inverse discrete fast Fourier transform to obtainTo cross correlation function

Converting relative sample point delays to relative time delays

Wherein f is_sA sampling frequency set for the analog-to-digital conversion section; finally, take t_i1jCalculating the average value to obtain four groups of sound signals relative to the microphone M₁Time delay t of received sound signal_i1。

8. The method for calculating a sound source localization analytic solution based on sound signal arrival time difference according to claim 5, wherein the step five specifically comprises:

the microphone array of claim 1, wherein the matrix A of the sound source in the equation is reversible at any position in space, and the sound source coordinates P (x, y, z) and t are obtained₁₀The analytical expression of (1):