CN109188362B

CN109188362B - Microphone array sound source positioning signal processing method

Info

Publication number: CN109188362B
Application number: CN201811019390.1A
Authority: CN
Inventors: 孟晓辉; 理华; 肖灵
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2018-09-03
Filing date: 2018-09-03
Publication date: 2020-09-08
Anticipated expiration: 2038-09-03
Also published as: CN109188362A

Abstract

The invention provides a microphone array sound source positioning method, which comprises the following steps: step 1) dividing the estimated sound source position into Q grid points in the measurement space, wherein the three-dimensional coordinate of each grid point is

Sampling the M microphone signals to calculate a grid point

Time delay differences to two different microphone signals; step 2) collecting current frame data of M microphone channels, and calculating time delay values of microphone pairs; calculating a weighted value w of the q-th grid point based on the delay value and the delay difference of the step 1)_q(ii) a Then, the SRP-PHAT value p of the q grid point is calculated_qFinding w in Q grid points_qp_qGrid point corresponding to the maximum value of

Thereby obtaining the grid point coordinates of the frame data corresponding to the estimated sound source position

The invention can solve the problem that the positioning accuracy of the SRP-PHAT method in the prior art is seriously and rapidly reduced under the influence of environmental noise and reverberation conditions.

Description

Microphone array sound source positioning signal processing method

Technical Field

The invention belongs to the technical field of audio signal processing and array signal processing, and particularly relates to a microphone array sound source positioning signal processing method.

Background

Current microphone array localization algorithms fall broadly into three broad categories, namely time difference of arrival (TDOA) -based localization, controlled response power (SRP), and high resolution spectral estimation-based algorithms. Algorithms based on high-resolution spectral estimation were initially applied to the localization of narrowband sources and were later increasingly referenced by numerous scholars' transformations to the broadband source localization problem. When the method is expanded to broadband signal estimation, the signal frequency needs to be divided into a plurality of sub-bands in the frequency domain, or frequency focusing is carried out to convert the frequency into a narrow-band signal processing mode. The algorithm has high positioning resolution, but the algorithm operation amount is greatly increased due to the conversion from a broadband to a narrow band, and the performance is sharply reduced in practice because the number of sound sources is unknown and the noise environment does not meet the ideal Gaussian white noise condition.

The core of a time difference of arrival (TDOA) -based localization algorithm is the accurate estimation of acoustic propagation delay, which is generally obtained by performing cross-correlation or generalized cross-correlation on signals between microphones. Finally, the position of the sound source is determined by applying a geometric algorithm. The directional algorithm based on the arrival time difference has relatively small computation amount, good real-time performance and low hardware cost, so the method attracts attention and becomes a method widely adopted in sound source orientation. In the method, whether the time delay estimation value is accurate or not determines whether the sound source positioning is accurate or not, and the environmental noise and the indoor reverberation have certain influence on the accuracy.

The SRP method divides the space into a grid, each grid has a hypothetical sound source, the time delay difference from each hypothetical sound source to a pair of microphones at a designated position can be calculated, the cross-correlation values corresponding to the time delay differences of all the microphones are summed to obtain the response power, and the hypothetical sound source position corresponding to the maximum value of the response power is the estimated value of the real sound source position. The sound source positioning method (SRP-PHAT) combining controllable response power and phase transformation combines the inherent robustness and short-time analysis characteristics of the controllable response power method with the insensitivity of the phase transformation method to the signal surrounding environment in time delay estimation, so that a sound source positioning system has certain noise resistance and reverberation resistance. However, the SRP-PHAT method has a sharp performance degradation in a severe environment (large noise interference and serious reverberation effect).

Disclosure of Invention

The invention aims to solve the problem that the positioning accuracy of the SRP-PHAT method in the prior art is seriously and rapidly reduced under the influence of environmental noise and reverberation conditions.

In order to achieve the above object, the present invention discloses a microphone array sound source localization signal processing method, which includes:

step 1) dividing the estimated sound source position into Q grid points in the measurement space, wherein the three-dimensional coordinate of each grid point is

Sampling the M microphone signals to calculate a grid point

Time delay differences to two different microphone signals;

step 2) collecting current frame data of M microphone channels, and calculating time delay values of microphone pairs; calculating a weighted value w of the q-th grid point based on the delay value and the delay difference of the step 1)_q(ii) a Then, the SRP-PHAT value p of the q grid point is calculated_qFinding w in Q grid points_qp_qGrid point corresponding to the maximum value of

As a modification of the above method, the step 1) includes:

step 1-1) setting a microphone array consisting of M microphones to be distributed in a three-dimensional space, wherein the coordinates of each microphone are

Step 1-2) dividing all possible positions of a sound source into Q grid points in a measurement space, wherein the three-dimensional coordinates of the grid points are

Step 1-3) each microphone corresponds to a channel, and the sampling frequency of the signal is set as f_sEach frame has a sampling length of L per channel and a sampling signal of x per channel_i1(n), i1 ═ 1, …, M, n ═ 1, …, L; the number of Fourier transform points is equal to 2L-1;

step 1-4) calculating grid points

Delay differences Δ τ to the i1 th and i2 th channels_i1i2(q)：

Where i2 is 1, …, M, i2 ≠ i1, and c is the speed of sound.

As a modification of the above method, the step 2) includes:

step 2-1) calculating each microphone channel signal x respectively_i1(n), i1 ═ 1, …, M, n ═ 1, …, 2L-1 point fast fourier transform of L to obtain X_i1(k),i1＝1,…,M,k＝1,…,2L-1；

Step 2-2) calculating the phase transformation PHAT cross-correlation value R of the i1 th and i2 th microphone channels_i1i2(l)：

Wherein, X_i1(k) Is the i1 th channel receiving signal x_i1(n), i1 ═ 1, …, M, n ═ 1, …, frequency domain representation of L, the number of points calculated by the fast fourier transform FFT is 2L-1; x_i2(k) Is the i2 th channel receiving signal x_i2(n), i2 ═ 1, …, M, n ═ 1, …, the frequency domain representation of L,

is X_i2(k) Conjugation of (1); i X_i1(k) I is X_i1(k) The magnitude of (d); 1, …, L;

step 2-3) according to R_i1i2(l) Calculate the firstTime delay value between i1 and i2 microphone channels

Step 2-4) calculating Delta tau_i1i2(q) and

the standard deviation between them obtains the weighted value w of each grid point_q：

Step 2-5) calculating a controllable response power-phase transformation SRP-PHAT value p of each grid point_q；

Step 2-6) calculating the weighted controllable response power-phase transformation SRP-PHAT value w of the qth grid point_qp_qAt Q number of w_qp_qFind the maximum value among them, according to w_qp_qGet the corresponding grid point

Step 2-7) according to w_qp_qGrid point corresponding to maximum value

Obtaining the sound source position corresponding to the frame data

The invention has the advantages that:

1. the invention discloses a microphone array sound source positioning signal processing method, which adopts the technical scheme of weighted SRP-PHAT sound source positioning signal processing, and uses the reciprocal of the standard difference between the time delay estimated by a PHAT cross-correlation value and the correct time delay value corresponding to a search point as the weighted value of the SRP-PHAT value to calculate the response power of a space grid point;

2, the relative time delay value of the sound source position and the microphone is more similar to the time delay value obtained by calculation of the PHAT cross-correlation method, and the response power value is larger;

3. the invention can solve the problem that the positioning accuracy of the SRP-PHAT method in the prior art is seriously and rapidly reduced under the influence of environmental noise and reverberation conditions.

Drawings

FIG. 1 is a flow chart of a signal processing method according to the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

Setting a microphone array composed of M microphones distributed in a three-dimensional space, wherein the coordinates of each microphone are

According to the requirement of the system on estimation precision, all possible positions of the sound source can be simplified into grid points of a three-dimensional grid in a measurement space. Suppose a total division into Q grid points with coordinates of

Let the sampling rate of the signal be f_sThe length of each frame per channel sample is L.

The weighted SRP-PHAT sound source positioning method determines the estimation value of the sound source position by searching the position with the maximum weighted SRP-PHAT value in the grid

Wherein p is_qAs search points

The SRP-PHAT value of (A) is calculated by the following formula:

wherein the PHAT cross-correlation value R_i1i2(Δτ_i1i2(q)) the calculation formula is as follows:

wherein, X_i1(k) Is the i1 th channel receiving signal x_i1(n), i1 ═ 1, …, M, n ═ 1, …, frequency domain representation of L, the number of points FFT calculated is 2L-1; x_i2(k) Is the i2 th channel receiving signal x_i2(n), i2 ═ 1, …, M, n ═ 1, …, the frequency domain representation of L,

wherein, Δ τ_i1i2(q) is the grid point

The delay difference to the i1 th and i2 th channels is calculated as:

where i2 is 1, …, M, i2 ≠ i1, and c is the speed of sound.

Weighted value w_qThe calculation formula of (a) is as follows:

wherein

Is prepared by using R_i1i2(τ) maximum position estimated delay value:

examples

Each microphone corresponds to a channel, and the sampling frequency of the signal is set as f_sAnd the sampling length per channel of each frame is L and is marked as x_i1(n), i1 ═ 1, …, M, n ═ 1, …, L. The number of Fourier transform points is equal to 2L-1.

As shown in fig. 1, the signal processing method disclosed by the present invention specifically comprises the following steps:

step 1) calculating the time delay difference from each grid point to the position of the microphone by using a formula (4) according to the position coordinates of the microphone and the coordinates of the searched grid points, and storing for later use. This step is performed only once;

and 2) processing each frame of data to obtain the estimation of the frame of data on the position of the sound source.

The specific steps of each frame of data processing are as follows:

step 2-1) calculating each channel signal x respectively_i1(n), 2L-1 point Fast Fourier Transform (FFT) of L, i1 ═ 1, …, M, n ═ 1, …, and X is obtained_i1(k),i1＝1,…,M,k＝1,…,2L-1；

Step 2-2) calculating the PHAT cross-correlation value R of the signals of all channel microphones according to the formula (3)_i1i2(l)；

Step 2-3) the PHAT cross-correlation value R is used according to the formula (6)_i1i2(τ) calculating delay estimates between all channel pairs

Step 2-4) calculating Delta tau according to formula (5)_i1i2(q) and

the standard deviation between them obtains the weighted value w of each grid point_q；

Step 2-5) calculating the SRP-PHAT value p of each grid point according to the formula (2)_q；

Step 2-6) calculating weighted SRP-PHAT values p of all grid points according to formula (1)_qFinding out the grid point corresponding to the maximum value

Step 2-7) according to w_qp_qGrid point corresponding to maximum value

Obtaining the sound source position corresponding to the frame data

The invention discloses a weighted SRP-PHAT microphone array sound source positioning signal processing method, which uses the reciprocal of the standard difference between the time delay estimated by a PHAT cross-correlation value and the correct time delay value corresponding to a search point as the weighted value of an SRP-PHAT value to calculate the response power of a space grid point. The guiding idea is that if the grid point is the correct sound source position, the relative time delay value of the grid point and the microphone pair is closer to the time delay value calculated by the PHAT cross-correlation method, and the response power value of the point is larger. By adopting the method, the accuracy of sound source positioning can be further improved.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.