CN109188362A

CN109188362A - A kind of microphone array auditory localization signal processing method

Info

Publication number: CN109188362A
Application number: CN201811019390.1A
Authority: CN
Inventors: 孟晓辉; 理华; 肖灵
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2018-09-03
Filing date: 2018-09-03
Publication date: 2019-01-11
Anticipated expiration: 2038-09-03
Also published as: CN109188362B

Abstract

The present invention provides a microphone array sound source localization method, comprising: step 1) dividing the estimated sound source position into Q grid points in the measurement space, and the three-dimensional coordinates of each grid point are Sampling M microphone signals to calculate grid points The delay difference to two different microphone signals; step 2) collect the current frame data of M microphone channels, calculate the delay value of the microphone pair; calculate the qth network based on the delay value and the delay difference in step 1). The weighted value w _q of the grid point; then calculate the SRP-PHAT value p _q of the qth grid point, and find the grid point corresponding to the maximum value of w _q p _q among the Q grid points Thereby, the grid point coordinates of the frame data corresponding to the estimated sound source position are obtained. The invention can solve the problem that the positioning accuracy of the SRP-PHAT method in the prior art is seriously affected by environmental noise and reverberation conditions, and the problem is sharply reduced.

Description

A kind of microphone array auditory localization signal processing method

Technical field

The invention belongs to Audio Signal Processings and array signal processing technology, in particular to a kind of microphone Array sound source positioning signal processing method.

Background technique

Microphone array location algorithm is roughly divided into three categories at present, i.e., is positioned based on reaching time-difference (TDOA), is controllable Responding power (SRP) and the algorithm based on High-Resolution Spectral Estimation.Algorithm based on High-Resolution Spectral Estimation is initially applied to The positioning of narrow-band source was gradually referred in broad band source orientation problem by numerous scholars transformation later.It is extended to broadband signal estimation When, it needs that signal frequency is divided into multiple subbands in frequency domain, or carry out frequency focusing to be converted into narrow band signal processing Mode.Such algorithm positioning resolution is very high, but due to the conversion in broadband to be carried out to narrowband, so that algorithm operation quantity increases greatly Add, be even more in practice because sound source number is unknown and noise circumstance is unsatisfactory for ideal white Gaussian noise condition and performance sharply Decline.

Core based on reaching time-difference (TDOA) location algorithm is the accurate estimation to acoustic propagation time delay, generally passes through Cross-correlation is done between signal microphone or broad sense cross correlation process obtains.Finally by the application for geometric algorithm by sound source Position determined.Orientation algorithm operand based on reaching time-difference is relatively small, and real-time is preferable, hardware cost compared with It is low, thus receive much attention, become the method being widely used in sound source direction.Whether time delay estimated value accurately determines in this method Whether auditory localization is accurate, and ambient noise and room reverberation can generate certain influence to its accuracy.

SRP method divides space into grid one by one, and each grid has the sound source of a hypothesis, can calculate every A imagination sound source to a pair of designated position microphone delay inequality, by all microphones in the corresponding cross correlation value of its delay inequality Summation, so that it may responding power is obtained, so that it is real sources that responding power, which obtains the corresponding imaginary sound source position of maximum value, The estimated value of position.Combine the sound localization method (SRP-PHAT) of controllable responding power and phse conversion for controllable responding power Insensitivity of the phse conversion method to signal ambient enviroment in the intrinsic robustness of method, short-time analysis feature and time delay estimation It combines, makes sonic location system that there is certain noise immunity, anti-reverberation.But SRP-PHAT method is in the presence of a harsh environment (noise jamming is big, reverberation influences seriously) performance sharply declines.

Summary of the invention

It is an object of the invention to solve the positioning accuracy of SRP-PHAT method in the prior art by ambient noise and reverberation The influence of condition is serious, the problem of sharply declining.

To achieve the above object, the present invention discloses a kind of microphone array auditory localization signal processing method, comprising:

Estimation sound source position is divided into Q mesh point in measurement space by step 1), and each mesh point three-dimensional coordinate isM microphone signal is sampled, mesh point is calculatedTo the time delay of two different microphone signals Difference；

Step 2) acquires the current frame data of M microphone channel, calculates the time delay value of microphone pair；Based on the time delay value The weighted value w of q-th of mesh point is calculated with the delay inequality of step 1)_q；Then the SRP-PHAT value p of q-th of mesh point is calculated_q, W is found in Q mesh point_qp_qThe corresponding mesh point of maximum valueTo obtain the net of the corresponding estimation sound source position of the frame data Lattice point coordinate

As a kind of improvement of the above method, the step 1) includes:

Step 1-1) set the microphone array distribution of M microphone composition in three dimensions, each microphone coordinate is

Step 1-2) in measurement space all possible positions of sound source are divided into Q mesh point, three-dimensional coordinate is

Step 1-3) one channel of each microphone correspondence, if the sample frequency of signal is f_s, every every channel sample of frame is long Degree is L, and every channel sample signal is x_i1(n), i1=1 ..., M, n=1 ..., L；Fourier transformation points are equal to 2L-1；

Step 1-4) calculate mesh pointTo the i-th 1 and the delay inequality Δ τ in the i-th 2 channels_i1i2(q):

Wherein, i2=1 ..., M, i2 ≠ i1, c are the velocity of sound.

As a kind of improvement of the above method, the step 2) includes:

Step 2-1) calculate separately each microphone channel signal x_i1(n), i1=1 ..., M, n=1 ..., the 2L-1 point of L Fast Fourier Transform (FFT) obtains X_i1(k), i1=1 ..., M, k=1 ..., 2L-1；

Step 2-2) calculate the phse conversion PHAT cross correlation value R of the i-th 1 and the i-th 2 microphone channels_i1i2(l):

Wherein, X_i1It (k) is the i-th 1 channel receiving signal x_i1(n), i1=1 ..., M, n=1 ..., the frequency domain representation of L, The points that Fast Fourier Transform (FFT) FFT is calculated are 2L-1；X_i2It (k) is the i-th 2 channel receiving signal x_i2(n), i2=1 ..., M, The frequency domain representation of n=1 ..., L,It is X_i2(k) conjugation；|X_i1(k) | it is X_i1(k) amplitude；L=1 ..., L；

Step 2-3) according to R_i1i2(l) time delay value between the i-th 1 and the i-th 2 microphone channels is calculated

Step 2-4) calculate Δ τ_i1i2(q) withBetween standard deviation obtain the weighted value w of each mesh point_q:

Step 2-5) calculate controllable responding power-phse conversion SRP-PHAT value p of each mesh point_q；

Step 2-6) calculate q-th of mesh point the controllable responding power of weighting-phse conversion SRP-PHAT value w_qp_q, at Q w_qp_qMaximum value therein is found out, according to w_qp_qMaximum value obtain corresponding mesh point

Step 2-7) according to w_qp_qThe corresponding mesh point of maximum valueObtain the corresponding sound source position of the frame data

Present invention has an advantage that

1, the present invention discloses a kind of microphone array auditory localization signal processing method, fixed using weighting SRP-PHAT sound source Position signal processing technology scheme, with the standard deviation between the time delay correct time delay value corresponding with Searching point of PHAT cross correlation value estimation Inverse ask the responding power of space networks lattice point to can be further improved using this method as the weighted value of SRP-PHAT value The accuracy of auditory localization；

The time delay value that the relative time delay value and PHAT cross-correlation method of 2 sound source positions and microphone of the invention are calculated More close, responding power value is bigger；

3, the present invention is able to solve the positioning accuracy of SRP-PHAT method in the prior art by ambient noise and reverberation condition Influence is serious, the problem of sharply declining.

Detailed description of the invention

Fig. 1 is signal processing method flow chart of the present invention.

Specific embodiment

The present invention will be described in detail in the following with reference to the drawings and specific embodiments.

If the microphone array distribution of M microphone composition is in three dimensions, each microphone coordinate isAccording to system to estimated accuracy requirement, all possible positions of sound source can be reduced to three-dimensional in measurement space The lattice point of grid.Assuming that being divided into Q mesh point altogether, coordinate isIf the sampling rate of signal is f_s, Every every channel sample length of frame is L.

Weighting SRP-PHAT sound localization method disclosed by the invention passes through the weighting SRP-PHAT value in search grid most Big position determines the estimated value of sound source position

Wherein, p_qFor Searching pointSRP-PHAT value, calculation formula is as follows:

Wherein, PHAT cross correlation value R_i1i2(Δτ_i1i2(q)) calculation formula is as follows:

Wherein, X_i1It (k) is the i-th 1 channel receiving signal x_i1(n), i1=1 ..., M, n=1 ..., the frequency domain representation of L, The points that FFT is calculated are 2L-1；X_i2It (k) is the i-th 2 channel receiving signal x_i2(n), i2=1 ..., M, n=1 ..., the frequency of L Domain representation,It is X_i2(k) conjugation；|X_i1(k) | it is X_i1(k) amplitude；L=1 ..., L；

Wherein, Δ τ_i1i2It (q) is mesh pointTo the i-th 1 and the delay inequality in the i-th 2 channels, its calculation formula is:

Wherein, i2=1 ..., M, i2 ≠ i1, c are the velocity of sound.

Weighted value w_qCalculation formula it is as follows:

WhereinFor with R_i1i2The time delay value that (τ) maximum value position estimates:

Embodiment

If the microphone array distribution of M microphone composition is in three dimensions, each microphone coordinate isAccording to system to estimated accuracy requirement, all possible positions of sound source can be reduced to three in measurement space Tie up the lattice point of grid.Assuming that being divided into Q mesh point altogether, coordinate isEach microphone is one corresponding Channel, if the sample frequency of signal is f_s, every every channel sample length of frame is L, is denoted as x_i1(n), i1=1 ..., M, n=1 ..., L.Fourier transformation points are equal to 2L-1.

As shown in Figure 1, specific step is as follows for signal processing method disclosed by the invention:

Step 1) calculates each mesh point to wheat according to microphone position coordinate and dragnet lattice point coordinate, with formula (4) Gram wind stores spare the delay inequality of position.The step for Exactly-once；

The every frame data of step 2) processing obtain the frame data and estimate sound source position.

Specific step is as follows for every frame data processing:

Step 2-1) calculate separately each channel signal x_i1(n), i1=1 ..., M, n=1 ..., quick Fu of 2L-1 point of L In leaf transformation (FFT), obtain X_i1(k), i1=1 ..., M, k=1 ..., 2L-1；

Step 2-2) according to formula (3) all channel microphones are calculated to the PHAT cross correlation value R of signal_i1i2(l)；

Step 2-3) according to formula (6) use PHAT cross correlation value R_i1i2(τ) calculates the Delay Estima-tion value between all channels pair

Step 2-4) according to formula (5) calculating Δ τ_i1i2(q) withBetween standard deviation obtain the weighting of each mesh point Value w_q；

Step 2-5) according to the SRP-PHAT value p of each mesh point of formula (2) calculating_q；

Step 2-6) according to the weighting SRP-PHAT value p of all mesh points of formula (1) calculating_q, find out wherein maximum value pair The mesh point answered

A kind of weighting SRP-PHAT microphone array auditory localization signal processing method disclosed by the invention, it is mutual with PHAT Weighted value of the inverse of standard deviation between the time delay correct time delay value corresponding with Searching point of pass value estimation as SRP-PHAT value Seek the responding power of space networks lattice point.Its guiding theory is if mesh point is correct sound source position, with microphone pair The time delay value that should be calculated with PHAT cross-correlation method of relative time delay value it is more close, and then make the responding power value of the point It is bigger.Using this method, the accuracy of auditory localization can be further improved.

It should be noted last that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting.Although ginseng It is described the invention in detail according to embodiment, those skilled in the art should understand that, to technical side of the invention Case is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered in the present invention Scope of the claims in.

Claims

1. a kind of microphone array sound localization method, comprising:

Step 2) acquires the current frame data of M microphone channel, calculates the time delay value of microphone pair；Based on the time delay value and step Rapid delay inequality 1) calculates the weighted value w of q-th of mesh point_q；Then the SRP-PHAT value p of q-th of mesh point is calculated_q, at Q W is found in mesh point_qp_qThe corresponding mesh point of maximum valueTo obtain the mesh point of the corresponding estimation sound source position of the frame data Coordinate

2. microphone array sound localization method according to claim 1, which is characterized in that the step 1) includes:

Step 1-3) one channel of each microphone correspondence, if the sample frequency of signal is f_s, every every channel sample length of frame is L, Every channel sample signal is x_i1(n), i1=1 ..., M, n=1 ..., L；Fourier transformation points are equal to 2L-1；

Wherein, i2=1 ..., M, i2 ≠ i1, c are the velocity of sound.

3. microphone array sound localization method according to claim 2, which is characterized in that the step 2) includes:

Step 2-1) calculate separately each microphone channel signal x_i1(n), i1=1 ..., M, n=1 ..., the 2L-1 point of L is quick Fourier transformation obtains X_i1(k), i1=1 ..., M, k=1 ..., 2L-1；

Wherein, X_i1It (k) is the i-th 1 channel receiving signal x_i1(n), i1=1 ..., M, n=1 ..., the frequency domain representation of L, quick Fu In leaf transformation FFT calculate points be 2L-1；X_i2It (k) is the i-th 2 channel receiving signal x_i2(n), i2=1 ..., M, n= The frequency domain representation of 1 ..., L,It is X_i2(k) conjugation；|X_i1(k) | it is X_i1(k) amplitude；L=1 ..., L；

Step 2-6) calculate q-th of mesh point the controllable responding power of weighting-phse conversion SRP-PHAT value w_qp_q, in Q w_qp_q Maximum value therein is found out, according to w_qp_qMaximum value obtain corresponding mesh point