CN105403860B

CN105403860B - A kind of how sparse sound localization method related based on domination

Info

Publication number: CN105403860B
Application number: CN201410451825.5A
Authority: CN
Inventors: 应冬文; 李军锋; 冯永强; 潘接林; 颜永红
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2014-08-19
Filing date: 2014-08-19
Publication date: 2017-10-31
Anticipated expiration: 2034-08-19
Also published as: CN105403860A

Abstract

The present invention relates to a kind of based on the how sparse sound localization method for dominating correlation, including：The sound-source signal received by microphone array is changed into digital audio signal；Extract the frequency spectrum of the digital audio signal of each microphone；The spatial correlation matrix on each frequency is calculated using the frequency spectrum of the digital audio signal of all microphones on the identical frequency of adjacent time；Extract the main characteristic vector of spatial correlation matrix；It is determined that on each frequency all microphones pair time delay set；Using the method for iteration, the azimuth for the sound source incident direction being top dog on each frequency is calculated；Statistical analysis is carried out to the azimuth of the sound source incident direction that is top dog on all frequencies, it is determined that final sound source incident direction and sound source number.This method considers acoustics robustness, it is adaptable to the real-time positioning of how sparse sound source.

Description

A kind of how sparse sound localization method related based on domination

Technical field

The present invention relates to field of sound source location, more particularly to a kind of how sparse sound localization method related based on domination.

Background technology

Auditory localization includes simple sund source positioning and many auditory localizations, and auditory localization technology can be indicated where acoustic target Dimensional orientation, important spatial information is provided for follow-up information gathering with processing.

Many auditory localization technologies occupy critical positions in the application of microphone array, and it can be used for teleconference, be wheat Gram wind array indicates the direction that wave beam is focused on, and directional information is provided for meeting camera.

In field of sound source location, classified based on source signal and the auditory localization algorithm of spatial beams scanning is using most For extensive algorithm, it is famous with the robustness under the conditions of noise and reverberation.However, there is a common spy in these algorithms Point：Their cost function non-convex is non-recessed, causes the presence of the optimal estimation that multiple extreme values correspond to sound source position.Therefore, it is this kind of Method generally seeks the globally optimal solution of sound source position by the way of lattice point traversal, causes amount of calculation to steeply rise, it is difficult to full The demand positioned when full.

Above-mentioned problem is overcome based on the openness many sound localization methods of time-frequency domain, the method is assumed on each frequency only There is a sound source to occupy ascendancy, the interference of other sound sources can be ignored.This hypothesis is by auditory localization more than one Problem reduction is the simple sund source orientation problem on single frequency point, although considerably reduces amount of calculation, but is a lack of acoustics robust Property causes this kind of method to be difficult to show excellent performance under complicated acoustic enviroment.

The content of the invention

It is computationally intensive present in existing many sound localization methods it is an object of the invention to overcome, lack acoustics robustness Etc. defect, using sparse sound source time-frequency domain domination correlation, so as to provide a kind of robust and efficient many auditory localization sides Method.

To achieve these goals, the present invention proposes a kind of how sparse sound localization method related based on domination, including：

Step 101), the sound-source signal received by microphone array is converted into digital audio signal, the microphone Array includes K microphone；

Step 102), extract the frequency spectrum of the digital audio signal of each microphone；

Step 103), utilize the spectrometer of the digital audio signal of all microphones on the identical frequency of t adjacent time Calculate the spatial correlation matrix on each frequency；

Step 104), the spatial correlation matrix on each frequency of t is decomposed, obtained on each frequency of t Main characteristic vector, the main characteristic vector each component correspondence one microphone collection signal；

Step 105), M is asked on each frequency of t to microphone using the main characteristic vector on each frequency of t Time delay set, the M be equal to K (K-1) 2；

Step 106), according to time delay set of the M to microphone on each frequency of t, using the method for iteration, meter Calculate the azimuth for the sound source incident direction being top dog on each frequency of t；

Step 107), the azimuth of the sound source incident direction being top dog of all frequencies of t is counted Analysis；

Step 108), export the t sound source incident direction and sound source number finally determined.

In above-mentioned technical proposal, in step 105) in, the main characteristic vector on each frequency of utilization t asks for t Time delay set of the M to microphone includes on moment each frequency：

On t f frequencies, main characteristic vector is expressed as：[u_1,t,f,u_2,t,f,…u_K,t,f], by p-th and q-th The m (m=1,2 ..., M) of microphone composition is to the time phase difference of microphoneFor：

The operation of complex phase is asked in wherein ∠ () expressions,Represent angular frequency；

On t f frequencies, according to m to microphone apart from r_mConstraint, obtains time phase aliasing set L_m,t,f：

Wherein, c is the velocity of sound；

On t f frequencies, m is combined into the time delay collection of microphone：

Wherein,, representTime cycle in frequency.

In above-mentioned technical proposal, the step 106) further comprise：

Step 106-1), choose initial sound source incident direction

Step 106-2), a time-delay value is chosen from each time delay set；

OrderFrom each time delay set B_m,t,fOne time-delay value τ of middle selection_m,t,f, meet：

Wherein, g_m=(g_mx,g_my,g_mz) represent direction unit vectors of the m to microphone line；

Step 106-3), ask for new weight coefficient w_m,t,f；

Wherein：

δ_m,t,f=arccos (p^Tg_m)-arccos(cτ_m,t,f/r_m), m=1,2 ..., M

Step 106-4), calculate new sound source incident direction

Wherein：

Step 106-5), judgeWhether restrain；If a determination be made that certainly, it is transferred to step 106-6)；Otherwise, It is transferred to step 106-2) continue executing with；

Step 106-6), calculate the sound source incident direction being top dogOrientation Angle.

In above-mentioned technical proposal, in step 106-1) in, it is described to choose initial sound source incident directionIncluding：

At the 360 ° × elevation angle in azimuth, 90 ° of lattice point spatially, uniformly chooses several sound source incident directions as candidate Value, gathers { ψ labeled as vector₁,ψ₂,…ψ_H, H ＞ 8 and for integer；For the sound source incident direction ψ of each candidate_h, h=1, 2 ... H, calculate it and arrive the distance of all time delay set and be expressed as：

Wherein, % represents that floating-point remainder is operated；

On t f frequencies, initial sound source incident directionMeet：

In above-mentioned technical proposal, in step 107) in, the statistical analysis includes：Histogram analysis, clustering.

The advantage of the invention is that：

1st, by the domination similitude between adjacent time frequency, the information on adjacent time frequency is fully excavated, realization can The estimated spatial position leaned on；

2nd, a kind of pairing time delay abstracting method of spatial correlation matrix based on time frequency block is proposed, this method is utilized The weight coefficient of signal enhancing and time delay weighs reliability, it is achieved thereby that the how sparse sound localization method of robust.

Brief description of the drawings

Fig. 1 is a kind of flow chart of how sparse sound localization method based on ascendancy of the present invention；

Fig. 2 is the azimuthal method flow of sound source incident direction being top dog on each frequency of calculating of the present invention Figure；

Fig. 3 is that the present invention enters column hisgram point to the sound source incident direction azimuth that is top dog on all frequencies The flow chart of analysis；

Fig. 4 is the schematic diagram of the azimuth histogram analysis of the present invention.

Embodiment

Below by drawings and examples, technical scheme is described in further detail.

With reference to Fig. 1, method of the invention comprises the following steps：

Step 101), the sound-source signal received by microphone array is converted into digital audio signal；

The microphone array includes K microphone；

Step 102), digital audio signal is pre-processed, each Mike is extracted by Fast Fourier Transform (FFT) (FFT) The frequency spectrum of the digital audio signal of wind；

It is described that digital audio signal progress pretreatment is included：To the first zero padding of the digital audio signal of each frame to N points, N =2ⁱ, i is integer, and i >=8；Then, adding window is carried out to the digital audio signal of each frame or preemphasis is handled, windowed function Using the peaceful window (hanning) of Hamming window (hamming) or Kazakhstan；

Fast Fourier Transform (FFT) is carried out to the digital audio signal of t, obtain t digital audio signal it is discrete Frequency spectrum is

Wherein, y_k,t,nRepresent that k-th of microphone of t gathers n-th of sampled point of signal, Y_k,t,f(k=1,2 ... K, f =0,1 ... N-1) represent that k-th of microphone of t gathers the Fourier Transform Coefficients of f-th of frequency of signal.

Step 103), when calculating t using the frequency spectrum of the digital audio signal of all microphones on the identical frequency of adjacent time Carve the spatial correlation matrix of each frequency；

If x_t,fFor t, one generated on f-th of frequency is complex vector located：x_t,f={ Y_1,t,f,Y_2,t,f…Y_K,t,f, its Autocorrelation matrix is：Wherein：()^HRepresent conjugate transposition；

Plural autocorrelation matrix R_t,fIt is expressed as the average of autocorrelation matrix on the adjacent time frequency centered on f frequencies：

Wherein, A represents the frame number with t adjacent time；

x_t,fSpatial correlation matrix be R_t,f。

Step 104), the spatial correlation matrix on each frequency of t is decomposed, obtained on each frequency of t Main characteristic vector；The collection signal of each component one microphone of correspondence of vector；

Step 105), M is asked on each frequency of t to microphone using the main characteristic vector on each frequency of t Time delay set, the M be equal to K (K-1) 2；Detailed process is：

Wherein, c is the velocity of sound；

Wherein,, representTime cycle in frequency.

With reference to Fig. 2, specific implementation step is as follows：

Step 106-1), choose initial sound source incident direction

Wherein, g_m=(g_mx,g_my,g_mz) direction unit vectors of the m to microphone line is represented, % represents floating-point remainder Operation.

On t f frequencies, initial sound source incident directionMeet：

Step 106-2), a time-delay value is chosen from each time delay set；

Step 106-3), ask for new weight coefficient w_m,t,f；

Wherein：

δ_m,t,f=arccos (p^Tg_m)-arccos(cτ_m,t,f/r_m), m=1,2 ..., M

Step 106-4), calculate new sound source incident direction

Wherein：

In the present embodiment, judgeWhether convergent method is：

JudgeWhether threshold value ε is less than, wherein taking ε=0.01.

Step 106-6), calculate the sound source incident direction being top dogOrientation Angle；Computational methods are：

Step 107) statistical is carried out to the azimuth of the sound source incident directions being top dog of all frequencies of t Analysis；

The statistical analysis includes：Histogram analysis and clustering.

In the present embodiment, the azimuth of the sound source incident direction being top dog of all frequencies of t is carried out Histogram analysis, with reference to Fig. 3, are comprised the following steps that：

Step 107-1), build and smooth azimuth histogram；

360 degree of azimuth is divided into 360 equal portions, histogram is built on these equal portions, and it is smooth using window function Histogram.

Step 107-2), calculate histogram threshold value；

Histogram threshold value=average histogram value+(maximum Nogata value-average histogram value) × empirical coefficient；

Wherein empirical coefficient span is between 0 to 1；In the present embodiment, empirical coefficient value is 0.3.

Step 107-3), it is determined that final azimuth and azimuth number；

With reference to Fig. 4, choose peak value in histogram and be more than the peak of histogram threshold value as final azimuth, of peak value Number is azimuthal number.

Step 107-4), it is determined that final sound source incident direction and sound source number.

The corresponding sound source incident direction in this azimuth is obtained according to the azimuth of selection, azimuthal number is that sound source is incident The number in direction.

In other embodiments, can be to the orientation of the sound source incident direction being top dog of all frequencies of t Angle carries out clustering, and specific processing method belongs to common knowledge, will not be described here.

Claims

1. it is a kind of based on the how sparse sound localization method for dominating correlation, including：

Step 101), the sound-source signal received by microphone array is converted into digital audio signal, the microphone array Including K microphone；

Step 103), calculate every using the frequency spectrum of the digital audio signal of all microphones on the identical frequency of t adjacent time Spatial correlation matrix on individual frequency；

Step 104), the spatial correlation matrix on each frequency of t is decomposed, the master on each frequency of t is obtained Characteristic vector, the collection signal of each component one microphone of correspondence of the main characteristic vector；

Step 105), using the main characteristic vector on each frequency of t ask for M on each frequency of t to microphone when Between postpone set, the M be equal to K (K-1)/2；

Step 106), according to time delay set of the M to microphone on each frequency of t, using the method for iteration, calculate t The azimuth for the sound source incident direction being top dog on moment each frequency；

Step 107), statistical analysis is carried out to the azimuth of the sound source incident direction being top dog of all frequencies of t；

2. it is according to claim 1 based on the how sparse sound localization method for dominating correlation, it is characterised in that in step 105) in, the main characteristic vector on each frequency of utilization t asks for times of the M to microphone on each frequency of t Delay set includes：

On t f frequencies, main characteristic vector is expressed as：[u_1,t,f,u_2,t,f,…u_K,t,f], by p-th and q-th of Mike The m (m=1,2 ..., M) of wind composition is to the time phase difference of microphoneFor：

Wherein, c is the velocity of sound；

<mrow> <msub> <mi>B</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mo>{</mo> <msub> <mover> <mi>&tau;</mi> <mo>^</mo> </mover> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <mo>=</mo> <msub> <mover> <mi>&tau;</mi> <mo>&OverBar;</mo> </mover> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>l</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <msub> <mi>T</mi> <mi>f</mi> </msub> <mo>|</mo> <msub> <mi>l</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <mo>&Element;</mo> <msub> <mi>L</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <mo>}</mo> <mo>,</mo> <mi>m</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>M</mi> </mrow>

Wherein,RepresentTime cycle in frequency.

3. it is according to claim 2 based on the how sparse sound localization method for dominating correlation, it is characterised in that step 106) Further comprise：

Step 106-1), choose initial sound source incident directionOrder

Step 106-2), a time-delay value is chosen from each time delay set；

<mrow> <msub> <mi>&tau;</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mi>arg</mi> <munder> <mi>min</mi> <mrow> <mi>&tau;</mi> <mo>&Element;</mo> <msub> <mi>B</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> </mrow> </munder> <mrow> <mo>|</mo> <mrow> <msubsup> <mi>pg</mi> <mi>m</mi> <mi>T</mi> </msubsup> <msub> <mi>r</mi> <mi>m</mi> </msub> <mo>-</mo> <mi>c</mi> <mi>&tau;</mi> </mrow> <mo>|</mo> </mrow> <mo>,</mo> <mi>m</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>M</mi> <mo>;</mo> </mrow> 1

Step 106-3), ask for new weight coefficient w_m,t,f；

<mrow> <msub> <mi>w</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <mo>=</mo> <mi>exp</mi> <mrow> <mo>(</mo> <msubsup> <mi>&delta;</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> <mn>2</mn> </msubsup> <mo>/</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>/</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mi>exp</mi> <mrow> <mo>(</mo> <msubsup> <mi>&delta;</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> <mn>2</mn> </msubsup> <mo>/</mo> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>,</mo> <mi>m</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mn>2</mn> <mo>,</mo> <mo>...</mo> <mo>,</mo> <mi>M</mi> </mrow>

Wherein：

δ_m,t,f=arccos (p^Tg_m)-arccos(cτ_m,t,f/r_m), m=1,2 ..., M

<mrow> <msup> <mi>&sigma;</mi> <mn>2</mn> </msup> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msubsup> <mi>&delta;</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> <mn>2</mn> </msubsup> <mo>/</mo> <mi>M</mi> </mrow>

Step 106-4), calculate new sound source incident direction

<mrow> <mfenced open = "(" close = ")"> <mtable> <mtr> <mtd> <msub> <mover> <mi>&gamma;</mi> <mo>^</mo> </mover> <mrow> <mn>1</mn> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <msub> <mover> <mi>&gamma;</mi> <mo>^</mo> </mover> <mrow> <mn>2</mn> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <msup> <mrow> <mo>&lsqb;</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>w</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <msubsup> <mi>g</mi> <mi>m</mi> <mo>&prime;</mo> </msubsup> <msubsup> <mi>g</mi> <mi>m</mi> <mrow> <mo>&prime;</mo> <mi>T</mi> </mrow> </msubsup> <mo>&rsqb;</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msup> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <msub> <mi>cw</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <msub> <mi>&tau;</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <msubsup> <mi>g</mi> <mi>m</mi> <mrow> <mo>&prime;</mo> <mi>T</mi> </mrow> </msubsup> <mo>/</mo> <msub> <mi>r</mi> <mi>m</mi> </msub> </mrow>

<mrow> <msub> <mover> <mi>&gamma;</mi> <mo>^</mo> </mover> <mrow> <mn>3</mn> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> </msub> <mo>=</mo> <msqrt> <mrow> <mn>1</mn> <mo>-</mo> <msubsup> <mover> <mi>&gamma;</mi> <mo>^</mo> </mover> <mrow> <mn>1</mn> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> <mn>2</mn> </msubsup> <mo>-</mo> <msubsup> <mover> <mi>&gamma;</mi> <mo>^</mo> </mover> <mrow> <mn>2</mn> <mo>,</mo> <mi>t</mi> <mo>,</mo> <mi>f</mi> </mrow> <mn>2</mn> </msubsup> </mrow> </msqrt> </mrow>

Wherein：g′_m=(g_mx,g_my)；

Step 106-5), judgeWhether restrain；If a determination be made that certainly, it is transferred to step 106-6)；Otherwise, makeIt is transferred to step 106-2) continue executing with；

Step 106-6), calculate the sound source incident direction being top dogAzimuth.

4. it is according to claim 3 based on the how sparse sound localization method for dominating correlation, it is characterised in that in step It is described to choose initial sound source incident direction in 106-1)Including：

At the 360 ° × elevation angle in azimuth, 90 ° of lattice point spatially, uniformly chooses several sound source incident directions as candidate value, mark It is designated as vector set { ψ₁,ψ₂,…ψ_H, H ＞ 8 and for integer；For the sound source incident direction ψ of each candidate_h, h=1,2 ... H, It is calculated to arrive the distance of all time delay set and be expressed as：

<mrow> <msub> <mi>b</mi> <mi>f</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>&psi;</mi> <mi>h</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>m</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mo>&lsqb;</mo> <msubsup> <mi>&psi;</mi> <mi>H</mi> <mi>T</mi> </msubsup> <msub> <mi>g</mi> <mi>m</mi> </msub> <msub> <mi>r</mi> <mi>m</mi> </msub> <mo>-</mo> <mi>c</mi> <mi>&tau;</mi> <mo>)</mo> <mi>%</mi> <mrow> <mo>(</mo> <msub> <mi>cT</mi> <mi>f</mi> </msub> <mo>)</mo> </mrow> <msup> <mo>&rsqb;</mo> <mn>2</mn> </msup> </mrow>

Wherein, % represents that floating-point remainder is operated；

On t f frequencies, initial sound source incident directionMeet：

<mrow> <msub> <mover> <mi>&gamma;</mi> <mo>^</mo> </mover> <mrow> <mi>t</mi> <mo>,</mo> <mi>f</mi> <mo>,</mo> <mn>0</mn> </mrow> </msub> <mo>=</mo> <munder> <mrow> <mi>arg</mi> <mi>min</mi> </mrow> <mrow> <mn>1</mn> <mo>&le;</mo> <mi>h</mi> <mo>&le;</mo> <mi>H</mi> </mrow> </munder> <msub> <mi>b</mi> <mi>f</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>&psi;</mi> <mi>h</mi> </msub> <mo>)</mo> </mrow> <mo>.</mo> </mrow>

5. it is according to claim 1 based on the how sparse sound localization method for dominating correlation, it is characterised in that in step 107) in, the statistical analysis includes：Histogram analysis, clustering.