CN106526541A

CN106526541A - Sound positioning method based on distribution matrix decision

Info

Publication number: CN106526541A
Application number: CN201610893331.1A
Authority: CN
Inventors: 王建中; 叶凯; 曹九稳; 薛安克; 王天磊
Original assignee: Hangzhou Electronic Science and Technology University
Current assignee: Hangzhou Dianzi University; Hangzhou Electronic Science and Technology University
Priority date: 2016-10-13
Filing date: 2016-10-13
Publication date: 2017-03-22
Anticipated expiration: 2036-10-13
Also published as: CN106526541B

Abstract

The invention discloses a sound positioning method based on a distribution matrix decision. The method comprises the following steps: 1) carrying out pretreatment, comprising framing, on multi-channel sound signals collected by a sound array; 2)carrying out sound identification algorithm on single-channel data, each frame obtaining one sound identification result; 3) carrying out broadband sound positioning on multi-channel data, each frame obtaining one sound positioning result; 4) through identification and positioning result sets, which represent lines and rows of a matrix, obtained through the steps above, constructing a distribution matrix; 5) after obtaining the distribution matrix, finding a positioning distribution peak value of a target sound source; and 6) selecting the peak value and adjacent two angle sections, and calculating a statistical mean value of the three sections. The method can improve sound positioning algorithm result accuracy, and such effect is especially obvious especially under the condition of obvious interference and complex environment background; dependent of the positioning algorithm on the identification results is low; and the method has wide applicability.

Description

Sound localization method based on distribution matrix decision-making

Technical field

The invention belongs to signal processing technology field, more particularly to the sound localization method based on distribution matrix decision-making.

Background technology

In traditional sound location algorithm, there are problems that following：

1. poor anti jamming capability.It is noiseless indoors, it is muting in the case of, location algorithm accuracy rate is high, but out of doors In the case of complex environment, once occur noise or very be interference, positioning result will be produced a very large impact.

2. sound signal processing field, recognizes and location algorithm contact is tight, and complement each other.Conventional location algorithm does not but have Having, and lacks the assurance to information fusion technology advantage.

The content of the invention

For problem above, the invention provides a kind of sound localization method based on distribution matrix decision-making.Now with cross It is illustrated as a example by ideophone array.

To achieve these goals, the technical solution used in the present invention comprises the steps：

Step 1, the four-way acoustical signal collected to acoustic array carry out pretreatment, and pretreatment includes framing；

Step 2, voice recognition is carried out to single-channel data；

Step 3, wideband voice positioning is carried out to multi-channel data；

Step 4, the identification obtained according to step 2,3 and positioning result set, build distribution matrix；

Step 5, obtain distribution matrix after, find the positioning distribution peaks of target sound source；

Step 6, selection peak value and its two neighboring angular interval, calculate these three interval average statisticals, as finally Positioning result.

Described step 1：Live sound signal is obtained using cross acoustic array, note sample frequency is f_s.To four-way Acoustical signal carries out sub-frame processing, it is assumed that the frame number after framing is m.Next to framing after each frame signal process.

Described step 2：Take each frame single channel signal after framing to be identified.

The described algorithm being identified to single channel signal is LPCC+SVM algorithms.

Each frame obtains a recognition result, so as to constitute recognition result array C that length is m.

C=[c (1) c (2) c (m)]；

Described step 3：Taking each frame four-way signal after framing carries out broadband location algorithm.

It is broadband MUSIC algorithms that described four-way signal carries out the algorithm of broadband positioning

3-1, as needed selection frequency band and mid frequency f₀, described frequency band and mid frequency f₀Need according to actual mesh The frequecy characteristic of mark signal is being selected.

3-2, FFT Fourier transformations are done to each frame four-way signal, the model X of each frame four-way signal after conversion (f_j) be expressed as：

X(f_j)=A_θ(f_j)S(f_j)+N(f_j), j=1,2,3...J formula 1

A_θ(f_j) it is guiding vector, S (f_j) and N (f_j) it is sound-source signal and noise after FFT Fourier transformations respectively.

Selected frequency band is divided into into multiple frequencies for f after conversion_jNarrow band signal combination.

3-3, using focussing matrix T, by each arrowband place frequency f_jBy focus variations to mid frequency f₀It is located narrow Band, change procedure are as follows：

T(f_j)A(f_j)S(f_j)=A (f₀)S(f₀) formula 2

And mid frequency f is tried to achieve by formula 3₀The autocorrelation matrix at place, for positioning:

3-4, to mid frequency f₀Place arrowband is positioned, and obtains the positioning result of this frame data.Each frame correspondence one Individual positioning result, so that constitute positioning result array A that length is m.

A=[a (1) a (2) a (m)]

Described step 4：Recognition result array C obtained according to step 2 and step 3 and positioning result array A, construction point Cloth matrix M.

Value with recognition result array C as abscissa, the angular configurations scope with positioning result array A as vertical coordinate, The result of each frame is traveled through, matrix M, wherein M (C is built_i,A_j) represent is that recognition result is C in all frames_iPositioning result is A_jFrame number.

Described step 5：After obtaining distribution matrix, by recognition result C_iFind the positioning distribution peaks of target sound source A_top。

Described step 6：In recognition result C_iPositioning distribution on, select peak A_topAnd its two neighboring value A_top-1And A_top+1, the average statistical of these three value place matrix units is calculated, formula can be expressed as:

The wherein resolution of P representing matrixs vertical coordinate angular interval.For example by circumference, 360 degree are divided into 36 angular areas Between, then resolution P=10.

The present invention has the beneficial effect that：

The acoustical signal for collecting simultaneously recognizes and location algorithm by the present invention, and according to result structure distribution matrix, End product is obtained by certain decision making algorithm.This invention can make full use of all identifications in sound clip and positioning letter Breath, on the premise of target sound is recognition result, is distributed according to the positioning result of all frames, obtains final positioning result. Advantage is can to maximize to reject interference and the impact that brings of noise in acoustical signal, and low to the dependency of recognizer, With broad applicability.

Description of the drawings

Fig. 1 is that the present invention proposes overall algorithm flow chart

Fig. 2 is position portion algorithm flow chart

Fig. 3 is the schematic diagram of distribution matrix

Fig. 4 is that 4 passage cross acoustic arrays set up the structure chart under rectangular coordinate system

Specific embodiment

With reference to the accompanying drawings and detailed description the present invention is elaborated, is below described and is only conciliate as demonstration Release, do not make any pro forma restriction to the present invention.

It is illustrated in figure 44 passage cross acoustic arrays and sets up the structure chart under rectangular coordinate system, wherein d is two phases The spacing of adjacent microphone；Radiuses of the r for cross array；S (t) is sound source, and its direction is θ；A, B, C, D in figure is right respectively Should be in passage 1, passage 2, passage 3, passage 4.Then signal is gathered, the signal for collecting 4 passages is always met together, x is designated as respectively₁ (t), x₂(t), x₃(t), x₄(t)。

Guiding vector based on signal collected by cross battle array can be expressed as：

Wherein, ω=2 π f, f are signal frequencies, τ_p(θ) (p=1,2,3,4) is the time delay between signal.Guiding vector exists Algorithm positioned below can be used.

Fig. 1 illustrates the algorithm overview flow chart of the present invention, according to the step in Fig. 1, is being connect by four-way acoustic array After having received four channel signals, pretreatment operation is carried out to which.Main pretreatment operation is framing.To four passages Signal does framing respectively, and framing length is 1024 sampled points, and step-length is 1/2nd of framing length.After assuming signal framing It is divided into the frame of m a length of 1024 sampled points, next our algorithm will be processed to this each frame.

First, algorithm is identified to each frame single channel signal.

Any speech recognition algorithm can be used, and we with LPCC feature extractions and svm classifier learning algorithm are here Example is illustrating.Wherein, we use 16 rank LPCC coefficients, the kernel function of SVM we choose RBF (Radial Basis Function, RBF), it is assumed that the sound type being identified has C1, C2, C3, C4, C5 three types.

12 rank linear predictor coefficients (Linear Prediction Coefficients, the LPC) value of every frame signal is tried to achieve, Wherein LPC values can be solved using Levinson-Durbin algorithms.Followed by LPCC values and the corresponding relation of LPC values Try to achieve the LPCC values of 16 ranks.

Described sound fingerprint base method for building up is as follows：

The 16 rank LPCC values extracted to every frame signal by rows, are then above adding string as category, mark Number ' 0 ' represents C1, and ' 1 ' represents C2, and ' 2 ' represent C3, and ' 3 ' represent C4, and ' 4 ' represent C5.So as to constitute the feature of 17 ranks to Amount.

SVM algorithm is realized with existing libsvm storehouses, chooses RBF as classifier functions；RBF has two parameters：Punish Penalty factor c and parameter gamma, can select optimum number by the grid search function opti_svm_coeff of libsvm Value.

Training process uses the svmtrain functions in libsvm storehouses, comprising four parameters：Characteristic vector, uses said extracted The labelled LPCC values for going out；Kernel function type, from RBF kernel functions；RBF kernel functional parameter c and gamma, are searched using grid Rope method determines；The variable of an entitled model can be obtained after calling svmtrain, this variable save training gained model letter This variable save is got off by breath, i.e., described sound fingerprint base.

And the svmtest being identified by libsvm storehouses of sound is come what is realized, the LPCC values that every frame signal is obtained Intelligent classification is carried out with the svmtest functions of libsvm, and svmtest there are three parameters：First is category, for testing identification Not (when the sound to UNKNOWN TYPE is identified, the parameter does not have practical significance) of rate；Second is characterized vector, i.e., The variable of storage LPCC values, the 3rd is Matching Model, is exactly the return value of above-mentioned steps training process svmtrain function.Adjust The return value obtained with svmtest is exactly acquired results of classifying, i.e. category, so as to can determine that the equipment class for producing this sound Type.

When in actual applications, feature extraction is carried out to signal, be then compared with the sound fingerprint base set up, do To identification.

Then after this stage, we can obtain m recognition result, constitute array C

C=[c (1) c (2) c (m)]

Next, the present invention carries out location algorithm to the four-way signal of each frame.

Fig. 2 illustrates the particular flow sheet of location algorithm part, carries out FFT including to subframe, to each arrowband Pre-estimation angle, and the location algorithm in broadband, here our explanations by taking MUSIC algorithms as an example.

For seeking the autocorrelation matrix of signal, this frame four-way signal is done into secondary framing, framing length is 256, and step-length is The half of frame length.FFT Fourier transformations are done after antithetical phrase framing.The formula of FFT is as follows：

L is that subframe is long, as 256.

After FFT, data can be expressed as：

N is the number of sub-frames after secondary framing.

The signal frequency domain model for then obtaining can be expressed as：

X(f_j)=A_θ(f_j)S(f_j)+N(f_j), j=1,2,3...J

Whereinf_sIt is the sample frequency of signal.As actual signal is mostly broadband signal, need to choose One suitable broadband frequency domain and center frequency points f₀。

Broadband signal can be regarded as multiple narrow band signals and constitute.By focussing matrix T_jEach arrowband can be made by we Focusing transform is to mid frequency.

T(f_j)A(f_j)S(f_j)=A (f₀)S(f₀)

A (f) is guiding vector to be used in location algorithm.

We first do the MUSIC location algorithms of an arrowband to each arrowband, used as pre-estimation when seeking focussing matrix As a result.Step is as follows：

The signal autocorrelation matrix R of each narrow band frequency is sought first_f, to autocorrelation matrix R_fMake Eigenvalues Decomposition.

U in formula_SIt is the subspace namely signal subspace opened by the corresponding characteristic vector of big eigenvalue, and U_NIt is by little The subspace of the corresponding characteristic vector of eigenvalue namely noise subspace.The Power estimation function of MUSIC algorithms is

In formula, Θ represents angle of visibility.

Allowing θ to scan in observation fan Θ faces, formula being calculated in the corresponding functional value of each scan position, peak value occurs in the function Orientation, be denoted as β_j, as aspect.

β=[β can be obtained after MUSIC location algorithm pre-estimations are done to each arrowband₁ β₂ ··· β_J]。

And then, we will construct focussing matrix by pre-estimation result.

T(f_j)=V (f_j)U(f_j)^H

Wherein U (f_j) and V (f_j) it is respectively A (f_j,β)A^H(f₀, β) left unusual and right singular vector.Using a series of poly- Burnt matrix T (f_j) conversion is focused to array receiving data, obtain the data autocorrelation matrix of single-frequency point

Equally, after autocorrelation matrix has been obtained, we can try again to mid frequency arrowband MUSIC algorithms, just Last positioning result can be obtained.

After this stage, we can obtain m positioning result, constitute array A.

A=[a (1) a (2) a (3) a (4) a (m)]

As shown in Figure 1, after positioning and recognition result is obtained, therefore we can build distribution matrix M.Fig. 3 is illustrated The schematic diagram of distribution matrix.Abscissa is that the possible spans of positioning result A are interval.That vertical coordinate is represented is recognition result C Possible span.M(C_i,A_j) represent that recognition result is C in all frames of this segment data_iPositioning result is A_jFrame it is total Number.

After distribution matrix statistics is obtained, just by the positioning distribution of the recognition result of target sound source, determining for target is tried to achieve Position result.

The present invention selects that a line that recognition result is target sound source, and the positioning result distribution of target sound source is obtained.Look for To peak A_top, determine peak value and its two neighboring value A_top-1And A_top+1, the statistics calculated in this 3 value place matrix units is equal Value, as final positioning result.

Formula can be expressed as：

Claims

1. the sound localization method based on distribution matrix decision-making, it is characterised in that comprise the steps：

Step 1, the four-way acoustical signal collected to acoustic array carry out pretreatment；

Step 2, voice recognition is carried out to single-channel data；

Step 3, wideband voice positioning is carried out to multi-channel data；

Step 6, selection peak value and its two neighboring angular interval, calculate these three interval average statisticals, and as last determines Position result.

2. the sound localization method based on distribution matrix decision-making according to claim 1, it is characterised in that described step 1：Live sound signal is obtained using cross acoustic array, note sample frequency is f_s；Four-way acoustical signal is carried out at framing Reason, it is assumed that the frame number after framing is m；Next to framing after each frame signal process.

3. the sound localization method based on distribution matrix decision-making according to claim 2, it is characterised in that described step 2 The algorithm being identified to single channel signal is LPCC+SVM algorithms；

Each frame obtains a recognition result, so as to constitute recognition result array C that length is m；

C=[c (1) c (2) ... c (m)].

4. the sound localization method based on distribution matrix decision-making according to claim 3, it is characterised in that described four-way It is broadband MUSIC algorithms that road signal carries out the algorithm of broadband positioning, specific as follows：

3-1, as needed selection frequency band and mid frequency f₀, described frequency band and mid frequency f₀Need to be believed according to realistic objective Number frequecy characteristic being selected；

3-2, FFT Fourier transformations are done to each frame four-way signal, the model X (f of each frame four-way signal after conversion_j) table It is shown as：

X(f_j)=A_θ(f_j)S(f_j)+N(f_j), j=1,2,3...J formula 1

A_θ(f_j) it is guiding vector, S (f_j) and N (f_j) it is sound-source signal and noise after FFT Fourier transformations respectively；

Selected frequency band is divided into into multiple frequencies for f after conversion_jNarrow band signal combination；

3-3, using focussing matrix T, by each arrowband place frequency f_jBy focus variations to mid frequency f₀Place arrowband, becomes Change process is as follows：

T(f_j)A(f_j)S(f_j)=A (f₀)S(f₀) formula 2

3-4, to mid frequency f₀Place arrowband is positioned, and obtains the positioning result of this frame data；Each one positioning of frame correspondence As a result, so as to constitute length be m positioning result array A；

A=[a (1) a (2) ... a (m)].

5. the sound localization method based on distribution matrix decision-making according to claim 4, it is characterised in that described step 4：Recognition result array C obtained according to step 2 and step 3 and positioning result array A, construct distribution matrix M；

As abscissa, the angular configurations scope with positioning result array A is traveled through value with recognition result array C as vertical coordinate The result of each frame, builds matrix M, wherein M (C_i,A_j) represent is that recognition result is C in all frames_iPositioning result is A_j's The number of frame.

6. the sound localization method based on distribution matrix decision-making according to claim 5, it is characterised in that described step 5：After obtaining distribution matrix, by recognition result C_iFind the positioning distribution peaks A of target sound source_top。

7. the sound localization method based on distribution matrix decision-making according to claim 6, it is characterised in that described step 6：In recognition result C_iPositioning distribution on, select peak A_topAnd its two neighboring value A_top-1And A_top+1, calculate these three value institutes In the average statistical of matrix unit, formula can be expressed as:

{FDOA}_{C_{i}} = P * \frac{Σ_{l = t o p - 1}^{t o p + 1} A_{l} * M (C_{i}, A_{l})}{Σ_{l = t o p - 1}^{t o p + 1} M (C_{i}, A_{l})} .