CN112666520A

CN112666520A - Method and system for positioning time-frequency spectrum sound source with adjustable response

Info

Publication number: CN112666520A
Application number: CN202011492942.8A
Authority: CN
Inventors: 聂鹏飞; 贾彩琴; 刘宾; 王黎明; 韩焱; 余尚江; 陈晋央; 周会娟
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-04-16
Anticipated expiration: 2040-12-17
Also published as: CN112666520B

Abstract

The invention provides a method and a system for positioning a time-frequency spectrum sound source with adjustable response.

Description

Method and system for positioning time-frequency spectrum sound source with adjustable response

Technical Field

The invention relates to the technical field of sound source positioning, in particular to a method and a system for positioning a sound source with a response-adjustable time-frequency spectrum.

Background

The sound source positioning is widely applied to many fields, and the current sound source positioning methods can be divided into two types in terms of positioning principle, wherein one type is a two-step positioning method based on arrival time/arrival time difference estimation, and the other type is a one-step positioning method based on beam forming. The positioning accuracy of the two-step positioning method is limited by the time difference estimation accuracy and the positioning algorithm performance. High accuracy moveout/moveout estimation requires high quality data and therefore the two step positioning method is poor in interference rejection. The one-step positioning method can make full use of time, energy and frequency information of signals, and is strong in anti-interference energy, high in positioning accuracy and wide in application range. However, the above methods only consider the time information of the array signal, and do not consider the constraint of the frequency information on the positioning, which results in inaccurate positioning. Although the existing one-step positioning method, such as SRP-PHAT, has a certain anti-noise performance, the method utilizes the full-band information of the signal, and due to noise interference, a large number of local extrema exist in the energy distribution of the positioning space, which results in the problems of reduced sound source positioning accuracy, low convergence speed in the optimization solution process, and the like.

Disclosure of Invention

The invention aims to provide a method and a system for positioning a time-frequency spectrum sound source with adjustable response, which are used for overcoming the technical defects of low positioning precision and low positioning speed in the conventional positioning method and improving the positioning precision and the positioning speed.

In order to achieve the purpose, the invention provides the following scheme:

a method for tunable response time-frequency spectrum sound source localization, the localization method comprising the steps of:

calculating the cross-correlation coefficient of the vibration signals acquired by every two sensors in the sensor array;

performing time-frequency transformation on each cross-correlation coefficient to obtain a time-frequency spectrum of the cross-correlation coefficient;

obtaining positioning space energy distribution containing signal frequency information by utilizing a delay and sum beam forming principle according to the time-frequency spectrum of each cross-correlation coefficient;

determining a positioning characteristic vector corresponding to the maximum value of the energy distribution of the positioning space of the vibration signals with different frequencies in a frequency constraint range by adopting a constraint maximum likelihood estimation mode, and constructing a positioning characteristic matrix;

and clustering and weighting fusion are carried out on the positioning feature vectors in the positioning feature matrix to obtain a sound source positioning result.

Optionally, the obtaining, according to the time-frequency spectrum of each cross-correlation coefficient, a positioning space energy distribution including signal frequency information by using a principle of delay-sum beam forming specifically includes:

according to the time-frequency spectrum of each cross-correlation coefficient, the positioning space energy distribution containing signal frequency information is obtained by utilizing the principle of delay summation beam forming, and the positioning space energy distribution is as follows:

where P (x, f) denotes the localized spatial energy distribution, τ (x) denotes the time difference between the spatial position x to the ith and jth sensors, f denotes the vibration signal frequency,

represents the cross-correlation coefficient R after the (i, j) th time-frequency transformation_i,jThe time-frequency transform function value of (1); s_i(t) represents the vibration signal collected by the ith sensor,

representing the vibration signal s collected by the jth sensor_j(t + τ (x)) with t representing the time at which the ith sensor acquired the vibration signal and N representing the number of sensors in the sensor array.

Optionally, the determining, in a constrained maximum likelihood estimation manner, a positioning feature vector corresponding to a maximum value of energy distribution in a positioning space of vibration signals with different frequencies within a frequency constraint range, and constructing a positioning feature matrix specifically includes:

solving formula by using constrained maximum likelihood estimation mode

Obtaining a positioning feature matrix as A ═ a₁,a₂,...,a_m,...a_M]；

Wherein the content of the first and second substances,

representing the frequency of the vibration signal as f_mPosition vector, P (x, f), of the maximum of the temporal localization spatial energy distribution_m) Representing the frequency of the vibration signal as f_mTemporal localization spatial energy distribution, a_mRepresenting the frequency f of the vibration signal_mThe corresponding location feature vector is then used to locate,

E_mrepresenting the frequency f of the vibration signal_mThe corresponding spatially tunable response spectral energy,

representing the frequency of the vibration signal as f_mLocation vector in temporal localization spatial energy distribution

Energy of (f) of (d)_min,f_max]Indicating the frequency band range, f_minRepresenting the minimum effective frequency, f_maxRepresenting the maximum effective frequency.

Optionally, the clustering and weighting fusion of the localization feature vectors in the localization feature matrix is performed to obtain a sound source localization result, and the method specifically includes:

taking the positioning feature vector with the vibration signal frequency closest to the main frequency in the positioning feature matrix as a main clustering center;

performing two-classification dynamic clustering on the positioning feature vector in the positioning feature matrix by using the main clustering center to obtain a class in which the main clustering center is located as a data fusion sample set;

using formulas

Performing weighted fusion on each positioning feature vector in the data fusion sample set to obtain a source positioning result;

where Φ represents the source localization result, w_nRepresenting the nth location feature vector a in the data fusion sample set_nThe weight of (a) is determined,

β_nrepresenting the inverse of the Euclidean distance, β, of the nth location feature vector in the data fusion sample set from the primary cluster center_lAnd L represents the quantity of the positioning feature vectors in the data fusion sample set.

A tunable response time-frequency spectral sound source localization system, the localization system comprising:

the cross-correlation coefficient calculation module is used for calculating the cross-correlation coefficient of the vibration signals acquired by every two sensors in the sensor array;

the time-frequency transformation module is used for respectively carrying out time-frequency transformation on each cross-correlation coefficient to obtain a time-frequency spectrum of the cross-correlation coefficient;

the positioning space energy distribution determining module is used for obtaining positioning space energy distribution containing signal frequency information by utilizing the principle of delay summation beam forming according to the cross-correlation coefficient after each time frequency transformation;

the positioning feature matrix construction module is used for determining a positioning feature vector corresponding to the maximum value of the energy distribution of the positioning space of the vibration signals with different frequencies in the frequency constraint range by adopting a constraint maximum likelihood estimation mode, and constructing a positioning feature matrix;

and the sound source positioning module is used for clustering and weighting the positioning characteristic vectors in the positioning characteristic matrix to obtain a sound source positioning result.

Optionally, the positioning spatial energy distribution determining module specifically includes:

and the positioning space energy distribution determining submodule is used for obtaining positioning space energy distribution containing signal frequency information according to the time-frequency spectrum of each cross-correlation coefficient by utilizing the principle of delay summation beam forming, and comprises the following steps:

Optionally, the module for constructing the location feature matrix specifically includes:

a positioning feature matrix construction submodule for solving the formula by adopting a constraint maximum likelihood estimation mode

Obtaining a positioning feature matrix as A ═ a₁,a₂,...,a_m,...a_M]；

Wherein the content of the first and second substances,

representing the frequency of the vibration signal as f_mPosition vector, P (x, f), of the maximum of the temporal localization spatial energy distribution_m) Representing the frequency of the vibration signal as f_mTemporal localization spatial energy distribution, a_mIndicating vibrationFrequency f of the signal_mThe corresponding location feature vector is then used to locate,

Optionally, the sound source positioning module specifically includes:

the main clustering center determining submodule is used for taking the positioning feature vector with the vibration signal frequency closest to the main frequency in the positioning feature matrix as a main clustering center;

the data fusion sample set acquisition submodule is used for carrying out two-classification dynamic clustering on the positioning feature vector in the positioning feature matrix by using the main clustering center to obtain a class where the main clustering center is located as a data fusion sample set;

a weighted fusion submodule for utilizing the formula

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a method and a system for positioning a time-frequency spectrum sound source with adjustable response, wherein the positioning method comprises the following steps: calculating the cross-correlation coefficient of the vibration signals acquired by every two sensors in the sensor array; performing time-frequency transformation on each cross-correlation coefficient to obtain a time-frequency spectrum of the cross-correlation coefficient; obtaining positioning space energy distribution containing signal frequency information by utilizing a delay and sum beam forming principle according to the time-frequency spectrum of each cross-correlation coefficient; determining a positioning characteristic vector corresponding to the maximum value of the energy distribution of the positioning space of the vibration signals with different frequencies in a frequency constraint range by adopting a constraint maximum likelihood estimation mode, and constructing a positioning characteristic matrix; and clustering and weighting fusion are carried out on the positioning feature vectors in the positioning feature matrix to obtain a sound source positioning result. According to the invention, the positioning space energy distribution containing signal frequency information is obtained by performing time-frequency transformation on the cross-correlation coefficient and utilizing the principle of delay summation beam forming, so that the obtained positioning space energy distribution has frequency information, then the maximum likelihood estimation is restricted, a positioning characteristic matrix is constructed, the technical defect of inaccurate positioning caused by the fact that the frequency information restriction is not considered is overcome, and the positioning precision and speed are further improved by adopting a clustering and weighting fusion mode.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a method for time-frequency-spectrum sound source localization with adjustable response according to the present invention;

fig. 2 is a schematic diagram of a method for locating a tunable response time-frequency spectrum sound source according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1 and 2, the present invention provides a method for localization of a time-frequency-spectrum sound source with adjustable response, the localization method comprising the steps of:

step 101, calculating the cross-correlation coefficient of the vibration signals collected by every two sensors in the sensor array.

Collecting vibration data generated by sound source by array sensor, calculating cross correlation R of signals of sensor pair (i, j)_i,j(τ)，

Where j denotes the jth sensor, t denotes the time of acquisition of the signal, τ denotes the time delay, s_i(t) represents the signal collected by the ith sensor, s_j(t + τ) represents the acquired signal delay signal of the jth sensor.

And 102, respectively carrying out time-frequency transformation on each cross correlation coefficient to obtain a time-frequency spectrum of the cross correlation coefficient.

The SRP-PHAT (weighted controlled response power sound source positioning of phase transformation) method which can be obtained by the formula (1) is the positioning method with the best anti-noise and anti-multipath interference performance at present, and the positioning principle is as follows:

where P (x) represents the localization spatial energy distribution,

is the time difference between the calculated spatial position x to the ith and jth sensors, wherein x represents the spatial position of the sound source, | · | | survival₂Is the 2 norm (distance) of the vector and v is the wave propagation velocity. Maximizing the spatial position corresponding to P (x)

Is the location of the sound source. Although the SRP-PHAT method has certain anti-noise performance, the method utilizes the full-frequency-band information of signals, and a large number of local extrema exist in P (x) due to noise interference, so that the positioning accuracy of a sound source is reduced, the convergence speed in the optimization solving process is low, and the like.

In order to overcome the defects of the SRP-PHAT method, the invention firstly carries out the treatment on R_i,j(tau) performing time-frequency transformation, and then obtaining an adjustable spatial response spectrum by utilizing delay-sum beam forming in a time-frequency domain. R_i,j(τ) time-frequency transform with S-transform:

wherein the content of the first and second substances,

representing a signal s_j(t) synchronized pressureThe S-scaling transformation is carried out,

representing the cross-correlation R_i,j(τ) time-frequency transformation.

And 103, obtaining positioning space energy distribution containing signal frequency information by utilizing a delay and sum beam forming principle according to the time-frequency spectrum of each cross-correlation coefficient.

Step 103, obtaining the positioning space energy distribution containing the signal frequency information by using the principle of delay and sum beam forming according to the cross-correlation coefficient after each time-frequency transformation, specifically including: according to the cross-correlation coefficient after each time frequency transformation, the positioning space energy distribution containing signal frequency information is obtained by utilizing the principle of delay summation beam forming, and the positioning space energy distribution is as follows:

Specifically, let the position of the ith sensor be x_i＝[x_i,y_i,z_i]^TFrom the principle of delay-sum beamforming, one can obtain:

the tunable spatial response spectrum represented by equation (3) contains not only the spatial position variable x but also a function of the signal frequency f. Thus, better positioning performance can be obtained by restricting the frequency f of the signal.

And 104, determining a positioning feature vector corresponding to the maximum value of the energy distribution of the positioning space of the vibration signals with different frequencies in the frequency constraint range by adopting a constraint maximum likelihood estimation mode, and constructing a positioning feature matrix.

Step 104, determining a positioning feature vector corresponding to the maximum value of the energy distribution of the positioning space of the vibration signals with different frequencies in the frequency constraint range by adopting a constraint maximum likelihood estimation mode, and constructing a positioning feature matrix, which specifically comprises the following steps: solving formula by using constrained maximum likelihood estimation mode

Obtaining a positioning feature matrix as A ═ a₁,a₂,...,a_m,...a_M](ii) a Wherein the content of the first and second substances,

Specifically, when the sensor array collects signals, the sensor array is often influenced by the surrounding environment, and particularly for complex environments, the signal-to-noise ratio of the collected signals is low. Meanwhile, the acquired array signals are often in a main frequency band, and not all frequency bands are useful information. Therefore, better positioning results can be obtained by constraining the signal frequency f so that only the adjustable spatial response spectral energy of the signal main frequency band is considered in the formula (3) maximization process. Carrying out spectrum analysis on the array signal to obtain the frequency band range [ f ] of the array signal_min,f_max]And main frequency f₀It is used as a constraint to solve the maximum likelihood estimate of P (x, f). The maximization problem can be converted into a constraint optimization problem

f_iRepresents that is of_min,f_max]At a certain frequency of the range, the above-mentioned constrained optimization problem can be solved by group intelligent optimization methods such as genetic algorithm, etc., so as to obtain a series of positioning results corresponding to the frequency band range and their corresponding characteristics, which are expressed as

A＝[a₁,a₂,...,a_i,...a_M] (5)

Wherein a is_iIs derived from the frequency f within the frequency band_iIts corresponding positioning space position

And its corresponding spatially tunable response spectral energy E_iConstructed vectors, i.e.

Wherein the content of the first and second substances,

the above method obtains a series of samples, possibly of the sound source position, which contain both spatial position information and frequency and energy information. This information is in fact the key information available for positioning. The acquisition of the information can provide a basis for subsequent high-precision fusion positioning.

And 105, clustering and weighting the positioning feature vectors in the positioning feature matrix to obtain a sound source positioning result.

Step 105, performing clustering and weighted fusion on the positioning feature vectors in the positioning feature matrix to obtain a sound source positioning result, which specifically includes: taking the positioning feature vector with the vibration signal frequency closest to the main frequency in the positioning feature matrix as a main clustering center; performing two-classification dynamic clustering on the positioning feature vector in the positioning feature matrix by using the main clustering center to obtain a class in which the main clustering center is located as a data fusion sample set; using formulas

Performing weighted fusion on each positioning feature vector in the data fusion sample set to obtain a source positioning result; where Φ represents the sound source localization result, w_nRepresenting the nth location feature vector a in the data fusion sample set_nThe weight of (a) is determined,

Specifically, step 104 has already obtained the key information for determining the positioning performance, but still cannot simply obtain high precision therefromAnd as a result of the accurate positioning, data mining is required to realize high-precision positioning. Although the positioning result within the signal frequency band is obtained in step 104, the frequencies of the signals collected by different sensors, which can represent the positioning performance, are not completely consistent due to the difference of the propagation paths of the waves to the different sensors. Meanwhile, the spatial position corresponding to the maximum spatial response spectral energy due to noise interference may not be the best positioning result. Based on the above two points, the invention firstly searches the A middle and dominant frequency f₀Closest sample

(f_mClosest to f₀) As a clustering center of the dynamic clustering, the obtained sample set A is subjected to two-classification dynamic clustering, and the removed clustering center is not a_mKeep the cluster center as a_mThe class (denoted as B) is used as a sample for data fusion. Calculate each sample a in B_nAnd a_mInverse beta of the Euclidean distance of (1)_n。

Wherein d is_nm＝||a_n-a_m||₂,a_nE.g. B. From beta_nConstructing a weighting factor w for a weighted fusion_n；

Using the constructed weighting coefficients, for sample a in B_nAnd performing weighted fusion to obtain a final positioning vector phi.

The spatial position corresponding to phi is the final positioning result.

The invention also provides a system for positioning a spectral sound source in adjustable response, which comprises:

the positioning space energy distribution determining module is used for obtaining positioning space energy distribution containing signal frequency information by utilizing the principle of delay summation beam forming according to the time-frequency spectrum of each cross-correlation coefficient;

the positioning spatial energy distribution determining module specifically includes: and the positioning space energy distribution determining submodule is used for obtaining positioning space energy distribution containing signal frequency information according to the cross-correlation coefficient after each time-frequency transformation by utilizing the principle of delay summation beam forming:

And the positioning feature matrix construction module is used for determining a positioning feature vector corresponding to the maximum value of the energy distribution of the positioning space of the vibration signals with different frequencies in the frequency constraint range by adopting a constraint maximum likelihood estimation mode, and constructing a positioning feature matrix.

The positioning feature matrix building module specifically comprises: a positioning feature matrix construction submodule for solving the formula by adopting a constraint maximum likelihood estimation mode

Obtaining a positioning feature matrix as A ═ a₁,a₂,...,a_m,...a_M]；

Wherein the content of the first and second substances,

The sound source positioning module specifically comprises: a main clustering center determining submodule for determining the location feature matrixThe positioning feature vector with the vibration signal frequency closest to the main frequency is used as a main clustering center; the data fusion sample set acquisition submodule is used for carrying out two-classification dynamic clustering on the positioning feature vector in the positioning feature matrix by using the main clustering center to obtain a class where the main clustering center is located as a data fusion sample set; a weighted fusion submodule for utilizing the formula

Performing weighted fusion on each positioning feature vector in the data fusion sample set to obtain a source positioning result; where Φ represents the source localization result, w_nRepresenting the nth location feature vector a in the data fusion sample set_nThe weight of (a) is determined,

Compared with the prior art, the invention has the beneficial effects that:

The equivalent embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts between the equivalent embodiments can be referred to each other.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In summary, this summary should not be construed to limit the present invention.

Claims

1. A method for locating a time-frequency-spectrum sound source with adjustable response is characterized by comprising the following steps:

2. The method according to claim 1, wherein the obtaining a localization spatial energy distribution containing signal frequency information according to the time-frequency spectrum of each cross-correlation coefficient by using the principle of delay-sum beam forming specifically comprises:

3. The method for positioning an adjustable-response time-frequency spectrum sound source according to claim 2, wherein the determining a positioning feature vector corresponding to a maximum value of energy distribution in a positioning space of vibration signals with different frequencies within a frequency constraint range by using a constraint maximum likelihood estimation method to construct a positioning feature matrix specifically comprises:

solving formula by using constrained maximum likelihood estimation mode

Obtaining a positioning feature matrix as A ═ a₁,a₂,...,a_m,...a_M]；

Wherein the content of the first and second substances,

4. The method for positioning a sound source with an adjustable response time-frequency spectrum according to claim 1, wherein the clustering and weighted fusion of the positioning feature vectors in the positioning feature matrix is performed to obtain a sound source positioning result, and specifically comprises:

using formulas

Carrying out weighted fusion on each positioning feature vector in the data fusion sample set to obtain a sound source positioning result;

where Φ represents the sound source localization result, w_nRepresenting the nth location feature vector a in the data fusion sample set_nThe weight of (a) is determined,

5. A system for tunable response-time spectral sound source localization, the localization system comprising:

6. The system for tunable response-time spectral sound source localization according to claim 5, wherein the localization spatial energy distribution determining module specifically comprises:

and the positioning space energy distribution determining submodule is used for obtaining positioning space energy distribution containing signal frequency information according to the cross-correlation coefficient after each time-frequency transformation by utilizing the principle of delay summation beam forming:

7. The system for spectral sound source localization according to claim 5, wherein the localization feature matrix constructing module specifically comprises:

Obtaining a positioning feature matrix as A ═ a₁,a₂,...,a_m,...a_M]；

Wherein the content of the first and second substances,

8. The system for tunable response-time spectral sound source localization according to claim 5, wherein the sound source localization module specifically comprises:

a weighted fusion submodule for utilizing the formula