CN112257484A

CN112257484A - Multi-sound-source direction finding method and system based on deep learning

Info

Publication number: CN112257484A
Application number: CN201910661146.3A
Authority: CN
Inventors: 徐及; 黄兆琼; 颜永红
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2021-01-22
Anticipated expiration: 2039-07-22
Also published as: CN112257484B

Abstract

The invention relates to a deep learning-based multi-sound-source direction finding method and system, wherein the method comprises the following steps: converting the acoustic source signals received by the array into digital acoustic signals; performing Fourier transform on the digital sound signal to obtain a signal after Fourier transform; inputting the signals after Fourier transform into a deep neural network, and training the deep neural network by using a back propagation method until the deep neural network is converged; and finding the azimuth corresponding to the peak value from the posterior probability output by the converged deep neural network, wherein the azimuth is the azimuth of the sound source. The method of the invention utilizes the deep neural network, finds the optimal solution of the sound source direction of arrival by optimizing the criterion function, obtains the direction of the target directly from the signals received by the array, and realizes simple and efficient multi-sound source direction finding.

Description

Multi-sound-source direction finding method and system based on deep learning

Technical Field

The invention relates to the technical field of sound source direction finding, in particular to a multi-sound-source direction finding method and system based on deep learning.

Background

The sound source direction finding technology can indicate the spatial direction of a sound source target, and provides important spatial information for subsequent information acquisition and processing.

The traditional method mainly utilizes the modern digital signal processing technology to estimate the azimuth information of the sound source, gives the direction of arrival of the sound source through a lattice point matching search or analysis mode, and has higher calculation complexity.

In recent years, deep neural networks have been widely used to solve the conventional signal processing problem, however, most of the previous studies of applying deep learning to direction finding are based on shallow neural networks or conventional neural networks. However, the deep neural network has stronger capability of finding the optimal solution than the shallow neural network, and the deep neural network has certain advantages when being used for finding the optimal solution of the arrival direction of the sound source.

Disclosure of Invention

The invention aims to solve the problem of multi-sound-source direction finding in the actual environment, provides a multi-sound-source direction finding method and system based on deep learning, and realizes robust and efficient multi-sound-source direction finding by utilizing a deep neural network.

In order to achieve the above object, the present invention provides a deep learning-based multi-sound-source direction finding method, including:

converting the acoustic source signals received by the array into digital acoustic signals; performing Fourier transform on the digital sound signal to obtain a signal after Fourier transform;

inputting the signals after Fourier transform into a deep neural network, and training the deep neural network by using a back propagation method until the deep neural network is converged;

and finding the azimuth corresponding to the peak value from the posterior probability output by the converged deep neural network, wherein the azimuth is the azimuth of the sound source.

As an improvement of the above method, said converting the acoustic source signal received by the array into a digital acoustic signal; performing Fourier transform on the digital sound signal to obtain a signal after Fourier transform; the method specifically comprises the following steps:

the array comprises K microphones; fourier transform is carried out on the signals y (t) received by the array, and real parts and imaginary parts of Fourier transform coefficients at each frequency are connected in series with a vector X:

Y(f_i)＝fft(y(t))，

wherein, Y (f)_i) Is a frequency f_iThe Fourier transform coefficients of (A); i is a frequency index, and F is the number of points of Fourier transform; real (Y (f)_i) Is Y (f)_i) Real part of (a), imag (Y (f)_i) Is Y (f)_i) An imaginary part of (d); y (f)_i)＝[Y₁(f_i)，Y₂(f_i)，…，Y_L(f_i)]And L is the number of fast beats.

As an improvement of the above method, the signals after fourier transform are input into a deep neural network, and the deep neural network is trained by using a back propagation method until the deep neural network converges; the method specifically comprises the following steps:

inputting the vector X into a deep neural network, wherein a training criterion function gamma of the deep neural network is as follows:

wherein, gamma is_f，p＝H(θ_p，f_i)[H^H(θ_p，f_i)H(θ_p，f_i)]H^H(θ_p，f_i) Defining a steering vector

Theta P is the direction of arrival of the P sound source, and P is more than or equal to 1 and less than or equal to P; p is the number of sound sources;τ_kis the time delay between the kth microphone and the first microphone; k is more than or equal to 1 and less than or equal to K; gamma-shaped_f，pY_l(f_i) Is a K x 1 dimensional observation vector projected onto a steering vector H (theta)_p，f_i) In space of (a);

when the training criterion is minimized, the deep neural network converges.

As an improvement of the above method, the direction of arrival of the sound source is: theta_pAngle interval p Δ θ

As an improvement of the above method, the finding the azimuth corresponding to the peak from the posterior probability output by the converged deep neural network is that the azimuth of the sound source appears, and specifically:

after the deep neural network converges, the output vector is: z is ═ z₁，z₂，…，z_P]^T，z_p∈[0，1]The posterior probability of the occurrence of the p sound source;

from z₁，z₂，…，z_PExtracting a plurality of maxima in the posterior probability;

calculating a threshold value delta:

δ＝O_avg+μ(O_max-O_avg)，

wherein, O_avgAnd O_maxRepresents the average value and the maximum value of the posterior probability, and mu is a parameter;

for maxima greater than the threshold δ, the corresponding bearing is the bearing where the sound source may appear.

The invention also provides a deep learning-based multi-sound-source direction-finding system, which comprises:

the signal conversion module is used for converting the sound source signals received by the array into digital sound signals; performing Fourier transform on the digital sound signal to obtain a signal after Fourier transform;

the deep neural network training module is used for inputting the signals subjected to Fourier transform into a deep neural network and training the deep neural network by using a back propagation method until the deep neural network converges;

and the sound source direction finding module is used for finding the direction corresponding to the peak value from the posterior probability output by the converged deep neural network, wherein the direction is the direction in which the sound source appears.

As an improvement of the above system, the specific implementation process of the signal conversion module is as follows:

fourier transform is carried out on the signals y (t) received by the array, and real parts and imaginary parts of Fourier transform coefficients at each frequency are connected in series with a vector X:

Y(f_i)＝fft(y(t))，

wherein the array comprises K microphones; y (f)_i) Is a frequency f_iThe Fourier transform coefficients of (A); i is a frequency index, and F is the number of points of Fourier transform; real (Y (f)_i) Is Y (f)_i) Real part of (a), imag (Y (f)_i) Is Y (f)_i) An imaginary part of (d); y (f)_i)＝[Y₁(f_i)，Y₂(f_i)，…，Y_L(f_i)]And L is the number of fast beats.

As an improvement of the above system, the implementation process of the deep neural network training module is as follows:

θ_pP is more than or equal to 1 and less than or equal to P as the arrival direction of the pth sound source; p is the number of sound sources; theta_pAngle interval p Δ θ

τ_kIs the time delay between the kth microphone and the first microphone; k is more than or equal to 1 and less than or equal to K; gamma-shaped_f，pY_l(f_i) Is a K x 1 dimensional observation vector projected onto a steering vector H (theta)_p，f_i) In space of (a);

when the training criterion is minimized, the deep neural network converges.

As an improvement of the above system, the sound source direction finding module is implemented by the following specific processes:

calculating a threshold value delta:

δ＝O_avg+μ(O_max-O_avg)，

The invention has the advantages that:

the invention provides an underwater multi-sound-source direction finding method based on deep learning, which utilizes a deep neural network to search an optimal solution of the sound source direction of arrival through an optimization criterion function.

Drawings

Fig. 1 is a flowchart of an underwater multi-sound-source direction finding method based on deep learning according to embodiment 1 of the present invention.

Detailed Description

The invention will now be further described with reference to the accompanying drawings and specific embodiments.

Example 1:

as shown in fig. 1, embodiment 1 of the present invention provides a deep learning-based multi-sound-source direction finding method, including the following steps

Step 1) converting sound source signals received by an array into digital sound signals;

the array includes K sensors (microphones).

Step 2) carrying out Fourier transform on the digital sound signal;

step 3) inputting the signals after Fourier transform into a deep neural network;

fourier transform is carried out on the signals y (t) received by the array, and the real part and the imaginary part of a Fourier transform coefficient on each frequency are connected in series to serve as the input X of the neural network:

Y(f_i)＝fft(y(t))，

wherein, Y (f)_i) Is a frequency f_iThe Fourier transform coefficients of (A); i is a frequency index, and F is the number of points of Fourier transform; real (Y (f)_i) Is Y (f)_i) Real part of (a), imag (Y (f)_i) Is Y (f)_i) An imaginary part of (d);

step 4), training a neural network by using a back propagation method until convergence;

when the training criterion is minimized, the deep neural network converges.

And 5) analyzing the converged neural network output posterior probability, and finding the azimuth corresponding to the peak value, namely the azimuth of the sound source.

After the neural network converges, extracting an extreme value in the posterior probability as the position where the sound source appears, and when the posterior probability output by the neural network is higher than a threshold:

δ＝O_avg+μ(O_max-O_avg)，

wherein, O_avgAnd O_maxThe mean and maximum values of the posterior probabilities are represented, where μ is generated empirically. And extracting the azimuth corresponding to the probability higher than the threshold delta as the azimuth of the sound source possibly appearing.

The posterior probability of the output of the neural network reflects the probability of the sound source appearing at this orientation when the minimum is reached by the criterion function in step 4) above. After the neural network converges, the occurrence orientation of the sound source can be obtained by analyzing the peak value of the posterior probability.

Example 2:

embodiment 2 of the present invention provides a deep learning-based multi-sound-source direction finding system, which includes:

the specific implementation process of the signal conversion module is as follows:

Y(f_i)＝fft(y(t))，

the specific implementation process of the deep neural network training module is as follows:

when the training criterion is minimized, the deep neural network converges.

The concrete implementation process of the sound source direction finding module is as follows:

calculating a threshold value delta:

δ＝O_avg+μ(O_max-O_avg)，

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A deep learning based multi-source direction finding method, the method comprising:

2. The deep learning based multi-source direction finding method according to claim 1, wherein the converting the acoustic source signals received by the array into digital acoustic signals; performing Fourier transform on the digital sound signal to obtain a signal after Fourier transform; the method specifically comprises the following steps:

Y(f_i)＝fft(y(t))，

3. The deep learning-based multi-sound-source direction finding method according to claim 2, wherein the Fourier transformed signal is input into a deep neural network, and the deep neural network is trained by using a back propagation method until the deep neural network converges; the method specifically comprises the following steps:

θ_pP is more than or equal to 1 and less than or equal to P as the arrival direction of the pth sound source; p is the number of sound sources; tau is_kIs the time delay between the kth microphone and the first microphone; k is more than or equal to 1 and less than or equal to K; gamma-shaped_f，pY_l(f_i) Is a K x 1 dimensional observation vector projected onto a steering vector H (theta)_p，f_i) In space of (a);

when the training criterion is minimized, the deep neural network converges.

4. Root of herbaceous plantThe deep learning-based multi-sound-source direction finding method according to claim 3, wherein the directions of arrival of the sound sources are: theta_pAngle interval p Δ θ

5. The deep learning-based multi-sound-source direction finding method according to claim 3 or 4, wherein the position corresponding to the peak is found from the posterior probability output by the converged deep neural network, and the position is a position where a sound source appears, and specifically:

calculating a threshold value delta:

δ＝O_avg+μ(O_max-O_avg)，

6. A deep learning based multi-source direction finding system, the system comprising:

7. The deep learning-based multi-sound-source direction finding system according to claim 6, wherein the signal conversion module is implemented by the following steps:

Y(f_i)＝fft(y(t))，

8. The deep learning-based multi-sound-source direction finding system according to claim 7, wherein the deep neural network training module is implemented by the following steps:

when the training criterion is minimized, the deep neural network converges.

9. The deep learning-based multi-sound-source direction-finding system according to claim 8, wherein the sound source direction-finding module is implemented by the following steps:

calculating a threshold value delta:

δ＝O_avg+μ(O_max-O_avg)，