CN113189571A

CN113189571A - Sound source passive ranging method based on tone feature extraction and deep learning

Info

Publication number: CN113189571A
Application number: CN202010037014.6A
Authority: CN
Inventors: 肖旭; 倪海燕; 王同; 苏林; 任群言; 马力
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2021-07-30
Anticipated expiration: 2040-01-14
Also published as: CN113189571B

Abstract

The invention discloses a sound source passive distance measurement method based on tone characteristic extraction and deep learning, which comprises the following steps: extracting time domain features, spectral features based on short-time Fourier transform, auditory spectral features based on equivalent rectangular bandwidth and harmonic spectral features based on a sinusoidal harmonic model from the real-time acoustic signal; extracting a plurality of tone descriptors from each feature to form 68-dimensional tone descriptor vectors; inputting the 68-dimensional tone descriptor vector into a pre-established deep neural network, outputting probability distribution corresponding to each distance, and taking the maximum probability as a distance predicted value. The method can achieve the ranging precision of more than 95% within the range of 1-10 km, and the highest ranging precision reaches 99.54%.

Description

Sound source passive ranging method based on tone feature extraction and deep learning

Technical Field

The invention relates to the field of underwater sound physics, in particular to a sound source passive distance measurement method based on tone feature extraction and deep learning.

Background

Passive sound source ranging is a main function of a sonar system and has been a problem addressed by underwater sound workers for many years. Because the ocean is a time-varying and space-varying complex acoustic channel, the traditional matching field method often faces the problems of environmental mismatch, too large calculated amount and the like during ranging. In recent years, deep learning is taken as a new branch based on a data driving mode, and a new idea is provided for underwater sound passive ranging by the strong feature extraction capability and the unique advantages of processing complex, high-dimensional, nonlinear and other data.

The extraction and the construction of deep learning features are key links of the passive positioning of underwater targets. The tone of the acoustic signal contains a large amount of information of the acoustic source and the underwater sound field, and the acoustic source distance measurement model is constructed by utilizing the tone characteristics extracted from the underwater acoustic signal and the deep neural network, so that the effective identification of the acoustic source distance can be realized.

Disclosure of Invention

The invention aims to overcome the technical defects and provides a method for realizing passive distance measurement of a sound source based on tone feature extraction and deep learning. The method comprises the steps of extracting time domain waveform characteristics, time domain envelope characteristics, short-time Fourier transform (STFT) -based spectrum characteristics, equivalent rectangular bandwidth-based auditory spectrum characteristics and harmonic spectrum characteristics based on a sinusoidal harmonic model from an acoustic signal by using MATLAB, extracting a set of complete tone color descriptors on the basis, taking the complete tone color descriptors as model input, and realizing estimation of the sound source distance through a deep neural network.

In order to achieve the above object, the present invention provides a sound source passive distance measurement method based on tone feature extraction and deep learning, the method comprising:

extracting time domain features, spectral features based on short-time Fourier transform, auditory spectral features based on equivalent rectangular bandwidth and harmonic spectral features based on a sinusoidal harmonic model from the real-time acoustic signal;

extracting a plurality of tone descriptors from each feature to form 68-dimensional tone descriptor vectors;

inputting the 68-dimensional tone descriptor vector into a pre-established deep neural network, outputting probability distribution corresponding to each distance, and taking the maximum probability as a distance predicted value.

As an improvement of the above method, the time domain features include: time domain waveform characteristics and time domain envelope characteristics; the tone color descriptors extracted from the time domain features comprise attack time, decay time, unvoiced reverberation time, logarithmic attack time, attack slope, descent slope, time domain centroid, effective duration, frequency modulation, amplitude modulation, zero crossing rate and RMS energy envelope;

the timbre descriptors extracted from the short-time fourier transform-based spectral features include: spectral centroid, spectral spread, spectral skewness, spectral kurtosis, spectral slope, spectral dip, spectral roll-off, spectral flux, and spectral energy;

the tone color descriptor extracted from the auditory spectrum feature based on the equivalent rectangular bandwidth comprises: spectral centroid, spectral spread, spectral skewness, spectral kurtosis, spectral slope, spectral dip, spectral roll-off, spectral flux, and spectral energy;

tone descriptors extracted from harmonic spectral features based on a sinusoidal harmonic model include: spectral centroid, spectral spread, spectral skewness, spectral kurtosis, spectral slope, spectral dip, spectral roll-off, spectral flux, and spectral energy.

As an improvement of the above method, the input layer of the deep neural network inputs 68-dimensional timbre descriptor vectors;

the hidden layer activation function of the deep neural network adopts a hyperbolic tangent function;

the output layer of the deep neural network adopts 200 Softmax nodes and corresponds to probability distribution of different distances.

As an improvement of the above method, the method further comprises: the deep neural network training method specifically comprises the following steps:

establishing a training set: the transmitting signal adopts a broadband signal s (t), a Pekeris waveguide is used as an environment model, and the target distance range is 1-10 km; for the transmitting signal s (t), the transmitting condition and the water body condition are kept unchanged, the receiving distance is used as the only variable in the simulated underwater acoustic environment, and the KRAKEN sound field model is used for obtaining the corresponding receiving signal y_i(t), i is 1,2 … N, and N is the number of signals; introducing Gaussian white noise n (t) into a received signal, wherein the substance range of the SNR is as follows: 1-10 dB;

performing frame calculation on each sample of the training set to obtain a characteristic sequence, respectively calculating each tone descriptor to form 68-dimensional tone descriptor vectors, and inputting the vectors into a deep neural network;

iterative optimization is carried out on the loss function through a back propagation algorithm by using an optimization algorithm to obtain a minimum value, a mean square error is taken as a cost function, model parameters are updated by adopting an Adam algorithm and an MSE cost function, and a Drop-out regularization strategy is used for realizing network parameter regularization; the initial weights are generated by a truncated normal distribution model with a standard deviation set to 0.1.

The invention also provides a sound source passive ranging system based on tone characteristic extraction and deep learning, which comprises: the device comprises a trained deep neural network, a tone characteristic extraction module, a tone descriptor calculation module and a distance prediction module;

the tone characteristic extraction module is used for extracting time domain characteristics, spectrum characteristics based on short-time Fourier transform, auditory spectrum characteristics based on equivalent rectangular bandwidth and harmonic spectrum characteristics based on a sinusoidal harmonic model from the real-time sound signals;

the tone descriptor calculation module is used for respectively extracting a plurality of tone descriptors from each feature to form 68-dimensional tone descriptor vectors;

and the distance prediction module is used for inputting the 68-dimensional tone descriptor vector into the trained deep neural network, outputting probability distribution corresponding to each distance, and taking the maximum probability as a distance prediction value.

As an improvement of the above system, the time domain features include: time domain waveform characteristics and time domain envelope characteristics; the tone color descriptors extracted from the time domain features comprise attack time, decay time, unvoiced reverberation time, logarithmic attack time, attack slope, descent slope, time domain centroid, effective duration, frequency modulation, amplitude modulation, zero crossing rate and RMS energy envelope;

As an improvement of the system, the input layer of the deep neural network inputs 68-dimensional tone descriptor vectors;

As an improvement of the above system, the specific process of training the deep neural network is as follows:

The invention has the advantages that:

1. the method can achieve more than 95% of ranging accuracy within the range of 1-10 km, and the highest ranging accuracy can reach 99.54%;

2. for the trained model, the distance measurement task can be completed within 0.1s, and the real-time performance is achieved;

3. the method of the invention establishes the model based on the data, and can avoid the sound field theoretical modeling of the unknown environment, thereby avoiding the influence caused by the environmental mismatch to the maximum extent and improving the universality of the model; the multi-dimensional sensing characteristic quantities of the acoustic signals in time and frequency domains are constructed, a large amount of information of the acoustic source and the underwater sound field is extracted, and model learning efficiency and stability are facilitated; the trained model only performs light-weight calculation in the prediction stage, and is beneficial to realizing real-time processing of data.

Drawings

FIG. 1 is a schematic diagram of environmental parameters of a KRAKEN sound field model;

FIG. 2 is a schematic diagram of a portion of an acoustic signal generated by the KRAKEN model;

FIG. 3 is a schematic diagram of a feature construction and tone feature extraction process;

FIG. 4 is a schematic view of a feature space;

FIG. 5 shows the ranging accuracy of data in the training set and the test set during the network training process;

FIG. 6 is a schematic diagram illustrating the influence of signal frequency and depth on the model ranging accuracy;

FIG. 7 is a MSE curve for signals of different bandwidths at a center frequency of 500 Hz;

FIG. 8 is a MSE curve for signals of different bandwidths at a center frequency of 1000 Hz;

FIG. 9 is a MSE curve for signals of different time lengths at a center frequency of 500 Hz;

FIG. 10 shows MSE curves for signals of different time lengths at a center frequency of 1000 Hz.

Detailed Description

The technical solution of the present invention is described in detail below with reference to the accompanying drawings and specific embodiments.

The invention provides a sound source passive distance measurement method based on tone characteristic extraction and deep learning.

Step 1), KRAKEN sound field model calculation:

the transmitting signal adopts a broadband signal, a typical Pekeris waveguide is used as an environment model (as shown in figure 1), the target distance is 1-10 km, the transmitting signal adopts a broadband signal s (t), and the KRAKEN sound field model can be used for calculating a receiving signal y (t) of a sound field. And for the transmitting signal s (t), keeping the transmitting condition and the water body condition unchanged, and taking the receiving distance as a unique variable in the simulated underwater acoustic environment. Obtaining a corresponding received signal y by using a KRAKEN sound field model_i(t) (i ═ 1,2 … N). Gaussian white noise n (t) is introduced into a received signal, SNR (signal to noise ratio) is 1-10 dB, and the signal is divided into two batches according to a certain proportion to be used as a neural network training set and a test set.

Step 2) extracting tone color characteristics:

time domain waveform features, time domain envelope features, Short Time Fourier Transform (STFT) -based spectral features, equivalent rectangular bandwidth-based auditory spectral features, and sinusoidal harmonic model-based harmonic spectral features are extracted from the acoustic signal, on which basis a 68-dimensional timbre descriptor is extracted for each acoustic sample and used as a model input. The feature extraction flow chart is shown in fig. 2.

Step 3), deep neural network:

the deep neural network carries out iterative optimization on the loss function by using an optimization algorithm through a Back Propagation (BP) algorithm to obtain a minimum value. And the mean square error is used as a cost function, the Adam optimization algorithm is used for network training, and meanwhile, the Drop-out method is used for realizing the regularization of network parameters, so that the overfitting is reduced.

In the experimental process, simulation data are generated by a KRAKEN sound field calculation tool under Pekeris waveguide environment parameters. FIG. 1 depicts environmental parameters used by the present invention. The simulation data comprises signals of Continuous Waves (CW) at 50Hz, 150Hz and 300Hz, the center frequencies of linear frequency modulation signals (LFM) are 500Hz, 1000Hz and 2000Hz, the frequency bandwidth range is 100-1000 Hz, and the signal length is 0.2-1.0 s. By adding gaussian noise to the analog received signal (SNR 1dB to 10 dB). The receiving point distance is distributed in 1-10 kilometers, the depth is distributed in 5-145 m, the network input training set accounts for 80% of the total sample set and consists of 16080 samples, and the rest 20% of data is used as a test set and consists of 4020 samples. And performing tone characteristic extraction on the generated sample, solving statistical characteristics of the characteristic sequence obtained by frame calculation, and taking the mean value and the variance as input characteristics. Finally, 68-dimensional tone descriptors of 20100 samples are obtained and used as input features of the deep neural network, as shown in fig. 3. The extracted feature space is shown in fig. 4, and the meaning of each feature is shown in table 1:

TABLE 1 timbre characteristics

The neural network selects Adam optimization algorithm to perform network training, the initial learning rate is set to be 0.03, the cost function selects MSE, Drop-out regularization processing is adopted, 5% of neurons are forbidden in each iteration, the initial weight is generated by a truncation positive-space distribution model, the standard deviation is set to be 0.1, the hidden layer activation function adopts hyperbolic tangent function, the output layer adopts 200 Softmax nodes, and probability distribution corresponding to different distances is achieved. The number of network iterations is set to 20000.

To transmit CW signals (f 150Hz, z)_s35m), fig. 5 shows the prediction result of the system through 20000 iterations, depth spiritThe prediction precision of the network on the test set reaches 99.54 percent, and the effective identification of the sound source position can be realized; the experimental result shows that the method has the advantages that the estimation precision on the test set is over 95% under different conditions, the prediction performance is stable, and the method is an effective method.

According to the comparison of training efficiencies of different transmitting signals, the performance of the algorithm is robust to waveform parameters and sound source depth (as shown in fig. 6), and the training efficiency of a transient transmitting signal model with small bandwidth and long signal duration is higher (as shown in fig. 7, 8, 9 and 10).

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A sound source passive ranging method based on tone feature extraction and deep learning comprises the following steps:

inputting the 68-dimensional tone descriptor vector into a pre-established deep neural network, outputting probability distribution corresponding to each distance, and taking the maximum probability as a distance prediction value.

2. The sound source passive ranging method based on timbre feature extraction and deep learning as claimed in claim 1, wherein the time domain features comprise: time domain waveform characteristics and time domain envelope characteristics; the tone color descriptors extracted from the time domain features comprise attack time, decay time, unvoiced reverberation time, logarithmic attack time, attack slope, descent slope, time domain centroid, effective duration, frequency modulation, amplitude modulation, zero crossing rate and RMS energy envelope;

3. The sound source passive ranging method based on timbre feature extraction and deep learning as claimed in claim 2, wherein 68-dimensional timbre descriptor vectors are input into the input layer of the deep neural network;

4. The sound source passive ranging method based on timbre feature extraction and deep learning as claimed in claim 3, wherein the method further comprises: the deep neural network training method specifically comprises the following steps:

5. A sound source passive ranging system based on tone feature extraction and deep learning, the system comprising: the device comprises a trained deep neural network, a tone characteristic extraction module, a tone descriptor calculation module and a distance prediction module;

6. The sound source passive ranging system based on timbre feature extraction and deep learning of claim 5, wherein the time domain features comprise: time domain waveform characteristics and time domain envelope characteristics; the tone color descriptors extracted from the time domain features comprise attack time, decay time, unvoiced reverberation time, logarithmic attack time, attack slope, descent slope, time domain centroid, effective duration, frequency modulation, amplitude modulation, zero crossing rate and RMS energy envelope;

7. The sound source passive ranging system based on timbre feature extraction and deep learning as claimed in claim 6, wherein the input layer of the deep neural network inputs 68-dimensional timbre descriptor vectors;

8. The sound source passive ranging system based on timbre feature extraction and deep learning as claimed in claim 7, wherein the specific process of training the deep neural network is as follows: