CN110456332B

CN110456332B - Underwater acoustic signal enhancement method based on automatic encoder

Info

Publication number: CN110456332B
Application number: CN201910738375.0A
Authority: CN
Inventors: 李理; 罗五雄; 殷敬伟; 郭龙祥; 于雪松; 顾师嘉; 韩笑
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2019-08-12
Filing date: 2019-08-12
Publication date: 2022-08-05
Anticipated expiration: 2039-08-12
Also published as: CN110456332A

Abstract

The invention discloses an underwater sound signal enhancement method based on an automatic encoder, and belongs to the field of underwater sound signal processing. Aiming at the problem of difficulty in extracting echo signal features in the active sonar, the invention designs an automatic encoder combining a noise reduction automatic encoder and a convolution noise reduction automatic encoder. Firstly, preprocessing a signal containing noise by utilizing the noise reduction advantage of a noise reduction automatic encoder on the whole signal; then, local noise reduction is carried out on the signal by combining optimization of a convolution noise reduction automatic encoder on local characteristics of the signal, so that signal enhancement is realized. The method can directly take the time domain waveform of the received signal as characteristic input, and reserves the amplitude and phase characteristics of the signal. Experimental results show that the invention not only effectively reduces the noise component in the signal, but also achieves better recovery effect in both time domain and frequency domain.

Description

Underwater acoustic signal enhancement method based on automatic encoder

Technical Field

The invention relates to an underwater sound signal enhancement method, in particular to an underwater sound signal enhancement method based on a deep learning technology, and belongs to the field of underwater sound signal processing.

Background

In the signal processing process, due to the influence of noise, remote detection and weak signal processing become very difficult, noise reduction becomes a problem which is troubled by researchers for a long time, and noise reduction is an essential process for effectively analyzing signals. In the conventional signal processing method, noise reduction is performed by filtering, and when a linear filtering method is used, the noise in periodic and quasi-periodic signals can be completely eliminated as long as the time sequence is long enough according to the distribution characteristics of the signals in the frequency domain. However, for noise generated by a nonlinear system, because both signals and noise are represented by broadband continuous spectrums on frequency spectrums, the filtering effect of the conventional method is greatly reduced, and a new noise reduction method suitable for nonlinear signals needs to be explored.

With the development of computer technology, neural network algorithms are widely applied, deep learning is a neural network with a multilayer learning structure proposed by Hinton G et al in 2006, and although each hidden layer in the network structure generally only uses simple nonlinear transformation, nonlinear transformation can be generated by nonlinear combination of the multilayer network, so that the deep learning has strong feature learning capability and can discover the internal change rule in data. Since the deep learning is proposed, the deep learning attracts the wide attention of many scholars at home and abroad, and the deep learning not only continuously develops new on the basis of theoretical algorithm, but also increasingly applies to actual occasions such as image recognition, image noise reduction, voice signal processing, human brain simulation and the like.

Disclosure of Invention

In order to overcome various defects in the traditional method, the invention provides a method for enhancing underwater sound signals based on a deep learning technology, which has a better noise reduction effect.

In order to solve the technical problem, the technical scheme of the invention is as follows:

an underwater sound signal enhancement method based on an automatic encoder comprises the following steps:

(1) constructing a neural network model of a noise reduction automatic encoder and a neural network model based on a convolution automatic encoder; the automatic noise reduction encoder inputs a signal with noise, an output signal is used as an input signal of the automatic convolution encoder, the automatic convolution encoder is of a symmetrical structure and comprises an encoding part and a decoding part, the encoding part encodes the input signal and compresses characteristic information to a low-dimensional space, and the decoding and encoding part decompresses the low-dimensional characteristic information into a clean signal;

(2) simulating a series of echo signals which can be received according to various parameters of the linear frequency modulation signals transmitted by the active sonar, and generating signal pairs corresponding to the signals with noise and the clean signals;

(3) randomly dividing the data sample into a training sample set and a testing sample set;

(4) pre-training parameters of a neural network model of the noise reduction automatic encoder by using a training sample set until a sample loss function of the training set reaches an index;

(5) the output of the noise reduction automatic encoder is used as the input of the convolution automatic encoder, and the clean signal is used as the output of the noise reduction automatic encoder, so that the parameters of the neural network model of the convolution automatic encoder are pre-trained until the sample loss function of the training set reaches the index;

(6) taking a signal with noise as the input of the whole neural network, taking a clean signal as the output of the whole neural network, and adjusting and optimizing the encoder by using the data set until a sample loss function of the training set reaches an index;

(7) and completing the setting of the parameters of the encoder, obtaining network model parameters, sampling the underwater acoustic signals and then taking the sampled underwater acoustic signals as the input of the integral neural network of the combined noise reduction automatic encoder and the convolution automatic encoder, and obtaining enhanced signals after noise reduction.

Further, the signal pair formed by the noisy signal and the clean signal in the step (2) is a time domain waveform, the input and the output of the noise reduction automatic encoder in the step (4) are the noisy signal and the clean signal, after the noise reduction automatic encoder is pre-trained, the network output corresponding to the noisy signal is used as the input of the convolution automatic encoder, and the clean signal is used as the output of the convolution automatic encoder.

Further, in step (6), when the loss function of the noise reduction automatic encoder is smaller than the preset threshold T and the loss function of the convolution automatic encoder is smaller than the preset threshold P, the noisy signal is used as the input of the joint encoder, and the clean signal is used as the output of the joint encoder, so as to perform joint tuning on the entire encoder network.

Furthermore, the number of the neurons of each layer of the coding part of the noise reduction automatic coder is reduced along with the increasing of the number of the layers, and the number of the neurons of each layer of the decoding part is increased along with the increasing of the number of the layers until the number of the neurons is equal to the number of the sampling points of the clean signal.

Further, the network model of the convolutional automatic encoder comprises an input layer, an output layer and a plurality of convolutional layers, the number of the convolutional layers is more than or equal to 3, and the number of channels of each convolutional layer is more than or equal to 30; in the coding part, a pooling layer is added behind each convolution layer, the pooling layer is the maximum pooling layer, and the number of channels of the pooling layer is equal to that of the previous convolution layer; in the decoding part, an up-sampling layer is added after each convolution layer, and the number of channels of the up-sampling layer is equal to that of the previous convolution layer.

Further, the received echo signals are simulated in the step (2) according to the following formula:

wherein r is ₀ (t) is a received echo signal, x (t) is a transmitting signal, n (t) is Gaussian noise interference of a receiving end, the first item on the right side of an equal sign is direct sound, the second item is a multi-path signal, and a parameter L is the number of intrinsic sound rays passing through a receiving point; a. the _i ，τ _i The signal amplitude of the ith path reaching the receiving point and the time delay value relative to the direct sound signal, A ₀ The direct sound signal amplitude.

The invention has the beneficial effects that: the invention adopts the data-driven learning capability of deep learning, can well solve the problem of difficulty in extracting echo signals due to the change of channels in the ocean and the influence of environmental noise, and can reduce the nonlinear noise in the signals through the strong nonlinear mapping capability of the neural network. According to the invention, the time domain waveform of the signal is directly used as the input and output characteristics of the neural network, and other transformation and characteristic extraction are not performed on the signal, so that a large amount of manpower is saved; the training data set is generated under the condition that various parameters of active sonar emission signals are known, not only are actual active sonar detection conditions met, but also the network has strong robustness, artificial feature extraction often does not have robustness due to the characteristics of time-varying space-variant and the like of an ocean channel, deep learning is a data-driven learning algorithm, robust features can be automatically extracted according to existing simulation data, and the noise reduction level can reach a high level.

Drawings

FIG. 1 is a block diagram of the algorithm of the present invention;

FIG. 2 is a schematic diagram of a neuron calculation process of a basic calculation unit in a deep learning convolutional neural network;

FIG. 3 is a schematic diagram of a pooling layer operation process in a deep learning convolutional neural network;

FIG. 4(a) is a schematic diagram of a fully connected layer without dropout;

FIG. 4(b) is a schematic diagram of a fully connected layer via dropout;

FIG. 5 is a schematic diagram of a neural network model of a noise reduction auto-encoder;

FIG. 6 is a schematic diagram of a neural network model of a convolutional auto-encoder;

FIG. 7(a) is a graph showing the variation of the noise reduction effect and the number of network layers of the noise reduction autoencoder;

FIG. 7(b) is a graph of the variation of noise reduction effect with the minimum dimension of compression of the noise reduction auto-encoder;

FIG. 7(c) is a graph of noise reduction versus iteration number;

FIG. 7(d) is a graph of noise reduction effect versus the use of a noise reduction auto-encoder alone and a convolution auto-encoder in combination;

FIG. 8(a) is a time domain waveform diagram of a clean signal;

FIG. 8(b) is a time domain waveform of a noisy signal with a signal-to-noise ratio of-10 dB;

FIG. 8(c) is a time domain waveform diagram of the network output after training with the noise reduction autoencoder alone;

FIG. 8(d) is a time domain waveform diagram of the network output after training of the joint convolution automatic encoder;

FIG. 9(a) is a time-frequency distribution graph of a noisy signal with a signal-to-noise ratio of-10 dB;

FIG. 9(b) is a time-frequency distribution diagram of the network output after training of the joint convolution automatic encoder.

Detailed Description

The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.

The embodiment of the invention discloses an underwater sound signal enhancement method based on an automatic encoder, which mainly comprises the following steps:

(1) an Automatic Encoder neural network model of regression with the same number of input neurons and output neurons is constructed, a frame diagram of the network is shown in fig. 1, and a training network adopts a joint Automatic Encoder (DAE + CDAE) mode combining a noise reduction Automatic Encoder and a convolution Automatic Encoder (CDAE). In the pre-training stage, adding noise to a training set (train _ clean) to be called a noise-added signal (train _ noise), taking train _ noise as an input signal of the DAE network, taking train _ clean as a target signal of the DAE network, training the DAE network through reverse optimization, and inputting train _ noise and test _ noise to the DAE to obtain a new training set (train _ noise) ₁ ) New test set (test) ₁ ) The pre-training phase ends. Will train ₁ The CDAE network is trained by inverse tuning as the input signal of CDAE, train _ clean as the target signal of CDAE, and after training is completed, test is performed ₁ And carrying out network test to obtain a final de-noising signal and finish the whole training process. The specific noise reduction auto-encoder is shown in fig. 5, and the specific convolution auto-encoder is shown in fig. 6, and when the encoding process is performed, a pooling layer is added behind each convolution layer, and the pooling layer is a maximum pooling layer.

(2) Under the condition that parameters of a linear frequency modulation signal emitted by an active sonar are known, simulating a series of echo signals which can be received, generating a signal pair corresponding to a noisy signal and a clean signal, and converting the signal pair into an input form of a noise reduction automatic encoder after preprocessing; specifically, an echo signal and a noisy echo signal are simulated, the simulated echo signal and the simulated noisy signal are normalized and subjected to data division, and each M sampling points are made to serve as a sample pair of an original data sample;

(3) randomly dividing all data sample pairs into a training sample set and a testing sample set;

(4) and training the neural network of the noise reduction automatic encoder by using a training sample set, wherein the method is implemented by using a back propagation algorithm. Judging whether the loss function of the neural network of the noise reduction automatic encoder reaches a preset index (the loss function is less than or equal to a preset threshold value T), if so, keeping the parameters of the network, switching the input of the neural network of the noise reduction automatic encoder into a test data sample set, and if not, continuing to train the neural network of the noise reduction automatic encoder by using the training sample set on the basis of the original network model parameters, thereby continuously updating the parameters of the network. Judging a loss function again, and repeating the steps circularly until the loss function reaches an index;

(5) and inputting the noisy samples of the training set into the network of the trained noise reduction automatic encoder, taking the output of the network as the input of a convolution automatic encoder, and converting the input and the output into the input form of a convolution neural network.

(6) The convolutional autoencoder neural network is trained using the above data set by a back-propagation algorithm. Judging whether the loss function of the neural network of the convolutional automatic encoder reaches a preset index, if so, keeping the parameters of the network, switching the input of the neural network of the convolutional automatic encoder into a test data sample set, and if not, continuing to train the neural network of the convolutional automatic encoder by using the training sample set on the basis of the original network model parameters, thereby continuously updating the parameters of the network. Judging a loss function, and circulating until the loss function reaches an index; finally, taking the signal with noise as the input of the whole neural network, taking the clean signal as the output of the whole neural network, and adjusting the encoder until the loss function of the training set sample reaches the index (the loss function is less than or equal to a preset threshold value Q);

(7) and finishing the setting of the parameters of the encoder, obtaining network model parameters, sampling the underwater acoustic signals to be used as the input of the integral neural network of the combined noise reduction automatic encoder and the convolution automatic encoder, namely outputting the enhanced signals after noise reduction, converting the output of the convolution automatic encoder into a normal time domain waveform form, and displaying an output result.

In the above step, the signal pair formed by the noisy signal and the clean signal is a time domain waveform, the input and output of the noise reduction automatic encoder in the step (4) are the noisy signal and the clean signal, after the noise reduction automatic encoder is pre-trained, the network output corresponding to the noisy signal is used as the input of the convolution automatic encoder, and the clean signal is used as the output of the convolution automatic encoder.

In step (6), when the loss function of the noise reduction automatic encoder is smaller than T and the loss function of the convolution automatic encoder is smaller than P, the noisy signal is used as the input of the joint encoder, the clean signal is used as the output of the joint encoder, and the joint tuning is performed on the whole encoder network.

The number of neurons of each layer of the coding part of the noise reduction automatic coder is reduced along with the increasing of the number of layers, and the number of neurons of each layer of the decoding part is increased along with the increasing of the number of layers until the number of the neurons is equal to the number of sampling points of a clean signal.

The deep convolution automatic encoder network model comprises an input layer, an output layer and a plurality of convolution layers, wherein the number of the convolution layers is more than or equal to 3, and the number of channels of each convolution layer is more than or equal to 30; in the coding part, a pooling layer is added behind each convolution layer, the pooling layer is the maximum pooling layer, and the number of channels of the pooling layer is equal to that of the previous convolution layer; in the decoding part, an up-sampling layer is added after each convolution layer, and the number of channels of the up-sampling layer is equal to that of the previous convolution layer.

The following is a detailed explanation of related concepts and calculation processes related to the encoder according to the embodiments of the present invention.

A. Neurons and their calculation

As shown in fig. 2, a neuron is a basic operation unit in a neural network, and is a mathematical model constructed by simulating some characteristics of biological nerve cells, the input of the neuron can be a plurality of variables with weight scores, linear combination is performed in the neuron, and the sum of the linear combination is compared with a certain threshold value which is the attribute of the neuron itself, then the response output of the neuron is stimulated, and the linear combination of the response output and the input of the neuron is generally in a nonlinear relationship, so the mathematical modeling of the neuron is calculated as follows:

is provided with

Wherein x _i For each input of a neuron, w _i Is the weight corresponding to the weight of the image,

for non-linear activation function vectors, i.e.

n is the number of the neurons in the upper layer, and b is the offset.

Nonlinear activation function

Generally, the following are used:

the nonlinear activation function in the network model of the present embodiment adopts the second kind.

B. Neural network

The weights and thresholds of the neural network are generally obtained by a back propagation algorithm, and the calculation process related to the back propagation algorithm is described later.

C. Pooling layer treatment

As shown in fig. 3, since there are many parameters to be adjusted when there are many neurons in the neural network, pooling operation can be performed on the neural network of one layer, that is, down-sampling the input data, which not only reduces the amount of computation, but also enhances the generalization capability. As shown, a plurality of numbers in the pooling range are combined into one number, most commonly maximum pooling, average pooling, sum pooling, and the like. The pooling layer of the network model of this embodiment employs the maximum pooling layer, i.e., the result of pooling takes the maximum number of pooling ranges. If the pooling range is not divided equally due to the shape of the input data and the shape of the pooling range, the pooling operation can be performed by appropriately complementing the bit with 0.

D. dropout processing

As shown in fig. 4(a) and 4(b), in order to further improve the generalization capability of the model, Hinton proposes that some neurons can be hidden according to a certain probability in a training process, that is, parameters related to the neurons are not updated in the training process, the values of the last training are maintained, and in the final testing stage, all neurons participate in the calculation.

E. Back propagation algorithm

As shown in fig. 5, the DAE is an improved network structure based on an Automatic-Encoder (AE) and performs data compression, and the compression function and the decompression function can be obtained through Automatic learning by learning sample data through a neural network. The main idea of the DAE is to train an automatic encoder, manually add random noise to an input layer of the encoder, and reconstruct input data at an output layer; then, the input data can be compressed and decompressed through the trained encoder model, and noise reduction is realized in the process, so that subsequent detection tasks can be generatedAnd is better characterized. The basic network structure of DAE with drop structure is shown, where X is the original signal,

is a noisy signal, y is a hidden layer,

is the output layer. Assuming an input signature of X and an output signature of Y, the desired target result is to map the values

As close to x as possible, a squared reconstruction error function, i.e., the loss function of DAE, can be constructed and minimized to obtain the optimized parameters:

wherein N is the number of neurons, W _opt ,W _o ′ _pt ,b _opt ,b _o ′ _pt The network weight before updating, the network weight after updating, the network bias before updating and the network bias after updating are respectively.

And then updating the weight parameters of the whole network by a random gradient descent method, and solving the optimal solution of the target function. The noise reduction automatic encoder algorithm firstly initializes the weight and the bias parameter, and then iteratively updates to obtain the optimal solution, and the specific algorithm steps are as follows:

(1) random initialization: for all l, let Δ W ^(l) ＝0,Δb ^(l) ＝0；

(2) Iteration times are as follows: from i to m:

a) calculation using BP algorithm

And

b) calculate out

c) Calculate out

(3) Updating the weight parameter:

where α is the learning rate and λ is a fixed parameter of the network. Combining the calculation methods of the partial derivatives of the parameters of the full connection layer and each layer, and so on, according to the schematic diagrams of fig. 5 and fig. 6, the partial derivative of each parameter of the previous layer is deduced from the next layer, so as to obtain the partial derivative of the whole deep learning convolutional neural network model, and finally, the parameters are updated according to the negative gradient direction of the parameter space, so that the back propagation algorithm is obtained.

As shown in fig. 6, the CDAE mainly consists of two parts, encoding (encoder) and decoding (decoder), and the overall structure is optimized by hierarchical training. The CDAE network presents a symmetrical structure, the structure encodes input signals in the first two convolutional layers, compresses characteristic information to a low-dimensional space, and decodes the hidden layer in the last two convolutional layers, so that the low-dimensional characteristic information is decompressed into a clean signal.

The input of each convolution layer is 3-dimensional feature data, 2-dimensional operators are adopted for a filter, pooling and upsampling, and in each convolution layer, the CDAE maps input information to features with higher abstraction and robustness through the denoising transformation of learning. The encoder section in a CDAE consists of a number of convolutional layers, which consist of a set of filters that extract features from their input layers, an active layer, which in the present invention is a correction unit that applies non-linearity to the feature map, and a pooling layer. In the pooling layer, the pooling function selects a maximum pooling function (Max-pooling) that downsamples the active layer by mapping a constant factor of the maximum value within a particular spatial range and generates a new mapping space with reduced dimensions. The decoding part is composed of a convolutional layer, an active layer and an Up-sampling layer (Up-sampling), and the Up-sampling layer generates a new layer with a high dimension by Up-sampling the previous active layer.

The CDAE loss function and the overall network loss function are the same as the DAE loss function definition formula network update method, and are not described herein again.

In consideration of the fact that an underwater sound channel belongs to an uneven double-interface random uneven medium channel in the experimental process, time-varying, space-varying and multi-path expansion of a transmitting signal is serious in the process of transmitting through the underwater sound channel, and the waveform of a receiving signal is distorted. The underwater acoustic channel has the characteristic of multi-path expansion, a transmitting signal sequentially reaches the receiving hydrophones through sound rays of different paths, and the final receiving signal is interference superposition of signals which are transmitted through the sound rays. Let the transmitted signal be x (t) and the receiving end Gaussian noise interference be n (t), then the received signal r ₀ (t) is:

wherein, the first term on the right side of the equal sign is direct sound, the second term is a multi-path signal, and the parameter L is the number of intrinsic sound rays passing through a receiving point; a. the _i ，τ _i The signal amplitude of the ith path reaching the receiving point and the time delay value relative to the direct sound signal, A ₀ The direct sound signal amplitude.

In the simulation of training data, an uncontaminated LFM is used as a target signal, the bandwidth is firstly determined to be 2 kHz-8 kHz, 10ms of LFM signal is generated under the sampling rate of 48k, then the LFM signal is randomly inserted into a certain time point to simulate the position where a target echo appears, a section of target signal with 1200 sampling points is generated, 200500 target signals are generated, 200000 signals are used as a training set, and the rest 500 signals are used as a test set; on the basis of a target signal, the number of the eigenlines is set to be 3, the signal amplitude is a random value of intervals [0.9,0.6 ], [0.6,0.3 ], and [0.3,0.1] in sequence, the time delay value is (rand 100) sampling points in sequence, Gaussian white noise with the signal-to-noise ratio of-20 dB to 5dB is added randomly, the time domain signal is directly used as an input signal, and the phase information of the input signal can be well reserved. When training the network, changing the number of DAE network layers, a graph of the enhancement effect and the change of the number of DAE network layers can be obtained, as shown in FIG. 7 (a); when the minimum dimension of compression is changed, a variation graph of the enhancement effect and the minimum dimension of compression can be obtained, as shown in fig. 7 (b); when the iteration number is changed, a change diagram of the enhancement effect and the iteration number can be obtained, as shown in fig. 7 (c); FIG. 7(d) shows the signal enhancement effect for both cases where the network contains DAE only and the network contains DAE and CDAE. Thus, an optimal network structure is determined, and after training is completed on the network structure, an enhanced result graph can be obtained, as shown in fig. 8(d), a clean signal before training is shown in fig. 8(a), a noise signal with a signal-to-noise ratio of-10 dB is shown in fig. 8(b), and an enhanced signal under the network structure with only DAE is shown in fig. 8 (c).

In order to further verify the feasibility and the effectiveness of the method provided by the invention, a signal transmission experiment under the ice of the Songhua river is carried out. The transmitting signal is an LFM signal, the pulse width of the LFM signal is 10ms, and the bandwidth is 2-8 kHz. The received signal is input to the network of the present invention to obtain a recovered signal, and the time-frequency analysis is performed on the signal before and after recovery to obtain the analysis results shown in fig. 9(a) and 9 (b).

As shown in fig. 9(a) and 9(b), it can be seen that the recovered background noise is significantly reduced, and the signal is more concentrated in the due frequency band range in the time period of the target signal, which is also consistent with the result of the theoretical simulation. Because the experimental signal is difficult to align with the transmitting signal and the accurate signal-to-noise ratio can not be calculated, the signal-to-noise ratio before the signal recovery is 3.75dB and the signal-to-noise ratio after the signal recovery is 17.56dB by approximating a signal-free segment as a noise signal and approximating a signal segment as a target signal and calculating the energy average value of the signal. Therefore, the above results verify to some extent the effectiveness of the deep learning neural network structure designed by the present invention in solving the problem of active sonar signal enhancement.

Claims

1. An underwater sound signal enhancement method based on an automatic encoder is characterized in that: the method comprises the following steps:

(7) completing the setting of parameters of an encoder, obtaining network model parameters, sampling the underwater sound signals and then using the sampled underwater sound signals as the input of an integral neural network of a combined noise reduction automatic encoder and a convolution automatic encoder, and obtaining enhanced signals after noise reduction;

the signal pair formed by the noise-carrying signal and the clean signal in the step (2) is a time domain waveform, the input and the output of the noise-reducing automatic encoder in the step (4) are the noise-carrying signal and the clean signal, after the noise-reducing automatic encoder is pre-trained, the network output corresponding to the noise-carrying signal is used as the input of the convolution automatic encoder, and the clean signal is used as the output of the convolution automatic encoder;

in the step (6), when the loss function of the noise reduction automatic encoder is smaller than the preset threshold T and the loss function of the convolution automatic encoder is smaller than the preset threshold P, the noise-carrying signal is used as the input of the joint encoder, and the clean signal is used as the output of the joint encoder, so as to perform joint tuning on the whole encoder network.

2. The method of claim 1, wherein the method comprises: the number of the neurons of each layer of the coding part of the noise reduction automatic coder is reduced along with the increment of the layer number, and the number of the neurons of each layer of the decoding part is increased along with the increment of the layer number until the number of the sampling points of the clean signal is equal.

3. The method of claim 1, wherein the method comprises: the network model of the convolution automatic encoder comprises an input layer, an output layer and a plurality of convolution layers, wherein the number of the convolution layers is more than or equal to 3, and the number of channels of each convolution layer is more than or equal to 30; in the coding part, a pooling layer is added behind each convolution layer, the pooling layer is the maximum pooling layer, and the number of channels of the pooling layer is equal to that of the previous convolution layer; in the decoding part, an up-sampling layer is added after each convolution layer, and the number of channels of the up-sampling layer is equal to that of the previous convolution layer.

4. The method of claim 1, wherein the method comprises: simulating the received echo signal according to the following formula in the step (2):

wherein r is ₀ (t) is the received echo signal, x (t) is the transmitted signal, n (t) is the receiving end gaussian noise interference, the first term on the right side of the equal sign is the direct sound, the second term is the multi-path signal, and the parameter L is the number of the intrinsic sound rays passing through the receiving point; a. the _i ，τ _i The signal amplitude of the ith path to the receiving point and the time delay value relative to the direct sound signal, A ₀ The direct sound signal amplitude.