CN109932708B

CN109932708B - Method for classifying targets on water surface and underwater based on interference fringes and deep learning

Info

Publication number: CN109932708B
Application number: CN201910225516.9A
Authority: CN
Inventors: 杨坤德; 周星月
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2022-09-23
Anticipated expiration: 2039-03-25
Also published as: CN109932708A

Abstract

The invention relates to a method for classifying targets on water surface and underwater based on interference fringes and deep learning, which comprises the steps of firstly utilizing a Gaussian ray acoustic model to simulate to obtain a large number of sound field interference fringe images including clear images, fuzzy images and images with random depth errors as training sets of a depth confidence network DBN (deep belief network) and a CNN, then using the trained DBN as a front-end processing module of the CNN, simultaneously adding a beam forming preprocessing module for forming a system, and finally inputting the preprocessed images into the DBN to realize the autonomous classification of the input fringe images. The method solves the problem that the training set sample is too small in deep learning by simulating a large number of sound field interference fringe images; designing a DBN model optimizes the problems of blurring and fringe offset in an actual input image; the design of CNNs enables autonomous classification of surface and underwater targets.

Description

Method for classifying targets on water surface and underwater based on interference fringes and deep learning

Technical Field

The invention belongs to the technical fields of underwater acoustic engineering, ocean engineering and the like, and relates to a method for classifying targets on water surface and underwater based on interference fringes and deep learning. A classification method based on interference fringe images and deep neural networks DBN and CNN is used for classifying water surface targets and underwater targets, and is suitable for water surface targets with target depth more than 20m and underwater targets with target depth less than 20 m.

Background

In the field of image recognition, classification of water surface and underwater targets is always an urgent problem to be solved. In the conventional method, the source depth is estimated by using signal processing methods such as beam forming and doa (beamforming and direction estimation), and then the water surface and underwater targets are classified according to the estimated target depth. Although physical characteristics of each target such as azimuth and depth can be described by the method, each target signal needs to be artificially and separately processed, parameter adjustment and optimization are carried out on different targets, and the method has no universal adaptability. In addition, for the purpose of realizing classification of water surface and underwater targets, the traditional method cannot simultaneously carry out a large amount of data processing and does not have the capability of autonomous classification. In addition, because the data samples of the underwater targets are few and the accuracy is not high, the deep learning method is difficult to be applied to deep learning or the learned characteristics are not comprehensive enough, so that the current deep learning method based on the classification of the water surface and the underwater targets is difficult to have higher accuracy and universal adaptability.

At present, the methods for classifying targets by adopting Deep Learning are water surface Ship Classification based on Synthetic Aperture Radar (SAR), and the latest proposed method is to realize Ship Classification of small data sets by Using High-precision SAR Images and Convolutional Neural Networks (CNN) (convolutional Neural networks), which is disclosed in the 18 th (9) stage of Sensors in 2018 and has an initial page number of 2929. Although the surface targets can be classified, the SAR image can only observe the movement of a surface ship and cannot detect underwater moving targets, so that the surface targets and the underwater targets cannot be classified at the same time. The sound field interference fringe image can be used for depth positioning of a target sound source, and the positioning method is referred to as "Acoustic-intensity striationless depth: Interpretation and modeling", which is published in Journal of Acoustic Society of America at stage 142(3) in 2017, and page number 245-. Under the condition of a known sound velocity profile SSP (sound velocity profile), a large number of sound field interference fringe images can be obtained through simulation, and no report that the sound field interference fringe images are applied to classification of water surface and underwater targets exists at present.

Disclosure of Invention

Technical problem to be solved

In order to avoid the defects of the prior art, the invention provides a method for classifying targets on water surface and underwater based on interference fringes and deep learning.

Technical scheme

A method for classifying objects on water surface and underwater based on interference fringes and deep learning is characterized by comprising the following steps:

step 1: acquiring the temperature T, the depth h, the salinity S and the hydrostatic pressure P of a certain sea area; and calculating the sound velocity under the corresponding depth according to a sound velocity empirical formula:

C _h ＝1449.2+4.6T-0.055T ² +0.00029T ³ +(1.34-0.01T)(S-35)+0.0168P

wherein: h is depth, C _h Is the sound velocity at the corresponding depth h, T is the temperature, S is the salinity, and P is the hydrostatic pressure at standard atmospheric pressure;

plotting sound velocity profile (SSP): calculating a sound velocity value every 2m within the range from the sea level to the maximum depth of the mixed layer, calculating a sound velocity value every 20m within the range from the maximum depth of the mixed layer to the axial depth of the sound channel, calculating a sound velocity value every 100m within the range from the axial depth of the sound channel to the depth of the sound source, and calculating a sound velocity value every 1000m within the range from the depth of the sound source to the depth of the sea bottom;

step 2: simulating by using a Gaussian ray acoustic model BELLHOP and a known sound velocity profile (SSP) to obtain a sound field interference fringe image set; the fringe image comprises a non-interference complete image, a fuzzy image and an image with random depth errors under the conditions of different sound source depths and different distances;

step 3, training a CNN network by using the sound field interference fringe image: the input image of the CNN is a grayscale image, and the image size is 256 × 256; the CNN network model design comprises five convolution layers Conv, five rectifying linear units ReLu, four pooling layers Pool and an output layer, wherein the output layer comprises a full-connection layer FC and a Softmax layer; each convolution layer in the CNN consists of a plurality of convolution nerve units, and the parameters of each nerve unit are obtained by optimization of a back propagation algorithm;

the calculation formula of each convolutional layer is as follows:

where m x n is the size of the kernel, w (u, v) is the weight of the kernel with the position (u, v), x _i,j Is an input of the convolutional layer, y _i,j Is the output of the convolution operation, i, j corresponds to the position of the convolution kernel;

maximum pooling operation was used:

the output y of each layer of the rectifying linear units ReLu is defined as follows:

y＝max(0,y _i,j )

tiling all two-dimensional image features output by the previous pooling layer into fully-connected one-dimensional vectors, and then inputting the fully-connected one-dimensional vectors into a Softmax layer, wherein the calculation formula of Softmax is as follows:

q stands for the number of classes, Q2, i, j is the target class, and y is the output _i Is the label that the sample belongs to each category,

and

is the probability corresponding to each category calculated for the fully connected layer;

setting the size of a kernel of each Conv layer to be 3, setting the step of the kernel to be 1, setting all zero padding of the other Conv layers to be 1 except the zero padding of the first Conv layer to be 10, and setting the number of hidden neurons of each Conv layer to be 64, 64, 128, 256 and 512; the stride and kernel size of each Pool layer are set to be 2, and the number of hidden neurons is 64, 128, 256 and 512 respectively; the number of hidden neurons in the fully connected layer is 4096, the number of neurons in the Softmax layer is 2, and the output value only contains "0" and "1", that is, the Softmax layer outputs a binary classification label: when the depth of the sound source is less than or equal to 20m, the output is [0,1 ]; conversely, if the sound source depth >20m, then [1,0] is output

The success rate of the CNN training is set to 94%, and when the test result is greater than 94%, the CNN training is finished;

step 4, training the DBN network by using the sound field interference fringe image: the DBN network model design includes 6 layers: a limiting Boltzmann machine RBM and a back propagation neural network BP layer which are stacked in four layers; the success rate of the DBN training is set to be 90%, and when the test result is greater than 90%, the DBN training is finished;

and 5: the input underwater acoustic signal is input to the DBN after being processed by beam forming, the trained DBN is used as a front-end processing module of the trained CNN, and the judgment module is used as a rear-end output module of the CNN;

step 6: when the output label of the CNN is a [0,1] label, judging the CNN is a water surface target; when [1,0] is output, the underwater target is judged.

The sound velocity value C _h Should remain four decimal places.

When simulating an image in the step 2, setting a receiving depth to be close to the seabed, setting a depth range of a target sound source to be 1-500m, taking a sound source depth every 1m, setting a distance to be 5-30km, taking a distance every 100m, wherein the sound source frequency is a broadband signal, the frequency band is 50-500Hz, and taking a frequency point every 1 Hz; and for the targets at all depths, calculating the propagation loss (TL) of each frequency point of the depth sound source by using a BELLHOP model, and splicing the propagation loss (TL) curves of all frequency points together, thereby obtaining a non-interference and complete interference fringe image at the current depth.

In the step 2, for the set sound source depth, the simulation of the fuzzy image is to take the propagation loss (TL) of one frequency point every 2Hz, 3Hz, 4Hz, 5Hz, 6Hz, 7Hz, 8Hz, 9Hz and 10Hz within the frequency band of 50-500Hz, and then splice all the propagation losses (TL) calculated by the ray acoustic model to obtain 9 interference fringe images with different fuzzy degrees.

When the image with the random depth error is simulated in the step 2, for the set sound source depth, the random error range is set to be +/-10% of the sound source depth, 450 random numbers are generated in the error range, and the propagation loss (TL) of all the frequency points calculated by the BELLHOP model is spliced corresponding to the sound source depth of each frequency point to obtain the image with the random depth error.

Advantageous effects

The invention provides a method for classifying targets on water surface and underwater based on interference fringes and deep learning, which comprises the steps of firstly utilizing a Gaussian ray acoustic model to simulate to obtain a large number of sound field interference fringe images including clear images, fuzzy images and images with random depth errors as training sets of a depth confidence network DBN (deep belief network) and a CNN, then using the trained DBN as a front-end processing module of the CNN, simultaneously adding a beam forming preprocessing module for forming a system, and finally inputting the preprocessed images into the DBN to realize the autonomous classification of the input fringe images. The method solves the problem that a training set sample is too small in deep learning by simulating a large number of sound field interference fringe images; designing a DBN model optimizes the problems of blurring and stripe deviation in an actual input image; the design of CNNs enables autonomous classification of surface and underwater targets.

The beneficial effects are that: simulating the sound field interference images corresponding to the sound sources at different depths by using known SSP and BELLHOP models to obtain a large number of simulation images, and solving the problems of less underwater sound training data and difficult model modeling of a simulation target in deep learning; in the third step, the CNN network model for classifying underwater and water surface targets is designed, the loss rate obtained by utilizing simulation data training is only 6.42%, and the success rate is up to 88.68% by adopting 106 experimental data verification; in the fourth step, a DBN network model for optimizing the interference fringe image is designed, 106 experimental data are adopted to verify that the success rate of the CNN without the DBN module reaches 82.08% and is reduced by 6.6%, and the DBN is proved to well improve the target classification accuracy. The method can automatically classify the water surface target and the underwater target, and has the accuracy rate of 88.68% through experimental data verification.

Drawings

FIG. 1 is a schematic flow chart of the water surface and water surface classification based on deep learning provided by the method of the invention.

FIG. 2 shows SSP measured in a certain sea area of the south China sea.

FIG. 3 is a simulated frequency-distance dependent sound field interference fringe pattern when sound source depths are respectively 5, 11, 20, 50, 150 and 300m

Fig. 4 is a CNN network model.

Fig. 5 is a DBN network model.

FIG. 6 comparison of input and output results of DBN test air gun data

(a) Inputting an interference fringe image of an air gun target (b) outputting a DBN-optimized interference fringe image

Fig. 7 shows the loss rate of CNN versus the success rate of the experimental data test.

Detailed Description

The invention will now be further described with reference to the following examples, and the accompanying drawings:

the invention provides a method for simulating and obtaining thousands of sound field interference fringe images with different sound source depths by utilizing a Gaussian ray acoustic model BELLHOP and a known SSP, wherein the images comprise a non-interference complete image, a fuzzy image and an image with a random depth error. Then, taking the blurred image and the image with the random depth error as an input training set, and taking the non-interference complete image as an output training set to train the DBN; all the fringe images are used as an input training set, the image categories are used as an output training set to train CNN, and the DBN and the CNN need to be trained for thousands of times until convergence. After the DBN and the CNN are trained, the DBN is used as a front-end processing module of the CNN, meanwhile, beam forming is added to be used as a preprocessing module, and finally, a judging module is added to output the categories of underwater and water surface targets, so that the underwater and water surface targets are classified autonomously. The classification process comprises the following steps:

acquiring SSP data of a certain sea area, detecting the Temperature, the depth, the salinity and the hydrostatic pressure of the sea by adopting a jettison type Temperature profile measuring system XBT (extensible Bathy thermal), a thermohaline depth meter CTD (product Temperature depth) and other equipment, and calculating the sound velocity under the corresponding depth according to a sound velocity empirical formula:

C _h ＝1449.2+4.6T-0.055T ² +0.00029T ³ +(1.34-0.01T)(S-35)+0.0168P (1)

wherein h is depth, C _h T is the sound velocity at the corresponding depth h, T is the temperature, S is the salinity, and P is the hydrostatic pressure at standard atmospheric pressure.

Calculating the sound velocity C under each depth according to the formula (1) _h . When the sound velocity profile SSP is drawn, a sound velocity value is taken every 2m within the range of the maximum depth of the sea level-mixed layer, and the mixed layer is maximumTaking a sound velocity value every 20m in the range of depth-sound channel axis depth, taking a sound velocity value every 100m in the range of sound channel axis depth-sound source depth, taking a sound velocity value every 1000m in the range of sound source depth-seabed depth, and taking a sound velocity value C _h Should remain to four decimal places.

And step two, simulating by using a Gaussian ray acoustic model BELLHOP and a known SSP to obtain a large number of sound field interference fringe images, wherein the fringe images comprise non-interference complete images, blurred images and images with random depth errors under the conditions of different sound source depths and different distances. In a determined sound field, the interference phenomena at different sound source depths and distances are different, which is represented by different numbers and shapes of interference fringes in an image.

When simulating an image, setting the receiving depth to be close to the seabed, setting the depth range of a target sound source to be 1-500m, taking the depth of one sound source every 1m, setting the distance to be 5-30km, taking the distance every 100m, taking the frequency of the sound source as a broadband signal, setting the frequency band to be 50-500Hz, and taking a frequency point every 1 Hz. For the targets at all depths, calculating the propagation loss TL (transmission loss) of each frequency point of the depth sound source by using a BELLHOP model, and splicing TL curves of all frequency points together, thereby obtaining a non-interference and complete interference fringe image at the current depth.

Similarly, for the set sound source depth, the simulation of the fuzzy image is to take the TL of one frequency point every 2Hz, 3Hz, 4Hz, 5Hz, 6Hz, 7Hz, 8Hz, 9Hz, and 10Hz within the frequency band of 50-500Hz, and then splice all the TLs obtained by calculating the ray acoustic model to obtain 9 interference fringe images with different fuzzy degrees.

When the image with the random depth error is simulated, for the set sound source depth, the random error range is set to be +/-10% of the sound source depth, 450 random numbers are generated in the error range, TL of all frequency points calculated by the BELLHOP model are spliced corresponding to the sound source depth of each frequency point, and the image with the random depth error is obtained.

And step three, training the CNN network by using the sound field interference fringe image, wherein in order to avoid classification errors caused by chromatic aberration with an actual target interference fringe, the input image of the CNN is a gray-scale image with the image size of 256 × 256. The CNN network model design includes five convolution layers conv (volumetric layer), five rectifying Linear units relu (rectified Linear units), four pooling layers pool (polarizing layer), and an output layer, which includes a full-connected layer FC (full-connected layer) and a Softmax layer.

Each convolutional layer in the CNN consists of a plurality of convolutional neural units, and the parameters of each neural unit are obtained by optimization through a back propagation algorithm. The convolution operation aims to extract different input features, the first layer of convolution layer can only extract some low-level features such as edges, lines, angles and the like, and more layers of networks can iteratively extract more complex features from the low-level features. The calculation formula of each convolutional layer is as follows:

where m x n is the size of the kernel, w (u, v) is the weight of the kernel with the position (u, v), x _i,j Is the input of the convolutional layer, y _i,j Is the output of the convolution operation, i, j corresponds to the position of the convolution kernel.

Usually, after the convolutional layer, a feature with a large dimension is obtained, and the role of the pooling layer is to divide the feature into several regions, take the maximum value or the average value of the regions, and obtain a new feature with a small dimension, that is, down-sampling. There are two common calculations for pooling layers: a Max Pooling (Max Pooling) operation and a Mean Pooling (Mean Pooling) operation. Because the classification method proposed by the present invention is based on CNN banding feature recognition for classification, the maximum pooling operation is adopted here:

y＝max(0,y _i,j ) (4)

the full-connection layer is used for converting all local features into global features in a combined mode, and is used for calculating the probability of each last class, all two-dimensional image features output by the previous pooling layer need to be tiled into full-connection one-dimensional vectors, and then the full-connection one-dimensional vectors are input into the Softmax layer, and the calculation formula of Softmax is as follows:

where T represents the number of classes, T is 2, i, j is the target class, and the output y _i Is the label that the sample belongs to each category,

and

is the probability corresponding to each class calculated for the fully connected layer.

The kernel size of each Conv layer is set to 3, the step size of the kernel is set to 1, the Zero padding of all the other Conv layers is set to 1 except for the Zero-padding (Zero-padding) of the first Conv layer being 10, and the number of hidden neurons of each Conv layer is set to 64, 64, 128, 256 and 512, respectively. Similarly, the step size and kernel size of each Pool layer are set to 2, and the number of hidden neurons is 64, 128, 256, and 512, respectively. The number of hidden neurons in the fully connected layer is 4096, the number of neurons in the Softmax layer is 2, and the output value only contains "0" and "1", that is, the Softmax layer outputs a binary classification label: when the depth of the sound source is less than or equal to 20m, the output is [0,1 ]; in contrast, if the sound source depth >20m, [1,0] is output.

The success rate of CNN training was set to 94%, when the test result > 94%, indicating that CNN training was completed.

And fourthly, training the DBN network by using the sound field interference fringe image. The DBN network model design includes 6 layers: four-layer stacked restricted Boltzmann machines RBM (restricted Boltzmann machines) and back propagation neural network BP (Back propagation) layers.

The energy function E (v, h) of the RBM is defined as follows:

here, m and n are visible layer neurons v _j And hidden layer neurons h _i I is 1,2Kn, j is 1,2Km, a _j ,b _i Respectively, their corresponding offsets, w _ij Is a visible neuron v _j To hidden neuron h _i The weight of the connection between. Assuming that the sequence number of each layer is 1,2K6, the activation function sigmoid function is defined as:

thus, the conditional probability density for each layer can be derived as:

here, the conditional probability density distribution, i.e., the objective function of the DBN, contains the visible layer v _j Hidden layer h _i And the connection weight w between the visible layer and the hidden layer _ij And an offset a _j ,b _i This corresponds to the result of exponentiating the energy function E (v, h) by E and then normalizing it. The training process is repeated to find the maximum likelihood logarithm of the function.

Each RBM network has 2 layers, where the first layer is called the visual layer, i.e. the input layer V, and the second layer is the hidden layer H, i.e. the feature extraction layer. The visible layer of the DBN, i.e. the input layer of the first RBM1, is V1, the input data is an image of size 256 × 256, the number of hidden neurons in the input layer V1 is 4096, the number of neurons in other hidden layers than V1 is set to 512, and the number of hidden neurons in the BP neural network is 4096. After completing the training of RBM1, consider hidden layer H1 as input layer V2 of the next layer of RBM2, add new hidden layer H2 after H1 as the hidden layer of RBM2, and so on until the training of the fourth RBM4 is completed, add BP after hidden layer H4 as the fully connected layer, and fine tune the whole DBN. The output of BP should be reshaped again to a 256 × 256 matrix and plotted as a grayscale as input to CNN.

Because each layer of RBM network can only ensure that the weight value in the layer of the RBM network can achieve the optimal mapping of the feature vector of the layer, but not the optimal mapping of the feature vector of the whole DBN network, the BP network is responsible for transmitting information to each layer of RBM from top to bottom and finely tuning the whole DBN network.

The success rate of the DBN training is set to 90%, when the test result > 90%, it indicates that the DBN has been trained.

The underwater sound signals input in the fifth step are input to the DBN after being processed by beam forming, the trained DBN is used as a front-end processing module of the trained CNN, and the judgment module is used as a rear-end output module of the CNN

Step six, the judging module is used for identifying the output label of the CNN, and when the output is the [0,1] label, the water surface target is judged; when [1,0] is output, the underwater target is judged. And finally, autonomous underwater and water surface target identification is realized.

Fig. 1 is a schematic flow chart of the water surface and water surface classification based on deep learning provided by the method of the invention. As shown in the figure, the specific flow of the method of the invention is to collect the sound velocity profile SSP of the sea area where the target is located in advance; simulating a large number of interference fringe images with different sound source depths and different distances by using the SSP and ray acoustic model; next, training the DBN and the CNN by taking the interference fringe images as a training set; then, the trained DBN is used as a front-end processing module of the trained CNN, and the judging module is used as a back-end module of the CNN, so that an automatic target classification system is formed; finally input the key

The underwater sound target signal is input to the DBN after being processed by wave beam forming, and finally, an autonomous underwater and water surface target classification and identification result is obtained.

FIG. 2 is an interference fringe image simulated by the method of the present invention, as shown in the figure, the depths of the targets in the image are respectively 5, 11, 20, 50, 150 and 300m, the distances are all 5-30km, the frequency bandwidths are all 50-500Hz, and the frequency resolution is 1 Hz.

FIG. 3 is a typical deep sea sound velocity profile, where sound velocity values are taken every 2m in the range of depth 0-100m (maximum depth of the mixed layer), sound velocity values are taken every 20m in the range of 100-1190m (depth corresponding to the sound channel axis), sound velocity values are taken every 100m in the range of 1190-4500m (target sound source depth), sound velocity values are taken every 1000m in the range of 4500-5500m (maximum sea depth), and the sound velocity values are four digits after being accurate to a minimum point.

Fig. 4 is a CNN network model adopted by the method of the present invention, in which each convolutional layer is represented by Conv, the pooling layer is represented by Pool, followed by the serial number of each layer, and the four parameters after colon represent sequentially from left to right: the number of hidden neurons, the number of zero padding, the size of the kernel function, and the stride of the kernel move.

Fig. 5 is a CNN network model adopted by the method of the present invention, wherein V represents a visible layer, and H represents a hidden layer, since the DBN is formed by stacking four RBMs, the hidden layer H1 of RBM1 is regarded as the visible layer V2 of the next RBM2, the hidden layer H2 of RBM2 is regarded as the visible layer V3 of RBM 3, the hidden layer H3 of RBM 3 is regarded as the visible layer V4 of RBM 3, and the hidden layer H4 of RBM4 is followed by adding a BP neural network as the last layer. The parameters followed by each layer represent the number of hidden neurons.

Fig. 6 is a comparison between input and output results of DBN model test air gun data proposed by the method of the present invention, where (a) is an interference fringe image of an input air gun target, and (b) is an interference fringe image after DBN optimization is output. The input data is air gun data obtained by an offshore test, the depth of the air gun is 11m, the air gun is a water surface target, the receiving distance is 6-20km, interference fringes actually obtained in the graph 6(a) are fuzzy due to the fact that target radiation noise signals are collected every 1.9km, however, after the DBN test, a clearer banded interference fringe image can be obtained in the graph 6(b), and the DBN can play a role in optimizing the fringe image.

Fig. 7 shows the loss rate of the CNN model proposed by the method of the present invention obtained by using simulated image training, and the success rate of the experimental data test. 6000 simulation images including a complete image without interference, a blurred image and an image with random depth error, wherein the red line in the image shows that the loss rate is as low as 6.6%. The total number of experimental images used for testing is 106, the collection location of the sound source signal is consistent with SSP in FIG. 3, and the result shows that the CNN with the DBN module added can successfully identify the categories of 94 images, the success rate is 88.68%, the CNN without the DBN module can only correctly classify 87 images, the success rate is 82.08%, and the difference between the CNN and the CNN is 6.4%, which indicates that the DBN can effectively improve the classification success rate and has an optimization effect on the images.

The method has obvious implementation effect in a typical embodiment, has the advantages that the method automatically classifies the sound field interference fringe images of the water surface underwater targets by using a deep learning method compared with the prior art, achieves the purpose of classifying the water surface targets and the water surface targets, and has a success rate of 88.68% as proved by actual measurement data verification.

Claims

1. A method for classifying objects on water surface and underwater based on interference fringes and deep learning is characterized by comprising the following steps:

step 1: obtaining the temperature T, the depth h, the salinity S and the hydrostatic pressure P of a certain sea area; and calculating the sound velocity under the corresponding depth according to an empirical formula of the sound velocity:

C _h ＝1449.2+4.6T-0.055T ² +0.00029T ³ +(1.34-0.01T)(S-35)+0.0168P

plotting sound velocity profile (SSP): calculating a sound velocity value every 2m within the range from the sea level to the maximum depth of the mixed layer, calculating a sound velocity value every 20m within the range from the maximum depth of the mixed layer to the sound channel axis depth, calculating a sound velocity value every 100m within the range from the sound channel axis depth to the sound source depth, and calculating a sound velocity value every 1000m within the range from the sound source depth to the seabed depth;

the calculation formula of each convolutional layer is as follows:

maximum pooling operation was used:

y＝max(0,y _i,j )

q represents the number of classes, Q is 2, i, j is the target class, and the input is the one-dimensional vector x of the upper layer's full connection _i The output is the target class y, y-i or j,

and

the probabilities corresponding to the categories are calculated for the full connection layer;

setting the size of a kernel of each Conv layer to be 3, setting the step of the kernel to be 1, setting all zero padding of the rest Conv layers to be 1 except for the zero padding of the first Conv layer to be 10, and setting the number of hidden neurons of each Conv layer to be 64, 64, 128, 256 and 512; the step and the kernel size of each Pool layer are set to be 2, and the number of hidden neurons is 64, 128, 256 and 512; the number of hidden neurons in the fully connected layer is 4096, the number of neurons in the Softmax layer is 2, and the output value contains only "0" and "1", i.e. the Softmax layer outputs a binary class label: when the depth of the sound source is less than or equal to 20m, the output is [0,1 ]; conversely, if the sound source depth >20m, then [1,0] is output

and 6: when the output label of the CNN is a [0,1] label, judging the CNN is a water surface target; and when the [1,0] is output, judging the underwater target.

2. The method for classifying objects above and below water based on interference fringes and deep learning as claimed in claim 1, wherein: the sound velocity value C _h Should remain to four decimal places.

3. The method for classifying objects above and below water based on interference fringes and deep learning as claimed in claim 1, wherein: when simulating an image in the step 2, setting a receiving depth to be close to the seabed, setting a depth range of a target sound source to be 1-500m, taking a sound source depth every 1m, setting a distance to be 5-30km, taking a distance every 100m, wherein the sound source frequency is a broadband signal, the frequency band is 50-500Hz, and taking a frequency point every 1 Hz; and for the targets at all depths, calculating the propagation loss (TL) of each frequency point of the depth sound source by using a BELLHOP model, and splicing the propagation loss (TL) curves of all frequency points together, thereby obtaining a non-interference and complete interference fringe image at the current depth.

4. The method for classifying objects above and below water based on interference fringes and deep learning as claimed in claim 1, wherein: in the step 2, for the set sound source depth, the simulation of the fuzzy image is to take the propagation loss (TL) of one frequency point every 2Hz, 3Hz, 4Hz, 5Hz, 6Hz, 7Hz, 8Hz, 9Hz and 10Hz within the frequency band of 50-500Hz, and then splice all the propagation losses (TL) calculated by the ray acoustic model to obtain 9 interference fringe images with different fuzzy degrees.

5. The method for classifying objects above and below water based on interference fringes and deep learning as claimed in claim 1, wherein: when the image with the random depth error is simulated in the step 2, for the set sound source depth, the random error range is set to be +/-10% of the sound source depth, 450 random numbers are generated in the error range, and the propagation loss (TL) of all the frequency points calculated by the BELLHOP model is spliced corresponding to the sound source depth of each frequency point to obtain the image with the random depth error.