CN113239809B

CN113239809B - Underwater sound target identification method based on multi-scale sparse SRU classification model

Info

Publication number: CN113239809B
Application number: CN202110530281.1A
Authority: CN
Inventors: 曾向阳; 杨爽; 薛灵芝
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2023-09-15
Anticipated expiration: 2041-05-14
Also published as: CN113239809A

Abstract

The invention relates to a method for identifying underwater sound targets based on a multi-scale sparse SRU classification model, which utilizes different feature expressions learned by SRUs of different levels to perform feature fusion on multi-scale features of input data, and takes the fused feature combination as the feature input of a classifier (the last layer) to complete the classification and identification task of multi-class targets. Meanwhile, jump connection is added in the classification model to accelerate model convergence and reduce training time.

Description

Underwater sound target identification method based on multi-scale sparse SRU classification model

Technical Field

The invention belongs to the field of underwater sound target passive identification under a noise mismatch condition, and particularly relates to an underwater sound target identification method based on a multi-scale sparse SRU classification model.

Background

Underwater sound target identification is one of important research directions and problems in the field of underwater sound signal processing. Due to the influence of noise in a complex marine environment, when a recognition method with a very good laboratory simulation effect is applied to a real scene, the effect may not be as expected. Therefore, suppressing noise interference and improving the robustness of the identification method are critical for practical application of the identification method.

In recent years, deep Learning (DL) theory is paid attention to, and a new idea is provided for the underwater sound target recognition technology. Wang Jiang et al studied the applicability of the Deep learning method in underwater acoustic target recognition, built convolutional neural network (Convolutional Neural Network, CNN) and Deep Brief Net, DBN) models, and recognized 3 actually measured underwater acoustic targets. The results show that the use of a deep learning model for underwater sound mesh identification is a viable approach. Lu Anan et al applied long and short term memory (Long short term memory, LSTM) networks to underwater acoustic target recognition. The LSTM structure has better recognition effect on the MFCC features than the conventional recurrent neural network (Recurrent Neural Networks, RNN), CNN and deep neural network (Deep Neural Networks, DNN), which means that the network can obtain better effect by using more time sequence information. The front-back dependence of the LSTM network makes RNN parallel computation difficult, so the computation speed is far less than CNN. Tao Lei et al propose a simple circulation unit (Simple Recurrent Unit, SRU) based on studies of LSTM model. They demonstrated the effectiveness of SRUs in terms of natural language processing (Natural Language Processing, NLP) tasks. SRUs are accelerated 5-9 times over cuDNN optimized LSTM on classification and question answer datasets.

SRU has not been applied to the field of underwater sound target passive recognition at present, and has application in the problems of voice recognition, text recognition and the like. The application only completes the classification recognition task by simply increasing the depth and the hidden node number of the network or combining with other network structures such as CNN and the like, does not adopt a parallel sparse structure to fuse the multi-scale SRU output characteristics, and completes the recognition task in a mode of feature fusion and abundant recognition characteristic information.

Disclosure of Invention

The invention solves the technical problems that: in order to simplify the flow of a underwater sound target passive recognition system, enrich recognition characteristic information, maintain high correct recognition rate of a classification model and improve adaptability under a noise mismatch condition, the invention introduces an SRU method into underwater sound target recognition and provides a multi-scale sparse SRU classification model. The tone information is implied based on the time domain waveform structural characteristics by utilizing the processing capacity of deep learning end-to-end, and the method can be used for distinguishing different types of underwater sound target signals to construct an underwater sound target passive recognition system and classify and recognize targets. The problems are solved: (1) The method avoids relying on manual extraction of features, and simplifies the process of constructing the underwater sound target passive identification system; (2) The SRUs of different layers are used for supervised learning of multi-scale characteristic representation of the underwater sound target time sequence (time domain waveform), so that identification characteristic information is enriched; (3) Each SRU block in the model is added with jump connection, so that the mapping between the input and the output is easier to converge than the direct learning of the model, and the training time of the model is reduced. (4) Under the condition that the noise conditions of the training sample and the test sample are not matched, the high correct recognition rate can be maintained, and the network with noise robustness is provided.

The technical scheme of the invention is as follows: the underwater sound target identification method based on the multi-scale sparse SRU classification model is characterized by comprising the following steps of:

step 1: the underwater sound target sample data acquisition and preprocessing method comprises the following three steps:

step 1.1: defining a plurality of underwater sound targets, wherein n (n is more than or equal to 3) type underwater sound targets are taken as research objects and marked; wherein n types of underwater sound targets are respectively provided with m (m is more than or equal to 15) audio files. Intercepting each audio file in each type of underwater sound target for t (t is more than or equal to 5) seconds from the beginning, and framing to obtain underwater sound target sample data;

step 1.2: dividing the obtained underwater sound target sample data into a training set, a verification set and a test set, and

step 1.3: carrying out standardization processing on the training set data and the verification set data, and carrying out standardization processing after adding band-limited white noise with different SNR (signal-to-noise ratio) to the test set data;

step 2: taking the time domain waveforms of the training set, the verification set and the test set which are subjected to standard treatment in the step 1.3 as the input of a multi-scale sparse SRU classification model, and performing model training and testing, wherein the method comprises the following substeps:

step 2.1: the method comprises the following two steps of:

step 2.1.1: on the basis of constructing a single-layer SRU model,

wherein, the calculation of single layer SRU is as follows:

wherein X is _t Is the input of the t-th time step; sigma is a sigmoid function mapping the input to between 0 and 1, h _t-1 Is an implicit state at time t-1. W (W) _f 、W _r And W is a parameter matrix, v _f ，v _r ，b _f And b _r Is a parameter vector that needs to be learned in training. Equations (1) and (2) define the forgetting gate f at time t, respectively _t And reset gate r _t The method comprises the steps of carrying out a first treatment on the surface of the Equation (3) defines a candidate hidden state at time tEquation (4) defines the final hidden state h at time t _t Wherein->For a jump connection, g is a nonlinear tanh function mapping the input between-1 and 1, -/->Is a dot product operator.

Step 2.1.2: constructing sparse connections

The SRU blocks with multiple scales obtained in the step 2.1.1 are subjected to sparse connection to obtain fusion forms of feature expressions learned by different SRU blocks, and the fused feature combinations are used as feature input of a classifier (model top layer) to finish classification and identification tasks of 3 types of targets;

step 2.2: model training is carried out on the training set and the verification set after the standardization processing, the actual output probability is calculated through forward propagation, network parameters are updated through a backward gradient propagation algorithm, and the loss value of a loss function is reduced to continuously reduce errors, so that the actual output probability of the model is more and more close to the expected output probability;

step 2.3: in order to test the training effect of the model created under the mismatch condition on the unknown data set, the test set is used as the input of the stored optimal model, the model classification performance is further tested, and during the test, an F1 value is used for carrying out error measurement on the network model, wherein the definition of the F1 value is as shown in a formula (5):

where P is the precision, i.e. the percentage of all targets predicted to be target i (i=1, 2, 3) that is actually target i; r is the recall, i.e., the percentage of successful predictions as target i among all targets actually being target i. The F1 value can be regarded as a weighted average of P and R. The F1 value was used to evaluate the merits of the different algorithms. In the invention, the higher the F1 value is, the better the classification performance of the model algorithm is.

Step 3: and (3) taking the correct recognition rate of the verification set obtained in the step (2.2) as an evaluation index in the training process, and taking the F1 value in the step (2.3) as an evaluation index in the testing process.

The invention further adopts the technical scheme that: and the preprocessing in the step 1 is to divide frames of each section of target data, and determine the length and the frame shift of each frame to obtain the overall underwater sound target sample data.

The invention further adopts the technical scheme that: the normalization in the step 3 is zero-mean and variance normalization.

The invention further adopts the technical scheme that: the model in the step 4.1.2 consists of an input layer, 4 SRU blocks, a multi-feature layer and 1 full-connection layer; wherein each SRU block is composed of one SRU and one Layer Normalization layer; layer Normalization layers are used in RNN networks for normalizing target inputs in the channel direction; each SRU block adds a hopping connection between the model inputs and the multi-feature layers, constituting a local architecture of the network.

The invention further adopts the technical scheme that: the SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of hidden nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and 4 SRU blocks are sparsely connected. The top layer of the model is connected with 1 full-connection layer. The full connection layer is not activated, and is directly used as a judging layer to output actual output probability.

Effects of the invention

The invention has the technical effects that: aiming at the classification and identification task of 3 types of actual measurement underwater sound targets, the invention constructs a multi-scale sparse SRU classification model based on a supervised simple circulation unit.

1) The multi-scale sparse SRU classification model provided by the invention is connected with SRUs of different layers in parallel to obtain multi-scale characteristic information, so that the network model obtains 96.7% correct recognition rate in a verification set.

2) The multi-scale sparse SRU classification model utilizes the advantage of a cyclic neural network to process a time sequence, uses multi-scale feature representations (constructing multi-scale SRU blocks) of SRUs of different layers to supervise learning underwater sound target time sequences (time domain waveforms), and fuses the feature representations (constructing sparse connection) to complete model construction.

3) In the iterative training process of the step 2.2, model parameters are optimized when the multi-scale sparse SRU classification model is trained to the 4 th time, and the optimal correct recognition rate of the verification set is obtained. Compared with the method, the optimal correct recognition rate of the verification set is obtained when the multi-scale sparse SRU classification model without jump connection is trained to 21 st time, so that jump connection is added to accelerate model convergence and reduce model training time.

4) Compared with a CNN classification model, the multi-scale sparse SRU classification model can keep higher correct recognition rate under the condition that noise conditions of a training sample and a test sample are not matched, and is a network with noise robustness.

Drawings

FIG. 1 is a flow chart of a hydroacoustic target recognition system based on a multi-scale sparse SRU classification model

FIG. 2 is a framework structure diagram of a multi-scale sparse SRU classification model

FIG. 3 is a diagram of a framework of a multi-layered CNN model

Detailed Description

In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Referring to fig. 1-3, in order to improve the recognition performance of a classification recognition system in a noise mismatch environment, the invention provides a hydroacoustic target recognition method based on a multi-scale sparse SRU classification model, which is inspired by the acceptance of a CNN model. According to the method, different feature expressions learned by SRUs at different levels are utilized to perform feature fusion on the multi-scale features of the input data, and the fused feature combination is used as the feature input of a classifier (the last layer) to finish the classification and identification task of multi-class targets. Meanwhile, jump connection is added in the classification model to accelerate model convergence and reduce training time.

Step 1: and (5) obtaining and preprocessing underwater sound target sample data.

Defining a plurality of underwater sound targets, wherein 3 types of underwater sound targets are taken as research objects and marked; wherein the 3 kinds of underwater sound targets each carry 15 audio files. Intercepting each audio file in each type of underwater sound target for 5 seconds from the beginning, and framing to obtain underwater sound target sample data; and then the underwater sound target data are strictly divided into a training set, a verification set and a test set. I.e.And finally, constructing a noise mismatch condition for the test set.

Step 2: taking the time domain waveforms of the training set, the verification set and the test set which are subjected to the standardization processing in the step 1 as the input of the multi-scale sparse SRU classification model created in the invention, and carrying out model training and testing, wherein the method comprises the following substeps:

step 2.1: the method comprises the following two steps of:

step 2.1.1: a multi-scale SRU block is constructed.

Each SRU block is formed by combining a single-layer or multi-layer stacked SRU with a hopping connection. Different SRU blocks learn feature expressions of input data through SRUs stacked by different layers, the stacking layers are different, the hidden node numbers are different, the learned features are different, and the features can be regarded as different feature expression forms of the input data. The shallow layer stack and the fewer hidden nodes acquire low-level features, and the deep layer stack and the more hidden nodes learn high-level features. The lower level features can reflect the local characteristics of the data, and the higher level features have abstract and invariance. The jump connection is applied to deeper network structures, so that the mapping between input and output is easier to converge than the mapping between input and output which are directly learned by a depth model, and meanwhile, the problem of gradient dissipation of the SRU in the training process is solved.

The single layer SRU is calculated as follows:

wherein X is _t Is the input of the t-th time step; sigma is a sigmoid function mapping the input to between 0 and 1, h _t-1 Is an implicit state at time t-1. W (W) _f 、W _r And W is a parameter matrix, v _f ，v _r ，b _f And b _r Is a parameter vector that needs to be learned in training. Equations (1) and (2) define the forgetting gate f at time t, respectively _t And reset gate r _t . Equation (3) defines a candidate hidden state at time tEquation (4) defines the final hidden state h at time t _t Wherein->The method is a jump connection, and the problem of gradient dissipation generated in gradient training of a deep Network is effectively solved by using the thought of a high way Network. The activation function g is a tanh function mapping the input between-1 and 1,/for>Is a dot product operator.

Step 2.1.2: and constructing sparse connection.

Inspired by the CNN model acceptance, the network does not simply increase the expression capability of the network by increasing the depth and the number of hidden nodes of the model, but can realize multi-scale feature expression through a non-uniform sparse structure. Such multi-scale feature expression may also be applied in RNN networks. The SRU is used as a parallelizable RNN, and has the advantages of time sequence modeling capability and low calculation cost. Meanwhile, feature fusion can enrich identification feature information. And (3) performing sparse connection on the plurality of multi-scale SRU blocks obtained in the step (2.1.1) to obtain a fusion form of feature expression learned by different SRU blocks, and inputting the fused feature combination as the feature of a classifier (model top layer) to finish the classification and identification task of the class 3 targets.

Step 2.2: training process.

And taking the standardized training set and the standardized verification set as the input of the model, and training the model. In the iterative training process, the training set firstly carries out forward propagation to calculate the actual output probability, then updates network parameters through a backward gradient propagation algorithm, reduces the loss value of the loss function to reduce errors, and obtains the actual output probability which is closer to the expected output probability. The expected output probability and the actual output probability are 3-dimensional probability tensors, and the index corresponding to the maximum value in the probability tensors is the target label i (i=1, 2, 3). Comparing the actual target labels (marked in step 1.1) of the known training set samples with the expected target labels (indexes corresponding to the maximum value in the actual output probability tensor obtained by network training) to obtain the correct recognition rate (the percentage of the target labels in all target labels in correct classification) of the training set; the validation set then proceeds to forward propagate to calculate the actual output probabilities. Comparing the actual target label of the verification set sample with the expected target label to obtain the correct recognition rate of the verification set; after the iterative training is completed, the trained optimal model (comprising optimal model parameters) is saved.

Step 2.3: and (5) testing.

In order to test the training effect of the model created under the mismatch condition on the unknown data set, the test set is used as the input of the stored optimal model, and the model classification performance is further tested. During testing, the error measurement is performed on the network model by using an F1 value, wherein the F1 value is defined as in a formula (5):

Step 3: and (5) evaluating a model.

And (3) taking the correct recognition rate of the verification set obtained in the step (2.2) as an evaluation index in the training process, and taking the F1 value in the step (2.3) as an evaluation index in the testing process. And comprehensively evaluating the model through 2 indexes obtained through training and testing to obtain more comprehensive evaluation. In order to prove the effectiveness of the invention, the multi-layer CNN model based on the time sequence is used for comparison, namely the multi-scale sparse SRU classification model constructed in the step 2.1 is replaced by the multi-layer CNN model, other steps are unchanged, 2 evaluation indexes for training and testing are obtained, and the multi-scale sparse SRU classification model is compared.

The underwater sound target recognition method of the invention is described in detail with reference to examples and drawings, and the flow of an underwater sound target recognition system is shown in fig. 1. The underwater sound target identification method based on the multi-scale sparse SRU classification model is realized by programming in a Python language PyTorch environment. The training and verification of the model are performed on the GPU, and the cuDNN is used for accelerating the training speed of the model.

Step 1: 3 kinds of underwater sound targets (ships, commercial ships and certain underwater targets) with labels are read, 15 audio files are selected from each group, and each audio file is intercepted for 5s. First, the underwater sound target data is preprocessed. Each segment of target data is framed, each frame is 100ms long, and the frame shift is 0. I.e. one sample per 0.1 seconds of target data, there are 2250 samples for a total of 3 classes of targets. The underwater sound target data is strictly divided into a training set, a verification set and a test set. 3/5 of the total sample was used in the experiment as training, 1/5 as validation and 1/5 as test. And performing standardization processing (zero-mean and variance normalization) on training data and verification data of the 3-class targets, extracting time domain waveforms of the training data and the verification data as input of a network model, and constructing a time sequence network model. In the test, in order to construct a noise mismatch condition, band-limited white noise is added to 3 types of target test data respectively, target test data with SNR of-20 dB, -15dB, -10dB, -5dB, 0dB, 5dB, 10dB, 15dB and 20dB are generated respectively, and then standardized processing is carried out on the test data.

step 2.1: and constructing a multi-scale sparse SRU classification model. A framework structure diagram of the multi-scale sparse SRU classification model is shown in fig. 2. The model consists of an input layer, 4 SRU blocks (within the dashed box in the figure), a multi-feature layer, and 1 fully connected layer. Wherein each SRU block is made up of one SRU and one Layer Normalization layer. The Layer Normalization layer is mainly used in an RNN network, and performs normalization operation on target input in the channel direction, so that data is ensured to be distributed consistently in the channel direction. Meanwhile, each SRU block adds a skip connection (skip connection) between the model input and the multi-feature layer, which forms a local structure of the network. The jump connection is applied to deeper network structures, so that the mapping between input and output is easier to converge than the mapping between input and output which are directly learned by a depth model, and meanwhile, the problem of gradient dissipation of the SRU in the training process is solved. The SRU in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of hidden nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and 4 SRU blocks are sparsely connected. The multi-feature layer contains feature expressions learned by different layers of SRUs. Batch Normalization after application to the multi-feature layer, by normalizing the data batch, the network convergence speed is increased while preventing the gradient of the network from disappearing and gradient exploding. The top layer of the model is connected with 1 full-connection layer. The full connection layer is not activated, and is directly used as a discrimination layer to output actual output (probability).

Step 2.2: and (5) model training. And taking the time domain waveforms of the class 3 target training set and the validation set as the input of the network model, and training the network model. Setting training parameters, randomly initializing a network, calculating loss by adopting a sparse classification cross entropy loss function, optimizing gradient by adopting an adaptive moment estimation (Adam) algorithm, and obtaining the learning rate of 0.001 and the training times of 50 times. After training, obtaining the optimal correct recognition rate of the verification set and the corresponding network model parameters, storing an optimal model (parameter optimal), and carrying out subsequent tests.

Step 2.3: and (5) model testing. In order to test the recognition performance of the network model under the noise mismatch condition, the trained network model is applied to the underwater sound target classification recognition task under the noise mismatch condition, and the model is further analyzed to obtain F1 values of the network model under different SNR.

Step 3: and (5) evaluating a model. The correct recognition rate of the verification set obtained in the step 2.2 is an evaluation index in the training process, and the F1 value in the step 2.3 is an evaluation index in the testing process. In order to prove the effectiveness of the present invention, a time-series-based multi-layer CNN model was used for comparison when oriented to actual verification. The framework structure diagram of the multi-layer CNN model is shown in figure 3. The model consists of an input layer, 3 one-dimensional convolution layers, 3 one-dimensional pooling layers and 2 full connection layers. In the model, each convolution layer performs ReLU activation, batch Normalization operation is performed before activation, and a one-dimensional pooling layer is added to reduce the space size of data. The number of convolution kernels of the 3 one-dimensional convolution layers is 32, 64 and 128, respectively, and the size of the one-dimensional pooling layer is 3. The 1 st full connection layer uses ReLU as an activation function, input data is normalized in batches by Batch Normalization operation before activation, a dropout layer is added after activation to prevent network over fitting, and the dropout value is 0.5. The 2 nd full connection layer is not activated and is directly used as a discrimination layer to output the actual output (probability) of various underwater sound target samples.

The following are experimental results. During training, the correct recognition rate of the verification set is 96.0% by using the multi-layer CNN model, and 96.7% by using the multi-scale sparse SRU classification model, which is higher than that of the multi-layer CNN model. In addition, model parameters are optimized when the multi-scale sparse SRU classification model is trained to the 4 th time, and the optimal correct recognition rate of the verification set is obtained. Compared with the method, the optimal correct recognition rate of the verification set is obtained when the multi-scale sparse SRU classification model without jump connection is trained to 21 st time, so that jump connection is added to accelerate model convergence and reduce model training time. The F1 values for the 2 models at different SNRs are shown in table 1 below. As can be seen from the accompanying table 1, the F1 value of the multi-scale sparse SRU classification model is higher than that of the multi-layer CNN model under low SNR, which indicates that the proposed model can suppress the influence caused by noise mismatch, and is a network with noise robustness. The above conclusion effectively demonstrates the effectiveness of the present invention.

Table 1: f1 values for 2 models at different SNR

Claims

1. The underwater sound target identification method based on the multi-scale sparse SRU classification model is characterized by comprising the following steps of:

step 1.1: defining a plurality of underwater sound targets, wherein n types of underwater sound targets are taken as research objects and marked, and n is more than or equal to 3; wherein n types of underwater sound targets are respectively provided with m audio files; wherein m is more than or equal to 15; intercepting each audio file in each type of underwater sound target for t seconds from the beginning, and then subdividing the audio file into frames to obtain underwater sound target sample data, wherein t is more than or equal to 5;

step 2.1: the method comprises the following two steps of:

step 2.1.1: constructing multi-scale SRU blocks, wherein each SRU block is formed by combining a single-layer or multi-layer stacked SRU with a jump connection; different SRU blocks learn the feature expression of the input data through SRUs stacked on different layers, the stacking layers are different, the hidden node numbers are different, the learned features are different, and the features can be regarded as different feature expression forms of the input data; shallow layer stacking and fewer hidden node numbers acquire low-level features, and deep layer stacking and more hidden node numbers learn high-level features; the lower-level features can reflect the local characteristics of the data, and the higher-level features have abstract and invariance; the method comprises the steps of carrying out a first treatment on the surface of the

Wherein, the calculation of single layer SRU is as follows:

wherein X is _t Is the input of the t-th time step; sigma is a sigmoid function mapping the input to between 0 and 1, h _t-1 Is an implicit state at time t-1; w (W) _f 、W _r And W is a parameter matrix, v _f ，v _r ，b _f And b _r The parameter vector to be learned in training is obtained; equations (1) and (2) define the forgetting gate f at time t, respectively _t And reset gate r _t The method comprises the steps of carrying out a first treatment on the surface of the Equation (3) defines a candidate hidden state at time tEquation (4) defines the final hidden state h at time t _t Wherein->In order to make a jump connection,g is a nonlinear tanh function mapping the input between-1 and 1,/and>is a dot product operator; step 2.1.2: constructing sparse connections

The SRU blocks with multiple scales obtained in the step 2.1.1 are subjected to sparse connection to obtain fusion forms of feature expressions learned by different SRU blocks, and the fused feature combinations are used as feature input of a classifier, namely a model top layer, so that classification and identification tasks of multiple targets are completed;

the model consists of an input layer, 4 SRU blocks, a multi-feature layer and 1 full-connection layer; wherein each SRU block is composed of one SRU and one Layer Normalization layer; layer Normalization layers are used in RNN networks for normalizing target inputs in the channel direction; each SRU block adds jump connection between the model input and the multi-feature layer to form a local structure of the network;

the SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of hidden nodes is 16, 32, 64 and 256 respectively; the internal structure of each SRU block is fully connected and non-sparse, and 4 SRU blocks are sparsely connected; the top layer of the model is connected with 1 full-connection layer; the full connection layer is not activated and directly serves as a judging layer to output actual output probability;

where P is the precision, i.e. the percentage of all targets predicted to be target i, actually target i, i=1, 2,3; r is recall rate, namely, the success rate is predicted as the percentage of the target i in all targets which are actually the target i; the F1 value can be regarded as a weighted average of P and R; using F1 value to evaluate the merits of different algorithms; the higher the F1 value is, the better the classification performance of the model algorithm is;

2. The method for identifying underwater sound targets based on the multi-scale sparse SRU classification model according to claim 1, wherein the preprocessing in step 1 is to frame each segment of target data, and determine the length and frame shift of each frame, so as to obtain overall underwater sound target sample data.

3. The method for identifying the underwater sound target based on the multi-scale sparse SRU classification model according to claim 1, wherein the normalization in the step 3 is zero-mean and variance normalization.