CN113239809A

CN113239809A - Underwater sound target identification method based on multi-scale sparse SRU classification model

Info

Publication number: CN113239809A
Application number: CN202110530281.1A
Authority: CN
Inventors: 曾向阳; 杨爽; 薛灵芝
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-05-14
Filing date: 2021-05-14
Publication date: 2021-08-10
Anticipated expiration: 2041-05-14
Also published as: CN113239809B

Abstract

The invention relates to an underwater acoustic target recognition method based on a multi-scale sparse SRU classification model, which utilizes different feature expressions learned by different levels of SRUs to perform feature fusion on input data multi-scale features, and uses the fused feature combination as the feature input of a classifier (the last layer) to complete the classification and recognition tasks of various targets. Meanwhile, jump connection is added in the classification model to accelerate model convergence and reduce training time.

Description

Underwater sound target identification method based on multi-scale sparse SRU classification model

Technical Field

The invention belongs to the field of underwater acoustic target passive recognition under the condition of noise mismatch, and particularly relates to an underwater acoustic target recognition method based on a multi-scale sparse SRU classification model.

Background

Underwater acoustic target recognition is one of the important research directions and challenges in the field of underwater acoustic signal processing. Due to the influence of noise in a complex marine environment, when the identification method with good laboratory simulation effect is applied to a real scene, the effect may be inferior to the expected effect. Therefore, suppressing noise interference and improving the robustness of the recognition method are very critical to the practical application of the recognition method.

In recent years, Deep Learning (DL) theory has attracted much attention, and also provides a new idea for underwater acoustic target recognition technology. Wangqiang et al studied the applicability of the Deep learning method in underwater acoustic target recognition, established Convolutional Neural Network (CNN) and Deep Belief Network (DBN) models, and identified 3 actually measured underwater acoustic targets. The result shows that the underwater sound identification by using the deep learning model is a feasible approach. Lu' an et al apply a Long Short Term Memory (LSTM) network to underwater acoustic target recognition. The recognition effect of the LSTM structure on MFCC features is superior to that of a conventional Recurrent Neural Network (RNN), a CNN (CNN) and a Deep Neural Network (DNN), which shows that the network can obtain better effect by utilizing more time sequence information. The front-back dependence of the LSTM network causes RNN parallel computation to be difficult, so the computation speed is far less than that of CNN. Tao Lei et al propose a Simple cycling Unit (SRU) based on a study of the LSTM model. They demonstrated the effectiveness of SRUs in Natural Language Processing (NLP) tasks. SRUs are accelerated 5-9 times over cuDNN optimized LSTM on categorical and question-answer datasets.

The SRU is not applied to the field of underwater sound target passive recognition at present, and has application in problems of voice recognition, text recognition and the like. The applications only complete the classification recognition task by simply increasing the depth and the number of hidden nodes of the network or combining with other network structures such as CNN and the like, do not adopt a parallel sparse structure to fuse the output characteristics of the multi-scale SRU, and complete the recognition task by means of characteristic fusion and enrichment of recognition characteristic information.

Disclosure of Invention

The technical problem solved by the invention is as follows: in order to simplify the flow of an underwater acoustic target passive recognition system, enrich recognition characteristic information, keep high correct recognition rate of a classification model and simultaneously improve the adaptability of the classification model under the condition of noise mismatch, the invention introduces an SRU method into the underwater acoustic target recognition and provides a multi-scale sparse SRU classification model. The method has the advantages that the deep learning end-to-end processing capability is utilized, the timbre information is hidden based on the time domain waveform structure characteristics, and the underwater acoustic target passive identification system can be constructed by distinguishing different types of underwater acoustic target signals and can be used for classifying and identifying targets. The problems are solved: (1) the method avoids the dependence on manual feature extraction, and simplifies the process of constructing the underwater sound target passive identification system; (2) the multi-scale feature representation of the SRUs of different layers for supervised learning of the underwater sound target time sequence (time domain waveform) is used, and the identification feature information is enriched; (3) and each SRU block in the model is added with jump connection, so that the model is easier to converge than the model which directly learns the mapping between input and output, and the training time of the model is reduced. (4) Under the condition that the noise conditions of the training samples and the test samples are not matched, the correct recognition rate can be kept high, and the network is a network with noise robustness.

The technical scheme of the invention is as follows: the underwater acoustic target identification method based on the multi-scale sparse SRU classification model is characterized by comprising the following steps of:

step 1: the underwater sound target sample data acquisition and pretreatment comprises the following three steps:

step 1.1: defining a plurality of underwater sound targets, wherein n (n is more than or equal to 3) types of underwater sound targets are taken as research objects and marked; wherein the n types of underwater sound targets are respectively provided with m (m is more than or equal to 15) audio files. Intercepting each audio file in each type of underwater sound target from the beginning for t (t is more than or equal to 5) seconds, and then framing to obtain sample data of the underwater sound target;

step 1.2: dividing the obtained underwater sound target sample data into a training set, a verification set and a test set, and

step 1.3: standardizing the data of the training set and the verification set, adding band-limited white noises with different SNR to the data of the test set, and then carrying out standardization processing;

step 2: taking the time domain waveforms of the training set, the verification set and the test set after the standardization processing in the step 1.3 as the input of the multi-scale sparse SRU classification model, and carrying out model training and testing, wherein the method comprises the following substeps:

step 2.1: constructing a multi-scale sparse SRU classification model, comprising the following two steps:

step 2.1.1: the single-layer SRU model is constructed as the basis,

wherein the single layer SRU is calculated as follows:

in the formula, X_tIs the input at the t time step; σ is a sigmoid function, mapping the input to between 0 and 1, h_t-1Is an implicit state at time t-1. W_f、W_rAnd W is a parameter matrix, v_f，v_r，b_fAnd b_rIs a parameter vector required to be learned in training. The formulas (1) and (2) respectively define a forgetting gate f at the time t_tAnd a reset gate r_t(ii) a Equation (3) defines the candidate implicit state at time t

Equation (4) defines the final hidden state h at time t_tWherein

For a jump connection, g is a non-linear tanh function, mapping the input between-1 and 1,

is a dot product operator.

Step 2.1.2: constructing sparse connections

Therefore, the multiple multi-scale SRU blocks obtained in the step 2.1.1 are connected sparsely to obtain a fusion form of feature expressions learned by different SRU blocks, and the feature combination after fusion is used as the feature input of a classifier (model top layer) to complete the classification and identification tasks of 3 types of targets;

step 2.2: performing model training on the training set and the verification set after the standardization processing, firstly performing forward propagation to calculate the actual output probability, then updating the network parameters through a reverse gradient propagation algorithm, and reducing the loss value of the loss function to continuously reduce the error so that the actual output probability of the model is closer to the expected output probability;

step 2.3: in order to test the training effect of the created model on the unknown data set under the mismatch condition, the test set is used as the input of the saved optimal model, the classification performance of the model is further tested, and during the test, the error measurement is carried out on the network model by using an F1 value, wherein the F1 value is defined as the formula (5):

where P is precision, i.e. the percentage of all targets for which the prediction is target i (i ═ 1, 2, 3), that is actually target i; r is the recall rate, i.e. the percentage of targets that are successfully predicted as target i, among all targets that are actually target i. The F1 value can be viewed as a weighted average of P and R. The F1 value was used to evaluate the goodness of the different algorithms. In the invention, the higher the F1 value is, the better the classification performance of the model algorithm is.

And step 3: and (3) taking the correct recognition rate of the verification set obtained in the step 2.2 as an evaluation index in the training process, and taking the F1 value in the step 2.3 as an evaluation index in the testing process.

The further technical scheme of the invention is as follows: the preprocessing in the step 1 is to frame each section of target data, determine the length and the frame shift of each frame, and obtain total underwater sound target sample data.

The further technical scheme of the invention is as follows: the normalization in step 3 is zero-mean while variance normalization.

The further technical scheme of the invention is as follows: the model in the step 4.1.2 consists of an input layer, 4 SRU blocks, a multi-feature layer and 1 full-connection layer; each SRU block consists of an SRU and a Layer Normalization Layer; the Layer Normalization Layer is mainly used in the RNN network and performs Normalization operation on target input in the channel direction; each SRU block adds jump connection between model input and multi-feature layer to form local structure of network.

The further technical scheme of the invention is as follows: the SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of implicit nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and the internal structure of each SRU block is sparsely connected among 4 SRU blocks. The top layer of the model is connected with 1 full connecting layer. The full connection layer is not activated and directly used as a discrimination layer to output actual output probability.

Effects of the invention

The invention has the technical effects that: aiming at the classification and identification tasks of 3 types of actually-measured underwater acoustic targets, the invention constructs a multi-scale sparse SRU classification model based on a supervised simple cyclic unit.

1) The multi-scale sparse SRU classification model provided by the invention is used for parallel connection of SRUs at different layers to obtain multi-scale characteristic information, so that a network model obtains 96.7% correct recognition rate in a verification set.

2) The multi-scale sparse SRU classification model utilizes the advantages of a recurrent neural network processing time sequence, uses multi-scale feature representation (constructing a multi-scale SRU block) of SRUs at different layers for supervised learning of underwater sound target time sequences (time domain waveforms), and completes model construction by fusing the feature representation (constructing sparse connection).

3) In the iterative training process of step 2.2, the model parameters are optimized when the multi-scale sparse SRU classification model is trained for the 4 th time, and the optimal correct recognition rate of the verification set is obtained. Compared with the multi-scale sparse SRU classification model without jump connection, the optimal correct recognition rate of the verification set is obtained when the multi-scale sparse SRU classification model is trained to the 21 st time, so that model convergence is accelerated by adding jump connection, and the model training time is shortened.

4) Through comparison experiments with the CNN classification model, the proposed multi-scale sparse SRU classification model can keep higher correct recognition rate under the condition that the noise conditions of the training samples and the test samples are not matched, and is a network with noise robustness.

Drawings

FIG. 1 is a flow chart of an underwater acoustic target recognition system based on a multi-scale sparse SRU classification model

FIG. 2 is a diagram of a multi-scale sparse SRU classification model framework

FIG. 3 is a diagram of a multi-layer CNN model framework

Detailed Description

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.

Referring to fig. 1 to fig. 3, in order to improve the recognition performance of the classification recognition system in the noise mismatch environment, the invention is inspired by CNN model inclusion, and provides an underwater acoustic target recognition method based on a multi-scale sparse SRU classification model. The method utilizes different feature expressions learned by different levels of SRUs to perform feature fusion on input data multi-scale features, and uses the fused feature combination as the feature input of a classifier (the last layer) to complete the classification and identification tasks of multiple types of targets. Meanwhile, jump connection is added in the classification model to accelerate model convergence and reduce training time.

Step 1: and acquiring and preprocessing underwater sound target sample data.

Defining several underwater acoustic targets, in which 3 kinds of water are takenTaking an acoustic target as a research object and marking; wherein the class 3 underwater sound objects each have 15 audio files. Intercepting each audio file in each type of underwater sound target for 5 seconds from the beginning, and framing to obtain sample data of the underwater sound target; and strictly dividing the underwater sound target data into a training set, a verification set and a test set. Namely, it is

And finally, constructing a noise mismatch condition for the test set.

Step 2: taking the time domain waveforms of the training set, the verification set and the test set after the standardization processing in the step 1 as the input of the multi-scale sparse SRU classification model established in the invention, and carrying out model training and testing, wherein the method comprises the following substeps:

step 2.1.1: and constructing a multi-scale SRU block.

Each SRU block is formed by combining a single-layer or multi-layer stacked SRU and a jump connection. Different SRU blocks learn the characteristic expression of input data through SRUs stacked in different layers, the number of stacked layers is different, the number of hidden nodes is different, the learned characteristics are different, and the characteristics can be considered as different characteristic expression forms of the input data. Shallow stacking and fewer hidden nodes acquire low-level features, deep stacking and more hidden nodes learn high-level features. The lower level features can reflect local characteristics of data, and the higher level features have abstraction and invariance. The jump connection is mostly applied to a deeper network structure, is easier to converge than a deep model which directly learns mapping between input and output, and solves the problem of gradient dissipation of the SRU in the training process.

The single layer SRU is calculated as follows:

wherein, X_tIs the input at the t time step; σ is a sigmoid function, mapping the input to between 0 and 1, h_t-1Is an implicit state at time t-1. W_f、W_rAnd W is a parameter matrix, v_f，v_r，b_fAnd b_rIs a parameter vector required to be learned in training. The formulas (1) and (2) respectively define a forgetting gate f at the time t_tAnd a reset gate r_t. Equation (3) defines the candidate implicit state at time t

Equation (4) defines the final hidden state h at time t_tWherein

The method is a jump connection, and uses the idea of high way Network, thereby effectively solving the problem of gradient dissipation generated by a deep Network in gradient training. The activation function g, which is a tanh function, maps the input between-1 and 1,

is a dot product operator.

Step 2.1.2: and constructing sparse connections.

Inspired by the CNN model inclusion, the network does not simply improve the expression capability of the network by increasing the depth and the number of hidden nodes of the model, but can realize multi-scale feature expression by a non-uniform sparse structure. Such multi-scale feature expressions may also be applied in RNN networks. The SRU is used as a parallel RNN, and has the advantages of time series modeling capability and low calculation cost. Meanwhile, feature fusion can enrich and identify feature information. Therefore, the multiple multi-scale SRU blocks obtained in the step 2.1.1 are connected sparsely to obtain a fusion form of feature expressions learned by different SRU blocks, and the feature combination after fusion is used as the feature input of a classifier (model top layer) to complete the classification and identification task of 3 types of targets.

Step 2.2: and (5) training.

And taking the training set and the verification set after the standardization treatment as the input of the model, and training the model. In the iterative training process, the training set firstly carries out forward propagation to calculate the actual output probability, then updates the network parameters through a reverse gradient propagation algorithm, reduces the loss value of the loss function to reduce the error and obtain the actual output probability which is closer to the expected output probability. The expected output probability and the actual output probability are 3-dimensional probability tensors, and an index corresponding to the maximum value in the probability tensors is the target label i (i is 1, 2, 3). Comparing the actual target label (marked in step 1.1) of the known training set sample with the expected target label (index corresponding to the maximum value in the actual output probability tensor obtained by network training), and obtaining the correct recognition rate (percentage of the correctly classified target label in all the target labels) of the training set; and then, the verification set carries out forward propagation to calculate the actual output probability. Comparing the actual target label of the verification set sample with the expected target label to obtain the correct identification rate of the verification set; after the iterative training is completed, the trained optimal model (including the optimal model parameters) is saved.

Step 2.3: and (6) testing.

In order to test the training effect of the created model on the unknown data set under the mismatching condition, the test set is used as the input of the stored optimal model, and the classification performance of the model is further tested. During testing, the error measurement is carried out on the network model by using F1 values, wherein the F1 values are defined as the following formula (5):

And step 3: and (6) evaluating the model.

And (3) taking the correct recognition rate of the verification set obtained in the step 2.2 as an evaluation index in the training process, and taking the F1 value in the step 2.3 as an evaluation index in the testing process. And 2 indexes obtained by training and testing are used for comprehensively evaluating the model to obtain more comprehensive evaluation. In order to prove the effectiveness of the invention, a multilayer CNN model based on time sequence is used for comparison, namely, the multi-scale sparse SRU classification model constructed in the step 2.1 is replaced by the multilayer CNN model, other steps are not changed, 2 evaluation indexes of training and testing are obtained, and the evaluation indexes are compared with the multi-scale sparse SRU classification model.

The underwater acoustic target recognition method of the present invention will now be described in detail with reference to the examples and the accompanying drawings, wherein the flow of the underwater acoustic target recognition system is shown in fig. 1. The underwater acoustic target recognition method based on the multi-scale sparse SRU classification model is realized by programming in a Python language PyTorch environment. The training and the verification of the model are both carried out on the GPU, and the training speed of the model is accelerated by using the cuDNN.

Step 1: read 3 types of tagged underwater acoustic targets (ship, merchant, and some underwater target), 15 audio files per group, each audio file intercepted for 5 s. Firstly, preprocessing the underwater sound target data. And framing each section of target data, wherein the length of each frame is 100ms, and the frame shift is 0. I.e., every 0.1 second of target data is one sample, for a total of 2250 samples for a 3-class target. The underwater acoustic target data is strictly divided into a training set, a validation set and a test set. Total sample 3/5 was used as training, 1/5 as validation, and 1/5 as test in the experiment. And then, carrying out standardization treatment (zero-averaging and variance normalization) on the training data and the verification data of the 3 types of targets, extracting time domain waveforms of the training data and the verification data as the input of a network model, and constructing a time sequence network model. During testing, in order to construct a noise mismatch condition, band-limited white noise is added to 3 types of target test data respectively, target test data with SNR of-20 dB, -15dB, -10dB, -5dB, 0dB, 5dB, 10dB, 15dB and 20dB are generated, and then the test data are subjected to standardization processing.

step 2.1: and constructing a multi-scale sparse SRU classification model. The structure diagram of the multi-scale sparse SRU classification model framework is shown in FIG. 2. The model consists of an input layer, 4 SRU blocks (within the dashed box in the figure), a multi-feature layer and 1 full-connected layer. Wherein, each SRU block is composed of an SRU and a Layer Normalization Layer. The Layer Normalization Layer is mostly used in the RNN, and Normalization operation is carried out on target input in the channel direction, so that data are uniformly distributed in the channel direction. Meanwhile, each SRU block adds skip connection (skip connection) between the model input and the multi-feature layer, and a local structure of the network is formed. The jump connection is mostly applied to a deeper network structure, is easier to converge than a deep model which directly learns mapping between input and output, and solves the problem of gradient dissipation of the SRU in the training process. SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of implicit nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and the internal structure of each SRU block is sparsely connected among 4 SRU blocks. The multi-feature layer contains feature expressions learned by different layers of the SRU. After the Batch Normalization operation is applied to the multi-feature layer, the network convergence speed is increased by normalizing the data Batch, and meanwhile, the disappearance of the gradient and the explosion of the gradient of the network are prevented. The top layer of the model is connected with 1 full connecting layer. The full-connection layer is not activated and directly used as a discrimination layer to output actual output (probability).

Step 2.2: and (5) training a model. And (5) taking the time domain waveforms of the 3 types of target training sets and the verification sets as the input of the network model, and training the network model. Training parameters are set, a network is initialized randomly, loss is calculated by adopting a sparse classification cross entropy loss function, a gradient is optimized by adopting an adaptive moment estimation (Adam) algorithm, the learning rate is 0.001, and the training times are 50. And after the training is finished, obtaining the optimal correct recognition rate of the verification set and the corresponding network model parameters, storing the optimal model (with optimal parameters), and carrying out subsequent tests.

Step 2.3: and (5) testing the model. In order to test the recognition performance of the network model under the noise mismatch condition, the trained network model is applied to the underwater sound target classification recognition task under the noise mismatch condition, the model is further analyzed, and F1 values of the network model under different SNR are obtained.

And step 3: and (6) evaluating the model. The correct recognition rate of the verification set obtained in step 2.2 is an evaluation index in the training process, and the F1 value in step 2.3 is an evaluation index in the testing process. In the actual verification, in order to prove the effectiveness of the invention, a time series-based multilayer CNN model is used for comparison. The structure of the multi-layer CNN model framework is shown in figure 3. The model consists of an input layer, 3 one-dimensional convolutional layers, 3 one-dimensional pooling layers and 2 full-connected layers. In the model, each convolution layer is activated by ReLU, before activation, a Batch Normalization operation is executed, and then a pooling layer is added to reduce the space size of data. The number of the 3 one-dimensional convolutional layer convolution kernels is 32, 64 and 128 respectively, and the size of the one-dimensional pooling layer is 3. The 1 st full-connection layer uses ReLU as an activation function, Batch Normalization operation is carried out on input data before activation, a dropout layer is added after activation to prevent network overfitting, and the value of dropout is 0.5. The 2 nd full-connection layer is not activated and directly used as a discrimination layer to output the actual output (probability) of various underwater sound target samples.

The experimental results are as follows. During training, the correct recognition rate of the verification set obtained by using the multilayer CNN model is 96.0%, and the correct recognition rate of the verification set obtained by using the multi-scale sparse SRU classification model is 96.7%, which is higher than that of the multilayer CNN model. In addition, model parameters are optimized when the multi-scale sparse SRU classification model is trained for the 4 th time, and the optimal correct recognition rate of the verification set is obtained. Compared with the multi-scale sparse SRU classification model without jump connection, the optimal correct recognition rate of the verification set is obtained when the multi-scale sparse SRU classification model is trained to the 21 st time, so that model convergence is accelerated by adding jump connection, and the model training time is shortened. The F1 values for the 2 models at different SNRs when tested are shown in attached table 1. As can be seen from the attached table 1, the F1 value of the multi-scale sparse SRU classification model is higher than that of the multi-layer CNN model under low SNR, which indicates that the proposed model can suppress the influence caused by noise mismatch, and is a network with noise robustness. The above conclusions effectively demonstrate the effectiveness of the present invention.

Table 1: f1 values for 2 models at different SNR

Claims

1. The underwater acoustic target identification method based on the multi-scale sparse SRU classification model is characterized by comprising the following steps of:

step 2.1.1: the single-layer SRU model is constructed as the basis,

wherein the single layer SRU is calculated as follows:

Equation (4) defines the final hidden state h at time t_tWherein

is a dot product operator.

Step 2.1.2: constructing sparse connections

2. The method for identifying underwater acoustic targets based on the multi-scale sparse SRU classification model according to claim 1, wherein the preprocessing in the step 1 is to frame each section of target data, determine the length and the frame shift of each frame, and obtain total underwater acoustic target sample data.

3. The method for underwater acoustic target recognition based on the multi-scale sparse SRU classification model as claimed in claim 1, wherein the normalization in the step 3 is zero mean while variance normalization.

4. The underwater acoustic target recognition method based on the multi-scale sparse SRU classification model according to claim 1, wherein the model in the step 4.1.2 is composed of an input layer, 4 SRU blocks, a multi-feature layer and 1 full-connected layer; each SRU block consists of an SRU and a Layer Normalization Layer; the Layer Normalization Layer is mainly used in the RNN network and performs Normalization operation on target input in the channel direction; each SRU block adds jump connection between model input and multi-feature layer to form local structure of network.

5. The method for underwater acoustic target recognition based on the multi-scale sparse SRU classification model according to claim 4, wherein the SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of implicit nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and the internal structure of each SRU block is sparsely connected among 4 SRU blocks. The top layer of the model is connected with 1 full connecting layer. The full connection layer is not activated and directly used as a discrimination layer to output actual output probability.