CN113239809A - Underwater sound target identification method based on multi-scale sparse SRU classification model - Google Patents

Underwater sound target identification method based on multi-scale sparse SRU classification model Download PDF

Info

Publication number
CN113239809A
CN113239809A CN202110530281.1A CN202110530281A CN113239809A CN 113239809 A CN113239809 A CN 113239809A CN 202110530281 A CN202110530281 A CN 202110530281A CN 113239809 A CN113239809 A CN 113239809A
Authority
CN
China
Prior art keywords
sru
model
layer
target
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110530281.1A
Other languages
Chinese (zh)
Other versions
CN113239809B (en
Inventor
曾向阳
杨爽
薛灵芝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110530281.1A priority Critical patent/CN113239809B/en
Publication of CN113239809A publication Critical patent/CN113239809A/en
Application granted granted Critical
Publication of CN113239809B publication Critical patent/CN113239809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction

Abstract

The invention relates to an underwater acoustic target recognition method based on a multi-scale sparse SRU classification model, which utilizes different feature expressions learned by different levels of SRUs to perform feature fusion on input data multi-scale features, and uses the fused feature combination as the feature input of a classifier (the last layer) to complete the classification and recognition tasks of various targets. Meanwhile, jump connection is added in the classification model to accelerate model convergence and reduce training time.

Description

Underwater sound target identification method based on multi-scale sparse SRU classification model
Technical Field
The invention belongs to the field of underwater acoustic target passive recognition under the condition of noise mismatch, and particularly relates to an underwater acoustic target recognition method based on a multi-scale sparse SRU classification model.
Background
Underwater acoustic target recognition is one of the important research directions and challenges in the field of underwater acoustic signal processing. Due to the influence of noise in a complex marine environment, when the identification method with good laboratory simulation effect is applied to a real scene, the effect may be inferior to the expected effect. Therefore, suppressing noise interference and improving the robustness of the recognition method are very critical to the practical application of the recognition method.
In recent years, Deep Learning (DL) theory has attracted much attention, and also provides a new idea for underwater acoustic target recognition technology. Wangqiang et al studied the applicability of the Deep learning method in underwater acoustic target recognition, established Convolutional Neural Network (CNN) and Deep Belief Network (DBN) models, and identified 3 actually measured underwater acoustic targets. The result shows that the underwater sound identification by using the deep learning model is a feasible approach. Lu' an et al apply a Long Short Term Memory (LSTM) network to underwater acoustic target recognition. The recognition effect of the LSTM structure on MFCC features is superior to that of a conventional Recurrent Neural Network (RNN), a CNN (CNN) and a Deep Neural Network (DNN), which shows that the network can obtain better effect by utilizing more time sequence information. The front-back dependence of the LSTM network causes RNN parallel computation to be difficult, so the computation speed is far less than that of CNN. Tao Lei et al propose a Simple cycling Unit (SRU) based on a study of the LSTM model. They demonstrated the effectiveness of SRUs in Natural Language Processing (NLP) tasks. SRUs are accelerated 5-9 times over cuDNN optimized LSTM on categorical and question-answer datasets.
The SRU is not applied to the field of underwater sound target passive recognition at present, and has application in problems of voice recognition, text recognition and the like. The applications only complete the classification recognition task by simply increasing the depth and the number of hidden nodes of the network or combining with other network structures such as CNN and the like, do not adopt a parallel sparse structure to fuse the output characteristics of the multi-scale SRU, and complete the recognition task by means of characteristic fusion and enrichment of recognition characteristic information.
Disclosure of Invention
The technical problem solved by the invention is as follows: in order to simplify the flow of an underwater acoustic target passive recognition system, enrich recognition characteristic information, keep high correct recognition rate of a classification model and simultaneously improve the adaptability of the classification model under the condition of noise mismatch, the invention introduces an SRU method into the underwater acoustic target recognition and provides a multi-scale sparse SRU classification model. The method has the advantages that the deep learning end-to-end processing capability is utilized, the timbre information is hidden based on the time domain waveform structure characteristics, and the underwater acoustic target passive identification system can be constructed by distinguishing different types of underwater acoustic target signals and can be used for classifying and identifying targets. The problems are solved: (1) the method avoids the dependence on manual feature extraction, and simplifies the process of constructing the underwater sound target passive identification system; (2) the multi-scale feature representation of the SRUs of different layers for supervised learning of the underwater sound target time sequence (time domain waveform) is used, and the identification feature information is enriched; (3) and each SRU block in the model is added with jump connection, so that the model is easier to converge than the model which directly learns the mapping between input and output, and the training time of the model is reduced. (4) Under the condition that the noise conditions of the training samples and the test samples are not matched, the correct recognition rate can be kept high, and the network is a network with noise robustness.
The technical scheme of the invention is as follows: the underwater acoustic target identification method based on the multi-scale sparse SRU classification model is characterized by comprising the following steps of:
step 1: the underwater sound target sample data acquisition and pretreatment comprises the following three steps:
step 1.1: defining a plurality of underwater sound targets, wherein n (n is more than or equal to 3) types of underwater sound targets are taken as research objects and marked; wherein the n types of underwater sound targets are respectively provided with m (m is more than or equal to 15) audio files. Intercepting each audio file in each type of underwater sound target from the beginning for t (t is more than or equal to 5) seconds, and then framing to obtain sample data of the underwater sound target;
step 1.2: dividing the obtained underwater sound target sample data into a training set, a verification set and a test set, and
Figure BDA0003067391140000021
step 1.3: standardizing the data of the training set and the verification set, adding band-limited white noises with different SNR to the data of the test set, and then carrying out standardization processing;
step 2: taking the time domain waveforms of the training set, the verification set and the test set after the standardization processing in the step 1.3 as the input of the multi-scale sparse SRU classification model, and carrying out model training and testing, wherein the method comprises the following substeps:
step 2.1: constructing a multi-scale sparse SRU classification model, comprising the following two steps:
step 2.1.1: the single-layer SRU model is constructed as the basis,
wherein the single layer SRU is calculated as follows:
Figure BDA0003067391140000031
Figure BDA0003067391140000032
Figure BDA0003067391140000033
Figure BDA0003067391140000034
in the formula, XtIs the input at the t time step; σ is a sigmoid function, mapping the input to between 0 and 1, ht-1Is an implicit state at time t-1. Wf、WrAnd W is a parameter matrix, vf,vr,bfAnd brIs a parameter vector required to be learned in training. The formulas (1) and (2) respectively define a forgetting gate f at the time ttAnd a reset gate rt(ii) a Equation (3) defines the candidate implicit state at time t
Figure BDA0003067391140000035
Equation (4) defines the final hidden state h at time ttWherein
Figure BDA0003067391140000036
For a jump connection, g is a non-linear tanh function, mapping the input between-1 and 1,
Figure BDA0003067391140000037
is a dot product operator.
Step 2.1.2: constructing sparse connections
Therefore, the multiple multi-scale SRU blocks obtained in the step 2.1.1 are connected sparsely to obtain a fusion form of feature expressions learned by different SRU blocks, and the feature combination after fusion is used as the feature input of a classifier (model top layer) to complete the classification and identification tasks of 3 types of targets;
step 2.2: performing model training on the training set and the verification set after the standardization processing, firstly performing forward propagation to calculate the actual output probability, then updating the network parameters through a reverse gradient propagation algorithm, and reducing the loss value of the loss function to continuously reduce the error so that the actual output probability of the model is closer to the expected output probability;
step 2.3: in order to test the training effect of the created model on the unknown data set under the mismatch condition, the test set is used as the input of the saved optimal model, the classification performance of the model is further tested, and during the test, the error measurement is carried out on the network model by using an F1 value, wherein the F1 value is defined as the formula (5):
Figure BDA0003067391140000041
where P is precision, i.e. the percentage of all targets for which the prediction is target i (i ═ 1, 2, 3), that is actually target i; r is the recall rate, i.e. the percentage of targets that are successfully predicted as target i, among all targets that are actually target i. The F1 value can be viewed as a weighted average of P and R. The F1 value was used to evaluate the goodness of the different algorithms. In the invention, the higher the F1 value is, the better the classification performance of the model algorithm is.
And step 3: and (3) taking the correct recognition rate of the verification set obtained in the step 2.2 as an evaluation index in the training process, and taking the F1 value in the step 2.3 as an evaluation index in the testing process.
The further technical scheme of the invention is as follows: the preprocessing in the step 1 is to frame each section of target data, determine the length and the frame shift of each frame, and obtain total underwater sound target sample data.
The further technical scheme of the invention is as follows: the normalization in step 3 is zero-mean while variance normalization.
The further technical scheme of the invention is as follows: the model in the step 4.1.2 consists of an input layer, 4 SRU blocks, a multi-feature layer and 1 full-connection layer; each SRU block consists of an SRU and a Layer Normalization Layer; the Layer Normalization Layer is mainly used in the RNN network and performs Normalization operation on target input in the channel direction; each SRU block adds jump connection between model input and multi-feature layer to form local structure of network.
The further technical scheme of the invention is as follows: the SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of implicit nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and the internal structure of each SRU block is sparsely connected among 4 SRU blocks. The top layer of the model is connected with 1 full connecting layer. The full connection layer is not activated and directly used as a discrimination layer to output actual output probability.
Effects of the invention
The invention has the technical effects that: aiming at the classification and identification tasks of 3 types of actually-measured underwater acoustic targets, the invention constructs a multi-scale sparse SRU classification model based on a supervised simple cyclic unit.
1) The multi-scale sparse SRU classification model provided by the invention is used for parallel connection of SRUs at different layers to obtain multi-scale characteristic information, so that a network model obtains 96.7% correct recognition rate in a verification set.
2) The multi-scale sparse SRU classification model utilizes the advantages of a recurrent neural network processing time sequence, uses multi-scale feature representation (constructing a multi-scale SRU block) of SRUs at different layers for supervised learning of underwater sound target time sequences (time domain waveforms), and completes model construction by fusing the feature representation (constructing sparse connection).
3) In the iterative training process of step 2.2, the model parameters are optimized when the multi-scale sparse SRU classification model is trained for the 4 th time, and the optimal correct recognition rate of the verification set is obtained. Compared with the multi-scale sparse SRU classification model without jump connection, the optimal correct recognition rate of the verification set is obtained when the multi-scale sparse SRU classification model is trained to the 21 st time, so that model convergence is accelerated by adding jump connection, and the model training time is shortened.
4) Through comparison experiments with the CNN classification model, the proposed multi-scale sparse SRU classification model can keep higher correct recognition rate under the condition that the noise conditions of the training samples and the test samples are not matched, and is a network with noise robustness.
Drawings
FIG. 1 is a flow chart of an underwater acoustic target recognition system based on a multi-scale sparse SRU classification model
FIG. 2 is a diagram of a multi-scale sparse SRU classification model framework
FIG. 3 is a diagram of a multi-layer CNN model framework
Detailed Description
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.
Referring to fig. 1 to fig. 3, in order to improve the recognition performance of the classification recognition system in the noise mismatch environment, the invention is inspired by CNN model inclusion, and provides an underwater acoustic target recognition method based on a multi-scale sparse SRU classification model. The method utilizes different feature expressions learned by different levels of SRUs to perform feature fusion on input data multi-scale features, and uses the fused feature combination as the feature input of a classifier (the last layer) to complete the classification and identification tasks of multiple types of targets. Meanwhile, jump connection is added in the classification model to accelerate model convergence and reduce training time.
Step 1: and acquiring and preprocessing underwater sound target sample data.
Defining several underwater acoustic targets, in which 3 kinds of water are takenTaking an acoustic target as a research object and marking; wherein the class 3 underwater sound objects each have 15 audio files. Intercepting each audio file in each type of underwater sound target for 5 seconds from the beginning, and framing to obtain sample data of the underwater sound target; and strictly dividing the underwater sound target data into a training set, a verification set and a test set. Namely, it is
Figure BDA0003067391140000061
And finally, constructing a noise mismatch condition for the test set.
Step 2: taking the time domain waveforms of the training set, the verification set and the test set after the standardization processing in the step 1 as the input of the multi-scale sparse SRU classification model established in the invention, and carrying out model training and testing, wherein the method comprises the following substeps:
step 2.1: constructing a multi-scale sparse SRU classification model, comprising the following two steps:
step 2.1.1: and constructing a multi-scale SRU block.
Each SRU block is formed by combining a single-layer or multi-layer stacked SRU and a jump connection. Different SRU blocks learn the characteristic expression of input data through SRUs stacked in different layers, the number of stacked layers is different, the number of hidden nodes is different, the learned characteristics are different, and the characteristics can be considered as different characteristic expression forms of the input data. Shallow stacking and fewer hidden nodes acquire low-level features, deep stacking and more hidden nodes learn high-level features. The lower level features can reflect local characteristics of data, and the higher level features have abstraction and invariance. The jump connection is mostly applied to a deeper network structure, is easier to converge than a deep model which directly learns mapping between input and output, and solves the problem of gradient dissipation of the SRU in the training process.
The single layer SRU is calculated as follows:
Figure BDA0003067391140000071
Figure BDA0003067391140000072
Figure BDA0003067391140000073
Figure BDA0003067391140000074
wherein, XtIs the input at the t time step; σ is a sigmoid function, mapping the input to between 0 and 1, ht-1Is an implicit state at time t-1. Wf、WrAnd W is a parameter matrix, vf,vr,bfAnd brIs a parameter vector required to be learned in training. The formulas (1) and (2) respectively define a forgetting gate f at the time ttAnd a reset gate rt. Equation (3) defines the candidate implicit state at time t
Figure BDA0003067391140000075
Equation (4) defines the final hidden state h at time ttWherein
Figure BDA0003067391140000076
The method is a jump connection, and uses the idea of high way Network, thereby effectively solving the problem of gradient dissipation generated by a deep Network in gradient training. The activation function g, which is a tanh function, maps the input between-1 and 1,
Figure BDA0003067391140000077
is a dot product operator.
Step 2.1.2: and constructing sparse connections.
Inspired by the CNN model inclusion, the network does not simply improve the expression capability of the network by increasing the depth and the number of hidden nodes of the model, but can realize multi-scale feature expression by a non-uniform sparse structure. Such multi-scale feature expressions may also be applied in RNN networks. The SRU is used as a parallel RNN, and has the advantages of time series modeling capability and low calculation cost. Meanwhile, feature fusion can enrich and identify feature information. Therefore, the multiple multi-scale SRU blocks obtained in the step 2.1.1 are connected sparsely to obtain a fusion form of feature expressions learned by different SRU blocks, and the feature combination after fusion is used as the feature input of a classifier (model top layer) to complete the classification and identification task of 3 types of targets.
Step 2.2: and (5) training.
And taking the training set and the verification set after the standardization treatment as the input of the model, and training the model. In the iterative training process, the training set firstly carries out forward propagation to calculate the actual output probability, then updates the network parameters through a reverse gradient propagation algorithm, reduces the loss value of the loss function to reduce the error and obtain the actual output probability which is closer to the expected output probability. The expected output probability and the actual output probability are 3-dimensional probability tensors, and an index corresponding to the maximum value in the probability tensors is the target label i (i is 1, 2, 3). Comparing the actual target label (marked in step 1.1) of the known training set sample with the expected target label (index corresponding to the maximum value in the actual output probability tensor obtained by network training), and obtaining the correct recognition rate (percentage of the correctly classified target label in all the target labels) of the training set; and then, the verification set carries out forward propagation to calculate the actual output probability. Comparing the actual target label of the verification set sample with the expected target label to obtain the correct identification rate of the verification set; after the iterative training is completed, the trained optimal model (including the optimal model parameters) is saved.
Step 2.3: and (6) testing.
In order to test the training effect of the created model on the unknown data set under the mismatching condition, the test set is used as the input of the stored optimal model, and the classification performance of the model is further tested. During testing, the error measurement is carried out on the network model by using F1 values, wherein the F1 values are defined as the following formula (5):
Figure BDA0003067391140000081
where P is precision, i.e. the percentage of all targets for which the prediction is target i (i ═ 1, 2, 3), that is actually target i; r is the recall rate, i.e. the percentage of targets that are successfully predicted as target i, among all targets that are actually target i. The F1 value can be viewed as a weighted average of P and R. The F1 value was used to evaluate the goodness of the different algorithms. In the invention, the higher the F1 value is, the better the classification performance of the model algorithm is.
And step 3: and (6) evaluating the model.
And (3) taking the correct recognition rate of the verification set obtained in the step 2.2 as an evaluation index in the training process, and taking the F1 value in the step 2.3 as an evaluation index in the testing process. And 2 indexes obtained by training and testing are used for comprehensively evaluating the model to obtain more comprehensive evaluation. In order to prove the effectiveness of the invention, a multilayer CNN model based on time sequence is used for comparison, namely, the multi-scale sparse SRU classification model constructed in the step 2.1 is replaced by the multilayer CNN model, other steps are not changed, 2 evaluation indexes of training and testing are obtained, and the evaluation indexes are compared with the multi-scale sparse SRU classification model.
The underwater acoustic target recognition method of the present invention will now be described in detail with reference to the examples and the accompanying drawings, wherein the flow of the underwater acoustic target recognition system is shown in fig. 1. The underwater acoustic target recognition method based on the multi-scale sparse SRU classification model is realized by programming in a Python language PyTorch environment. The training and the verification of the model are both carried out on the GPU, and the training speed of the model is accelerated by using the cuDNN.
Step 1: read 3 types of tagged underwater acoustic targets (ship, merchant, and some underwater target), 15 audio files per group, each audio file intercepted for 5 s. Firstly, preprocessing the underwater sound target data. And framing each section of target data, wherein the length of each frame is 100ms, and the frame shift is 0. I.e., every 0.1 second of target data is one sample, for a total of 2250 samples for a 3-class target. The underwater acoustic target data is strictly divided into a training set, a validation set and a test set. Total sample 3/5 was used as training, 1/5 as validation, and 1/5 as test in the experiment. And then, carrying out standardization treatment (zero-averaging and variance normalization) on the training data and the verification data of the 3 types of targets, extracting time domain waveforms of the training data and the verification data as the input of a network model, and constructing a time sequence network model. During testing, in order to construct a noise mismatch condition, band-limited white noise is added to 3 types of target test data respectively, target test data with SNR of-20 dB, -15dB, -10dB, -5dB, 0dB, 5dB, 10dB, 15dB and 20dB are generated, and then the test data are subjected to standardization processing.
Step 2: taking the time domain waveforms of the training set, the verification set and the test set after the standardization processing in the step 1 as the input of the multi-scale sparse SRU classification model established in the invention, and carrying out model training and testing, wherein the method comprises the following substeps:
step 2.1: and constructing a multi-scale sparse SRU classification model. The structure diagram of the multi-scale sparse SRU classification model framework is shown in FIG. 2. The model consists of an input layer, 4 SRU blocks (within the dashed box in the figure), a multi-feature layer and 1 full-connected layer. Wherein, each SRU block is composed of an SRU and a Layer Normalization Layer. The Layer Normalization Layer is mostly used in the RNN, and Normalization operation is carried out on target input in the channel direction, so that data are uniformly distributed in the channel direction. Meanwhile, each SRU block adds skip connection (skip connection) between the model input and the multi-feature layer, and a local structure of the network is formed. The jump connection is mostly applied to a deeper network structure, is easier to converge than a deep model which directly learns mapping between input and output, and solves the problem of gradient dissipation of the SRU in the training process. SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of implicit nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and the internal structure of each SRU block is sparsely connected among 4 SRU blocks. The multi-feature layer contains feature expressions learned by different layers of the SRU. After the Batch Normalization operation is applied to the multi-feature layer, the network convergence speed is increased by normalizing the data Batch, and meanwhile, the disappearance of the gradient and the explosion of the gradient of the network are prevented. The top layer of the model is connected with 1 full connecting layer. The full-connection layer is not activated and directly used as a discrimination layer to output actual output (probability).
Step 2.2: and (5) training a model. And (5) taking the time domain waveforms of the 3 types of target training sets and the verification sets as the input of the network model, and training the network model. Training parameters are set, a network is initialized randomly, loss is calculated by adopting a sparse classification cross entropy loss function, a gradient is optimized by adopting an adaptive moment estimation (Adam) algorithm, the learning rate is 0.001, and the training times are 50. And after the training is finished, obtaining the optimal correct recognition rate of the verification set and the corresponding network model parameters, storing the optimal model (with optimal parameters), and carrying out subsequent tests.
Step 2.3: and (5) testing the model. In order to test the recognition performance of the network model under the noise mismatch condition, the trained network model is applied to the underwater sound target classification recognition task under the noise mismatch condition, the model is further analyzed, and F1 values of the network model under different SNR are obtained.
And step 3: and (6) evaluating the model. The correct recognition rate of the verification set obtained in step 2.2 is an evaluation index in the training process, and the F1 value in step 2.3 is an evaluation index in the testing process. In the actual verification, in order to prove the effectiveness of the invention, a time series-based multilayer CNN model is used for comparison. The structure of the multi-layer CNN model framework is shown in figure 3. The model consists of an input layer, 3 one-dimensional convolutional layers, 3 one-dimensional pooling layers and 2 full-connected layers. In the model, each convolution layer is activated by ReLU, before activation, a Batch Normalization operation is executed, and then a pooling layer is added to reduce the space size of data. The number of the 3 one-dimensional convolutional layer convolution kernels is 32, 64 and 128 respectively, and the size of the one-dimensional pooling layer is 3. The 1 st full-connection layer uses ReLU as an activation function, Batch Normalization operation is carried out on input data before activation, a dropout layer is added after activation to prevent network overfitting, and the value of dropout is 0.5. The 2 nd full-connection layer is not activated and directly used as a discrimination layer to output the actual output (probability) of various underwater sound target samples.
The experimental results are as follows. During training, the correct recognition rate of the verification set obtained by using the multilayer CNN model is 96.0%, and the correct recognition rate of the verification set obtained by using the multi-scale sparse SRU classification model is 96.7%, which is higher than that of the multilayer CNN model. In addition, model parameters are optimized when the multi-scale sparse SRU classification model is trained for the 4 th time, and the optimal correct recognition rate of the verification set is obtained. Compared with the multi-scale sparse SRU classification model without jump connection, the optimal correct recognition rate of the verification set is obtained when the multi-scale sparse SRU classification model is trained to the 21 st time, so that model convergence is accelerated by adding jump connection, and the model training time is shortened. The F1 values for the 2 models at different SNRs when tested are shown in attached table 1. As can be seen from the attached table 1, the F1 value of the multi-scale sparse SRU classification model is higher than that of the multi-layer CNN model under low SNR, which indicates that the proposed model can suppress the influence caused by noise mismatch, and is a network with noise robustness. The above conclusions effectively demonstrate the effectiveness of the present invention.
Table 1: f1 values for 2 models at different SNR
Figure BDA0003067391140000111

Claims (5)

1. The underwater acoustic target identification method based on the multi-scale sparse SRU classification model is characterized by comprising the following steps of:
step 1: the underwater sound target sample data acquisition and pretreatment comprises the following three steps:
step 1.1: defining a plurality of underwater sound targets, wherein n (n is more than or equal to 3) types of underwater sound targets are taken as research objects and marked; wherein the n types of underwater sound targets are respectively provided with m (m is more than or equal to 15) audio files. Intercepting each audio file in each type of underwater sound target from the beginning for t (t is more than or equal to 5) seconds, and then framing to obtain sample data of the underwater sound target;
step 1.2: dividing the obtained underwater sound target sample data into a training set, a verification set and a test set, and
Figure FDA0003067391130000015
step 1.3: standardizing the data of the training set and the verification set, adding band-limited white noises with different SNR to the data of the test set, and then carrying out standardization processing;
step 2: taking the time domain waveforms of the training set, the verification set and the test set after the standardization processing in the step 1.3 as the input of the multi-scale sparse SRU classification model, and carrying out model training and testing, wherein the method comprises the following substeps:
step 2.1: constructing a multi-scale sparse SRU classification model, comprising the following two steps:
step 2.1.1: the single-layer SRU model is constructed as the basis,
wherein the single layer SRU is calculated as follows:
Figure FDA0003067391130000011
Figure FDA0003067391130000012
Figure FDA0003067391130000013
Figure FDA0003067391130000014
in the formula, XtIs the input at the t time step; σ is a sigmoid function, mapping the input to between 0 and 1, ht-1Is an implicit state at time t-1. Wf、WrAnd W is a parameter matrix, vf,vr,bfAnd brIs a parameter vector required to be learned in training. The formulas (1) and (2) respectively define a forgetting gate f at the time ttAnd a reset gate rt(ii) a Equation (3) defines the candidate implicit state at time t
Figure FDA0003067391130000021
Equation (4) defines the final hidden state h at time ttWherein
Figure FDA0003067391130000022
For a jump connection, g is a non-linear tanh function, mapping the input between-1 and 1,
Figure FDA0003067391130000023
is a dot product operator.
Step 2.1.2: constructing sparse connections
Therefore, the multiple multi-scale SRU blocks obtained in the step 2.1.1 are connected sparsely to obtain a fusion form of feature expressions learned by different SRU blocks, and the feature combination after fusion is used as the feature input of a classifier (model top layer) to complete the classification and identification tasks of 3 types of targets;
step 2.2: performing model training on the training set and the verification set after the standardization processing, firstly performing forward propagation to calculate the actual output probability, then updating the network parameters through a reverse gradient propagation algorithm, and reducing the loss value of the loss function to continuously reduce the error so that the actual output probability of the model is closer to the expected output probability;
step 2.3: in order to test the training effect of the created model on the unknown data set under the mismatch condition, the test set is used as the input of the saved optimal model, the classification performance of the model is further tested, and during the test, the error measurement is carried out on the network model by using an F1 value, wherein the F1 value is defined as the formula (5):
Figure FDA0003067391130000024
where P is precision, i.e. the percentage of all targets for which the prediction is target i (i ═ 1, 2, 3), that is actually target i; r is the recall rate, i.e. the percentage of targets that are successfully predicted as target i, among all targets that are actually target i. The F1 value can be viewed as a weighted average of P and R. The F1 value was used to evaluate the goodness of the different algorithms. In the invention, the higher the F1 value is, the better the classification performance of the model algorithm is.
And step 3: and (3) taking the correct recognition rate of the verification set obtained in the step 2.2 as an evaluation index in the training process, and taking the F1 value in the step 2.3 as an evaluation index in the testing process.
2. The method for identifying underwater acoustic targets based on the multi-scale sparse SRU classification model according to claim 1, wherein the preprocessing in the step 1 is to frame each section of target data, determine the length and the frame shift of each frame, and obtain total underwater acoustic target sample data.
3. The method for underwater acoustic target recognition based on the multi-scale sparse SRU classification model as claimed in claim 1, wherein the normalization in the step 3 is zero mean while variance normalization.
4. The underwater acoustic target recognition method based on the multi-scale sparse SRU classification model according to claim 1, wherein the model in the step 4.1.2 is composed of an input layer, 4 SRU blocks, a multi-feature layer and 1 full-connected layer; each SRU block consists of an SRU and a Layer Normalization Layer; the Layer Normalization Layer is mainly used in the RNN network and performs Normalization operation on target input in the channel direction; each SRU block adds jump connection between model input and multi-feature layer to form local structure of network.
5. The method for underwater acoustic target recognition based on the multi-scale sparse SRU classification model according to claim 4, wherein the SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of implicit nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and the internal structure of each SRU block is sparsely connected among 4 SRU blocks. The top layer of the model is connected with 1 full connecting layer. The full connection layer is not activated and directly used as a discrimination layer to output actual output probability.
CN202110530281.1A 2021-05-14 2021-05-14 Underwater sound target identification method based on multi-scale sparse SRU classification model Active CN113239809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110530281.1A CN113239809B (en) 2021-05-14 2021-05-14 Underwater sound target identification method based on multi-scale sparse SRU classification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110530281.1A CN113239809B (en) 2021-05-14 2021-05-14 Underwater sound target identification method based on multi-scale sparse SRU classification model

Publications (2)

Publication Number Publication Date
CN113239809A true CN113239809A (en) 2021-08-10
CN113239809B CN113239809B (en) 2023-09-15

Family

ID=77134433

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110530281.1A Active CN113239809B (en) 2021-05-14 2021-05-14 Underwater sound target identification method based on multi-scale sparse SRU classification model

Country Status (1)

Country Link
CN (1) CN113239809B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116543795A (en) * 2023-06-29 2023-08-04 天津大学 Sound scene classification method based on multi-mode feature fusion

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154105A (en) * 2017-12-21 2018-06-12 深圳先进技术研究院 Aquatic organism detects and recognition methods, device, server and terminal device
US20190155619A1 (en) * 2017-11-20 2019-05-23 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method for Preloading Application, Terminal Device, and Medium
US20190311188A1 (en) * 2018-12-05 2019-10-10 Sichuan University Face emotion recognition method based on dual-stream convolutional neural network
CN110491415A (en) * 2019-09-23 2019-11-22 河南工业大学 A kind of speech-emotion recognition method based on convolutional neural networks and simple cycle unit
CN110580458A (en) * 2019-08-25 2019-12-17 天津大学 music score image recognition method combining multi-scale residual error type CNN and SRU
CN110738138A (en) * 2019-09-26 2020-01-31 哈尔滨工程大学 Underwater acoustic communication signal modulation mode identification method based on cyclic neural network
CN110807365A (en) * 2019-09-29 2020-02-18 浙江大学 Underwater target identification method based on fusion of GRU and one-dimensional CNN neural network
CN111243579A (en) * 2020-01-19 2020-06-05 清华大学 Time domain single-channel multi-speaker voice recognition method and system
CN111257341A (en) * 2020-03-30 2020-06-09 河海大学常州校区 Underwater building crack detection method based on multi-scale features and stacked full convolution network
CN111754438A (en) * 2020-06-24 2020-10-09 安徽理工大学 Underwater image restoration model based on multi-branch gating fusion and restoration method thereof
CN111985533A (en) * 2020-07-14 2020-11-24 中国电子科技集团公司第三十六研究所 Incremental underwater sound signal identification method based on multi-scale information fusion
CN112115822A (en) * 2020-09-04 2020-12-22 西北工业大学 Intelligent fusion sensing method for underwater moving target
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN112733656A (en) * 2020-12-30 2021-04-30 杭州电子科技大学 Skeleton action identification method based on multi-stream space attention diagram convolution SRU network

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190155619A1 (en) * 2017-11-20 2019-05-23 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method for Preloading Application, Terminal Device, and Medium
CN108154105A (en) * 2017-12-21 2018-06-12 深圳先进技术研究院 Aquatic organism detects and recognition methods, device, server and terminal device
US20190311188A1 (en) * 2018-12-05 2019-10-10 Sichuan University Face emotion recognition method based on dual-stream convolutional neural network
CN110580458A (en) * 2019-08-25 2019-12-17 天津大学 music score image recognition method combining multi-scale residual error type CNN and SRU
CN110491415A (en) * 2019-09-23 2019-11-22 河南工业大学 A kind of speech-emotion recognition method based on convolutional neural networks and simple cycle unit
CN110738138A (en) * 2019-09-26 2020-01-31 哈尔滨工程大学 Underwater acoustic communication signal modulation mode identification method based on cyclic neural network
CN110807365A (en) * 2019-09-29 2020-02-18 浙江大学 Underwater target identification method based on fusion of GRU and one-dimensional CNN neural network
CN111243579A (en) * 2020-01-19 2020-06-05 清华大学 Time domain single-channel multi-speaker voice recognition method and system
CN111257341A (en) * 2020-03-30 2020-06-09 河海大学常州校区 Underwater building crack detection method based on multi-scale features and stacked full convolution network
CN111754438A (en) * 2020-06-24 2020-10-09 安徽理工大学 Underwater image restoration model based on multi-branch gating fusion and restoration method thereof
CN111985533A (en) * 2020-07-14 2020-11-24 中国电子科技集团公司第三十六研究所 Incremental underwater sound signal identification method based on multi-scale information fusion
CN112115822A (en) * 2020-09-04 2020-12-22 西北工业大学 Intelligent fusion sensing method for underwater moving target
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN112733656A (en) * 2020-12-30 2021-04-30 杭州电子科技大学 Skeleton action identification method based on multi-stream space attention diagram convolution SRU network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SOBHAN SHEYKHIVAND等: "Recognizing Emotions Evoked by Music Using CNN-LSTM Networks on EEG Signals", 《IEEE ACCESS》, vol. 8, pages 139332 - 139345, XP011802750, DOI: 10.1109/ACCESS.2020.3011882 *
吴琼等: "基于多尺度残差式卷积神经网络与双向简单循环单元的光学乐谱识别方法", 《激光与光电子学进展》, vol. 57, no. 8, pages 1 - 10 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116543795A (en) * 2023-06-29 2023-08-04 天津大学 Sound scene classification method based on multi-mode feature fusion
CN116543795B (en) * 2023-06-29 2023-08-29 天津大学 Sound scene classification method based on multi-mode feature fusion

Also Published As

Publication number Publication date
CN113239809B (en) 2023-09-15

Similar Documents

Publication Publication Date Title
US10003483B1 (en) Biologically inspired methods and systems for automatically determining the modulation types of radio signals using stacked de-noising autoencoders
CN109583501B (en) Method, device, equipment and medium for generating image classification and classification recognition model
CN112784881B (en) Network abnormal flow detection method, model and system
CN111753881B (en) Concept sensitivity-based quantitative recognition defending method against attacks
CN112766386B (en) Generalized zero sample learning method based on multi-input multi-output fusion network
CN111353373B (en) Related alignment domain adaptive fault diagnosis method
CN111709315A (en) Underwater acoustic target radiation noise identification method based on field adaptation
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN109934269B (en) Open set identification method and device for electromagnetic signals
CN111477247B (en) Speech countermeasure sample generation method based on GAN
CN111460157B (en) Cyclic convolution multitask learning method for multi-field text classification
CN111368920A (en) Quantum twin neural network-based binary classification method and face recognition method thereof
CN112784031B (en) Method and system for classifying customer service conversation texts based on small sample learning
CN110940523A (en) Unsupervised domain adaptive fault diagnosis method
CN114139676A (en) Training method of domain adaptive neural network
CN112766496B (en) Deep learning model safety guarantee compression method and device based on reinforcement learning
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN114842343A (en) ViT-based aerial image identification method
CN113221758B (en) GRU-NIN model-based underwater sound target identification method
CN114675249A (en) Attention mechanism-based radar signal modulation mode identification method
CN113239809A (en) Underwater sound target identification method based on multi-scale sparse SRU classification model
CN113362814A (en) Voice identification model compression method fusing combined model information
CN115600137A (en) Multi-source domain variable working condition mechanical fault diagnosis method for incomplete category data
CN115392434A (en) Depth model reinforcement method based on graph structure variation test
CN115098681A (en) Open service intention detection method based on supervised contrast learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant