CN113239809B - Underwater sound target identification method based on multi-scale sparse SRU classification model - Google Patents

Underwater sound target identification method based on multi-scale sparse SRU classification model Download PDF

Info

Publication number
CN113239809B
CN113239809B CN202110530281.1A CN202110530281A CN113239809B CN 113239809 B CN113239809 B CN 113239809B CN 202110530281 A CN202110530281 A CN 202110530281A CN 113239809 B CN113239809 B CN 113239809B
Authority
CN
China
Prior art keywords
sru
model
layer
underwater sound
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110530281.1A
Other languages
Chinese (zh)
Other versions
CN113239809A (en
Inventor
曾向阳
杨爽
薛灵芝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110530281.1A priority Critical patent/CN113239809B/en
Publication of CN113239809A publication Critical patent/CN113239809A/en
Application granted granted Critical
Publication of CN113239809B publication Critical patent/CN113239809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention relates to a method for identifying underwater sound targets based on a multi-scale sparse SRU classification model, which utilizes different feature expressions learned by SRUs of different levels to perform feature fusion on multi-scale features of input data, and takes the fused feature combination as the feature input of a classifier (the last layer) to complete the classification and identification task of multi-class targets. Meanwhile, jump connection is added in the classification model to accelerate model convergence and reduce training time.

Description

Underwater sound target identification method based on multi-scale sparse SRU classification model
Technical Field
The invention belongs to the field of underwater sound target passive identification under a noise mismatch condition, and particularly relates to an underwater sound target identification method based on a multi-scale sparse SRU classification model.
Background
Underwater sound target identification is one of important research directions and problems in the field of underwater sound signal processing. Due to the influence of noise in a complex marine environment, when a recognition method with a very good laboratory simulation effect is applied to a real scene, the effect may not be as expected. Therefore, suppressing noise interference and improving the robustness of the identification method are critical for practical application of the identification method.
In recent years, deep Learning (DL) theory is paid attention to, and a new idea is provided for the underwater sound target recognition technology. Wang Jiang et al studied the applicability of the Deep learning method in underwater acoustic target recognition, built convolutional neural network (Convolutional Neural Network, CNN) and Deep Brief Net, DBN) models, and recognized 3 actually measured underwater acoustic targets. The results show that the use of a deep learning model for underwater sound mesh identification is a viable approach. Lu Anan et al applied long and short term memory (Long short term memory, LSTM) networks to underwater acoustic target recognition. The LSTM structure has better recognition effect on the MFCC features than the conventional recurrent neural network (Recurrent Neural Networks, RNN), CNN and deep neural network (Deep Neural Networks, DNN), which means that the network can obtain better effect by using more time sequence information. The front-back dependence of the LSTM network makes RNN parallel computation difficult, so the computation speed is far less than CNN. Tao Lei et al propose a simple circulation unit (Simple Recurrent Unit, SRU) based on studies of LSTM model. They demonstrated the effectiveness of SRUs in terms of natural language processing (Natural Language Processing, NLP) tasks. SRUs are accelerated 5-9 times over cuDNN optimized LSTM on classification and question answer datasets.
SRU has not been applied to the field of underwater sound target passive recognition at present, and has application in the problems of voice recognition, text recognition and the like. The application only completes the classification recognition task by simply increasing the depth and the hidden node number of the network or combining with other network structures such as CNN and the like, does not adopt a parallel sparse structure to fuse the multi-scale SRU output characteristics, and completes the recognition task in a mode of feature fusion and abundant recognition characteristic information.
Disclosure of Invention
The invention solves the technical problems that: in order to simplify the flow of a underwater sound target passive recognition system, enrich recognition characteristic information, maintain high correct recognition rate of a classification model and improve adaptability under a noise mismatch condition, the invention introduces an SRU method into underwater sound target recognition and provides a multi-scale sparse SRU classification model. The tone information is implied based on the time domain waveform structural characteristics by utilizing the processing capacity of deep learning end-to-end, and the method can be used for distinguishing different types of underwater sound target signals to construct an underwater sound target passive recognition system and classify and recognize targets. The problems are solved: (1) The method avoids relying on manual extraction of features, and simplifies the process of constructing the underwater sound target passive identification system; (2) The SRUs of different layers are used for supervised learning of multi-scale characteristic representation of the underwater sound target time sequence (time domain waveform), so that identification characteristic information is enriched; (3) Each SRU block in the model is added with jump connection, so that the mapping between the input and the output is easier to converge than the direct learning of the model, and the training time of the model is reduced. (4) Under the condition that the noise conditions of the training sample and the test sample are not matched, the high correct recognition rate can be maintained, and the network with noise robustness is provided.
The technical scheme of the invention is as follows: the underwater sound target identification method based on the multi-scale sparse SRU classification model is characterized by comprising the following steps of:
step 1: the underwater sound target sample data acquisition and preprocessing method comprises the following three steps:
step 1.1: defining a plurality of underwater sound targets, wherein n (n is more than or equal to 3) type underwater sound targets are taken as research objects and marked; wherein n types of underwater sound targets are respectively provided with m (m is more than or equal to 15) audio files. Intercepting each audio file in each type of underwater sound target for t (t is more than or equal to 5) seconds from the beginning, and framing to obtain underwater sound target sample data;
step 1.2: dividing the obtained underwater sound target sample data into a training set, a verification set and a test set, and
step 1.3: carrying out standardization processing on the training set data and the verification set data, and carrying out standardization processing after adding band-limited white noise with different SNR (signal-to-noise ratio) to the test set data;
step 2: taking the time domain waveforms of the training set, the verification set and the test set which are subjected to standard treatment in the step 1.3 as the input of a multi-scale sparse SRU classification model, and performing model training and testing, wherein the method comprises the following substeps:
step 2.1: the method comprises the following two steps of:
step 2.1.1: on the basis of constructing a single-layer SRU model,
wherein, the calculation of single layer SRU is as follows:
wherein X is t Is the input of the t-th time step; sigma is a sigmoid function mapping the input to between 0 and 1, h t-1 Is an implicit state at time t-1. W (W) f 、W r And W is a parameter matrix, v f ,v r ,b f And b r Is a parameter vector that needs to be learned in training. Equations (1) and (2) define the forgetting gate f at time t, respectively t And reset gate r t The method comprises the steps of carrying out a first treatment on the surface of the Equation (3) defines a candidate hidden state at time tEquation (4) defines the final hidden state h at time t t Wherein->For a jump connection, g is a nonlinear tanh function mapping the input between-1 and 1, -/->Is a dot product operator.
Step 2.1.2: constructing sparse connections
The SRU blocks with multiple scales obtained in the step 2.1.1 are subjected to sparse connection to obtain fusion forms of feature expressions learned by different SRU blocks, and the fused feature combinations are used as feature input of a classifier (model top layer) to finish classification and identification tasks of 3 types of targets;
step 2.2: model training is carried out on the training set and the verification set after the standardization processing, the actual output probability is calculated through forward propagation, network parameters are updated through a backward gradient propagation algorithm, and the loss value of a loss function is reduced to continuously reduce errors, so that the actual output probability of the model is more and more close to the expected output probability;
step 2.3: in order to test the training effect of the model created under the mismatch condition on the unknown data set, the test set is used as the input of the stored optimal model, the model classification performance is further tested, and during the test, an F1 value is used for carrying out error measurement on the network model, wherein the definition of the F1 value is as shown in a formula (5):
where P is the precision, i.e. the percentage of all targets predicted to be target i (i=1, 2, 3) that is actually target i; r is the recall, i.e., the percentage of successful predictions as target i among all targets actually being target i. The F1 value can be regarded as a weighted average of P and R. The F1 value was used to evaluate the merits of the different algorithms. In the invention, the higher the F1 value is, the better the classification performance of the model algorithm is.
Step 3: and (3) taking the correct recognition rate of the verification set obtained in the step (2.2) as an evaluation index in the training process, and taking the F1 value in the step (2.3) as an evaluation index in the testing process.
The invention further adopts the technical scheme that: and the preprocessing in the step 1 is to divide frames of each section of target data, and determine the length and the frame shift of each frame to obtain the overall underwater sound target sample data.
The invention further adopts the technical scheme that: the normalization in the step 3 is zero-mean and variance normalization.
The invention further adopts the technical scheme that: the model in the step 4.1.2 consists of an input layer, 4 SRU blocks, a multi-feature layer and 1 full-connection layer; wherein each SRU block is composed of one SRU and one Layer Normalization layer; layer Normalization layers are used in RNN networks for normalizing target inputs in the channel direction; each SRU block adds a hopping connection between the model inputs and the multi-feature layers, constituting a local architecture of the network.
The invention further adopts the technical scheme that: the SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of hidden nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and 4 SRU blocks are sparsely connected. The top layer of the model is connected with 1 full-connection layer. The full connection layer is not activated, and is directly used as a judging layer to output actual output probability.
Effects of the invention
The invention has the technical effects that: aiming at the classification and identification task of 3 types of actual measurement underwater sound targets, the invention constructs a multi-scale sparse SRU classification model based on a supervised simple circulation unit.
1) The multi-scale sparse SRU classification model provided by the invention is connected with SRUs of different layers in parallel to obtain multi-scale characteristic information, so that the network model obtains 96.7% correct recognition rate in a verification set.
2) The multi-scale sparse SRU classification model utilizes the advantage of a cyclic neural network to process a time sequence, uses multi-scale feature representations (constructing multi-scale SRU blocks) of SRUs of different layers to supervise learning underwater sound target time sequences (time domain waveforms), and fuses the feature representations (constructing sparse connection) to complete model construction.
3) In the iterative training process of the step 2.2, model parameters are optimized when the multi-scale sparse SRU classification model is trained to the 4 th time, and the optimal correct recognition rate of the verification set is obtained. Compared with the method, the optimal correct recognition rate of the verification set is obtained when the multi-scale sparse SRU classification model without jump connection is trained to 21 st time, so that jump connection is added to accelerate model convergence and reduce model training time.
4) Compared with a CNN classification model, the multi-scale sparse SRU classification model can keep higher correct recognition rate under the condition that noise conditions of a training sample and a test sample are not matched, and is a network with noise robustness.
Drawings
FIG. 1 is a flow chart of a hydroacoustic target recognition system based on a multi-scale sparse SRU classification model
FIG. 2 is a framework structure diagram of a multi-scale sparse SRU classification model
FIG. 3 is a diagram of a framework of a multi-layered CNN model
Detailed Description
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Referring to fig. 1-3, in order to improve the recognition performance of a classification recognition system in a noise mismatch environment, the invention provides a hydroacoustic target recognition method based on a multi-scale sparse SRU classification model, which is inspired by the acceptance of a CNN model. According to the method, different feature expressions learned by SRUs at different levels are utilized to perform feature fusion on the multi-scale features of the input data, and the fused feature combination is used as the feature input of a classifier (the last layer) to finish the classification and identification task of multi-class targets. Meanwhile, jump connection is added in the classification model to accelerate model convergence and reduce training time.
Step 1: and (5) obtaining and preprocessing underwater sound target sample data.
Defining a plurality of underwater sound targets, wherein 3 types of underwater sound targets are taken as research objects and marked; wherein the 3 kinds of underwater sound targets each carry 15 audio files. Intercepting each audio file in each type of underwater sound target for 5 seconds from the beginning, and framing to obtain underwater sound target sample data; and then the underwater sound target data are strictly divided into a training set, a verification set and a test set. I.e.And finally, constructing a noise mismatch condition for the test set.
Step 2: taking the time domain waveforms of the training set, the verification set and the test set which are subjected to the standardization processing in the step 1 as the input of the multi-scale sparse SRU classification model created in the invention, and carrying out model training and testing, wherein the method comprises the following substeps:
step 2.1: the method comprises the following two steps of:
step 2.1.1: a multi-scale SRU block is constructed.
Each SRU block is formed by combining a single-layer or multi-layer stacked SRU with a hopping connection. Different SRU blocks learn feature expressions of input data through SRUs stacked by different layers, the stacking layers are different, the hidden node numbers are different, the learned features are different, and the features can be regarded as different feature expression forms of the input data. The shallow layer stack and the fewer hidden nodes acquire low-level features, and the deep layer stack and the more hidden nodes learn high-level features. The lower level features can reflect the local characteristics of the data, and the higher level features have abstract and invariance. The jump connection is applied to deeper network structures, so that the mapping between input and output is easier to converge than the mapping between input and output which are directly learned by a depth model, and meanwhile, the problem of gradient dissipation of the SRU in the training process is solved.
The single layer SRU is calculated as follows:
wherein X is t Is the input of the t-th time step; sigma is a sigmoid function mapping the input to between 0 and 1, h t-1 Is an implicit state at time t-1. W (W) f 、W r And W is a parameter matrix, v f ,v r ,b f And b r Is a parameter vector that needs to be learned in training. Equations (1) and (2) define the forgetting gate f at time t, respectively t And reset gate r t . Equation (3) defines a candidate hidden state at time tEquation (4) defines the final hidden state h at time t t Wherein->The method is a jump connection, and the problem of gradient dissipation generated in gradient training of a deep Network is effectively solved by using the thought of a high way Network. The activation function g is a tanh function mapping the input between-1 and 1,/for>Is a dot product operator.
Step 2.1.2: and constructing sparse connection.
Inspired by the CNN model acceptance, the network does not simply increase the expression capability of the network by increasing the depth and the number of hidden nodes of the model, but can realize multi-scale feature expression through a non-uniform sparse structure. Such multi-scale feature expression may also be applied in RNN networks. The SRU is used as a parallelizable RNN, and has the advantages of time sequence modeling capability and low calculation cost. Meanwhile, feature fusion can enrich identification feature information. And (3) performing sparse connection on the plurality of multi-scale SRU blocks obtained in the step (2.1.1) to obtain a fusion form of feature expression learned by different SRU blocks, and inputting the fused feature combination as the feature of a classifier (model top layer) to finish the classification and identification task of the class 3 targets.
Step 2.2: training process.
And taking the standardized training set and the standardized verification set as the input of the model, and training the model. In the iterative training process, the training set firstly carries out forward propagation to calculate the actual output probability, then updates network parameters through a backward gradient propagation algorithm, reduces the loss value of the loss function to reduce errors, and obtains the actual output probability which is closer to the expected output probability. The expected output probability and the actual output probability are 3-dimensional probability tensors, and the index corresponding to the maximum value in the probability tensors is the target label i (i=1, 2, 3). Comparing the actual target labels (marked in step 1.1) of the known training set samples with the expected target labels (indexes corresponding to the maximum value in the actual output probability tensor obtained by network training) to obtain the correct recognition rate (the percentage of the target labels in all target labels in correct classification) of the training set; the validation set then proceeds to forward propagate to calculate the actual output probabilities. Comparing the actual target label of the verification set sample with the expected target label to obtain the correct recognition rate of the verification set; after the iterative training is completed, the trained optimal model (comprising optimal model parameters) is saved.
Step 2.3: and (5) testing.
In order to test the training effect of the model created under the mismatch condition on the unknown data set, the test set is used as the input of the stored optimal model, and the model classification performance is further tested. During testing, the error measurement is performed on the network model by using an F1 value, wherein the F1 value is defined as in a formula (5):
where P is the precision, i.e. the percentage of all targets predicted to be target i (i=1, 2, 3) that is actually target i; r is the recall, i.e., the percentage of successful predictions as target i among all targets actually being target i. The F1 value can be regarded as a weighted average of P and R. The F1 value was used to evaluate the merits of the different algorithms. In the invention, the higher the F1 value is, the better the classification performance of the model algorithm is.
Step 3: and (5) evaluating a model.
And (3) taking the correct recognition rate of the verification set obtained in the step (2.2) as an evaluation index in the training process, and taking the F1 value in the step (2.3) as an evaluation index in the testing process. And comprehensively evaluating the model through 2 indexes obtained through training and testing to obtain more comprehensive evaluation. In order to prove the effectiveness of the invention, the multi-layer CNN model based on the time sequence is used for comparison, namely the multi-scale sparse SRU classification model constructed in the step 2.1 is replaced by the multi-layer CNN model, other steps are unchanged, 2 evaluation indexes for training and testing are obtained, and the multi-scale sparse SRU classification model is compared.
The underwater sound target recognition method of the invention is described in detail with reference to examples and drawings, and the flow of an underwater sound target recognition system is shown in fig. 1. The underwater sound target identification method based on the multi-scale sparse SRU classification model is realized by programming in a Python language PyTorch environment. The training and verification of the model are performed on the GPU, and the cuDNN is used for accelerating the training speed of the model.
Step 1: 3 kinds of underwater sound targets (ships, commercial ships and certain underwater targets) with labels are read, 15 audio files are selected from each group, and each audio file is intercepted for 5s. First, the underwater sound target data is preprocessed. Each segment of target data is framed, each frame is 100ms long, and the frame shift is 0. I.e. one sample per 0.1 seconds of target data, there are 2250 samples for a total of 3 classes of targets. The underwater sound target data is strictly divided into a training set, a verification set and a test set. 3/5 of the total sample was used in the experiment as training, 1/5 as validation and 1/5 as test. And performing standardization processing (zero-mean and variance normalization) on training data and verification data of the 3-class targets, extracting time domain waveforms of the training data and the verification data as input of a network model, and constructing a time sequence network model. In the test, in order to construct a noise mismatch condition, band-limited white noise is added to 3 types of target test data respectively, target test data with SNR of-20 dB, -15dB, -10dB, -5dB, 0dB, 5dB, 10dB, 15dB and 20dB are generated respectively, and then standardized processing is carried out on the test data.
Step 2: taking the time domain waveforms of the training set, the verification set and the test set which are subjected to the standardization processing in the step 1 as the input of the multi-scale sparse SRU classification model created in the invention, and carrying out model training and testing, wherein the method comprises the following substeps:
step 2.1: and constructing a multi-scale sparse SRU classification model. A framework structure diagram of the multi-scale sparse SRU classification model is shown in fig. 2. The model consists of an input layer, 4 SRU blocks (within the dashed box in the figure), a multi-feature layer, and 1 fully connected layer. Wherein each SRU block is made up of one SRU and one Layer Normalization layer. The Layer Normalization layer is mainly used in an RNN network, and performs normalization operation on target input in the channel direction, so that data is ensured to be distributed consistently in the channel direction. Meanwhile, each SRU block adds a skip connection (skip connection) between the model input and the multi-feature layer, which forms a local structure of the network. The jump connection is applied to deeper network structures, so that the mapping between input and output is easier to converge than the mapping between input and output which are directly learned by a depth model, and meanwhile, the problem of gradient dissipation of the SRU in the training process is solved. The SRU in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of hidden nodes is 16, 32, 64 and 256, respectively. The internal structure of each SRU block is fully connected and non-sparse, and 4 SRU blocks are sparsely connected. The multi-feature layer contains feature expressions learned by different layers of SRUs. Batch Normalization after application to the multi-feature layer, by normalizing the data batch, the network convergence speed is increased while preventing the gradient of the network from disappearing and gradient exploding. The top layer of the model is connected with 1 full-connection layer. The full connection layer is not activated, and is directly used as a discrimination layer to output actual output (probability).
Step 2.2: and (5) model training. And taking the time domain waveforms of the class 3 target training set and the validation set as the input of the network model, and training the network model. Setting training parameters, randomly initializing a network, calculating loss by adopting a sparse classification cross entropy loss function, optimizing gradient by adopting an adaptive moment estimation (Adam) algorithm, and obtaining the learning rate of 0.001 and the training times of 50 times. After training, obtaining the optimal correct recognition rate of the verification set and the corresponding network model parameters, storing an optimal model (parameter optimal), and carrying out subsequent tests.
Step 2.3: and (5) model testing. In order to test the recognition performance of the network model under the noise mismatch condition, the trained network model is applied to the underwater sound target classification recognition task under the noise mismatch condition, and the model is further analyzed to obtain F1 values of the network model under different SNR.
Step 3: and (5) evaluating a model. The correct recognition rate of the verification set obtained in the step 2.2 is an evaluation index in the training process, and the F1 value in the step 2.3 is an evaluation index in the testing process. In order to prove the effectiveness of the present invention, a time-series-based multi-layer CNN model was used for comparison when oriented to actual verification. The framework structure diagram of the multi-layer CNN model is shown in figure 3. The model consists of an input layer, 3 one-dimensional convolution layers, 3 one-dimensional pooling layers and 2 full connection layers. In the model, each convolution layer performs ReLU activation, batch Normalization operation is performed before activation, and a one-dimensional pooling layer is added to reduce the space size of data. The number of convolution kernels of the 3 one-dimensional convolution layers is 32, 64 and 128, respectively, and the size of the one-dimensional pooling layer is 3. The 1 st full connection layer uses ReLU as an activation function, input data is normalized in batches by Batch Normalization operation before activation, a dropout layer is added after activation to prevent network over fitting, and the dropout value is 0.5. The 2 nd full connection layer is not activated and is directly used as a discrimination layer to output the actual output (probability) of various underwater sound target samples.
The following are experimental results. During training, the correct recognition rate of the verification set is 96.0% by using the multi-layer CNN model, and 96.7% by using the multi-scale sparse SRU classification model, which is higher than that of the multi-layer CNN model. In addition, model parameters are optimized when the multi-scale sparse SRU classification model is trained to the 4 th time, and the optimal correct recognition rate of the verification set is obtained. Compared with the method, the optimal correct recognition rate of the verification set is obtained when the multi-scale sparse SRU classification model without jump connection is trained to 21 st time, so that jump connection is added to accelerate model convergence and reduce model training time. The F1 values for the 2 models at different SNRs are shown in table 1 below. As can be seen from the accompanying table 1, the F1 value of the multi-scale sparse SRU classification model is higher than that of the multi-layer CNN model under low SNR, which indicates that the proposed model can suppress the influence caused by noise mismatch, and is a network with noise robustness. The above conclusion effectively demonstrates the effectiveness of the present invention.
Table 1: f1 values for 2 models at different SNR

Claims (3)

1. The underwater sound target identification method based on the multi-scale sparse SRU classification model is characterized by comprising the following steps of:
step 1: the underwater sound target sample data acquisition and preprocessing method comprises the following three steps:
step 1.1: defining a plurality of underwater sound targets, wherein n types of underwater sound targets are taken as research objects and marked, and n is more than or equal to 3; wherein n types of underwater sound targets are respectively provided with m audio files; wherein m is more than or equal to 15; intercepting each audio file in each type of underwater sound target for t seconds from the beginning, and then subdividing the audio file into frames to obtain underwater sound target sample data, wherein t is more than or equal to 5;
step 1.2: dividing the obtained underwater sound target sample data into a training set, a verification set and a test set, and
step 1.3: carrying out standardization processing on the training set data and the verification set data, and carrying out standardization processing after adding band-limited white noise with different SNR (signal-to-noise ratio) to the test set data;
step 2: taking the time domain waveforms of the training set, the verification set and the test set which are subjected to standard treatment in the step 1.3 as the input of a multi-scale sparse SRU classification model, and performing model training and testing, wherein the method comprises the following substeps:
step 2.1: the method comprises the following two steps of:
step 2.1.1: constructing multi-scale SRU blocks, wherein each SRU block is formed by combining a single-layer or multi-layer stacked SRU with a jump connection; different SRU blocks learn the feature expression of the input data through SRUs stacked on different layers, the stacking layers are different, the hidden node numbers are different, the learned features are different, and the features can be regarded as different feature expression forms of the input data; shallow layer stacking and fewer hidden node numbers acquire low-level features, and deep layer stacking and more hidden node numbers learn high-level features; the lower-level features can reflect the local characteristics of the data, and the higher-level features have abstract and invariance; the method comprises the steps of carrying out a first treatment on the surface of the
Wherein, the calculation of single layer SRU is as follows:
wherein X is t Is the input of the t-th time step; sigma is a sigmoid function mapping the input to between 0 and 1, h t-1 Is an implicit state at time t-1; w (W) f 、W r And W is a parameter matrix, v f ,v r ,b f And b r The parameter vector to be learned in training is obtained; equations (1) and (2) define the forgetting gate f at time t, respectively t And reset gate r t The method comprises the steps of carrying out a first treatment on the surface of the Equation (3) defines a candidate hidden state at time tEquation (4) defines the final hidden state h at time t t Wherein->In order to make a jump connection,g is a nonlinear tanh function mapping the input between-1 and 1,/and>is a dot product operator; step 2.1.2: constructing sparse connections
The SRU blocks with multiple scales obtained in the step 2.1.1 are subjected to sparse connection to obtain fusion forms of feature expressions learned by different SRU blocks, and the fused feature combinations are used as feature input of a classifier, namely a model top layer, so that classification and identification tasks of multiple targets are completed;
the model consists of an input layer, 4 SRU blocks, a multi-feature layer and 1 full-connection layer; wherein each SRU block is composed of one SRU and one Layer Normalization layer; layer Normalization layers are used in RNN networks for normalizing target inputs in the channel direction; each SRU block adds jump connection between the model input and the multi-feature layer to form a local structure of the network;
the SRUs in the 4 SRU blocks are respectively 1 layer, 2 layers, 3 layers and 4 layers; the number of hidden nodes is 16, 32, 64 and 256 respectively; the internal structure of each SRU block is fully connected and non-sparse, and 4 SRU blocks are sparsely connected; the top layer of the model is connected with 1 full-connection layer; the full connection layer is not activated and directly serves as a judging layer to output actual output probability;
step 2.2: model training is carried out on the training set and the verification set after the standardization processing, the actual output probability is calculated through forward propagation, network parameters are updated through a backward gradient propagation algorithm, and the loss value of a loss function is reduced to continuously reduce errors, so that the actual output probability of the model is more and more close to the expected output probability;
step 2.3: in order to test the training effect of the model created under the mismatch condition on the unknown data set, the test set is used as the input of the stored optimal model, the model classification performance is further tested, and during the test, an F1 value is used for carrying out error measurement on the network model, wherein the definition of the F1 value is as shown in a formula (5):
where P is the precision, i.e. the percentage of all targets predicted to be target i, actually target i, i=1, 2,3; r is recall rate, namely, the success rate is predicted as the percentage of the target i in all targets which are actually the target i; the F1 value can be regarded as a weighted average of P and R; using F1 value to evaluate the merits of different algorithms; the higher the F1 value is, the better the classification performance of the model algorithm is;
step 3: and (3) taking the correct recognition rate of the verification set obtained in the step (2.2) as an evaluation index in the training process, and taking the F1 value in the step (2.3) as an evaluation index in the testing process.
2. The method for identifying underwater sound targets based on the multi-scale sparse SRU classification model according to claim 1, wherein the preprocessing in step 1 is to frame each segment of target data, and determine the length and frame shift of each frame, so as to obtain overall underwater sound target sample data.
3. The method for identifying the underwater sound target based on the multi-scale sparse SRU classification model according to claim 1, wherein the normalization in the step 3 is zero-mean and variance normalization.
CN202110530281.1A 2021-05-14 2021-05-14 Underwater sound target identification method based on multi-scale sparse SRU classification model Active CN113239809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110530281.1A CN113239809B (en) 2021-05-14 2021-05-14 Underwater sound target identification method based on multi-scale sparse SRU classification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110530281.1A CN113239809B (en) 2021-05-14 2021-05-14 Underwater sound target identification method based on multi-scale sparse SRU classification model

Publications (2)

Publication Number Publication Date
CN113239809A CN113239809A (en) 2021-08-10
CN113239809B true CN113239809B (en) 2023-09-15

Family

ID=77134433

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110530281.1A Active CN113239809B (en) 2021-05-14 2021-05-14 Underwater sound target identification method based on multi-scale sparse SRU classification model

Country Status (1)

Country Link
CN (1) CN113239809B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116543795B (en) * 2023-06-29 2023-08-29 天津大学 Sound scene classification method based on multi-mode feature fusion

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154105A (en) * 2017-12-21 2018-06-12 深圳先进技术研究院 Aquatic organism detects and recognition methods, device, server and terminal device
CN110491415A (en) * 2019-09-23 2019-11-22 河南工业大学 A kind of speech-emotion recognition method based on convolutional neural networks and simple cycle unit
CN110580458A (en) * 2019-08-25 2019-12-17 天津大学 music score image recognition method combining multi-scale residual error type CNN and SRU
CN110738138A (en) * 2019-09-26 2020-01-31 哈尔滨工程大学 Underwater acoustic communication signal modulation mode identification method based on cyclic neural network
CN110807365A (en) * 2019-09-29 2020-02-18 浙江大学 Underwater target identification method based on fusion of GRU and one-dimensional CNN neural network
CN111243579A (en) * 2020-01-19 2020-06-05 清华大学 Time domain single-channel multi-speaker voice recognition method and system
CN111257341A (en) * 2020-03-30 2020-06-09 河海大学常州校区 Underwater building crack detection method based on multi-scale features and stacked full convolution network
CN111754438A (en) * 2020-06-24 2020-10-09 安徽理工大学 Underwater image restoration model based on multi-branch gating fusion and restoration method thereof
CN111985533A (en) * 2020-07-14 2020-11-24 中国电子科技集团公司第三十六研究所 Incremental underwater sound signal identification method based on multi-scale information fusion
CN112115822A (en) * 2020-09-04 2020-12-22 西北工业大学 Intelligent fusion sensing method for underwater moving target
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN112733656A (en) * 2020-12-30 2021-04-30 杭州电子科技大学 Skeleton action identification method based on multi-stream space attention diagram convolution SRU network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109814937A (en) * 2017-11-20 2019-05-28 广东欧珀移动通信有限公司 Application program prediction model is established, preloads method, apparatus, medium and terminal
CN109815785A (en) * 2018-12-05 2019-05-28 四川大学 A kind of face Emotion identification method based on double-current convolutional neural networks

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154105A (en) * 2017-12-21 2018-06-12 深圳先进技术研究院 Aquatic organism detects and recognition methods, device, server and terminal device
CN110580458A (en) * 2019-08-25 2019-12-17 天津大学 music score image recognition method combining multi-scale residual error type CNN and SRU
CN110491415A (en) * 2019-09-23 2019-11-22 河南工业大学 A kind of speech-emotion recognition method based on convolutional neural networks and simple cycle unit
CN110738138A (en) * 2019-09-26 2020-01-31 哈尔滨工程大学 Underwater acoustic communication signal modulation mode identification method based on cyclic neural network
CN110807365A (en) * 2019-09-29 2020-02-18 浙江大学 Underwater target identification method based on fusion of GRU and one-dimensional CNN neural network
CN111243579A (en) * 2020-01-19 2020-06-05 清华大学 Time domain single-channel multi-speaker voice recognition method and system
CN111257341A (en) * 2020-03-30 2020-06-09 河海大学常州校区 Underwater building crack detection method based on multi-scale features and stacked full convolution network
CN111754438A (en) * 2020-06-24 2020-10-09 安徽理工大学 Underwater image restoration model based on multi-branch gating fusion and restoration method thereof
CN111985533A (en) * 2020-07-14 2020-11-24 中国电子科技集团公司第三十六研究所 Incremental underwater sound signal identification method based on multi-scale information fusion
CN112115822A (en) * 2020-09-04 2020-12-22 西北工业大学 Intelligent fusion sensing method for underwater moving target
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN112733656A (en) * 2020-12-30 2021-04-30 杭州电子科技大学 Skeleton action identification method based on multi-stream space attention diagram convolution SRU network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Recognizing Emotions Evoked by Music Using CNN-LSTM Networks on EEG Signals;Sobhan Sheykhivand等;《IEEE Access》;第8卷;139332-139345 *
基于多尺度残差式卷积神经网络与双向简单循环单元的光学乐谱识别方法;吴琼等;《激光与光电子学进展》;第57卷(第8期);1-10 *

Also Published As

Publication number Publication date
CN113239809A (en) 2021-08-10

Similar Documents

Publication Publication Date Title
CN112784881B (en) Network abnormal flow detection method, model and system
US10762426B2 (en) Multi-iteration compression for deep neural networks
US11189302B2 (en) Speech emotion detection method and apparatus, computer device, and storage medium
CN109583501B (en) Method, device, equipment and medium for generating image classification and classification recognition model
Xiang et al. Efficient text-independent speaker verification with structural Gaussian mixture models and neural network
CN111753881B (en) Concept sensitivity-based quantitative recognition defending method against attacks
US20180046915A1 (en) Compression of deep neural networks with proper use of mask
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN111477247B (en) Speech countermeasure sample generation method based on GAN
CN111653275B (en) Method and device for constructing voice recognition model based on LSTM-CTC tail convolution and voice recognition method
CN112784031B (en) Method and system for classifying customer service conversation texts based on small sample learning
CN113297572B (en) Deep learning sample-level anti-attack defense method and device based on neuron activation mode
CN112766496B (en) Deep learning model safety guarantee compression method and device based on reinforcement learning
WO2020214253A1 (en) Condition-invariant feature extraction network for speaker recognition
CN111882042B (en) Neural network architecture automatic search method, system and medium for liquid state machine
CN112990444A (en) Hybrid neural network training method, system, equipment and storage medium
CN114139676A (en) Training method of domain adaptive neural network
CN113239809B (en) Underwater sound target identification method based on multi-scale sparse SRU classification model
CN114675249A (en) Attention mechanism-based radar signal modulation mode identification method
CN110827809B (en) Language identification and classification method based on condition generation type confrontation network
CN113221758B (en) GRU-NIN model-based underwater sound target identification method
KR20200023695A (en) Learning system to reduce computation volume
CN113378910B (en) Poisoning attack method for identifying electromagnetic signal modulation type based on pure label
US20220284261A1 (en) Training-support-based machine learning classification and regression augmentation
CN114203185A (en) Time sequence voiceprint feature combination identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant