CN117275510A - Small sample underwater sound target identification method and system based on multi-gradient flow network - Google Patents

Small sample underwater sound target identification method and system based on multi-gradient flow network Download PDF

Info

Publication number
CN117275510A
CN117275510A CN202311062301.2A CN202311062301A CN117275510A CN 117275510 A CN117275510 A CN 117275510A CN 202311062301 A CN202311062301 A CN 202311062301A CN 117275510 A CN117275510 A CN 117275510A
Authority
CN
China
Prior art keywords
feature
underwater sound
gradient flow
module
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311062301.2A
Other languages
Chinese (zh)
Inventor
唐建勋
陈名松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202311062301.2A priority Critical patent/CN117275510A/en
Publication of CN117275510A publication Critical patent/CN117275510A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/24765Rule-based classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a small sample underwater sound target identification method and a system based on a multi-gradient flow network, which specifically comprise the following steps: s1, data acquisition, feature extraction and fusion; s2, model building and training; s3, model deployment; s4, collecting and processing real-time data; s5, underwater sound target identification; the invention relates to the technical field of underwater sound target identification. According to the small sample underwater sound target identification method and system based on the multi-gradient flow network, the original underwater sound characteristics are better expressed through the energy domain characteristic enhancement and multi-characteristic fusion method in the characteristic extraction and fusion module, and meanwhile the original underwater sound characteristics are embedded in the front end of the multi-gradient flow global characteristic enhancement network, so that the end-to-end underwater sound target identification is realized; in the multi-gradient flow global feature enhancement network, by a multi-gradient flow method and a context feature enhancement module, effective features are rapidly acquired and enhanced, and meanwhile, model parameters are reduced, so that the dual requirements of high recognition speed and high accuracy required by underwater sound target recognition are met.

Description

Small sample underwater sound target identification method and system based on multi-gradient flow network
Technical Field
The invention relates to the technical field of underwater sound target identification, in particular to a small sample underwater sound target identification method and system based on a multi-gradient flow network.
Background
The underwater sound target identification is a technical means for acquiring an underwater sound signal carrying target information by utilizing a hydrophone and then analyzing, processing and judging the target type, is one of the difficult problems in the field of underwater sound signal processing, and has important significance in the aspects of underwater environment detection, marine ship monitoring, underwater vehicle detection and the like.
The existing underwater sound target recognition model mainly comprises two modules, namely an underwater sound target signal characteristic extraction module and a classifier. In the existing underwater sound target recognition method, the feature extraction and the classifier are usually two relatively independent links, and the flow mainly comprises the steps of customizing a feature extraction method according to the underwater sound target signal features, extracting original underwater sound features by using the customized feature extraction method, and finally inputting the features into a specific classification model to judge the categories of the features.
The existing underwater sound target feature extraction method mainly comprises methods based on signal physical features, brain-like calculation and deep learning according to the difference of radiation noise among ships. The method based on the physical characteristics of the signals is mainly based on the basic characteristics of the underwater acoustic signals and the time-varying characteristics and non-Gaussian characteristics generated by the influence of factors such as seawater absorption, refraction, scattering, submarine reflection, sea surface reflection and the like. The brain-like calculation characteristics of the underwater acoustic signals are mainly Mel cepstrum coefficient simulating the nonlinear frequency processing of human ears, gamma filtering simulating the peripheral calculation of hearing, and the like. The deep learning-based method mainly extracts deep abstract features of underwater acoustic signals through a convolutional neural network. Huang et al obtain the acoustic target line spectral features by directly time-domain processing the raw acoustic data using An Autocorrelation Neural Network (AANN) without the need for data prior information (Huang c., yang k.and Yang q.et al, "Line spectrum extraction based on autoassociative neural networks," JASA Express Letters, vol.1pp.16003, 2021). Although the feature extraction method based on deep learning can extract abstract features deeper than the signal processing method based on signal physical features and brain-like computation, it also increases a large amount of computation cost.
Existing classifier models mainly include machine learning-based methods such as Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs). Although the machine learning method can also realize the feature classification task, the machine learning method depends on priori knowledge, has poor model generalization capability, and the feature extraction and the classifier work completely independently, so that the actual requirements of end-to-end underwater sound target recognition cannot be realized. Due to the rapid development of deep learning related theory, such as Convolutional Neural Network (CNN), generation of model of countermeasure network, deep Belief Network (DBN), transducer and self-encoder, a new solution is provided for the classifier model of underwater sound target recognition.
The prior technical scheme is as follows:
wang et al verify the feasibility of deep learning models for underwater sound target recognition by modeling convolutional neural networks and deep belief networks and then testing the models using 3 measured underwater sound targets (Wang Q.and Zeng X.Y., "Deep learning methods and their applications in underwater targets recognition," Tech. Acoust, vol.34, pp.138-140,2015). However, the two models mainly stack convolution operation to extract original characteristics of the underwater sound target, and as partial frequencies of the underwater sound target are similar to that of ocean background noise, the underwater sound target signal mixed with the ocean background noise is extremely vulnerable to losing effective characteristics in common convolution operation of stacking, and partial ocean background noise characteristics are reserved, so that the recognition accuracy of the models is reduced. Meanwhile, the model is not optimized for the small sample data set, so that the problem of over-fitting is very easy to occur when the small sample data set is used for training.
Li et al increases the residual network width on the basis of the ResNet network and combines channel attention mechanisms to enhance different channel characteristics, thereby enhancing the model's feature extraction capability for underwater acoustic targets (Li J., wang B.and Cui X.et al, "Underwater Acoustic Target Recognition Based on Attention Residual Network," Entropy, vol.24, pp.1657, 2022).
Yang et al enhance the recognition accuracy of the underwater sound target recognition model primarily by reducing the number of ResNet residual structures and using a attentive mechanism to help the model focus on important information (Yang S., xue L. And Hong X et al, "A Lightweight Network Model Based on an Attention Mechanism for Ship-Radiated Noise Classification," Journal of Marine Science and Engineering, vol.11, pp.432, 2023).
Although the recognition accuracy of the two models can obtain a good result, firstly, the models are not optimized for small samples, the problem of fitting is very easy to occur during training of the small samples, and secondly, the model parameter amount is large, so that the prediction time is long, and the effective balance between the recognition accuracy of the models and the prediction time cannot be realized.
Based on the retrieval of the above data, it can be seen that the prior art has the following drawbacks:
1. the existing model is mainly a double-stage model, and the flow is to extract effective characteristic information from underwater sound signals and then identify the target category by using a classifier. This form cannot meet the end-to-end underwater sound target recognition requirements;
2. the existing model mainly realizes the feature extraction of the underwater sound signal based on common convolution, but because the marine environment noise is mixed with partial information of the underwater sound target feature, the common convolution operation is extremely vulnerable to partial underwater sound target effective feature and the marine environment noise is erroneously reserved, so that the capability of extracting the effective feature of the underwater sound target recognition model is reduced;
3. the existing model has complex structure, and the large parameter quantity leads to the fact that the recognition accuracy and recognition speed of the model are inversely proportional, so that the dual requirements of high recognition speed and high accuracy required by underwater sound target recognition cannot be met;
4. because of the difficulty in underwater sound data acquisition, the sample sizes of various types in the existing public underwater sound data set are unbalanced, so that the overfitting phenomenon is easy to occur when an underwater sound target recognition model based on deep learning is used for training.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a small sample underwater sound target recognition method and a system based on a multi-gradient flow network, which solve the problems that the single characteristic extraction method of the conventional underwater sound target recognition model can not effectively express multi-dimensional underwater sound target characteristics, can not meet the requirements of high recognition precision and short prediction time required by industrial application, and is easy to generate over fitting when a small sample unbalanced data set is used for training.
(II) technical scheme
In order to achieve the above purpose, the invention is realized by the following technical scheme: a small sample underwater sound target identification method based on a multi-gradient flow network specifically comprises the following steps:
s1, data acquisition, feature extraction and fusion: dividing the data of each target class of underwater sound audio signals in the pre-acquired underwater sound target data set according to the preset equal time length, then respectively taking the divided signals as a data input feature extraction and fusion module, outputting a multi-dimensional feature map of the audio signals, and finally dividing the multi-dimensional feature map of each target class according to the preset proportion of a training set, a verification set and a test set;
s2, model building and training: building a multi-gradient flow global feature enhancement network, setting an activation function of the multi-gradient flow global feature enhancement network, a loss function parameter of the multi-gradient flow global feature enhancement network and a training parameter of the multi-gradient flow global feature enhancement network, inputting all multi-dimensional feature graphs in the training set and the verification set after feature extraction and fusion in the S1 into the multi-gradient flow global feature enhancement network, training the multi-gradient flow global feature enhancement network, and acquiring model weights of the multi-gradient flow global feature enhancement network when the multi-gradient flow global feature enhancement network converges;
s3, model deployment: performing model deployment on a development board by utilizing the feature extraction and fusion module in the S1 and the model weight acquired in the S2, and acquiring a deployed feature extraction and fusion module and a multi-gradient flow global feature enhancement network;
s4, real-time data acquisition and processing: collecting underwater target radiation noise by using a hydrophone, converting the underwater target radiation noise into an underwater sound audio signal, dividing the real-time underwater sound audio signal according to a preset equal time length, transmitting the real-time underwater sound audio signal to a feature extraction and fusion module after deployment in the step S3 in a frame unit, and processing the underwater sound signal received by the feature extraction and fusion module and outputting a multi-dimensional feature map of the audio signal;
s5, underwater sound target identification: and (3) transmitting the multidimensional feature map in the step (S4) to the multi-gradient flow global feature enhancement network after the deployment in the step (S3) and outputting the recognition result of the underwater sound target.
The invention is further provided with: the feature extraction and fusion module comprises constant Q transformation feature extraction, mel cepstrum feature extraction and enhancement, and constant Q transformation feature and Mel cepstrum feature enhancement and fusion;
the extracting of the constant Q transform feature includes:
a1, setting the frame length to 2048, and setting the frame overlapping between two frames, wherein the size of the frame is 75% of the frame length;
b1, using a Hanning window with a window size equal to the frame length for each frame signal;
c1, performing constant Q conversion operation, wherein the constant Q conversion of the finite length sequence x (n) is as follows:
wherein the method comprises the steps ofIs of length N k Is a hamming window of (a); q is a constant factor in a constant Q transform; k is the constant Q conversion frequency number, N k The value of (2) is related to the value of k;
wherein b is the number of frequency lines;
f k =f min ×2 k/b ,k=0,1,...,K-1
in the feature extraction process, fmin=1, fs=22050, and the constant Q transformation information is stored in a matrix X CQT In (k, n), since the original underwater sound signal has a sampling rate of 22050Hz of a 5s signal, the shape of the constant Q transform is 128×216.
The invention is further provided with: the extraction and enhancement of the mel-frequency cepstrum features comprise:
a2, setting the frame length as 2048, wherein the frames are overlapped between two adjacent frames, and the size of the frames is 75% of the frame length;
b2, using a Hanning window with the window size equal to the frame length for each frame signal;
c2, filtering noise by using short-time Fourier transform;
d2, obtaining a power spectrum by using a square sum, wherein the power spectrum mainly comprises two-dimensional space information of a frequency domain and a time domain of ship radiation noise;
e2, filtering each frame of information through 128 Mel filter banks and solving the logarithm to obtain Mel spectrograms;
f2, carrying out discrete cosine transform on the Mel spectrogram after carrying out logarithmic fit on human hearing to obtain a Mel cepstrum, wherein the shape of the Mel cepstrum is 128 multiplied by 216 because the sampling rate of the original underwater sound signal is a 5s signal of 22050 Hz;
and G2, adding delta characteristics and double-delta characteristics on the basis of the characteristics of the Mel-cepstrum to extract the characteristics of the delta Mel-cepstrum and the double-delta Mel-cepstrum, and converting the characteristics of the Mel-cepstrum, the delta Mel-cepstrum and the double-delta Mel-cepstrum into spectrograms on the basis that the sizes of Hop length, bins per octave and tuning are 512, 12 and 0 respectively, wherein the size of each image is 3×640×480 preset.
The invention is further provided with: the enhancing and fusing of the constant Q transform feature and mel-cepstrum feature comprises:
a3, adding three channel dimension values of the spectrograms of the constant Q transformation, the delta Mel cepstrum and the double-delta Mel cepstrum to form a 640 multiplied by 480 characteristic diagram;
b3, mapping the feature map to a range of which the pixel point value range is 0-255;
c3 is constant Q transformation, delta Mel cepstrum and double-delta Mel cepstrum respectively from top to bottom in channel dimension, and the fusion characteristic with 3×640×480 is formed.
The invention is further provided with: the operation flow of the multi-gradient flow global feature enhancement network specifically comprises the following steps:
a4, inputting the fusion characteristics into a multi-gradient flow global characteristic enhancement network;
b4, rapidly extracting effective information in the feature map by using a plurality of common convolution layers, and simultaneously reducing the size of the feature map;
c4, enabling the serial structure of a plurality of multi-head attention residual modules to pay attention to target foreground information to ignore background information while acquiring gradient flow information through the multi-gradient flow multi-head attention module;
d4, sorting by a convolution layer and further extracting effective characteristic information extracted by the C4;
e4, sequentially repeating a plurality of C4 and D4, and enhancing the feature extraction capacity of the model through stacking;
f4, acquiring underwater sound target context information of different receptive fields by using expansion convolution of different expansion rates through a context feature enhancement module, so that the multi-level abstract features better represent underwater sound target features;
g4, normalizing the feature map through convolution and pooling operations;
and H4, obtaining the category possibility of each target category through the full connection layer, and selecting the maximum category possibility as a recognition result.
The invention is further provided with: the operation flow of the multi-gradient flow multi-head attention module comprises the following steps:
a5, inputting a front feature map, and then normalizing the shape of the input feature map through convolution with the convolution kernel size of 1;
b5, inputting the gradient flow information into a plurality of multi-head attention residual modules connected in series to obtain gradient flow information of different receptive fields;
c5, combining the feature map output by each multi-head attention residual error module with the original front feature map standardized by convolution with the convolution kernel size of 1 to form a new feature map;
d5, modifying the shape value of the output feature map of C5 through convolution with the convolution kernel size of 1, so that superposition of multi-gradient-flow multi-head attention modules is facilitated;
the multi-head attention calculation formula is:
MultiHead(Q,K,V)=Concat(head 1 ,head 2 ,...,head h )W O
wherein Q, K, V represent the query vector, key vector and value vector, h represents the number of heads, and headi represents the output of the ith head, W O Is the output transform matrix;
the output header of each header is denoted as:
head i =Attention(QW i Q ,KW i K ,VW i V )
wherein W is i Q ,W i K ,W i V The i-th head query, key and value transformation matrix, and the Attention is a self-Attention calculation function, and the formula is as follows:
wherein d is k The method comprises the steps of normalizing the dimension of key vectors, softmax similarity, calculating the weight of each key vector, multiplying the weight by a value vector, and finally carrying out weighted summation to obtain the attention output.
The invention is further provided with: the running flow of the context characteristic enhancement module comprises the following steps:
a6, carrying out split-flow processing on the effective characteristic information of all-dimensional and all-size targets of the pre-extracted underwater acoustic signal at the speeds of 1, 3 and 5 through expansion convolution to obtain the effective characteristic information of different sensing fields;
b6, enhancing the target characteristic information through the self-adaptive characteristic enhancing module and the cascade operation module respectively;
and C6, carrying out weighted fusion on the effective characteristics obtained by the self-adaptive characteristic enhancement module and the cascade operation module.
The invention is further provided with: the self-adaptive characteristic enhancement module operation flow comprises the following steps:
a7, respectively compressing and reducing the dimensions of the front feature map into a single-channel feature map by using convolution with the convolution kernel size of 1;
b7, splicing feature graphs on channel dimensions according to the increasing order of rates, and acquiring the weight of each channel through softmax;
and C7, multiplying the softmax weight value by each dimension feature graph to complete channel dimension feature enhancement.
The invention is further provided with: the operation flow of the cascade operation module comprises the following steps:
and A8, splicing the feature graphs obtained through the expansion convolution of different rates to form a new feature graph.
The invention is further provided with: and calculating the error between the identification result and the true value in the H4 through a dynamic joint loss function.
The invention is further provided with: the dynamic joint loss function calculation formula is as follows:
L=-log(P t )+ε 1 (1-P t )
wherein P is t Model predictive probability, ε, of a target truth class 1 E-1, ++), thereby alleviating data imbalance between categories in the underwater acoustic target dataset, and using Cross-entopy Loss and Focal Loss functions alone tends to make the model overfit.
The invention also discloses a small sample underwater sound target recognition system based on the multi-gradient flow network, which comprises a feature extraction and fusion module, wherein the feature extraction and fusion module is used for extracting constant Q transformation features, extracting Mel cepstrum features and fusing the constant Q transformation features and the Mel cepstrum features;
the system also comprises a multi-gradient flow global feature enhancement network, wherein the multi-gradient flow global feature enhancement network comprises a multi-gradient flow multi-head attention module, a context feature enhancement module and a dynamic joint loss function;
the multi-gradient-flow multi-head attention module is used for rapidly extracting underwater sound target deep abstract features of different receptive fields through the gradient flow and residual error module, and enhancing part of target feature information by utilizing a multi-head attention mechanism;
the context feature enhancement module is used for acquiring effective feature information of different receptive fields in a shunting processing mode and carrying out self-adaptive fusion of the features of the different receptive fields;
the dynamic joint loss function is used for calculating errors between the recognition result and the true value in the model training stage.
(III) beneficial effects
The invention provides a small sample underwater sound target identification method and system based on a multi-gradient flow network. The beneficial effects are as follows:
(1) According to the invention, the feature extraction and fusion module is arranged at the front end of the multi-gradient flow global feature enhancement network, and the embedded connection mode is realized, so that the end-to-end underwater sound target identification is realized, and the problem that the feature extraction and classifier of the existing model, namely the multi-gradient flow global feature enhancement network, are mutually independent is avoided.
(2) According to the invention, the multi-dimensional characteristic enhancement module and the multi-gradient flow multi-head attention module are used for rapidly acquiring the effective information of the target, and simultaneously focusing on the underwater sound target characteristics which are mixed with the ocean background noise and have similar frequency, so that the context information of the target characteristics is enhanced, the recognition precision of the model is improved, and the problem that the existing model retains the ocean background noise characteristics similar to the target characteristics in the deep learning characteristic extraction process through stacking common convolution operation is avoided.
(3) According to the invention, by using the multi-gradient flow method, the model can obtain the same receptive field information and simultaneously reduce the model parameters, so that the dual requirements of high recognition speed and high accuracy required by underwater sound target recognition are realized.
(4) The invention uses the dynamic joint loss function to solve the problem of model overfitting caused by the fact that the sample size of each category in the underwater sound data set used for model training is less and unbalanced in the existing model.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a schematic diagram of a model architecture according to the present invention;
FIG. 3 is a schematic diagram of a flow chart of the feature extraction and fusion module of the present invention;
FIG. 4 is a schematic diagram of a multi-gradient flow global feature enhancement network according to the present invention;
FIG. 5 is a schematic diagram of a multi-gradient-flow multi-head attention module according to the present invention;
FIG. 6 is a schematic diagram of a multi-head attention residual module according to the present invention;
FIG. 7 is a schematic diagram of a contextual feature enhancement module of the present invention;
FIG. 8 is a schematic diagram of object class classification in an embodiment of the invention;
FIG. 9 is a table of test results in an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Referring to fig. 1-9, the following technical solutions are provided in the embodiments of the present invention: a small sample underwater sound target identification method based on a multi-gradient flow network specifically comprises the following steps:
s1, data set preparation: using a small sample ship radiation noise data set shipsEar with unbalanced sample size of each target class, wherein the shipsEar data set comprises 90 kinds of ship radiation noise of 11 target classes and audio of natural background noise of marine environment, and combining the 11 kinds of ship radiation noise and marine background noise into 5 target classes;
s2, data preprocessing: for the accuracy of experimental effect, a shipsEar dataset was subjected to preprocessing operation, wherein the sampling rate and the data length were set to 22050Hz and 5s, respectively, and the dividing ratio of the training set, the validation set and the test set was 8:1:1, detailed object classification is shown in figure 8;
s3, model training: the divided signals are respectively used as a data input feature extraction and fusion module, and a three-dimensional fusion feature map of the audio signals is output;
setting up a multi-gradient flow global feature enhancement network, setting an activation function of the multi-gradient flow global feature enhancement network as FReLU, a loss function parameter of the multi-gradient flow global feature enhancement network and a training parameter of the multi-gradient flow global feature enhancement network, wherein an adaptive moment estimation optimizer is used for suppressing sample noise interference, wherein a first-order momentum factor, a second-order momentum factor and a fuzzy factor of the adaptive moment estimation optimizer are respectively set to be 0.9, 0.999 and 0.0000001, setting an initial learning rate to be 0.001 for accelerating a training process, setting the learning rate in the training process as the initial learning rate multiplied by a weight attenuation coefficient of 0.0005, and setting the batch_size to be 32;
then taking all the underwater sound feature images in the training set and the verification set as the input training multi-gradient flow global feature enhancement network of the built multi-gradient flow global feature enhancement network;
the model weight of the multi-gradient flow global feature enhancement network is obtained through convergence of an experimental test model when the number of epochs is 90, and the model weight is converted into an ONNX format and then is deployed on a development board together with a feature extraction and fusion module;
s4, model deployment and target identification: transmitting the underwater sound audio signals of the test set in the shipsEar to a feature extraction and fusion module in a frame unit;
the characteristic extraction and fusion module receives the underwater sound signal and then processes a three-dimensional fusion characteristic diagram of the output audio signal;
and then automatically transmitting the feature map to a deployed multi-gradient flow global feature enhancement network to output the recognition result of the underwater sound target.
The existing underwater sound target recognition model ResNet, efficient and the small sample underwater sound target recognition method based on the multi-gradient flow network provided by the invention have the recognition results in a shipsEar test set shown in the figure 9.
Experiments prove that on a shipsEar data set, the multi-gradient flow global feature enhanced network identification precision is greatly improved compared with multiple versions such as ResNet and EfficientNet, the reasoning time is greatly reduced, the gradient flow global feature enhanced network is simple in structure and few in parameters, and the end-to-end high-precision and low-delay use requirements in the underwater sound target identification field can be met.

Claims (10)

1. A small sample underwater sound target identification method based on a multi-gradient flow network is characterized by comprising the following steps of: the method specifically comprises the following steps:
s1, data acquisition, feature extraction and fusion: dividing the data of each target class of underwater sound audio signals in the pre-acquired underwater sound target data set according to the preset equal time length, then respectively taking the divided signals as a data input feature extraction and fusion module, outputting a multi-dimensional feature map of the audio signals, and finally dividing the multi-dimensional feature map of each target class according to the preset proportion of a training set, a verification set and a test set;
s2, model building and training: building a multi-gradient flow global feature enhancement network, setting an activation function of the multi-gradient flow global feature enhancement network, a loss function parameter of the multi-gradient flow global feature enhancement network and a training parameter of the multi-gradient flow global feature enhancement network, inputting all multi-dimensional feature graphs in the training set and the verification set after feature extraction and fusion in the S1 into the multi-gradient flow global feature enhancement network, training the multi-gradient flow global feature enhancement network, and acquiring model weights of the multi-gradient flow global feature enhancement network when the multi-gradient flow global feature enhancement network converges;
s3, model deployment: performing model deployment on a development board by utilizing the feature extraction and fusion module in the S1 and the model weight acquired in the S2, and acquiring a deployed feature extraction and fusion module and a multi-gradient flow global feature enhancement network;
s4, real-time data acquisition and processing: collecting underwater target radiation noise by using a hydrophone, converting the underwater target radiation noise into an underwater sound audio signal, dividing the real-time underwater sound audio signal according to a preset equal time length, transmitting the real-time underwater sound audio signal to a feature extraction and fusion module after deployment in the step S3 in a frame unit, and processing the underwater sound signal received by the feature extraction and fusion module and outputting a multi-dimensional feature map of the audio signal;
s5, underwater sound target identification: and (3) transmitting the multidimensional feature map in the step (S4) to the multi-gradient flow global feature enhancement network after the deployment in the step (S3) and outputting the recognition result of the underwater sound target.
2. The method for identifying the underwater sound targets of the small samples based on the multi-gradient flow network according to claim 1, wherein the method comprises the following steps of: the feature extraction and fusion module comprises constant Q transformation feature extraction, mel cepstrum feature extraction and enhancement, and constant Q transformation feature and Mel cepstrum feature enhancement and fusion;
the extracting of the constant Q transform feature includes:
a1, setting the frame length to 2048, and setting the frame overlapping between two frames, wherein the size of the frame is 75% of the frame length;
b1, using a Hanning window with a window size equal to the frame length for each frame signal;
c1, performing constant Q conversion operation, wherein the constant Q conversion of the finite length sequence x (n) is as follows:
wherein the method comprises the steps ofIs of length N k Is a hamming window of (a); q is a constant factor in a constant Q transform; k is the constant Q conversion frequency number, N k The value of (2) is related to the value of k;
wherein b is the number of frequency lines;
f k =f min ×2 k/b ,k=0,1,...,K-1
in the feature extraction process, f min =1,f s =22050, constant Q transform information is stored in a matrix X CQT In (k, n), since the original underwater sound signal has a sampling rate of 22050Hz of a 5s signal, the shape of the constant Q transform is 128×216.
3. The method for identifying the underwater sound targets of the small samples based on the multi-gradient flow network according to claim 2, wherein the method comprises the following steps of: the extraction and enhancement of the mel-frequency cepstrum features comprise:
a2, setting the frame length as 2048, wherein the frames are overlapped between two adjacent frames, and the size of the frames is 75% of the frame length;
b2, using a Hanning window with the window size equal to the frame length for each frame signal;
c2, filtering noise by using short-time Fourier transform;
d2, obtaining a power spectrum by using a square sum, wherein the power spectrum mainly comprises two-dimensional space information of a frequency domain and a time domain of ship radiation noise;
e2, filtering each frame of information through 128 Mel filter banks and solving the logarithm to obtain Mel spectrograms;
f2, carrying out discrete cosine transform on the Mel spectrogram after carrying out logarithmic fit on human hearing to obtain a Mel cepstrum, wherein the shape of the Mel cepstrum is 128 multiplied by 216 because the sampling rate of the original underwater sound signal is a 5s signal of 22050 Hz;
and G2, adding delta characteristics and double-delta characteristics on the basis of the characteristics of the Mel-cepstrum to extract the characteristics of the delta Mel-cepstrum and the double-delta Mel-cepstrum, converting the characteristics of the Mel-cepstrum, the delta Mel-cepstrum and the double-delta Mel-cepstrum into spectrograms on the basis that the sizes of Hop length, bins per octave and turn are 512, 12 and 0 respectively, and carrying out characteristic enhancement on an energy domain, wherein the size of each image is 3X 640X 480 preset.
4. The method for identifying the underwater sound targets of the small samples based on the multi-gradient flow network according to claim 2, wherein the method comprises the following steps of: the enhancing and fusing of the constant Q transform feature and mel-cepstrum feature comprises:
a3, adding three channel dimension values of the spectrograms of the constant Q transformation, the delta Mel cepstrum and the double-delta Mel cepstrum to form a 640 multiplied by 480 characteristic diagram;
b3, mapping the feature map to a range of which the pixel point value range is 0-255;
c3 is constant Q transformation, delta Mel cepstrum and double-delta Mel cepstrum respectively from top to bottom in channel dimension, and the fusion characteristic with 3×640×480 is formed.
5. The method for identifying the underwater sound targets of the small samples based on the multi-gradient flow network according to claim 4, wherein the method comprises the following steps of: the operation flow of the multi-gradient flow global feature enhancement network specifically comprises the following steps:
a4, inputting the fusion characteristics into a multi-gradient flow global characteristic enhancement network;
b4, using a plurality of common convolution layers to rapidly extract effective information in the feature map;
c4, using a serial structure of a plurality of multi-head attention residual modules by the multi-gradient-flow multi-head attention module to acquire gradient flow information and simultaneously focusing on target foreground information to ignore background information;
d4, sorting by a convolution layer and further extracting effective characteristic information extracted by the C4;
e4, sequentially repeating a plurality of C4 and D4;
f4, acquiring rich physical, channel and context information by using expansion convolution of different expansion rates through a context feature enhancement module, and enhancing feature weights of different channels;
g4, normalizing the feature map through convolution and pooling operations;
and H4, obtaining the category possibility of each target category through the full connection layer, and selecting the maximum category possibility as a recognition result.
6. The method for identifying the underwater sound targets of the small samples based on the multi-gradient flow network according to claim 5, wherein the method comprises the following steps of: the operation flow of the multi-gradient flow multi-head attention module comprises the following steps:
a5, inputting a front feature map, and then normalizing the shape of the input feature map through convolution with the convolution kernel size of 1;
b5, inputting the gradient flow information into a plurality of multi-head attention residual modules connected in series to obtain gradient flow information of different receptive fields;
c5, combining the feature map output by each multi-head attention residual error module with the original front feature map standardized by convolution with the convolution kernel size of 1 to form a new feature map;
d5, modifying the shape value of the output characteristic diagram of the C5 through convolution with the convolution kernel size of 1;
the multi-head attention calculation formula is:
MultiHead(Q,K,V)=Concat(head 1 ,head 2 ,...,head h )W O
wherein Q, K, V represent a query vector, a key vector, and a value vector, respectively, h represents the number of heads, head i Represents the output of the ith header, W O Is the output transform matrix;
output head of each head i Expressed as:
wherein W is i Q ,W i K ,W i V The i-th head query, key and value transformation matrix, and the Attention is a self-Attention calculation function, and the formula is as follows:
wherein d is k The method comprises the steps of normalizing the dimension of key vectors, softmax similarity, calculating the weight of each key vector, multiplying the weight by a value vector, and finally carrying out weighted summation to obtain the attention output.
7. The method for identifying the underwater sound targets of the small samples based on the multi-gradient flow network according to claim 6, wherein the method comprises the following steps: the running flow of the context characteristic enhancement module comprises the following steps:
a6, carrying out split-flow processing on the effective characteristic information of all-dimensional and all-size targets of the pre-extracted underwater acoustic signal at the speeds of 1, 3 and 5 through expansion convolution to obtain the effective characteristic information of different sensing fields;
b6, respectively enhancing the target characteristic information through the self-adaptive characteristic enhancing module and the cascade operation module;
and C6, finally, carrying out weighted fusion on the effective characteristics obtained by the self-adaptive characteristic enhancement module and the cascade operation module.
8. The method for identifying the underwater sound targets of the small samples based on the multi-gradient flow network according to claim 7, wherein the method comprises the following steps of: the self-adaptive characteristic enhancement module operation flow comprises the following steps:
a7, respectively compressing and reducing the dimensions of the front feature map into a single-channel feature map by using convolution with the convolution kernel size of 1;
b7, splicing feature graphs on channel dimensions according to the increasing order of rates, and acquiring the weight of each channel through softmax;
and C7, multiplying the softmax weight value by each dimension feature graph to complete channel dimension feature enhancement.
9. The method for identifying the underwater sound targets of the small samples based on the multi-gradient flow network according to claim 7, wherein the method comprises the following steps of: the operation flow of the cascade operation module comprises the following steps:
a8, splicing the feature graphs obtained through the expansion convolution of different rates to form a new feature graph;
the error between the identification result in the H4 and the true value is calculated through a dynamic joint loss function;
the dynamic joint loss function calculation formula is as follows:
L=-log(P t )+ε 1 (1-P t )
wherein P is t Model predictive probability, ε, of a target truth class 1 ∈[-1,∞)。
10. A small sample underwater sound target recognition system based on a multi-gradient flow network is characterized in that: the device comprises a feature extraction and fusion module, wherein the feature extraction and fusion module is used for extracting constant Q transformation features, extracting Mel cepstrum features and fusing the constant Q transformation features and the Mel cepstrum features;
the system also comprises a multi-gradient flow global feature enhancement network, wherein the multi-gradient flow global feature enhancement network comprises a multi-gradient flow multi-head attention module, a context feature enhancement module and a dynamic joint loss function;
the multi-gradient-flow multi-head attention module is used for extracting underwater sound target deep abstract features of different receptive fields through the gradient flow and residual error module, and enhancing part of target feature information by utilizing a multi-head attention mechanism;
the context feature enhancement module is used for acquiring effective feature information of different receptive fields in a shunting processing mode and carrying out self-adaptive fusion of the features of the different receptive fields;
the dynamic joint loss function is used for calculating errors between the recognition result and the true value in the model training stage.
CN202311062301.2A 2023-08-22 2023-08-22 Small sample underwater sound target identification method and system based on multi-gradient flow network Pending CN117275510A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311062301.2A CN117275510A (en) 2023-08-22 2023-08-22 Small sample underwater sound target identification method and system based on multi-gradient flow network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311062301.2A CN117275510A (en) 2023-08-22 2023-08-22 Small sample underwater sound target identification method and system based on multi-gradient flow network

Publications (1)

Publication Number Publication Date
CN117275510A true CN117275510A (en) 2023-12-22

Family

ID=89199812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311062301.2A Pending CN117275510A (en) 2023-08-22 2023-08-22 Small sample underwater sound target identification method and system based on multi-gradient flow network

Country Status (1)

Country Link
CN (1) CN117275510A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117727307A (en) * 2024-02-18 2024-03-19 百鸟数据科技(北京)有限责任公司 Bird voice intelligent recognition method based on feature fusion

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117727307A (en) * 2024-02-18 2024-03-19 百鸟数据科技(北京)有限责任公司 Bird voice intelligent recognition method based on feature fusion
CN117727307B (en) * 2024-02-18 2024-04-16 百鸟数据科技(北京)有限责任公司 Bird voice intelligent recognition method based on feature fusion

Similar Documents

Publication Publication Date Title
CN108922560B (en) Urban noise identification method based on hybrid deep neural network model
CN109410917B (en) Voice data classification method based on improved capsule network
Stöter et al. Countnet: Estimating the number of concurrent speakers using supervised learning
CN108875592A (en) A kind of convolutional neural networks optimization method based on attention
Wang et al. ia-PNCC: Noise Processing Method for Underwater Target Recognition Convolutional Neural Network.
CN117275510A (en) Small sample underwater sound target identification method and system based on multi-gradient flow network
CN113488058A (en) Voiceprint recognition method based on short voice
CN112712819B (en) Visual auxiliary cross-modal audio signal separation method
Sun et al. Underwater single-channel acoustic signal multitarget recognition using convolutional neural networks
Wei et al. A method of underwater acoustic signal classification based on deep neural network
CN113191178A (en) Underwater sound target identification method based on auditory perception feature deep learning
CN111653267A (en) Rapid language identification method based on time delay neural network
CN112183107A (en) Audio processing method and device
CN112183582A (en) Multi-feature fusion underwater target identification method
CN114333865A (en) Model training and tone conversion method, device, equipment and medium
CN117310668A (en) Underwater sound target identification method integrating attention mechanism and depth residual error shrinkage network
CN112133326A (en) Gunshot data amplification and detection method based on antagonistic neural network
CN112562698A (en) Power equipment defect diagnosis method based on fusion of sound source information and thermal imaging characteristics
CN115267672A (en) Method for detecting and positioning sound source
Alouani et al. A spatio-temporal deep learning approach for underwater acoustic signals classification
CN113488069B (en) Speech high-dimensional characteristic rapid extraction method and device based on generation type countermeasure network
Huilian et al. Speech emotion recognition based on BLSTM and CNN feature fusion
CN115909040A (en) Underwater sound target identification method based on self-adaptive multi-feature fusion model
Li et al. Fdn: Finite difference network with hierarchical convolutional features for text-independent speaker verification
CN111274989A (en) Deep learning-based field vehicle identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination