CN113111786A - Underwater target identification method based on small sample training image convolutional network - Google Patents

Underwater target identification method based on small sample training image convolutional network Download PDF

Info

Publication number
CN113111786A
CN113111786A CN202110403699.6A CN202110403699A CN113111786A CN 113111786 A CN113111786 A CN 113111786A CN 202110403699 A CN202110403699 A CN 202110403699A CN 113111786 A CN113111786 A CN 113111786A
Authority
CN
China
Prior art keywords
sample
feature
point
intercepted
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110403699.6A
Other languages
Chinese (zh)
Other versions
CN113111786B (en
Inventor
吴金建
莫周
石光明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110403699.6A priority Critical patent/CN113111786B/en
Publication of CN113111786A publication Critical patent/CN113111786A/en
Application granted granted Critical
Publication of CN113111786B publication Critical patent/CN113111786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention discloses an underwater target identification method based on a convolution network of a small sample training diagram, which mainly solves the problems that information is lost due to characteristic extraction of underwater sound data and the underwater sound data of the small sample cannot be effectively fitted with the network in the prior art. The method comprises the following steps: (1) generating a training set of small samples; (2) extracting the characteristics of each sample in the training set; (3) constructing a characteristic matrix set; (4) constructing a knowledge graph; (5) converting the knowledge graph into a connection matrix; (6) constructing a graph convolution network; (7) training a graph convolution network; (8) and identifying the underwater target. The characteristics of the underwater acoustic signals are fully represented by extracting a plurality of characteristics, and the characteristics are fused by utilizing the knowledge graph, so that the optimization direction is provided for the fitting of the network under the small sample data, and the method has the advantages that the network is not easy to over-fit, and the accuracy is high.

Description

Underwater target identification method based on small sample training image convolutional network
Technical Field
The invention belongs to the technical field of signal processing, and further relates to an underwater target identification method based on a small sample training graph convolutional network in the technical field of sound target identification. Aiming at a plurality of practical problems that underwater sound data is difficult to obtain, information is lost due to feature extraction and the like, the method realizes identification of an underwater small sample target by means of embedding knowledge through a graph network and enriching priori knowledge.
Background
At present, the main method for classifying and identifying underwater targets at home and abroad is a pattern identification method based on the combination of speech signal processing and a classifier, and generally adopted methods generally comprise the processes of data acquisition by using sonar, data preprocessing, feature extraction, classification decision and the like. The two most important steps are the selection of the feature extraction and classification method. The extraction of the features is generally that a single feature vector or a plurality of feature vectors are directly spliced to form a new feature vector, and the classification and identification model is mainly a classical machine learning method and a deep learning method of some mainstream, such as K neighbor, clustering, a support vector machine, a deep Neural network DNN (deep Neural networks) and the like. Although the methods obtain better classification results in underwater target identification tasks, the extracted features are not large in quantity, so that the risk of information loss exists after the feature extraction operation; the contact between the features is not considered by combining the prior knowledge, the features are spliced blindly and simply, and the contact between the features is not represented by utilizing the prior knowledge; a large amount of data is also required to train the network when using deep learning networks for classification. Therefore, when the underwater target recognition is performed on a small sample, the classification accuracy is not high.
In the patent document "a multi-feature fusion underwater target identification method" (application number: 202010930201.7 application publication number: CN112183582A) applied by the university of china ocean, a method for identifying a target by performing head-to-tail fusion on short-time energy features extracted from a time domain by a sound signal and gamma pass frequency cepstral coefficients gfcc (gamma frequency cepstral coefficients) extracted from a frequency domain to form a new feature vector is disclosed. The method comprises the following specific steps: standardizing the collected underwater acoustic signals, mapping the result to [0,1], extracting Short-time energy characteristics and GFCC characteristics of the signals on a time domain and a frequency domain respectively, splicing the Short-time energy characteristics and the GFCC characteristics end to form a new fused characteristic vector, and performing classification prediction by using a Convolutional Neural network (Convolutional Neural Networks) CNN-Long Short-Term Memory network LSTM (Long Short-Term Memory) integrated time sequence network model. The method improves the classification precision of the underwater target identification method by simulating the auditory perception characteristic of human ears. However, the method still has the disadvantages that the number of extracted features is small, various characteristics of the original sound signal cannot be comprehensively represented, information represented by each feature is independent and has mutual influence, the existing priori knowledge is used for representing the contact between the features, the knowledge can be better embedded into input data, and the two features are directly spliced and fused in the method, so that the part of information can be lost.
Wang Shenggui et al, in a paper published by it, "Underwater target recognition method research based on deep learning" (underwater target recognition method research based on deep learning [ J ]. Ship science and technology, 2020,42(23): 141-. The method comprises the following specific steps: the method comprises the steps of utilizing a deep convolutional neural network to realize self-adaption feature extraction of a target two-dimensional time frequency spectrogram (LOFAR), then adopting a full connection layer to transform features into a class space, and finally utilizing a softmax function to realize intelligent identification of underwater targets. The method effectively reduces the influence of noise. However, the method still has the disadvantage that a large amount of training data is required because the network used by the method comprises three convolutional layers and a fully connected layer, but the network cannot be effectively fitted when the number of samples is insufficient.
Disclosure of Invention
The invention aims to provide an underwater target identification method based on a small sample training graph convolutional neural network aiming at the defects of the prior art, which is used for solving the problems that the extracted signal features are few, the sound characteristics of an underwater target cannot be comprehensively represented and the network cannot be effectively fitted when the samples are insufficient.
The idea for realizing the purpose of the invention is as follows: by extracting six characteristics of the underwater sound signal, the tone characteristic, the loudness characteristic, the regularity characteristic and the depth characteristic of the underwater sound signal are comprehensively represented, and the problem that only information of one or two characteristics can be obtained and information of other category characteristics is lost after the characteristics of the underwater sound signal are extracted in engineering application is solved. The invention constructs the knowledge graph according to the physical definition of each characteristic, and utilizes the knowledge graph to aggregate the same characteristics of the input characteristics, thereby reducing the optimization space of the optimal solution of network search and solving the problem that the network cannot be effectively fitted when the data samples are insufficient in engineering application.
In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:
(1) generating a training set of small samples:
(1a) selecting at least 10 samples from each type of underwater sound signals, wherein each sample corresponds to a type label, if the samples are multi-channel signals, only the signal information of a first channel is taken, and the sampling rates of all the samples are uniformly processed to 16000 Hz;
(1b) selecting one value from 32000, 48000 and 60000 as the total number of points in the Hamming window;
(1c) carrying out windowing framing processing on each sample by using a window function, and forming a training set by all intercepted samples subjected to windowing framing and labels;
(2) extracting the characteristics of each sample in the training set:
(2a) performing windowing and framing processing on each intercepted sample as in the step (1c) to obtain a secondary intercepted sample cut by each intercepted sample; respectively utilizing a spectrum roll-off point generation method, a spectrum centroid calculation formula, an energy calculation formula, a zero crossing rate calculation formula and an autocorrelation coefficient calculation formula to calculate a short-time spectrum roll-off point, a short-time spectrum centroid, short-time energy, a short-time zero crossing rate and a short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample, and respectively splicing the short-time spectrum roll-off point, the short-time spectrum centroid, the short-time energy, the short-time zero crossing rate and the short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample to obtain a spectrum roll-off point feature, a spectrum centroid feature, an energy feature, a zero crossing rate feature and an autocorrelation coefficient feature of the intercepted sample;
(2b) inputting each intercepted sample obtained in the step (1c) into a VGGish network, and taking the output of the network as the undescribable semantic features of the intercepted sample;
(2c) respectively carrying out principal component analysis on the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each sample, and reducing the dimensionality of each characteristic of the sample to 128;
(3) constructing a feature matrix set:
(3a) sequentially splicing the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each underwater acoustic signal according to rows to form a characteristic matrix of the underwater acoustic signal;
(3b) combining the feature matrixes of all the underwater acoustic signals into a feature matrix set;
(4) constructing a knowledge graph:
according to respective physical definitions of the features, dividing the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature and the autocorrelation coefficient feature after dimension reduction into four categories of tone, timbre, loudness and regularity respectively, and connecting each feature with the same type of feature; classifying the undescribable semantic features into a depth feature class; connecting undescribable semantic features with each feature, and forming a knowledge graph by the connected features;
(5) converting the knowledge-graph into a connection matrix:
(5a) respectively numbering the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature, the autocorrelation coefficient feature and the undescribable semantic feature after dimensionality reduction as 1,2,3,4,5 and 6;
(5b) initializing a connection matrix with 6 x 6 of dimensionalities and all zeros, connecting the spectrum centroid characteristics with the zero-crossing rate characteristics according to the knowledge graph, setting the values of the fourth column of the second row and the fourth column of the connection matrix as 1, and setting all elements of the sixth row and all elements of the sixth column of the connection matrix as 1 if the undescribable semantic characteristics are connected with the other five characteristics;
(5c) setting a diagonal element value of the connection matrix to 1;
(6) constructing a graph convolution network:
a four-layer graph convolution network is constructed, and the structure is as follows: the first graph volume layer, the second graph volume layer, the first full connection layer, the second full connection layer and the four network layers are sequentially connected; setting the sizes of the feature mapping matrixes of the first graph convolution layer and the second graph convolution layer as 128 × 100 and 100 × 64 respectively, wherein the number of feature mapping units of the first graph convolution layer and the second graph convolution layer is 384 and 3 respectively;
(7) training the graph convolutional network:
inputting the feature matrix set and the connection matrix into a graph convolution network, iteratively updating two feature mapping matrixes and two feature mapping units in the network, and stopping training when the output of a loss function is less than 0.01 or the number of training iterations reaches 350 to obtain a trained graph convolution network;
(8) identifying underwater targets:
and (3) extracting six characteristic features of the sound signal of the underwater target to be recognized by adopting the same operation as the step (2), obtaining a characteristic matrix of the sound signal by adopting the same splicing operation as the step (3a), and inputting the characteristic matrix of the sound signal and the connection matrix obtained in the step (5) into a trained graph convolution network together to obtain a recognition result of the target sound signal.
Compared with the prior art, the invention has the following advantages:
firstly, the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature, the autocorrelation coefficient feature and the undescribable semantic feature are extracted from the underwater sound signal, and the tone feature, the tone color feature, the loudness feature and the depth feature of the underwater sound signal are respectively characterized by the six features, so that comprehensive characteristic expression of the underwater sound signal is obtained, the problem of feature information loss during extraction of the characteristic of the underwater sound signal is solved, and the accuracy of underwater sound identification is improved.
Secondly, the knowledge graph is constructed according to the actual definition of each feature of the underwater sound signal, and the feature association degree between the features is set by using the knowledge graph, so that the same features between the features are fully aggregated, the different features between the features are amplified, the network distinguishability is enhanced, and the network can be better fitted, so that the network which is not easy to be over-fitted can be trained when the problem of insufficient samples is faced.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The implementation steps of the present invention are further described in detail with reference to fig. 1.
Step 1, generating a training set of small samples.
At least 10 samples are selected from each type of underwater sound signals, each sample corresponds to a class label, if the samples are multi-channel signals, only the signal information of the first channel is taken, and the sampling rates of all the samples are uniformly processed to 16000 Hz.
Any one value from 32000, 48000, 60000 is taken as the total number of points in the hamming window.
Each sample is windowed and framed with a window function. Windowing and framing processing are performed because the sound emitted by an underwater target signal is disordered macroscopically and stable microscopically, and therefore windowing and framing processing needs to be performed on the signal to obtain a short-time and stable intercepted sample. And forming a training set by all the intercepted samples and the labels after windowing and framing.
The windowing framing process is as follows.
Step 1, calculating the amplitude of each point in the Hamming window according to the following formula:
Figure BDA0003021383370000051
where ω (n) represents the amplitude of the nth point in the Hamming window, cos represents the cosine operation, π represents the circumference ratio, and M represents the total number of points in the Hamming window.
And 2, setting the first sampling point from the left in each sample as the starting point of the intercepted sample of the sample, and setting the starting point of the intercepted sample as the starting point of the current intercepted sample.
Step 3, selecting the starting point of the currently intercepted sample of each sample
Figure BDA0003021383370000052
A number of sample points are sampled at the time of sampling,
Figure BDA0003021383370000053
is equal to M, will
Figure BDA0003021383370000054
And (3) multiplying each sampling point in the sampling points by the amplitude of each point in the Hamming window in sequence, forming all products into an intercepted sample of the current iteration, and setting the label of the intercepted sample to be the same as the label of the corresponding sample.
And 4, moving the current intercepted sample starting point of each sample to the right by gamma sampling points to obtain an updated current intercepted sample starting point, wherein gamma is 0.5M.
And 5, repeating the 3 rd step and the 4 th step of the step until the number of sampling points from the starting point of the current intercepted sample to the last sampling point of each sample is less than
Figure BDA0003021383370000055
And stopping the process to obtain all intercepted samples after windowing and framing.
And 2, extracting the characteristics of each sample in the training set.
And carrying out windowing and framing processing on each intercepted sample again to obtain a secondary intercepted sample cut by each intercepted sample. Respectively utilizing a spectrum roll-off point generation method, a spectrum centroid calculation formula, an energy calculation formula, a zero crossing rate calculation formula and an autocorrelation coefficient calculation formula to calculate a short-time spectrum roll-off point, a short-time spectrum centroid, short-time energy, a short-time zero crossing rate and a short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample, and respectively splicing the short-time spectrum roll-off point, the short-time spectrum centroid, the short-time energy, the short-time zero crossing rate and the short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample to obtain a spectrum roll-off point feature, a spectrum centroid feature, an energy feature, a zero crossing rate feature and an autocorrelation coefficient feature of the intercepted sample. And inputting each intercepted sample into a VGGish model network, and taking the output of the network as the undescribable semantic features of the intercepted sample.
The spectral roll-off point generation method is as follows.
Step 1, performing the following discrete Fourier transform on each secondary intercepted sample cut by each intercepted sample to obtain a frequency domain sequence of each secondary intercepted sample:
Figure BDA0003021383370000061
wherein the content of the first and second substances,
Figure BDA0003021383370000062
q-th secondary cut sample c representing the i-th cut samplei,qIn the corresponding frequency domain sequence
Figure BDA0003021383370000063
The frequency value corresponding to each frequency point is obtained,
Figure BDA0003021383370000064
the sequence numbers of the frequency points in the frequency domain sequence are shown,
Figure BDA0003021383370000065
m1q-th secondary cut sample c representing the i-th cut samplei,qThe total number of samples included, Σ representing the summation operation, k representing the sequence number of the sample in the second truncated sample, xi,q(k) Indicating the ith cut sampleThe corresponding numerical value of the kth sampling point in the q-th secondary intercepted sample, e(·)Denotes exponential operation with a natural constant e as the base, j denotes the imaginary unit symbol, and pi denotes the circumferential ratio.
And 2, sequentially accumulating the frequency value of each frequency point in each frequency domain sequence, stopping accumulation when the accumulated value is more than 85 percent of the total sum of the whole frequency domain sequence values, and taking the total number of the accumulated frequency points as the characteristic value of the spectrum roll-off point of the frequency domain sequence.
The spectral centroid generation method is as follows.
And step 1, performing the same discrete Fourier transform operation on each secondary intercepted sample after each intercepted sample is cut to obtain a frequency domain sequence of each secondary intercepted sample.
And 2, generating the spectrum centroid of each frequency domain sequence by using the following formula:
Figure BDA0003021383370000071
wherein the content of the first and second substances,
Figure BDA0003021383370000072
represents the q-th secondary cut sample c after the i-th cut sample is cuti,qA spectrum centroid characteristic value, L represents the length of the frequency domain sequence, w represents the sequence number of the frequency point in the frequency domain sequence, hi,q(w) the q-th secondary cut sample c representing the i-th cut sample cuti,qAnd the frequency value corresponding to the w-th frequency point in the corresponding frequency domain sequence.
The energy calculation formula is as follows:
Figure BDA0003021383370000073
wherein the content of the first and second substances,
Figure BDA0003021383370000074
q-th secondary cut sample c representing the i-th cut samplei,qShort-term energy of.
The zero crossing rate calculation formula is as follows:
Figure BDA0003021383370000075
wherein the content of the first and second substances,
Figure BDA0003021383370000076
q-th secondary cut sample c representing the i-th cut samplei,qIs short-term zero-crossing rate, sgn (·) denotes a sign function, xi,q(k +1) denotes the value of the (k +1) th sample point in the qth sub-truncated sample of the ith truncated sample cut.
The autocorrelation coefficient calculation formula is as follows:
Figure BDA0003021383370000077
wherein the content of the first and second substances,
Figure BDA0003021383370000078
q-th secondary cut sample c representing the i-th cut samplei,qThe short-time autocorrelation coefficients of (a) are,
Figure BDA0003021383370000079
representing a twice-truncated sample ci,qMean value of (1), xi,q(k + l) denotes the value of the (k + l) th sample point in the qth sub-truncated sample of the ith truncated sample cut.
And respectively carrying out principal component analysis on the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each sample, and reducing the dimensionality of each characteristic of the sample to 128.
And 3, constructing a feature matrix set.
And sequentially splicing the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each underwater sound signal according to rows to form a characteristic matrix of the underwater sound signal.
And combining the feature matrixes of all the underwater sound signals into a feature matrix set.
And 4, constructing a knowledge graph.
According to respective physical definitions of the features, dividing the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature and the autocorrelation coefficient feature after dimension reduction into four categories of tone, timbre, loudness and regularity respectively, connecting each feature with the same type of feature, and disconnecting different types of features; classifying the undescribable semantic features into a depth feature class; and connecting the undescribable semantic features with each feature, and forming a knowledge graph by the connected features.
The method for defining the spectrum roll-off point characteristic as the tone class characteristic comprises the following steps: the frequency spectrum roll-off point is characterized in that the proportion of the low-frequency energy of the underwater acoustic signal to the total energy of the underwater acoustic signal represents the strength of the low-frequency energy of the underwater acoustic signal; the pitch represents the height of the frequency of the underwater sound signal, so the characteristic of the frequency spectrum roll-off point is classified as a pitch-class characteristic.
The method for defining the zero-crossing rate characteristic and the spectral centroid characteristic as the tone class characteristic comprises the following steps: the zero-crossing rate characteristic represents the zero-crossing times of the underwater acoustic signal and represents the waveform change rate characteristic of the underwater acoustic signal; the spectral centroid characteristic is one of important physical parameters for describing the timbre attribute of the underwater acoustic signal and is the center of gravity of the frequency components of the underwater acoustic signal; the tone represents that different underwater sound signals are always represented by distinctive characteristics in terms of waveforms, so that the zero-crossing rate characteristic and the spectral centroid characteristic are classified into tone-class characteristics.
The method for defining the energy characteristics as the tone class characteristics comprises the following steps: the energy characteristics represent the strength of underwater acoustic signals at different moments; loudness is a physical quantity describing the strength of an underwater sound signal, so energy features are classified as loudness-like features.
The method for defining the autocorrelation coefficient characteristics as the regularity class characteristics comprises the following steps: the autocorrelation coefficient characteristic is the similarity between two observed values of the underwater sound signal, and reflects the degree of regularity of the underwater sound signal, so that the characteristic is classified as a regularity characteristic.
The method for defining the undescribable semantic features as the depth feature class features comprises the following steps: the undescribable semantic features are high-dimensional features extracted through a VGGish network, do not have clear physical meanings, and therefore the features are classified as deep feature class features.
And 5, converting the knowledge graph into a connection matrix.
And respectively numbering the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature, the autocorrelation coefficient feature and the undescribable semantic feature after dimensionality reduction as 1,2,3,4,5 and 6.
Initializing a connection matrix with all zeros of 6 x 6 in dimensionality, connecting the spectral centroid features with the zero-crossing rate features according to the knowledge graph, setting the values of the fourth column in the second row and the fourth column of the connection matrix and the values of the second column in the fourth row to be 1, connecting the undescribable semantic features with other five features, and setting all elements of the sixth row and all elements of the sixth column of the connection matrix to be 1.
The diagonal element value of the connection matrix is set to 1.
And 6, constructing a graph convolution network.
A four-layer graph convolution network is constructed, and the structure is as follows: the first graph volume layer, the second graph volume layer, the first full connection layer, the second full connection layer and the four network layers are sequentially connected; the sizes of the feature mapping matrixes of the first graph convolution layer and the second graph convolution layer are respectively set to be 128 × 100 and 100 × 64, and the number of feature mapping units of the first graph convolution layer and the second graph convolution layer is 384 and 3.
And 7, training a graph convolution network.
Inputting the feature matrix set and the connection matrix into a graph convolution network, iteratively updating two feature mapping matrixes and two feature mapping units in the network, and stopping training when the output of the loss function is less than 0.01 or the training iteration times reach 350 times to obtain the trained graph convolution network.
And 8, identifying the underwater target.
Extracting the spectrum roll-off point characteristic, the spectrum centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of the underwater sound signal to be recognized, sequentially splicing the spectrum roll-off point characteristic, the spectrum centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of the underwater sound signal according to rows to obtain a characteristic matrix of the sound signal, and inputting the characteristic matrix connection matrix of the sound signal into a trained graph convolution network together to obtain a recognition result of the target sound signal.
The effect of the present invention is further explained by combining the simulation experiment as follows:
1. simulation experiment conditions are as follows:
the hardware platform of the simulation experiment of the invention is as follows: the processor is an Intel i 79750H CPU, the main frequency is 2.60GHz, and the memory is 16 GB.
The software platform of the simulation experiment of the invention is as follows: windows 10 operating system and python 3.6.
2. Simulation content and result analysis thereof:
the simulation experiment of the invention is to classify the input underwater sound signals respectively by adopting the invention and a prior art (SVM classification method).
The underwater sound signal data used by the simulation experiment of the invention is three types of simulated underwater sound data generated by an underwater sound signal simulator, and the three types of simulated underwater sound data are warship sound, civil ship sound and submarine sound respectively.
The effects of the present invention are further described below with reference to table 1.
And (3) evaluating the results of classifying the three types of underwater sound signal data by using two methods respectively by using two evaluation indexes (classification precision of each type and total precision OA).
The overall accuracy OA, the accuracy of each classification of the three types of underwater acoustic signals, is calculated using the following formula, and all the calculation results are plotted in table 1:
Figure BDA0003021383370000101
Figure BDA0003021383370000102
TABLE 1 quantitative analysis table of classification results of the present invention and the prior art in simulation experiments
Figure BDA0003021383370000103
As can be seen from Table 1, the total classification accuracy OA of the method is 94.0 percent, which is higher than that of the SVM technical method, and the classification accuracy of each class is also higher than that of the SVM technical method, so that the method can obtain higher classification accuracy of the underwater acoustic signals.
The above simulation experiments show that: the method can comprehensively express the detail information in the underwater sound signal by using the extracted multiple characteristics, and utilizes the knowledge graph designed according to the characteristic definition to aggregate the characteristics among the characteristics and amplify the different characteristics among the characteristics, so that the expressed information of the characteristics is more prominent, and the key characteristic information can be better learned by a network, thereby ensuring that the network has a more correct optimization direction, ensuring that the network is not easy to generate an overfitting phenomenon, and being beneficial to final classification.

Claims (7)

1. The underwater target identification method based on the convolution network of the small sample training image is characterized in that six characteristics of sound emitted by an underwater target signal are extracted, and a knowledge graph is constructed according to respective physical definitions of the characteristics; the method comprises the following specific steps:
(1) generating a training set of small samples:
(1a) selecting at least 10 samples from each type of underwater sound signals, wherein each sample corresponds to a type label, if the samples are multi-channel signals, only the signal information of a first channel is taken, and the sampling rates of all the samples are uniformly processed to 16000 Hz;
(1b) selecting one value from 32000, 48000 and 60000 as the total number of points in the Hamming window;
(1c) carrying out windowing framing processing on each sample by using a window function, and forming a training set by all intercepted samples subjected to windowing framing and labels;
(2) extracting the characteristics of each sample in the training set:
(2a) performing windowing and framing processing on each intercepted sample as in the step (1c) to obtain a secondary intercepted sample cut by each intercepted sample; respectively utilizing a spectrum roll-off point generation method, a spectrum centroid calculation formula, an energy calculation formula, a zero crossing rate calculation formula and an autocorrelation coefficient calculation formula to calculate a short-time spectrum roll-off point, a short-time spectrum centroid, short-time energy, a short-time zero crossing rate and a short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample, and respectively splicing the short-time spectrum roll-off point, the short-time spectrum centroid, the short-time energy, the short-time zero crossing rate and the short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample to obtain a spectrum roll-off point feature, a spectrum centroid feature, an energy feature, a zero crossing rate feature and an autocorrelation coefficient feature of the intercepted sample;
(2b) inputting each intercepted sample obtained in the step (1c) into a VGGish network, and taking the output of the network as the undescribable semantic features of the intercepted sample;
(2c) respectively carrying out principal component analysis on the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each sample, and reducing the dimensionality of each characteristic of the sample to 128;
(3) constructing a feature matrix set:
(3a) sequentially splicing the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each underwater acoustic signal according to rows to form a characteristic matrix of the underwater acoustic signal;
(3b) combining the feature matrixes of all the underwater acoustic signals into a feature matrix set;
(4) constructing a knowledge graph:
according to respective physical definitions of the features, dividing the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature and the autocorrelation coefficient feature after dimension reduction into four categories of tone, timbre, loudness and regularity respectively, and connecting each feature with the same type of feature; classifying the undescribable semantic features into a depth feature class; connecting undescribable semantic features with each feature, and forming a knowledge graph by the connected features;
(5) converting the knowledge-graph into a connection matrix:
(5a) respectively numbering the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature, the autocorrelation coefficient feature and the undescribable semantic feature after dimensionality reduction as 1,2,3,4,5 and 6;
(5b) initializing a connection matrix with 6 x 6 of dimensionalities and all zeros, connecting the spectrum centroid characteristics with the zero-crossing rate characteristics according to the knowledge graph, setting the values of the fourth column of the second row and the fourth column of the connection matrix as 1, and setting all elements of the sixth row and all elements of the sixth column of the connection matrix as 1 if the undescribable semantic characteristics are connected with the other five characteristics;
(5c) setting a diagonal element value of the connection matrix to 1;
(6) constructing a graph convolution network:
a four-layer graph convolution network is constructed, and the structure is as follows: the first graph volume layer, the second graph volume layer, the first full connection layer, the second full connection layer and the four network layers are sequentially connected; setting the sizes of the feature mapping matrixes of the first graph convolution layer and the second graph convolution layer as 128 × 100 and 100 × 64 respectively, wherein the number of feature mapping units of the first graph convolution layer and the second graph convolution layer is 384 and 3 respectively;
(7) training the graph convolutional network:
inputting the feature matrix set and the connection matrix into a graph convolution network, iteratively updating two feature mapping matrixes and two feature mapping units in the network, and stopping training when the output of a loss function is less than 0.01 or the number of training iterations reaches 350 to obtain a trained graph convolution network;
(8) identifying underwater targets:
and (3) extracting six characteristics of the sound signal of the underwater target to be identified by adopting the same operation as the step (2), obtaining a characteristic matrix of the sound signal by adopting the same splicing operation as the step (3a), and inputting the characteristic matrix of the sound signal and the connection matrix obtained in the step (5) into a trained graph convolution network together to obtain an identification result of the target sound signal.
2. The method for underwater object recognition based on convolution network of small sample training image as claimed in claim 1, wherein the step of windowing framing processing in step (1b) is as follows:
first, the amplitude of each point in the hamming window is calculated according to the following formula:
Figure FDA0003021383360000031
wherein, ω (n) represents the amplitude of the nth point in the Hamming window, cos represents the cosine operation, π represents the circumference ratio, M represents the total number of points in the Hamming window;
setting the first sampling point from the left in each sample as the initial point of the intercepted sample of the sample, and setting the initial point of the intercepted sample as the current initial point of the intercepted sample;
thirdly, selecting the starting point of the currently intercepted sample of each sample
Figure FDA0003021383360000032
A number of sample points are sampled at the time of sampling,
Figure FDA0003021383360000033
is equal to M, will
Figure FDA0003021383360000034
Each sampling point in the sampling points is sequentially multiplied by the amplitude of each point in the Hamming window correspondingly, all the products form an intercepted sample of the current iteration, and the label of the intercepted sample is set to be the same as the label of the corresponding sample;
fourthly, moving the starting point of the current intercepted sample of each sample to the right by gamma sampling points to obtain the updated starting point of the current intercepted sample, wherein gamma is 0.5M;
fifthly, repeating the third step and the fourth step until the number of sampling points from the starting point of the current intercepted sample to the last sampling point of each sampleThe number of sampling points in between is less than
Figure FDA0003021383360000035
And stopping the process to obtain all intercepted samples after windowing and framing.
3. The method for identifying underwater targets based on convolution network of small sample training image as claimed in claim 1, wherein the method for generating spectrum roll-off point in step (2a) is as follows:
the first step, each secondary intercepted sample cut by each intercepted sample is subjected to the following discrete Fourier transform to obtain a frequency domain sequence of each secondary intercepted sample:
Figure FDA0003021383360000036
wherein the content of the first and second substances,
Figure FDA0003021383360000037
q-th secondary cut sample c representing the i-th cut samplei,qIn the corresponding frequency domain sequence
Figure FDA0003021383360000038
The frequency value corresponding to each frequency point is obtained,
Figure FDA0003021383360000039
the sequence numbers of the frequency points in the frequency domain sequence are shown,
Figure FDA00030213833600000310
m1q-th secondary cut sample c representing the i-th cut samplei,qThe total number of samples included, Σ representing the summation operation, k representing the sequence number of the sample in the second truncated sample, xi,q(k) A value corresponding to the kth sampling point in the qth secondary truncated sample of the ith truncated sample cut, e(·)Denotes exponential operation with a natural constant e as the base, j denotes an imaginary numberThe unit symbol, π, represents the circumference ratio.
And secondly, sequentially accumulating the frequency value of each frequency point in each frequency domain sequence, stopping accumulation when the accumulated value is more than 85% of the total sum of the whole frequency domain sequence values, and taking the total number of accumulated frequency points at the moment as the characteristic value of the spectrum roll-off point of the frequency domain sequence.
4. The method for underwater target recognition based on convolution network of small sample training image according to claim 1, wherein the method for generating spectral centroid in step (2a) is as follows:
the first step, after each intercepted sample is cut, each secondary intercepted sample is subjected to the same discrete Fourier transform operation as the first step in the claim 3, and the frequency domain sequence of each secondary intercepted sample is obtained;
secondly, generating the spectrum centroid of each frequency domain sequence by using the following formula:
Figure FDA0003021383360000041
wherein the content of the first and second substances,
Figure FDA0003021383360000042
represents the q-th secondary cut sample c after the i-th cut sample is cuti,qA spectrum centroid characteristic value, L represents the length of the frequency domain sequence, w represents the sequence number of the frequency point in the frequency domain sequence, hi,q(w) the q-th secondary cut sample c representing the i-th cut sample cuti,qAnd the frequency value corresponding to the w-th frequency point in the corresponding frequency domain sequence.
5. The method for underwater object recognition based on convolution network of small sample training image according to claim 3, wherein the energy calculation formula in step (2a) is as follows:
Figure FDA0003021383360000043
wherein the content of the first and second substances,
Figure FDA0003021383360000044
q-th secondary cut sample c representing the i-th cut samplei,qShort-term energy of.
6. The method for identifying underwater targets based on convolution network of small sample training image as claimed in claim 3, wherein the zero-crossing rate calculation formula in step (2a) is as follows:
Figure FDA0003021383360000045
wherein the content of the first and second substances,
Figure FDA0003021383360000046
q-th secondary cut sample c representing the i-th cut samplei,qIs short-term zero-crossing rate, sgn (·) denotes a sign function, xi,q(k +1) denotes the value of the (k +1) th sample point in the qth sub-truncated sample of the ith truncated sample cut.
7. The method for underwater object recognition based on convolution network of small sample training image as claimed in claim 3, wherein the autocorrelation coefficient calculation formula in step (2a) is as follows:
Figure FDA0003021383360000051
wherein the content of the first and second substances,
Figure FDA0003021383360000052
q-th secondary cut sample c representing the i-th cut samplei,qThe short-time autocorrelation coefficients of (a) are,
Figure FDA0003021383360000053
representing a twice-truncated sample ci,qMean value of (1), xi,q(k + l) denotes the value of the (k + l) th sample point in the qth sub-truncated sample of the ith truncated sample cut.
CN202110403699.6A 2021-04-15 2021-04-15 Underwater target identification method based on small sample training diagram convolutional network Active CN113111786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110403699.6A CN113111786B (en) 2021-04-15 2021-04-15 Underwater target identification method based on small sample training diagram convolutional network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110403699.6A CN113111786B (en) 2021-04-15 2021-04-15 Underwater target identification method based on small sample training diagram convolutional network

Publications (2)

Publication Number Publication Date
CN113111786A true CN113111786A (en) 2021-07-13
CN113111786B CN113111786B (en) 2024-02-09

Family

ID=76717057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110403699.6A Active CN113111786B (en) 2021-04-15 2021-04-15 Underwater target identification method based on small sample training diagram convolutional network

Country Status (1)

Country Link
CN (1) CN113111786B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569773A (en) * 2021-08-02 2021-10-29 南京信息工程大学 Interference signal identification method based on knowledge graph and Softmax regression
CN116108353A (en) * 2023-04-12 2023-05-12 厦门大学 Small sample deep learning underwater sound target recognition method based on data packet

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019079972A1 (en) * 2017-10-24 2019-05-02 深圳和而泰智能控制股份有限公司 Specific sound recognition method and apparatus, and storage medium
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
WO2021042503A1 (en) * 2019-09-06 2021-03-11 平安科技(深圳)有限公司 Information classification extraction method, apparatus, computer device and storage medium
CN112488241A (en) * 2020-12-18 2021-03-12 贵州大学 Zero sample picture identification method based on multi-granularity fusion network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019079972A1 (en) * 2017-10-24 2019-05-02 深圳和而泰智能控制股份有限公司 Specific sound recognition method and apparatus, and storage medium
WO2021042503A1 (en) * 2019-09-06 2021-03-11 平安科技(深圳)有限公司 Information classification extraction method, apparatus, computer device and storage medium
CN112364779A (en) * 2020-11-12 2021-02-12 中国电子科技集团公司第五十四研究所 Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN112488241A (en) * 2020-12-18 2021-03-12 贵州大学 Zero sample picture identification method based on multi-granularity fusion network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
江彤彤;成金勇;鹿文鹏;: "基于卷积神经网络多层特征提取的目标识别", 计算机系统应用, no. 12 *
鲍开放;顾君忠;杨静;: "基于结构与文本联合表示的知识图谱补全方法", 计算机工程, no. 07 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569773A (en) * 2021-08-02 2021-10-29 南京信息工程大学 Interference signal identification method based on knowledge graph and Softmax regression
CN113569773B (en) * 2021-08-02 2023-09-15 南京信息工程大学 Interference signal identification method based on knowledge graph and Softmax regression
CN116108353A (en) * 2023-04-12 2023-05-12 厦门大学 Small sample deep learning underwater sound target recognition method based on data packet

Also Published As

Publication number Publication date
CN113111786B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
Kong et al. Weakly labelled audioset tagging with attention neural networks
CN108053836B (en) Audio automatic labeling method based on deep learning
CN112364779B (en) Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
Su et al. Performance analysis of multiple aggregated acoustic features for environment sound classification
CN105488466B (en) A kind of deep-neural-network and Acoustic Object vocal print feature extracting method
CN109815892A (en) The signal recognition method of distributed fiber grating sensing network based on CNN
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN102419974A (en) Sparse representation features for speech recognition
KR20200104019A (en) Machine learning based voice data analysis method, device and program
CN113111786B (en) Underwater target identification method based on small sample training diagram convolutional network
CN112750442B (en) Crested mill population ecological system monitoring system with wavelet transformation and method thereof
CN115762536A (en) Small sample optimization bird sound recognition method based on bridge transform
Yang et al. Classification of odontocete echolocation clicks using convolutional neural network
Soliman et al. Isolated word speech recognition using convolutional neural network
CN107731235A (en) Sperm whale and the cry pulse characteristicses extraction of long fin navigator whale and sorting technique and device
CN112052880A (en) Underwater sound target identification method based on weight updating support vector machine
Li et al. Audio recognition of Chinese traditional instruments based on machine learning
Hu et al. A lightweight multi-sensory field-based dual-feature fusion residual network for bird song recognition
CN113095381B (en) Underwater sound target identification method and system based on improved DBN
Wu et al. Audio-based expansion learning for aerial target recognition
RAMO et al. A novel approach to spoken Arabic number recognition based on developed ant lion algorithm
Chun et al. Research on music classification based on MFCC and BP neural network
Joshi et al. Comparative study of Mfcc and Mel spectrogram for raga classification using CNN
CN116153337B (en) Synthetic voice tracing evidence obtaining method and device, electronic equipment and storage medium
Chiu et al. A micro-control device of soundscape collection for mixed frog call recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant