CN113111786A

CN113111786A - Underwater target identification method based on small sample training image convolutional network

Info

Publication number: CN113111786A
Application number: CN202110403699.6A
Authority: CN
Inventors: 吴金建; 莫周; 石光明
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-04-15
Filing date: 2021-04-15
Publication date: 2021-07-13
Anticipated expiration: 2041-04-15
Also published as: CN113111786B

Abstract

The invention discloses an underwater target identification method based on a convolution network of a small sample training diagram, which mainly solves the problems that information is lost due to characteristic extraction of underwater sound data and the underwater sound data of the small sample cannot be effectively fitted with the network in the prior art. The method comprises the following steps: (1) generating a training set of small samples; (2) extracting the characteristics of each sample in the training set; (3) constructing a characteristic matrix set; (4) constructing a knowledge graph; (5) converting the knowledge graph into a connection matrix; (6) constructing a graph convolution network; (7) training a graph convolution network; (8) and identifying the underwater target. The characteristics of the underwater acoustic signals are fully represented by extracting a plurality of characteristics, and the characteristics are fused by utilizing the knowledge graph, so that the optimization direction is provided for the fitting of the network under the small sample data, and the method has the advantages that the network is not easy to over-fit, and the accuracy is high.

Description

Underwater target identification method based on small sample training image convolutional network

Technical Field

The invention belongs to the technical field of signal processing, and further relates to an underwater target identification method based on a small sample training graph convolutional network in the technical field of sound target identification. Aiming at a plurality of practical problems that underwater sound data is difficult to obtain, information is lost due to feature extraction and the like, the method realizes identification of an underwater small sample target by means of embedding knowledge through a graph network and enriching priori knowledge.

Background

At present, the main method for classifying and identifying underwater targets at home and abroad is a pattern identification method based on the combination of speech signal processing and a classifier, and generally adopted methods generally comprise the processes of data acquisition by using sonar, data preprocessing, feature extraction, classification decision and the like. The two most important steps are the selection of the feature extraction and classification method. The extraction of the features is generally that a single feature vector or a plurality of feature vectors are directly spliced to form a new feature vector, and the classification and identification model is mainly a classical machine learning method and a deep learning method of some mainstream, such as K neighbor, clustering, a support vector machine, a deep Neural network DNN (deep Neural networks) and the like. Although the methods obtain better classification results in underwater target identification tasks, the extracted features are not large in quantity, so that the risk of information loss exists after the feature extraction operation; the contact between the features is not considered by combining the prior knowledge, the features are spliced blindly and simply, and the contact between the features is not represented by utilizing the prior knowledge; a large amount of data is also required to train the network when using deep learning networks for classification. Therefore, when the underwater target recognition is performed on a small sample, the classification accuracy is not high.

In the patent document "a multi-feature fusion underwater target identification method" (application number: 202010930201.7 application publication number: CN112183582A) applied by the university of china ocean, a method for identifying a target by performing head-to-tail fusion on short-time energy features extracted from a time domain by a sound signal and gamma pass frequency cepstral coefficients gfcc (gamma frequency cepstral coefficients) extracted from a frequency domain to form a new feature vector is disclosed. The method comprises the following specific steps: standardizing the collected underwater acoustic signals, mapping the result to [0,1], extracting Short-time energy characteristics and GFCC characteristics of the signals on a time domain and a frequency domain respectively, splicing the Short-time energy characteristics and the GFCC characteristics end to form a new fused characteristic vector, and performing classification prediction by using a Convolutional Neural network (Convolutional Neural Networks) CNN-Long Short-Term Memory network LSTM (Long Short-Term Memory) integrated time sequence network model. The method improves the classification precision of the underwater target identification method by simulating the auditory perception characteristic of human ears. However, the method still has the disadvantages that the number of extracted features is small, various characteristics of the original sound signal cannot be comprehensively represented, information represented by each feature is independent and has mutual influence, the existing priori knowledge is used for representing the contact between the features, the knowledge can be better embedded into input data, and the two features are directly spliced and fused in the method, so that the part of information can be lost.

Wang Shenggui et al, in a paper published by it, "Underwater target recognition method research based on deep learning" (underwater target recognition method research based on deep learning [ J ]. Ship science and technology, 2020,42(23): 141-. The method comprises the following specific steps: the method comprises the steps of utilizing a deep convolutional neural network to realize self-adaption feature extraction of a target two-dimensional time frequency spectrogram (LOFAR), then adopting a full connection layer to transform features into a class space, and finally utilizing a softmax function to realize intelligent identification of underwater targets. The method effectively reduces the influence of noise. However, the method still has the disadvantage that a large amount of training data is required because the network used by the method comprises three convolutional layers and a fully connected layer, but the network cannot be effectively fitted when the number of samples is insufficient.

Disclosure of Invention

The invention aims to provide an underwater target identification method based on a small sample training graph convolutional neural network aiming at the defects of the prior art, which is used for solving the problems that the extracted signal features are few, the sound characteristics of an underwater target cannot be comprehensively represented and the network cannot be effectively fitted when the samples are insufficient.

The idea for realizing the purpose of the invention is as follows: by extracting six characteristics of the underwater sound signal, the tone characteristic, the loudness characteristic, the regularity characteristic and the depth characteristic of the underwater sound signal are comprehensively represented, and the problem that only information of one or two characteristics can be obtained and information of other category characteristics is lost after the characteristics of the underwater sound signal are extracted in engineering application is solved. The invention constructs the knowledge graph according to the physical definition of each characteristic, and utilizes the knowledge graph to aggregate the same characteristics of the input characteristics, thereby reducing the optimization space of the optimal solution of network search and solving the problem that the network cannot be effectively fitted when the data samples are insufficient in engineering application.

In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:

(1) generating a training set of small samples:

(1a) selecting at least 10 samples from each type of underwater sound signals, wherein each sample corresponds to a type label, if the samples are multi-channel signals, only the signal information of a first channel is taken, and the sampling rates of all the samples are uniformly processed to 16000 Hz;

(1b) selecting one value from 32000, 48000 and 60000 as the total number of points in the Hamming window;

(1c) carrying out windowing framing processing on each sample by using a window function, and forming a training set by all intercepted samples subjected to windowing framing and labels;

(2) extracting the characteristics of each sample in the training set:

(2a) performing windowing and framing processing on each intercepted sample as in the step (1c) to obtain a secondary intercepted sample cut by each intercepted sample; respectively utilizing a spectrum roll-off point generation method, a spectrum centroid calculation formula, an energy calculation formula, a zero crossing rate calculation formula and an autocorrelation coefficient calculation formula to calculate a short-time spectrum roll-off point, a short-time spectrum centroid, short-time energy, a short-time zero crossing rate and a short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample, and respectively splicing the short-time spectrum roll-off point, the short-time spectrum centroid, the short-time energy, the short-time zero crossing rate and the short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample to obtain a spectrum roll-off point feature, a spectrum centroid feature, an energy feature, a zero crossing rate feature and an autocorrelation coefficient feature of the intercepted sample;

(2b) inputting each intercepted sample obtained in the step (1c) into a VGGish network, and taking the output of the network as the undescribable semantic features of the intercepted sample;

(2c) respectively carrying out principal component analysis on the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each sample, and reducing the dimensionality of each characteristic of the sample to 128;

(3) constructing a feature matrix set:

(3a) sequentially splicing the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each underwater acoustic signal according to rows to form a characteristic matrix of the underwater acoustic signal;

(3b) combining the feature matrixes of all the underwater acoustic signals into a feature matrix set;

(4) constructing a knowledge graph:

according to respective physical definitions of the features, dividing the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature and the autocorrelation coefficient feature after dimension reduction into four categories of tone, timbre, loudness and regularity respectively, and connecting each feature with the same type of feature; classifying the undescribable semantic features into a depth feature class; connecting undescribable semantic features with each feature, and forming a knowledge graph by the connected features;

(5) converting the knowledge-graph into a connection matrix:

(5a) respectively numbering the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature, the autocorrelation coefficient feature and the undescribable semantic feature after dimensionality reduction as 1,2,3,4,5 and 6;

(5b) initializing a connection matrix with 6 x 6 of dimensionalities and all zeros, connecting the spectrum centroid characteristics with the zero-crossing rate characteristics according to the knowledge graph, setting the values of the fourth column of the second row and the fourth column of the connection matrix as 1, and setting all elements of the sixth row and all elements of the sixth column of the connection matrix as 1 if the undescribable semantic characteristics are connected with the other five characteristics;

(5c) setting a diagonal element value of the connection matrix to 1;

(6) constructing a graph convolution network:

a four-layer graph convolution network is constructed, and the structure is as follows: the first graph volume layer, the second graph volume layer, the first full connection layer, the second full connection layer and the four network layers are sequentially connected; setting the sizes of the feature mapping matrixes of the first graph convolution layer and the second graph convolution layer as 128 × 100 and 100 × 64 respectively, wherein the number of feature mapping units of the first graph convolution layer and the second graph convolution layer is 384 and 3 respectively;

(7) training the graph convolutional network:

inputting the feature matrix set and the connection matrix into a graph convolution network, iteratively updating two feature mapping matrixes and two feature mapping units in the network, and stopping training when the output of a loss function is less than 0.01 or the number of training iterations reaches 350 to obtain a trained graph convolution network;

(8) identifying underwater targets:

and (3) extracting six characteristic features of the sound signal of the underwater target to be recognized by adopting the same operation as the step (2), obtaining a characteristic matrix of the sound signal by adopting the same splicing operation as the step (3a), and inputting the characteristic matrix of the sound signal and the connection matrix obtained in the step (5) into a trained graph convolution network together to obtain a recognition result of the target sound signal.

Compared with the prior art, the invention has the following advantages:

firstly, the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature, the autocorrelation coefficient feature and the undescribable semantic feature are extracted from the underwater sound signal, and the tone feature, the tone color feature, the loudness feature and the depth feature of the underwater sound signal are respectively characterized by the six features, so that comprehensive characteristic expression of the underwater sound signal is obtained, the problem of feature information loss during extraction of the characteristic of the underwater sound signal is solved, and the accuracy of underwater sound identification is improved.

Secondly, the knowledge graph is constructed according to the actual definition of each feature of the underwater sound signal, and the feature association degree between the features is set by using the knowledge graph, so that the same features between the features are fully aggregated, the different features between the features are amplified, the network distinguishability is enhanced, and the network can be better fitted, so that the network which is not easy to be over-fitted can be trained when the problem of insufficient samples is faced.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The implementation steps of the present invention are further described in detail with reference to fig. 1.

Step 1, generating a training set of small samples.

At least 10 samples are selected from each type of underwater sound signals, each sample corresponds to a class label, if the samples are multi-channel signals, only the signal information of the first channel is taken, and the sampling rates of all the samples are uniformly processed to 16000 Hz.

Any one value from 32000, 48000, 60000 is taken as the total number of points in the hamming window.

Each sample is windowed and framed with a window function. Windowing and framing processing are performed because the sound emitted by an underwater target signal is disordered macroscopically and stable microscopically, and therefore windowing and framing processing needs to be performed on the signal to obtain a short-time and stable intercepted sample. And forming a training set by all the intercepted samples and the labels after windowing and framing.

The windowing framing process is as follows.

Step 1, calculating the amplitude of each point in the Hamming window according to the following formula:

where ω (n) represents the amplitude of the nth point in the Hamming window, cos represents the cosine operation, π represents the circumference ratio, and M represents the total number of points in the Hamming window.

And 2, setting the first sampling point from the left in each sample as the starting point of the intercepted sample of the sample, and setting the starting point of the intercepted sample as the starting point of the current intercepted sample.

Step 3, selecting the starting point of the currently intercepted sample of each sample

A number of sample points are sampled at the time of sampling,

is equal to M, will

And (3) multiplying each sampling point in the sampling points by the amplitude of each point in the Hamming window in sequence, forming all products into an intercepted sample of the current iteration, and setting the label of the intercepted sample to be the same as the label of the corresponding sample.

And 4, moving the current intercepted sample starting point of each sample to the right by gamma sampling points to obtain an updated current intercepted sample starting point, wherein gamma is 0.5M.

And 5, repeating the 3 rd step and the 4 th step of the step until the number of sampling points from the starting point of the current intercepted sample to the last sampling point of each sample is less than

And stopping the process to obtain all intercepted samples after windowing and framing.

And 2, extracting the characteristics of each sample in the training set.

And carrying out windowing and framing processing on each intercepted sample again to obtain a secondary intercepted sample cut by each intercepted sample. Respectively utilizing a spectrum roll-off point generation method, a spectrum centroid calculation formula, an energy calculation formula, a zero crossing rate calculation formula and an autocorrelation coefficient calculation formula to calculate a short-time spectrum roll-off point, a short-time spectrum centroid, short-time energy, a short-time zero crossing rate and a short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample, and respectively splicing the short-time spectrum roll-off point, the short-time spectrum centroid, the short-time energy, the short-time zero crossing rate and the short-time autocorrelation coefficient of each secondary intercepted sample cut by each intercepted sample to obtain a spectrum roll-off point feature, a spectrum centroid feature, an energy feature, a zero crossing rate feature and an autocorrelation coefficient feature of the intercepted sample. And inputting each intercepted sample into a VGGish model network, and taking the output of the network as the undescribable semantic features of the intercepted sample.

The spectral roll-off point generation method is as follows.

Step 1, performing the following discrete Fourier transform on each secondary intercepted sample cut by each intercepted sample to obtain a frequency domain sequence of each secondary intercepted sample:

wherein the content of the first and second substances,

q-th secondary cut sample c representing the i-th cut sample_i,qIn the corresponding frequency domain sequence

The frequency value corresponding to each frequency point is obtained,

the sequence numbers of the frequency points in the frequency domain sequence are shown,

m₁q-th secondary cut sample c representing the i-th cut sample_i,qThe total number of samples included, Σ representing the summation operation, k representing the sequence number of the sample in the second truncated sample, x_i,q(k) Indicating the ith cut sampleThe corresponding numerical value of the kth sampling point in the q-th secondary intercepted sample, e^(·)Denotes exponential operation with a natural constant e as the base, j denotes the imaginary unit symbol, and pi denotes the circumferential ratio.

And 2, sequentially accumulating the frequency value of each frequency point in each frequency domain sequence, stopping accumulation when the accumulated value is more than 85 percent of the total sum of the whole frequency domain sequence values, and taking the total number of the accumulated frequency points as the characteristic value of the spectrum roll-off point of the frequency domain sequence.

The spectral centroid generation method is as follows.

And step 1, performing the same discrete Fourier transform operation on each secondary intercepted sample after each intercepted sample is cut to obtain a frequency domain sequence of each secondary intercepted sample.

And 2, generating the spectrum centroid of each frequency domain sequence by using the following formula:

wherein the content of the first and second substances,

represents the q-th secondary cut sample c after the i-th cut sample is cut_i,qA spectrum centroid characteristic value, L represents the length of the frequency domain sequence, w represents the sequence number of the frequency point in the frequency domain sequence, h_i,q(w) the q-th secondary cut sample c representing the i-th cut sample cut_i,qAnd the frequency value corresponding to the w-th frequency point in the corresponding frequency domain sequence.

The energy calculation formula is as follows:

wherein the content of the first and second substances,

q-th secondary cut sample c representing the i-th cut sample_i,qShort-term energy of.

The zero crossing rate calculation formula is as follows:

wherein the content of the first and second substances,

q-th secondary cut sample c representing the i-th cut sample_i,qIs short-term zero-crossing rate, sgn (·) denotes a sign function, x_i,q(k +1) denotes the value of the (k +1) th sample point in the qth sub-truncated sample of the ith truncated sample cut.

The autocorrelation coefficient calculation formula is as follows:

wherein the content of the first and second substances,

q-th secondary cut sample c representing the i-th cut sample_i,qThe short-time autocorrelation coefficients of (a) are,

representing a twice-truncated sample c_i,qMean value of (1), x_i,q(k + l) denotes the value of the (k + l) th sample point in the qth sub-truncated sample of the ith truncated sample cut.

And respectively carrying out principal component analysis on the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each sample, and reducing the dimensionality of each characteristic of the sample to 128.

And 3, constructing a feature matrix set.

And sequentially splicing the spectral roll-off point characteristic, the spectral centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of each underwater sound signal according to rows to form a characteristic matrix of the underwater sound signal.

And combining the feature matrixes of all the underwater sound signals into a feature matrix set.

And 4, constructing a knowledge graph.

According to respective physical definitions of the features, dividing the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature and the autocorrelation coefficient feature after dimension reduction into four categories of tone, timbre, loudness and regularity respectively, connecting each feature with the same type of feature, and disconnecting different types of features; classifying the undescribable semantic features into a depth feature class; and connecting the undescribable semantic features with each feature, and forming a knowledge graph by the connected features.

The method for defining the spectrum roll-off point characteristic as the tone class characteristic comprises the following steps: the frequency spectrum roll-off point is characterized in that the proportion of the low-frequency energy of the underwater acoustic signal to the total energy of the underwater acoustic signal represents the strength of the low-frequency energy of the underwater acoustic signal; the pitch represents the height of the frequency of the underwater sound signal, so the characteristic of the frequency spectrum roll-off point is classified as a pitch-class characteristic.

The method for defining the zero-crossing rate characteristic and the spectral centroid characteristic as the tone class characteristic comprises the following steps: the zero-crossing rate characteristic represents the zero-crossing times of the underwater acoustic signal and represents the waveform change rate characteristic of the underwater acoustic signal; the spectral centroid characteristic is one of important physical parameters for describing the timbre attribute of the underwater acoustic signal and is the center of gravity of the frequency components of the underwater acoustic signal; the tone represents that different underwater sound signals are always represented by distinctive characteristics in terms of waveforms, so that the zero-crossing rate characteristic and the spectral centroid characteristic are classified into tone-class characteristics.

The method for defining the energy characteristics as the tone class characteristics comprises the following steps: the energy characteristics represent the strength of underwater acoustic signals at different moments; loudness is a physical quantity describing the strength of an underwater sound signal, so energy features are classified as loudness-like features.

The method for defining the autocorrelation coefficient characteristics as the regularity class characteristics comprises the following steps: the autocorrelation coefficient characteristic is the similarity between two observed values of the underwater sound signal, and reflects the degree of regularity of the underwater sound signal, so that the characteristic is classified as a regularity characteristic.

The method for defining the undescribable semantic features as the depth feature class features comprises the following steps: the undescribable semantic features are high-dimensional features extracted through a VGGish network, do not have clear physical meanings, and therefore the features are classified as deep feature class features.

And 5, converting the knowledge graph into a connection matrix.

And respectively numbering the spectral roll-off point feature, the spectral centroid feature, the energy feature, the zero-crossing rate feature, the autocorrelation coefficient feature and the undescribable semantic feature after dimensionality reduction as 1,2,3,4,5 and 6.

Initializing a connection matrix with all zeros of 6 x 6 in dimensionality, connecting the spectral centroid features with the zero-crossing rate features according to the knowledge graph, setting the values of the fourth column in the second row and the fourth column of the connection matrix and the values of the second column in the fourth row to be 1, connecting the undescribable semantic features with other five features, and setting all elements of the sixth row and all elements of the sixth column of the connection matrix to be 1.

The diagonal element value of the connection matrix is set to 1.

And 6, constructing a graph convolution network.

A four-layer graph convolution network is constructed, and the structure is as follows: the first graph volume layer, the second graph volume layer, the first full connection layer, the second full connection layer and the four network layers are sequentially connected; the sizes of the feature mapping matrixes of the first graph convolution layer and the second graph convolution layer are respectively set to be 128 × 100 and 100 × 64, and the number of feature mapping units of the first graph convolution layer and the second graph convolution layer is 384 and 3.

And 7, training a graph convolution network.

Inputting the feature matrix set and the connection matrix into a graph convolution network, iteratively updating two feature mapping matrixes and two feature mapping units in the network, and stopping training when the output of the loss function is less than 0.01 or the training iteration times reach 350 times to obtain the trained graph convolution network.

And 8, identifying the underwater target.

Extracting the spectrum roll-off point characteristic, the spectrum centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of the underwater sound signal to be recognized, sequentially splicing the spectrum roll-off point characteristic, the spectrum centroid characteristic, the energy characteristic, the zero-crossing rate characteristic, the autocorrelation coefficient characteristic and the undescribable semantic characteristic of the underwater sound signal according to rows to obtain a characteristic matrix of the sound signal, and inputting the characteristic matrix connection matrix of the sound signal into a trained graph convolution network together to obtain a recognition result of the target sound signal.

The effect of the present invention is further explained by combining the simulation experiment as follows:

1. simulation experiment conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: the processor is an Intel i 79750H CPU, the main frequency is 2.60GHz, and the memory is 16 GB.

The software platform of the simulation experiment of the invention is as follows: windows 10 operating system and python 3.6.

2. Simulation content and result analysis thereof:

the simulation experiment of the invention is to classify the input underwater sound signals respectively by adopting the invention and a prior art (SVM classification method).

The underwater sound signal data used by the simulation experiment of the invention is three types of simulated underwater sound data generated by an underwater sound signal simulator, and the three types of simulated underwater sound data are warship sound, civil ship sound and submarine sound respectively.

The effects of the present invention are further described below with reference to table 1.

And (3) evaluating the results of classifying the three types of underwater sound signal data by using two methods respectively by using two evaluation indexes (classification precision of each type and total precision OA).

The overall accuracy OA, the accuracy of each classification of the three types of underwater acoustic signals, is calculated using the following formula, and all the calculation results are plotted in table 1:

TABLE 1 quantitative analysis table of classification results of the present invention and the prior art in simulation experiments

As can be seen from Table 1, the total classification accuracy OA of the method is 94.0 percent, which is higher than that of the SVM technical method, and the classification accuracy of each class is also higher than that of the SVM technical method, so that the method can obtain higher classification accuracy of the underwater acoustic signals.

The above simulation experiments show that: the method can comprehensively express the detail information in the underwater sound signal by using the extracted multiple characteristics, and utilizes the knowledge graph designed according to the characteristic definition to aggregate the characteristics among the characteristics and amplify the different characteristics among the characteristics, so that the expressed information of the characteristics is more prominent, and the key characteristic information can be better learned by a network, thereby ensuring that the network has a more correct optimization direction, ensuring that the network is not easy to generate an overfitting phenomenon, and being beneficial to final classification.

Claims

1. The underwater target identification method based on the convolution network of the small sample training image is characterized in that six characteristics of sound emitted by an underwater target signal are extracted, and a knowledge graph is constructed according to respective physical definitions of the characteristics; the method comprises the following specific steps:

(1) generating a training set of small samples:

(2) extracting the characteristics of each sample in the training set:

(3) constructing a feature matrix set:

(4) constructing a knowledge graph:

(5) converting the knowledge-graph into a connection matrix:

(5c) setting a diagonal element value of the connection matrix to 1;

(6) constructing a graph convolution network:

(7) training the graph convolutional network:

(8) identifying underwater targets:

and (3) extracting six characteristics of the sound signal of the underwater target to be identified by adopting the same operation as the step (2), obtaining a characteristic matrix of the sound signal by adopting the same splicing operation as the step (3a), and inputting the characteristic matrix of the sound signal and the connection matrix obtained in the step (5) into a trained graph convolution network together to obtain an identification result of the target sound signal.

2. The method for underwater object recognition based on convolution network of small sample training image as claimed in claim 1, wherein the step of windowing framing processing in step (1b) is as follows:

first, the amplitude of each point in the hamming window is calculated according to the following formula:

wherein, ω (n) represents the amplitude of the nth point in the Hamming window, cos represents the cosine operation, π represents the circumference ratio, M represents the total number of points in the Hamming window;

setting the first sampling point from the left in each sample as the initial point of the intercepted sample of the sample, and setting the initial point of the intercepted sample as the current initial point of the intercepted sample;

thirdly, selecting the starting point of the currently intercepted sample of each sample

A number of sample points are sampled at the time of sampling,

is equal to M, will

Each sampling point in the sampling points is sequentially multiplied by the amplitude of each point in the Hamming window correspondingly, all the products form an intercepted sample of the current iteration, and the label of the intercepted sample is set to be the same as the label of the corresponding sample;

fourthly, moving the starting point of the current intercepted sample of each sample to the right by gamma sampling points to obtain the updated starting point of the current intercepted sample, wherein gamma is 0.5M;

fifthly, repeating the third step and the fourth step until the number of sampling points from the starting point of the current intercepted sample to the last sampling point of each sampleThe number of sampling points in between is less than

3. The method for identifying underwater targets based on convolution network of small sample training image as claimed in claim 1, wherein the method for generating spectrum roll-off point in step (2a) is as follows:

the first step, each secondary intercepted sample cut by each intercepted sample is subjected to the following discrete Fourier transform to obtain a frequency domain sequence of each secondary intercepted sample:

wherein the content of the first and second substances,

The frequency value corresponding to each frequency point is obtained,

m₁q-th secondary cut sample c representing the i-th cut sample_i,qThe total number of samples included, Σ representing the summation operation, k representing the sequence number of the sample in the second truncated sample, x_i,q(k) A value corresponding to the kth sampling point in the qth secondary truncated sample of the ith truncated sample cut, e^(·)Denotes exponential operation with a natural constant e as the base, j denotes an imaginary numberThe unit symbol, π, represents the circumference ratio.

And secondly, sequentially accumulating the frequency value of each frequency point in each frequency domain sequence, stopping accumulation when the accumulated value is more than 85% of the total sum of the whole frequency domain sequence values, and taking the total number of accumulated frequency points at the moment as the characteristic value of the spectrum roll-off point of the frequency domain sequence.

4. The method for underwater target recognition based on convolution network of small sample training image according to claim 1, wherein the method for generating spectral centroid in step (2a) is as follows:

the first step, after each intercepted sample is cut, each secondary intercepted sample is subjected to the same discrete Fourier transform operation as the first step in the claim 3, and the frequency domain sequence of each secondary intercepted sample is obtained;

secondly, generating the spectrum centroid of each frequency domain sequence by using the following formula:

wherein the content of the first and second substances,

5. The method for underwater object recognition based on convolution network of small sample training image according to claim 3, wherein the energy calculation formula in step (2a) is as follows:

wherein the content of the first and second substances,

6. The method for identifying underwater targets based on convolution network of small sample training image as claimed in claim 3, wherein the zero-crossing rate calculation formula in step (2a) is as follows:

wherein the content of the first and second substances,

7. The method for underwater object recognition based on convolution network of small sample training image as claimed in claim 3, wherein the autocorrelation coefficient calculation formula in step (2a) is as follows:

wherein the content of the first and second substances,