CN116797938A

CN116797938A - SAR ship classification method based on contrast learning pre-training

Info

Publication number: CN116797938A
Application number: CN202310567081.2A
Authority: CN
Inventors: 王英华; 张超; 刘宏伟
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2023-05-19
Filing date: 2023-05-19
Publication date: 2023-09-22

Abstract

The invention discloses a SAR ship classification method based on contrast learning pre-training, which comprises the following steps: constructing a feature fusion network, wherein the feature fusion network comprises a feature extraction module, a first feature fusion module and a second feature fusion module which are sequentially connected; constructing a SimCLR comparison learning network framework taking a feature fusion network as a feature extraction network; acquiring a plurality of groups of mat-format training data sets and a plurality of groups of JPG-format picture training data sets, and respectively inputting the plurality of groups of JPG-format picture training data sets into a SimCLR network frame for training to acquire a plurality of groups of pre-training models; loading the parameters of a plurality of groups of pre-training models into a feature fusion network, and further training by utilizing a plurality of groups of mat format training data sets to obtain a trained feature fusion network; and inputting the original SAR image to be classified into the trained feature fusion network to obtain a classification result. According to the invention, the pre-training model is obtained by using the unsupervised contrast learning and the feature fusion network is combined, so that the ship classification performance is improved.

Description

SAR ship classification method based on contrast learning pre-training

Technical Field

The invention belongs to the technical field of image target classification, and particularly relates to a SAR ship classification method based on contrast learning pre-training.

Background

Since SAR (Synthetic Aperture Radar ) was developed in the 50 s of the 20 th century, SAR image target classification techniques have also been rapidly developed. In machine learning-based classification of SAR image ship targets, ji et al extract position feature parameters (mean, variance, and range) in the RCS (Radar Cross Section, radar cross-sectional area) statistical features of ships, distribution feature parameters (skewness coefficient, kurtosis coefficient), distribution analysis (probability density function, cumulative distribution function), and percentile probability value distribution (10%, 50%, and 90%) and use BP neural network to realize classification of ship images. In deep learning-based SAR image ship target classification, features of SAR images are extracted by using a convolutional neural network to classify, the convolutional neural network starts from a LeNet-5 network in 1998, a handwriting digital recognition function is realized, an AlexNet network proposed in an Alex image classification match in 2012 is raised, and new classical networks (such as VGG, googLeNet and ResNet and the like) are continuously applied to SAR image ship target classification.

The invention provides a SAR ship target recognition method based on deep dense connection network and metric learning, which is provided by the invention with the patent name of CN201911238758.8, wherein the method is based on the idea of metric learning, acquires the depth characteristics of similar samples and heterogeneous samples through a triplet network, and reduces the distance of the similar samples in a characteristic space and pushes away the distance of the heterogeneous samples through optimizing a loss function. The feature extraction network in the triplet network is an improved DenseNet network, and the loss function is a joint loss function obtained by weighting and combining the cross entropy loss function, the triplet loss function and the Fisher discrimination regularization term. The invention improves the feature extraction network DenseNet and the loss function, and improves the classification result to a certain extent. However, because the network parameter is large, the training is time-consuming, and the method only uses deep features, is abstract, and ignores the features such as texture information in the image, the result of the method still has a lifting space.

The current SAR image ship target classification method mainly uses a convolutional neural network to extract deep features of an image for classification, and the training mode is usually de-novo training. On one hand, the deep features have stronger semantic information, but the resolution is very low, and the perception capability of details is poor; the shallow features have higher resolution, including more position and detail information, and classification by using deep features alone is not fully utilized.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a SAR ship classification method based on contrast learning pre-training. The technical problems to be solved by the invention are realized by the following technical scheme:

the invention provides a SAR ship classification method based on contrast learning pre-training, which comprises the following steps:

s1: constructing a feature fusion network, wherein the feature fusion network comprises a feature extraction module, a first feature fusion module and a second feature fusion module which are sequentially connected, and the feature extraction module is used for carrying out preliminary feature extraction on an input SAR image to obtain shallow features and deep features of the SAR image; the first feature fusion module is used for realizing fusion from the deep features to the shallow features and obtaining the fused shallow features; the second feature fusion module is used for realizing continuous fusion from the fused shallow features to the deep features and outputting image category probabilities;

S2: constructing a SimCLR comparison learning network framework taking the feature fusion network as a feature extraction network, wherein the SimCLR comparison learning network framework comprises a feature extraction network and a feature mapping network which are cascaded;

s3: acquiring a plurality of groups of mat-format training data sets and a plurality of groups of JPG-format picture training data sets, and respectively inputting the plurality of groups of JPG-format picture training data sets into the SimCLR contrast learning network framework for training to acquire a plurality of groups of pre-training models;

s4: loading parameters of the multiple groups of pre-training models into the feature fusion network, and further training the feature fusion network by utilizing the multiple groups of mat-format training data sets to obtain a trained feature fusion network;

s5: and inputting the original SAR image to be classified into the trained feature fusion network to obtain a classification result.

In one embodiment of the present invention, the feature extraction module includes a first convolution layer CO connected in sequence ₁ First normalization layer B ₁ First activation function layer R ₁ Second convolution layer CO ₂ Normalization layer B of the second batch ₂ Second activation function layer R ₂ First dense connection and transition module M ₁ Second dense connection and transition module M ₂ Third dense connection and transition module M ₃ Fourth dense connection and transition module M ₄ 。

In one embodiment of the invention, the first dense connection and transition module M ₁ Said second dense connection and transition module M ₂ Said third dense connection and transition module M ₃ Said fourth dense connection and transition module M ₄ The structures of the two are the same, each comprises a cascade dense connection unit and a transition unit, wherein,

the dense connection unit comprises four subunits connected in series, each subunit comprises a batch normalization layer, an activation function layer, a convolution layer and a splicing layer, the input and the output of the former subunit are spliced to be used as the input of the latter subunit, and the input and the output of the last subunit are spliced to be used as the output of the dense connection unit;

the transition unit comprises a batch normalization layer, an activation function layer, a convolution layer and a pooling layer which are sequentially connected.

In one embodiment of the present invention, the first feature fusion module includes a first deconvolution layer T ₁ First splice layer CA ₁ Third convolution layer CO ₃ Second deconvolution layer T ₂ Second splice layer CA ₂ Fourth convolution layer CO ₄ Third deconvolution layer T ₃ Third splice layer CA ₃ Fifth convolution layer CO ₅ Fourth deconvolution layer T ₄ Fourth splice layer CA ₄ And a sixth convolution layer CO ₆ Wherein, the method comprises the steps of, wherein,

the first deconvolution layer T ₁ Is connected with the input of the fourth dense connection and transition module M ₄ The first splice layer CA ₁ Is connected to the third dense connection and transition module M respectively ₃ And the first deconvolution layer T ₁ The first splice layer CA ₁ Is connected with the output of the third convolution layer CO ₃ Is input to the computer; the second deconvolution layer T ₂ Is connected with the input of the third convolution layer CO ₃ The second splice layer CA ₂ Is connected to the second dense connection and transition module M respectively ₂ And the second deconvolution layer T ₂ The second splice layer CA ₂ Is connected with the output of the fourth convolution layer CO ₄ Is input to the computer; the third deconvolution layer T ₃ Is connected with the input of the fourth convolution layer CO ₄ The third splicing layer CA ₃ Is connected to the first dense connection and transition module M respectively ₁ And the third deconvolution layer T ₃ The third splicing layer CA ₃ Is connected with the output of the fifth convolution layer CO ₅ Is input to the computer; the fourth deconvolution layer T ₄ Is connected with the input of the fifth convolution layer CO ₅ The fourth splice layer CA ₄ Respectively connected with the second activation function layer R ₂ And the fourth deconvolution layer T ₄ The fourth splice layer CA ₄ Is connected with the output of the sixth convolution layer CO ₆ Is input to the computer.

In one embodiment of the present invention, the second feature fusion module includes a seventh convolution layer CO ₇ Third, thirdBatch normalization layer B ₃ Third activation function layer R ₃ First pooling layer P ₁ Fifth splice layer CA ₅ Eighth convolution layer CO ₈ Normalization layer B of the fourth batch ₄ Fourth activation function layer R ₄ Second pooling layer P ₂ Sixth splice layer CA ₆ Ninth convolution layer CO ₉ Fifth normalization layer B ₅ Fifth activation function layer R ₅ Third pooling layer P ₃ Seventh splice layer CA ₇ Tenth convolution layer CO ₁₀ Normalization layer B of sixth batch ₆ Sixth activation function layer R ₆ Fourth pooling layer P ₄ Eighth splice layer CA ₈ Eleventh convolution layer CO ₁₁ Normalization layer B of seventh batch ₇ Seventh activation function layer R ₇ Fifth pooling layer P ₅ Normalized layer B of eighth batch ₈ Eighth activation function layer R ₈ Layer P of sixth pooling ₆ A flattening layer FL, a first full connection layer FC ₁ And a second full connection layer FC ₂ Wherein, the method comprises the steps of, wherein,

the seventh convolution layer CO ₇ The third normalization layer B ₃ The third activation function layer R ₃ And the first pooling layer P ₁ Sequentially cascade and said seventh convolution layer CO ₇ Is connected with the input of the sixth convolution layer CO ₆ The first pooling layer P ₁ And the output of the fifth convolution layer CO ₅ Is connected with the output of the fifth splicing layer CA ₅ Is input to the computer; the eighth convolution layer CO ₈ The fourth normalization layer B ₄ The fourth activation function layer R ₄ And the second pooling layer P ₂ Sequentially cascade and said eighth convolutional layer CO ₈ Is connected with the fifth splicing layer CA ₅ The second pooling layer P ₂ Is output with the fourth convolution layer CO ₄ Is connected with the output of the sixth splicing layer CA ₆ Is input to the computer;

the ninth convolution layer CO ₉ The fifth normalization layer B ₅ The fifth activation function layer R ₅ And the third pooling layer P ₃ Secondary stageAnd the ninth convolution layer CO ₉ Is connected with the input of the sixth splicing layer CA ₆ The third pooling layer P ₃ And the output of the third convolution layer CO ₃ Is connected with the seventh splicing layer CA ₇ Is input to the computer; the tenth convolution layer CO ₁₀ The sixth normalization layer B ₆ The sixth activation function layer R ₆ And the fourth pooling layer P ₄ Sequentially cascade and said tenth convolution layer CO ₁₀ Is connected with the seventh splicing layer CA ₇ The fourth pooling layer P ₄ Is connected with the output of the fourth dense connection and transition module M ₄ Is connected with the eighth splicing layer CA ₈ Is input to the computer; the eleventh convolution layer CO ₁₁ The seventh normalization layer B ₇ The seventh activation function layer R ₇ And the fifth pooling layer P ₅ Sequentially connected with the eleventh convolution layer CO ₁₁ Is connected with the eighth splicing layer CA ₈ The fifth pooling layer P ₅ The outputs of the (B) are sequentially connected with the eighth normalized layer B ₈ The eighth activation function layer R ₈ The sixth pooling layer P ₆ The flattened layer FL and the first full-connection layer FC ₁ And the second full connection layer FC ₂ 。

In one embodiment of the present invention, the feature mapping network is a multi-layer perceptron, comprising a third fully connected layer FC in cascade ₃ Ninth activation function layer R ₉ And a fourth full connection layer FC ₄ And the third full connection layer FC ₃ Is connected with the input of the second full connection layer FC ₂ Is provided.

In one embodiment of the present invention, the S3 includes:

s3.1: dividing the OpenSARShip data set into 5 training sets and a test set according to the proportion of 8:2 to obtain 5 training sets { phi } ₁ ,φ ₂ ,φ ₃ ,φ ₄ ,φ ₅ Sum of corresponding 5 test sets { t } ₁ ,t ₂ ,t ₃ ,t ₄ ,t ₅ And training set { phi }, and ₁ ,φ ₂ ,φ ₃ ,φ ₄ ,φ ₅ data expansion and clipping are carried out to obtain a training set { phi }, and the training set { phi } is obtained ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ Test set { t } ₁ ,t ₂ ,t ₃ ,t ₄ ,t ₅ Cutting to obtain a test set { T } ₁ ,T ₂ ,T ₃ ,T ₄ ,T ₅ Training set { phi }, mat format ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ Conversion to JPG format picture training set { phi } ₁ ',Φ ₂ ',Φ ₃ ',Φ ₄ ',Φ ₅ '}；

S3.2: the JPG format picture data set { phi } ₁ ',Φ ₂ ',Φ ₃ ',Φ ₄ ',Φ ₅ 'inputting the training data into the SimCLR contrast learning framework to perform pre-training to obtain a pre-training model { psi' ₁ ,ψ' ₂ ,ψ' ₃ ,ψ' ₄ ,ψ' ₅ The loss function used was the contrast loss:

wherein z is _i Representing feature vector, z obtained by feature extraction and feature mapping of ith training data in training data set _j Representing feature vectors obtained by feature extraction and feature mapping of jth training data in training data set, s _i,j Representing similarity of feature vectors of the ith training data and the jth training data, wherein L (i, j) represents similarity of feature vectors after function transformation, tau represents a temperature coefficient, and L represents all samples obtained by enhancing data of each sample in data containing N samplesAverage value of similarity of pairs. In the calculation process, for the kth sample in the data of the N samples, the 2k-1 and the 2k sample in the data of the 2N samples after data enhancement are obtained through two data enhancement modes of random clipping, scaling and Gaussian blur.

In one embodiment of the present invention, the S4 includes:

the pre-training model { ψ' ₁ ,ψ' ₂ ,ψ' ₃ ,ψ' ₄ ,ψ' ₅ The parameters of each pre-training model in the JPG format are loaded into the feature fusion network psi and are matched with the JPG format picture data set { phi } ₁ ',Φ ₂ ',Φ ₃ ',Φ ₄ ',Φ ₅ ' Mat-format data set { Φ } corresponding to }, and method for preparing the same ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ And inputting the data into the feature fusion network psi for fine tuning training to obtain a trained feature fusion network.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the SAR ship classification method based on contrast learning pre-training, the pre-training model is obtained through the non-supervision contrast learning, the non-supervision contrast learning is realized through a simple SimCLR framework, and the network can obtain better feature extraction capability without label information; compared with the method that the method for loading the pretrained model for fine adjustment can obtain better network parameters through direct training, the method has the advantages that the network convergence speed is higher, and the classification accuracy is higher.

2. The invention combines the shallow layer characteristics and the deep layer characteristics by using the convolution neural network with the bidirectional characteristic fusion structure of the path aggregation network (Path Aggregation Network, PANet), thereby improving the image characteristic extraction capability and the classification accuracy of the convolution neural network.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

FIG. 1 is a flow chart of a SAR ship classification method based on contrast learning pre-training provided by an embodiment of the invention;

FIG. 2 is a network frame structure diagram of a feature fusion network provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a dense connection unit according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a transition unit according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a SimCLR contrast learning network framework according to an embodiment of the present invention;

FIG. 6 is an image of three classes of SAR ships used in embodiments of the present invention;

FIG. 7 is a schematic diagram of a dense connected network based on a triplet network and Fisher criterion for use in comparative experiments in accordance with embodiments of the present invention;

fig. 8 is a learning graph of the triansanenet and improved network loading pre-training model fine tuning on three classes of SAR ship images in accordance with an embodiment of the present invention.

Detailed Description

In order to further explain the technical means and effects adopted by the invention to achieve the preset aim, the SAR ship classification method based on contrast learning pre-training is described in detail below with reference to the attached drawings and the specific embodiments.

The foregoing and other features, aspects, and advantages of the present invention will become more apparent from the following detailed description of the preferred embodiments when taken in conjunction with the accompanying drawings. The technical means and effects adopted by the present invention to achieve the intended purpose can be more deeply and specifically understood through the description of the specific embodiments, however, the attached drawings are provided for reference and description only, and are not intended to limit the technical scheme of the present invention.

It should be noted that in this document relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in an article or apparatus that comprises the element.

Referring to fig. 1, fig. 1 is a flowchart of a SAR ship classification method based on contrast learning pre-training according to an embodiment of the present invention. The SAR ship classification method comprises the following steps:

s1: constructing a feature fusion network psi combining PANet and DenseNet, wherein the feature fusion network psi comprises a feature extraction module, a first feature fusion module and a second feature fusion module which are sequentially connected, and the feature extraction module is used for carrying out preliminary feature extraction on an input SAR image to obtain shallow features and deep features of the SAR image; the first feature fusion module is used for realizing fusion from the deep features to the shallow features and obtaining the fused shallow features; the second feature fusion module is used for realizing continuous fusion from the fused shallow features to the deep features and outputting image category probabilities.

Referring to fig. 2, fig. 2 is a network frame structure diagram of a feature fusion network according to an embodiment of the present invention. The feature extraction module of this embodiment includes two convolution layers, two batch normalization layers, two activation function layers, and four dense connection modules, and specifically includes a first convolution layer CO sequentially connected ₁ First normalization layer B ₁ First activation function layer R ₁ Second convolution layer CO ₂ Normalization layer B of the second batch ₂ Second activation function layer R ₂ First dense connection and transition module M ₁ Second dense connection and transition module M ₂ Third dense connection and transition module M ₃ Fourth dense connection and transition module M ₄ 。

Further, a first dense connection and transition module M ₁ Second dense connection and transition module M ₂ Third dense connection and transition module M ₃ Fourth dense connectionConnection and transition module M ₄ The structures of the two are the same, and each of the two comprises a cascade dense connection unit and a transition module unit, please refer to fig. 3 and fig. 4, fig. 3 is a schematic structural diagram of the dense connection unit provided by the embodiment of the invention; fig. 4 is a schematic structural diagram of a transition unit according to an embodiment of the present invention. The dense connection unit comprises four subunits connected in series, each subunit comprises a batch normalization layer, an activation function layer, a convolution layer and a splicing layer, the input and the output of the former subunit are spliced to be used as the input of the latter subunit, and the input and the output of the last subunit are spliced to be used as the output of the dense connection unit; the transition unit comprises a batch normalization layer, an activation function layer, a convolution layer and a pooling window which are sequentially connected.

In this embodiment, the input of the feature extraction module isNamely, the input, output and parameter setting and relation of each layer of SAR pictures to be classified are as follows:

first convolution layer CO ₁ Its convolution kernel K ¹ The window size of (2) is 3 x 3, the sliding step S ¹ 1 for outputting 1 feature mapThis layer is referred to as first normalization layer B ₁ Is input to the computer; first normalization layer B ₁ The normalization mode is->(wherein x is a batch of feature maps input by the batch normalization layer, mu is the average value of all elements at the corresponding positions of the batch of feature maps, sigma ² For the variance of all elements of the corresponding positions of the set of feature maps, ε is a particularly small number, preventing denominator from being 0), for outputting 32 feature maps +.>The layer is used as a first activation function layer R ₁ Is input to the computer; first activation function layer R ₁ The activation function is ReLU function, which is used for outputting 32 weight characteristic graphs +.>As a second convolution layer CO ₂ Is input to the computer; second convolution layer CO ₂ Its convolution kernel K ² The window size of (2) is 3 x 3, the sliding step S ² 1 for outputting 32 feature maps +.>This layer is referred to as second normalization layer B ₂ Is input to the computer; second normalization layer B ₂ The normalization mode is->For outputting 32 feature maps->The layer is used as a second activation function layer R ₂ Is input to the computer; second activation function layer R ₂ The activation function is a ReLU function for outputting 32 weight feature graphsAs a first dense connection and transition module M ₁ Is input to the computer.

First dense connection and transition module M ₁ The method comprises the step of cascading a dense connection unit and a transition unit, wherein the dense connection unit comprises four subunits connected in series, each subunit comprises a batch normalization layer, an activation function layer, a convolution layer with the convolution kernel size of 3 multiplied by 6 and the step length of 1 and a splicing layer, the input and the output of the former subunit are used as the input of the latter subunit after being spliced by the splicing layer, and the input and the output of the last subunit are used as the output of the dense connection unit after being spliced. The transition unit comprises a batch normalization layer, an activation function layer and a convolution layer with the convolution kernel size of 3 multiplied by 56 and the step length of 1 which are connected in sequenceAnd a pooling window. The first dense connection and transition module M ₁ For outputting 56 feature mapsThis layer serves as a second dense connection and transition module M ₂ Is input to the computer.

Second dense connection and transition module M ₂ The method comprises the steps of cascading a dense connection unit and a transition unit, wherein the dense connection unit comprises four subunits connected in series, each subunit comprises a batch normalization layer, an activation function layer, a convolution layer with the convolution kernel size of 3 multiplied by 12 and the step length of 1 and a splicing layer connected in series, and the transition unit comprises a batch normalization layer, an activation function layer, a convolution operation with the convolution kernel size of 3 multiplied by 104 and the step length of 1 and a pooling window connected in series. The second dense connection and transition module M ₂ For outputting 104 feature mapsThe layer serves as a third dense connection and transition module M ₃ Is input to the computer.

Similarly, a third dense connection and transition module M ₃ Comprising a cascade of closely connected units comprising four subunits, each subunit comprising in series a batch normalization layer, an activation function layer, a convolution layer of convolution kernel size 3 x 18 and step size 1, and a splice layer, and a transition unit comprising in series a batch normalization layer, an activation function layer, a convolution layer of convolution kernel size 3 x 176 and step size 1, and a pooling window, the third closely connected and transition module M ₃ For outputting 176 feature mapsThis layer serves as a fourth dense connection and transition module M ₄ Is input to the computer.

Fourth dense connection and transition module M ₄ Comprising a cascade of densely connected units and an overdriveA transition unit, wherein the dense connection unit comprises four subunits, each subunit comprising a batch normalization layer, an activation function layer, a convolution layer with a convolution kernel size of 3×3×24 and a step size of 1, and a concatenation layer in series, the transition unit comprising a batch normalization layer, an activation function layer, a convolution layer with a convolution kernel size of 3×3×272 and a step size of 1, and a pooling window in series, the fourth dense connection and transition module M ₄ For outputting 272 feature mapsThis layer serves as input to the first feature fusion module.

With continued reference to fig. 2, the first feature fusion module of this embodiment includes four deconvolution layers (Transpose Convolution), four splice layers, and four convolution layers. Specifically, the first feature fusion module includes a first deconvolution layer T ₁ First splice layer CA ₁ Third convolution layer CO ₃ Second deconvolution layer T ₂ Second splice layer CA ₂ Fourth convolution layer CO ₄ Third deconvolution layer T ₃ Third splice layer CA ₃ Fifth convolution layer CO ₅ Fourth deconvolution layer T ₄ Fourth splice layer CA ₄ And a sixth convolution layer CO ₆ Wherein the first deconvolution layer T ₁ Input connections of (c) fourth dense connection and transition module M ₄ Output of first splice layer CA ₁ Is connected with a third dense connection and transition module M respectively ₃ And a first deconvolution layer T ₁ Output of first splice layer CA ₁ Is connected with the output of the third convolution layer CO ₃ Is input to the computer; second deconvolution layer T ₂ Is connected with the third convolution layer CO ₃ Output of the second splice layer CA ₂ Is connected with the second dense connection and transition module M respectively ₂ And a second deconvolution layer T ₂ Output of the second splice layer CA ₂ Output of (2) is connected with the fourth convolution layer CO ₄ Is input to the computer; third deconvolution layer T ₃ Is connected with the input of the fourth convolution layer CO ₄ Output of third splice layer CA ₃ Is connected with the first dense connection and the transition module M respectively ₁ And a third deconvolution layer T ₃ Output of third splice layer CA ₃ Output of (c) is connected with the fifth convolution layer CO ₅ Is input to the computer; fourth deconvolution layer T ₄ Is connected to the fifth convolution layer CO ₅ Output of fourth splice layer CA ₄ Respectively connected with the second activation function layer R ₂ And a fourth deconvolution layer T ₄ Output of fourth splice layer CA ₄ Is connected with the output of the sixth convolution layer CO ₆ Is input to the computer.

Specifically, the input of the first feature fusion module isI.e. fourth dense connection and transition module M ₄ J=1, 2 … 272, the inputs, outputs and parameter settings and relationships of the layers are as follows:

first deconvolution layer T ₁ Its convolution kernel TK ¹ The window size of (2) is 4 x 4, the sliding step TS ¹ 2 for outputting 176 feature mapsThe layer is taken as a first splicing layer CA ₁ Is input to the computer; first splice layer CA ₁ The splice dimension is a channel dimension and is used for outputting 352 feature graphs +.>This layer acts as a third convolution layer CO ₃ Is input to the computer; third convolution layer CO ₃ Its convolution kernel K ³ The window size of (2) is 3 x 3, the sliding step S ³ 1 for outputting 176 feature mapsThis layer acts as a second deconvolution layer T ₂ Is input to the computer.

Second deconvolution layer T ₂ Its convolution kernel TK ² The window size of (2) is 4 x 4, the sliding step TS ² 2 for outputting 104 bitsSign mapThe layer is taken as a second splicing layer CA ₂ Is input to the computer; second splice layer CA ₂ The splice dimension is a channel dimension and is used for outputting 208 feature graphs +.>This layer is referred to as the fourth convolution layer CO ₄ Is input to the computer; fourth convolution layer CO ₄ Its convolution kernel K ⁴ The window size of (2) is 3 x 3, the sliding step S ⁴ 1 for outputting 104 feature mapsThis layer is referred to as the third deconvolution layer T ₃ Is input to the computer.

Third deconvolution layer T ₃ Its convolution kernel TK ³ The window size of (2) is 4 x 4, the sliding step TS ³ 2 for outputting 56 feature mapsThe layer is taken as a third splicing layer CA ₃ Is input to the computer; third splice layer CA ₃ The splice dimension is a channel dimension, and is used for outputting 112 feature graphs +.>This layer acts as a fifth convolution layer CO ₅ Is input to the computer; fifth convolution layer CO ₅ Its convolution kernel K ⁵ The window size of (2) is 3 x 3, the sliding step S ⁵ 1 for outputting 56 feature mapsThis layer is referred to as the fourth deconvolution layer T ₄ Is input to the computer.

Fourth deconvolution layer T ₄ Its convolution kernel TK ⁴ The window size of (2) is 4 x 4, the sliding step TS ⁴ 2 for outputting 32 feature maps The layer is taken as a fourth splicing layer CA ₄ Is input to the computer; fourth splice layer CA ₄ The splice dimension is a channel dimension, and is used for outputting 64 feature graphs +.>This layer acts as a sixth convolution layer CO ₆ Is input to the computer; sixth convolution layer CO ₆ Its convolution kernel K ⁶ The window size of (2) is 3 x 3, the sliding step S ⁶ 1 for outputting 32 feature mapsThis layer serves as input to the second feature fusion module.

Further, the second feature fusion module of this embodiment includes five convolution layers, six batch normalization layers, six activation function layers, six pooling layers, four stitching layers, one flattening layer, and two fully connected layers. Specifically, the second feature fusion module of the present embodiment includes a seventh convolution layer CO ₇ Normalization layer B of third batch ₃ Third activation function layer R ₃ First pooling layer P ₁ Fifth splice layer CA ₅ Eighth convolution layer CO ₈ Normalization layer B of the fourth batch ₄ Fourth activation function layer R ₄ Second pooling layer P ₂ Sixth splice layer CA ₆ Ninth convolution layer CO ₉ Fifth normalization layer B ₅ Fifth activation function layer R ₅ Third pooling layer P ₃ Seventh splice layer CA ₇ Tenth convolution layer CO ₁₀ Normalization layer B of sixth batch ₆ Sixth activation function layer R ₆ Fourth pooling layer P ₄ Eighth splice layer CA ₈ Eleventh convolution layer CO ₁₁ Normalization layer B of seventh batch ₇ Seventh activation function layer R ₇ Fifth pooling layer P ₅ Normalized layer B of eighth batch ₈ Eighth activation function layer R ₈ Layer P of sixth pooling ₆ A flattening layer FL, a first full connection layer FC ₁ And a second full connection layer FC ₂ 。

Specifically, the firstSeven convolution layers CO ₇ Normalization layer B of third batch ₃ Third activation function layer R ₃ And a first pooling layer P ₁ Sequentially connected, and a seventh convolution layer CO ₇ Is connected with the input of the sixth convolution layer CO ₆ Output of (1), first pooling layer P ₁ And a fifth convolution layer CO ₅ Output of (c) is connected to fifth splice layer CA ₅ Is input to the computer; eighth convolution layer CO ₈ Normalization layer B of the fourth batch ₄ Fourth activation function layer R ₄ And a second pooling layer P ₂ Sequentially connected, eighth convolution layer CO ₈ Is connected with the fifth splicing layer CA ₅ Output of second pooling layer P ₂ And a fourth convolutional layer CO ₄ Output of (c) is connected with the sixth splicing layer CA ₆ Is input to the computer; ninth convolution layer CO ₉ Fifth normalization layer B ₅ Fifth activation function layer R ₅ And a third pooling layer P ₃ Sequentially connected, ninth convolution layer CO ₉ Is connected with the sixth splicing layer CA ₆ Output of (3), third pooling layer P ₃ And a third convolution layer CO ₃ Output of (c) is connected with the seventh splicing layer CA ₇ Is input to the computer.

Tenth convolution layer CO ₁₀ Normalization layer B of sixth batch ₆ Sixth activation function layer R ₆ And a fourth pooling layer P ₄ Sequentially connected, tenth convolution layer CO ₁₀ Is input into the seventh splice layer CA ₇ Output of fourth pooling layer P ₄ Output of (d) and fourth dense connection and transition module M ₄ Output of (c) is connected with eighth splicing layer CA ₈ Is input to the computer; eleventh convolution layer CO ₁₁ Normalization layer B of seventh batch ₇ Seventh activation function layer R ₇ And a fifth pooling layer P ₅ Sequentially connected, eleventh convolution layer CO ₁₁ Is connected with an eighth splicing layer CA ₈ Output of fifth pooling layer P ₅ The outputs of (a) are sequentially connected with an eighth normalization layer B ₈ Eighth activation function layer R ₈ Layer P of sixth pooling ₆ A flattening layer FL, a first full connection layer FC ₁ And a second full connection layer FC ₂ 。

Specifically, the second feature fusion moduleThe input isI.e. the sixth convolution layer CO ₆ J=1, 2 … 32, the inputs, outputs and parameter settings and relationships of the layers are as follows:

seventh convolution layer CO ₇ : its convolution kernel K ⁷ The window size of (2) is 3 x 3, the sliding step S ⁷ 1 for outputting 32 feature mapsThis layer was used as the third normalization layer B ₃ Is input to the computer; third normalization layer B ₃ : the normalization mode is->For outputting 32 feature maps-> The layer is used as a third activation function layer R ₃ Is input to the computer; third activation function layer R ₃ : the activation function is a ReLU function and is used for outputting 32 weight feature graphsAs a first pooling layer P ₁ Is input to the computer; first pooling layer P ₁ : the pooling dimension is space dimension, the pooling window size is 2 multiplied by 2, and the sliding step length PS ¹ 2 for outputting 32 weight feature maps +.>This layer serves as the fifth splice layer CA ₅ Is input to the computer; fifth splice layer CA ₅ : the splicing dimension is a channel dimension and is used for outputting 88 feature graphsThis layer is referred to as the eighth convolution layer CO ₈ Is input to the computer.

Eighth convolution layer CO ₈ : its convolution kernel K ⁸ The window size of (2) is 3 x 3, the sliding step S ⁸ 1 for outputting 88 feature mapsThis layer was normalized layer B as a fourth batch ₄ Is input to the computer; fourth normalization layer B ₄ : the normalization mode is->For outputting 88 feature maps-> The layer is used as a fourth activation function layer R ₄ Is input to the computer; fourth activation function layer R ₄ : the activation function is a ReLU function and is used for outputting 88 weight feature graphsThis layer serves as a second pooling layer P ₂ Is input to the computer; second pooling layer P ₂ : the pooling dimension is space dimension, the pooling window size is 2 multiplied by 2, and the sliding step length PS ² 2 for outputting 88 weight feature maps +.>The layer is taken as a sixth splicing layer CA ₆ Is input to the computer; sixth splice layer CA ₆ : the splicing dimension is a channel dimension and is used for outputting 192 feature graphsThis layer serves as a ninth convolution layer CO ₉ Is input to the computer.

Ninth convolution layer CO ₉ : its convolution kernel K ⁹ The window size of (2) is 3 x 3, the sliding step S ⁹ 1 for outputting 192 feature mapsThis layer was taken as fifth normalization layer B ₅ Is input to the computer; fifth normalization layer B ₅ : the normalization mode is->For outputting 192 feature maps->j=1, 2 … 192, this layer being the fifth activation function layer R ₅ Is input to the computer; fifth activation function layer R ₅ : the activation function is a ReLU function and is used for outputting 192 weight feature graphsThe layer is used as a third pooling layer P ₃ Is input to the computer; third pooling layer P ₃ : the pooling dimension is space dimension, the pooling window size is 2 multiplied by 2, and the sliding step length PS ³ 2 for outputting 192 weight feature maps +.>The layer is taken as a seventh splicing layer CA ₇ Is input to the computer; seventh splice layer CA ₇ : the splicing dimension is a channel dimension and is used for outputting 368 feature graphsThis layer serves as the tenth convolution layer CO ₁₀ Is input to the computer.

Tenth convolution layer CO ₁₀ : its convolution kernel K ¹⁰ Window size 3 x 3, sliding step S ¹⁰ 1 for outputting 368 feature mapsThis layer was taken as the sixth normalization layer B ₆ Is input to the computer; sixth normalization layer B ₆ : the normalization mode is- >For outputting 368 feature maps->j=1, 2, … 368, which layer is the sixth activation function layer R ₆ Is input to the computer; sixth activation function layer R ₆ : the activation function is a ReLU function and is used for outputting 368 weight feature graphsThis layer serves as a fourth pooling layer P ₄ Is input to the computer; fourth pooling layer P ₄ : the pooling dimension is space dimension, the pooling window size is 2 multiplied by 2, and the sliding step length PS ₄ 2 for outputting 368 weight feature maps +.>The layer is taken as an eighth splicing layer CA ₈ Is input to the computer; eighth splice layer CA ₈ : the splice dimension is a channel dimension and is used for outputting 640 feature graphsThis layer acts as an eleventh convolutional layer CO ₁₁ Is input to the computer.

Eleventh convolution layer CO ₁₁ : its convolution kernel K ¹¹ The window size of (2) is 3 x 3, the sliding step S ¹¹ 1 for outputting 640 feature mapsThis layer was taken as the seventh normalization layer B ₇ Is input to the computer; seventh normalization layer B ₇ : the normalization mode is->For outputting 640 feature maps-> The layer is used asSeventh activation function layer R ₇ Is input to the computer; seventh activation function layer R ₇ : the activation function is a ReLU function and is used for outputting 640 weight feature graphsThis layer serves as a fifth pooling layer P ₅ Is input to the computer; fifth pooling layer P ₅ : the pooling dimension is space dimension, the pooling window size is 2 multiplied by 2, and the sliding step length PS ⁵ 2 for outputting 640 weight feature maps +.>This layer was normalized layer B as the eighth batch ₈ Is input to the computer.

Eighth normalization layer B ₈ : the normalization mode is thatFor outputting 640 feature maps-> The layer is used as an eighth activation function layer R ₈ Is input to the computer; eighth activation function layer R ₈ : the activation function is ReLU function for outputting 640 weight feature graphs +.>This layer serves as a sixth pooling layer P ₆ Is input to the computer; sixth pooling layer P ₆ : the pooling dimension is space dimension, the pooling window size is 2 multiplied by 2, and the sliding step length PS ⁶ 2 for outputting 640 weight feature maps +.>This layer serves as an input to the flattening layer FL; flattening layer FL: for outputting 1 640-dimensional column vector X ₂ ²⁸ This layer serves as a first fully connected layer FC ₁ Is input to the computer; first full connection layer FC ₁ : which is a kind ofThere are 128 neurons for outputting 1 128-dimensional column vector X ₂ ²⁹ This layer serves as a second fully-connected layer FC ₂ Is input to the computer; second full connection layer FC ₂ : it is provided with 3 neurons for outputting 1 3-dimensional column vector X ₂ ³⁰ 。

S2: and constructing a SimCLR comparison learning network framework taking the feature fusion network psi as a feature extraction network, wherein the SimCLR comparison learning network framework comprises a feature extraction network and a feature mapping network which are cascaded.

Specifically, referring to fig. 5, fig. 5 is a schematic structural diagram of a SimCLR contrast learning network framework according to an embodiment of the present invention. The feature extraction network in this embodiment is the feature fusion network ψ, and the number of output nodes of the last full-connection layer of the feature fusion network ψ is modified to 128.

The feature mapping network is a multi-layer perceptron E and comprises two full-connection layers and an activation function layer, namely a third full-connection layer FC which is cascaded in sequence ₃ Ninth activation function layer R ₉ And a fourth full connection layer FC ₄ And the third full connection layer FC ₃ Is connected with the input of the second full connection layer FC ₂ Is provided.

Specifically, the input of the multi-layer perceptron E is X ₂ ³⁰ A second full connection layer FC which is the second feature fusion module ₂ The inputs, outputs and parameter settings of the layers are as follows:

third full connection layer FC ₃ : it has 128 neurons for outputting 1 128-dimensional column vector X ₂ ³¹ The layer being a ninth activation function layer R ₉ Is input to the computer; ninth activation function layer R ₉ : the activation function is a ReLU function for outputting 1 128-dimensional column vector X ₂ ³² The layer serves as a fourth full connection layer FC ₄ Is input to the computer; fourth full connection layer FC ₄ : it has 128 neurons for outputting 1 128-dimensional column vector X ₂ ³³ 。

S3: and acquiring a plurality of groups of mat-format training data sets and a plurality of groups of JPG-format picture training data sets, and respectively inputting the plurality of groups of JPG-format picture training data sets into the SimCLR comparison learning network frame for training to acquire a plurality of groups of pre-training models.

In this embodiment, step S3 includes:

s3.1: a training dataset and a test dataset are obtained.

Specifically, the OpenSARShip dataset is divided into a training set and a testing set according to the proportion of 8:2 to obtain 5 training sets { phi } ₁ ,φ ₂ ,φ ₃ ,φ ₄ ,φ ₅ Sum of corresponding 5 test sets { t } ₁ ,t ₂ ,t ₃ ,t ₄ ,t ₅ And training set { phi }, and ₁ ,φ ₂ ,φ ₃ ,φ ₄ ,φ ₅ data expansion and clipping are carried out to obtain a training set { phi }, and the training set { phi } is obtained ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ Test set { t } ₁ ,t ₂ ,t ₃ ,t ₄ ,t ₅ Cutting to obtain a test set { T } ₁ ,T ₂ ,T ₃ ,T ₄ ,T ₅ Training set { phi }, mat format ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ Conversion to JPG format picture training set { phi } ₁ ',Φ ₂ ',Φ ₃ ',Φ ₄ ',Φ ₅ '}。

Specifically, the specific operation steps of dividing the data set and expanding the data of the training set in the step are as follows:

(a) Randomly dividing training data and test data of each of three types of data in an OpenSARShip data set according to the proportion of 8:2, and then collecting the training data of the three types of data together to serve as a training set phi of the division _i The test data of the three types of data are collected together to be used as a test set t of the current division _i . 5 training sets { phi } are obtained by 5 divisions ₁ ,φ ₂ ,φ ₃ ,φ ₄ ,φ ₅ Sum of 5 test sets { t } ₁ ,t ₂ ,t ₃ ,t ₄ ,t ₅ }。

(b) For 5 training sets { phi } ₁ ,φ ₂ ,φ ₃ ,φ ₄ ,φ ₅ Go intoAnd (5) expanding data. The specific expansion mode is as follows: (1) turning over ship pictures in the training set, including horizontal turning over and vertical turning over; (2) rotating the ship slices by 90 °, 180 ° and 270 °; (3) randomly horizontally shifting the ship slices by 0-5 pixels; (4) gaussian noise is added to the ship picture, wherein the mean value of the gaussian noise is 0 and the variance is 0.001. After data expansion, the number of each type of training data becomes 8 times of the original number, namely an original sample and an expanded sample which is 7 times, then the minimum number of the types of training data is the number of each type of training data after expansion, and the rest types of training data are formed by combining all original training data plus randomly selected part of expanded training data, thereby ensuring that the number of the training data of each type is the same, and then taking a ship picture center 64

The x 64 region is the final mat-format training set { Φ } ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ Simultaneously for test set { t } ₁ ,t ₂ ,t ₃ ,t ₄ ,t ₅ }

Taking the 64 multiplied by 64 area of the ship picture center as the final test set { T ] ₁ ,T ₂ ,T ₃ ,T ₄ ,T ₅ }；

(2c) For 5 training sets { phi ] ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ Each SAR image in each training set is converted into a JPG format picture, and 5 training sets { phi } are obtained ₁ ',Φ ₂ ',Φ ₃ ',Φ ₄ ',Φ ₅ '}。

S3.2: will JPG format picture data set { phi ] ₁ ',Φ ₂ ',Φ ₃ ',Φ ₄ ',Φ ₅ ' is input into a SimCLR contrast learning framework omega for pretraining, and a trained pretraining model { ψ ' is obtained ' ₁ ,ψ' ₂ ,ψ' ₃ ,ψ' ₄ ,ψ' ₅ }。

That is, each of the 5 JPG format picture datasets is input into the SimCLR contrast learning framework Ω, a corresponding pre-training model can be obtained, and the loss function used is a contrast loss:

wherein z is _i Representing feature vector, z obtained by feature extraction and feature mapping of ith training data in training data set _j Representing feature vectors obtained by feature extraction and feature mapping of jth training data in training data set, s _i,j And the similarity of the feature vectors of the ith training data and the jth training data is represented. l (i, j) represents the similarity of the feature vector after the function transformation, τ represents a temperature coefficient, is a super parameter, and is set to 0.07, l represents the average value of the similarity of all sample pairs obtained by data enhancement of each sample in a small batch of data containing N samples, k is a variable in a summation formula, and represents the kth data in the N data. In the calculation process, for the kth sample in the data of the N samples, the 2k-1 and the 2k sample in the data of the 2N samples after data enhancement are obtained through two data enhancement modes of random clipping, scaling and Gaussian blur.

S4: and loading parameters of the multiple groups of pre-training models into the feature fusion network, and further training the feature fusion network by utilizing the multiple groups of mat-format training data sets to obtain a trained feature fusion network.

Specifically, the pre-trained model { ψ } 'is applied' ₁ ,ψ' ₂ ,ψ' ₃ ,ψ' ₄ ,ψ' ₅ The parameters of each pre-training model in the JPG format are loaded into the feature fusion network psi and are matched with the JPG format picture data set { phi } ₁ ',Φ ₂ ',Φ ₃ ',Φ ₄ ',Φ ₅ ' Mat format data corresponding to }, andset { phi } ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ And inputting the data into the feature fusion network psi for fine tuning training to obtain a trained feature fusion network. Illustratively, the model ψ 'will be pre-trained' ₁ Is loaded into the feature fusion network phi and is matched with a JPG format picture data set phi ₁ ' corresponding mat-format data set Φ ₁ And inputting the data into the feature fusion network psi for fine tuning training, and the like.

In this embodiment, 300 rounds of training are performed, and after each round of training, a test set is used to perform a test, and an average value of the accuracy of the last 10 rounds of test is taken to obtain a final classification result. In this step, the loss function used for training is as follows:

joint loss:

wherein lambda is ₁ Is that the cross entropy loss function weight is set to 0.6, J _s (X, W, B) is a cross entropy loss function, X is a feature vector of the network output, W, B is a network parameter weight and bias, 1-lambda ₁ Is that the weight of the triplet loss function is set to 0.4, J _t (X, W, B) is a triplet loss function, lambda ₂ Is Fisher discrimination regularization term weight set to 0.005, J _f (X, W, B) is Fisher discriminant term, lambda ₃ Is that the L2 regularization term weight is set to 0.0005, J _w (W, B) is a weight decay canonical term,probability truth value for the ith sample belonging to class c,/->The probability prediction value of the ith sample belonging to the c class is given, and N is the data quantity of a batch of data; />Is the eigenvector of the anchor sample in the triplet sample, < ->Feature vector for positive sample in triplet sample, < +.>Feature vector for negative sample in triplet sample, < +.>Euclidean distance representing feature vector of anchor sample and feature vector of positive sample, +.>The Euclidean distance of the feature vector representing the anchor point sample and the feature vector of the negative sample, wherein alpha is the Euclidean distance threshold value which is set to be 0.2; />Representation->Get itself and add->Taking 0; m is m ₁ And m ₂ Average value of Euclidean distance in characteristic space of same kind of sample data pair and different kinds of sample data pair respectively, < >>And->Corresponding variance values respectively; />Representing the square of applying the F-norm to the network parameter matrix,/- >Representation ofThe square of the 2 norms is applied to the network parameter vector.

The effectiveness of the SAR ship classification method based on contrast learning pre-training is described below through a comparison experiment, please refer to fig. 7, fig. 7 is a schematic structural diagram of a dense connection network based on a triplet network and Fisher criterion, which is used in the comparison experiment of the embodiment of the invention, and the dense connection network is called TriDenseNet for short, which is from paper of research on SAR image ship target detection and classification method, paper of doctor of the university of Western electronic technology, he Jinglu, 2019. The comparative experiment was performed directly on the densely connected network TriDenseNet.

In the TriDenseNet direct training process, experiments were carried out using a PyTorch framework using Ubuntu 16.04 operating system, torch version 1.6.0 and CUDA version 10.0.130. The learning rate, momentum parameters, cross entropy loss function weights, batch sizes were set to 0.1, 0.9, 0.6 and 100, respectively, and the learning rate decayed 0.1 times the last time at the 150, 200 and 250 th rounds of training. The 10 rounds of network training begin using only cross entropy loss, and the 10 rounds of network training later train the cross entropy loss function in combination with the triplet loss and Fisher criterion, training 300 rounds.

In the experimental process of fine tuning by loading a pre-training model on a feature fusion network by using the method of the invention: first, contrast learning pre-training is achieved by using a PyTorch framework, a Ubuntu 16.04 operating system is used, a torch 1.6.0 version is used, and a CUDA version is 10.0.130. In contrast learning, the learning rate, temperature and momentum parameters are respectively set to be 0.0003, 0.07 and 0.0001, and 1000 rounds of training are performed to obtain a pre-training model; at fine tuning, the learning rate, momentum parameters, cross entropy loss function weights, batch sizes were set to 0.1, 0.9, 0.6 and 100, respectively, and the learning rate decayed 0.1 times the last time at 150, 200 and 250 rounds of training. The 10 rounds of network training begin using only cross entropy loss, and the 10 rounds of network training later train the cross entropy loss function in combination with the triplet loss and Fisher criterion, training 300 rounds.

TABLE 1 test accuracy of three classes of SAR Ship images obtained with TriDenseNet and method of embodiments of the present invention

As can be seen from fig. 8 and table 1, the tri ensenet direct training and the improved network loading pre-training model fine tuning undergo 300 rounds of training, achieving convergence on each data set. Both methods converged the training curve and the test curve after 200 rounds. As can be seen from Table 1, compared with TriDenseNet, the accuracy of the improved network loading pre-training model fine tuning mode is 0.64 percent higher, the standard deviation is 0.29 percent lower, the improved network loading pre-training model fine tuning mode has higher classification accuracy and lower standard deviation, and the classification accuracy on SAR image ship targets can be improved by adding the feature fusion module into the convolutional neural network and loading the comparison learning pre-training model fine tuning mode.

According to the SAR ship classification method based on contrast learning pre-training, the pre-training model is obtained through the non-supervision contrast learning, the non-supervision contrast learning is realized through a simple SimCLR framework, and the network can obtain better feature extraction capability without label information; compared with the method that the method for loading the pretrained model for fine adjustment can obtain better network parameters through direct training, the method has the advantages that the network convergence speed is higher, and the classification accuracy is higher. The invention combines the shallow layer features and the deep layer features by using the convolution neural network with the bidirectional feature fusion structure of the path aggregation network, thereby improving the image feature extraction capability and the classification accuracy of the convolution neural network.

In the several embodiments provided in the present invention, it should be understood that the apparatus and method disclosed in the present invention may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed.

In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in hardware plus software functional modules.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. The SAR ship classification method based on contrast learning pre-training is characterized by comprising the following steps of:

2. The method for classifying SAR ships based on contrast learning pre-training as set forth in claim 1, wherein the feature extraction module comprises a first convolution layer CO connected in sequence ₁ First normalization layer B ₁ First activation function layer R ₁ Second convolution layer CO ₂ Normalization layer B of the second batch ₂ Second activation function layer R ₂ First dense connection and transition module M ₁ Second dense connection and transition module M ₂ Third dense connection and transition module M ₃ Fourth dense connection and transition module M ₄ 。

3. The method for classifying SAR vessels based on contrast learning pre-training according to claim 2, wherein the first dense connection and transition module M ₁ Said second dense connection and transition module M ₂ Said third dense connection and transition module M ₃ Said fourth dense connection and transition module M ₄ The structures of the two are the same, each comprises a cascade dense connection unit and a transition unit, wherein,

4. According to the weightsThe contrast learning pretrained SAR ship classification method of claim 2, wherein the first feature fusion module comprises a first deconvolution layer T ₁ First splice layer CA ₁ Third convolution layer CO ₃ Second deconvolution layer T ₂ Second splice layer CA ₂ Fourth convolution layer CO ₄ Third deconvolution layer T ₃ Third splice layer CA ₃ Fifth convolution layer CO ₅ Fourth deconvolution layer T ₄ Fourth splice layer CA ₄ And a sixth convolution layer CO ₆ Wherein, the method comprises the steps of, wherein,

the first deconvolution layer T ₁ Is connected with the input of the fourth dense connection and transition module M ₄ The first splice layer CA ₁ Is connected to the third dense connection and transition module M respectively ₃ And the first deconvolution layer T ₁ The first splice layer CA ₁ Is connected with the output of the third convolution layer CO ₃ Is input to the computer; the second deconvolution layer T ₂ Is connected with the input of the third convolution layer CO ₃ The second splice layer CA ₂ Is connected to the second dense connection and transition module M respectively ₂ And the second deconvolution layer T ₂ The second splice layer CA ₂ Is connected with the output of the fourth convolution layer CO ₄ Is input to the computer; the third deconvolution layer T ₃ Is connected with the input of the fourth convolution layer CO ₄ The third splicing layer CA ₃ Is connected to the first dense connection and transition module M respectively ₁ And the third deconvolution layer T ₃ The third splicing layer CA ₃ Is connected with the output of the fifth convolution layer CO ₅ Is input to the computer; the fourth deconvolution layer T ₄ Is connected with the input of the fifth convolution layer CO ₅ The fourth splice layer CA ₄ Respectively connected with the second activation function layer R ₂ And the fourth deconvolution layer T ₄ The fourth splice layer CA ₄ Is connected with the output of the sixth convolution layer CO ₆ Input of (a)。

5. The contrast learning pretrained SAR ship classification method according to claim 4, wherein the second feature fusion module comprises a seventh convolutional layer CO ₇ Normalization layer B of third batch ₃ Third activation function layer R ₃ First pooling layer P ₁ Fifth splice layer CA ₅ Eighth convolution layer CO ₈ Normalization layer B of the fourth batch ₄ Fourth activation function layer R ₄ Second pooling layer P ₂ Sixth splice layer CA ₆ Ninth convolution layer CO ₉ Fifth normalization layer B ₅ Fifth activation function layer R ₅ Third pooling layer P ₃ Seventh splice layer CA ₇ Tenth convolution layer CO ₁₀ Normalization layer B of sixth batch ₆ Sixth activation function layer R ₆ Fourth pooling layer P ₄ Eighth splice layer CA ₈ Eleventh convolution layer CO ₁₁ Normalization layer B of seventh batch ₇ Seventh activation function layer R ₇ Fifth pooling layer P ₅ Normalized layer B of eighth batch ₈ Eighth activation function layer R ₈ Layer P of sixth pooling ₆ A flattening layer FL, a first full connection layer FC ₁ And a second full connection layer FC ₂ Wherein, the method comprises the steps of, wherein,

the ninth convolution layer CO ₉ The fifth normalization layer B ₅ The fifth activation function layer R ₅ And the third pooling layer P ₃ Sequentially cascade and said ninth convolutional layer CO ₉ Is connected with the input of the sixth splicing layer CA ₆ The third pooling layer P ₃ And the output of the third convolution layer CO ₃ Is connected with the seventh splicing layer CA ₇ Is input to the computer; the tenth convolution layer CO ₁₀ The sixth normalization layer B ₆ The sixth activation function layer R ₆ And the fourth pooling layer P ₄ Sequentially cascade and said tenth convolution layer CO ₁₀ Is connected with the seventh splicing layer CA ₇ The fourth pooling layer P ₄ Is connected with the output of the fourth dense connection and transition module M ₄ Is connected with the eighth splicing layer CA ₈ Is input to the computer; the eleventh convolution layer CO ₁₁ The seventh normalization layer B ₇ The seventh activation function layer R ₇ And the fifth pooling layer P ₅ Sequentially connected with the eleventh convolution layer CO ₁₁ Is connected with the eighth splicing layer CA ₈ The fifth pooling layer P ₅ The outputs of the (B) are sequentially connected with the eighth normalized layer B ₈ The eighth activation function layer R ₈ The sixth pooling layer P ₆ The flattened layer FL and the first full-connection layer FC ₁ And the second full connection layer FC ₂ 。

6. The method for classifying SAR ships based on contrast learning pre-training as set forth in claim 5, wherein the feature mapping network is a multi-layer perceptron and comprises a third full-connection layer FC in cascade connection in sequence ₃ Ninth activation function layer R ₉ And a fourth full connection layer FC ₄ And the third full connection layer FC ₃ Is connected with the input of the second full connection layer FC ₂ Is provided.

7. The contrast learning pretrained SAR ship classification method according to claim 1, wherein S3 comprises:

s3.1: dividing the OpenSARShip data set into 5 training sets and a test set according to the proportion of 8:2 to obtain 5 training sets { phi } ₁ ,φ ₂ ,φ ₃ ,φ ₄ ,φ ₅ Sum of corresponding 5 test sets { t } ₁ ,t ₂ ,t ₃ ,t ₄ ,t ₅ And training set { phi }, and ₁ ,φ ₂ ,φ ₃ ,φ ₄ ,φ ₅ data expansion and clipping are carried out to obtain a training set { phi }, and the training set { phi } is obtained ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ Test set { t } ₁ ,t ₂ ,t ₃ ,t ₄ ,t ₅ Cutting to obtain a test set { T } ₁ ,T ₂ ,T ₃ ,T ₄ ,T ₅ Training set { phi }, mat format ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ Conversion to JPG format picture training set { phi } ₁ ',Φ′ ₂ ,Φ′ ₃ ,Φ′ ₄ ,Φ′ ₅ '}；

S3.2: the JPG format picture data set { phi } ₁ ',Φ′ ₂ ,Φ′ ₃ ,Φ′ ₄ ,Φ′ ₅ Inputting the training data into the SimCLR contrast learning framework to perform pre-training to obtain a pre-training model { psi' ₁ ,ψ' ₂ ,ψ' ₃ ,ψ' ₄ ,ψ' ₅ The loss function used was the contrast loss:

wherein z is _i Representing feature vector, z obtained by feature extraction and feature mapping of ith training data in training data set _j Representing feature vectors obtained by feature extraction and feature mapping of jth training data in training data set, s _i,j The similarity of the feature vector of the ith training data and the feature vector of the jth training data is represented, L (i, j) represents the similarity of the feature vector after function transformation, tau represents a temperature coefficient, L represents the average value of the similarity of all sample pairs obtained by data enhancement of each sample in data containing N samples, and in the calculation process, the kth sample in the data containing N samples is obtained by randomly cutting, scaling and Gaussian blur two data enhancement modes, namely the 2k-1 sample and the 2k sample in the data containing 2N samples after data enhancement.

8. The contrast learning pretrained SAR ship classification method according to claim 7, wherein S4 comprises:

the pre-training model { ψ' ₁ ,ψ' ₂ ,ψ' ₃ ,ψ' ₄ ,ψ' ₅ The parameters of each pre-training model in the JPG format are loaded into the feature fusion network psi and are matched with the JPG format picture data set { phi } ₁ ',Φ′ ₂ ,Φ′ ₃ ,Φ′ ₄ ,Φ′ ₅ Data set { phi } corresponding to mat format ₁ ,Φ ₂ ,Φ ₃ ,Φ ₄ ,Φ ₅ And inputting the data into the feature fusion network psi for fine tuning training to obtain a trained feature fusion network.