CN116895016A

CN116895016A - SAR image ship target generation and classification method

Info

Publication number: CN116895016A
Application number: CN202310771564.4A
Authority: CN
Inventors: 徐从安; 高龙; 苏航; 张建廷; 吴俊峰; 闫文君
Original assignee: Naval Aeronautical University
Current assignee: Naval Aeronautical University
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-10-17

Abstract

The invention discloses a SAR image ship target generation and classification method, which comprises the steps of training test set construction; a construction generator for generating a generated image with a category label; the discriminator outputs the three-tuple characteristics, the category prediction result and the true and false prediction result through the characteristic extraction network and the three multi-task learning sub-networks, and updates the network parameters of the discriminator and the generator by using the corresponding loss; calculating the category prediction loss of the real image based on the hard tag, and generating the category prediction loss of the image based on the smooth tag calculation; inputting feature vectors learned by a feature extraction network in the discriminator to a full-connection layer, optimizing triplet features by using triplet loss, and learning distinguishable features related to categories; constructing an overall loss function based on the generator and the discriminator and the constructed loss function, and alternately optimizing the generator and the discriminator by using the constructed overall loss function; after training the constructed model, the input test images are classified by using a discriminator. The method can reduce the cost of manual labeling, prevent the algorithm from being overfitted, improve the robustness and generalization capability of the classifier, and increase the degree of distinction of different categories in the classification space.

Description

SAR image ship target generation and classification method

Technical Field

The invention relates to the technical field of synthetic aperture radar image (SAR image) target generation and classification, in particular to an SAR image ship target generation and classification method.

Background

The classification of ships plays an important role in various offshore activities, such as national defense information, fishery monitoring, offshore searching and the like. SAR is gradually becoming a key device for ship monitoring due to its characteristics of all-day and all-weather operation. With the transmission of new generation synthetic aperture radar satellites, a large number of medium-high resolution synthetic aperture radar images are becoming easier to acquire, making it possible to identify the type of vessel.

Conventional SAR ship classification methods typically extract low-level features such as geometric features, scattering features, and statistical features by manual methods and then classify the ship using conventional machine learning algorithms. However, these manually extracted features are difficult to fully represent vessel targets and have limited performance in complex scenarios.

In recent years, a deep learning-based SAR ship classification method is attracting more and more attention, and becomes a relevant research hotspot. For example, li et al (Li J, qu C, peng S.clip classification for unbalanced SAR datasetbased on convolutional neural network [ J ]. Journal ofAppliedRemote Sensing, 2018) have devised a dense residual network with resampling and integrated cost-sensitive learning for class imbalance problems in ship classification. Deches et al (Deches ne, C.; lefE v re, S.; vadaine, R.; hajduch, G.; fablet, R.clip Identification and Characterization in Sentinel-1SAR Images with Multi-Task Deep learning.remote sens.2019,11,1-18, doi:10.3390/rs 11242997.) have designed a multitasking architecture to accomplish the detection, classification, and length estimation tasks simultaneously. Firoozy et al (Firoozy, N.; sandisregaram, N.jackling SAR Imagery Ship Classification Imbalance via Deep Convolutional Generative Adversarial network.can.J. remote ns.2021,47,295-308, doi: 10.1080/07038992.2021.1910499.) generate samples for minority classes using resistance training to balance training datasets. He et al (He, j.; wang, y.; liu, h.clip Classification in Medium-Resolution SAR Images via Densely Connected Triplet CNNs Integrating Fisher Discrimination Regularized Metric learning.2021,59, 3022-3039.) propose a dense connected triplet neural network with Fisher regularization term for mid-resolution SAR image vessel classification tasks. Zeng et al (Zeng, l.; zhu, q.; lu, d.; zhang, t.; wang, h.; yin, j.; membrane, s.dual-Polarized SAR Ship Grained Classification Based on CNN With Hybrid Channel Feature loss.2022,19, 16-20.) propose the joint use of information contained in polarized channels (VV and VH) by designing a hybrid channel feature loss for dual polarized SAR vessel granularity classification. Zhang et al (Zhang, Y.; lei, Z.; yu, H.; zhuang, L.Imbanced High-Resolution SAR Ship Recognition Method Based on a Lightweight CNN.IEEE Geosci.remote sens.Lett.2022,19,0-4, doi: 10.1109/LGRS.2021.3083262.) designed a lightweight CNNs classification model combining DML with stepwise balanced sampling to reduce computational complexity and solve the problem of class imbalance in High resolution SAR images. Summarizing, the deep learning-based method can directly extract advanced features from large-scale data and perform end-to-end training, and has great potential in practical application. However, the currently available SAR image ship classification data volume is relatively small, especially for some offshore military targets such as eviction vessels and aircraft carriers, and manual tagging is costly. Due to the numerous parameters of deep learning models, deep learning-based models trained on small-scale data sets often suffer from over-fitting problems, thereby reducing the generalization ability of the model. Furthermore, unlike optical images, SAR imaging mechanisms have limited information due to imperfections in SAR imaging mechanisms, resulting in relatively small differences in appearance for different types of vessels, and relatively large differences in appearance for the same type of vessel. The high intra-class diversity and inter-class similarity make it difficult for the algorithm to learn the classification boundaries, which further reduces the classification performance of the algorithm.

To solve the above problems, the present invention proposes LST-ACGAN for SAR ship classification, generating new samples by introducing ACGAN to expand sample distribution; the algorithm is prevented from being overfitted by distributing smooth labels to the generated samples, so that the generalization capability of the model is improved; by integrating the triplet loss into the ACGAN, the distinguishability of different classes of samples is improved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a SAR image ship target generation and classification method.

The technical scheme provided by the invention is as follows: deep learning-based SAR image ship classification achieves significant performance in large-scale training samples, however, these methods have difficulty in learning the distribution of data and are prone to overfitting when the training data is insufficient. In response to this problem, the present invention proposes a GAN-based algorithm, called LST-ACGAN, for creating images while completing classification. The overall model of LST-ACGAN includes a generator G and a discriminant D. The generator G may generate an image with a category label and the arbiter D may verify whether the input image is authentic and at the same time implement classification. By assigning a smooth label to the image generated by generator G and proposing a cross entropy penalty (SL-CE) based on the smooth label as a classification penalty for the generated image, the model is regularized during training and model collapse is avoided. The triplet loss is integrated into the classification network of the arbiter D to learn the discriminative features associated with the class. In the training process, triplets are constructed by sampling anchor samples, positive samples and negative samples from the real images and input into the discriminator D together with the generated images, and then the discriminator and generator are jointly optimized by a plurality of loss functions. After training is completed, the input test samples are classified by using a discriminator.

The SAR image ship target generation and classification method is characterized by comprising the following steps of:

step 1, training test set construction:

collecting SAR ship target images, and dividing training and testing sample sets; then carrying out sample amplification on the training samples to enable the number of the training samples of each type to be the same as the number of the training samples of the class with the largest training sample number;

step 2, generator G constructs:

generating a generated image x with class labels consistent with a training sample distribution _fake Constructing a generator G based on the DCNN;

step 3, designing a network structure of the discriminator D;

the input of the discriminator D is the real image x _real And step 2 the generated image x output by the generator G _fake After the confused data block passes through a feature extraction network and three multi-task learning sub-networks, outputting a triplet feature, a category prediction result and an authenticity prediction result, and updating network parameters of a discriminator D and a generator G by using corresponding losses, wherein the three multi-task learning sub-networks are the triplet feature learning sub-network, the category prediction sub-network and the authenticity prediction sub-network;

step 4, classifying the sub-network smooth label construction and the loss function construction:

step 3 the input image of the discriminator D constructed in step 3 is the real image x _real And generating an image x _fake The data block after confusion inputs the two images into a category prediction sub-network, and the two images are used for real image x _real Computing real image x using cross entropy loss based on hard tag _real Class prediction loss for generated image x _fake Class prediction loss of the generated image is calculated based on the smooth labels.

Step 5, triple loss construction:

inputting the feature vector learned by the feature extraction network in the step 3 discriminator to a full connection layer (FC) to obtain the output f of the triplet feature learning sub-network _triplets The triplet loss is used to optimize triplet characteristics, learning category-related distinguishability characteristics.

Step 6, constructing a generator and a discriminator loss function:

based on the network of generators and discriminants constructed in steps 2 and 3 and the loss functions constructed in steps 4 and 5, the overall loss functions of the generators G and discriminants D are constructed, and the constructed overall loss functions are used for alternately optimizing the generators and the discriminants.

Step 7, model training and testing:

training the model constructed in the steps 2 to 6, and classifying the input test sample by using a discriminator after the training is finished.

Further, in the step 2 generator G configuration described above:

generating a generated image x consistent with the training image distribution according to the input vector z _fake The method comprises the steps of carrying out a first treatment on the surface of the Wherein the input vector z' is obtained from equation (1) in which emmbedding (y _fake ) Represented by category label y _fake The vector with the length of m is obtained by conversion, and the specific conversion process is as follows: assuming that the classification task contains { 1..once., C } C total classes, a random matrix M of dimension C x M is first randomly generated, and then a class label y is randomly generated _fake E { 1..times., C }, and converts it into a one-hot tag y of dimension 1 XC _oh Recalculate y _oh X M, finally obtaining a vector of length M (y _fake ) The method comprises the steps of carrying out a first treatment on the surface of the The vector is applied to the vector (y _fake ) Multiplying the noise vector z with the length of m generated randomly by elements to finally obtain an input vector z' of the generator;

wherein the method comprises the steps ofRepresenting element-by-element multiplication;

then, the generator G takes the input vector z' as input and outputs a generated image x _fake Constructing a generator G based on the DCNN, wherein the generator G comprises a conversion module, 2 UC modules and a generation module; for the input vector z ', the conversion module converts z' from a one-dimensional vector to a three-dimensional feature map by full join (FC), reshape operation, and Batch Normalization (BN), howeverInputting the UC module; the UC module firstly carries out up-sampling for 2 times on the three-dimensional feature map output by the previous module through an Upsampling layer, and then realizes feature learning through a convolution layer (Conv), a batch normalization layer (BN) and an activation layer (LeakyReLU); in the Conv of the convolution layer, c is the number of convolution kernels, k is the size of the convolution kernels, s is the step length, and p is the boundary expansion parameter; beta is a slope coefficient; finally, the generating module reduces the three-dimensional feature map output by the UC2 module through a convolution layer Conv, and constrains the values in the three-dimensional feature map to (-1, 1) through a Tanh function;

further, in the step 3, the arbiter D network structure design:

adopting a pretrained ResNet18 with the FC layer removed as a feature extraction network; b input images are input into a feature extraction network to obtain b feature images, the b feature images are converted into b feature vectors through a global average pooling layer (GAP) and a Reshape layer, and then the b feature vectors are normalized by L2 normalization to finally obtain b feature vectors; finally, the feature vectors are respectively input into the three multi-task learning sub-networks;

the category prediction sub-network obtains a category prediction value of the image input by the discriminator D through a softmax function:

wherein C represents the number of categories, p _i ,p _j (i, j=1, 2, …, C) represents a classification vector of softmax output, q _i Representing the probability that the input image belongs to the ith category, namely a category prediction value;

the authenticity prediction sub-network maps the output to (0, 1) by adopting a sigmoid function to obtain an authenticity prediction value of the input image:

where v represents a two-class vector and p (v) represents the probability that the input image belongs to the true class, i.e. the true-false prediction value.

Further, in the step 4, the classifying sub-network smooth label construction and the loss function construction are as follows:

inputting the two images into a category prediction sub-network to obtain a category prediction value q _i The losses are then calculated separately:

for real image x _real The calculation is directly performed by adopting a hard tag, and the hard tag can be expressed as:

wherein y is _i (i=1, 2, …, C) represents a one-hot encoded vector when the true image x is inputted _real Y when belonging to the ith category _i =1; the class prediction sub-network outputs a real image x through a softmax function _real Probability q of belonging to each category _i (see equation (2)) and then calculates the true image x using cross entropy loss _real Class predictive loss of l _CE ：

For generating image x _fake Calculating a smooth labelDistributed to the generated image x output by the generator G _fake For calculating class prediction losses of the generated image, calculated smooth labels +.>Is a vector of length C->Wherein the method comprises the steps ofRepresenting a generated image x _fake The smooth label in category i, meaning the probability that the image belongs to category i, can be calculated by equation (6):

wherein ε is a hyper-parameter and C represents the number of categories; for generating image x _fake Obtaining a class prediction value q according to formula (2) _i Generating a category prediction loss l of an image _SL-CE Can be expressed as:

further, in the step 5 triplet loss construction:

let x be ^a For randomly sampling the obtained real image, x ^p Is equal to x ^a True images belonging to the same category, x ⁿ Is equal to x ^a F (x) is a feature vector obtained by the real image after passing through a feature extraction network and a triplet feature learning sub-network, and then the input image which does not meet the following distance constraint in f (x) is selected to construct a triplet sample<x ^a ,x ^p ,x ⁿ >：

d(f(x ^a ),f(x ^p ))+α＜d(f(x ^a ),f(x ⁿ )) (8)

Wherein d (·) represents the euclidean distance and α is the classification interval;

for samples meeting the constraint of equation (8), the loss value is 0, and for triplet samples that do not meet equation (8), the loss value is x ^a And x ^p And x ⁿ The difference in distance of the feature vectors learned by f (x), i.e. the triplet loss l _triplets ：

l _triplets ＝d(f(x ^a ),f(x ^p ))-d(f(x ^a ),f(x ⁿ )) (9)

Further, in the step 6 generator and the construction of the arbiter loss function, the following steps are:

first, for the true/false prediction sub-network in the discriminator D, the true/false prediction loss l is calculated using a two-class cross entropy loss function _BCE To distinguish the true image x _real And generating an image x _fa k _e Can be expressed as:

l _BCE ＝-(y _v *log(p(v))+(1-y _v )*log(1-p(v))) (10)

wherein y is _v Representing genuine-fake two-class label, y when input image is true image _v =0, y when the input image is a generated image _v =1; p (v) is the true-false predicted value of the input image calculated by the formula (3);

then, the total loss function L of the generator G is constructed _G The method comprises the steps of carrying out a first treatment on the surface of the The function predicts the loss from authenticity _BCE And generating a class prediction loss of the image _SL-CE The two parts are composed of the following formulas:

wherein the method comprises the steps ofA smooth label representing the generated image calculated according to formula (6); lambda (lambda) ₁ 、λ ₂ Representing trade-off parameters;

finally, constructing a total loss function of the discriminator D; the true-false prediction loss l is expressed by a formula (10) _BCE Class prediction loss/of the generated image described in equation (7) _SL-CE Class prediction loss l of the real image described in equation (5) _CE And the triplet loss l as described in equation (9) _triplets Optimizing; total loss function L of discriminator D _D The formula is as follows:

wherein x is _real 、x _fake Representing the real image and the generated image, y, respectively _real Representation ofA hard tag of a real image is displayed,smooth labels representing generated images, y _v For the authenticity two-class label shown in the formula (10), x ^a ,x ^p ,x ⁿ A triplet sample, lambda, constructed for equation (8) _i (i=3, 4, ·, 7) represents a trade-off parameter.

Further, in the step 7 model training and testing, the following steps are:

using Adam as an optimizer, the model constructed by steps 2 to 6 is trained as follows:

(1) b vectors are randomly generated according to the formula (1), and b generated images are obtained by the input generator G;

(2) b generated images are mixed with b real images and then input into a discriminator D, and each loss is calculated according to a formula (12) and the weight of the discriminator D is updated;

(3) calculating generator loss and updating generator G weights according to formula (11);

(4) repeating the processes (1) - (3) until the training times are reached.

After training, reserving a discriminator D in the model designed in the step 2-6, inputting a test sample into the discriminator D, and outputting a prediction result of the category prediction sub-network to obtain a sample category.

The beneficial effects of the invention are as follows: according to the invention, through the steps 2 and 3, an ACGAN model can be constructed to generate various ship target images, so that the cost of manual labeling is reduced, and the diversity of data distribution is increased from the perspective of the model, thereby improving the robustness and generalization capability of the classifier.

And 4, constructing a soft label of a generated sample aiming at the ACGAN model, and training the model through the soft label to prevent the algorithm from being over fitted so as to improve the generalization capability of the algorithm.

Through the step 5, the category distinguishing characteristics can be better learned, so that the intra-category similarity and inter-category difference of semantic characteristics learned by the model are increased, and the distinguishing degree of different categories in the classification space is improved.

Drawings

FIG. 1 is a general frame diagram of the present invention;

FIG. 2 is a network architecture diagram of the arbiter of the present invention;

FIG. 3 is a schematic representation of the effect of triplet loss in accordance with the present invention;

figure 4 is a confusion matrix output by the designed model of the present invention over three classification tasks,

(a)ACGAN；(b)ResNet18；(c)DenseNet121；(d)Xception；(e)LST-ACGAN.；

figure 5 is a confusion matrix output by the designed model of the present invention over five classification tasks,

(a)ACGAN；(b)ResNet18；(c)DenseNet121；(d)Xception；(e)LST-ACGAN.。

the specific embodiment is as follows: the following detailed description of specific embodiments of the invention refers to the accompanying drawings:

as shown in fig. 1 to 5, the method for generating and classifying the ship targets of the SAR image specifically comprises the following steps:

and 1, constructing a training test set.

Collecting SAR ship target images, uniformly cutting the collected SAR ship target images into 1X 64 sizes by using a bicubic interpolation method, and dividing training and testing sample sets; and then carrying out sample amplification on the training samples in a translation and rotation mode, and carrying out sample amplification operation for more times on the training samples with fewer numbers of classes so as to make the number of the training samples of each type identical, namely making the number of the training samples of each type identical to the number of the training samples of the class with the largest number of the training samples.

And 2, constructing a generator G.

The constructed generator G is used for generating a generated image x consistent with the training image distribution according to the input vector z _fake . Wherein the input vector z' is obtained from equation (1). Embedding (y) _fake ) Represented by category label y _fake The vector with the length of m is obtained by conversion, and the specific conversion process is as follows: assuming that the classification task contains { 1..C } total C categories, a dimension of C m is first randomly generatedRandom matrix M, then randomly generating class label y _fake E { 1..times., C }, and converts it into one-hot tag y _oh (dimension 1 XC), recalculate y _oh X M, finally obtaining a vector of length M (y _fake ). The vector is applied to the vector (y _fake ) The element-wise multiplication with a randomly generated noise vector z of length m results in the input vector z' of the generator.

Wherein the method comprises the steps ofRepresenting element-wise multiplication. In this way, y can be _fake The represented class labels are fused with the noise vector z depth.

Then, the generator G takes as input the input vector z' and outputs a generated image x having dimensions of 1×64×64 _fake . The generator G was constructed based on DCNN (Unsupervised Representation Learning with Deep Convolutional GenerativeAdversarial Networks), and its specific structure is shown in table 1, and includes one conversion module, 2 UC modules, and one generation module (see table 1). For the input vector z ', the conversion module converts z' from a one-dimensional vector of length 128 to a three-dimensional feature map of size 128×16×16 by full join (FC), reshape operation, and Batch Normalization (BN), and then inputs the UC module. The UC module firstly carries out up-sampling for 2 times on the three-dimensional feature map output by the previous module through an Upsampling layer, and then realizes feature learning through a convolution layer (Conv), a batch normalization layer (BN) and an activation layer (Leaky ReLU). In the Conv of the convolution layer, c is the number of convolution kernels, k is the size of the convolution kernels, s is the step length, and p is the boundary expansion parameter; the slope coefficient β in the leak ReLU takes 0.2. Finally, the generating module reduces the three-dimensional characteristic diagram with the size of 64 multiplied by 64 output by the UC2 module to 1 multiplied by 64 through a 3 multiplied by 3 convolution layer Conv with the channel size of 1, and constraining the values in the three-dimensional feature map to (-1, 1) by means of a Tanh function.

Table 1 generator network architecture

And 3, designing a network structure of the discriminator D.

The input of the discriminator D is the real image x _real And step 2 the generated image x output by the generator G _fake After the confused data block passes through a feature extraction network and three multi-task learning sub-networks (a triplet feature learning sub-network, a category prediction sub-network and an authenticity prediction sub-network), the confused data block is output as a triplet feature, a category prediction result and an authenticity prediction result, and network parameters of the discriminator D and the generator G are updated by using corresponding losses. The discriminator D adopts a multi-task learning structure to learn the triplet characteristics, the category prediction results and the authenticity prediction results, and the network architecture is shown in fig. 2. Where b is the batch size.

The discriminator D mainly comprises a feature extraction network and the three multi-task learning sub-networks. The present invention employs a pretrained ResNet18 with the FC layer removed as the feature extraction network. B input images (real images x in confusion order in actual training) with the size of 1×64×64 _real And generating an image x _fake ) Inputting the b feature images into a feature extraction network to obtain b 512 multiplied by 4 size feature images, converting the b feature images into b feature vectors with the length of 512 through a global average pooling layer (GAP) and a Reshape layer, and normalizing the b feature vectors by L2 normalization to stabilize a learning process, so as to finally obtain b feature vectors with the length of 512. And finally, respectively inputting the characteristic vectors into the three multi-task learning sub-networks.

Specifically, the category prediction sub-network obtains a category prediction value for the image input by the discriminator D through a softmax function:

wherein C represents the number of categories, p _i ,p _j (i,j＝12, …, C) represents a classification vector, q) of a softmax output _i Representing the probability that the input image belongs to the i-th class, i.e. the class predictor.

And 4, classifying the sub-network smooth label construction and the loss function construction.

Step 3 the input image of the discriminator D constructed in step 3 is the real image x _real And generating an image x _fake Inputting the two images into a category prediction sub-network to obtain a category prediction value q _i The losses are then calculated separately in two ways: .

For real image x _real The calculation may be performed directly using a hard tag, which may be expressed as:

wherein y is _i (i=1, 2, …, C) represents a one-hot encoded vector when the true image x is inputted _real Y when belonging to the ith category _i =1. The class prediction sub-network outputs a real image x through a softmax function _real Probability q of belonging to each category _i (see equation (2)) and then calculates the true image x using Cross Entropy loss (CE) _real Class predictive loss of l _CE ：

For generating image x _fake The smooth label method is designed to calculate the classification loss. In particular, the quality of the image generated by the GAN is not perfect, especially ACGAN with class labels. When the training data is insufficient, ACGAN is more difficult to generate images with accurate category labels. However, hard tags with CE loss encourage the model to output integer predictive tags equal to 1 or 0 for targeted or non-targeted categories. Training based on the predictive labels described above may promote a target class p in the logic classification vector _i Tends to infinity so that the model is too confident in its predictions. Thus, when training samples are insufficient, generating ACGAN with hard tag images is prone to model collapse during training. For this problem, calculate a smooth labelRather than the hard tag being assigned to the generated image x output by generator G _fake For calculating class prediction losses of the generated image, calculated smooth labels +.>Is a vector of length C->Wherein->Representing a generated image x _fake The smooth label in category i, meaning the probability that the image belongs to category i, can be calculated by equation (6):

where ε is a superparameter and C represents the number of categories. For generating image x _fake Obtaining a class prediction value q according to formula (2) _i Generating a category prediction loss l of an image _SL-CE Can be expressed as:

in this way, by adding noise to the generated image, excessive confidence of the model in classification results is avoided, and prediction differences between target classes and non-target classes are properly reduced, so that regularization is performed on the model, and collapse of the model in the training process is reduced.

Step 5, triplet loss construction

Inputting the feature vector learned by the feature extraction network in the step 3 discriminator to a fully-connected layer (FC) with the dimension of 128 to obtain the output f of the triplet feature learning sub-network _triplets In order to better learn class-related discriminative features and solve the problem of high intra-class diversity and inter-class similarity, the invention optimizes the triplet features using triplet loss. The triplet loss aims to learn the distinguishability feature by pulling intra-class samples and pushing inter-class samples, the schematic of which is shown in fig. 3, which conflicts with label smoothing of the generated image. Therefore, the triplet loss is only used for the real image x _real . Specifically, assume x ^a For randomly sampling the obtained real image, x ^p Is equal to x ^a True images belonging to the same category, x ⁿ Is equal to x ^a F (x) is a feature vector obtained by the real image after passing through a feature extraction network and a triplet feature learning sub-network, and then the input image which does not meet the following distance constraint in f (x) is selected to construct a triplet sample<x ^a ,x ^p ,x ⁿ >：

d(f(x ^a ),f(x ^p ))+α＜d(f(x ^a ),f(x ⁿ )) (8)

Where d (·) represents the Euclidean distance, α is the classification interval, which in this embodiment is set to 0.2 altogether.

l _triplets ＝d(f(x ^a ),f(x ^p ))-d(f(x ^a ),f(x ⁿ )) (9)

And 6, constructing a generator and a discriminator loss function.

Based on the network of generators and discriminants constructed in steps 2 and 3 and the loss functions constructed in steps 4 and 5, the overall loss functions of the generators G and discriminants D are constructed in the following manner, and the generators and discriminants are alternately optimized by using the constructed overall loss functions.

First, for the true/false prediction sub-network in the discriminator D, the true/false prediction loss l is calculated using a two-class cross entropy loss function _BCE To distinguish the true image x _real And generating an image x _fake Can be expressed as:

l _BCE ＝-(y _v *log(p(v))+(1-y _v )*log(1-p(v))) (10)

wherein y is _v Representing genuine-fake two-class label, y when input image is true image _v =0, y when the input image is a generated image _v =1; p (v) is the true/false prediction value of the input image calculated by the formula (3).

Then, the total loss function L of the generator G is constructed _G . The function predicts the loss l from the authenticity according to the formula (10) _BCE And the class prediction loss l of the generated image shown in the formula (7) _SL-CE The two parts are composed of the following formulas:

wherein the method comprises the steps ofA smooth label representing the generated image calculated according to formula (6); y is _v Classifying labels for authenticity in the formula (10); lambda (lambda) ₁ 、λ ₂ Represents a trade-off parameter, lambda ₁ 、λ ₂ Set to 1.

Finally, the total loss function of the arbiter D is constructed. The arbiter D aims to distinguish between the real image and the generated image,and learn the category distinguishing characteristic of the real image, the network parameters of the category distinguishing characteristic are calculated by the formula (10) to predict the true and false loss l _BCE Class prediction loss/of the generated image described in equation (7) _SL-CE Class prediction loss l of the real image described in equation (5) _CE And the triplet loss l as described in equation (9) _triplets And (5) optimizing. Total loss function L of discriminator D _D The formula is as follows:

wherein x is _real 、x _fake Representing the real image and the generated image, y, respectively _real A hard tag representing a real image is provided,smooth labels representing generated images, y _v For the authenticity two-class label shown in the formula (10), x ^a ,x ^p ,x ⁿ A triplet sample, lambda, constructed for equation (8) _i (i=3, 4, ·, 7) represents a trade-off parameter, lambda _i Are set to 1.

Step 7, model training and testing

Using Adam as the optimizer, the learning rate was set to 0.0002, the batch size b was set to 128, and the number of training times was set to 200. The model constructed by steps 2 to 6 is trained as follows.

(4) repeating the processes (1) - (3) until the training times are reached.

In order to prove the effectiveness of the method, the invention verifies the algorithm effect through experiments. The data set used was OpenSARShip, which contains five types of vessels in total, the number of which is shown in table 2.

TABLE 2 number of different types of vessels

Based on the above vessel types, two sub-data sets were reconstructed as shown in table 3. First, 100 samples of each ship type were randomly selected as the standard test set T. Then, based on the class with the smallest number of samples, three class training sets D1 of class balance and five class training sets D2 of class balance are reconstructed, thereby constructing two classification tasks D1 and D2.

TABLE 3 training and test sample quantity for two classification tasks (D1 and D2) constructed

To ensure fairness of comparison, the size of the input image was adjusted to 1×64×64 pixels for all experiments. Adam was used as an optimizer, and the learning rate was set to 0.0002. The training batch size was set to 128 and the number of training times was 200. Weighing parameter lambda _i (i=1, 2, ·, 7) is set to 1.

To verify the effectiveness of label smoothing, epsilon= 0.1,0.2,0.3,0.4,0.5,0.6 was set separately and a classification experiment was performed on D1, the results of which are shown in table 4. It was found that the algorithm performed relatively stable under different label smoothing parameters, with 88.33% optimal performance when epsilon=0.4 was set. Thus, in the following, epsilon for all experiments was fixed at 0.4.

Table 4. Accuracy when using different label smoothing parameters on D1.

To verify the effect of the characteristic dimension m of z 'in equation (1), the z' dimension m is set to 64, 128, 256, 512, respectively. It can be seen from table 5 that the accuracy decreases when the feature dimension is too small or too large. This may be due to the fact that features with smaller dimensions are less capable of representation, while models with oversized dimensions may over fit a limited number of samples. Thus, for the following experiments, all feature dimensions were fixed at 128.

Table 5. Accuracy when using different feature dimensions on the D1 task.

To further demonstrate the effectiveness of the label smoothing method of equation (6) and the triplet loss of equation (9), the algorithm effect under different combinations of loss functions was tested using an ablation experiment, the results of which are shown in table 6, where "CE loss" indicates the use of a hard label for the generated image, i.e., setting epsilon=0, where the smoothed label of equation (6) is converted to a hard label. "SL-CE loss" means using the designed label smoothing method for the generated image, and "Triplet loss" means using Triplet loss, which is set to 0.4 according to Table 4 result ε. It was found that embedding SL-CE loss and Triplet loss gave better results than model results optimized by CE loss, where the "SL-CE loss + Triplet loss" combination gave 88.33% better performance than the baseline method (75.33%). Therefore, the integration of tag smoothing methods and triplet loss is critical to improving ship classification performance.

Table 6. Accuracy of using different combinations of loss functions when performing the three classifications on the D1 data.

To demonstrate the effectiveness of the algorithm, the proposed algorithm is compared to the effects of existing related classification algorithms. The algorithms compared included ACGAN, resNet18, denseNet121, xception.

TABLE 7 accuracy of different algorithms when performing the three classification tasks on D1

The three classification results for the different algorithms are shown in table 7. The proposed LST-CGAN achieved an optimal accuracy of 88.33%, 24.66% higher than ACGAN (63.67%), 9.66% higher than ResNet18 (78.67%), 5.33% higher than DenseNet121 (83.00%), and 7% higher than Xpercent (81.33%). Furthermore, by combining the proposed method with DenseNet21 and Xreception, the accuracy is improved by 4.33% and 2.67% respectively compared to the original algorithm.

The confusion matrix for the different algorithms is shown in fig. 4. It can be found that the classification accuracy of LST-ACGAN on these three ship types is superior to most of the previous methods, indicating the effectiveness of the designed method.

The five classification results for the different algorithms are shown in table 8. The proposed LST-ACGAN achieved an optimal accuracy of 56.40%, 15% higher than ACGAN (41.40%), 2.4% higher than ResNet18 (54.00%), 5.1% higher than DenseNet121 (51.30%), 6.32% higher than Xception (50.08%). In addition, by using the DenseNet121 or Xreception network to replace the feature extraction network and the category prediction sub-network in the classifier D, the proposed method is combined with the DenseNet121 and the Xreception, and the accuracy is improved by 0.8% and 0.6% compared with the original method.

TABLE 8 accuracy in Using the algorithm when five classifications are made on the D2 task

The accuracy of the different algorithms is shown in figure 5. It can be found that the classification accuracy of LST-ACGAN for these five boat forms is superior to most of the previous methods.

It should be understood that parts of the present specification not specifically described are prior art. The above examples are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the scope of protection defined by the claims of the present invention without departing from the spirit of the design of the present invention.

Claims

1. The SAR image ship target generation and classification method is characterized by comprising the following steps of:

step 1, training test set construction:

step 2, generator G constructs:

step 3, designing a network structure of the discriminator D;

step 3 the input image of the discriminator D constructed in step 3 is the real image x _real And generating an image x _fake The data blocks after confusion. Inputting the two images into a category prediction sub-network, wherein the two images are used for real image x _real Cross entropy loss calculation based on hard tagsReal image x _real Category prediction loss of (a); for generating image x _fake Class prediction loss of the generated image is calculated based on the smooth labels.

Step 5, triple loss construction:

Step 6, constructing a generator and a discriminator loss function:

and (3) respectively constructing the total loss functions of the generator G and the discriminator D based on the network constructed in the steps 2 and 3 and the loss functions constructed in the steps 4 and 5, and respectively performing alternating optimization on the generator and the discriminator by using the constructed total loss functions.

Step 7, model training and testing:

2. The method for generating and classifying the ship target of the SAR image set forth in claim 1, wherein the step 2 generator G is constructed by:

generating a generated image x consistent with the training image distribution according to the input vector z _fake The method comprises the steps of carrying out a first treatment on the surface of the Wherein the input vector z' is obtained from equation (1) in which emmbedding (y _fake ) Represented by category label y _fake The vector with the length of m is obtained by conversion, and the specific conversion process is as follows: assuming that the classification task contains { 1..once., C } C total classes, a random matrix M of dimension C x M is first randomly generated, and then a class label y is randomly generated _fake E { 1..times., C }, and converts it into a one-hot tag y of dimension 1 XC _oh Recalculate y _oh X M, finally obtaining a vector of length M (y _fake ) The method comprises the steps of carrying out a first treatment on the surface of the The vector is applied to the vector (y _fake ) Element-wise multiplying the randomly generated noise vector z of length m to finally obtain a productThe input vector z' of the device;

then, the generator G takes the input vector z' as input and outputs a generated image x _fake Constructing a generator G based on the DCNN, wherein the generator G comprises a conversion module, 2 UC modules and a generation module; for an input vector z ', a conversion module converts the z' from a one-dimensional vector to a three-dimensional feature map through Full Connection (FC), reshape operation and Batch Normalization (BN), and then inputs the three-dimensional feature map to a UC module; the UC module firstly carries out up-sampling for 2 times on the three-dimensional feature map output by the previous module through an Upsampling layer, and then realizes feature learning through a convolution layer (Conv), a batch normalization layer (BN) and an activation layer (LeakyReLU); in the Conv of the convolution layer, c is the number of convolution kernels, k is the size of the convolution kernels, s is the step length, and p is the boundary expansion parameter; beta is a slope coefficient; finally, the generating module reduces the three-dimensional feature map output by the UC2 module through a convolution layer Conv, and constrains the values in the three-dimensional feature map to (-1, 1) through a Tanh function.

3. The method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 3, the design of the network structure of the discriminator D is characterized in that:

4. The method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 4, the classification sub-network smooth label construction and the loss function construction are as follows:

wherein y is _i (i=1, 2, …, C) represents a one-hot encoded vector when the true image x is inputted _real Y when belonging to the ith category _i =1; the class prediction sub-network outputs a real image x through a softmax function _real Probability q of belonging to each category _i (see equation (2)) and then calculates the true image x using cross entropy loss _real Category prediction loss of (2)Loss of L _CE ：

5. the method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 5 triplet loss construction:

let x be ^a True graph obtained for random samplingImage, x ^p Is equal to x ^a True images belonging to the same category, x ⁿ Is equal to x ^a F (x) is a feature vector obtained by the real image after passing through a feature extraction network and a triplet feature learning sub-network, and then the input image which does not meet the following distance constraint in f (x) is selected to construct a triplet sample<x ^a ,x ^p ,x ⁿ >：

d(f(x ^a ),f(x ^p ))+α＜d(f(x ^a ),f(x ⁿ )) (8)

l _triplets ＝d(f(x ^a ),f(x ^p ))-d(f(x ^a ),f(x ⁿ )) (9)。

6. The method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 6 generator and the construction of the discriminator loss function:

l _BCE ＝-(y _v *log(p(v))+(1-y _v )*log(1-p(v))) (10)

wherein x is _real 、x _fake Representing the real image and the generated image, y, respectively _real A hard tag representing a real image is provided,smooth labels representing generated images, y _v For the authenticity two-class label shown in the formula (10), x ^a ,x ^p ,x ⁿ A triplet sample, lambda, constructed for equation (8) _i (i=3, 4, ·, 7) represents a trade-off parameter.

7. The method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 7 model training and testing:

(4) repeating the processes (1) - (3) until the training times are reached.