CN116895016A - SAR image ship target generation and classification method - Google Patents

SAR image ship target generation and classification method Download PDF

Info

Publication number
CN116895016A
CN116895016A CN202310771564.4A CN202310771564A CN116895016A CN 116895016 A CN116895016 A CN 116895016A CN 202310771564 A CN202310771564 A CN 202310771564A CN 116895016 A CN116895016 A CN 116895016A
Authority
CN
China
Prior art keywords
image
loss
discriminator
network
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310771564.4A
Other languages
Chinese (zh)
Inventor
徐从安
高龙
苏航
张建廷
吴俊峰
闫文君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Naval Aeronautical University
Original Assignee
Naval Aeronautical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Naval Aeronautical University filed Critical Naval Aeronautical University
Priority to CN202310771564.4A priority Critical patent/CN116895016A/en
Publication of CN116895016A publication Critical patent/CN116895016A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a SAR image ship target generation and classification method, which comprises the steps of training test set construction; a construction generator for generating a generated image with a category label; the discriminator outputs the three-tuple characteristics, the category prediction result and the true and false prediction result through the characteristic extraction network and the three multi-task learning sub-networks, and updates the network parameters of the discriminator and the generator by using the corresponding loss; calculating the category prediction loss of the real image based on the hard tag, and generating the category prediction loss of the image based on the smooth tag calculation; inputting feature vectors learned by a feature extraction network in the discriminator to a full-connection layer, optimizing triplet features by using triplet loss, and learning distinguishable features related to categories; constructing an overall loss function based on the generator and the discriminator and the constructed loss function, and alternately optimizing the generator and the discriminator by using the constructed overall loss function; after training the constructed model, the input test images are classified by using a discriminator. The method can reduce the cost of manual labeling, prevent the algorithm from being overfitted, improve the robustness and generalization capability of the classifier, and increase the degree of distinction of different categories in the classification space.

Description

SAR image ship target generation and classification method
Technical Field
The invention relates to the technical field of synthetic aperture radar image (SAR image) target generation and classification, in particular to an SAR image ship target generation and classification method.
Background
The classification of ships plays an important role in various offshore activities, such as national defense information, fishery monitoring, offshore searching and the like. SAR is gradually becoming a key device for ship monitoring due to its characteristics of all-day and all-weather operation. With the transmission of new generation synthetic aperture radar satellites, a large number of medium-high resolution synthetic aperture radar images are becoming easier to acquire, making it possible to identify the type of vessel.
Conventional SAR ship classification methods typically extract low-level features such as geometric features, scattering features, and statistical features by manual methods and then classify the ship using conventional machine learning algorithms. However, these manually extracted features are difficult to fully represent vessel targets and have limited performance in complex scenarios.
In recent years, a deep learning-based SAR ship classification method is attracting more and more attention, and becomes a relevant research hotspot. For example, li et al (Li J, qu C, peng S.clip classification for unbalanced SAR datasetbased on convolutional neural network [ J ]. Journal ofAppliedRemote Sensing, 2018) have devised a dense residual network with resampling and integrated cost-sensitive learning for class imbalance problems in ship classification. Deches et al (Deches ne, C.; lefE v re, S.; vadaine, R.; hajduch, G.; fablet, R.clip Identification and Characterization in Sentinel-1SAR Images with Multi-Task Deep learning.remote sens.2019,11,1-18, doi:10.3390/rs 11242997.) have designed a multitasking architecture to accomplish the detection, classification, and length estimation tasks simultaneously. Firoozy et al (Firoozy, N.; sandisregaram, N.jackling SAR Imagery Ship Classification Imbalance via Deep Convolutional Generative Adversarial network.can.J. remote ns.2021,47,295-308, doi: 10.1080/07038992.2021.1910499.) generate samples for minority classes using resistance training to balance training datasets. He et al (He, j.; wang, y.; liu, h.clip Classification in Medium-Resolution SAR Images via Densely Connected Triplet CNNs Integrating Fisher Discrimination Regularized Metric learning.2021,59, 3022-3039.) propose a dense connected triplet neural network with Fisher regularization term for mid-resolution SAR image vessel classification tasks. Zeng et al (Zeng, l.; zhu, q.; lu, d.; zhang, t.; wang, h.; yin, j.; membrane, s.dual-Polarized SAR Ship Grained Classification Based on CNN With Hybrid Channel Feature loss.2022,19, 16-20.) propose the joint use of information contained in polarized channels (VV and VH) by designing a hybrid channel feature loss for dual polarized SAR vessel granularity classification. Zhang et al (Zhang, Y.; lei, Z.; yu, H.; zhuang, L.Imbanced High-Resolution SAR Ship Recognition Method Based on a Lightweight CNN.IEEE Geosci.remote sens.Lett.2022,19,0-4, doi: 10.1109/LGRS.2021.3083262.) designed a lightweight CNNs classification model combining DML with stepwise balanced sampling to reduce computational complexity and solve the problem of class imbalance in High resolution SAR images. Summarizing, the deep learning-based method can directly extract advanced features from large-scale data and perform end-to-end training, and has great potential in practical application. However, the currently available SAR image ship classification data volume is relatively small, especially for some offshore military targets such as eviction vessels and aircraft carriers, and manual tagging is costly. Due to the numerous parameters of deep learning models, deep learning-based models trained on small-scale data sets often suffer from over-fitting problems, thereby reducing the generalization ability of the model. Furthermore, unlike optical images, SAR imaging mechanisms have limited information due to imperfections in SAR imaging mechanisms, resulting in relatively small differences in appearance for different types of vessels, and relatively large differences in appearance for the same type of vessel. The high intra-class diversity and inter-class similarity make it difficult for the algorithm to learn the classification boundaries, which further reduces the classification performance of the algorithm.
To solve the above problems, the present invention proposes LST-ACGAN for SAR ship classification, generating new samples by introducing ACGAN to expand sample distribution; the algorithm is prevented from being overfitted by distributing smooth labels to the generated samples, so that the generalization capability of the model is improved; by integrating the triplet loss into the ACGAN, the distinguishability of different classes of samples is improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a SAR image ship target generation and classification method.
The technical scheme provided by the invention is as follows: deep learning-based SAR image ship classification achieves significant performance in large-scale training samples, however, these methods have difficulty in learning the distribution of data and are prone to overfitting when the training data is insufficient. In response to this problem, the present invention proposes a GAN-based algorithm, called LST-ACGAN, for creating images while completing classification. The overall model of LST-ACGAN includes a generator G and a discriminant D. The generator G may generate an image with a category label and the arbiter D may verify whether the input image is authentic and at the same time implement classification. By assigning a smooth label to the image generated by generator G and proposing a cross entropy penalty (SL-CE) based on the smooth label as a classification penalty for the generated image, the model is regularized during training and model collapse is avoided. The triplet loss is integrated into the classification network of the arbiter D to learn the discriminative features associated with the class. In the training process, triplets are constructed by sampling anchor samples, positive samples and negative samples from the real images and input into the discriminator D together with the generated images, and then the discriminator and generator are jointly optimized by a plurality of loss functions. After training is completed, the input test samples are classified by using a discriminator.
The SAR image ship target generation and classification method is characterized by comprising the following steps of:
step 1, training test set construction:
collecting SAR ship target images, and dividing training and testing sample sets; then carrying out sample amplification on the training samples to enable the number of the training samples of each type to be the same as the number of the training samples of the class with the largest training sample number;
step 2, generator G constructs:
generating a generated image x with class labels consistent with a training sample distribution fake Constructing a generator G based on the DCNN;
step 3, designing a network structure of the discriminator D;
the input of the discriminator D is the real image x real And step 2 the generated image x output by the generator G fake After the confused data block passes through a feature extraction network and three multi-task learning sub-networks, outputting a triplet feature, a category prediction result and an authenticity prediction result, and updating network parameters of a discriminator D and a generator G by using corresponding losses, wherein the three multi-task learning sub-networks are the triplet feature learning sub-network, the category prediction sub-network and the authenticity prediction sub-network;
step 4, classifying the sub-network smooth label construction and the loss function construction:
step 3 the input image of the discriminator D constructed in step 3 is the real image x real And generating an image x fake The data block after confusion inputs the two images into a category prediction sub-network, and the two images are used for real image x real Computing real image x using cross entropy loss based on hard tag real Class prediction loss for generated image x fake Class prediction loss of the generated image is calculated based on the smooth labels.
Step 5, triple loss construction:
inputting the feature vector learned by the feature extraction network in the step 3 discriminator to a full connection layer (FC) to obtain the output f of the triplet feature learning sub-network triplets The triplet loss is used to optimize triplet characteristics, learning category-related distinguishability characteristics.
Step 6, constructing a generator and a discriminator loss function:
based on the network of generators and discriminants constructed in steps 2 and 3 and the loss functions constructed in steps 4 and 5, the overall loss functions of the generators G and discriminants D are constructed, and the constructed overall loss functions are used for alternately optimizing the generators and the discriminants.
Step 7, model training and testing:
training the model constructed in the steps 2 to 6, and classifying the input test sample by using a discriminator after the training is finished.
Further, in the step 2 generator G configuration described above:
generating a generated image x consistent with the training image distribution according to the input vector z fake The method comprises the steps of carrying out a first treatment on the surface of the Wherein the input vector z' is obtained from equation (1) in which emmbedding (y fake ) Represented by category label y fake The vector with the length of m is obtained by conversion, and the specific conversion process is as follows: assuming that the classification task contains { 1..once., C } C total classes, a random matrix M of dimension C x M is first randomly generated, and then a class label y is randomly generated fake E { 1..times., C }, and converts it into a one-hot tag y of dimension 1 XC oh Recalculate y oh X M, finally obtaining a vector of length M (y fake ) The method comprises the steps of carrying out a first treatment on the surface of the The vector is applied to the vector (y fake ) Multiplying the noise vector z with the length of m generated randomly by elements to finally obtain an input vector z' of the generator;
wherein the method comprises the steps ofRepresenting element-by-element multiplication;
then, the generator G takes the input vector z' as input and outputs a generated image x fake Constructing a generator G based on the DCNN, wherein the generator G comprises a conversion module, 2 UC modules and a generation module; for the input vector z ', the conversion module converts z' from a one-dimensional vector to a three-dimensional feature map by full join (FC), reshape operation, and Batch Normalization (BN), howeverInputting the UC module; the UC module firstly carries out up-sampling for 2 times on the three-dimensional feature map output by the previous module through an Upsampling layer, and then realizes feature learning through a convolution layer (Conv), a batch normalization layer (BN) and an activation layer (LeakyReLU); in the Conv of the convolution layer, c is the number of convolution kernels, k is the size of the convolution kernels, s is the step length, and p is the boundary expansion parameter; beta is a slope coefficient; finally, the generating module reduces the three-dimensional feature map output by the UC2 module through a convolution layer Conv, and constrains the values in the three-dimensional feature map to (-1, 1) through a Tanh function;
further, in the step 3, the arbiter D network structure design:
adopting a pretrained ResNet18 with the FC layer removed as a feature extraction network; b input images are input into a feature extraction network to obtain b feature images, the b feature images are converted into b feature vectors through a global average pooling layer (GAP) and a Reshape layer, and then the b feature vectors are normalized by L2 normalization to finally obtain b feature vectors; finally, the feature vectors are respectively input into the three multi-task learning sub-networks;
the category prediction sub-network obtains a category prediction value of the image input by the discriminator D through a softmax function:
wherein C represents the number of categories, p i ,p j (i, j=1, 2, …, C) represents a classification vector of softmax output, q i Representing the probability that the input image belongs to the ith category, namely a category prediction value;
the authenticity prediction sub-network maps the output to (0, 1) by adopting a sigmoid function to obtain an authenticity prediction value of the input image:
where v represents a two-class vector and p (v) represents the probability that the input image belongs to the true class, i.e. the true-false prediction value.
Further, in the step 4, the classifying sub-network smooth label construction and the loss function construction are as follows:
inputting the two images into a category prediction sub-network to obtain a category prediction value q i The losses are then calculated separately:
for real image x real The calculation is directly performed by adopting a hard tag, and the hard tag can be expressed as:
wherein y is i (i=1, 2, …, C) represents a one-hot encoded vector when the true image x is inputted real Y when belonging to the ith category i =1; the class prediction sub-network outputs a real image x through a softmax function real Probability q of belonging to each category i (see equation (2)) and then calculates the true image x using cross entropy loss real Class predictive loss of l CE
For generating image x fake Calculating a smooth labelDistributed to the generated image x output by the generator G fake For calculating class prediction losses of the generated image, calculated smooth labels +.>Is a vector of length C->Wherein the method comprises the steps ofRepresenting a generated image x fake The smooth label in category i, meaning the probability that the image belongs to category i, can be calculated by equation (6):
wherein ε is a hyper-parameter and C represents the number of categories; for generating image x fake Obtaining a class prediction value q according to formula (2) i Generating a category prediction loss l of an image SL-CE Can be expressed as:
further, in the step 5 triplet loss construction:
let x be a For randomly sampling the obtained real image, x p Is equal to x a True images belonging to the same category, x n Is equal to x a F (x) is a feature vector obtained by the real image after passing through a feature extraction network and a triplet feature learning sub-network, and then the input image which does not meet the following distance constraint in f (x) is selected to construct a triplet sample<x a ,x p ,x n >:
d(f(x a ),f(x p ))+α<d(f(x a ),f(x n )) (8)
Wherein d (·) represents the euclidean distance and α is the classification interval;
for samples meeting the constraint of equation (8), the loss value is 0, and for triplet samples that do not meet equation (8), the loss value is x a And x p And x n The difference in distance of the feature vectors learned by f (x), i.e. the triplet loss l triplets
l triplets =d(f(x a ),f(x p ))-d(f(x a ),f(x n )) (9)
Further, in the step 6 generator and the construction of the arbiter loss function, the following steps are:
first, for the true/false prediction sub-network in the discriminator D, the true/false prediction loss l is calculated using a two-class cross entropy loss function BCE To distinguish the true image x real And generating an image x fa k e Can be expressed as:
l BCE =-(y v *log(p(v))+(1-y v )*log(1-p(v))) (10)
wherein y is v Representing genuine-fake two-class label, y when input image is true image v =0, y when the input image is a generated image v =1; p (v) is the true-false predicted value of the input image calculated by the formula (3);
then, the total loss function L of the generator G is constructed G The method comprises the steps of carrying out a first treatment on the surface of the The function predicts the loss from authenticity BCE And generating a class prediction loss of the image SL-CE The two parts are composed of the following formulas:
wherein the method comprises the steps ofA smooth label representing the generated image calculated according to formula (6); lambda (lambda) 1 、λ 2 Representing trade-off parameters;
finally, constructing a total loss function of the discriminator D; the true-false prediction loss l is expressed by a formula (10) BCE Class prediction loss/of the generated image described in equation (7) SL-CE Class prediction loss l of the real image described in equation (5) CE And the triplet loss l as described in equation (9) triplets Optimizing; total loss function L of discriminator D D The formula is as follows:
wherein x is real 、x fake Representing the real image and the generated image, y, respectively real Representation ofA hard tag of a real image is displayed,smooth labels representing generated images, y v For the authenticity two-class label shown in the formula (10), x a ,x p ,x n A triplet sample, lambda, constructed for equation (8) i (i=3, 4, ·, 7) represents a trade-off parameter.
Further, in the step 7 model training and testing, the following steps are:
using Adam as an optimizer, the model constructed by steps 2 to 6 is trained as follows:
(1) b vectors are randomly generated according to the formula (1), and b generated images are obtained by the input generator G;
(2) b generated images are mixed with b real images and then input into a discriminator D, and each loss is calculated according to a formula (12) and the weight of the discriminator D is updated;
(3) calculating generator loss and updating generator G weights according to formula (11);
(4) repeating the processes (1) - (3) until the training times are reached.
After training, reserving a discriminator D in the model designed in the step 2-6, inputting a test sample into the discriminator D, and outputting a prediction result of the category prediction sub-network to obtain a sample category.
The beneficial effects of the invention are as follows: according to the invention, through the steps 2 and 3, an ACGAN model can be constructed to generate various ship target images, so that the cost of manual labeling is reduced, and the diversity of data distribution is increased from the perspective of the model, thereby improving the robustness and generalization capability of the classifier.
And 4, constructing a soft label of a generated sample aiming at the ACGAN model, and training the model through the soft label to prevent the algorithm from being over fitted so as to improve the generalization capability of the algorithm.
Through the step 5, the category distinguishing characteristics can be better learned, so that the intra-category similarity and inter-category difference of semantic characteristics learned by the model are increased, and the distinguishing degree of different categories in the classification space is improved.
Drawings
FIG. 1 is a general frame diagram of the present invention;
FIG. 2 is a network architecture diagram of the arbiter of the present invention;
FIG. 3 is a schematic representation of the effect of triplet loss in accordance with the present invention;
figure 4 is a confusion matrix output by the designed model of the present invention over three classification tasks,
(a)ACGAN;(b)ResNet18;(c)DenseNet121;(d)Xception;(e)LST-ACGAN.;
figure 5 is a confusion matrix output by the designed model of the present invention over five classification tasks,
(a)ACGAN;(b)ResNet18;(c)DenseNet121;(d)Xception;(e)LST-ACGAN.。
the specific embodiment is as follows: the following detailed description of specific embodiments of the invention refers to the accompanying drawings:
as shown in fig. 1 to 5, the method for generating and classifying the ship targets of the SAR image specifically comprises the following steps:
and 1, constructing a training test set.
Collecting SAR ship target images, uniformly cutting the collected SAR ship target images into 1X 64 sizes by using a bicubic interpolation method, and dividing training and testing sample sets; and then carrying out sample amplification on the training samples in a translation and rotation mode, and carrying out sample amplification operation for more times on the training samples with fewer numbers of classes so as to make the number of the training samples of each type identical, namely making the number of the training samples of each type identical to the number of the training samples of the class with the largest number of the training samples.
And 2, constructing a generator G.
The constructed generator G is used for generating a generated image x consistent with the training image distribution according to the input vector z fake . Wherein the input vector z' is obtained from equation (1). Embedding (y) fake ) Represented by category label y fake The vector with the length of m is obtained by conversion, and the specific conversion process is as follows: assuming that the classification task contains { 1..C } total C categories, a dimension of C m is first randomly generatedRandom matrix M, then randomly generating class label y fake E { 1..times., C }, and converts it into one-hot tag y oh (dimension 1 XC), recalculate y oh X M, finally obtaining a vector of length M (y fake ). The vector is applied to the vector (y fake ) The element-wise multiplication with a randomly generated noise vector z of length m results in the input vector z' of the generator.
Wherein the method comprises the steps ofRepresenting element-wise multiplication. In this way, y can be fake The represented class labels are fused with the noise vector z depth.
Then, the generator G takes as input the input vector z' and outputs a generated image x having dimensions of 1×64×64 fake . The generator G was constructed based on DCNN (Unsupervised Representation Learning with Deep Convolutional GenerativeAdversarial Networks), and its specific structure is shown in table 1, and includes one conversion module, 2 UC modules, and one generation module (see table 1). For the input vector z ', the conversion module converts z' from a one-dimensional vector of length 128 to a three-dimensional feature map of size 128×16×16 by full join (FC), reshape operation, and Batch Normalization (BN), and then inputs the UC module. The UC module firstly carries out up-sampling for 2 times on the three-dimensional feature map output by the previous module through an Upsampling layer, and then realizes feature learning through a convolution layer (Conv), a batch normalization layer (BN) and an activation layer (Leaky ReLU). In the Conv of the convolution layer, c is the number of convolution kernels, k is the size of the convolution kernels, s is the step length, and p is the boundary expansion parameter; the slope coefficient β in the leak ReLU takes 0.2. Finally, the generating module reduces the three-dimensional characteristic diagram with the size of 64 multiplied by 64 output by the UC2 module to 1 multiplied by 64 through a 3 multiplied by 3 convolution layer Conv with the channel size of 1, and constraining the values in the three-dimensional feature map to (-1, 1) by means of a Tanh function.
Table 1 generator network architecture
And 3, designing a network structure of the discriminator D.
The input of the discriminator D is the real image x real And step 2 the generated image x output by the generator G fake After the confused data block passes through a feature extraction network and three multi-task learning sub-networks (a triplet feature learning sub-network, a category prediction sub-network and an authenticity prediction sub-network), the confused data block is output as a triplet feature, a category prediction result and an authenticity prediction result, and network parameters of the discriminator D and the generator G are updated by using corresponding losses. The discriminator D adopts a multi-task learning structure to learn the triplet characteristics, the category prediction results and the authenticity prediction results, and the network architecture is shown in fig. 2. Where b is the batch size.
The discriminator D mainly comprises a feature extraction network and the three multi-task learning sub-networks. The present invention employs a pretrained ResNet18 with the FC layer removed as the feature extraction network. B input images (real images x in confusion order in actual training) with the size of 1×64×64 real And generating an image x fake ) Inputting the b feature images into a feature extraction network to obtain b 512 multiplied by 4 size feature images, converting the b feature images into b feature vectors with the length of 512 through a global average pooling layer (GAP) and a Reshape layer, and normalizing the b feature vectors by L2 normalization to stabilize a learning process, so as to finally obtain b feature vectors with the length of 512. And finally, respectively inputting the characteristic vectors into the three multi-task learning sub-networks.
Specifically, the category prediction sub-network obtains a category prediction value for the image input by the discriminator D through a softmax function:
wherein C represents the number of categories, p i ,p j (i,j=12, …, C) represents a classification vector, q) of a softmax output i Representing the probability that the input image belongs to the i-th class, i.e. the class predictor.
The authenticity prediction sub-network maps the output to (0, 1) by adopting a sigmoid function to obtain an authenticity prediction value of the input image:
where v represents a two-class vector and p (v) represents the probability that the input image belongs to the true class, i.e. the true-false prediction value.
And 4, classifying the sub-network smooth label construction and the loss function construction.
Step 3 the input image of the discriminator D constructed in step 3 is the real image x real And generating an image x fake Inputting the two images into a category prediction sub-network to obtain a category prediction value q i The losses are then calculated separately in two ways: .
For real image x real The calculation may be performed directly using a hard tag, which may be expressed as:
wherein y is i (i=1, 2, …, C) represents a one-hot encoded vector when the true image x is inputted real Y when belonging to the ith category i =1. The class prediction sub-network outputs a real image x through a softmax function real Probability q of belonging to each category i (see equation (2)) and then calculates the true image x using Cross Entropy loss (CE) real Class predictive loss of l CE
For generating image x fake The smooth label method is designed to calculate the classification loss. In particular, the quality of the image generated by the GAN is not perfect, especially ACGAN with class labels. When the training data is insufficient, ACGAN is more difficult to generate images with accurate category labels. However, hard tags with CE loss encourage the model to output integer predictive tags equal to 1 or 0 for targeted or non-targeted categories. Training based on the predictive labels described above may promote a target class p in the logic classification vector i Tends to infinity so that the model is too confident in its predictions. Thus, when training samples are insufficient, generating ACGAN with hard tag images is prone to model collapse during training. For this problem, calculate a smooth labelRather than the hard tag being assigned to the generated image x output by generator G fake For calculating class prediction losses of the generated image, calculated smooth labels +.>Is a vector of length C->Wherein->Representing a generated image x fake The smooth label in category i, meaning the probability that the image belongs to category i, can be calculated by equation (6):
where ε is a superparameter and C represents the number of categories. For generating image x fake Obtaining a class prediction value q according to formula (2) i Generating a category prediction loss l of an image SL-CE Can be expressed as:
in this way, by adding noise to the generated image, excessive confidence of the model in classification results is avoided, and prediction differences between target classes and non-target classes are properly reduced, so that regularization is performed on the model, and collapse of the model in the training process is reduced.
Step 5, triplet loss construction
Inputting the feature vector learned by the feature extraction network in the step 3 discriminator to a fully-connected layer (FC) with the dimension of 128 to obtain the output f of the triplet feature learning sub-network triplets In order to better learn class-related discriminative features and solve the problem of high intra-class diversity and inter-class similarity, the invention optimizes the triplet features using triplet loss. The triplet loss aims to learn the distinguishability feature by pulling intra-class samples and pushing inter-class samples, the schematic of which is shown in fig. 3, which conflicts with label smoothing of the generated image. Therefore, the triplet loss is only used for the real image x real . Specifically, assume x a For randomly sampling the obtained real image, x p Is equal to x a True images belonging to the same category, x n Is equal to x a F (x) is a feature vector obtained by the real image after passing through a feature extraction network and a triplet feature learning sub-network, and then the input image which does not meet the following distance constraint in f (x) is selected to construct a triplet sample<x a ,x p ,x n >:
d(f(x a ),f(x p ))+α<d(f(x a ),f(x n )) (8)
Where d (·) represents the Euclidean distance, α is the classification interval, which in this embodiment is set to 0.2 altogether.
For samples meeting the constraint of equation (8), the loss value is 0, and for triplet samples that do not meet equation (8), the loss value is x a And x p And x n The difference in distance of the feature vectors learned by f (x), i.e. the triplet loss l triplets
l triplets =d(f(x a ),f(x p ))-d(f(x a ),f(x n )) (9)
And 6, constructing a generator and a discriminator loss function.
Based on the network of generators and discriminants constructed in steps 2 and 3 and the loss functions constructed in steps 4 and 5, the overall loss functions of the generators G and discriminants D are constructed in the following manner, and the generators and discriminants are alternately optimized by using the constructed overall loss functions.
First, for the true/false prediction sub-network in the discriminator D, the true/false prediction loss l is calculated using a two-class cross entropy loss function BCE To distinguish the true image x real And generating an image x fake Can be expressed as:
l BCE =-(y v *log(p(v))+(1-y v )*log(1-p(v))) (10)
wherein y is v Representing genuine-fake two-class label, y when input image is true image v =0, y when the input image is a generated image v =1; p (v) is the true/false prediction value of the input image calculated by the formula (3).
Then, the total loss function L of the generator G is constructed G . The function predicts the loss l from the authenticity according to the formula (10) BCE And the class prediction loss l of the generated image shown in the formula (7) SL-CE The two parts are composed of the following formulas:
wherein the method comprises the steps ofA smooth label representing the generated image calculated according to formula (6); y is v Classifying labels for authenticity in the formula (10); lambda (lambda) 1 、λ 2 Represents a trade-off parameter, lambda 1 、λ 2 Set to 1.
Finally, the total loss function of the arbiter D is constructed. The arbiter D aims to distinguish between the real image and the generated image,and learn the category distinguishing characteristic of the real image, the network parameters of the category distinguishing characteristic are calculated by the formula (10) to predict the true and false loss l BCE Class prediction loss/of the generated image described in equation (7) SL-CE Class prediction loss l of the real image described in equation (5) CE And the triplet loss l as described in equation (9) triplets And (5) optimizing. Total loss function L of discriminator D D The formula is as follows:
wherein x is real 、x fake Representing the real image and the generated image, y, respectively real A hard tag representing a real image is provided,smooth labels representing generated images, y v For the authenticity two-class label shown in the formula (10), x a ,x p ,x n A triplet sample, lambda, constructed for equation (8) i (i=3, 4, ·, 7) represents a trade-off parameter, lambda i Are set to 1.
Step 7, model training and testing
Using Adam as the optimizer, the learning rate was set to 0.0002, the batch size b was set to 128, and the number of training times was set to 200. The model constructed by steps 2 to 6 is trained as follows.
(1) B vectors are randomly generated according to the formula (1), and b generated images are obtained by the input generator G;
(2) b generated images are mixed with b real images and then input into a discriminator D, and each loss is calculated according to a formula (12) and the weight of the discriminator D is updated;
(3) calculating generator loss and updating generator G weights according to formula (11);
(4) repeating the processes (1) - (3) until the training times are reached.
After training, reserving a discriminator D in the model designed in the step 2-6, inputting a test sample into the discriminator D, and outputting a prediction result of the category prediction sub-network to obtain a sample category.
In order to prove the effectiveness of the method, the invention verifies the algorithm effect through experiments. The data set used was OpenSARShip, which contains five types of vessels in total, the number of which is shown in table 2.
TABLE 2 number of different types of vessels
Based on the above vessel types, two sub-data sets were reconstructed as shown in table 3. First, 100 samples of each ship type were randomly selected as the standard test set T. Then, based on the class with the smallest number of samples, three class training sets D1 of class balance and five class training sets D2 of class balance are reconstructed, thereby constructing two classification tasks D1 and D2.
TABLE 3 training and test sample quantity for two classification tasks (D1 and D2) constructed
To ensure fairness of comparison, the size of the input image was adjusted to 1×64×64 pixels for all experiments. Adam was used as an optimizer, and the learning rate was set to 0.0002. The training batch size was set to 128 and the number of training times was 200. Weighing parameter lambda i (i=1, 2, ·, 7) is set to 1.
To verify the effectiveness of label smoothing, epsilon= 0.1,0.2,0.3,0.4,0.5,0.6 was set separately and a classification experiment was performed on D1, the results of which are shown in table 4. It was found that the algorithm performed relatively stable under different label smoothing parameters, with 88.33% optimal performance when epsilon=0.4 was set. Thus, in the following, epsilon for all experiments was fixed at 0.4.
Table 4. Accuracy when using different label smoothing parameters on D1.
To verify the effect of the characteristic dimension m of z 'in equation (1), the z' dimension m is set to 64, 128, 256, 512, respectively. It can be seen from table 5 that the accuracy decreases when the feature dimension is too small or too large. This may be due to the fact that features with smaller dimensions are less capable of representation, while models with oversized dimensions may over fit a limited number of samples. Thus, for the following experiments, all feature dimensions were fixed at 128.
Table 5. Accuracy when using different feature dimensions on the D1 task.
To further demonstrate the effectiveness of the label smoothing method of equation (6) and the triplet loss of equation (9), the algorithm effect under different combinations of loss functions was tested using an ablation experiment, the results of which are shown in table 6, where "CE loss" indicates the use of a hard label for the generated image, i.e., setting epsilon=0, where the smoothed label of equation (6) is converted to a hard label. "SL-CE loss" means using the designed label smoothing method for the generated image, and "Triplet loss" means using Triplet loss, which is set to 0.4 according to Table 4 result ε. It was found that embedding SL-CE loss and Triplet loss gave better results than model results optimized by CE loss, where the "SL-CE loss + Triplet loss" combination gave 88.33% better performance than the baseline method (75.33%). Therefore, the integration of tag smoothing methods and triplet loss is critical to improving ship classification performance.
Table 6. Accuracy of using different combinations of loss functions when performing the three classifications on the D1 data.
To demonstrate the effectiveness of the algorithm, the proposed algorithm is compared to the effects of existing related classification algorithms. The algorithms compared included ACGAN, resNet18, denseNet121, xception.
TABLE 7 accuracy of different algorithms when performing the three classification tasks on D1
The three classification results for the different algorithms are shown in table 7. The proposed LST-CGAN achieved an optimal accuracy of 88.33%, 24.66% higher than ACGAN (63.67%), 9.66% higher than ResNet18 (78.67%), 5.33% higher than DenseNet121 (83.00%), and 7% higher than Xpercent (81.33%). Furthermore, by combining the proposed method with DenseNet21 and Xreception, the accuracy is improved by 4.33% and 2.67% respectively compared to the original algorithm.
The confusion matrix for the different algorithms is shown in fig. 4. It can be found that the classification accuracy of LST-ACGAN on these three ship types is superior to most of the previous methods, indicating the effectiveness of the designed method.
The five classification results for the different algorithms are shown in table 8. The proposed LST-ACGAN achieved an optimal accuracy of 56.40%, 15% higher than ACGAN (41.40%), 2.4% higher than ResNet18 (54.00%), 5.1% higher than DenseNet121 (51.30%), 6.32% higher than Xception (50.08%). In addition, by using the DenseNet121 or Xreception network to replace the feature extraction network and the category prediction sub-network in the classifier D, the proposed method is combined with the DenseNet121 and the Xreception, and the accuracy is improved by 0.8% and 0.6% compared with the original method.
TABLE 8 accuracy in Using the algorithm when five classifications are made on the D2 task
The accuracy of the different algorithms is shown in figure 5. It can be found that the classification accuracy of LST-ACGAN for these five boat forms is superior to most of the previous methods.
It should be understood that parts of the present specification not specifically described are prior art. The above examples are only illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the scope of protection defined by the claims of the present invention without departing from the spirit of the design of the present invention.

Claims (7)

1. The SAR image ship target generation and classification method is characterized by comprising the following steps of:
step 1, training test set construction:
collecting SAR ship target images, and dividing training and testing sample sets; then carrying out sample amplification on the training samples to enable the number of the training samples of each type to be the same as the number of the training samples of the class with the largest training sample number;
step 2, generator G constructs:
generating a generated image x with class labels consistent with a training sample distribution fake Constructing a generator G based on the DCNN;
step 3, designing a network structure of the discriminator D;
the input of the discriminator D is the real image x real And step 2 the generated image x output by the generator G fake After the confused data block passes through a feature extraction network and three multi-task learning sub-networks, outputting a triplet feature, a category prediction result and an authenticity prediction result, and updating network parameters of a discriminator D and a generator G by using corresponding losses, wherein the three multi-task learning sub-networks are the triplet feature learning sub-network, the category prediction sub-network and the authenticity prediction sub-network;
step 4, classifying the sub-network smooth label construction and the loss function construction:
step 3 the input image of the discriminator D constructed in step 3 is the real image x real And generating an image x fake The data blocks after confusion. Inputting the two images into a category prediction sub-network, wherein the two images are used for real image x real Cross entropy loss calculation based on hard tagsReal image x real Category prediction loss of (a); for generating image x fake Class prediction loss of the generated image is calculated based on the smooth labels.
Step 5, triple loss construction:
inputting the feature vector learned by the feature extraction network in the step 3 discriminator to a full connection layer (FC) to obtain the output f of the triplet feature learning sub-network triplets The triplet loss is used to optimize triplet characteristics, learning category-related distinguishability characteristics.
Step 6, constructing a generator and a discriminator loss function:
and (3) respectively constructing the total loss functions of the generator G and the discriminator D based on the network constructed in the steps 2 and 3 and the loss functions constructed in the steps 4 and 5, and respectively performing alternating optimization on the generator and the discriminator by using the constructed total loss functions.
Step 7, model training and testing:
training the model constructed in the steps 2 to 6, and classifying the input test sample by using a discriminator after the training is finished.
2. The method for generating and classifying the ship target of the SAR image set forth in claim 1, wherein the step 2 generator G is constructed by:
generating a generated image x consistent with the training image distribution according to the input vector z fake The method comprises the steps of carrying out a first treatment on the surface of the Wherein the input vector z' is obtained from equation (1) in which emmbedding (y fake ) Represented by category label y fake The vector with the length of m is obtained by conversion, and the specific conversion process is as follows: assuming that the classification task contains { 1..once., C } C total classes, a random matrix M of dimension C x M is first randomly generated, and then a class label y is randomly generated fake E { 1..times., C }, and converts it into a one-hot tag y of dimension 1 XC oh Recalculate y oh X M, finally obtaining a vector of length M (y fake ) The method comprises the steps of carrying out a first treatment on the surface of the The vector is applied to the vector (y fake ) Element-wise multiplying the randomly generated noise vector z of length m to finally obtain a productThe input vector z' of the device;
wherein the method comprises the steps ofRepresenting element-by-element multiplication;
then, the generator G takes the input vector z' as input and outputs a generated image x fake Constructing a generator G based on the DCNN, wherein the generator G comprises a conversion module, 2 UC modules and a generation module; for an input vector z ', a conversion module converts the z' from a one-dimensional vector to a three-dimensional feature map through Full Connection (FC), reshape operation and Batch Normalization (BN), and then inputs the three-dimensional feature map to a UC module; the UC module firstly carries out up-sampling for 2 times on the three-dimensional feature map output by the previous module through an Upsampling layer, and then realizes feature learning through a convolution layer (Conv), a batch normalization layer (BN) and an activation layer (LeakyReLU); in the Conv of the convolution layer, c is the number of convolution kernels, k is the size of the convolution kernels, s is the step length, and p is the boundary expansion parameter; beta is a slope coefficient; finally, the generating module reduces the three-dimensional feature map output by the UC2 module through a convolution layer Conv, and constrains the values in the three-dimensional feature map to (-1, 1) through a Tanh function.
3. The method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 3, the design of the network structure of the discriminator D is characterized in that:
adopting a pretrained ResNet18 with the FC layer removed as a feature extraction network; b input images are input into a feature extraction network to obtain b feature images, the b feature images are converted into b feature vectors through a global average pooling layer (GAP) and a Reshape layer, and then the b feature vectors are normalized by L2 normalization to finally obtain b feature vectors; finally, the feature vectors are respectively input into the three multi-task learning sub-networks;
the category prediction sub-network obtains a category prediction value of the image input by the discriminator D through a softmax function:
wherein C represents the number of categories, p i ,p j (i, j=1, 2, …, C) represents a classification vector of softmax output, q i Representing the probability that the input image belongs to the ith category, namely a category prediction value;
the authenticity prediction sub-network maps the output to (0, 1) by adopting a sigmoid function to obtain an authenticity prediction value of the input image:
where v represents a two-class vector and p (v) represents the probability that the input image belongs to the true class, i.e. the true-false prediction value.
4. The method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 4, the classification sub-network smooth label construction and the loss function construction are as follows:
inputting the two images into a category prediction sub-network to obtain a category prediction value q i The losses are then calculated separately:
for real image x real The calculation is directly performed by adopting a hard tag, and the hard tag can be expressed as:
wherein y is i (i=1, 2, …, C) represents a one-hot encoded vector when the true image x is inputted real Y when belonging to the ith category i =1; the class prediction sub-network outputs a real image x through a softmax function real Probability q of belonging to each category i (see equation (2)) and then calculates the true image x using cross entropy loss real Category prediction loss of (2)Loss of L CE
For generating image x fake Calculating a smooth labelDistributed to the generated image x output by the generator G fake For calculating class prediction losses of the generated image, calculated smooth labels +.>Is a vector of length C->Wherein the method comprises the steps ofRepresenting a generated image x fake The smooth label in category i, meaning the probability that the image belongs to category i, can be calculated by equation (6):
wherein ε is a hyper-parameter and C represents the number of categories; for generating image x fake Obtaining a class prediction value q according to formula (2) i Generating a category prediction loss l of an image SL-CE Can be expressed as:
5. the method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 5 triplet loss construction:
let x be a True graph obtained for random samplingImage, x p Is equal to x a True images belonging to the same category, x n Is equal to x a F (x) is a feature vector obtained by the real image after passing through a feature extraction network and a triplet feature learning sub-network, and then the input image which does not meet the following distance constraint in f (x) is selected to construct a triplet sample<x a ,x p ,x n >:
d(f(x a ),f(x p ))+α<d(f(x a ),f(x n )) (8)
Wherein d (·) represents the euclidean distance and α is the classification interval;
for samples meeting the constraint of equation (8), the loss value is 0, and for triplet samples that do not meet equation (8), the loss value is x a And x p And x n The difference in distance of the feature vectors learned by f (x), i.e. the triplet loss l triplets
l triplets =d(f(x a ),f(x p ))-d(f(x a ),f(x n )) (9)。
6. The method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 6 generator and the construction of the discriminator loss function:
first, for the true/false prediction sub-network in the discriminator D, the true/false prediction loss l is calculated using a two-class cross entropy loss function BCE To distinguish the true image x real And generating an image x fake Can be expressed as:
l BCE =-(y v *log(p(v))+(1-y v )*log(1-p(v))) (10)
wherein y is v Representing genuine-fake two-class label, y when input image is true image v =0, y when the input image is a generated image v =1; p (v) is the true-false predicted value of the input image calculated by the formula (3);
then, the total loss function L of the generator G is constructed G The method comprises the steps of carrying out a first treatment on the surface of the The function predicts the loss from authenticity BCE And generating a class prediction loss of the image SL-CE The two parts are composed of the following formulas:
wherein the method comprises the steps ofA smooth label representing the generated image calculated according to formula (6); lambda (lambda) 1 、λ 2 Representing trade-off parameters;
finally, constructing a total loss function of the discriminator D; the true-false prediction loss l is expressed by a formula (10) BCE Class prediction loss/of the generated image described in equation (7) SL-CE Class prediction loss l of the real image described in equation (5) CE And the triplet loss l as described in equation (9) triplets Optimizing; total loss function L of discriminator D D The formula is as follows:
wherein x is real 、x fake Representing the real image and the generated image, y, respectively real A hard tag representing a real image is provided,smooth labels representing generated images, y v For the authenticity two-class label shown in the formula (10), x a ,x p ,x n A triplet sample, lambda, constructed for equation (8) i (i=3, 4, ·, 7) represents a trade-off parameter.
7. The method for generating and classifying the ship target of the SAR image according to claim 1, wherein in the step 7 model training and testing:
using Adam as an optimizer, the model constructed by steps 2 to 6 is trained as follows:
(1) b vectors are randomly generated according to the formula (1), and b generated images are obtained by the input generator G;
(2) b generated images are mixed with b real images and then input into a discriminator D, and each loss is calculated according to a formula (12) and the weight of the discriminator D is updated;
(3) calculating generator loss and updating generator G weights according to formula (11);
(4) repeating the processes (1) - (3) until the training times are reached.
After training, reserving a discriminator D in the model designed in the step 2-6, inputting a test sample into the discriminator D, and outputting a prediction result of the category prediction sub-network to obtain a sample category.
CN202310771564.4A 2023-06-27 2023-06-27 SAR image ship target generation and classification method Pending CN116895016A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310771564.4A CN116895016A (en) 2023-06-27 2023-06-27 SAR image ship target generation and classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310771564.4A CN116895016A (en) 2023-06-27 2023-06-27 SAR image ship target generation and classification method

Publications (1)

Publication Number Publication Date
CN116895016A true CN116895016A (en) 2023-10-17

Family

ID=88310162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310771564.4A Pending CN116895016A (en) 2023-06-27 2023-06-27 SAR image ship target generation and classification method

Country Status (1)

Country Link
CN (1) CN116895016A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576164A (en) * 2023-12-14 2024-02-20 中国人民解放军海军航空大学 Remote sensing video sea-land movement target tracking method based on feature joint learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576164A (en) * 2023-12-14 2024-02-20 中国人民解放军海军航空大学 Remote sensing video sea-land movement target tracking method based on feature joint learning
CN117576164B (en) * 2023-12-14 2024-05-03 中国人民解放军海军航空大学 Remote sensing video sea-land movement target tracking method based on feature joint learning

Similar Documents

Publication Publication Date Title
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN109145979B (en) Sensitive image identification method and terminal system
CN111652321B (en) Marine ship detection method based on improved YOLOV3 algorithm
CN114926746B (en) SAR image change detection method based on multiscale differential feature attention mechanism
CN109063724B (en) Enhanced generation type countermeasure network and target sample identification method
CN106228185B (en) A kind of general image classifying and identifying system neural network based and method
CN109671070B (en) Target detection method based on feature weighting and feature correlation fusion
CN111368769B (en) Ship multi-target detection method based on improved anchor point frame generation model
CN109743642B (en) Video abstract generation method based on hierarchical recurrent neural network
CN111369522B (en) Light field significance target detection method based on generation of deconvolution neural network
CN110363068B (en) High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN113408605A (en) Hyperspectral image semi-supervised classification method based on small sample learning
CN113313123B (en) Glance path prediction method based on semantic inference
CN111723780A (en) Directional migration method and system of cross-domain data based on high-resolution remote sensing image
CN111968124B (en) Shoulder musculoskeletal ultrasonic structure segmentation method based on semi-supervised semantic segmentation
CN114926693A (en) SAR image small sample identification method and device based on weighted distance
CN109977968A (en) A kind of SAR change detecting method of deep learning classification and predicting
CN116665176A (en) Multi-task network road target detection method for vehicle automatic driving
CN116895016A (en) SAR image ship target generation and classification method
CN111639697B (en) Hyperspectral image classification method based on non-repeated sampling and prototype network
CN111259733A (en) Point cloud image-based ship identification method and device
CN111126155B (en) Pedestrian re-identification method for generating countermeasure network based on semantic constraint
CN115496720A (en) Gastrointestinal cancer pathological image segmentation method based on ViT mechanism model and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination