CN114067177B - Remote sensing image classification network robustness improving method based on self-supervision learning - Google Patents

Remote sensing image classification network robustness improving method based on self-supervision learning Download PDF

Info

Publication number
CN114067177B
CN114067177B CN202111368092.5A CN202111368092A CN114067177B CN 114067177 B CN114067177 B CN 114067177B CN 202111368092 A CN202111368092 A CN 202111368092A CN 114067177 B CN114067177 B CN 114067177B
Authority
CN
China
Prior art keywords
network
remote sensing
sensing image
model
robustness
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111368092.5A
Other languages
Chinese (zh)
Other versions
CN114067177A (en
Inventor
孙浩
徐延杰
雷琳
计科峰
匡纲要
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111368092.5A priority Critical patent/CN114067177B/en
Publication of CN114067177A publication Critical patent/CN114067177A/en
Application granted granted Critical
Publication of CN114067177B publication Critical patent/CN114067177B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Abstract

The invention provides a remote sensing image classification network robustness improving method based on self-supervision learning, which not only utilizes labeled data, but also fully utilizes a great deal of non-labeled data existing in the remote sensing field, and effectively improves the robustness of a model by mining the information of the image through a twin network; and performing feature extraction on the clean sample and the countermeasure sample by using the twin network to obtain a feature vector, and completing model training by comparing and learning the feature vectors of the approaching clean sample and the countermeasure sample, so that the image has stable expression in a depth remote sensing image encoder network in an online network in the twin network, and further the improvement of robustness is realized. The method effectively enhances the robustness of the model to resisting sample noise and natural noise, hardly influences the classification effect of a clean data set, and is convenient to apply.

Description

Remote sensing image classification network robustness improving method based on self-supervision learning
Technical Field
The invention relates to the crossing field of deep learning and remote sensing, in particular to a remote sensing image classification network robustness improving method based on self-supervision learning.
Background
In recent years, neural networks have achieved breakthrough in many fields such as computer vision and natural language processing. In the remote sensing image classification application, the neural network inevitably acts on various unknown remote sensing data sets containing a large amount of different noises, although the noises have no influence on human eye identification, the noises can often induce the deep neural network to make wrong judgment, and the wrong judgment poses a serious security threat to the application of the neural network in the remote sensing image classification.
The importance of the interpretable deep learning is highlighted by the fact that the deep neural network makes a completely wrong judgment due to the tiny noise which cannot be perceived by human eyes, what is the basis adopted in the neural network classification and judgment, and how to further improve the stability and the expression capacity of the deep learning model. Training robust, interpretable deep neural networks is a higher pursuit.
Meanwhile, with the rapid development of remote sensing career, a large number of remote sensing data sets are continuously emerging. The manual labeling is time-consuming and labor-consuming and is difficult to match with the high-speed increase of the remote sensing data volume, how a large number of unlabeled data sets are utilized further improves the robustness and expression capacity of the remote sensing image classification neural network, and the potential and importance of the self-supervision learning are highlighted. Therefore, in recent research, the improvement of the comprehensive performance of the model through the self-supervision learning is highly regarded.
Under the background, in order to improve the defense capability of a remote sensing image classification model for a countermeasure sample, a large number of countermeasure defense methods are proposed, gradient masking (gradient masking) is proposed by some scholars according to gradient transmission, and the gradient of the model is changed into non-calculable or non-differentiable, so that the conventional attack method based on the gradient is avoided. However, these gradient-mask-based methods have proven to be only capable of defending against sample attacks in a rather limited number of cases. An attack method (backed Pass differential attack) for avoiding the gradient can completely bypass the gradient covering and implement effective attack on the network. And the countermeasure Training (adaptive Training) method utilizes the countermeasure sample generated by the appointed countermeasure attack method to be added into the Training set for carrying out neural network Training again, thereby improving the defense capability of the model to a certain extent. But the requirement on the data volume of the labeled data is high, the stability on natural noise is poor, the clean sample identification capability of the model can be reduced, and the method is not suitable for the remote sensing field with expensive label manufacturing.
In summary, a remote sensing image classification network robustness improvement method based on self-supervision learning is urgently needed to solve the problems in the prior art.
Disclosure of Invention
The invention aims to provide a remote sensing image classification network robustness improving method based on self-supervision learning, aims to provide a robustness improving method aiming at a remote sensing image classification model, and solves the problem that the prior art cannot fully utilize a large amount of unlabelled data in remote sensing resources to improve the performance of the model, and the specific technical scheme is as follows:
a remote sensing image classification network robustness improving method based on self-supervision learning comprises the following steps:
step S1: preprocessing the tag-free remote sensing image data, copying the preprocessed data into two parts, using a twin network model to carry out counterattack on one part to produce a counterattack sample data set, and carrying out data amplification on the other part to obtain a clean sample data set;
step S2: obtaining a characteristic vector of the clean sample data set through a target network in a twin network, obtaining a characteristic vector of the confrontation sample data set through an online network in the twin network, and then obtaining the contrast loss of the two characteristic vectors;
step S3: updating the twin network: firstly, the online network carries out gradient pass-back updating according to contrast loss, then the newly obtained online network and the original target network are used for carrying out exponential moving average, and the target network in the twin network is updated, so that the updating of the whole twin network model is realized;
step S4: iteratively executing the steps S1 to S3 to count c rounds, and finishing the training for improving the robustness of the twin network model; wherein c is a positive integer;
step S5: and extracting a depth remote sensing image encoder network in an online network in the twin network model after the c-round training, adding a full connecting layer to the depth remote sensing image encoder network to form a classification model, and then carrying out fine adjustment by using labeled data to finally obtain a robust classification model.
Preferably, the attack resisting mode in the above technical solution includes the following two modes:
one is a PGD attack based on gradient iteration, as in formula 1):
Figure BDA0003361588890000021
wherein the content of the first and second substances,
Figure BDA0003361588890000022
for the preprocessed data x to iteratively move n times along the gradient direction in the twin network to obtain a confrontation sample,
Figure BDA0003361588890000023
is shown in
Figure BDA0003361588890000024
As the projection of a random sphere with a center and a radius of S, epsilon as the perturbation limit, alpha as the attack step length, L θ,ξ Comparing loss of the twin network model, xi is a target network parameter, and theta is an online network parameter;
another is an SSP attack based on an auto-supervised pseudo-gradient, as specified in equations 2) -3):
Figure BDA0003361588890000031
Figure BDA0003361588890000032
wherein psi is a depth remote sensing image encoder network in the online network,
Figure BDA0003361588890000033
which is the output of the depth remote sensing image encoder network,
Figure BDA0003361588890000034
the method is characterized in that a preprocessed data x is subjected to iterative movement for n times along the gradient direction of the preprocessed data x in a depth remote sensing image encoder network to obtain a countersample, alpha is an attack step length, n represents the number of gradient iterations, and infinity represents that an attack is carried out under the limit of an infinite norm.
Preferably, in the above technical solution, the online network is composed of an encoder, a projector and a predictor, the target network is composed of an encoder and a projector, and initial parameters of the online network and the target network are set differently.
Preferably, in the above technical solution, the euclidean distance of the feature vector is used as the contrast loss, specifically, as shown in formula 4):
Figure BDA0003361588890000035
wherein L is θ,ξ For contrast loss, q is the target network, xi is the target network parameter, f is the online network, theta is the online network parameter,
Figure BDA0003361588890000036
to combat the sample data set, x' is the clean sample data set,
Figure BDA0003361588890000037
is a normalized value of the clean sample feature vector,
Figure BDA0003361588890000038
is a normalized value against the sample feature vector.
Preferably, in the above technical solution, the online network performs gradient backhaul update according to the contrast loss, specifically as shown in formula 5):
θ←optimizer(θ,▽ θ L θ,ξ lr) formula 5),
wherein optimizer represents optimization operation, theta is an online network parameter + θ Is the gradient direction of the loss function on the on-line network parameter, xi is the target network parameter, L θ,ξ For twin network model contrast loss, lr is the learning rate.
Preferably, in the above technical solution, the target network is updated by using the online network parameters using exponential moving average, specifically as shown in formula 6):
ξ ← τ ξ + (1- τ) θ equation 6),
wherein, theta is an online network parameter, xi is a target network parameter, tau is a retention index, and the network updating speed is controlled.
Preferably, in the above technical solution, the fine-tuning by using the labeled data specifically includes: firstly, inputting an image with label data into a depth remote sensing image encoder model, then taking the label of the data as an expected result of encoder model training, optimizing encoder model parameters through gradient feedback, limiting the maximum value of single change of the encoder model parameters in the optimization process, finally enabling the encoder model to have robust remote sensing image classification performance, and finishing the training of the classification model.
Preferably, in the above technical solution, the pretreatment includes the steps of:
step S1.1: performing cutting operation on all the images to standardize the images to a uniform size;
step S1.2: normalizing the numerical values of the images, namely compressing the pixel values of all the images to be between 0 and 1;
step S1.3: the image data is linearly transformed by a normalization operation into a data set with a mean of 0 and a variance of 1.
Preferably, batch standardization is used in the twin network to improve the model performance, specifically as shown in formula 7):
Figure BDA0003361588890000041
wherein a is batch data in the twin network, including a clean sample data set and a challenge sample data set which are input at the beginning, and a clean sample data set and a feature vector of the challenge sample data set which are extracted respectively through the twin network later; a is the normalized transition batch data,
Figure BDA0003361588890000042
is the average of the batch data, var (a) is the variance of the batch data a, and γ and β are the parameters to be learned.
Preferably, in the above technical solution, the data amplification is performed by at least one of random clipping, color matching, graying, random flipping, random rotation, and gaussian noise adding.
The technical scheme of the invention has the following beneficial effects:
the method provided by the invention not only utilizes the labeled data, but also fully utilizes the label-free data existing in a large amount in the remote sensing field, and effectively improves the robustness of the model by mining the information of the image through the twin network. And performing feature extraction on the clean sample and the countermeasure sample by using the twin network to obtain a feature vector, and completing model training by comparing and learning the feature vectors of the approaching clean sample and the countermeasure sample, so that the image has stable expression in a depth remote sensing image encoder network in an online network in the twin network, and further the improvement of robustness is realized. The method effectively enhances the robustness of the model to resisting sample noise and natural noise, hardly influences the classification effect of a clean data set, and is convenient to apply.
The method for improving the robustness of the remote sensing image classification model is combined with the self-supervision learning process, and the comparison learning is carried out aiming at the anti-attack means and the data amplification means, so that the characteristic expression of the classifier model on the same image is more stable. Considering that the network obtained by self-supervision contrast learning is difficult to be directly applied to the classification task of the remote sensing image, a small amount of labeled data is adopted for fine adjustment, gradient truncation is carried out during fine adjustment training (namely, the maximum value of single change of the parameters of the encoder model is limited), gradient explosion is prevented, and the performance of the classification model is improved. On the basis of ensuring the classification accuracy of the model to the clean samples, the method realizes the defense to the confrontation samples and improves the robustness of the model; in addition, the method has small demand for labeled data, is very suitable for the current situation that the labeling in the remote sensing field wastes time and labor, has low cost and has high practicability.
The method of the invention is still effective without using negative samples, which is due to the implicit contrast gain brought by batch standardization, so that batch standardization is not only an effective means for improving the performance of the twin network model, but also a basis for training the twin network model.
In addition to the objects, features and advantages described above, other objects, features and advantages of the present invention are also provided. The present invention will be described in further detail below with reference to the drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method for improving robustness of a classification model of a remote sensing image provided by the invention;
FIG. 2 is a model structure diagram of the method for improving robustness of the remote sensing image classification model provided by the invention.
Detailed Description
In order that the invention may be more fully understood, a more particular description of the invention will now be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Example 1:
at present, the deep learning technology is greatly developed and is deeply fused with the remote sensing technology to obtain revolutionary results, but the vulnerability of the deep learning also leaves hidden danger for the application of the deep learning technology in the field of remote sensing with extremely high requirements on safety and stability. The anti-attack method can make the deep network output a result completely different from the original result by adding well-designed tiny noise into the original image, and seriously threatens the safety of remote sensing detection and identification.
In remote sensing detection and identification, a great amount of natural noise such as cloud and fog shielding, focusing blurring, wind, frost, rain and snow, digital noise and the like and well-designed man-made interference such as for military concealment are required to be faced, and the requirements on the robustness of the model are high.
In contrast, the embodiment provides a method for improving robustness of a remote sensing image classification network based on self-supervised learning, as shown in fig. 1 and fig. 2, the method specifically includes the following steps:
step S1: preprocessing the tag-free remote sensing image data, copying the preprocessed data into two parts, using a twin network model to carry out counterattack on one part to produce a counterattack sample data set, and carrying out data amplification on the other part to obtain a clean sample data set;
preferably, the pretreatment comprises the following steps:
step S1.1: performing cutting operation on all the images to standardize the images to a uniform size;
step S1.2: normalizing the numerical values of the images, namely compressing the pixel values of all the images to be between 0 and 1;
step S1.3: the image data is linearly transformed by a normalization operation into a data set with a mean of 0 and a variance of 1.
In this embodiment, when preprocessing data, an image is first adjusted to 256 × 256, center clipping is used to obtain a 224 × 224 data set, then the image is normalized to compress the value range of the data to 0 to 1, and then normalization is performed to convert a picture into an approximately normally distributed data set with a mean value of 0 and a variance of 1. The preprocessed data is then replicated into two copies, one for making the confrontation sample data set and one for obtaining a clean sample data set.
In this embodiment, data amplification is performed by at least one of random clipping, color matching, graying, random flipping, random rotation, and gaussian noise adding. The diversity of the distribution of clean sample data is improved under the condition that the example level label of the preprocessed data is not changed.
The use of data amplification is not fixed, random adjustment can be performed according to actual tasks, random clipping is recommended if the size change of a target in an image is large, and color change and gray level are recommended if the color change of the image is bright, so that the robustness of the model can be effectively improved.
In addition, amplification of a data set is not performed only once, remote sensing data is complex and changeable, image diversity is collapsed by single amplification, and training of a twin network is extremely dependent on data diversity, so that data amplification must be updated again in each training round to ensure twin network performance. Meanwhile, the data amplification and the selection of the countermeasure sample are flexible, and the tasks such as camouflage reconnaissance, defogging and the like can be flexibly selected according to the characteristics of the tasks.
Step S2: obtaining a feature vector q of the clean sample data set through a target network in a twin network ξ (x'), passing the challenge sample data set through an online network in the twin network to obtain its feature vector
Figure BDA0003361588890000061
Then obtaining the contrast loss of the two eigenvectors;
as shown in fig. 2, the online network is composed of an encoder, a projector and a predictor, the target network is composed of an encoder and a projector, and the two networks are not completely symmetrical; although some of the two networks have the same structure, in the embodiment, the initial parameters of the online network and the target network are set differently, so that the difference is increased, and the training is accelerated.
In this embodiment, preferably, Batch Normalization (BN) is used in the twin network to improve the model performance, which is specifically as follows:
Figure BDA0003361588890000071
wherein BN represents batch normalization; a is batch data in the twin network, including a clean sample data set, a challenge sample data set which are input at the beginning, and a clean sample data set and a feature vector of the challenge sample data set which are extracted respectively through the twin network later; a is the normalized transition batch data,
Figure BDA0003361588890000076
the mean value of the batch data, var (a) the variance of the batch data a, and gamma and beta are parameters to be learned, and are updated when the model is optimized.
The use of the twin model is not fixed and may be modified according to the amount and type of data of the training data. The encoder is composed of a plurality of Convolution layers (Convolution layers), output is subjected to batch standardization operation every two to three Convolution layers, and a Linear activated layer (RecU) is used for activating the neuron, so that the model performance can be effectively improved.
Specifically, the encoder is a deep convolutional network with image depth feature extraction capability, in this embodiment, specifically, a backbone network of a ResNet18 network (with the last full-connection layer removed, and 17 convolutional layers in total) is used as the encoder, and the final output dimension is 25088;
the projector and the predictor both use a multilayer perceptron MLP, and the structure of the MLP is that a full connection layer with an output characteristic dimension of 256 is added, Batch Normalization (BN) is added, an activation layer (activation function ReLU) is added, so that the model has the nonlinear classification capability, and finally the full connection layer with the output characteristic dimension of 256 is connected. The activation function ReLU may be represented by the following formula:
Figure BDA0003361588890000072
where s is a parameter into the active layer.
Further, to ensure the stability of the training, the clean sample feature vector q is first applied ξ (x') and the confrontational sample feature vectors output by the online network
Figure BDA0003361588890000073
Respectively carrying out normalization treatment to obtain
Figure BDA0003361588890000074
And
Figure BDA0003361588890000075
using Euclidean distance of feature vectors as a comparison loss, specifically, a loss function L to be compared and learned θ,ξ Defined as the sum of clean sample feature vectorsThe on-line network outputs the mean square error of the normalized value of the eigenvector of the confrontation sample, which is specifically as follows:
Figure BDA0003361588890000081
wherein L is θ,ξ For contrast loss, q is the target network, xi is the target network parameter, f is the online network, theta is the online network parameter,
Figure BDA0003361588890000082
to combat the sample data set, x' is the clean sample data set,
Figure BDA0003361588890000083
is a normalized value of the feature vector of the clean sample,
Figure BDA0003361588890000084
is a normalized value against the sample feature vector.
Step S3: updating the twin network: firstly, the online network carries out gradient pass-back updating according to contrast loss, then the newly obtained online network and the original target network are used for carrying out exponential moving average, and the target network in the twin network is updated, so that the updating of the whole twin network model is realized;
specifically, the online network performs gradient backhaul update according to the contrast loss, and more specifically, performs gradient backhaul only on the online network, and updates the parameters of the online network by using an Adam optimizer according to the gradient obtained by backhaul, which is specifically as follows:
θ←optimizer(θ,▽ θ L θ,ξ ,lr)
wherein optimizer represents optimization operation, theta is an online network parameter + θ Is the gradient direction of the loss function on the on-line network parameter, xi is the target network parameter, L θ,ξ For twin network model contrast loss, lr is the learning rate.
Adam is different from the classical random gradient descent method. The stochastic gradient descent maintains a single learning rate (called alpha) for all weight updates, and the learning rate does not change during the training process. Adam maintains a learning rate for each network weight (parameter) and adjusts individually as learning evolves. The method calculates adaptive learning rates for different parameters from the first and second moments of the gradient. The Adam optimizer has low requirement on a memory and high calculation efficiency, and is favorable for rapid convergence of the model.
The target network adopts exponential moving average to update by using online network parameters, namely: utilizing the just updated online network, using exponential moving average to the target network, and feeding back and updating the target network through the parameters of the online network, wherein the following formula is specifically provided:
ξ←τξ+(1-τ)θ
wherein, theta is an online network parameter, xi is a target network parameter, tau is a retention index, and the network updating speed is controlled.
To ensure the stability of the training, the update speed of the target network is generally controlled, and the retention index generally has a value close to 1, specifically 0.99 in this embodiment.
Step S4: iteratively executing the steps S1 to S3 until the loss function convergence is met and the iteration times reach c, and finishing the training of improving the robustness of the twin network model; wherein c is a positive integer;
in the process of iteratively executing steps S1-S3, a new countermeasure sample set is made according to the updated twin network, and the counterattack in this embodiment includes the following two ways:
one is a PGD attack based on gradient iteration, as follows:
Figure BDA0003361588890000091
wherein the content of the first and second substances,
Figure BDA0003361588890000092
for the preprocessed data x to iteratively move n times the obtained confrontation sample along the gradient direction in the twin network,
Figure BDA0003361588890000093
is shown in
Figure BDA0003361588890000094
As the projection of a random sphere with a center and a radius of S, epsilon as the perturbation limit, alpha as the attack step length, L θ,ξ Comparing loss of the twin network model, xi is a target network parameter, and theta is an online network parameter;
through a certain number of iterations, a new antagonistic sample to the twin network can be generated to maximize the similarity error with the clean sample. The method for resisting attack is low in calculation consumption, high in attack strength and good in generalization, and is the preferred method for manufacturing the resisting sample in the embodiment.
The other is SSP attack based on self-supervision pseudo gradient, which has the following two formulas:
Figure BDA0003361588890000095
Figure BDA0003361588890000096
wherein psi is a depth remote sensing image encoder network in the online network,
Figure BDA0003361588890000097
which is the output of the depth remote sensing image encoder network,
Figure BDA0003361588890000098
the method is characterized in that a preprocessed data x is subjected to iterative movement for n times along the gradient direction of the preprocessed data x in a depth remote sensing image encoder network to obtain a countersample, alpha is an attack step length, n represents the number of gradient iterations, and infinity represents that an attack is carried out under the limit of an infinite norm.
The method is similar to the PGD attack, but the action target of the method is an encoder of an online network, and a countermeasure sample is made by maximizing the Euclidean distance with the feature vector of a clean sample through gradient superposition.
In the method provided by this embodiment, no negative sample is used, and the model training is prone to collapse, so the training parameters must be precisely controlled during training, and the lr learning rate is set to 0.003. Generally, the magnitude of the countermeasure noise varies according to the actual application requirement, and the present embodiment specifically uses a countermeasure sample with a disturbance amplitude of 8/255. Meanwhile, the training may be unstable due to the too fast network update rate, so the training process is limited by using gradient interception, and the maximum value of the parameter update (i.e. the maximum value of a single change of the parameter) is set to 0.01.
The iterative process requires increased randomness, random batches of training data for each round, and re-runs for data amplification.
Step S5: and extracting a depth remote sensing image encoder network in an online network in the twin network model after the c-round training, adding a full connecting layer to the depth remote sensing image encoder network to form a classification model, and then carrying out fine adjustment by using labeled data to finally obtain a robust classification model.
The fine adjustment using the tagged data in this embodiment specifically includes: firstly, inputting an image with label data into a depth remote sensing image encoder model, then taking the label of the data (indicating the label data) as an expected result of encoder model training, returning and optimizing encoder model parameters through gradient, limiting the maximum value of single change of the encoder model parameters, namely gradient truncation, in the optimization process, finally enabling the encoder model to have robust remote sensing image classification performance, and finishing the training of the classification model.
Specifically, firstly, a backbone network of an on-line network depth remote sensing image encoder, namely ResNet18, is extracted from a trained twin network for use by downstream tasks, and then a full connection layer is connected behind the encoder network to form a classification model. The classification model is finely adjusted by using a certain amount of labeled data, and the specific methods for fine adjustment mainly include two methods:
the first method comprises the following steps: the encoder section parameters in the classification model are fixed and only the parameters of the fully connected section are optimized. The method has the advantages of high training speed and wide application range, and is suitable for the real-time remote sensing classification task.
And the second method comprises the following steps: and optimizing the overall parameters of the classification model. Compared with the first method, the method takes longer time, but can obtain higher accuracy in the face of the original image and the confrontation sample, and is the default using method of the embodiment.
The embodiment also provides a specific application case of the method:
simulation conditions are as follows:
the data selected by the embodiment is an NWPU-rescisc 45 Dataset optical remote sensing Dataset, the Dataset is created by northwest industrial university, the picture pixel size is 256 × 256, and 31500 images in total cover 45 scene categories, wherein each category has 700 images. The 45 scene categories include airplanes, airports, baseball fields, basketball fields, beaches, bridges, jungles, churches, round farmlands, clouds, commercial areas, dense houses, deserts, forests, highways, golf courses, ground tracks, ports, industrial areas, intersections, islands, lakes, grasslands, medium-sized houses, mobile house parks, mountains, overpasses, palaces, parking lots, railways, train stations, rectangular farmlands, rivers, ring junctions, runways, seas, ships, snow mountains, sparse houses, stadiums, water storage tanks, tennis courts, terraces, thermal power stations, and wetlands, are high-quality data sets that cover a wide variety of remote sensing data sets and have a large data volume.
The counterattack method selected in this embodiment is a PGD standard attack method, and countersamples are made by performing PGD attack on a standard classification model, and the robustness of the countersamples is evaluated according to the classification result of the classification model on the countersamples. The anti-noise limit epsilon of the PGD attack is set to 8/255, the number of attack iterations, namely the maximum value of n, is 50, the attack step length alpha is 1/255, and the ResNet18 model is uniformly used by the classification model.
Simulation content:
under the condition of less labeled data, the classification accuracy of the standard classification model, the confrontation training classification model and the classification accuracy of the classification model of the method on the original image and the confrontation sample of the test set are respectively considered, and the results are shown in table 1:
TABLE 1 simulation results
Figure BDA0003361588890000111
As can be seen from Table 1, the method greatly improves the classification accuracy of the confrontation samples of the model at the cost of slightly reducing the classification accuracy of the original images of the classification model. Compared with classical confrontation training, the method has obvious advantages in classification of the original image and the confrontation sample, and is more suitable for practical application of remote sensing image recognition.
In conclusion, the training method for enhancing the robustness of the remote sensing image depth classification model provided by the method makes full use of the label-free data rich in resources in remote sensing, in the training process, the characteristic of the mutual learning effect of the online network and the target network in the twin network is utilized to force the characteristics of the confrontation sample to be close to the clean sample, so that the encoder is more robust, the classification model is obtained by adding the full connection layer, the robust classification model can be obtained after fine adjustment, the classification accuracy of the confrontation sample is greatly improved under the condition of ensuring that the classification accuracy of the original image is reduced a little, and the method has high practicability.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A remote sensing image classification network robustness improving method based on self-supervision learning is characterized by comprising the following steps:
step S1: preprocessing the tag-free remote sensing image data, copying the preprocessed data into two parts, using a twin network model to carry out counterattack on one part to produce a counterattack sample data set, and carrying out data amplification on the other part to obtain a clean sample data set;
step S2: obtaining a characteristic vector of the clean sample data set through a target network in a twin network, obtaining a characteristic vector of the confrontation sample data set through an online network in the twin network, and then obtaining the contrast loss of the two characteristic vectors;
step S3: updating the twin network: firstly, the online network carries out gradient pass-back updating according to contrast loss, then the newly obtained online network and the original target network are used for carrying out exponential moving average, and the target network in the twin network is updated, so that the updating of the whole twin network model is realized;
step S4: iteratively executing the steps S1 to S3 to count c rounds, and finishing the training for improving the robustness of the twin network model; wherein c is a positive integer;
step S5: and extracting a depth remote sensing image encoder network in an online network in the twin network model after the c-round training, adding a full connecting layer to the depth remote sensing image encoder network to form a classification model, and then carrying out fine adjustment by using labeled data to finally obtain a robust classification model.
2. The remote sensing image classification network robustness improving method based on the self-supervision learning according to claim 1, characterized in that the attack resisting mode comprises the following two modes:
one is a PGD attack based on gradient iteration, as in formula 1):
Figure FDA0003361588880000011
wherein the content of the first and second substances,
Figure FDA0003361588880000012
for the preprocessed data x to iteratively move n times the obtained confrontation sample along the gradient direction in the twin network,
Figure FDA0003361588880000013
is shown in
Figure FDA0003361588880000014
As the projection of a random sphere with a center and a radius of S, epsilon as the perturbation limit, alpha as the attack step length, L θ,ξ Comparing loss of the twin network model, xi is a target network parameter, and theta is an online network parameter;
another is an SSP attack based on an auto-supervised pseudo-gradient, as specified in equations 2) -3):
Figure FDA0003361588880000015
Figure FDA0003361588880000016
wherein psi is a depth remote sensing image encoder network in the online network,
Figure FDA0003361588880000017
which is the output of the depth remote sensing image encoder network,
Figure FDA0003361588880000018
the method is characterized in that a preprocessed data x is subjected to iterative movement for n times along the gradient direction of the preprocessed data x in a depth remote sensing image encoder network to obtain a countersample, alpha is an attack step length, n represents the number of gradient iterations, and infinity represents that an attack is carried out under the limit of an infinite norm.
3. The method for improving the robustness of the remote sensing image classification network based on the self-supervision learning of claim 1, wherein the online network is composed of an encoder, a projector and a predictor, the target network is composed of an encoder and a projector, and initial parameters of the online network and the target network are set differently.
4. The method for improving the robustness of the remote sensing image classification network based on the self-supervised learning as recited in claim 1, wherein the Euclidean distance of the feature vector is used as a contrast loss, and the method is specifically represented by formula 4):
Figure FDA0003361588880000021
wherein L is θ,ξ For contrast loss, q is the target network, xi is the target network parameter, f is the online network, theta is the online network parameter,
Figure FDA0003361588880000022
to combat the sample data set, x' is the clean sample data set,
Figure FDA0003361588880000023
is a normalized value of the clean sample feature vector,
Figure FDA0003361588880000024
is a normalized value against the sample feature vector.
5. The method for improving robustness of the remote sensing image classification network based on the self-supervised learning as recited in claim 1, wherein the gradient return updating is performed on the online network according to the contrast loss, specifically as shown in formula 5):
Figure FDA0003361588880000025
wherein, optimizer represents optimization operation, theta is online network parameter,
Figure FDA0003361588880000026
is the gradient direction of the loss function on the on-line network parameter, xi is the target network parameter, L θ,ξ For twin network model contrast loss, lr is the learning rate.
6. The method for improving the robustness of the remote sensing image classification network based on the self-supervision learning of claim 1, wherein the target network is updated by using the online network parameters by adopting the exponential moving average, and specifically, the method is as follows in formula 6):
ξ ← τ ξ + (1- τ) θ equation 6),
wherein, theta is an online network parameter, xi is a target network parameter, tau is a retention index, and the network updating speed is controlled.
7. The method for improving the robustness of the remote sensing image classification network based on the self-supervised learning as recited in claim 1, wherein the fine tuning by using the labeled data specifically comprises the following steps: firstly, inputting an image with label data into a depth remote sensing image encoder model, then taking the label of the data as an expected result of encoder model training, optimizing encoder model parameters through gradient feedback, limiting the maximum value of single change of the encoder model parameters in the optimization process, finally enabling the encoder model to have robust remote sensing image classification performance, and finishing the training of the classification model.
8. The remote sensing image classification network robustness improving method based on the self-supervision learning according to claim 1, wherein the preprocessing comprises the following steps:
step S1.1: performing cutting operation on all the images to standardize the images to a uniform size;
step S1.2: normalizing the numerical values of the images, namely compressing the pixel values of all the images to be between 0 and 1;
step S1.3: the image data is linearly transformed by a normalization operation into a data set with a mean of 0 and a variance of 1.
9. The remote sensing image classification network robustness improving method based on the self-supervision learning according to claim 1, characterized in that batch standardization is used in the twin network to improve the model performance, specifically as shown in formula 7):
Figure FDA0003361588880000031
wherein a is batch data in the twin network, including a clean sample data set and a challenge sample data set which are input at the beginning, and a clean sample data set and a feature vector of the challenge sample data set which are extracted respectively through the twin network later; a is the normalized transition batch data,
Figure FDA0003361588880000032
is the average of the batch data, var (a) is the variance of the batch data a, and γ and β are the parameters to be learned.
10. The method for improving the robustness of the remote sensing image classification network based on the self-supervision learning of claim 1, is characterized in that data amplification is performed through at least one of random cutting, color matching, graying, random turning, random rotation and Gaussian noise adding operation.
CN202111368092.5A 2021-11-18 2021-11-18 Remote sensing image classification network robustness improving method based on self-supervision learning Active CN114067177B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111368092.5A CN114067177B (en) 2021-11-18 2021-11-18 Remote sensing image classification network robustness improving method based on self-supervision learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111368092.5A CN114067177B (en) 2021-11-18 2021-11-18 Remote sensing image classification network robustness improving method based on self-supervision learning

Publications (2)

Publication Number Publication Date
CN114067177A CN114067177A (en) 2022-02-18
CN114067177B true CN114067177B (en) 2022-09-20

Family

ID=80277783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111368092.5A Active CN114067177B (en) 2021-11-18 2021-11-18 Remote sensing image classification network robustness improving method based on self-supervision learning

Country Status (1)

Country Link
CN (1) CN114067177B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549985B (en) * 2022-02-23 2023-01-31 清华大学 Target detection method and system based on self-supervision contrast learning
CN114861865B (en) * 2022-03-10 2023-07-21 长江三峡技术经济发展有限公司 Self-supervision learning method, system, medium and electronic equipment of hyperspectral image classification model
CN114611694B (en) * 2022-03-16 2022-09-23 上海交通大学 Loss function method and system for improving robustness of image classification network model
CN115470827B (en) * 2022-09-23 2023-06-20 山东省人工智能研究院 Self-supervision learning and twin network-based noise reduction method for resistant electrocardiosignals
CN116229148B (en) * 2023-01-03 2023-10-03 中南大学 Screen-shot-roll robust detection method based on self-supervision contrast learning
CN116434037B (en) * 2023-04-21 2023-09-22 大连理工大学 Multi-mode remote sensing target robust recognition method based on double-layer optimization learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797703B (en) * 2020-06-11 2022-04-01 武汉大学 Multi-source remote sensing image classification method based on robust deep semantic segmentation network
CN113033460A (en) * 2021-04-09 2021-06-25 昆明理工大学 Combined loss remote sensing image classification method based on twin network

Also Published As

Publication number Publication date
CN114067177A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN114067177B (en) Remote sensing image classification network robustness improving method based on self-supervision learning
Wang et al. RSNet: The search for remote sensing deep neural networks in recognition tasks
CN109948658A (en) The confrontation attack defense method of Feature Oriented figure attention mechanism and application
CN112507793A (en) Ultra-short-term photovoltaic power prediction method
CN107229904A (en) A kind of object detection and recognition method based on deep learning
Zhang et al. Auxiliary training: Towards accurate and robust models
CN111401132B (en) Pedestrian attribute identification method guided by high-level semantics under monitoring scene
CN110334749A (en) Confrontation attack defending model, construction method and application based on attention mechanism
CN107256396A (en) Ship target ISAR characteristics of image learning methods based on convolutional neural networks
Zhu et al. Extreme weather recognition using convolutional neural networks
CN113420639A (en) Method and device for establishing near-ground infrared target data set based on generation countermeasure network
CN112597993A (en) Confrontation defense model training method based on patch detection
CN114708479B (en) Self-adaptive defense method based on graph structure and characteristics
CN113971440A (en) Unsupervised radar signal sorting method based on deep clustering
CN115272774A (en) Sample attack resisting method and system based on improved self-adaptive differential evolution algorithm
CN115481716A (en) Physical world counter attack method based on deep network foreground activation feature transfer
Qiong et al. Urban classification by multi-feature fusion of hyperspectral image and LiDAR data
Jiang et al. AGD-Linknet: a road semantic segmentation model for high resolution remote sensing images integrating attention mechanism, gated decoding block and dilated convolution
CN114329031A (en) Fine-grained bird image retrieval method based on graph neural network and deep hash
CN110503090A (en) Character machining network training method, character detection method and character machining device based on limited attention model
CN113610109A (en) Visible light camouflage target identification method based on magnifier observation effect
CN111191704A (en) Foundation cloud classification method based on task graph convolutional network
Yi et al. MHA-CNN: Aircraft fine-grained recognition of remote sensing image based on multiple hierarchies attention
Wu et al. RepISD-Net: Learning Efficient Infrared Small-target Detection Network via Structural Re-parameterization
Nugroho et al. A solution for imbalanced training sets problem by combnet-ii and its application on fog forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant