CN114067177B

CN114067177B - Remote sensing image classification network robustness improving method based on self-supervision learning

Info

Publication number: CN114067177B
Application number: CN202111368092.5A
Authority: CN
Inventors: 孙浩; 徐延杰; 雷琳; 计科峰; 匡纲要
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2022-09-20
Anticipated expiration: 2041-11-18
Also published as: CN114067177A

Abstract

The invention provides a remote sensing image classification network robustness improving method based on self-supervision learning, which not only utilizes labeled data, but also fully utilizes a great deal of non-labeled data existing in the remote sensing field, and effectively improves the robustness of a model by mining the information of the image through a twin network; and performing feature extraction on the clean sample and the countermeasure sample by using the twin network to obtain a feature vector, and completing model training by comparing and learning the feature vectors of the approaching clean sample and the countermeasure sample, so that the image has stable expression in a depth remote sensing image encoder network in an online network in the twin network, and further the improvement of robustness is realized. The method effectively enhances the robustness of the model to resisting sample noise and natural noise, hardly influences the classification effect of a clean data set, and is convenient to apply.

Description

Remote sensing image classification network robustness improving method based on self-supervision learning

Technical Field

The invention relates to the crossing field of deep learning and remote sensing, in particular to a remote sensing image classification network robustness improving method based on self-supervision learning.

Background

In recent years, neural networks have achieved breakthrough in many fields such as computer vision and natural language processing. In the remote sensing image classification application, the neural network inevitably acts on various unknown remote sensing data sets containing a large amount of different noises, although the noises have no influence on human eye identification, the noises can often induce the deep neural network to make wrong judgment, and the wrong judgment poses a serious security threat to the application of the neural network in the remote sensing image classification.

The importance of the interpretable deep learning is highlighted by the fact that the deep neural network makes a completely wrong judgment due to the tiny noise which cannot be perceived by human eyes, what is the basis adopted in the neural network classification and judgment, and how to further improve the stability and the expression capacity of the deep learning model. Training robust, interpretable deep neural networks is a higher pursuit.

Meanwhile, with the rapid development of remote sensing career, a large number of remote sensing data sets are continuously emerging. The manual labeling is time-consuming and labor-consuming and is difficult to match with the high-speed increase of the remote sensing data volume, how a large number of unlabeled data sets are utilized further improves the robustness and expression capacity of the remote sensing image classification neural network, and the potential and importance of the self-supervision learning are highlighted. Therefore, in recent research, the improvement of the comprehensive performance of the model through the self-supervision learning is highly regarded.

Under the background, in order to improve the defense capability of a remote sensing image classification model for a countermeasure sample, a large number of countermeasure defense methods are proposed, gradient masking (gradient masking) is proposed by some scholars according to gradient transmission, and the gradient of the model is changed into non-calculable or non-differentiable, so that the conventional attack method based on the gradient is avoided. However, these gradient-mask-based methods have proven to be only capable of defending against sample attacks in a rather limited number of cases. An attack method (backed Pass differential attack) for avoiding the gradient can completely bypass the gradient covering and implement effective attack on the network. And the countermeasure Training (adaptive Training) method utilizes the countermeasure sample generated by the appointed countermeasure attack method to be added into the Training set for carrying out neural network Training again, thereby improving the defense capability of the model to a certain extent. But the requirement on the data volume of the labeled data is high, the stability on natural noise is poor, the clean sample identification capability of the model can be reduced, and the method is not suitable for the remote sensing field with expensive label manufacturing.

In summary, a remote sensing image classification network robustness improvement method based on self-supervision learning is urgently needed to solve the problems in the prior art.

Disclosure of Invention

The invention aims to provide a remote sensing image classification network robustness improving method based on self-supervision learning, aims to provide a robustness improving method aiming at a remote sensing image classification model, and solves the problem that the prior art cannot fully utilize a large amount of unlabelled data in remote sensing resources to improve the performance of the model, and the specific technical scheme is as follows:

a remote sensing image classification network robustness improving method based on self-supervision learning comprises the following steps:

step S1: preprocessing the tag-free remote sensing image data, copying the preprocessed data into two parts, using a twin network model to carry out counterattack on one part to produce a counterattack sample data set, and carrying out data amplification on the other part to obtain a clean sample data set;

step S2: obtaining a characteristic vector of the clean sample data set through a target network in a twin network, obtaining a characteristic vector of the confrontation sample data set through an online network in the twin network, and then obtaining the contrast loss of the two characteristic vectors;

step S3: updating the twin network: firstly, the online network carries out gradient pass-back updating according to contrast loss, then the newly obtained online network and the original target network are used for carrying out exponential moving average, and the target network in the twin network is updated, so that the updating of the whole twin network model is realized;

step S4: iteratively executing the steps S1 to S3 to count c rounds, and finishing the training for improving the robustness of the twin network model; wherein c is a positive integer;

step S5: and extracting a depth remote sensing image encoder network in an online network in the twin network model after the c-round training, adding a full connecting layer to the depth remote sensing image encoder network to form a classification model, and then carrying out fine adjustment by using labeled data to finally obtain a robust classification model.

Preferably, the attack resisting mode in the above technical solution includes the following two modes:

one is a PGD attack based on gradient iteration, as in formula 1):

wherein the content of the first and second substances,

for the preprocessed data x to iteratively move n times along the gradient direction in the twin network to obtain a confrontation sample,

is shown in

As the projection of a random sphere with a center and a radius of S, epsilon as the perturbation limit, alpha as the attack step length, L _θ,ξ Comparing loss of the twin network model, xi is a target network parameter, and theta is an online network parameter;

another is an SSP attack based on an auto-supervised pseudo-gradient, as specified in equations 2) -3):

wherein psi is a depth remote sensing image encoder network in the online network,

which is the output of the depth remote sensing image encoder network,

the method is characterized in that a preprocessed data x is subjected to iterative movement for n times along the gradient direction of the preprocessed data x in a depth remote sensing image encoder network to obtain a countersample, alpha is an attack step length, n represents the number of gradient iterations, and infinity represents that an attack is carried out under the limit of an infinite norm.

Preferably, in the above technical solution, the online network is composed of an encoder, a projector and a predictor, the target network is composed of an encoder and a projector, and initial parameters of the online network and the target network are set differently.

Preferably, in the above technical solution, the euclidean distance of the feature vector is used as the contrast loss, specifically, as shown in formula 4):

wherein L is _θ,ξ For contrast loss, q is the target network, xi is the target network parameter, f is the online network, theta is the online network parameter,

to combat the sample data set, x' is the clean sample data set,

is a normalized value of the clean sample feature vector,

is a normalized value against the sample feature vector.

Preferably, in the above technical solution, the online network performs gradient backhaul update according to the contrast loss, specifically as shown in formula 5):

θ←optimizer(θ,▽ _θ L _θ,ξ lr) formula 5),

wherein optimizer represents optimization operation, theta is an online network parameter + _θ Is the gradient direction of the loss function on the on-line network parameter, xi is the target network parameter, L _θ,ξ For twin network model contrast loss, lr is the learning rate.

Preferably, in the above technical solution, the target network is updated by using the online network parameters using exponential moving average, specifically as shown in formula 6):

ξ ← τ ξ + (1- τ) θ equation 6),

wherein, theta is an online network parameter, xi is a target network parameter, tau is a retention index, and the network updating speed is controlled.

Preferably, in the above technical solution, the fine-tuning by using the labeled data specifically includes: firstly, inputting an image with label data into a depth remote sensing image encoder model, then taking the label of the data as an expected result of encoder model training, optimizing encoder model parameters through gradient feedback, limiting the maximum value of single change of the encoder model parameters in the optimization process, finally enabling the encoder model to have robust remote sensing image classification performance, and finishing the training of the classification model.

Preferably, in the above technical solution, the pretreatment includes the steps of:

step S1.1: performing cutting operation on all the images to standardize the images to a uniform size;

step S1.2: normalizing the numerical values of the images, namely compressing the pixel values of all the images to be between 0 and 1;

step S1.3: the image data is linearly transformed by a normalization operation into a data set with a mean of 0 and a variance of 1.

Preferably, batch standardization is used in the twin network to improve the model performance, specifically as shown in formula 7):

wherein a is batch data in the twin network, including a clean sample data set and a challenge sample data set which are input at the beginning, and a clean sample data set and a feature vector of the challenge sample data set which are extracted respectively through the twin network later; a is the normalized transition batch data,

is the average of the batch data, var (a) is the variance of the batch data a, and γ and β are the parameters to be learned.

Preferably, in the above technical solution, the data amplification is performed by at least one of random clipping, color matching, graying, random flipping, random rotation, and gaussian noise adding.

The technical scheme of the invention has the following beneficial effects:

the method provided by the invention not only utilizes the labeled data, but also fully utilizes the label-free data existing in a large amount in the remote sensing field, and effectively improves the robustness of the model by mining the information of the image through the twin network. And performing feature extraction on the clean sample and the countermeasure sample by using the twin network to obtain a feature vector, and completing model training by comparing and learning the feature vectors of the approaching clean sample and the countermeasure sample, so that the image has stable expression in a depth remote sensing image encoder network in an online network in the twin network, and further the improvement of robustness is realized. The method effectively enhances the robustness of the model to resisting sample noise and natural noise, hardly influences the classification effect of a clean data set, and is convenient to apply.

The method for improving the robustness of the remote sensing image classification model is combined with the self-supervision learning process, and the comparison learning is carried out aiming at the anti-attack means and the data amplification means, so that the characteristic expression of the classifier model on the same image is more stable. Considering that the network obtained by self-supervision contrast learning is difficult to be directly applied to the classification task of the remote sensing image, a small amount of labeled data is adopted for fine adjustment, gradient truncation is carried out during fine adjustment training (namely, the maximum value of single change of the parameters of the encoder model is limited), gradient explosion is prevented, and the performance of the classification model is improved. On the basis of ensuring the classification accuracy of the model to the clean samples, the method realizes the defense to the confrontation samples and improves the robustness of the model; in addition, the method has small demand for labeled data, is very suitable for the current situation that the labeling in the remote sensing field wastes time and labor, has low cost and has high practicability.

The method of the invention is still effective without using negative samples, which is due to the implicit contrast gain brought by batch standardization, so that batch standardization is not only an effective means for improving the performance of the twin network model, but also a basis for training the twin network model.

In addition to the objects, features and advantages described above, other objects, features and advantages of the present invention are also provided. The present invention will be described in further detail below with reference to the drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a method for improving robustness of a classification model of a remote sensing image provided by the invention;

FIG. 2 is a model structure diagram of the method for improving robustness of the remote sensing image classification model provided by the invention.

Detailed Description

In order that the invention may be more fully understood, a more particular description of the invention will now be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Example 1:

at present, the deep learning technology is greatly developed and is deeply fused with the remote sensing technology to obtain revolutionary results, but the vulnerability of the deep learning also leaves hidden danger for the application of the deep learning technology in the field of remote sensing with extremely high requirements on safety and stability. The anti-attack method can make the deep network output a result completely different from the original result by adding well-designed tiny noise into the original image, and seriously threatens the safety of remote sensing detection and identification.

In remote sensing detection and identification, a great amount of natural noise such as cloud and fog shielding, focusing blurring, wind, frost, rain and snow, digital noise and the like and well-designed man-made interference such as for military concealment are required to be faced, and the requirements on the robustness of the model are high.

In contrast, the embodiment provides a method for improving robustness of a remote sensing image classification network based on self-supervised learning, as shown in fig. 1 and fig. 2, the method specifically includes the following steps:

preferably, the pretreatment comprises the following steps:

In this embodiment, when preprocessing data, an image is first adjusted to 256 × 256, center clipping is used to obtain a 224 × 224 data set, then the image is normalized to compress the value range of the data to 0 to 1, and then normalization is performed to convert a picture into an approximately normally distributed data set with a mean value of 0 and a variance of 1. The preprocessed data is then replicated into two copies, one for making the confrontation sample data set and one for obtaining a clean sample data set.

In this embodiment, data amplification is performed by at least one of random clipping, color matching, graying, random flipping, random rotation, and gaussian noise adding. The diversity of the distribution of clean sample data is improved under the condition that the example level label of the preprocessed data is not changed.

The use of data amplification is not fixed, random adjustment can be performed according to actual tasks, random clipping is recommended if the size change of a target in an image is large, and color change and gray level are recommended if the color change of the image is bright, so that the robustness of the model can be effectively improved.

In addition, amplification of a data set is not performed only once, remote sensing data is complex and changeable, image diversity is collapsed by single amplification, and training of a twin network is extremely dependent on data diversity, so that data amplification must be updated again in each training round to ensure twin network performance. Meanwhile, the data amplification and the selection of the countermeasure sample are flexible, and the tasks such as camouflage reconnaissance, defogging and the like can be flexibly selected according to the characteristics of the tasks.

Step S2: obtaining a feature vector q of the clean sample data set through a target network in a twin network _ξ (x'), passing the challenge sample data set through an online network in the twin network to obtain its feature vector

Then obtaining the contrast loss of the two eigenvectors;

as shown in fig. 2, the online network is composed of an encoder, a projector and a predictor, the target network is composed of an encoder and a projector, and the two networks are not completely symmetrical; although some of the two networks have the same structure, in the embodiment, the initial parameters of the online network and the target network are set differently, so that the difference is increased, and the training is accelerated.

In this embodiment, preferably, Batch Normalization (BN) is used in the twin network to improve the model performance, which is specifically as follows:

wherein BN represents batch normalization; a is batch data in the twin network, including a clean sample data set, a challenge sample data set which are input at the beginning, and a clean sample data set and a feature vector of the challenge sample data set which are extracted respectively through the twin network later; a is the normalized transition batch data,

the mean value of the batch data, var (a) the variance of the batch data a, and gamma and beta are parameters to be learned, and are updated when the model is optimized.

The use of the twin model is not fixed and may be modified according to the amount and type of data of the training data. The encoder is composed of a plurality of Convolution layers (Convolution layers), output is subjected to batch standardization operation every two to three Convolution layers, and a Linear activated layer (RecU) is used for activating the neuron, so that the model performance can be effectively improved.

Specifically, the encoder is a deep convolutional network with image depth feature extraction capability, in this embodiment, specifically, a backbone network of a ResNet18 network (with the last full-connection layer removed, and 17 convolutional layers in total) is used as the encoder, and the final output dimension is 25088;

the projector and the predictor both use a multilayer perceptron MLP, and the structure of the MLP is that a full connection layer with an output characteristic dimension of 256 is added, Batch Normalization (BN) is added, an activation layer (activation function ReLU) is added, so that the model has the nonlinear classification capability, and finally the full connection layer with the output characteristic dimension of 256 is connected. The activation function ReLU may be represented by the following formula:

where s is a parameter into the active layer.

Further, to ensure the stability of the training, the clean sample feature vector q is first applied _ξ (x') and the confrontational sample feature vectors output by the online network

Respectively carrying out normalization treatment to obtain

And

using Euclidean distance of feature vectors as a comparison loss, specifically, a loss function L to be compared and learned _θ,ξ Defined as the sum of clean sample feature vectorsThe on-line network outputs the mean square error of the normalized value of the eigenvector of the confrontation sample, which is specifically as follows:

to combat the sample data set, x' is the clean sample data set,

is a normalized value of the feature vector of the clean sample,

is a normalized value against the sample feature vector.

specifically, the online network performs gradient backhaul update according to the contrast loss, and more specifically, performs gradient backhaul only on the online network, and updates the parameters of the online network by using an Adam optimizer according to the gradient obtained by backhaul, which is specifically as follows:

θ←optimizer(θ,▽ _θ L _θ,ξ ,lr)

Adam is different from the classical random gradient descent method. The stochastic gradient descent maintains a single learning rate (called alpha) for all weight updates, and the learning rate does not change during the training process. Adam maintains a learning rate for each network weight (parameter) and adjusts individually as learning evolves. The method calculates adaptive learning rates for different parameters from the first and second moments of the gradient. The Adam optimizer has low requirement on a memory and high calculation efficiency, and is favorable for rapid convergence of the model.

The target network adopts exponential moving average to update by using online network parameters, namely: utilizing the just updated online network, using exponential moving average to the target network, and feeding back and updating the target network through the parameters of the online network, wherein the following formula is specifically provided:

ξ←τξ+(1-τ)θ

To ensure the stability of the training, the update speed of the target network is generally controlled, and the retention index generally has a value close to 1, specifically 0.99 in this embodiment.

Step S4: iteratively executing the steps S1 to S3 until the loss function convergence is met and the iteration times reach c, and finishing the training of improving the robustness of the twin network model; wherein c is a positive integer;

in the process of iteratively executing steps S1-S3, a new countermeasure sample set is made according to the updated twin network, and the counterattack in this embodiment includes the following two ways:

one is a PGD attack based on gradient iteration, as follows:

wherein the content of the first and second substances,

for the preprocessed data x to iteratively move n times the obtained confrontation sample along the gradient direction in the twin network,

is shown in

through a certain number of iterations, a new antagonistic sample to the twin network can be generated to maximize the similarity error with the clean sample. The method for resisting attack is low in calculation consumption, high in attack strength and good in generalization, and is the preferred method for manufacturing the resisting sample in the embodiment.

The other is SSP attack based on self-supervision pseudo gradient, which has the following two formulas:

which is the output of the depth remote sensing image encoder network,

The method is similar to the PGD attack, but the action target of the method is an encoder of an online network, and a countermeasure sample is made by maximizing the Euclidean distance with the feature vector of a clean sample through gradient superposition.

In the method provided by this embodiment, no negative sample is used, and the model training is prone to collapse, so the training parameters must be precisely controlled during training, and the lr learning rate is set to 0.003. Generally, the magnitude of the countermeasure noise varies according to the actual application requirement, and the present embodiment specifically uses a countermeasure sample with a disturbance amplitude of 8/255. Meanwhile, the training may be unstable due to the too fast network update rate, so the training process is limited by using gradient interception, and the maximum value of the parameter update (i.e. the maximum value of a single change of the parameter) is set to 0.01.

The iterative process requires increased randomness, random batches of training data for each round, and re-runs for data amplification.

The fine adjustment using the tagged data in this embodiment specifically includes: firstly, inputting an image with label data into a depth remote sensing image encoder model, then taking the label of the data (indicating the label data) as an expected result of encoder model training, returning and optimizing encoder model parameters through gradient, limiting the maximum value of single change of the encoder model parameters, namely gradient truncation, in the optimization process, finally enabling the encoder model to have robust remote sensing image classification performance, and finishing the training of the classification model.

Specifically, firstly, a backbone network of an on-line network depth remote sensing image encoder, namely ResNet18, is extracted from a trained twin network for use by downstream tasks, and then a full connection layer is connected behind the encoder network to form a classification model. The classification model is finely adjusted by using a certain amount of labeled data, and the specific methods for fine adjustment mainly include two methods:

the first method comprises the following steps: the encoder section parameters in the classification model are fixed and only the parameters of the fully connected section are optimized. The method has the advantages of high training speed and wide application range, and is suitable for the real-time remote sensing classification task.

And the second method comprises the following steps: and optimizing the overall parameters of the classification model. Compared with the first method, the method takes longer time, but can obtain higher accuracy in the face of the original image and the confrontation sample, and is the default using method of the embodiment.

The embodiment also provides a specific application case of the method:

simulation conditions are as follows:

the data selected by the embodiment is an NWPU-rescisc 45 Dataset optical remote sensing Dataset, the Dataset is created by northwest industrial university, the picture pixel size is 256 × 256, and 31500 images in total cover 45 scene categories, wherein each category has 700 images. The 45 scene categories include airplanes, airports, baseball fields, basketball fields, beaches, bridges, jungles, churches, round farmlands, clouds, commercial areas, dense houses, deserts, forests, highways, golf courses, ground tracks, ports, industrial areas, intersections, islands, lakes, grasslands, medium-sized houses, mobile house parks, mountains, overpasses, palaces, parking lots, railways, train stations, rectangular farmlands, rivers, ring junctions, runways, seas, ships, snow mountains, sparse houses, stadiums, water storage tanks, tennis courts, terraces, thermal power stations, and wetlands, are high-quality data sets that cover a wide variety of remote sensing data sets and have a large data volume.

The counterattack method selected in this embodiment is a PGD standard attack method, and countersamples are made by performing PGD attack on a standard classification model, and the robustness of the countersamples is evaluated according to the classification result of the classification model on the countersamples. The anti-noise limit epsilon of the PGD attack is set to 8/255, the number of attack iterations, namely the maximum value of n, is 50, the attack step length alpha is 1/255, and the ResNet18 model is uniformly used by the classification model.

Simulation content:

under the condition of less labeled data, the classification accuracy of the standard classification model, the confrontation training classification model and the classification accuracy of the classification model of the method on the original image and the confrontation sample of the test set are respectively considered, and the results are shown in table 1:

TABLE 1 simulation results

As can be seen from Table 1, the method greatly improves the classification accuracy of the confrontation samples of the model at the cost of slightly reducing the classification accuracy of the original images of the classification model. Compared with classical confrontation training, the method has obvious advantages in classification of the original image and the confrontation sample, and is more suitable for practical application of remote sensing image recognition.

In conclusion, the training method for enhancing the robustness of the remote sensing image depth classification model provided by the method makes full use of the label-free data rich in resources in remote sensing, in the training process, the characteristic of the mutual learning effect of the online network and the target network in the twin network is utilized to force the characteristics of the confrontation sample to be close to the clean sample, so that the encoder is more robust, the classification model is obtained by adding the full connection layer, the robust classification model can be obtained after fine adjustment, the classification accuracy of the confrontation sample is greatly improved under the condition of ensuring that the classification accuracy of the original image is reduced a little, and the method has high practicability.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A remote sensing image classification network robustness improving method based on self-supervision learning is characterized by comprising the following steps:

2. The remote sensing image classification network robustness improving method based on the self-supervision learning according to claim 1, characterized in that the attack resisting mode comprises the following two modes:

one is a PGD attack based on gradient iteration, as in formula 1):

wherein the content of the first and second substances,

is shown in

which is the output of the depth remote sensing image encoder network,

3. The method for improving the robustness of the remote sensing image classification network based on the self-supervision learning of claim 1, wherein the online network is composed of an encoder, a projector and a predictor, the target network is composed of an encoder and a projector, and initial parameters of the online network and the target network are set differently.

4. The method for improving the robustness of the remote sensing image classification network based on the self-supervised learning as recited in claim 1, wherein the Euclidean distance of the feature vector is used as a contrast loss, and the method is specifically represented by formula 4):

to combat the sample data set, x' is the clean sample data set,

is a normalized value of the clean sample feature vector,

is a normalized value against the sample feature vector.

5. The method for improving robustness of the remote sensing image classification network based on the self-supervised learning as recited in claim 1, wherein the gradient return updating is performed on the online network according to the contrast loss, specifically as shown in formula 5):

wherein, optimizer represents optimization operation, theta is online network parameter,

is the gradient direction of the loss function on the on-line network parameter, xi is the target network parameter, L _θ,ξ For twin network model contrast loss, lr is the learning rate.

6. The method for improving the robustness of the remote sensing image classification network based on the self-supervision learning of claim 1, wherein the target network is updated by using the online network parameters by adopting the exponential moving average, and specifically, the method is as follows in formula 6):

ξ ← τ ξ + (1- τ) θ equation 6),

7. The method for improving the robustness of the remote sensing image classification network based on the self-supervised learning as recited in claim 1, wherein the fine tuning by using the labeled data specifically comprises the following steps: firstly, inputting an image with label data into a depth remote sensing image encoder model, then taking the label of the data as an expected result of encoder model training, optimizing encoder model parameters through gradient feedback, limiting the maximum value of single change of the encoder model parameters in the optimization process, finally enabling the encoder model to have robust remote sensing image classification performance, and finishing the training of the classification model.

8. The remote sensing image classification network robustness improving method based on the self-supervision learning according to claim 1, wherein the preprocessing comprises the following steps:

9. The remote sensing image classification network robustness improving method based on the self-supervision learning according to claim 1, characterized in that batch standardization is used in the twin network to improve the model performance, specifically as shown in formula 7):

10. The method for improving the robustness of the remote sensing image classification network based on the self-supervision learning of claim 1, is characterized in that data amplification is performed through at least one of random cutting, color matching, graying, random turning, random rotation and Gaussian noise adding operation.