CN115331079A

CN115331079A - Attack resisting method for multi-mode remote sensing image classification network

Info

Publication number: CN115331079A
Application number: CN202211005572.XA
Authority: CN
Inventors: 石程; 党叶楠; 赵明华; 苗启广; 潘治文
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2022-11-11

Abstract

The invention discloses an anti-attack method for a multi-mode remote sensing image classification network, which provides an anti-attack technology for the multi-mode remote sensing image classification network for two remote sensing image data sources, namely an optical remote sensing image TOP with three wave bands and a digital elevation model image DSM with one wave band, has more obvious attack effect and higher attack time efficiency and is used for evaluating and improving the robustness of the multi-mode remote sensing image classification network.

Description

Attack resisting method for multi-mode remote sensing image classification network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an anti-attack method for a multi-mode remote sensing image classification network.

Background

Remote sensing image classification remains a challenging task in the field of remote sensing. In recent years, with the rapid development of aerospace technology, a large number of remote sensing images of different sensors are generated, and effective classification of ground objects by combining a plurality of sensor images becomes a hot point of research. The multimodal remote sensing image covers the ground features with different observations, for example a TOP image, which expresses high spatial resolution information of the remote sensing scene, and a Digital Surface Model (DSM) image, which provides height information of the ground features. The multi-mode remote sensing image classification method based on the advantages of the multi-mode remote sensing data can effectively reduce the uncertainty in the classification task. In recent years, deep neural networks are widely introduced into the task of multi-modal remote sensing image classification. Compared with a single-mode remote sensing image classification method, the multi-mode remote sensing image classification method can effectively utilize the correlation between two modes and realize higher classification precision.

The vulnerability of deep neural networks has also attracted a great deal of attention in recent years. Adding a tiny perturbation that cannot be distinguished by human eyes to the original clean sample enables the deep neural network to generate wrong predictions with high confidence, the perturbed sample is called a countermeasure sample, and the process of generating the countermeasure sample is called countermeasure attack. Therefore, it is necessary to make the network more robust, and to know all risks faced by the network in advance, and to find a counterexample with more aggressive performance. Various attack methods have been proposed to generate countermeasure samples, such as Fast Gradient Sign Method (FGSM), iterative Fast Gradient Sign Method (I-FGSM), C & W algorithm based on optimized objective function, deepFool algorithm, etc., but the existing countermeasure attack technology only considers the attack effect under a single mode and lacks the attack implementation on the multi-mode classification network. When the multi-mode remote sensing image classification network is attacked, not only the success rate of the attack, the concealment of disturbance and the timeliness of the attack but also the cooperative attack capability among different modes need to be considered. Therefore, for the multi-modal remote sensing image classification network, high-quality multi-modal countermeasures need to be generated, and the robustness of the multi-modal remote sensing image classification network is further evaluated and improved.

Disclosure of Invention

The invention aims to provide an attack resisting method for a multi-mode remote sensing image classification network, which has more remarkable attack effect and higher attack time efficiency.

The invention adopts the following technical scheme: an attack resisting method for a multi-mode remote sensing image classification network comprises the following steps:

step one, constructing a multi-modal training sample set T and a test sample set S;

step two, inputting the training sample set T and the testing sample set S in the step one into a target model f to be attacked;

step three, constructing a target attack class t according to the target model f to be attacked in the step two;

step four, building a multi-mode anti-attack network; the multi-mode anti-attack network consists of a disturbance generation network and an identifier network of each mode;

step five, generating a multi-modal confrontation sample;

step six, inputting the multi-modal confrontation samples in the step five into a corresponding disturbance generation network, a discriminator network and a target model f respectively, and constructing a multi-modal generation loss function and a multi-modal discrimination loss function;

step seven, respectively training the multi-modal generation loss function and the multi-modal identification loss function in the step six alternately, updating each disturbance generation network and each identification network, and finishing the training of the multi-modal anti-attack network;

step eight, inputting the test samples into the multi-modal counter attack network in the step seven, and generating corresponding test counter samples.

Further, after the step eight, a step nine is further included, as follows: inputting the training samples into the multi-modal counterattack network to obtain the counterattack samples of the training samples, adding the counterattack samples of the training samples into the training sample set, and re-training the target model f in the second step; and repeating the third step to the eighth step.

Further, the multiple modes are optical remote sensing images and digital elevation model images.

Further, the specific process of constructing the multi-modal generation loss function and the multi-modal discrimination loss function is as follows:

step 6.1, constructing a multi-modal generating loss function, as shown in formula (C):

wherein:

respectively taking the optical remote sensing image as a multimode mode and taking the optical remote sensing image as an image of the resistance loss and the digital elevation model as a resistance loss;

the hyper-parameters alpha, beta and gamma are weight coefficients of perception loss, deception loss and cooperative attack loss;

L _f to defraud the loss; classified as t-th by error, spoofing loss L _f The definition is shown as a formula (H);

L _c the definition of the synergistic loss is shown as formula (I);

the definitions are shown in formulas (D) and (E):

wherein: d _T (x′ _T ) Representing that the generated optical remote sensing image is confronted with a sample x' _T Input to the optical remoteSensory image authentication network D _T The obtained output probability; d _D (x′ _D ) Representing the generated digital elevation model image against the sample x' _D Input to a digital elevation model image discriminator network D _D The obtained output probability;

respectively representing the perception loss of the optical remote sensing image and the perception loss of the digital elevation model image, and the definitions are shown as formulas (F) and (G):

wherein: epsilon is a hyperparameter controlling the minimum allowable disturbance intensity, | | | | computational complexity _P L representing a disturbance _P A norm;

formula (H) is:

f＝l _f (f(x′ _T ，x′ _D )，t) (H)；

wherein: f is the target model to be attacked, l _f Obtaining a target label t of the target attack for the cross entropy loss by the third step;

formula (I) is:

L _C ＝l _C (T(G _T (x _T ))，G _D (x _D )) (I)；

wherein: t is a band variation function; l _C Is a cosine similarity measure function;

step 6.2, constructing a multi-modal discrimination loss function, as shown in formula (J):

wherein:

the multi-mode discriminator loss function is an optical remote sensing image and a digital elevation model image respectively, and is defined as shown in a formula (K) and a formula (L):

wherein: d _T (x _T ) Representing training sample x for original multi-modality to optical remote sensing image _T Input to a multimode optical remote sensing image identification network D _T The obtained output probability; d _D (x _D ) Representing training sample x of original digital elevation model image _D Input to the digital elevation model image discrimination network D _D The obtained output probability; d _T (x′ _T ) Representing that the generated optical remote sensing image is confronted with a sample x' _T Input to an optical remote sensing image discrimination network D _T The obtained output probability; d _D (x′ _D ) Representing the generated digital elevation model image against the sample x' _D Input to a digital elevation model image discriminator network D _D The resulting output probability.

Further, in the second step, the target model f is a trained multi-source remote sensing image classification network.

Further, in step three, the prediction probability score P = [ P ] of the sample is obtained by forward calculation of the target model f ₁ ，p ₂ ，...，p _n ] ^T ；

Performing One-Hot encoding according to the original label and the category number of the sample to obtain an encoded vector, where if the original sample label is 1, the encoded vector is h = [1,0., 0 =] ^T The vector length is the number of classes of the sample;

obtaining an inverse mask from the encoded vector

Multiplying the reverse mask by the prediction probability score P to obtain the prediction probability value P' = [0,p ] of other categories except the sample original label category probability ₂ ，...，p _n ] ^T Performing difference processing on p ', subtracting a larger value from the position of the original label, namely the difference s = p' -h × 1e10, and taking the number of positions where the maximum value of the difference s is located as the target category;

wherein: p is a radical of _n The probability score of the output nth class is represented.

Further, in step four, the multi-modal attack-resisting network generates a network G by the disturbance of the optical remote sensing image _T Optical remote sensing image discriminator network D _T Digital elevation model image disturbance generation network G _D And a digital elevation model image discriminator network D _D And (4) forming.

Further, in step five, the multi-modal confrontation sample generation process is as follows:

training sample x of optical remote sensing image _T Input to optical remote sensing image disturbance generation network G _T In the method, optical remote sensing image disturbance is generated, and the generated optical remote sensing image disturbance is added to an input optical remote sensing image training sample to obtain a TOP confrontation sample x' _T As shown in formula (A):

x′ _T ＝x _T +G _T (x _T ) (A)；

wherein: g _T (x _T ) Representing TOP disturbance generated by an optical remote sensing image disturbance generation network.

Training sample x of digital elevation model image _D Input to the image disturbance generation network D of the digital elevation model _T Generating image disturbance of a digital elevation model, adding the generated disturbance to an input digital elevation model image training sample to obtain a countermeasure sample x 'of the digital elevation model image' _D As shown in formula (B):

x′ _D ＝x _D +G _D (x _D ) (B)；

wherein: g _D (x _D ) And representing the image disturbance of the digital elevation model generated by the image disturbance generation network of the digital elevation model.

Further, in step eight, the specific process of attacking the multimodal remote sensing image classification network by using the test sample is as follows:

testing TOP samples

Inputting the input to the trained TOP disturbance generation network, and outputting the TOP disturbance

And adding the TOP disturbance and the input TOP test sample to obtain a confrontation sample of the TOP test sample

Simultaneous DSM testing

Inputting the data to a trained DSM disturbance generation network, and outputting the DSM disturbance

And adding the DSM disturbance to the input DSM test sample to obtain a countermeasure sample of the DSM test sample

Finishing the attack resistance of the multi-mode remote sensing image classification network; the concrete expression is as follows:

further, the specific process of step nine is as follows:

training sample x of optical remote sensing image _T Is inputted intoThe trained good optical remote sensing image disturbance generates network output optical remote sensing image disturbance, the optical remote sensing image disturbance is added with the input optical remote sensing image training sample to obtain a countermeasure sample x 'of the optical remote sensing image training sample' _T ；

Training sample x of digital elevation model image _D Inputting the image disturbance of the trained digital elevation model into a disturbance generation network, outputting the image disturbance of the digital elevation model, and adding the image disturbance of the digital elevation model and the input image training sample of the digital elevation model to obtain a countermeasure sample x 'of the image training sample of the digital elevation model' _D (ii) a X' _T And x' _D Adding the training samples into the training sample set to obtain a new training sample set T', namely { x _T ，x _D ，x′ _T ，x′ _D E, T', and retraining the target model f in the step two.

The invention has the beneficial effects that: 1. the method considers that multi-source disturbance is generated by combining a plurality of modes, and the generated countermeasure sample is more real. 2. A multi-modal generation loss function and a multi-modal identification loss function are designed to establish a relation between different modal disturbances, so that the interference intensity added to each data source can be effectively reduced, and meanwhile, a higher attack success rate is kept.

Drawings

FIG. 1 is a research flow block diagram of an attack resisting method for a multi-source remote sensing image classification network according to the invention;

FIG. 2 is a data set used in the experiments of the present invention;

2a is a Potsdam dataset; 2b is a Vaihingen dataset;

FIG. 3 is a schematic diagram of the structure of the object model, disturbance generation network, discriminator network of the present invention;

FIG. 4 is a diagram of the classification before and after various approaches to confrontation training on a Potsdam dataset;

FIG. 5 is a classification chart of different classification methods on the Vaihingen dataset before and after the countertraining.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention discloses an attack resisting method for a multi-mode remote sensing image classification network, which comprises the following steps as shown in figure 1:

the multi-modes are three-band optical remote sensing images (TOP) and one-band digital elevation model images (DSM).

Inputting two multi-modal data TOP and DSM, and a real class diagram corresponding to the two modalities, as shown in FIG. 2; according to each pixel on the two multi-modal data TOP and DSM, taking the pixel as the center, defining a spatial window with the size of 27 multiplied by 27 pixels to extract samples, and respectively extracting the TOP samples and the DSM samples in pairs on the TOP and the DSM to form a sample pair; forming a sample set according to all the sample pairs; selecting part of sample pairs to form a training sample set, and selecting the rest of sample pairs in the sample set to form a test sample set, wherein { x _T ，x _D }∈T,

x _T And

respectively TOP training and TOP test samples, x _D And

DSM training samples and DSM test samples, respectively.

in the second step, the target model f is a trained multi-source remote sensing image classification network. As shown in fig. 3 (a), the configuration is specifically: the method comprises the steps of inputting two data sources, wherein the data input size of a TOP data source is 28 multiplied by 3, the data input size of a DSM data source is 28 multiplied by 1, each data source is provided with 5-layer networks to extract basic features of each data source respectively, then the outputs of the two data sources are connected, and then common features of the two data sources are extracted and classified through the 4-layer networks. The network parameters of the target model f are specifically shown in table 1, the target model f is trained well in advance, and the parameters of the target model are kept unchanged in the attack process.

TABLE 1 target model f network parameters

in the third step, each pair of samples x in the training sample set T is separately paired _T ，x _D Carrying out forward calculation through a pre-trained target model f to obtain an output probability score of each sample on each class, selecting a target attack class according to the output probability score, and assuming that a training sample pair is x _T ，x _D The output probability score is P = [ P ] ₁ ，p ₂ ，...，p _n ] ^T ，p _n Representing the probability score of the output nth class, and constructing a target class according to the following principle:

obtaining a prediction probability score P = [ P ] of the sample through forward calculation of the target model f ₁ ，p ₂ ，...，p _n ] ^T ；

obtaining an inverse mask from the encoded vector

Multiplying the reverse mask by the prediction probability score P to obtain the prediction probability value P' = [0,p ] of other categories except the sample original label category probability ₂ ，...，p _n ] ^T In order to avoid the repetition of the target category label and the original category label, difference processing is performed on p ', a larger value is subtracted from the position of the original label, namely the difference s = p' -h × 1e10, and the number of positions where the maximum value of the difference s is located is taken as the number of the positions where the maximum value of the difference s is locatedA target category;

Step four, building a multi-mode anti-attack network; the multi-modal attack-resisting network consists of a disturbance generation network and a discriminator network of each mode;

in the fourth step, the multi-modal attack-resisting network generates a network G by the disturbance of the optical remote sensing image _T Optical remote sensing image discriminator network D _T Digital elevation model image disturbance generation network G _D And a digital elevation model image discriminator network D _D And (4) forming.

As shown in fig. 3 (b) (c). The disturbance generating network is a coding-decoding structure in which G _T The image input size of the layer 1 convolutional layer of (2) is 28X 3,G _D Has an input size of 28 × 28 × 1, and the number of filters in the third layer in the decoding structure depends on the number of channels in the picture, G _T Is 3,G _D To 1, in addition to this parameter setting, the disturbance generation network G _T And G _D The other parameter settings are the same, as shown in table 2; TOP discriminator network D _T Layer 1 convolutional layer input picture size 28 x3, dsm discriminator network D _D The size of the convolution layer 1 input picture is 28 × 28 × 1, and the discriminator D _T And D _D The same parameter settings are shown in table 3.

Table 2 network parameters of a disturbance generation network

TABLE 3 discriminator network parameters

Step five, generating a multi-modal confrontation sample;

in the fifth step, the multi-modal confrontation sample generation process is as follows:

x′ _T ＝x _T +G _T (x _T ) (A)；

x′ _D ＝x _D +G _D (x _D ) (B)；

in the sixth step, the specific process of constructing the multi-modal generating loss function and the multi-modal discriminating loss function is as follows:

inputting TOP countercheck samples into a TOP discriminator and a target model f respectively, and simultaneously inputting DSM countercheck samples into a DSM discriminator and a target function f respectively, and constructing a multi-modal generation loss function and a multi-modal discrimination loss function according to TOP disturbance, the TOP countercheck samples, TOP at the output of the discriminator and TOP at the output of the target model, and DSM disturbance, DSM countercheck samples, DSM at the output of the discriminator and DSM at the output of the target model; the concrete structure is as follows:

wherein:

respectively taking the optical remote sensing image as a multimode mode and taking the optical remote sensing image as an image of the resistance loss and the digital elevation model as a resistance loss; the TOP challenge samples generated with the constraints are closer to the input TOP training samples and the DSM challenge samples generated with the constraints are closer to the input DSM training samples.

L _f for spoofing losses, classified by mistake as category t, spoofing losses L _f The definition is shown as a formula (H);

L _c is a cooperative loss to realize multi-modal cooperative attack, and the definition of the cooperative loss is shown as a formula (I);

the definitions are shown in formulas (D) and (E):

wherein: d _T (x′ _T ) Representing that the generated optical remote sensing image is confronted with a sample x' _T Input to an optical remote sensing image discrimination network D _T The obtained output probability; d _D (x′ _D ) Representing the generated digital elevation model image against the sample x' _D Input to a digital elevation model image discriminator network D _D The obtained output probability;

respectively representing the optical remote sensing image perception loss and the digital elevation model image perception loss so as to respectively restrict the disturbance intensity of TOP disturbance and DSM disturbance, and the definitions are shown as formulas (F) and (G):

wherein: epsilon is a hyperparameter controlling the minimum allowable disturbance intensity, | | | | computational complexity _P L representing a disturbance _P A norm; used in the examples is _∞ The norm constrains the perturbation, forcing the generated countermeasure sample closer to the true sample.

Formula (H) is:

L _f ＝l _f (f(x′ _T ，x′ _D )，t) (H):

formula (I) is:

L _C ＝l _C (T(G _T (x _T ))，G _D (x _D )) (I)；

wherein: t is a band variation function; because the TOP data has three bands and the DSM perturbation has one band, the DSM perturbation needs to be expanded repeatedly to three bands for similarity calculation. l _C The method is a cosine similarity measurement function and is used for measuring the similarity of disturbance under different modes.

wherein:

wherein: d _T (x _T ) Representing training sample x for original multimodality to optical remote sensing image _T Input to a multimode optical remote sensing image identification network D _T The obtained output probability; d _D (x _D ) Representing training sample x of original digital elevation model image _D Input to the digital elevation model image discrimination network D _D The obtained output probability; d _T (x′ _T ) Representing that the generated optical remote sensing image is confronted with a sample x' _T Input to an optical remote sensing image discrimination network D _T The obtained output probability; d _D (x′ _D ) Representing the generated digital elevation model image against the sample x' _D Input to a digital elevation model image discriminator network D _D The resulting output probability.

Seventhly, respectively training the multi-modal generation loss function and the multi-modal identification loss function in the sixth step alternately, updating each disturbance generation network and each identification network, and finishing the training of the multi-modal anti-attack network;

alternately training by utilizing a multi-mode generator loss and multi-mode discriminator loss optimization model, respectively training a TOP disturbance generation network, a TOP discrimination network, a DSM disturbance generation network and a DSM discrimination network, and performing the following steps:

step 7.1, update TOP discriminator D _T . Health-care productDSM discriminator D _D DSM disturbance generator G _D TOP perturbation generator G _T Using a gradient descent method to minimize the formula (M) for the TOP discriminator D _T And (5) training.

Step 7.2 update TOP perturbation generator G _T . Hold TOP discriminator D _T DSM disturbance generator G _D DSM discriminator D _D The TOP disturbance generator G is subjected to minimization of the formula (N) by a gradient descent method under the condition that the parameters are not changed _T And (6) updating.

Step 7.3, update DSM discriminator D _D . Preserving DSM disturbance generator G _D TOP discriminator D _T TOP disturbance generator G _T Using gradient descent with unchanged parameters to minimize formula (O) for DSM discriminator D _D And (6) updating.

Step 7.4, update DSM disturbance Generator G _D . Holding DSM discriminator D _D TOP perturbation generator G _T TOP discriminator D _T Using gradient descent method with unchanged parameters to minimize formula (P) for DSM disturbance generator G _D And (4) updating.

In the step eight, the specific process of attacking the multi-modal remote sensing image classification network by using the test sample is as follows:

testing TOP samples

Adding the TOP disturbance and the input TOP test sample to obtain a confrontation sample of the TOP test sample

Simultaneous DSM testing

Inputting the data to a trained DSM disturbance generation network, and outputting DSM disturbance

the ninth step is as follows: inputting training samples into a multi-modal counterattack network to obtain countersamples of the training samples, adding the countersamples of the training samples into a training sample set, and re-training the target model f in the second step; and repeating the third step to the eighth step.

The method comprises the following specific steps: optical remote sensing imageTraining sample x _T Inputting the optical remote sensing image disturbance into a trained optical remote sensing image disturbance to generate a network output optical remote sensing image disturbance, and adding the optical remote sensing image disturbance and an input optical remote sensing image training sample to obtain a countermeasure sample x 'of the optical remote sensing image training sample' _T ；

Training sample x of digital elevation model image _D Inputting the image disturbance of the trained digital elevation model into a disturbance generation network, outputting the image disturbance of the digital elevation model, and adding the image disturbance of the digital elevation model and the input image training sample of the digital elevation model to obtain a countermeasure sample x 'of the image training sample of the digital elevation model' _D (ii) a X' _T And x' _D Adding the new training sample set T 'into the training sample set to obtain a new training sample set T', namely { x _T ，x _D ，x′ _T ，x′ _D E, the target model f in the step two is trained again to enhance the capability of the model to cope with attacks.

The effect of the method of the present invention can be further illustrated by the following simulation experiments:

(1) Simulation conditions are as follows:

the hardware conditions of the simulation of the invention are as follows: windows10, GPU NVIDIA GeForce RTX3060; the software platform is as follows: matlabR2016a, pycharm;

the picture sources selected for simulation are Potsdam dataset and Vaihingen dataset. The Potsdam data set comprises 28 6000-pixel unmanned aerial vehicle images in total, the resolution is 5 cm/pixel, the data set comprises image data, terrain data and label data of three different channels, and a pair of DSM data and TOP data are randomly selected as input of the multi-source remote sensing image network as shown in 2a in figure 2. The Vaihingen data set comprises 33 remote sensing images with different sizes, TOP data with 3 wave bands and DSM data with a single wave band, the resolution is 9 cm/pixel, and a pair of DSM data and TOP data are selected as the input of a multi-source remote sensing image classification network and are shown as 2b in figure 2; in the invention, 10000 pixels are randomly selected from a Potsdam data set to each class respectively to construct a training sample, and 800 pixels are randomly selected from a Vaihingen data set to each class respectively to construct a training sample.

Simulation content and results:

simulation 1, the method of the present invention and the existing three technologies are used to perform classification simulation on the two data sets shown in fig. 2, and the results are as follows:

fig. 4 (a) to (d) are FGSM, C & W, PGD and the classification effect diagram of the technology of the present invention on the target model f for the test countermeasure sample on the Potsdam image dataset, respectively, (i) is the classification effect diagram of the network on the original test sample, and can be obtained from the diagram, the countermeasure sample generated by the four technologies can make the target model generate obvious wrong scores compared with the classification diagram of the original test sample on the target model f, and the classification diagrams (a) - (d) have obvious differences with the classification diagram (i), but the classification diagram obtained by the present invention has more obvious differences with the diagram (i), and the wrong regions are more numerous, which indicates that the countermeasure sample generated by the technology of the present invention is more aggressive.

Fig. 5 (a) to (d) are graphs of the classification effect of the FGSM, C & W, PGD and the technique of the present invention on the target model f for the testing challenge sample on the Vaihingen image dataset, respectively, (i) is a graph of the classification effect of the network on the original testing sample, and fig. 5 shows the same result as fig. 4.

The numerical comparison of the classification results before and after the antagonistic training on the data set between the method of the present invention and the prior art FGSM, C & W, PGD is shown in tables 4 and 5:

table 4 is a numerical comparison of classification results before and after the countertraining on Potsdam dataset for the method of the present invention and prior art FGSM, C & W, PGD

Table 5 is a numerical comparison of the classification results of the present invention method and prior art FGSM, C & W, PGD before and after the challenge training on the Vaihingen dataset

The data in tables 4 and 5 show that the technique of the present invention has shorter test time and higher attack time efficiency. It is obvious that the method has obvious advantages in processing multi-modal data attacks.

Claims

1. An attack resisting method for a multi-mode remote sensing image classification network is characterized by comprising the following steps:

step five, generating a multi-modal confrontation sample;

step six, respectively inputting the multi-modal confrontation samples in the step five into a corresponding disturbance generation network, a discriminator network and a target model f to construct a multi-modal generation loss function and a multi-modal discrimination loss function;

2. The attack-fighting method facing the multi-modal remote sensing image classification network as claimed in claim 1, wherein after the step eight, the method further comprises a step nine as follows: inputting training samples into a multi-modal counterattack network to obtain countersamples of the training samples, adding the countersamples of the training samples into a training sample set, and re-training the target model f in the second step; and repeating the third step to the eighth step.

3. The attack-fighting method facing multi-modal remote sensing image classification network according to claim 1 or 2, characterized in that the multi-modalities are optical remote sensing images and digital elevation model images.

4. The attack resisting method for the multi-modal remote sensing image classification network as claimed in claim 3, wherein the specific process of constructing the multi-modal generation loss function and the multi-modal identification loss function is as follows:

wherein:

respectively taking the optical remote sensing images as a multimode mode to resist loss and the digital elevation model images to resist loss;

L _f to defraud the loss; classified as t-th by error, spoofing loss L _f The definition is shown as formula (H);

L _c the definition of the synergistic loss is shown as formula (I);

the definitions are shown in formulas (D) and (E):

wherein: ε is a hyperparameter controlling the minimum allowable disturbance intensity, | | | | | non-calculation _P L representing a disturbance _P A norm;

formula (H) is:

L _f ＝l _f (f(x′ _T ，x′ _D )，t) (H)；

formula (I) is:

L _C ＝l _C (T(G _T (x _T ))，G _D (x _D )) (I)；

wherein:

the multi-mode is an identifier loss function of an optical remote sensing image and a digital elevation model image respectively, and the definitions are respectively shown as a formula (K) and a formula (L):

5. The attack-resisting method facing the multi-modal remote sensing image classification network as claimed in claim 4, wherein in the second step, the target model f is a trained multi-source remote sensing image classification network.

6. The method for resisting attacks on the multi-modal remote sensing image classification network as claimed in claim 1 or 2, wherein in the third step, the predicted probability score P = [ P ] of the sample is obtained through forward calculation of the target model f ₁ ，p ₂ ，...，p _n ] ^T ；

Performing One-Hot encoding according to the original label and the category number of the sample to obtain an encoding vector, wherein if the original sample label is 1, the encoding vector is h = [1,0., 0 =] ^T The vector length is the number of classes of the sample;

obtaining an inverse mask from the encoded vector

7. The method for resisting attack to the multi-modal remote sensing image classification network as claimed in claim 1 or 2, wherein in the fourth step, the multi-modal anti-attack network generates the network G by the disturbance of the optical remote sensing image _T Optical remote sensing image discriminator network D _T Digital elevation model image disturbance generation network G _D And a digital elevation model image discriminator network D _D And (4) forming.

8. The attack resisting method facing the multi-modal remote sensing image classification network as claimed in claim 1 or 2, characterized in that in the fifth step, the multi-modal resisting sample generation process is as follows:

x′ _T ＝x _T +G _T (x _T ) (A)；

Training sample x of digital elevation model image _D Input to digital elevation model image disturbance generation network D _T Generating image disturbance of a digital elevation model, adding the generated disturbance to an input digital elevation model image training sample to obtain a countermeasure sample x 'of the digital elevation model image' _D As shown in formula (B):

x′ _D ＝x _D +G _D (x _D ) (B)；

9. The attack resisting method facing the multi-modal remote sensing image classification network as claimed in claim 1 or 2, characterized in that in the eighth step, the specific process of attacking the multi-modal remote sensing image classification network by using the test sample is as follows:

testing TOP samples

And adding the TOP disturbance and the input TOP test sample to obtainChallenge samples of TOP test samples

Simultaneous DSM testing

And adding the DSM perturbation to the input DSM test sample to obtain an antagonistic sample of the DSM test sample

。

10. the attack resisting method facing the multi-modal remote sensing image classification network as claimed in claim 2, wherein the specific process of the ninth step is as follows:

training sample x of optical remote sensing image _T The disturbance of the trained good optical remote sensing image is input to generate disturbance of the optical remote sensing image output by the network, the disturbance of the optical remote sensing image is added with the training sample of the input optical remote sensing image to obtain a countermeasure sample x 'of the training sample of the optical remote sensing image' _T ；

Training sample x of digital elevation model image _D Inputting the image disturbance of the trained digital elevation model into a disturbance generation network, outputting the image disturbance of the digital elevation model, and performing disturbance of the image of the digital elevation model and the input digital elevation modelAdding the image training samples to obtain a confrontation sample x 'of the digital elevation model image training sample' _D (ii) a X' _T And x' _D Adding the new training sample set T 'into the training sample set to obtain a new training sample set T', namely { x _T ，x _D ，x′ _T ，x′ _D E, T', and retraining the target model f in the step two.