AU2021103604A4

AU2021103604A4 - Soft threshold defense method for adversarial examples of remote sensing images

Info

Publication number: AU2021103604A4
Application number: AU2021103604A
Authority: AU
Inventors: Li Chen; Jiale Duan; Haifeng Li; Qi Li; Mingming Lu
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2021-05-18
Filing date: 2021-06-24
Publication date: 2021-08-12
Anticipated expiration: 2029-06-24
Also published as: CN113269241A; CN113269241B

Abstract

A soft threshold defense method for adversarial examples of remote sensing images includes: saving correctly classified remote sensing images and adversarial examples in a validation set under a same class, and removing remote sensing images that cannot be correctly classified in the validation set; re-classifying the images saved in the validation set by treating original images as positive samples and the adversarial examples as negative samples; obtaining a new dataset by combining the output confidences of classes; training a logistic regression model on the dataset; obtaining thresholds of the output confidences through decision boundaries of original images and adversarial examples; and determining whether a current input image is an adversarial example by comparing an output confidence of the current input image with the soft threshold used for defense. Attack of adversarial examples in the scene classification problem of remote sensing images can be effectively defended. 21006ZCM-AUP2 Positive: correctly classified RSIS soft threshold Negative: adversarial examples U pooling layer classifiers RSI convolutional fully connected softthreshold layer layer confidence > soft threshold? predictconfidence If Ture, Safe per class soft threshold If False, Unsafe FIG. 1 1 21006ZCM-AUP2

Description

Positive: correctly classified RSIS soft threshold

Negative: adversarial examples U

pooling layer classifiers

RSI convolutional fully connected softthreshold layer layer confidence > soft threshold? predictconfidence If Ture, Safe per class soft threshold If False, Unsafe

FIG. 1

1 21006ZCM-AUP2

SOFT THRESHOLD DEFENSE METHOD FOR ADVERSARIAL EXAMPLES OF REMOTE SENSING IMAGES FIELD OF THE INVENTION

[0001] The invention relates to the field of remote sensing image classification, and

more particularly to a soft threshold defense method for adversarial examples of

remote sensing images.

BACKGROUND

[0002] Due to the excellent feature extraction ability and high accuracy of

convolutional neural networks (CNN), it has become a general technique for object

recognition in the remote sensing field. It is widely applied in remote sensing field,

such as disaster management, forest monitoring, and urban planning. A

well-performing CNN can lead to a high economic benefit. However, many studies

have shown that CNNs are vulnerable to adversarial examples which are carefully

generated and imperceptible to human observers. They can cause the model to predict

error results with high confidence. Adversarial examples have become the most

critical security concern for CNNs in real-world applications. Adversarial examples

fool the models into predicting wrong results through generated perturbations,

demonstrating the vulnerability of convolutional neural networks (CNNs). Recent

studies also show that many CNNs applied to remote sensing image (RSI) scene

classifications are still subject to adversarial example attacks.

[0003] In response to these attack algorithms, numerous defense algorithms have also

been born. These algorithms can be divided into two types. The first type is to make

the model robust, such as adversarial training, gradient masking, and input

transformation. They improve the robustness of the model by modifying the model

structure or adding additional regular terms to make the generation of adversarial

examples more difficult. However, these algorithms require re-training the model and

are computationally intensive. The second type is detection-only. This algorithm

usually requires training a new detector to extract features of the image in advance

and determine whether the input is an adversarial example based on these features.

21006ZCM-AUP2

SUMMARY

[0004] Through further analysis of adversarial examples of RSIs, it is found that the

misclassified classes are not random, and these adversarial examples have

demonstrated attack selectivity. Inspired by the attack selectivity of RSI to adversarial

examples, the invention considers that a distribution between misclassified classes of

adversarial examples from the same class and an original class is stable, and they are

distinguishable by a decision boundary. Based on this finding, a soft threshold

defense method is proposed. It determines whether a current input image is an

adversarial example by comparing an output confidence with a soft threshold of a

class. Specifically, all correctly predicted images under a class are treated as positive

samples, and all adversarial examples under the class with various attack algorithms

are treated as negative samples. Then the confidence through a model is used as an

input to train a logistic regression model. Decision boundaries for the original images

and adversarial examples can be obtained based on the logistic regression model, and

the threshold of confidence, i.e., the soft threshold of the class used for defense is

further obtained. Regardless of the type of attack, each class has a soft threshold. In

contrast to natural image-based defense algorithms, an algorithm proposed in the

invention is that based on properties of RSI to adversarial examples. The algorithm

performs well on a variety of models and attack algorithms under multiple RSI

datasets through experiments.

[0005] A soft threshold defense method for adversarial examples of remote sensing

images provided in the invention may specifically include the following steps of:

saving output confidences of correctly classified remote sensing images and

corresponding generated adversarial examples in a validation set under a same class,

and removing remote sensing images that cannot be correctly classified in the

validation set;

re-classifying the remote sensing images saved in the validation set by treating

original images as positive samples and the adversarial examples as negative samples;

obtaining a new dataset by combining the output confidence, wherein each input

data in the new dataset includes the output confidence of each of the remote sensing 2 21006ZCM-AUP2 images and label data denoting whether the remote sensing image is an adversarial example, an image whose label data is 0 is the adversarial example, and an image whose label data is 1 is the original image; training a logistic regression model on the new dataset D; obtaining thresholds of the output confidences through decision boundaries of the original images and the adversarial examples, wherein the thresholds include a soft threshold used for defense of each class; and selecting the soft threshold used for defense of a corresponding class according to a class of a current input image, and determining whether the current input image is an adversarial example by comparing an output confidence of the current input image with the soft threshold used for defense.

[0006] In an embodiment, the step of training a logistic regression model on the new

dataset D may include:

calculating a posterior probability of the original image by a Sigmoid function

substituting a step function between an input x and a corresponding label y;

using a maximum likelihood method to solve weights in the logistic regression

model;

calculating an average log-likelihood loss of the new dataset D; and

obtaining iteratively optimal weights under a gradient descent algorithm.

[0007] In an embodiment, for the new dataset D, the Sigmod function may be defined

as follows:

p(N>V 1 1+e

z= wx+ b,

where w, b represent the weights in the logistic regression model, and p(x)

represents a probability that the input x is classified as 1 and is the posterior

probability of the original image.

[0008] In an embodiment, the probability of adversarial examples may be calculated

as follows:

P(y | x; w, b) = p (x)' (Y p (x))'-I,

3 21006ZCM-AUP2 where P(y | x; w, b) represents a probability for whether the input x is an adversarial example.

[0009] In an embodiment, the maximum likelihood method may be carried out as

follows:

L(w; b) = 11(P(xi)'")(1-Pxi)

)

[0010] In an embodiment, the average log-likelihood loss may be calculated as

follows: 1 J(w; b)p=--log L(w; b) n

(Iyilog p(x,)+ (1-yi) log(-p(Xi)))). n I

[0011] In an embodiment, the calculation method of the optimal weights w*, b* may

be calculated as follows:

w k1 w - aJ(w;b) Bw,

b k+1 =b k -a BJ(w; b) bb,

where a represents the learning rate and k represents the number of iterations.

[0012] In an embodiment, the threshold r may be expressed as follows:

r = x, ifp(x; w*, b*) = 0.5,

where the threshold r is the soft threshold used for defense, and each class has a

corresponding soft threshold.

[0013] The soft threshold defense method proposed in the invention also belongs to

the detection-only category in this study, which means finding an adversarial example

and rejecting it. However, the algorithm proposed in the invention does not require

significant computation as well as retraining the model. Furthermore, the algorithm

originates from the property of RSI to adversarial examples, which applies to

adversarial example problems in the remote sensing field. In contrast to natural

image-based defense algorithms, the algorithm proposed in the invention is based on

properties of RSI to adversarial examples.

4 21006ZCM-AUP2

[0014] The attack of adversarial examples in scene classification problem of remote

sensing images can be effectively defended in the invention. Compared to other

defense algorithms that require modifying the model structure or are computationally

complex, the algorithm proposed in the invention is simple and effective. In several

scenarios, the fooling rates of FGSM, BIM, Deepfool and C&W attack algorithms are

reduced by an average by 97.76%, 99.77%, 68.18% and 97.95%. This data shows that

the soft threshold defense method can effectively defend against the fooling of

adversarial examples.

BRIEF DESCRIPTION OF THE DRAWINGS

[0015] FIG. 1 is a schematic flowchart of a soft threshold defense method for

adversarial examples of remote sensing images of the invention.

DETAILED DESCRIPTION OF EMBODIMENTS

[0016] The invention will be further described below in conjunction with the

accompanying drawings, but the invention is not restricted in any way. Any changes

or substitutions made based on the teachings of the invention belong to the protection

scope of the invention.

[0017] According to the attack selectivity of RSI to adversarial examples, the key of

the soft threshold defense method is to obtain the confidence threshold for each class

correctly. When the output confidence is higher than that threshold, it means the input

RSI is safe, while when the output confidence is lower than that threshold, that RSI

may be an adversarial example, which is unsafe.

[0018] As shown in FIG. 1, a soft threshold defense method for adversarial examples

of remote sensing images disclosed in the invention specifically includes the

following steps:

[0019] S1O: saving output confidences of correctly classified remote sensing images

and corresponding generated adversarial examples in a validation set under a same

class, and removing remote sensing images that cannot be correctly classified in the

validation set.

The output confidences of the correctly classified RSIs and the correspondingly

generated adversarial examples in the verification set under the same class are saved 5 21006ZCM-AUP2 in the invention. These negative samples originate from multiple attacks. The soft thresholds obtained for each class are independent of the type of attack. The reason that the validation set is used in the invention is the output confidences of the images in the training set are high, so the obtained thresholds in this case also are high. With such a threshold, it can lead to many RSIs not being correctly classified as adversarial examples. Also, the RSIs that cannot be correctly classified in the validation set are removed in the invention. And adversarial examples are from the correctly classified

RSIs in the validation set because the incorrectly classified RSIs already cause the

model to misclassify, which is not consistent with the definition of adversarial

examples.

[0020] S20: re-classifying the remote sensing images saved in the validation set by

treating original images as positive samples and the adversarial examples as negative

samples.

The saved results are re-classified by treating the original images as positive

samples and adversarial examples as negative samples in the invention. Combining

their output confidences, a new dataset D = {(x], y]), (x2, y2), •••, (xn, yn)} can be

obtained, where x denotes the output confidence of each of RSIs, and y represents a

label data which indicates whether the remote sensing image is an adversarial

example. n represents the size of the dataset. y is either 0 or 1, with 0 being the

adversarial example and 1 being the original image, which is a binary classification

problem.

[0021] S30: obtaining a new dataset by combining the output confidence, wherein

each input data in the new dataset includes the output confidence of each of the

remote sensing images and label data denoting whether the remote sensing image is

an adversarial example, an image whose label data is 0 is the adversarial example, and

an image whose label data is 1 is the original image.

[0022] S40: training a logistic regression model on the new dataset D.

The logistic regression model is trained on the new dataset, and the decision

boundaries are obtained through the logistic regression algorithm.

6 21006ZCM-AUP2

[0023] S50: obtaining thresholds of the output confidences through decision

boundaries of the original images and the adversarial examples, wherein the threshold

may include a soft threshold used for defense of each class.

The threshold used for defense is obtained by the decision boundaries of the

original images and adversarial examples. The decision boundaries for the original

images and adversarial examples based on the logistic regression model are obtained

further by treating the confidence of the model as the input of the training logistic

regression model, and the threshold of confidence, i.e., the soft threshold of the class

used for defense is further obtained. Regardless of the type of attack, each class has a

soft threshold.

[0024] S60: selecting the soft threshold used for defense of a corresponding class

according to a class of a current input image, and determining whether the current

input image is an adversarial example by comparing an output confidence of the

current input image with the soft threshold used for defense. When the model predicts

the new RSI, the input is an original image if the output confidence is higher than the

soft threshold of the corresponding class, and the opposite is an adversarial example.

Specifically, all correctly predicted images under a class are treated as positive

samples and all adversarial examples under that class with various attack algorithms

are treated as negative samples.

[0025] Specifically, step S40 includes the following steps:

[0026] S401: for the dataset D, using the Sigmoid function substituting a step function

between x and y in the invention, and the Sigmoid function is defined as follows:

1 p(x) = y = 1 z =wX b, () 1+ e

where, w and b represent the weights of the logistic regression model. p(x)

represents a probability that an input x is classified as 1, which is the posterior

probability of the original image. Therefore, the following can be obtained:

P(y = 1|x; w, b) = p(x) (2) P(y = 0|x; w, b) = 1 - p(x). (3)

[0027] Combining these two cases, the following can be obtained: 7 21006ZCM-AUP2

P(ylx; w, b) = p(x)'(1 - p(x))'-'. (4)

where the equation 4 represents the probability for whether the input x is an

adversarial example.

[0028] S402: further using the maximum likelihood method to solve weights in the

logistic regression model. The likelihood function is carried out as follows:

L (w, b) = H(pxi)"')(1 - p(zi)1-') (5)

[0029] To solve the equation 5, the same operation is performed to both sides of the

equation and written as a log-likelihood function:

log L(w, b) = j(yj log p(x;)+ (1 - y;) log(1 - p(xz))). (6) i=1

[0030] S403: calculating an average log-likelihood loss of the dataset with the

following equation:

J(w, b) -- log L(w, b) n

- (y log p(Xi) + (1 - yj) log(1 - p(Xi)))) (7)

[0031] S404: obtaining iteratively optimal weights w *, b * under a gradient descent

algorithm, which is as shown below:

Ob

where, a represents a learning rate, and k represents the number of iterations.

After obtaining w *, b *, according to the equation 1, the threshold r under this class

can be found in the invention, namely:

r = x, if p(x; w*, b*) = 0.5. (9)

The threshold r is the soft threshold used for defense, and each class has a

corresponding soft threshold. When the output confidence of a RSI is below the

8 21006ZCM-AUP2 threshold of that class, the input RSI is an adversarial example, which effectively reduces the risk caused by adversarial examples.

[0032] The existing datasets are used as follows to verify the technical effectiveness

of the invention.

[0033] 8 CNN models are selected in the invention, which are AlexNet, VGG16,

ResNet50, InceptionV4, Inception-ResNet, ResNeXt, DenseNetl21 and

PNASNet. All these models are widely used in remote sensing

applications. Considering the diversity of data types and ground objects, 6 RSI

datasets are selected, which are the AID dataset, the UC Merced Land Use Dataset

(UCM) dataset, the NWPU-RESISC45 (NWPU) dataset, the EuroSAT-MS dataset,

the MSTAR dataset, and the partial SEN1-2 dataset. Therefore, there are 48

classification scenarios in the experiment. Then, 4 attack algorithms are used,

including FGSM, BIM, DeepFool, and C&W. All attack algorithms are used to

generate adversarial examples for each of the classification scenarios. In total, 192

attack scenarios have been used to verify the effectiveness of the method in the

invention.

[0034] In addition, the effectiveness of the defense is quantified in terms of the

change in the fooling rate. The fooling rate is the proportion of adversarial examples

that can cause CNNs to produce incorrect results in all attack images.

[0035] Inspired by the attack selectivity of the RSI to adversarial example, a soft

threshold defense method is proposed in the invention. This defense algorithm

classifies adversarial examples and the input images by learning the output threshold

of each class through CNN-based classifiers.

[0036] In the experiments, the effectiveness of the soft threshold defense method is

verified by 48 classification scenarios under 4 attack algorithms. The experimental

results indicate that the proposed method can effectively defend adversarial examples,

and the fooling rates of the CNNs are reduced to 0 in most cases. This reduces the risk

of adversarial examples to the CNNs in the remote sensing field.

[0037] The above-mentioned embodiment is an embodiment of the invention, but the

embodiment of the invention is not limited by the above- mentioned embodiment, and 9 21006ZCM-AUP2 any other changes, modifications, substitutions, combinations and simplifications that deviate from the spirit and principle of the invention, should be equivalent replacement methods, and they are all included in the protection scope of the invention.

10 21006ZCM-AUP2

Claims

WHAT IS CLAIMED IS:

1. A soft threshold defense method for adversarial examples of remote sensing

images, comprising following steps of:

saving output confidences of correctly classified remote sensing images and

and removing remote sensing images that cannot be correctly classified in the

validation set;

obtaining a new dataset by combining the output confidences, wherein each

input data in the new dataset comprises the output confidence of each of the remote

sensing images and label data denoting whether the remote sensing image is an

adversarial example, an image whose label data is 0 is the adversarial example, and an

image whose label data is 1 is the original image;

training a logistic regression model on the new dataset D;

obtaining thresholds of the output confidences through decision boundaries of

the original images and the adversarial examples, wherein the thresholds comprise a

soft threshold used for defense of each class; and

selecting the soft threshold used for defense of a corresponding class according

to a class of a current input image, and determining whether the current input image is

an adversarial example by comparing an output confidence of the current input image

with the soft threshold used for defense.

2. The soft threshold defense method as claimed in claim 1, wherein the step of

training a logistic regression model on the new dataset D comprises:

calculating a posterior probability of the original image by a Sigmoid function

substituting a step function between an input x and a corresponding label y;

using a maximum likelihood method to solve weights in the logistic regression

model;

calculating an average log-likelihood loss of the new dataset D; and

obtaining iteratively optimal weights under a gradient descent algorithm.

21006ZCM-AUP2

3. The soft threshold defense method as claimed in claim 2, wherein for the new

dataset D, the Sigmod function is defined as follows:

p(x)=y

, 1+e

z= wx+b,

where w, b represent the weights in the logistic regression model, and p(x)

probability of the original image.

4. The soft threshold defense method as claimed in claim 3, wherein a

probability of an adversarial example is calculated as follows:

P(y | x; w, b) = p(x) (1- p(x)) '-I,

where P(y | x; w, b) represents a probability for whether the input x is an

adversarial example.

5. The soft threshold defense method as claimed in claim 2, wherein the

maximum likelihood method is carried out as follows:

L(w,;b)= 17(p(Xi) 2 -(Xi

6. The soft threshold defense method as claimed in claim 5, wherein the average

log-likelihood loss is calculated as follows: 1 J(w; b)p=- log L(w; b) n

_(I y logp(xi)+ (1-yi)log(1-p(i)))). n 1

7. The soft threshold defense method as claimed in claim 1, wherein the optimal

weights w*, b* are calculated as follows:

w k-i w k - aJ(w;b) ww, b k-1 b k -a J(w;b) bb ,

where a represents a learning rate, and k represents the number of iterations.

8. The soft threshold defense method as claimed in claim 7, wherein the

threshold r is expressed as follows:

r = x, ifp(x; w*, b*) = 0.5,

2 21006ZCM-AUP2 where the threshold r is the soft threshold used for defense, and each class has a corresponding soft threshold.

3 21006ZCM-AUP2

FIG. 1

21006ZCM-AUP2