CN115861695A - Backdoor attack method, device and medium based on space transformation - Google Patents

Backdoor attack method, device and medium based on space transformation Download PDF

Info

Publication number
CN115861695A
CN115861695A CN202211539281.9A CN202211539281A CN115861695A CN 115861695 A CN115861695 A CN 115861695A CN 202211539281 A CN202211539281 A CN 202211539281A CN 115861695 A CN115861695 A CN 115861695A
Authority
CN
China
Prior art keywords
spatial transformation
data set
backdoor
samples
attack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211539281.9A
Other languages
Chinese (zh)
Inventor
夏树涛
徐彤
李一鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202211539281.9A priority Critical patent/CN115861695A/en
Publication of CN115861695A publication Critical patent/CN115861695A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a backdoor attack method, a backdoor attack device and a backdoor attack medium based on space transformation, wherein the method comprises the following steps: randomly selecting partial image samples from an original data set to perform parameter setting spatial transformation, changing labels of the partial image samples into target labels, and performing random parameter spatial transformation on the remaining benign image samples in the original data set under the condition of keeping the labels unchanged, so that the original data set is processed into a poisoning data set; performing standard training of a deep learning classification model by using the poisoning data set to construct a victim model implanted into a hidden backdoor; when the damaged model is subjected to classification prediction, the hidden backdoor of the damaged model can be activated by the samples subjected to the spatial transformation of the set parameters in the data set to be classified, so that the samples subjected to the spatial transformation of the set parameters in the data set to be classified are wrongly predicted as target labels, and the rest of the samples can be correctly predicted as real labels.

Description

Backdoor attack method, device and medium based on space transformation
Technical Field
The invention relates to the technical field of computer vision and artificial intelligence, in particular to a backdoor attack method and device based on spatial transformation and a related computer storage medium.
Background
Deep neural networks have been widely used in the field of computer vision, such as image detection, face recognition, automatic driving, and the like. Their success relies heavily on large amounts of training data and powerful computing power, but not all research or development personnel have the ability to produce their own data sets or own computing resources, and to reduce costs, users often employ third party resources to reduce costs, such as using third party data sets provided by certain companies or organizations, or outsourcing their training process to third party computing platforms. When these resources are used, the training process and the reasoning process of the model are not completely transparent to the user, and therefore, some potential safety hazards are brought, such as data poisoning, attack resistance, backdoor attack and the like. The vulnerability of the deep neural network in the inference stage is explored by the countermeasure attack, but compared with the inference stage, the training stage is more complex and has a wider related range, including data collection, data preprocessing, model selection and construction, training, model storage, model deployment and the like, more steps mean more attacked opportunities and greater security threats, and therefore the potential safety hazard of the backdoor attack occurring in the training stage of the model is gradually concerned by academic circles.
The backdoor attack is an attack mode aiming at a deep learning model, and an attacker implants some backdoors into the model in a certain mode in the training process of the model. When the back door is not excited, the attacked model has similar performance as the normal model; and when the back door in the model is activated, the output of the model becomes a target label which is pre-designated by an attacker to achieve the purpose of maliciousness. Backdoor attacks can occur in many scenarios where the training process is not completely controlled, such as using a third-party data set, using a third-party platform for training, directly invoking a third-party model, etc., thus posing a significant threat to the security of the model.
Backdoor attackers generally have three main goals — effectiveness, concealment, and robustness. The effectiveness means that when the test image contains the back door trigger, the prediction result of the attacked deep learning network is the target label, and the prediction performance of the network on benign (containing no trigger) samples is not significantly reduced; the concealment refers to that the adopted rear door trigger is relatively concealed and is not easy to be found by a user, and a relatively high attack success rate can still be achieved under the condition that the proportion of poisoned samples (namely the sample poisoning rate) is relatively small; robustness means that the back door attack is still effective under some common back door defenses.
Common backgate attacks mostly add extra triggers, such as blocks of pixels, additive noise, or physical objects, on the original image. "Badnets: evaluating backstreaming attacks on deep neural networks", proposed by Tianyu Gu, kang Liu et al, is a work in the field of back-door attacks, in which 3 × 3 white pixel blocks in the lower right corner of an image are pasted onto a benign image as a back-door trigger and the label thereof is modified to realize data poisoning. Referred to herein as "method 1".
Xinyun Chen, chang Liu et al, targetedback door attacks on a rear leaving system using data posing, first discusses the visibility of the rear door trigger and explains its significant impact on the concealment of rear door attacks, i.e. for better deceiving users, the poisoned image should be difficult to distinguish from the benign image, the method uses a mixing strategy to construct a 'stealth' attack, and the rear door trigger is better fused with the benign image by adjusting the transparency. Referred to herein as "method 2".
Yuezun Li, invitible background attack with sample-specific triggerers, et al, suggested by Baoyuan Wu, gets inspiration from image steganography, uses a pre-trained encoder to convert a representative string of a target tag into Invisible additional noise, and embeds it into an image as a trigger, while minimizing the perceptual difference between the input and the encoded image, such that the generated trigger is sample-specific, breaking the basic assumption that the existing backdoor defense trigger is sample-agnostic, and thus can easily bypass most defenses, with great success. Referred to herein as "method 3".
However, all of the above 3 existing backdoor attack methods require an additional trigger to be introduced to the original image to complete the data poisoning. In addition, most existing backdoor attacks are applied in digital scenes and are not close to the real physical world.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a backdoor attack method based on spatial transformation, which is used for solving the problems that an additional trigger needs to be introduced and the existing backdoor attack is not close to the real physical world.
The invention provides the following technical scheme for solving the problems:
a backdoor attack method based on spatial transformation comprises the following steps: randomly selecting partial image samples from an original data set to perform parameter setting spatial transformation, changing labels of the partial image samples into target labels, and performing random parameter spatial transformation on the remaining benign image samples in the original data set under the condition of keeping the labels unchanged, so that the original data set is processed into a poisoning data set; performing standard training of a deep learning classification model by using the poisoning data set to construct a victim model implanted into a hidden backdoor; when the victim model is subjected to classification prediction, the hidden backdoor of the victim model can be activated by the samples subjected to the spatial transformation of the set parameters in the data set to be classified, so that the samples subjected to the spatial transformation of the set parameters in the data set to be classified are wrongly predicted as target labels, and the rest of the samples can be correctly predicted as real labels.
Further, the spatial transformation comprises a rotation or a translation.
Further, the spatial transformation of the setting parameter comprises rotating a setting angle or translating a setting distance; the spatial transformation of the random parameter comprises rotating a random angle or translating a random distance.
Further, the random angle of rotation is randomly rotated within an angle range not including the set angle; the translation random distance is a random translation within a distance range not including the set distance.
In order to solve the foregoing problems, the present invention further provides a backdoor attack apparatus based on spatial transformation, including: a poisoning image generator for performing parameter setting spatial transformation on a part of image samples randomly selected from an original data set, and changing labels of the part of image samples into target labels to obtain a poisoning subset; a benign sample processor for performing spatial transformation of random parameters on the remaining benign image samples in the original data set, and keeping the labels unchanged to obtain a benign subset; the standard training module is used for performing standard training of a deep learning classification model by using a poisoning data set formed by the poisoning subset and the benign subset, and constructing a victim model implanted into the hidden backdoor, so that the victim model has the following characteristics: when the damaged model carries out classification prediction, the hidden back door of the damaged model can be activated by the samples which are subjected to the spatial transformation of the set parameters in the data set to be classified, so that the samples which are subjected to the spatial transformation of the set parameters in the data set to be classified are wrongly predicted as target labels, and the rest samples can be correctly predicted as real labels.
To solve the foregoing problems, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the back door attack method based on spatial transformation.
The invention has the beneficial effects that: according to the back door attack method based on the space transformation, a back door trigger is not required to be additionally introduced, the space transformation existing in the physical world is used as the back door trigger, and the original sample is directly subjected to the space transformation to manufacture poisoning data; meanwhile, the space transformation of random parameters is carried out on the remaining benign samples in the original data set under the condition that the labels are kept unchanged, so that the space transformation only under the set parameters of an attacker can be activated to hide a backdoor in a victim model, the backdoor attack carried out by the method is hidden and efficient, the concealment is that the space transformation of an image naturally exists, the detection of a user can be avoided, the efficiency is high, the original samples can be directly processed without additionally introducing a backdoor trigger, a plurality of backdoor defenses can be naturally bypassed, the threat of backdoor attack is greatly enhanced, and a new research is made for the development of the field of artificial intelligent security.
Drawings
Fig. 1 is a schematic flow chart of a backdoor attack method based on spatial transformation according to an embodiment of the present invention.
FIG. 2 is a flow chart illustrating the production of a poisoning data set according to an embodiment of the present invention.
FIG. 3 is a diagram showing the change of the indicators of the impact of different rotation angles on the attack effect as the backdoor trigger according to the embodiment of the present invention.
FIG. 4 is a graph showing the change of indicators of the impact of different translation distances on the attack effect as a backdoor trigger according to an embodiment of the present invention.
Fig. 5 is a back gate flip-flop synthesized by the Neural clean method.
FIG. 6 shows Grad-CAM after three different methods (method 1, invention I, invention II) for sample administration.
FIG. 7 is a resistance force diagram of a back door attack against the fine-tuning defense method according to an embodiment of the present invention.
Fig. 8 is a resistance force diagram of the backdoor attack against pruning defense method provided by the invention.
Fig. 9 is a traffic sign prediction result of real photographing.
Detailed Description
The invention is further described with reference to the following figures and detailed description of embodiments.
Research shows that the backdoor attack is easily affected by spatial transformation (such as turning and shrinking) because the spatial transformation can change the position or appearance of the trigger in the poisoned image, and the difference of the backdoor trigger in the training and testing phases can reduce the success rate of the backdoor attack, which is the reason that the existing backdoor attack has limited influence on the real physical world, because in the image shooting and using process of the real physical world, the introduction of the spatial transformation is difficult to avoid due to the change of the distance and the angle between the camera and the target object, so that the backdoor trigger is changed to cause the failure of the backdoor attack. The invention utilizes the vulnerability, abandons the idea that the prior backdoor attack method needs to introduce an additional backdoor trigger, and directly uses space transformation as the backdoor trigger to design a more effective and less noticeable backdoor, because the space transformation is ubiquitous in the physical world.
In view of this, the embodiment of the present invention provides a back door attack method based on spatial transformation, and a flow of the method is shown in fig. 1, and the method mainly includes three stages of attack, training, and testing.
The attack stage mainly completes the production of the poisoning data set, and with reference to fig. 2, specifically includes: randomly selecting a partial image sample (subset Ds) from an original data set Do to perform space transformation of set parameters, and changing the label of the partial image sample Ds into a target label to obtain a poisoning subset Dm; the remaining benign image samples (Do-Ds) in the original dataset are also spatially transformed with random parameters while keeping their labels unchanged, resulting in a benign subset Dt. Thus, the original data set Do is processed into the poisoning data set Dp, satisfying:
Figure BDA0003976469580000051
/>
the ratio of the randomly selected image samples for performing the spatial transformation of the setting parameters to the original data set is referred to as a sample toxicity applying rate, and the value of the sample toxicity applying rate is set according to actual needs. However, if the value is larger in practice, it means that the proportion of the sample to be poisoned is larger, the number of the sample to be poisoned is larger, and the sample is easier to be found by the user, so that the sample poisoning rate is not suitable to be too large in practice.
Unlike the existing method of generating a poisoning image by directly adding pixel blocks or changing pixels, the embodiment of the invention generates a poisoning sample by using a spatial transformation with set parameters, which may occur in the physical world. The embodiment of the invention mainly considers two classical space transformations of rotation and translation, and selects the set distance of rotation or translation of the set angle when the space transformation of the set parameters is carried out. For example, the selected sample is rotated by an angle of 16 °, and the remaining samples are rotated by random angles, which may be selected from a range of angles that does not include the set angle of 16 °, such as [ -10 °,10 ° ] and the randomly selected angles are transformed.
In the Training phase, standard Training (Standard Training) of a deep learning classification model is mainly performed by using the generated poisoning data set, and a victim model implanted into the hidden backdoor is constructed. The standard training process comprises: initializing a neural network, carrying out forward propagation to obtain a predicted value, calculating a reverse propagation error of a loss function, loading an optimizer and optimizing parameters; the Learning Rate (Learning Rate) was set to 0.01, and the Batch Size (Batch Size) and the number of iterations (Epochs) were set to 128 and 30, respectively. The cross entropy (cross entropy Loss) formula often used for classification problems used by the Loss function is:
Figure BDA0003976469580000052
wherein x is a vector with dimensionality as the number of categories, and each value is a probability value of different categories; class is the index value of the tag class. And optimizing the model parameters according to the Gradient information by using a Stochastic Gradient Descent (SGD) as an optimizer until the iteration number is finished.
In the testing stage, the user uses the trained victim model to perform classified prediction on the test data set, the image subjected to spatial transformation of the set parameters in the test data set activates a back door hidden in the victim model, so that the prediction result of the model is misled to a target label, and samples subjected to spatial transformation by other random parameters are still correctly predicted to be original real labels of the samples.
In the back door attack method based on spatial transformation provided by the embodiment of the invention, one implementation condition may be:
a software system: linux Ubuntu 18.04.3LTS;
programming language: python3.8;
a deep learning framework: pytorch1.8.0;
attack scenarios: scenario 1-scenario where the user employs a third party data set.
The invention relates to a method for generating a backdoor attack threat, which comprises the following steps of generating three actual scenes of the backdoor attack threat, wherein different scenes correspond to different abilities and authorities of an attacker and a defender. (1) Scene 1: the user employs the third party data set. In this scenario, the attacker provides the poisoned data set to the user directly or through a network. Users will employ the contaminated data set to train and deploy their models. Thus, an attacker can only manipulate the data set, and cannot modify the model, training flow, and reasoning flow. (2) Scene 2: the user employs a third party training platform. In this scenario, users provide their benign datasets, model structures, and training procedures to third-party untrusted platforms, borrowing their computing resources to train their models. An attacker (namely a malicious platform) performs some tampering and pollution (poisoning) in the actual training process, but the attacker does not change the model structure, otherwise the attacker is extremely easy to attract the attention of a user. (3) Scene 3: the user employs a third party model. In this scenario, an attacker obtains a pre-trained model through an Application Programming Interface (API) downloaded over a network and uses the pre-trained model for its own inference, in which case the attacker can change everything except the inference flow. In the three attack scenes, the ability of an attacker is gradually enhanced, more and more experiment processes can be changed, and accordingly, the possibility of the attack in the real world is smaller and smaller. The embodiment of the invention is applied to the (1) th scene which is most likely to occur in actual deployment, namely, a backdoor attacker has the authority to access and modify a training data set, but has no way to control a training and reasoning process, and has no way to know other important information in the process, such as a model structure, a training loss function and the like.
The effectiveness of the back door attack method based on spatial transformation according to the embodiment of the present invention is verified by a specific example.
(1) Experimental data set and experimental model
Experiments in the embodiment of the invention are all carried out on a GTSRB data set based on a Resnet-18 network. The data set contains 43 types of common traffic signs, wherein 39209 training images and 12630 testing images have different light conditions and rich and diverse backgrounds.
(2) Basic set of experiments
In the experiment, the sample toxicity rates are all set to be 5%, and the target label is set to be 1. When the rotation is used as a means of space transformation, the space transformation of the set parameters used by the poison-taker is rotated counterclockwise by 16 degrees, and the angle selection range of the space transformation of the random parameters is [ -10 degrees, 10 degrees ] (in this case, the description is referred to as "invention I" and will not be repeated below). When the translation is used as the spatial transformation, the spatial transformation of the set parameters used by the drug-giver is translated by 6 pixels to the right, and the range of the translation distance of the spatial transformation of the random parameters is [ -3,3] (pixels) (this case is referred to as "invention II", and the description will not be repeated below). In the experiment, the effectiveness of the method is evaluated by adopting Attack Success Rate (ASR) and Benign Accuracy Rate (Benign Accuracy, BA). The attack success rate refers to the proportion of poisoned samples predicted as target labels by the damaged model, and the benign accuracy refers to the prediction precision of the benign samples in the poisoned data set. The higher the ASR and BA, the better the attack effect of the representative method.
(3) Rear door attack feasibility and effectiveness verification based on spatial transformation
TABLE 1 Effect of different backdoor attack methods
Figure BDA0003976469580000071
CA in table 1 refers to the prediction accuracy on normal data sets that were not poisoned. "method 4" is a back door attack method in the literature of Back door attack in the physical world, proposed by Liming Li, tongqing ZHai et al. As can be seen from the experimental results in table 1, as with other classical back-door attack methods, the prediction accuracy of the two spatial transformations (invention I and invention II) proposed in the present invention is only reduced by 0.19% (97.51-97.32 = 0.19) and 0.74% (97.51-96.77 = 0.74), respectively, as compared to benign samples (non-poisoned normal samples) that do not include a trigger, and the attack success rate is over 99%, which achieves a very good attack effect. Generally speaking, if the virus is thrown, BA will be reduced compared with CA, but the goal of back door attack is to predict the virus sample wrongly, but the benign sample still keeps the original prediction, that is to say, if BA is reduced less compared with CA, the performance is better, here, the reduction of two methods used in the experiment of the invention is controlled within 1 percent, which shows that the effect of back door attack is very good.
(4) Effect of spatial transformation parameters on the results
To further prove the effectiveness of spatial transformation as a back door trigger, the embodiment of the present invention also discusses whether setting different spatial transformation scales (i.e. rotation angles or translation distances) can maintain high performance. As shown in FIG. 3 and FIG. 4, it can be seen that the back door attack is effective as long as the rotation angle is greater than or equal to 0 or the translation distance is greater than or equal to 0, which indicates that the back door attack method based on spatial transformation provided by the invention obtains good attack performance under almost any specified spatial transformation scale. However, when the transformation scale (rotation angle or translation distance) is too small, both rotation and translation are difficult to be identified by the neural network, and the poisoned sample is considered as an outlier, so that relatively low benign accuracy and attack success rate are caused, but a certain proportion of samples are attacked successfully.
(5) Effects of different object tags
TABLE 2 Effect of different target tags on attack Effect
Figure BDA0003976469580000081
The present invention also makes further research on the setting of target tags in the experimental setup to verify that the back door trigger of the present invention is still effective when using different target tags, and in order to ensure that the proportion of poisoned samples in the tag samples is as small as possible under the same sample poisoning rate, the "1 st", "2" and "12" categories (tags) with the most categories in the GTSRB data set are selected for discussion, as shown in table 2, although the performance of the two transformations fluctuates under different target tags, the two transformations maintain quite excellent performance, and the generality of the back door trigger proposed in the present invention is verified again.
(6) Robustness verification of the present invention against existing defenses
Robustness is also an important index for measuring backdoor attacks, namely, whether the backdoor trigger used by the invention can resist typical backdoor defense or not is verified. For better comparison and more visual demonstration, in addition to invention i and invention ii, the back door defense experiment of method 1 was also performed.
A. Trigger synthesis based defense
The documents BolunWang, yuanshunYao, shawnShan, huiyingLi, bimal Viswanath, haitaoZheng, andBandenYZo, "neuroclean: identifying and identifying background attacks in Neural networks," in IEEE S & P,2019 (noted Neural clean method) and the documents Edward Chou, floran primer, and Giancalo Pellegrino, "Sennet: detecting logic free neutral attack acquisition loss detection systems," in IEEE S & PWorkshop,2020 (noted Sentinit method) are all defense based on trigger synthesis. Fig. 5 shows a synthesized back-gate flip-flop by the Neural clean method. The defense idea of Neural clean is to synthesize the triggers used for the back door attack, the triggers used in the method 1 are the white pixel blocks at the lower right corner of 3 multiplied by 3, and the triggers of the invention I and the invention II are respectively rotation and translation. As shown in fig. 5, for method 1, the synthesized trigger is similar to its real trigger (i.e. white patch in the lower right corner), but the synthesized back-gate trigger is meaningless for the present invention, which indicates that the back-gate defense method can successfully defend method 1, but fails the attack proposed by the present invention.
FIG. 6 shows Grad-CAM after three different methods (method 1, invention I, invention II) for sample administration. As shown in fig. 6, the sentinent method can very accurately distinguish the location of the trigger of method 1, but cannot detect the area where the trigger of the present invention is located, which indicates that the sentinent method is also ineffective for the back door attack of the present invention.
B. Defense based on classical model repair
The literature Yuntao Liu, yang Xie, and Ankur Srivastava, "Neural trojans" in ICCD,2017 (denoted as Fine-tuning method) and the literature Kang Liu, bredan Dolan-Gavitt, and Siddharth Garg, "Fine-pruning: defeneding against learning from background tuning atterns on Neural networks" in RAID,2018 (denoted as pruning method) are all defenses based on classical model repair. Additionally, the resistance of the backdoor trigger used in the present invention to model-based repair backdoor defense methods was also explored, with the idea of deleting the implanted hidden backdoor from the victim model. FIG. 7 is a diagram illustrating the ability of the back door attack to resist the fine-tuning defense method according to an embodiment of the present invention. As shown in fig. 7, as the number of iterations increases, the attack success rate of the method 1 decreases significantly, but the index of the present invention is less affected. Fig. 8 shows the resistance of the backdoor attack to the pruning defense method proposed by the present invention. As can be seen from fig. 8, when the pruning rate is greater than 30%, the performance of the method 1 is greatly affected, but the backdoor attack proposed by the present invention is significantly reduced when the pruning rate is greater than 80%, and the robustness of the method of the present invention is verified again.
C. Defense based on advanced model repair
The defense against advanced model repair proposed by the documents Pu ZHao, pin-Yu Chen, payel Das, karthikeyan Natesan Ramamurthy, and Xue Lin, "Bridging mode connectivity in less than two peptides and adaptive robustness" in ICLR,2020 and the documents Yige Li, xixiang Lyu, nodes Koren, lingjuan Lyu, bo Li, and Xinjun Ma, "Neural anchoring failure: erasing backstage generators from Neural networks" in ICLR,2021 is denoted MCR, NAD.
TABLE 3 resistance of the backdoor attack proposed by the invention to MCR and NAD
Figure BDA0003976469580000101
Besides classical model repair, the resistance of the backdoor attack to advanced model repair proposed by the invention is also explored here, and as can be seen from table 3, the attack success rate of the method 1 is reduced to a very small value after the two defenses, while the attack success rate of the method of the invention is reduced to some extent, but still can be maintained above 60%.
In conclusion, the backdoor attack provided by the invention has strong robustness.
(7) Verification of backdoor attack method in physical world
Most of the existing backdoor attack methods only consider the digital world, but the real backdoor attack in a physical scene has more practical value. Here, it is verified that the method of the present invention is not only effective in the digital setting, but also effective in the physical setting, and for example, based on the camera of the iPhone mobile phone of apple, some real traffic signs are photographed at different angles, as shown in fig. 9. With the adoption of the damage model of the aforementioned experiment of the present invention to predict the tags of the captured images, all the images (images in the last column) having a specific angle (set rotation angle) are predicted as the target tags "erroneously" (i.e., "speed limit 30"), and the images (images in the first to third columns) having other angles are predicted as their real tags. Because the last column of images has been rotated to around 16 ° as used in the previous experimental training, the "misleading" network predicts that it is the corresponding target label "speed limit 30", while the images in columns 1-3 have been rotated by other angles (not 16 °), and are still predicted by the network as the original label. It is known that it is also feasible to use real shot images to verify that in the physical world, the back door hidden in the victim model can be activated directly by taking a picture.
Therefore, the method abandons the idea of additionally introducing a backdoor trigger by adding or modifying pixels and the like in the classic backdoor attack method, directly uses the real space transformation of the physical world as the backdoor attack method of the backdoor trigger, can be realized by directly using the original sample, is simple and efficient, and has high concealment, strong robustness and attack effect. Meanwhile, the space transformation is disclosed to be used as a back door defense, the back door attack is disabled by modifying the back door trigger, and the space transformation can also be used as the back door trigger to realize the attack. From the experimental verification, the invention carries out extensive attempts on a reference data set (GTSRB), and compared with different backdoor attack methods, the invention proves the effectiveness of the method provided by the invention, and has higher benign accuracy and extremely high attack success rate; meanwhile, factors such as trigger scale and target label variety which influence the rear door attack performance are disassembled, and the fact that the rear door attack performance can be kept very excellent under different settings is proved. Besides the digital scene, the invention also aims at verifying and showing the effectiveness of the proposed method in the real physical world, which makes a contribution to the future landing of backdoor attacks in the real physical scene. The idea provided by the invention can be applied to the fields with higher requirements on safety, such as the field of intelligent driving, face recognition and the like, provides a new idea for the future development of artificial intelligent safety problems, such as backdoor attack, backdoor defense and the like, and further improves the safety of a neural network model.
The embodiment of the present invention further provides a back door attack apparatus based on spatial transformation, which is an apparatus adapted to the back door attack method according to the foregoing embodiment of the present invention, and includes: a poisoning image generator for performing parameter setting spatial transformation on a part of image samples randomly selected from an original data set, and changing labels of the part of image samples into target labels to obtain a poisoning subset; a benign sample processor for performing spatial transformation of random parameters on the remaining benign image samples in the original data set, and keeping the labels unchanged to obtain a benign subset; the standard training module is used for performing standard training of a deep learning classification model by using a poisoning data set formed by the poisoning subset and the benign subset, and constructing a victim model implanted into the hidden backdoor, so that the victim model has the following characteristics: when the damaged model carries out classification prediction, the hidden back door of the damaged model can be activated by the samples which are subjected to the spatial transformation of the set parameters in the data set to be classified, so that the samples which are subjected to the spatial transformation of the set parameters in the data set to be classified are wrongly predicted as target labels, and the rest samples can be correctly predicted as real labels.
Furthermore, another embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the steps of the back door attack method based on spatial transformation according to the foregoing embodiments.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. It will be apparent to those skilled in the art that various equivalent substitutions and obvious modifications can be made without departing from the spirit of the invention, and all changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (9)

1. A backdoor attack method based on spatial transformation is characterized by comprising the following steps:
randomly selecting partial image samples from an original data set to perform parameter setting spatial transformation, changing labels of the partial image samples into target labels, and performing random parameter spatial transformation on the remaining benign image samples in the original data set under the condition of keeping the labels unchanged, so that the original data set is processed into a poisoning data set;
performing standard training of a deep learning classification model by using the poisoning data set, and constructing a damage model implanted into the hidden backdoor; when the damaged model is subjected to classification prediction, the hidden backdoor of the damaged model can be activated by the samples subjected to the spatial transformation of the set parameters in the data set to be classified, so that the samples subjected to the spatial transformation of the set parameters in the data set to be classified are wrongly predicted as target labels, and the rest of the samples can be correctly predicted as real labels.
2. A back-door attack method based on spatial transformation as claimed in claim 1, characterized in that: the spatial transformation comprises a rotation or a translation.
3. A back-door attack method based on spatial transformation as claimed in claim 1, characterized in that: the spatial transformation of the set parameters comprises rotating a set angle or translating a set distance; the spatial transformation of the random parameter comprises rotating a random angle or translating a random distance.
4. A back-door attack method based on spatial transformation as claimed in claim 3, characterized in that: the random rotation angle is randomly rotated within an angle range not including the set angle; the translation random distance is a random translation within a distance range not including the set distance.
5. A backdoor attack device based on spatial transformation, comprising:
a poisoning image generator for performing parameter setting spatial transformation on a part of image samples randomly selected from an original data set, and changing labels of the part of image samples into target labels to obtain a poisoning subset;
a benign sample processor for performing spatial transformation of random parameters on the remaining benign image samples in the original data set, and keeping the labels unchanged to obtain a benign subset;
the standard training module is used for performing standard training of a deep learning classification model by using a poisoning data set formed by the poisoning subset and the benign subset, and constructing a victim model implanted into the hidden backdoor, so that the victim model has the following characteristics: when the damaged model carries out classification prediction, the hidden back door of the damaged model can be activated by the samples which are subjected to the spatial transformation of the set parameters in the data set to be classified, so that the samples which are subjected to the spatial transformation of the set parameters in the data set to be classified are wrongly predicted as target labels, and the rest samples can be correctly predicted as real labels.
6. A backdoor attack device based on spatial transformation as claimed in claim 5, wherein: the spatial transformation comprises a rotation or a translation.
7. A backdoor attack device based on spatial transformation as claimed in claim 5, wherein: the spatial transformation of the set parameters comprises rotating a set angle or translating a set distance; the spatial transformation of the random parameter comprises rotating a random angle or translating a random distance.
8. The backdoor attack apparatus based on spatial transform of claim 7, wherein: the random rotation angle is randomly rotated within an angle range not including the set angle; the translation random distance is a random translation within a distance range not including the set distance.
9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program is capable of performing the steps of the method for a back door attack based on spatial transformations of claims 1-4 when executed by a processor.
CN202211539281.9A 2022-12-01 2022-12-01 Backdoor attack method, device and medium based on space transformation Pending CN115861695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211539281.9A CN115861695A (en) 2022-12-01 2022-12-01 Backdoor attack method, device and medium based on space transformation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211539281.9A CN115861695A (en) 2022-12-01 2022-12-01 Backdoor attack method, device and medium based on space transformation

Publications (1)

Publication Number Publication Date
CN115861695A true CN115861695A (en) 2023-03-28

Family

ID=85669374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211539281.9A Pending CN115861695A (en) 2022-12-01 2022-12-01 Backdoor attack method, device and medium based on space transformation

Country Status (1)

Country Link
CN (1) CN115861695A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473489A (en) * 2023-09-28 2024-01-30 华中科技大学 Back door attack method and defending method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473489A (en) * 2023-09-28 2024-01-30 华中科技大学 Back door attack method and defending method

Similar Documents

Publication Publication Date Title
Liu et al. Reflection backdoor: A natural backdoor attack on deep neural networks
Jia et al. Adv-watermark: A novel watermark perturbation for adversarial examples
Zhong et al. Shadows can be dangerous: Stealthy and effective physical-world adversarial attack by natural phenomenon
Li et al. Deep learning backdoors
KR102304661B1 (en) Attack-less Adversarial Training Method for a Robust Adversarial Defense
Byrnes et al. Data hiding with deep learning: A survey unifying digital watermarking and steganography
Bai et al. Hardly perceptible trojan attack against neural networks with bit flips
Wu et al. Just rotate it: Deploying backdoor attacks via rotation transformation
Yang et al. Robust roadside physical adversarial attack against deep learning in lidar perception modules
CN115861695A (en) Backdoor attack method, device and medium based on space transformation
Wang et al. Rethinking the vulnerability of dnn watermarking: Are watermarks robust against naturalness-aware perturbations?
Mirsky IPatch: a remote adversarial patch
Gong et al. B3: Backdoor attacks against black-box machine learning models
Zhan et al. AMGmal: Adaptive mask-guided adversarial attack against malware detection with minimal perturbation
Shi et al. Black-box Backdoor Defense via Zero-shot Image Purification
Wu et al. Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective
CN115758337A (en) Back door real-time monitoring method based on timing diagram convolutional network, electronic equipment and medium
Dong et al. Mind your heart: Stealthy backdoor attack on dynamic deep neural network in edge computing
CN114021136A (en) Back door attack defense system for artificial intelligence model
Hou et al. M-to-n backdoor paradigm: A stealthy and fuzzy attack to deep learning models
Westbrook et al. Adversarial attacks on machine learning in embedded and iot platforms
Wei et al. A lightweight backdoor defense framework based on image inpainting
CN116543268B (en) Channel enhancement joint transformation-based countermeasure sample generation method and terminal
Zhang et al. Certified defense against patch attacks via mask-guided randomized smoothing
Yu et al. Improving Adversarial Robustness Against Universal Patch Attacks Through Feature Norm Suppressing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination