CN116821897A - Label consistent type back door attack method based on re-parameterized steganography trigger - Google Patents

Label consistent type back door attack method based on re-parameterized steganography trigger Download PDF

Info

Publication number
CN116821897A
CN116821897A CN202310705268.4A CN202310705268A CN116821897A CN 116821897 A CN116821897 A CN 116821897A CN 202310705268 A CN202310705268 A CN 202310705268A CN 116821897 A CN116821897 A CN 116821897A
Authority
CN
China
Prior art keywords
image
back door
trigger
poisoning
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310705268.4A
Other languages
Chinese (zh)
Inventor
于菲
王波
王兆宁
杨子
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202310705268.4A priority Critical patent/CN116821897A/en
Publication of CN116821897A publication Critical patent/CN116821897A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7747Organisation of the process, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a label consistency type back door attack method based on a re-parameterized steganography trigger. The method comprises the steps of generating a poisoning image, constructing a training set with the poisoning image, training by using a training set guiding model, finishing mapping of a back door trigger and a target label, obtaining a damaged model, classifying the clean image correctly by the damaged model, converting the target label of the poisoning image into a preset label for outputting, and ensuring stronger concealment of the back door trigger while realizing back door attack.

Description

Label consistent type back door attack method based on re-parameterized steganography trigger
Technical Field
The invention relates to the technical field of model security, in particular to a label consistency type back door attack method based on a re-parameterized steganography trigger.
Background
With the continuous development of artificial intelligence technology, the research results of deep learning are widely applied in the fields of natural language processing, image recognition, signal processing, industrial control and the like. In the age of large models, the training cost of deep neural network models is higher and higher, and developers tend to fine tune and optimize the model by using the disclosed pre-training model and data set rather than training the model from scratch. However, the disclosed pre-training models and datasets are typically issued by an untrusted third party, the security of which is difficult to guarantee, and the possibility of being implanted into the back door exists. Once implanted into the backdoor, various applications based on deep neural network technology face a large security risk, which causes privacy leakage, property loss, and even endangers personal safety.
Badnets is an mountain-opening operation in the DNN back door attack field, and basic steps of back door attack are described in the paper, firstly, triggers are added for normal data to serve as poisoning data, then target labels appointed by attackers are marked on the poisoning data, finally, the poisoning data and the normal data are trained together, and Badnets successfully attacks on data sets such as MNIST. Blend demonstrates that the back door trigger can be set arbitrarily, and the concept of the concealment of the back door trigger is put forward for the first time, so that an attack method aiming at the concealment of the back door trigger becomes a popular research direction. Liao et al propose back door attacks using invisible counter-generated disturbances as triggers and employ two methods of generating disturbance back door patterns. Nguyen et al consider that humans can recognize inconsistent parts of pictures, and therefore propose to use small distortion of the preserved image content as a trigger to make the poison image more realistic and natural. Sarkar et al successfully implemented an invisible back door attack on a face recognition system using facial attributes or specific expressions as triggers. Recently Zhang et al have proposed an attack method of Poison ink, using an image structure as a target poisoning area, and filling it with Poison ink to generate a trigger.
In addition to the above-described back-gate attack method in which the data plane faces the training data set, back-gate attack may be performed at the model plane, for example, by directly modifying the model structure or implanting a back-gate into the weights. Liu et al propose a Trojan attack that assumes that the trigger can trigger abnormal behavior in the deep neural network, then generates a universal backdoor trigger through the inverse neural network, and finally modifies the model to achieve backdoor implantation. The PoTrojan attack is to insert a neuron PoTrojan into each layer of the AlexNet model to realize back door attack. Rakin et al propose a back-gate attack that modifies the weight bits, which turns over the key weight bits stored in memory. The Chen et al study further reduced the amount of inversion required to embed the hidden back door.
In the present stage, most backdoor attack methods are focused on improving the concealment of a backdoor trigger, and the processing of the key information of the image tag is ignored, namely, most of the existing backdoor attack methods are based on the premise of 'tag modification'. Although these attack methods are very effective in terms of attack success rate, they also have a fatal disadvantage: the labels of the back door image are obviously wrong. Such clearly erroneous labels are easily detected when manually checking or running the pre-sorting step.
In practice, therefore, a truly unobtrusive back door attack should be one that creates a more concealed back door trigger with the image tag unchanged.
Disclosure of Invention
According to the technical problem that the concealment of the back door trigger is poor, the label consistency type back door attack method based on the re-parameterized steganography trigger is provided. The invention mainly utilizes the generation of the poisoning image to construct a training set with the poisoning image, and then uses the training set to guide the model to train, so as to finish the mapping of the back door trigger and the target label, obtain a damaged model, and the damaged model classifies the clean image correctly, converts the target label of the poisoning image into a preset label for outputting, thereby realizing the back door attack and simultaneously ensuring stronger concealment of the back door trigger.
The invention adopts the following technical means:
the invention provides a label consistency type back door attack method based on a re-parameterized steganography trigger, which comprises the following steps:
the data set comprises a plurality of original images, wherein the original images comprise category labels, the original images comprise specific images and clean images, the clean images comprise first clean images and second clean images, and the category labels of the specific images are target labels;
adding a back door trigger to the specific image to convert the specific image into a poisoning image, wherein the poisoning image comprises a first poisoning image and a second poisoning image;
the first poisoning image and the first clean image form a training set, and the second poisoning image and the second clean image form a test set;
adopting the training set to train the baseline model to obtain a back door model, and finishing back door attack;
when the back door model inputs the second clean image, the back door model outputs the category label corresponding to the second clean image;
when the rear door inputs the second poisoning image, the rear door model converts the category label corresponding to the second poisoning image into a preset label and outputs the preset label.
Further, the adding a back door trigger to the specific image, converting the specific image into a poisoning image, includes:
extracting a first feature vector for the particular image using an encoder network;
performing reparameterization sampling on the first feature vector based on Gumbel-Softmax to obtain a second feature vector;
inputting the second feature vector into a decoder network, and obtaining a reconstructed image by the decoder network according to the second feature vector;
and re-encoding the character information of the back door trigger and the reconstructed image by using a pre-trained encoder-decoder network to form the poisoning image.
Further, the performing the reparameterized sampling on the first feature vector based on gummel-Softmax to obtain a second feature vector includes:
the Gumbel-Softmax is utilized to realize the sampling of polynomial distribution, which comprises the following steps: generating Geng Beier distributed random numbers with the same dimension as the first feature vector by the Gumbel-Softmax, and adding corresponding dimensions of the Geng Beier distributed random numbers into the first feature vector;
and smoothly distributing the output result of the argmax function of the first eigenvector added with the Geng Beier distributed random number by adopting a Softmax function to obtain the second eigenvector.
Further, the step of obtaining the second feature vector by smoothly distributing the output result of the argmax function of the first feature vector added with the Geng Beier distributed random number by using a Softmax function includes: the smooth distribution degree is controlled by adjusting the temperature coefficient, and the calculation is performed according to the following mode:
wherein f τ (X) Gumbel-Softmax output result of the first item of the first feature vector, X is the first feature vector, X k And (3) for the value of the kth position of the first eigenvector, τ is the temperature coefficient, and τ is more than 0.
Further, the Geng Beier probability density of the distributed random numbers is calculated as follows:
where x is a discrete variable, μ is a position coefficient, and η is a scale coefficient.
Further, the inputting the second feature vector into a decoder network, where the decoder network obtains a reconstructed image according to the second feature vector, including:
introducing a total loss function to limit in the process of obtaining the reconstructed image by the decoder network according to the second feature vector, and calculating according to the following mode:
L all =αL rect +βL ssim +γL act
wherein alpha is a first weight coefficient, beta is a second weight coefficient, gamma is a third weight coefficient, L all For the total loss function value, L rect For the first loss function value, L ssim For the second loss function value, L act And a third loss function value.
Further, the first loss function value is calculated as follows:
wherein I (I, j) is the true value of the (I, j) th pixel point in the specific image, and K (I, j) is the reconstructed value of the (I, j) th pixel point in the reconstructed image.
Further, the second loss function value is calculated as follows:
wherein mu x Mu, the average value of the specific image y Sigma, the mean value of the reconstructed image x Sigma, which is the standard value of the specific image y Sigma, which is the standard value of the reconstructed image xy C for the covariance of the particular image and the reconstructed image 1 Is a first constant, C 2 Is a second constant.
Further, the third loss function value is calculated as follows:
L act =log(1+p j )-λ·min(0,p j -max(p i )) i≠j
wherein p is j To obtain the classification confidence according to the target label, p i For classification confidence of other categories, λ is a weight coefficient.
Compared with the prior art, the invention has the following advantages:
1. the label consistency type back door attack method based on the re-parameterized steganography trigger adopts the ideas of image reconstruction and probability sampling, and the classification of the poisoning images is more dependent on the added back door trigger by interfering the key features of the images, so that the problem of consistency of the image labels is solved.
2. According to the label consistency type back door attack method based on the reparameterized steganography trigger, provided by the invention, the reparameterized noise sampling is carried out on the image characteristics in a Gumbel-Softmax sampling mode, so that the image characteristic vector not only reflects the characteristics of an original image, but also can conform to the meaning of probability, and certain key characteristics are converted into smoother distribution.
3. The label consistency type back door attack method based on the re-parameterized steganography trigger adopts the information hiding skill, introduces the least significant bit image steganography method, completes the embedding of the back door trigger by using a U-Net-style encoder-decoder network, re-encodes the re-parameterized image and the information to be hidden by using the encoder, and recovers the hidden information from the encoded image by using the decoder, thereby realizing the hiding of the back door trigger.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a frame diagram of a label-consistent back door attack method based on a re-parameterized steganography trigger provided by the invention.
Fig. 2 is a flow chart of a steganography process.
FIG. 3 is a visual effect contrast of poisoning images generated by different methods.
FIG. 4 is a plot of a poisoning ratio.
FIG. 5 is another toxicity ratio line graph.
FIG. 6 is a comparison of locating key areas.
FIG. 7 is another comparison of locating key areas.
Fig. 8 is a bar graph of defense results.
Detailed Description
In order that those skilled in the art will better understand the present invention, a detailed description of embodiments of the present invention will be provided below, with reference to the accompanying drawings, wherein it is apparent that the described embodiments are only some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a frame diagram of a label-consistent back door attack method based on a reparameterized steganographic trigger provided by the present invention, to illustrate a specific embodiment of a label-consistent back door attack method based on a reparameterized steganographic trigger provided by the present invention, including:
the data set comprises a plurality of original images, wherein the original images comprise category labels, the original images comprise specific images and clean images, the clean images comprise first clean images and second clean images, and the category labels of the specific images are target labels;
adding a back door trigger to the specific image to convert the specific image into a poisoning image, wherein the poisoning image comprises a first poisoning image and a second poisoning image;
the first poisoning image and the first clean image form a training set, and the second poisoning image and the second clean image form a testing set;
a back door model is obtained by adopting a training set training baseline model, and back door attack is completed;
when the back door model inputs a second clean image, the back door model outputs a category label corresponding to the second clean image;
when the rear door inputs the second poisoning image, the rear door model converts the category label corresponding to the second poisoning image into a preset label and outputs the preset label.
It can be understood that the back door trigger of the label consistency type back door attack method based on the re-parameterized steganography trigger provided by the embodiment can avoid DNN model and manual inspection, and has high concealment. The first poisoning image and the first clean image form a training set, the second poisoning image and the second clean image form a testing set, namely images between the training set and the testing set are not overlapped, referring to fig. 1, the implementation of back door attack can go through 3 stages, the first stage is a poisoning image generation stage, a specific image is converted into a poisoning image, the second stage is a training stage, the training set containing the poisoning image and a common image is adopted to guide a baseline model to train, mapping of a back door trigger and a target label is completed, the back door model is obtained, the back door model classifies the common image correctly, the target label of the poisoning image is converted into a preset label to be output, the hidden performance of the back door trigger can be guaranteed to be stronger while the back door attack is realized, the classification of the poisoning image is more dependent on the added back door trigger, and the problem of consistency of the image labels is solved.
Referring to fig. 1 to 8, fig. 2 is a flowchart of a steganography process, fig. 3 is a visual effect comparison diagram of poisoning images generated by different methods, fig. 4 is a poisoning ratio line graph, fig. 5 is another poisoning ratio line graph, fig. 6 is a comparison diagram of a positioning key area, fig. 7 is another comparison diagram of a positioning key area, and fig. 8 is a defending result histogram, to illustrate another specific embodiment of a tag-consistent back door attack method based on a re-parameterized steganography trigger provided by the present invention, including:
the data set comprises a plurality of original images, wherein the original images comprise category labels, the original images comprise specific images and clean images, the clean images comprise first clean images and second clean images, and the category labels of the specific images are target labels;
adding a back door trigger to the specific image to convert the specific image into a poisoning image, wherein the poisoning image comprises a first poisoning image and a second poisoning image;
wherein adding a back door trigger to the specific image converts the specific image into a poisoning image, comprising:
extracting a first feature vector from the specific image by using an encoder network;
performing reparameterization sampling on the first feature vector based on Gumbel-Softmax to obtain a second feature vector, including:
sampling of polynomial distributions using gummel-Softmax, comprising: gumbel-Softmax generates Geng Beier distributed (Gumbel distributed) random numbers with the same dimension as the first feature vector, and adds Geng Beier distributed (Gumbel distributed) random numbers to the first feature vector in the corresponding dimension;
and (3) adopting a Softmax function to realize smooth distribution on the output result of the argmax function of the first eigenvector added with Geng Beier distributed (Gumbel distributed) random numbers to obtain a second eigenvector, wherein the smooth distribution degree is controlled by adjusting the temperature coefficient, and the calculation is performed in the following way:
wherein f τ (X) Gumbel-Softmax output result of the first item of the first feature vector, X is the first feature vector, X k The value τ is a temperature coefficient, which is a parameter larger than 0, and the larger the value τ is, the smoother the generated distribution is.
Optionally, the probability density of the random number is Geng Beier distributed (gummel distribution) calculated as follows:
where x is the discrete variable, μ is the position coefficient, η is the scale coefficient, μ is the mode of the Geng Beier distribution (Gumbel distribution), and the variance of the Geng Beier distribution (Gumbel distribution) is equal to
It can be understood that the encoder network is used to extract the first feature vector from the specific image, and the first feature vector is subjected to the re-parameterization sampling to obtain the second feature vector, so that the second feature vector input into the decoder network can reflect the characteristics of the original image and conform to the meaning of probability, and certain key features are converted from one-hot form into smoother distribution, so that the feature reduction is realized.
Inputting the second feature vector into a decoder network, the decoder network deriving a reconstructed image from the second feature vector, comprising:
in the process that the decoder network obtains the reconstructed image according to the second feature vector, the total loss function is introduced to limit, and the calculation is carried out according to the following mode:
L all =αL rect +βL ssim +γL act
wherein alpha is a first weight coefficient, beta is a second weight coefficient, gamma is a third weight coefficient, L all As a total loss function value, L rect For the first loss function value, L ssim For the second loss function value, L act And the third loss function value is 1 by default of alpha, beta and gamma.
Alternatively, the mean square error loss, i.e. the first loss function value, is calculated by taking the square of the pixel difference between the reconstructed image and the specific image at the element level and then taking the average value, as follows:
wherein I (I, j) is the true value of the (I, j) th pixel point in the specific image, and K (I, j) is the reconstructed value of the (I, j) th pixel point in the reconstructed image.
Alternatively, according to neuroscience studies, humans are more biased to the structural similarity of the two images when measuring the distance between the two images, so a second loss function value is introduced, calculated as follows:
wherein mu x Mu, the average value of a specific image y Sigma, for reconstructing the mean value of the image x Sigma, which is the standard value of a specific image y Sigma, the standard value of the reconstructed image xy C for covariance of particular image and reconstructed image 1 Is a first constant, C 2 Is a second constant, C 1 And C 2 For stable calculations.
Optionally, an activation loss L is provided for the purpose of reducing key features and ensuring correct classification of the reconstructed image act . Firstly, obtaining the classification confidence corresponding to the reconstructed image of a specific image according to the class label of the specific image, and then calling the activation loss, wherein the confidence is obtainedThe degree is reduced and the maximum is ensured, and the activation loss value, namely the third loss function value, is calculated according to the following mode:
L act =log(1+p j )-λ·min(0,p j -max(p i )) i≠j
wherein p is j To obtain the classification confidence according to the target label, p i For classification confidence of other categories, λ is a weight coefficient, defaulting to 1.
The character information of the back door trigger and the reconstructed image are recoded by a pre-trained encoder-decoder network to form a poisoning image.
It will be appreciated that referring to fig. 2, if the character information of the back gate trigger is intended to be concealed in the reconstructed image, it may be implemented by least significant bit steganography (Least Significant Bit Steganography, abbreviated LSB steganography) performed by replacing the lowest binary bit in the RGB color components of the image with the information of the data to be concealed. Since the LSB algorithm modifies the least significant bits of the image, which does not result in a significant change in color, the human eye does not notice this change before and after, so least significant bit steganography is the most popular algorithm that embeds hidden data into the image without being discovered.
Specifically, the encoder-decoder network uses a U-Net style architecture, with the encoder and decoder portions trained simultaneously on a training set containing only normal images, with the encoder hiding the character information of the back gate trigger into the reconstructed image, producing a poisoned image. Ideally, there is no perceptual difference between the poisoned image and the corresponding specific image, and the decoder recovers the hidden information from the poisoned image. As shown on the right side of the "generate toxic image stage" in fig. 1, the encoder re-encodes the re-parameterized reconstructed image and the character information of the back door trigger to be hidden to generate the toxic image. The character information of the back door trigger, i.e. the text trigger, can be arbitrarily designed, and can be a target label, a proper noun, or a random nonsensical character string, which is not particularly limited in this embodiment, and the construction of the poisoning image is completed through this step, and meanwhile, the concealment of the back door trigger is realized.
The first poisoning image and the first clean image form a training set, and the second poisoning image and the second clean image form a testing set;
a back door model is obtained by adopting a training set training baseline model, and back door attack is completed;
when the back door model inputs a second clean image, the back door model outputs a category label corresponding to the second clean image;
when the rear door inputs the second poisoning image, the rear door model converts the category label corresponding to the second poisoning image into a preset label and outputs the preset label.
It should be noted that, the label consistency type back door attack method based on the re-parameterized steganography trigger provided in this embodiment detects an attack effect through experiments, including:
s1: setting experimental data, comprising: adopting an MNIST data set and a CIFAR10 data set for testing, wherein the MNIST data set comprises 6 ten thousand training images and 1 ten thousand testing images and is divided into 10 category labels; the CIFAR10 dataset consisted of 6 ten thousand 32x32 color images, divided into 10 category labels, containing 5 ten thousand training pictures and 1 ten thousand test pictures. The encoder network and the decoder network are existing networks widely applied to image reconstruction tasks, and an encoder of the encoder-decoder network selects DNN of U-Net style as an encoder, and a decoder of the encoder-decoder network uses a space transformer network. The baseline model, i.e. the image classification model, may be any one of popular network structures such as ResNet18, resNet50, denseNet121, google Net, etc., and is not limited thereto. In the experimental process of this embodiment, resNet18 is adopted by default, and the generalization of the method provided in this embodiment is demonstrated by using other three network structures. The evaluation indexes of the experiment comprise validity indexes and concealment indexes, wherein the validity indexes comprise clean data accuracy (Clean Data Accuracy, CDA) and attack success rate (Attack Success Rate, ASR). Specifically, clean Data Accuracy (CDA) refers to the probability that a normal image without a back gate trigger will be correctly predicted as their true category labels; attack Success Rate (ASR) refers to the probability that a poisoned image with a backdoor trigger will successfully predict a preset tag designated for an attacker. For a successful backdoor model, the Attack Success Rate (ASR) should be kept at a high level while the Clean Data Accuracy (CDA) is also close to that of the clean model. The evaluation index of the experiment also includes a concealment index including local similarity (Peak Signal Noise Ratio, PSNR) and global similarity (Structural Similarity, SSIM). The greater the local similarity (PSNR) and the closer the global similarity (SSIM) is to 1, the better the concealment of the back door attack is explained.
Finally, the default dose ratio is 10% and all models of victim classifiers use SGD optimizers with momentum of 0.9. For MNIST dataset, initial learning rate is 1e -5 For the CIFAR10 dataset, the initial learning rate was 0.1. The learning rate adopts a cosine annealing mode T max =100. In the experimental procedure, the default settings above were used in this example when compared to other methods.
S2: through the effectiveness of the test attack, 5 classical back door attacks are selected as comparison methods, including Badnets, blend, poison ink, SIG and CL, and are divided according to whether labels are consistent or not, and Ours is the method provided by the invention, and referring to tables 1 and 2, it can be seen that in the label consistent back door attack, the method of the embodiment obtains remarkable improvement on the Attack Success Rate (ASR); compared with the tag inconsistent type back door attack, the method also obtains comparable attack results. Furthermore, clean Data Accuracy (CDA) achieves the best results on the CIFAR10 dataset. Barni et al prove through experiments that compared with a tag inconsistent type back door attack, the back door attack consistent with the tag needs to destroy more samples to realize successful attack, which fully explains the difficulty of the back door attack under clean tag setting, so that the result of obtaining the effect which is not superior to that of the back door attack inconsistent with the tag is an important breakthrough of the method provided by the embodiment.
Table 1. Experimental results of attack effectiveness on MNIST dataset (%)
TABLE 2 experimental results of the effectiveness of the attack on CIFAR10 dataset (%)
S3: by detecting the concealment of the attack through experiments, fig. 3 intuitively presents the poisoning images generated by different attack methods, and as can be seen from fig. 3, the concealment of the poisoning images generated by Badnets, blend, SIG and CL is poor, and the concealment of Poison ink is the best. In addition, in addition to visual effects, the present embodiment also provides objective measurement of hidden data, see tables 3 and 4, on MNIST datasets, the methods herein achieve optimal results on both global similarity (SSIM) and local similarity (PSNR); the results of the method of the invention were slightly inferior on the CIFAR10 dataset. The analysis reasons may be that the method herein has been subjected to a re-parameterized sampling, resulting in deviations in color space that are not represented on black and white MNIST datasets, but are not revealed on color-rich CIFAR10 datasets. However, referring to the intuitive result of fig. 3, it is difficult for the human eye to determine that the image generated by the method of the present embodiment is a toxic image without specific image contrast due to the sample specificity of the embedded back gate trigger. In addition, in the manual inspection process of the actual situation, the possibility of having the original specific image is basically not existed, so the poisoning image generated by the method of the embodiment is also endless in concealment.
Table 3. Objective measures of concealment on mnist dataset
Table 4. Objective measures of concealment on CIFAR10 dataset
S4: through experimental detection of generalization, in this embodiment, resNet18, resNet50, denseNet121 and Google Net are adopted as a comparison network structure, specific results refer to tables 5 and 6, and it can be seen from tables 5 and 6 that the attack method of this embodiment can still keep a very high attack success rate on different models, and the original performance is not affected.
Table 5 generalization performance of different models on mnist dataset
TABLE 6 generalization performance of different models on CIFAR10 dataset
S5: the importance of the re-parameterization operation is verified through experiments, and in order to prove the importance of the re-parameterization noise sampling in the method provided by the embodiment, ablation experiments are carried out on the re-parameterization noise sampling in the embodiment, and the experimental results are shown in table 7. It can be seen from table 7 that if the re-parameterization is removed, the attack success rate is greatly reduced, which means that the back door model trained by the user does not complete the mapping of the back door trigger to the target tag, so that the re-parameterization is necessary.
TABLE 7 verification of importance of reparameterization operations on CIFAR10 dataset
S6: the influence of the loss function on the concealment is verified through experiments, and in the process of constructing the reconstructed image, the invention uses three loss functions, namely a first loss function value L rect A second loss function value L ssim And a third loss function value L act Needle of this embodimentThe effect of these three loss functions on the reconstructed image was experimentally tested and the specific results are shown in table 8. As can be seen from table 8, the reconstruction effect using only either loss function is not expected; at a first loss function value L rect Then, add a second loss function value L ssim The image quality can be slightly improved but to achieve the best results, the three loss functions need to be used simultaneously.
TABLE 8 influence of different loss functions on image concealment
S7: the influence of the toxicity proportion on the experimental result is verified through experiments, and the default toxicity proportion of the method is 10%. Referring to fig. 4 and 5, further attempts were made to increase the proportion of toxin administered during the course of the experiment. Barni et al prove through experiments that the tag inconsistent type back door attack can be successful by only injecting 1% -4% of back door samples, and the tag consistent type back door attack requires 10% -30% of poisoning proportion. As can be seen from fig. 4 and 5, although the Attack Success Rate (ASR) of the present invention is somewhat reduced when the poisoning ratio is less than 10%, the Attack Success Rate (ASR) of 85% is achieved even when the poisoning ratio is 1%.
S8: experiments prove that the method of the embodiment resists the defending method, and in order to resist the back door attack, researchers propose a plurality of defending methods, including a defending method of a data layer and a defending method of a model layer. In step S8, the present embodiment takes a classical method from each of the two defense types to perform an experimental test, which includes:
s81: the SentiNet detection method uses model interpretability and a target detection technology as a detection mechanism, the Grad-Cam technology is utilized to visualize the attention map of a target image, the positioning of a back door trigger is realized, the experimental result is shown in fig. 6 and 7, and the SentiNet does not detect a trigger area generated by the attack method of the invention.
S82: the nerve cleaning method uses the gradient descent method to calculate potential triggers for all outputs of the classifier and returns an abnormality index for the classifier, which is considered as a compromised classifier if it is greater than 2. As a result of the experiment, see fig. 8, the back door model of the present embodiment can bypass the detection of the Neural clean.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (9)

1. The label consistent type back door attack method based on the re-parameterized steganography trigger is characterized by comprising the following steps of:
the data set comprises a plurality of original images, wherein the original images comprise category labels, the original images comprise specific images and clean images, the clean images comprise first clean images and second clean images, and the category labels of the specific images are target labels;
adding a back door trigger to the specific image to convert the specific image into a poisoning image, wherein the poisoning image comprises a first poisoning image and a second poisoning image;
the first poisoning image and the first clean image form a training set, and the second poisoning image and the second clean image form a test set;
adopting the training set to train the baseline model to obtain a back door model, and finishing back door attack;
when the back door model inputs the second clean image, the back door model outputs the category label corresponding to the second clean image;
when the rear door inputs the second poisoning image, the rear door model converts the category label corresponding to the second poisoning image into a preset label and outputs the preset label.
2. The label-consistent back door attack method based on a re-parameterized steganography trigger of claim 1, wherein the adding a back door trigger to the specific image converts the specific image into a poisoning image, comprising:
extracting a first feature vector for the particular image using an encoder network;
performing reparameterization sampling on the first feature vector based on Gumbel-Softmax to obtain a second feature vector;
inputting the second feature vector into a decoder network, and obtaining a reconstructed image by the decoder network according to the second feature vector;
and re-encoding the character information of the back door trigger and the reconstructed image by using a pre-trained encoder-decoder network to form the poisoning image.
3. The label-consistent back door attack method based on a reparameterized steganographic trigger according to claim 2, wherein the reparameterizing the first feature vector based on gummel-Softmax to obtain a second feature vector includes:
the Gumbel-Softmax is utilized to realize the sampling of polynomial distribution, which comprises the following steps: generating Geng Beier distributed random numbers with the same dimension as the first feature vector by the Gumbel-Softmax, and adding corresponding dimensions of the Geng Beier distributed random numbers into the first feature vector;
and smoothly distributing the output result of the argmax function of the first eigenvector added with the Geng Beier distributed random number by adopting a Softmax function to obtain the second eigenvector.
4. The label-consistent back door attack method based on a re-parameterized steganography trigger according to claim 3, wherein the adopting a Softmax function to smoothly distribute the first feature vector added with the Geng Beier distributed random number through an output result of an argmax function to obtain the second feature vector includes: the smooth distribution degree is controlled by adjusting the temperature coefficient, and the calculation is performed according to the following mode:
wherein f τ (X) Gumbel-Softmax output result for the first item of the first feature vector, X k And (3) for the value of the kth position of the first eigenvector, τ is the temperature coefficient, and τ is more than 0.
5. The label-consistent back door attack method based on re-parameterized steganography trigger of claim 3, wherein the Geng Beier distribution probability density of random numbers is calculated as follows:
where x is a discrete variable, μ is a position coefficient, and η is a scale coefficient.
6. The label-consistent back door attack method based on a re-parameterized steganography trigger according to claim 2, wherein the inputting the second feature vector into a decoder network, the decoder network obtaining a reconstructed image from the second feature vector, comprises:
introducing a total loss function to limit in the process of obtaining the reconstructed image by the decoder network according to the second feature vector, and calculating according to the following mode:
L all =αL rect +βL ssim +γL act
wherein alpha is a first weight coefficient, beta is a second weight coefficient, gamma is a third weight coefficient, L all For the total loss function value, L rect For the first loss function value, L ssim For the second loss function value, L act And a third loss function value.
7. The re-parameterized steganographic trigger based tag-consistent backdoor attack method of claim 6, wherein the first loss function value is calculated as follows:
wherein I (I, j) is the true value of the (I, j) th pixel point in the specific image, and K (I, j) is the reconstructed value of the (I, j) th pixel point in the reconstructed image.
8. The re-parameterized steganographic trigger based tag-consistent backdoor attack method of claim 6, wherein the second loss function value is calculated as follows:
wherein mu x Mu, the average value of the specific image y Sigma, the mean value of the reconstructed image x Sigma, which is the standard value of the specific image y Sigma, which is the standard value of the reconstructed image xy C for the covariance of the particular image and the reconstructed image 1 Is a first constant, C 2 Is a second constant.
9. The label-consistent back door attack method based on a re-parameterized steganographic trigger of claim 6, wherein the third loss function value is calculated as follows:
L act =log(1+p j )-λ·min(0,p j -max(p i )) i≠j
wherein p is j To obtain the classification confidence according to the target label, p i For classification confidence of other categories, λ is a weight coefficient.
CN202310705268.4A 2023-06-14 2023-06-14 Label consistent type back door attack method based on re-parameterized steganography trigger Pending CN116821897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310705268.4A CN116821897A (en) 2023-06-14 2023-06-14 Label consistent type back door attack method based on re-parameterized steganography trigger

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310705268.4A CN116821897A (en) 2023-06-14 2023-06-14 Label consistent type back door attack method based on re-parameterized steganography trigger

Publications (1)

Publication Number Publication Date
CN116821897A true CN116821897A (en) 2023-09-29

Family

ID=88125103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310705268.4A Pending CN116821897A (en) 2023-06-14 2023-06-14 Label consistent type back door attack method based on re-parameterized steganography trigger

Country Status (1)

Country Link
CN (1) CN116821897A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117896187A (en) * 2024-03-15 2024-04-16 东北大学 Multi-objective optimization-based federal learning multi-attacker back door attack method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117896187A (en) * 2024-03-15 2024-04-16 东北大学 Multi-objective optimization-based federal learning multi-attacker back door attack method

Similar Documents

Publication Publication Date Title
Abdelnabi et al. Adversarial watermarking transformer: Towards tracing text provenance with data hiding
Carlini et al. Evading deepfake-image detectors with white-and black-box attacks
US11443178B2 (en) Deep neural network hardening framework
CN110929515A (en) Reading understanding method and system based on cooperative attention and adaptive adjustment
CN110110318A (en) Text Stego-detection method and system based on Recognition with Recurrent Neural Network
Srinivasan et al. Robustifying models against adversarial attacks by langevin dynamics
CN116821897A (en) Label consistent type back door attack method based on re-parameterized steganography trigger
Yin et al. Defense against adversarial attacks by low‐level image transformations
Wang et al. HidingGAN: High capacity information hiding with generative adversarial network
Sankaranarayanan et al. Semantic uncertainty intervals for disentangled latent spaces.
Mao et al. Transfer attacks revisited: A large-scale empirical study in real computer vision settings
CN112241741A (en) Self-adaptive image attribute editing model and method based on classified countermeasure network
Surabhi et al. Advancing Faux Image Detection: A Hybrid Approach Combining Deep Learning and Data Mining Techniques
CN117272113B (en) Method and system for detecting illegal behaviors based on virtual social network
Liu et al. Jacobian norm with selective input gradient regularization for interpretable adversarial defense
Kong et al. Data redaction from conditional generative models
CN118037641A (en) Multi-scale image tampering detection and positioning method based on double-flow feature extraction
Abdollahi et al. Image steganography based on smooth cycle-consistent adversarial learning
Zhao et al. Exploring Clean Label Backdoor Attacks and Defense in Language Models
CN116895089A (en) Face diversified complement method and system based on generation countermeasure network
Mira Deep learning technique for recognition of deep fake videos
CN113205044B (en) Deep fake video detection method based on characterization contrast prediction learning
Liu et al. Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model
Liu et al. Semi-supervised anomaly detection based on improved adversarial autoencoder and ensemble learning
Neekhara Synthesis and Robust Detection of AI-generated Media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination