CN115620100A - Active learning-based neural network black box attack method - Google Patents

Active learning-based neural network black box attack method Download PDF

Info

Publication number
CN115620100A
CN115620100A CN202211188638.3A CN202211188638A CN115620100A CN 115620100 A CN115620100 A CN 115620100A CN 202211188638 A CN202211188638 A CN 202211188638A CN 115620100 A CN115620100 A CN 115620100A
Authority
CN
China
Prior art keywords
model
attack
target
training
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211188638.3A
Other languages
Chinese (zh)
Inventor
翔云
胡晋瑄
李宇浩
陈作辉
朱家琪
宣琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202211188638.3A priority Critical patent/CN115620100A/en
Publication of CN115620100A publication Critical patent/CN115620100A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S40/00Systems for electrical power generation, transmission, distribution or end-user application management characterised by the use of communication or information technologies, or communication or information technology specific aspects supporting them
    • Y04S40/20Information technology specific aspects, e.g. CAD, simulation, modelling, system security

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

A neural network black box attack method based on active learning comprises the following steps: (S1): training a target model by using target training data, and pre-training a local surrogate model by using surrogate model training data which is not the target training data; (S2): screening non-target training data by using an active learning method, and retraining the model pre-trained in S1; (S3): generating a countermeasure sample for the substitute model trained by the S2 by using a countermeasure attack method, and migrating the countermeasure sample to a target model for attack; the surrogate model training method provided by the invention greatly reduces the request times of the target model under the condition of the gray box setting, reduces the request cost, and improves the attack success rate of the target model under the black box setting.

Description

Neural network black box attack method based on active learning
Technical Field
The invention relates to the field of deep learning counterattack, in particular to a neural network black box attack method based on active learning.
Background
The deep neural network models are widely applied in industrial application, and as the usage of the deep neural network models increases, the problem that the safety performance of the network models meets the requirements of people on the usage is solved. For an attacker, the neural network is likely to make an error in the identification of the sample by applying a small perturbation to the sample that is invisible to the naked eye. The counterattack mainly comprises black box attack and white box attack, and the white box attack requires an attacker to have prior knowledge about a model, including model structure, weight, data used for target model training, classification results (including prediction labels and prediction probability) and the like. With the information, the white-box attack can calculate the gradient of the target model, and a countermeasure sample with high success rate for the target model attack is prepared according to the gradient. Although the success rate of the white-box attack is high, the prior knowledge is unrealistic in a real scene, and the black-box attack is more consistent with the real scene. In contrast, black box attacks have very limited knowledge of the target model. In real-world applications, DNN classification services typically do not provide information about their systems, thereby preventing threats on network security. It is desirable to prevent opponents and competitors from gaining details of their own models. An adversary can only access the target model, i.e., use queries, through the exposed service interface. Therefore, the gradient-based white-box attack algorithm cannot be directly applied in the black-box scenario, and the number of queries and the query cost become an important issue.
Black-box attacks can be largely classified into query-based attacks, model migration-based attacks, and surrogate model attacks. The main disadvantages of the existing work on black box attacks are that query-based attacks require too many requests and the cost of the requests is too high. The disadvantage of the model-based migration attack is mainly that the success rate of the attack is not high, i.e. the attack effect is not good. In a surrogate model attack, an attacker first queries a victim model to obtain a label, and then trains a surrogate model using the queried label. The surrogate and victim models should be as close as possible to allow for better mobility of the antagonistic sample. After proper training, subsequent attacks can reuse the obtained surrogate models without querying the victim model. Compared with the two methods, although the request times are reduced by the surrogate model attack, the defect of large request times still exists in the stage of training the surrogate model. The latest black-box attack proposes a method (DatT Data-free customization Training for adaptive Attacks) that does not require real Data to train a surrogate model, and although this method works well on some Data sets and models, it has the fatal disadvantage that the number of requests is very large, and such a large request cost is unrealistic in real scenarios. And (3) the mock network (knock off nets) adaptively selects the query image by using reinforcement learning so as to steal the function of the victim model and train a substitute model fitting the target model for carrying out substitute model attack. The method comprises the steps of generating a substitution model to simulate a decision boundary of an approximate attacked model by Practical black-box attack (Practical black-box attacks) and selecting a general structure of the model according to common knowledge, wherein for example, CNN can be used for image recognition. The black box attack method for the brain-computer interface system with the publication number of CN112149609A is characterized in that a substitution model is trained by searching samples on two sides of a decision boundary of the substitution model, the same or better attack performance can be obtained under the condition of less inquiry times, a generated confrontation sample has little disturbance, and is hardly different from an original EEG signal in a time domain and a frequency domain and cannot be easily found, so that the attack efficiency of the black box attack in the brain-computer interface system is greatly improved, but the attack efficiency does not include a target attack with higher difficulty, the application field is very limited, the attack success rate is only improved for a non-target attack with small difficulty, the generated substitution model is only suitable for an attack method based on a Jacobi matrix. Based on the perspective of the decision boundary, the different distribution of data and the randomness in the training process cause the surrogate model and the target model to have different decision boundaries. This illustrates that the decision boundaries of the surrogate model are significantly different from the target model in query-based and migration-based attacks. Based on this, it is proposed to fit local surrogate models to the decision boundaries of the target model by active learning. In addition, many existing black box attacks are consistent with training data selected by a target model and a substitution model, which is a strong assumption for an attacker, but the setting of the technology does not need the strong assumption, and samples used by the active learning part are also inconsistent with training samples of the target model.
Disclosure of Invention
The invention provides a neural network black box attack method based on active learning, which aims to overcome the defects in the prior art.
The invention aims to solve the problem of high request cost in the field of resisting attack by deep learning at present, and provides a substitution model attack method for reducing the times of black box attack requests and improving the attack success rate.
The technical conception of the invention is as follows: the technical difficulty of attacking under the black box scene in the current deep learning security field mainly lies in that the number of requests during training of the substitution model is too large, the request cost is too high, and the success rate of migration attack is poor in the target attack scene. The effect of combining active learning is to query samples near the decision boundary of the model, which has positive effect on the training of the substitute model and the approximation of the target model. Therefore, the invention aims to solve the problems of excessive cost of the black box attack request and poor attack effect.
The technical scheme adopted by the invention for realizing the aim is as follows:
a neural network black box attack method based on active learning is characterized by comprising the following steps:
s1: training a target model by using target training data, and pre-training a local substitution model by using non-target training data;
s2: screening non-target training data by using an active learning method, and retraining the model pre-trained in S1;
s3: and (4) generating a countermeasure sample for the S2-trained substitute model by using a countermeasure attack method, and migrating to the target model for attack.
Preferably, the step S1 specifically includes:
s1.1: randomly scrambling using surrogate model training data;
s1.2: taking a sample of the randomly disturbed target model training data for training the target model, avoiding the repetition of the sample and a subsequent training local substitution model, and storing the model;
s1.3: and (4) using the alternative model training data to request the input and output obtained by the target model to train the local alternative model.
Preferably, the step S2 specifically includes:
s2.1: inputting the rest samples of the non-target training data into the local substitution model trained in the S1, and obtaining the confidence coefficient of each sample passing through the substitution model;
s2.2: setting a confidence threshold, and taking the picture with the confidence lower than the threshold in the S2.1 for subsequent training;
s2.3: adding the images lower than the confidence coefficient threshold value to resist disturbance, and inputting the images before and after disturbance into a target model together for query to obtain corresponding labels;
s2.4: and (5) retraining the local substitution training by using the input and output obtained in the steps, and storing the model.
Preferably, the step S3 specifically includes:
s3.1: using the substitution model trained in S2, carrying out attack under different attack methods under a local substitution model white box on the samples of the training data of the randomly disturbed test set by using the gradient information of the substitution model, and storing corresponding confrontation samples for subsequent migration attack;
s3.2: and carrying out migration attack on the target model by using the countermeasure sample generated by the substitute model, wherein the method comprises FGSM, PGD, BIM, CW and other attack methods. BIM attacks mainly use the gradient of the model to determine the direction in which to add the perturbation. It screens the optimal disturbance in an iterative manner; the CW attack takes an artificially input image as a parameter to be optimized and then trains the input with fixed model parameters to add a perturbation attack.
Their attacks are defined as follows:
Figure BDA0003868486350000051
Figure BDA0003868486350000052
Figure BDA0003868486350000053
Figure BDA0003868486350000054
where x is the sample before the perturbation is added; ε is an adjustment factor; theta is a model parameter;
Figure BDA0003868486350000055
indicating that the loss values are passed back to the input image and the gradients are calculated; sign () function is a function for solving numerical signs, which takes 1 when the input value is greater than 0, takes 0 when the input value is equal to 0, and takes-1 when the input value is less than 0; the clip () function is a function for preventing attack failure due to an excessively large gradient span, and constrains the disturbance within a prescribed range; a is a parameter controlling the iteration; w is a n Is an optimization parameter; x n Represents a clean sample;
s3.3: after the anti-sample migration attack is generated, the success rate of target attack and the success rate of non-target attack are respectively recorded, and the comparison is carried out without using an active learning method.
The beneficial effects of the invention are as follows:
(1) The alternative model training method provided by the invention greatly reduces the request times of the target model under the condition of ash box setting, and reduces the request cost;
(2) The alternative model training method provided by the invention improves the attack success rate of the target model to a certain extent under the black box setting.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 shows the results of training the target model, FIG. 2 (a) shows the results of training the vgg16_ bn model with the set partial CIFAR10 dataset, FIG. 2 (b) shows the results of training the Resnet18 model with the set partial CIFAR10 dataset, and FIG. 2 (c) shows the results of training the MobileNet V2 model with the set partial CIFAR10 dataset, where the abscissa is the epoch of the training and the ordinate is the test accuracy on the selected partial test set of CIFAR 10;
FIG. 3 is a training process for fitting a surrogate model decision boundary to a target model.
Detailed Description
The following detailed description of specific embodiments of the invention is provided in connection with the accompanying drawings.
Example one
Referring to fig. 1 to 3, a neural network black box attack method based on active learning includes the following steps:
s1, randomly scrambling data of a CIFAR10 training set, selecting a model structure of a target model, training the target model, and pre-training a local substitution model, wherein the training data of the substitution model and the target model are not repeated, and the training results of the target model and the substitution model are as shown in figure 2, and specifically comprise the following steps:
setting a pseudo-random seed number seed23 for a training set of a CIFAR10 to ensure that the disordering sequence of the disordering samples of the data set is consistent each time, wherein the randomly disordering samples of target training data are selected for training a target model, the structure of the target model respectively selects Vgg16_ bn, resnet18 and MobileNet V2 as a typical light weighting model, and the training parameters shared by the three models are as follows: in the SGD optimizer, the momentum coefficient is set to be 0.9, the weight attenuation coefficient weight _ decay is set to be 5e-4, and the training parameters of vgg16' u bn include: the initial learning rate is 0.001, the learning rate is reduced by ten times at 135epoch and 185epoch, and the training parameters of Resnet18 are: the initial learning rate is 0.1, the learning rate is reduced by ten times at 135epoch and 185epoch, and the training parameters of MobileNetV2 are: the initial learning rate is 0.1, the learning rate is reduced by ten times at 20epoch,50epoch and 80epoch, wherein the accuracy of vgg16_ bn reaches 88.45% on the selected 5000 CIFAR10 test sets, the accuracy of Resnet18 reaches 91.79% on the selected 5000 CIFAR10 test sets, and the accuracy of MobileNet V2 reaches 85.86% on the selected 5000 CIFAR10 test sets. And setting random seed numbers which are the same as those of the S1 to a training set of the CIFAR10, ensuring that the disorder sequence is consistent with the S1, wherein a sample of one-half proportion of randomly disordered training data of the substitution model is selected to be input into the target model trained by the S1 for query output, and an output label is obtained and recorded for subsequent training of the local substitution model.
S2: screening non-target training data by using an active learning method, retraining the model pre-trained in the S2, and performing an active learning process and a surrogate model retraining process as shown in FIG. 3, which specifically includes:
the test set of CIFAR10 is set with the same random seed number as S1, S2. Dividing a data set, taking a sample of the remaining half proportion of the random disturbed alternative model training data as non-target training data to be screened for active learning, screening the sample based on the sample close to a decision boundary with higher uncertainty, setting a confidence threshold to be 0.6, wherein in S3.1, pictures with confidence degrees lower than the threshold are samples with high training importance for the alternative model, storing the samples with important training and retraining the local alternative model, wherein the training parameters of three network structures are preferred: the initial learning rate is 0.1, the learning rate drops by ten times at 135epoch and 185epoch, and the surrogate model retraining process is as follows: inputting a local substitution model for the selected sample to check the confidence, saving the sample below 0.6, inputting the target model, and inquiring the output label by the request times to continuously fit and approximate the decision boundary of the target model, wherein the color of the triangle represents the real label of the sample, and the color of the triangle frame represents the classification of the model.
S3: generating a countermeasure sample for the substitute model trained by the S3 by using a white box attack method, and migrating the countermeasure sample to a target model for attack, wherein the method specifically comprises the following steps:
carrying out white box attack on five thousand samples after a CIFAR10 test set which is randomly disturbed by using the gradient information of a local substitution model after S3 retraining by using a PGD attack method, obtaining antagonistic samples, transferring the antagonistic samples to a corresponding target model for non-target attack and target attack with higher difficulty, counting the samples which pass the model and are classified wrongly, and calculating the success rate of the non-target attack, wherein in the target attack, the samples are classified wrongly, then the number of the samples which can be classified into a specified category under the target attack is calculated, and the success rate of the target attack is calculated, wherein the common parameters of four attacks are as follows: the loss function is a cross entropy loss function, and the attack parameters of the FGSM method are as follows: the attack step length eps is 0.26, and the attack parameters of the BIM method are as follows: the maximum perturbation eps is 0.25, the iteration number nb _ iter is 120, the attack step length eps _ iter is 0.02, the minimum value clip _ min of each input dimension is 0.0, the maximum value clip _ max of each input dimension is 1.0, and the attack parameters of the CW method are as follows: the learning rate of the attack algorithm learning _ rate is 0.45, the binary query times binding _ search _ steps are 10, the maximum iteration times max _ iterations are 120, and the attack parameters of the PGD method are as follows: the maximum perturbation eps is 0.25, the iteration number nb _ iter is 11, the attack step length eps _ iter is 0.03, the minimum value clip _ min of each input dimension is 0.0, and the maximum value clip _ max of each input dimension is 1.0.
Example two
The method for improving the robustness of the power grid system based on active learning comprises the following steps:
s1: training a power grid system model by using electronic information data (e-mails, electromagnetic signals and the like) of a target model data field, pre-training a local substitution model by using electronic information data of a non-target model data field, and enabling training data not to be repeated;
s2: screening electronic information data of a non-target model data domain by using an active learning method, inputting the data into a target model for query, and retraining a local pre-training model after obtaining a query result;
s3: and (3) generating a countermeasure sample for the substitute model trained in the S2 by using a countermeasure attack method, migrating the countermeasure sample to a power grid system target model, and destroying a power grid system detection algorithm of the other party, so that an attacker cannot identify electronic information data of the other party to attack the power grid system, and the robustness of the power grid system is improved.
The specific contents of each step of the present embodiment include the specific contents of each step of the previous embodiment.
The step S1 specifically includes:
s1.1: random shuffling using surrogate model training data;
s1.2: taking a sample of the randomly disturbed target model training data for training the target model, avoiding the repetition of the sample and a subsequent training local substitution model, and storing the model;
s1.3: and (4) using the alternative model training data to request the input and output obtained by the target model to train the local alternative model.
The step S2 specifically includes:
s2.1: inputting the rest samples of the non-target training data into the local substitution model trained in the S1, and obtaining the confidence coefficient of each sample passing through the substitution model;
s2.2: setting a confidence threshold, and taking the picture with the confidence lower than the threshold in the S2.1 for subsequent training;
s2.3: adding the images lower than the confidence coefficient threshold value to resist disturbance, and inputting the images before and after disturbance into a target model together for query to obtain corresponding labels;
s2.4: and (5) retraining the local alternative training by using the input and output obtained in the step, and storing the model.
The step S3 specifically includes:
s3.1: using the substitution model trained in S2, carrying out attack under different attack methods under a local substitution model white box on the samples of the training data of the randomly disturbed test set by using the gradient information of the substitution model, and storing corresponding confrontation samples for subsequent migration attack;
s3.2: and carrying out migration attack on the target model by using the countermeasure sample generated by the substitute model, wherein the method comprises FGSM, PGD, BIM, CW and other attack methods. BIM attacks mainly use the gradient of the model to determine the direction in which to add the perturbation. It screens the optimal disturbance in an iterative way; the CW attack takes an artificially input image as a parameter to be optimized and then trains the input with fixed model parameters to add a perturbation attack.
Their attacks are defined as follows:
Figure BDA0003868486350000091
Figure BDA0003868486350000092
Figure BDA0003868486350000093
Figure BDA0003868486350000094
where x is the sample before adding the perturbation; ε is an adjustment factor; theta is a model parameter;
Figure BDA0003868486350000095
representing the return of loss values to the input image and calculating gradients; sign () function is a function for finding a value sign as an input valueWhen the input value is greater than 0, the function takes 1, when the input value is equal to 0, the function takes 0, and when the input value is less than 0, the function takes-1; the clip () function is a function for preventing attack failure due to an excessively large gradient span, and constrains the disturbance within a prescribed range; a is a parameter controlling the iteration; w is a n Is an optimization parameter; x n Represents a clean sample;
s3.3: after the anti-sample migration attack is generated, the success rate of target attack and the success rate of non-target attack are respectively recorded, and the comparison is carried out without using an active learning method.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims (4)

1. A power grid system robustness improving method based on active learning is characterized by comprising the following steps:
s1: training a target model by using target training data, and pre-training a local substitution model by using non-target training data;
s2: screening non-target training data by using an active learning method, and retraining the model pre-trained in the S1;
s3: and generating a countermeasure sample for the S2-trained substitute model by using a countermeasure attack method, and migrating to the target model for attack.
2. The surrogate model attack method based on improving the success rate of model migration attack as claimed in claim 1, wherein: the step S1 specifically includes:
s1.1: randomly scrambling using surrogate model training data;
s1.2: taking a sample of the randomly disturbed target model training data for training the target model, avoiding the repetition of the sample and a subsequent training local substitution model, and storing the model;
s1.3: and (4) using the input and output obtained by the surrogate model training data request target model to train the local surrogate model.
3. The surrogate model attack method based on improving the success rate of model migration attack as claimed in claim 1, wherein: the step S2 specifically includes:
s2.1: inputting the rest samples of the non-target training data into the local substitution model trained in the S1, and obtaining the confidence coefficient of each sample passing through the substitution model;
s2.2: setting a confidence coefficient threshold value, and taking the picture with the confidence coefficient lower than the threshold value in S2.1 for subsequent training;
s2.3: adding the images lower than the confidence coefficient threshold value to resist disturbance, and inputting the images before and after disturbance into a target model together for query to obtain corresponding labels;
s2.4: and (5) retraining the local alternative training by using the input and output obtained in the step, and storing the model.
4. The surrogate model attack method based on improving the success rate of model migration attack as claimed in claim 1, wherein: the step S3 specifically includes:
s3.1: using the substitution model trained in S2, carrying out attacks under different attack methods under a local substitution model white box on samples of the randomly disturbed test set training data by using the gradient information of the substitution model, and storing corresponding countercheck samples for subsequent migration attacks;
s3.2: and (3) carrying out migration attack on the target model by using the confrontation sample generated by the substitute model, wherein the confrontation sample comprises attack methods such as FGSM, PGD, BIM, CW and the like. The BIM attack mainly determines the direction of adding disturbance by using the gradient of a model, and screens the optimal disturbance in an iterative way; the CW attack takes an artificially input image as a parameter to be optimized and then trains the input with fixed model parameters to add a perturbation attack, which is defined as follows:
Figure FDA0003868486340000021
Figure FDA0003868486340000022
Figure FDA0003868486340000023
Figure FDA0003868486340000024
where x is the sample before adding the perturbation; ε is an adjustment factor; theta is a model parameter;
Figure FDA0003868486340000025
indicating that the loss values are passed back to the input image and the gradients are calculated; sign () function is a function for solving numerical value signs, which takes 1 when the input value is greater than 0, takes 0 when the input value is equal to 0, and takes-1 when the input value is less than 0; the clip () function is a function for preventing attack failure due to an excessively large gradient span, and constrains disturbance within a prescribed range; a is a parameter controlling the iteration; w is a n Is an optimization parameter; x n Represents a clean sample;
s3.3: after the anti-sample migration attack is generated, the success rate of the target attack and the success rate of the non-target attack are respectively recorded and compared with the method without using the active learning method.
CN202211188638.3A 2022-09-28 2022-09-28 Active learning-based neural network black box attack method Pending CN115620100A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211188638.3A CN115620100A (en) 2022-09-28 2022-09-28 Active learning-based neural network black box attack method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211188638.3A CN115620100A (en) 2022-09-28 2022-09-28 Active learning-based neural network black box attack method

Publications (1)

Publication Number Publication Date
CN115620100A true CN115620100A (en) 2023-01-17

Family

ID=84860511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211188638.3A Pending CN115620100A (en) 2022-09-28 2022-09-28 Active learning-based neural network black box attack method

Country Status (1)

Country Link
CN (1) CN115620100A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523032A (en) * 2023-03-13 2023-08-01 之江实验室 Image text double-end migration attack method, device and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523032A (en) * 2023-03-13 2023-08-01 之江实验室 Image text double-end migration attack method, device and medium
CN116523032B (en) * 2023-03-13 2023-09-29 之江实验室 Image text double-end migration attack method, device and medium

Similar Documents

Publication Publication Date Title
Agarwal et al. Image transformation-based defense against adversarial perturbation on deep learning models
KR102304661B1 (en) Attack-less Adversarial Training Method for a Robust Adversarial Defense
CN112115469B (en) Edge intelligent mobile target defense method based on Bayes-Stackelberg game
Zhu et al. Dualde: Dually distilling knowledge graph embedding for faster and cheaper reasoning
CN111866004A (en) Security assessment method, apparatus, computer system, and medium
CN112016686A (en) Antagonism training method based on deep learning model
CN115620100A (en) Active learning-based neural network black box attack method
CN113627543A (en) Anti-attack detection method
Guo et al. ELAA: An efficient local adversarial attack using model interpreters
Deng et al. Frequency-tuned universal adversarial perturbations
Hui et al. FoolChecker: A platform to evaluate the robustness of images against adversarial attacks
CN113435264A (en) Face recognition attack resisting method and device based on black box substitution model searching
CN113034332A (en) Invisible watermark image and backdoor attack model construction and classification method and system
CN115481719B (en) Method for defending against attack based on gradient
Goodman Transferability of adversarial examples to attack cloud-based image classifier service
Yin et al. Adversarial attack, defense, and applications with deep learning frameworks
WO2023142282A1 (en) Task amplification-based transfer attack method and apparatus
CN115719085A (en) Deep neural network model inversion attack defense method and equipment
CN115632843A (en) Target detection-based generation method of backdoor attack defense model
CN113159317B (en) Antagonistic sample generation method based on dynamic residual corrosion
Dong et al. Mind your heart: Stealthy backdoor attack on dynamic deep neural network in edge computing
CN112884053B (en) Website classification method, system, equipment and medium based on image-text mixed characteristics
CN113486736A (en) Black box anti-attack method based on active subspace and low-rank evolution strategy
Xie et al. GAME: Generative-based adaptive model extraction attack
Li et al. A fast two-stage black-box deep learning network attacking method based on cross-correlation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination