CN115952493A - Reverse attack method and attack device for black box model and storage medium - Google Patents

Reverse attack method and attack device for black box model and storage medium Download PDF

Info

Publication number
CN115952493A
CN115952493A CN202211689305.9A CN202211689305A CN115952493A CN 115952493 A CN115952493 A CN 115952493A CN 202211689305 A CN202211689305 A CN 202211689305A CN 115952493 A CN115952493 A CN 115952493A
Authority
CN
China
Prior art keywords
data
preset
target
training data
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211689305.9A
Other languages
Chinese (zh)
Inventor
罗文坚
贾焰
叶子鹏
杨向凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202211689305.9A priority Critical patent/CN115952493A/en
Publication of CN115952493A publication Critical patent/CN115952493A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a reverse attack method, an attack device and a storage medium of a black box model, which are used in the technical field of black box models and comprise the following steps: respectively inputting preset training data into the black box model and the feature extraction model to obtain the prediction probability of the preset training data and the data features of the preset training data; training a mapping network based on the prediction probability of preset training data and data characteristics; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested; optimizing hidden vectors in data characteristics of the data to be tested based on a preset white box model reverse attack algorithm to obtain target hidden vectors corresponding to the data to be tested, and obtaining reverse attack target data according to the target hidden vectors; when the black box model attacks reversely, high-dimensionality data regression can be achieved.

Description

Reverse attack method and attack device for black box model and storage medium
Technical Field
The embodiment of the application relates to the technical field of black box models, in particular to a reverse attack method and device of a black box model and a storage medium.
Background
Existing network models typically include black-box models (black-box models) and white-box models (white-box models); the black box model refers to a model such as a neural network, a gradient enhancement model, or a complex integration model. The black box model is usually highly accurate. However, the internal working mechanism of these models is difficult to understand, the importance of each feature to the model prediction result cannot be estimated, and the interaction relationship between different features is difficult to understand. White-box models, such as simple models like linear regression and decision trees, are often limited in predictive power and difficult to model (e.g., feature interaction) the complexity inherent in a data set. However, such simple models are usually better interpretable, and the internal working principle is easier to interpret.
The existing network model reverse attack method generally adopts a gradient-based optimization method, and can be successfully implemented only when the network structure and parameter information of a target model are completely known during gradient calculation. However, for most network models deployed in reality, the structure is generally a black box model for an attacker. The existing black box model reverse attack aims to reconstruct the training data characteristics of the model only through the prediction output of the target model. When the data is an image, the realization principle of the model reverse attack is as follows: given the label class needing to be attacked, an attack image is reconstructed, and the attack image has extremely high prediction confidence coefficient on the given attack label class of the target model. An attacker can only access the target model in an input-output mode, and the related information of the model structure and the parameters is difficult to obtain. Currently, the reverse attack related research of the black box model is still very deficient, and only a few researches have poor performance on the reverse attack performance. In order to develop a safe and reliable industrial intelligent system, research on reverse attack of a black box model is needed.
The existing black box model reverse attack is a method based on a reverse generation network, which directly trains the reverse generation network from an output space to an input space through the prediction output of auxiliary data in a target model (namely, a black box model). However, this method has the following problems: the target model is difficult to train, and usually only the mapping relation from points to points can be memorized by means of fitting; the target model lacks generalization capability and it is difficult to generate the correct image for a given class label. This is because the dimensionality of an image is large, and it is difficult to realize high-dimensionality image regression when a black box model is attacked in the reverse direction only by prediction output.
Disclosure of Invention
The embodiment of the application provides a reverse attack method, an attack device and a storage medium for a black box model, which can realize high-dimensionality data regression when the black box model is reversely attacked.
The embodiment of the application provides a reverse attack method of a black box model, which comprises the following steps:
acquiring a feature extraction model trained based on preset training data;
inputting the preset training data into the black box model and the feature extraction model respectively to obtain the prediction probability of the preset training data and the data features of the preset training data;
training a mapping network based on the prediction probability of the preset training data and the data characteristics of the preset training data;
inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested;
and optimizing the hidden vector in the data characteristics of the data to be detected based on a preset white-box model reverse attack algorithm to obtain a target hidden vector corresponding to the data to be detected, and obtaining target data of reverse attack according to the target hidden vector.
Further, the preset training data includes: image training data, the black box model comprising: an image recognition model;
the step of respectively inputting the preset training data into the black box model and the feature extraction model to obtain the prediction probability of the preset training data and the data features of the preset training data comprises:
inputting the image training data into the image recognition model, wherein the image recognition model outputs the prediction probability of the image training data;
and inputting the image training data into the feature extraction model, and outputting the image features of the image training data by the feature extraction model.
Further, the training the mapping network based on the prediction probability of the preset training data and the data feature of the preset training data includes:
and taking the prediction probability of the image training data as the input of the mapping network, taking the image features of the image training data as the output of the mapping network, and training the mapping network based on the mapping relation between the prediction probability and the image features in the image training data.
Further, the data to be tested includes: image data to be detected; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network, wherein the data characteristics of the data to be tested comprise:
inputting the image data to be detected into the image recognition model to obtain a target prediction probability corresponding to the image data to be detected;
converting the target prediction probability corresponding to the image data to be detected into a preset characteristic form;
and inputting the preset characteristic form into the trained mapping network, and outputting the target image characteristic of the image data to be detected based on the mapping relation of the trained mapping network.
Further, the step of optimizing the hidden vector in the data characteristics of the data to be detected based on the preset white-box model reverse attack algorithm to obtain the target hidden vector corresponding to the data to be detected includes:
based on a preset white box model reverse attack algorithm:
Figure BDA0004020526760000021
obtaining a target implicit vector w corresponding to the data to be detected *
W is an implicit vector in the data features of the data to be detected, E is the feature extraction model, M is the mapping network, G is a generator corresponding to the data to be detected, y c And predicting the probability of the target corresponding to the data to be detected.
Further, after the target characteristic value of the data to be detected is obtained, the method also comprises the following steps;
acquiring a first fitness corresponding to the target implicit vector and a preset test vector corresponding to the data to be tested;
intersecting the dimensions of the target hidden vector and the preset test vector to obtain a descendant hidden vector corresponding to the target hidden vector;
comparing the first fitness corresponding to the target hidden vector with the second fitness corresponding to the descendant hidden vector, and determining a hidden vector with higher fitness;
and taking the implicit vector with higher fitness as the target implicit vector, and returning to execute the step of obtaining the first fitness corresponding to the target implicit vector and the preset test vector corresponding to the data to be tested to obtain the implicit vector meeting the preset fitness.
Further, the obtaining of the preset test vector corresponding to the data to be tested includes:
based on a preset mutation operator algorithm:
Figure BDA0004020526760000031
obtaining a preset test vector u corresponding to the data to be tested (i)
Wherein w (i) Is a target implicit vector, beta, corresponding to the data to be measured 1 The weights of the difference vectors derived for the optimal fitness hidden vector,
Figure BDA0004020526760000032
to maximize the fitnessHidden vector of (b), beta 2 Weights of the difference vectors obtained for the random hidden vectors, nv being the number of random difference vectors, k being the kth group of random difference vectors, and->
Figure BDA0004020526760000033
Two sets of hidden vectors are arbitrary outside the target hidden vector.
The embodiment of the present application further includes a reverse attack apparatus for a black box model, including:
the acquisition unit is used for acquiring a feature extraction model trained based on preset training data;
the input unit is used for respectively inputting the preset training data into the black box model and the feature extraction model to obtain first training data and a label of the first training data;
a training unit, configured to train a mapping network based on the first training data and a label of the first training data;
the execution unit is used for inputting the label of the data to be tested into the trained mapping network to obtain the characteristic value of the data to be tested;
and the optimization unit is used for optimizing the hidden vector in the characteristic value based on a preset white box model reverse attack algorithm to obtain a target hidden vector corresponding to the data to be detected.
The embodiment of the present application further includes a reverse attack apparatus for a black box model, including:
the system comprises a central processing unit, a memory and an input/output interface;
the memory is a transient storage memory or a persistent storage memory;
the central processor is configured to communicate with the memory and execute the instruction operations in the memory to perform the above-described method.
Embodiments of the present application also include a computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the above-described method.
According to the technical scheme, the embodiment of the application has the following advantages:
the method comprises the following steps: inputting preset training data into the black box model and the feature extraction model respectively to obtain the prediction probability of the preset training data and the data features of the preset training data; training a mapping network based on the prediction probability of preset training data and data characteristics; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested; optimizing hidden vectors in data characteristics of the data to be tested based on a preset white box model reverse attack algorithm to obtain target hidden vectors corresponding to the data to be tested, and obtaining reverse attack target data according to the target hidden vectors; and inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested, and realizing high-dimensionality data regression without obtaining the internal structure parameters of the black box model when the black box model is reversely attacked.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to these drawings.
FIG. 1 is a flowchart illustrating a reverse attack of a black box model according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a reverse attack based on a mapping network disclosed in an embodiment of the present application;
FIG. 3 is a flowchart illustrating a reverse attack of another black box model disclosed in an embodiment of the present application;
FIG. 4 is a diagram of an apparatus for reverse attack of a black box model according to an embodiment of the present disclosure;
fig. 5 is a diagram of a reverse attack apparatus of another black box model disclosed in an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, references to "one embodiment" or "an embodiment" and the like describe a subset of all possible embodiments, but it is to be understood that "one embodiment" or "an embodiment" may be the same subset or a different subset of all possible embodiments, and may be combined with each other without conflict. In the following description, the term plurality refers to at least two.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
The existing black box model reverse attack is a method based on a reverse generation network, which trains the reverse generation network from an output space to an input space directly through the prediction output of auxiliary data in a target model (namely, a black box model). However, this method has the following problems: the target model is difficult to train, and usually only the mapping relation from points to points can be memorized by means of fitting; the target model lacks generalization capability and it is difficult to generate the correct image for a given class label. This is because the dimensionality of an image is large, and it is difficult to realize high-dimensionality image regression when a black box model is attacked in the reverse direction only by prediction output. Therefore, the embodiment of the present application provides a reverse attack method for a black box model, which can implement high-dimensional data regression when the black box model is reversely attacked, and as shown in fig. 1, the specific steps are as follows:
101. and acquiring a feature extraction model trained based on preset training data.
In this embodiment of the application, the reverse attack apparatus of the black box model may obtain a feature extraction model trained based on preset training data, where the preset training data may be image data or audio data, and is not specifically limited here, and the feature extraction model may be a neural network model, such as a ResNet network model, or directly uses a public pre-trained network model, and is not specifically limited here. The feature extraction model (feature extraction network) may be trained by preset training data.
102. And inputting the preset training data into the black box model and the feature extraction model respectively to obtain the prediction probability and the data features of the preset training data.
In the embodiment of the application, after the preset training data is obtained, the preset training data can be respectively input into the black box model and the feature extraction model, so that the prediction probability of the preset training data and the data features of the preset training data are obtained. The black box model is a target model for reverse attack, and structural parameters in the black box model are generally unknown. Specifically, preset training data can be input into the black box model, and the black box model outputs the prediction probability on the target label corresponding to the training data; the predicted probability is the probability that the black box model outputs various training data after preset training data are input into the black box model. And inputting the preset training data into the feature extraction model, and extracting the data features of the preset training data in the feature extraction model.
When the preset training data is image training data, the black box model can be a corresponding image recognition model; specifically, the image training data may be input into an image recognition model, and the image recognition model outputs a prediction probability of the image training data; inputting the image training data into a feature extraction model, extracting the feature extraction model to obtain image features, and outputting the image features of the image training data by the feature extraction model.
103. And training the mapping network based on the prediction probability of the preset training data and the data characteristics of the preset training data.
After the prediction probability and the data features of the preset training data are obtained, the mapping network can be trained based on the prediction probability of the preset training data and the data features of the preset training data to obtain the trained mapping network, as shown in fig. 2, the auxiliary data in the drawing is the preset training data, the target model is a black box model, M is the mapping network, and E is a feature extraction model. The mapping network may be understood as a reverse mapping network.
Specifically, when the preset training data is image training data, the prediction probability of the image training data may be used as the input of the mapping network, the image feature of the image training data may be used as the output of the mapping network, and the mapping network may be trained based on the prediction probability in the image training data and the mapping relationship of the image feature. Namely, the mapping network can learn the prediction probability in the image training data and the mapping relation of the image features so as to facilitate the subsequent use of the mapping relation. It can be understood that even though the model structures of the feature extraction model and the black box model are different, the feature composition of the image in the feature space is similar, so that the image is only required to be mapped to the feature space of the feature extraction model, and then the white box model is started to attack on the feature extraction model with known structure and parameters in the reverse direction, and the image corresponding to the reverse attack can be obtained.
104. And inputting the target prediction probability corresponding to the data to be detected into the trained mapping network to obtain the data characteristics of the data to be detected.
After the training of the mapping network is completed, the target prediction probability corresponding to the data to be tested can be input into the trained mapping network to obtain the data characteristics of the data to be tested. Inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested. The data characteristics of the data to be measured can be obtained according to the mapping relation between the prediction probability and the data characteristics in the mapping network.
Specifically, the data to be measured may be: image data to be detected; and inputting the image data to be detected into the image recognition model to obtain the target prediction probability corresponding to the image data to be detected. And converting the target prediction probability corresponding to the image data to be detected into a preset characteristic form. The preset feature form may be a one-hot vector form, where the one-hot vector is a process for converting the category variable into a form that is easily utilized by the machine learning algorithm, and the representation of the one-hot vector is a feature vector of an attribute, that is, only one activation point (not 0) is present at the same time, and only one feature of the one-hot vector is not 0, and the others are all 0. And inputting the preset characteristic form into the trained mapping network, and outputting the target image characteristic of the image data to be detected based on the mapping relation of the trained mapping network. Namely, inputting a one-hot vector corresponding to the target prediction probability into the mapping network M, wherein the output of M is the characteristic value of M.
105. And optimizing the hidden vector in the data characteristics of the data to be tested based on a preset white-box model reverse attack algorithm to obtain reverse attack target data.
After the data characteristics of the data to be detected are obtained, the hidden vector in the data characteristics of the data to be detected can be optimized based on a preset white-box model reverse attack algorithm, and target data of reverse attack are obtained. It can be understood that hidden vectors in data features of the data to be detected can be optimized based on a preset white-box model reverse attack algorithm to obtain target hidden vectors corresponding to the data to be detected, and reverse attack target data can be obtained according to the target hidden vectors.
The method for optimizing the hidden vector in the data characteristics of the data to be detected based on a preset white-box model reverse attack algorithm to obtain the target hidden vector corresponding to the data to be detected comprises the following steps: based on a preset white box model reverse attack algorithm:
Figure BDA0004020526760000061
obtaining a target implicit vector w corresponding to the data to be measured * (ii) a W is a hidden vector in the data features of the data to be detected, E is a feature extraction model, M is the mapping network, G is a generator corresponding to the data to be detected, and yc is a target prediction probability corresponding to the data to be detected.
And inputting the optimized hidden vector into a generator corresponding to the data to be tested, so that target data of reverse attack can be obtained.
It can be seen that the embodiment of the present application includes: respectively inputting preset training data into the black box model and the feature extraction model to obtain the prediction probability of the preset training data and the data features of the preset training data; training the mapping network based on the prediction probability of preset training data and the data characteristics; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested; optimizing hidden vectors in data characteristics of the data to be tested based on a preset white box model reverse attack algorithm to obtain target hidden vectors corresponding to the data to be tested, and obtaining reverse attack target data according to the target hidden vectors; and inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested, and realizing high-dimensionality data regression without obtaining the internal structure parameters of the black box model when the black box model is reversely attacked.
Further, the mapping network M is generally lossy and will suffer a large penalty for higher dimensional data input. Therefore, the target data accuracy of the reverse attack obtained by only the above steps may not be high enough. Therefore, the data to be measured in the above steps take multiple groups of random values as initial values, and multiple groups of optimized hidden vectors are obtained through updating. Next, the groups of optimized hidden vectors are used as initial values, and are updated by a group intelligent optimization algorithm without gradient, as shown in fig. 3, the specific steps are as follows:
301. and acquiring a first fitness corresponding to the target implicit vector and a preset test vector corresponding to the data to be tested.
In the embodiment of the application, a first fitness corresponding to a target hidden vector and a preset test vector corresponding to data to be tested can be obtained, wherein when the data to be tested is image data, a corresponding generated image can be obtained by inputting the target hidden vector into a generator, the image is input into an image recognition model, a prediction output value can be obtained, the prediction probability of the prediction output on a target label is the fitness, and it should be noted that the fitness in the embodiment of the application can be understood as the prediction probability of the output of a black box model, and the description is omitted later.
Obtaining the preset test vector corresponding to the data to be tested may be: based on a preset mutation operator algorithm (current-to-best mutation operator):
Figure BDA0004020526760000062
obtaining a preset test vector u corresponding to the data to be tested (i) (ii) a Wherein w (i) Is a target implicit vector, beta, corresponding to the data to be measured 1 The weight of the difference vector found for the best fitness hidden vector, <' >>
Figure BDA0004020526760000063
As the most adaptive latent vector, beta 2 Weights of the difference vectors obtained for the random hidden vectors, nv being the number of random difference vectors, summing up different differences in nv groups, k being the kth group of random difference vectors, and->
Figure BDA0004020526760000064
Figure BDA0004020526760000065
Two sets of hidden vectors are arbitrary outside the target hidden vector.
302. And intersecting the dimensions of the target hidden vector and the preset test vector to obtain a descendant hidden vector corresponding to the target hidden vector.
And then, intersecting the dimensions of the target hidden vector and the preset test vector to obtain a descendant hidden vector corresponding to the target hidden vector. I.e. for each pair w (i) And u (i) Intersecting all dimensions according to the probability to obtain the offspring hidden vectors corresponding to the target hidden vector
Figure BDA0004020526760000071
Specifically, a threshold p of each dimension of the hidden vector may be set, and is usually 0.5; will w (i) And u (i) Taking the vector sum of all dimensions and taking the average value, and generating a random number r uniformly distributed between 0 and 1 by using the average value; in each dimension, if r is less than p, then w is selected (i) The value corresponding to the dimension; if r is greater than p, then u is selected (i) The value corresponding to that dimension. />
303. And comparing the first fitness corresponding to the target hidden vector with the second fitness corresponding to the descendant hidden vector to determine the hidden vector with higher fitness.
And comparing the first fitness corresponding to the target hidden vector with the second fitness corresponding to the descendant hidden vector to determine the hidden vector with higher fitness. When the data to be detected is image data, the second fitness corresponding to the sub-generation hidden vector is that the sub-generation hidden vector is generated into the image data by the generator, the image data is input into the image recognition model (black box model), and the image recognition model outputs the second fitness (prediction probability).
304. And taking the hidden vector with higher fitness as a target hidden vector, and returning to execute the step 301.
And taking the implicit vector with higher fitness as a target implicit vector, and returning to execute the step of obtaining the first fitness corresponding to the target implicit vector and the preset test vector corresponding to the data to be tested to obtain the implicit vector meeting the preset fitness. I.e. the individual with the best fitness is taken as the final implicit vector.
In the embodiment of the application, a two-stage model reverse attack method is provided, and high-performance model reverse attack can be realized in a black box scene. Specifically, in the first stage, a simple inverse mapping network M is designed to map the prediction probability output by the target model to the feature space of the auxiliary feature extractor (feature extraction model). Since the parameters of the assistant feature extractor are available, after the mapped features are obtained, they can be regarded as a white-box problem, searched for an input space that generates a countermeasure network, and an image similar to the mapped features is synthesized. However, the mapping network M that is usually trained is lossy, so in the second phase, a non-gradient optimization algorithm is employed to optimize the resulting image to further improve attack performance. Any internal structure parameter information of the target model is not needed, and reverse attack of the black box model is effectively realized. Designing an easy-to-train reverse network to realize reverse mapping of model prediction output to a feature space and converting the problem into a white box form; by using a gradient-free optimization strategy of the reverse attack of the model, the reverse attack performance of the black box model is effectively improved. The whole two-stage framework effectively realizes the reverse attack of the model in the black box scene.
The embodiment of the present application further provides a reverse attack apparatus for a black box model, as shown in fig. 4, including:
an obtaining unit 401, configured to obtain a feature extraction model trained based on preset training data;
an input unit 402, configured to input the preset training data into the black box model and the feature extraction model respectively, so as to obtain first training data and a label of the first training data;
a training unit 403, configured to train a mapping network based on the first training data and the label of the first training data;
an execution unit 404, configured to input a label of data to be detected into a trained mapping network, so as to obtain a feature value of the data to be detected;
and an optimizing unit 405, configured to optimize the hidden vector in the feature value based on a preset white-box model reverse attack algorithm, so as to obtain a target hidden vector corresponding to the to-be-detected data.
An embodiment of the present application provides a reverse attack apparatus 500 of a black box model, as shown in fig. 5, including:
a central processing unit 501, a memory 502 and an input/output interface 503;
the memory 502 is a transient storage memory or a persistent storage memory;
the central processor 501 is configured to communicate with the memory 502 and execute the instruction operations in the memory 502 to perform the above-mentioned reverse attack method.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Claims (10)

1. A reverse attack method of a black box model is characterized by comprising the following steps:
acquiring a feature extraction model trained based on preset training data;
inputting the preset training data into the black box model and the feature extraction model respectively to obtain the prediction probability of the preset training data and the data features of the preset training data;
training a mapping network based on the prediction probability of the preset training data and the data characteristics of the preset training data;
inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested;
and optimizing the hidden vector in the data characteristics of the data to be detected based on a preset white-box model reverse attack algorithm to obtain a target hidden vector corresponding to the data to be detected, and obtaining target data of reverse attack according to the target hidden vector.
2. A reverse attack method according to claim 1, wherein the preset training data comprises: image training data, the black box model comprising: an image recognition model;
the step of respectively inputting the preset training data into the black box model and the feature extraction model to obtain the prediction probability of the preset training data and the data features of the preset training data comprises:
inputting the image training data into the image recognition model, wherein the image recognition model outputs the prediction probability of the image training data;
and inputting the image training data into the feature extraction model, and outputting the image features of the image training data by the feature extraction model.
3. The reverse attack method according to claim 2, wherein the training the mapping network based on the predicted probability of the preset training data and the data feature of the preset training data comprises:
and taking the prediction probability of the image training data as the input of the mapping network, taking the image features of the image training data as the output of the mapping network, and training the mapping network based on the mapping relation between the prediction probability and the image features in the image training data.
4. A reverse attack method according to claim 3, wherein the data to be tested comprises: image data to be detected; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network, wherein the data characteristics of the data to be tested comprise:
inputting the image data to be detected into the image recognition model to obtain a target prediction probability corresponding to the image data to be detected;
converting the target prediction probability corresponding to the image data to be detected into a preset characteristic form;
and inputting the preset characteristic form into the trained mapping network, and outputting the target image characteristic of the image data to be detected based on the mapping relation of the trained mapping network.
5. The reverse attack method according to claim 1, wherein the optimizing the hidden vector in the data feature of the data to be detected based on a preset white-box model reverse attack algorithm to obtain the target hidden vector corresponding to the data to be detected comprises:
based on a preset white box model reverse attack algorithm:
Figure FDA0004020526750000011
obtaining a target hidden vector corresponding to the data to be detectedw *
W is a hidden vector in the data characteristics of the data to be detected, E is the characteristic extraction model, M is the mapping network, G is a generator corresponding to the data to be detected, y c And predicting the probability of the target corresponding to the data to be detected.
6. The reverse attack method according to claim 1, further comprising, after obtaining the target characteristic value of the data to be tested;
acquiring a first fitness corresponding to the target hidden vector and a preset test vector corresponding to the data to be tested;
intersecting the dimensions of the target hidden vector and the preset test vector to obtain a child hidden vector corresponding to the target hidden vector;
comparing the first fitness corresponding to the target hidden vector with the second fitness corresponding to the child hidden vector to determine a hidden vector with higher fitness;
and taking the hidden vector with higher fitness as the target hidden vector, and returning to execute the step of obtaining the first fitness corresponding to the target hidden vector and the preset test vector corresponding to the data to be tested to obtain the hidden vector meeting the preset fitness.
7. The reverse attack method according to claim 6, wherein the obtaining of the preset test vector corresponding to the data to be tested comprises:
based on a preset mutation operator algorithm:
Figure FDA0004020526750000021
obtaining a preset test vector u corresponding to the data to be tested (i)
Wherein, w (i) Is a target implicit vector, beta, corresponding to the data to be measured 1 The weights of the difference vectors obtained for the best fitness hidden vector,
Figure FDA0004020526750000022
as the most adaptive latent vector, beta 2 Weights of difference vectors obtained for random hidden vectors, n v Is the number of random difference vectors, k is the kth group of random difference vectors, </R>
Figure FDA0004020526750000023
Two sets of hidden vectors are arbitrary outside the target hidden vector.
8. A reverse attack apparatus of a black box model, comprising:
the acquisition unit is used for acquiring a feature extraction model trained based on preset training data;
the input unit is used for respectively inputting the preset training data into the black box model and the feature extraction model to obtain first training data and a label of the first training data;
a training unit configured to train a mapping network based on the first training data and a label of the first training data;
the execution unit is used for inputting the label of the data to be detected into the trained mapping network to obtain the characteristic value of the data to be detected;
and the optimization unit is used for optimizing the hidden vector in the characteristic value based on a preset white box model reverse attack algorithm to obtain a target hidden vector corresponding to the data to be detected.
9. A reverse attack apparatus of a black box model, comprising:
the system comprises a central processing unit, a memory and an input/output interface;
the memory is a transient memory or a persistent memory;
the central processor is configured to communicate with the memory and execute the instructions in the memory to perform the method of any of claims 1-7.
10. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 7.
CN202211689305.9A 2022-12-27 2022-12-27 Reverse attack method and attack device for black box model and storage medium Pending CN115952493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211689305.9A CN115952493A (en) 2022-12-27 2022-12-27 Reverse attack method and attack device for black box model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211689305.9A CN115952493A (en) 2022-12-27 2022-12-27 Reverse attack method and attack device for black box model and storage medium

Publications (1)

Publication Number Publication Date
CN115952493A true CN115952493A (en) 2023-04-11

Family

ID=87285748

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211689305.9A Pending CN115952493A (en) 2022-12-27 2022-12-27 Reverse attack method and attack device for black box model and storage medium

Country Status (1)

Country Link
CN (1) CN115952493A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117176416A (en) * 2023-09-01 2023-12-05 中国信息通信研究院 Attack partner discovery method and system based on graph model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117176416A (en) * 2023-09-01 2023-12-05 中国信息通信研究院 Attack partner discovery method and system based on graph model
CN117176416B (en) * 2023-09-01 2024-05-24 中国信息通信研究院 Attack partner discovery method and system based on graph model

Similar Documents

Publication Publication Date Title
CN107358293B (en) Neural network training method and device
CN109961442B (en) Training method and device of neural network model and electronic equipment
US20220261659A1 (en) Method and Apparatus for Determining Neural Network
CN112560967B (en) Multi-source remote sensing image classification method, storage medium and computing device
CN113570029A (en) Method for obtaining neural network model, image processing method and device
CN110598603A (en) Face recognition model acquisition method, device, equipment and medium
CN113254716B (en) Video clip retrieval method and device, electronic equipment and readable storage medium
CN113298152B (en) Model training method, device, terminal equipment and computer readable storage medium
Bharti et al. EMOCGAN: a novel evolutionary multiobjective cyclic generative adversarial network and its application to unpaired image translation
EP4318322A1 (en) Data processing method and related device
CN115952493A (en) Reverse attack method and attack device for black box model and storage medium
CN113935489A (en) Variational quantum model TFQ-VQA based on quantum neural network and two-stage optimization method thereof
WO2020154373A1 (en) Neural network training using the soft nearest neighbor loss
CN113591892A (en) Training data processing method and device
CN116797850A (en) Class increment image classification method based on knowledge distillation and consistency regularization
CN116310642A (en) Variable dynamic discriminator differential privacy data generator based on PATE framework
CN112529772B (en) Unsupervised image conversion method under zero sample setting
CN115936926A (en) SMOTE-GBDT-based unbalanced electricity stealing data classification method and device, computer equipment and storage medium
KR20240034804A (en) Evaluating output sequences using an autoregressive language model neural network
CN113569960A (en) Small sample image classification method and system based on domain adaptation
KR20220101868A (en) Method and system for training dynamic deep neural network
JP2020030702A (en) Learning device, learning method, and learning program
CN113011446A (en) Intelligent target identification method based on multi-source heterogeneous data learning
CN113449817B (en) Image classification implicit model acceleration training method based on phantom gradient
Zheng et al. Networked synthetic dynamic PMU data generation: A generative adversarial network approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination