CN115952493A

CN115952493A - Reverse attack method and attack device for black box model and storage medium

Info

Publication number: CN115952493A
Application number: CN202211689305.9A
Authority: CN
Inventors: 罗文坚; 贾焰; 叶子鹏; 杨向凯
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2023-04-11

Abstract

The embodiment of the application discloses a reverse attack method, an attack device and a storage medium of a black box model, which are used in the technical field of black box models and comprise the following steps: respectively inputting preset training data into the black box model and the feature extraction model to obtain the prediction probability of the preset training data and the data features of the preset training data; training a mapping network based on the prediction probability of preset training data and data characteristics; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested; optimizing hidden vectors in data characteristics of the data to be tested based on a preset white box model reverse attack algorithm to obtain target hidden vectors corresponding to the data to be tested, and obtaining reverse attack target data according to the target hidden vectors; when the black box model attacks reversely, high-dimensionality data regression can be achieved.

Description

Reverse attack method and attack device for black box model and storage medium

Technical Field

The embodiment of the application relates to the technical field of black box models, in particular to a reverse attack method and device of a black box model and a storage medium.

Background

Existing network models typically include black-box models (black-box models) and white-box models (white-box models); the black box model refers to a model such as a neural network, a gradient enhancement model, or a complex integration model. The black box model is usually highly accurate. However, the internal working mechanism of these models is difficult to understand, the importance of each feature to the model prediction result cannot be estimated, and the interaction relationship between different features is difficult to understand. White-box models, such as simple models like linear regression and decision trees, are often limited in predictive power and difficult to model (e.g., feature interaction) the complexity inherent in a data set. However, such simple models are usually better interpretable, and the internal working principle is easier to interpret.

The existing network model reverse attack method generally adopts a gradient-based optimization method, and can be successfully implemented only when the network structure and parameter information of a target model are completely known during gradient calculation. However, for most network models deployed in reality, the structure is generally a black box model for an attacker. The existing black box model reverse attack aims to reconstruct the training data characteristics of the model only through the prediction output of the target model. When the data is an image, the realization principle of the model reverse attack is as follows: given the label class needing to be attacked, an attack image is reconstructed, and the attack image has extremely high prediction confidence coefficient on the given attack label class of the target model. An attacker can only access the target model in an input-output mode, and the related information of the model structure and the parameters is difficult to obtain. Currently, the reverse attack related research of the black box model is still very deficient, and only a few researches have poor performance on the reverse attack performance. In order to develop a safe and reliable industrial intelligent system, research on reverse attack of a black box model is needed.

The existing black box model reverse attack is a method based on a reverse generation network, which directly trains the reverse generation network from an output space to an input space through the prediction output of auxiliary data in a target model (namely, a black box model). However, this method has the following problems: the target model is difficult to train, and usually only the mapping relation from points to points can be memorized by means of fitting; the target model lacks generalization capability and it is difficult to generate the correct image for a given class label. This is because the dimensionality of an image is large, and it is difficult to realize high-dimensionality image regression when a black box model is attacked in the reverse direction only by prediction output.

Disclosure of Invention

The embodiment of the application provides a reverse attack method, an attack device and a storage medium for a black box model, which can realize high-dimensionality data regression when the black box model is reversely attacked.

The embodiment of the application provides a reverse attack method of a black box model, which comprises the following steps:

acquiring a feature extraction model trained based on preset training data;

inputting the preset training data into the black box model and the feature extraction model respectively to obtain the prediction probability of the preset training data and the data features of the preset training data;

training a mapping network based on the prediction probability of the preset training data and the data characteristics of the preset training data;

inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested;

and optimizing the hidden vector in the data characteristics of the data to be detected based on a preset white-box model reverse attack algorithm to obtain a target hidden vector corresponding to the data to be detected, and obtaining target data of reverse attack according to the target hidden vector.

Further, the preset training data includes: image training data, the black box model comprising: an image recognition model;

the step of respectively inputting the preset training data into the black box model and the feature extraction model to obtain the prediction probability of the preset training data and the data features of the preset training data comprises:

inputting the image training data into the image recognition model, wherein the image recognition model outputs the prediction probability of the image training data;

and inputting the image training data into the feature extraction model, and outputting the image features of the image training data by the feature extraction model.

Further, the training the mapping network based on the prediction probability of the preset training data and the data feature of the preset training data includes:

and taking the prediction probability of the image training data as the input of the mapping network, taking the image features of the image training data as the output of the mapping network, and training the mapping network based on the mapping relation between the prediction probability and the image features in the image training data.

Further, the data to be tested includes: image data to be detected; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network, wherein the data characteristics of the data to be tested comprise:

inputting the image data to be detected into the image recognition model to obtain a target prediction probability corresponding to the image data to be detected;

converting the target prediction probability corresponding to the image data to be detected into a preset characteristic form;

and inputting the preset characteristic form into the trained mapping network, and outputting the target image characteristic of the image data to be detected based on the mapping relation of the trained mapping network.

Further, the step of optimizing the hidden vector in the data characteristics of the data to be detected based on the preset white-box model reverse attack algorithm to obtain the target hidden vector corresponding to the data to be detected includes:

based on a preset white box model reverse attack algorithm:

obtaining a target implicit vector w corresponding to the data to be detected ^* ；

W is an implicit vector in the data features of the data to be detected, E is the feature extraction model, M is the mapping network, G is a generator corresponding to the data to be detected, y _c And predicting the probability of the target corresponding to the data to be detected.

Further, after the target characteristic value of the data to be detected is obtained, the method also comprises the following steps;

acquiring a first fitness corresponding to the target implicit vector and a preset test vector corresponding to the data to be tested;

intersecting the dimensions of the target hidden vector and the preset test vector to obtain a descendant hidden vector corresponding to the target hidden vector;

comparing the first fitness corresponding to the target hidden vector with the second fitness corresponding to the descendant hidden vector, and determining a hidden vector with higher fitness;

and taking the implicit vector with higher fitness as the target implicit vector, and returning to execute the step of obtaining the first fitness corresponding to the target implicit vector and the preset test vector corresponding to the data to be tested to obtain the implicit vector meeting the preset fitness.

Further, the obtaining of the preset test vector corresponding to the data to be tested includes:

based on a preset mutation operator algorithm:

obtaining a preset test vector u corresponding to the data to be tested ⁽ⁱ⁾ ；

Wherein w ⁽ⁱ⁾ Is a target implicit vector, beta, corresponding to the data to be measured ₁ The weights of the difference vectors derived for the optimal fitness hidden vector,

to maximize the fitnessHidden vector of (b), beta ₂ Weights of the difference vectors obtained for the random hidden vectors, nv being the number of random difference vectors, k being the kth group of random difference vectors, and->

Two sets of hidden vectors are arbitrary outside the target hidden vector.

The embodiment of the present application further includes a reverse attack apparatus for a black box model, including:

the acquisition unit is used for acquiring a feature extraction model trained based on preset training data;

the input unit is used for respectively inputting the preset training data into the black box model and the feature extraction model to obtain first training data and a label of the first training data;

a training unit, configured to train a mapping network based on the first training data and a label of the first training data;

the execution unit is used for inputting the label of the data to be tested into the trained mapping network to obtain the characteristic value of the data to be tested;

and the optimization unit is used for optimizing the hidden vector in the characteristic value based on a preset white box model reverse attack algorithm to obtain a target hidden vector corresponding to the data to be detected.

the system comprises a central processing unit, a memory and an input/output interface;

the memory is a transient storage memory or a persistent storage memory;

the central processor is configured to communicate with the memory and execute the instruction operations in the memory to perform the above-described method.

Embodiments of the present application also include a computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the above-described method.

According to the technical scheme, the embodiment of the application has the following advantages:

the method comprises the following steps: inputting preset training data into the black box model and the feature extraction model respectively to obtain the prediction probability of the preset training data and the data features of the preset training data; training a mapping network based on the prediction probability of preset training data and data characteristics; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested; optimizing hidden vectors in data characteristics of the data to be tested based on a preset white box model reverse attack algorithm to obtain target hidden vectors corresponding to the data to be tested, and obtaining reverse attack target data according to the target hidden vectors; and inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested, and realizing high-dimensionality data regression without obtaining the internal structure parameters of the black box model when the black box model is reversely attacked.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to these drawings.

FIG. 1 is a flowchart illustrating a reverse attack of a black box model according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a reverse attack based on a mapping network disclosed in an embodiment of the present application;

FIG. 3 is a flowchart illustrating a reverse attack of another black box model disclosed in an embodiment of the present application;

FIG. 4 is a diagram of an apparatus for reverse attack of a black box model according to an embodiment of the present disclosure;

fig. 5 is a diagram of a reverse attack apparatus of another black box model disclosed in an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, references to "one embodiment" or "an embodiment" and the like describe a subset of all possible embodiments, but it is to be understood that "one embodiment" or "an embodiment" may be the same subset or a different subset of all possible embodiments, and may be combined with each other without conflict. In the following description, the term plurality refers to at least two.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

The existing black box model reverse attack is a method based on a reverse generation network, which trains the reverse generation network from an output space to an input space directly through the prediction output of auxiliary data in a target model (namely, a black box model). However, this method has the following problems: the target model is difficult to train, and usually only the mapping relation from points to points can be memorized by means of fitting; the target model lacks generalization capability and it is difficult to generate the correct image for a given class label. This is because the dimensionality of an image is large, and it is difficult to realize high-dimensionality image regression when a black box model is attacked in the reverse direction only by prediction output. Therefore, the embodiment of the present application provides a reverse attack method for a black box model, which can implement high-dimensional data regression when the black box model is reversely attacked, and as shown in fig. 1, the specific steps are as follows:

101. and acquiring a feature extraction model trained based on preset training data.

In this embodiment of the application, the reverse attack apparatus of the black box model may obtain a feature extraction model trained based on preset training data, where the preset training data may be image data or audio data, and is not specifically limited here, and the feature extraction model may be a neural network model, such as a ResNet network model, or directly uses a public pre-trained network model, and is not specifically limited here. The feature extraction model (feature extraction network) may be trained by preset training data.

102. And inputting the preset training data into the black box model and the feature extraction model respectively to obtain the prediction probability and the data features of the preset training data.

In the embodiment of the application, after the preset training data is obtained, the preset training data can be respectively input into the black box model and the feature extraction model, so that the prediction probability of the preset training data and the data features of the preset training data are obtained. The black box model is a target model for reverse attack, and structural parameters in the black box model are generally unknown. Specifically, preset training data can be input into the black box model, and the black box model outputs the prediction probability on the target label corresponding to the training data; the predicted probability is the probability that the black box model outputs various training data after preset training data are input into the black box model. And inputting the preset training data into the feature extraction model, and extracting the data features of the preset training data in the feature extraction model.

When the preset training data is image training data, the black box model can be a corresponding image recognition model; specifically, the image training data may be input into an image recognition model, and the image recognition model outputs a prediction probability of the image training data; inputting the image training data into a feature extraction model, extracting the feature extraction model to obtain image features, and outputting the image features of the image training data by the feature extraction model.

103. And training the mapping network based on the prediction probability of the preset training data and the data characteristics of the preset training data.

After the prediction probability and the data features of the preset training data are obtained, the mapping network can be trained based on the prediction probability of the preset training data and the data features of the preset training data to obtain the trained mapping network, as shown in fig. 2, the auxiliary data in the drawing is the preset training data, the target model is a black box model, M is the mapping network, and E is a feature extraction model. The mapping network may be understood as a reverse mapping network.

Specifically, when the preset training data is image training data, the prediction probability of the image training data may be used as the input of the mapping network, the image feature of the image training data may be used as the output of the mapping network, and the mapping network may be trained based on the prediction probability in the image training data and the mapping relationship of the image feature. Namely, the mapping network can learn the prediction probability in the image training data and the mapping relation of the image features so as to facilitate the subsequent use of the mapping relation. It can be understood that even though the model structures of the feature extraction model and the black box model are different, the feature composition of the image in the feature space is similar, so that the image is only required to be mapped to the feature space of the feature extraction model, and then the white box model is started to attack on the feature extraction model with known structure and parameters in the reverse direction, and the image corresponding to the reverse attack can be obtained.

104. And inputting the target prediction probability corresponding to the data to be detected into the trained mapping network to obtain the data characteristics of the data to be detected.

After the training of the mapping network is completed, the target prediction probability corresponding to the data to be tested can be input into the trained mapping network to obtain the data characteristics of the data to be tested. Inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested. The data characteristics of the data to be measured can be obtained according to the mapping relation between the prediction probability and the data characteristics in the mapping network.

Specifically, the data to be measured may be: image data to be detected; and inputting the image data to be detected into the image recognition model to obtain the target prediction probability corresponding to the image data to be detected. And converting the target prediction probability corresponding to the image data to be detected into a preset characteristic form. The preset feature form may be a one-hot vector form, where the one-hot vector is a process for converting the category variable into a form that is easily utilized by the machine learning algorithm, and the representation of the one-hot vector is a feature vector of an attribute, that is, only one activation point (not 0) is present at the same time, and only one feature of the one-hot vector is not 0, and the others are all 0. And inputting the preset characteristic form into the trained mapping network, and outputting the target image characteristic of the image data to be detected based on the mapping relation of the trained mapping network. Namely, inputting a one-hot vector corresponding to the target prediction probability into the mapping network M, wherein the output of M is the characteristic value of M.

105. And optimizing the hidden vector in the data characteristics of the data to be tested based on a preset white-box model reverse attack algorithm to obtain reverse attack target data.

After the data characteristics of the data to be detected are obtained, the hidden vector in the data characteristics of the data to be detected can be optimized based on a preset white-box model reverse attack algorithm, and target data of reverse attack are obtained. It can be understood that hidden vectors in data features of the data to be detected can be optimized based on a preset white-box model reverse attack algorithm to obtain target hidden vectors corresponding to the data to be detected, and reverse attack target data can be obtained according to the target hidden vectors.

The method for optimizing the hidden vector in the data characteristics of the data to be detected based on a preset white-box model reverse attack algorithm to obtain the target hidden vector corresponding to the data to be detected comprises the following steps: based on a preset white box model reverse attack algorithm:

obtaining a target implicit vector w corresponding to the data to be measured ^* (ii) a W is a hidden vector in the data features of the data to be detected, E is a feature extraction model, M is the mapping network, G is a generator corresponding to the data to be detected, and yc is a target prediction probability corresponding to the data to be detected.

And inputting the optimized hidden vector into a generator corresponding to the data to be tested, so that target data of reverse attack can be obtained.

It can be seen that the embodiment of the present application includes: respectively inputting preset training data into the black box model and the feature extraction model to obtain the prediction probability of the preset training data and the data features of the preset training data; training the mapping network based on the prediction probability of preset training data and the data characteristics; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested; optimizing hidden vectors in data characteristics of the data to be tested based on a preset white box model reverse attack algorithm to obtain target hidden vectors corresponding to the data to be tested, and obtaining reverse attack target data according to the target hidden vectors; and inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network to obtain the data characteristics of the data to be tested, and realizing high-dimensionality data regression without obtaining the internal structure parameters of the black box model when the black box model is reversely attacked.

Further, the mapping network M is generally lossy and will suffer a large penalty for higher dimensional data input. Therefore, the target data accuracy of the reverse attack obtained by only the above steps may not be high enough. Therefore, the data to be measured in the above steps take multiple groups of random values as initial values, and multiple groups of optimized hidden vectors are obtained through updating. Next, the groups of optimized hidden vectors are used as initial values, and are updated by a group intelligent optimization algorithm without gradient, as shown in fig. 3, the specific steps are as follows:

301. and acquiring a first fitness corresponding to the target implicit vector and a preset test vector corresponding to the data to be tested.

In the embodiment of the application, a first fitness corresponding to a target hidden vector and a preset test vector corresponding to data to be tested can be obtained, wherein when the data to be tested is image data, a corresponding generated image can be obtained by inputting the target hidden vector into a generator, the image is input into an image recognition model, a prediction output value can be obtained, the prediction probability of the prediction output on a target label is the fitness, and it should be noted that the fitness in the embodiment of the application can be understood as the prediction probability of the output of a black box model, and the description is omitted later.

Obtaining the preset test vector corresponding to the data to be tested may be: based on a preset mutation operator algorithm (current-to-best mutation operator):

obtaining a preset test vector u corresponding to the data to be tested ⁽ⁱ⁾ (ii) a Wherein w ⁽ⁱ⁾ Is a target implicit vector, beta, corresponding to the data to be measured ₁ The weight of the difference vector found for the best fitness hidden vector, <' >>

As the most adaptive latent vector, beta ₂ Weights of the difference vectors obtained for the random hidden vectors, nv being the number of random difference vectors, summing up different differences in nv groups, k being the kth group of random difference vectors, and->

Two sets of hidden vectors are arbitrary outside the target hidden vector.

302. And intersecting the dimensions of the target hidden vector and the preset test vector to obtain a descendant hidden vector corresponding to the target hidden vector.

And then, intersecting the dimensions of the target hidden vector and the preset test vector to obtain a descendant hidden vector corresponding to the target hidden vector. I.e. for each pair w ⁽ⁱ⁾ And u ⁽ⁱ⁾ Intersecting all dimensions according to the probability to obtain the offspring hidden vectors corresponding to the target hidden vector

Specifically, a threshold p of each dimension of the hidden vector may be set, and is usually 0.5; will w ⁽ⁱ⁾ And u ⁽ⁱ⁾ Taking the vector sum of all dimensions and taking the average value, and generating a random number r uniformly distributed between 0 and 1 by using the average value; in each dimension, if r is less than p, then w is selected ⁽ⁱ⁾ The value corresponding to the dimension; if r is greater than p, then u is selected ⁽ⁱ⁾ The value corresponding to that dimension. />

303. And comparing the first fitness corresponding to the target hidden vector with the second fitness corresponding to the descendant hidden vector to determine the hidden vector with higher fitness.

And comparing the first fitness corresponding to the target hidden vector with the second fitness corresponding to the descendant hidden vector to determine the hidden vector with higher fitness. When the data to be detected is image data, the second fitness corresponding to the sub-generation hidden vector is that the sub-generation hidden vector is generated into the image data by the generator, the image data is input into the image recognition model (black box model), and the image recognition model outputs the second fitness (prediction probability).

304. And taking the hidden vector with higher fitness as a target hidden vector, and returning to execute the step 301.

And taking the implicit vector with higher fitness as a target implicit vector, and returning to execute the step of obtaining the first fitness corresponding to the target implicit vector and the preset test vector corresponding to the data to be tested to obtain the implicit vector meeting the preset fitness. I.e. the individual with the best fitness is taken as the final implicit vector.

In the embodiment of the application, a two-stage model reverse attack method is provided, and high-performance model reverse attack can be realized in a black box scene. Specifically, in the first stage, a simple inverse mapping network M is designed to map the prediction probability output by the target model to the feature space of the auxiliary feature extractor (feature extraction model). Since the parameters of the assistant feature extractor are available, after the mapped features are obtained, they can be regarded as a white-box problem, searched for an input space that generates a countermeasure network, and an image similar to the mapped features is synthesized. However, the mapping network M that is usually trained is lossy, so in the second phase, a non-gradient optimization algorithm is employed to optimize the resulting image to further improve attack performance. Any internal structure parameter information of the target model is not needed, and reverse attack of the black box model is effectively realized. Designing an easy-to-train reverse network to realize reverse mapping of model prediction output to a feature space and converting the problem into a white box form; by using a gradient-free optimization strategy of the reverse attack of the model, the reverse attack performance of the black box model is effectively improved. The whole two-stage framework effectively realizes the reverse attack of the model in the black box scene.

The embodiment of the present application further provides a reverse attack apparatus for a black box model, as shown in fig. 4, including:

an obtaining unit 401, configured to obtain a feature extraction model trained based on preset training data;

an input unit 402, configured to input the preset training data into the black box model and the feature extraction model respectively, so as to obtain first training data and a label of the first training data;

a training unit 403, configured to train a mapping network based on the first training data and the label of the first training data;

an execution unit 404, configured to input a label of data to be detected into a trained mapping network, so as to obtain a feature value of the data to be detected;

and an optimizing unit 405, configured to optimize the hidden vector in the feature value based on a preset white-box model reverse attack algorithm, so as to obtain a target hidden vector corresponding to the to-be-detected data.

An embodiment of the present application provides a reverse attack apparatus 500 of a black box model, as shown in fig. 5, including:

a central processing unit 501, a memory 502 and an input/output interface 503;

the memory 502 is a transient storage memory or a persistent storage memory;

the central processor 501 is configured to communicate with the memory 502 and execute the instruction operations in the memory 502 to perform the above-mentioned reverse attack method.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Claims

1. A reverse attack method of a black box model is characterized by comprising the following steps:

acquiring a feature extraction model trained based on preset training data;

2. A reverse attack method according to claim 1, wherein the preset training data comprises: image training data, the black box model comprising: an image recognition model;

3. The reverse attack method according to claim 2, wherein the training the mapping network based on the predicted probability of the preset training data and the data feature of the preset training data comprises:

4. A reverse attack method according to claim 3, wherein the data to be tested comprises: image data to be detected; inputting the target prediction probability obtained by inputting the data to be tested into the black box model into the trained mapping network, wherein the data characteristics of the data to be tested comprise:

5. The reverse attack method according to claim 1, wherein the optimizing the hidden vector in the data feature of the data to be detected based on a preset white-box model reverse attack algorithm to obtain the target hidden vector corresponding to the data to be detected comprises:

based on a preset white box model reverse attack algorithm:

obtaining a target hidden vector corresponding to the data to be detectedw ^* ；

W is a hidden vector in the data characteristics of the data to be detected, E is the characteristic extraction model, M is the mapping network, G is a generator corresponding to the data to be detected, y _c And predicting the probability of the target corresponding to the data to be detected.

6. The reverse attack method according to claim 1, further comprising, after obtaining the target characteristic value of the data to be tested;

acquiring a first fitness corresponding to the target hidden vector and a preset test vector corresponding to the data to be tested;

intersecting the dimensions of the target hidden vector and the preset test vector to obtain a child hidden vector corresponding to the target hidden vector;

comparing the first fitness corresponding to the target hidden vector with the second fitness corresponding to the child hidden vector to determine a hidden vector with higher fitness;

and taking the hidden vector with higher fitness as the target hidden vector, and returning to execute the step of obtaining the first fitness corresponding to the target hidden vector and the preset test vector corresponding to the data to be tested to obtain the hidden vector meeting the preset fitness.

7. The reverse attack method according to claim 6, wherein the obtaining of the preset test vector corresponding to the data to be tested comprises:

based on a preset mutation operator algorithm:

Wherein, w ⁽ⁱ⁾ Is a target implicit vector, beta, corresponding to the data to be measured ₁ The weights of the difference vectors obtained for the best fitness hidden vector,

as the most adaptive latent vector, beta ₂ Weights of difference vectors obtained for random hidden vectors, n _v Is the number of random difference vectors, k is the kth group of random difference vectors, </R>

Two sets of hidden vectors are arbitrary outside the target hidden vector.

8. A reverse attack apparatus of a black box model, comprising:

a training unit configured to train a mapping network based on the first training data and a label of the first training data;

the execution unit is used for inputting the label of the data to be detected into the trained mapping network to obtain the characteristic value of the data to be detected;

9. A reverse attack apparatus of a black box model, comprising:

the memory is a transient memory or a persistent memory;

the central processor is configured to communicate with the memory and execute the instructions in the memory to perform the method of any of claims 1-7.

10. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 7.