CN114547102A

CN114547102A - Model stealing attack method based on gradient driving data generation

Info

Publication number: CN114547102A
Application number: CN202210047190.7A
Authority: CN
Inventors: 潘丽敏; 丁杨; 罗森林; 张荣倩; 吴杭颐
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2022-05-27

Abstract

The invention relates to a model stealing attack method based on gradient drive data generation, and belongs to the technical field of computers and information science. Firstly, carrying out K-means clustering on a data set; then inputting a query data set consisting of the clustering centroids into a target model through an API (application programming interface) to obtain a label corresponding to each piece of data; then training the surrogate model using the data-tag pairs as a training set; and finally, constructing an objective function according to the output difference between the objective model and the substitution model, and generating data based on the function gradient to iteratively train the substitution model. According to the method, data are generated based on the output difference gradient between the models, and the target model is stolen according to the data, so that the problem of low accuracy of the substitution model caused by unknown training set of the target model in the model stealing process is solved, the API calling times of the target model are reduced, and the model stealing efficiency is improved.

Description

Model stealing attack method based on gradient driving data generation

Technical Field

The invention relates to a model stealing attack method based on gradient drive data generation, and belongs to the technical field of computers and information science.

Background

With the rapid development of machine learning, a plurality of fields such as image classification and malicious software identification solve corresponding problems by establishing a machine learning model. However, since the machine learning model relates to privacy sensitive information of training data and commercial value of practical application, the security of the machine learning model is always concerned. In recent years, the advent and development of model stealing technology has created a direct challenge to the security of models.

The problems to be solved by model stealing are: under the condition of lacking prior knowledge (training data, model structure, model parameters and the like) of the target model, black box access is carried out on the model by using a public access interface, a substitute model with very high similarity to the target model is constructed by using information such as access data, returned labels and the like, and then the confidentiality of the target model is damaged or further adversarial attack is developed. Combining existing model stealing methods can generally be classified into two categories:

1. model stealing method based on equation solution

Because the processing process of some simple models (such as support vector machines, neural network algorithms and the like) is to obtain output results by mapping input data through functions, a model similar to a target model can be easily constructed only by establishing parameters of a plurality of groups of equation solving functions by using the input data and the output results. However, as the structure of the model is more and more complex and the model parameters are increased, not only the number of data to be input into the query needs to be increased correspondingly, but also the calculation amount for solving the equation is multiplied.

2. Model stealing method based on training surrogate model

Data without labels are input into a target model through a public access interface to obtain corresponding information such as labels, a training set formed by inputting the data and the labels is used for training a substitution model with functions similar to those of the target model, but the conventional method still needs information such as seed samples or data distribution of original training data, and the similarity between the substitution model and the target model needs to be improved by increasing data query times under the condition that the information of the original training data is unknown.

In summary, although accurate parameters can be restored by equation solution-based model stealing, the number of times of data query input and the calculation amount for equation solution increase rapidly with the increase of model complexity, and thus, the method is only suitable for models with few parameters and small scale. Model stealing attacks based on the training surrogate model require knowledge about the training data or auxiliary data of the target deep neural network.

However, nowadays, people have increasingly strengthened awareness of protecting data, and model training data in some sensitive fields are strictly protected, so that an attacker cannot obtain the training data; in addition, the existing method needs to generate tags for data through a large amount of data query, and many models monitor and defend against model stealing attacks by using a method for limiting the number of times that a model API is accessed. Therefore, the invention provides a method for generating training data by using the decision boundary information of the target model and the gradient information of the alternative model, so as to improve the model stealing efficiency.

Disclosure of Invention

The invention aims to solve the problems that a training data set of a target model cannot be obtained in a model stealing process, and the accuracy of a substitution model is low due to limited target model API (application program interface) calling times, and provides a model stealing attack method based on gradient drive data generation.

The design principle of the invention is as follows: firstly, clustering a training set by using a clustering algorithm; then, inputting the data of the clustering center as input into a target model through an API (application programming interface) interface to obtain a label corresponding to each piece of data; then training a surrogate model using the data of the updated labels; finally, the difference between the target model and the substitution model is compared through a cross entropy loss function, if the loss is higher than a preset threshold value, new data are generated according to target model decision boundary information contained in the data of the updated label, and the new data are input into the target model again after being adjusted by the gradient information of the substitution model and are repeated; if the loss is lower than the preset threshold value, the substitute model at the moment can be used as the final output. The specific process is shown in figure 1.

The technical scheme of the invention is realized by the following steps:

step 1, dividing the public data set.

Step 1.1, dividing the public data set into a training set and a testing set.

And 2, carrying out k-means clustering on the training set.

And 2.1, initializing the clustering number, randomly selecting a corresponding number of data points from the training set as the centroid, and presetting a clustering termination threshold value.

And 2.2, calculating the Euclidean distance between each data point in the training set and each centroid, and dividing the Euclidean distance into a set to which the closest centroid belongs.

And 2.3, after all the data are divided, recalculating the centroid of each set.

Step 2.4, if the distance between the new centroid and the original centroid is larger than a preset threshold value, repeating the steps 2.2 and 2.3; otherwise, terminating clustering and obtaining a data set formed by the new centroids as a query data set.

And 3, labeling the query data set by using the target model, and training the substitution model.

And 3.1, inputting the query data set into a target model to obtain a label corresponding to each piece of data to form a data-label pair.

And 3.2, training the substitution model by using the data-tag pair expanded data set.

And 4, comparing the difference between the substitution model and the target model by using a cross entropy loss function.

And 4.1, respectively inputting the test set into the target model and the substitution model, and calculating cross entropy loss of an output result.

And 4.2, comparing the cross entropy loss with a preset threshold, if the cross entropy loss is greater than the threshold, performing the step 5, otherwise, outputting a substitution model meeting the expectation.

And 5, inquiring data generation and data distillation.

And 5.1, calculating Euclidean distances among data in the data sets obtained in the steps 3.1 and 3.2.

And 5.2, generating new data between each piece of data and the nearest data of different types by using the decision boundary information of the target model contained in the label data of different types.

And 5.3, taking the generated data as initial distillation data, and initializing a distillation model by using the structure and parameters of the substitution model obtained in the step 3.2.

And 5.4, updating distillation model parameters by using the distillation data.

And 5.5, extracting data from the test set, respectively inputting the data into the substitution model and the distillation model, and taking the output difference of the two models as a loss function.

And 5.6, updating distillation data according to the loss function and the gradient descent formula.

And 5.7, repeating the steps 5.4-5.6 according to preset times to obtain distilled data.

And 5.8, taking the data set obtained by distillation as a new query data set, and repeating the step 3 and the step 4.

Advantageous effects

Compared with the existing model stealing method, the method has the advantages that a new data generation rule is introduced, the decision boundary information of the target model is fully utilized to generate the data set training substitution model, and the similarity between the substitution model and the decision boundary of the target model is improved; the data set distillation is guided by using the parameters of the substitution model in the data distillation method, and the loss function for maximizing the difference between the target model and the substitution model is constructed, so that the distilled data can accelerate the training process of the substitution model, the API (application program interface) calling times are reduced, and the model stealing efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of a model stealing attack method based on gradient drive data generation according to the present invention.

Detailed Description

To better illustrate the objects and advantages of the present invention, embodiments of the method of the present invention are described in further detail below with reference to the accompanying drawings and examples.

The specific process is as follows:

step 1, dividing the public data set.

Step 1.1, dividing the public data set into training sets T according to the proportion of 4: 1_trainAnd test set T_test。

And 2, carrying out k-means clustering on the training set.

Step 2.1, initialize the clustering number k, from the training set T_train＝{t₁，t₂，...，t_NRandomly selecting k data as initial mass center, and recording as c₁，c₂，...，c_kAnd (4) presetting a clustering termination threshold value mu.

Step 2.2, calculate data t_i(i ═ 1, 2.. times.n) and each centroid c_jEuclidean distance d of (j ═ 1, 2.. k)_i，jThe distance is calculated by the formula

Will t_iDividing the cluster into the sets to which the nearest centroids belong to obtain k clustering clusters (C)₁，C₂，...，C_k)。

Step 2.3, with each C_iRecalculate the centroid of the data contained therein

Step 2.4, if c'_iAnd c_iIs greater than threshold value mu, then c'_iThe step 2.2 and the step 2.3 are repeated for the centroid; otherwise, terminating the clustering to obtain a data set { c 'composed of centroids'₁，c′₂，...，c′_kIt is regarded as a query data set and recorded as

Wherein x_i＝c′_i。

Step 3.1, inputting the query data set X into the target model to obtain each piece of data X_iLabel y of_iForm a data-tag pair

Step 3.2, use

Augmenting data sets

Then, the substitution model is trained to obtain a model M.

Step 4.1, test set T_testRespectively inputting the data into the target model and the substitute model, and recording the output of the target model as a real label y_jThe output probability of the surrogate model is denoted as P_jCalculating cross entropy loss

And 4.2, comparing the cross entropy loss E with a preset threshold Th, continuing to perform the step 5 if the cross entropy loss E is larger than the threshold, and otherwise, outputting a substitution model meeting the expectation.

And 5, generating data and distilling the data.

Step 5.1, calculate

And X_trainEuclidean distance between two middle data.

Step 5.2, decision boundary information of the target model contained in the label data of different types is utilized, and each piece of data x is subjected to decision boundary information processing_iWith the most recent non-homogeneous data x_jNew data x 'is generated'_iX 'is generated as formula'_i＝αx_i+(1-α)x_jWherein

x_j∈X_train，α∈(0.5，1)；x′_iIs inherited from x_i，x_iAnd x_jThe distance between them is recorded as d_i. Thereby obtaining a data set

And 5.3, taking X 'as initial distillation data, initializing a distillation model by using the structure and parameters of the model M, and recording the parameters of the distillation model M' as theta₀。

Step 5.4, using the formula

Updating distillation model parameters, wherein

l(x′_i，θ₀) Represents a data set x'_iAt a parameter θ₀η represents the learning rate.

Step 5.5, from test set T_testIn randomly extracting data set

Respectively inputting the obtained product into a substitution model M and a distillation model M' to obtain an output y ═ f_M(T_m，θ₀) And y ═ f_M′(T_m，θ₁) Wherein f is_M(T_m，θ₀) Representing a data set T_mInput to a parameter of theta₀The output obtained after model M. Then calculating the loss function L, formula

Step 5.6, according to the formula

Updating distillation data

Wherein d is_iCorresponding to step 5.2.

And 5.7, repeating the steps 5.4-5.6 according to preset times to obtain data X' after distillation.

And 5.8, taking the data set X' obtained by distillation as a query data set, and repeating the step 3 and the step 4.

As described above, the present invention can be preferably realized.

The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A model stealing attack method based on gradient drive data generation is characterized by comprising the following steps:

step 1, performing k-means clustering on the public data set, and after initializing the clustering number and the centroid, dividing a set by using the distance between each piece of data and the centroid; obtaining a final cluster by recalculating the centroid loop iteration of the set, and further obtaining the centroid of each cluster to form a query data set;

step 2, labeling the data by using a target model, inputting the data in the query data set into the target model to obtain a label corresponding to each piece of data, and training a substitution model by using the data set consisting of data-label pairs;

step 3, comparing the difference between the substitution model and the target model by using a cross entropy loss function, respectively inputting the same samples into the target model and the substitution model, calculating cross entropy loss according to the output of the target model and the substitution model, and determining whether to continue generating query data and training the substitution model according to the size of the cross entropy loss;

and 4, generating new data between each piece of data in the training set of the round and the latest heterogeneous data, taking the new data as initial distillation data, distilling the data by utilizing the output difference of the substitution model and the distillation model, taking the finally distilled data as next query data, and repeating the processes of label query and substitution model training.

2. The gradient-driven data generation-based model stealing attack method according to claim 1, characterized in that: in step 4, the data set X 'is generated by the method of generating data set X' for each piece of data X_iWith the most recent non-homogeneous data x_jNew data x 'is generated'_iX 'is generated as formula'_i＝αx_i+(1-α)x_jWherein

x_j∈X_train，α∈(0.5,1)。

3. The gradient-driven data generation-based model stealing attack method according to claim 1, characterized in that: using the parameter θ of the surrogate model in step 4₀Calculating a loss function using output differences of the surrogate model and the distillation model in place of randomly initialized model parameters during distillation of the data, the formula being

And using the formula

Updating distillation data