CN114547102A - Model stealing attack method based on gradient driving data generation - Google Patents
Model stealing attack method based on gradient driving data generation Download PDFInfo
- Publication number
- CN114547102A CN114547102A CN202210047190.7A CN202210047190A CN114547102A CN 114547102 A CN114547102 A CN 114547102A CN 202210047190 A CN202210047190 A CN 202210047190A CN 114547102 A CN114547102 A CN 114547102A
- Authority
- CN
- China
- Prior art keywords
- data
- model
- substitution
- training
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2425—Iterative querying; Query formulation based on the results of a preceding query
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a model stealing attack method based on gradient drive data generation, and belongs to the technical field of computers and information science. Firstly, carrying out K-means clustering on a data set; then inputting a query data set consisting of the clustering centroids into a target model through an API (application programming interface) to obtain a label corresponding to each piece of data; then training the surrogate model using the data-tag pairs as a training set; and finally, constructing an objective function according to the output difference between the objective model and the substitution model, and generating data based on the function gradient to iteratively train the substitution model. According to the method, data are generated based on the output difference gradient between the models, and the target model is stolen according to the data, so that the problem of low accuracy of the substitution model caused by unknown training set of the target model in the model stealing process is solved, the API calling times of the target model are reduced, and the model stealing efficiency is improved.
Description
Technical Field
The invention relates to a model stealing attack method based on gradient drive data generation, and belongs to the technical field of computers and information science.
Background
With the rapid development of machine learning, a plurality of fields such as image classification and malicious software identification solve corresponding problems by establishing a machine learning model. However, since the machine learning model relates to privacy sensitive information of training data and commercial value of practical application, the security of the machine learning model is always concerned. In recent years, the advent and development of model stealing technology has created a direct challenge to the security of models.
The problems to be solved by model stealing are: under the condition of lacking prior knowledge (training data, model structure, model parameters and the like) of the target model, black box access is carried out on the model by using a public access interface, a substitute model with very high similarity to the target model is constructed by using information such as access data, returned labels and the like, and then the confidentiality of the target model is damaged or further adversarial attack is developed. Combining existing model stealing methods can generally be classified into two categories:
1. model stealing method based on equation solution
Because the processing process of some simple models (such as support vector machines, neural network algorithms and the like) is to obtain output results by mapping input data through functions, a model similar to a target model can be easily constructed only by establishing parameters of a plurality of groups of equation solving functions by using the input data and the output results. However, as the structure of the model is more and more complex and the model parameters are increased, not only the number of data to be input into the query needs to be increased correspondingly, but also the calculation amount for solving the equation is multiplied.
2. Model stealing method based on training surrogate model
Data without labels are input into a target model through a public access interface to obtain corresponding information such as labels, a training set formed by inputting the data and the labels is used for training a substitution model with functions similar to those of the target model, but the conventional method still needs information such as seed samples or data distribution of original training data, and the similarity between the substitution model and the target model needs to be improved by increasing data query times under the condition that the information of the original training data is unknown.
In summary, although accurate parameters can be restored by equation solution-based model stealing, the number of times of data query input and the calculation amount for equation solution increase rapidly with the increase of model complexity, and thus, the method is only suitable for models with few parameters and small scale. Model stealing attacks based on the training surrogate model require knowledge about the training data or auxiliary data of the target deep neural network.
However, nowadays, people have increasingly strengthened awareness of protecting data, and model training data in some sensitive fields are strictly protected, so that an attacker cannot obtain the training data; in addition, the existing method needs to generate tags for data through a large amount of data query, and many models monitor and defend against model stealing attacks by using a method for limiting the number of times that a model API is accessed. Therefore, the invention provides a method for generating training data by using the decision boundary information of the target model and the gradient information of the alternative model, so as to improve the model stealing efficiency.
Disclosure of Invention
The invention aims to solve the problems that a training data set of a target model cannot be obtained in a model stealing process, and the accuracy of a substitution model is low due to limited target model API (application program interface) calling times, and provides a model stealing attack method based on gradient drive data generation.
The design principle of the invention is as follows: firstly, clustering a training set by using a clustering algorithm; then, inputting the data of the clustering center as input into a target model through an API (application programming interface) interface to obtain a label corresponding to each piece of data; then training a surrogate model using the data of the updated labels; finally, the difference between the target model and the substitution model is compared through a cross entropy loss function, if the loss is higher than a preset threshold value, new data are generated according to target model decision boundary information contained in the data of the updated label, and the new data are input into the target model again after being adjusted by the gradient information of the substitution model and are repeated; if the loss is lower than the preset threshold value, the substitute model at the moment can be used as the final output. The specific process is shown in figure 1.
The technical scheme of the invention is realized by the following steps:
Step 1.1, dividing the public data set into a training set and a testing set.
And 2, carrying out k-means clustering on the training set.
And 2.1, initializing the clustering number, randomly selecting a corresponding number of data points from the training set as the centroid, and presetting a clustering termination threshold value.
And 2.2, calculating the Euclidean distance between each data point in the training set and each centroid, and dividing the Euclidean distance into a set to which the closest centroid belongs.
And 2.3, after all the data are divided, recalculating the centroid of each set.
Step 2.4, if the distance between the new centroid and the original centroid is larger than a preset threshold value, repeating the steps 2.2 and 2.3; otherwise, terminating clustering and obtaining a data set formed by the new centroids as a query data set.
And 3, labeling the query data set by using the target model, and training the substitution model.
And 3.1, inputting the query data set into a target model to obtain a label corresponding to each piece of data to form a data-label pair.
And 3.2, training the substitution model by using the data-tag pair expanded data set.
And 4, comparing the difference between the substitution model and the target model by using a cross entropy loss function.
And 4.1, respectively inputting the test set into the target model and the substitution model, and calculating cross entropy loss of an output result.
And 4.2, comparing the cross entropy loss with a preset threshold, if the cross entropy loss is greater than the threshold, performing the step 5, otherwise, outputting a substitution model meeting the expectation.
And 5, inquiring data generation and data distillation.
And 5.1, calculating Euclidean distances among data in the data sets obtained in the steps 3.1 and 3.2.
And 5.2, generating new data between each piece of data and the nearest data of different types by using the decision boundary information of the target model contained in the label data of different types.
And 5.3, taking the generated data as initial distillation data, and initializing a distillation model by using the structure and parameters of the substitution model obtained in the step 3.2.
And 5.4, updating distillation model parameters by using the distillation data.
And 5.5, extracting data from the test set, respectively inputting the data into the substitution model and the distillation model, and taking the output difference of the two models as a loss function.
And 5.6, updating distillation data according to the loss function and the gradient descent formula.
And 5.7, repeating the steps 5.4-5.6 according to preset times to obtain distilled data.
And 5.8, taking the data set obtained by distillation as a new query data set, and repeating the step 3 and the step 4.
Advantageous effects
Compared with the existing model stealing method, the method has the advantages that a new data generation rule is introduced, the decision boundary information of the target model is fully utilized to generate the data set training substitution model, and the similarity between the substitution model and the decision boundary of the target model is improved; the data set distillation is guided by using the parameters of the substitution model in the data distillation method, and the loss function for maximizing the difference between the target model and the substitution model is constructed, so that the distilled data can accelerate the training process of the substitution model, the API (application program interface) calling times are reduced, and the model stealing efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of a model stealing attack method based on gradient drive data generation according to the present invention.
Detailed Description
To better illustrate the objects and advantages of the present invention, embodiments of the method of the present invention are described in further detail below with reference to the accompanying drawings and examples.
The specific process is as follows:
Step 1.1, dividing the public data set into training sets T according to the proportion of 4: 1trainAnd test set Ttest。
And 2, carrying out k-means clustering on the training set.
Step 2.1, initialize the clustering number k, from the training set Ttrain={t1,t2,...,tNRandomly selecting k data as initial mass center, and recording as c1,c2,...,ckAnd (4) presetting a clustering termination threshold value mu.
Step 2.2, calculate data ti(i ═ 1, 2.. times.n) and each centroid cjEuclidean distance d of (j ═ 1, 2.. k)i,jThe distance is calculated by the formulaWill tiDividing the cluster into the sets to which the nearest centroids belong to obtain k clustering clusters (C)1,C2,...,Ck)。
Step 2.4, if c'iAnd ciIs greater than threshold value mu, then c'iThe step 2.2 and the step 2.3 are repeated for the centroid; otherwise, terminating the clustering to obtain a data set { c 'composed of centroids'1,c′2,...,c′kIt is regarded as a query data set and recorded asWherein xi=c′i。
And 3, labeling the query data set by using the target model, and training the substitution model.
Step 3.1, inputting the query data set X into the target model to obtain each piece of data XiLabel y ofiForm a data-tag pair
And 4, comparing the difference between the substitution model and the target model by using a cross entropy loss function.
Step 4.1, test set TtestRespectively inputting the data into the target model and the substitute model, and recording the output of the target model as a real label yjThe output probability of the surrogate model is denoted as PjCalculating cross entropy loss
And 4.2, comparing the cross entropy loss E with a preset threshold Th, continuing to perform the step 5 if the cross entropy loss E is larger than the threshold, and otherwise, outputting a substitution model meeting the expectation.
And 5, generating data and distilling the data.
Step 5.2, decision boundary information of the target model contained in the label data of different types is utilized, and each piece of data x is subjected to decision boundary information processingiWith the most recent non-homogeneous data xjNew data x 'is generated'iX 'is generated as formula'i=αxi+(1-α)xjWhereinxj∈Xtrain,α∈(0.5,1);x′iIs inherited from xi,xiAnd xjThe distance between them is recorded as di. Thereby obtaining a data set
And 5.3, taking X 'as initial distillation data, initializing a distillation model by using the structure and parameters of the model M, and recording the parameters of the distillation model M' as theta0。
Step 5.4, using the formulaUpdating distillation model parameters, wherein l(x′i,θ0) Represents a data set x'iAt a parameter θ0η represents the learning rate.
Step 5.5, from test set TtestIn randomly extracting data setRespectively inputting the obtained product into a substitution model M and a distillation model M' to obtain an output y ═ fM(Tm,θ0) And y ═ fM′(Tm,θ1) Wherein f isM(Tm,θ0) Representing a data set TmInput to a parameter of theta0The output obtained after model M. Then calculating the loss function L, formula
And 5.7, repeating the steps 5.4-5.6 according to preset times to obtain data X' after distillation.
And 5.8, taking the data set X' obtained by distillation as a query data set, and repeating the step 3 and the step 4.
As described above, the present invention can be preferably realized.
The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (3)
1. A model stealing attack method based on gradient drive data generation is characterized by comprising the following steps:
step 1, performing k-means clustering on the public data set, and after initializing the clustering number and the centroid, dividing a set by using the distance between each piece of data and the centroid; obtaining a final cluster by recalculating the centroid loop iteration of the set, and further obtaining the centroid of each cluster to form a query data set;
step 2, labeling the data by using a target model, inputting the data in the query data set into the target model to obtain a label corresponding to each piece of data, and training a substitution model by using the data set consisting of data-label pairs;
step 3, comparing the difference between the substitution model and the target model by using a cross entropy loss function, respectively inputting the same samples into the target model and the substitution model, calculating cross entropy loss according to the output of the target model and the substitution model, and determining whether to continue generating query data and training the substitution model according to the size of the cross entropy loss;
and 4, generating new data between each piece of data in the training set of the round and the latest heterogeneous data, taking the new data as initial distillation data, distilling the data by utilizing the output difference of the substitution model and the distillation model, taking the finally distilled data as next query data, and repeating the processes of label query and substitution model training.
2. The gradient-driven data generation-based model stealing attack method according to claim 1, characterized in that: in step 4, the data set X 'is generated by the method of generating data set X' for each piece of data XiWith the most recent non-homogeneous data xjNew data x 'is generated'iX 'is generated as formula'i=αxi+(1-α)xjWhereinxj∈Xtrain,α∈(0.5,1)。
3. The gradient-driven data generation-based model stealing attack method according to claim 1, characterized in that: using the parameter θ of the surrogate model in step 40Calculating a loss function using output differences of the surrogate model and the distillation model in place of randomly initialized model parameters during distillation of the data, the formula beingAnd using the formulaUpdating distillation data
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210047190.7A CN114547102A (en) | 2022-01-14 | 2022-01-14 | Model stealing attack method based on gradient driving data generation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210047190.7A CN114547102A (en) | 2022-01-14 | 2022-01-14 | Model stealing attack method based on gradient driving data generation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114547102A true CN114547102A (en) | 2022-05-27 |
Family
ID=81671168
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210047190.7A Pending CN114547102A (en) | 2022-01-14 | 2022-01-14 | Model stealing attack method based on gradient driving data generation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114547102A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116680727A (en) * | 2023-08-01 | 2023-09-01 | 北京航空航天大学 | Function stealing defense method for image classification model |
CN117496118A (en) * | 2023-10-23 | 2024-02-02 | 浙江大学 | Method and system for analyzing steal vulnerability of target detection model |
-
2022
- 2022-01-14 CN CN202210047190.7A patent/CN114547102A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116680727A (en) * | 2023-08-01 | 2023-09-01 | 北京航空航天大学 | Function stealing defense method for image classification model |
CN116680727B (en) * | 2023-08-01 | 2023-11-03 | 北京航空航天大学 | Function stealing defense method for image classification model |
CN117496118A (en) * | 2023-10-23 | 2024-02-02 | 浙江大学 | Method and system for analyzing steal vulnerability of target detection model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Farahnakian et al. | A deep auto-encoder based approach for intrusion detection system | |
Shen et al. | BBAS: Towards large scale effective ensemble adversarial attacks against deep neural network learning | |
Ren et al. | Grnn: generative regression neural network—a data leakage attack for federated learning | |
CN111461155A (en) | Apparatus and method for training classification model | |
Zhao et al. | A malware detection method of code texture visualization based on an improved faster RCNN combining transfer learning | |
CN114547102A (en) | Model stealing attack method based on gradient driving data generation | |
Yuan et al. | IoT malware classification based on lightweight convolutional neural networks | |
CN112668482A (en) | Face recognition training method and device, computer equipment and storage medium | |
Wu et al. | Genetic algorithm with multiple fitness functions for generating adversarial examples | |
Chaaraoui et al. | Human action recognition optimization based on evolutionary feature subset selection | |
Kenaza et al. | An efficient hybrid svdd/clustering approach for anomaly-based intrusion detection | |
CN113656700A (en) | Hash retrieval method based on multi-similarity consistent matrix decomposition | |
Ponce-López et al. | Gesture and action recognition by evolved dynamic subgestures | |
KR20190028880A (en) | Method and appratus for generating machine learning data for botnet detection system | |
Acharya et al. | EfficientNet-based convolutional neural networks for malware classification | |
Sani et al. | Learning a new distance metric to improve an SVM-clustering based intrusion detection system | |
Bui et al. | A clustering-based shrink autoencoder for detecting anomalies in intrusion detection systems | |
Dong et al. | Kinship classification based on discriminative facial patches | |
Chao et al. | Research on network intrusion detection technology based on dcgan | |
Soliman et al. | A network intrusions detection system based on a quantum bio inspired algorithm | |
Smirnov et al. | Prototype memory for large-scale face representation learning | |
Li et al. | Online alternate generator against adversarial attacks | |
CN111160077A (en) | Large-scale dynamic face clustering method | |
Genç et al. | A taxonomic survey of model extraction attacks | |
CN115567224A (en) | Method for detecting abnormal transaction of block chain and related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |