CN114547102A - Model stealing attack method based on gradient driving data generation - Google Patents

Model stealing attack method based on gradient driving data generation Download PDF

Info

Publication number
CN114547102A
CN114547102A CN202210047190.7A CN202210047190A CN114547102A CN 114547102 A CN114547102 A CN 114547102A CN 202210047190 A CN202210047190 A CN 202210047190A CN 114547102 A CN114547102 A CN 114547102A
Authority
CN
China
Prior art keywords
data
model
substitution
training
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210047190.7A
Other languages
Chinese (zh)
Inventor
潘丽敏
丁杨
罗森林
张荣倩
吴杭颐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202210047190.7A priority Critical patent/CN114547102A/en
Publication of CN114547102A publication Critical patent/CN114547102A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a model stealing attack method based on gradient drive data generation, and belongs to the technical field of computers and information science. Firstly, carrying out K-means clustering on a data set; then inputting a query data set consisting of the clustering centroids into a target model through an API (application programming interface) to obtain a label corresponding to each piece of data; then training the surrogate model using the data-tag pairs as a training set; and finally, constructing an objective function according to the output difference between the objective model and the substitution model, and generating data based on the function gradient to iteratively train the substitution model. According to the method, data are generated based on the output difference gradient between the models, and the target model is stolen according to the data, so that the problem of low accuracy of the substitution model caused by unknown training set of the target model in the model stealing process is solved, the API calling times of the target model are reduced, and the model stealing efficiency is improved.

Description

Model stealing attack method based on gradient driving data generation
Technical Field
The invention relates to a model stealing attack method based on gradient drive data generation, and belongs to the technical field of computers and information science.
Background
With the rapid development of machine learning, a plurality of fields such as image classification and malicious software identification solve corresponding problems by establishing a machine learning model. However, since the machine learning model relates to privacy sensitive information of training data and commercial value of practical application, the security of the machine learning model is always concerned. In recent years, the advent and development of model stealing technology has created a direct challenge to the security of models.
The problems to be solved by model stealing are: under the condition of lacking prior knowledge (training data, model structure, model parameters and the like) of the target model, black box access is carried out on the model by using a public access interface, a substitute model with very high similarity to the target model is constructed by using information such as access data, returned labels and the like, and then the confidentiality of the target model is damaged or further adversarial attack is developed. Combining existing model stealing methods can generally be classified into two categories:
1. model stealing method based on equation solution
Because the processing process of some simple models (such as support vector machines, neural network algorithms and the like) is to obtain output results by mapping input data through functions, a model similar to a target model can be easily constructed only by establishing parameters of a plurality of groups of equation solving functions by using the input data and the output results. However, as the structure of the model is more and more complex and the model parameters are increased, not only the number of data to be input into the query needs to be increased correspondingly, but also the calculation amount for solving the equation is multiplied.
2. Model stealing method based on training surrogate model
Data without labels are input into a target model through a public access interface to obtain corresponding information such as labels, a training set formed by inputting the data and the labels is used for training a substitution model with functions similar to those of the target model, but the conventional method still needs information such as seed samples or data distribution of original training data, and the similarity between the substitution model and the target model needs to be improved by increasing data query times under the condition that the information of the original training data is unknown.
In summary, although accurate parameters can be restored by equation solution-based model stealing, the number of times of data query input and the calculation amount for equation solution increase rapidly with the increase of model complexity, and thus, the method is only suitable for models with few parameters and small scale. Model stealing attacks based on the training surrogate model require knowledge about the training data or auxiliary data of the target deep neural network.
However, nowadays, people have increasingly strengthened awareness of protecting data, and model training data in some sensitive fields are strictly protected, so that an attacker cannot obtain the training data; in addition, the existing method needs to generate tags for data through a large amount of data query, and many models monitor and defend against model stealing attacks by using a method for limiting the number of times that a model API is accessed. Therefore, the invention provides a method for generating training data by using the decision boundary information of the target model and the gradient information of the alternative model, so as to improve the model stealing efficiency.
Disclosure of Invention
The invention aims to solve the problems that a training data set of a target model cannot be obtained in a model stealing process, and the accuracy of a substitution model is low due to limited target model API (application program interface) calling times, and provides a model stealing attack method based on gradient drive data generation.
The design principle of the invention is as follows: firstly, clustering a training set by using a clustering algorithm; then, inputting the data of the clustering center as input into a target model through an API (application programming interface) interface to obtain a label corresponding to each piece of data; then training a surrogate model using the data of the updated labels; finally, the difference between the target model and the substitution model is compared through a cross entropy loss function, if the loss is higher than a preset threshold value, new data are generated according to target model decision boundary information contained in the data of the updated label, and the new data are input into the target model again after being adjusted by the gradient information of the substitution model and are repeated; if the loss is lower than the preset threshold value, the substitute model at the moment can be used as the final output. The specific process is shown in figure 1.
The technical scheme of the invention is realized by the following steps:
step 1, dividing the public data set.
Step 1.1, dividing the public data set into a training set and a testing set.
And 2, carrying out k-means clustering on the training set.
And 2.1, initializing the clustering number, randomly selecting a corresponding number of data points from the training set as the centroid, and presetting a clustering termination threshold value.
And 2.2, calculating the Euclidean distance between each data point in the training set and each centroid, and dividing the Euclidean distance into a set to which the closest centroid belongs.
And 2.3, after all the data are divided, recalculating the centroid of each set.
Step 2.4, if the distance between the new centroid and the original centroid is larger than a preset threshold value, repeating the steps 2.2 and 2.3; otherwise, terminating clustering and obtaining a data set formed by the new centroids as a query data set.
And 3, labeling the query data set by using the target model, and training the substitution model.
And 3.1, inputting the query data set into a target model to obtain a label corresponding to each piece of data to form a data-label pair.
And 3.2, training the substitution model by using the data-tag pair expanded data set.
And 4, comparing the difference between the substitution model and the target model by using a cross entropy loss function.
And 4.1, respectively inputting the test set into the target model and the substitution model, and calculating cross entropy loss of an output result.
And 4.2, comparing the cross entropy loss with a preset threshold, if the cross entropy loss is greater than the threshold, performing the step 5, otherwise, outputting a substitution model meeting the expectation.
And 5, inquiring data generation and data distillation.
And 5.1, calculating Euclidean distances among data in the data sets obtained in the steps 3.1 and 3.2.
And 5.2, generating new data between each piece of data and the nearest data of different types by using the decision boundary information of the target model contained in the label data of different types.
And 5.3, taking the generated data as initial distillation data, and initializing a distillation model by using the structure and parameters of the substitution model obtained in the step 3.2.
And 5.4, updating distillation model parameters by using the distillation data.
And 5.5, extracting data from the test set, respectively inputting the data into the substitution model and the distillation model, and taking the output difference of the two models as a loss function.
And 5.6, updating distillation data according to the loss function and the gradient descent formula.
And 5.7, repeating the steps 5.4-5.6 according to preset times to obtain distilled data.
And 5.8, taking the data set obtained by distillation as a new query data set, and repeating the step 3 and the step 4.
Advantageous effects
Compared with the existing model stealing method, the method has the advantages that a new data generation rule is introduced, the decision boundary information of the target model is fully utilized to generate the data set training substitution model, and the similarity between the substitution model and the decision boundary of the target model is improved; the data set distillation is guided by using the parameters of the substitution model in the data distillation method, and the loss function for maximizing the difference between the target model and the substitution model is constructed, so that the distilled data can accelerate the training process of the substitution model, the API (application program interface) calling times are reduced, and the model stealing efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of a model stealing attack method based on gradient drive data generation according to the present invention.
Detailed Description
To better illustrate the objects and advantages of the present invention, embodiments of the method of the present invention are described in further detail below with reference to the accompanying drawings and examples.
The specific process is as follows:
step 1, dividing the public data set.
Step 1.1, dividing the public data set into training sets T according to the proportion of 4: 1trainAnd test set Ttest
And 2, carrying out k-means clustering on the training set.
Step 2.1, initialize the clustering number k, from the training set Ttrain={t1,t2,...,tNRandomly selecting k data as initial mass center, and recording as c1,c2,...,ckAnd (4) presetting a clustering termination threshold value mu.
Step 2.2, calculate data ti(i ═ 1, 2.. times.n) and each centroid cjEuclidean distance d of (j ═ 1, 2.. k)i,jThe distance is calculated by the formula
Figure BDA0003469752180000041
Will tiDividing the cluster into the sets to which the nearest centroids belong to obtain k clustering clusters (C)1,C2,...,Ck)。
Step 2.3, with each CiRecalculate the centroid of the data contained therein
Figure BDA0003469752180000042
Step 2.4, if c'iAnd ciIs greater than threshold value mu, then c'iThe step 2.2 and the step 2.3 are repeated for the centroid; otherwise, terminating the clustering to obtain a data set { c 'composed of centroids'1,c′2,...,c′kIt is regarded as a query data set and recorded as
Figure BDA0003469752180000043
Wherein xi=c′i
And 3, labeling the query data set by using the target model, and training the substitution model.
Step 3.1, inputting the query data set X into the target model to obtain each piece of data XiLabel y ofiForm a data-tag pair
Figure BDA0003469752180000044
Step 3.2, use
Figure BDA0003469752180000045
Augmenting data sets
Figure BDA0003469752180000046
Then, the substitution model is trained to obtain a model M.
And 4, comparing the difference between the substitution model and the target model by using a cross entropy loss function.
Step 4.1, test set TtestRespectively inputting the data into the target model and the substitute model, and recording the output of the target model as a real label yjThe output probability of the surrogate model is denoted as PjCalculating cross entropy loss
Figure BDA0003469752180000047
Figure BDA0003469752180000048
And 4.2, comparing the cross entropy loss E with a preset threshold Th, continuing to perform the step 5 if the cross entropy loss E is larger than the threshold, and otherwise, outputting a substitution model meeting the expectation.
And 5, generating data and distilling the data.
Step 5.1, calculate
Figure BDA0003469752180000049
And XtrainEuclidean distance between two middle data.
Step 5.2, decision boundary information of the target model contained in the label data of different types is utilized, and each piece of data x is subjected to decision boundary information processingiWith the most recent non-homogeneous data xjNew data x 'is generated'iX 'is generated as formula'i=αxi+(1-α)xjWherein
Figure BDA00034697521800000410
xj∈Xtrain,α∈(0.5,1);x′iIs inherited from xi,xiAnd xjThe distance between them is recorded as di. Thereby obtaining a data set
Figure BDA0003469752180000051
And 5.3, taking X 'as initial distillation data, initializing a distillation model by using the structure and parameters of the model M, and recording the parameters of the distillation model M' as theta0
Step 5.4, using the formula
Figure BDA0003469752180000052
Updating distillation model parameters, wherein
Figure BDA0003469752180000053
Figure BDA0003469752180000054
l(x′i,θ0) Represents a data set x'iAt a parameter θ0η represents the learning rate.
Step 5.5, from test set TtestIn randomly extracting data set
Figure BDA0003469752180000055
Respectively inputting the obtained product into a substitution model M and a distillation model M' to obtain an output y ═ fM(Tm,θ0) And y ═ fM′(Tm,θ1) Wherein f isM(Tm,θ0) Representing a data set TmInput to a parameter of theta0The output obtained after model M. Then calculating the loss function L, formula
Figure BDA0003469752180000056
Step 5.6, according to the formula
Figure BDA0003469752180000057
Updating distillation data
Figure BDA0003469752180000058
Wherein d isiCorresponding to step 5.2.
And 5.7, repeating the steps 5.4-5.6 according to preset times to obtain data X' after distillation.
And 5.8, taking the data set X' obtained by distillation as a query data set, and repeating the step 3 and the step 4.
As described above, the present invention can be preferably realized.
The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (3)

1. A model stealing attack method based on gradient drive data generation is characterized by comprising the following steps:
step 1, performing k-means clustering on the public data set, and after initializing the clustering number and the centroid, dividing a set by using the distance between each piece of data and the centroid; obtaining a final cluster by recalculating the centroid loop iteration of the set, and further obtaining the centroid of each cluster to form a query data set;
step 2, labeling the data by using a target model, inputting the data in the query data set into the target model to obtain a label corresponding to each piece of data, and training a substitution model by using the data set consisting of data-label pairs;
step 3, comparing the difference between the substitution model and the target model by using a cross entropy loss function, respectively inputting the same samples into the target model and the substitution model, calculating cross entropy loss according to the output of the target model and the substitution model, and determining whether to continue generating query data and training the substitution model according to the size of the cross entropy loss;
and 4, generating new data between each piece of data in the training set of the round and the latest heterogeneous data, taking the new data as initial distillation data, distilling the data by utilizing the output difference of the substitution model and the distillation model, taking the finally distilled data as next query data, and repeating the processes of label query and substitution model training.
2. The gradient-driven data generation-based model stealing attack method according to claim 1, characterized in that: in step 4, the data set X 'is generated by the method of generating data set X' for each piece of data XiWith the most recent non-homogeneous data xjNew data x 'is generated'iX 'is generated as formula'i=αxi+(1-α)xjWherein
Figure FDA0003469752170000011
xj∈Xtrain,α∈(0.5,1)。
3. The gradient-driven data generation-based model stealing attack method according to claim 1, characterized in that: using the parameter θ of the surrogate model in step 40Calculating a loss function using output differences of the surrogate model and the distillation model in place of randomly initialized model parameters during distillation of the data, the formula being
Figure FDA0003469752170000012
And using the formula
Figure FDA0003469752170000013
Updating distillation data
Figure FDA0003469752170000014
CN202210047190.7A 2022-01-14 2022-01-14 Model stealing attack method based on gradient driving data generation Pending CN114547102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210047190.7A CN114547102A (en) 2022-01-14 2022-01-14 Model stealing attack method based on gradient driving data generation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210047190.7A CN114547102A (en) 2022-01-14 2022-01-14 Model stealing attack method based on gradient driving data generation

Publications (1)

Publication Number Publication Date
CN114547102A true CN114547102A (en) 2022-05-27

Family

ID=81671168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210047190.7A Pending CN114547102A (en) 2022-01-14 2022-01-14 Model stealing attack method based on gradient driving data generation

Country Status (1)

Country Link
CN (1) CN114547102A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680727A (en) * 2023-08-01 2023-09-01 北京航空航天大学 Function stealing defense method for image classification model
CN117496118A (en) * 2023-10-23 2024-02-02 浙江大学 Method and system for analyzing steal vulnerability of target detection model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680727A (en) * 2023-08-01 2023-09-01 北京航空航天大学 Function stealing defense method for image classification model
CN116680727B (en) * 2023-08-01 2023-11-03 北京航空航天大学 Function stealing defense method for image classification model
CN117496118A (en) * 2023-10-23 2024-02-02 浙江大学 Method and system for analyzing steal vulnerability of target detection model

Similar Documents

Publication Publication Date Title
Farahnakian et al. A deep auto-encoder based approach for intrusion detection system
Shen et al. BBAS: Towards large scale effective ensemble adversarial attacks against deep neural network learning
Ren et al. Grnn: generative regression neural network—a data leakage attack for federated learning
CN111461155A (en) Apparatus and method for training classification model
Zhao et al. A malware detection method of code texture visualization based on an improved faster RCNN combining transfer learning
CN114547102A (en) Model stealing attack method based on gradient driving data generation
Yuan et al. IoT malware classification based on lightweight convolutional neural networks
CN112668482A (en) Face recognition training method and device, computer equipment and storage medium
Wu et al. Genetic algorithm with multiple fitness functions for generating adversarial examples
Chaaraoui et al. Human action recognition optimization based on evolutionary feature subset selection
Kenaza et al. An efficient hybrid svdd/clustering approach for anomaly-based intrusion detection
CN113656700A (en) Hash retrieval method based on multi-similarity consistent matrix decomposition
Ponce-López et al. Gesture and action recognition by evolved dynamic subgestures
KR20190028880A (en) Method and appratus for generating machine learning data for botnet detection system
Acharya et al. EfficientNet-based convolutional neural networks for malware classification
Sani et al. Learning a new distance metric to improve an SVM-clustering based intrusion detection system
Bui et al. A clustering-based shrink autoencoder for detecting anomalies in intrusion detection systems
Dong et al. Kinship classification based on discriminative facial patches
Chao et al. Research on network intrusion detection technology based on dcgan
Soliman et al. A network intrusions detection system based on a quantum bio inspired algorithm
Smirnov et al. Prototype memory for large-scale face representation learning
Li et al. Online alternate generator against adversarial attacks
CN111160077A (en) Large-scale dynamic face clustering method
Genç et al. A taxonomic survey of model extraction attacks
CN115567224A (en) Method for detecting abnormal transaction of block chain and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination