CN115935179A

CN115935179A - Model Stealing Detection Method Combining Training Set Data Distribution and W Distance

Info

Publication number: CN115935179A
Application number: CN202211346069.0A
Authority: CN
Inventors: 罗森林; 张辰龙; 潘丽敏; 陆永鑫; 张笈
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-04-07

Abstract

The invention relates to a model stealing detection method combining training set data distribution and W distance, and belongs to the technical field of computers and information science. Firstly, reducing dimensions of a training set and a query set by utilizing a VAE method; secondly, calculating probability distribution of the query set by utilizing maximum likelihood estimation, and sampling according to the probability distribution to obtain a plurality of groups of samples to be detected; then, for each group of samples to be detected, randomly sampling in a training set to obtain reference samples with the same number, and calculating the W distance between each group of samples to be detected and the reference samples; and finally, weighting and calculating all W distances by using the ratio of the number of categories in the reference sample to the total number of categories as a weight, and judging that the model stealing is detected when a weighting calculation result is greater than a detection threshold value. The invention provides a model stealing detection method for associating training set data distribution, simultaneously considers the characteristics of the distribution of a query set and a training set sample, improves a W distance calculation method, and effectively improves the accuracy of model stealing detection.

Description

Model stealing detection method combining training set data distribution and W distance

Technical Field

The invention relates to a model stealing detection method combining training set data distribution and W distance, and belongs to the technical field of computers and information science.

Background

Model stealing attacks are a type of malicious behavior that steals model functions or simulates model decision boundaries. Stolen models are often trained by owners spending a lot of time and money, have great commercial value, and once stolen, the rights and interests of the model owners are damaged, and meanwhile, the stolen models provide springboards for resisting attacks. Therefore, the method for researching the high-accuracy model stealing detection has important theoretical significance and practical value.

The existing model stealing detection method is mainly based on an abnormal sample detection method. In order to improve the attack efficiency, an attacker can utilize a small number of samples in the training sample set to synthesize samples, so that the synthesized samples are closer to the classification boundary of the target model, or the attacker can fully utilize the prediction vector of the target model as feedback and use a related data set or a randomly generated vector to train the substitution model. The existing research is dedicated to mining the distribution change of query samples caused by abnormal samples, for example, on the premise that the distance between two randomly extracted points in a finite space obeys normal distribution, whether the distance of a group of query samples obeys normal distribution is analyzed; or judging whether the query sample has excessive neighbor samples through a K neighbor algorithm. However, the existing detection method cannot change along with the change of the model, when an attacker steals one model successfully, a plurality of models can be stolen according to the method, the detection standard is simple, and the attacker can easily make an inference after trying for many times, so that the attacker can be bypassed, subsequent attacks cannot be responded, and the detection accuracy is greatly influenced.

In summary, the existing model stealing detection method has simple and fixed detection standard, and is easy to be bypassed after being deduced by an attacker. Therefore, the invention provides a model stealing detection method combining training set data distribution and W distance.

Disclosure of Invention

The invention aims to provide a model stealing detection method combining training set data distribution and W distance, aiming at the problem that the detection standard is fixed and easy to infer during model stealing detection.

The design principle of the invention is as follows: firstly, training a VAE model by utilizing a training data set, and calculating the output of the training set in the VAE to form a dimension reduction data set S; secondly, taking a group of query samples as VAE model input samples to obtain a reduced-dimension data set S 'of the query samples, calculating probability distribution of the S' by utilizing maximum likelihood estimation, and sampling according to the probability distribution to obtain k groups of samples to be detected, wherein the capacity of each group is D; thirdly, for each group of data sampled from S', randomly sampling a group of reference sample groups with the capacity of D from S as W distance calculation pairs, and calculating W distances between the two groups, namely Wasserstein distances; and finally, weighting and calculating the W distance by using the ratio of the number of the reference sample categories to the total number of the categories as a weight to judge the query behavior.

The technical scheme of the invention is realized by the following steps:

step 1, training a VAE model and a target model by using a training data set, and obtaining a dimension reduction data set S of the training data set by using the VAE model.

Step 1.1, constructing a VAE model framework.

Step 1.2, determining a VAE model loss function.

And 1.3, coding the training set data by using a VAE model to obtain the data after dimension reduction to form a data set S.

And 2, reducing the dimensions of the query data by using a VAE model, calculating probability distribution of each dimension based on maximum likelihood estimation, and sampling multi-group data by the probability distribution.

And 2.1, maintaining a queue m for the input query sample, wherein the queue length is D.

And 2.2, reducing the dimension of the input sample by using a VAE model, wherein the dimension is h, adding a queue m, removing a queue head sample when m is full, and adding a new sample to the tail of the queue.

And 2.3, extracting the characteristic information of the query sample to the maximum extent by the VAE model, wherein each dimension of the data after dimension reduction is independent. Therefore, the probability density and the probability distribution of each dimension of the data in the queue m are calculated based on the maximum likelihood estimation, 1 group of data is obtained by sampling h groups of probability distributions, and the operation is repeated for k times.

And step 3, for each group of data obtained in the step 2.3, randomly sampling the data groups with the same capacity from the S to serve as Wasserstein distance calculation pairs.

And 4, calculating the Wasserstein distance of each pair of data sets in the step 3, and weighting and summing the result according to the ratio of the number of the classes of the samples in the data sets in the step 3 to the total number of the classes to obtain the final distance W.

And 5, judging the query behavior by using the final distance W.

Advantageous effects

Compared with the existing model stealing detection method, the method combines the training set data distribution and the W statistical distribution distance calculation method. Firstly, compared with the commonly used KL divergence and JS divergence, the W distance can be used for calculating the distribution distance of the two sample sets under the condition that the two sample sets are overlapped a little, and the detection stability is improved. Secondly, the method takes training set distribution as a detection standard, the number of model training set samples is large, the distribution is complex and the model training sets are not distributed to the public, and meanwhile, the model training sets are different, so that the detection standard which is difficult to bypass easily is provided; secondly, the method fully considers the characteristics of model stealing and considers that the sample distribution and the training set sample used for normal query are approximately the same, so that the method can effectively cope with various model stealing attacks; then, after probability distribution is calculated for the query sample data, a plurality of groups of samples are sampled, so that the distribution characteristics of the query sample can be fully reflected, and distribution judgment is facilitated; and finally, performing weighted calculation on each group of results, so that the influence of large W distance values caused by too few categories is reduced, and the detection accuracy is improved.

Drawings

FIG. 1 is a schematic diagram of a model stealing detection method based on W distance according to the present invention.

Detailed Description

In order to better illustrate the objects and advantages of the present invention, embodiments of the method of the present invention are described in further detail below with reference to examples.

The experimental data are from a plurality of image classification data, including a 10-classification clothing class fast-MNIST data set, a 10-classification small object class CIFAR10 data set, a Google street view doorplate number SVHN data set and a traffic sign GTSRB data set. The training set and the test set are segmented according to the proportion of 9.

Three attack means of JBDA, knockoffnets and MAZE are used in the experiment process, the three attack methods are respectively emphasized, the JBDA utilizes a small amount of training set samples to make synthetic samples, and therefore the classification boundary of the target model is approached; the knock offnets adopt a larger data set possibly related to a target model, and model stealing is completed by using output feedback; the MAZE does not use any sample related to a training set, so that on the premise that the maximum difference between output vector information of a clone model and output vector information of a target model is maximum, a data generator is trained, and stealing is finished.

The test adopts an Accuracy (Accuracy) evaluation model to steal the detection result, and the Accuracy calculation method is shown as formula (1):

wherein TP is the number of the stealing behaviors determined as stealing behaviors, FN is the number of the normal behaviors determined as normal behaviors, FP is the number of the normal behaviors determined as stealing behaviors, and TN is the number of the stealing behaviors determined as normal behaviors.

The experiment is carried out on a computer and a server, and the computer is specifically configured as follows: interi 7-8750H, CPU 2.20GHz, memory 8G, an operating system is windows 10, 64 bits; the specific configuration of the server is as follows: e7-4820v4, RAM 256G, and the operating system is Linux Ubuntu 64 bit.

The specific process of the experiment is as follows:

step 1, training a VAE model and a target model by using a training data set, and obtaining dimension reduction data of the training data set by using the VAE model to form a data set S.

Step 1.1, constructing a VAE model framework.

Step 1.2, determining a VAE model loss function, as shown in formula (2) and formula (3).

In the formula, D ^train In order to train the data set, the data set is,

for standard autoencoder reconstruction of losses, the calculation is for D ^train Calculating the expectation of the square of the difference before and after passing through the VAE model; />

Pass and/or>

Divergence-limited encoded data obey>

Distribution, the calculation mode is to D ^train The sample in (a) calculates its coded distribution and the @' of the given distribution>

The expectation of divergence.

And 1.3, coding the training set data by utilizing a VAE model to form a data set S.

And 2, reducing the dimension of the query data by using a VAE model, calculating the probability distribution of the query sample based on the maximum likelihood estimation, and sampling multi-group data by the probability distribution.

And 2.2, when the query sample is input, reducing the sample to a vector with dimension h through a VAE model, adding the vector into a queue m, removing a queue head sample when the queue m is full, adding a new sample into a queue tail, and judging query behavior once every time one sample is input when the queue m is full, so that the real-time detection of model stealing is realized.

Step 2.3, VAEThe model can fully extract sample characteristics, and each dimensionality of the reduced sample is independent, so that the probability density can be respectively obtained for each dimensionality. Assuming that each dimension data obeys a certain set of parameter distributions

Sample data is->

Calculating the probability density of the query sample by finding the parameter theta at which the maximum value of formula (4) is obtained, and calculating the probability distribution in each dimension of the query sample based thereon>

Step 2.4, in each P separately _i Sampling once to obtain a group of data, repeating the sampling k times to obtain k groups of data, and recording the data as

Step 3, for each A _i Random sampling is performed once in a data set S, the sample capacity is D, and the obtained data set is marked as B _i 。

Step 4, calculating A according to a Wasserstein distance formula _i And B _i Wasserstein distance W between _i And to W _i A weighted sum is performed.

Step 4.1, calculate A according to formula (5) _i And B _i Wasserstein distance W between _i 。

Step 4.2, recording all the categories in the training data set as T, B _i Class of data contained inTotal number of digits is t _i The weighted Wasserstein distance W is calculated according to equation (6).

And 5, setting a detection threshold value delta and judging the query behavior.

And 5.1, selecting a threshold value delta to ensure that the misjudgment rate of the detector to the normal query behavior is equal to 0.5%.

And 5.2, comparing W with delta, and when W is larger than delta, determining that the difference between the distribution of the query sample and the distribution of the sample in the training set is too large, the query behavior does not accord with the characteristics of normal query behavior, and judging that the query behavior is model stealing.

And (3) testing results: the method can accurately detect 3 kinds of attack behaviors and normal query behaviors, achieves 97.3% of model stealing detection accuracy and has a good detection effect.

The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. The model stealing detection method combining training set data distribution and W distance is characterized by comprising the following steps of:

step 1, training a VAE model and a target model by utilizing a training data set, obtaining a dimension reduction data set S of the training data set by utilizing the VAE model,

step 1.1, building a VAE model framework,

step 1.2, determining a VAE model loss function,

step 1.3, the training set data is coded by using a VAE model to obtain the data after dimensionality reduction to form a data set S,

step 2, using VAE model to reduce dimension of query data, calculating probability distribution of each dimension based on maximum likelihood estimation, sampling multi-group data from probability distribution,

step 2.1, maintaining a queue m for the input query sample, the queue length being D,

step 2.2, reducing the dimension of the input sample by utilizing a VAE model, adding a queue m with the dimension being h, removing a queue head sample when m is full, adding a new sample into the queue tail,

step 2.3, the VAE model extracts the characteristic information of the query sample to the maximum extent, and each dimension of the data after dimension reduction is independent, therefore, the probability density and the probability distribution of each dimension of the data in the queue m are calculated based on the maximum likelihood estimation, 1 group of data is obtained by sampling h groups of probability distributions, and the data is repeated for k times to obtain k groups of samples to be detected

Step 3, for each group of data A obtained in step 2.3 _i Randomly sampling a reference sample group B with the same capacity from S _i As a W distance calculation pair, namely, wasserstein distance calculation pair,

step 4, calculating the Wasserstein distance of each pair of data sets in the step 3 according to the B _i The result is weighted and summed according to the ratio of the number of the classes of the medium samples to the total number of the classes to obtain the final distance W,

and 5, judging the query behavior by using the final distance W.

2. The model stealing detection method in combination with training set data distribution and W distance of claim 1, wherein: in step 2.3, the probability distribution of each dimension of the query sample after dimension reduction is calculated by utilizing maximum likelihood estimation, and each dimension of data is assumed to obey a certain group of parameter distribution

Sample data is->

A parameter theta is obtained when the following equation is maximized,

calculating the probability density of the query sample, and calculating the probability distribution of each dimension of the query sample

And respectively carrying out primary sampling in each Pi to obtain a group of data, and repeating the sampling for k times to obtain k groups of data.

3. The model stealing detection method in combination with training set data distribution and W distance of claim 1, wherein: in step 4, the Wasserstein distance is used to calculate the distribution distance between two sets of data, and the weighting calculation is performed according to the ratio of the category number of the reference sample in each pair of data sets to the total category number, as shown in the following formula,

A _i and B _i For a sampled data set, T is the total number of classes in the data set, T _i Is B _i The number of data classes contained in (a).