CN118400185B

CN118400185B - Detection method for model inversion attack

Info

Publication number: CN118400185B
Application number: CN202410822970.3A
Authority: CN
Inventors: 田志宏; 胥迤潇; 李默涵; 刘园; 方滨兴; 苏申; 鲁辉; 仇晶; 孙彦斌; 张乐君; 谭庆丰
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2024-06-25
Filing date: 2024-06-25
Publication date: 2024-08-23
Anticipated expiration: 2044-06-25
Also published as: CN118400185A

Abstract

The invention provides a detection method for model inversion attack, which belongs to the fields of network security and artificial intelligence security, and comprises the following steps: randomly generating an initial honey point, detecting whether the initial honey point meets the requirement or not based on behavior logic of an attacker, outputting the initial honey point as a deep honey point if the initial honey point meets the requirement, and otherwise updating the initial honey point; deploying deep honey points to obtain a protected model; and detecting the state of the deep honey point, and detecting the model inversion attack when the deep honey point is triggered. The detection method for the model inversion attack can improve the detection capability and the detection accuracy for the model inversion attack on the premise that the function of the protected model is not obviously affected.

Description

Detection method for model inversion attack

Technical Field

The invention relates to the fields of network security and artificial intelligence security, in particular to a detection method for model inversion attack.

Background

In recent years, the deep learning model is close to or even exceeds the manual level in various tasks, and is widely applied in the production and living processes. Meanwhile, with the appearance of multi-mode large-scale language vision models, deep learning models are deeply applied to the fields of energy, medical treatment and the like. At the same time, the application of deep learning models in the security field has also attracted a great deal of public attention.

The deep learning model needs a large amount of training data, which may contain high-value attack targets such as personal privacy data or business confidential information. An attacker may attempt to attack the deep model through various means to acquire training data, so ensuring the safety of the training data is an important issue in the field of deep learning models.

Among the numerous attack modes, the model inversion attack is an attack mode with serious threat to the deep learning model, and an attacker restores the characteristics of model training data through iteratively accessing the model, including but not limited to obtaining samples in a training set and reconstructing privacy sensitive characteristics of the training set, thereby causing model data leakage.

Therefore, how to defend model inversion attack becomes an important research subject, and the existing defending method aiming at model inversion attack can mainly limit two modes of outputting information and interfering with outputting information. However, the defense method for limiting the output information has the problems of limited defense capability and easy bypass by an attacker, and the defense method for interfering the output information has the problem of influencing the normal function of the model. It is therefore desirable to propose a solution to the above-mentioned problems.

Disclosure of Invention

The invention aims to provide a detection method for model inversion attack, which can improve the condition that a limited output information defense method is easy to bypass and interfere with the output information defense method to influence the normal function of a model.

The invention provides a detection method for model inversion attack, which comprises the following steps:

Generating initial honey points randomly, and detecting whether the initial honey points meet the requirements based on behavior logic of model inversion attackers: content generation is carried out by using a plurality of generation models based on the behavior logic until the output of all the generation models contains initial honey points, when the sum of the times of iterative generation is smaller than a preset value, the initial honey points meet the requirements, the output is deep honey points, and otherwise, the initial honey points are updated;

Deploying the deep honey points into the original model to obtain a protected model;

When the deep sweet spot in the protected model is triggered, the protected model is attacked by the model inversion.

The detection method based on the inversion attack aiming at the model has the beneficial effects that:

According to the invention, the depth honey points are generated through the description of the behavior logic of the model inversion attacker and are deployed in the model in a mode of model fine adjustment, so that the high-efficiency and high-precision detection of the model inversion attack is realized under the condition that the usability of the model is not obviously affected.

Optionally, the behavior logic of the attacker includes:

An attacker reconstructs the training set data or the highly similar data by taking the training set data as a target through the gradient by iteratively accessing the model.

Optionally, when detecting the initial honey point, the method includes:

And iteratively optimizing input variables of the plurality of generation models by taking the initial honey point as a target, until the plurality of generation models generate the initial honey point, recording the total iteration times of all the generation models, outputting the initial honey point as a depth honey point when the total iteration times are smaller than a preset value, and otherwise, updating the initial honey point.

Optionally, when the iterative optimization generates the input variable of the model, the method includes:

And the generation model carries out content generation based on the variables, when the generation result is different from the initial honey point, the distance between the generation result and the initial honey point is calculated, the first gradient of the distance about the honey point is calculated, the input variables are updated based on the first gradient and regenerated until the generation result is the initial honey point.

Optionally, when the number of iterative updating reaches a preset threshold value during iterative optimization of the input variable, if the final result is still not the same as the initial honey point, calculating a second gradient of the initial honey point with respect to the final result, and updating the initial honey point based on the second gradient; the final result is to generate new input variables of the model, and the updated initial honey points generate new targets of the model.

based on behavioral logic depiction of the model inversion attacker, whether the depth honey points can be generated in the effective iteration times is detected, so that the depth honey points are ensured to be deployed on a key path of the model inversion attack, and the detection rate of the model inversion attack by the depth honey points is effectively improved.

Optionally, when the deep honey point is deployed, the method comprises:

randomly sampling a small amount of data from training set data of an original model to obtain a fine-tuning data set, and removing the sampled data in the training set to obtain an unprocessed data set; and combining the deep honey points with all data in the fine adjustment data set to obtain a honey point data set, and fine adjusting the original model based on the honey point data set and the unprocessed data set to obtain a protected model.

The deep honey points are deployed in the protected model in a mode of model fine adjustment, and the unprocessed data set ensures that the normal performance of the model is not obviously affected.

Optionally, detecting whether the deep honey point is triggered includes:

recording the middle layer characteristics of the deep honey points in the protected model, extracting the middle layer characteristics of the input information in the generation process when the protected model generates the content based on the input information, and comparing the middle layer characteristics of the input information with the middle layer characteristics of the deep honey points.

Optionally, comparing the intermediate layer characteristics of the input layer information with the intermediate layer characteristics of the deep honey points includes:

Calculating the feature cosine similarity of the middle layer features of the input information and the middle layer features of the depth honey points, and when the feature cosine similarity is higher than a preset value, triggering the depth honey points to judge the input information as model inversion attack.

The detection method based on the inversion attack aiming at the model has the beneficial effects that: and detecting the state of the deep honey point in real time in the process of generating the model content, and stopping the generation of the content in time when the deep honey point is triggered, so that the safety of the model training set data is ensured.

Drawings

FIG. 1 is a flow chart of a detection method for model inversion attack;

FIG. 2 is a flow chart of deploying deep honey points into a protected model;

FIG. 3 is a flow chart of inversion attack of a protected system based on a deep sweet spot detection model.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. Unless otherwise defined, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.

Referring to fig. 1, the method for detecting model inversion attack provided by the invention comprises the following steps:

s1, randomly generating an initial honey point, detecting whether the initial honey point meets the requirements or not based on behavior logic of an attacker, outputting the initial honey point as a deep honey point if the initial honey point meets the requirements, and otherwise updating the initial honey point;

s2, deploying deep honey points to obtain a protected model;

S3, detecting a deep honey point state, and detecting a model inversion attack when the deep honey point is triggered.

In fact, in step S1, when generating the initial honey point, the format of the initial honey point is the same as the training set data, taking the image classification scene as an example: for a given original model and a training set data set, if the format of training set data in the training set data set is 3×255×255, when an initial honey point is generated, the size of the initial honey point is also 3×255×255, the content in the initial honey point, namely the value of a picture, is random, and at this time, the initial honey point generation is completed.

In fact, in step S1, when detecting whether the initial honey point meets the requirement, the detection is performed based on the behavior logic of the attacker, where the behavior logic of the attacker is: an attacker reconstructs the training set data or the highly similar data by taking the training set data as a target through the gradient by iteratively accessing the model.

Specifically, taking an image classification scene as an example:

when model inversion attack is carried out on the generated model by a model inversion attacker, the attack target generates training set data of the model: an attacker may attempt to reconstruct the training set data Or acquiring and dataHighly similar dataBy reconstructing the obtained data, an attacker can analyze the data to obtain data comprising a training setSensitive information of the training set model in the training set model itself, so that information leakage is caused;

Besides the attack target is training set data, the attack mode of an attacker is also that the behavior logic is as follows: in the initial stage of attack, an attacker accesses a generation model, inputs an initial variable to generate, and because the initial variable only comprises the most basic generation condition, an output result generated by taking the initial variable as input often does not have the characteristic with enough identification, or the output result only comprises the characteristic of the same type of data, for example, when a face image is generated, the initial generation result may be only a fuzzy general face, even only contains the general outline of the face;

After the initial content generation is completed, the attacker enters the next stage, and starts to iteratively access the generation model based on the initial generation result, and in the iterative access process, conditions and details are continuously added in the input variables, so that the generation model is induced to generate a generation result with gradually increased recognition degree and more specificity, and a face image is taken as an example: the initial generation result is only a blurred front general face, and along with continuous iterative access of an attacker, details of images are increased continuously, and a plurality of details such as gender characteristics, facial expression, facial characteristics, five-sense organ characteristics, color development, facial orientation and the like are added from the initial general face;

Finally, after multiple rounds of iterative access, an attacker is guided to successfully generate images with the similarity of model output and training set data up to 95% or consistent key characteristics At this time, training set data of the generated model is leaked due to model inversion attack of an attacker, and a face image is taken as an example: after multiple rounds of iterative access, the attacker results guide the generation model to generate an image of a certain human male, and the image is highly similar to the image of the male in the training set data, and at the moment, the information in the model is revealed.

In fact, in step S1, the detection of the initial honey point includes:

Specifically, the generated models used for detecting whether the initial honey point is qualified can be all the generated models which are in actual use in the market, such as stylegan, stylegan and wgan, etc., in the invention, the number of the generated models used for checking is three, and when the initial honey point is checked to be qualified, the generated models are sequentially used、、The three generation models perform content generation, wherein A, B, C three generation models are any generation model, and in general, the three generation models are different;

The object of detecting the initial honey point by using a plurality of generation models is to check whether the initial honey point can be generated within a prescribed number of times, when the initial honey point can be generated 、、Three generation models are generated simultaneously in the iteration generation times of 300 times of 100 times of sum, so that the honey point is proved to have high probability of being generated when a model inversion attacker attacks the model; therefore, in the whole attack process of model inversion attack of the model inversion attacker, the depth honey point is generated in the content before the model outputs training set data or the content containing the training set data characteristics due to the model inversion attack in the iterative access process, so that the protected model can find that the model inversion attack is received by the protected model before the attacker reaches the attack target; therefore, when the initial honey point can be generated by the generated model within a specified number of times, the honey point meets the requirement, is output as a depth honey point, and is deployed on a critical path of the model inversion attack in the protected model.

In fact, in step S1, when iteratively optimizing the input variables of the generated model, it includes:

the generation model carries out iterative generation of the content based on the input variable, and when the generation result is not the same as the initial honey point, the input variable is updated and regenerated until the generation result is the initial honey point;

When updating the input variable, first, a distance of the generated result with respect to the initial honey point is calculated, a gradient with respect to the initial honey point is calculated based on the distance, and the input variable is updated based on the gradient.

Specifically, the purpose of gradient-based updating of the input variable is to make the input variable approach to the initial honey point, so that the generated content gradually converges to the initial honey point, and finally, the generated content which is the same as the initial honey point is generated after a certain number of iterative updating.

Specifically, generating output based on input variables by using a generation model, calculating the distance between the output and an initial honey point, solving the gradient related to the distance, updating the input variables based on the gradient, and regenerating;

the expression for calculating the gradient between the distance and the initial honey point is:

；

Wherein, As an initial point of the honey is to be taken,In order to input the variable(s),Generating a modelBased on input variablesIs provided with an output of (a),To output and initiate honey pointIs used for the distance of (a),For distance about initial honey pointIs a gradient of (2);

The expression for updating the input variables based on the gradient is:

；

Wherein the method comprises the steps of For learning rate, 0.001 is taken,As a sign function, when the input variable is updatedThe generated result of (1) is still not equal to the initial honey pointContinuously and iteratively updating input variable to obtain，For the number of iterative updates,Is the firstThe input variables obtained after the secondary updating.

In fact, in step S1, when the number of iterative updates reaches the preset threshold value while iteratively optimizing the input variablesIf the final output is still not equal to the initial honey pointWhen the initial honey points are not in accordance with the requirements, calculating the gradient between the initial honey points and the distances, and updating the initial honey points based on the gradient; the final result is a generation modelInput variables generated by new iteration, and updated initial honey points generate a modelA new round of iteration generating targets;

Wherein the method comprises the steps of When the value of the initial honey point is 100, and after 100 times of iterative updating, the initial honey point can not be generated by the model within enough generation times, so that the function of the deep honey point can not be met and the initial honey point needs to be updated;

the expression for calculating the gradient between the initial sweet spot and the distance is:

；

Wherein the method comprises the steps of Reaching a preset threshold for the number of iterative updatesThe output of the model is generated at that time,To finally output the distance from the original sweet spot,Gradient of initial honey point with respect to distance;

The expression for updating the initial honey point based on the gradient is:

；

Wherein, For the updated initial honey point, the initial honey point is used as a target for generating the next iteration of the generating model, and if the initial honey point still does not meet the requirement, the initial honey point is continuously updated。

In particular, use is made of、、The three generation models sequentially repeat the detection process until the initial honey point meets the requirement when the sum of the iteration times of the three generation models is smaller than a preset value, and the initial honey point is output as a deep honey point.

In fact, referring to fig. 2, in step S2, when deploying deep honey points, it includes:

Randomly sampling a small amount of data from training set data of an original model to obtain a fine tuning data set, and removing the sampling data in the training set to obtain an unprocessed data set; and combining the deep honey points with all data in the fine adjustment data set to obtain a honey point data set, and fine adjusting the original model based on the honey point data set and the unprocessed data set to obtain a protected model.

Wherein the fine tuning data set accounts for 5% of the total data set, the rest 95% of training set data is unprocessed data set, when the deep honey points are combined with all data in the fine tuning data set, the deep honey points are implanted into the data in the fine tuning data set, so that the data has the characteristics of the deep honey points or the deep honey points; marking the data in the fine tuning data as protected class after being combined with the deep honey points, and recombining the data into a honey point data set;

Fine-tuning the original model based on the honey point data set and the unprocessed data set to obtain a protected model, wherein the unprocessed data set with the proportion of 95% ensures that the normal performance of the protected model is not obviously affected, the honey point data set with the proportion of 5% is used for deploying the deep honey points into the protected model, and the deep honey points are deployed into a critical path of the protected model based on generalization of the protected model;

For the training set data of the protected model, which contains multiple kinds of data, the protected class and the training set data to be protected are marked as the same class.

In fact, referring to fig. 3, in step S3, detecting whether the deep honey point is triggered includes:

Specifically, in the deep learning model, the model learns the mode and the feature of the training set data through the training of a large amount of training set data, so that the capability of accurately predicting or classifying the input data is obtained, in the image field, the deep learning model processes an image as a matrix composed of pixel values, and the feature of the image is obtained through analyzing the number of the matrix; in the language field, the deep learning model learns a large amount of text data to enable the text data to understand text semantics or predict subsequent content;

In the training process of the protected model, through training of honey point data set data containing deep honey points, the protected model learns the characteristics of the middle layer of the deep honey points in model training, taking images as examples: after the multi-layer neural network is learned, the characteristics learned by the neural network are more and more refined, and in the middle training process, the neural network learns middle layer characteristics of deep honey points, such as structural characteristics, gradients and the like of images from the whole to the structural units;

when an attacker accesses the protected model, after receiving input information and analyzing and understanding the input information, the protected model generates content based on the characteristics learned by the neural network and the input information, in the process, a detection module of the protected model compares the middle content generated in the generation process with middle layer characteristics of deep honey points learned in the training process, and performs characteristic cosine similarity calculation on the middle content and the middle layer characteristics.

In fact, referring to fig. 3, in step S3, comparing the intermediate layer characteristics of the input layer information with the intermediate layer characteristics of the deep honey points includes:

Calculating the feature cosine similarity of the middle layer features of the input information and the middle layer features of the depth honey points, and when the feature cosine similarity is larger than a preset threshold value, triggering the depth honey points to judge the input information as model inversion attack; when the feature cosine similarity is smaller than a preset threshold, the input information is judged to be a normal request, and content generation is performed normally.

Specifically, when the feature cosine similarity of the middle layer features of the input information and the middle layer features of the deep honey points is larger than a preset value of 0.8, the input information middle layer features are shown to be highly similar to the middle layer features of the deep honey points; if the protected model generates input information according to the content generation flow, the final generation result is highly similar to training set data marked as protected type, and at the moment, a model inversion attacker can achieve the purpose of model inversion attack to obtain data which is highly similar to or even identical to the training set data;

Therefore, in the process of generating the content, when the detection module detects that the feature cosine similarity of the features of the middle layer of the input information and the features of the middle layer of the deep honey points is higher than 0.8, the detection module judges that the input information is model inversion attack, terminates the content generation of the protected model, blocks an attacker from obtaining the generated content of the model, alarms the attack behavior and records the attack behavior into an attack alarm log record.

While embodiments of the present invention have been described in detail hereinabove, it will be apparent to those skilled in the art that various modifications and variations can be made to these embodiments. It is to be understood that such modifications and variations are within the scope and spirit of the present invention as set forth in the following claims. Moreover, the invention described herein is capable of other embodiments and of being practiced or of being carried out in various ways.

Claims

1. A method for detecting a model inversion attack, comprising:

Generating initial honey points randomly, and detecting whether the initial honey points meet the requirements based on behavior logic of model inversion attackers: performing content iterative generation by using a plurality of generation models based on the behavior logic until the output of all the generation models contains the initial honey point, and outputting the initial honey point as a deep honey point when the sum of the iterative generation times is smaller than a preset value, otherwise updating the initial honey point;

Randomly sampling a small amount of data from training set data of an original model to obtain a fine-tuning data set, and removing the sampled data in the training set to obtain an unprocessed data set; combining the deep honey points with all data in the fine adjustment data set to obtain a honey point data set, and fine adjusting the original model based on the honey point data set and the unprocessed data set to obtain a protected model;

recording the middle layer characteristics of the deep honey points in the protected model, extracting the middle layer characteristics of the input information in the generating process when the protected model generates content based on the input information, calculating the feature cosine similarity of the middle layer characteristics of the input information and the middle layer characteristics of the deep honey points, and triggering the deep honey points when the feature cosine similarity is higher than a preset value, so that the input information is judged to be model inversion attack.

2. The method of claim 1, wherein the behavioral logic of the attacker comprises:

and the attacker reconstructs the training set data or the highly similar data by taking the training set data as a target through the gradient by iteratively accessing the model.

3. The method according to claim 1, wherein detecting the initial honey point comprises:

And iteratively optimizing the input variables of the plurality of generation models by taking the initial honey point as a target, until the plurality of generation models generate the initial honey point, recording the iteration times sum of all generation models, outputting the initial honey point as a depth honey point when the iteration times sum is smaller than a preset value, and otherwise updating the initial honey point.

4. A method according to claim 3, wherein iteratively optimizing the input variables of the generated model comprises:

And generating contents based on variables by the generation model, calculating the distance between the generation result and the initial honey point when the generation result is different from the initial honey point, calculating a first gradient of the distance about the honey point, iteratively updating the input variables based on the first gradient, and regenerating until the generation result is the initial honey point.

5. The detection method according to claim 4, wherein when the number of iterative updates reaches a preset threshold while iteratively optimizing the input variable, if a final result is still not the same as the initial honey point, calculating a second gradient of the initial honey point with respect to the final result, and updating the initial honey point based on the second gradient; and the final result is a new input variable of the generated model, and the updated initial honey point is a new target of the generated model.