CN112733196A

CN112733196A - Privacy protection method and system for resisting member reasoning attack based on vector confusion

Info

Publication number: CN112733196A
Application number: CN202110358755.9A
Authority: CN
Inventors: 李红程; 华炜
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2021-04-30
Anticipated expiration: 2041-04-02
Also published as: CN112733196B

Abstract

The invention discloses a privacy protection method and a privacy protection system for resisting member reasoning attack based on vector confusion, which are used for sequentially carrying out confusion transformation meeting the vector availability constraint and the order-preserving requirement and disturbance transformation meeting the randomness, the vector availability constraint and the order-preserving requirement on a prediction vector output by a classification model, and returning a transformed noise vector as a model classification result. The method does not need to modify the target classification model and know the specific technical details of member reasoning attack, can be simply and quickly applied to the existing classification model, and has low cost and wide application range; vector availability constraints provide a flexible configuration scheme that balances prediction result availability and model privacy protectiveness; the added random disturbance obviously reduces the possibility that an attacker restores a prediction vector according to the noise vector, and the robustness of the method is improved; the order-preserving requirement guarantees that the model improves the member reasoning attack resistance under the condition of not reducing the prediction accuracy.

Description

Privacy protection method and system for resisting member reasoning attack based on vector confusion

Technical Field

The invention relates to the field of crossing of artificial intelligence and information security, in particular to a privacy protection method and a privacy protection system for resisting member reasoning attack based on vector confusion.

Background

At present, artificial intelligence technologies such as machine learning and deep learning are developed at a high speed, and artificial intelligence models are actively used in various fields to solve specific problems, for example, a deep learning model is used in the medical field to realize intelligent diagnosis, and a mass financial data training model is used in the financial field to realize automatic decision functions such as quantitative transaction. At present, an artificial intelligence model needs to be trained by using a large amount of data, the data often contain privacy information of a user, the training is directly performed by using the privacy information, and the model is likely to face a serious privacy disclosure risk.

The member reasoning attack is an intuitive and effective attack means for stealing privacy information from a trained target model. The prediction performance of the artificial intelligence model on the training data of the artificial intelligence model often differs from the non-training data, and the membership inference attack aims to utilize the difference to realize the function of judging whether the data sample is used for training the target model. A large number of researches show that an attacker only needs to initiate a data access request to a target model without knowing information such as a specific structure, a training mode and the like of the target model, and whether corresponding input data are used for training the model can be judged according to a prediction vector returned by the model.

Aiming at member reasoning attack, a large number of researchers think that overfitting is the main reason for attack success, and design corresponding regularization methods for reducing the generalization errors of the models, such as dropout, L2 regularization, minimum-maximum training and the like, wherein the methods need to retrain the models, the cost is high, the prediction accuracy of the models can be reduced, and many methods are only suitable for deep learning models; some other researchers directly start with the prediction vector returned by the model and process the prediction vector, aiming at eliminating the performance difference of the model on the data of the training set and the non-training set, such as limiting the prediction vector to the maximum K types or introducing antagonistic disturbance on the prediction vector, and the like, wherein the method loses the usability of the prediction vector to a certain extent; researchers also tend to facilitate differential privacy to combat membership inference attacks, however, such methods face a number of application challenges, such as the difficulty in balancing the privacy protection capabilities and the prediction capabilities of the models.

Disclosure of Invention

In order to solve the defects of the prior art and realize the purpose of privacy protection aiming at the prior member reasoning attack, the invention adopts the following technical scheme:

the privacy protection method for resisting member reasoning attack based on vector confusion comprises the following steps:

s1, inputting the data sample E into the classification model M to obtain the prediction vector of the classification model M to the data sample E

Wherein the integer K is more than or equal to 2 and represents the number of categories;

s2, setting confusion transformation T with order preserving property, applying the confusion transformation T on the predicted vector C to obtain confusion vector

The order preservation of the aliased transform T is achieved by, for any i and j (i =1,2, …, K j =1,2, … K i ≠ j), if

Then there is

If, if

Then, then

While the obfuscated transformation T satisfies the vector availability constraint D, which is a distance metric function, i.e.

Wherein d is a preset upper distance limit; the order preservation requirement guarantees that the model improves the member reasoning attack resistance under the condition of not reducing the prediction accuracy;

s3, adding a random perturbation vector to the confusion vector H

Generating a noise vector

I.e. by

Wherein

Adding random perturbation vectors

Thereafter, the relative sizes of the elements in the confusion vector H are unchanged, i.e. for arbitrary i and j (i =1,2, …, K j =1,2, … K i ≠ j), if

Then there is

I.e. by

If, if

Then, then

I.e. by

(ii) a The added random disturbance obviously reduces the possibility that an attacker restores a prediction vector according to the noise vector, and the robustness of the method is improved;

and S4, taking the noise vector N as the final result of the classification model M and outputting the final result.

Further, the step S2 includes the following steps:

s21, setting a hyper-parameter alpha of the confusion transformation T, wherein the alpha represents a confusion vector

Maximum value of

And second maximum value

Target difference therebetween, i.e.

，

Wherein, in the step (A),

，

respectively, the maximum value in the confusion vector H

And second maximum value

The subscript of (a) is,

and is and

；

s22, applying order preserving transformation to the prediction vector C to generate a confusion vector H;

setting a maximum value in an alias vector H

And the maximum value in the prediction vector C

Is delta, i.e.

Sum of elements of the prediction vector C

Dividing the maximum value in the confusion vector H

The outer elements are calculated according to the following formula:

，

due to the fact that

To obtain

And finally, obtaining elements in the confusion vector H according to the element calculation formula.

Further, the step S22 is executed because

And due to

To obtain

As can be seen from the above, the confusion transformation T implements order preserving transformation with constant vector element sums.

Further, in the step S21, a hyper-parameter beta of the confusion transform T is also set;

beta is a parameter of the vector availability constraint that confusion vector H needs to satisfy (beta ≧ 0), indicating that the maximum value in confusion vector H is allowed

And second maximum value

The difference alpha between, and the maximum value in the prediction vector C

And second maximum value

The maximum difference between the differences d, i.e. alpha, has to be satisfied

(ii) a If alpha is less than

Then alpha is set to

If alpha is greater than

Then alpha is set to

；

In step S3, the maximum value of the confusion vector H is set

And second maximum value

Applying random perturbations eps and-eps, i.e.

，

，

Representing assignment operation, and in order to ensure that the noise vector N after disturbance application still meets the vector availability constraint and order preservation, eps needs to meet the following constraint conditions：

S31, if the number of the classes output by the classification model M is more than 2, namely K is more than 2, then

For the third largest value in the confusion vector H,

as the third largest value in the confusion vector H

Subscript, obtained by solving, eps satisfies

；

S32, if the number of classes output by the classification model M is equal to 2, i.e., K =2, then

Solved to obtain eps satisfying

。

The vector availability constraints provide a flexible configuration scheme that balances predictive result availability and model privacy protections.

Further, in the step S3, a random disturbance eps is generated and updated

，

And obtaining a noise vector N.

Further, for the same model, the values of alpha and beta are fixed.

Further, the privacy protection system corresponding to the method comprises: the model prediction module, the vector confusion module, the vector perturbation module and the model output module;

the model prediction module is used for receiving data input and generating a prediction vector;

the vector confusion module is used for applying order-preserving confusion transformation meeting vector availability constraint on the prediction vector to generate a confusion vector;

the vector perturbation module is used for generating a random perturbation vector and adding the random perturbation vector to the confusion vector to form an order-preserving noise vector;

and the model output module is used for returning the noise vector as a final result of model classification.

The invention has the advantages and beneficial effects that:

the method does not need to modify the target classification model and know the specific technical details of member reasoning attack, can be simply and quickly applied to the existing classification model, and has low cost and wide application range; vector availability constraints provide a flexible configuration scheme that balances prediction result availability and model privacy protectiveness; the added random disturbance obviously reduces the possibility that an attacker restores a prediction vector according to the noise vector, and the robustness of the method is improved; the order-preserving requirement guarantees that the model improves the member reasoning attack resistance under the condition of not reducing the prediction accuracy.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

As shown in fig. 1, the privacy protection method against member inference attack based on vector obfuscation includes the following steps:

(1) inputting the data sample E into a classification model M to obtain a classificationPrediction vector of model M to data sample E

Wherein K is the number of categories (K is more than or equal to 2 and K is an integer).

(2) Setting confusion transformation T with order preserving property, applying the confusion transformation T on the predicted vector C to obtain the confusion vector

Then there is

If, if

Then, then

At the same time, the confusion transform T needs to satisfy the vector availability constraint D, which is a distance metric function, i.e.

And d is a preset upper distance limit. The order-preserving requirement guarantees that the model improves the member reasoning attack resistance under the condition of not reducing the prediction accuracy.

The step is the core of the invention and is divided into the following substeps:

(2.1) setting the hyperparameters alpha and beta of the confusion transform T. For the same model, the values of alpha and beta are fixed.

alpha represents the confusion vector

Maximum value of

And second maximum value

Target difference therebetween, i.e.

，

. Wherein the content of the first and second substances,

，

respectively, the maximum value in the confusion vector H

And second maximum value

The subscript of (a) is,

and is and

。

beta is a parameter of the vector availability constraint that confusion vector H needs to satisfy (0 ≦ beta ≦ 1), representing allowing the maximum value in confusion vector H

And second maximum value

The difference alpha between and the maximum value in the prediction vector C

And second maximum value

. If alpha is less than

Then alpha is set to

If alpha is greater than

Then alpha is set to

。

And (2.2) applying order-preserving transformation to the prediction vector C to generate an aliasing vector H.

Assuming a maximum in the confusion vector H

And the maximum value in the prediction vector C

Is delta, i.e.

. Sum of elements of prediction vector C

. Dividing maximum value in confusion vector H

The outer elements are calculated according to the following formula, i.e.

，

. Due to the fact that

Is obtained by

Then the elements in the confusion vector H can be obtained according to the above element calculation formula.

In addition, is easy to obtain

And due to

Is easy to obtain

. As can be seen from the above, the confusion transformation T implements order preserving transformation with constant vector element sums.

(3) Adding a random perturbation vector to the confusion vector H

Generating a noise vector

I.e. by

Wherein

Adding random perturbation vectors

Then there is

I.e. by

If, if

Then, then

I.e. by

. The added random disturbance obviously reduces the possibility that an attacker restores a prediction vector according to the noise vector, and the robustness of the method is improved.

In particular, for the maximum value of the confusion vector H

And second maximum value

Applying random perturbations eps and-eps, i.e.

，

，

Representing assignment operation, and in order to ensure that the noise vector N after disturbance application still meets the vector availability constraint and the order preservation, eps needs to meet the following constraint conditions:

(3.1) if the number of the classes output by the classification model M is more than 2, namely K is more than 2, then

For the third largest value in the confusion vector H,

as the third largest value in the confusion vector H

Subscript where, solved for, eps needs to satisfy

；

(3.2) if the number of classes output by the classification model M is equal to 2, i.e., K =2, then

Solved to obtain, eps needs to satisfy

。

(4) Generating random perturbation eps meeting the requirement and updating

，

And obtaining the noise vector N. And taking the noise vector N as a final result of the classification model M and outputting the final result.

A system for satisfying a privacy protection method for resisting member reasoning attack based on vector confusion comprises the following steps: the model prediction module, the vector confusion module, the vector perturbation module and the model output module;

and the vector perturbation module is used for generating a random perturbation vector and adding the random perturbation vector to the confusion vector to form an order-preserving noise vector.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. The privacy protection method for resisting member reasoning attack based on vector confusion is characterized by comprising the following steps:

Aliasing transform T for arbitrary i and j (i =1,2, …, K j =1,2, … K i ≠ j) if

Then there is

If, if

Then, then

Wherein d is a preset upper distance limit;

s3, adding a random perturbation vector to the confusion vector H

Generating a noise vector

I.e. by

Wherein

Adding random perturbation vectors

Then there is

I.e. by

If, if

Then, then

I.e. by

；

2. The privacy protection method against membership inference attacks based on vector obfuscation as claimed in claim 1, wherein said step S2 includes the steps of:

Maximum value of

And second maximum value

Target difference therebetween, i.e.

，

Wherein, in the step (A),

，

respectively, the maximum value in the confusion vector H

And second maximum value

The subscript of (a) is,

and is and

。

3. the privacy protection method against membership inference attacks based on vector obfuscation as claimed in claim 2, wherein said step S2 further comprises the steps of:

setting a maximum value in an alias vector H

And the maximum value in the prediction vector C

Is delta, i.e.

Sum of elements of the prediction vector C

Dividing the maximum value in the confusion vector H

The outer elements are calculated according to the following formula:

，

due to the fact that

To obtain

4. The privacy protection method against member inference attack based on vector obfuscation as claimed in claim 2, wherein the step S21 is further to set a hyper-parameter beta of the obfuscation transformation T;

And second maximum value

The difference alpha between, and the maximum value in the prediction vector C

And second maximum value

(ii) a If alpha is less than

Then alpha is set to

If alpha is greater than

Then alpha is set to

。

5. The privacy protection method against membership inference attacks based on vector obfuscation as claimed in claim 4, wherein said step S3 includes the steps of:

for maximum value of confusion vector H

And second maximum value

Applying random perturbations eps and-eps, i.e.

，

，

Representing the assignment operation, eps needs to satisfy the following constraint conditions:

For the third largest value in the confusion vector H,

as the third largest value in the confusion vector H

Subscript, obtained by solving, eps satisfies

；

Solved to obtain eps satisfying

。

6. The privacy protection method against membership inference attack based on vector confusion as claimed in claim 5, wherein in step S3, random perturbation eps is generated and updated

，

And obtaining a noise vector N.

7. The privacy protection method for resisting member inference attack based on vector confusion as claimed in claim 2, wherein alpha value is fixed for the same model.

8. The privacy protection method for resisting member inference attack based on vector confusion as claimed in claim 4, wherein the beta value is fixed for the same model.

9. The privacy protection method against membership inference attack based on vector obfuscation as claimed in claim 3, wherein said step S22 is performed since

And due to

To obtain

I.e. the sum of the vector elements is unchanged.

10. The privacy protection system of claim 1, comprising: the model prediction module, the vector confusion module, the vector perturbation module and the model output module are characterized in that: