CN113536298B

CN113536298B - Deep learning model bias poisoning attack-oriented defense method

Info

Publication number: CN113536298B
Application number: CN202110652511.1A
Authority: CN
Inventors: 陈晋音; 陈一鸣; 陈奕芃; 郑海斌
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2024-04-30
Anticipated expiration: 2041-06-11
Also published as: CN113536298A

Abstract

The invention discloses a deep learning model-oriented prejudice poisoning attack defense method, which comprises the following steps: (1) acquiring a raw sample dataset; (2) Dividing the original sample data set into sub-training sets in blocks, (3) training the sub-training sets by using a basic classifier; (4) Evaluating the input of each basic classifier and calculating the correct number of classification of each basic classifier; (5) And screening out a basic classifier with highest classification accuracy, and training the deep learning model again by using the basic classifier to finally obtain a newly trained deep learning model. According to the method, the data individual samples in the original sample data set are mapped by using the hash function, so that the capability of the deep learning model in defending against the prejudicial poisoning attack is improved.

Description

Deep learning model bias poisoning attack-oriented defense method

Technical Field

The invention belongs to the field of deep learning, and particularly relates to a deep learning model-oriented defense method for prejudice poisoning attack.

Background

The deep learning model learns from a large number of sample data sets with wide sources, extracts intrinsic rules and abstracts data features, and can help human beings to make decisions and solve a plurality of complex pattern recognition problems in an automatic mode according to learned experiences, so that the deep learning technology is widely applied to the fields of search engines, image recognition, anomaly detection, natural language processing, voice recognition, recommendation systems, medical treatment, credit issuing, education and the like, good prediction and decision making effects are exerted, and larger social reverberation and better economic benefits are obtained. Along with the continuous deep study of researchers, the accuracy of decision making by using a deep study model is continuously improved, the application scene of the deep study is gradually widened, the deep study is gradually permeated into the traditional field, and the decision making by using a deep technology and the decision suggestion are obtained, so that the daily production and life of human beings are not affected negligibly.

While deep learning techniques may help one get more accurate and detailed decision results and give practical decision suggestions, recent studies have shown that since deep learning models make decisions highly dependent on the original sample data set used to train the deep learning model, the data samples associated with some of the attributes contained in the original sample data set can affect the decisions of the deep learning model to a large extent, i.e., sensitive attributes such as gender, etc. If the sample data set used for training the deep learning model is tampered with, the deep learning model is poisoned by poisoning attacks, further, if the attacker deliberately manipulates sensitive attribute data in the tampered data, the deep learning model is poisoned by prejudice. The poisoning attack of the deep learning model can cause a plurality of negative effects on social production and normal life of people, and the deep learning model gradually permeates to aspects of production and life of people along with the widening of the application range of the deep learning, so that the research on the defense method for prejudice poisoning attack of the deep learning model is particularly important.

The Chinese patent document with publication number CN112905997A discloses a method, a device and a system for detecting poisoning attack of a deep learning model, which comprise the following steps: (1) acquiring a sample set and a model to be detected; (2) Pre-training a benign model with the same structure as the model to be detected; (3) Carrying out data augmentation on part of samples to form a new sample set; (4) Taking each new sample as a target class, taking all the remaining new samples as source classes, and performing various poisoning attacks of the target class on the pre-trained benign model to obtain various poisoning models and various poisoning samples; (5) Obtaining detection results of the poisoning samples under all non-output poisoning models, and screening and constructing a poisoning model pool and a poisoning sample pool according to the detection results; (6) And judging whether the deep learning model to be detected is poisoned or not according to the detection result of the poisoning sample in the deep learning model to be detected and the detection result of the poisoning sample in the non-generated poisoning model. So as to realize the rapid and accurate detection of poisoning attack to the deep learning model. But the patent does not disclose a method of defending against deep learning model biased poisoning attacks.

Disclosure of Invention

The invention provides a deep learning model bias poisoning attack-oriented defense method, which has good universality, ensures the objectivity of the deep learning model in decision making and the fairness of the deep learning model in decision making, and improves the capability of the deep learning model in defending the bias poisoning attack.

The technical scheme adopted is as follows:

a method of defending against deep learning model biased poisoning attacks, the method comprising the steps of:

(1) Acquiring an original data set, marking sensitive attribute tags and task tags in the original data set, and constructing an original sample data set T;

(2) Dividing the original sample data set T into k sub-training sets P _i (i epsilon 1,2,3, …, k) in a blocking manner, and distributing the sample data x into the sub-training sets P _i by using a hash function h;

(3) Training the sub-training set P _i by using a basic classifier;

(4) In the reasoning stage of the deep learning model training, evaluating the input of each basic classifier one by one, and calculating the correct classification quantity of each basic classifier;

(5) Screening out the basic classifier with highest classification accuracy according to the step (4), and training the deep learning model again by using the basic classifier to finally obtain the newly trained deep learning model.

In the step (2), the distribution method comprises the following steps: p _i = { x e t|h (x) ≡i (mod k) }.

The number of sample individuals contained in each sub-training set P _i is equal.

In the step (3), the training mode is as follows: and (3) independently training each sub-training set by using a basic classifier, wherein the basic classifier can only access class marks of data of the sub-training set where the basic classifier is located.

In the step (5), the screening method comprises the following steps: a new classifier g (T, x) is defined that is used to count the number n _c(x),g(T,x):＝arg max n_c (x) of classification correctness for each base classifier.

Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:

the robustness of the deep learning model for defending the bias poisoning attack is enhanced, the universality is good, the objectivity of the deep learning model in decision making and the fairness of the deep learning model in decision making are ensured, and the capability of the deep learning model for defending the bias poisoning attack is improved.

Drawings

Fig. 1 is a flow chart of a method for defending against deep learning model bias poisoning attacks according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.

In order to solve the problems that a deep learning model is easy to encounter poisoning attack, especially bias poisoning attack caused by tampering of sensitive attribute data, in a training stage, so that the deep learning model makes a false prediction result to mislead a decision maker and the fairness of the deep learning model is damaged. The embodiment provides a deep learning model bias poisoning attack-oriented defending method, and a flow diagram is shown in fig. 1.

(1) Definition of deep learning model fairness

The invention defines decision making with bias without influence of sensitive attribute when the deep learning model is automatically decided as fairness of the deep learning model.

(2) Definition of deep learning model bias poisoning

The invention defines the behavior of the deep learning model, which is provided with bias prediction and decision results and impairs fairness, as bias poisoning of the deep learning model, by falsifying data containing sensitive attributes in an original sample data set encountered by the deep learning model when making an automatic decision and training by using the falsified sample data set. Wherein, tampering of the original sample data set includes the actions of destroying the integrity of the original sample data set or not destroying the integrity of the original sample data set but destroying the attribute labels corresponding to the individual data in the original sample data set, for example, an attacker intentionally manipulates the sensitive attribute data in the original sample data set in a training stage so that the deep learning model is wrongly trained and makes a biased decision, the attacker can add or delete the original sample data set or flip the class labels corresponding to the individual data in the original sample data set, the method can lead the original data sample sampled originally to no longer accord with independent identical distribution in terms of data distribution rule, the data sample ratio of a certain class label is too high, and the deep learning model can make a biased decision, so that the fairness of the deep learning model is impaired.

(3) Preparation and preprocessing of data sets

The present embodiment selects an image dataset with multi-tag classification, such as CIFAR dataset, and uses one of the bias attribute tags B as a sensitive attribute tag, such as a gender feature. Other tags in the dataset are selected as one or more task tags, which may be professional tags or the like, preprocessing the dataset to construct an original sample dataset T. The original sample data set is divided into k sub-training sets by using a hash function h, so that the equal number of sample individuals in each sub-training set is ensured, and since the hash function creates a one-to-one mapping relation between the data individuals and the sub-training sets, the hash value only depends on the value of the data individuals in the sample data set, so that the sub-training set blocks to which the data samples are mapped cannot be changed whether other data are manipulated to cause a poisoning attack or the total number of samples are changed by deleting or adding the data and the samples are randomly ordered.

(4) Selection of hash functions

A hash function is chosen that blocks the original sample dataset T into sub-datasets. The original training data set T is divided into k sub-training sets P _i (i epsilon 1,2,3, …, k), a hash function h is used for determining that each sample data x is distributed into the sub-training sets, and P _i is = { x epsilon T | h (x) ≡i (mod k) }, and the number of sample individuals contained in each sub-training set is equal.

(5) Training deep learning model on sub-training set

And (3) independently training by using a basic classifier on each sub-training set which is already divided, wherein the basic classifier is defined as f _i(x):＝f(P_i and x, and each basic classifier in the sub-training set can only access class labels of data of the sub-training set where the basic classifier is located.

(6) Evaluating each basic classification

In the reasoning stage of the deep learning model, the input of each basic classifier is evaluated, k basic classifiers are used, a plurality of classification results are returned, c is the correct classification result, and then the correct classification quantity n _c(x):＝|{i∈[k]∣f_i (x) =c } | of each basic classifier is counted.

(7) Selecting the basic classifier with highest classification accuracy

Defining a new classifier g (T, x), wherein the classifier is used for screening out the largest n _c(x), g(T,x):＝arg max n_c (x), namely the basic classifier with the highest classification accuracy is screened out in the step, and training the deep learning model again by using the basic classifier, so as to finally obtain the newly trained deep learning model.

The method for defending against the deep learning model bias poisoning attack is a new deep learning model training method, and because the hash value of the hash function only depends on the value of the sample data x, no matter an attacker manipulates data to throw toxin or changes the total number of samples or reorders the samples, the sub-training sets to which the sample data x is mapped cannot be changed, and thus, the training of each sub-training set is independent and not affected by each other. Therefore, the method for dividing the original sample data set into k sub-training sets to train by using the hash function enhances the robustness of the deep learning model in defending against the prejudice poisoning attack, and the method allows the selection of a simpler structure to reduce the calculation force and time consumption of the deep learning model in the actual production life application scene, and can also select a more complex model according to the actual situation, so that the method has better universality. The method for defending the bias poisoning attack of the deep learning model further ensures the objectivity of the deep learning model in decision making and the fairness of the deep learning model in decision making, improves the capability of the deep learning model in defending the bias poisoning attack, and provides guidance for improving the robustness of the deep learning model, enhancing the objectivity of the deep learning model in decision making and ensuring the fairness of the deep learning model in decision making.

The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims

1. A method for defending against deep learning model bias poisoning attacks, the method comprising the steps of:

(2) Dividing the original sample data set T into k sub-training sets P _i (i epsilon 1,2,3, …, k) in a blocking manner, and distributing the sample data x into the sub-training sets P _i by using a hash function h; the distribution method comprises the following steps: p _i = { x e t|h (x) ≡i (modk) }; the number of sample individuals contained in each sub-training set P _i is equal;

(3) Training the sub-training set P _i by using a basic classifier; the training mode is as follows: the basic classifier is used for independent training on each sub-training set, and the basic classifier can only access class labels of data of the sub-training set where the basic classifier is located;

2. The method for defending against deep learning model bias poisoning attacks according to claim 1, wherein in the step (5), the screening method is as follows: a new classifier g (T, x) is defined that is used to count the number n _c(x),g(T,x):＝arg max n_c (x) of classification correctness for each base classifier.