CN113536298B - Deep learning model bias poisoning attack-oriented defense method - Google Patents
Deep learning model bias poisoning attack-oriented defense method Download PDFInfo
- Publication number
- CN113536298B CN113536298B CN202110652511.1A CN202110652511A CN113536298B CN 113536298 B CN113536298 B CN 113536298B CN 202110652511 A CN202110652511 A CN 202110652511A CN 113536298 B CN113536298 B CN 113536298B
- Authority
- CN
- China
- Prior art keywords
- deep learning
- learning model
- training
- sub
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013136 deep learning model Methods 0.000 title claims abstract description 67
- 231100000572 poisoning Toxicity 0.000 title claims abstract description 39
- 230000000607 poisoning effect Effects 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000007123 defense Effects 0.000 title abstract description 6
- 238000012549 training Methods 0.000 claims abstract description 51
- 230000006870 function Effects 0.000 claims abstract description 10
- 238000012216 screening Methods 0.000 claims abstract description 7
- 238000009826 distribution Methods 0.000 claims description 4
- 230000000903 blocking effect Effects 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 abstract description 6
- 238000001514 detection method Methods 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 4
- 241000282414 Homo sapiens Species 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 239000012466 permeate Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000003053 toxin Substances 0.000 description 1
- 231100000765 toxin Toxicity 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Security & Cryptography (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a deep learning model-oriented prejudice poisoning attack defense method, which comprises the following steps: (1) acquiring a raw sample dataset; (2) Dividing the original sample data set into sub-training sets in blocks, (3) training the sub-training sets by using a basic classifier; (4) Evaluating the input of each basic classifier and calculating the correct number of classification of each basic classifier; (5) And screening out a basic classifier with highest classification accuracy, and training the deep learning model again by using the basic classifier to finally obtain a newly trained deep learning model. According to the method, the data individual samples in the original sample data set are mapped by using the hash function, so that the capability of the deep learning model in defending against the prejudicial poisoning attack is improved.
Description
Technical Field
The invention belongs to the field of deep learning, and particularly relates to a deep learning model-oriented defense method for prejudice poisoning attack.
Background
The deep learning model learns from a large number of sample data sets with wide sources, extracts intrinsic rules and abstracts data features, and can help human beings to make decisions and solve a plurality of complex pattern recognition problems in an automatic mode according to learned experiences, so that the deep learning technology is widely applied to the fields of search engines, image recognition, anomaly detection, natural language processing, voice recognition, recommendation systems, medical treatment, credit issuing, education and the like, good prediction and decision making effects are exerted, and larger social reverberation and better economic benefits are obtained. Along with the continuous deep study of researchers, the accuracy of decision making by using a deep study model is continuously improved, the application scene of the deep study is gradually widened, the deep study is gradually permeated into the traditional field, and the decision making by using a deep technology and the decision suggestion are obtained, so that the daily production and life of human beings are not affected negligibly.
While deep learning techniques may help one get more accurate and detailed decision results and give practical decision suggestions, recent studies have shown that since deep learning models make decisions highly dependent on the original sample data set used to train the deep learning model, the data samples associated with some of the attributes contained in the original sample data set can affect the decisions of the deep learning model to a large extent, i.e., sensitive attributes such as gender, etc. If the sample data set used for training the deep learning model is tampered with, the deep learning model is poisoned by poisoning attacks, further, if the attacker deliberately manipulates sensitive attribute data in the tampered data, the deep learning model is poisoned by prejudice. The poisoning attack of the deep learning model can cause a plurality of negative effects on social production and normal life of people, and the deep learning model gradually permeates to aspects of production and life of people along with the widening of the application range of the deep learning, so that the research on the defense method for prejudice poisoning attack of the deep learning model is particularly important.
The Chinese patent document with publication number CN112905997A discloses a method, a device and a system for detecting poisoning attack of a deep learning model, which comprise the following steps: (1) acquiring a sample set and a model to be detected; (2) Pre-training a benign model with the same structure as the model to be detected; (3) Carrying out data augmentation on part of samples to form a new sample set; (4) Taking each new sample as a target class, taking all the remaining new samples as source classes, and performing various poisoning attacks of the target class on the pre-trained benign model to obtain various poisoning models and various poisoning samples; (5) Obtaining detection results of the poisoning samples under all non-output poisoning models, and screening and constructing a poisoning model pool and a poisoning sample pool according to the detection results; (6) And judging whether the deep learning model to be detected is poisoned or not according to the detection result of the poisoning sample in the deep learning model to be detected and the detection result of the poisoning sample in the non-generated poisoning model. So as to realize the rapid and accurate detection of poisoning attack to the deep learning model. But the patent does not disclose a method of defending against deep learning model biased poisoning attacks.
Disclosure of Invention
The invention provides a deep learning model bias poisoning attack-oriented defense method, which has good universality, ensures the objectivity of the deep learning model in decision making and the fairness of the deep learning model in decision making, and improves the capability of the deep learning model in defending the bias poisoning attack.
The technical scheme adopted is as follows:
a method of defending against deep learning model biased poisoning attacks, the method comprising the steps of:
(1) Acquiring an original data set, marking sensitive attribute tags and task tags in the original data set, and constructing an original sample data set T;
(2) Dividing the original sample data set T into k sub-training sets P i (i epsilon 1,2,3, …, k) in a blocking manner, and distributing the sample data x into the sub-training sets P i by using a hash function h;
(3) Training the sub-training set P i by using a basic classifier;
(4) In the reasoning stage of the deep learning model training, evaluating the input of each basic classifier one by one, and calculating the correct classification quantity of each basic classifier;
(5) Screening out the basic classifier with highest classification accuracy according to the step (4), and training the deep learning model again by using the basic classifier to finally obtain the newly trained deep learning model.
In the step (2), the distribution method comprises the following steps: p i = { x e t|h (x) ≡i (mod k) }.
The number of sample individuals contained in each sub-training set P i is equal.
In the step (3), the training mode is as follows: and (3) independently training each sub-training set by using a basic classifier, wherein the basic classifier can only access class marks of data of the sub-training set where the basic classifier is located.
In the step (5), the screening method comprises the following steps: a new classifier g (T, x) is defined that is used to count the number n c(x),g(T,x):=arg max nc (x) of classification correctness for each base classifier.
Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
the robustness of the deep learning model for defending the bias poisoning attack is enhanced, the universality is good, the objectivity of the deep learning model in decision making and the fairness of the deep learning model in decision making are ensured, and the capability of the deep learning model for defending the bias poisoning attack is improved.
Drawings
Fig. 1 is a flow chart of a method for defending against deep learning model bias poisoning attacks according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
In order to solve the problems that a deep learning model is easy to encounter poisoning attack, especially bias poisoning attack caused by tampering of sensitive attribute data, in a training stage, so that the deep learning model makes a false prediction result to mislead a decision maker and the fairness of the deep learning model is damaged. The embodiment provides a deep learning model bias poisoning attack-oriented defending method, and a flow diagram is shown in fig. 1.
(1) Definition of deep learning model fairness
The invention defines decision making with bias without influence of sensitive attribute when the deep learning model is automatically decided as fairness of the deep learning model.
(2) Definition of deep learning model bias poisoning
The invention defines the behavior of the deep learning model, which is provided with bias prediction and decision results and impairs fairness, as bias poisoning of the deep learning model, by falsifying data containing sensitive attributes in an original sample data set encountered by the deep learning model when making an automatic decision and training by using the falsified sample data set. Wherein, tampering of the original sample data set includes the actions of destroying the integrity of the original sample data set or not destroying the integrity of the original sample data set but destroying the attribute labels corresponding to the individual data in the original sample data set, for example, an attacker intentionally manipulates the sensitive attribute data in the original sample data set in a training stage so that the deep learning model is wrongly trained and makes a biased decision, the attacker can add or delete the original sample data set or flip the class labels corresponding to the individual data in the original sample data set, the method can lead the original data sample sampled originally to no longer accord with independent identical distribution in terms of data distribution rule, the data sample ratio of a certain class label is too high, and the deep learning model can make a biased decision, so that the fairness of the deep learning model is impaired.
(3) Preparation and preprocessing of data sets
The present embodiment selects an image dataset with multi-tag classification, such as CIFAR dataset, and uses one of the bias attribute tags B as a sensitive attribute tag, such as a gender feature. Other tags in the dataset are selected as one or more task tags, which may be professional tags or the like, preprocessing the dataset to construct an original sample dataset T. The original sample data set is divided into k sub-training sets by using a hash function h, so that the equal number of sample individuals in each sub-training set is ensured, and since the hash function creates a one-to-one mapping relation between the data individuals and the sub-training sets, the hash value only depends on the value of the data individuals in the sample data set, so that the sub-training set blocks to which the data samples are mapped cannot be changed whether other data are manipulated to cause a poisoning attack or the total number of samples are changed by deleting or adding the data and the samples are randomly ordered.
(4) Selection of hash functions
A hash function is chosen that blocks the original sample dataset T into sub-datasets. The original training data set T is divided into k sub-training sets P i (i epsilon 1,2,3, …, k), a hash function h is used for determining that each sample data x is distributed into the sub-training sets, and P i is = { x epsilon T | h (x) ≡i (mod k) }, and the number of sample individuals contained in each sub-training set is equal.
(5) Training deep learning model on sub-training set
And (3) independently training by using a basic classifier on each sub-training set which is already divided, wherein the basic classifier is defined as f i(x):=f(Pi and x, and each basic classifier in the sub-training set can only access class labels of data of the sub-training set where the basic classifier is located.
(6) Evaluating each basic classification
In the reasoning stage of the deep learning model, the input of each basic classifier is evaluated, k basic classifiers are used, a plurality of classification results are returned, c is the correct classification result, and then the correct classification quantity n c(x):=|{i∈[k]∣fi (x) =c } | of each basic classifier is counted.
(7) Selecting the basic classifier with highest classification accuracy
Defining a new classifier g (T, x), wherein the classifier is used for screening out the largest n c(x), g(T,x):=arg max nc (x), namely the basic classifier with the highest classification accuracy is screened out in the step, and training the deep learning model again by using the basic classifier, so as to finally obtain the newly trained deep learning model.
The method for defending against the deep learning model bias poisoning attack is a new deep learning model training method, and because the hash value of the hash function only depends on the value of the sample data x, no matter an attacker manipulates data to throw toxin or changes the total number of samples or reorders the samples, the sub-training sets to which the sample data x is mapped cannot be changed, and thus, the training of each sub-training set is independent and not affected by each other. Therefore, the method for dividing the original sample data set into k sub-training sets to train by using the hash function enhances the robustness of the deep learning model in defending against the prejudice poisoning attack, and the method allows the selection of a simpler structure to reduce the calculation force and time consumption of the deep learning model in the actual production life application scene, and can also select a more complex model according to the actual situation, so that the method has better universality. The method for defending the bias poisoning attack of the deep learning model further ensures the objectivity of the deep learning model in decision making and the fairness of the deep learning model in decision making, improves the capability of the deep learning model in defending the bias poisoning attack, and provides guidance for improving the robustness of the deep learning model, enhancing the objectivity of the deep learning model in decision making and ensuring the fairness of the deep learning model in decision making.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.
Claims (2)
1. A method for defending against deep learning model bias poisoning attacks, the method comprising the steps of:
(1) Acquiring an original data set, marking sensitive attribute tags and task tags in the original data set, and constructing an original sample data set T;
(2) Dividing the original sample data set T into k sub-training sets P i (i epsilon 1,2,3, …, k) in a blocking manner, and distributing the sample data x into the sub-training sets P i by using a hash function h; the distribution method comprises the following steps: p i = { x e t|h (x) ≡i (modk) }; the number of sample individuals contained in each sub-training set P i is equal;
(3) Training the sub-training set P i by using a basic classifier; the training mode is as follows: the basic classifier is used for independent training on each sub-training set, and the basic classifier can only access class labels of data of the sub-training set where the basic classifier is located;
(4) In the reasoning stage of the deep learning model training, evaluating the input of each basic classifier one by one, and calculating the correct classification quantity of each basic classifier;
(5) Screening out the basic classifier with highest classification accuracy according to the step (4), and training the deep learning model again by using the basic classifier to finally obtain the newly trained deep learning model.
2. The method for defending against deep learning model bias poisoning attacks according to claim 1, wherein in the step (5), the screening method is as follows: a new classifier g (T, x) is defined that is used to count the number n c(x),g(T,x):=arg max nc (x) of classification correctness for each base classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110652511.1A CN113536298B (en) | 2021-06-11 | 2021-06-11 | Deep learning model bias poisoning attack-oriented defense method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110652511.1A CN113536298B (en) | 2021-06-11 | 2021-06-11 | Deep learning model bias poisoning attack-oriented defense method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113536298A CN113536298A (en) | 2021-10-22 |
CN113536298B true CN113536298B (en) | 2024-04-30 |
Family
ID=78095881
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110652511.1A Active CN113536298B (en) | 2021-06-11 | 2021-06-11 | Deep learning model bias poisoning attack-oriented defense method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113536298B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508650A (en) * | 2018-10-23 | 2019-03-22 | 浙江农林大学 | A kind of wood recognition method based on transfer learning |
CN110414548A (en) * | 2019-06-06 | 2019-11-05 | 西安电子科技大学 | The level Bagging method of sentiment analysis is carried out based on EEG signals |
CN110737659A (en) * | 2019-09-06 | 2020-01-31 | 平安科技(深圳)有限公司 | Graph data storage and query method, device and computer readable storage medium |
CN111862260A (en) * | 2020-07-31 | 2020-10-30 | 浙江工业大学 | Bias eliminating method and device based on cross-domain dual-generation type countermeasure network |
CN112189204A (en) * | 2018-05-16 | 2021-01-05 | 国际商业机器公司 | Interpretation of artificial intelligence based suggestions |
-
2021
- 2021-06-11 CN CN202110652511.1A patent/CN113536298B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112189204A (en) * | 2018-05-16 | 2021-01-05 | 国际商业机器公司 | Interpretation of artificial intelligence based suggestions |
CN109508650A (en) * | 2018-10-23 | 2019-03-22 | 浙江农林大学 | A kind of wood recognition method based on transfer learning |
CN110414548A (en) * | 2019-06-06 | 2019-11-05 | 西安电子科技大学 | The level Bagging method of sentiment analysis is carried out based on EEG signals |
CN110737659A (en) * | 2019-09-06 | 2020-01-31 | 平安科技(深圳)有限公司 | Graph data storage and query method, device and computer readable storage medium |
CN111862260A (en) * | 2020-07-31 | 2020-10-30 | 浙江工业大学 | Bias eliminating method and device based on cross-domain dual-generation type countermeasure network |
Non-Patent Citations (1)
Title |
---|
"面向深度学习的公平性研究综述";陈晋音等;《计算机研究与发展》;20210208;第264-280页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113536298A (en) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106815604A (en) | Method for viewing points detecting based on fusion of multi-layer information | |
CN110991218B (en) | Image-based network public opinion early warning system and method | |
CN112734775A (en) | Image annotation, image semantic segmentation and model training method and device | |
CN111444873A (en) | Method and device for detecting authenticity of person in video, electronic device and storage medium | |
Gelman et al. | Gaydar and the fallacy of decontextualized measurement | |
CN109117885A (en) | A kind of stamp recognition methods based on deep learning | |
Kumar et al. | A survey on analysis of fake news detection techniques | |
CN111428511B (en) | Event detection method and device | |
CN113761259A (en) | Image processing method and device and computer equipment | |
CN114842343A (en) | ViT-based aerial image identification method | |
CN110993102A (en) | Campus big data-based student behavior and psychological detection result accurate analysis method and system | |
CN114662497A (en) | False news detection method based on cooperative neural network | |
CN112507912A (en) | Method and device for identifying illegal picture | |
CN106599834A (en) | Information pushing method and system | |
CN109726703A (en) | A kind of facial image age recognition methods based on improvement integrated study strategy | |
Dai | Measuring populism in context: A supervised approach with word embedding models | |
CN116955707A (en) | Content tag determination method, device, equipment, medium and program product | |
CN115309860A (en) | False news detection method based on pseudo twin network | |
CN116028803A (en) | Unbalancing method based on sensitive attribute rebalancing | |
CN114662586A (en) | Method for detecting false information based on common attention multi-mode fusion mechanism | |
Younis et al. | A new parallel bat algorithm for musical note recognition. | |
CN113536298B (en) | Deep learning model bias poisoning attack-oriented defense method | |
Alwan et al. | Cancellable face biometrics template using alexnet | |
CN114595329B (en) | System and method for extracting few sample events of prototype network | |
Pham et al. | Ookpik-A Collection of Out-of-Context Image-Caption Pairs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |