CN113536298B - Deep learning model bias poisoning attack-oriented defense method - Google Patents

Deep learning model bias poisoning attack-oriented defense method Download PDF

Info

Publication number
CN113536298B
CN113536298B CN202110652511.1A CN202110652511A CN113536298B CN 113536298 B CN113536298 B CN 113536298B CN 202110652511 A CN202110652511 A CN 202110652511A CN 113536298 B CN113536298 B CN 113536298B
Authority
CN
China
Prior art keywords
deep learning
learning model
training
sub
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110652511.1A
Other languages
Chinese (zh)
Other versions
CN113536298A (en
Inventor
陈晋音
陈一鸣
陈奕芃
郑海斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110652511.1A priority Critical patent/CN113536298B/en
Publication of CN113536298A publication Critical patent/CN113536298A/en
Application granted granted Critical
Publication of CN113536298B publication Critical patent/CN113536298B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a deep learning model-oriented prejudice poisoning attack defense method, which comprises the following steps: (1) acquiring a raw sample dataset; (2) Dividing the original sample data set into sub-training sets in blocks, (3) training the sub-training sets by using a basic classifier; (4) Evaluating the input of each basic classifier and calculating the correct number of classification of each basic classifier; (5) And screening out a basic classifier with highest classification accuracy, and training the deep learning model again by using the basic classifier to finally obtain a newly trained deep learning model. According to the method, the data individual samples in the original sample data set are mapped by using the hash function, so that the capability of the deep learning model in defending against the prejudicial poisoning attack is improved.

Description

Deep learning model bias poisoning attack-oriented defense method
Technical Field
The invention belongs to the field of deep learning, and particularly relates to a deep learning model-oriented defense method for prejudice poisoning attack.
Background
The deep learning model learns from a large number of sample data sets with wide sources, extracts intrinsic rules and abstracts data features, and can help human beings to make decisions and solve a plurality of complex pattern recognition problems in an automatic mode according to learned experiences, so that the deep learning technology is widely applied to the fields of search engines, image recognition, anomaly detection, natural language processing, voice recognition, recommendation systems, medical treatment, credit issuing, education and the like, good prediction and decision making effects are exerted, and larger social reverberation and better economic benefits are obtained. Along with the continuous deep study of researchers, the accuracy of decision making by using a deep study model is continuously improved, the application scene of the deep study is gradually widened, the deep study is gradually permeated into the traditional field, and the decision making by using a deep technology and the decision suggestion are obtained, so that the daily production and life of human beings are not affected negligibly.
While deep learning techniques may help one get more accurate and detailed decision results and give practical decision suggestions, recent studies have shown that since deep learning models make decisions highly dependent on the original sample data set used to train the deep learning model, the data samples associated with some of the attributes contained in the original sample data set can affect the decisions of the deep learning model to a large extent, i.e., sensitive attributes such as gender, etc. If the sample data set used for training the deep learning model is tampered with, the deep learning model is poisoned by poisoning attacks, further, if the attacker deliberately manipulates sensitive attribute data in the tampered data, the deep learning model is poisoned by prejudice. The poisoning attack of the deep learning model can cause a plurality of negative effects on social production and normal life of people, and the deep learning model gradually permeates to aspects of production and life of people along with the widening of the application range of the deep learning, so that the research on the defense method for prejudice poisoning attack of the deep learning model is particularly important.
The Chinese patent document with publication number CN112905997A discloses a method, a device and a system for detecting poisoning attack of a deep learning model, which comprise the following steps: (1) acquiring a sample set and a model to be detected; (2) Pre-training a benign model with the same structure as the model to be detected; (3) Carrying out data augmentation on part of samples to form a new sample set; (4) Taking each new sample as a target class, taking all the remaining new samples as source classes, and performing various poisoning attacks of the target class on the pre-trained benign model to obtain various poisoning models and various poisoning samples; (5) Obtaining detection results of the poisoning samples under all non-output poisoning models, and screening and constructing a poisoning model pool and a poisoning sample pool according to the detection results; (6) And judging whether the deep learning model to be detected is poisoned or not according to the detection result of the poisoning sample in the deep learning model to be detected and the detection result of the poisoning sample in the non-generated poisoning model. So as to realize the rapid and accurate detection of poisoning attack to the deep learning model. But the patent does not disclose a method of defending against deep learning model biased poisoning attacks.
Disclosure of Invention
The invention provides a deep learning model bias poisoning attack-oriented defense method, which has good universality, ensures the objectivity of the deep learning model in decision making and the fairness of the deep learning model in decision making, and improves the capability of the deep learning model in defending the bias poisoning attack.
The technical scheme adopted is as follows:
a method of defending against deep learning model biased poisoning attacks, the method comprising the steps of:
(1) Acquiring an original data set, marking sensitive attribute tags and task tags in the original data set, and constructing an original sample data set T;
(2) Dividing the original sample data set T into k sub-training sets P i (i epsilon 1,2,3, …, k) in a blocking manner, and distributing the sample data x into the sub-training sets P i by using a hash function h;
(3) Training the sub-training set P i by using a basic classifier;
(4) In the reasoning stage of the deep learning model training, evaluating the input of each basic classifier one by one, and calculating the correct classification quantity of each basic classifier;
(5) Screening out the basic classifier with highest classification accuracy according to the step (4), and training the deep learning model again by using the basic classifier to finally obtain the newly trained deep learning model.
In the step (2), the distribution method comprises the following steps: p i = { x e t|h (x) ≡i (mod k) }.
The number of sample individuals contained in each sub-training set P i is equal.
In the step (3), the training mode is as follows: and (3) independently training each sub-training set by using a basic classifier, wherein the basic classifier can only access class marks of data of the sub-training set where the basic classifier is located.
In the step (5), the screening method comprises the following steps: a new classifier g (T, x) is defined that is used to count the number n c(x),g(T,x):=arg max nc (x) of classification correctness for each base classifier.
Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
the robustness of the deep learning model for defending the bias poisoning attack is enhanced, the universality is good, the objectivity of the deep learning model in decision making and the fairness of the deep learning model in decision making are ensured, and the capability of the deep learning model for defending the bias poisoning attack is improved.
Drawings
Fig. 1 is a flow chart of a method for defending against deep learning model bias poisoning attacks according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
In order to solve the problems that a deep learning model is easy to encounter poisoning attack, especially bias poisoning attack caused by tampering of sensitive attribute data, in a training stage, so that the deep learning model makes a false prediction result to mislead a decision maker and the fairness of the deep learning model is damaged. The embodiment provides a deep learning model bias poisoning attack-oriented defending method, and a flow diagram is shown in fig. 1.
(1) Definition of deep learning model fairness
The invention defines decision making with bias without influence of sensitive attribute when the deep learning model is automatically decided as fairness of the deep learning model.
(2) Definition of deep learning model bias poisoning
The invention defines the behavior of the deep learning model, which is provided with bias prediction and decision results and impairs fairness, as bias poisoning of the deep learning model, by falsifying data containing sensitive attributes in an original sample data set encountered by the deep learning model when making an automatic decision and training by using the falsified sample data set. Wherein, tampering of the original sample data set includes the actions of destroying the integrity of the original sample data set or not destroying the integrity of the original sample data set but destroying the attribute labels corresponding to the individual data in the original sample data set, for example, an attacker intentionally manipulates the sensitive attribute data in the original sample data set in a training stage so that the deep learning model is wrongly trained and makes a biased decision, the attacker can add or delete the original sample data set or flip the class labels corresponding to the individual data in the original sample data set, the method can lead the original data sample sampled originally to no longer accord with independent identical distribution in terms of data distribution rule, the data sample ratio of a certain class label is too high, and the deep learning model can make a biased decision, so that the fairness of the deep learning model is impaired.
(3) Preparation and preprocessing of data sets
The present embodiment selects an image dataset with multi-tag classification, such as CIFAR dataset, and uses one of the bias attribute tags B as a sensitive attribute tag, such as a gender feature. Other tags in the dataset are selected as one or more task tags, which may be professional tags or the like, preprocessing the dataset to construct an original sample dataset T. The original sample data set is divided into k sub-training sets by using a hash function h, so that the equal number of sample individuals in each sub-training set is ensured, and since the hash function creates a one-to-one mapping relation between the data individuals and the sub-training sets, the hash value only depends on the value of the data individuals in the sample data set, so that the sub-training set blocks to which the data samples are mapped cannot be changed whether other data are manipulated to cause a poisoning attack or the total number of samples are changed by deleting or adding the data and the samples are randomly ordered.
(4) Selection of hash functions
A hash function is chosen that blocks the original sample dataset T into sub-datasets. The original training data set T is divided into k sub-training sets P i (i epsilon 1,2,3, …, k), a hash function h is used for determining that each sample data x is distributed into the sub-training sets, and P i is = { x epsilon T | h (x) ≡i (mod k) }, and the number of sample individuals contained in each sub-training set is equal.
(5) Training deep learning model on sub-training set
And (3) independently training by using a basic classifier on each sub-training set which is already divided, wherein the basic classifier is defined as f i(x):=f(Pi and x, and each basic classifier in the sub-training set can only access class labels of data of the sub-training set where the basic classifier is located.
(6) Evaluating each basic classification
In the reasoning stage of the deep learning model, the input of each basic classifier is evaluated, k basic classifiers are used, a plurality of classification results are returned, c is the correct classification result, and then the correct classification quantity n c(x):=|{i∈[k]∣fi (x) =c } | of each basic classifier is counted.
(7) Selecting the basic classifier with highest classification accuracy
Defining a new classifier g (T, x), wherein the classifier is used for screening out the largest n c(x), g(T,x):=arg max nc (x), namely the basic classifier with the highest classification accuracy is screened out in the step, and training the deep learning model again by using the basic classifier, so as to finally obtain the newly trained deep learning model.
The method for defending against the deep learning model bias poisoning attack is a new deep learning model training method, and because the hash value of the hash function only depends on the value of the sample data x, no matter an attacker manipulates data to throw toxin or changes the total number of samples or reorders the samples, the sub-training sets to which the sample data x is mapped cannot be changed, and thus, the training of each sub-training set is independent and not affected by each other. Therefore, the method for dividing the original sample data set into k sub-training sets to train by using the hash function enhances the robustness of the deep learning model in defending against the prejudice poisoning attack, and the method allows the selection of a simpler structure to reduce the calculation force and time consumption of the deep learning model in the actual production life application scene, and can also select a more complex model according to the actual situation, so that the method has better universality. The method for defending the bias poisoning attack of the deep learning model further ensures the objectivity of the deep learning model in decision making and the fairness of the deep learning model in decision making, improves the capability of the deep learning model in defending the bias poisoning attack, and provides guidance for improving the robustness of the deep learning model, enhancing the objectivity of the deep learning model in decision making and ensuring the fairness of the deep learning model in decision making.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims (2)

1. A method for defending against deep learning model bias poisoning attacks, the method comprising the steps of:
(1) Acquiring an original data set, marking sensitive attribute tags and task tags in the original data set, and constructing an original sample data set T;
(2) Dividing the original sample data set T into k sub-training sets P i (i epsilon 1,2,3, …, k) in a blocking manner, and distributing the sample data x into the sub-training sets P i by using a hash function h; the distribution method comprises the following steps: p i = { x e t|h (x) ≡i (modk) }; the number of sample individuals contained in each sub-training set P i is equal;
(3) Training the sub-training set P i by using a basic classifier; the training mode is as follows: the basic classifier is used for independent training on each sub-training set, and the basic classifier can only access class labels of data of the sub-training set where the basic classifier is located;
(4) In the reasoning stage of the deep learning model training, evaluating the input of each basic classifier one by one, and calculating the correct classification quantity of each basic classifier;
(5) Screening out the basic classifier with highest classification accuracy according to the step (4), and training the deep learning model again by using the basic classifier to finally obtain the newly trained deep learning model.
2. The method for defending against deep learning model bias poisoning attacks according to claim 1, wherein in the step (5), the screening method is as follows: a new classifier g (T, x) is defined that is used to count the number n c(x),g(T,x):=arg max nc (x) of classification correctness for each base classifier.
CN202110652511.1A 2021-06-11 2021-06-11 Deep learning model bias poisoning attack-oriented defense method Active CN113536298B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110652511.1A CN113536298B (en) 2021-06-11 2021-06-11 Deep learning model bias poisoning attack-oriented defense method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110652511.1A CN113536298B (en) 2021-06-11 2021-06-11 Deep learning model bias poisoning attack-oriented defense method

Publications (2)

Publication Number Publication Date
CN113536298A CN113536298A (en) 2021-10-22
CN113536298B true CN113536298B (en) 2024-04-30

Family

ID=78095881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110652511.1A Active CN113536298B (en) 2021-06-11 2021-06-11 Deep learning model bias poisoning attack-oriented defense method

Country Status (1)

Country Link
CN (1) CN113536298B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508650A (en) * 2018-10-23 2019-03-22 浙江农林大学 A kind of wood recognition method based on transfer learning
CN110414548A (en) * 2019-06-06 2019-11-05 西安电子科技大学 The level Bagging method of sentiment analysis is carried out based on EEG signals
CN110737659A (en) * 2019-09-06 2020-01-31 平安科技(深圳)有限公司 Graph data storage and query method, device and computer readable storage medium
CN111862260A (en) * 2020-07-31 2020-10-30 浙江工业大学 Bias eliminating method and device based on cross-domain dual-generation type countermeasure network
CN112189204A (en) * 2018-05-16 2021-01-05 国际商业机器公司 Interpretation of artificial intelligence based suggestions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112189204A (en) * 2018-05-16 2021-01-05 国际商业机器公司 Interpretation of artificial intelligence based suggestions
CN109508650A (en) * 2018-10-23 2019-03-22 浙江农林大学 A kind of wood recognition method based on transfer learning
CN110414548A (en) * 2019-06-06 2019-11-05 西安电子科技大学 The level Bagging method of sentiment analysis is carried out based on EEG signals
CN110737659A (en) * 2019-09-06 2020-01-31 平安科技(深圳)有限公司 Graph data storage and query method, device and computer readable storage medium
CN111862260A (en) * 2020-07-31 2020-10-30 浙江工业大学 Bias eliminating method and device based on cross-domain dual-generation type countermeasure network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"面向深度学习的公平性研究综述";陈晋音等;《计算机研究与发展》;20210208;第264-280页 *

Also Published As

Publication number Publication date
CN113536298A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN106815604A (en) Method for viewing points detecting based on fusion of multi-layer information
CN110991218B (en) Image-based network public opinion early warning system and method
CN112734775A (en) Image annotation, image semantic segmentation and model training method and device
CN111444873A (en) Method and device for detecting authenticity of person in video, electronic device and storage medium
Gelman et al. Gaydar and the fallacy of decontextualized measurement
CN109117885A (en) A kind of stamp recognition methods based on deep learning
Kumar et al. A survey on analysis of fake news detection techniques
CN111428511B (en) Event detection method and device
CN113761259A (en) Image processing method and device and computer equipment
CN114842343A (en) ViT-based aerial image identification method
CN110993102A (en) Campus big data-based student behavior and psychological detection result accurate analysis method and system
CN114662497A (en) False news detection method based on cooperative neural network
CN112507912A (en) Method and device for identifying illegal picture
CN106599834A (en) Information pushing method and system
CN109726703A (en) A kind of facial image age recognition methods based on improvement integrated study strategy
Dai Measuring populism in context: A supervised approach with word embedding models
CN116955707A (en) Content tag determination method, device, equipment, medium and program product
CN115309860A (en) False news detection method based on pseudo twin network
CN116028803A (en) Unbalancing method based on sensitive attribute rebalancing
CN114662586A (en) Method for detecting false information based on common attention multi-mode fusion mechanism
Younis et al. A new parallel bat algorithm for musical note recognition.
CN113536298B (en) Deep learning model bias poisoning attack-oriented defense method
Alwan et al. Cancellable face biometrics template using alexnet
CN114595329B (en) System and method for extracting few sample events of prototype network
Pham et al. Ookpik-A Collection of Out-of-Context Image-Caption Pairs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant