CN114419379A - System and method for improving fairness of deep learning model based on antagonistic disturbance - Google Patents

System and method for improving fairness of deep learning model based on antagonistic disturbance Download PDF

Info

Publication number
CN114419379A
CN114419379A CN202210320949.4A CN202210320949A CN114419379A CN 114419379 A CN114419379 A CN 114419379A CN 202210320949 A CN202210320949 A CN 202210320949A CN 114419379 A CN114419379 A CN 114419379A
Authority
CN
China
Prior art keywords
disturbance
image
discriminator
fairness
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210320949.4A
Other languages
Chinese (zh)
Inventor
王志波
董小威
任奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210320949.4A priority Critical patent/CN114419379A/en
Publication of CN114419379A publication Critical patent/CN114419379A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a system and a method for improving fairness of a deep learning model based on antagonistic disturbance, wherein the system comprises a deployment model, a disturbance generator and a discriminator, the deployment model comprises a feature extractor and a label predictor, and the disturbance generator is connected with the feature extractor. The invention processes the input data of the deployment model without changing the deep learning model. The model fairness is improved based on the adversarial disturbance, a corresponding disturbance generator and a discriminator are designed, the discriminator is used for capturing sensitive attribute information related to the fairness, training optimization of the disturbance generator is guided, the sensitive attribute information of adversarial disturbance hidden data is generated, target task related information is reserved, the sensitive information of input data is prevented from being extracted by the model in the feature extraction process, and therefore prediction fairness is improved.

Description

System and method for improving fairness of deep learning model based on antagonistic disturbance
Technical Field
The invention relates to the field of trusted Artificial Intelligence (AI), in particular to a system and a method for improving fairness of a deep learning model based on adversarial disturbance.
Background
In recent years, deep neural networks have exhibited excellent performance in various fields such as image processing, natural language processing, voice recognition, and the like. Although the popularization of the application of the artificial intelligence technology promotes the change of various fields and brings convenience and improvement to human life, researches find that the existing partial artificial intelligence systems have ethical risks, and the systems contain bias and discrimination to specific groups and even place the vulnerable groups at a more unfavorable position. Therefore, the prejudice of the deep learning model is relieved, the fairness of model decision is improved, and the important premise for ensuring the reliable application of the artificial intelligence system is provided. The deep learning model usually learns from data, if the distribution of data of different groups is not balanced, a false statistical association exists between a target task label and a label of a sensitive attribute, which causes the model to learn the false association, and associates a predicted target task label with the label of the sensitive attribute, thereby generating a bias for a specific group. The existing technology for improving the fairness of the deep learning model essentially needs to modify the deployed model to prevent the model from learning false association so as to eliminate the bias of a specific group, thereby greatly limiting the practical application of the fairness mechanism of the deep learning model.
Disclosure of Invention
Aiming at the defect that the deployed deep learning model needs to be modified in the prior art, the invention provides a system and a method for improving the fairness of the deep learning model based on antagonistic disturbance, and the fairness is improved under the condition that the deep learning model is not changed.
In order to achieve the purpose, the invention is realized by the following technical scheme:
the invention discloses a depth learning model fairness promotion system based on antagonistic disturbance, which comprises a deployment model, a disturbance generator and a discriminator, wherein the deployment model comprises a feature extractor and a label predictor, the disturbance generator is connected with the feature extractor, the feature extractor is respectively connected with the label predictor and the discriminator, an image is input by the feature extractor, the image is subjected to a hidden space representation by the feature extractor, the hidden space representation is output as a prediction result of a target label after being input into the label predictor, and the hidden space representation is output as a prediction result of image sensitive attribute after being input into the discriminator.
As a further improvement, the input of the disturbance generator is an image, the output is antagonistic disturbance, and the disturbance value is added with the input image and then input into the feature extractor.
The invention also discloses a method for improving the fairness of the deep learning model based on the antagonistic disturbance, which comprises the following steps:
1) adding antagonistic disturbance to the image by using a disturbance generator, inputting the disturbed image into a feature extractor of a deployment model, outputting a hidden space representation of the image by the feature extractor, and obtaining a prediction result of a target label after the hidden space representation is input into a label predictor;
2) measuring sensitive attribute information contained in the disturbed image, inputting the hidden space representation into a discriminator to obtain a prediction result of the image sensitive attribute, training the discriminator to predict the sensitive attribute from the hidden space representation, and updating the discriminator;
3) updating the disturbance generator to better generate the antagonistic disturbance, deceiving the discriminator to ensure that the image added with the antagonistic disturbance does not contain information of sensitive attributes in a hidden space representation as much as possible, and simultaneously ensure that a prediction result of the target label predictor is as accurate as possible;
4) and (3) repeating the step 2) and the step 3) until the generator can well cheat the discriminator, the target label predictor has high accuracy, the disturbance generator at the moment is integrated into a deployment model data preprocessing link as a fairness promotion module, and antagonism disturbance is added to the input image to promote fairness.
As a further improvement, it is possible to,the deployment model of the invention is expressed as
Figure 155566DEST_PATH_IMAGE001
Wherein
Figure 898394DEST_PATH_IMAGE002
In order to provide a feature extractor for a computer,
Figure 138752DEST_PATH_IMAGE003
for the target label predictor, the input image is
Figure 650636DEST_PATH_IMAGE004
The sensitive property is
Figure 318246DEST_PATH_IMAGE005
The object label is
Figure 548370DEST_PATH_IMAGE006
As a further improvement, in step 1) of the invention, a disturbance generator is used
Figure 326839DEST_PATH_IMAGE007
For images
Figure 163338DEST_PATH_IMAGE004
Adding antagonistic disturbance, the disturbed image is
Figure 18161DEST_PATH_IMAGE008
Disturbance satisfies
Figure 329057DEST_PATH_IMAGE009
Norm limitation
Figure 583321DEST_PATH_IMAGE010
The disturbed image
Figure 584644DEST_PATH_IMAGE011
Input deployment model, feature extractor for deployment model
Figure 610369DEST_PATH_IMAGE002
Implicit spatial representation of an output image
Figure 798773DEST_PATH_IMAGE012
And obtaining the prediction result of the target label after the label predictor is input in the hidden space representation
Figure 669777DEST_PATH_IMAGE013
As a further improvement, in the step 2) of the invention, the updating is carried out
Figure 791186DEST_PATH_IMAGE014
So that the discriminator can accurately capture the sensitive attribute from the hidden space representation
Figure 722233DEST_PATH_IMAGE005
Is determined by the information of (a) a,
Figure 663513DEST_PATH_IMAGE014
the loss function of (d) is:
Figure 869366DEST_PATH_IMAGE015
wherein
Figure 579702DEST_PATH_IMAGE016
Representing cross entropy, the hidden space of the perturbed data is represented as
Figure 681651DEST_PATH_IMAGE012
Arbiter for sensitive attribute
Figure 110227DEST_PATH_IMAGE014
Is output as
Figure 323033DEST_PATH_IMAGE017
Figure 628156DEST_PATH_IMAGE005
Representing the true sensitive property.
As a further improvement, in the step 3) of the invention, the size is increased
Figure 901005DEST_PATH_IMAGE014
Entropy of prediction of disturbed image
Figure 161085DEST_PATH_IMAGE014
In disturbing the sample
Figure 426851DEST_PATH_IMAGE011
Making a random guess above, the loss of entropy is expressed as:
Figure 65773DEST_PATH_IMAGE018
wherein the content of the first and second substances,
Figure 634158DEST_PATH_IMAGE019
representing entropy, to this point, the generator
Figure 506168DEST_PATH_IMAGE007
The total loss for improving fairness is expressed as
Figure 795198DEST_PATH_IMAGE020
Figure 944420DEST_PATH_IMAGE021
Is a smaller value, controlling the weight of the entropy-constrained term.
As a further improvement, the step 3) of the invention is described, except that it is responsible for fairness perception
Figure 542760DEST_PATH_IMAGE022
Besides, information of the target label needs to be kept in the hidden space representation, the performance of the model on target label prediction needs to be kept, and a loss term responsible for the accuracy of the model needs to be:
Figure 511853DEST_PATH_IMAGE023
wherein
Figure 870153DEST_PATH_IMAGE016
The cross-entropy is represented by the cross-entropy,
Figure 732936DEST_PATH_IMAGE024
the output of the target label predictor representing the model,
Figure 377544DEST_PATH_IMAGE007
during the course of updating, by adding
Figure 709299DEST_PATH_IMAGE022
While at the same time reducing
Figure 464766DEST_PATH_IMAGE025
Deceiving the discriminator and keeping the accuracy of target label prediction;
Figure 713213DEST_PATH_IMAGE022
and
Figure 263143DEST_PATH_IMAGE025
is balanced by a parameter
Figure 82195DEST_PATH_IMAGE026
The control is carried out by controlling the temperature of the air conditioner,
Figure 31565DEST_PATH_IMAGE027
the higher the primary task accuracy can be maintained,
Figure 744306DEST_PATH_IMAGE027
the lower the fairness can be improved,
Figure 871662DEST_PATH_IMAGE007
loss function of
Figure 37064DEST_PATH_IMAGE028
Expressed as the total loss function design contains negativesLoss of fairness awareness
Figure 524546DEST_PATH_IMAGE022
And loss of retention accuracy
Figure 232739DEST_PATH_IMAGE025
And the disturbance generator learns to generate the antagonistic disturbance meeting the requirement, and the fairness of the model is improved while the target label prediction accuracy is kept:
Figure 390051DEST_PATH_IMAGE029
as a further improvement, in step 4) of the invention, the disturbance generator
Figure 427103DEST_PATH_IMAGE007
And discriminator
Figure 328063DEST_PATH_IMAGE014
Conducting a mini-max game until the generator can fool the discriminator well and the target label predictor has a high accuracy, at which point the generator will be used
Figure 156341DEST_PATH_IMAGE007
Deployed as a model
Figure 609188DEST_PATH_IMAGE030
Adaptively generating perturbations for the input data.
As a further improvement, in the mini-max game process, the discriminator
Figure 749183DEST_PATH_IMAGE014
Maximizing prediction of sensitive attributes from feature space
Figure 329200DEST_PATH_IMAGE005
Ability of disturbance generator
Figure 871040DEST_PATH_IMAGE007
Then an attempt is made to fool as much as possible
Figure 494788DEST_PATH_IMAGE014
At the same time let
Figure 122078DEST_PATH_IMAGE003
The target label of the sample after disturbance can be predicted, and the process target function can be formalized as follows:
Figure 505786DEST_PATH_IMAGE031
Figure 292345DEST_PATH_IMAGE032
Figure 962361DEST_PATH_IMAGE010
wherein, the parameters to be updated in the objective function are
Figure 952314DEST_PATH_IMAGE014
And
Figure 998767DEST_PATH_IMAGE007
update
Figure 639833DEST_PATH_IMAGE014
Updating by maximizing (max) the above-mentioned objective function
Figure 356116DEST_PATH_IMAGE007
Minimizing (min) the above objective function, a constraint term representation generator of the objective function
Figure 957999DEST_PATH_IMAGE007
For input image
Figure 932777DEST_PATH_IMAGE004
Applying a perturbation to the image
Figure 303716DEST_PATH_IMAGE008
Disturbance satisfies
Figure 190900DEST_PATH_IMAGE009
Norm limitation
Figure 670292DEST_PATH_IMAGE010
The implicit space obtained by the data after disturbance is expressed as
Figure 58548DEST_PATH_IMAGE012
The invention has the following beneficial technical effects:
step 1) in the technical scheme of the invention, firstly, antagonistic disturbance is added to an image to improve the fairness of a model; secondly, a disturbance generator is introduced to generate antagonistic disturbance, so that after training of the disturbance generator is completed, the generator can generate the antagonistic disturbance for any image, fairness of a model is improved, and sensitive attributes and target labels of the image do not need to be known.
In step 2) in the technical scheme of the invention, in the process of deceiving the discriminator, in addition to the cross entropy, the entropy is also used, so that the generated antagonism disturbance can be increased
Figure 159359DEST_PATH_IMAGE033
For the entropy of the prediction of the disturbance image, the model is prevented from extracting sensitive attribute information but not extracting information with opposite sensitive attributes, for example, the input image is male, and the model is expected not to extract sex information but rather extracting information with opposite sex after disturbance.
According to the method and the device, the sensitive characteristics of the data are prevented from being extracted by the deployment model by modifying the input image, so that the fairness can be improved under the condition of not changing the model. The invention processes the input data of the deployment model without changing the deep learning model. The method improves the model fairness based on the antagonistic disturbance, and designs a corresponding disturbance generator and a discriminator, wherein the disturbance generator is directly used for generating the antagonistic disturbance, and the discriminator assists the training of the disturbance generator. The method comprises the steps of capturing sensitive attribute information related to fairness by using a discriminator, guiding training optimization of a disturbance generator, generating antagonistic disturbance hiding data sensitive attribute information, and reserving target task related information, so that a model is prevented from extracting sensitive information of input data in a feature extraction process, and accordingly fairness prediction is improved.
Drawings
FIG. 1 is a block diagram of a deep learning model fairness boosting system based on antagonistic perturbations.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by the following embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
FIG. 1 is a frame diagram of a depth learning model fairness promotion system based on antagonistic disturbance, which includes a deployment model, a disturbance generator and a discriminator, wherein the deployment model includes a feature extractor and a label predictor, the disturbance generator is connected with the feature extractor, the feature extractor is respectively connected with the label predictor and the discriminator, an image is input by the feature extractor, the image is represented by a hidden space obtained by the feature extractor, the hidden space represents a prediction result of a target label output after the label predictor is input, and the hidden space represents a prediction result of an image sensitive attribute output after the discriminator is input.
Comprises the following steps:
1) adding antagonistic disturbance to an input image by using a disturbance generator, inputting the disturbed image into a deployment model, outputting a hidden space representation of the image by a feature extractor of the deployment model, and obtaining a prediction result of a target label after the hidden space representation is input into a label predictor;
2) training a sensitive attribute discriminator to predict a sensitive attribute from the hidden space representation, and updating the discriminator to guide the updating of the disturbance generator;
3) updating the disturbance generator to better generate the antagonistic disturbance, deceiving the discriminator to ensure that the image added with the antagonistic disturbance does not contain information of sensitive attributes in a hidden space representation as much as possible, and simultaneously ensure that a prediction result of the label predictor is as accurate as possible;
4) repeating the step 2) and the step 3) until the generator can well cheat the discriminator, the accuracy of the label predictor is high, the generator at the moment is used as a fairness promotion module to be integrated into a deployment model data preprocessing link, and antagonism disturbance is added to the input image to promote fairness;
the deployment model may be represented as
Figure 607658DEST_PATH_IMAGE001
Wherein
Figure 308767DEST_PATH_IMAGE002
A representative feature extractor for extracting a feature of the image,
Figure 376080DEST_PATH_IMAGE003
representing a label predictor, input image is noted
Figure 721610DEST_PATH_IMAGE004
The sensitivity attribute of the image is recorded as
Figure 471304DEST_PATH_IMAGE005
Object tag is marked as
Figure 144862DEST_PATH_IMAGE006
The implicit space output in the feature extraction process is represented as
Figure 874920DEST_PATH_IMAGE034
The final output result of the model is the output of the label predictor
Figure 465171DEST_PATH_IMAGE035
. Disturbance generator
Figure 130638DEST_PATH_IMAGE007
Is inputted as
Figure 416126DEST_PATH_IMAGE004
The output is the antagonistic disturbance, the disturbance value and the input image
Figure 74509DEST_PATH_IMAGE004
The summed values are input to a feature extractor. Distinguishing device
Figure 269998DEST_PATH_IMAGE014
Connected to the output of the feature extractor, from a hidden spatial representation
Figure 496580DEST_PATH_IMAGE036
The output of the medium prediction sensitive attribute is the predicted value of the sensitive attribute
Figure 393998DEST_PATH_IMAGE037
The method specifically comprises the following steps:
1) the implicit space output in the feature extraction process is represented as
Figure 200280DEST_PATH_IMAGE038
The final output result of the model is the output of the label predictor
Figure 515855DEST_PATH_IMAGE039
. Training a reactive perturbation generation module using training data, modifying input data using the module, and using the perturbation generator
Figure 772393DEST_PATH_IMAGE040
For data
Figure 32473DEST_PATH_IMAGE041
Adding antagonistic disturbance, the disturbed image is
Figure 517812DEST_PATH_IMAGE042
Disturbance satisfies
Figure 937161DEST_PATH_IMAGE043
Limiting the norm; the disturbed image is processed
Figure 239966DEST_PATH_IMAGE044
Input deployment model, feature extractor for deployment model
Figure 862709DEST_PATH_IMAGE045
Implicit spatial representation of an output image
Figure 276372DEST_PATH_IMAGE046
And obtaining the prediction result of the target label after the label predictor is input in the hidden space representation
Figure 284649DEST_PATH_IMAGE047
2) Measuring sensitive attribute information contained in disturbed image, training discriminator
Figure 899301DEST_PATH_IMAGE033
From implicit spatial representation
Figure 133973DEST_PATH_IMAGE048
Predicting sensitivity attribute and comparing discriminator
Figure 210382DEST_PATH_IMAGE033
Updating is carried out, and information of the sensitive attribute is better captured so as to guide the updating of the disturbance generator;
hidden spatial representation of post-perturbation data as
Figure 355056DEST_PATH_IMAGE046
Device for discriminating
Figure 999664DEST_PATH_IMAGE033
The output of the predicted post-disturbance image sensitivity attribute is
Figure 863844DEST_PATH_IMAGE049
By updating
Figure 760256DEST_PATH_IMAGE033
So that the discriminator can accurately capture the sensitive attribute from the hidden space representation
Figure 352911DEST_PATH_IMAGE050
Is determined by the information of (a) a,
Figure 558633DEST_PATH_IMAGE033
the loss function of (d) is:
Figure 502319DEST_PATH_IMAGE015
wherein
Figure 202421DEST_PATH_IMAGE016
The cross-entropy is represented by the cross-entropy,
Figure 39796DEST_PATH_IMAGE051
representing the true sensitive property. By minimizing
Figure 26207DEST_PATH_IMAGE014
Loss function of
Figure 332554DEST_PATH_IMAGE052
Continuously updating sensitive attribute discriminator
Figure 820036DEST_PATH_IMAGE014
3) To disturbance generator
Figure 652863DEST_PATH_IMAGE040
And updating is carried out, antagonistic disturbance is generated better, a discriminator is deceived, the data added with the antagonistic disturbance does not contain information of sensitive attributes in a hidden space representation as much as possible, and meanwhile, the prediction result of the label predictor is accurate as much as possible.
Disturbance generator
Figure 419962DEST_PATH_IMAGE040
To-be-deceived discriminator
Figure 603819DEST_PATH_IMAGE033
And the method prevents the model from extracting sensitive attribute information, thereby eliminating the association between the sensitive attribute and the target label and improving the fairness under the condition of not changing the model. On the one hand, need to be maximized
Figure 629412DEST_PATH_IMAGE053
However, this causes the image to be moved in feature space to the other side of the sensitive property hyperplane. Therefore, it is required to increase
Figure 51166DEST_PATH_IMAGE033
For the disturbed image
Figure 254746DEST_PATH_IMAGE044
Predicted entropy of let
Figure 394740DEST_PATH_IMAGE033
In that
Figure 489604DEST_PATH_IMAGE044
Making a random guess above, the loss of entropy for this term can be expressed as:
Figure 172389DEST_PATH_IMAGE018
wherein the content of the first and second substances,
Figure 671504DEST_PATH_IMAGE019
representing entropy. Thus, the disturbance generator
Figure 423428DEST_PATH_IMAGE007
The total loss for improving fairness is expressed as
Figure 807136DEST_PATH_IMAGE054
Figure 203482DEST_PATH_IMAGE021
Is a smaller value, controlling the weight of the entropy-constrained term. In addition to being responsible for perceiving fairness
Figure 263711DEST_PATH_IMAGE022
In addition, it is necessary to keep the information of the target label in the hidden space representation and keep the model's performance on target label prediction, so a loss term responsible for the model accuracy is needed:
Figure 378297DEST_PATH_IMAGE023
wherein
Figure 300117DEST_PATH_IMAGE016
The cross-entropy is represented by the cross-entropy,
Figure 212621DEST_PATH_IMAGE024
the output of the label predictor representing the model.
Figure 194484DEST_PATH_IMAGE007
During the course of updating, by adding
Figure 796367DEST_PATH_IMAGE022
While at the same time reducing
Figure 771145DEST_PATH_IMAGE025
And deceiving the discriminator and keeping the accuracy of target label prediction.
Figure 283029DEST_PATH_IMAGE007
Loss function of
Figure 560426DEST_PATH_IMAGE028
Expressed as:
Figure 508659DEST_PATH_IMAGE029
wherein the parameters
Figure 896915DEST_PATH_IMAGE026
Control of
Figure 528885DEST_PATH_IMAGE022
And
Figure 101818DEST_PATH_IMAGE025
the balance of (a) to (b) is,
Figure 553659DEST_PATH_IMAGE027
the higher the primary task accuracy can be maintained,
Figure 480026DEST_PATH_IMAGE027
the lower the fairness is improved.
4) And (3) repeating the step 2) and the step 3) to carry out iterative training until the generator can well cheat the discriminator, the accuracy of the label predictor is high, the generator at the moment is integrated into a deployment model data preprocessing link as a fairness promotion module, and antagonism disturbance is added to input data to promote fairness of the deployment model.
In the course of iterative training, the disturbance generator
Figure 950191DEST_PATH_IMAGE040
And discriminator
Figure 569391DEST_PATH_IMAGE033
Conducting mini-max games until the creator can fool the discriminator well and the label predictor is accurate, at which point the creator will be played
Figure 242949DEST_PATH_IMAGE040
Deployed as a model
Figure 363221DEST_PATH_IMAGE055
Adaptively generating perturbations for the input data. Discriminator in infinitesimal-infinitesimal game process
Figure 563258DEST_PATH_IMAGE033
Maximizing slave spaceInter-prediction sensitivity attribute
Figure 228726DEST_PATH_IMAGE050
Ability of disturbance generator
Figure 904426DEST_PATH_IMAGE040
Then an attempt is made to fool as much as possible
Figure 438176DEST_PATH_IMAGE033
At the same time let
Figure 899244DEST_PATH_IMAGE056
The target label of the disturbed image can be predicted, and the target function can be formalized as follows:
Figure 984881DEST_PATH_IMAGE057
Figure 633031DEST_PATH_IMAGE032
Figure 704892DEST_PATH_IMAGE010
wherein, the parameters to be updated in the objective function are
Figure 263875DEST_PATH_IMAGE014
And
Figure 130200DEST_PATH_IMAGE007
update
Figure 265646DEST_PATH_IMAGE014
Updating by maximizing (max) the above-mentioned objective function
Figure 265832DEST_PATH_IMAGE007
The above objective function is minimized (min). Constraint term representation generator of objective function
Figure 560547DEST_PATH_IMAGE007
For input image
Figure 473140DEST_PATH_IMAGE004
Applying a perturbation to the image
Figure 610729DEST_PATH_IMAGE008
Disturbance satisfies
Figure 24393DEST_PATH_IMAGE009
Norm limitation
Figure 48980DEST_PATH_IMAGE010
The implicit space obtained by the data after disturbance is expressed as
Figure 522687DEST_PATH_IMAGE012
. Distinguishing device
Figure 881993DEST_PATH_IMAGE014
Maximizing prediction of sensitive attributes from feature space
Figure 709135DEST_PATH_IMAGE005
Ability of disturbance generator
Figure 712863DEST_PATH_IMAGE007
Then an attempt is made to fool as much as possible
Figure 482105DEST_PATH_IMAGE014
At the same time let
Figure 938494DEST_PATH_IMAGE003
The target label of the disturbed image can be predicted. When the generator can cheat the discriminator well and the accuracy of the label predictor is high, stopping the iterative training and leading the generator to be used
Figure 739965DEST_PATH_IMAGE007
Deployed as a model
Figure 863779DEST_PATH_IMAGE030
A data preprocessing module, generator
Figure 820234DEST_PATH_IMAGE007
Antagonistic perturbations can be generated adaptively for the input image.
According to the method for improving the fairness of the deep learning model based on the antagonistic disturbance, the antagonistic disturbance is added to the given deployment model through the training disturbance generator, the relevant characteristics of the sensitive attributes of the model are prevented from being extracted, images with different sensitive attribute values are treated fairly, and therefore the fairness of the deployment model is improved. The invention is tested on CelebA image data set, and in the testing process, the target label
Figure 159992DEST_PATH_IMAGE058
That is, the target task label value and the sensitive attribute which need to be predicted for the deployment model
Figure 453570DEST_PATH_IMAGE050
For gender, values of the target label and the sensitive attribute are { -1,1 }. In the test, in order to verify the improvement performance of the method on different fairness level models, models obtained by 4 different training modes are adopted as deployment models:
1) normal training model: training the model to minimize target label prediction loss on the dataset;
2) the confrontation training model comprises the following steps: adding a discriminator at the output end of the model, learning and predicting the sensitive attribute value by the discriminator, minimizing the target label prediction loss of the model on the data set in the process of training the model, and maximizing the discriminator loss so as to reduce the model bias;
3) and (3) turning over the label: randomly turning over a target label of the training set data, expanding bias in the data set, and minimizing target label prediction loss on the data set, so that the model learns the bias existing in the data;
4) gradient inversion model: and (3) inverting the gradient of reverse transmission of the discriminator in the antithetical training model, minimizing the target label prediction loss of the model on the data set in the process of training the model, and minimizing the loss of the discriminator so as to enlarge the bias of the model.
TABLE 1-1 method run results on Normal training model
Figure 307256DEST_PATH_IMAGE059
TABLE 1-2 method run results on the challenge training model
Figure 418301DEST_PATH_IMAGE060
Tables 1-3 running the results on the Label flipping model
Figure 114861DEST_PATH_IMAGE061
Tables 1-4 results of the method run on a gradient inversion model
Figure 87496DEST_PATH_IMAGE062
In the fairness promotion performance, the worse the original fairness of the deployment model is, the more the promotable space is, and the fairness promotion effect is more obvious by using the method and the device. As shown in tables 1-1, 1-2, 1-3, and 1-4 above, each table represents results on deployment models of different fairness levels. In each table, the first column represents different target label prediction tasks, including Smiling, Attractive and Blond _ Hair; the second column represents the input as an original image or a perturbed image; the third column ACC index is used to measure the accuracy of the target task,
Figure 44957DEST_PATH_IMAGE063
Figure 202269DEST_PATH_IMAGE064
the representation of the function of the indicative function,
Figure 995912DEST_PATH_IMAGE065
indicating the result of the prediction of the image object label,
Figure 631293DEST_PATH_IMAGE066
true target label representing an image, ACC being the number of predicted correct divided by the total
Figure 443260DEST_PATH_IMAGE067
Higher ACC indicates better target task prediction performance; the fourth and fifth columns measure fairness,
Figure 37053DEST_PATH_IMAGE068
Figure 52413DEST_PATH_IMAGE069
representing the sensitive attribute value of the image, DP calculates the difference of the probabilities that the population of different sensitive attribute values is predicted as positive by the model, and
Figure 757064DEST_PATH_IMAGE070
the DEO measures the difference between the false positive rate and the false negative rate of different sensitive attribute value groups, and the closer the two rates are to 0, the better the fairness is. The values of each row in the table represent the test results of either the original image or the perturbed image using the present invention on the deployment model. The experiment shows that the deployment model has certain prejudice, and the method can improve the fairness to a certain extent while maintaining the accuracy of the main task; when a deployment model has a large bias, the method can obviously improve fairness while effectively maintaining the accuracy of the main task; when the deployment model is fairly fair, the method can further improve fairness in a small range while maintaining accuracy of the main task.
TABLE 2-1 test results on the Ali API by the method
Figure 689117DEST_PATH_IMAGE071
TABLE 2-2 method test results on Baidu API
Figure 63597DEST_PATH_IMAGE072
Under the condition that the deployment model cannot be accessed, the method can also achieve a certain fairness promotion effect. As shown in tables 2-1 and 2-2, the smile detection interface (API) provided by the ali and Baidu vision open platform is used for experimental testing, the CelebA data set is used for training the disturbance generator, and when the deployment model architecture and parameters are unknown, the method can improve fairness while better maintaining the target label prediction accuracy.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. The system is characterized by comprising a deployment model, a disturbance generator and a discriminator, wherein the deployment model comprises a feature extractor and a label predictor, the disturbance generator is connected with the feature extractor, the feature extractor is respectively connected with the label predictor and the discriminator, an image is input by the feature extractor, the image is represented by a hidden space through the feature extractor, the hidden space represents a prediction result of a target label output after being input into the label predictor, and the hidden space represents a prediction result of an image sensitive attribute output after being input into the discriminator.
2. The system for improving fairness of a deep learning model based on antagonistic disturbance according to claim 1, wherein an input of the disturbance generator is an image, an output of the disturbance generator is the antagonistic disturbance, and a disturbance value is added to the input image and then input to the feature extractor.
3. A method for improving fairness of a deep learning model based on antagonistic disturbance is characterized by comprising the following steps:
1) adding antagonistic disturbance to the image by using a disturbance generator, inputting the disturbed image into a feature extractor of a deployment model, outputting a hidden space representation of the image by the feature extractor, and obtaining a prediction result of a target label after the hidden space representation is input into a label predictor;
2) measuring sensitive attribute information contained in the disturbed image, inputting the hidden space representation into a discriminator to obtain a prediction result of the image sensitive attribute, training the discriminator to predict the sensitive attribute from the hidden space representation, and updating the discriminator;
3) updating the disturbance generator to better generate the antagonistic disturbance, deceiving the discriminator to ensure that the image added with the antagonistic disturbance does not contain information of sensitive attributes in a hidden space representation as much as possible, and simultaneously ensure that a prediction result of the target label predictor is as accurate as possible;
4) and (3) repeating the step 2) and the step 3) until the generator can well cheat the discriminator, the target label predictor has high accuracy, the disturbance generator at the moment is integrated into a deployment model data preprocessing link as a fairness promotion module, and antagonism disturbance is added to the input image to promote fairness.
4. The method for improving fairness-based deep learning model fairness based on adversarial disturbance as claimed in claim 3, wherein the deployment model is expressed as
Figure 186703DEST_PATH_IMAGE001
Wherein
Figure 854576DEST_PATH_IMAGE002
In order to provide a feature extractor for a computer,
Figure 916204DEST_PATH_IMAGE003
for the target label predictor, the input image is
Figure 530987DEST_PATH_IMAGE004
The sensitive property is
Figure 423988DEST_PATH_IMAGE005
The object label is
Figure 403707DEST_PATH_IMAGE006
5. The method for improving fairness of deep learning model based on adversarial disturbance according to claim 4, wherein in the step 1), a disturbance generator is used
Figure 421473DEST_PATH_IMAGE007
For images
Figure 902264DEST_PATH_IMAGE004
Adding antagonistic disturbance, the disturbed image is
Figure 226935DEST_PATH_IMAGE008
Disturbance satisfies
Figure 174294DEST_PATH_IMAGE009
Norm limitation
Figure 413776DEST_PATH_IMAGE010
The disturbed image
Figure 432679DEST_PATH_IMAGE011
Input deployment model, feature extractor for deployment model
Figure 910059DEST_PATH_IMAGE002
Implicit spatial representation of an output image
Figure 293898DEST_PATH_IMAGE012
And obtaining the prediction result of the target label after the label predictor is input in the hidden space representation
Figure 489518DEST_PATH_IMAGE013
6. The method as claimed in claim 4, wherein the step 2) is performed by updating
Figure 108850DEST_PATH_IMAGE014
So that the discriminator can accurately capture the sensitive attribute from the hidden space representation
Figure 909578DEST_PATH_IMAGE005
Is determined by the information of (a) a,
Figure 753335DEST_PATH_IMAGE014
the loss function of (d) is:
Figure 934786DEST_PATH_IMAGE015
wherein
Figure 593694DEST_PATH_IMAGE016
Representing cross entropy, the hidden space of the perturbed data is represented as
Figure 747464DEST_PATH_IMAGE012
Arbiter for sensitive attribute
Figure 535422DEST_PATH_IMAGE014
Is output as
Figure 971214DEST_PATH_IMAGE017
Figure 135610DEST_PATH_IMAGE005
Representing the true sensitive property.
7. The method for improving fairness of deep learning model based on adversarial disturbance as claimed in claim 4, 5 or 6, wherein in step 3), the increase is performed
Figure 176510DEST_PATH_IMAGE014
Entropy of prediction of disturbed image
Figure 73053DEST_PATH_IMAGE014
In disturbing the sample
Figure 668244DEST_PATH_IMAGE011
Making a random guess above, the loss of entropy is expressed as:
Figure 105173DEST_PATH_IMAGE018
wherein the content of the first and second substances,
Figure 594054DEST_PATH_IMAGE019
representing entropy, to this point, the generator
Figure 192657DEST_PATH_IMAGE007
The total loss for improving fairness is expressed as
Figure 570417DEST_PATH_IMAGE020
Figure 279879DEST_PATH_IMAGE021
Is a smaller value, controlling the weight of the entropy-constrained term.
8. The method of claim 7 for improving fairness based on adversarial perturbation in deep learning modelLifting method, characterized in that in said step 3), except for being responsible for fairness perception
Figure 498633DEST_PATH_IMAGE022
Besides, information of the target label needs to be kept in the hidden space representation, the performance of the model on target label prediction needs to be kept, and a loss term responsible for the accuracy of the model needs to be:
Figure 330454DEST_PATH_IMAGE023
wherein
Figure 900237DEST_PATH_IMAGE016
The cross-entropy is represented by the cross-entropy,
Figure 921110DEST_PATH_IMAGE024
the output of the target label predictor representing the model,
Figure 56688DEST_PATH_IMAGE007
during the course of updating, by adding
Figure 433311DEST_PATH_IMAGE022
While at the same time reducing
Figure 287129DEST_PATH_IMAGE025
Deceiving the discriminator and keeping the accuracy of target label prediction;
Figure 400710DEST_PATH_IMAGE022
and
Figure 594056DEST_PATH_IMAGE025
is balanced by a parameter
Figure 33259DEST_PATH_IMAGE026
The control is carried out by controlling the temperature of the air conditioner,
Figure 471DEST_PATH_IMAGE027
the higher the primary task accuracy can be maintained,
Figure 822802DEST_PATH_IMAGE027
the lower the fairness can be improved,
Figure 605076DEST_PATH_IMAGE007
loss function of
Figure 356125DEST_PATH_IMAGE028
Expressed as:
Figure 653377DEST_PATH_IMAGE029
9. the method for improving fairness based on the adversarial disturbance deep learning model in claim 4 or 8, wherein in step 4), the disturbance generator
Figure 108760DEST_PATH_IMAGE007
And discriminator
Figure 509654DEST_PATH_IMAGE014
Conducting a mini-max game until the generator can fool the discriminator well and the target label predictor has a high accuracy, at which point the generator will be used
Figure 431605DEST_PATH_IMAGE007
Deployed as a model
Figure 809628DEST_PATH_IMAGE030
Adaptively generating perturbations for the input data.
10. The adversarial-perturbation-based deep-learning model fairness of claim 9The character of the character promoting method is that in the minimum-maximum game process, the discriminator
Figure 412910DEST_PATH_IMAGE014
Maximizing the ability to predict the sensitive property z from the feature space, a perturbation generator
Figure 825568DEST_PATH_IMAGE007
Then an attempt is made to fool as much as possible
Figure 145120DEST_PATH_IMAGE014
At the same time let
Figure 10439DEST_PATH_IMAGE003
The target label of the sample after disturbance can be predicted, and the process target function can be formalized as follows:
Figure 915946DEST_PATH_IMAGE031
Figure 183111DEST_PATH_IMAGE032
Figure 650127DEST_PATH_IMAGE033
wherein, the parameters to be updated in the objective function are
Figure 737162DEST_PATH_IMAGE014
And
Figure 744564DEST_PATH_IMAGE007
update
Figure 476022DEST_PATH_IMAGE014
To maximize (max) the above mentioned objectiveStandard function, update
Figure 238572DEST_PATH_IMAGE007
Minimizing (min) the above objective function, a constraint term representation generator of the objective function
Figure 983543DEST_PATH_IMAGE007
Applying a disturbance to the input image x, the disturbed image being
Figure 325794DEST_PATH_IMAGE008
Disturbance satisfies
Figure 770813DEST_PATH_IMAGE009
Norm limitation
Figure 173107DEST_PATH_IMAGE010
The implicit space obtained by the data after disturbance is expressed as
Figure 375680DEST_PATH_IMAGE012
CN202210320949.4A 2022-03-30 2022-03-30 System and method for improving fairness of deep learning model based on antagonistic disturbance Pending CN114419379A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210320949.4A CN114419379A (en) 2022-03-30 2022-03-30 System and method for improving fairness of deep learning model based on antagonistic disturbance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210320949.4A CN114419379A (en) 2022-03-30 2022-03-30 System and method for improving fairness of deep learning model based on antagonistic disturbance

Publications (1)

Publication Number Publication Date
CN114419379A true CN114419379A (en) 2022-04-29

Family

ID=81262937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210320949.4A Pending CN114419379A (en) 2022-03-30 2022-03-30 System and method for improving fairness of deep learning model based on antagonistic disturbance

Country Status (1)

Country Link
CN (1) CN114419379A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017290A (en) * 2022-07-15 2022-09-06 浙江星汉信息技术股份有限公司 File question-answering system optimization method and device based on cooperative confrontation training
CN116994309A (en) * 2023-05-06 2023-11-03 浙江大学 Face recognition model pruning method for fairness perception

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020029356A1 (en) * 2018-08-08 2020-02-13 杰创智能科技股份有限公司 Method employing generative adversarial network for predicting face change
US20200285952A1 (en) * 2019-03-08 2020-09-10 International Business Machines Corporation Quantifying Vulnerabilities of Deep Learning Computing Systems to Adversarial Perturbations
CN111753918A (en) * 2020-06-30 2020-10-09 浙江工业大学 Image recognition model for eliminating sex bias based on counterstudy and application
CN111881935A (en) * 2020-06-19 2020-11-03 北京邮电大学 Countermeasure sample generation method based on content-aware GAN
CN112115963A (en) * 2020-07-30 2020-12-22 浙江工业大学 Method for generating unbiased deep learning model based on transfer learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020029356A1 (en) * 2018-08-08 2020-02-13 杰创智能科技股份有限公司 Method employing generative adversarial network for predicting face change
US20200285952A1 (en) * 2019-03-08 2020-09-10 International Business Machines Corporation Quantifying Vulnerabilities of Deep Learning Computing Systems to Adversarial Perturbations
CN111881935A (en) * 2020-06-19 2020-11-03 北京邮电大学 Countermeasure sample generation method based on content-aware GAN
CN111753918A (en) * 2020-06-30 2020-10-09 浙江工业大学 Image recognition model for eliminating sex bias based on counterstudy and application
CN112115963A (en) * 2020-07-30 2020-12-22 浙江工业大学 Method for generating unbiased deep learning model based on transfer learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHIBO WANG ET AL: "Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models", 《ARXIV》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017290A (en) * 2022-07-15 2022-09-06 浙江星汉信息技术股份有限公司 File question-answering system optimization method and device based on cooperative confrontation training
CN115017290B (en) * 2022-07-15 2022-11-08 浙江星汉信息技术股份有限公司 File question-answering system optimization method and device based on cooperative confrontation training
CN116994309A (en) * 2023-05-06 2023-11-03 浙江大学 Face recognition model pruning method for fairness perception
CN116994309B (en) * 2023-05-06 2024-04-09 浙江大学 Face recognition model pruning method for fairness perception

Similar Documents

Publication Publication Date Title
CN111767405B (en) Training method, device, equipment and storage medium of text classification model
CN111754596B (en) Editing model generation method, device, equipment and medium for editing face image
CN107704495B (en) Training method, device and the computer readable storage medium of subject classification device
CN109583501B (en) Method, device, equipment and medium for generating image classification and classification recognition model
CN107391760A (en) User interest recognition methods, device and computer-readable recording medium
CN114419379A (en) System and method for improving fairness of deep learning model based on antagonistic disturbance
CN110796199B (en) Image processing method and device and electronic medical equipment
JP2022141931A (en) Method and device for training living body detection model, method and apparatus for living body detection, electronic apparatus, storage medium, and computer program
CN108961358B (en) Method and device for obtaining sample picture and electronic equipment
WO2023038574A1 (en) Method and system for processing a target image
CN114842343A (en) ViT-based aerial image identification method
CN111753918A (en) Image recognition model for eliminating sex bias based on counterstudy and application
CN114155397A (en) Small sample image classification method and system
CN110807291B (en) On-site situation future guiding technology based on mimicry countermeasure learning mechanism
CN115761408A (en) Knowledge distillation-based federal domain adaptation method and system
CN115063664A (en) Model learning method, training method and system for industrial vision detection
Lauren et al. A low-dimensional vector representation for words using an extreme learning machine
CN113240080A (en) Prior class enhancement based confrontation training method
CN114495114B (en) Text sequence recognition model calibration method based on CTC decoder
CN115795355A (en) Classification model training method, device and equipment
CN113627498B (en) Character ugly image recognition and model training method and device
CN111651626B (en) Image classification method, device and readable storage medium
CN113689514A (en) Theme-oriented image scene graph generation method
CN113449631A (en) Image classification method and system
CN111598144A (en) Training method and device of image recognition model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220429