WO2023197927A1

WO2023197927A1 - Model fairness evaluation methods and apparatus

Info

Publication number: WO2023197927A1
Application number: PCT/CN2023/086570
Authority: WO
Inventors: 李进锋; 刘翔宇; 张�荣
Original assignee: 阿里巴巴（中国）有限公司
Priority date: 2022-04-12
Filing date: 2023-04-06
Publication date: 2023-10-19
Also published as: CN114970670A

Abstract

Provided in the embodiments of the present specification are model fairness evaluation methods and apparatus. One model fairness evaluation method comprises: according to a picture and/or text training model, determining real data probability distribution of a picture and/or text training sample; according to the real data probability distribution and a generative adversarial network model, determining a credibility detection result of a sample to be evaluated; when the credibility detection result meets an untrusted condition, performing sample processing on the sample to be evaluated according to the credibility detection result, so as to obtain an updated evaluation sample; and according to the sample to be evaluated and the updated evaluation sample, performing fairness evaluation on a model to be evaluated. The method can be applied to algorithm governance.

Description

Model fairness assessment method and device

This application claims priority to the Chinese patent application submitted to the China Patent Office on April 12, 2022, with application number 202210379396. Applying.

Technical field

The embodiments of this specification relate to the field of computer technology, and in particular to two model fairness evaluation methods.

Background technique

As the basic theories and technologies of artificial intelligence continue to make new breakthroughs, graphic and text algorithms based on artificial intelligence technology have been widely used in many public fields such as finance, education, medical care, and security, giving rise to concepts such as intelligent security, intelligent A series of intelligent applications such as customer service, medical consultation, and personalized recommendations not only greatly enrich and facilitate people's daily lives, but also promote and promote the development and progress of social economy and technology.

However, the issue of unfairness and even discrimination in the automated decision-making process of artificial intelligence has been controversial in society. It has not only caused people's concerns and doubts about algorithm automated decision-making, but also gradually attracted widespread attention from society and the public. In some places Laws and regulations related to algorithm fairness have been promulgated one after another, clearly stating that the development and application of artificial intelligence algorithms must meet fairness constraints. Therefore, whether it is from the perspective of meeting regulatory compliance or improving user experience, conducting algorithm fairness assessments and eliminating algorithm bias are essential steps in the entire algorithm life cycle.

Contents of the invention

In view of this, the embodiments of this specification provide two model fairness evaluation methods. One or more embodiments of this specification relate to two model fairness evaluation devices, a computing device, a computer-readable storage medium, and a computer program at the same time, so as to solve the technical deficiencies existing in the existing technology.

According to the first aspect of the embodiment of this specification, a model fairness assessment method is provided, including:

Based on the image and/or text training model, determine the real data probability distribution of the image and/or text training samples;

Determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model;

When the credibility detection result satisfies the non-credibility condition, perform sample processing on the sample to be evaluated according to the credibility detection result to obtain an updated evaluation sample;

According to the sample to be evaluated and the updated evaluation sample, a fairness evaluation is performed on the model to be evaluated.

According to the second aspect of the embodiment of this specification, a model fairness evaluation device is provided, including:

a probability distribution determination module configured to determine the real data probability distribution of the image and/or text training sample based on the image and/or text training model;

The detection result determination module is configured to determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model;

A sample processing module configured to, when the credibility detection result satisfies the non-credibility condition, process the sample according to the The credibility test results are used to perform sample processing on the sample to be evaluated to obtain an updated evaluation sample;

The evaluation module is configured to perform a fairness evaluation on the model to be evaluated based on the sample to be evaluated and the updated evaluation sample.

According to the third aspect of the embodiment of this specification, a model fairness evaluation method is provided, which is applied to the model fairness evaluation platform, including:

Receive samples to be evaluated and models to be evaluated sent by users;

Conduct a fairness assessment on the model to be evaluated based on the sample to be evaluated and the updated evaluation sample;

Obtain the fairness evaluation result of the model to be evaluated, and return the fairness evaluation result to the user.

According to the fourth aspect of the embodiments of this specification, a model fairness evaluation method is provided, which is applied to a model fairness evaluation platform, including:

The first determination module is configured to determine the real data probability distribution of the image and/or text training samples based on the image and/or text training model;

The data receiving module is configured to receive the samples to be evaluated and the models to be evaluated sent by the user;

The second determination module is configured to determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model;

A sample update module configured to perform sample processing on the sample to be evaluated according to the credibility detection result to obtain an updated evaluation sample when the credibility detection result satisfies the non-credibility condition;

A fairness evaluation module configured to perform a fairness evaluation on the model to be evaluated based on the sample to be evaluated and the updated evaluation sample;

The result display module is configured to obtain the fairness evaluation result of the model to be evaluated, and return the fairness evaluation result to the user.

According to a fifth aspect of the embodiments of this specification, a computing device is provided, including:

memory and processor;

The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions. When the computer-executable instructions are executed by the processor, the steps of the above model fairness evaluation method are implemented.

According to a sixth aspect of the embodiments of this specification, a computer-readable storage medium is provided, which stores computer-executable instructions. When the instructions are executed by a processor, the steps of the above model fairness evaluation method are implemented.

According to a seventh aspect of the embodiments of this specification, a computer program is provided, wherein when the computer program is executed in a computer, the computer is caused to perform the steps of the above model fairness evaluation method.

One embodiment of this specification implements two model fairness evaluation methods and devices. One of the model fairness evaluation methods includes training a model based on pictures and/or text, and determining the real data probability score of the picture and/or text training sample. Distribution; determine the credibility detection result of the sample to be evaluated according to the real data probability distribution and the generated adversarial network model; when the credibility detection result satisfies the non-credibility condition, according to the credibility As a result of the detection, the sample to be evaluated is subjected to sample processing to obtain an updated evaluation sample; based on the sample to be evaluated and the updated evaluation sample, a fairness assessment is performed on the model to be evaluated.

Specifically, the model fairness evaluation method uses a graphic training model to model the real data probability distribution of graphic training samples, and conducts credibility testing on the samples to be evaluated based on the real data probability distribution; and targets non-credible samples. Process and obtain updated evaluation samples to improve the reliability and completeness of the evaluation samples in untrusted environments, thereby ensuring the model's fairness evaluation method and its robustness in both trusted and untrusted environments. and the availability of model evaluation results, thereby ensuring the accuracy of the fairness evaluation of the model to be evaluated through the model fairness evaluation method, so that the subsequent models to be evaluated can be used in actual applications, whether from meeting regulatory compliance or improving user experience. angles, all have good results and can be applied to algorithmic governance.

Description of the drawings

Figure 1 is a schematic diagram of a specific scenario of a model fairness evaluation method provided by an embodiment of this specification;

Figure 2 is a flow chart of a model fairness evaluation method provided by an embodiment of this specification;

Figure 3 is a process flow chart of a model fairness evaluation method provided by an embodiment of this specification;

Figure 4 is a schematic structural diagram of a model fairness evaluation device provided by an embodiment of this specification;

Figure 5 is a flow chart of another model fairness evaluation method provided by an embodiment of this specification;

Figure 6 is a schematic structural diagram of another model fairness evaluation method provided by an embodiment of this specification;

Figure 7 is a structural block diagram of a computing device provided by an embodiment of this specification.

Detailed ways

In the following description, numerous specific details are set forth to facilitate a thorough understanding of this specification. However, this specification can be implemented in many other ways different from those described here. Those skilled in the art can make similar extensions without violating the connotation of this specification. Therefore, this specification is not limited by the specific implementation disclosed below.

The terminology used in one or more embodiments of this specification is for the purpose of describing particular embodiments only and is not intended to limit the one or more embodiments of this specification. As used in one or more embodiments of this specification and the appended claims, the singular forms "a," "the" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used in one or more embodiments of this specification refers to and includes any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of this specification, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other. For example, without departing from the scope of one or more embodiments of this specification, the first may also be called the second, and similarly, the second may also be called the first. Depending on the context, the word "if" as used herein may be interpreted as "when" or "when" or "in response to determining."

First, terminology used in one or more embodiments of this specification will be explained.

Graphic-text algorithm: Graphic-text algorithm refers to the general term for picture algorithms and text algorithms, specifically including picture classification, face recognition, target detection, picture retrieval and other algorithms that use pictures as input, as well as text classification, sentiment analysis, machine translation, and dialogue generation. and other algorithms that take text as input.

Algorithm fairness: The automated decision-making of artificial intelligence algorithms is independent of protected sensitive attributes (natural attributes and social attributes) such as ethnicity, belief, and region. That is, for protected sensitive attributes, artificial intelligence algorithm decisions have no impact on individuals or groups. Prejudice or preference arising from inherent or acquired attributes.

OOD data: OOD (Out-of-Distribution) data, also known as out-of-distribution data, means that the sample data comes from data that is different from the distribution of the algorithm model training data. If the sample data distribution is the same as the algorithm model training data distribution, the data is called ID data, that is, in-distribution data.

Robustness: Robustness, also known as robustness or robustness, refers to the fact that a computer system continues to operate normally when certain parameters (structure, size) change, or handles errors during execution, and the algorithm encounters abnormalities such as input and operations. The ability to operate and ensure stable performance.

However, as application scenarios become more and more widespread, the legal and ethical issues and risks faced by artificial intelligence algorithms are becoming increasingly prominent. For example, a certain crime risk assessment algorithm may systematically discriminate against people of one ethnic group. The issue of unfairness and even discrimination in the automated decision-making process of artificial intelligence has been controversial in society. It has not only caused people's concerns and doubts about automated decision-making by algorithms, but also gradually attracted widespread attention from society and the public. Some places have successively introduced algorithms Fairness-related laws and regulations clearly state that the development and application of artificial intelligence algorithms must meet fairness constraints. Therefore, whether it is from the perspective of meeting regulatory compliance or improving user experience, conducting algorithm fairness assessments and eliminating algorithm bias are essential steps in the entire algorithm life cycle.

In response to the above problems, embodiments of this specification provide a fairness evaluation system that has the ability to evaluate the fairness of some artificial intelligence algorithm tasks (such as text classification, image classification, etc.), but the fairness evaluation system Only the fairness evaluation of natural inputs in a trusted environment is considered, and the quantification of fairness relies entirely on statistical indicators such as accuracy, recall, and F1-score. In an uncontrolled environment (non-trusted environment), these statistical indicators will be greatly affected by factors such as adversarial perturbation and data selection. Therefore, the fairness evaluation results produced by the above system will not accurately reflect the algorithm ( model) itself, so the validity and usability of the evaluation cannot be guaranteed.

Based on this, in this specification, two model fairness evaluation methods are provided. One or more embodiments of this specification relate to two model fairness evaluation devices, a computing device, a computer-readable storage medium, and a computer program, which will be described in detail one by one in the following embodiments.

Referring to Figure 1, Figure 1 shows a schematic diagram of a specific scenario of a model fairness assessment method provided according to an embodiment of this specification, which specifically includes the following steps.

Specifically, the model fairness evaluation method provided by the embodiment of this specification is applied to the model fairness evaluation platform.

Step 102: Based on the large-scale graphic and text training samples collected in the database, use self-supervised learning technology to train a large-scale pre-training model (picture and/or text training model), and initially model the number of graphic and text training samples through the pre-training model. According to the probability distribution; and, based on the pre-training model, deep generation technology is used to construct a generative adversarial network model, and the data probability distribution of graphic training samples is further optimized through the generative adversarial network model.

Among them, graphic training samples can be understood as picture and text training samples.

Step 104: Receive the fairness evaluation request sent by the user. The fairness evaluation request carries the sample to be evaluated and the model to be evaluated. First, based on the data probability distribution of the graphic training sample and the generated adversarial network model, the sample to be evaluated is trusted. Secondly, based on the credibility test results of the samples to be evaluated, when it is determined that the samples to be evaluated include adversarial samples, adversarial defense technology is used to denoise and adversarially reconstruct the adversarial samples. After determining the distribution of the samples to be evaluated, When the diversity detection is weak, the generative adversarial network model is used to generate diversity for the samples to be evaluated; finally, the original samples to be evaluated, the evaluation samples after adversarial reconstruction and the samples generated by diversity are mixed, and then combined with the samples to be evaluated The model is input to the fairness evaluation module for evaluation, and the fairness evaluation results of the model to be evaluated are obtained.

Specifically, the fairness evaluation of the model to be evaluated in the embodiments of this specification can be understood as a statistical model performance difference indicator for different groups of the model to be evaluated based on sensitive/protected attributes (such as ethnicity, belief, income, etc.). If false Positive rate, statistical equality, equal opportunity, inconsistent impact and other indicators. Subsequently, the fairness of the model to be tested can be evaluated based on these indicators.

Step 106: Return the fairness evaluation result of the model to be evaluated to the user.

The model fairness evaluation method provided by the embodiment of this specification proposes a robust graphic algorithm fairness evaluation system, which models the data probability distribution by combining large-scale pre-training technology and deep generation technology, and treats evaluation samples according to the probability distribution. Carry out reliability testing, and perform denoising, adversarial reconstruction and diversity generation respectively for untrusted samples such as adversarial samples or distribution deviation samples, so as to improve the reliability and integrity of evaluation samples in untrusted environments, thereby Ensure the robustness of the fairness evaluation system in an uncontrolled environment and the availability of evaluation results.

Referring to Figure 2, Figure 2 shows a flow chart of a model fairness assessment method provided by an embodiment of this specification, which specifically includes the following steps.

Step 202: Based on the image and/or text training model, determine the real data probability distribution of the image and/or text training sample.

Among them, pictures include but are not limited to pictures of any type, any size, and contain any content, such as pictures of animals or people; texts include but are not limited to any type, any length, and contain any content, such as academic discussions, Literary articles, etc.

The picture and/or text training model can be understood as a picture training model, a text training model or a training model that combines pictures and text, etc.; in actual applications, the specific type of picture and/or text training model can be determined according to the actual situation. The requirements are determined, and the embodiments of this specification do not limit this in any way.

During specific implementation, in order to ensure the accuracy of the real data probability distribution of image and/or text training samples, large-scale image and/or text training samples will first be used to train the image and/or text training model, and then based on The trained image and/or text training model models the real data probability distribution of the image and/or text training sample; then the real data probability distribution is tuned by generating an adversarial network model to obtain the optimized real data. Probability distributions. The specific implementation method is as follows:

Determining the real data probability distribution of the picture and/or text training samples based on the picture and/or text training model, include:

Obtain image and/or text training samples;

Use self-supervised learning technology to train and obtain image and/or text training models based on the training samples;

According to the picture and/or text training model, obtain the real data probability distribution adjusted by the training sample;

The real data probability distribution of the training sample is adjusted according to the generative adversarial network model to obtain the adjusted real data probability distribution of the training sample.

Among them, in order to ensure the accuracy of the picture and/or text training model, in the embodiment of this specification, large-scale picture and/or text training samples will be obtained for model training; and when the training samples are pictures, the The image and/or text training model can be understood as a visual Transformer model, etc.; when the training sample is text, the image and/or text training model can be understood as a language model, BERT model, etc.; when the training sample is a graphic training sample, In this case, the image and/or text training model can be understood as a multi-modal fusion model that combines the visual Transformer model and the language model BERT model.

Specifically, after obtaining a large-scale image and/or text training sample, self-supervised learning technology will be used to train the image and/or text training model based on the large-scale image and/or text training sample; and then use the image to and/or the text training model initially models the real data probability distribution of the image and/or text training sample. In order to further optimize the real data probability distribution of the picture and/or text training sample, the real data probability distribution of the picture and/or text training sample can be tuned according to the generative adversarial network model to obtain the picture and/or Probability distribution of real data adjusted for text training samples.

The model fairness evaluation method provided by the embodiments of this specification first trains the picture and/or text training model through large-scale picture and/or text training samples, and initially models the picture and/or text based on the picture and/or text training model. The real data probability distribution of the training samples is then optimized based on the generative adversarial network model constructed with deep generation technology to optimize the data probability distribution of the initially modeled image and/or text training samples to determine the accuracy of the image and/or text training samples. performance and availability.

Before tuning the real data probability distribution of image and/or text training samples based on the generative adversarial network model, it is necessary to use deep generation technology to build a generative adversarial network model based on the image and/or text training model to ensure that the generative adversarial network Subsequent availability of the model. The specific implementation method is as follows:

The step of adjusting the real data probability distribution of the training sample according to the generative adversarial network model to obtain the adjusted real data probability distribution of the training sample includes:

Build a generative adversarial network model based on the image and/or text training model;

Train the generative adversarial network model according to the training samples, and obtain the discriminating module and generating module of the trained generative adversarial network model;

The real data probability distribution of the training sample is adjusted according to the discrimination module to obtain the adjusted real data probability distribution of the training sample.

Specifically, first build a generative adversarial network model based on pictures and/or text training models; then train the generative adversarial network model based on picture and/or text training samples to obtain the discriminant module and generation module of the trained generative adversarial network model. ; Finally, the initial probability distribution of the image and/or text training samples is fine-tuned according to the discriminant module of the generative adversarial network model, and the real data probability distribution after fine-tuning of the image and/or text training samples is obtained.

In specific implementation, the construction phase of the generative adversarial network model includes two parts. The first part is the construction of the generative adversarial network model, and the second part is the training of the generative adversarial network model. Among them, the generative adversarial network model consists of two parts: the discriminant module and the generation module. Then when constructing the generative adversarial network model, the image and/or text training sample model obtained by training in the above embodiment can be used as the discriminant module of the generative adversarial network model; and for the generation module, if image data is generated, you can use Multiple upsampling deconvolution networks build a generation module for a generative adversarial network model. If text data is generated, Transformer can be used as the generation module for a generative adversarial network model. The specific implementation method is as follows:

The method of constructing a generative adversarial network model based on the picture and/or text training model includes:

Initialize the module parameters of the discrimination module of the generative adversarial network model according to the model parameters of the picture and/or text training model, and construct the discrimination module of the generative adversarial network model;

Construct a generation module of the generative adversarial network model according to the deconvolution network and/or text generation network;

The generative adversarial network model is constructed according to the discriminating module and the generating module.

Among them, the image and/or text training sample model obtained by training in the above embodiment is used as the discriminant module of the generative adversarial network model. It can be understood that the discriminant module of the generative adversarial network model is initialized according to the model parameters of the image and/or text training model. module parameters to build the discriminant module of the generative adversarial network model; and the generation module can be constructed based on the type of data to be generated by selecting a deconvolution network or a text generation network; the final generated discriminant module and generation module are constructed and generated Adversarial network model.

After constructing the generative adversarial network model, the generative adversarial network model can be trained; specifically, the generation module and the discriminating module of the generative adversarial network model are alternately trained by constructing a zero-sum game adversarial loss function, so that the generation module generates The data can be closer to the real data distribution, and at the same time, the discriminant module can better distinguish between real data and generated data.

Specifically, in the embodiment of this specification, the parameters of the pre-training model (i.e., the model parameters of the picture and/or text training model) are used to initialize the parameters of the discriminator (i.e., the discriminant module). The advantage is that the pre-training model is based on It is obtained by training with large-scale image and text training samples. By initializing the discriminator through the pre-training model, the knowledge learned from the large-scale training samples of the pre-training model can be transferred to the discriminator, which is the technical implementation of pre-training and fine-tuning in deep learning.

The model fairness evaluation method provided by the embodiments of this specification constructs a generative adversarial network model based on the picture and/or text training model, and trains the generative adversarial network model based on the real data probability distribution of the picture and/or text training samples, Subsequently, according to the generated adversarial network model obtained through training, the real data probability distribution of the picture and/or text training sample can be adjusted to obtain the adjusted real data probability distribution of the picture and/or text training sample to improve the picture and/or text training sample. Or the authenticity of text training samples.

Step 204: Determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model.

After obtaining the adjusted real data probability distribution of the image and/or text training samples, it can be combined with the generated adversarial network model to detect the credibility of the samples to be evaluated.

Specifically, determining the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model includes:

According to the picture and/or text training model, obtain the sample data probability distribution of the sample to be evaluated;

According to the sample data probability distribution, determine the similarity of the sample to be evaluated belonging to the real data probability distribution of the training sample;

According to the discriminant module of the generative adversarial network model, obtain the sample prediction result of the sample to be evaluated;

Based on the similarity and the sample prediction result, the credibility detection result of the sample to be evaluated is determined.

Among them, for ease of understanding, the real data probability distribution in the following embodiments can be understood as the real data probability distribution adjusted by the picture and/or text training sample; and the sample prediction result of the sample to be evaluated can be understood as the sample prediction result of the sample to be evaluated. Samples are generated samples or real samples.

In specific implementation, first, train the model based on pictures and/or text to obtain the sample data probability distribution of the sample to be evaluated, and calculate whether the sample to be evaluated belongs to real data (pictures and/or text) based on the sample data probability distribution of the sample to be evaluated. training samples) distribution; at the same time, according to the discriminant module of the generative adversarial network model, the sample prediction results of the samples to be evaluated are obtained; and then based on the similarity and the sample prediction results, the sample to be evaluated is determined The credibility test results of the sample.

The model fairness evaluation method provided by the embodiments of this specification performs credibility detection on the samples to be evaluated based on the target probability distribution of the image and/or text training samples and the generated adversarial network model, thereby determining whether the evaluation samples include adversarial samples or When the diversity of the samples to be evaluated is weak and there is a problem with the credibility of the samples to be evaluated, subsequent processing of the samples to be evaluated can be performed to improve the reliability and integrity of the samples to be evaluated in an untrustworthy environment. sex.

In practical applications, the credibility test of the sample to be evaluated can be understood as the detection of whether the sample to be evaluated is an adversarial sample and the distribution diversity of the sample to be evaluated. The specific implementation method is as follows:

Determining the credibility test result of the sample to be evaluated based on the similarity and the sample prediction result includes:

According to the similarity and the sample prediction result, it is determined whether the sample to be evaluated is an adversarial sample and the distribution diversity of the sample to be evaluated.

In practical applications, after obtaining the log-likelihood and sample prediction results, you can judge whether the sample to be evaluated is an OOD sample based on the log-likelihood and sample prediction results. Such as adversarial samples, theoretical log-likelihood The smaller the degree (in practical applications, a log-likelihood threshold can be set based on the sample set and model task. If it is less than this threshold, the log-likelihood can be considered smaller), the sample to be evaluated is an adversarial sample. The greater the probability.

For the distribution diversity detection of the samples to be evaluated, the statistics of the samples to be evaluated belong to the logarithmic likelihood distribution in the real data distribution. If the distribution is more divergent, it means that the distribution diversity of the samples to be evaluated is stronger, and the distribution is more diverse. Concentration can mean that the distribution diversity of the samples to be evaluated is weaker. In practical applications, the detection of the distribution diversity of the sample to be evaluated is the detection of the distribution diversity of the entire sample to be evaluated, rather than the measurement of a single sample to be evaluated, which can be measured by variance, standard deviation, median, central tendency, etc. The indicator of distribution divergence is used to detect the distribution diversity of the samples to be evaluated. The embodiments of this specification do not impose any limitations on this.

The model fairness evaluation method provided by the embodiments of this specification, after obtaining the log likelihood and sample prediction results that the sample to be evaluated belongs to the real data distribution, can then evaluate the model fairness based on the log likelihood and sample prediction results. The sample is tested for credibility, that is, whether the sample to be evaluated is an adversarial sample and the distribution diversity of the sample to be evaluated is tested. When it is determined that the sample to be evaluated is an adversarial sample or the distribution diversity of the sample to be evaluated is weak, then After determining that the sample to be evaluated is untrustworthy, the untrustworthy sample to be evaluated can be subsequently processed to improve the reliability and integrity of the sample to be evaluated in an untrustworthy environment.

Step 206: If the credibility detection result satisfies the non-credibility condition, perform sample processing on the sample to be evaluated according to the credibility detection result to obtain an updated evaluation sample.

Specifically, when it is determined that the sample to be evaluated is an adversarial sample, the sample can be reconstructed through denoising and adversarial reconstruction; and when it is determined that the distribution diversity of the sample to be evaluated is weak, then it can be Diversity generation is performed on the sample to be evaluated. The specific implementation method is as follows:

In the case where the credibility test result meets the non-credibility condition, sample processing is performed on the sample to be evaluated according to the credibility test result to obtain an updated evaluation sample, including:

When the sample to be evaluated is an adversarial sample, perform sample processing on the sample to be evaluated according to a first preset processing method to obtain a first updated evaluation sample; and/or

When the distribution diversity of the samples to be evaluated satisfies the preset distribution conditions, sample processing is performed on the samples to be evaluated according to the second preset processing method to obtain a second updated evaluation sample.

The first preset processing method and the second preset processing method can be set according to actual applications, and the embodiments of this specification do not limit this in any way.

In practical applications, for the credibility detection of the sample to be evaluated, it can be detected whether the sample to be evaluated is an adversarial sample and whether the distribution diversity of the sample to be evaluated is weak; Even when the distribution diversity of the evaluation sample is weak, the sample to be evaluated can be considered to be untrustworthy; in this case, the sample to be evaluated needs to be processed.

During specific implementation, when the credibility test result of the sample to be evaluated is, and the sample to be evaluated is an adversarial sample, it can be determined that the credibility test result of the sample to be evaluated satisfies the non-credibility condition. At this time, then The sample to be evaluated can be processed according to the first preset processing method to obtain the first updated evaluation sample; and when the credibility test result of the sample to be evaluated is that the distribution diversity of the sample to be evaluated is weak, It can be determined that the credibility test result of the sample to be evaluated satisfies the non-credibility condition. At this time, the sample to be evaluated can be processed according to the second preset processing method to obtain a second updated evaluation sample.

Then, when the first preset processing method includes data compression, data randomization or a denoising method against error correction, the sample to be evaluated is processed according to the first preset processing method to obtain the specific details of the first updated evaluation sample. The method is as follows:

When the sample to be evaluated is an adversarial sample, performing sample processing on the sample to be evaluated according to a first preset processing method to obtain a first updated evaluation sample includes:

When the sample to be evaluated is an adversarial sample, the sample to be evaluated is reconstructed through a denoising method such as data compression, data randomization or adversarial error correction to obtain a first updated evaluation sample.

When the second preset processing method is to generate a new evaluation sample, the sample to be evaluated is processed according to the second preset processing method, and the specific processing method for obtaining the second updated evaluation sample is as follows:

When the distribution diversity of the samples to be evaluated meets the preset distribution conditions, sample processing is performed on the samples to be evaluated according to the second preset processing method to obtain the second updated evaluation sample, including:

When the distribution diversity of the samples to be evaluated satisfies the preset distribution conditions, according to the distribution diversity of the samples to be evaluated, The sample data probability distribution is used to generate new evaluation samples through the generation module of the generative adversarial network model;

According to the newly added evaluation sample, a second updated evaluation sample is obtained.

Specifically, when the credibility of the sample to be evaluated is due to weak adversarial samples and distribution diversity, adversarial reconstruction and diversity generation of the sample to be evaluated can be performed, where adversarial reconstruction can be understood as, in the sample to be evaluated, The detected adversarial samples are reconstructed using denoising methods including but not limited to data compression, data randomization, adversarial error correction, etc. to eliminate the interference of adversarial noise; diversity generation can be understood as based on the evaluation to be The sample data probability distribution of the sample uses the generation module of the generative adversarial network model to generate new evaluation samples to expand the diversity of samples to be evaluated.

The model fairness evaluation method provided by the embodiments of this specification, after obtaining the log likelihood and sample prediction results that the sample to be evaluated belongs to the real data distribution, can then evaluate the model fairness based on the log likelihood and sample prediction results. The sample is tested for credibility, that is, whether the sample to be evaluated is an adversarial sample and the distribution diversity of the sample to be evaluated is tested. When it is determined that the sample to be evaluated is an adversarial sample or the distribution diversity of the sample to be evaluated is weak, then After determining that the sample to be evaluated is untrustworthy, subsequent data processing can be performed on the untrustworthy sample to be evaluated to improve the reliability and integrity of the sample to be evaluated in an untrustworthy environment, thereby ensuring that the fairness assessment method is used in Robustness and availability of measurement results in untrusted environments.

In addition, in order to ensure the accuracy of the new evaluation samples, after the new evaluation samples are generated through the generation module of the generative adversarial network model, the quality of the new evaluation samples will also be tested based on the discriminant module of the generative adversarial network model to ensure Added accuracy rate of new evaluation samples. The specific implementation method is as follows:

The second updated evaluation sample is obtained based on the newly added evaluation sample, including:

Input the newly added evaluation sample into the discrimination module of the generative adversarial network model to obtain the prediction results of the newly added evaluation sample;

According to the prediction results of the new evaluation sample, the new evaluation sample is deleted to obtain a second updated evaluation sample.

The model fairness evaluation method provided by the embodiments of this specification, in order to ensure the accuracy of the newly added evaluation samples, after generating the new evaluation samples through the generation module of the generative adversarial network model, it will also be based on the generation quality of the new evaluation samples. Perform sample filtering to ensure the accuracy of newly added evaluation samples. In addition, the new evaluation samples can also be generated and filtered based on the quality evaluation indicators of the original samples to be evaluated. For example, if the original samples to be evaluated are images, they can be quality filtered based on the Inception Score (initial score) of the images. When the original sample to be evaluated is text, quality filtering can be performed on it based on the fluency of the text.

Step 208: Conduct a fairness assessment on the model to be evaluated based on the sample to be evaluated and the updated evaluation sample.

Among them, the updated evaluation sample includes the first updated evaluation sample and/or the second updated evaluation sample; and the model to be evaluated can also be understood as an image and text recognition model of the same type as the training model.

When the updated evaluation sample includes the first updated evaluation sample, the sample to be evaluated and the first updated evaluation sample are mixed. After the mixed sample is generated, the fairness of the model to be evaluated is evaluated; when the updated evaluation sample includes the second updated evaluation sample In the case of samples, the samples to be evaluated and the second updated evaluation samples are mixed. After the mixed samples are generated, the fairness of the model to be evaluated is evaluated; the updated evaluation samples include the first updated evaluation samples and the second updated evaluation samples. In the case of new evaluation samples, the samples to be evaluated, the first updated evaluation samples, and the second updated evaluation samples are mixed. After the mixed samples are generated, the fairness of the model to be evaluated is evaluated. The specific implementation method is as follows:

The fairness evaluation of the model to be evaluated based on the sample to be evaluated and the updated evaluation sample includes:

Mix the sample to be evaluated and the updated evaluation sample to obtain a mixed evaluation sample;

Input the mixed evaluation sample and the model to be evaluated into the fairness evaluation module to obtain the fairness evaluation index of the model to be evaluated;

Conduct a fairness evaluation on the model to be evaluated according to the fairness evaluation index of the model to be evaluated.

Among them, fairness evaluation indicators include but are not limited to false positive rate, statistical equality, equal opportunity, inconsistent impact and other indicators.

During specific implementation, the mixed evaluation sample and the model to be evaluated are input into the fairness evaluation module to obtain the fairness evaluation index of the model to be evaluated, including:

Input the mixed evaluation sample and the model to be evaluated into the fairness evaluation module;

receiving the fairness evaluation index of the model to be evaluated determined based on the comparison result of the real value and the predicted value of the mixed evaluation sample output by the fairness evaluation module,

Wherein, the predicted value is the output of the model to be evaluated based on the mixed evaluation sample.

Specifically, the comparison result between the true value and the predicted value of the mixed evaluation sample can be understood as the prediction accuracy of the model to be evaluated.

In practical applications, the mixed evaluation sample and the model to be evaluated are input into the fairness evaluation module. In the fairness evaluation module, the mixed evaluation sample is input into the model to be evaluated, and the predicted value of the mixed evaluation sample output by the model to be evaluated is obtained; according to The predicted value and the true value of the mixed evaluation sample are used to calculate the prediction accuracy of the model to be evaluated; and after determining the prediction accuracy of the model to be evaluated, the fairness assessment module calculates the prediction accuracy of the model to be evaluated. The model's false positive rate, statistical equality, equal opportunity, inconsistent impact and other indicators; subsequent users or the system can evaluate the fairness of the model to be evaluated based on these indicators.

The model fairness evaluation method provided by the embodiment of this specification uses a graphic training model to model the real data probability distribution of graphic training samples, and conducts credibility detection on the samples to be evaluated based on the real data probability distribution; and for non- Trusted samples are processed to obtain updated evaluation samples, thereby improving the reliability and completeness of evaluation samples in non-trusted environments, thereby ensuring the fairness evaluation method of the model, both in trusted environments and non-trusted environments. Robustness and availability of model evaluation results, thereby ensuring the accuracy of the fairness evaluation of the model to be evaluated through this model fairness evaluation method, so that subsequent models to be evaluated can be used in practical applications, whether from meeting regulatory compliance or improving From a user experience perspective, both have good results.

The following describes the model fairness evaluation method further in conjunction with Figure 3, taking the application of the model fairness evaluation method provided in this specification in fairness evaluation of the recommended model as an example. Among them, FIG. 3 shows a process flow chart of a model fairness evaluation method provided by an embodiment of this specification, which specifically includes the following steps.

Step 302: Based on the large-scale unsupervised graphic and text training samples collected, use self-supervised learning technology to train a pre-training model, and use the pre-training model to initially model the real data probability distribution of the graphic and text training samples.

Step 304: Use deep generation technology to build a generative adversarial network model based on the pre-trained model, and train based on graphics and text Sample training generates an adversarial network model, and optimizes the real data probability distribution of graphic and text training samples based on the generative adversarial network model to obtain the optimized real data probability distribution of graphic and text training samples.

Step 306: Based on the optimized real data probability distribution of the graphic training samples and the generated adversarial network model, perform a credibility test on the samples to be evaluated.

Among them, based on the optimized real data probability distribution of the graphic training samples and the generated adversarial network model, the credibility of the samples to be evaluated is tested. The specific implementation method can be found in the detailed introduction of the above embodiments, and will not be described again here.

Step 308: Based on the credibility detection results of the samples to be evaluated, use adversarial defense technology to perform adversarial reconstruction of the samples to be evaluated, or use a generator that generates an adversarial network model to generate diversity for the samples to be evaluated.

For specific implementation methods of adversarial reconstruction and diversity generation of samples to be evaluated, please refer to the detailed introduction of the above embodiments and will not be described again here.

Step 310: Mix the sample to be evaluated with the adversarially reconstructed sample to be evaluated and/or the diversity-generated sample to be evaluated to obtain a mixed sample, and input the mixed sample and the recommended model into the fairness evaluation module for evaluation to obtain the recommendation. Model fairness evaluation results.

The model fairness evaluation method provided by the embodiments of this specification proposes algorithm fairness evaluation technology in an uncontrolled (trusted) environment, based on large-scale and easy-to-obtain unsupervised graphic and text training data, and utilizing large-scale pre-training technology Model the probability distribution of real data by combining it with deep generation technology, and detect untrustworthy samples in the samples to be evaluated based on the probability distribution of the real data (unsupervised graphic training data) obtained through modeling, and use adversarial defense technology to counteract them. Noisy reconstruction can effectively eliminate the impact of adversarial noise on fairness evaluation.

Regarding the problem of distribution deviation of the data to be evaluated (the sample distribution is single and cannot cover the entire data distribution), the model fairness evaluation method in the embodiment of this specification proposes a data distribution based on the sample to be evaluated itself, while combining depth generation technology to achieve diversity Generation can achieve the purpose of improving the reliability and integrity of samples to be evaluated in untrusted environments. In addition, in the entire fairness evaluation process, the model fairness evaluation method provided by the embodiments of this specification does not require additional evaluation data or manual intervention, which greatly improves the intelligence of the evaluation and also reduces the evaluation cost. . In summary, the model fairness evaluation method provided by the embodiments of this specification not only has the ability to evaluate fairness in a trusted environment, but also ensures the robustness of the fairness evaluation and the availability of the evaluation results in an uncontrolled environment. Therefore, it will be applicable to the fairness assessment of algorithms on e-commerce platforms, online social platforms, online social media and other platforms, including but not limited to intelligent customer service, personalized recommendations, intelligent risk control, etc., to eliminate algorithm bias and promote algorithms to meet regulatory compliance. and improve user experience.

Corresponding to the above method embodiments, this specification also provides an embodiment of a model fairness evaluation device. Figure 4 shows a schematic structural diagram of a model fairness evaluation device provided by an embodiment of this specification. As shown in Figure 4, the device includes:

The probability distribution determination module 402 is configured to determine the real data probability distribution of the picture and/or text training sample according to the picture and/or text training model;

The detection result determination module 404 is configured to determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model;

The sample processing module 406 is configured to perform sample processing on the sample to be evaluated according to the credibility detection result to obtain an updated evaluation sample when the credibility detection result satisfies the non-credibility condition;

The evaluation module 408 is configured to perform a fairness evaluation on the model to be evaluated based on the sample to be evaluated and the updated evaluation sample.

Optionally, the probability distribution determination module 402 is further configured to:

Obtain image and/or text training samples;

Optionally, the detection result determination module 404 is further configured to:

Optionally, the sample processing module 406 is further configured to:

When the sample to be evaluated is an adversarial sample, the method can be used to remove errors through data compression, data randomization or adversarial error correction. The noise method is used to reconstruct the sample to be evaluated to obtain the first updated evaluation sample.

Optionally, the sample processing module 406 is further configured to:

When the distribution diversity of the samples to be evaluated meets the preset distribution conditions, a new evaluation sample is generated through the generation module of the generative adversarial network model according to the sample data probability distribution of the samples to be evaluated;

Optionally, the sample processing module 406 is further configured to:

Optionally, the evaluation module 408 is further configured to:

The model fairness evaluation device provided by the embodiment of this specification uses a graphic training model to model the real data probability distribution of graphic training samples, and conducts credibility detection on the samples to be evaluated based on the real data probability distribution; and for non- Trusted samples are processed to obtain updated evaluation samples, thereby improving the reliability and completeness of evaluation samples in non-trusted environments, thereby ensuring the fairness evaluation method of the model, both in trusted environments and non-trusted environments. Robustness and availability of model evaluation results, thereby ensuring the accuracy of the fairness evaluation of the model to be evaluated through this model fairness evaluation method, so that subsequent models to be evaluated can be used in practical applications, whether from meeting regulatory compliance or improving From a user experience perspective, both have good results.

The above is a schematic solution of a model fairness evaluation device in this embodiment. It should be noted that the technical solution of the model fairness evaluation device and the technical solution of the above-mentioned model fairness evaluation method belong to the same concept. For details that are not described in detail in the technical solution of the model fairness evaluation device, please refer to the above-mentioned model fairness. Description of technical solutions for sexual assessment methods.

Referring to Figure 5, Figure 5 shows a flow chart of another model fairness evaluation method provided by an embodiment of this specification, which specifically includes the following steps.

Step 502: Determine the real data probability distribution of the image and/or text training samples based on the image and/or text training model.

Step 504: Receive the sample to be evaluated and the model to be evaluated sent by the user.

Step 506: Determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model.

Step 508: If the credibility detection result satisfies the non-credibility condition, perform sample processing on the sample to be evaluated according to the credibility detection result to obtain an updated evaluation sample.

Step 510: Conduct a fairness assessment on the model to be evaluated based on the sample to be evaluated and the updated evaluation sample.

Step 512: Obtain the fairness evaluation result of the model to be evaluated, and return the fairness evaluation result to the user.

The model fairness evaluation method provided by the embodiments of this specification is applied to the model fairness evaluation platform.

The actual application scenario can be that if a user wants to conduct a fairness evaluation on his or her project model on the model fairness evaluation platform, the user can send the sample to be evaluated and the model to be evaluated to the model fairness evaluation platform. The model fairness evaluation platform After receiving the samples to be evaluated and the models to be evaluated sent by the user, adversarial reconstruction or diversity generation of the samples to be evaluated can be performed according to the above embodiment, thereby ensuring that the evaluation results given by the model fairness evaluation platform are closer to each other. The actual situation of the model to be evaluated.

The model fairness evaluation method provided by the embodiments of this specification is first based on large-scale and easy-to-obtain unsupervised graphic training data, and uses a combination of large-scale pre-training technology and deep generation technology to model the training data probability distribution as real data Probability distributions. For the samples to be evaluated, before entering the fairness assessment module, the reliability of the samples to be evaluated will be tested based on the probability distribution of real data, which effectively alleviates the interference of untrustworthy samples on the evaluation results. For detected untrusted samples such as adversarial samples or distribution deviation samples, this solution denoises and reconstructs the evaluation samples based on adversarial defense technology and generates diversity based on deep generation models, achieving improved performance in untrusted environments. The purpose of ensuring the reliability and integrity of the evaluation samples is to ensure the robustness of the fairness evaluation system in uncontrolled environments and the availability of evaluation results, and effectively make up for the existing system's resistance to disturbance and evaluation in uncontrolled environments. The disadvantage of being sensitive to sample distribution deviation.

Corresponding to the above method embodiment, this specification also provides another embodiment of a model fairness evaluation device. Figure 6 shows a schematic structural diagram of another model fairness evaluation device provided by an embodiment of this specification. As shown in Figure 6, the device is applied to the model fairness evaluation platform, including:

The first determination module 602 is configured to determine the real data probability distribution of the image and/or text training samples according to the image and/or text training model;

The data receiving module 604 is configured to receive samples to be evaluated and models to be evaluated sent by the user;

The second determination module 606 is configured to determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model;

The sample update module 608 is configured to perform sample processing on the sample to be evaluated according to the credibility detection result to obtain an updated evaluation sample when the credibility detection result satisfies the non-credibility condition;

The fairness evaluation module 610 is configured to perform a fairness evaluation on the model to be evaluated based on the sample to be evaluated and the updated evaluation sample;

The result display module 612 is configured to obtain the fairness evaluation result of the model to be evaluated, and return the fairness evaluation result to the user.

The model fairness evaluation device provided by the embodiments of this specification is first based on large-scale and easy-to-obtain unsupervised graphic training data, and uses a combination of large-scale pre-training technology and deep generation technology to model the training data probability distribution as real data Probability distributions. For the samples to be evaluated, before entering the fairness assessment module, the reliability of the samples to be evaluated will be tested based on the probability distribution of real data, which effectively alleviates the interference of untrustworthy samples on the evaluation results. For detected untrusted samples such as adversarial samples or distribution deviation samples, this solution denoises and reconstructs the evaluation samples based on adversarial defense technology and generates diversity based on deep generation models, achieving improved performance in untrusted environments. The purpose of ensuring the reliability and integrity of the evaluation samples is to ensure the robustness of the fairness evaluation system in uncontrolled environments and the availability of evaluation results, and effectively make up for the existing system's resistance to disturbance and evaluation in uncontrolled environments. The disadvantage of being sensitive to sample distribution deviation.

Figure 7 shows a structural block diagram of a computing device 700 provided according to an embodiment of this specification. Components of the computing device 700 include, but are not limited to, memory 710 and processor 720 . The processor 720 and the memory 710 are connected through a bus 730, and the database 750 is used to save data.

Computing device 700 also includes an access device 740 that enables computing device 700 to communicate via one or more networks 760 . Examples of these networks include the Public Switched Telephone Network (PSTN), a local area network (LAN), a wide area network (WAN), a personal area network (PAN), or a combination of communications networks such as the Internet. Access device 740 may include one or more of any type of network interface (eg, a network interface card (NIC)), wired or wireless, such as an IEEE 802.11 Wireless Local Area Network (WLAN) wireless interface, Global Interconnection for Microwave Access ( Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, etc.

In one embodiment of this specification, the above-mentioned components of the computing device 700 and other components not shown in FIG. 7 may also be connected to each other, such as through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 7 is for illustrative purposes only and does not limit the scope of this description. Those skilled in the art can add or replace other components as needed.

Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet computer, personal digital assistant, laptop computer, notebook computer, netbook, etc.), a mobile telephone (e.g., smartphone ), a wearable computing device (e.g., smart watch, smart glasses, etc.) or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.

The processor 720 is configured to execute the following computer-executable instructions. When the computer-executable instructions are executed by the processor, the steps of the above model fairness evaluation method are implemented.

The above is a schematic solution of a computing device in this embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned model fairness evaluation method belong to the same concept. Details that are not described in detail in the technical solution of the computing device can be found in the technical solution of the above-mentioned model fairness evaluation method. description of.

An embodiment of this specification also provides a computer-readable storage medium that stores computer-executable instructions. When the computer-executable instructions are executed by a processor, the steps of the above model fairness evaluation method are implemented.

The above is a schematic solution of a computer-readable storage medium in this embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above-mentioned model fairness evaluation method belong to the same concept. For details that are not described in detail in the technical solution of the storage medium, please refer to the technical solution of the above-mentioned model fairness evaluation method. description of.

An embodiment of this specification also provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to perform the steps of the above model fairness evaluation method.

The above is a schematic solution of a computer program in this embodiment. It should be noted that the technical solution of this computer program and the technical solution of the above-mentioned model fairness evaluation method belong to the same concept. For details that are not described in detail in the technical solution of the computer program, please refer to the technical solution of the above-mentioned model fairness evaluation method. description of.

The foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations.

The computer instructions include computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc. It should be noted that the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer-readable medium Excludes electrical carrier signals and telecommunications signals.

It should be noted that for the convenience of description, each of the foregoing method embodiments is expressed as a series of action combinations. However, those skilled in the art should know that the embodiments of this specification are not limited by the described action sequence. limitation, because according to the embodiments of this specification, certain steps may be performed in other orders or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily necessary for the embodiments of this specification.

In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

The preferred embodiments of this specification disclosed above are only used to help explain this specification. Alternative embodiments are not described in all details, nor are the inventions limited to the specific embodiments described. Obviously, many modifications and changes can be made based on the contents of the embodiments of this specification. These embodiments are selected and described in detail in this specification to better explain the principles and practical applications of the embodiments in this specification, so that those skilled in the art can better understand and utilize this specification. This specification is limited only by the claims and their full scope and equivalents.

Claims

A model fairness assessment method, including:

Based on the image and/or text training model, determine the real data probability distribution of the image and/or text training samples;

Determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model;

When the credibility detection result satisfies the non-credibility condition, perform sample processing on the sample to be evaluated according to the credibility detection result to obtain an updated evaluation sample;

According to the sample to be evaluated and the updated evaluation sample, a fairness evaluation is performed on the model to be evaluated.
The model fairness evaluation method according to claim 1, wherein the training model based on pictures and/or text and determining the real data probability distribution of picture and/or text training samples includes:

Obtain image and/or text training samples;

Use self-supervised learning technology to train and obtain image and/or text training models based on the training samples;

According to the picture and/or text training model, obtain the real data probability distribution adjusted by the training sample;

The real data probability distribution of the training sample is adjusted according to the generative adversarial network model to obtain the adjusted real data probability distribution of the training sample.
The model fairness evaluation method according to claim 2, adjusting the real data probability distribution of the training sample according to the generative adversarial network model to obtain the adjusted real data probability distribution of the training sample, including:

Build a generative adversarial network model based on the image and/or text training model;

Train the generative adversarial network model according to the training samples, and obtain the discriminating module and generating module of the trained generative adversarial network model;

The real data probability distribution of the training sample is adjusted according to the discrimination module to obtain the adjusted real data probability distribution of the training sample.
The model fairness evaluation method according to claim 3, wherein the training model based on the pictures and/or text and constructing a generative adversarial network model includes:

Initialize the module parameters of the discrimination module of the generative adversarial network model according to the model parameters of the picture and/or text training model, and construct the discrimination module of the generative adversarial network model;

Construct a generation module of the generative adversarial network model according to the deconvolution network and/or text generation network;

The generative adversarial network model is constructed according to the discriminating module and the generating module.
The model fairness evaluation method according to claim 1, wherein determining the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model includes:

According to the picture and/or text training model, obtain the sample data probability distribution of the sample to be evaluated;

According to the sample data probability distribution, determine the similarity of the sample to be evaluated belonging to the real data probability distribution of the training sample;

According to the discriminant module of the generative adversarial network model, obtain the sample prediction result of the sample to be evaluated;

Based on the similarity and the sample prediction result, the credibility detection result of the sample to be evaluated is determined.
The model fairness evaluation method according to claim 5, wherein determining the credibility detection result of the sample to be evaluated based on the similarity and the sample prediction result includes:

According to the similarity and the sample prediction result, it is determined whether the sample to be evaluated is an adversarial sample and the distribution diversity of the sample to be evaluated.
The model fairness evaluation method according to claim 6, wherein when the credibility detection result satisfies the non-credibility condition, sample processing is performed on the sample to be evaluated according to the credibility detection result, Get updated test samples, including:

When the sample to be evaluated is an adversarial sample, perform sample processing on the sample to be evaluated according to a first preset processing method to obtain a first updated evaluation sample; and/or

When the distribution diversity of the samples to be evaluated satisfies the preset distribution conditions, sample processing is performed on the samples to be evaluated according to the second preset processing method to obtain a second updated evaluation sample.
The model fairness evaluation method according to claim 7, wherein when the sample to be evaluated is an adversarial sample, sample processing is performed on the sample to be evaluated according to a first preset processing method to obtain a first updated evaluation. Samples include:

When the sample to be evaluated is an adversarial sample, the sample to be evaluated is reconstructed through a denoising method such as data compression, data randomization or adversarial error correction to obtain a first updated evaluation sample.
The model fairness evaluation method according to claim 7, wherein when the distribution diversity of the samples to be evaluated satisfies preset distribution conditions, sample processing is performed on the samples to be evaluated according to a second preset processing method. , get the second updated evaluation sample, including:

When the distribution diversity of the samples to be evaluated meets the preset distribution conditions, a new evaluation sample is generated through the generation module of the generative adversarial network model according to the sample data probability distribution of the samples to be evaluated;

According to the newly added evaluation sample, a second updated evaluation sample is obtained.
The model fairness evaluation method according to claim 9, wherein obtaining a second updated evaluation sample based on the newly added evaluation sample includes:

Input the newly added evaluation sample into the discrimination module of the generative adversarial network model to obtain the prediction results of the newly added evaluation sample;

According to the prediction results of the new evaluation sample, the new evaluation sample is deleted to obtain a second updated evaluation sample.
The model fairness evaluation method according to claim 1, wherein the sample to be evaluated and the Update the evaluation samples and conduct a fairness assessment on the model to be evaluated, including:

Mix the sample to be evaluated and the updated evaluation sample to obtain a mixed evaluation sample;

Input the mixed evaluation sample and the model to be evaluated into the fairness evaluation module to obtain the fairness evaluation index of the model to be evaluated;

Conduct a fairness evaluation on the model to be evaluated according to the fairness evaluation index of the model to be evaluated.
The model fairness evaluation method according to claim 11, said inputting the mixed evaluation sample and the model to be evaluated into a fairness evaluation module to obtain the fairness evaluation index of the model to be evaluated, including:

Input the mixed evaluation sample and the model to be evaluated into the fairness evaluation module;

receiving the fairness evaluation index of the model to be evaluated determined based on the comparison result of the real value and the predicted value of the mixed evaluation sample output by the fairness evaluation module,

Wherein, the predicted value is the output of the model to be evaluated based on the mixed evaluation sample.
A model fairness assessment device, including:

a probability distribution determination module configured to determine the real data probability distribution of the image and/or text training sample based on the image and/or text training model;

The detection result determination module is configured to determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model;

A sample processing module configured to perform sample processing on the sample to be evaluated according to the credibility detection result to obtain an updated evaluation sample when the credibility detection result satisfies the non-credibility condition;

The evaluation module is configured to perform a fairness evaluation on the model to be evaluated based on the sample to be evaluated and the updated evaluation sample.
A model fairness assessment method, applied to the model fairness assessment platform, including:

Based on the image and/or text training model, determine the real data probability distribution of the image and/or text training samples;

Receive samples to be evaluated and models to be evaluated sent by users;

Determine the credibility detection result of the sample to be evaluated based on the real data probability distribution and the generated adversarial network model;

When the credibility detection result satisfies the non-credibility condition, perform sample processing on the sample to be evaluated according to the credibility detection result to obtain an updated evaluation sample;

Conduct a fairness assessment on the model to be evaluated based on the sample to be evaluated and the updated evaluation sample;

Obtain the fairness evaluation result of the model to be evaluated, and return the fairness evaluation result to the user.