WO2023179055A1 - 一种偏见评估方法、装置、介质、程序产品及电子设备 - Google Patents

一种偏见评估方法、装置、介质、程序产品及电子设备 Download PDF

Info

Publication number
WO2023179055A1
WO2023179055A1 PCT/CN2022/132232 CN2022132232W WO2023179055A1 WO 2023179055 A1 WO2023179055 A1 WO 2023179055A1 CN 2022132232 W CN2022132232 W CN 2022132232W WO 2023179055 A1 WO2023179055 A1 WO 2023179055A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
images
evaluation
model
factors
Prior art date
Application number
PCT/CN2022/132232
Other languages
English (en)
French (fr)
Inventor
陈晓仕
张诗杰
朱森华
Original Assignee
华为云计算技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为云计算技术有限公司 filed Critical 华为云计算技术有限公司
Publication of WO2023179055A1 publication Critical patent/WO2023179055A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a bias assessment method, device, medium, program product and electronic equipment.
  • Image data set bias refers to the "false technical features" present in the image data set. These false technical features are features that exist in the image data that are not expected to be learned by the machine learning model when using image data to train a machine learning model.
  • some images in the image data set contain information such as device labels such as the model of the image acquisition device, image acquisition parameters, and human labels. This information may become false features for model learning, which may lead to machine learning.
  • the model cannot objectively and truly learn the target task as expected by the designer. Then, it may be difficult for the trained machine learning model to complete the target tasks as expected in the actual use environment, resulting in bias in the model and possible wide-scale errors in the recognition results.
  • the model will learn the characteristics of the label to infer whether the medical image is an image of serious illness, and will no longer learn Image features related to the lesion tissue. Furthermore, if an image of a serious heart disease is input into the trained medical image data, and there are no labels for intensive care unit equipment in the medical image, the model may not be able to infer that the patient has a severe heart disease, resulting in serious errors in the model's recognition results. . Therefore, there is an urgent need for a convenient method to evaluate the bias of trained machine learning models to better apply machine learning models.
  • embodiments of the present application provide a bias assessment model, device, medium, program product, and electronic device, which can reduce the collection requirements for verification data sets, obtain more comprehensive bias assessment results for the model to be evaluated, and display them intuitively to users.
  • embodiments of the present application provide a bias assessment method, which is applied to electronic devices.
  • the method includes: obtaining factors to be verified that exist in multiple evaluation images for bias assessment of the model to be evaluated; and based on the factors to be verified.
  • the plurality of evaluation images are classified to obtain a first target evaluation image set.
  • the first target evaluation image set includes: a first evaluation image set that includes the factors to be verified, and/or does not include the factors to be verified.
  • Verify a second evaluation image set of factors perform style conversion on the first target evaluation image set according to the factors to be verified to obtain a second target evaluation image set; combine the first target evaluation image set and the second
  • the target evaluation image set is input to the model to be evaluated for reasoning, and a target reasoning result is obtained; according to the target reasoning result, a bias evaluation result for the model to be evaluated is output, and the bias evaluation result is used to characterize the factors to be verified. Whether it causes bias in the model to be evaluated; wherein, performing style conversion on the first evaluation image set is achieved by removing the factors to be verified, and performing style conversion on the second evaluation image set is by adding the to-be-verified factors. Authentication factors are implemented. It can be understood that the multiple evaluation images are the verification data sets below.
  • the first evaluation image set and the second evaluation image set may be different subsets obtained by dividing the verification data set below.
  • the factors to be verified may be image features, such as image features of a pacemaker.
  • the image before style conversion is an X-ray image of a heart disease patient including the pacemaker image feature
  • the image after style conversion is an X-ray image of a heart disease patient without the pacemaker image.
  • the bias assessment method provided by this application does not require the collection of real labels of sample images, and at the same time overcomes the difficulty of obtaining certain categories of images in practice. It has lower requirements for the collection of sample images in the verification data set, which can reduce the user's Time on sample collection.
  • the above bias assessment result includes at least one of the following: information on whether the factor to be verified is a factor causing bias in the model to be assessed; the first target assessment image set
  • the difference image in the difference image is an evaluation image in which the inference result in the first target evaluation image set is different from the inference result of at least one converted image after corresponding style conversion in the second target evaluation image set, so
  • the inference results of the difference images and the inference results of the converted images are both inference results output by the model to be evaluated; the transformations obtained by performing style conversion on each of the difference images included in the second target evaluation set image; the inference result of the model to be evaluated for each of the difference images; the inference result of the model to be evaluated for each of the converted images; the difference images in the first target evaluation image set in the plurality of The proportion in the evaluation image.
  • difference images are difference samples or images with large differences described below.
  • a difference image in the first target evaluation image set is an X-ray image of a heart disease patient containing image features of a pacemaker
  • the corresponding style-converted image in the second target evaluation image set is an image with the cardiac pacemaker removed.
  • X-ray image of a heart disease patient i.e. corresponding converted image
  • the inference results of the two images are different, which can mean that the inference results are different or have a large difference.
  • the inference results of the two images by the model to be evaluated are different.
  • the model to be evaluated is a classification model, it refers to the different classification results of the two images.
  • the model to be evaluated is a detection model, it refers to the rectangles in the two images.
  • the intersection-over-Union (IoU) of the box is lower than the set IoU threshold.
  • the factors to be verified are determined based on the background and foreground of each original image in the verification data set, and the factors to be verified correspond to the images in the background. feature.
  • the style conversion of images is implemented through an image style conversion model; the style conversion model is trained based on the first evaluation image set and the second evaluation image set. ; And, the image style conversion model is used to remove the verification factors from images containing the verification factors, and add the verification factors to images that do not contain the verification factors, where the verification factors are image features.
  • the first evaluation image set corresponds to a first classification label
  • the second evaluation image set corresponds to a second classification label different from the first classification label
  • the The image style transfer model is trained based on the images in the first evaluation image set and the first classification label, and the images in the second evaluation image set and the second classification label.
  • the above method further includes: receiving the verification data set and the model to be evaluated input by the user.
  • the above method further includes: receiving the factor to be verified input by the user, where the factor to be verified is an image feature or an identifier indicating an image feature.
  • embodiments of the present application provide a bias assessment device, including: an acquisition module, used to obtain the factors to be verified that exist in multiple evaluation images for bias evaluation of the model to be evaluated; a classification module, used to perform bias evaluation according to the The factors to be verified obtained by the acquisition module classify the multiple evaluation images to obtain a first target evaluation image set.
  • the first target evaluation image set includes: a first evaluation image set including the factors to be verified, and/or, a second evaluation image set that does not include the factors to be verified; a conversion module configured to perform style conversion on the first target evaluation image set obtained by the classification module according to the factors to be verified, to obtain a third Two target evaluation image sets; an inference module, used to input the first target evaluation image set obtained by the classification module and the second target evaluation image set obtained by the conversion module into the model to be evaluated for reasoning, and obtain Target reasoning result; an output module, configured to output a bias evaluation result for the model to be evaluated based on the target reasoning result obtained by the reasoning module, and the bias evaluation result is used to characterize whether the factor to be verified causes the The bias of the model to be evaluated; wherein, performing style conversion on the first evaluation image set is achieved by removing the factors to be verified, and performing style conversion on the second evaluation image set is achieved by adding the factors to be verified. of.
  • the bias evaluation result includes at least one of the following: information on whether the factor to be verified is a factor causing bias in the model to be evaluated; the first target evaluation image a difference image in the set, the difference image being an evaluation image in which the inference result in the first target evaluation image set is different from the inference result of at least one converted image after corresponding style conversion in the second target evaluation image set,
  • the inference results of the difference image and the inference result of the converted image are both inference results output by the model to be evaluated; the results included in the second target evaluation set are obtained by performing style conversion on each of the difference images.
  • Converted images; the inference results of the model to be evaluated for each of the difference images; the inference results of the model to be evaluated for each of the converted images; the difference images in the first target evaluation image set are in the Proportion among multiple evaluation images.
  • the factors to be verified are determined based on the background and foreground of each original image in the verification data set, and the factors to be verified correspond to the images in the background. feature.
  • the style conversion of the image is implemented through an image style conversion model; the style conversion model is trained based on the first evaluation image set and the second evaluation image set. ;
  • the image style conversion model is used to remove the verification factors from images containing the verification factors, and to add the verification factors to images that do not contain the verification factors, where the verification factors are image features.
  • the first evaluation image set corresponds to a first classification label
  • the second evaluation image set corresponds to a second classification label different from the first classification label
  • the The image style transfer model is trained based on the images in the first evaluation image set and the first classification label, and the images in the second evaluation image set and the second classification label.
  • the device further includes: an input module configured to receive the verification data set and the model to be evaluated input by the user.
  • the input module is also configured to receive the factors to be verified input by the user, where the factors to be verified are image features or identifiers indicating image features.
  • the above bias assessment device can be provided in an electronic device, the above acquisition module, classification module, conversion module and output module can be implemented by a processor in the electronic device, and the above input module can be implemented through an interface unit of the electronic device.
  • embodiments of the present application provide a computer-readable storage medium. Instructions are stored on the computer-readable storage medium. When executed on an electronic device, the instructions cause the electronic device to perform the bias assessment described in the first aspect. method.
  • embodiments of the present application provide a computer program product, where the computer program product includes instructions, and the instructions are used to implement the bias assessment method as described in the first aspect.
  • embodiments of the present application provide an electronic device, including:
  • memory for storing instructions for execution by one or more processors of the electronic device
  • a processor when the instructions are executed by one or more processors, the processor is configured to execute the bias assessment method as described in the first aspect.
  • Figure 1 is a schematic diagram of an application scenario of bias assessment provided by the embodiment of this application.
  • Figure 2 is a system architecture block diagram applied to a bias assessment method provided by an embodiment of the present application
  • Figure 3 is a schematic diagram of different categories of models to be evaluated provided by the embodiment of the present application.
  • Figure 4 is an architectural block diagram of a system for applying a bias assessment method provided by an embodiment of the present application
  • Figure 5 is a schematic flowchart of a bias assessment method provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of a display interface for bias assessment results provided by an embodiment of the present application.
  • Figure 7 is a schematic flow chart of a bias assessment method based on a medical image assessment model provided by an embodiment of the present application.
  • FIGS. 8A, 8B and 9 are schematic diagrams of a display interface for bias assessment results provided by embodiments of the present application.
  • Figure 10 is a structural block diagram of an electronic device provided by an embodiment of the present application.
  • Embodiments of the present application include, but are not limited to, a bias assessment method, media, and electronic equipment.
  • Machine learning is a science of artificial intelligence.
  • the main research object in this field is artificial intelligence, especially how to improve the performance of specific algorithms in empirical learning.
  • Deep learning A type of machine learning technology based on deep neural network algorithms. Its main feature is the use of multiple nonlinear transformation structures to process and analyze data. It is mainly used in scenarios such as perception and decision-making in the field of artificial intelligence, such as image and speech recognition, natural language translation, computer games, etc.
  • Data bias Also known as image data set bias, for a specific machine learning task, there are factors in the data that are relevant to the task but do not have non-causal relationships. For example, the samples are unbalanced, there are artificial markers in the data, etc. And data bias can cause machine learning models to learn false features.
  • the following takes a machine learning model as a binary classification model as an example.
  • the evaluation parameters of the machine learning model are explained with reference to the real values and predicted values shown in Table 1.
  • Predicted value 1
  • image samples include two categories: positive samples and negative samples. 1 represents positive samples and 0 represents negative samples.
  • the evaluation parameters of the machine learning model include the following parameters:
  • Negative The result predicted by the model is a negative example.
  • TP True Positive
  • the true category of the sample is a positive example, and the result predicted by the model is also a positive example.
  • TN True Negative
  • FP False Positive
  • TPR True Positive Rate
  • Sensitivity Sensitivity
  • Recall recall rate
  • TNR True Negative Rate
  • model and data set bias is a widespread problem that has a huge negative impact in machine learning, especially deep learning, and is difficult to detect and easily ignored. Especially in scenarios where model safety is required, models trained with biased data sets may cause serious accidents in actual use.
  • the machine learning model when there is a pacemaker in the medical image, the machine learning model will with a high probability infer that the patient has a heart disease; when there is a patient's intubation in the medical image, the model will with a high probability It is assumed that the patient has respiratory disease.
  • doctors often manually mark some unique identifiers on the images. These identifiers may be related to hospitals, doctors or diseases, so that the model can infer whether the medical image is a heart disease image by learning these identifiers. In this way, the model after training is likely to infer that the patient has heart disease through the unique identifiers related to heart disease that are manually marked.
  • unique identifiers related to heart disease in medical images can be image features of a pacemaker, text features related to heart disease, and labels for specific detection equipment, etc. Then, if there are no unique identifiers related to heart disease in the medical images of patients with heart disease, such as the image features of a pacemaker, then the model may not be able to identify the medical image as a heart disease image through the image features of human tissue, resulting in recognition The result is wrong.
  • Some traditional bias assessment methods can artificially select some verification factors that may lead to bias, and then verify each verification factor one by one to determine the factors causing model bias. Specifically, this method can split the verification data set into multiple subsets according to verification factors, and then count the differences in the inference results of the machine learning model on each subset, such as statistical model evaluation parameters such as accuracy or recall of the classification model. Characterizing differences in reasoning results. Furthermore, you can determine whether the factors currently to be verified will cause bias in the model by judging whether the difference in the inference results is significant.
  • model bias that is, the model is biased against the image data in the subset of the verification data set based on the factor to be verified. For example, taking a classification model where a machine learning model performs two classifications, the verification data set is divided into two subsets based on verification factors. Then, the machine learning model is used to perform inference on the images in the two subsets and obtain the inference results. Statistics show that the accuracy of the model for inferring positive samples for the images in the first subset is 90%, and for the images in the other subset, the accuracy is 90%. The accuracy of the inferred positive sample is 10%, and the accuracy difference between the two is significant, indicating that the factor to be verified will cause model bias.
  • the proportion of positive samples may be inconsistent on different subsets of the verification data set. Therefore, the image data in the verification data set needs to have real labels, and then combined with the real labels of the image data to determine the model's inference on each subset. differences in results. Moreover, certain categories of samples may be difficult to collect, such as X-ray images of normal people using pacemakers. Therefore, the above method has higher requirements on the verification data set.
  • the embodiment of the present application proposes a bias assessment method based on image style conversion, which determines at least one factor to be verified, and then divides the verification data set into subsets (or subsets) of different categories according to these factors to be verified. After classification subsets), the images in the subsets of each category are style-converted and converted into images in the styles corresponding to other category subsets.
  • the style of the image can include the texture, shape, color, structure, etc. of the image.
  • the machine learning model to be evaluated (hereinafter referred to as the model to be evaluated) performs inference on the original image and the image after the style conversion of the same image to obtain the inference result. Then evaluate the different reasoning results of the model to be evaluated for the same image, and obtain the bias evaluation results of the model to be evaluated for at least one factor to be verified, such as the image in which the model to be evaluated is biased and the score of the degree of bias, etc., and provide it to the user.
  • factors such as image features of a pacemaker, text features related to heart disease, and labels of specific detection equipment can be considered as possible factors that may cause bias in the model to be evaluated.
  • Verification factors As an example, by performing style conversion on medical images for the image feature of a cardiac pacemaker, which is a factor to be verified, we can obtain medical images of patients with heart disease that do not contain the image features of a pacemaker, and medical images of normal people that contain the image features of a pacemaker. Medical images with image features of the pacemaker.
  • the verification factors involved in the embodiments of this application mainly refer to image features in the image, such as local image features, such as image features of a pacemaker.
  • the bias assessment method provided by this application does not require the collection of real labels of sample images, and at the same time overcomes the difficulty of obtaining certain categories of images in practice. It has lower requirements for the collection of sample images in the verification data set, which can reduce the user's Time on sample collection.
  • Figure 1 is a schematic diagram of an application scenario of bias assessment provided by this application. As shown in Figure 1, it is assumed that the verification data set is divided into subset 1 of category 1 and subset 2 of category 2 according to the verification factors. The style of category 1 is recorded as style 1, and the style of category 2 is recorded as style 2.
  • the inference result for image 1 in the model 10 to be evaluated is positive, and the inference result for image 1′ is Negative. That is, there are differences in the inference results of the model to be evaluated 10 for different styles of image 1, indicating that the model to be evaluated 10 is biased against image 1, or is biased against the images in the subset to which image 1 belongs. It is understandable that the different inference results of the two images may mean that the inference results of the two images are quite different.
  • the images in the above verification data set are medical images, and the positive samples refer to X-ray images of heart disease, and the negative samples refer to X-ray images of normal people.
  • the image 1 shown in FIG. 1 is a presumed result of image 1 with positive fingers, which is a heart disease X-ray image, and the presumed result of image 1', which is image 1' with negative fingers, is a normal X-ray image.
  • factors such as image features of a pacemaker, text features related to heart disease, and labels of specific detection equipment can be used as factors to be verified that may cause bias in the model to be evaluated, that is, these factors May affect judgment of medical images related to heart disease.
  • style 1 can be an image with the image features of a pacemaker
  • style 2 can be an image without a pacemaker.
  • the style conversion between style 1 and style 2 is to add or cancel the image features of the pacemaker in the medical image.
  • the images in the above verification data set are medical images
  • the positive samples in the above verification data set are cell images of cervical cancer patients
  • the negative samples can be Normal human cell image.
  • the image 1 shown in FIG. 1 indicates that the image 1 with positive fingers is an image of cervical cancer cells
  • the image 1′ indicates that the image 1′ with negative fingers is presumed to be an image of normal cells.
  • factors such as the image characteristics of cell atrophy and the model of the image acquisition device can be used as factors to be verified that may cause bias in the model to be evaluated, that is, these factors may affect the evaluation of medical images related to cervical cancer. judge.
  • style 1 can be an image with image features of cell atrophy
  • style 2 can be an image without image features of cell atrophy
  • style The style conversion between style 1 and style 2 is to add or cancel the image features of cell atrophy in medical images.
  • the execution subject may be an electronic device, or a central processing unit (CPU) of the electronic device, or a device in the electronic device used to perform bias assessment.
  • the control module or device of the method may be an electronic device, or a central processing unit (CPU) of the electronic device, or a device in the electronic device used to perform bias assessment.
  • electronic devices suitable for this application may include but are not limited to: mobile phones, tablet computers, camcorders, cameras, desktop computers, laptop computers, handheld computers, notebook computers, desktop computers, super mobile personal computers ( ultra-mobile personal computer (UMPC), netbook, as well as cellular phones, personal digital assistant (PDA), augmented reality (AR) ⁇ virtual reality (VR) devices, media players, smart phones TVs, smart speakers, smart watches, etc.
  • PDA personal digital assistant
  • AR augmented reality
  • VR virtual reality
  • the execution subject of the bias assessment method provided by the embodiments of this application may also be a server.
  • the above-mentioned server can be a cloud server, which can be a hardware server, or can be embedded in a virtualized environment.
  • the server can be a virtual server executed on a hardware server including one or more other virtual machines. machine.
  • the above-mentioned server can also be an independent server. This independent server owns all the software and hardware resources of the entire server and can allocate and implement multiple services by itself, such as executing the bias assessment method in this application.
  • an electronic device is mainly used as the execution subject to illustrate the bias assessment method provided by the embodiments of the present application.
  • FIG. 2 is a block diagram of a software system architecture in an electronic device to which the bias assessment method provided by the embodiment of the present application is applied.
  • the framework includes a model to be evaluated 10 and an evaluation device 20 .
  • the evaluation device 20 inputs the model 10 to be evaluated and the verification data set, and outputs the consistency score of each image in the verification data set and the image whose style has the greatest impact on the consistency score.
  • the model 10 to be evaluated can perform inference on the original image of the same image in each subset and the image converted from the original image into the style corresponding to other subsets, and then obtain a consistency score based on these inference results.
  • the consistency score The sex score is used to measure the degree of influence of different styles on the inference results of the image, that is, the difference between the inference results of different styles of the image.
  • the images with the greatest influence of style on consistency scores are images with large differences in inference results between images of different styles.
  • the consistency score is the score of a subset of the verification data set, specifically the inference result of the model 10 to be evaluated for the original image of the same image in a subset and the inference result of the original image converted into an image of other styles.
  • the images whose style has the greatest impact on the consistency score are the images in the subset with the highest proportion of images in the total sample images in the validation data set that have significantly different inference results.
  • the score of an image in the consistency score verification data set is specifically the sample images that have a large difference between the inference results of the original image of the same image and the inference results of the original image converted into other styles of the model 10 to be evaluated.
  • the images whose style has the greatest impact on the consistency score are the images in the subset with the highest proportion of images in the total sample images in the validation data set that have significantly different inference results.
  • the image data in the verification data set in this application can also be called images, samples, samples, etc., all of which represent images.
  • the consistency score in this application can also be called bias score, score, etc.
  • the groups in the verification data set can also be called classifications, subsets, sets, etc.
  • the images in the validation data set can also be called evaluation images.
  • the model 10 to be evaluated can be a machine learning model trained according to the verification data set or other data sets. Subsequently, the image data in the verification data set can be inferred to obtain the inference result.
  • the model 10 to be evaluated may be a classification model 10a, a detection model 10b or a segmentation model 10c, etc.
  • the classification model 10a can respectively output a result label for images of different styles of the same image sample.
  • the result label determined by the classification model 10a is the label of the animal such as cats or dogs.
  • the evaluation device 20 can determine image samples with inconsistent result labels among images of different styles corresponding to the same image in each subset, regard these image samples as differentiated samples, and then calculate the proportion of differentiated samples to the total images in the verification data set. The proportion of samples is used to obtain the consistency score corresponding to each subset.
  • the detection model 10b can respectively output rectangular frames of the objects to be detected for images of different styles of the same image sample.
  • the detection model 10b is used to determine the rectangular frame in which objects such as cars, animals, and people are located in the image. At this time, the rectangular frame belongs to the object category such as cars, animals, and people.
  • the evaluation device 20 can determine that the intersection-over-union (IoU) of the rectangular frames of the same object category in the images of different styles corresponding to the same image in each subset is lower than the set IoU threshold.
  • Image samples, or image samples with inconsistent object categories in rectangular frames are regarded as differentiated samples, and the proportion of differentiated samples to the total image samples in the verification data set is calculated to obtain the consistency score corresponding to each subset.
  • the segmentation model 10c can segment different segmentation objects for images of different styles of the same image sample.
  • the segmentation objects are street scenes, People, vehicles, animals and other objects.
  • the evaluation device 20 can determine image samples whose dice of segmentation objects are lower than the set dice threshold among images of different styles corresponding to the same image in each subset, as samples with differences, and calculate the proportion of samples with differences in the verification data. The proportion of the total image samples in the set is used to obtain the consistency score corresponding to each subset.
  • the device 20 to be evaluated in the system includes an image grouping module M1 , an inter-class conversion training module M2 , an inter-class conversion reasoning module M3 and a difference evaluation and visualization module M4 .
  • the image grouping module M1 is used to group (or classify) the verification data set according to the factors to be verified to obtain multiple subsets of different categories, each category corresponding to a style.
  • the verification data set can be divided into a category of having a pacemaker (recorded as category 1) based on the factor to be verified.
  • category 1 the number of a pacemaker
  • Subset 2 without a pacemaker class (denoted as class 2).
  • the image grouping module M1 can respectively label images in subsets of different categories. Furthermore, the image grouping module M1 provides each image and the corresponding category label to the inter-class conversion training module M2. For example, the category label in the subset 1 with a pacemaker category 1 is "has a pacemaker", and does not The class label in subset 2 with pacemaker class 2 is "no pacemaker”.
  • the above-mentioned factors to be verified may be artificially set factors.
  • the image grouping module M1 can group the verification and data sets according to categories corresponding to these artificially set factors to obtain different subsets. That is, the image grouping module M1 groups the verification data sets to obtain different subsets in response to user operations.
  • the user can use known structured factors as factors to be verified.
  • artificially determined factors to be verified can be factors such as imaging machines and staining reagent models.
  • the image grouping module M1 can also automatically obtain one or more factors to be verified through the bias factor mining device, and group the verification data set according to these factors to be verified to obtain different subsets.
  • the above-mentioned bias factor mining device is a unit in the above-mentioned image grouping module M1.
  • the above-mentioned bias factor mining device is a device different from the evaluation device 20 in the electronic device. Then, the image grouping module M1 can analyze the verification data set and determine the factors to be verified that may cause bias in the evaluation process of the verification data set. Furthermore, the image grouping module M1 in the evaluation device 20 obtains the factors to be verified for the current verification data set from the bias factor mining device.
  • the image features of the foreground of the image can be used as the objects to be evaluated by the model 10 to be evaluated, while the image features of the background in the image can be used as the objects to be evaluated by the model 10 to be evaluated. Then, if during the process of training the model 10 to be evaluated, the model 10 to be evaluated learns too many image features of the background, it may cause the model 10 to be evaluated to be biased in its evaluation of the image features of the foreground.
  • the background of the images in the verification data set includes information such as exclusive identifiers related to hospitals or diseases, and this information may affect the image features of the heart in the foreground or the image features of cells in the cervix of the model 10 to be evaluated. Inferred results, that is, this information can be used as factors to be verified.
  • the bias factor mining device can identify the foreground and background in the image, and determine the factors to be verified for bias assessment from the image features of the background image. For example, in the field of medical images, image features corresponding to information such as hospital or disease-related exclusive identifiers can be determined from the background of the image as factors to be verified.
  • the inter-class conversion training module M2 is used to train an image style conversion model through the images and category labels provided by the image grouping module M1, and obtain the weight parameters of the trained image style conversion model. Among them, the main function of the image style conversion model is to realize style conversion between images of different categories. Furthermore, the inter-class conversion training module M2 outputs the weight parameters of the image style conversion model to the inter-class conversion reasoning module M3.
  • the above-mentioned image style conversion model can use Cycle Generative Adversarial Networks (cyclegan) technology to realize the conversion of images between different styles. It is understandable that cyclegan technology can achieve style conversion similar to that between horses and zebras, such as style conversion between apples and oranges.
  • cyclegan technology can achieve style conversion similar to that between horses and zebras, such as style conversion between apples and oranges.
  • the above image style transfer model uses cyclegan technology to add or eliminate some image features on the image for each verification factor to test whether these factors will affect the inference results of the model to be evaluated 10 .
  • the image style conversion model can use cyclegan technology to add the image features of a pacemaker to the X-ray image of a normal person, that is, the style ( For example, style 1) represented by category 1 is converted into a style (such as style 2) for an image with a pacemaker, which is an image without a pacemaker, to achieve style conversion.
  • the image style conversion model can use cyclegan technology to remove the image features of pacemakers in X-ray images of heart disease patients, that is, convert images with a style (such as style 2) without a pacemaker into a style (such as style 1) ) implements style conversion for images with a pacemaker.
  • the style of the image may include local features and global features, such as the texture, shape, structure, and color difference of the image.
  • the image features of the pacemaker described above are local features, or the color of the entire image, etc. is a global feature.
  • the inter-class conversion reasoning module M3 is used to convert images in each category subset in the verification data set into styles corresponding to subsets of other categories through the image style conversion model, obtain the converted style image, and convert the original images in the verification data set The images and these style-converted images are output to the model to be evaluated 10 .
  • model to be evaluated 10 can perform inference on all the original images in the verification data set to obtain inference results, and can perform inference on the transformed images corresponding to the original images in each subset of the verification data set to obtain inference results.
  • the model to be evaluated 10 outputs all inference results to the difference evaluation and visualization module M4.
  • the difference evaluation and visualization module M4 makes a difference judgment on the inference results of images of different styles corresponding to the same image, and then determines the difference samples, and calculates the consistency score such as the proportion of the difference samples to the total image samples in the verification data set, and adds Evaluation results such as difference samples and consistency scores are visually output to the user.
  • the execution subject of the method may be an electronic device, and the method includes the following steps:
  • the evaluation device 20 groups the verification data set according to at least one factor to be verified to obtain multiple subsets, and labels images of different subsets with different categories.
  • the electronic device may group the verification data set through the image grouping module M1 in the evaluation device 20 shown in FIG. 4 .
  • the acquisition of at least one factor to be verified may refer to the relevant description of the image grouping module M1 above.
  • S502 For images in different subsets in the verification data set, use the evaluation device 20 to train an image style conversion model that performs style conversion between images in each subset.
  • the electronic device can input the images of different subsets in the verification data set and the category labels of each image into the inter-class conversion training module M2 in the evaluation device 20 shown in Figure 4, and train to obtain the image style conversion model, This outputs the weight parameters of the model.
  • the evaluation device 20 uses the image style conversion model to convert the images into images of styles corresponding to the subsets of other categories.
  • the electronic device can use the image style conversion model to convert the images in each subset of the verification data set into corresponding subsets of other categories through the inter-class conversion inference module M3 in the evaluation device 20 shown in FIG. 4 image in the style.
  • the image 1 of style 1 can be converted into the image 1' of style 2 through the image style conversion model.
  • the evaluation device 20 can input the image 1 carrying the classification label 1 into the image conversion model, so that the evaluation device 20 performs style conversion on the image 1 and outputs the image 1' carrying the classification label 2, that is, the image 1' of the style 1 Image 1 is converted to image 1' of style 2.
  • the image style conversion model can also convert the style of image 1 to that other subset.
  • the subset's category corresponds to the style.
  • the evaluation device 20 can input the image 1 carrying the classification label 1 and the classification label 2 into the image conversion model, so that the evaluation device 20 performs style conversion on the image 1 and outputs the image 1' carrying the classification label 2 , that is, convert image 1 of style 1 into image 1' of style 2.
  • the image style The conversion model can also convert the style of image 1 to the styles corresponding to the categories of this other subset.
  • S504 Use the model 10 to be evaluated to perform inference on the original images in the verification data set and the images after the style conversion of the original images.
  • the model 10 to be evaluated can perform inference on image 1 and image 1' respectively to obtain respective inference results.
  • the inference result of the model 10 to be evaluated is positive for image 1, that is, an X-ray image of a heart disease, and the inference result for image 1' is negative, that is, an X-ray image of a normal person.
  • S505 Compare all the inference results through the evaluation device 20, output the original images in each subset with large differences in the inference results after the style transformation, and calculate the proportion of the images in each subset with large differences in the verification data set.
  • the electronic device can use the difference evaluation and visualization module M4 in the evaluation device 20 to determine the image with a large difference in the reasoning result after the style transformation of the original image in each subset of the verification data set, and then calculate the image with a large difference Proportion in the validation data set.
  • the evaluation device 20 can determine that image 1 is the inference result after style transformation. Big image. Similarly, the evaluation device 20 can determine images with large differences in inference results after style transformation of other original images in the verification data set. Furthermore, the proportion of image verification data sets with large differences in each subset is calculated.
  • the parameters for evaluating the degree of bias calculated by the electronic device through the evaluation device 20 are not limited to the proportion of the above-mentioned images with large differences in the verification data set.
  • the total number of images with large differences can also be calculated. Parameters such as the proportion of images with large differences in each subset to the total sample images in the subset are not specifically limited.
  • the electronic device can use the difference evaluation and visualization module M4 in the evaluation device 20 to display on the screen of the electronic device images with large differences and images with different reasoning results after the style conversion.
  • Bias assessment result information such as the proportion of the image verification data set, or the conclusion of the model on which data is biased, as well as the factors causing the bias.
  • the electronic device can perform style conversion on part of the original images, and then perform inferences on these original images and the style-converted images to obtain inference results, and then compare these inferences.
  • the results are bias assessment results.
  • the bias assessment information obtained by the electronic device for the model 10 to be evaluated includes: image 1 and the confidence obtained by its inference, image 1' and the confidence obtained by its inference, and "Conclusion: The model is correct for category 1" Image is biased” and "The bias factor is: Factor 1" bias assessment result information.
  • the style of image 1 displayed on the screen of the electronic device in FIG. 6 is subset 1 of category 1 and corresponds to style 1.
  • the style 1 is an X-ray image of a person with a pacemaker.
  • the style of image 1' is subset 1 of category 1 corresponding to style 1
  • style 2 is an X-ray image of a person without a pacemaker.
  • the inference result of model 10 to be evaluated on image 1 of category 1 is a heart disease X-ray image with a confidence level of 0.99, and the inference result is considered to be a heart disease patient;
  • the inference result of model 10 to be evaluated on image 1' of category 2 is:
  • the confidence level of the heart disease X-ray image is 0.01, and the inference result is considered to be that of a normal person.
  • the conclusion in the bias evaluation result shown in FIG. 6 may be that the model 10 to be evaluated is biased against the factor to be verified whether the X-ray image of a person with a pacemaker is biased.
  • the bias assessment method does not require the collection of real labels of sample images, and at the same time overcomes the difficulty of obtaining images of certain categories. Therefore, it has lower requirements for the collection of sample images in the verification data set and can reduce User time spent on sample collection.
  • the bias assessment results are more intuitive for users. Users can directly observe the impact of bias factors on the results, which is helpful to improve users' ability to analyze and understand the bias of the model.
  • this method can not only obtain the overall degree of bias of the verification data set, but also analyze which image data the model to be evaluated is biased, which is beneficial to users in analyzing the model.
  • the above-mentioned evaluation device 20 can be an application or software or system installed in an electronic device.
  • the software can provide a human-computer interaction interface and support the user to import the verification data set and the model 10 to be evaluated. Model information, etc., thereby outputting bias assessment result information on the screen according to the bias assessment method above.
  • the method shown in Figure 7 includes follow these steps:
  • S701 Receive a pathology data set and a cell classification model uploaded by the user to the bias assessment system.
  • the pathology data set is a data set to be verified, and the cell classification model is a model to be evaluated 10.
  • the images in the above-mentioned pathology data set are medical images, where the positive samples are cell images of cervical cancer patients, and the negative samples can be cell images of normal people.
  • the medical image of the positive finger is an image of cervical cancer cells
  • the inferred result of the medical image of the negative finger is an image of normal cells.
  • FIG. 8A it is a schematic diagram of uploading data to a bias assessment system displayed on an electronic device.
  • the main interface of the bias assessment system shown in Figure 8A includes a data set selection control 81, a model to be evaluated selection control 82, and a factor selection control 83.
  • the electronic device can display multiple optional data set controls including the open control 811 and data sets 812 and 813 shown in FIG. 8B . Furthermore, the user can click on any data set control to trigger the uploading of the data of the data set in the bias assessment system, such as selecting the control of data set 812, which represents the above-mentioned pathology data set. In addition, the user can click the data set opening control 811 to link and select the data set in any storage address in the electronic device.
  • the user can click the model selection control 82 to be evaluated shown in FIG. 8A to control the electronic device to select the model to be evaluated, for example, the model to be evaluated is the above-mentioned cell classification model.
  • S702 Receive user-input grouped factors to be verified, and group the pathology data set according to the factors to be verified to obtain subset 1 composed of images of category 1 and subset 2 composed of images of category 2.
  • the factor to be verified input by the user refers to the image feature represented by the factor to be verified input by the user, which may be data of the image feature, or identification information indicating the image feature.
  • the verification data set for the cell classification model is the image features of atrophic cells, or the text label "atrophic cells", etc.
  • the user can click the factor selection control 83 shown in FIG. 8A to set the grouping factors for the current pathology data set in the bias assessment system, which will not be described in detail.
  • the cells of the elderly generally have a higher probability of shrinking, while the cells of young women (or normal women) do not shrink.
  • shrinkage is not directly related to the disease, so shrinkage can be predicted as a candidate for biased assessment. Verification factors.
  • the pathology data set can be grouped into a subset consisting of images with the classification label atrophic (category 1), and a subset consisting of images with the classification label not atrophic.
  • the images in the verification data set can be called multiple evaluation images, the above-mentioned subset 1 can also be called the first evaluation image set, and the subset 2 can be called the second evaluation image set.
  • S703 Use cyclegan technology to train the image style conversion model corresponding to the pathology data set, and use the image style conversion model to classify the images in subset 1 of category 1 (atrophic) and the subset of category 2 (non-atrophic) in the pathology data set
  • the images in 2 are style transformed separately.
  • the style of these images A1 can be converted from the style corresponding to category 1 to the style corresponding to category 2 to obtain the converted image B2, that is, the image A1 can be converted from category 1 to category 2.
  • the style conversion on images with atrophic cells to eliminate atrophy-related image features in image A1 to obtain image B1.
  • the style of these images A2 can be converted from the style corresponding to category 2 to the style corresponding to category 1 to obtain the converted image B2, that is, the image A2 can be transformed into Converting from category 2 to category 1 specifically involves performing style conversion on images of cells without atrophy to add atrophy-related image features to image A2 to obtain image B2.
  • S704 Use the cell classification model to perform inference on the original image of the pathology data set and the image after the style conversion of the original image to obtain the inference result.
  • the set 3 consisting of images in subset 1 converted into images of style 2 can be called the third evaluation image set
  • the set 4 consisting of images in subset 2 converted into images of style 1 can be called the third evaluation image set.
  • the cell classification model performs inference on the image A1 with the classification label of category 1 and obtains a positive inference result with a confidence level of 0.99. It performs inference on the image B1 converted to the corresponding style of category 2 and obtains a negative inference result.
  • the confidence level is 0.01, that is, the inference result is positive.
  • the cell classification model infers the image A2 with the classification label of category 2, and the confidence that the inference result is positive is 0.01, and it infers the image B2 after being converted to the style corresponding to category 1, and the confidence that the inference result is negative is 0.99, the result is negative.
  • the bias assessment results include the proportion of images with relatively different results in each subset of the verification data set to the total sample images in the verification data set.
  • the inference results for the original images in subset 1 and subset 3 that is, the inference results for the first evaluation image set and the third evaluation image set can be called the first inference result.
  • the inference results for the original images in subset 2 and subset 4, that is, the inference results for the second evaluation image set and the fourth evaluation image set may be called second inference results. Therefore, the first inference result and the second inference result can be compared to obtain samples with large differences, and then the bias assessment results can be obtained.
  • the inference results of the cell classification model for images of different styles of image A1 in category 1 are biased positive, and the inference results of images of different styles of image A2 of category 2 are biased negative, indicating that the cell classification model is biased against atrophy.
  • the inference results of the cell classification model for image A1 and image A2 are quite different.
  • FIG. 9 a schematic diagram of a bias assessment result displayed by a bias assessment system is shown for an electronic device.
  • the interface shown in Figure 9 includes images A1, B1, A2, and B2, as well as the confidence levels corresponding to the images A1, B1, A2, and B2 respectively.
  • the bias assessment system only displays some images and their confidence levels at the same time.
  • the user can trigger the bias assessment system to update and display other relatively different images by operating more controls 91 shown in Figure 9 Large image and its confidence level. In this way, users can intuitively understand the biased assessment results of atrophy caused by the current cell classification model.
  • Figure 10 shows a schematic structural diagram of an electronic device.
  • the electronic device 100 shown in FIG. 10 may include a processor 110, a power module 140, a memory 150, a mobile communication module 130, a wireless communication module 120, a display screen 160, an interface unit 170, and the like.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, such as a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a digital signal processor (Digital Signal Processor, DSP), Processing modules or processing circuits such as microprocessors (Micro-programmed Control Unit, MCU), artificial intelligence (Artificial Intelligence, AI) processors or programmable logic devices (Field Programmable Gate Array, FPGA).
  • processing units can be independent devices or integrated in one or more processors.
  • the processor 110 can be used to run the above-mentioned model 10 to be evaluated and the evaluation device 20 to execute the bias evaluation method provided by this application.
  • the memory 150 can be used to store data, software programs and modules, and can be a volatile memory (Volatile Memory), such as a random access memory (Random-Access Memory, RAM); or a non-volatile memory (Non-Volatile Memory), For example, read-only memory (Read-Only Memory, ROM), flash memory (Flash Memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, or It can also be a removable storage medium, such as a Secure Digital (SD) memory card.
  • the memory 160 may include a program storage area (not shown) and a data storage area (not shown). Program code may be stored in the program storage area, and the program code is used to cause the processor 110 to execute the content display method in a mobile scenario provided by the embodiment of the present application by executing the program code.
  • the mobile communication module 130 may include, but is not limited to, an antenna, a power amplifier, a filter, a low noise amplifier (Low Noise Amplify, LNA), etc.
  • the mobile communication module 130 can provide wireless communication solutions including 2G/3G/4G/5G applied on the electronic device 2 .
  • the mobile communication module 130 can receive electromagnetic waves through an antenna, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 130 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna for radiation.
  • at least part of the functional modules of the mobile communication module 130 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 130 may be provided in the same device as at least part of the modules of the processor 110 .
  • the wireless communication module 120 may include an antenna, and implements the transmission and reception of electromagnetic waves via the antenna.
  • the wireless communication module 120 can provide applications on the electronic device 2 including Wireless Local Area Networks (WLAN) (such as Wireless Fidelity (Wi-Fi) network), Bluetooth (Bluetooth, BT), and global navigation.
  • WLAN Wireless Local Area Networks
  • Wi-Fi Wireless Fidelity
  • Bluetooth Bluetooth
  • BT Bluetooth
  • global navigation Wireless communication solutions such as Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR) and other wireless communications.
  • GNSS Global Navigation Satellite System
  • FM Frequency Modulation
  • NFC Near Field Communication
  • IR Infrared
  • the display screen 160 can be used to display relevant interfaces of the bias evaluation system mentioned above, support the user in selecting the data set to be verified and the model to be evaluated, and support the user in viewing the bias evaluation results of the model.
  • the interface unit 170 is configured to receive user input, such as the user inputting the verification data set and the input of the model to be evaluated on the interface of the bias assessment system displayed on the display screen 160 .
  • the power supply 140 is used to power the display screen 160, the processor 110 and other units in the electronic device 100.
  • the mobile communication module 130 and the wireless communication module 120 of the electronic device 100 may also be located in the same module.
  • the hardware structure shown in FIG. 10 does not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or fewer components than shown in FIG. 10 , or combine some components, or split some components, or arrange different components.
  • Embodiments of the mechanisms disclosed in this application may be implemented in hardware, software, firmware, or a combination of these implementation methods.
  • Embodiments of the present application may be implemented as a computer program or program code executing on a programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements) , at least one input device and at least one output device.
  • Program code may be applied to input instructions to perform the functions described herein and to generate output information.
  • Output information can be applied to one or more output devices in a known manner.
  • a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • Program code may be implemented in a high-level procedural language or an object-oriented programming language to communicate with the processing system.
  • assembly language or machine language can also be used to implement program code.
  • the mechanisms described in this application are not limited to the scope of any particular programming language. In either case, the language may be a compiled or interpreted language.
  • Embodiments of the mechanisms disclosed in this application may be implemented in hardware, software, firmware, or a combination of these implementation methods.
  • Embodiments of the present application may be implemented as a computer program or program code executing on a programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements) , at least one input device and at least one output device.
  • Program code may be applied to input instructions to perform the functions described herein and to generate output information.
  • Output information can be applied to one or more output devices in a known manner.
  • a processing system includes any system having a processor such as, for example, a digital signal processor (DSP), a microcontroller, an application specific integrated circuit (ASIC), or a microprocessor.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • Program code may be implemented in a high-level procedural language or an object-oriented programming language to communicate with the processing system.
  • assembly language or machine language can also be used to implement program code.
  • the mechanisms described in this application are not limited to the scope of any particular programming language. In either case, the language may be a compiled or interpreted language.
  • the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof.
  • the disclosed embodiments may also be implemented as instructions carried on or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be operated by one or more processors Read and execute.
  • instructions may be distributed over a network or through other computer-readable media.
  • machine-readable media may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magnetic Optical disk, read-only memory (ROM), random-access memory (RAM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), magnetic or optical card, flash memory, or Tangible machine-readable storage used to transmit information (e.g., carrier waves, infrared signals, digital signals, etc.) using electrical, optical, acoustic, or other forms of propagated signals over the Internet.
  • machine-readable media includes any type of machine-readable media suitable for storing or transmitting electronic instructions or information in a form readable by a machine (eg, computer).
  • each unit/module mentioned in each device embodiment of this application is a logical unit/module.
  • a logical unit/module can be a physical unit/module, or it can be a physical unit/module.
  • Part of the module can also be implemented as a combination of multiple physical units/modules.
  • the physical implementation of these logical units/modules is not the most important.
  • the combination of functions implemented by these logical units/modules is what solves the problem of this application. Key technical issues raised.
  • the above-mentioned equipment embodiments of this application do not introduce units/modules that are not closely related to solving the technical problems raised by this application. This does not mean that the above-mentioned equipment embodiments do not exist. Other units/modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

一种偏见评估方法、装置、介质、程序产品及电子设备,应用于人工智能领域,可以降低对验证数据集的收集要求,得到较为全面的偏见评估结果,并直观展示给用户。该方法包括:获取待验证因素;根据待验证因素对多个评估图像进行分类,得到第一目标评估图像集合,第一目标评估图像集合包括:包含所述待验证因素的第一评估图像集合,和/或,不包含所述待验证因素的第二评估图像集合;按照待验证因素对第一目标评估图像集合进行风格转换,得到第二目标评估图像集合;将第一目标评估图像集合和第二目标评估图像集合输入待评估模型进行推理,并根据推理结果输出对待评估模型的偏见评估结果;风格转换是通过去除或添加待验证因素实现的。

Description

一种偏见评估方法、装置、介质、程序产品及电子设备
本申请要求于2022年03月21日提交国家知识产权局、申请号为202210281564.1、申请名称为“一种偏见评估方法、装置、介质、程序产品及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,特别涉及一种偏见评估方法、装置、介质、程序产品及电子设备。
背景技术
图像数据集偏见是指图像数据集中存在的“虚假技术特征”,这些虚假技术特征是在利用图像数据训练机器学习模型时,图像数据中存在的不希望被机器学习模型学习的特征。
在一些场景中,图像数据集的一些图像中存在图像采集设备的机型等设备标签、图像采集参数以及人为标记等信息,而这些信息可能成为模型学习的虚假特征,从而,有可能导致机器学习模型无法按照设计者的预期,客观真实地对目标任务进行学习。那么,训练好的机器学习模型在实际使用环境中有可能难以按预期完成目标任务,从而导致模型存在偏见,识别结果可能出现大范围错误。例如,针对目标任务为识别重病图像的医学图像识别模型,如果训练数据集中存在重症室设备的标签,那么在训练中模型会学习该标签的特征来推测医学图像是否为重病图像,而不再学习病灶组织相关的图像特征。进而,如果在训练好的医学图像数据中输入心脏病重病图像,而该医学图像中没有重症室设备的标签,该模型很可能无法推测出患者为心脏病重症,导致模型的识别结果出现严重错误。如此,亟需一种便捷的方法对训练好的机器学习模型进行偏见评估,以更好地应用机器学习模型。
发明内容
有鉴于此,本申请实施例提供了一种偏见评估模型、装置、介质、程序产品及电子设备,可以降低对验证数据集的收集要求,得到对待评估模型较为全面的偏见评估结果,并直观展示给用户。
第一方面,本申请实施例提供了一种偏见评估方法,应用于电子设备,该方法包括:获取对待评估模型进行偏见评估的多个评估图像所存在的待验证因素;根据所述待验证因素对所述多个评估图像进行分类,得到第一目标评估图像集合,所述第一目标评估图像集合包括:包含所述待验证因素的第一评估图像集合,和/或,不包含所述待验证因素的第二评估图像集合;按照所述待验证因素对所述第一目标评估图像集合进行风格转换,得到第二目标评估图像集合;将所述第一目标评估图像集合和所述第二目标评估图像集合输入所述待评估模型进行推理,得到目标推理结果;根据所述目标推理结果,输出对所述待评估模型的偏见评估结果,所述偏见评估结果用于表征所述待验证因素是否造成所述待评估模型偏见;其中,对所述第一评估图像集合进行风 格转换是通过去除所述待验证因素实现的,对所述第二评估图像集合进行风格转换是通过添加所述待验证因素实现的。可以理解的是,多个评估图像即为下文中的验证数据集。第一评估图像集合和第二评估图像集合可以为下文中验证数据集划分得到的不同子集。其中,上述待验证因素可以为图像特征,例如心脏起搏器的图像特征。例如,风格转换前的图像为包括心脏起搏器图像特征的心脏病患者的X光图像,风格转换后的图像为去除心脏起搏器图像的心脏病患者的X光图像。如此,本申请提供的偏见评估方法不需要收集样本图像的真实标签,同时克服了实际中无法获取某些类别的图像的困难,对验证数据集中的样本图像的收集要求较低,可以减少用户在样本收集上的时间。
在第一方面的一种可能的实现方式中,上述偏见评估结果包括以下至少一项:所述待验证因素是否为造成所述待评估模型偏见的因素的信息;所述第一目标评估图像集合中的差异图像,所述差异图像为所述第一目标评估图像集合中推理结果与所述第二目标评估图像集合中对应的风格转换后的至少一个转换图像的推理结果不同的评估图像,所述差异图像的推理结果和所述转换图像的推理结果均为经过所述待评估模型输出的推理结果;所述第二目标评估集合中包括的针对每个所述差异图像进行风格转换得到的转换图像;所述待评估模型对每个所述差异图像的推理结果;所述待评估模型对每个所述转换图像的推理结果;所述第一目标评估图像集合中的差异图像在所述多个评估图像中的占比。如此,可以展示给用户基于多个待评估图像对待评估模型的较为全面的偏见评估结果。
可以理解的是,上述差异图像即为下文中所描述的差异样本或者差异较大的图像。例如,第一目标评估图像集合中的一个差异图像为包含心脏起搏器图像特征的心脏病患者的X光图像,在第二目标评估图像集合中对应的风格转换后的图像为去除心脏起搏器图像的心脏病患者的X光图像(即对应的转换图像)。另外,两个图像的推理结果不同,可以指推理结果不同或者差异较大。例如,待评估模型对两个图像的推理结果不同,在待评估模型为分类模型时指的是两个图像分类结果不同,在待评估模型为检测模型时指的是两个图像的中的矩形框的交并比(Intersection-over-Union,IoU)低于设定的IoU阈值。
在第一方面的一种可能的实现方式中,所述待验证因素为基于所述验证数据集中的每张原图像的背景和前景确定得到的,且所述待验证因素对应于背景中的图像特征。
在第一方面的一种可能的实现方式中,图像的风格转换是通过图像风格转换模型实现的;所述风格转换模型是根据所述第一评估图像集合和所述第二评估图像集合训练得到;并且,所述图像风格转换模型用于对包含所述验证因素的图像去除所述验证因素,以及对不包含所述验证因素的图像添加所述验证因素,所述验证因素为图像特征。
在第一方面的一种可能的实现方式中,所述第一评估图像集合对应第一分类标签,所述第二评估图像集合对应与所述第一分类标签不同的第二分类标签,所述图像风格转换模型是基于所述第一评估图像集合中的图像和所述第一分类标签,以及所述第二评估图像集合中的图像和所述第二分类标签训练得到的。
在第一方面的一种可能的实现方式中,上述方法还包括:接收用户输入的所述验 证数据集和所述待评估模型。
在第一方面的一种可能的实现方式中,上述方法还包括:接收用户输入的所述待验证因素,所述待验证因素为图像特征或者指示图像特征的标识。
第二方面,本申请实施例提供了一种偏见评估装置,包括:获取模块,用于获取对待评估模型进行偏见评估的多个评估图像所存在的待验证因素;分类模块,用于根据所述获取模块获取的所述待验证因素对所述多个评估图像进行分类,得到第一目标评估图像集合,所述第一目标评估图像集合包括:包含所述待验证因素的第一评估图像集合,和/或,不包含所述待验证因素的第二评估图像集合;转换模块,用于按照所述待验证因素对所述分类模块得到的所述第一目标评估图像集合进行风格转换,得到第二目标评估图像集合;推理模块,用于将所述分类模块得到的所述第一目标评估图像集合和所述转换模块得到所述第二目标评估图像集合输入所述待评估模型进行推理,得到目标推理结果;输出模块,用于根据所述推理模块得到的所述目标推理结果,输出对所述待评估模型的偏见评估结果,所述偏见评估结果用于表征所述待验证因素是否造成所述待评估模型偏见;其中,对所述第一评估图像集合进行风格转换是通过去除所述待验证因素实现的,对所述第二评估图像集合进行风格转换是通过添加所述待验证因素实现的。
在第二方面的一种可能的实现方式中,所述偏见评估结果包括以下至少一项:所述待验证因素是否为造成所述待评估模型偏见的因素的信息;所述第一目标评估图像集合中的差异图像,所述差异图像为所述第一目标评估图像集合中推理结果与所述第二目标评估图像集合中对应的风格转换后的至少一个转换图像的推理结果不同的评估图像,所述差异图像的推理结果和所述转换图像的推理结果均为经过所述待评估模型输出的推理结果;所述第二目标评估集合中包括的针对每个所述差异图像进行风格转换得到的转换图像;所述待评估模型对每个所述差异图像的推理结果;所述待评估模型对每个所述转换图像的推理结果;所述第一目标评估图像集合中的差异图像在所述多个评估图像中的占比。
在第二方面的一种可能的实现方式中,所述待验证因素为基于所述验证数据集中的每张原图像的背景和前景确定得到的,且所述待验证因素对应于背景中的图像特征。
在第二方面的一种可能的实现方式中,图像的风格转换是通过图像风格转换模型实现的;所述风格转换模型是根据所述第一评估图像集合和所述第二评估图像集合训练得到;
并且,所述图像风格转换模型用于对包含所述验证因素的图像去除所述验证因素,以及对不包含所述验证因素的图像添加所述验证因素,所述验证因素为图像特征。
在第二方面的一种可能的实现方式中,所述第一评估图像集合对应第一分类标签,所述第二评估图像集合对应与所述第一分类标签不同的第二分类标签,所述图像风格转换模型是基于所述第一评估图像集合中的图像和所述第一分类标签,以及所述第二评估图像集合中的图像和所述第二分类标签训练得到的。
在第二方面的一种可能的实现方式中,所述装置还包括:输入模块,用于接收用户输入的所述验证数据集和所述待评估模型。
在第二方面的一种可能的实现方式中,所述输入模块,还用于接收用户输入的所 述待验证因素,所述待验证因素为图像特征或者指示图像特征的标识。
例如,上述偏见评估装置可以设置在电子设备中,上述获取模块、分类模块、转换模块和输出模块可以通过电子设备中的处理器实现,上述输入模块可以通过电子设备的接口单元实现。
第三方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,该指令在电子设备上执行时使电子设备执行第一方面所述的偏见评估方法。
第四方面,本申请实施例提供了一种计算机程序产品,所述计算机程序产品包括指令,所述指令用于实现如第一方面所述的偏见评估方法。
第五方面,本申请实施例提供了一种电子设备,包括:
存储器,用于存储由电子设备的一个或多个处理器执行的指令,以及
处理器,当所述指令被一个或多个处理器执行时,所述处理器用于执行如第一方面所述的偏见评估方法。
附图说明
图1是本申请实施例提供的一种偏见评估的应用场景示意图;
图2是本申请实施例提供的一种偏见评估方法所应用的系统架构框图;
图3是本申请实施例提供的待评估模型的不同类别的示意图;
图4是本申请实施例提供的一种偏见评估方法应用的系统的架构框图;
图5是本申请实施例提供的一种偏见评估的方法的流程示意图;
图6是本申请实施例提供的一种偏见评估结果的显示界面的示意图;
图7是本申请实施例提供的一种基于医学图像评估模型的偏见评估方法的流程示意图;
图8A、图8B和图9是本申请实施例提供的一种偏见评估结果的显示界面的示意图;
图10是本申请实施例提供的一种电子设备的结构框图。
具体实施方式
本申请的实施例包括但不限于一种偏见评估方法、介质及电子设备。
下面介绍本申请实施例中相关的一些概念。
(1)机器学习(Machine learning):机器学习是一门人工智能的科学,该领域的主要研究对象是人工智能,特别是如何在经验学习中改善具体算法的性能。
(2)深度学习(deep learning):一类基于深层次神经网络算法的机器学习技术,其主要特征是使用多重非线性变换结构对数据进行处理和分析。主要应用于人工智能领域的感知、决策等场景,例如图像和语音识别、自然语言翻译、计算机博弈等。
(3)数据偏见(Data bias):或者称为图像数据集偏见,对于特定机器学习任务而言,数据中存在与该任务呈相关但不存在非因果关系的因素。例如样本不均衡、数据中存在人为标志物等。而数据偏见可能导致机器学习模型学习虚假特征。
(4)机器学习模型的评估参数
下面以机器学习模型为二分类的模型为例,参照表1所示的真实值和预测值来说明机器学习模型的评估参数。
表1
  预测值=1 预测值=0
真实值=1 真阳性 假阴性
真实值=0 假阴性 真阴性
其中,图像样本包括正例样本和负例样本两种分类,1表示正例样本,0表示负例样本。
具体的,机器学习模型的评估参数包括如下参数:
阳性(Positive):模型预测的结果是正例。
阴性(Negative):模型预测的结果是负例。
真阳性(True Positive,TP):样本的真实类别是正例,并且模型预测的结果也是正例。例如表1中示出的样本的真实值=1且预测值=1。
真阴性(True Negative,TN):样本的真实类别是负例,并且模型将其预测成为负例。例如表1中示出的图像样本的真实值=1且预测值=0。
假阳性(False Positive,FP):样本的真实类别是负例,但是模型将其预测成为正例。例如表1中示出的图像样本的真实值=0且预测值=1。
假阴性(False Negative,FN):样本的真实类别是正例,但是模型将其预测成为负例。例如表1中示出的图像样本的真实值=0且预测值=0。
准确度(Accuracy)=(TP+TN)/(TP+TN+FN+TN)。
正确率(Precision)=TP/(TP+FP)。
真阳性率(True Positive Rate,TPR),也称为灵敏度(Sensitivity)或召回率(Recall)。其中,召回率=TPR=TP/(TP+FN)。
真阴性率(True Negative Rate,TNR)=特异度(Specificity)=TN/(TN+FP)。
假阴性率(FalseNegaticeRate,FNR)=漏诊率=(1–灵敏度)=FN/(TP+FN)=1–TPR。
假阳性率(FalsePositiceRate,FPR),误诊率=(1–特异度)=FP/(FP+TN)=1–TNR。
dice相似系数(Dice Similariy Coefficient,DSC),也称为dice系数或dice,通常用于计算两个样本的相似度,值的范围从0到1。其中,dice=2TP/(FP+2TP+FN)。例如,对于分割任务而言,分割结果最好时dice的值为1,最差时dice的值为0。
交并比(Intersection-over-Union,IoU)为两个边界框(如矩形框)交集和并集之比。例如,IoU=TP/(FP+TP+FN)。
需要说明的是,模型与数据集偏见是一种在机器学习,尤其是深度学习中负面影响巨大,且难以察觉易被忽略的广泛问题。尤其对模型安全性要求较高的场景,含有偏见的数据集训练的模型可能在实际使用中,导致严重的事故。
作为一种示例,在一些医学图像识别场景中,医学图像中存在心脏起搏器时,机器学习模型大概率会推断该患者患有心脏病;医学图像中存在患者的插管时,模型大概率会推断该患者患有呼吸类疾病。但是,医生往往会人为在图像上标记一些专属标识,这些标识可能会与医院、医生或者疾病相关,从而使模型通过学习这些标识推测出医学图像是否为心脏病图像。如此,训练结束后的模型很可能是通过人为标记的与心脏病相关的专属标识推测出患者患有心脏病的。例如,医学图像中有心脏病相关的专属标识可以为心脏起搏器的图像特征,与心脏病相关的文字特征,以及特定检测设 备的标签等。那么,如果心脏病患者的医学图像中没有心脏病相关的专属标识,如没有心脏起搏器的图像特征,那么模型就可能无法通过人体组织的图像特征识别出医学图像为心脏病图像,导致识别结果错误。
一些传统的偏见评估方法,可以人为选取一些可能导致偏见的验证因素,然后对每种验证因素一一验证,确定出造成模型偏见的因素。具体地,该方法可以按照验证因素将验证数据集分割成多个子集,然后统计机器学习模型在各个子集上的推理结果的差异,如针对分类模型统计准确率或召回率等模型评估参数来表征推理结果的差异。进而,可以通过判断推理结果的差异是否显著来判断当前待验证因素是否会造成模型有偏见。如果判断出模型的推理结果的差异显著则认为当前待验证因素造成了模型偏见,也即模型对验证数据集中基于该待验证因素分割出的子集中的图像数据存在偏见。例如,以机器学习模型进行二分类的分类模型为例,验证数据集按照验证因素划分为两个子集。然后,通过机器学习模型对这两个子集中的图像分别进行推理并得到推理结果,统计该模型对第一个子集中的图像推理为阳性样本的准确率为90%,对另一个子集中的图像推理为阳性样本的准确率为10%,两者的准确率差异显著,说明该待验证因素会造成模型偏见。
然而,在验证数据集的不同的子集上,正例样本的比例可能不一致,因此验证数据集中的图像数据需要具有真实标签,进而结合图像数据的真实标签才能确定模型在各个子集上的推理结果的差异。并且,某些分类的样本可能会比较难以收集,如使用心脏起搏器的正常人的X光图像难以收集。因此,上述方法对验证数据集的要求较高。
为了解决上述问题,本申请实施例提出了一种基于图像风格转换的偏见评估方法,确定出至少一个待验证因素,再将验证数据集按照这些待验证因素划分为不同类别的子集(或称分类子集)之后,对各个类别的子集中的图像进行风格转换,转换为其他类别子集对应的风格下的图像,其中图像的风格可以包括图像的纹理、形状、色彩、结构等。从而,可以获得实际中较难获取或者无法获取的类别的图像。进而,通过待评估机器学习模型(以下称为待评估模型)针对同一图像的原图像和转换风格后的图像分别进行推理得到推理结果。再对待评估模型针对同一图像的不同推理结果进行评估,得到待评估模型针对至少一个待验证因素的偏见评估结果,如待评估模型产生偏见的图像以及偏见程度的评分等,并提供给用户。
例如,在识别心脏病图像的医学图像识别场景中,可以将心脏起搏器的图像特征,与心脏病相关的文字特征,以及特定检测设备的标签等因素作为可能导致待评估模型产生偏见的待验证因素。作为一种示例,针对心脏起搏器的图像特征这一待验证因素对医学图像进行风格转换,可以得到心脏病患者的不包含心脏起搏器的图像特征的医学图像,以及正常人包含心脏起搏器的图像特征的医学图像。
需要说明的是,本申请实施例中所涉及的验证因素,主要指的是图像中的图像特征,如局部图像特征,例如心脏起搏器的图像特征等。
如此,本申请提供的偏见评估方法不需要收集样本图像的真实标签,同时克服了实际中无法获取某些类别的图像的困难,对验证数据集中的样本图像的收集要求较低,可以减少用户在样本收集上的时间。
图1为本申请提供的一种偏见评估的应用场景示意图。如图1所示,假设验证数 据集按照验证因素划分为类别1的子集1和类别2的子集2,类别1的风格记为风格1,类别2的风格记为风格2。
进而,针对风格为风格1的图像1和由图像1进行风格转换后的风格为风格2的图像1',待评估模型10中对图像1的推测结果为阳性,对图像1'的推测结果为阴性。即待评估模型10针对图像1的不同风格的推测结果不同而存在差异,说明待评估模型10对图像1具有偏见,或者说对图像1所属的子集中的图像具有偏见。可以理解的是,两个图像的推理结果不同可以是两个图像的推理结果差异较大。
作为一种示例,在识别心脏病图像的医学图像识别场景中,上述验证数据集中的图像为医学图像,且正例样本指的是心脏病X光图像,负例样本为正常人X光图像。而图1示出的图像1为阳性指的图像1的推测结果为心脏病X光图像,图像1'为阴性指的图像1'的推测结果为正常X光图像。作为一种示例,该场景下可以将心脏起搏器的图像特征,与心脏病相关的文字特征,以及特定检测设备的标签等因素作为可能导致待评估模型产生偏见的待验证因素,即这些因素可能会影响对心脏病相关的医学图像的判断。例如,以针对心脏起搏器的图像特征这一待验证因素进行风格转换为例,风格1可以为具有心脏起搏器的图像特征的图像,而风格2可以为不具有心脏起搏器的图像特征的医学图像,那么风格1和风格2之间的风格转换就是在医学图像中添加或者取消心脏起搏器的图像特征。
作为另一种示例,在识别宫颈癌细胞图像的医学图像识别场景中,上述验证数据集中的图像为医学图像,上述验证数据集中的正例样本为宫颈癌症患者的细胞图像,负例样本可以为正常人的细胞图像。而图1示出的图像1为阳性指的图像1为宫颈癌细胞图像,图像1'为阴性指的图像1'的推测结果是正常细胞图像。作为一种示例,该场景下可以将细胞萎缩的图像特征、图像采集设备的型号等因素作为可能导致待评估模型产生偏见的待验证因素,即这些因素可能会影响对宫颈癌相关的医学图像的判断。例如,以针对细胞萎缩的图像特征这一待验证因素进行风格转换为例,风格1可以为具有细胞萎缩的图像特征的图像,而风格2可以为不具有细胞萎缩的图像特征的图像,那么风格1和风格2之间的风格转换就是在医学图像中添加或者取消细胞萎缩的图像特征。
需要说明的是,本申请实施例提供的偏见评估方法,执行主体可以为电子设备,或者,该电子设备的中央处理器(Central Processing Unit,CPU),或者该电子设备中的用于执行偏见评估方法的控制模块或装置。
可以理解的是,适用于本申请的电子设备可包括但不限于:手机、平板电脑、摄像机、相机、桌面型计算机、膝上型计算机、手持计算机、笔记本电脑、台式电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)\虚拟现实(virtual reality,VR)设备、媒体播放器、智能电视、智能音箱、智能手表等。
另外,在其他一些实施例中,本申请实施例提供的偏见评估方法,执行主体还可以为服务器。作为一种示例,上述服务器可以云端服务器,该云端服务器可以是硬件服务器,也可以植入虚拟化环境中,例如,该服务器可以是在包括一个或多个其他虚 拟机的硬件服务器上执行的虚拟机。另外,上述服务器还可以为独立服务器,该独立服务器拥有整台服务器的所有软硬件资源,可以自行分配与实行多种服务,如执行本申请中的偏见评估方法。
以下实施例中,主要以电子设备为执行主体,说明本申请实施例提供的偏见评估方法。
下面将结合附图对本申请的实施例作进一步地详细描述。
参考图2,为本申请实施例提供的偏见评估方法所应用的电子设备中的软件系统架构框图。该框架中包括待评估模型10和评估装置20。
其中,评估装置20中输入待评估模型10和验证数据集,输出验证数据集中的每个图像的一致性评分以及风格对一致性评分影响最大的图像。可以理解的是,待评估模型10可以针对每个子集中的同一图像的原图像和由原图像转换为其他子集对应的风格后的图像进行推理,再根据这些推理结果得到一致性评分,该一致性评分用于衡量不同风格对该图像的推理结果的影响程度,也即该图像的不同风格的推理结果之间的差异。此外,风格对一致性评分影响最大的图像为不同风格的图像之间的推理结果差异较大的图像。
作为一种示例,一致性评分为验证数据集中一个子集的评分,具体为待评估模型10针对一个子集中同一图像的原图像的推理结果和由原图像转换为其他风格的图像之间的推理结果差异较大的样本图像占验证数据集中的总样本图像的占比。风格对一致性评分影响最大的图像为子集中推理结果差异较大的图像在验证数据集中的总样本图像中的占比最高的子集中的图像。
另外,一致性评分验证数据集中一个图像的评分,具体为待评估模型10针对同一图像的原图像的推理结果和由原图像转换为其他风格的图像之间的推理结果差异较大的样本图像占子集数量的占比。风格对一致性评分影响最大的图像为子集中推理结果差异较大的图像在验证数据集中的总样本图像中的占比最高的子集中的图像。
本申请实施例中,为了在不同地方描述方便,将使用不同名词描述同一对象,但对该对象的本质不造成限定。例如,本申请中的验证数据集中的图像数据还可以称为图像、样例、样本等,均表示图像。又如,本申请中一致性评分还可以称为偏见评分、评分等。又如,验证数据集中的分组还可以称为分类或者子集或者集合等。又如验证数据集中的图像还可以称为评估图像。
在一些实施例中,待评估模型10可以为按照验证数据集或其他数据集训练得到的机器学习模型,后续可以对验证数据集中的图像数据进行推测得到推测结果。
在一些实施例中,参照图3示出的待评估模型的不同类别的示意图。如图3所示,待评估模型10可以为分类模型10a,检测模型10b或者分割模型10c等。
作为一种示例,在待评估模型10为图3示出的分类模型10a的情况下,分类模型10a可以针对同一图像样本的不同风格的图像分别输出一个结果标签。例如,分类模型10a用于对图像中的猫和狗等动物进行分类,那么分类模型10a确定出的结果标签为猫或狗等动物的标签。进而,评估装置20可以确定出各个子集中同一图像对应的不同风格的图像中结果标签不一致的图像样本,将这些图像样本作为有差异的样本,进而计算有差异的样本占验证数据集中的总图像样本的比例,以得到各个子集对应的一 致性评分。
作为一种示例,在待评估模型10为图3示出的检测模型10b的情况下,检测模型10b可以针对同一图像样本的不同风格的图像分别输出待检测对象的矩形框。例如,检测模型10b用于确定出图像中的汽车、动物以及人物等对象所在的矩形框,此时矩形框属于汽车、动物以及人物等对象类别。进而,评估装置20可以确定出各个子集中同一图像对应的不同风格的图像中同一对象类别的待检测对象的矩形框的交并比(Intersection-over-Union,IoU)低于设定的IoU阈值的图像样本,或者矩形框的对象类别不一致的图像样本,作为有差异的样本,并计算有差异的样本占验证数据集中的总图像样本的比例,以得到各个子集对应的一致性评分。
作为一种示例,在待评估模型10为图3示出的分割模型10c的情况下,分割模型10c可以针对同一图像样本的不同风格的图像分割出不同分割对象,如该分割对象为出街景、人物、车辆、动物等对象。进而,评估装置20可以确定出各个子集中同一图像对应的不同风格的图像中分割对象的dice低于设定的dice阈值的图像样本,作为有差异的样本,并计算有差异的样本占验证数据集中的总图像样本的比例,以得到各个子集对应的一致性评分。
接下来,基于图2示出的软件系统,参照图4详细描述该系统中的评估装置20的架构。
如图4所示的系统中待评估装置20包括图像分组模块M1,类间转换训练模块M2,类间转换推理模块M3和差异评估及可视化模块M4。
图像分组模块M1用于按照待验证因素对验证数据集进行分组(或称分类)得到不同类别的多个子集,每个类别对应一种风格。
作为一种示例,在待验证因素为X光图像中是否有心脏起搏器的图像特征的情况下,验证数据集按照该待验证因素可以划分为具有心脏起搏器类别(记为类别1)的子集1,和不具有心脏起搏器类别(记为类别2)的子集2。
具体地,图像分组模块M1可以对不同分类的子集中的图像分别打上类别标签。进而,图像分组模块M1将各个图像以及对应的类别标签提供给类间转换训练模块M2,如具有心脏起搏器类别1的子集1中的类别标签为“有心脏起搏器”,而不具有心脏起搏器类别2的子集2中的类别标签为“无心脏起搏器”。
在一些实施例中,上述待验证因素可以为人为设定的因素。进而,图像分组模块M1可以按照这些人为设定的因素对应的类别对验证和数据集进行分组得到不同子集,即图像分组模块M1响应于用户的操作对验证数据集进行分组得到不同子集。作为一种示例,用户可以将已知结构化的因素作为待验证因素,例如在病理影像中,人为确定的待验证因素可以为成像机器、染色试剂型号等因素。
在另一些实施例中,图像分组模块M1还可以通过偏见因素挖掘装置自动获取一个或多个待验证因素,并按照这些待验证因素对验证数据集进行分组得到不同子集。
作为一种示例,上述偏见因素挖掘装置为上述图像分组模块M1中的一个单元。
作为另一种示例,上述偏见因素挖掘装置为电子设备中与评估装置20不同的装置。那么,图像分组模块M1可以对验证数据集进行分析,确定出验证数据集评估过程中可能出现偏见的待验证因素。进而,评估装置20中的图像分组模块M1从偏见因素挖掘 装置获取针对当前的验证数据集的待验证因素。
通常可以将图像的前景的图像特征作为待评估模型10评估的对象,而不将图像中的背景的图像特征作为待评估模型10评估的对象。那么,如果在训练待评估模型10的过程中,待评估模型10学习过多背景的图像特征,将可能导致待评估模型10对前景的图像特征的评估出现偏见。例如,在医学图像领域,验证数据集中图像的背景中包括与医院或者疾病相关专属标识等信息,而这些信息可能会影响待评估模型10对前景中的心脏图像特征或者宫颈中的细胞的图像特征的推测结果,即这些信息可以作为待验证因素。
在一些实施例中,针对验证数据集,偏见因素挖掘装置可以识别出图像中的前景和背景,并从背景图像的图像特征中确定出偏见评估的待验证因素。例如,在医学图像领域,可以从图像的背景中确定出与医院或者疾病相关专属标识等信息对应的图像特征作为待验证因素。
类间转换训练模块M2用于通过图像分组模块M1提供的图像与类别标签,训练一个图像风格转换模型,得到训练好的图像风格转换模型的权重参数。其中,该图像风格转换模型的主要功能是实现不同类别图像之间的风格转换,进而,类间转换训练模块M2输出图像风格转换模型的权重参数到类间转换推理模块M3。
在一种实施例中,上述图像风格转换模型可以采用环形生成式对抗网络(Cycle Generative Adversarial Networks,cyclegan)技术,实现图像在不同风格之间的转换。可以理解的是,cyclegan技术可以实现类似马与斑马之间的风格转换,例如实现苹果与橙子之间风格转换等风格转换。
需要说明的是,上述图像风格转换模型利用cyclegan技术,针对一个验证因素,可以实现针对每个验证因素在图像上添加或消除一些图像特征,来检验这些因素是否会影响待评估模型10的推测结果。比如,针对X光图像中是否有心脏起搏器的图像特征这一待验证因素,图像风格转换模型可以利用cyclegan技术在正常人的X光图像上添加心脏起搏器的图像特征,即将风格(如类别1表示的风格1)为有心脏起搏器的图像转换为风格(如风格2)为无心脏起搏器的图像,实现风格转换。以及,图像风格转换模型可以利用cyclegan技术去除心脏病患者的X光图像中的心脏起搏器的图像特征,即将风格(如风格2)为无心脏起搏器的图像转换为风格(如风格1)为有心脏起搏器的图像,实现风格转换。
在一些实施例中,图像的风格可以包括局部特征和全局特征,如图像的纹理、形状、结构和色差等特征,例如上述心脏起搏器的图像特征为局部特征,或者,整个图像的色彩等为全局特征。
类间转换推理模块M3用于通过图像风格转换模型将验证数据集中的各个类别的子集中的图像转化为其他类别的子集对应的风格,得到转换风格后的图像,并将验证数据集中的原图像和这些转换风格后的图像输出到待评估模型10中。
进而,待评估模型10可以对验证数据集中的所有原图像进行推理得到推理结果,并对验证数据集中各个子集中的原图像对应的转换风格后的图像进行推理得到推理结果。待评估模型10将所有推理结果输出到差异评估及可视化模块M4。
差异评估及可视化模块M4,对同一图像对应的不同风格的图像的推理结果进行差 异判定,进而确定出差异样本,并计算差异样本占验证数据集中的总图像样本的比例等一致性评分,并将差异样本以及一致性评分等评估结果可视化地输出给用户。
接下来,基于图4示出的系统,参照图5示出了一种偏见评估的方法的流程示意图。如图5所示,该方法的执行主体可以为电子设备,该方法包括如下步骤:
S501:通过评估装置20按照至少一个待验证因素对验证数据集进行分组得到多个子集,为不同子集的图像打上不同类别标签。
在一些实施例中,电子设备可以通过图4示出的评估装置20中的图像分组模块M1对验证数据集进行分组。其中,对至少一个待验证因素的获取可以参照上文中对图像分组模块M1中的相关描述。
S502:针对验证数据集中的不同子集的图像,通过评估装置20训练对各个子集中的图像之间进行风格转换的图像风格转换模型。
在一些实施例中,电子设备可以在图4示出的评估装置20中的类间转换训练模块M2中输入验证数据集中不同子集的图像和各个图像的类别标签,训练得到图像风格转换模型,从而输出该模型的权重参数。
S503:针对验证数据集中各个子集的图像,通过评估装置20使用图像风格转换模型转换为其它类别的子集对应的风格的图像。
在一些实施例中,电子设备可以通过图4示出的评估装置20中的类间转换推理模块M3使用图像风格转换模型将验证数据集中的各个子集中的图像,转换为其他类别的子集对应的风格下的图像。
例如,参照图1,通过图像风格转换模型可以将风格1的图像1转换为风格2的图像1'。
作为一种示例,评估装置20可以向图像转换模型中输入携带有分类标签1的图像1,使得评估装置20将图像1进行风格转换,输出携带有分类标签2的图像1',即将风格1的图像1转换为风格2的图像1'。类似的,在验证数据集划分为包括子集1和子集2之外的其他子集时,在图像风格转换模型中输入图像1之后,图像风格转换模型还可以将图像1的风格转换为该其他子集的类别对应的风格。
作为另一种示例,评估装置20可以向图像转换模型中输入携带有分类标签1的图像1以及分类标签2,使得评估装置20将图像1进行风格转换,输出携带有分类标签2的图像1',即将风格1的图像1转换为风格2的图像1'。类似的,在验证数据集划分为包括子集1和子集2之外的其他子集时,在图像风格转换模型中输入图像1、分类标签2以及该其他子集对应的分类标签后,图像风格转换模型还可以将图像1的风格转换为该其他子集的类别对应的风格。
S504:使用待评估模型10对验证数据集中的原图像以及原图像进行风格转换后的图像分别进行推理。
例如,参照图1,待评估模型10可以分别对图像1和图像1'分别进行推理,得到各自的推理结果。作为一种示例,待评估模型10对图像1的推理结果为阳性,即心脏病X光图像,对图像1'的推理结果为阴性,即正常人的X光图像。
S505:通过评估装置20对比所有推理结果,输出各个子集中的原图像进行风格变换后推理结果差异较大的图像,并计算各个子集中差异较大图像在验证数据集中的占 比。
在一些实施例中,电子设备可以通过评估装置20中的差异评估及可视化模块M4确定验证数据集中每个子集中原图像进行风格变换后推理结果差异较大的图像,进而计算出差异较大的图像在验证数据集中的占比。
例如,在待评估模型10对图像1的推理结果为心脏病X光图像,对图像1'的推理结果为正常X光图像时,评估装置20可以确定出图像1为风格变换后推理结果差异较大的图像。类似的,评估装置20可以确定出验证数据集中的其他原图像进行风格变换后推理结果差异较大的图像。进而,计算出各个子集中差异较大的图像验证数据集中的占比。
此外,在其他一些实施例中,电子设备通过评估装置20计算的评估偏见程度的参数不限于上述差异较大的图像在验证数据集中的占比,还可以计算出差异较大的图像的总数、每个子集中差异较大的图像占该子集中的总样本图像的占比等参数,对此不做具体限定。
进而,在一些实施例中,电子设备可以通过评估装置20中的差异评估及可视化模块M4在电子设备的屏幕上显示差异较大的图像以及转换风格后推理结果存在差异的图像,差异较大的图像验证数据集中的占比,或者模型对哪些数据具有偏见的结论,以及造成偏见的因素等偏见评估结果信息。
另外,在其他一些实施例中,针对验证数据集中的原图像,电子设备可以对部分原图像进行风格转换,再对这些原图像和风格转换后的图像进行推理得到推理结果,进而比对这些推理结果得到偏见评估结果。
参照图6所示,示出一种偏见评估结果的显示界面的示意图。如图6所示,电子设备针对待评估模型10获取的偏见评估信息包括:图像1以及其推理得到的置信度,图像1'以及其推理得到的置信度,以及“结论:模型对类别1的图像具有偏见”和“偏见因素为:因素1”偏见评估结果信息。
例如,图6中电子设备的屏幕上显示的图像1的风格为分类1的子集1对应风格1,该风格1为有心脏起搏器的人的X光图像。相应的,图像1'的风格为分类1的子集1对应风格1,该风格2为无心脏起搏器的人的X光图像。而待评估模型10对类别1的图像1的推理结果为心脏病X光图像的置信度为0.99,认为该推理结果为心脏病患者;待评估模型10对类别2的图像1'的推理结果为心脏病X光图像的置信度为0.01,认为该推理结果为正常人,显然这两个推理结果存在差异。从而,图6示出的偏见评估结果中的结论可以为待评估模型10对是否具有心脏起搏器的人的X光图像这一待验证因素具有偏见。
如此,本申请实施例提供的偏见评估方法,不需要收集样本图像的真实标签,同时克服了无法获取某些类别的图像的困难,从而对验证数据集中的样本图像的收集要求较低,可以减少用户在样本收集上的时间。并且,偏见评估结果对于用户来说较为直观,用户可以直接观察偏见因素对结果的影响,有利于提升用户分析和认识模型的偏见的能力。另外,该方法不仅可以得到验证数据集整体的偏见程度,还可以分析出待评估模型对哪些图像数据产生了偏见,有利于用户对该模型进行分析。
另外,根据本申请的一些实施例,上述评估装置20可以为安装在电子设备中的一 个应用或软件或系统,该软件可以提供人机交互界面,支持用户导入验证数据集以及待评估模型10的模型信息等,从而按照上文中偏见评估的方法在屏幕上输出偏见评估结果信息。
在一些实施例中,在电子设备提供偏见评估系统进行人机交互完成偏见评估的场景中,参照图7示出的基于医学图像评估模型的偏见评估方法的流程示意图,图7示出的方法包括如下步骤:
S701:接收用户向偏见评估系统上传的一个病理数据集和一个细胞分类模型,该病理数据集为待验证数据集,该细胞分类模型为待评估模型10。
作为一种示例,在识别宫颈癌细胞图像的医学图像识别场景中,上述病理数据集中的图像为医学图像,其中正例样本为宫颈癌症患者的细胞图像,负例样本可以为正常人的细胞图像。进而,阳性指的医学图像为宫颈癌细胞图像,阴性指的医学图像的推测结果是正常细胞图像。
作为一种示例,如图8A所示,为电子设备显示的偏见评估系统上传数据的示意图。图8A示出的偏见评估系统的主界面中包括数据集选择控件81和待评估模型选择控件82和因素选择控件83。
其中,在用户点击图8A示出的数据集选择控件81之后,电子设备可以显示图8B示出的包含打开控件811以及数据集812、813等多个可选的数据集控件。进而,用户可以点击任意数据集控件可以触发在偏见评估系统中上传该数据集的数据,如选择数据集812的控件,数据集812表示上述病理数据集。另外,用户点击数据集打开控件811可以链接并选择电子设备中的任意存储地址中的数据集。
类似的,用户可以点击图8A示出的待评估模型选择控件82,控制电子设备选择待评估模型,如选择待评估模型为上述细胞分类模型。
S702:接收用户输入分组的待验证因素,按照待验证因素将对病理数据集进行分组得到类别1的图像组成的子集1和类别2的图像组成的子集2。
可以理解的是,用户输入的待验证因素,指的是用户输入的待验证因素表示的图像特征,可以为该图像特征的数据,或者指示该图像特征的标识信息。例如,针对细胞分类模型的验证数据集为萎缩细胞的图像特征,或者文字标识“萎缩细胞”等。
类似的,用户可以点击图8A示出的因素选择控件83,在偏见评估系统中设定对当前病理数据集的分组因素,不再详细描述。
作为一种示例,一般老年人的细胞出现萎缩的概率较高,而年轻女性(或正常女性)的细胞不会萎缩,但是萎缩与病变并无直接关联关系,因此可以预测萎缩作为偏见评估的待验证因素。如此,病理数据集可以分组为分类标签为萎缩(类别1)的图像组成的子集,以及分类标签为未萎缩的图像组成的子集。
需要说明的是,验证数据集中的图像可以称为多张评估图像,上述子集1也可以称第一评估图像集合,子集2可以称为第二评估图像集合。
S703:采用cyclegan技术训练与病理数据集对应的图像风格转换模型,并使用图像风格转换模型对病理数据集中的类别1(萎缩)的子集1中的图像和类别2(未萎缩)的子集2中的图像分别进行风格转换。
类似的,针对类别1的图像A1,可以将这些图像A1的风格由类别1对应的风格 转换为类别2对应的风格而得到转换风格后的图像B2,即实现将图像A1由类别1转换为类别2,具体是将出现萎缩细胞的图像进行风格转换,以在图像A1中消除萎缩相关的图像特征得到图像B1。
具体地,针对类别2(未出现萎缩的细胞)的图像A2,可以将这些图像A2的风格由类别2对应的风格转换为类别1对应的风格得到转换风格后的图像B2,即实现将图像A2由类别2转换为类别1,具体是将未出现萎缩的细胞的图像进行风格转换,以在图像A2中添加萎缩相关的图像特征得到图像B2。
S704:使用细胞分类模型对病理数据集的原图像以及原图像转换风格后的图像分别进行推理得到推理结果。
需要说明的是,子集1中的图像转换为风格2的图像组成的集合3可以称为第三评估图像集合,子集2中的图像转换为风格1的图像组成的集合4可以称为第四评估图像集合。
作为一种示例,细胞分类模型对分类标签为类别1的图像A1进行推理得出推理结果为阳性的置信度为0.99,对转换为类别2对应风格后的图像B1进行推理得出推理结果为阴性的置信度为0.01,即推理结果偏阳性。以及细胞分类模型对分类标签为类别2的图像A2进行推理得出推理结果为阳性的置信度为0.01,对转换为类别1对应风格后的图像B2进行推理得出推理结果为阴性的置信度为0.99,结果偏阴性。
S705:对细胞分类模型的所有推理结果进行分析,确定出各个子集中结果差异较大的图像样本,并将结果差异较大的图像及偏见评估结果通过偏见评估系统展示给用户。其中,该偏见评估结果包括验证数据集中各个子集中结果差异较大的图像占验证数据集中的总样本图像的比例。
需要说明的是,对子集1和子集3中的原图像的推理结果,即对第一评估图像集合和第三评估图像集合的推理结果可以称为第一推理结果。对子集2和子集4中的原图像的推理结果,即对第二评估图像集合和第四评估图像集合的推理结果可以称为第二推理结果。从而,可以比对第一推理结果和第二推理结果,得到差异较大的样本,进而得到偏见评估结果。
参照上述示例,细胞分类模型对类别1的图像A1的不同风格的图像的推理结果偏阳性,类别2的图像A2的不同风格的图像的推理结果偏阴性,说明细胞分类模型对萎缩产生了偏见。显然,细胞分类模型对图像A1和图像A2的推理结果差异较大。
如图9所示,为电子设备示出的偏见评估系统显示的偏见评估结果的示意图。图9示出的界面中包括图像A1、B1、A2、B2,以及图像A1、B1、A2、B2分别对应的置信度。另外,在差异较大的图像较多时,偏见评估系统同一时刻仅显示部分图像及其置信度,用户可以通过对图9示出的更多控件91的操作,触发偏见评估系统更新显示其他差异较大的图像及其置信度。如此,用户可以直观的获知当前的细胞分类模型对萎缩产生偏见的评估结果。
图10示出了一种电子设备的结构示意图。具体的,图10示出的电子设备100可以包括处理器110、电源模块140、存储器150、移动通信模块130、无线通信模块120、显示屏160以及接口单元170等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。 在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如,可以包括中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)、数字信号处理器(Digital Signal Processor,DSP)、微处理器(Micro-programmed Control Unit,MCU)、人工智能(Artificial Intelligence,AI)处理器或可编程逻辑器件(Field Programmable Gate Array,FPGA)等的处理模块或处理电路。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。例如,处理器110可以用于运行上述待评估模型10和评估装置20来执行本申请提供的偏见评估方法。
存储器150可用于存储数据、软件程序以及模块,可以是易失性存储器(Volatile Memory),例如随机存取存储器(Random-Access Memory,RAM);或者非易失性存储器(Non-Volatile Memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(Flash Memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,或者也可以是可移动存储介质,例如安全数字(Secure Digital,SD)存储卡。具体的,存储器160可以包括程序存储区(未图示)和数据存储区(未图示)。程序存储区内可存储程序代码,该程序代码用于使处理器110通过执行该程序代码,执行本申请实施例提供的移动场景下的内容显示方法。
移动通信模块130可以包括但不限于天线、功率放大器、滤波器、低噪声放大器(Low Noise Amplify,LNA)等。移动通信模块130可以提供应用在电子设备2上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块130可以由天线接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块130还可以对经调制解调处理器调制后的信号放大,经天线转为电磁波辐射出去。在一些实施例中,移动通信模块130的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块130至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
无线通信模块120可以包括天线,并经由天线实现对电磁波的收发。无线通信模块120可以提供应用在电子设备2上的包括无线局域网络(Wireless Local Area Networks,WLAN)(如无线保真(Wireless Fidelity,Wi-Fi)网络),蓝牙(Bluetooth,BT),全球导航卫星系统(Global Navigation Satellite System,GNSS),调频(Frequency Modulation,FM),近距离无线通信技术(Near Field Communication,NFC),红外技术(Infrared,IR)等无线通信的解决方案。
显示屏160,可以用于显示上文中偏见评估系统的相关界面,支持用户选择待验证数据集以及待评估模型,以及支持用户查看模型的偏见评估结果等。
接口单元170用于接收用户的输入,如用户在显示屏160显示的偏见评估系统的界面上输入验证数据集以及待评估模型的输入。
电源140用于为电子设备100中的显示屏160、处理器110等单元供电。
在一些实施例中,电子设备100的移动通信模块130和无线通信模块120也可以位于同一模块中。
可以理解的是,以上图10所示的硬件结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图10所示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。
本申请公开的机制的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码,该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。
可将程序代码应用于输入指令,以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的,处理系统包括具有诸如例如数字信号处理器(DSP)、微控制器、专用集成电路(ASIC)或微处理器之类的处理器的任何系统。
程序代码可以用高级程序化语言或面向对象的编程语言来实现,以便与处理系统通信。在需要时,也可用汇编语言或机器语言来实现程序代码。事实上,本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下,该语言可以是编译语言或解释语言。
本申请公开的机制的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码,该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。
可将程序代码应用于输入指令,以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的,处理系统包括具有诸如例如数字信号处理器(DSP)、微控制器、专用集成电路(ASIC)或微处理器之类的处理器的任何系统。
程序代码可以用高级程序化语言或面向对象的编程语言来实现,以便与处理系统通信。在需要时,也可用汇编语言或机器语言来实现程序代码。事实上,本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下,该语言可以是编译语言或解释语言。
在一些情况下,所公开的实施例可以以硬件、固件、软件或其任何组合来实现。所公开的实施例还可以被实现为由一个或多个暂时或非暂时性机器可读(例如,计算机可读)存储介质承载或存储在其上的指令,其可以由一个或多个处理器读取和执行。例如,指令可以通过网络或通过其他计算机可读介质分发。因此,机器可读介质可以包括用于以机器(例如,计算机)可读的形式存储或传输信息的任何机制,包括但不限于,软盘、光盘、光碟、只读存储器(CD-ROMs)、磁光盘、只读存储器(ROM)、随机存取存储器(RAM)、可擦除可编程只读存储器(EPROM)、电可擦除可编程只读存储器(EEPROM)、磁卡或光卡、闪存、或用于利用因特网以电、光、声或其他形式的传播信号来传输信息(例如,载波、红外信号数字信号等)的有形的机器可读存储器。因此,机器可读介质包括适合于以机器(例如,计算机)可读的形式存储或传输电子指令或信息的任何类型的机器可读介质。
在附图中,可以以特定布置和/或顺序示出一些结构或方法特征。然而,应该理解, 可能不需要这样的特定布置和/或排序。而是,在一些实施例中,这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外,在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征,并且在一些实施例中,可以不包括这些特征或者可以与其他特征组合。
需要说明的是,本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块,在物理上,一个逻辑单元/模块可以是一个物理单元/模块,也可以是一个物理单元/模块的一部分,还可以以多个物理单元/模块的组合实现,这些逻辑单元/模块本身的物理实现方式并不是最重要的,这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外,为了突出本申请的创新部分,本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入,这并不表明上述设备实施例并不存在其它的单元/模块。
需要说明的是,在本专利的示例和说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
虽然通过参照本申请的某些优选实施例,已经对本申请进行了图示和描述,但本领域的普通技术人员应该明白,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (17)

  1. 一种偏见评估方法,应用于电子设备,其特征在于,包括:
    获取对待评估模型进行偏见评估的多个评估图像所存在的待验证因素;
    根据所述待验证因素对所述多个评估图像进行分类,得到第一目标评估图像集合,所述第一目标评估图像集合包括:包含所述待验证因素的第一评估图像集合,和/或,不包含所述待验证因素的第二评估图像集合;
    按照所述待验证因素对所述第一目标评估图像集合进行风格转换,得到第二目标评估图像集合;
    将所述第一目标评估图像集合和所述第二目标评估图像集合输入所述待评估模型进行推理,得到目标推理结果;
    根据所述目标推理结果,输出对所述待评估模型的偏见评估结果,所述偏见评估结果用于表征所述待验证因素是否造成所述待评估模型偏见;
    其中,对所述第一评估图像集合进行风格转换是通过去除所述待验证因素实现的,对所述第二评估图像集合进行风格转换是通过添加所述待验证因素实现的。
  2. 根据权利要求1所述的方法,其特征在于,所述偏见评估结果包括以下至少一项:
    所述待验证因素是否为造成所述待评估模型偏见的因素的信息;
    所述第一目标评估图像集合中的差异图像,所述差异图像为所述第一目标评估图像集合中推理结果与所述第二目标评估图像集合中对应的风格转换后的至少一个转换图像的推理结果不同的评估图像,所述差异图像的推理结果和所述转换图像的推理结果均为经过所述待评估模型输出的推理结果;
    所述第二目标评估集合中包括的针对每个所述差异图像进行风格转换得到的转换图像;
    所述待评估模型对每个所述差异图像的推理结果;
    所述待评估模型对每个所述转换图像的推理结果;
    所述第一目标评估图像集合中的差异图像在所述多个评估图像中的占比。
  3. 根据权利要求2所述的方法,其特征在于,所述待验证因素为基于所述验证数据集中的每张原图像的背景和前景确定得到的,且所述待验证因素对应于背景中的图像特征。
  4. 根据权利要求3所述的方法,其特征在于,图像的风格转换是通过图像风格转换模型实现的;所述风格转换模型是根据所述第一评估图像集合和所述第二评估图像集合训练得到;
    并且,所述图像风格转换模型用于对包含所述验证因素的图像去除所述验证因素,以及对不包含所述验证因素的图像添加所述验证因素,所述验证因素为图像特征。
  5. 根据权利要求4所述的方法,其特征在于,所述第一评估图像集合对应第一分类标签,所述第二评估图像集合对应与所述第一分类标签不同的第二分类标签,所述图像风格转换模型是基于所述第一评估图像集合中的图像和所述第一分类标签,以及所述第二评估图像集合中的图像和所述第二分类标签训练得到的。
  6. 根据权利要求2至4中的任一项所述的方法,其特征在于,所述方法还包括:
    接收用户输入的所述验证数据集和所述待评估模型。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    接收用户输入的所述待验证因素,所述待验证因素为图像特征或者指示图像特征的标识。
  8. 一种偏见评估装置,其特征在于,包括:
    获取模块,用于获取对待评估模型进行偏见评估的多个评估图像所存在的待验证因素;
    分类模块,用于根据所述获取模块获取的所述待验证因素对所述多个评估图像进行分类,得到第一目标评估图像集合,所述第一目标评估图像集合包括:包含所述待验证因素的第一评估图像集合,和/或,不包含所述待验证因素的第二评估图像集合;
    转换模块,用于按照所述待验证因素对所述分类模块得到的所述第一目标评估图像集合进行风格转换,得到第二目标评估图像集合;
    推理模块,用于将所述分类模块得到的所述第一目标评估图像集合和所述转换模块得到所述第二目标评估图像集合输入所述待评估模型进行推理,得到目标推理结果;
    输出模块,用于根据所述推理模块得到的所述目标推理结果,输出对所述待评估模型的偏见评估结果,所述偏见评估结果用于表征所述待验证因素是否造成所述待评估模型偏见;
    其中,对所述第一评估图像集合进行风格转换是通过去除所述待验证因素实现的,对所述第二评估图像集合进行风格转换是通过添加所述待验证因素实现的。
  9. 根据权利要求8所述的装置,其特征在于,所述偏见评估结果包括以下至少一项:
    所述待验证因素是否为造成所述待评估模型偏见的因素的信息;
    所述第一目标评估图像集合中的差异图像,所述差异图像为所述第一目标评估图像集合中推理结果与所述第二目标评估图像集合中对应的风格转换后的至少一个转换图像的推理结果不同的评估图像,所述差异图像的推理结果和所述转换图像的推理结果均为经过所述待评估模型输出的推理结果;
    所述第二目标评估集合中包括的针对每个所述差异图像进行风格转换得到的转换图像;
    所述待评估模型对每个所述差异图像的推理结果;
    所述待评估模型对每个所述转换图像的推理结果;
    所述第一目标评估图像集合中的差异图像在所述多个评估图像中的占比。
  10. 根据权利要求9所述的装置,其特征在于,所述待验证因素为基于所述验证数据集中的每张原图像的背景和前景确定得到的,且所述待验证因素对应于背景中的图像特征。
  11. 根据权利要求10所述的装置,其特征在于,图像的风格转换是通过图像风格转换模型实现的;所述风格转换模型是根据所述第一评估图像集合和所述第二评估图像集合训练得到;
    并且,所述图像风格转换模型用于对包含所述验证因素的图像去除所述验证因素,以及对不包含所述验证因素的图像添加所述验证因素,所述验证因素为图像特征。
  12. 根据权利要求11所述的装置,其特征在于,所述第一评估图像集合对应第一分类标签,所述第二评估图像集合对应与所述第一分类标签不同的第二分类标签,所述图像风格转换模型是基于所述第一评估图像集合中的图像和所述第一分类标签,以及所述第二评估图像集合中的图像和所述第二分类标签训练得到的。
  13. 根据权利要求9至11中的任一项所述的装置,其特征在于,所述装置还包括:
    输入模块,用于接收用户输入的所述验证数据集和所述待评估模型。
  14. 根据权利要求13所述的装置,其特征在于,
    所述输入模块,还用于接收用户输入的所述待验证因素,所述待验证因素为图像特征或者指示图像特征的标识。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有指令,该指令在电子设备上执行时使电子设备执行权利要求1至7中任一项所述的偏见评估方法。
  16. 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,所述指令用于实现如权利要求1至7中任一项所述的偏见评估方法。
  17. 一种电子设备,其特征在于,包括:
    存储器,用于存储由电子设备的一个或多个处理器执行的指令,以及
    处理器,当所述指令被一个或多个处理器执行时,所述处理器用于执行如权利要求1至7中任一项所述的偏见评估方法。
PCT/CN2022/132232 2022-03-21 2022-11-16 一种偏见评估方法、装置、介质、程序产品及电子设备 WO2023179055A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210281564.1 2022-03-21
CN202210281564.1A CN116824198A (zh) 2022-03-21 2022-03-21 一种偏见评估方法、装置、介质、程序产品及电子设备

Publications (1)

Publication Number Publication Date
WO2023179055A1 true WO2023179055A1 (zh) 2023-09-28

Family

ID=88099738

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/132232 WO2023179055A1 (zh) 2022-03-21 2022-11-16 一种偏见评估方法、装置、介质、程序产品及电子设备

Country Status (2)

Country Link
CN (1) CN116824198A (zh)
WO (1) WO2023179055A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860133A (zh) * 2020-06-08 2020-10-30 华南师范大学 无种族偏见的识别人类的人工智能伦理方法和机器人
CN113361653A (zh) * 2021-07-09 2021-09-07 浙江工业大学 基于数据样本增强的深度学习模型去偏方法和装置
CN113570603A (zh) * 2021-09-26 2021-10-29 国网浙江省电力有限公司电力科学研究院 一种基于CycleGAN的接触网零部件缺陷数据生成方法及系统
US20210406587A1 (en) * 2020-06-29 2021-12-30 Robert Bosch Gmbh Image classification and associated training for safety-relevant classification tasks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860133A (zh) * 2020-06-08 2020-10-30 华南师范大学 无种族偏见的识别人类的人工智能伦理方法和机器人
US20210406587A1 (en) * 2020-06-29 2021-12-30 Robert Bosch Gmbh Image classification and associated training for safety-relevant classification tasks
CN113361653A (zh) * 2021-07-09 2021-09-07 浙江工业大学 基于数据样本增强的深度学习模型去偏方法和装置
CN113570603A (zh) * 2021-09-26 2021-10-29 国网浙江省电力有限公司电力科学研究院 一种基于CycleGAN的接触网零部件缺陷数据生成方法及系统

Also Published As

Publication number Publication date
CN116824198A (zh) 2023-09-29

Similar Documents

Publication Publication Date Title
Harangi Skin lesion classification with ensembles of deep convolutional neural networks
US11694123B2 (en) Computer based object detection within a video or image
US12056211B2 (en) Method and apparatus for determining image to be labeled and model training method and apparatus
CN108416776B (zh) 图像识别方法、图像识别装置、计算机产品和可读存储介质
WO2019105218A1 (zh) 图像特征的识别方法和装置、存储介质、电子装置
US20200074632A1 (en) Assessment of density in mammography
JP2022505775A (ja) 画像分類モデルの訓練方法、画像処理方法及びその装置、並びにコンピュータプログラム
Kisilev et al. From medical image to automatic medical report generation
CN111709485B (zh) 医学影像处理方法、装置和计算机设备
US10937143B1 (en) Fracture detection method, electronic device and storage medium
WO2021098534A1 (zh) 相似度确定、网络训练、查找方法及装置、电子装置和存储介质
US20210192365A1 (en) Computer device, system, readable storage medium and medical data analysis method
US11531851B2 (en) Sequential minimal optimization algorithm for learning using partially available privileged information
CN110796659A (zh) 一种目标检测结果的鉴别方法、装置、设备及存储介质
WO2023221697A1 (zh) 图像识别模型的训练方法、装置、设备、介质
WO2023142532A1 (zh) 一种推理模型训练方法及装置
CN114758787A (zh) 区域疫情信息处理方法、装置和系统
Lanjewar et al. Fusion of transfer learning models with LSTM for detection of breast cancer using ultrasound images
CN114399634B (zh) 基于弱监督学习的三维图像分类方法、系统、设备及介质
Tian et al. Radiomics and its clinical application: artificial intelligence and medical big data
CN114093507A (zh) 边缘计算网络中基于对比学习的皮肤病智能分类方法
CN114191665A (zh) 机械通气过程中人机异步现象的分类方法和分类装置
CN110414562A (zh) X光片的分类方法、装置、终端及存储介质
Jin et al. Metadata and image features co-aware personalized federated learning for smart healthcare
US11783165B1 (en) Generating vectors from data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22933106

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022933106

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022933106

Country of ref document: EP

Effective date: 20240913