WO2023014789A1

WO2023014789A1 - System and method for pathology image analysis using a trained neural network and active learning framework

Info

Publication number: WO2023014789A1
Application number: PCT/US2022/039274
Authority: WO
Inventors: Corey ARNOLD; Wenyuan Li; Jiayun LI; William SPEIER
Original assignee: The Regents Of The University Of California
Priority date: 2021-08-03
Filing date: 2022-08-03
Publication date: 2023-02-09

Abstract

A method for training a neural network for analyzing pathology images includes receiving a training dataset comprising a set of labeled samples and a set of unlabeled samples, training the neural network using the set of labeled samples until the neural network converges, identifying at least one noisy sample in the set of labeled samples, identifying at least one informative sample for further expert annotation and at least one confident predictive sample in the set of unlabeled samples and generating an updated set of labeled samples for the training dataset. The updated set of labeled samples may be generated by removing the identified noisy samples, adding the identified informative samples with further expert annotation and adding the identified confident predictive samples. The method may further include training the neural network using the updated set of labeled samples and storing the trained neural network in a data storage.

Description

SYSTEM AND METHOD FOR PATHOLOGY IMAGE ANALYSIS USING A TRAINED NEURAL NETWORK AND ACTIVE LEARNING FRAMEWORK

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is based on, claims priority to, and incorporates herein by reference in its entirety U.S. Serial No. 63/228,742 filed August 3, 2021 and entitled "Active Learning Framework for Pathology Image Analysis"

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH [0002] This invention was made with government support under Grant Number CA220352, awarded by the National Institutes of Health. The government has certain rights in the invention.

BACKGROUND

[0003] Deep neural networks (DNNs) have achieved great success in a wide variety of medical image analysis tasks. DNNs, in particular convolutional neural networks, have rapidly become a popular choice for analyzing histopathology images. However, training these models relies heavily on a large number of annotated samples by experts. In addition, noise-free expert annotations are crucial to achieve high performance. It is difficult, however, to obtain a perfect set of labels due to the variability between expert annotations. Furthermore, in medical image analysis, obtaining enough annotations can be expensive and time-consuming for many tasks. This is especially true for histopathology image analysis. In histopathology image analysis, the size of the collected dataset can be large, but performing annotations requires years of professional training and domain knowledge. In addition, the labels provided by different pathologists can demonstrate low inter-reader variability. For example, in prostate cancer grading using Gleason scoring, the concordance rate of multiple pathologists can be as low as 57.9%, which results in noisy annotations. DNNs are capable of fitting to noisy annotations, but they may not generalize to unseen data, which is an important component of clinical applications. Furthermore, it is challenging to distinguish mislabeled samples from hard samples. Mislabeled samples are samples with incorrect annotations, while hard samples have the correct label, but the samples themselves are not "typical." The lack of large and noise-free annotation sets is a significant challenge in histopathology image analysis, preventing DNNs to scale to the size of the collected data.

[0004] Recent studies have investigated methods for dealing with annotation challenges in medical imaging. One solution is to use active learning (AL). AL aims to reduce the amount of labeled data necessary for the learning task. It employs various sampling methods to select samples from an unlabeled set. The selected samples are then annotated by experts and used to train the model. A carefully designed sampling method can reduce the overall number of labeled data points required to train the model and make the model robust to class imbalances. However, traditional AL methods do not address the noisy label issue.

[0005] Few studies have sought to detect noisy labels in training data and enhance the performance of DNNs in medical image analysis. Specifically, addressing the issue of noisy labels remains an ongoing challenge for the medical imaging analysis community. In one prior study, a noisy channel was adopted in neural networks, which models the stochastic relation between the correct label and the observed noisy label. In another prior study, an online uncertainty sampling strategy was proposed to suppress the noisy samples. However, these methods do not distinguish mislabeled samples from hard samples. Making this distinction could greatly improve histopathology images analysis tasks with noisy labels.

[0006] There is a need for systems and methods for training and generating a neural network for analyzing pathology images that includes a sample selection technique that can detect noisy samples and reduce the amount of data required to train the neural network including reducing the required number of expert annotations.

SUMMARY

[0007] In accordance with an embodiment, a method for training a neural network for analyzing pathology images includes receiving a training dataset comprising a set of labeled samples and a set of unlabeled samples, training the neural network for analyzing pathology images using the set of labeled samples until the neural network converges, identifying at least one noisy sample in the set of labeled samples and identifying at least one informative sample in the set of unlabeled samples for further expert annotation and at least one confident predictive sample in the set of unlabeled samples. The method may further include generating an updated set of labeled samples for the training dataset by removing the identified at least one noisy sample from the set of labeled samples, adding the identified at least one informative sample with further expert annotation to the set of labeled samples and adding the identified at least one confident predictive sample to the set of labeled samples. The method may further include training the neural network for analyzing pathology images using the updated set of labeled samples and storing the trained neural network for analyzing pathology images in a data storage. [0008] In accordance with another embodiment, a system for analyzing pathology images includes an input for receiving at least one pathology image of a subject and a neural network coupled to the input and configured to analyze the at least one pathology image. The neural network may be trained using a plurality of training iterations. For at least one iteration, the neural network may be trained using a set of labeled samples generated by removing one or more noisy samples from an initial set of labeled samples in a training dataset, adding one or more informative samples with further expert annotation to the initial set of labeled samples and adding one or more confident predictive samples to the initial set of labeled samples.

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The present invention will hereafter be described with reference to the accompanying drawings, wherein like reference numerals denote like elements. [0010] FIG 1 is a block diagram of a system for analyzing pathology images using a deep neural network (DNN) in accordance with an embodiment;

[0011] FIG. 2A is a block diagram of a system for training a DNN for analyzing pathology images in accordance with an embodiment;

[0012] FIG. 2B is a block diagram of a process for generating a labeled dataset for training a DNN for analyzing pathology images in accordance with an embodiment; [0013] FIG. 3 illustrates a method for training a DNN for analyzing pathology images in accordance with an embodiment;

[0014] FIG. 4 illustrates an example process for pre-processing training data in accordance with an embodiment;

[0015] FIG. 5 illustrates a method for complexity level classification in accordance with an embodiment;

[0016] FIG. 6 illustrates a method for determining training loss and predictive entropy of samples in a training dataset in accordance with an embodiment; and

[0017] FIG. 7 is a block diagram of an example computer system in accordance with an embodiment.

DETAILED DESCRIPTION

[0018] The present disclosure describes a system and method for training and generating a neural network for analyzing pathology images (e.g., histopathology images) using an active learning framework. In some embodiments, the neural network may be trained and configured to perform analysis tasks or functions such as, for example, classification, segmentation, disease (e.g., cancer) diagnosis, etc. The active learning framework can be advantageously configured to include a sample selection technique that automatically detects, during training of the neural network, noisy (e.g., mislabeled data points) samples from labeled samples in a training dataset as well as informative and confident predictive samples from unlabeled samples in the training dataset. In some embodiments, the sample selection technique of the disclosed training process advantageously combines both uncertainty and representativeness measures to detect noisy, informative and confident predictive samples in the training dataset. The identified noisy, informative and confident predictive samples may be used to reprioritize samples for training of the neural network which can result in a more robust model that requires less data to train the neural network. The disclosed active learning framework or technique for training a pathology image analysis neural network\may be useful in clinical application with real-world training datasets where ground truth labels may be imperfect.

[0019] In some embodiments, the training technique for a neural network for analyzing pathology images can be configured to, in each training iteration, systematically identify noisy samples in a set of labeled samples in the training dataset and excluding the noisy samples from the set of labeled samples. By identifying and removing the noisy samples in the set of labeled samples in each training iteration, the disclosed training technique can reduce the impact of noisy samples with noisy labels in the training dataset and improve generalization of the neural network model. In some embodiments, the training technique may also be configured to, in each training iteration, identify and select two groups of data from the set of unlabeled samples in the training dataset including 1) one "informative" sample group (e.g., a set of most uncertain samples) that requires additional expert annotation (e.g., by a domain expert or oracle), and 2) one "confident predictive" sample group (e.g., a set of highly certain samples) where each confident predictive sample includes pseudo labels generated and assigned by the neural network model for analyzing pathology images. The confident predictive samples may be automatically added to the set of labeled samples in the training dataset and the informative samples may be added to the set of labeled samples in the training dataset after expert (or oracle) annotation. Advantageously, in each iteration of the training process an updated set of labeled samples for the next iteration of the training process may be generated by removing the identified noisy samples, adding the identified informative samples with additional expert annotations, and adding the confident predictive samples with pseudo labels. Accordingly, the sample selection strategy of the disclosed training technique can improve performance of the trained neural network (e.g., classification performance) with fewer manual annotations. In some embodiments, in each iteration of the training process, each sample in the training dataset (labeled and unlabeled) may be classified based on complexity, for example, easy, medium, or hard. In some embodiments, the disclosed training technique may also be configured to distinguish between noisy samples and hard samples using a heuristic approach. Advantageously, the noisy samples can be excluded from the training dataset while the hard samples can be preserved in the training dataset to improve performance of the pathology image analysis neural network model. [0020] FIG 1 is a block diagram of a system for analyzing pathology images using a deep neural network in accordance with an embodiment. System 100 can include an input of one or more pathology images 102 of a subject, a pre-processing module 104, a trained pathology image neural network 106, an output 108 of the pathology image neural network 106, data storage 110, 114, and a display 116. In some embodiments, the input pathology image(s) 102 of the subject may be, for example, histopathology images. The input pathology image(s) may be acquired using known imaging systems and methods. For example, in some embodiments the input pathology image 102 may be an image of a tissue specimen of a subject (e.g., a biopsy or surgical specimen) acquired using a slide scanner. In some embodiments, the input pathology image(s) 102 may be retrieved from data storage (or memory) 210 of system 100, data storage of an imaging system, or data storage of other computer systems.

[0021] The pathology image(s) 102 may be provided as an input to the pathology image neural network 106. In some embodiments, the input pathology image(s) 102 may be pre-processed using the pre-processing module 104. For example, a input pathology image 102 of the subject may be processed to generate one or more input patches. In some embodiments, the pathology image neural network 106 may be trained to perform analysis tasks or functions including, but not limited to, classification, segmentation, disease (e.g., cancer) diagnosis, etc. For example, the pathology image neural network 106 may be configured to generate or predict a grade for a particular disease, to generate a segmentation map, or to predict a diagnosis for a disease (e.g., cancer). In some embodiments, the pathology image neural network can be a deep neural network (DNN) such as, for example, a convolutional neural network (CNN). The pathology image neural network 106 may be implemented using known deep neural network architectures. In some embodiments, the pathology image neural network 106 may be implemented using an EfficientNet-BO architecture. The pathology image neural network may be configured to generate an output 108 based on the analysis of the input pathology image 102 such as, for example, a classification for the input pathology image 102 or for portions of the input pathology image 102, a diagnosis, a prediction or probability of a particular classification or diagnosis, a segmentation map, an image, etc. The output 108 may be displayed on a display 116 (e.g., display 718 of the computer system 700 shown in FIG. 8). The output 108 may also be stored in data storage, for example, data storage 114 (e.g., device storage 716 of computer system 700 shown in FIG. 7).

[0022] In some embodiments, the pathology image neural network 106 can be trained using training data (or dataset) 112. In some embodiments, the training data 112 includes a plurality of pathology images (or samples or data points) and can include pathology images both with or without manual annotations. In some embodiments, the training data 112 may be, for example, histopathology images. The training data 112 may be acquired using known imaging systems and methods. For example, in some embodiments the training data 112 may include images of a tissue specimens acquired using a slide scanner. In some embodiments, the training data 112 may be retrieved from data storage (or memory) 110 of system 100, data storage of an imaging system, or data storage of other computer systems. In some embodiments, the training data 112 may be pre-processed using the pre-processing module 104. For example, pre-processing module 104 may be configured to identify the most informative tissue areas in each sample in the training data 112 and generate a set of patches. An example method for pre-processing the training data 112 is described below with respect to FIG. 4. The pathology image neural network 106 may be trained to analyze the input pathology image 102 using a training method that includes an active learning framework as described further below with respect to FIGs. 2A-6. The active learning framework may be advantageously configured to dynamically identify and select noisy samples, informative samples and confident predictive samples in the training data 112 during training.

[0023] In some embodiments, the pre-processing module 104 and pathology image neural network 106 may be implemented on one or more processors (or processor devices) of a computer system such as, for example, any general-purpose computing system or device, such as a personal computer, workstation, cellular phone, smartphone, laptop, tablet, or the like. As such, the computer system may include any suitable hardware and component designed or capable of carrying out a variety of processing and control tasks, including, but not limited to, steps for receiving pathology image(s) of the subject, implementing the pre-processing module 104, implementing the pathology image neural network 106, providing the output 108 of the pathology image neural network 106 to a display 116 or storing the output 108 in data storage 114. For example, the computer system may include a programmable processor or combination of programmable processors, such as central processing units (CPUs), graphics processing units (GPUs), and the like. In some implementations, the one or more processors of the computer system may be configured to execute instructions stored in a non-transitory computer readable-media. In this regard, the computer system may be any device or system designed to integrate a variety of software, hardware, capabilities, and functionalities. Alternatively, and by way of particular configurations and programming, the computer system may be a special-purpose system or device. For instance, such special purpose system or device may include one or more dedicated processing units or modules that may be configured (e.g., hardwired, or pre-programmed) to carry out steps, in accordance with aspects of the present disclosure.

[0024] FIG. 2A is a block diagram of a system for training a deep neural network (DNN) for analyzing pathology images in accordance with an embodiment and FIG. 2B is a block diagram of a process for generating a labeled dataset for training a DNN for analyzing pathology images in accordance with an embodiment. The system 200 can include a training dataset (T) 202 including a set of labeled samples (Li) 204 and a set of unlabeled samples (Ui) 206, a pathology image neural network 106 including a pathology classifier 208 and a training loss and predictive entropy monitor 210, a complexity level classification module 212, and a sample selector 214. As mentioned above, during each training iteration, the system 200 can advantageously consider three groups of samples (e.g., data points or images) from the training dataset 202, namely: 1) labeled (or annotated) samples from the labeled set 204 that are in the training dataset 202 that have ahigh probability of incorrect label assignment (i.e., noisy samples), denoted as noisy sample set Ni (216); 2) unlabeled samples from the unlabeled set 206 that are the most informative to the current pathology image neural network model (i.e., informative samples), denoted as the informative sample set li (218); and 3) unlabeled samples from the unlabeled set 206 for which the current pathology image neural network model is confident in its predictions (i.e., confident (or confident predictive) samples), denoted as the confident (or confident predictive) sample set Ci (220). The identified noisy sample set 216 (Ni), informative sample set 218 (L), and confident predictive set 220 (Ci) may be used to generate an updated set of labeled samples (Li+i ) 222 which may be used to train the pathology image neural network 106 in the next iteration of the training process. In some embodiments, as illustrated in FIG. 2B., the updated set of labeled samples Li+i may be generated by discarding the noisy samples Ni from the current set of labeled samples Li, require experts to annotate the informative samples li and then add the informative samples with expert annotation to the current set of labeled samples Li, and automatically add the confident predictive samples Ci to the current set of labeled samples Li, where the confident predictive samples 220 have assigned annotations (or labels) from the current pathology image neural network 106 (e.g., pathology classifier 208). Thus, the updated set of labeled samples 222 may be given by:

where N_t c L_t, l_L c u_t, and Q £ u_t, and the training dataset (202) T = fy + fy. [0025] For each iteration of the training process, the pathology image neural network 106 is first trained with the current set of labeled (or annotated) samples (e.g., images) 204 from the training dataset 202. In some embodiments, the pathology image neural network 106 may be trained until the neural network 106 converges. In FIG. 2A, the current iteration of the trained pathology image neural network 106 is represented by the pathology classifier 208. It should be understood that the pathology classifier 208 may be configured to perform one of various analysis tasks such as, for example, classification, segmentation, diagnosis, etc. Next, the training process may continue by utilizing the training loss and predictive entropy monitor 210 and the complexity level classification module 212 to select the noisy samples 216, informative samples 218 and confident predictive samples 220 from the training dataset 202. The training loss and predictive entropy monitor 210 may be configured to force the pathology image neural network 106 to modulate between overfitting and underfitting over a plurality of epochs, and to determine and rank an average training loss for each labeled sample and an average predictive entropy for each unlabeled sample. In some embodiments, the training loss and predictive entropy monitor 210 may be configured to adjust one or more hyperparameters (e.g., learning rate) of the pathology image neural network 106 to cause the neural network to modulate between overfitting and underfitting. A method for determining training loss and predictive entropy of samples in a training dataset using a training loss and predictive entropy monitor is discussed further below with respect to FIG. 6. The complexity level classification module 212 may be configured to measure and classify a complexity level for each of the samples (labeled and unlabeled) in the training dataset 202. In some embodiments, each sample in the training dataset 202 may be classified with a complexity level of easy, medium or hard. In some embodiments, each sample in the training dataset 202 may be classified with a complexity level of easy, medium or hard based on their local density in feature space as discussed further below with respect to FIG. 5. In some embodiments, the complexity level classification module 212 may be implemented as a curriculum classification (CC) process. In some embodiments, the complexity level classification module 212 and the training loss and predicative entropy monitor 212 advantageously operate simultaneously. Accordingly, the average training loss and average predictive entropy may be determined simultaneously with the complexity level classification.

[0026] The sample selector 214 may be configured to identify and select: 1) the noisy samples 216 from the labeled samples based on the determined average training loss and complexity level classification; 2) the informative samples 218 from the unlabeled samples based on the determined predictive entropy; and 3) the confident predictive samples 220 from the unlabeled samples based on the determined average predictive entropy and complexity level classification. In some embodiments, the sample selector 214 is also configured to distinguish noisy samples from hard samples in the set of labeled samples 202. In some embodiments, the noisy samples 216 may be samples that have a large average training loss and are classified as easy by the complexity level classification module 212. However, a labeled sample that is classified as "hard" for its complexity but has a large training loss variation, is more likely to be a hard sample. In some embodiments, the informative samples 218 (i.e., that require additional expert annotations) may be samples that have the highest average predictive entropy. In some embodiments, the confident predictive samples 220 may be samples that have the lowest average predictive entropy and are classified as easy or medium by the complexity level classification module 212. As mentioned above, an updated set of labeled samples (L_i+1) 222 for use in the next training iteration may be generated by discarding the noisy samples N_t, requesting human experts to annotate the informative samples It and adding the informative samples to L_L after expert annotation, and adding the confident predictive samples with "pseudo labels" to fy. In some embodiments, the "pseudo-labels" may be assigned to the confident predictive samples 220 by the pathology classifier 208 of the pathology image neural network 106.

[0027] In some embodiments, as the pathology image neural network 106 model evolves during the training process, the training loss and predictive entropy monitor 210 and complexity level classification module 212 results may change. Therefore, in some embodiments, the identified noisy samples 216 may not be discarded completely and may be stored in data storage (e.g., data storage 114 shown in FIG. 1). The stored noisy samples 216 may be examined to determine if they need to be added back throughout the training process. Accordingly, system 200 may advantageously include a mechanism to correct errors made at the beginning of the training process.

[0028] In some embodiments, the pathology image neural network 106 including the pathology classifier 208 and training loss and predictive entropy monitor 210, the complexity level classification module 212 and the sample selector 214 may be implemented on one or more processors (or processor devices) of a computer system such as, for example, any general-purpose computing system or device, such as a personal computer, workstation, cellular phone, smartphone, laptop, tablet, or the like. As such, the computer system may include any suitable hardware and component designed or capable of carrying out a variety of processing and control tasks, including, but not limited to, steps for receiving training dataset 202, implementing pathology image neural network 106 including the pathology classifier 208 and loss and predictive entropy monitor 210, the complexity level classification module 212 and the sample selector 214, or storing the selected samples 216, 218, 220 and updated set of labeled samples 222 in data storage (e.g., data storage 110 of system 100 shown in FIG. 1). For example, the computer system may include a programmable processor or combination of programmable processors, such as central processing units (CPUs), graphics processing units (GPUs), and the like. In some implementations, the one or more processors of the computer system may be configured to execute instructions stored in a non-transitory computer readable-media. In this regard, the computer system may be any device or system designed to integrate a variety of software, hardware, capabilities, and functionalities. Alternatively, and by way of particular configurations and programming., the computer system may be a special-purpose system or device. For instance, such special purpose system or device may include one or more dedicated processing units or modules that may be configured (e.g., hardwired, or pre-programmed) to carry out steps, in accordance with aspects of the present disclosure.

[0029] FIG. 3 illustrates a method for training a DNN for analyzing pathology images in accordance with an embodiment. The process illustrated in FIG. 3 is described below as being caried out by the system 200 for training a DNN for analyzing pathology images as illustrated in FIG. 2A. Although the blocks of the process are illustrated in a particular order, in some embodiments, one or more blocks may be executed in a different order than illustrated in FIG. 3, or may be bypassed.

[0030] In some embodiments, a training dataset 202 (or training data 112 shown in FIG. 1) may be retrieved from data storage (e.g., data storage 110 shown in FIG. 1). In some embodiments, the training dataset 202 (e.g., training data 112) can include\ a plurality of pathology images (or samples or data points). In some embodiments, the training dataset 202 may be pre-processed. For example, the pre-processing module 104 (shown in FIG. 1) may be configured to identify the most informative tissue areas in each sample in the training data 112 and generate a set of patches. An example process for pre-processing a training dataset 202 is described below with respect to FIG. 4. In some embodiments, the training process including the disclosed adaptive learning framework can include a plurality of iterations (i). In each iteration, the disclosed adaptive learning framework can include training the pathology image neural network 106 using the current set of labeled samples 204 in the training dataset 202 and then generating an updated set of labeled data 222 for the training dataset 202 that may be used to train the pathology image neural network 106 in the next training iteration.

[0031] At block 304, the current set of labeled data 204 in the training dataset 202 may be used to train the neural network for analyzing pathology images 106 until, for example, the network 106 converges. In some embodiments, the pathology image neural network 106 may be trained to perform analysis tasks or functions including, but not limited to, classification, segmentation, disease (e.g., cancer) diagnosis, etc. For example, the pathology image neural network 106 may be configured to generate or predict a grade for a particular disease, to generate a segmentation map, or to predict a diagnosis for a disease (e.g., cancer). In some embodiments, the pathology image neural network can be a deep neural network (DNN) such as, for example, a convolutional neural network (CNN). At block 306, each sample in the training dataset 202 may be classified with one of a plurality of classification levels using, for example, the complexity level classification module 212. In some embodiments, each sample in the training dataset 202 (both labeled samples 204 and unlabeled samples 206) may be classified with a complexity level of easy, medium or hard. In some embodiments, each sample in the training dataset 202 may be classified with a complexity level of easy, medium or hard based on their local density in feature space as discussed further below with respect to FIG. 5. In some embodiments, the complexity level classification module 212 may be implemented as a curriculum classification (CC) process. [0032] At block 308, an average training loss for each sampled in the set of labeled samples 204 may be determined and at block 310, an average predictive entropy for each sample in the set of unlabeled samples 206 may be determined. In some embodiments, the training loss and predictive entropy monitor 210 may be configured to force the pathology image neural network 106 to modulate between overfitting and underfitting over a plurality of epochs (e.g., by adjusting one or more hyperparameters of the pathology image neural network 106), and to determine and rank an average training loss for each labeled sample and an average predictive entropy for each unlabeled sample. A method for determining training loss and predictive entropy of samples in a training dataset using a training loss and predictive entropy monitor is discussed further below with respect to FIG. 6. In some embodiments, the complexity level classification of the samples in the training dataset 202 at block 306 and the determination of the average training loss at block 308 and average predictive entropy at block 310 may be performed simultaneously.

[0033] In some embodiments, the complexity level classification, the determined average training loss and the determined average predictive entropy may be used, for example, by a sample selector 214, to detect and select noisy samples 216 from the set of labeled samples 204 in the retaining dataset 202 and to detect and select informative samples 218 and confident predictive samples 220 from the set of unlabeled samples 206 in the training dataset 202. At block 312, noisy samples 216 may be identified and selected from the set of labeled samples 204 based on the assigned complexity level and determined average training loss for each labeled sample. In some embodiments, noisy samples 216 may be samples that have a large average training loss and are classified as easy by the complexity level classification module 212. A sample that is classified as "easy" based on its complexity by has a large average training loss variation may be more likely to be annotated (or labeled) incorrectly, e.g., there is a higher probability that the pathologist annotations for these samples contain noise. At block 314, informative samples may be identified and selected from the set of unlabeled samples 206 based on the determined average predictive entropy for each unlabeled sample. In some embodiments, the informative samples 218 (i.e., that require additional expert annotations) may be samples that have the highest average predictive entropy. These samples may be the most informative because they cannot be predicted confidently by the current pathology image neural network 106 model (e.g., the pathology classifier 208). At block 316, the confident predictive samples 220 may be identified and selected from the set of unlabeled samples 206 based on the assigned complexity level and determined average predictive entropy for each unlabeled sample. In some embodiments, the confident predictive samples 220 may be samples that have the lowest average predictive entropy and are classified as easy or medium by the complexity level classification module 212. Accordingly, there may be a high probability or confidence that predictions (or labels) by the current pathology image neural network 106 model (e.g., the pathology classifier 208) for these samples are correct.

[0034] At block 318, an updated set of labeled samples (Li+i) 222 for the training dataset 202 for use in the next training iteration may be generated removing (or discarding) the noisy samples N_t, requesting human experts to annotate the informative samples It and adding them to L_L. and adding the confident predictive samples C_L with "pseudo labels" to Lj. In some embodiments, the "pseudo-labels" may be assigned to the confident predictive samples 220 by the pathology classifier 208 of the pathology image neural network 106. At block 320, the updated set of labeled samples (Li+i) 222 may be used to train the pathology image neural network 106, for example, until the network 106 converges. At block 322, if the training process has not reached the final iteration, the process may return to block 306 and repeat blocks 306 to 320 for each iteration. If at block 322 the training process has reached the last iteration, at block 324 the trained pathology image neural network 106 may be stored in data storage (e.g., device storage 716 of computer system 700 shown in FIG. 7).

[0035] As mentioned above, the samples (e.g., data points or images) in the training dataset 202 (or training data 112 shown in FIG. 1) may be pre-processed, for example using a pre-processing module 104 (shown in FIG. 1). FIG. 4 illustrates an example process for pre-processing training data in accordance with an embodiment. In some embodiments, each sample (e.g., image) in the training dataset 202 may be large and it may be advantageous to locate area of concern on which to focus during the training of the pathology image neural network 106. In some embodiments, a tiling algorithm with a blue-ratio selection criteria may be used to identify the most informative tissue areas in each sample. First, a binary tissue mask 404 of the tissue in the original image 402 (e.g., an image of tissue on the slide) may be created by setting a threshold for the average intensity in the original image 402. For example, this threshold may be set empirically to 90% of the maximumimage intensity value in the original image 402. Second, the tissue mask 404 may be smoothed using morphological closing and the skeleton (or spline) 406 of the smoothed mask may then be found and branches may be removed by finding the endpoints with the maximum geodesic distance. Third, the mid-line may be partitioned based on a patch size and overlap, tangent lines may be found at each of these locations by lookingat, for example, the neighborhood of nine pixels along the mid-line and a perpendicular line may be drawn until intersection with the mask boundary. Finally, a set of patches 408 that intersect more than a predetermined percentage (e.g., 60%) with the mask 404 may be chosen to calculate their blue ratio, and the top k blue-ratio patches 410 may be selected. In some embodiments, a patch size of 256 256 pixels may be used, and 36 patches may be selected for each image (e.g., image of a slide).

[0036] FIG. 5 illustrates a method for complexity level classification in accordance with an embodiment. The process illustrated in FIG. 5 is described below as being caried out by the system 200 for training a DNN for analyzing pathology images as illustrated in FIG. 2A. Although the blocks of the process are illustrated in a particular order, in some embodiments, one or more blocks may be executed in a different order than illustrated in FIG. 3, or may be bypassed.

[0037] As mentioned above, one aspect of the disclosed training process and active learning framework can be classifying each sample in the training dataset 202 (labeled 204 and unlabeled 206 samples) with a complexity level classification. The complexity level classification can be performed using a complexity level classification module 212. In some embodiments, the complexity level classification is advantageously implemented using curriculum classification or learning to classify each sample in the training dataset 202 with a complexity level, for example, easy, medium and hard. In each training iteration, at block 502 the current pathology image network model 106 (e.g., pathology classifier 208) may be used to compute a deep representation (z.e. fully-convolutional features) for each sample (e.g., image) in the training dataset 202. This step may be used to roughly map all training samples (or images) into a feature space where the underlying structure and the complexity of the images can be discovered. Each sample may then be classified into different complexity levels, ranging from easy samples with high-signal labels to difficult samples whose labels may contain noise. To do so, in some embodiments, at block 504 the dimension of the deep features may first be reduced using, for example, t-distributed Stochastic Neighbor Embedding (t-SNE). With this set of reduced features, at block 506 K-means may be used to cluster the samples (and deep image features) into different groups (or subsets). In some embodiments, each group will ideally contain images with similar diagnoses. This step may be used to help the following process select representative samples covering the whole training sample space. At block 508, a Euclidean distance matrix D c _cal_cul_ated as:

where n is the number of images in the same group, It, Ij are two images in this group,

are the feature vectors of the two images in deep feature space. D indicates a similarity value between h and Ij. Then, at block 510, a local density (p_;) may be calculated for each image,

where

X(d) = P ^d < ⁰ Eqn. 4 k other ⁴

[0038] de in the above equation 3 is a distance threshold that may be selected for determining the local density. The value d_c may be selected by first sorting n² distances from small to large values, and choosing the top k%. In some embodiments, k = 60. The local density p_; can count how many samples are closer to image h in the deep feature space than the threshold d_c. Finally, at block 512, each sample in the training dataset 202 may be classified with a complexity level based in the local density. In some embodiments, a K-means clustering method may be used to classify each sample as easy, medium, or hard based on their local density for each group. In some embodiments, it may be assumed that a group of easy images with correct labels often may have similar visual characteristics, project closely to each other in the feature space, and therefore may have a high p_L. By contrast, hard images often may have more visual diversity, resulting in a sparse distribution with a smaller p_t. At block 514, the complexity level classification (e.g., easy, medium, hard) for each sample (or image) may be stored in data storage (e.g., data storage 114 shown in FIG. 1).

[0039] As the pathology image neural network 106 model disclosed here evolves during each active learning iteration, it may be difficult to distinguish whether a sample is a noisy sample that has an incorrect label or is a complex (or hard) sample that the pathology image neural network 106 has not learned yet. Accordingly, in some embodiments, the training process may be configured to distinguish noisy samples from hard samples, to discover the most informative samples to be annotated (e.g., by an expert or oracle), and to discover confident predictive samples which are assigned pseudo labels by the current pathology image neural network. [0040] FIG. 6 illustrates a method for determining training loss and predictive entropy of samples in a training dataset in accordance with an embodiment. The process illustrated in FIG. 6 is described below as being caried out by the system 200 for training a DNN for analyzing pathology images as illustrated in FIG. 2A. Although the blocks of the process are illustrated in a particular order, in some embodiments, one or more blocks may be executed in a different order than illustrated in FIG. 3, or may be bypassed.

[0041] In some embodiments, the training loss and predictive entropy monitor 210 may be configured to cycle training between underfitting and overfitting while observing the variation of training loss for the labeled samples 204 in the training dataset 202 and the predictive entropy for the unlabeled samples 206 in the training dataset 202. In some embodiments, the training loss and predictive entropy monitor 210 may be configured to adjust one or more hyperparameters (e.g., learning rate) of the pathology image neural network 106 to cause the neural network to modulate between overfitting and underfitting. In each iteration of the disclosed training process, after the pathology image neural network 106 converges as described above with respect to blocks 304 and 320 of FIG. 3, the training loss of each labeled sample and predictive entropy of each unlabeled sample may be monitored over a plurality of epochs and used to determine an average training loss for each of the labeled samples 204 and an average predictive entropy for each of the unlabeled samples. In FIG. 6, blocks 602 to 608 are performed for each epoch during an iteration of the disclosed active learning framework of the training process. At block 602, a hyperparameter of the pathology image neural network 106, for example, a learning rate, may be set or adjusted. The learning rate may be set or adjusted using, for example, the training loss and predictive entropy monitor 210. In some embodiments, the learning rate may initially be set as a large learning rate. The learning rate may be adjusted for each epoch so that the pathology image neural network 106 may transition from overfitting to underfitting cyclically. In some embodiments, the learning rate may be adjusted based on a cosine annealing function in each cyclic round (e.g., in each epoch) as:

where Tmax is the epoch for one cycle, and Irmin and lr_max are the minimum and maximum learning rates in one cycle.

[0042] At block 604, a training loss for each labeled sample in the set of labeled samples 204 may be determined and stored in data storage (e.g., data storage 114 shown in FIG. 1) using, for example, the training loss and predictive entropy monitor 210. In some embodiments, the training loss and predictive entropy monitor 210 may be implemented using a noise cleansing technique. At block 606, the image pathology neural network 106 may be updated. At block 608, a predictive entropy for each unlabeled sample in the set of unlabeled samples may be determined and stored in data storage (e.g., data storage 114) using, for example, the training loss and predictive entropy monitor 210. At block 610, it is determined if the last epoch for the current training iteration has been reached. If the last epoch has not been reached, the process returns to block 602. In some embodiments, the number of epochs may be determined to ensure that enough training loss and predictive entropy data is collected. If the last epoch has been reached, an average training loss for each labeled sample and an average predictive entropy loss for each unlabeled sample may be determined.

[0043] At block 612, an average training loss (Iss for each labeled sample may be determined using the collected training loss data for the sample over all of the epochs and stored in data storage (e.g., data storage 114). In some embodiments, the average training loss for each sample may be a normalized average training loss. At block 614, an average predictive entropy ent_t for each unlabeled sample may be determined using the collected predictive entropy data for the sample over all of the epochs and stored in data storage (e.g., data storage 114). In some embodiments, the average predictive entropy for each sample may be a normalized average predictive entropy. At block 616, all of the labeled samples may be ranked based on the average training loss and all of the unlabeled samples may be ranked based on the average predictive entropy. At block 618, the ranks for the labeled samples and the ranks for the unlabeled samples may be stored in data storage.

[0044] As mentioned above, the complexity level classifications and the average training loss and average predictive entropy may advantageously be used (e.g., by a sample selector 214) to identify and select noisy samples 216 from the set of labeled samples 204, and to identify and select informative samples 216 and confident predictive samples 218 from the set of unlabeled samples 206. In addition, the complexity level classifications and the average training loss and average predictive entropy may also advantageously be used to distinguish between noisy and hard samples. In some embodiments, a labeled sample that is classified as "easy" based on its complexity but has large training loss variation is more likely to be annotated incorrectly (i.e., a noisy sample); while a labeled sample that is classified as "hard" for its complexity but has a large training loss variation, is more likely to be a hard sample.

[0045] FIG. 7 is a block diagram of an example computer system in accordance with an embodiment. Computer system 700 may be used to implement the systems and methods described herein. In some embodiments, the computer system 700 may be a workstation, a notebook computer, a tablet device, a mobile device, a multimedia device, a network server, a mainframe, one or more controllers, one or more microcontrollers, or any other general -purpose or application-specific computing device. The computer system 700 may operate autonomously or semi-autonomously, or may read executable software instructions from the memory or storage device 716 or a computer-readable medium (e.g., a hard drive, a CD-ROM, flash memory), or may receive instructions via the input device 720 from a user, or any other source logically connected to a computer or device, such as another networked computer or server. Thus, in some embodiments, the computer system 700 can also include any suitable device for reading computer-readable storage media. [0046] Data, such as data acquired with, for example, an imaging system for acquiring pathology images, may be provided to the computer system 700 from a data storage device 716, and these data are received in a processing unit 702. In some embodiments, the processing unit 702 included one or more processors. For example, the processing unit 702 may include one or more of a digital signal processor (DSP) 704, a microprocessor unit (MPU) 706, and a graphic processing unit (GPU) 708. The processing unit 702 also includes a data acquisition unit 710 that is configured to electronically receive data to be processed. The DSP 704, MPU 706, GPU 708, and data acquisition unit 710 are all coupled to a communication bus 712. The communication bus 712 may be, for example, a group of wires, or a hardware used for switching data between the peripherals or between any component in the processing unit 702.

[0047] The processing unit 702 may also include a communication port 714 in electronic communication with other devices, which may include a storage device 716, a display 718, and one or more input devices 720. Examples of an input device 720 include, but are not limited to, a keyboard, a mouse, and a touch screen through which a user can provide an input. The storage device 716 may be configured to store data, which may include data such as, for example, training data, acquired pathology images, classification data, segmentation data, training loss data, predictive entropy data, etc., whether these data are provided to, or processed by, the processing unit 702. The display 718 may be used to display images and other information, such as patient health data, and so on.

[0048] The processing unit 702 can also be in electronic communication with a network 722 to transmit and receive data and other information. The communication port 714 can also be coupled to the processing unit 702 through a switched central resource, for example the communication bus 712. The processing unit 702 can also include temporary storage 724 and a display controller 726. The temporary storage 724 is configured to store temporary information. For example, the temporary storage can be a random access memory.

[0049] Computer-executable instructions for training a neural network for analyzing pathology images according to the above-described methods may be stored on a form of computer readable media. Computer readable media includes volatile and nonvolatile, removable, and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer readable media includes, but is not limited to, random access memory (RAM), read-only memory (ROM), electrically erasable programmable ROM (EEPROM), flash memory or other memory technology, compact disk ROM (CD- ROM), digital volatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired instructions and which may be accessed by a system (e.g., a computer), including by internet or other computer network form of access.

[0050] The present disclosure has been described in terms of one or more preferred embodiments, and it should be appreciated that many equivalents, alternatives, variations, and modifications, aside from those expressly stated, are possible and within the scope of the invention.

Claims

CLAIMS:

1. A method for training a neural network for analyzing pathology images, the method comprising:

(a) receiving a training dataset comprising a set of labeled samples and a set of unlabeled samples;

(b) training the neural network for analyzing pathology images using the set of labeled samples until the neural network converges;

(c) identifying at least one noisy sample in the set of labeled samples;

(d) identifying at least one informative sample in the set of unlabeled samples for further expert annotation and at least one confident predictive sample in the set of unlabeled samples;

(e) generating an updated set of labeled samples for the training dataset by removing the identified at least one noisy sample from the set of labeled samples, adding the identified at least one informative sample with further expert annotation to the set of labeled samples and adding the identified at least one confident predictive sample to the set of labeled samples;

(f) training the neural network for analyzing pathology images using the updated set of labeled samples; and

(g) storing the trained neural network for analyzing pathology images in a data storage.

2. The method according to claim 1, wherein the training of the neural network for analyzing pathology images is performed for a plurality of iterations and each iteration comprises repeating steps (c)-(f) such that an updated set of labeled samples is generated for each iteration.

3. The method according to claim 1, further comprising: classifying each sample in the training dataset with one of a plurality of complexity levels; determining a training loss for each sample in the set of labeled samples; and determining a predictive entropy for each sample in the set of unlabeled samples.

4. The method according to claim 3, wherein classifying each sample in the training dataset comprises assigning a complexity level to each sample based on a local density of the sample in a feature space.

5. The method according to claim 1, wherein each confident predictive sample has a label generated by the neural network for analyzing pathology images.

6. The method according to claim 3, wherein the training loss is an average training loss and the predictive entropy is an average predictive entropy.

7. The method according to claim 3, wherein determining a training loss for each sample in the set of labeled samples and determining a predictive entropy for each sample in the set of unlabeled samples comprises modulating the neural network between overfitting and underfitting.

8. The method according to claim 7, wherein modulating the neural network between overfitting and underfitting includes adjusting a learning rate of the neural network for analyzing pathology images.

9. The method according to claim 3, wherein classifying each sample in the training dataset with one of a plurality of complexity levels, determining a training loss for each sample in the set of labeled samples, and determining a predictive entropy for each sample in the set of unlabeled samples are performed simultaneously.

10. The method according to claim 3, wherein identifying at least one noisy sample in the set of labeled samples is based on the complexity level classification and determined training loss.

11. The method according to claim 3, wherein identifying at least one informative sample in the set of unlabeled samples is based in the determined predictive entropy.

12. The method according to claim 3, wherein identifying at least one confident predictive sample in the set of unlabeled samples is based on the complexity level classification and the determined predictive entropy.

13. The method according to claim 1, wherein the pathology images are histopathology images.

14. The method according to claim 3, wherein the plurality of complexity levels include easy, medium and hard.

15. A system for analyzing pathology images, the system comprising: an input for receiving at least one pathology image of a subject; and a neural network coupled to the input and configured to analyze the at least one pathology image, the neural network trained using a plurality of training iterations, wherein for at least one iteration the neural network is trained using a set of labeled samples generated by removing one or more noisy samples from an initial set of labeled samples in a training dataset, adding one or more informative samples with further expert annotation to the initial set of labeled samples and adding one or more confident predictive samples to the initial set of labeled samples.

16. The system according to claim 15, wherein the at least one pathology image is a histopathology image.

17. The system according to claim 15, wherein the one or more confident predictive samples incudes labels generated by the pathology image neural network.

18. The system according to claim 15, wherein the one or more noisy samples are identified from the training dataset based on a complexity level and a training loss.

19. The system according to claim 15, wherein the one or more informative samples are identified from the training dataset based a predictive entropy.

20. The system according to claim 15, wherein the one or more confident predictive samples are identified from the training dataset based on a complexity level and a predictive entropy.