CN112712522A

CN112712522A - Automatic segmentation method for oral cancer epithelial tissue region of pathological image

Info

Publication number: CN112712522A
Application number: CN202110116993.9A
Authority: CN
Inventors: 陆铖; 吴玉欣; 王亮; 郭玉川
Original assignee: Shandong Junteng Medical Technology Co ltd; Shaanxi Normal University
Current assignee: Shandong Junteng Medical Technology Co ltd; Shaanxi Normal University
Priority date: 2020-10-30
Filing date: 2021-01-28
Publication date: 2021-04-27
Also published as: CN112308840A

Abstract

A method for automatically segmenting an oral cancer epithelial tissue region of a pathological image comprises the following steps: s100: preprocessing a pathology image and a label of a pathologist by using a method for extracting a pixel block patch; s200: selecting a part of preprocessed pathological images as training samples to form a training set, and taking the rest of the preprocessed pathological images as verification samples to form a verification set; s300: constructing a convolutional neural network UNet model, and training and verifying the UNet model by adopting the training set and the verification set in the step S200 to obtain a final UNet model; s400: the final UNet model described above is applied to a multi-center external test set to automatically generate epithelial tissue regions. According to the method, each data set is not trained, verified and tested respectively in head and neck cancer detection, only TMA data is used for training, and testing is performed on the multi-center external WSI data set, so that the method is more convincing to unknown images.

Description

Automatic segmentation method for oral cancer epithelial tissue region of pathological image

Technical Field

The disclosure belongs to the technical field of medical image processing and machine learning, and particularly relates to an automatic segmentation method for an oral cancer epithelial tissue region of a pathological image.

Background

Oral squamous cell carcinoma (0C-SCC) is the most common head and neck malignancy worldwide, with high invasive capacity to adjacent tissues and metastatic potential to distant organs. There is therefore a need for better diagnostic and risk stratification tools for personalized treatment of OC-SCC patients.

The pathologist's method of diagnosis and risk factor stratification begins with the determination of the tumor area. These areas are typically analyzed by a pathologist through a light microscope using conventional stained tissue. The entire tumor area is composed of cancer cells and their supporting tissue matrix. In the case of cancer, epithelial tumor regions are the most evaluated component. Histomorphological analysis of these compartments has been carried out for decades and the type and extent of the tumour can often be reliably identified. Furthermore, the underlying morphological characteristics of OC-SCC have been found to correlate with patient prognosis. However, due to the lack of an effective marking tool and the susceptibility of marking to intra-observer and inter-observer variability, pathologists cannot effectively identify or quantify these histological markers reproducibly because it is time consuming and expert dependent.

Over the past few years, various quantitative digital pathology image analyses using Machine Learning (ML), in particular Deep Learning (DL), have shown that sub-visual attributes of tumor origin can be "unlocked" from digitized hematoxylin-eosin (H & E) stained tissue sections. In this work, different histological primitives including epithelium, renal tubules, lymphocytes, mitosis and cancer have been classified, detected or segmented from H & E stained images by the same DL network architecture AlexNet. However, when ML/DL is applied to more specific regions, in the epithelial segmentation of WSI, challenges such as time-consuming and boring of group Truth labeling, variability of staining, and difficulty in gathering information for processing images based on pixel blocks (patch) may be encountered.

Advanced digital pathology analysis using computer-aided pattern recognition tools has been shown to "unlock" sub-visual attributes and provide quantitative characterization of tumors. Automatic differentiation between epithelium and other tissues is an important prerequisite for the development of automated methods for the detection and objective characterization of OC-SCC.

Disclosure of Invention

In order to solve the above problems, the present disclosure provides an automatic segmentation method of an oral cancer epithelial tissue region of a pathology image, comprising the steps of:

s100: preprocessing a pathology image and a label of a pathologist by using a method for extracting a pixel block patch;

s200: selecting a part of preprocessed pathological images as training samples to form a training set, and taking the rest of the preprocessed pathological images as verification samples to form a verification set;

s300: constructing a convolutional neural network UNet model, and training and verifying the UNet model by adopting the training set and the verification set in the step S200 to obtain a final UNet model;

s400: the final UNet model described above is applied to a multi-center external test set to automatically generate epithelial tissue regions.

Through the technical scheme, a deep learning framework based on UNet is provided and is used for automatically detecting the epithelial tissue area from the H & E stained tissue image. Images and labels were first preprocessed using the patch extraction method, DL models were trained and validated, and two types of Tissue Microarray (TMA) images and detailed human expert labeling were used at different magnifications. This locked model was then tested independently in data sets from different institutions. These independent data sets consist of full glass images (WSI) and are independent of the training and validation phase.

This method does not train, validate and test each dataset separately. Because, if each data set is trained for validation and testing, it is equivalent to testing based on the information known from those data sets, the results of such testing may be less compelling to "unknown" data. The method only uses 2 TMA images of centers (mechanisms) to test the WSI images of the outer 3 centers (mechanisms), uses different image types (TMA and WSI) in training and testing, achieves considerable effect on an external test set, and is more convincing to unknown images.

Drawings

Fig. 1 is a flow chart of a method for automatic segmentation of an oral cancer epithelial tissue region of a pathology image provided in an embodiment of the present disclosure;

FIG. 2 is a depiction of a representative TMA and its epithelial tissue in accordance with an embodiment of the present disclosure;

FIG. 3 is a labeled diagram of a representative WSI and its corresponding polygon box in one embodiment of the present disclosure;

FIG. 4 is a UNet model architecture diagram for epithelial segmentation of oral cancer in one embodiment of the present disclosure;

FIG. 5 is a comparison of six representative validation TMA image results in one embodiment of the disclosure;

fig. 6 is a comparison graph of four representative test WSI image results in one embodiment of the present disclosure.

Detailed Description

In one embodiment, as shown in fig. 1, there is disclosed a method for automatic segmentation of an epithelial tissue region of oral cancer providing a pathology image, comprising the steps of:

For this embodiment, the UNet-based deep learning framework is used to segment epithelial regions from two types of histopathology images: tissue Microarrays (TMAs) and full glass images (WSIs). In the training phase, a total of 212 labeled TMAs from 190 patients from two different institutions were used for model training. In the testing phase, 478 WSIs tested using 477 OC-SCC patients from three different institutions. Finally, the results were compared to the pathologist's annotated WSI.

The label of the TMA type image is directly a binary image, the label of the WSI type image is labeled by a pathology expert through labeling software (such as QuPath), firstly, an xml file containing a set of coordinates of a labeled region polygon is obtained, and then the xml file is converted into the binary image.

The preprocessing in the training process is the processing of the TMA image, and is mainly the process of cutting a large image into small images. The TMA image we take is 40 x. (40 × 40, which is a picture taken under a microscope at 40 × magnification.) the procedure is as follows: 1. the 40x original image is down-sampled to 10x, for example, 8000x8000 pixels and 2000x2000 pixels (4-fold relationship). 2. Then fetch patch (256x256) at 10x image, fetch (non-overlapping) without overlap; labels (binary images), are also processed. 3. Thus, each TMA image gets some patches, which are classified according to TMA level when the training and verification sets are divided, i.e. patches belonging to the same TMA do not appear in the training and verification sets at the same time.

In another embodiment, the patient selection is as follows: the san Louis human research protection office at the university of Washington (WU), Ohio State University (OSU), san Francisco refuge military administration of medicine center (SFVA), university of California san Francisco division of Japan (UCSF), Vanbel university of king medicine center (VUMC). Patients with oral squamous cell carcinoma were identified from a database of radiation oncology and otorhinolaryngology head and neck surgery clinicians approved by the human research protection office. The radiation oncology database is a collection of approved patients for treatment by a radiation oncologist.

The method comprises a data set consisting of five independent and well-characterized formalin-fixed paraffin-embedded (FFPE) and H & E-stained full-length glass images (WSI) and Tissue Microarrays (TMAs), representing a total of n 667 patients. All WSI and TMA slices were digitally scanned at 40x magnification using an Aperio Scanscope XT digital scanner with a resolution of 0.25 μm per pixel. Each lesion on the TMA is assigned a code number to share and with a link to only the actual patient data known to the study pathologist. For TMA generation, the study pathologist selected a 2mm central tumor punch for use (i.e., those on the slide that represented the most tumor and the best).

These five data sets consisted of D1 (97 TMAs from 75 patients in OSU), D2 (115 TMAs from 115 patients in WU), D3 (95 WSIs from 94 patients in SFVA), D4 (182 WSIs from 182 patients in VUMC) and D5 (201 WSIs from 201 patients in UCSF), respectively. The corresponding clinical pathology and outcome information for patients from D1, D2, and D4 were obtained from IRB approved retrospective chart reviews (institutions collecting data sets). Data sets D1 and D2 were used for training of an automated epithelial tissue segmentation model, while data sets D3, D4 and D5 were used for independent testing of the trained model. Table 1 summarizes clinical and pathological data for patients for all five datasets.

TABLE 1

In another embodiment, the notation is as follows: data sets D1 and D2 consisted of 212 TMAs, all epithelial regions were labeled by one research pathologist using an in-house developed labeling tool. FIG. 2 shows binary images of representative TMA (a, b) D1 and (c, D) D2 and labeled epithelial tissue regions. Since the data sets D1 and D2 were used for model training, all epithelial regions were labeled in detail, and we tried to make the labeling as accurate as possible. Wherein each image in FIG. 2 has a width or height of 5k-7k pixels; (a, b) is TMA from D1; (c, D) is TMA from D2; the third and fourth columns of images are detailed scaled parts of these images in the first and second columns.

To provide an accurate assessment of the training model, all epithelial regions of n 478 WSIs labeled in D3, D4 and D5. Fig. 3 shows representative WSI and labeled epithelial tissue regions (polygon line boxes) for (a, b, c) D3 and (D, e, f) D4 and (g, h, i) D5. Some WSI annotations may not be exhaustive due to the substantial workload of annotation, but we try to make the annotations as accurate as possible.

In another embodiment, the training set is made up of tissue microarray TMAs and labels made by a pathologist.

For this embodiment, a set of 212 TMAs and annotations made by the pathologist were used to train and validate the model to identify epithelial tissue regions. The average size of TMA was 6000x6000 pixels. We split the training data sets D1 and D2 into 9: 1 training and validation sets. Patch from the same TMA will not be placed in both the training and validation set. The model was trained over 50 epochs and a validation set was used at the end of each Epoch to track the degree of model convergence. The final model is selected by minimizing the error on the validation set and then locked in.

In another embodiment, the test set is composed of full glass images WSI.

For this embodiment, the model was applied to a separate test set consisting of WSI data sets D3, D4, and D5 to automatically generate epithelial regions.

In another embodiment, the performance of the model in the independent test set is evaluated.

For this example, in the evaluation, we used the following indices:

pixel precision: it is only necessary to find the number of correctly classified pixels divided by the total number of pixels. For 2 classes (0: negative class, 1: positive class).

Wherein p is_ijIs the number predicted to be class i and actually belonging to class j.

Recall (Recall):

positive predictive value (Positive predictive value):

dice coefficient (Dice coeffient):

where TP represents the number of true positives, FP represents the number of false positives and FN represents the number of false negatives.

In another embodiment, the convolutional neural network UNet model consists of 28 layers, each convolutional layer having a padding operation.

For this embodiment, the UNet model structure for epithelial tissue segmentation of oral cancer is shown in fig. 4. The network consists of 28 layers, which contain 14788929 parameters. Each convolutional layer has a padding operation (padding) that will make the output the same size in height and width as the input.

All images were scanned at 40 times magnification (0.25 μm/pixel). We trained and validated the model using 5-, 10-and 20-fold magnification images, respectively. In preparing the training samples, basic horizontal and vertical flipping operations are used for data enhancement. In order to focus our model on learning tissue morphology and reduce the effects caused by color changes, we also adjusted the brightness (1, 1.4), contrast (1.1.4), saturation (1, 1.4), hue (-0.5, 0.5) and gaussian blur {1, 2} of the image.

In another embodiment, the convolutional neural network UNet model is composed of a convolution block, an deconvolution block, a pooling layer, and an output layer.

In another embodiment, the convolution blocks have 5 groups, each group of convolution blocks has 2 convolution layers, each convolution layer is followed by a batch normalization layer and a ReLU activation layer, and the other convolution blocks except the first group of convolution blocks have a pooling layer at the head of the other convolution blocks; the number of the anti-convolution blocks is 4, each anti-convolution block comprises an anti-convolution layer and 2 convolution layers, and each convolution layer is followed by a batch normalization layer and a ReLU activation layer; the output layer comprises only one convolutional layer.

In another embodiment, the convolution kernel size of the convolution layer in the convolution block and the deconvolution block is 3, the step size is 1, and the padding is 1; the convolution kernel size of the pooling layer in the convolution block is 2, the step length is 2, and the padding is 0; the convolution kernel size of the deconvolution layer in the deconvolution block is 2, the step length is 2, and the padding is 0; the convolution kernel size of the convolution layer of the output layer is 1, the step length is 1, and the padding is 0.

In another embodiment, the output of the convolutional neural network UNet model is arranged with a channel, the output of the channel is input into a BCEWithLogitsLoss loss function to obtain an epithelial tissue probability map, and the probability map is converted into a binary map with 0.5 as a boundary for comparison with an expert marker.

For the embodiment, the bcewithlogitslloss function firstly transmits the output of the network into the Sigmoid function to obtain a normalized probability map, that is, the Sigmoid function restricts the output of the network within the range of 0 to 1 to represent the probability, and then compares the probability with the binary map marked by the expert through the cross entropy function.

In another embodiment, the convolutional neural network UNet model trains the RMSprop optimization algorithm using a learning rate of 0.01 and a weight decay of 1 e-8.

In another embodiment, during training, the tissue microarray TMA and its binary signature map are correspondingly sub-sampled and trained.

For this example, all TMAs and their binary label maps are correspondingly sub-sampled and trained, as we try to compare the performance of models with different magnifications (i.e. 5 x, 10x and 20 x) training sets.

In another embodiment, during training, the tissue microarray TMA and associated annotations are clipped to non-overlapping 256x256 pixel blocks.

For this example, a TMA is cropped and then split into 256x256 pixel non-overlapping image patches, resulting in 8802 training patches (191 TMAs) and 954 validation patches (21 TMAs).

In another embodiment, all pixel patches are extracted from the regions of marked epithelial tissue and non-epithelial tissue.

For this example, all patches were extracted from the annotated epithelial and non-epithelial tissue regions, with a size of 256x256 pixels.

In another embodiment, Table 2 shows the performance of the model in the validation set at different magnifications. Table 3 shows the model performance in the test set at different magnifications. Where HEA is represented in the results comparison below as an intelligent H & E image enhancement, implemented in the paws stomictk, can be added to our color enhancement. SE is then expressed as a basic UNet network structure model with enhanced attention by adding SE blocks to each convolutional layer in the downsampling. On the verification set, compared with the labeling of ground facts by a pathologist, the UNet model under the 10x magnification of epithelial tissue segmentation has 88.05% of pixel Precision (PA), 82.74% of recall rate (R), 86.41% of positive predictive value (PPA) and 84.53% of Dice coefficient (D). According to our other metrics, conflicting or inaccurate portions of our test set annotations may result in low recall.

Magnification factor	Model (model)	Loss	PA％	R％	PPV％	D％
							5	UNet	0.34	82.84	88.12	70.23	78.17
10	UNet	0.27	88.05	82.74	86.41	84.53
							10	UNet+HEA	0.28	87.31	86.28	82.41	84.30
10	UNet+SE	0.30	86.28	91.79	77.57	84.08
							20	UNet	0.29	86.71	81.40	86.24	83.75

TABLE 2

Data set	Magnification factor	Model (model)	PA％	R％	PPV％	D％
							D3	5	UNet	82.50	35.70	59.02	39.42
D3	10	UNet	86.04	40.43	77.97	52.82
							D3	10	UNet+HEA	85.57	50.52	67.72	57.23
D3	10	UNet+SE	81.40	53.47	52.55	54.57
							D3	20	UNet	85.19	31.93	79.77	46.35
D4	5	UNet	80.09	64.50	46.03	48.71
							D4	10	UNet	88.77	51.96	78.72	58.82
D4	10	UNet+HEA	88.43	46.71	81.39	55.57
							D4	10	UNet+SE	85.38	59.73	59.56	54.67
D4	20	UNet	88.60	49.31	78.74	56.17
							D5	5	UNet	86.66	57.55	85.17	63.55
D5	10	UNet	90.67	73.08	88.84	77.29
							D5	10	UNet+HEA	89.02	65.12	89.54	73.60
D5	10	UNet+SE	90.28	78.40	83.02	75.93
							D5	20	UNet	90.47	71.63	88.49	75.20

TABLE 3

Six representative validation TMA images are shown in fig. 5, where the first column shows the original TMA image, the second column shows the expert generated annotated binary map, and the last column shows the results from our model. Expert annotation and predictors are very close in overall shape, but predictors are more easily distinguished at small blocks and edges. FIG. 6 shows four representative test results of WSIs on D3-D5, where the first column shows the binary map of the original WSI and its labels made by the pathologist (top) and machine (bottom). The second column displays the enlarged detail of the original WSI annotated by the rectangular box in the previous column. The third and fourth columns show binary plots of the expert and machine-made enlargements. The last column shows a heat map of an enlarged portion made by our framework. It is easy to notice on the boundaries and internal details that the results of the machine are more accurate than the results of the expert.

Although the embodiments of the present invention have been described above with reference to the accompanying drawings, the present invention is not limited to the above-described embodiments and application fields, and the above-described embodiments are illustrative, instructive, and not restrictive. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto without departing from the scope of the invention as defined by the appended claims.

Claims

1. A method for automatically segmenting an oral cancer epithelial tissue region of a pathological image comprises the following steps:

2. The method of claim 1, wherein the training set is comprised of tissue microarray TMAs and pathologist-made labels.

3. The method of claim 1, the multi-center external test set consisting of full-width glass images WSI.

4. The method of claim 1, the convolutional neural network UNet model consisting of a convolution block, an deconvolution block, a pooling layer, and an output layer.

5. The method of claim 4, wherein the convolutional blocks have 5 sets, each convolutional block has 2 convolutional layers, each convolutional layer is followed by a batch normalization layer and a ReLU activation layer, and all convolutional blocks except the first convolutional block have a pooling layer at the head; the number of the anti-convolution blocks is 4, each anti-convolution block comprises an anti-convolution layer and 2 convolution layers, and each convolution layer is followed by a batch normalization layer and a ReLU activation layer; the output layer comprises only one convolutional layer.

6. The method of claim 5, the convolution kernel size of convolution layers in the convolution and deconvolution blocks is 3, step size is 1, padding is 1; the convolution kernel size of the pooling layer in the convolution block is 2, the step length is 2, and the padding is 0; the convolution kernel size of the deconvolution layer in the deconvolution block is 2, the step length is 2, and the padding is 0; the convolution kernel size of the convolution layer of the output layer is 1, the step length is 1, and the padding is 0.

7. The method of claim 1, the convolutional neural network UNet model trains an RMSprop optimization algorithm using a learning rate of 0.01 and a weight decay of 1 e-8.

8. The method of claim 2, wherein during training, the tissue microarray TMA and its binary signature map are correspondingly sub-sampled and trained.

9. The method of claim 2, during training, the tissue microarray TMA and its binary signature map are clipped to non-overlapping 256x256 pixel blocks.

10. The method of claim 1, all pixel patches are extracted from regions of marked epithelial and non-epithelial tissue to distinguish one from another.