Disclosure of Invention
According to the pathology image-based dMMR subtype classification method, through training the interest region recognition model, influences of irrelevant factors in WSI are removed for dMMR subtype classification of the target pathology image, and classification difficulty is reduced. The method specifically comprises the following steps:
dividing the marked pathological images to obtain a plurality of known image blocks, constructing the known image blocks into a data set, and training a deep learning network by using the data set to obtain an interest area identification model; acquiring a target image area of a target pathological image through the interest area identification model; dividing the target image area into a plurality of target image blocks with the same size, averagely dividing all the target image blocks into a plurality of parts, and stacking and merging each part of the target image block to obtain a stacked block; and acquiring the feature characterization of all the stacked blocks, and obtaining the dMMR subtype classification result of the target pathological image according to all the feature characterization.
The method for classifying the dMMR subtype divides the marked pathological image into known image blocks with a plurality of image scales under a plurality of resolutions, divides all the known image blocks into a data set with a corresponding resolution, and trains an interest area identification model with the data set with the plurality of image scales under the corresponding resolution; wherein the interest region identification model is obtained by training a Mobilene V2 network.
The method for classifying the dMMR subtype adopts a neural network to obtain the characteristic characterization, performs dimension reduction processing on all the characteristic characterization by a principal component analysis method, inputs the characteristic characterization after the dimension reduction processing into a random forest classifier, and obtains a classification result of the dMMR subtype; wherein the characterization is obtained using a ResNet-18 network.
The invention also provides a system for classifying the dMMR subtype based on the pathological image, which comprises the following steps: the interesting region extraction module is used for dividing the marked pathological images to obtain a plurality of known image blocks, constructing the known image blocks into a data set, and training the deep learning network by using the data set to obtain an interesting region identification model; the image processing module is used for acquiring a target image area of the target pathological image through the interest area identification model; dividing the target image area into a plurality of target image blocks with the same size, averagely dividing all the target image blocks into a plurality of parts, and stacking and merging each part of the target image block to obtain a stacked block; and the image classification module is used for acquiring the feature characterization of all the stacking blocks and obtaining the dMMR subtype classification result of the target pathological image according to all the feature characterization.
The system for classifying the dMMR subtypes comprises a model training module, a model analysis module and a model classification module, wherein the model training module comprises a data set generation module and is used for dividing the marked pathological image into known image blocks with a plurality of image scales under a plurality of resolutions, dividing all the known image blocks into data sets with corresponding resolutions, and training interest region identification models with a plurality of image scales under corresponding resolutions by using the data sets; the model training module obtains the interest region identification model by training a Mobilene V2 network.
The invention discloses a dMMR subtype classification system, wherein an image classification module comprises: the characteristic acquisition module is used for acquiring the characteristic characterization by adopting a neural network and performing dimensionality reduction on all the characteristic characterizations by using a principal component analysis method; the classification module is used for inputting the feature representation after the dimension reduction processing into a random forest classifier to obtain the classification result of the dMMR subtype; the feature acquisition module acquires the feature representation using a ResNet-18 network.
The present invention also proposes a computer-readable storage medium storing computer-executable instructions, characterized in that when executed, the computer-executable instructions implement the method for mmr subtype classification of pathology images as described above.
The present invention also proposes a data processing apparatus comprising a computer-readable storage medium as described above, wherein when the processor of the data processing apparatus retrieves and executes the computer-executable instructions in the computer-readable storage medium, the mmr subtype classification is performed on the target pathology image.
Detailed Description
In order to make the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
The invention provides a dMMR subtype classification method based on pathological images, which aims at the problem that the pathological images are too large in size and cannot be directly classified, trains a multi-scale interest region identification network through different-scale information images, effectively filters background information causing interference in the images, classifies the images through a method combining a neural network model and machine learning, segments a region of interest (ROI) into image blocks with uniform size, extracts image block feature expressions by utilizing a neural network, reduces the dimension of the extracted features by utilizing PCA, and improves the running speed of subsequent operation. And performing feature fusion on all the dimensionality-reduced features of the same pathological image, and inputting the feature fusion into a random forest to obtain a final classification result.
The invention finishes pathological image classification by training a deep learning model and a machine learning model, and FIG. 1 is a schematic diagram of an interest region identification model and a dMMR subtype classification model provided by the invention. As shown in FIG. 1, the machine learning model of the present invention comprises: a region of interest recognition model and a mmr subtype classification model.
The region of interest identification model is used to determine a region of interest (ROI) in a full-field digital slice (WSI) of the pathology Image. The WSI contains normal tissues and lesion tissues, and the influence of the normal tissue region on subsequent classification work is firstly removed through the region of interest identification; in addition, the WSI has a large size, and cannot be directly processed, as shown in fig. 2, before training the region of interest recognition model, the WSI image is first divided into image blocks (patch) of 256 × 256 size to generate a training data set; in the training process, the data in the data set are input into the deep learning network after being enhanced in a random turnover mode, and the region-of-interest recognition model is trained and optimized; when the interest region identification model is tested, firstly, a WSI image is preprocessed, more blank regions exist in the WSI image, a threshold segmentation method is adopted to remove a white background part in the image, then a probability thermodynamic diagram is generated by using a trained interest region identification model, and a gland part in colorectal cancer can display a complete structure in an image with the size of 256 multiplied by 256 under the condition of lower resolution, so that the patch is respectively cut under the condition of 5 times and 40 times of magnification, two interest region identification models are trained, and the two models are integrated to serve as a final interest region identification network.
The mmr subtype classification model consists of two parts, patch class and WSI class, as shown in fig. 3. When pathological image dMMR subtype classification is carried out, firstly, a target WSI image is segmented according to a probability heat map obtained by an interest region identification model to obtain a plurality of patches, every N patches are subjected to stacking fusion (Concatenate) operation to generate image blocks (patch blocks) with the size of N multiplied by 256 multiplied by 3, the previous layer of a full connecting layer of a classification network (such as ResNet-18 network) is output as the features of the patch blocks, as the feature dimensions of the patch blocks are 512 and the number of the features is large, the feature of the patch blocks is subjected to Principal Component Analysis (PCA) dimension reduction operation, then all the feature of the patch blocks after dimension reduction in the WSI are fused to generate a feature map of the WSI, and the feature map of the WSI is input into a random forest classifier, and the dMMWSI subtype classification result of the target WSI is obtained.
As shown in fig. 4, the mmr subtype classification method of the present invention specifically includes:
and step S1, generating a pathological image with a multi-resolution pyramid structure by using the marked digital pathological section, making a data set under different resolutions, and training interest region identification models with different scales. The WSI is divided into image blocks (patch) with uniform sizes when a data set is manufactured, and one WSI can divide the patch from thousands to hundreds of thousands of patches. The WSI of the present invention may be H & E (hematoxylin-eosin (HE) stained) sections digitized by a full-slide scanner, or may be obtained from a network (e.g., GDC/global data center), but the present invention is not limited thereto.
Step S2, inputting the target WSI into the interest region identification model, acquiring the ROI, and removing influences of irrelevant factors in the WSI for subsequent dMMR subtype classification operation as shown in FIG. 5;
step S3, segmenting the ROI acquired in the step S2 into patches with fixed sizes, performing stacking and merging operation on every N patches to generate a plurality of patch stacking blocks, inputting the stacking blocks into a ResNet-18 network, and obtaining the characteristic representation of the patch blocks, wherein the characteristic representation is shown in FIG. 6;
and S4, performing PCA (principal component analysis) dimensionality reduction operation on the patch block features obtained in the step S3, fusing all patch block features in the target WSI, inputting the patch block features into a random forest classifier, and obtaining a dMMR (sparse matrix regression) subtype classification result of the target WSI.
In order to evaluate the performance of the classification algorithm, the invention adopts AUC evaluation indexes to evaluate the performance of the classification algorithm. Fig. 7 corresponds to the ROC curve for the classification network, with an AUC value of 0.88.
FIG. 8 is a schematic diagram of a data processing apparatus of the present invention. As shown in fig. 8, an embodiment of the present invention also provides a computer-readable storage medium and a data processing apparatus. The computer-readable storage medium of the present invention stores computer-executable instructions that, when executed by a processor of a data processing apparatus, implement a mmr subtype classification of a pathology image. It will be understood by those skilled in the art that all or part of the steps of the above method may be implemented by instructing relevant hardware (e.g., processor, FPGA, ASIC, etc.) through a program, and the program may be stored in a readable storage medium, such as a read-only memory, a magnetic or optical disk, etc. All or some of the steps of the above embodiments may also be implemented using one or more integrated circuits. Accordingly, the modules in the above embodiments may be implemented in hardware, for example, by an integrated circuit, or in software, for example, by a processor executing programs/instructions stored in a memory. Embodiments of the invention are not limited to any specific form of hardware or software combination.
The method has the advantages that in the process of identifying the region of interest, identification network models with different scales are fused, so that the identification accuracy is improved; in the dMMR subtype classification process, for the problem that the WSI image is too large in size and cannot be directly classified, a neural network classification model and a machine learning classification model are fused, each patch feature representation in the WSI is obtained through the neural network model, and then all features are fused and input into a random forest to determine the classification result of the WSI, so that the workload of a researcher is effectively shared, and the research efficiency is improved.
Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made by those skilled in the art without departing from the spirit and scope of the invention.