CN114170473A

CN114170473A - dMMR subtype classification method and system based on pathological images

Info

Publication number: CN114170473A
Application number: CN202111233264.8A
Authority: CN
Inventors: 赵娜; 吴焕文; 窦晋津; 杜保林; 王晓雯
Original assignee: Chongqing Zhijian Life Technology Co ltd; Beijing Zhijian Life Technology Co ltd
Current assignee: Chongqing Zhijian Life Technology Co ltd
Priority date: 2021-10-22
Filing date: 2021-10-22
Publication date: 2022-03-11

Abstract

The invention provides a method for classifying dMMR subtypes based on pathological images, which comprises the following steps: dividing the marked pathological images to obtain a plurality of known image blocks, constructing the known image blocks into a data set, and training a deep learning network by using the data set to obtain an interest area identification model; acquiring a target image area of a target pathological image through the interest area identification model; dividing the target image area into a plurality of target image blocks with the same size, averagely dividing all the target image blocks into a plurality of parts, and stacking and merging each part of the target image block to obtain a stacked block; and acquiring the feature characterization of all the stacked blocks, and obtaining the dMMR subtype classification result of the target pathological image according to all the feature characterization. The invention also provides a dMMR subtype classification system based on the pathological images and a data processing device for carrying out dMMR subtype classification on the target pathological images.

Description

Method and system for classifying dMMR subtypes based on pathological images

Technical Field

The invention relates to the technical field of medical pathological image processing, in particular to a pathological image classification method and system based on deep learning.

Background

In the pathological research process of tumors, researchers usually obtain living bodies of suspected tumor areas of the researchers through puncture or surgery, make pathological sections, and read the pathological sections through a microscope to obtain pathological results, wherein the mode depends on the accumulated experience of the researchers seriously; in addition, some errors in the tableting process affect the results of the pathology study. Therefore, a computer preprocessing method for pathological images is needed, which reduces the workload of researchers and simultaneously reduces the problem of pathological result deviation caused by subjective and objective factors by performing normalization processing on pathological images and performing mmr (differential Mismatch Repair) subtype classification.

Disclosure of Invention

According to the pathology image-based dMMR subtype classification method, through training the interest region recognition model, influences of irrelevant factors in WSI are removed for dMMR subtype classification of the target pathology image, and classification difficulty is reduced. The method specifically comprises the following steps:

dividing the marked pathological images to obtain a plurality of known image blocks, constructing the known image blocks into a data set, and training a deep learning network by using the data set to obtain an interest area identification model; acquiring a target image area of a target pathological image through the interest area identification model; dividing the target image area into a plurality of target image blocks with the same size, averagely dividing all the target image blocks into a plurality of parts, and stacking and merging each part of the target image block to obtain a stacked block; and acquiring the feature characterization of all the stacked blocks, and obtaining the dMMR subtype classification result of the target pathological image according to all the feature characterization.

The method for classifying the dMMR subtype divides the marked pathological image into known image blocks with a plurality of image scales under a plurality of resolutions, divides all the known image blocks into a data set with a corresponding resolution, and trains an interest area identification model with the data set with the plurality of image scales under the corresponding resolution; wherein the interest region identification model is obtained by training a Mobilene V2 network.

The method for classifying the dMMR subtype adopts a neural network to obtain the characteristic characterization, performs dimension reduction processing on all the characteristic characterization by a principal component analysis method, inputs the characteristic characterization after the dimension reduction processing into a random forest classifier, and obtains a classification result of the dMMR subtype; wherein the characterization is obtained using a ResNet-18 network.

The invention also provides a system for classifying the dMMR subtype based on the pathological image, which comprises the following steps: the interesting region extraction module is used for dividing the marked pathological images to obtain a plurality of known image blocks, constructing the known image blocks into a data set, and training the deep learning network by using the data set to obtain an interesting region identification model; the image processing module is used for acquiring a target image area of the target pathological image through the interest area identification model; dividing the target image area into a plurality of target image blocks with the same size, averagely dividing all the target image blocks into a plurality of parts, and stacking and merging each part of the target image block to obtain a stacked block; and the image classification module is used for acquiring the feature characterization of all the stacking blocks and obtaining the dMMR subtype classification result of the target pathological image according to all the feature characterization.

The system for classifying the dMMR subtypes comprises a model training module, a model analysis module and a model classification module, wherein the model training module comprises a data set generation module and is used for dividing the marked pathological image into known image blocks with a plurality of image scales under a plurality of resolutions, dividing all the known image blocks into data sets with corresponding resolutions, and training interest region identification models with a plurality of image scales under corresponding resolutions by using the data sets; the model training module obtains the interest region identification model by training a Mobilene V2 network.

The invention discloses a dMMR subtype classification system, wherein an image classification module comprises: the characteristic acquisition module is used for acquiring the characteristic characterization by adopting a neural network and performing dimensionality reduction on all the characteristic characterizations by using a principal component analysis method; the classification module is used for inputting the feature representation after the dimension reduction processing into a random forest classifier to obtain the classification result of the dMMR subtype; the feature acquisition module acquires the feature representation using a ResNet-18 network.

The present invention also proposes a computer-readable storage medium storing computer-executable instructions, characterized in that when executed, the computer-executable instructions implement the method for mmr subtype classification of pathology images as described above.

The present invention also proposes a data processing apparatus comprising a computer-readable storage medium as described above, wherein when the processor of the data processing apparatus retrieves and executes the computer-executable instructions in the computer-readable storage medium, the mmr subtype classification is performed on the target pathology image.

Drawings

FIG. 1 is a schematic representation of the dMMR subtype classification according to the invention.

FIG. 2 is a schematic diagram of the dMMR subtype classification model of the present invention.

Fig. 3 is a flow chart of region of interest identification of the present invention.

FIG. 4 is a flow chart of the dMMR subtype classification method of the present invention.

FIG. 5 is a probability heatmap generated by the region of interest identification network of the present invention.

FIG. 6 is a flowchart of the patch classifier of the present invention.

FIG. 7 is a ROC curve for a classifier.

FIG. 8 is a schematic diagram of a data processing apparatus of the present invention.

Detailed Description

In order to make the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

The invention provides a dMMR subtype classification method based on pathological images, which aims at the problem that the pathological images are too large in size and cannot be directly classified, trains a multi-scale interest region identification network through different-scale information images, effectively filters background information causing interference in the images, classifies the images through a method combining a neural network model and machine learning, segments a region of interest (ROI) into image blocks with uniform size, extracts image block feature expressions by utilizing a neural network, reduces the dimension of the extracted features by utilizing PCA, and improves the running speed of subsequent operation. And performing feature fusion on all the dimensionality-reduced features of the same pathological image, and inputting the feature fusion into a random forest to obtain a final classification result.

The invention finishes pathological image classification by training a deep learning model and a machine learning model, and FIG. 1 is a schematic diagram of an interest region identification model and a dMMR subtype classification model provided by the invention. As shown in FIG. 1, the machine learning model of the present invention comprises: a region of interest recognition model and a mmr subtype classification model.

The region of interest identification model is used to determine a region of interest (ROI) in a full-field digital slice (WSI) of the pathology Image. The WSI contains normal tissues and lesion tissues, and the influence of the normal tissue region on subsequent classification work is firstly removed through the region of interest identification; in addition, the WSI has a large size, and cannot be directly processed, as shown in fig. 2, before training the region of interest recognition model, the WSI image is first divided into image blocks (patch) of 256 × 256 size to generate a training data set; in the training process, the data in the data set are input into the deep learning network after being enhanced in a random turnover mode, and the region-of-interest recognition model is trained and optimized; when the interest region identification model is tested, firstly, a WSI image is preprocessed, more blank regions exist in the WSI image, a threshold segmentation method is adopted to remove a white background part in the image, then a probability thermodynamic diagram is generated by using a trained interest region identification model, and a gland part in colorectal cancer can display a complete structure in an image with the size of 256 multiplied by 256 under the condition of lower resolution, so that the patch is respectively cut under the condition of 5 times and 40 times of magnification, two interest region identification models are trained, and the two models are integrated to serve as a final interest region identification network.

The mmr subtype classification model consists of two parts, patch class and WSI class, as shown in fig. 3. When pathological image dMMR subtype classification is carried out, firstly, a target WSI image is segmented according to a probability heat map obtained by an interest region identification model to obtain a plurality of patches, every N patches are subjected to stacking fusion (Concatenate) operation to generate image blocks (patch blocks) with the size of N multiplied by 256 multiplied by 3, the previous layer of a full connecting layer of a classification network (such as ResNet-18 network) is output as the features of the patch blocks, as the feature dimensions of the patch blocks are 512 and the number of the features is large, the feature of the patch blocks is subjected to Principal Component Analysis (PCA) dimension reduction operation, then all the feature of the patch blocks after dimension reduction in the WSI are fused to generate a feature map of the WSI, and the feature map of the WSI is input into a random forest classifier, and the dMMWSI subtype classification result of the target WSI is obtained.

As shown in fig. 4, the mmr subtype classification method of the present invention specifically includes:

and step S1, generating a pathological image with a multi-resolution pyramid structure by using the marked digital pathological section, making a data set under different resolutions, and training interest region identification models with different scales. The WSI is divided into image blocks (patch) with uniform sizes when a data set is manufactured, and one WSI can divide the patch from thousands to hundreds of thousands of patches. The WSI of the present invention may be H & E (hematoxylin-eosin (HE) stained) sections digitized by a full-slide scanner, or may be obtained from a network (e.g., GDC/global data center), but the present invention is not limited thereto.

Step S2, inputting the target WSI into the interest region identification model, acquiring the ROI, and removing influences of irrelevant factors in the WSI for subsequent dMMR subtype classification operation as shown in FIG. 5;

step S3, segmenting the ROI acquired in the step S2 into patches with fixed sizes, performing stacking and merging operation on every N patches to generate a plurality of patch stacking blocks, inputting the stacking blocks into a ResNet-18 network, and obtaining the characteristic representation of the patch blocks, wherein the characteristic representation is shown in FIG. 6;

and S4, performing PCA (principal component analysis) dimensionality reduction operation on the patch block features obtained in the step S3, fusing all patch block features in the target WSI, inputting the patch block features into a random forest classifier, and obtaining a dMMR (sparse matrix regression) subtype classification result of the target WSI.

In order to evaluate the performance of the classification algorithm, the invention adopts AUC evaluation indexes to evaluate the performance of the classification algorithm. Fig. 7 corresponds to the ROC curve for the classification network, with an AUC value of 0.88.

FIG. 8 is a schematic diagram of a data processing apparatus of the present invention. As shown in fig. 8, an embodiment of the present invention also provides a computer-readable storage medium and a data processing apparatus. The computer-readable storage medium of the present invention stores computer-executable instructions that, when executed by a processor of a data processing apparatus, implement a mmr subtype classification of a pathology image. It will be understood by those skilled in the art that all or part of the steps of the above method may be implemented by instructing relevant hardware (e.g., processor, FPGA, ASIC, etc.) through a program, and the program may be stored in a readable storage medium, such as a read-only memory, a magnetic or optical disk, etc. All or some of the steps of the above embodiments may also be implemented using one or more integrated circuits. Accordingly, the modules in the above embodiments may be implemented in hardware, for example, by an integrated circuit, or in software, for example, by a processor executing programs/instructions stored in a memory. Embodiments of the invention are not limited to any specific form of hardware or software combination.

The method has the advantages that in the process of identifying the region of interest, identification network models with different scales are fused, so that the identification accuracy is improved; in the dMMR subtype classification process, for the problem that the WSI image is too large in size and cannot be directly classified, a neural network classification model and a machine learning classification model are fused, each patch feature representation in the WSI is obtained through the neural network model, and then all features are fused and input into a random forest to determine the classification result of the WSI, so that the workload of a researcher is effectively shared, and the research efficiency is improved.

Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. a dMMR subtype classification method based on pathological images, is characterized in that, comprises:

Divide the labeled pathological images to obtain a plurality of known image blocks, build them into a data set, train the deep learning network with the data set, and obtain a region of interest recognition model;

Obtain the target image area of the target pathological image through the ROI identification model; divide the target image area into a plurality of target image blocks of the same size, divide all the target image blocks into equal parts, and divide each target image The blocks are stacked and merged to obtain stacked blocks;

The feature representations of all the stacked blocks are acquired, and the dMMR subtype classification result of the target pathological image is obtained according to all the feature representations.

2. The dMMR subtype classification method according to claim 1, wherein the labeled pathological image is divided into known image blocks of multiple image scales under multiple resolutions, and all the known image blocks are divided into Divide it into a dataset of corresponding resolution, and use this dataset to train a region of interest recognition model of multiple image scales at the corresponding resolution.

3. The dMMR subtype classification method according to claim 2, characterized in that, the ROI identification model is obtained by training a MobilenetV2 network.

4. dMMR subtype classification method as claimed in claim 1, it is characterized in that, adopt neural network to obtain this characteristic representation, carry out dimension reduction processing to all this characteristic representation with principal component analysis method, and the characteristic representation after dimension reduction processing is carried out Enter the random forest classifier to obtain the classification result of this dMMR subtype.

5. The dMMR subtype classification method according to claim 4, wherein the characteristic representation is obtained by adopting a ResNet-18 network.

6. A dMMR subtype classification system based on pathological images, characterized in that, comprising:

The region of interest extraction module is used to divide the labeled pathological images to obtain a plurality of known image blocks, which are constructed as a data set, and the deep learning network is trained with the data set to obtain a region of interest recognition model;

The image processing module is used for obtaining the target image area of the target pathological image through the ROI identification model; dividing the target image area into a plurality of target image blocks of the same size, and dividing all the target image blocks into multiple equal parts, The stacking block is obtained by stacking and merging each target image block;

The image classification module is used to obtain the feature representations of all the stacked blocks, and obtain the dMMR subtype classification result of the target pathological image according to all the feature representations.

7. The dMMR subtype classification system according to claim 6, wherein the model training module comprises a data set generation module for dividing the labeled pathological image into a plurality of image scales at a plurality of resolutions. All the known image blocks are divided into datasets of corresponding resolutions, and the regions of interest recognition models of multiple image scales at the corresponding resolutions are trained with the datasets.

8. The dMMR subtype classification system of claim 7, wherein the model training module obtains the region of interest recognition model by training a MobilenetV2 network.

9. dMMR subtype classification system as claimed in claim 6, is characterized in that, this image classification module comprises:

A feature acquisition module, used for acquiring the feature representation by using a neural network, and performing dimension reduction processing on all the feature representations by principal component analysis;

The classification module is used to input the feature representation after dimensionality reduction processing into the random forest classifier to obtain the dMMR subtype classification result.

10 . The dMMR subtype classification system according to claim 9 , wherein the feature acquisition module uses a ResNet-18 network to acquire the feature representation. 11 .

11. A computer-readable storage medium storing computer-executable instructions, wherein, when the computer-executable instructions are executed, the pathological image-based dMMR sub-system according to any one of claims 1 to 6 is implemented. type classification method.

12. A data processing apparatus, comprising the computer-readable storage medium as claimed in claim 11, when a processor of the data processing apparatus calls and executes the computer-executable instructions in the computer-readable storage medium, the target Pathological images were subjected to dMMR subtype classification.