CN118197603A - Method for predicting stomach cancer molecular subtype by using stomach cancer pathological image - Google Patents
Method for predicting stomach cancer molecular subtype by using stomach cancer pathological image Download PDFInfo
- Publication number
- CN118197603A CN118197603A CN202410366139.1A CN202410366139A CN118197603A CN 118197603 A CN118197603 A CN 118197603A CN 202410366139 A CN202410366139 A CN 202410366139A CN 118197603 A CN118197603 A CN 118197603A
- Authority
- CN
- China
- Prior art keywords
- image
- stomach cancer
- gastric cancer
- molecular subtype
- gene expression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000005718 Stomach Neoplasms Diseases 0.000 title claims abstract description 102
- 206010017758 gastric cancer Diseases 0.000 title claims abstract description 98
- 201000011549 stomach cancer Diseases 0.000 title claims abstract description 98
- 230000001575 pathological effect Effects 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 36
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 45
- 230000014509 gene expression Effects 0.000 claims abstract description 30
- 238000004458 analytical method Methods 0.000 claims abstract description 22
- 230000007170 pathology Effects 0.000 claims abstract description 22
- 238000012545 processing Methods 0.000 claims abstract description 12
- 108700019961 Neoplasm Genes Proteins 0.000 claims abstract description 8
- 102000048850 Neoplasm Genes Human genes 0.000 claims abstract description 8
- 230000036039 immunity Effects 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000012216 screening Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 14
- 210000002865 immune cell Anatomy 0.000 claims description 11
- 238000007619 statistical method Methods 0.000 claims description 10
- 208000032818 Microsatellite Instability Diseases 0.000 claims description 8
- 238000001914 filtration Methods 0.000 claims description 8
- 208000037051 Chromosomal Instability Diseases 0.000 claims description 7
- 108090000623 proteins and genes Proteins 0.000 claims description 7
- 208000031448 Genomic Instability Diseases 0.000 claims description 6
- 238000013136 deep learning model Methods 0.000 claims description 6
- 230000009274 differential gene expression Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 238000012352 Spearman correlation analysis Methods 0.000 claims description 3
- 241000700605 Viruses Species 0.000 claims description 3
- 108020004999 messenger RNA Proteins 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 238000012098 association analyses Methods 0.000 claims description 2
- 238000010219 correlation analysis Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 5
- 238000003908 quality control method Methods 0.000 abstract description 4
- 238000004393 prognosis Methods 0.000 description 6
- 238000002790 cross-validation Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- 108020004414 DNA Proteins 0.000 description 2
- 230000004791 biological behavior Effects 0.000 description 2
- 210000004443 dendritic cell Anatomy 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010201 enrichment analysis Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000009169 immunotherapy Methods 0.000 description 2
- 210000000822 natural killer cell Anatomy 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010186 staining Methods 0.000 description 2
- 238000002560 therapeutic procedure Methods 0.000 description 2
- 238000011269 treatment regimen Methods 0.000 description 2
- 238000007482 whole exome sequencing Methods 0.000 description 2
- 208000003174 Brain Neoplasms Diseases 0.000 description 1
- HEVGGTGPGPKZHF-UHFFFAOYSA-N Epilaurene Natural products CC1C(=C)CCC1(C)C1=CC=C(C)C=C1 HEVGGTGPGPKZHF-UHFFFAOYSA-N 0.000 description 1
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 210000003719 b-lymphocyte Anatomy 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005773 cancer-related death Effects 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 231100000517 death Toxicity 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 210000002249 digestive system Anatomy 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000008595 infiltration Effects 0.000 description 1
- 238000001764 infiltration Methods 0.000 description 1
- 230000000968 intestinal effect Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 208000037841 lung tumor Diseases 0.000 description 1
- 210000002540 macrophage Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007479 molecular analysis Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 238000003068 pathway analysis Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 210000004180 plasmocyte Anatomy 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 208000023958 prostate neoplasm Diseases 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 238000002626 targeted therapy Methods 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a method for predicting stomach cancer molecular subtype by utilizing stomach cancer pathology image, belonging to the stomach cancer histopathology image technical field, comprising the following steps: s1: processing and analyzing gastric cancer gene expression data; s2: tumor immunity microenvironment contrast analysis; s3: preprocessing a pathological image, and extracting and analyzing characteristics; s4: molecular subtype recognition. The invention pre-processes and analyzes gastric cancer gene expression data, then calculates tumor immunity micro-environment data according to the gene expression data for analysis, then analyzes the collected pathological images through a series of screening, annotating, cutting, quality control, normalizing and extracting features, finally predicts gastric cancer molecular subtype by adopting Resnet model based on focal loss, and experimental results prove that the invention can accurately predict gastric cancer molecular subtype by using gastric cancer images and is expected to be applied to medical image identification in the future.
Description
Technical Field
The invention relates to the technical field of stomach cancer histopathological images, in particular to a method for predicting stomach cancer molecular subtypes by utilizing stomach cancer pathological images.
Background
Gastric Cancer (GC) is one of the most common malignant tumors of the digestive system and one of the main causes of cancer-related death worldwide. In 2020, 48 cases of new onset and 37 cases of related deaths account for 44% and 48% of global statistics, respectively. Gastric cancer shows heterogeneity and different classification methods have different criteria. According to Lauren's classification, gastric cancer can be classified into diffuse type, intestinal type and mixed type 3 types according to its morphological characteristics. In 2014, cancer genomic profile (TCGA) studied 295 gastric cancer samples using six molecular analysis platforms including Whole Exome Sequencing (WES). Based on their findings, they proposed four molecular subtypes: epstein barr virus positive (EBV), microsatellite instability (MSI), chromosome Instability (CIN) and Genome Stability (GS). These subtypes were subsequently included in the world health organization classification in 2019. Research on pathogenesis and molecular characteristics of gastric cancer has important significance for early diagnosis, treatment strategy establishment and prognosis evaluation.
Gastric cancer has a large difference in clinical manifestation and prognosis, due in part to its molecular level heterogeneity. In recent years, more and more studies have found that gastric cancer can be divided into different molecular subtypes, which have unique gene expression characteristics and biological behaviors. These differences may affect the patient's response to treatment, as well as differences in prognosis. Different subtypes of gastric cancer may have different sensitivities to treatment. Some subtypes may respond better to traditional therapies (such as chemotherapy or radiation therapy), while other subtypes may be more suitable for targeted therapy or immunotherapy. Furthermore, gastric cancer of a particular subtype may have different biological behaviors, leading to a significant difference in prognosis for the patient. Therefore, understanding and identifying molecular subtypes of gastric cancer is of great importance for formulating personalized treatment regimens, improving treatment efficacy, and predicting patient prognosis. While traditional clinical pathology features are helpful for diagnosis and therapy planning, molecular level heterogeneity may not be adequately captured using these features alone.
In recent years, deep learning techniques have made breakthrough progress in the medical field. Its powerful pattern recognition capability and processing power on complex data make it a powerful tool for studying and predicting disease molecular characteristics. In the cancer field, deep learning has been widely used in tumor classification, prognosis evaluation, image analysis, genomics research, and the like. In some medical data analysis tasks, deep learning is performed beyond humans, and images of lung, prostate, and brain tumors can be used to predict patient survival and tumor mutations; kather et al built a deep residual learning model to predict microsatellite instability (MSI) from H & E stained histological images. For this reason, a method for predicting a subtype of gastric cancer molecule using a pathological image of gastric cancer is proposed.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: how to accurately identify the molecular subtype of the gastric cancer based on the gastric cancer pathology image by using a deep learning technology, and a method for predicting the molecular subtype of the gastric cancer by using the gastric cancer pathology image.
The invention solves the technical problems through the following technical proposal, and the invention comprises the following steps:
S1: gastric cancer gene expression data processing analysis
Obtaining gastric cancer gene expression data of a patient from TCGA, matching the patient with respective molecular subtype labels, preprocessing the data according to the gene expression difference requirement, and carrying out difference analysis among different molecular subtype groups;
S2: tumor immunity microenvironment contrast analysis
According to the matched gene expression data in the step S1, calculating tumor immune microenvironment data of a corresponding patient by using a CIBERSORT tool, carrying out statistical analysis on the immune microenvironment, and comparing and analyzing differences among different molecular subtype groups;
s3: pathological image preprocessing and feature extraction and analysis
Collecting pathological images of a corresponding patient from TCGA, cutting and normalizing the collected images to obtain pathological image blocks of the corresponding patient, extracting image block characteristics and analyzing;
s4: molecular subtype recognition
And designing and training a deep learning model for identifying stomach cancer molecular subtypes, obtaining a classification result by using a verification set, and evaluating the classification effect of the model.
Further, in the step S1, the specific process is as follows:
s11: mRNA gene expression data of a gastric cancer patient are obtained from TCGA, and gastric cancer molecular subtype labels are matched;
S12: filtering the matched gene expression data, screening out low-quality data, standardizing the gene expression data, and carrying out differential gene expression statistical analysis on the standardized data.
Further, in the step S2, the specific process is as follows:
s21: according to the matched gene expression data, calculating corresponding gastric cancer immune microenvironment data by using a CIBERSORT tool;
S22: and respectively carrying out differential analysis among molecular subtype groups on components in the immune microenvironment.
Further, in the step S22, when the inter-group difference analysis is performed, the ratio Fold change of the average value of the gene expression levels of the two groups of gene samples is calculated, the value of LFC is calculated by using the formula lfc=log 2 Fold change, the absolute value of LFC is greater than 1 to indicate that there is a difference between the two groups, then a statistical value is calculated for each expression to measure the difference under different groups, then the significance P value is calculated according to the t distribution to measure the significance of the difference, and the detected value P <0.05 is regarded as the group with the significance difference.
Further, in the step S3, the specific procedure is as follows:
s31: the pathologist pre-examines the collected pathological images, eliminates the pathological images with fuzzy unsatisfactory, and manually annotates the tumor areas of the pathological images with the unsuitable requirements;
S32: cutting the annotated pathological image, cutting a tumor region into pixel slices with the size of 512x512, controlling the quality of the slices without overlapping areas among the slices, discarding the slices with blank areas more than 30% or background areas more than 30%, and carrying out Macenko color normalization processing on the rest slices;
S33: extracting the image features of each slice after normalization processing, including color features and texture features, then obtaining image features corresponding to pathological images, carrying out statistical analysis on the image features of different molecular subtypes, and carrying out association analysis on the image features and tumor immune microenvironments which possibly influence the image features.
Further, in the step S33, the color features are obtained as follows:
s3301: converting the pathological image slice into HSV color space to obtain a color image slice;
s3302: then the color image is disassembled into separate R, G, B channels;
s3303: the image is then converted to a gray scale image, and the mean and variance over R, G, B and gray scale space are calculated, respectively.
Further, in the step S33, the texture features are obtained as follows:
S3311: firstly, carrying out 2D wavelet packet transformation on pathological image slices to obtain four sub-images: an approximation sub-image, a horizontal detail sub-image, a vertical detail sub-image, and a diagonal detail sub-image;
S3312: averaging the similar sub-image, the horizontal detail sub-image, the vertical detail sub-image and the diagonal detail sub-image into 16 gray levels, standardizing the images and constructing a window multi-scale symbiotic matrix;
S3313: texture features are extracted based on the window multi-scale co-occurrence matrix, and the extracted texture features are as follows: entropy, contrast, regulation, correlation, IDM, DLMSE, GLMSE, DLA, GLA, SGSDA, SGBDA.
Further, in the step S33, statistical analysis is performed on the image features of different molecular subtypes, and correlation analysis is performed on the image features and tumor immune microenvironment which may affect the image features, which specifically includes the following steps:
S3321: performing difference detection and P value calculation on different subtype groups by using statistical analysis software on all the extracted image features;
s3322: carrying out statistical analysis, difference detection and P value calculation on the immune cell content in different subtype tumor immune microenvironments;
S3323: and carrying out Spearman correlation analysis on the image characteristics and the immune cell content in the tumor immune microenvironment by using statistical analysis software, and calculating a correlation coefficient.
Further, in the step S4, the specific process is as follows:
S41: making a classification strategy according to original definition of gastric cancer molecular subtype, wherein the classification strategy is to judge whether the gastric cancer is EB virus positive, separate out EBV type gastric cancer, judge whether the gastric cancer is microsatellite instability, and finally distinguish genome stability type and chromosome instability type according to the degree of copy number variation;
S42: dividing the data set into a training set and a verification set according to a set proportion by the established classification strategy;
s43: the deep learning model takes Resnet as a basic model, a model full-connection layer is added, a dropout layer is added, a focal loss function is used for replacing a cross entropy loss function, and a training set is used for training the deep learning model;
s44: and after training, obtaining a classification result by using the verification set and evaluating the model classification effect.
Further, in the step S43, the Focal loss function is as follows:
Lfl=-(1-pt)γlog (pt)
Wherein p t reflects the proximity to class y, and a larger p t indicates a closer proximity to class y, i.e., a more accurate classification result, with γ >0 being an adjustable factor.
Compared with the prior art, the invention has the following advantages: the method for predicting the stomach cancer molecular subtype by utilizing the stomach cancer pathology image comprises the steps of preprocessing and analyzing stomach cancer gene expression data, calculating tumor immunity microenvironment data according to the gene expression data for analysis, analyzing the collected pathology image through a series of screening, annotating, cutting, quality control, normalizing and extracting features, and predicting the stomach cancer molecular subtype by adopting a Resnet model based on focal loss, wherein experimental results prove that the method can accurately predict the stomach cancer molecular subtype by using the stomach cancer image and is expected to be applied to medical image recognition in the future.
Drawings
FIG. 1 is a flow chart of an illustrative method for predicting gastric cancer molecular subtypes using gastric cancer pathology images in an embodiment of the present invention;
FIG. 2a is a thermal diagram of differential gene expression between EBV-type and non-EBV-type gastric cancers in an embodiment of the present invention;
FIG. 2b is a thermal graph of differential gene expression of MSI-type and non-MSI-type gastric cancers in an embodiment of the present invention;
FIG. 2c is a thermal map of differential gene expression of CIN-type and non-CIN-type gastric cancers in an embodiment of the invention;
FIG. 2d is a thermal diagram of differential gene expression between GS-type and non-GS-type stomach cancers in examples of the present invention;
FIG. 3 is a volcanic chart of gastric cancer molecular subtype differential genes in an embodiment of the present invention;
FIG. 4 is a thermal diagram of a differential gene enrichment analysis pathway in an embodiment of the invention;
FIG. 5 is a graph of differential analysis violin of Tumor Immune microenvironment scores ((interstitial Score (Stromal Score), immune Score (Immune Score), tumor Purity (Tumor Purity)) over four molecular subtypes in an example of the invention, wherein P values represent the level of inter-group variability, and smaller P values indicate more significant differences;
FIG. 6 is a box plot of immune cell content in a tumor immune microenvironment in an embodiment of the invention;
FIG. 7 is a box plot of differences between pathological image features and molecular subtypes in an embodiment of the invention;
FIG. 8 is a network diagram showing the correlation of pathological image features with immune cells in a tumor immune microenvironment in an embodiment of the invention;
FIG. 9 is a flow chart of pathology image preprocessing (including model thumbnail) in an embodiment of the present invention;
FIG. 10a is a five-fold cross-validation ROC curve for a model of EBV-type tumor classification in an embodiment of the invention;
FIG. 10b is a five-fold cross-validation ROC curve for a model of MSI-type tumor classification in an embodiment of the invention;
FIG. 10c is a five-fold cross-validation ROC curve for model CIN and GS-type tumor classifications in an embodiment of the invention.
Detailed Description
The following describes in detail the examples of the present invention, which are implemented on the premise of the technical solution of the present invention, and detailed embodiments and specific operation procedures are given, but the scope of protection of the present invention is not limited to the following examples.
As shown in fig. 1, this embodiment provides a technical solution: an interpretable method for predicting a subtype of gastric cancer molecules using a gastric cancer pathology image, comprising:
S1: and (5) processing and analyzing gastric cancer gene expression data.
The gastric cancer mRNA gene expression data are collected from TCGA, and are matched with four molecular subtype labels, quality control is carried out on the matched data, low-quality reading is filtered, and normalization processing is carried out on the data, so that the influence of technical differences on results is reduced. The data were analyzed using statistical and computational biology methods, in this example using DESeq2 differential expression analysis tools, gene function analysis, including enrichment analysis, pathway analysis, or gene ontology analysis, was performed on the differentially expressed genes.
S2: tumor immune microenvironment analysis.
According to the gene expression data obtained in the step S1, calculating tumor immunity microenvironment data of a corresponding patient through a CIBERSORT tool, wherein the tumor immunity microenvironment consists of a plurality of immune cells, and the immune cells comprise T cells (CD4+ and CD8+), B cells, plasma cells, natural killer cells (NK cells), dendritic Cells (DCs), macrophages and the like. The degree and type of infiltration of immune cells can exhibit different characteristics in pathology images. Aiming at different molecular subtypes, carrying out differential analysis on the content of various immune cells;
s3: pathological image preprocessing and feature extraction and analysis
And collecting pathological images of the corresponding patient from the TCGA, cutting and normalizing the collected images to obtain pathological image blocks of the corresponding patient, extracting the characteristics of the image blocks, and analyzing.
In this embodiment, the step S3 specifically includes the following steps:
S31: the oncologist (pathologist) pre-examines the collected pathological images, eliminates the pathological images with fuzzy unsatisfactory, and manually annotates the tumor areas of the pathological images with satisfactory; the distribution of gastric cancer molecular subtype data in the pathological images in this example is shown in table 1 below:
TABLE 1 distribution of gastric cancer molecular subtype data
S32: the annotated pathological section is segmented into tumor areas, the tumor areas are cut into small sections by taking 512x512 pixel sections as standards, no overlapping areas exist among the sections, meanwhile, quality control processing is carried out on the obtained sections, some sections contain a large number of blank areas or background areas, the existence of the areas greatly influences the accuracy of a model, and the sections with the blank areas or the background areas being more than 30% are discarded. Since the collected pathological image samples may come from different hospitals, the staining standards of the pathological images are different, and in order to avoid the influence of the staining technology on prediction, the slices are subjected to unified Macenko color normalization processing.
S33: and carrying out feature extraction on the normalized image slice, wherein the feature extraction is also feature extraction at the slice level. Firstly, transferring the image to HSV color space, disassembling the color image into a single R, G, B channel, converting the image into a gray image, and then calculating R, G, B and color characteristics such as mean and variance in the gray space respectively. Then, the texture features of the tumor tissue region slice are extracted, firstly, 2D wavelet packet transformation is carried out on the pathological image slice, and four seed images are obtained through the transformation: an approximate sub-image (Low-Low, which contains Low frequency information of the original image for representing the overall trend and smooth structure), a horizontal detail sub-image (Low-High, which captures High frequency detail information in the horizontal direction of the original image for representing texture and edge information in the horizontal direction), a vertical detail sub-image (High-Low, which captures High frequency detail information in the vertical direction of the original image for representing texture and edge information in the vertical direction), and a diagonal detail sub-image (High-High), high frequency detail information in the diagonal direction in the original image is captured for representing texture and edge information in the diagonal direction). Specifically, performing low-pass filtering on rows and columns of an original image to obtain an approximate sub-image; performing low-pass filtering on the rows of the image and performing high-pass filtering on the columns to obtain a horizontal detail sub-image; carrying out high-pass filtering on the image, and carrying out low-pass filtering on the columns to obtain a vertical detail sub-image; high-pass filtering is carried out on the rows and the columns of the image to obtain diagonal detail sub-images; then, the similar sub-image, the horizontal detail sub-image, the vertical detail sub-image and the diagonal detail sub-image are quantized into 16 gray levels, the images are standardized, a Window multi-scale co-occurrence matrix: WMCM is constructed, texture features are extracted based on WMCM, and the extracted texture features are as follows: entropy (entropy, measure uncertainty or randomness of pixel value distribution in an image), contrast (Contrast, quantify the change in brightness between different parts of an image, reflect the depth of texture and edges in an image), and, Regulation (uniformity, measure the smoothness or consistency of pixel values in an image across the image), corelation (Correlation, evaluate the Correlation between pixel values and their neighborhood pixel values, indicate the likelihood of predicting pixel values from surrounding pixel values), IDM (inverse moment, measure the local uniformity of an image), DLMSE (local mean square error bias, calculate variability or dispersion of pixel values in local columns, indicate uniformity of vertical texture pattern), GLMSE (global mean square error, measure the horizontal dispersion of pixel values, reflect horizontal texture pattern), DLA (local mean value bias, deviation of average pixel values in local columns), GLA (global average, average pixel values between rows to analyze horizontal smoothness or uniformity of pattern), SGSDA (gray differential integration sum, overall texture energy or intensity of image), SGBDA (gradient-based differential integration sum, gradient difference integration in image, describe texture complexity and edge density, reflect structure and texture changes from gradient information). And carrying out differential analysis according to molecular subtypes and image characteristics, and carrying out spearman correlation analysis on various immune cells in the image characteristics and tumor immune microenvironment to find out the correlation between the tumor immune microenvironment and the image characteristics, wherein tumor immune microenvironment components have correlation with the image characteristics, so that the tumor immune microenvironment is influenced by different subtype tumors, and the image characteristics are further influenced.
S4: and (5) identifying stomach cancer molecular subtype types. The detailed scheme is as follows:
S41: the classification strategy is formulated according to the original definition of the stomach cancer molecular subtype, whether the stomach cancer molecular subtype is positive to EB virus is firstly judged, EBV type stomach cancer is separated, whether microsatellite instability (MSI) is judged, and finally, the Genome Stability (GS) and Chromosome Instability (CIN) are distinguished according to the degree of copy number variation (the copy number variation refers to the variation of increasing or decreasing part of DNA fragments in the genome DNA of an individual compared with a reference genome). This classification strategy is also classified according to the effect of these subtypes on immunotherapy. This classification strategy is followed in the recognition of molecular subtypes.
S42: dividing the data set according to the formulated classification strategy, dividing the data set according to a five-fold cross validation method, sequentially taking four-fold training and reserving one as validation.
S43: the Resnet is taken as a basic model, a model full-connection layer is added, and a dropout layer is added to reduce the risk of model overfitting. According to the data distribution condition, replacing the cross entropy loss function with a focal loss function, wherein the cross entropy loss function has the following formula:
Wherein, For the magnitude of the prediction probability, y is label, which corresponds to 0,1 in the two classifications. In this dataset, the cross entropy loss function does not notice difficult-to-sort samples with a smaller amount of data when in use due to the imbalance between datasets. The loss function is replaced by a focal loss function, and the formula is deduced as follows:
Lfl=-(1-pt)γlog (pt)
Where p t reflects the proximity to ground truth, class y, and a larger p t indicates a closer proximity to class y, that is, more accurate classification, with γ >0 being the adjustable factor. In implementation, γ=2, and a coefficient α=0.6 is added before the loss function, and the training process loads the weight of ImageNet pre-training to retrain on the data, and the SGD optimizer is used, and the learning rate is set to 0.001, the momentum (momentum) is set to 0.9, and the learning rate is reduced by 10 times every 7 pieces of epochs; setting the batch size to 128, training 30 epochs.
In summary, the method for predicting the stomach cancer molecular subtype by using the stomach cancer pathology image in the embodiment disclosed above reveals the heterogeneity of the stomach cancer tumor microenvironment caused by gene expression based on the determination of the stomach cancer molecular subtype, the heterogeneity of the tumor immunity microenvironment is reflected on the stomach cancer pathology image, the explanation that the stomach cancer pathology image can be used for predicting the stomach cancer molecular subtype, the accuracy of the focus-based Resnet model in the task of predicting the stomach cancer pathology image molecular subtype is higher, the model has good robustness, and the stomach cancer molecular subtype can be accurately identified.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Claims (10)
1. A method for predicting stomach cancer molecular subtype by using stomach cancer pathological image, which is characterized by comprising the following steps:
S1: gastric cancer gene expression data processing analysis
Obtaining gastric cancer gene expression data of a patient from TCGA, matching the patient with respective molecular subtype labels, preprocessing the data according to the gene expression difference requirement, and carrying out difference analysis among different molecular subtype groups;
S2: tumor immunity microenvironment contrast analysis
According to the matched gene expression data in the step S1, calculating tumor immune microenvironment data of a corresponding patient by using a CIBERSORT tool, carrying out statistical analysis on the immune microenvironment, and comparing and analyzing differences among different molecular subtype groups;
s3: pathological image preprocessing and feature extraction and analysis
Collecting pathological images of a corresponding patient from TCGA, cutting and normalizing the collected images to obtain pathological image blocks of the corresponding patient, extracting image block characteristics and analyzing;
s4: molecular subtype recognition
And designing and training a deep learning model for identifying stomach cancer molecular subtypes, obtaining a classification result by using a verification set, and evaluating the classification effect of the model.
2. The method for predicting stomach cancer molecular subtype by using stomach cancer pathology image according to claim 1, wherein in the step S1, the specific procedure is as follows:
s11: mRNA gene expression data of a gastric cancer patient are obtained from TCGA, and gastric cancer molecular subtype labels are matched;
S12: filtering the matched gene expression data, screening out low-quality data, standardizing the gene expression data, and carrying out differential gene expression statistical analysis on the standardized data.
3. The method for predicting stomach cancer molecular subtype by using stomach cancer pathology image according to claim 1, wherein in the step S2, the specific procedure is as follows:
s21: according to the matched gene expression data, calculating corresponding gastric cancer immune microenvironment data by using a CIBERSORT tool;
S22: and respectively carrying out differential analysis among molecular subtype groups on components in the immune microenvironment.
4. A method for predicting gastric cancer molecular subtype using gastric cancer pathology image according to claim 3, wherein in step S22, when performing inter-group difference analysis, the ratio Fold change of the average value of gene expression levels of two groups of gene samples is calculated first, then LFC is calculated by using the formula lfc=log 2 Fold change, the absolute value of LFC is greater than 1 to indicate that there is a difference between the two groups, then a statistical value is calculated for each expression level to measure the difference under different groups, then a significance P value is calculated according to t distribution to measure the significance of the difference, and the detected value P <0.05 is regarded as the group with significant difference.
5. The method for predicting stomach cancer molecular subtype by using stomach cancer pathology image according to claim 1, wherein in the step S3, the specific procedure is as follows:
s31: the pathologist pre-examines the collected pathological images, eliminates the pathological images with fuzzy unsatisfactory, and manually annotates the tumor areas of the pathological images with the unsuitable requirements;
S32: cutting the annotated pathological image, cutting a tumor region into pixel slices with the size of 512x512, controlling the quality of the slices without overlapping areas among the slices, discarding the slices with blank areas more than 30% or background areas more than 30%, and carrying out Macenko color normalization processing on the rest slices;
S33: extracting the image features of each slice after normalization processing, including color features and texture features, then obtaining image features corresponding to pathological images, carrying out statistical analysis on the image features of different molecular subtypes, and carrying out association analysis on the image features and tumor immune microenvironments which possibly influence the image features.
6. The method for predicting stomach cancer molecular subtype using stomach cancer pathology image according to claim 5, wherein in the step S33, the color features are obtained as follows:
s3301: converting the pathological image slice into HSV color space to obtain a color image slice;
s3302: then the color image is disassembled into separate R, G, B channels;
s3303: the image is then converted to a gray scale image, and the mean and variance over R, G, B and gray scale space are calculated, respectively.
7. The method according to claim 6, wherein in the step S33, the texture features are obtained as follows:
S3311: firstly, carrying out 2D wavelet packet transformation on pathological image slices to obtain four sub-images: an approximation sub-image, a horizontal detail sub-image, a vertical detail sub-image, and a diagonal detail sub-image;
S3312: averaging the similar sub-image, the horizontal detail sub-image, the vertical detail sub-image and the diagonal detail sub-image into 16 gray levels, standardizing the images and constructing a window multi-scale symbiotic matrix;
S3313: texture features are extracted based on the window multi-scale co-occurrence matrix, and the extracted texture features are as follows: entropy, contrast, regulation, correlation, IDM, DLMSE, GLMSE, DLA, GLA, SGSDA, SGBDA.
8. The method for predicting stomach cancer molecular subtypes by using stomach cancer pathology image according to claim 7, wherein in the step S33, statistical analysis is performed on image features of different molecular subtypes, and correlation analysis is performed on the image features and tumor immune microenvironment which may affect the image features, specifically comprising the following steps:
S3321: performing difference detection and P value calculation on different subtype groups by using statistical analysis software on all the extracted image features;
s3322: carrying out statistical analysis, difference detection and P value calculation on the immune cell content in different subtype tumor immune microenvironments;
S3323: and carrying out Spearman correlation analysis on the image characteristics and the immune cell content in the tumor immune microenvironment by using statistical analysis software, and calculating a correlation coefficient.
9. The method for predicting stomach cancer molecular subtype using stomach cancer pathology image according to claim 8, wherein in the step S4, the specific procedure is as follows:
S41: making a classification strategy according to original definition of gastric cancer molecular subtype, wherein the classification strategy is to judge whether the gastric cancer is EB virus positive, separate out EBV type gastric cancer, judge whether the gastric cancer is microsatellite instability, and finally distinguish genome stability type and chromosome instability type according to the degree of copy number variation;
S42: dividing the data set into a training set and a verification set according to a set proportion by the established classification strategy;
s43: the deep learning model takes Resnet as a basic model, a model full-connection layer is added, a dropout layer is added, a focal loss function is used for replacing a cross entropy loss function, and a training set is used for training the deep learning model;
s44: and after training, obtaining a classification result by using the verification set and evaluating the model classification effect.
10. The method of predicting gastric cancer molecular subtype using gastric cancer pathology images according to claim 9, characterized in that in step S43 the Focal loss function is as follows:
Lfl=-(1-pt)γlog (pt)
Wherein p t reflects the proximity to class y, and a larger p t indicates a closer proximity to class y, i.e., a more accurate classification result, with γ >0 being an adjustable factor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410366139.1A CN118197603A (en) | 2024-03-28 | 2024-03-28 | Method for predicting stomach cancer molecular subtype by using stomach cancer pathological image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410366139.1A CN118197603A (en) | 2024-03-28 | 2024-03-28 | Method for predicting stomach cancer molecular subtype by using stomach cancer pathological image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118197603A true CN118197603A (en) | 2024-06-14 |
Family
ID=91397773
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410366139.1A Pending CN118197603A (en) | 2024-03-28 | 2024-03-28 | Method for predicting stomach cancer molecular subtype by using stomach cancer pathological image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118197603A (en) |
-
2024
- 2024-03-28 CN CN202410366139.1A patent/CN118197603A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Computational staining of pathology images to study the tumor microenvironment in lung cancer | |
US11610307B2 (en) | Determining biomarkers from histopathology slide images | |
Zhao et al. | Toward automatic prediction of EGFR mutation status in pulmonary adenocarcinoma with 3D deep learning | |
Li et al. | Machine learning for lung cancer diagnosis, treatment, and prognosis | |
CN111079862B (en) | Deep learning-based thyroid papillary carcinoma pathological image classification method | |
Cong et al. | Deep learning model as a new trend in computer-aided diagnosis of tumor pathology for lung cancer | |
WO2021062904A1 (en) | Tmb classification method and system based on pathological image, and tmb analysis device based on pathological image | |
CN112635063B (en) | Comprehensive lung cancer prognosis prediction model, construction method and device | |
CN108198621B (en) | Database data comprehensive diagnosis and treatment decision method based on neural network | |
JP2024016039A (en) | Integrated machine-learning framework to estimate homologous recombination deficiency | |
Xu et al. | Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients | |
CN108335756B (en) | Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database | |
CN107169497A (en) | A kind of tumor imaging label extracting method based on gene iconography | |
CN108206056B (en) | Nasopharyngeal darcinoma artificial intelligence assists diagnosis and treatment decision-making terminal | |
CN113870951A (en) | Prediction system for predicting head and neck squamous cell carcinoma immune subtype | |
Zhao et al. | Single-cell morphological and topological atlas reveals the ecosystem diversity of human breast cancer | |
Liu et al. | Pathological prognosis classification of patients with neuroblastoma using computational pathology analysis | |
CN108320797B (en) | Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database | |
Sali et al. | Morphological diversity of cancer cells predicts prognosis across tumor types | |
Wang et al. | Integrative Analysis for Lung Adenocarcinoma Predicts Morphological Features Associated with Genetic Variations. | |
CN111554381A (en) | Artificial intelligent pathological diagnosis method and diagnosis model for renal clear cell carcinoma based on deep learning | |
CN110942808A (en) | Prognosis prediction method and prediction system based on gene big data | |
CN118197603A (en) | Method for predicting stomach cancer molecular subtype by using stomach cancer pathological image | |
CN114974432A (en) | Screening method of biomarker and related application thereof | |
Garg et al. | [Retracted] ML‐Based Texture and Wavelet Features Extraction Technique to Predict Gastric Mesothelioma Cancer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |