CN115375640A - Tumor heterogeneity identification method and device, electronic equipment and storage medium - Google Patents

Tumor heterogeneity identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115375640A
CN115375640A CN202210964997.7A CN202210964997A CN115375640A CN 115375640 A CN115375640 A CN 115375640A CN 202210964997 A CN202210964997 A CN 202210964997A CN 115375640 A CN115375640 A CN 115375640A
Authority
CN
China
Prior art keywords
tumor
analysis
optimal
image
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210964997.7A
Other languages
Chinese (zh)
Inventor
王理
赵红颖
刘开来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Medical University
Original Assignee
Harbin Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Medical University filed Critical Harbin Medical University
Priority to CN202210964997.7A priority Critical patent/CN115375640A/en
Publication of CN115375640A publication Critical patent/CN115375640A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10088Magnetic resonance imaging [MRI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion

Abstract

The invention is applicable to the technical field of tumor identification, and provides a tumor heterogeneity identification method, a tumor heterogeneity identification device, electronic equipment and a storage medium. The invention firstly positions the tumor risk genes with changed consistency, identifies the subclone specific genes related to the expression of the tumor risk genes, then determines the subclone specific genes with the survival degree reaching the appointed correlation degree with the patient, carries out consistency cluster analysis on the sample patient to obtain the classification label, constructs an optimal tumor prognosis model and screens optimal image genome characteristics according to the tumor MRI image and the classification label of the sample patient, and finally analyzes the external tumor MRI image through the optimal tumor prognosis model and the optimal image genome characteristics, thereby realizing that the survival time of the tumor patient is predicted only by the tumor MRI image on the premise of not causing trauma to the patient, and providing important theoretical basis and application value for accurate tumor medical treatment.

Description

Tumor heterogeneity identification method and device, electronic equipment and storage medium
Technical Field
The invention belongs to the technical field of tumor identification, and particularly relates to a tumor heterogeneity identification method, a tumor heterogeneity identification device, electronic equipment and a storage medium.
Background
Tumors are one of the major causes of death worldwide, causing a serious social burden. Intratumoral heterogeneity refers to a subpopulation of neoplastic cells that have morphological and functional differences between different regions of the same primary tumor or between the primary foci and metastases. Because the tumor has heterogeneity in both time and space, it brings great difficulty to clinically select an ideal tumor marker and realize accurate treatment.
Currently, detection of tumor heterogeneity generally requires needle biopsy of directly harvested diseased tissue, which is the gold standard for clinical diagnosis. The method firstly needs to obtain a tissue sample of a tumor by means of puncturing and the like, separate single cells, and perform transcriptomics analysis through single-cell RNA-sequencing (scRNA-seq) to know the diversity of cell states and the heterogeneity of cell groups.
For the consideration of existing data resources, research has also been selecting for heterogeneity analysis using the transcript profiles of tissue samples. In most of the current studies, the premise of deconvolution of patient gene expression profiles is that deconvolution of expression profiles mixed with different tumor samples is required to obtain a reference expression profile of the tumor, and then supervised deconvolution is performed according to the obtained reference expression profile, thereby analyzing intratumoral heterogeneity of the patient.
However, the applicant of the present invention has found that the above technical solution has at least the following disadvantages:
needle biopsy can be traumatic to the patient, may lead to complications, and may even increase the risk of tumor cells entering the blood.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a method for identifying tumor heterogeneity, which aims to solve the problems mentioned in the background art.
The embodiment of the invention is realized in such a way that the tumor heterogeneity identification method comprises the following steps:
positioning a tumor risk gene with changed consistency according to the tumor copy number variation data and the tumor transcription spectrum data;
carrying out unsupervised deconvolution analysis on the expression profile of the tumor risk genes, and identifying subcloned specific genes related to the expression of the tumor risk genes;
performing biological function analysis and survival analysis on the subcloned specific genes, and determining the subcloned specific genes which reach the specified degree of correlation with the survival of the patient;
carrying out consistent clustering analysis on the sample patients based on the subcloned specific genes with the survival function reaching the specified correlation degree to obtain classification labels, constructing an optimal tumor prognosis model according to the tumor MRI images of the sample patients and the classification labels, and screening optimal image genome characteristics;
and analyzing the external tumor MRI image through the optimal tumor prognosis model and the optimal image genome characteristics.
Preferably, the step of locating a tumor risk gene with changed consistency according to the tumor copy number variation data and the tumor transcription profile data comprises the following steps:
identifying a copy number significant variation area reaching a specified degree according to the tumor copy number variation data;
and (3) positioning the tumor risk genes with changed consistency on the copy number significant variation region by combining with tumor transcription spectrum data.
Preferably, the step of performing a biological function analysis on the subclone-specific genes comprises:
and performing gene enrichment analysis on the subcloned specific genes through a biological pathway database, and marking the corresponding subcloned specific genes by taking the biological pathway with the most remarkable enrichment as a biological function.
Preferably, the steps of constructing an optimal tumor prognosis model and screening optimal image genome features according to the tumor MRI image of the sample patient and the classification label comprise:
constructing a tumor prognosis model and screening image genome characteristics by a supervision deconvolution algorithm, a machine learning algorithm and a convolutional neural network algorithm according to the tumor MRI image of the sample patient and the classification label, and selecting an optimal tumor prognosis model and optimal image genome characteristics from the tumor prognosis model and the image genome characteristics according to evaluation indexes; the machine learning algorithm comprises a plurality of data screening modes and a plurality of model structures.
It is another object of an embodiment of the present invention to provide a tumor heterogeneity identification apparatus, including:
the tumor risk gene positioning module is used for positioning the tumor risk genes with changed consistency according to the tumor copy number variation data and the tumor transcription spectrum data;
the subclone specific gene identification module is used for carrying out unsupervised deconvolution analysis on the expression profile of the tumor risk gene and identifying the subclone specific gene related to the expression of the tumor risk gene;
a biological function analysis and survival analysis module for performing biological function analysis and survival analysis on the subcloned specific genes and determining the subcloned specific genes which have a specified degree of correlation with the survival of the patient;
the tumor prognosis model building module is used for carrying out consistent clustering analysis on the sample patients based on the subcloned specific genes with the survival functions reaching the specified correlation degree to obtain classification labels, building an optimal tumor prognosis model and screening optimal image genome characteristics according to the tumor MRI images of the sample patients and the classification labels;
and the tumor image analysis module is used for analyzing the external tumor MRI images through the optimal tumor prognosis model and the optimal image genome characteristics.
Preferably, the tumor risk gene mapping module comprises:
the copy number significant variation region identification subunit is used for identifying the copy number significant variation region reaching the specified degree according to the tumor copy number variation data;
and the tumor risk gene locator unit is used for combining the tumor transcription spectrum data to locate the tumor risk genes with changed consistency on the copy number significant variation area.
Another object of an embodiment of the present invention is to provide an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the method for tumor heterogeneity identification as described in any of the above.
It is a further object of embodiments of the present invention to provide a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the tumor heterogeneity identification method according to any one of the above.
The tumor heterogeneity identification method provided by the embodiment of the invention comprises the steps of firstly positioning a tumor risk gene with changed consistency according to tumor copy number variation data and tumor transcription profile data, then carrying out unsupervised deconvolution analysis on an expression profile of the tumor risk gene, identifying a subclone specific gene related to tumor risk gene expression, then carrying out biological function analysis and survival analysis on the subclone specific gene, determining the subclone specific gene with a specified degree of correlation with the survival of a patient, carrying out consistency cluster analysis on a sample patient based on the subclone specific gene with the specified degree of correlation with the survival function, obtaining a classification label, constructing an optimal tumor prognosis model and screening optimal image genome characteristics according to a tumor MRI image and the classification label of the sample patient, and finally analyzing an external tumor MRI image through the optimal tumor prognosis model and the optimal image genome characteristics, so that the survival time of the tumor patient is predicted by carrying out quantitative analysis on the tumor heterogeneity through the tumor MRI image on the premise that the patient does not cause trauma, and important theoretical basis and application value are provided for accurate tumor medical treatment.
Drawings
FIG. 1 is a flowchart illustrating steps of a method for identifying tumor heterogeneity according to an embodiment of the present invention;
FIG. 2 is a flowchart of the steps for locating a tumor risk gene with altered consistency according to an embodiment of the present invention;
fig. 3 is a block diagram illustrating a tumor heterogeneity recognition apparatus according to an embodiment of the present invention;
FIG. 4 is a block diagram of a tumor risk gene mapping module according to an embodiment of the present invention;
FIG. 5 is a flow chart of tumor heterogeneity identification provided by embodiments of the present invention;
FIG. 6 is a flowchart illustrating the operation of a tumor heterogeneity identification platform according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Specific implementations of the present invention are described in detail below with reference to specific embodiments.
As shown in fig. 1 and 5, a tumor heterogeneity identification method according to an embodiment of the present invention includes the following steps:
and S100, positioning the tumor risk genes with changed consistency according to the tumor copy number variation data and the tumor transcription spectrum data.
Copy number refers to the number of a gene in a biological genome. Copy number variation is caused by genome rearrangement, generally refers to increase or decrease of copy number of genome large fragment with length more than 1000 bases, and is an important component of genome structure variation.
The tumor transcription profile data is obtained by transcriptome sequencing technology (RNA-seq), which is a method for extracting total RNA and performing sequencing analysis by using high-throughput sequencing technology, so as to reflect the expression level of each gene and further reveal the biological function of the gene.
In this embodiment, the tumor risk genes with altered consistency are first located, and the specific location method is to locate according to the tumor copy number variation data and the tumor transcription profile data.
S200, carrying out unsupervised deconvolution analysis on the expression profile of the tumor risk gene, and identifying the subcloned specific gene related to the expression of the tumor risk gene.
Subcloning refers to the generation of differences in growth rate, invasion capacity, etc. of cell progeny due to partial changes of genetic material under the continuous proliferation of a single tumor cell. The tumor cell population now consists of a plurality of heterogeneous cell populations, i.e. of a plurality of subclones.
Deconvolution is the inverse operation of convolution, which is intended to solve for one input with a known output and another input, and this algorithm is also called supervised deconvolution. Unsupervised deconvolution refers to an operation that solves for two inputs with only known outputs. In this example, the subcloned specific genes and their proportion in each patient (two inputs) were solved by unsupervised deconvolution analysis of the expression profile (output) of the tumor risk genes.
In this embodiment, the unsupervised deconvolution can be specifically adopted as unsupervised Mixture Convex Analysis (CAM), and the CAM assumes that the gene expression information obtained by sequencing can be regarded as mixed expression of a plurality of potential gene subclones, and is the weighting of the specific expression profiles of the potential subclones according to the proportion thereof. By drawing a simplex scatter diagram of an RNA expression profile, taking the vertex angle of each simplex as a potential subclone, performing differential expression analysis on each gene of the subclones, identifying key subclone specific genes which are obviously related to tumor risk gene expression, and calculating a composition proportion matrix of different subclones in a tumor sample and a specific expression matrix of each subclone by utilizing a standardized average and least square method.
S300, performing biological function analysis and survival analysis on the subcloned specific genes, and determining the subcloned specific genes which have the specified degree of correlation with the survival of the patient.
In this example, log-rank can be used to test the correlation of different subclone specific gene ratios with survival function. The log-rank test is a statistical method used to assess the impact of a variable on the survival of a patient. Grouping patients according to different proportions of each subcloned specific gene in all patients, carrying out log-rank test, correcting a p value of significance of a representative variable and survival related to the log-rank test obtained by using a Benjamini-Hochberg method in order to avoid system errors caused by multiple random sampling in the test process, drawing a Kaplan-Meier survival curve by taking a grouping threshold value with the minimum corrected p value as a final grouping threshold value of the subcloned specific gene, calculating a risk ratio (HR) and a 95% confidence interval thereof, and determining the significance of the composition proportion of the subcloned specific gene and the survival of the patients. Gene differential expression analysis was performed on both groups of patients under this classification standard to account for the differences in survival caused by the different ratios of the subcloned specific genes.
In this example, subclonal heterogeneity and its important role in tumor progression are explained by the recognition of the biological functions of the different subclone-specific genes.
S400, carrying out consistent clustering analysis on the sample patients based on the subcloned specific genes with the survival functions reaching the specified correlation degree to obtain classification labels, constructing an optimal tumor prognosis model according to the tumor MRI images of the sample patients and the classification labels, and screening optimal image genome characteristics.
In this embodiment, the sample patients are subjected to consistent clustering analysis by using the subcloned specific genes whose survival functions reach a specified degree of correlation, and the clustering results are determined according to a Cumulative Distribution Function (CDF) curve, so that the patients are grouped and a grouping label is obtained. And then constructing an optimal tumor prognosis model and screening optimal image genome characteristics according to the tumor MRI image of the sample patient and the classification label, thereby depicting the relationship between the subcloned specific genes and the survival and determining the production capacity of the patient for the follow-up.
And S500, analyzing the external tumor MRI image through the optimal tumor prognosis model and the optimal image genome characteristics.
In this embodiment, the MRI images of the external tumors are analyzed through the optimal tumor prognosis model and the optimal image genome feature determined in step S400, so as to evaluate the prognosis efficacy of the image genome feature, and realize quantitative analysis of tumor heterogeneity only through the MRI images of the tumors without causing trauma to the patient, thereby predicting the survival time of the tumor patient and providing important theoretical basis and application value for precise medical treatment of the tumors.
In the embodiment, firstly, the tumor risk genes with changed consistency are positioned according to the tumor copy number variation data and the tumor transcription spectrum data, then the expression spectrum of the tumor risk genes is subjected to unsupervised deconvolution analysis, the subclone specific genes related to the expression of the tumor risk genes are identified, then the subclone specific genes are subjected to biological function analysis and survival analysis, the subclone specific genes with the specified degree of correlation with the survival of patients are determined, then the sample patients are subjected to consistency cluster analysis based on the subclone specific genes with the specified degree of correlation with the survival function, classification labels are obtained, an optimal tumor prognosis model is constructed and optimal image genome characteristics are screened according to the tumor MRI images and the classification labels of the sample patients, finally, the external tumor MRI images are analyzed through the optimal tumor prognosis model and the optimal image genome characteristics, so that the tumor heterogeneity quantitative analysis is only performed through the tumor MRI images on the premise of not causing trauma to the patients, the survival time of the tumor patients is predicted, and important theoretical basis and application value are provided for accurate tumor medical treatment.
In this embodiment, a genome database of pan-carcinomas can be constructed in advance. The method comprises the following specific steps: acquiring 11 kinds of Cancer related data including an RNA Expression profile, a copy number variation profile, patient survival information and MRI image data of TCGA (The Cancer Genome Atlas-Cancer Genome), and further integrating a GEO (Gene Expression Omnibus) database, a TCIA (The Cancer Imaging Archive) database, a CGGA (chip glooma Genome Atlas), a biological pathway database (KEGG, reactor, SMPDB and The like) and other resources to complete The pan-Cancer image Genome database.
In this example, 11 carcinomic data (MRI image data, RNA-seq gene expression profile, copy number variation profile, and clinical survival information) of TCGA can be selected as shown in the following table.
TCGA cancer types MRI RNA-seq Copy Number Variation Clinical
Urothelial carcinoma of Bladder (BLCA) 20 406 412 412
Infiltrative cancer of mammary gland (BRCA) 137 1095 1098 1097
Cervical squamous carcinoma and adenocarcinoma (CESC) 54 304 302 307
Glioblastoma (GBM) 262 166 599 599
Head and neck squamous cell carcinoma (HNSC) 227 503 526 528
Clear cell carcinoma of Kidney (KIRC) 62 532 534 537
Papillary carcinoma of Kidney (KIRP) 17 290 290 291
Brain Low Grade Glioma (LGG) 199 514 515 515
Liver cell carcinoma (LIHC) 40 371 376 337
Prostate cancer (PRAD) 10 497 498 500
Endometrial Cancer (UCEC) 8 557 558 548
As shown in fig. 2, in one aspect of the present embodiment, the step of locating a tumor risk gene with altered consistency according to tumor copy number variation data and tumor transcription profile data comprises:
s101, identifying a copy number significant variation area reaching a specified degree according to tumor copy number variation data;
and S102, combining the tumor transcription spectrum data, and positioning the tumor risk genes with changed consistency on the significant variation region of the copy number.
In this example, tumor copy number variation expression profiles of TCGA sample patients (exemplified by breast cancer) were first obtained from the GDC platform (Genomic Data common Data Portal). This data was generated using relevant hardware equipment (e.g., affymetrix SNP 6.0 chip) to identify duplicate genomic regions and calculate the copy number of these regions by subsequent analysis. The GDC platform further converts the copy number to segment mean format, equal to log2[ (copy number)/2 ]. Human is diploid, segment mean is 0, segment mean of amplified region is positive, and deletion region is negative. After copy number variation data in a segment mean format are obtained, a GISTIC (Genome Identification of signature Targets in Cancer) 2.0 module of a GenePattern platform is selected to analyze the copy number variation data of the tumor, the module considers the frequency and the intensity of the copy number variation at the same time, a frequent copy number variation area is determined by scoring the variation significance, and genes with the copy number variation Significant are identified by combining with GRCh38 human reference Genome information. And acquiring RNA-seq data of the healthy sample and the breast cancer sample, performing differential expression analysis, and identifying the gene with significantly changed expression after suffering from the breast cancer. Data for significant variation in copy number and significant changes in expression were integrated and genes that increased or decreased in both were screened to localize breast cancer risk genes with altered consistency.
In one aspect of this embodiment, the step of performing a biological function analysis of the subclone-specific genes comprises:
and carrying out gene set enrichment analysis on the subcloned specific genes through a biological pathway database, and marking the corresponding subcloned specific genes by taking the biological pathway with the most remarkable enrichment as a biological function.
In this example, the specific genes of each gene subclone were subjected to pathway analysis, and the correlation between the specific genes of each subclone and the biological pathway database (KEGG, reactome, SMPDB, etc. may be selected) was measured by a hyper-geometric test, and the biological mechanism thereof was analyzed. Using the biological pathway database, gene Set Enrichment Analysis (GSEA) was performed on the subcloned specific genes. The method comprises the steps of firstly ordering identified subclone specific genes according to the specificity degree, then observing the ranking of genes of a plurality of channel gene sets contained in a biological channel database on the subclone specific genes, calculating the enrichment score of each gene set according to the ranking, finally selecting the channel gene set according to the enrichment score, and taking the channel gene set as a subclone enriched biological channel. This subclone was named after the most significantly enriched pathway and was taken as the biological function of this subclone.
In one aspect of this embodiment, the steps of constructing an optimal tumor prognosis model and screening optimal image genomic features based on the MRI images of the tumor of the sample patient and the classification tags comprise:
constructing a tumor prognosis model and screening image genome characteristics by a supervision deconvolution algorithm, a machine learning algorithm and a convolutional neural network algorithm according to the tumor MRI image of the sample patient and the classification label, and selecting an optimal tumor prognosis model and optimal image genome characteristics from the tumor prognosis model and the image genome characteristics according to evaluation indexes; the machine learning algorithm comprises a plurality of data screening modes and a plurality of model structures.
In this embodiment, the evaluation index may take AUC, decision curve, or the like. Specifically, after a plurality of tumor prognosis models are constructed and image genome features are screened, an ROC curve is drawn, and a classification threshold is determined according to the Johnson index (sensitivity + specificity-1). And evaluating the classification effect of the model according to modes such as AUC (AUC) and decision curve, and taking the image genome characteristics with the best classification effect and the corresponding model as final selection. After correcting clinical variables such as age, ER, PR, HER2, and tumor size, a multifactor Cox regression model was used to analyze whether image genomic features are independent prognostic factors for overall survival or relapse-free survival. And (3) carrying out likelihood ratio test on the survival model using the image genome characteristics as variables and the survival model using only clinical variables, and evaluating the improvement of the model on the prognosis efficiency by using the method.
In this embodiment, various data screening methods may be adopted, such as GBDT, LASSO, RF, XGBoost, etc., and various model structures may be adopted, such as RF, GBDT, adaboost, LR, NB, SVM, DT, KNM, etc.
In the machine learning task, the screening method of the image genome features and the structure of the tumor prognosis model directly influence the prediction performance of the model. In order to reduce the influence of training data and model structures as much as possible and improve the classification effect of the model on the patient gene subclone composition proportion, 4 feature screening modes (GBDT, LASSO, RF and XGboost) and 8 models with different structures (RF, GBDT, adaboost, LR, NB, SVM, DT and KNM) are selected for training. In the feature screening process, image features are selected according to variable importance sequences given by GBDT, RF and XGboost, and LASSO can narrow down regression coefficients of irrelevant variables to zero, retain variables with nonzero coefficients, and accordingly, the image features are selected and used for classifying patients with different subclone composition ratios. In the aspect of model construction, firstly, information of tumors such as the size, the shape and the boundary definition is digitized by utilizing a PyRadiomics tool, numerical variables reflecting tumor image information such as a gray matrix, first-order statistics and second-order statistics are output, and then a machine learning model is constructed by combining classification labels to classify patients.
With the development of computer performance and medical image digitization, many studies for model building by using a neural network appear. Compared with the traditional machine learning task, the Convolutional Neural Network (CNN) can directly understand and select the image data, and errors caused by insufficient extraction of image information are avoided. The method does not need to digitize the image data, can directly learn the input image information, and can adjust the weight of each node in multiple iterative learning by setting a loss function, thereby reducing the classification error. In order to fully extract the image genome features capable of reflecting tumor heterogeneity, the difference significance of the optimal image genome features in different patient groups can be evaluated through t test, and the internal relation between the optimal image genome features and the tumor heterogeneity genome features is described. After training is completed, a heat map is drawn to visualize the region of interest of the CNN, and model interpretability is enhanced.
As shown in fig. 3 and 5, a tumor heterogeneity recognition apparatus according to an embodiment of the present invention includes:
and the tumor risk gene positioning module 100 is used for positioning the tumor risk genes with changed consistency according to the tumor copy number variation data and the tumor transcription spectrum data.
In this embodiment, the tumor risk genes with changed consistency are first located, and the specific locating method is to locate according to the tumor copy number variation data and the tumor transcription profile data.
And the subclone specific gene identification module 200 is used for performing unsupervised deconvolution analysis on the expression profile of the tumor risk gene and identifying the subclone specific gene related to the expression of the tumor risk gene.
In this embodiment, the unsupervised deconvolution can be specifically adopted as unsupervised Mixture Convex Analysis (CAM), and the CAM assumes that the gene expression information obtained by sequencing can be regarded as mixed expression of a plurality of potential gene subclones, and is the weighting of the specific expression profiles of the potential subclones according to the proportion thereof. By drawing a simplex scatter diagram of an RNA expression profile, taking the vertex angle of each simplex as a potential subclone, performing differential expression analysis on each gene of the subclones, identifying key subclone specific genes which are obviously related to tumor risk gene expression, and calculating a composition proportion matrix of different subclones in a tumor sample and a specific expression matrix of each subclone by utilizing a standardized average and least square method.
And a biological function analysis and survival analysis module 300 for performing biological function analysis and survival analysis on the subcloned specific genes to determine the subcloned specific genes that have a degree of association with the patient's survival as specified.
In this example, log-rank can be used to test the correlation of different subclone specific gene ratios with survival function. The log-rank test is a statistical method used to assess the impact of a variable on the survival of a patient. Grouping patients according to different proportions of each subcloned specific gene in all patients, carrying out log-rank test, correcting a p value of significance of a representative variable and survival related to the log-rank test obtained by using a Benjamini-Hochberg method in order to avoid system errors caused by multiple random sampling in the test process, drawing a Kaplan-Meier survival curve by taking a grouping threshold value with the minimum corrected p value as a final grouping threshold value of the subcloned specific gene, calculating a risk ratio (HR) and a 95% confidence interval thereof, and determining the significance of the composition proportion of the subcloned specific gene and the survival of the patients. Gene differential expression analysis was performed on both groups of patients under this classification standard to account for the differences in survival caused by the different ratios of the subcloned specific genes.
In this example, the subclonal heterogeneity and its important role in tumor progression are explained by the recognition of the biological functions of the different subclone-specific genes.
And the tumor prognosis model building module 400 is used for performing consistent cluster analysis on the sample patients based on the subcloned specific genes with the survival functions reaching the specified related degree to obtain classification labels, building an optimal tumor prognosis model according to the tumor MRI images of the sample patients and the classification labels, and screening optimal image genome characteristics.
In this embodiment, the sample patients are subjected to consistent clustering analysis by using the subcloned specific genes whose survival functions reach a specified degree of correlation, and clustering results are determined according to a Cumulative Distribution Function (CDF) curve, so that the patients are grouped and a grouping label is obtained. And then constructing an optimal tumor prognosis model and screening optimal image genome characteristics according to the tumor MRI image of the sample patient and the classification label, thereby depicting the relationship between the subcloned specific genes and the survival and determining the production capacity of the patient for the follow-up.
And the tumor image analysis module 500 is used for analyzing the external tumor MRI images through the optimal tumor prognosis model and the optimal image genome characteristics.
In this embodiment, the MRI images of the external tumors are analyzed through the optimal tumor prognosis model and the optimal image genome characteristics determined by the tumor prognosis model building module 400, so as to evaluate the prognosis efficacy of the image genome characteristics, and realize quantitative analysis of tumor heterogeneity only through the MRI images of the tumors without causing trauma to the patient, thereby predicting the survival time of the tumor patient and providing important theoretical basis and application value for accurate medical treatment of the tumors.
In the embodiment, firstly, the tumor risk genes with changed consistency are positioned according to the tumor copy number variation data and the tumor transcription profile data, then the expression profile of the tumor risk genes is subjected to unsupervised deconvolution analysis, the subclone specific genes related to the expression of the tumor risk genes are identified, then the subclone specific genes are subjected to biological function analysis and survival analysis, the subclone specific genes with the specified degree of correlation with the survival of patients are determined, then the sample patients are subjected to consistency cluster analysis based on the subclone specific genes with the specified degree of correlation with the survival function, classification labels are obtained, an optimal tumor prognosis model is constructed and optimal image genome characteristics are screened according to the tumor MRI images and the classification labels of the sample patients, and finally, the external tumor MRI images are analyzed according to the optimal tumor prognosis model and the optimal image genome characteristics, so that the tumor heterogeneity quantitative analysis is carried out only through the tumor MRI images on the premise that no wound is caused to the patients, the survival time of the tumor patients is predicted, and important theoretical basis and application value are provided for accurate medical treatment of tumors.
As shown in fig. 4, in one aspect of the present embodiment, the tumor risk gene mapping module 100 comprises:
a copy number significant variation region identification subunit 101, configured to identify, according to the tumor copy number variation data, a copy number significant variation region that reaches a specified degree;
and the tumor risk gene locator unit 102 is used for combining the tumor transcription spectrum data to locate the tumor risk genes with changed consistency on the copy number significant variation area.
In this example, tumor copy number variation expression profiles of TCGA sample patients (exemplified by breast cancer) were first obtained from the GDC platform (Genomic Data common Data Portal). This data was generated using relevant hardware devices (e.g., affymetrix SNP 6.0 chips) to identify duplicate genomic regions and calculate the copy number of these regions by subsequent analysis. The GDC platform further converts the copy number to a segment mean format, equal to log2[ (copy number)/2 ]. Human is diploid, segment mean is 0, segment mean of amplified region is positive, and deletion region is negative. After a copy number variation expression profile in a segment mean format is obtained, a GISTIC (Genome Identification of signature Targets in Cancer) 2.0 module of a GenePattern platform is selected to analyze the copy number variation data of the tumor, the module considers the frequency and the intensity of the copy number variation at the same time, a frequent copy number variation area is determined by scoring the variation significance, and genes with the copy number variation are identified by combining the GRCh38 human reference Genome information. And acquiring RNA-seq data of the healthy sample and the breast cancer sample, performing differential expression analysis, and identifying the gene with significantly changed expression after suffering from the breast cancer. Data for significant variation in copy number and significant change in expression were integrated and genes with both increased or decreased were screened to locate breast cancer risk genes with altered consistency.
In one aspect of the present embodiment, the biological function analysis and survival analysis module 300 includes:
and the gene enrichment analysis unit is used for carrying out gene enrichment analysis on the subclone specific genes through a biological pathway database, and marking the corresponding subclone specific genes by taking the biological pathway with the most obvious enrichment as a biological function. In this example, the specific genes of each gene subclone were subjected to pathway analysis, and the correlation between the specific genes of each subclone and the biological pathway database (KEGG, reactome, SMPDB, etc. may be selected) was measured by a hyper-geometric test, and the biological mechanism thereof was analyzed. Using the biological pathway database, gene Set Enrichment Analysis (GSEA) was performed on the subcloned specific genes. The method comprises the steps of firstly ordering identified subclone specific genes according to the specificity degree, then observing the ranking of genes of a plurality of channel gene sets contained in a biological channel database on the subclone specific genes, calculating the enrichment score of each gene set according to the ranking, finally selecting the channel gene set according to the enrichment score, and taking the channel gene set as a subclone enriched biological channel. The most significantly enriched pathway was named and used as the biological function of the subclone.
And the survival analysis unit is used for carrying out survival analysis on the subcloned specific genes through the patient survival information and identifying the subcloned specific genes which are related to the survival of the patient to a specified degree.
In this example, the specific genes of each gene subclone were subjected to survival analysis, and log-rank was used to examine the correlation between the ratios of the specific genes of different subclones and survival functions. The log-rank test is a statistical method used to assess the impact of a variable on the survival of a patient. Grouping patients according to different proportions of each subcloned specific gene in all patients, carrying out log-rank test, correcting a p value of significance of a representative variable and survival related to the log-rank test obtained by using a Benjamini-Hochberg method in order to avoid system errors caused by multiple random sampling in the test process, drawing a Kaplan-Meier survival curve by taking a grouping threshold value with the minimum corrected p value as a final grouping threshold value of the subcloned specific gene, calculating a risk ratio (HR) and a 95% confidence interval thereof, and determining the significance of the composition proportion of the subcloned specific gene and the survival of the patients.
In one aspect of this embodiment, the tumor prognosis model construction module 400 includes:
the module construction and feature selection unit is used for constructing a tumor prognosis model and screening image genome features through a supervision deconvolution algorithm, a machine learning algorithm and a convolutional neural network algorithm according to the tumor MRI image of the sample patient and the classification label, and selecting an optimal tumor prognosis model and optimal image genome features from the tumor prognosis model and the image genome features according to evaluation indexes; the machine learning algorithm comprises a plurality of data screening modes and a plurality of model structures.
In this embodiment, the evaluation index may take AUC, a decision curve, or the like. Specifically, after a plurality of tumor prognosis models are constructed and image genome features are screened, an ROC curve is drawn, and a classification threshold is determined according to the Johnson index (sensitivity + specificity-1). And evaluating the classification effect of the model according to modes such as AUC (AUC) and decision curve, and taking the image genome characteristics with the best classification effect and the corresponding model as final selection. After correcting clinical variables such as age, ER, PR, HER2, and tumor size, a multifactor Cox regression model was used to analyze whether image genomic features are independent prognostic factors for overall survival or relapse-free survival. And (3) carrying out likelihood ratio test on the survival model using the image genome characteristics as variables and the survival model using only clinical variables, and evaluating the improvement of the model on the prognosis performance of the method.
In this embodiment, various data screening methods such as GBDT, LASSO, RF, XGBoost, etc. may be adopted, and various model structures such as RF, GBDT, adaboost, LR, NB, SVM, DT, KNM, etc. may be adopted.
In the machine learning task, the screening mode of the image genome features and the structure of the tumor prognosis model directly influence the prediction efficiency of the model. In order to reduce the influence of training data and model structures as much as possible and improve the classification effect of the model on the patient gene subclone composition proportion, 4 feature screening modes (GBDT, LASSO, RF and XGboost) and 8 models with different structures (RF, GBDT, adaboost, LR, NB, SVM, DT and KNM) are selected for training. In the feature screening process, image features are selected according to variable importance ranking given by GBDT, RF and XGboost, LASSO can reduce regression coefficients of irrelevant variables to zero, and variables with nonzero coefficients are reserved, so that the image features are selected for classifying patients with different subclone composition ratios. In the aspect of model construction, firstly, information of tumors such as the size, the shape and the boundary definition is digitized by utilizing a PyRadiomics tool, numerical variables reflecting tumor image information such as a gray matrix, first-order statistics and second-order statistics are output, and then a machine learning model is constructed by combining classification labels to classify patients.
With the development of computer performance and medical image digitization, many studies for modeling using neural networks have emerged. Compared with the traditional machine learning task, the Convolutional Neural Network (CNN) can directly understand and select the image data, and errors caused by insufficient extraction of image information are avoided. The method does not need to digitize the image data, can directly learn the input image information, and can adjust the weight of each node in multiple iterative learning by setting a loss function, thereby reducing the classification error. In order to fully extract the image genome features capable of reflecting tumor heterogeneity, the difference significance of the optimal image genome features in different patient groups can be evaluated through t test, and the internal relation between the optimal image genome features and the tumor heterogeneity genome features is described. After training is completed, a heat map is drawn to visualize the region of interest of the CNN, and model interpretability is enhanced.
In another embodiment, a tumor heterogeneity identification platform is provided, which is constructed based on the optimal tumor prognosis model and optimal image genomic features obtained by the tumor heterogeneity identification method provided in the above embodiments.
The image genome features with prognostic efficiency are identified aiming at different cancers, data management is carried out based on a MySQL database, docker is used as an operation soft carrier, all program operation and development environments are packaged into Docker containers and packaged into mirror images, a foreground website page is compiled by Java language, and a user-friendly image genome feature prognostic analysis platform for pan-cancer is constructed. Figure 6 illustrates the workflow of the platform. The platform mainly comprises the following analysis modules: 1) Pan-cancer MRI data repository and extensible other disease database interfaces; 2) A copy number variation driven pan-cancer subclone specific gene query analysis module; 3) A copy number variation driven pan-cancer subclone function analysis module; 4) And the image genome feature on-line real-time analysis module is used for predicting.
In another embodiment, with reference to fig. 7, there is provided an electronic device including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the tumor heterogeneity identification method provided in the above embodiments when executing the computer program.
In the embodiment, data transmission is realized between the memory and the processor through the communication interface. The memory may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. A processor for implementing the tumor heterogeneity identification method provided in the above embodiments when executing the computer program. If the memory, the processor and the communication interface are implemented independently, the communication interface, the memory and the processor may be connected to each other via a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but that does not indicate only one bus or one type of bus.
Optionally, in a specific implementation, if the memory, the processor, and the communication interface are integrated on a chip, the memory, the processor, and the communication interface may complete mutual communication through an internal interface.
The processor may be a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the present embodiments.
In another embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the tumor heterogeneity identification method provided in the above embodiments.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A method for identifying tumor heterogeneity, comprising the steps of:
positioning a tumor risk gene with changed consistency according to the tumor copy number variation data and the tumor transcription spectrum data;
carrying out unsupervised deconvolution analysis on the expression profile of the tumor risk gene, and identifying a subcloned specific gene related to the expression of the tumor risk gene;
performing biological function analysis and survival analysis on the subcloned specific genes, and determining the subcloned specific genes which are related to the survival degree of the patient to be specified;
carrying out consistent clustering analysis on the sample patients based on the subcloned specific genes with the survival function reaching the specified correlation degree to obtain classification labels, constructing an optimal tumor prognosis model according to the tumor MRI images of the sample patients and the classification labels, and screening optimal image genome characteristics;
and analyzing the external tumor MRI images through the optimal tumor prognosis model and the optimal image genome characteristics.
2. The method for tumor heterogeneity identification according to claim 1, wherein the step of locating tumor risk genes with consistent changes according to tumor copy number variation data and tumor transcript profile data comprises:
identifying a copy number significant variation area reaching a specified degree according to the tumor copy number variation data;
and (3) positioning the tumor risk genes with changed consistency on the copy number significant variation region by combining with tumor transcription spectrum data.
3. The method for tumor heterogeneity recognition according to claim 1, wherein said step of performing biological function analysis on said subcloned specific genes comprises:
and performing gene enrichment analysis on the subcloned specific genes through a biological pathway database, and marking the corresponding subcloned specific genes by taking the biological pathway with the most remarkable enrichment as a biological function.
4. The method for tumor heterogeneity identification according to claim 1, wherein the steps of constructing optimal tumor prognosis models and screening optimal image genome features according to the MRI images of the sample patients and the classification tags comprise:
constructing a tumor prognosis model and screening image genome characteristics by a supervision deconvolution algorithm, a machine learning algorithm and a convolutional neural network algorithm according to the tumor MRI image of the sample patient and the classification label, and selecting an optimal tumor prognosis model and optimal image genome characteristics from the tumor prognosis model and the image genome characteristics according to evaluation indexes; the machine learning algorithm comprises a plurality of data screening modes and a plurality of model structures.
5. A tumor heterogeneity recognition apparatus comprising:
the tumor risk gene positioning module is used for positioning the tumor risk genes with changed consistency according to the tumor copy number variation data and the tumor transcription spectrum data;
the subclone specific gene identification module is used for carrying out unsupervised deconvolution analysis on the expression profile of the tumor risk gene and identifying the subclone specific gene related to the expression of the tumor risk gene;
a biological function analysis and survival analysis module for performing biological function analysis and survival analysis on the subcloned specific genes and determining the subcloned specific genes which have a specified degree of correlation with the survival of the patient;
the tumor prognosis model building module is used for carrying out consistent clustering analysis on the sample patients based on the subcloned specific genes with the survival functions reaching the specified correlation degree to obtain classification labels, building an optimal tumor prognosis model and screening optimal image genome characteristics according to the tumor MRI images of the sample patients and the classification labels;
and the tumor image analysis module is used for analyzing the external tumor MRI images through the optimal tumor prognosis model and the optimal image genome characteristics.
6. The tumor heterogeneity identification apparatus according to claim 1, wherein the tumor risk gene mapping module comprises:
the copy number significant variation region identification subunit is used for identifying the copy number significant variation region reaching the specified degree according to the tumor copy number variation data;
and the tumor risk gene locator unit is used for combining the tumor transcription spectrum data to locate the tumor risk genes with changed consistency on the copy number significant variation area.
7. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of tumor heterogeneity identification according to any one of claims 1 to 4 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for tumor heterogeneity identification according to any one of claims 1 to 4.
CN202210964997.7A 2022-08-11 2022-08-11 Tumor heterogeneity identification method and device, electronic equipment and storage medium Pending CN115375640A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210964997.7A CN115375640A (en) 2022-08-11 2022-08-11 Tumor heterogeneity identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210964997.7A CN115375640A (en) 2022-08-11 2022-08-11 Tumor heterogeneity identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115375640A true CN115375640A (en) 2022-11-22

Family

ID=84065777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210964997.7A Pending CN115375640A (en) 2022-08-11 2022-08-11 Tumor heterogeneity identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115375640A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798601A (en) * 2023-02-03 2023-03-14 北京灵迅医药科技有限公司 Tumor characteristic gene identification method, device, equipment and storage medium
CN116385441A (en) * 2023-06-05 2023-07-04 中国科学院深圳先进技术研究院 Method and system for risk stratification of oligodendroglioma based on MRI
CN116386903A (en) * 2023-06-06 2023-07-04 中国医学科学院肿瘤医院 Method for reading heterogeneity between tumors and in tumors of small cell lung cancer
CN116403076A (en) * 2023-06-06 2023-07-07 中国科学院深圳先进技术研究院 Method and system for risk stratification of GBM patient based on DTI sequence

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115798601A (en) * 2023-02-03 2023-03-14 北京灵迅医药科技有限公司 Tumor characteristic gene identification method, device, equipment and storage medium
CN116385441A (en) * 2023-06-05 2023-07-04 中国科学院深圳先进技术研究院 Method and system for risk stratification of oligodendroglioma based on MRI
CN116385441B (en) * 2023-06-05 2023-09-05 中国科学院深圳先进技术研究院 Method and system for risk stratification of oligodendroglioma based on MRI
CN116386903A (en) * 2023-06-06 2023-07-04 中国医学科学院肿瘤医院 Method for reading heterogeneity between tumors and in tumors of small cell lung cancer
CN116403076A (en) * 2023-06-06 2023-07-07 中国科学院深圳先进技术研究院 Method and system for risk stratification of GBM patient based on DTI sequence
CN116403076B (en) * 2023-06-06 2023-08-22 中国科学院深圳先进技术研究院 Method and system for risk stratification of GBM patient based on DTI sequence
CN116386903B (en) * 2023-06-06 2023-11-10 中国医学科学院肿瘤医院 Method for reading heterogeneity between tumors and in tumors of small cell lung cancer

Similar Documents

Publication Publication Date Title
CN115375640A (en) Tumor heterogeneity identification method and device, electronic equipment and storage medium
US11783915B2 (en) Convolutional neural network systems and methods for data classification
CN112048559B (en) Model construction and clinical application of m 6A-related IncRNA network gastric cancer prognosis
US8165973B2 (en) Method of identifying robust clustering
JP6063446B2 (en) Analysis of biomarker expression in cells by product rate
US9613254B1 (en) Quantitative in situ characterization of heterogeneity in biological samples
CN109872776B (en) Screening method for potential biomarkers of gastric cancer based on weighted gene co-expression network analysis and application thereof
US20200219587A1 (en) Systems and methods for using fragment lengths as a predictor of cancer
CN112766428B (en) Tumor molecule typing method and device, terminal device and readable storage medium
CN109801680A (en) Tumour metastasis and recurrence prediction technique and system based on TCGA database
CN111440869A (en) DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof
CN107463797B (en) Biological information analysis method and device for high-throughput sequencing, equipment and storage medium
Liu et al. Pathological prognosis classification of patients with neuroblastoma using computational pathology analysis
CN115881296B (en) Thyroid papillary carcinoma (PTC) risk auxiliary layering system
CN112397153A (en) Method for screening biomarker for predicting esophageal squamous cell carcinoma prognosis
KR20200109544A (en) Multi-cancer classification method by common significant genes
CN110942808A (en) Prognosis prediction method and prediction system based on gene big data
Zhang et al. Predicting IHC staining classes of NF1 using features in the hematoxylin channel
CN114974432A (en) Screening method of biomarker and related application thereof
CN113981081A (en) Breast cancer molecular marker based on RNA editing level and diagnosis model
US11796446B2 (en) Systems and methods for automated hematological abnormality detection
Batool et al. Towards Improving Breast Cancer Classification using an Adaptive Voting Ensemble Learning Algorithm
CN112382341A (en) Method for identifying biomarkers related to esophageal squamous carcinoma prognosis
Deng et al. Genopathomic Profiling Identifies Signatures for Immunotherapy Response of Lung Cancer Via Confounder-Aware Representation Learning
Fourgoux Field Cancerisation in Breast Cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination