CN115982644B - Esophageal squamous cell carcinoma classification model construction and data processing method - Google Patents

Esophageal squamous cell carcinoma classification model construction and data processing method Download PDF

Info

Publication number
CN115982644B
CN115982644B CN202310063027.4A CN202310063027A CN115982644B CN 115982644 B CN115982644 B CN 115982644B CN 202310063027 A CN202310063027 A CN 202310063027A CN 115982644 B CN115982644 B CN 115982644B
Authority
CN
China
Prior art keywords
ddr
data
pathway
sample
gene expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310063027.4A
Other languages
Chinese (zh)
Other versions
CN115982644A (en
Inventor
刘芝华
陈洪岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cancer Hospital and Institute of CAMS and PUMC
Original Assignee
Cancer Hospital and Institute of CAMS and PUMC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cancer Hospital and Institute of CAMS and PUMC filed Critical Cancer Hospital and Institute of CAMS and PUMC
Priority to CN202310063027.4A priority Critical patent/CN115982644B/en
Publication of CN115982644A publication Critical patent/CN115982644A/en
Application granted granted Critical
Publication of CN115982644B publication Critical patent/CN115982644B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method, a system, equipment and a computer readable storage medium for constructing and processing esophageal squamous cell carcinoma classification models, wherein the method comprises the following steps: acquiring sequencing data of a training set sample and a life cycle condition corresponding to the sample; extracting a DDR channel gene set and a gene expression condition thereof from sequencing data of the training set sample; selecting the DDR pathway gene set to obtain a pathway related to survival rate and the gene expression condition of the pathway related to survival rate; the survival rate related pathway comprises one or more of the following: MMR pathway, NER pathway, FA pathway, and NHEJ pathway; and carrying out cluster analysis on the training set sample based on the lifetime condition to obtain different classification subtypes, and characterizing the passage and gene expression condition of each group of classification subtypes to obtain a classification model.

Description

Esophageal squamous cell carcinoma classification model construction and data processing method
Technical Field
The invention relates to the field of data analysis, in particular to a method and a system for constructing and processing esophageal squamous cell carcinoma classification models.
Background
Esophageal squamous cell carcinoma (Esophageal squamous cell carcinoma, ESCC) is a malignant tumor that threatens human health. Five year survival in ESCC patients is less than 20% in developed countries and less than 5% in many developing countries. Notably, some primary esophageal cancer patients often relapse rapidly after esophageal resection, and the prognosis of these patients remains poor. To date, no accurate molecular biomarkers can predict the development of these primary ESCC patients, resulting in inadequate clinical management. Thus, there is an urgent need to identify new prognostic biomarkers for primary ESCC.
A variety of synergistic repair mechanisms can rapidly and properly repair DNA damage in normal cells; DNA double strand breaks are repaired primarily by Homologous Recombination (HR) and non-homologous end joining (NHEJ), DNA single strand breaks are repaired primarily by mismatch repair (MMR) and nucleotide excision repair pathways (NER). DNA Damage Repair (DDR) defects can lead to accumulation of DNA damage and genomic instability, production of neoantigens, and up-regulation of expression of immune checkpoints, ultimately altering immune balance in the Tumor Microenvironment (TME). Interestingly, DDR deficiency becomes an important determinant of anti-tumor immune response by affecting antigenicity, adjuvanticity and responsiveness, which may contribute to the response of immunotherapy. Recent studies have revealed the potential of some DDR-based biomarkers in predicting immune therapeutic responses; however, the value of DDR-related features for prognostic evaluation and personalized immunotherapy has not yet been fully elucidated. Thus, it is crucial to reveal a link between the change in tumor DDR pathways and prognosis.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. The invention provides a construction method of esophageal squamous cell carcinoma classification model, which comprises the steps of screening DDR (double data rate) channel gene sets and gene expression conditions thereof by using sequencing data of samples, carrying out clustering analysis on the samples according to the life cycle conditions of the samples to obtain DDR-active subtype and DDR-silent subtype, and representing DDR channel gene sets and gene expression conditions of 2 subtypes to obtain a classification model; the method of the invention processes and analyzes the related data based on the classification model, is used for parting and prognosis evaluation of the primary ESCC, and solves the related life science problem by deep mining of life rules hidden behind the biological data.
The first aspect of the application discloses a construction method of an esophageal squamous cell carcinoma classification model, which comprises the following steps:
Acquiring sequencing data of a training set sample and a life cycle condition corresponding to the sample;
Extracting a DDR channel gene set and a gene expression condition thereof from sequencing data of the training set sample;
selecting the DDR pathway gene set to obtain a pathway related to survival rate and the gene expression condition of the pathway related to survival rate; the survival rate related pathway comprises one or more of the following: MMR pathway, NER pathway, FA pathway, and NHEJ pathway;
and carrying out cluster analysis on the training set sample based on the lifetime condition to obtain different classification subtypes, and characterizing the passage and gene expression condition of each group of classification subtypes to obtain a classification model.
The DDR pathway gene set comprises: BER, MMR, NER, FA, HR, and NHEJ paths;
the cluster analysis method comprises the following steps: a consistency clustering algorithm;
optionally, the method for selecting processing includes: single variable Cox regression analysis;
Optionally, the sequencing data of the training set sample includes: RNA-seq data of primary ESCC tumor tissue samples and metastatic ESCC tumor tissue samples.
The construction method further comprises the following steps: based on the gene expression condition of the survival rate related path, obtaining a DDR gene set related to a survival result and a corresponding gene expression condition by utilizing a univariate regression analysis method; the DDR gene set related to the survival result and the corresponding gene expression situation are processed by utilizing a multivariate analysis method, so that the gene expression situations of a prognosis prediction gene and a prognosis prediction gene are obtained;
And carrying out cluster analysis on the training set sample based on the lifetime condition to obtain different classification subtypes, and representing the prognosis prediction genes and the gene expression conditions of each group of classification subtypes to obtain a classification model.
The different classification subtypes of the classification model include: DDR-active subtype and DDR-silent subtype; the DDR-active subtype corresponds to the gene expression condition of a pathway with high survival rate, and the DDR-active subtype corresponds to the gene expression condition of a pathway with low survival rate;
Optionally, the prognostic prediction gene includes: BRCA1 gene and HFM1 gene; the DDR-active subtype corresponds to the high BRCA1 gene expression quantity, and the DDR-silent subtype corresponds to the high HFM1 gene expression quantity.
The second aspect of the application discloses a method for processing esophageal squamous cell carcinoma data, comprising the following steps:
acquiring sequencing data of a sample to be tested;
Inputting the sequencing data of the sample to be tested into the classification model disclosed in the first aspect of the application to obtain classification results of the DDR-active subtype and the DDR-silent subtype;
Optionally, the method further comprises: predicting the survival rate of the sample to be detected based on the classification result; outputting a result with high survival rate of the sample to be tested based on the classification result of the DDR-active subtype; and outputting a result with low survival rate of the sample to be tested based on the classification result of the DDR-silent subtype.
The third aspect of the application discloses a method for processing esophageal squamous cell carcinoma data, comprising the following steps:
Obtaining gene expression data of a sample to be tested; the gene expression data of the sample to be tested comprises the gene expression data of one or more of the following genes: BRCA1 gene, HFM1 gene;
inputting the gene expression data of the sample to be tested into the classification model disclosed in the first aspect of the application to obtain a classification result;
In a fourth aspect, the application discloses a system for processing esophageal squamous cell carcinoma data, comprising:
the acquisition unit is used for acquiring sequencing data of the sample to be tested;
The output unit is used for inputting the sequencing data of the sample to be tested into the classification model disclosed in the first aspect of the application to obtain classification results of the DDR-active subtype and the DDR-silent subtype.
In a fifth aspect, the application discloses an apparatus for processing esophageal squamous cell carcinoma data, the apparatus comprising: a memory and a processor;
The memory is used for storing program instructions; the processor is adapted to invoke program instructions which, when executed, are adapted to carry out the method of processing esophageal squamous cell carcinoma data as disclosed in the second and/or third aspect of the application.
A sixth aspect of the present application discloses a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of processing esophageal squamous cell carcinoma data disclosed in the second and/or third aspects of the present application.
The application has the following beneficial effects:
1. The application creatively discloses a model construction method for parting a primary ESCC according to a DDR access gene set and a gene expression condition, and a classification model of 2 classification results of DDR-active subtype and DDR-silent subtype is obtained; meanwhile, in the model construction process, two independent prognosis biomarkers BRCA1 and HFM1 are also determined, the classification model can be used for effectively predicting the subsequent survival rate of primary ESCC patients with frequent rapid recurrence and poor prognosis, a new clue and a new view angle are provided for identifying novel molecular subtypes based on DDR for tumor heterogeneity, and potential clinical significance of treatment and management strategies of the primary ESCC patients of the DDR-silent subtype is disclosed.
2. The method creatively classifies primary ESCC patients based on the classification model, and is used for carrying out clinical prognosis evaluation on the patients based on analysis of sequencing data or prognosis prediction genes of the patients.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for processing esophageal squamous cell carcinoma data provided by a second aspect of an embodiment of the invention;
FIG. 2 is a schematic diagram of an apparatus for processing and analyzing esophageal squamous cell carcinoma data provided by an embodiment of the invention;
FIG. 3 is a schematic flow chart of a processing analysis system for esophageal squamous cell carcinoma data provided by an embodiment of the invention;
FIG. 4 is an ESCC tumor cluster analysis chart based on DDR gene map provided by the embodiment of the invention;
FIG. 5 is a graph showing the results of the BRCA1 and HFM1 mediated DNA damage in ESCC provided in the examples of the present invention.
Detailed Description
In order to enable those skilled in the art to better understand the present invention, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present invention with reference to the accompanying drawings.
In some of the flows described in the specification and claims of the present invention and in the foregoing figures, a plurality of operations occurring in a particular order are included, but it should be understood that the operations may be performed out of order or performed in parallel, with the order of operations such as 101, 102, etc., being merely used to distinguish between the various operations, the order of the operations themselves not representing any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments according to the invention without any creative effort, are within the protection scope of the invention.
Fig. 1 is a schematic flow chart of a processing method of esophageal squamous cell carcinoma data provided in a second aspect of an embodiment of the invention, specifically, the method includes the following steps:
101: acquiring sequencing data of a sample to be tested;
In one embodiment, the sequencing data of the test sample is RNA-seq data of a primary ESCC patient. Primary is relative to secondary and metastatic. That is, a disease first occurs in a tissue or organ for which the disease is primary. As an example: primary hepatocellular carcinoma, i.e., hepatocellular carcinoma occurs first, while secondary liver cancer is cancer in other areas, which is transferred to the liver along with blood flow or lymphatic path, and the primary area is in other tissues or organs, not the liver. The primary ESCC patient in this example refers to a surgically resected patient who received a primary tumor, followed by radiation therapy, with or without chemotherapy.
In one embodiment, the RNA-seq, i.e., transcriptome sequencing, is a sequencing analysis using high throughput sequencing techniques that reflects the expression level of mRNA, smallRNA, noncodingRNA, etc., or some of them. In the last decade, RNA-Seq technology has evolved rapidly and has become an indispensable tool for analyzing differential gene expression/variable cleavage of mRNA at the transcriptome level. With the development of the next generation sequencing technology, the application range of the RNA-Seq technology becomes wider: firstly, in the field of RNA biology, RNA-Seq can be applied to single cell gene expression/protein expression/RNA structure analysis; secondly, the concept of spatial transcriptomes is also growing. Long read long/direct RNA-Seq technology and better data analysis and calculation tools have the advantage of helping biologists to gain insight into RNA biology with RNA-Seq-e.g. when and where transcription starts; and how to influence RNA functions by in vivo folding and intermolecular actions.
A transcriptome is a collection of all transcripts produced by a particular species or cell type. Transcriptome research can research gene functions and gene structures from an overall level, reveals specific biological processes and molecular mechanisms in disease occurrence processes, and has been widely applied to the fields of basic research, clinical diagnosis, drug development and the like.
In one embodiment, the sample to be tested is a primary ESCC patient clinically used to receive a prognostic evaluation.
102: Inputting the sequencing data of the sample to be tested into a constructed classification model to obtain classification results of DDR-active subtype and DDR-silent subtype;
In one embodiment, the method further comprises: predicting the survival rate of the sample to be detected based on the classification result; outputting a result with high survival rate of the sample to be tested based on the classification result of the DDR-active subtype; outputting a result with low survival rate of the sample to be tested based on the classification result of the DDR-silent subtype;
In one embodiment, the method for constructing the classification model includes:
Acquiring sequencing data of a training set sample and a life cycle condition corresponding to the sample;
Extracting a DDR channel gene set and a gene expression condition thereof from sequencing data of the training set sample; the training set samples included RNA-seq data for tumor tissue of 82 primary ESCCs and 73 ESCCs with lymph node metastasis; the patients received surgical excision of the primary tumor and lymph node dissection followed by radiotherapy, with or without chemotherapy. The data of the 155 patients are from ESCC queues of the Shanxi province tumor Hospital (SCH), the RNA-seq data of the SCH queues are stored in Gene Expression Omnibus (GEO), the accession number is GSE53625, and clinical and pathological data of 97 patients are determined through retrospective examination of SCH electronic medical records, and the follow-up period is finished 2019/06 8 months. RNA-seq data analysis of the Hiseq Illumina platform was collected from UCSC Xena atlas (https:// xenabrowser. Net/datapages /), the RNA-seq data encompassing TPM levels and log2 (x+1) normalization;
selecting the DDR pathway gene set to obtain a pathway related to survival rate and the gene expression condition of the pathway related to survival rate; the survival rate related pathway comprises one or more of the following: MMR pathway, NER pathway, FA pathway, and NHEJ pathway;
and carrying out cluster analysis on the training set sample based on the lifetime condition to obtain different classification subtypes, and characterizing the passage and gene expression condition of each group of classification subtypes to obtain a classification model.
In one embodiment, to characterize the DDR subtype, the DDR subtype is first subjected to Differential Expression (DE) analysis using R-packet limma (V3.50.3) to determine subtype-specific genes. The Differentially Expressed Gene (DEG) was defined as a logarithmic fold change (logFC) < = -1 or > = 1 and the adjustment P value <0.05. Then, a pathway enrichment analysis was performed on the DEG from the MSigDB (Genset database: https:// www.jianshu.com/p/99369b2f7a7 d) for a set of carefully selected marker pathways to identify enriched pathways in the DDR subtype, as implemented by R-package cluster analysis program clusterprofiler (version 4.2.2).
In one embodiment, the gene expression profile of the survival-related pathway comprises the gene expression profile of one or more of the following genes :POLD1、POLD2、POLD3、POLD4、MSH2、MSH3、MSH6、MLH1、MLH3、PMS1、PMS2、MSH4、MSH5、EXO1、HMGB1、HMGB1、LIG1、PCNA、RFC2、RFC4、RFC3、RFC5、RFC1、RPA1、RPA2、RPA3、RPA4、POLD1、POLD2、POLD3、POLD4、PCNA、RFC1、RFC2、RFC3、RFC4、RFC5、POLE、POLE2、POLE3、POLE4、POLK、CUL4A、DDB1、DDB2、RBX1、CUL4A、DDB1、DDB2、RBX1、CETN2、RAD23B、XPC、POLR2A、POLR2B、POLR2C、POLR2D、POLR2E、POLR2F、POLR2G、POLR2H、POLR2I、POLR2J、POLR2K、POLR2L、CUL3、CUL5、ERCC1、ERCC4、ERCC5、LIG1、TCEB1、TCEB2、TCEB3、UVSSA、XPA、RPA1、RPA2、RPA3、RPA4、CDK7、ERCC2、ERCC3、GTF2H1、GTF2H2、GTF2H3、GTF2H4、GTF2H5、MNAT1、ERCC6、ERCC8、LIG3、RAD23A、XAB2、XRCC1、GADD45A、GADD45G、BLM、RMI2、TOP3A、TOP3B、BARD1、BRCA1、BRCA2、BRIP1、PALB2、FAAP100、FAAP24、FANCA、FANCB、FANCC、FANCE、FANCF、FANCG、FANCL、FANCM、APITD1、HES1、STRA13、UBE2T、FANCD2、FANCI、BRE、CCDC98、DNA2、FAN1、HELQ、KAT5、RAD51、RAD51C、TELO2、USP1、WDR48、APLF、ATM、MDC1、MRE11A、NBN、PARP3、RAD50、RNF168、RNF8、TP53BP1、DCLRE1C、LIG4、NHEJ1、PRKDC、XRCC5、XRCC6、LIG4、NHEJ1、XRCC4、PRKDC、XRCC5、XRCC6、PNKP、POLL、POLM、MRE11A、RAD50、DNTT、POLB、POLL、POLM、APLF、APTX、DCLRE1C、PARG、XRCC2、XRCC3;
Optionally, the DDR pathway gene set includes: BER pathway (base excision repair, n=43), MMR pathway (mismatch repair, n=27), NER pathway (nucleotide excision repair, n=70), FA pathway (fanconi anemia, n=36), HR pathway (homologous recombination, n=55) and NHEJ pathway (non-homologous end joining, n=37);
In one embodiment, the method of cluster analysis is: a consistency clustering algorithm; consistency clustering is also called consensus clustering, and is a method for aggregating the results of various clustering algorithms, and is also called clustering integration or aggregation of clusters. It is meant that a number of different (input) clusters have been obtained for a particular dataset and that it is desirable to find a single (consistent) cluster, in some sense more appropriate than existing clusters. Thus, consistent clustering is a problem of coordinating clustering information about the same dataset from different sources or different runs of the same algorithm. This clustering procedure was performed using R-packet ConsensuClusterPlus, iterated 1000 times and resampled 90%. The core algorithm is a k-means algorithm based on Euclidean distance, and a single algorithm cannot be realized.
Optionally, the method for selecting processing includes: single variable Cox regression analysis;
Optionally, the sequencing data of the training set sample includes: RNA-seq data of primary ESCC tumor tissue samples and metastatic ESCC tumor tissue samples. By analyzing the RNA-seq data of primary and metastatic ESCC tumor tissue samples, DDR pathway analysis determined that the DDR active subtype and DDR silent subtype have independent prognostic value in primary ESCC, but not in metastatic ESCC.
In one embodiment, the building method further comprises: based on the gene expression condition of the survival rate related path, obtaining (8) DDR gene sets and corresponding gene expression conditions related to survival results by utilizing a univariate regression analysis method; the DDR gene set related to the survival result and the corresponding gene expression situation are processed by utilizing a multivariate analysis method, so that the gene expression situations of a prognosis prediction gene and a prognosis prediction gene are obtained; sex, grade, smoking history and drinking history are controlled in the multivariate analysis;
And carrying out cluster analysis on the training set sample based on the lifetime condition to obtain different classification subtypes, and representing the prognosis prediction genes and the gene expression conditions of each group of classification subtypes to obtain a classification model.
The different classification subtypes of the classification model include: DDR-active subtype and DDR-silent subtype; the DDR-active subtype corresponds to the gene expression condition of a pathway with high survival rate, and the DDR-active subtype corresponds to the gene expression condition of a pathway with low survival rate; there is no specific threshold for the survival rate, and it is concluded by statistical comparative analysis between the DDR-active subtype and the DDR-silent subtype.
In one example, correlation between DDR subtype and ESCC survival with and without LNM (lymph node metastasis) was studied using a hierarchical analysis method, and the results showed that the survival rate of primary ESCC tumors of the DDR-slient subtype was the worst compared to that of the metastatic ESCC tumors of the DDRSLIENT subtype (log-rankp =0.032), but no significant difference was observed in survival rate of the metastatic ESCC tumors between DDR subtypes (log-rankp =0.34). DDR pathway analysis established that the DDR active subtype and DDR silent subtype have independent prognostic value in primary ESCC, but not in metastatic ESCC.
In one embodiment, to further verify the association between DDR subtype and survival outcomes, DDR subtypes of 74 tumors in the TCGA-ESCC cohort and 117 tumors in the Chen cohort were also summarized. Consistent with the findings in this cohort, DDR subtype assisted survival prediction was only used for primary ESCC tumors, allowing identification of a subset of patients with good or poor outcome (TCGA-ESCC cohort, hr=0.075, 95% ci 0.008-0.674, log-rankp =0.004; for Chen cohort, hr=0.430, 95% ci 0.186-0.995, log-rankp =0.042), and failure to stratify survival of ESCC tumors with LNM. Multivariate Cox regression analysis showed that the DDR subtype was a powerful predictor of survival outcome and it was independent of clinical variables and underscores the value of the DDR subtype and its robustness in predicting primary ESCC patient survival outcome. Stratified analysis is to separate the population into different layers (sub-groups) according to a certain characteristic, such as gender, age, etc., and analyze the association of exposure and disease in each layer separately. The objective of hierarchical analysis is to control confounding factors, adjust the interference of these factors-estimate the magnitude of the confounding factors' impact on the relationship between exposure and outcome. Hierarchical analysis is a scenario to cope with mean value failure. Wherein the TCGA-ESCC queue contains RNA-seq data of 74 patients, collected from UCSC Xena atlas (https:// xenabrowser. Net/datapages /); the Chen cohort is 117 cases of ESCC patient microarray data of the academy of medical science and the college of beijing synergetics, and clinical data was obtained from Gene Expression Omnibus (GEO, www.ncbi.nlm.nih.gov/GEO/query/acc.cgiac=gse 53624).
Optionally, the prognostic prediction gene includes: BRCA1 gene and HFM1 gene; the DDR-active subtype corresponds to the high BRCA1 gene expression quantity, and the DDR-silent subtype corresponds to the high HFM1 gene expression quantity; there is no specific threshold for the amount of gene expression, and it is concluded by statistical comparative analysis between the DDR-active subtype and the DDR-silent subtype.
In one embodiment, 3 independent queues are used for prognosis evaluation of DDR genes in primary and metastatic ESCC tumor tissues respectively by using meta analysis, and BRCA1 and HFM1 are predictors of survival results of primary ESCC patients, but do not contribute to prognosis of metastatic ESCC; BRCA1 was identified as a favorable prognostic factor, with high expression associated with improved survival, combined HR of 0.22, while HFM1 is a risk factor, with increased expression associated with poor survival results, with a different aggregate HR of 4.41.
In one embodiment, the cells sense and repair DNA damage, maintain genomic integrity and prevent tumorigenesis in the presence of BRCA 1. BRCA1 deficiency can disrupt normal DDR and lead to accumulation of DNA damage. However, the role of HFM1 (an ATP dependent DNA helicase homolog) in DDR has not been studied. To determine the role of BRCA1 and HFM1 in ESCC cell DDR, cell models of cisplatin (DDP) and X-IR induced in vitro DNA damage were constructed, expression of BRCA1 or HFM1 was silenced using transient siRNA transfection, and cells were treated with cisplatin (DDP) or X-IR. The knockout efficiency of BRCA1 and HFM1 was examined by Western blotting. To directly assess DDR, γh2ax (a mature DNA DSB marker) was visualized by immunofluorescence. Spontaneous and DDP or IR induced γh2ax lesions were counted and analyzed. After DDP or X-IR treatment, γH2AX accumulates. Furthermore, immunofluorescence analysis showed a significant increase in endogenous γh2ax accumulation in KYSE410 and KYSE450 cells following BRCA1 knockout under IR and DDP treatment. In contrast, HFM1 knockdown significantly reduced the number of γH2AX lesions in KYSE30 and KYSE450 cells treated with X-IR or DDP. These results indicate that the loss of BRCA1 results in DDR defects, which support the role of BRCA1 as an advantageous prognostic factor, whereas the loss of HFM1 promotes DDR, supporting the role of HFM1 as a prognostic risk factor.
The third aspect of the application discloses a method for processing esophageal squamous cell carcinoma data, comprising the following steps:
Obtaining gene expression data of a sample to be tested; the gene expression data of the sample to be tested comprises the gene expression data of one or more of the following genes: BRCA1 gene, HFM1 gene;
inputting the gene expression data of the sample to be tested into the classification model disclosed in the first aspect of the application to obtain a classification result;
Optionally, the gene expression data of the sample to be tested is data of a primary ESCC patient.
Fig. 2 is a processing analysis device for esophageal squamous cell carcinoma data provided by an embodiment of the invention, the device comprising: a memory and a processor; the memory is used for storing program instructions; the processor is used for calling program instructions which, when executed, are used for executing the processing method of esophageal squamous cell carcinoma data.
Fig. 3 is a processing analysis system for esophageal squamous cell carcinoma data provided by an embodiment of the invention, comprising:
an acquiring unit 301, configured to acquire sequencing data of a sample to be tested;
The output unit 302 is configured to input the sequencing data of the sample to be tested to the classification model disclosed in the first aspect of the present application, so as to obtain classification results of the DDR-active subtype and the DDR-silent subtype.
The processing and analyzing system for esophageal squamous cell carcinoma data provided by the embodiment of the invention comprises:
The acquisition unit acquires gene expression data of a sample to be detected; the gene expression data of the sample to be tested comprises the gene expression data of one or more of the following genes: BRCA1 gene, HFM1 gene;
And the output unit is used for inputting the gene expression data of the sample to be detected into the classification model disclosed in the first aspect of the application to obtain a classification result.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method of processing esophageal squamous cell carcinoma data as described above.
FIG. 4 is a chart of ESCC tumor cluster analysis based on DDR gene map provided by the embodiment of the invention, wherein,
(A) Heat map of fold change in DDR gene expression between DDR subtypes. Red bars represent DDR-active subtypes and green bars represent DDR-silent subtypes. DDR subtypes are classified by consensus clustering methods. (B-D) Kaplan-Meier curves compare the OS (log rank test) of DDR-active subtype, DDR-silent subtype and transition subtype groups. HR and 95% ci were calculated by double sided Wald test using univariate Cox regression.
FIG. 5 is a graph of the results of the BRCA1 and HFM1 mediated DNA damage reaction in ESCC provided in the examples of the present invention, wherein (A, B) KYSE410 and KYSE450 cells are transfected with BRCA1 siRNA, treated with 2. Mu.g/ml DDP, and analyzed by Western blotting for γH2AX. (C, D) KYSE410 and KYSE450 cells were transfected with BRCA1 siRNA, exposed to IR (4 Gy), harvested at the indicated times and analyzed by Western blot for γH2AX. (E, F) representative pictures and quantification of gamma H2AX lesions in control and BRCA1 knockdown KYSE410 and KYSE450 cells were treated with 2. Mu.g/ml DDP for the indicated times. Data represent three independent experiments. Each dot represents one cell, and 50 cells per group were counted for this experiment with Image J. Error bars represent ± SD of the experiment. The P-value was determined by unpaired double sided t-test. (G, H) representative pictures and quantification of γH2AX lesions in control and BRCA1 knockdown KYSE410 and KYSE450 cells, treatment with IR (4 Gy) for the indicated times. Data represent three independent experiments. Each dot represents one cell, and 50 cells per group were counted for this experiment with Image J. Error bars represent ± SD of the experiment. The P-value was determined by unpaired double sided t-test. (I, J) KYSE30 and KYSE450 cells transfected with HFM1 siRNA, treated with 2. Mu.g/ml DDP and analyzed by Western blotting
Γh2ax. (K, L) KYSE30 and KYSE450 cells were transfected with HFM1 siRNA, exposed to IR (4 Gy), harvested at the indicated times and analyzed by Western blot for γH2AX. Representative pictures and quantification of γh2ax lesions in (M, N) control and HFM1 knockdown KYSE30 and KYSE450 cells were treated with 2 μg/ml DDP for the indicated times. Data represent three independent experiments. Each dot represents one cell and Image J counted 50 cells for each group of the experiment. Error bars represent ± SD of the experiment. The P-value was determined by unpaired double sided t-test. (O, P) control and HFM1 knockdown KYSE30 and KYSE450 cells with IR (4 Gy) treatment for a specified time of representative pictures and quantification of gamma H2AX lesions. Data represent three independent experiments. Each dot represents one cell and Image J counted 50 cells for each group of the experiment. Error bars represent ± SD of the experiment. The P-value was determined by unpaired double sided t-test.
The results of the verification of the present verification embodiment show that assigning an inherent weight to an indication may moderately improve the performance of the present method relative to the default settings.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in implementing the methods of the above embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, where the storage medium may be a read only memory, a magnetic disk or optical disk, etc.
While the foregoing describes a computer device provided by the present invention in detail, those skilled in the art will appreciate that the foregoing description is not meant to limit the invention thereto, as long as the scope of the invention is defined by the claims appended hereto.

Claims (14)

1. A construction method of esophageal squamous cell carcinoma classification model comprises the following steps:
Acquiring sequencing data of a training set sample and a life cycle condition corresponding to the sample;
extracting a DDR channel gene set and a gene expression condition thereof from sequencing data of the training set sample; selecting the DDR pathway gene set to obtain a pathway related to survival rate and the gene expression condition of the pathway related to survival rate; the survival rate related pathway comprises one or more of the following: MMR pathway, NER pathway, FA pathway, and NHEJ pathway; based on the gene expression condition of the survival rate related path, obtaining a DDR gene set related to a survival result and a corresponding gene expression condition by utilizing a univariate regression analysis method; the DDR gene set related to the survival result is processed by utilizing a multivariate analysis method, so that the gene expression conditions of prognosis prediction genes and prognosis prediction genes are obtained; the prognosis prediction gene comprises: BRCA1 gene and HFM1 gene;
and carrying out cluster analysis on the training set sample based on the lifetime condition to obtain different classification subtypes, and representing the prognosis prediction genes and the gene expression conditions of each group of classification subtypes to obtain a classification model.
2. The method for constructing esophageal squamous cell carcinoma classification model according to claim 1, wherein said DDR pathway gene set comprises one or several of the following: BER, MMR, NER, FA, HR, and NHEJ paths.
3. The method for constructing esophageal squamous cell carcinoma classification model according to claim 1, wherein the method for cluster analysis is as follows: consistency clustering algorithm.
4. The method for constructing a classification model of esophageal squamous cell carcinoma according to claim 1, wherein the method for selecting treatment comprises: univariate Cox regression analysis.
5. The method of constructing a classification model of esophageal squamous cell carcinoma according to claim 1, wherein the sequencing data of the training set sample comprises: RNA-seq data of primary ESCC tumor tissue samples and metastatic ESCC tumor tissue samples.
6. The method of constructing a classification model for esophageal squamous cell carcinoma according to any of claims 1-5, wherein the different classification subtypes of the classification model include: DDR-active subtype and DDR-silent subtype; the DDR-active subtype corresponds to the gene expression condition of a pathway with high survival rate, and the DDR-active subtype corresponds to the gene expression condition of a pathway with low survival rate.
7. The method for constructing a classification model of esophageal squamous cell carcinoma according to claim 6, wherein said DDR-active subtype corresponds to a high BRCA1 gene expression level and said DDR-active subtype corresponds to a high HFM1 gene expression level.
8. A method of processing esophageal squamous cell carcinoma data, comprising:
acquiring sequencing data of a sample to be tested;
Inputting the sequencing data of the sample to be tested into the classification model in any one of claims 1-7 to obtain classification results of the DDR-active subtype and the DDR-silent subtype.
9. The method of processing esophageal squamous cell carcinoma data of claim 8, further comprising: predicting the survival rate of the sample to be detected based on the classification result; outputting a result with high survival rate of the sample to be tested based on the classification result of the DDR-active subtype; and outputting a result with low survival rate of the sample to be tested based on the classification result of the DDR-silent subtype.
10. A method of processing esophageal squamous cell carcinoma data, comprising:
Obtaining gene expression data of a sample to be tested; the gene expression data of the sample to be tested comprises the gene expression data of one or more of the following genes: BRCA1 gene, HFM1 gene;
inputting the gene expression data of the sample to be tested into the classification model of any one of claims 1-7 to obtain a classification result.
11. The method for processing esophageal squamous cell carcinoma data of claim 10, wherein the gene expression data of the sample to be tested is data of a primary ESCC patient.
12. A system for processing esophageal squamous cell carcinoma data, comprising:
the acquisition unit is used for acquiring sequencing data of the sample to be tested;
The output unit is used for inputting the sequencing data of the sample to be tested into the classification model in any one of claims 1-7 to obtain classification results of the DDR-active subtype and the DDR-silent subtype.
13. A device for processing esophageal squamous cell carcinoma data, the device comprising: a memory and a processor;
the memory is used for storing program instructions; the processor is adapted to invoke program instructions, which when executed, are adapted to carry out the method of processing esophageal squamous cell carcinoma data of any of claims 8-11.
14. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements a method for processing esophageal squamous cell carcinoma data as set forth in any of the preceding claims 8-11.
CN202310063027.4A 2023-01-19 2023-01-19 Esophageal squamous cell carcinoma classification model construction and data processing method Active CN115982644B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310063027.4A CN115982644B (en) 2023-01-19 2023-01-19 Esophageal squamous cell carcinoma classification model construction and data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310063027.4A CN115982644B (en) 2023-01-19 2023-01-19 Esophageal squamous cell carcinoma classification model construction and data processing method

Publications (2)

Publication Number Publication Date
CN115982644A CN115982644A (en) 2023-04-18
CN115982644B true CN115982644B (en) 2024-04-30

Family

ID=85960554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310063027.4A Active CN115982644B (en) 2023-01-19 2023-01-19 Esophageal squamous cell carcinoma classification model construction and data processing method

Country Status (1)

Country Link
CN (1) CN115982644B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109863251A (en) * 2016-05-17 2019-06-07 基因中心治疗公司 To the method for squamous cell lung carcinoma subtype typing
CN110863048A (en) * 2019-12-06 2020-03-06 苏州卫生职业技术学院 Probe library, detection method and kit for detecting effectiveness of DNA homologous recombination repair pathway
CN112086199A (en) * 2020-09-14 2020-12-15 中科院计算所西部高等技术研究院 Liver cancer data processing system based on multiple groups of mathematical data
WO2021127610A1 (en) * 2019-12-20 2021-06-24 EDWARD Via COLLEGE OF OSTEOPATHIC MEDICINE Cancer signatures, methods of generating cancer signatures, and uses thereof
CN113345592A (en) * 2021-06-18 2021-09-03 山东第一医科大学附属省立医院(山东省立医院) Construction and diagnosis equipment for acute myeloid leukemia prognosis risk model
CN114496066A (en) * 2022-04-13 2022-05-13 南京墨宁医疗科技有限公司 Construction method and application of gene model for prognosis of triple negative breast cancer
CN114686591A (en) * 2022-05-12 2022-07-01 浙江大学医学院附属第四医院 Lung squamous carcinoma immunotherapy curative effect prediction model based on gene expression condition and construction method and application thereof
CN115232877A (en) * 2022-08-05 2022-10-25 中国医学科学院肿瘤医院 Molecular typing diagnosis marker for esophageal squamous carcinoma and application thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3271479B1 (en) * 2015-03-17 2019-07-17 Stichting Het Nederlands Kanker Instituut- Antoni van Leeuwenhoek Ziekenhuis Methods and means for subtyping invasive lobular breast cancer
US20210102260A1 (en) * 2018-02-16 2021-04-08 The Institute Of Cancer Research: Royal Cancer Hospital Patient classification and prognositic method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109863251A (en) * 2016-05-17 2019-06-07 基因中心治疗公司 To the method for squamous cell lung carcinoma subtype typing
CN110863048A (en) * 2019-12-06 2020-03-06 苏州卫生职业技术学院 Probe library, detection method and kit for detecting effectiveness of DNA homologous recombination repair pathway
WO2021127610A1 (en) * 2019-12-20 2021-06-24 EDWARD Via COLLEGE OF OSTEOPATHIC MEDICINE Cancer signatures, methods of generating cancer signatures, and uses thereof
CN112086199A (en) * 2020-09-14 2020-12-15 中科院计算所西部高等技术研究院 Liver cancer data processing system based on multiple groups of mathematical data
CN113345592A (en) * 2021-06-18 2021-09-03 山东第一医科大学附属省立医院(山东省立医院) Construction and diagnosis equipment for acute myeloid leukemia prognosis risk model
CN114496066A (en) * 2022-04-13 2022-05-13 南京墨宁医疗科技有限公司 Construction method and application of gene model for prognosis of triple negative breast cancer
CN114686591A (en) * 2022-05-12 2022-07-01 浙江大学医学院附属第四医院 Lung squamous carcinoma immunotherapy curative effect prediction model based on gene expression condition and construction method and application thereof
CN115232877A (en) * 2022-08-05 2022-10-25 中国医学科学院肿瘤医院 Molecular typing diagnosis marker for esophageal squamous carcinoma and application thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DNA damage repair profiling of esophageal squamous cell carcinoma uncovers clinically relevant molecular subtypes with distinct prognoses and therapeutic vulnerabilities;Zhao N, et al.;Ebiomedicine;20230917;全文 *
Integrated multi-omics profiling yields a clinically relevant molecular classification for esophageal squamous cell carcinoma;Liu Z, et al.;Cancer Cell;20230109;181-195 *
基于多数据库分析代谢相关基因DLAT在结直肠癌中的表达及其临床意义;王婷,等;解放军医学杂志;20190416(第04期) *
多基因模型在肝细胞癌预后中的应用;魏之菡,等;生物技术通报;20200421;183-192 *

Also Published As

Publication number Publication date
CN115982644A (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Siegel et al. Integrated RNA and DNA sequencing reveals early drivers of metastatic breast cancer
Wang et al. Integrated bioinformatics analysis the function of RNA binding proteins (RBPs) and their prognostic value in breast cancer
Nagahashi et al. Genomic landscape of colorectal cancer in Japan: clinical implications of comprehensive genomic sequencing for precision medicine
CA3148023A1 (en) Systems and methods for detecting cellular pathway dysregulation in cancer specimens
CN106414768A (en) Gene fusions and gene variants associated with cancer
JP2022515200A (en) Tumor classification based on predicted tumor gene mutation amount
CN116129998A (en) Esophageal squamous cell carcinoma data processing method and system
Shi et al. Hypoxia‐induced hsa_circ_0000826 is linked to liver metastasis of colorectal cancer
US20230383363A1 (en) Method for determining sensitivity to parp inhibitor or dna damaging agent using non-functional transcriptome
Lin et al. Evolutionary route of nasopharyngeal carcinoma metastasis and its clinical significance
Cui et al. ALDH2 promotes uterine corpus endometrial carcinoma proliferation and construction of clinical survival prognostic model
US20220275460A1 (en) Molecular predictors of patient response to radiotherapy treatment
JP2020522256A (en) Investigation of tumor and temporal heterogeneity through comprehensive omics profiling in patients with metastatic triple-negative breast cancer
CN115982644B (en) Esophageal squamous cell carcinoma classification model construction and data processing method
Lu et al. Gene expression along with genomic copy number variation and mutational analysis were used to develop a 9-gene signature for estimating prognosis of COAD
WO2020137076A1 (en) Method for predicting susceptibility of cancer to parp inhibitors, and method for detecting cancer having homologous recombination repair deficiency
Lin et al. LncRNA DIRC1 is a novel prognostic biomarker and correlated with immune infiltrates in stomach adenocarcinoma
Lv et al. Genetic instability-related lncRNAs predict prognosis and influence the immune microenvironment in breast cancer
Feng et al. P37. 24 Identification of microRNAs in Non-Small-Cell Lung Cancer Based on Bioinformation Analysis
Che et al. Bacterial lipopolysaccharide-related genes are involved in the invasion and recurrence of prostate cancer and are related to immune escape based on bioinformatics analysis
US20230416841A1 (en) Inferring transcription factor activity from dna methylation and its application as a biomarker
Wang et al. Genome Instability-Associated Long Non-Coding RNAs Reveal Biomarkers for Glioma Immunotherapy and Prognosis
Zhang et al. Identification of a novel RNA modifications-related model to improve bladder cancer outcomes in the framework of predictive, preventive, and personalized medicine
US20230326554A1 (en) Identifying treatment response signatures
WO2023125787A1 (en) Biomarkers for colorectal cancer treatment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant