CN114898874A - Prognosis prediction method and system for renal clear cell carcinoma patient - Google Patents

Prognosis prediction method and system for renal clear cell carcinoma patient Download PDF

Info

Publication number
CN114898874A
CN114898874A CN202210404313.8A CN202210404313A CN114898874A CN 114898874 A CN114898874 A CN 114898874A CN 202210404313 A CN202210404313 A CN 202210404313A CN 114898874 A CN114898874 A CN 114898874A
Authority
CN
China
Prior art keywords
prognosis
data
gene
risk group
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210404313.8A
Other languages
Chinese (zh)
Inventor
杨泽锐
刘星云
谭凯月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Biological and Medical Engineering of Guangdong Academy of Sciences
Original Assignee
Institute of Biological and Medical Engineering of Guangdong Academy of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Biological and Medical Engineering of Guangdong Academy of Sciences filed Critical Institute of Biological and Medical Engineering of Guangdong Academy of Sciences
Priority to CN202210404313.8A priority Critical patent/CN114898874A/en
Publication of CN114898874A publication Critical patent/CN114898874A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Abstract

The invention discloses a prognosis prediction method and a prognosis prediction system for patients with renal clear cell carcinoma, wherein the technical scheme of the invention selects two copper death related genes including CDKN2A and DLAT as biomarkers, establishes a prognosis model with 2 CRGs, and divides the patients with KIRC into high-risk groups and low-risk groups. The risk score for KIRC patients in the training set was significantly correlated with overall patient survival (P < 0.05). ROC curve analysis showed that AUC was 0.73, 0.66 and 0.65 in 1 year, 3 year and 5 year follow-up, respectively. The predicted performance has been validated in the test set. The model can effectively predict the prognosis of KIRC patients, which may help the selection of clinical treatment protocols, thereby achieving early intervention and early treatment. The research result of the invention provides new insight for the action of CRGs in the occurrence and development of KIRC, provides a valuable CRGs targeted treatment way for KIRC, and can be widely applied to the technical field of biomedicine.

Description

Prognosis prediction method and system for renal clear cell carcinoma patient
Technical Field
The invention relates to the technical field of biomedicine, in particular to a method and a system for predicting prognosis of a patient with renal clear cell carcinoma.
Background
Renal Cell Carcinoma (RCC) accounts for 2% to 3% of all adult cancers. Renal clear cell carcinoma (KIRC) is the most common histological subtype, accounting for 80% to 90% of RCC cases. The incidence of renal cell carcinoma has steadily increased over the past decades; in addition, renal cell carcinoma has the highest mortality rate among all urological malignancies, causing about 10 million deaths worldwide each year. Despite advances in molecular targeted therapies, such as mammalian targeted therapies against vascular endothelial growth factor and rapamycin inhibitors, increasing Overall Survival (OS) and progression-free survival in patients remains a significant clinical challenge. In order to further improve the treatment effect of renal clear cell carcinoma and to develop an accurate treatment strategy, oncologists need to predict the prognosis of renal clear cell carcinoma patients.
KIRC is one of the most common malignancies, threatens public health, and poses a serious global health burden. It is usually diagnosed in the late stage due to less obvious clinical symptoms in the early stage. TNM staging is a classical method of predicting KIRC prognosis based on clinical data. At present, a TNM (tumor node metastasis, TNM) staging system is commonly used in clinic as an index for judging prognosis of a patient with renal clear cell carcinoma, and a TNM staging standard is promulgated and implemented by the International Union of anticancer (UICC), and is a most widely applied tumor staging system in renal clear cell carcinoma diagnosis and treatment development at the present stage.
However, the TNM staging system is based on only three criteria, namely the status of the primary tumor (T), regional lymph node status (N) and distant metastasis status (M), and is divided into four stages (stages I, II, III and IV). Therefore, there is a limit to the predictive ability of the TNM staging system, which does not allow accurate prognosis prediction of renal clear cell carcinoma patients.
Disclosure of Invention
In view of this, the embodiment of the invention provides a renal clear cell carcinoma patient prognosis prediction method and system, which can accurately predict the clinical prognosis result of renal clear cell carcinoma, so as to realize targeted guidance of individual treatment, and have high clinical application value.
In a first aspect, embodiments of the present invention provide a method for predicting prognosis of a patient with renal clear cell carcinoma, comprising:
acquiring a data set and copper death related gene data according to prior knowledge;
carrying out standardization processing on gene expression data according to the data set to obtain a gene expression matrix; the gene expression matrix comprises a tumor tissue gene expression matrix and a normal tissue gene expression matrix;
performing differential expression analysis according to the copper death-related gene data and the gene expression matrix to obtain differential expression profile data;
randomly distributing the data set to obtain a training set and a test set, and acquiring target data from the training set according to preset conditions;
performing a prognostic analysis to determine a prognostic target gene by combining the target data and the differential expression profile data of the training set;
and constructing a prognosis model according to the prognosis target gene, and completing prognosis prediction of the patient with renal clear cell carcinoma through the prognosis model.
Optionally, the method further comprises:
and performing risk scoring on the sample data of the data set, and further performing verification and evaluation processing on the prognosis model.
Optionally, the performing a risk scoring on sample data of the dataset to complete an evaluation process of the prognosis model includes:
calculating risk scores of all samples of the training set and the verification set according to the risk score model;
dividing the samples of the training set into a high risk group and a low risk group according to the median of the risk scores of the training set;
dividing the samples of the verification set into a high risk group and a low risk group according to the median of the risk scores of the training set;
and performing verification evaluation treatment on the prognosis model according to the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the verification set.
Optionally, the performing a verification evaluation process of the prognosis model according to the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the validation set includes:
performing survival analysis and drawing survival curves on the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the verification set through a survivval package and a survivmini package of R software;
comparison of differences between groups of the high risk group and low risk group was performed by log-rank test.
Optionally, the performing a verification evaluation process of the prognosis model according to the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the validation set further includes:
the data set sample data was tested for prognostic efficacy by R software survivval package and timeROC package using time dependent ROC curves after 1 year, 3 years and 5 years for biomarkers.
Optionally, the performing differential expression analysis according to the copper death-related gene data and the gene expression matrix to obtain differential expression profile data includes:
determining a differential gene screening cut-off value according to the P value of less than 0.05 and the absolute value of the difference multiple of more than 1.5;
performing differential expression analysis on the copper death-related gene data and the gene expression matrix according to the differential gene screening cut-off value;
and obtaining differential expression profile data according to the differential analysis.
Optionally, said determining a prognostic target gene by performing a prognostic analysis in combination with said target data and said differential expression profile data of said training set, comprising:
merging the target data and the differential expression data of the training set, and determining a survival-related gene expression profile of the renal clear cell carcinoma patient through single-factor Cox regression analysis;
performing LASSO Cox regression analysis by combining survival time and survival state according to the gene expression profile related to the survival of the renal clear cell carcinoma patient;
performing multifactor Cox regression analysis to determine a prognostic gene signature according to the result of the LASSO Cox regression analysis;
wherein the prognostic gene signature includes a prognostic target gene including CDKN2A and DLAT and a regression coefficient.
Optionally, the constructing a prognosis model according to the prognosis target gene comprises:
constructing a prognosis model according to CDKN2A and DLAT, wherein the prognosis model is as follows:
risk score β 1 Expression level-beta of CDKN2A gene 2 Expression level of DLAT gene
Wherein, beta 1 Denotes the regression coefficient, β, of CDKN2A 2 The regression coefficients of DLAT are shown.
Optionally, the method further comprises:
and carrying out independence analysis verification on the prognosis model.
In a second aspect, embodiments of the present invention provide a renal clear cell carcinoma patient prognosis prediction system, comprising:
the first module is used for acquiring a data set and copper death related gene data according to prior knowledge;
the second module is used for carrying out standardization processing on gene expression data according to the data set to obtain a gene expression matrix; the gene expression matrix comprises a tumor tissue gene expression matrix and a normal tissue gene expression matrix;
a third module for performing differential expression analysis according to the copper death-related gene data and the gene expression matrix to obtain differential expression profile data;
the fourth module is used for randomly distributing the data set to obtain a training set and a test set and acquiring target data from the training set according to preset conditions;
a fifth module for performing a prognostic analysis to determine a prognostic target gene by combining the target data and the differential expression profile data of the training set;
and the sixth module is used for constructing a prognosis model according to the prognosis target gene and completing prognosis prediction of the renal clear cell carcinoma patient through the prognosis model.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method according to the first aspect of the embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a program, the program being executed by a processor to implement the method according to the first aspect of the embodiment of the present invention.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.
According to the embodiment of the invention, firstly, a data set and copper death related gene data are obtained according to prior knowledge; carrying out standardization processing on gene expression data according to the data set to obtain a gene expression matrix; the gene expression matrix comprises a tumor tissue gene expression matrix and a normal tissue gene expression matrix; performing differential expression analysis according to the copper death-related gene data and the gene expression matrix to obtain differential expression profile data; randomly distributing the data set to obtain a training set and a test set, and acquiring target data from the training set according to preset conditions; performing a prognostic analysis to determine a prognostic target gene by combining the target data and the differential expression profile data of the training set; and finally, constructing a prognosis model according to the prognosis target gene, and completing prognosis prediction of the renal clear cell carcinoma patient through the prognosis model. According to the invention, the patients with different risks and prognosis are layered by identifying some genetic characteristics and constructing a prognosis model, so that the clinical prognosis result of the renal clear cell carcinoma can be accurately predicted, and further the individual treatment can be specifically guided, and the method has a high clinical application value.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for predicting prognosis of a patient with renal clear cell carcinoma according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a histogram of differential expression of CRGs according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a Kaplan-Meier survival analysis curve of KIRC patients in the high risk group and the low risk group in the training set according to the embodiment of the present invention;
FIG. 4 is a schematic diagram of model time-dependent ROC curves for 1 year, 3 years and 5 years in a training set according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a Kaplan-Meier survival analysis curve for KIRC patients in the high risk group and the low risk group of the validation set provided by the embodiment of the present invention;
FIG. 6 is a schematic diagram of model time-dependent ROC curves for 1 year, 3 years, and 5 years of validation set provided by an embodiment of the present invention;
FIG. 7 is a schematic diagram of the independent prognostic value of a KIRC patient by a Cox regression analysis risk assessment prediction model provided by the embodiment of the present invention;
fig. 8 is a schematic diagram illustrating the results of ROC comparative analysis performed on the risk prediction model of KIRC patients and clinically relevant pathological information according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In a first aspect, referring to fig. 1, an embodiment of the present invention provides a method for predicting prognosis of a renal clear cell carcinoma patient, including:
acquiring a data set and copper death related gene data according to prior knowledge;
carrying out standardization processing on gene expression data according to the data set to obtain a gene expression matrix; the gene expression matrix comprises a tumor tissue gene expression matrix and a normal tissue gene expression matrix;
performing differential expression analysis according to the copper death-related gene data and the gene expression matrix to obtain differential expression profile data;
randomly distributing the data set to obtain a training set and a test set, and acquiring target data from the training set according to preset conditions;
performing a prognostic analysis to determine a prognostic target gene by combining the target data and the differential expression profile data of the training set;
and constructing a prognosis model according to the prognosis target gene, and finishing prognosis prediction of the renal clear cell carcinoma patient through the prognosis model.
Optionally, the method further comprises:
and performing risk scoring on the sample data of the data set, and further performing verification and evaluation processing on the prognosis model.
Optionally, the performing a risk scoring on sample data of the dataset to complete an evaluation process of the prognosis model includes:
calculating risk scores of all samples of the training set and the verification set according to the risk score model;
dividing samples of the training set into a high risk group and a low risk group according to the median of the risk scores of the training set;
dividing the samples of the verification set into a high risk group and a low risk group according to the median of the risk scores of the training set;
and performing verification evaluation treatment on the prognosis model according to the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the verification set.
Optionally, the performing a verification evaluation process of the prognosis model according to the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the validation set includes:
performing survival analysis and drawing survival curves on the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the verification set through a survivval package and a survivmini package of R software;
comparison of differences between groups of the high risk group and low risk group was performed by log-rank test.
Optionally, the performing a verification evaluation process of the prognosis model according to the high risk group and the low risk group of the training set and the high risk group and the low risk group of the verification set further comprises:
the data set sample data was tested for prognostic efficacy by R software survivval package and timeROC package using time dependent ROC curves after 1 year, 3 years and 5 years for biomarkers.
Optionally, the performing differential expression analysis according to the copper death-related gene data and the gene expression matrix to obtain differential expression profile data includes:
determining a differential gene screening cut-off value according to the P value of less than 0.05 and the absolute value of the difference multiple of more than 1.5;
performing differential expression analysis on the copper death-related gene data and the gene expression matrix according to the differential gene screening cut-off value;
and obtaining differential expression profile data according to the differential analysis.
Optionally, said determining a prognostic target gene by performing a prognostic analysis in combination with said target data and said differential expression profile data of said training set, comprising:
merging the target data and the differential expression data of the training set, and determining a survival-related gene expression profile of the renal clear cell carcinoma patient through single-factor Cox regression analysis;
performing LASSO Cox regression analysis by combining survival time and survival state according to the gene expression profile related to the survival of the renal clear cell carcinoma patient;
performing multifactor Cox regression analysis to determine a prognostic gene signature according to the result of the LASSO Cox regression analysis;
wherein the prognostic gene signature includes a prognostic target gene including CDKN2A and DLAT and a regression coefficient.
Optionally, the constructing a prognosis model according to the prognosis target gene comprises:
constructing a prognosis model according to CDKN2A and DLAT, wherein the prognosis model is as follows:
risk score β 1 Expression level-beta of CDKN2A gene 2 Expression level of DLAT gene
Wherein, beta 1 Denotes the regression coefficient, β, of CDKN2A 2 The regression coefficients of DLAT are shown.
Optionally, the method further comprises:
and carrying out independence analysis verification on the prognosis model.
In a second aspect, embodiments of the present invention provide a renal clear cell carcinoma patient prognosis prediction system, comprising:
the first module is used for acquiring a data set and copper death related gene data according to prior knowledge;
the second module is used for carrying out standardization processing on gene expression data according to the data set to obtain a gene expression matrix; the gene expression matrix comprises a tumor tissue gene expression matrix and a normal tissue gene expression matrix;
a third module for performing differential expression analysis according to the copper death-related gene data and the gene expression matrix to obtain differential expression profile data;
the fourth module is used for randomly distributing the data set to obtain a training set and a test set and acquiring target data from the training set according to preset conditions;
a fifth module for performing a prognostic analysis to determine a prognostic target gene by combining the target data and the differential expression profile data of the training set;
and the sixth module is used for constructing a prognosis model according to the prognosis target gene and completing prognosis prediction of the renal clear cell carcinoma patient through the prognosis model.
The content of the method embodiment of the present invention is applicable to the apparatus embodiment, the functions specifically implemented by the apparatus embodiment are the same as those of the method embodiment, and the beneficial effects achieved by the apparatus embodiment are also the same as those achieved by the method.
Another aspect of the embodiments of the present invention further provides an electronic device, including a processor and a memory;
the memory is used for storing programs;
the processor executes the program to implement the method as described above.
The contents of the embodiment of the method of the present invention are all applicable to the embodiment of the electronic device, the functions specifically implemented by the embodiment of the electronic device are the same as those of the embodiment of the method, and the beneficial effects achieved by the embodiment of the electronic device are also the same as those achieved by the method.
Yet another aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores a program, which is executed by a processor to implement the method as described above.
The contents of the embodiment of the method of the present invention are all applicable to the embodiment of the computer-readable storage medium, the functions specifically implemented by the embodiment of the computer-readable storage medium are the same as those of the embodiment of the method described above, and the advantageous effects achieved by the embodiment of the computer-readable storage medium are also the same as those achieved by the method described above.
The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and the computer instructions executed by the processor cause the computer device to perform the foregoing method.
The present invention is further illustrated in detail below with reference to specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to make the content and technical solution of the present application more clear, the related terms and meanings are explained as follows:
tumor markers: the tumor marker is a substance which is characterized in malignant tumor cells, or is produced by malignant tumor cells in an abnormal way, or is produced by the stimulation response of a host to the tumor, and can reflect the occurrence and the development of the tumor and monitor the response of the tumor to treatment.
Overall survival rate: the survival rate refers to the proportion of surviving cases after several years of follow-up visits (usually 1, 3, 5 years) in a patient receiving a certain treatment or a patient.
Necroptosis: is a novel cell death way independent of casPase and has important functions in various human diseases such as cerebral ischemia, myocardial ischemia, acute and chronic neurodegenerative diseases, tumors and the like.
The prognosis model: refers to the use of a multi-factor model to estimate the probability of a patient's future outcome. It is concerned with the probability of outcome in the current disease state, recurrence, death, disability and complications in the future for some period of time.
Individualized treatment: according to the different pathological types, the different tumor stages, the different gene types and the different physical conditions of the patients, the individual treatment scheme suitable for the patients is made by comprehensively considering the factors.
AUC value: the area of the region covered by the ROC curve, it is clear that the larger the AUC, the better the classifier classification.
AUC 1 is a perfect classifier, and when this prediction model is used, a perfect prediction can be obtained regardless of what threshold is set. In most prediction scenarios, no perfect classifier exists.
0.5< AUC <1, superior to random guess. This classifier (model) can be predictive if it sets the threshold value properly.
AUC is 0.5, the follower guesses the same (e.g. missing copper plate), and the model has no predictive value.
AUC <0.5, worse than random guess; but is better than random guessing as long as it always works against prediction.
It should be noted that copper plays a key role in the development of cancer, so that it has a very important significance for the research of Cu, and it may become a target with potential for inhibiting cancer. At present, foreign research teams find a plurality of copper cell death-related genes (CRGs), however, no research related to the expression and prognosis of CRGs in cancers exists at present. Therefore, it is necessary to analyze the expression and prognosis of CRGs in renal clear cell carcinoma, so that the clinical prognosis result of renal clear cell carcinoma can be predicted, the individualized treatment can be guided, and the method has high clinical application value. The technical scheme of the invention utilizes 2 CRGs to construct a prognosis model with better accuracy and specificity, thereby realizing the early intervention and early treatment of the KIRC and providing a valuable CRGs targeted treatment way for the KIRC. The invention aims to solve the problem that a technical blank exists in the analysis of the expression and prognosis of a copper death-related gene in KIRC, provides a KIRC patient prognosis model and a construction method and applies the KIRC patient prognosis model and the construction method to the prediction of the KIRC patient prognosis.
In order to achieve the above purpose, the technical scheme adopted by the invention specifically comprises the following steps:
1. data download and collation
Obtaining RNA sequencing data (FPKM values) and clinical information for KIRC from the TCGA database, the data set comprising 72 normal samples and 539 KIRC samples in total; copper death-related genes were obtained from the literature and a total of 10 CRGs were included for subsequent analysis. Expression amount data of CRGs was extracted from TCGA expression profile data using Perl script.
2. Differential expression analysis
Differential expression analysis was performed on CRGs in TCGA dataset using R language limma package, with P <0.05 and absolute value of Fold difference >1.5 as cut-off value for screening differential genes, i.e. Fold Change (FC) >1.5 as screening criteria to obtain differentially expressed CRGs, as shown in fig. 2, and finally 7 differential CRGs expression profile data of differentially expressed CRGs were obtained in total.
It should be noted that the P value (P value) is the probability of the observed result or more extreme result of the sample obtained when the original assumption is true. If the P value is small, the probability of the occurrence of the original hypothesis is small, and if the P value is small, the reason for rejecting the original hypothesis is reasonable according to the small probability principle, and the smaller the P value is, the more sufficient the reason for rejecting the original hypothesis is. In summary, a smaller P value indicates a more pronounced result. However, whether the test result is "significant", "moderately significant" or "highly significant" needs to be solved according to the magnitude of the P value and the actual problem.
3. Regression analysis of one-factor Cox, LASSO Cox and multifactor Cox
Patients in the TCGA dataset were treated with the R software caret package at 5: 5, randomly distributing the proportion to a training set and a testing set, screening out KIRC patients with survival time of more than 30 days and survival state (Dead or Alive) from the training set, merging the KIRC patients with differential CRGs expression profile data, carrying out single-factor Cox analysis on the training set, and screening genes related to the survival of the KIRC patients in the training set, wherein the genes with P <0.05 are considered to have influence on the survival of the KIRC patients; combining a gene expression profile, survival time and survival state which are obtained by single-factor Cox analysis and are related to the survival of the KIRC patient, and then carrying out LASSO Cox regression analysis; and finally, carrying out multi-factor Cox regression analysis on the genes screened by the Lasso regression, and constructing a prognostic gene signature by utilizing the linear combination of the multi-factor Cox regression model regression coefficient and the mRNA expression level to form a risk scoring formula. And calculating the risk score of each sample by using the same formula when verification is carried out in the verification set, dividing all samples into a high risk group and a low risk group according to the median of the risk score, and further carrying out Kaplan-Meier survival analysis and Receiver Operating Characteristic (ROC) curve analysis. Finally, a total of 2 genes were obtained for the construction of a prognostic model (relevant parameters for CDKN2A and DLAT are shown in table 1):
TABLE 1
Name of gene Regression coefficient HR HR.95L HR.95H P value
CDKN2A 0.0542 1.0557 1.0137 1.0994 0.0087
DLAT -0.1089 0.8967 0.8363 0.9615 0.0021
Note: HR in the one-way regression analysis is used to characterize the relative risk, wherein an HR value greater than 1 indicates that the expression value of the corresponding gene is in a positive correlation with the risk score, such that the corresponding regression coefficient is greater than 0, and an HR value less than 1 indicates that the expression value of the corresponding gene is in a negative correlation with the risk score, such that the corresponding regression coefficient is less than 0. Hr.95l and hr.95h each represent a 95% Confidence interval (Confidence interval).
According to the risk scoring formula:
Figure BDA0003601646940000101
wherein, the risk score is calculated by a risk score calculation formula, and riskScore represents the risk score, n represents the prognostic factor, i represents the ith prognostic gene, expi represents the expression value of the gene i, and β i represents the regression coefficient of the gene i.
Further, a prognostic model of the present invention can be obtained, expressed as:
prognostic risk score-0.0542 × CDKN2A gene expression level-0.1089 × DLAT gene expression level
The Cox ProPortional hazards regression model (Cox's probabilistic hazards regression model) is simply referred to as the Cox regression model. The model was proposed by british statistician d.r.cox in 1972, mainly for prognostic analysis of tumors and other chronic diseases, and also for etiological exploration in cohort studies. LASSO was first proposed by Robert Tibshirani in 1996, and is known as Least absolute shrinkage and selection oPerator. The method is a kind of compression estimation. It obtains a more refined model by constructing a penalty function, making it compress some coefficients, and setting some coefficients to zero. The advantage of subset puncturing is thus retained, and is a way to process biased estimates of data with complex collinearity.
4. Survival Curve analysis
Calculating the score of each sample in a training set according to a risk scoring formula, dividing the samples into a high risk group and a low risk group according to a median, calculating the risk scoring of each sample by using the same formula when verifying in a verification set, dividing the samples in the verification set into the high risk group and the low risk group according to the median in the samples in the training set, performing survival analysis on KIRC patients of the high risk group and the low risk group in the training set and the verification set by adopting R software 'survivval' and 'surviviner' packages, drawing a survival curve, and performing difference comparison among groups through log-rank test. Referring to fig. 3 and 5, the results show that the survival time of patients in the high risk group is significantly shorter than that in the low risk group in the training group. It also gets better verification in the verification set.
5. ROC curve analysis
In order to evaluate the accuracy of the prognostic model in predicting KIRC prognosis, R software "survivval" and "timeROC" packages were used to detect the prognostic efficacy of the biomarkers for 1 year, 3 years, 5 years using time-dependent ROC curves. Referring to fig. 4, the results show that the AUC for 1 year, 3 years and 5 years are 0.73, 0.66 and 0.65, respectively. Referring to fig. 5 and 6, AUC values indicate that the prognostic model consisting of 2 genes has better discriminatory performance on the prognosis of KIRC patients. Referring to fig. 6, it also gets better verification in the verification set.
6. Independence analysis of risk scoring models
Clinical characteristics of KIRC patients were collected from the TCGA database, including time to live, status of survival, age, histological Grade (Grade), clinical Stage (Stage), T Stage, M Stage, and Gender (Gender). A multivariate Cox regression analysis is performed using the clinical data and the risk score to assess whether the prognostic value of the risk score correlates with the clinical profile. A value of P <0.05 was considered statistically significant. Referring to fig. 7, wherein single and multi-factor Cox regression analyses were performed for Age (Age), histological Grade (Grade), clinical Stage (Stage), T-Stage, M-Stage, Gender (Gender) and Risk score (Risk score), respectively, fig. 7A is a single-factor Cox regression analysis; figure 7B is a multifactor Cox regression analysis with results showing that risk score is an independent prognostic factor for KIRC patients. The AUC value (risk score, 10 year AUC value ═ 0.693) of the prognostic model is higher than other clinical stages. Referring to fig. 8, where Age is patient Age, Grade is histological Grade, Stage is clinical Stage, T represents T Stage, M represents M Stage, genter represents Gender, and Risk score represents Risk score, models constructed based on 2 CRGs have better accuracy and specificity compared to TMN Stage.
In conclusion, the invention selects two copper death related genes including CDKN2A and DLAT as biomarkers, establishes a prognosis model with 2 CRGs, and divides KIRC patients into high-risk and low-risk groups. The risk score for KIRC patients in the training set was significantly correlated with overall patient survival (P < 0.05). ROC curve analysis showed that AUC was 0.73, 0.66 and 0.65 in 1 year, 3 year and 5 year follow-up, respectively. The predicted performance has been validated in the test set. The model can effectively predict the prognosis of KIRC patients, which may help the selection of clinical treatment protocols, thereby achieving early intervention and early treatment. The research result of the invention provides new insight for the function of CRGs in the occurrence and development of KIRC and provides a valuable CRGs targeted treatment way for KIRC.
In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution apparatus, device, or device (e.g., a computer-based apparatus, processor-containing apparatus, or other device that can fetch the instructions from the instruction execution apparatus, device, or device and execute the instructions). For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution apparatus, device, or apparatus.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by suitable instruction execution devices. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for predicting the prognosis of a patient with renal clear cell carcinoma, comprising:
acquiring a data set and copper death related gene data according to prior knowledge;
carrying out standardization processing on gene expression data according to the data set to obtain a gene expression matrix; the gene expression matrix comprises a tumor tissue gene expression matrix and a normal tissue gene expression matrix;
performing differential expression analysis according to the copper death-related gene data and the gene expression matrix to obtain differential expression profile data;
randomly distributing the data set to obtain a training set and a test set, and acquiring target data from the training set according to preset conditions;
performing a prognostic analysis to determine a prognostic target gene by combining the target data and the differential expression profile data of the training set;
and constructing a prognosis model according to the prognosis target gene, and finishing prognosis prediction of the renal clear cell carcinoma patient through the prognosis model.
2. The method of claim 1, further comprising:
and performing risk scoring on the sample data of the data set, and further performing verification and evaluation processing on the prognosis model.
3. The method of claim 2, wherein the assessing the prognosis of the prognosis model by risk scoring of sample data from the data set comprises:
calculating risk scores of all samples of the training set and the verification set according to the risk score model;
dividing the samples of the training set into a high risk group and a low risk group according to the median of the risk scores of the training set;
dividing the samples of the verification set into a high risk group and a low risk group according to the median of the risk scores of the training set;
and performing verification evaluation treatment on the prognosis model according to the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the verification set.
4. The method for predicting the prognosis of a renal clear cell carcinoma patient according to claim 3, wherein the performing of the verification evaluation process of the prognosis model according to the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the validation set comprises:
performing survival analysis and drawing survival curves on the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the verification set through a survivval package and a survivmini package of R software;
comparison of differences between groups of the high risk group and low risk group was performed by log-rank test.
5. The method for predicting the prognosis of a renal clear cell carcinoma patient according to claim 3, wherein the performing the verification evaluation process of the prognosis model according to the high-risk group and the low-risk group of the training set and the high-risk group and the low-risk group of the validation set further comprises:
the data set sample data was tested for prognostic efficacy by R software survivval package and timeROC package using time dependent ROC curves after 1 year, 3 years and 5 years for biomarkers.
6. The method for predicting the prognosis of a patient with renal clear cell carcinoma as claimed in claim 1, wherein the differential expression analysis based on the copper death-related gene data and the gene expression matrix to obtain differential expression profile data comprises:
determining a differential gene screening cut-off value according to the P value of less than 0.05 and the absolute value of the difference multiple of more than 1.5;
performing differential expression analysis on the copper death-related gene data and the gene expression matrix according to the differential gene screening cut-off value;
and obtaining differential expression profile data according to the differential analysis.
7. The method of claim 1, wherein the determining a prognostic target gene by performing a prognostic analysis in combination with the target data and the differential expression profile data of the training set comprises:
merging the target data and the differential expression data of the training set, and determining a survival-related gene expression profile of the renal clear cell carcinoma patient through single-factor Cox regression analysis;
performing LASSOCox regression analysis by combining survival time and survival state according to the gene expression profile related to the survival of the renal clear cell carcinoma patient;
performing multifactor Cox regression analysis to determine a prognostic gene signature according to the result of the LASSOCox regression analysis;
wherein the prognostic gene signature includes a prognostic target gene including CDKN2A and DLAT and a regression coefficient.
8. The method for predicting the prognosis of a patient with renal clear cell carcinoma according to claim 7, wherein the constructing a prognosis model based on the prognosis target gene comprises:
constructing a prognosis model according to CDKN2A and DLAT, wherein the prognosis model is as follows:
risk score β 1 Expression level-beta of CDKN2A gene 2 Expression level of DLAT gene
Wherein beta is 1 Denotes the regression coefficient, β, of CDKN2A 2 The regression coefficients of DLAT are indicated.
9. The method for predicting the prognosis of a patient with renal clear cell carcinoma according to any one of claims 1 to 8, further comprising:
and carrying out independence analysis verification on the prognosis model.
10. A renal clear cell carcinoma patient prognosis prediction system, comprising:
the first module is used for acquiring a data set and copper death related gene data according to prior knowledge;
the second module is used for carrying out standardization processing on gene expression data according to the data set to obtain a gene expression matrix; the gene expression matrix comprises a tumor tissue gene expression matrix and a normal tissue gene expression matrix;
a third module for performing differential expression analysis according to the copper death-related gene data and the gene expression matrix to obtain differential expression profile data;
the fourth module is used for randomly distributing the data set to obtain a training set and a test set and acquiring target data from the training set according to preset conditions;
a fifth module for performing a prognostic analysis to determine a prognostic target gene by combining the target data and the differential expression profile data of the training set;
and the sixth module is used for constructing a prognosis model according to the prognosis target gene and completing prognosis prediction of the renal clear cell carcinoma patient through the prognosis model.
CN202210404313.8A 2022-04-18 2022-04-18 Prognosis prediction method and system for renal clear cell carcinoma patient Pending CN114898874A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210404313.8A CN114898874A (en) 2022-04-18 2022-04-18 Prognosis prediction method and system for renal clear cell carcinoma patient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210404313.8A CN114898874A (en) 2022-04-18 2022-04-18 Prognosis prediction method and system for renal clear cell carcinoma patient

Publications (1)

Publication Number Publication Date
CN114898874A true CN114898874A (en) 2022-08-12

Family

ID=82716774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210404313.8A Pending CN114898874A (en) 2022-04-18 2022-04-18 Prognosis prediction method and system for renal clear cell carcinoma patient

Country Status (1)

Country Link
CN (1) CN114898874A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116564421A (en) * 2023-06-08 2023-08-08 苏州卫生职业技术学院 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116564421A (en) * 2023-06-08 2023-08-08 苏州卫生职业技术学院 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient
CN116564421B (en) * 2023-06-08 2024-01-30 苏州卫生职业技术学院 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient

Similar Documents

Publication Publication Date Title
Risom et al. Transition to invasive breast cancer is associated with progressive changes in the structure and composition of tumor stroma
Jamshidi et al. Evaluation of cell-free DNA approaches for multi-cancer early detection
Rendeiro et al. The spatial landscape of lung pathology during COVID-19 progression
Huang et al. Multi-parametric MRI-based radiomics models for predicting molecular subtype and androgen receptor expression in breast cancer
JP5931874B2 (en) Pancreatic cancer biomarkers and uses thereof
JP5701212B2 (en) Lung cancer biomarkers and their use
US20210109086A1 (en) Circulating tumor cell diagnostics for lung cancer
CN110114477A (en) Method for using total and specificity Cell-free DNA assessment risk
Golubnitschaja et al. Risk assessment, disease prevention and personalised treatments in breast cancer: is clinically qualified integrative approach in the horizon?
CN110958853A (en) Methods and systems for identifying or monitoring lung disease
Li et al. Driverless artificial intelligence framework for the identification of malignant pleural effusion
KR102044094B1 (en) Method for classifying cancer or normal by deep neural network using gene expression data
Chen et al. Clinical use of a machine learning histopathological image signature in diagnosis and survival prediction of clear cell renal cell carcinoma
CN113270188A (en) Method and device for constructing prognosis prediction model of patient after esophageal squamous carcinoma radical treatment
KR101378919B1 (en) System biological method of biomarker selection for diagnosis of lung cancer, subtype of lung cancer, and biomarker selected by the same
Greenbaum et al. A spatially resolved timeline of the human maternal–fetal interface
Huo et al. Eight-gene prognostic signature associated with hypoxia and ferroptosis for gastric cancer with general applicability
CN115410713A (en) Hepatocellular carcinoma prognosis risk prediction model construction based on immune-related gene
Clarke et al. Clinical validity of a gene expression signature in diagnostically uncertain neoplasms
CN114317532B (en) Evaluation gene set, kit, system and application for predicting leukemia prognosis
CN114898874A (en) Prognosis prediction method and system for renal clear cell carcinoma patient
KR101990430B1 (en) System and method of biomarker identification for cancer recurrence prediction
Xue et al. Machine learning for the prediction of acute kidney injury in patients after cardiac surgery
Hariri et al. Cost-effectiveness of a dual (immunohistochemistry and fluorescence in situ hybridization) HER2/neu testing strategy on invasive breast cancers
Khassafi et al. Transcriptional profiling unveils molecular subgroups of adaptive and maladaptive right ventricular remodeling in pulmonary hypertension

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination