CN116564421B - Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient - Google Patents

Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient Download PDF

Info

Publication number
CN116564421B
CN116564421B CN202310672840.1A CN202310672840A CN116564421B CN 116564421 B CN116564421 B CN 116564421B CN 202310672840 A CN202310672840 A CN 202310672840A CN 116564421 B CN116564421 B CN 116564421B
Authority
CN
China
Prior art keywords
death
copper
model
prognosis
risk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310672840.1A
Other languages
Chinese (zh)
Other versions
CN116564421A (en
Inventor
刘松柏
汤在祥
陈苏宁
王玺超
孙昊
东勇飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Vocational Health College
Original Assignee
Suzhou Vocational Health College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Vocational Health College filed Critical Suzhou Vocational Health College
Priority to CN202310672840.1A priority Critical patent/CN116564421B/en
Publication of CN116564421A publication Critical patent/CN116564421A/en
Application granted granted Critical
Publication of CN116564421B publication Critical patent/CN116564421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Biotechnology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Software Systems (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for constructing a copper death related prognosis model of an acute myeloid leukemia patient, which comprises the following steps: determining a new copper death-related gene set based on three data sets of GSE37642, GSE12417 and TCGA-LAML and a plurality of known copper death-related genes, obtaining copper death-related genes related to prognosis through single factor Cox regression, then performing dimensionality reduction on the copper death-related genes related to prognosis through spike-and-slide lasso, finally obtaining an optimal gene combination and regression coefficients thereof through stepwise regression, and constructing a preliminary model of copper death-related prognosis characteristics; and fitting the preliminary model of the copper death-related prognosis characteristic with a plurality of gene models based on a stacking strategy to obtain an expanded stacking model. The invention utilizes a stacking strategy and combines two high-quality gene models to expand the constructed copper death-related prognosis characteristics and improve the prediction efficiency of the models.

Description

Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient
Technical Field
The invention relates to the technical field of biological information, in particular to a method for constructing a prognosis model related to copper death of an acute myelogenous leukemia patient.
Background
Acute myelogenous leukemia (Acute myeloid leukemia, AML) is a molecular and cytogenetic heterogeneous disease characterized by the clonal expansion of myeloid precursors. In all leukemic subtypes, AML mortality was 44.3%. Up to now, chemotherapy remains a routine treatment for AML patients. However, the cure rate of the traditional intensive chemotherapy is only 30-50%. As basic medical research progresses, there is a greater understanding of AML, particularly in terms of its underlying mechanisms, environmental and genetic risk factors, and new therapeutic approaches. New therapies (especially targeted therapies and immunotherapy) and new clinical studies are critical to improve prognosis in AML patients. The determination of new prognostic markers is important for guiding leukemia-related studies and for advancing clinical therapies.
Copper death is a novel type of cell death, defined as the intracellular copper accumulation that triggers aggregation of mitochondrial lipid acylated proteins and instability of iron sulfur clusterin, resulting in a unique cell death. The TCGA database was used by the only two studies (Zhu, y., he, j, li, z. & Yang, w.cuproptosis-related lncRNA signature for prognostic prediction in patients with acute myeloid leukemia BMC Bioinformatics, 37, doi:10.1186/s12859-023-05148-9 (2023); li, p.et al A novel Cuproptosis-related LncRNA signature: prognostic and therapeutic value for acute myeloid leukemia, front Oncol 12, 966920) to construct gene models related to copper death in AML, which have problems in the course and outcome of the study. First, the amount of sample modeled is insufficient, the data volume of the data sets used in both studies is about 150, with less data volume being used for modeling, which may result in poor representativeness of the model and poor extrapolation results, in the studies of Li et al, the model performs poorly in the validation data set; secondly, the reported models are not verified by an external data set, verification sets of the models are obtained by randomly splitting an original data set, and good verification results are often obtained by verifying the models by randomly splitting the verification sets obtained by the data set. Based on the current state of research, it is necessary to perform research to fill the research gap in this field and to improve the quality and standard of future research.
Stacking strategies have a strong predictive power for handling complex problems. In recent years, stacks have been developed in the medical field and applied to clinical practice. The Wang et al establish a stacked set model by utilizing cfDNA fragment histology characteristics and obtain high sensitivity in detecting early lung cancer. Carina Albuquerque et al set up a stack-based artificial intelligence framework for effectively detecting and locating colonic polyps. Thus, stacking strategies have considerable potential in integrating models and important clinical practical implications in advancing the application of models. But there is little research to investigate the research potential of stacking strategies in the hematology direction.
Based on the current state of research, constructing a new copper death-related feature to predict the prognosis of AML patients and exploring the possibility of stacking strategies in the hematological direction has become an important research topic.
Disclosure of Invention
The invention provides a method for constructing a copper death-related prognosis model of an acute myeloid leukemia patient, which utilizes a stacking strategy and combines two high-quality gene models to expand the constructed copper death-related prognosis characteristics and improve the prediction efficiency of the model.
In order to achieve the above object, the present invention provides the following solutions:
a method for constructing a copper death-related prognosis model of an acute myelogenous leukemia patient comprises the following steps:
constructing a preliminary model of copper death-related prognosis characteristics:
using GSE37642 data set as training set, GSE12417 and TCGA-LAML queue as verification set, combining several known copper death related genes, determining new copper death related gene set;
obtaining copper death related genes related to prognosis through single factor Cox regression in a new copper death related gene set, then carrying out dimension reduction on the copper death related genes related to prognosis through spike-and-slide lasso, finally obtaining the optimal gene combination and regression coefficient thereof through stepwise regression, and constructing a preliminary model of copper death related prognosis characteristics;
extended copper death-related prognostic signature preliminary model:
based on a stacking strategy, fitting the preliminary model of the copper death related prognosis characteristics with a plurality of gene models to obtain an expanded stacking model which is used as a final copper death related prognosis model of the acute myeloid leukemia patient.
Further, the plurality of known copper death-related genes includes:
CDKN2A, FDX1, DLD, DLAT, LIAS, GLS, LIPT1, MTF1, PDHA1 and PDHB.
Further, the determining a new copper death-related gene set using the GSE37642 dataset as a training set and the GSE12417 and TCGA-LAML queues as a validation set, in combination with a plurality of known copper death-related genes, comprises:
carrying out Spearman rank correlation on a plurality of known copper death related genes and common genes of three data sets of GSE37642, GSE12417 and TCGA-LAML to obtain a correlation coefficient and a P value of each gene;
and taking the gene with the absolute value of the correlation coefficient larger than 0.4 and the P value smaller than 0.05 as a new copper death related gene to obtain a new copper death related gene set.
Further, the obtaining of copper death-related genes associated with prognosis by single factor Cox regression in the new copper death-related gene set comprises:
genes with P values less than 0.01 were selected from the new copper death-related gene set as prognosis-related copper death-related genes by single factor Cox regression.
Further, the preliminary model of the copper death-related prognostic characteristic is formulated as:
wherein,nfor the final modeling of the basis factors,Exp i beta, the expression level of the gene i Regression coefficients are stepwise regression.
Further, the number of gene models includes a 4-mRNA model and a 24gene model.
Further, fitting the preliminary model of the copper death-related prognosis feature with a plurality of gene models based on a stacking strategy to obtain an expanded stacking model as a final copper death-related prognosis model of the acute myeloid leukemia patient, comprising:
first, the GSE37642 dataset was used as training data, randomly divided into 10 uniform groups, called "folds";
secondly, fitting three sub-models of a copper death-related prognosis feature primary model, a 4-mRNA model and a 24gene model by multi-factor Cox regression through 9 folds in 10 tradeoffs, calculating the risk score of each sub-model in another trade-off, and repeating the process for 10 times, wherein the risk scores of the three sub-models can be obtained by all folds;
thirdly, integrating the risk score of each sub-model with the survival outcome of the training data;
fourthly, integrating the risk scores of the sub-models by adopting an additive linear model, wherein a random survival forest is used for fitting the integrated risk scores and survival outcomes, and obtaining the weight of the risk score of each sub-model by adopting a restrictive least square method;
fifthly, obtaining risk scores of all sub-models through multi-factor Cox regression in all training data again, and obtaining final integrated risk scores according to weights in the fourth step;
and sixthly, fitting the relation between the risk score and survival ending in the fifth step by using a random survival forest method to obtain an expanded stacking model which is used as a final copper death-related prognosis model of the acute myelogenous leukemia patient.
Further, the method further comprises:
verifying a preliminary model of copper death-related prognosis characteristics:
the prediction efficiency of the preliminary model of the copper death-related prognosis characteristics is estimated according to the ROC curve and the calibration chart during passing; meanwhile, patients were divided into high-risk and low-risk groups according to the optimal cut-off value of risk score, and survival differences between the two groups were compared, and a P value of log-rank test less than 0.05 was considered as a difference in survival between the two groups.
Further, the method further comprises:
verifying the expanded stacking model:
comparing the degree of distinction between the stacking model and each sub-model according to the ROC curve, and simultaneously, verifying the prediction effect of the risk classification of the stacking model and the European leukemia net risk classification by using a BeatAML queue;
patients are divided into low-risk groups and high-risk groups according to the optimal cut-off value of risk scores of the stacked models, and in European leukemia net risk classification, patients are divided into unfavorable, medium and favorable groups;
recombining two grouping standards of a low-risk group and a high-risk group, wherein the low-risk group is based on European leukemia network risk classification, and in the rest group, the medium-risk group and the high-risk group are based on classification standards of a stacking model, the low-risk group in the stacking model is re-divided into the medium-risk group, and the high-risk group in the stacking model is re-divided into the high-risk group;
the log-rank test was used to compare the differences between the survival curves of the groups, and when comparing between groups, a P value of less than 0.017 for the log-rank test was considered to be a difference between the two groups.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects: according to the method for constructing the copper death-related prognosis model of the acute myeloid leukemia patient, disclosed by the invention, the constructed copper death-related prognosis characteristic preliminary model is expanded by combining two other gene models published on a high-quality journal by utilizing a stacking strategy, and the advantages of different algorithms and models can be combined by stacking, so that the prediction performance is improved, and the stacking strategy has very strong prediction capability on complex problems. The model after expansion is superior to other models in the direction in the aspect of prediction efficiency, specifically, the distinguishing degree of the copper death related prognosis features is higher than that of other models, the model evaluation index is perfect, and the model generalization capability is strong.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for constructing a copper death-related prognosis model of an acute myeloid leukemia patient according to the present invention;
FIG. 2 is a schematic representation of the construction of copper death-related prognostic signatures in GSE37642 dataset according to the present invention, wherein (A) represents the optimal cut-off value of risk score; (B) Representing Kaplan-Meier plots showing patient OS differences stratified based on risk scores; (C) represents a ROC curve of 1,2, 3 years; (D) represents a calibration chart;
FIG. 3 is a schematic representation of the present invention for validating copper death-related prognostic signatures in GSE12417 dataset, wherein (A) represents an optimal cutoff value for risk score; (B) Representing Kaplan-Meier plots showing patient OS differences stratified based on risk scores; (C) represents a ROC curve of 1,2, 3 years; (D) represents a calibration chart;
FIG. 4 is a schematic diagram illustrating the verification of copper death-related prognosis features in a TCGA-LAML queue according to the present invention; (a) represents an optimal cut-off value for risk score; (B) Representing Kaplan-Meier plots showing patient OS differences stratified based on risk scores; (C) represents a ROC curve of 1,2, 3 years; (D) represents a calibration chart;
FIG. 5 is a1, 2, 3 year ROC curve comparison between the stacked model and each sub-model, with (A) - (D) representing the 1,2, 3 year ROC curves of the copper death-related features, 4-mRNA model, 24gene model, and stacked model in the GSE37642 dataset, respectively; (E) (H) represents the 1,2 and 3 year ROC curves of the copper death related features, the 4-mRNA model, the 24gene model and the stacking model in the GSE12417 data set respectively;
FIG. 6 is a Kaplan-Meier diagram of an embodiment of the present invention, (A) shows that the Kaplan-Meier diagram shows patient OS differences based on ELN layering; (B) The Kaplan-Meier graph shows patient OS differences based on new stratification.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a method for constructing a copper death-related prognosis model of an acute myelogenous leukemia patient, which utilizes a stacking strategy and combines two high-quality gene models to expand the constructed copper death-related prognosis characteristics and improve the prediction efficiency of the model.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in FIG. 1, the method for constructing the copper death-related prognosis model of the acute myelogenous leukemia patient provided by the invention comprises the following steps:
(1) Constructing a preliminary model of copper death-related prognosis characteristics:
taking a GSE37642 data set as a training set, taking a GSE12417 and TCGA-LAML queue as a verification set, combining 10 copper death related genes (CDKN 2A, FDX1, DLD, DLAT, LIAS, GLS, LIPT1, MTF1, PDHA1 and PDHB), and taking Spearman rank correlation between the 10 genes and a common gene of the three data sets (GSE 37642, GSE12417 and TCGA-LAML) to obtain a correlation coefficient and a P value of each gene, wherein the absolute value of the correlation coefficient is larger than 0.4, and the genes with the P value smaller than 0.05 are selected as new copper death related genes; in the new copper death-related gene set, copper death-related genes associated with prognosis were obtained by single factor Cox regression (P-value less than 0.01). Then, carrying out dimension reduction on copper death related genes related to prognosis through spike-and-slide lasso, finally, obtaining the optimal gene combination through stepwise regression, and constructing a preliminary model of copper death related prognosis characteristics by utilizing the genes obtained through stepwise regression and regression coefficients thereof:
wherein,nfor the final modeling of the basis factors,Exp i beta, the expression level of the gene i Regression coefficients are stepwise regression.
(2) Verifying a preliminary model of copper death-related prognosis characteristics:
the prediction efficiency of the preliminary model of the copper death-related prognosis characteristics is estimated according to the ROC curve and the calibration chart during passing; meanwhile, patients were divided into high-risk and low-risk groups according to the optimal cut-off value of risk score, and survival differences between the two groups were compared, and a P value of log-rank test less than 0.05 was considered as a difference in survival between the two groups.
(3) Extended copper death-related prognostic signature preliminary model:
based on a stacking strategy, fitting the preliminary model of the copper death related prognosis characteristics with a plurality of gene models to obtain an expanded stacking model which is used as a final copper death related prognosis model of the acute myeloid leukemia patient.
Wherein the plurality of gene models includes a 4-mRNA model and a 24gene model, the one model derived from:
Chen, Z. et al. A novel 4-mRNA signature predicts the overall survival in acute myeloid leukemia. Am J Hematol 96, 1385-1395, doi:10.1002/ajh.26309 (2021).
Li, Z. et al. Identification of a 24-gene prognostic signature that improves the European LeukemiaNet risk classification of acute myeloid leukemia: an international collaborative study. J Clin Oncol 31, 1172-1181, doi:10.1200/jco.2012.44.3184 (2013).
two other gene models were first collected: modeling genes corresponding to 4-mRNA models (KLF 9, ENPP4, TUBA4A and CD 247) and 24gene models (ALS 2CR8, ANGEL1, ARL6IP5, BSPRY, BTBD3, C1RL, CPT1A, DAPK1, ETFB, FGFR1, HEATR6, LAPTM4B, MAP7, NDFIP1, PBX3, PLA2G4A, PLOD3, PTP4A3, SLC25A12, SLC2A5, TMEM159, TRIM44, TRPS1 and VAV 3).
Based on the method, the model is expanded under the stacking framework, the technical flow is shown in fig. 1, and the method specifically comprises the following steps:
first, the GSE37642 dataset was used as training data, randomly divided into 10 uniform groups, called "folds";
secondly, fitting three sub-models of a copper death-related prognosis feature primary model, a 4-mRNA model and a 24gene model by multi-factor Cox regression through 9 folds in 10 tradeoffs, calculating the risk score of each sub-model in another trade-off, and repeating the process for 10 times, wherein the risk scores of the three sub-models can be obtained by all folds;
thirdly, integrating the risk score of each sub-model with the survival outcome of the training data;
fourthly, integrating the risk scores of the sub-models by adopting an additive linear model, wherein a random survival forest is used for fitting the integrated risk scores and survival outcomes, and obtaining the weight of the risk score of each sub-model by adopting a restrictive least square method;
fifthly, obtaining risk scores of all sub-models through multi-factor Cox regression in all training data again, and obtaining final integrated risk scores according to weights in the fourth step;
and sixthly, fitting the relation between the risk score and survival ending in the fifth step by using a random survival forest method to obtain an expanded stacking model which is used as a final copper death-related prognosis model of the acute myelogenous leukemia patient.
(4) Verifying the expanded stacking model:
comparing the degree of distinction between the stacking model and each sub-model according to the ROC curve, and simultaneously, verifying the prediction effect of the risk classification of the stacking model and the European leukemia net risk classification by using a BeatAML queue;
patients are divided into low-risk groups and high-risk groups according to the optimal cut-off value of risk scores of the stacked models, and in European leukemia net risk classification, patients are divided into unfavorable, medium and favorable groups;
recombining two grouping standards of a low-risk group and a high-risk group, wherein the low-risk group is based on European leukemia network risk classification, and in the rest group, the medium-risk group and the high-risk group are based on classification standards of a stacking model, the low-risk group in the stacking model is re-divided into the medium-risk group, and the high-risk group in the stacking model is re-divided into the high-risk group;
the log-rank test was used to compare the differences between the survival curves of the groups, and when comparing between groups, a P value of less than 0.017 for the log-rank test was considered to be a difference between the two groups.
Other methods (e.g., machine learning, bioinformatics methods, etc.) may be used in determining the final modeled gene portion.
In the process of constructing the stacking model, the scheme selects two high-quality journal upper-published gene models. The choice of these sub-models is alternative in practice. The alternative can be a model of other histology directions, and also can be a prediction model constructed by clinical variables; the scheme uses a random living forest method for final fitting the model, and can also be replaced by other machine learning algorithms.
Examples
In the process of constructing the copper death-related prognosis feature, 10 copper death-related genes (CDKN 2A, FDX1, DLD, DLAT, LIAS, GLS, LIPT, MTF1, PDHA1 and PDHB) which are reported at first are collected by taking a GSE37642 data set as a training set and a GSE12417 and TCGA-LAML queue as a verification set, the 10 genes are subjected to Spearman rank correlation with a common gene of the three data sets (GSE 37642, GSE12417 and TCGA-LAML), and 3170 novel copper death-related genes are obtained in total according to a screening standard that the absolute value of a correlation coefficient is larger than 0.4 and the P value is smaller than 0.05. On this basis 122 copper death-related genes associated with prognosis of the patient were obtained by single factor Cox regression. Dimension reduction is carried out through spike-and-slide lasso (S1=0.1, S0=0.01), and 24 prognosis-related copper death-related genes are further obtained through screening. After stepwise regression, the best combination of 14 genes was obtained. The risk scores were constructed from these genes and their regression coefficients as follows:
in the process of verifying the copper death-related prognosis characteristics, the prediction performance of the model is evaluated according to the ROC curve and the calibration chart when passing. At the same time, patients were divided into high-risk and low-risk groups according to the optimal cut-off value of risk score and the survival differences between the two groups were compared (a P value of log-rank test less than 0.05 was considered as a difference in survival between the two groups). As shown in fig. 2, in training set GSE37642, AUC for 1,2, 3 years are 0.748, 0.785, and 0.807, respectively; fitting the calibration curve to the diagonal line shows that the model has good calibration degree; there was a difference in survival between the high and low risk groups. As shown in fig. 3, in validation set GSE12417, AUCs for 1,2, 3 years are 0.757, 0.745, and 0.772, respectively; the calibration degree is good; there was a difference in survival between the high and low risk groups. As shown in fig. 4, in the validation set TCGA-LAML queue, AUCs for 1,2, and 3 years are 0.735, 0.758, and 0.748, respectively; the calibration degree is good; there was a difference in survival between the high and low risk groups.
In constructing the stack model, two microarray datasets (GSE 37642 as training set and GSE12417 as validation set) were used and two additional gene models (4-mRNA model and 24gene model) were collected as sub-models to construct the final stack model.
In validating the stacked model, the distinction between stacked model and each sub-model is compared by ROC curve when passing. As shown in fig. 5, in the training set GSE37642, the 1,2, 3 year AUC for the copper death-related prognostic signature was 0.748, 0.785, and 0.807, respectively; 1,2, 3 year AUC for the 4-mRNA model were 0.634, 0.645, and 0.652, respectively; the 1,2, 3 year AUC for the 24gene model was 0.704, 0.714, and 0.744, respectively; the 1,2, 3 year AUC for the stack model were 0.816, 0.843, and 0.857, respectively. In the validation set GSE12417, the 1,2, 3 year AUC for the copper death-related prognostic signature was 0.757, 0.745, and 0.772, respectively; 1,2, 3 year AUC for the 4-mRNA model were 0.678, 0.65, and 0.638, respectively; the 1,2, 3 year AUC for the 24gene model was 0.65, 0.653, and 0.646, respectively; the 1,2, 3 year AUC for the stacked model were 0.778, 0.751, and 0.769, respectively. Meanwhile, a BeatAML queue was used to verify the predictive effect of risk classification of the stacked model in combination with the European leukemia network risk classification. As shown in fig. 6, in the beaaml cohort, european leukemia net risk classification failed to distinguish survival of the intermediate and adverse group of acute myeloid leukemia patients (P value of 0.2 for log-rank test). After integrating the risk classification of the stacked model and the european leukemia net risk classification, the P-value of the log-rank test between the two groups was 0.011, i.e. the new risk classification could be used to better distinguish the two groups of patients.
The copper death-related prognosis characteristic model constructed by the invention is superior to other models in the direction in the aspect of prediction efficiency. Specifically, the degree of distinction of the copper death related prognosis features is higher than that of other models, the model evaluation index is perfect, and the model generalization capability is strong.
The reason is that: (1) model differentiation is highly derived from strategies of gene screening: compared with the lasso method, the spike-and-slide lasso method has advantages in terms of variable selection and parameter estimation. (2) The predictive efficacy of the model is evaluated from multiple dimensions. The distinguishing degree and the calibration degree of the model are evaluated according to the ROC curve and the calibration chart when in use. Meanwhile, the patients are divided into a high-risk group and a low-risk group according to the optimal cut-off value of the linear predictive value, and survival differences between the two groups are compared. (3) The sample size of the modeled dataset is sufficient.
The constructed copper death-related prognostic signatures were extended using a stacking strategy in combination with two other gene models 1,2 published on high quality journals. The invention provides an integration strategy for reference, and the prediction efficiency of the original model can be improved based on the strategy.
The reason is that: stacking may combine the advantages of different algorithms and models to improve predictive performance. Stacking strategies have a strong predictive power for complex problems.
The invention also provides a system for constructing a copper death-related prognosis model of an acute myelogenous leukemia patient, which comprises the following steps:
the development module is used for constructing a preliminary model of the copper death-related prognosis characteristics:
the verification module is used for verifying the preliminary model of the copper death related prognosis characteristics and the expanded stacking model;
and the expansion model is used for expanding the preliminary model of the copper death-related prognosis characteristics.
The invention also discloses an electronic device, which comprises one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the acute myeloid leukemia patient copper death-related prognosis model construction method as described above.
Those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The disclosed systems, modules, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units may be merely a logical functional division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: USB flash disk, mobile hard disk, read-only memory (ROM), random access memory
Various media such as a disk or optical disk may store program code.
Those skilled in the art will appreciate that all or part of the processes in implementing the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the program may include processes of the embodiments of the methods as described above when executed. Wherein the storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.
The above description of embodiments is only for aiding in the understanding of the method of the present invention and its core ideas; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (5)

1. The method for constructing the copper death related prognosis model of the acute myelogenous leukemia patient is characterized by comprising the following steps of:
constructing a preliminary model of copper death-related prognosis characteristics:
using GSE37642 data set as training set, GSE12417 and TCGA-LAML queue as verification set, combining several known copper death related genes, determining new copper death related gene set; the plurality of known copper death-related genes includes: CDKN2A, FDX1, DLD, DLAT, LIAS, GLS, LIPT1, MTF1, PDHA1 and PDHB, specifically, a plurality of known copper death related genes are correlated with a common gene of three data sets of GSE37642, GSE12417 and TCGA-LAML to obtain a correlation coefficient and a P value of each gene; taking the gene with the absolute value of the correlation coefficient larger than 0.4 and the P value smaller than 0.05 as a new copper death related gene to obtain a new copper death related gene set;
obtaining copper death related genes related to prognosis through single factor Cox regression in a new copper death related gene set, then carrying out dimension reduction on the copper death related genes related to prognosis through spike-and-slide lasso, finally obtaining the optimal gene combination and regression coefficient thereof through stepwise regression, and constructing a preliminary model of copper death related prognosis characteristics;
extended copper death-related prognostic signature preliminary model:
fitting a preliminary model of copper death-related prognostic characteristics with a plurality of gene models based on a stacking strategy, wherein the gene models comprise a 4-mRNA model and a 24gene model, and obtaining an expanded stacking model which is used as a final copper death-related prognostic model of an acute myeloid leukemia patient, and the method specifically comprises the following steps:
first, the GSE37642 dataset was used as training data, randomly divided into 10 uniform groups, called "folds";
secondly, fitting three sub-models of a copper death-related prognosis feature primary model, a 4-mRNA model and a 24gene model by multi-factor Cox regression through 9 folds in 10 tradeoffs, calculating the risk score of each sub-model in another trade-off, and repeating the process for 10 times, wherein the risk scores of the three sub-models can be obtained by all folds;
thirdly, integrating the risk score of each sub-model with the survival outcome of the training data;
fourthly, integrating the risk scores of the sub-models by adopting an additive linear model, wherein a random survival forest is used for fitting the integrated risk scores and survival outcomes, and obtaining the weight of the risk score of each sub-model by adopting a restrictive least square method;
fifthly, obtaining risk scores of all sub-models through multi-factor Cox regression in all training data again, and obtaining final integrated risk scores according to weights in the fourth step;
and sixthly, fitting the relation between the risk score and survival ending in the fifth step by using a random survival forest method to obtain an expanded stacking model which is used as a final copper death-related prognosis model of the acute myelogenous leukemia patient.
2. The method for constructing a copper death-related prognosis model for an acute myeloid leukemia patient according to claim 1, wherein the obtaining the copper death-related gene related to prognosis by single factor Cox regression in a new copper death-related gene set comprises:
genes with P values less than 0.01 were selected from the new copper death-related gene set as prognosis-related copper death-related genes by single factor Cox regression.
3. The method for constructing a copper death-related prognosis model for an acute myeloid leukemia patient according to claim 2, wherein the preliminary model of copper death-related prognosis characteristics is formulated as:wherein, the method comprises the steps of, wherein,nfor the final modeling of the basis factors,Exp i beta, the expression level of the gene i Regression coefficients are stepwise regression.
4. The method for constructing a copper death-related prognosis model for an acute myeloid leukemia patient according to claim 1, wherein the method further comprises:
verifying a preliminary model of copper death-related prognosis characteristics:
the prediction efficiency of the preliminary model of the copper death-related prognosis characteristics is estimated according to the ROC curve and the calibration chart during passing; meanwhile, patients were divided into high-risk and low-risk groups according to the optimal cut-off value of risk score, and survival differences between the two groups were compared, and a P value of log-rank test less than 0.05 was considered as a difference in survival between the two groups.
5. The method for constructing a copper death-related prognosis model for an acute myeloid leukemia patient according to claim 1, wherein the method further comprises:
verifying the expanded stacking model:
comparing the degree of distinction between the stacking model and each sub-model according to the ROC curve, and simultaneously, verifying the prediction effect of the risk classification of the stacking model and the European leukemia net risk classification by using a BeatAML queue;
patients are divided into low-risk groups and high-risk groups according to the optimal cut-off value of risk scores of the stacked models, and in European leukemia net risk classification, patients are divided into unfavorable, medium and favorable groups;
recombining two grouping standards of a low-risk group and a high-risk group, wherein the low-risk group is based on European leukemia network risk classification, and in the rest group, the medium-risk group and the high-risk group are based on classification standards of a stacking model, the low-risk group in the stacking model is re-divided into the medium-risk group, and the high-risk group in the stacking model is re-divided into the high-risk group;
the log-rank test was used to compare the differences between the survival curves of the groups, and when comparing between groups, a P value of less than 0.017 for the log-rank test was considered to be a difference between the two groups.
CN202310672840.1A 2023-06-08 2023-06-08 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient Active CN116564421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310672840.1A CN116564421B (en) 2023-06-08 2023-06-08 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310672840.1A CN116564421B (en) 2023-06-08 2023-06-08 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient

Publications (2)

Publication Number Publication Date
CN116564421A CN116564421A (en) 2023-08-08
CN116564421B true CN116564421B (en) 2024-01-30

Family

ID=87503609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310672840.1A Active CN116564421B (en) 2023-06-08 2023-06-08 Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient

Country Status (1)

Country Link
CN (1) CN116564421B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409855B (en) * 2023-10-25 2024-04-26 苏州卫生职业技术学院 Hepatoma patient mismatch repair related prognosis model, and construction and verification methods and application thereof
CN117789819A (en) * 2024-02-27 2024-03-29 北京携云启源科技有限公司 Construction method of VTE risk assessment model

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6964868B1 (en) * 1998-01-28 2005-11-15 Nuvelo, Inc. Human genes and gene expression products II
CN102836292A (en) * 2011-06-20 2012-12-26 苏州卫生职业技术学院 Research method for effect of extracts of modified five-ingredient toxin-dispersing beverage to carrageenan-induced inflammation in mice
CN110468203A (en) * 2019-08-08 2019-11-19 浙江省人民医院 A kind of marker, detection primer sequence and its application for predicting Gliblastoma patient prognosis
CN112992346A (en) * 2021-04-09 2021-06-18 中山大学附属第三医院(中山大学肝脏病医院) Method for establishing prediction model for prognosis of severe spinal cord injury
CN113160979A (en) * 2020-12-18 2021-07-23 北京臻知医学科技有限责任公司 Machine learning-based liver cancer patient clinical prognosis prediction method
CN113782090A (en) * 2021-09-18 2021-12-10 中南大学湘雅三医院 Iron death model construction method and application
CN114317532A (en) * 2021-12-31 2022-04-12 广东省人民医院 Evaluation gene set, kit, system and application for predicting leukemia prognosis
CN114898874A (en) * 2022-04-18 2022-08-12 广东省科学院生物与医学工程研究所 Prognosis prediction method and system for renal clear cell carcinoma patient
CN115019965A (en) * 2022-05-20 2022-09-06 中山大学附属第一医院 Method for constructing liver cancer patient survival prediction model based on cell death related gene
CN115033758A (en) * 2022-06-30 2022-09-09 郑州金域临床检验中心有限公司 Application of kidney clear cell carcinoma prognosis marker gene, screening method and prognosis prediction method
CN115497562A (en) * 2022-10-27 2022-12-20 中国医学科学院北京协和医院 Pancreatic cancer prognosis prediction model construction method based on copper death-related gene
CN116004815A (en) * 2022-08-02 2023-04-25 山东大学齐鲁医院 Endometrial cancer prognosis model based on copper death-related lncRNA and application thereof in immunotherapy

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6964868B1 (en) * 1998-01-28 2005-11-15 Nuvelo, Inc. Human genes and gene expression products II
CN102836292A (en) * 2011-06-20 2012-12-26 苏州卫生职业技术学院 Research method for effect of extracts of modified five-ingredient toxin-dispersing beverage to carrageenan-induced inflammation in mice
CN110468203A (en) * 2019-08-08 2019-11-19 浙江省人民医院 A kind of marker, detection primer sequence and its application for predicting Gliblastoma patient prognosis
CN113160979A (en) * 2020-12-18 2021-07-23 北京臻知医学科技有限责任公司 Machine learning-based liver cancer patient clinical prognosis prediction method
CN112992346A (en) * 2021-04-09 2021-06-18 中山大学附属第三医院(中山大学肝脏病医院) Method for establishing prediction model for prognosis of severe spinal cord injury
CN113782090A (en) * 2021-09-18 2021-12-10 中南大学湘雅三医院 Iron death model construction method and application
CN114317532A (en) * 2021-12-31 2022-04-12 广东省人民医院 Evaluation gene set, kit, system and application for predicting leukemia prognosis
CN114898874A (en) * 2022-04-18 2022-08-12 广东省科学院生物与医学工程研究所 Prognosis prediction method and system for renal clear cell carcinoma patient
CN115019965A (en) * 2022-05-20 2022-09-06 中山大学附属第一医院 Method for constructing liver cancer patient survival prediction model based on cell death related gene
CN115033758A (en) * 2022-06-30 2022-09-09 郑州金域临床检验中心有限公司 Application of kidney clear cell carcinoma prognosis marker gene, screening method and prognosis prediction method
CN116004815A (en) * 2022-08-02 2023-04-25 山东大学齐鲁医院 Endometrial cancer prognosis model based on copper death-related lncRNA and application thereof in immunotherapy
CN115497562A (en) * 2022-10-27 2022-12-20 中国医学科学院北京协和医院 Pancreatic cancer prognosis prediction model construction method based on copper death-related gene

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RNA甲基化转移酶METTL3在白血病中的研究进展;杜佳慧;王晓晓;刘松柏;《重庆医学》;第52卷(第11期);1732-1737 *
基于铜死亡相关长链非编码RNA构建膀胱癌患者预后风险评估模型;许铖铖等;《浙江大学学报》;第52卷(第02期);139-147 *

Also Published As

Publication number Publication date
CN116564421A (en) 2023-08-08

Similar Documents

Publication Publication Date Title
CN116564421B (en) Method for constructing prognosis model related to copper death of acute myelogenous leukemia patient
Assefa et al. Differential gene expression analysis tools exhibit substandard performance for long non-coding RNA-sequencing data
AU2015101194A4 (en) Semi-Supervised Learning Framework based on Cox and AFT Models with L1/2 Regularization for Patient’s Survival Prediction
KR20190101966A (en) Methods and Systems for Predicting DNA Accessibility in the Pan-Cancer Genome
WO2023217290A1 (en) Genophenotypic prediction based on graph neural network
CN116741397A (en) Cancer typing method, system and storage medium based on multi-group data fusion
Pan et al. i-Modern: Integrated multi-omics network model identifies potential therapeutic targets in glioma by deep learning with interpretability
Haque et al. A common neighbor based technique to detect protein complexes in PPI networks
Ramos et al. An interpretable approach for lung cancer prediction and subtype classification using gene expression
Shibahara et al. Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified
Jeng et al. Efficient signal inclusion with genomic applications
Vimaladevi et al. A microarray gene expression data classification using hybrid back propagation neural network
Sobhan et al. Explainable machine learning to identify patient-specific biomarkers for lung cancer
Yang et al. MSPL: Multimodal self-paced learning for multi-omics feature selection and data integration
Zhang et al. Predicting patient survival from longitudinal gene expression
KR101816646B1 (en) A METHOD FOR PROCESSING DATA OF A COMPUTER FOR IDENTIFYING GENE-microRNA MODULE HAVING HIGH COREELATION WITH CANCER AND A METHOD OF SELECTING GENES AND microRNAs HAVING HIGH COREELATION WITH CANCER
Liu et al. Cancer subtype identification based on multi-view subspace clustering with adaptive local structure learning
Xiang et al. Exploring gene–gene interaction in family‐based data with an unsupervised machine learning method: EPISFA
Lin et al. Evaluation of classical statistical methods for analyzing bs-seq data
CN109686400A (en) A kind of enrichment degree method of inspection, device and readable medium, storage control
Berghout et al. Single subject transcriptome analysis to identify functionally signed gene set or pathway activity
CN115985388B (en) Multi-group-study integration method and system based on preprocessing noise reduction and biological center rule
Jia et al. DCCAFN: deep convolution cascade attention fusion network based on imaging genomics for prediction survival analysis of lung cancer
Zhao et al. Ensemble classification based signature discovery for cancer diagnosis in RNA expression profiles across different platforms
Cai et al. Application and research progress of machine learning in Bioinformatics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant