CN117352066A - Construction method of mitochondrion related gene prognosis model of breast cancer - Google Patents

Construction method of mitochondrion related gene prognosis model of breast cancer Download PDF

Info

Publication number
CN117352066A
CN117352066A CN202311410198.6A CN202311410198A CN117352066A CN 117352066 A CN117352066 A CN 117352066A CN 202311410198 A CN202311410198 A CN 202311410198A CN 117352066 A CN117352066 A CN 117352066A
Authority
CN
China
Prior art keywords
mitochondrial
breast cancer
prognosis
differential expression
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311410198.6A
Other languages
Chinese (zh)
Inventor
丁茜
裴可
张梦娜
颉丽英
蔺宝
曾泽皓
何跃腾
蔡车国
胡隽源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Beikeyuan Cell Technology Co ltd
Original Assignee
Shenzhen Beikeyuan Cell Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Beikeyuan Cell Technology Co ltd filed Critical Shenzhen Beikeyuan Cell Technology Co ltd
Priority to CN202311410198.6A priority Critical patent/CN117352066A/en
Publication of CN117352066A publication Critical patent/CN117352066A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Public Health (AREA)
  • Biotechnology (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a construction method of a mitochondrial related gene prognosis model of breast cancer, which relates to the technical field of construction methods of breast cancer prognosis models, and comprises the following steps: step one: collecting transcriptome data of the acquired breast cancer and clinical data of the patient; step two: acquiring mitochondrial related genes; step three: generating volcanic and heat map visual differential expression genes; step four: enrichment analysis is carried out on the differential expression genes related to mitochondria; step five: screening out parameters for constructing a prognosis prediction model and obtaining corresponding regression coefficients; step six: obtaining a mitochondrial related breast cancer prognosis prediction model; the provided construction method of the mitochondrial related gene prognosis model of the breast cancer is beneficial to improving the evaluation prediction capability of prognosis of the breast cancer patient, plays a guiding role in early diagnosis and treatment of the breast cancer patient, and improves the cure rate of the patient.

Description

Construction method of mitochondrion related gene prognosis model of breast cancer
Technical Field
The invention relates to the technical field of methods for constructing a breast cancer prognosis model, in particular to a method for constructing a breast cancer mitochondrion-related gene prognosis model.
Background
Breast cancer remains a major tumor factor in the worldwide burden of female disease, and the incidence rate is still on an increasing trend and the incidence age is getting lower and lower. Because of the lack of effective therapeutic targets for breast cancer, traditional therapies, including surgery, radiation therapy, and chemotherapy, remain the primary therapies. Meanwhile, because breast cancer has various clinical behaviors and biological characteristics, the prevention, diagnosis and treatment of the breast cancer are all provided with certain challenges. Currently, several studies have constructed a model of prognosis for breast cancer to predict patient survival. The main defects of the existing model are that (1) the construction method is single and the prediction effect is limited; (2) The adopted statistical algorithm is simpler, and complex relations cannot be deeply mined; (3) small sample size and poor mobility among different people; (4) biological underinterpretation; furthermore, there is no study to establish a prognostic model of mitochondrial-related genes of breast cancer for the moment; the invention of application number 202211359084.9 discloses a method for constructing a tumor malignant cell gene prognosis risk model; the construction method in the prior art has limited capability of evaluating and predicting prognosis of breast cancer patients; the mitochondrial related differential expression genes cannot be used as independent prognosis factors, and the accuracy of prognosis prediction is low; the mitochondrial related genes are not screened more scientifically and accurately, and the prognosis accuracy of breast cancer is low; therefore, in view of this situation, development of a method for constructing a mitochondrial-related gene prognosis model for breast cancer is urgently needed to meet the needs of practical use.
Disclosure of Invention
In view of the above, the present invention aims at the defects existing in the prior art, and its main objective is to provide a method for constructing a mitochondrial related gene prognosis model of breast cancer, which is beneficial to improving the evaluation prediction capability of prognosis of breast cancer patients and improving the cure rate of patients; the mitochondrial related differential expression genes can be used as independent prognosis factors, and the accuracy of prognosis prediction is high; the prognosis accuracy of breast cancer is further improved by more scientific and accurate screening of the mitochondria related genes.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a construction method of a mitochondrial related gene prognosis model of breast cancer comprises the following steps:
step one: respectively collecting transcriptome data of the breast cancer and clinical data of a patient from a plurality of databases, and removing sample data with incomplete information and total survival time less than 30 days from the clinical data in the collecting process;
step two: acquiring mitochondrial related genes from at least one database;
step three: screening the transcriptome data obtained in the first step for differentially expressed genes between a normal sample and a tumor sample by using a "limma" package of R; and regarding |logFC| >2 and corrected P value <0.05 as differentially expressed genes; meanwhile, acquiring intersection with the mitochondria-related genes obtained in the step two to obtain mitochondria-related differential expression genes, namely M-DEGs; the "Volcano" and "hetmap" packages of R were used to generate volcanic and heat map visualization differentially expressed genes;
step four: respectively carrying out GO enrichment and KEGG enrichment analysis on the mitochondrial related differential expression genes obtained in the step three; enrichment analysis is carried out on the mitochondrial related differential expression genes by using the cluster profiler, the org.Hs.eg.db, the enrichplot and the ggplot2 packages of R respectively;
step five: screening parameters for constructing a prognosis prediction model from the mitochondrial related differential expression genes obtained in the step three, and obtaining corresponding regression coefficients, wherein the parameters comprise a plurality of gene types;
step six: calculating a risk score based on the mitochondrial related differential expression genes screened in the step three and according to the expression quantity and the corresponding regression coefficient thereof, so as to obtain a mitochondrial related breast cancer prognosis prediction model;
wherein A is i Weight coefficient representing M-DEGs related to breast cancer survival, B i Represents the expression level of M-DEGs related to breast cancer survival, and n represents the number of M-DEGs related to breast cancer survival.
As a preferred embodiment: the conditions for screening the differentially expressed genes between the normal sample and the tumor sample in the third step are set to FDR <0.05 and log|FC| >1, and screening is carried out through a "limma" package of R language.
As a preferred embodiment: in the fifth step, the screening of parameters for constructing a prognosis prediction model from the mitochondrial related differential expression genes and obtaining the corresponding regression coefficients specifically comprises the following steps:
s51, M mitochondrial-related differential expression genes obtained in the step three are obtained, and firstly, the M mitochondrial-related differential expression genes obtained in the step three are subjected to further single factor COX regression analysis to obtain N mitochondrial-related differential expression genes;
s52, carrying out further protein interaction network analysis on the N differential expression genes, and finally, carrying out further screening by using Cytoscape to obtain five mitochondrial related differential expression genes which are respectively HSPE1, MRPL13, MTHFD2, APOO and TIMM17A for constructing a prognosis model.
As a preferred embodiment: in the sixth step, based on the selected mitochondrial related differential expression genes as parameters for establishing a breast cancer prognosis model, an optimal adjustment parameter lambda is determined by a ridge regression algorithm and a cross validation method according to the minimum variance, so that an optimal variable for establishing a prediction model is determined.
As a preferred embodiment: the database for collecting transcriptome data of breast cancer and clinical data of patients in the first step comprises a TCGA database, a GEO database and a UCSC database.
As a preferred embodiment: the database for acquiring the mitochondria-related genes in the second step comprises a MitoCarta database, a GSEA database and a GeneCards database.
As a preferred embodiment: the clinical data of the patient in the step one comprises the age, sex, tumor stage and survival of the patient; the transcriptome data includes the expression level of each RNA in the respective sample.
As a preferred embodiment: in the second step, mitochondrial related genes are obtained from a MitoCarta database, a GSEA database and a GeneCards database, and are searched by using 'Mitochondria' as a keyword.
As a preferred embodiment: in the fifth step S51, 702 mitochondrial-related differential expression genes obtained in the third step are used, and the 702 mitochondrial-related differential expression genes obtained in the third step are subjected to further single factor COX regression analysis to obtain 33 mitochondrial-related differential expression genes.
As a preferred embodiment: parameters for constructing the prognosis model in the fifth step comprise HSPE1, MRPL13, MTHFD2, APOO and TIMM17A; substituting the parameters into the prognosis score formula in the step six to obtain a prognosis model for constructing the breast cancer mitochondria-related differential expression gene, wherein the prognosis model comprises the following steps: prognosis score = (0.0529 ×timm17 Aexp) + (0.0672 ×apooxp) + (0.0663 ×hspe1 exp) + (0.1303 ×mrpl13 exp) + (0.0585×mthfd2 exp), wherein the subscript exp represents the expression level of the corresponding M-DEGs.
Compared with the prior art, the method has obvious advantages and beneficial effects, and particularly, according to the technical scheme, the method for constructing the mitochondrial related gene prognosis model of the breast cancer is beneficial to improving the evaluation prediction capability of the prognosis of the breast cancer patient, plays a guiding role in early diagnosis and treatment of the breast cancer patient, can improve the prognosis of the patient to a certain extent and improves the cure rate of the patient; the mitochondrial related differential expression genes can be used as independent prognosis factors, and the accuracy of prognosis prediction is high; the prognosis model constructed by the finally determined mitochondrial related differential expression genes can be suitable for predicting prognosis of most samples of breast cancer through more scientific and accurate screening of the mitochondrial related genes, so that the final prognosis model has universality, more breast cancer patients benefit, and the prognosis accuracy of the breast cancer can be further improved.
In order to more clearly illustrate the structural features and efficacy of the present invention, a detailed description thereof will be given below with reference to the accompanying drawings and examples.
Drawings
FIG. 1 is a flow chart of a method for constructing a mitochondrial-related gene prognosis model of breast cancer according to the present invention;
FIG. 2 is a thermal diagram (A) and a volcanic diagram (B) of a mitochondrial related differential expression gene of the invention;
FIG. 3 is an enrichment analysis diagram of a mitochondrial-related differential expression gene according to the invention, A is a GO enrichment analysis diagram, and B is a KEGG enrichment analysis diagram;
FIG. 4 is a heat map (A) of the intersection of single factor Cox regression analysis and mitochondrial correlation differential expression genes and a protein interaction network analysis map (B) of the intersection genes according to the present invention;
FIG. 5 is a schematic representation of a ridge regression model of the breast cancer mitochondrial-related differential expression gene of the invention;
FIG. 6 is a schematic diagram of the result of analysis of predictive accuracy of a prognostic model in accordance with the present invention;
FIG. 7 is a schematic diagram showing the survival curve analysis of breast cancer patients between high and low risk groups of genes constructing a prognostic model according to the present invention.
Detailed Description
Example 1
A construction method of a mitochondrial related gene prognosis model of breast cancer comprises the following steps:
step one: transcriptome data of breast cancer and clinical data of patients are collected and acquired from a TCGA database (https:// www.cancer.gov/TCGA), a GEO database (https:// www.ncbi.nlm.nih.gov/GEO /) and a UCSC database (http:// xena. UCSC. Edu), respectively, and 113 normal samples and 1118 tumor samples are downloaded from the TCGA database; clinical information is obtained from a UCSC database, and sample data with incomplete information and total survival time less than 30 days in the clinical data are removed in the collecting process; the clinical data of the patient in the first step comprises the age, sex, tumor stage, survival or not and survival time of the patient; the transcriptome data includes the expression level of each RNA in the respective sample;
step two: acquiring mitochondrial related genes from a MitoCarta database, a GSEA database and a GeneCards database;
the MitoCarta database adopts MitoCarta3.0; the website of the Mitocarta3.0 database is: https:// www.broadinstitute.org/mitocarbta 30-inventory-mammalian-mitoc hondrial-proteins-and-pathways;
the GSEAs database website is: http:// www.gsea-msigdb.org/gsea/index.jsp; the GeneCards database website is: https:// www.genecards.org/.
In the second step, mitochondrial related genes are obtained from a MitoCarta database, a GSEA database and a GeneCards database, and are searched by using 'Mitochondria' as a keyword.
Step three: screening a differential expression gene between a normal sample and a tumor sample through a gamma package of R in transcriptome data obtained in the first step, wherein the condition for screening the differential expression gene between the normal sample and the tumor sample is set to FDR <0.05 and log|FC| >1, and screening is carried out through the gamma package of R language; and regarding |logFC| >2 and corrected P value <0.05 as differentially expressed genes; meanwhile, acquiring intersection with the mitochondria-related genes obtained in the step two to obtain mitochondria-related differential expression genes, namely M-DEGs; the "Volcano" and "hetmap" packages of R were used to generate volcanic and heat map visualization differentially expressed genes; step four: respectively carrying out GO enrichment and KEGG enrichment analysis on the mitochondrial related differential expression genes obtained in the step three; enrichment analysis is carried out on the mitochondrial related differential expression genes by using the cluster profiler, the org.Hs.eg.db, the enrichplot and the ggplot2 packages of R respectively;
step five: screening parameters for constructing a prognosis prediction model from the mitochondrial related differential expression genes obtained in the step three, and obtaining corresponding regression coefficients, wherein the parameters comprise a plurality of gene types;
in the fifth step, the screening of parameters for constructing a prognosis prediction model from mitochondrial related differential expression genes and obtaining corresponding regression coefficients specifically comprises the following steps:
s51, 702 mitochondrial-related differential expression genes obtained in the step three, namely, performing further single-factor COX regression analysis on the M mitochondrial-related differential expression genes obtained in the step three to obtain 33 mitochondrial-related differential expression genes;
s52, carrying out further protein interaction network analysis on the N differential expression genes, and finally, carrying out further screening by using Cytoscape to obtain five mitochondrial related differential expression genes which are respectively HSPE1, MRPL13, MTHFD2, APOO and TIMM17A for constructing a prognosis model.
Step six: calculating a risk score based on the mitochondrial related differential expression genes screened in the step three and according to the expression quantity and the corresponding regression coefficient thereof, so as to obtain a mitochondrial related breast cancer prognosis prediction model;
wherein A is i Weight coefficient representing M-DEGs related to breast cancer survival, B i Represents the expression level of M-DEGs related to breast cancer survival, and n represents the number of M-DEGs related to breast cancer survival.
In the sixth step, based on the selected mitochondrial related differential expression genes as parameters for establishing a breast cancer prognosis model, an optimal adjustment parameter lambda is determined by a ridge regression algorithm and a cross validation method according to the minimum variance, so that an optimal variable for establishing a prediction model is determined.
In the step S51, 702 mitochondrial-related differential expression genes obtained in the step III are used, and the 702 mitochondrial-related differential expression genes obtained in the step III are subjected to further single factor COX regression analysis to obtain 33 mitochondrial-related differential expression genes.
Parameters for constructing the prognosis model in the fifth step include HSPE1, MRPL13, MTHFD2, APOO and TIMM17A; substituting the parameters into the prognosis score formula in the step six to obtain a prognosis model for constructing the breast cancer mitochondria-related differential expression gene, wherein the prognosis model comprises the following steps: prognosis score = (0.0529 ×timm17 Aexp) + (0.0672 ×apooxp) + (0.0663 ×hspe1 exp) + (0.1303 ×mrpl13 exp) + (0.0585×mthfd2 exp), wherein the subscript exp represents the expression level of the corresponding M-DEGs.
Verification example 1
Prognosis model prediction accuracy analysis test:
to assess the accuracy of prognosis risk models to predict 1, 3 and 5 year OS, we plotted ROC curves based on breast cancer samples obtained from TCGA database, as shown in fig. 6, with AUC values of 0.621, 0.626 and 0.614, respectively, demonstrating that the model has good diagnostic capabilities in predicting prognosis of breast cancer patients.
Verification example 2
Survival curve analysis test of breast cancer patients between high and low risk groups:
dividing patients into two groups of high risk and low risk according to the median risk score, and then drawing a K-M survival curve based on a sample of the TCGA database; as shown in FIG. 7, the K-M survival curve shows that the OS is worse for the high risk group of patients (P < 0.001).
Mitochondria play an important role in regulating tumor cell growth, death, and cell metabolism. At the same time, mitochondria have also been found to be involved in bioenergy metabolism, such as ATP production, reactive oxygen species production, apoptosis and calcium homeostasis. It has been found that mitochondria play a complex role in breast cancer, involving multiple aspects of energy metabolism, oxidative stress, apoptosis, and drug resistance. For the study and treatment of breast cancer, it is important to understand in depth the functional regulatory mechanisms of mitochondria therein, which helps to develop more effective therapeutic strategies and preventive measures. Furthermore, the present application has the following advantages: (1) a machine learning algorithm is applied to mine nonlinear relations among data, multiple collinearity is processed, and overfitting is reduced, so that model predictability is improved; (2) training by adopting a multi-center large sample, and improving the generalization capability of the model; (3) the model composition can be customized, and the possibility is provided for the individuation accurate prognosis.
The design of the present invention is focused on,
1. according to the invention, a breast cancer related sample is obtained in a TCGA database, a limma R package is utilized to screen differential expression genes, mitochondrial related genes are obtained from Mitocarta3.0, GSEA and GeneCards databases, mitochondrial related differential expression genes are obtained through intersection, then the relation between the expression and survival of the mitochondrial related differential expression genes is discussed by single factor Cox regression analysis according to the obtained clinical data, the potential molecular mechanism of the mitochondrial related differential expression genes in breast cancer is discussed by enrichment analysis, a mitochondrial related differential expression gene prognosis model is established according to ridge regression analysis, further a biomarker is further defined to predict the prognosis risk of a breast cancer patient, and a reliable biomarker is provided for assessing the prognosis of the breast cancer patient through the biomarker, so that the assessment prediction capability of the prognosis of the breast cancer patient is improved, the guiding effect is played for early diagnosis and treatment of the breast cancer patient, the prognosis of the patient can be improved to a certain extent, and the cure rate of the patient is improved;
2. the prognosis prediction of the mitochondrial related differential expression gene prognosis model constructed by the invention can take the mitochondrial related differential expression gene as an independent prognosis factor, and has high accuracy in prognosis prediction, namely 1 year, 3 years and 5 years;
3. compared with the prior art, the construction method of the breast cancer mitochondria-related gene prognosis model provided by the application considers the significance of mitochondria-related genes in breast cancer biology, determines a mitochondria-related gene model to predict prognosis of a breast cancer sample, and accurately judges the prognosis of the breast cancer sample; the prognosis model constructed by the finally determined mitochondrial related differential expression genes can be suitable for predicting prognosis of most samples of breast cancer through more scientific and accurate screening of the mitochondrial related genes, so that the final prognosis model has universality, more breast cancer patients benefit, and the prognosis accuracy of the breast cancer can be further improved.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the technical scope of the present invention, so any minor modifications, equivalent changes and modifications made to the above embodiments according to the technical principles of the present invention still fall within the scope of the technical solutions of the present invention.

Claims (10)

1. A construction method of a mitochondrial related gene prognosis model of breast cancer is characterized by comprising the following steps: the method comprises the following steps:
step one: respectively collecting transcriptome data of the breast cancer and clinical data of a patient from a plurality of databases, and removing sample data with incomplete information and total survival time less than 30 days from the clinical data in the collecting process;
step two: acquiring mitochondrial related genes from at least one database;
step three: screening the transcriptome data obtained in the first step for differentially expressed genes between a normal sample and a tumor sample by using a "limma" package of R; and regarding |logFC| >2 and corrected P value <0.05 as differentially expressed genes; meanwhile, acquiring intersection with the mitochondria-related genes obtained in the step two to obtain mitochondria-related differential expression genes, namely M-DEGs; the "Volcano" and "hetmap" packages of R were used to generate volcanic and heat map visualization differentially expressed genes;
step four: respectively carrying out GO enrichment and KEGG enrichment analysis on the mitochondrial related differential expression genes obtained in the step three; enrichment analysis is carried out on the mitochondrial related differential expression genes by using the cluster profiler, the org.Hs.eg.db, the enrichplot and the ggplot2 packages of R respectively;
step five: screening parameters for constructing a prognosis prediction model from the mitochondrial related differential expression genes obtained in the step three, and obtaining corresponding regression coefficients, wherein the parameters comprise a plurality of gene types;
step six: calculating a risk score based on the mitochondrial related differential expression genes screened in the step three and according to the expression quantity and the corresponding regression coefficient thereof, so as to obtain a mitochondrial related breast cancer prognosis prediction model;
wherein A is i Weight coefficient representing M-DEGs related to breast cancer survival, B i Represents the expression level of M-DEGs related to breast cancer survival, and n represents the number of M-DEGs related to breast cancer survival.
2. The method for constructing a mitochondrial-related gene prognosis model for breast cancer according to claim 1, wherein: the conditions for screening the differentially expressed genes between the normal sample and the tumor sample in the third step are set to FDR <0.05 and log|FC| >1, and screening is carried out through a "limma" package of R language.
3. The method for constructing a mitochondrial-related gene prognosis model for breast cancer according to claim 1, wherein: in the fifth step, the screening of parameters for constructing a prognosis prediction model from the mitochondrial related differential expression genes and obtaining the corresponding regression coefficients specifically comprises the following steps:
s51, M mitochondrial-related differential expression genes obtained in the step three are obtained, and firstly, the M mitochondrial-related differential expression genes obtained in the step three are subjected to further single factor COX regression analysis to obtain N mitochondrial-related differential expression genes;
s52, carrying out further protein interaction network analysis on the N differential expression genes, and finally, carrying out further screening by using Cytoscape to obtain five mitochondrial related differential expression genes which are respectively HSPE1, MRPL13, MTHFD2, APOO and TIMM17A for constructing a prognosis model.
4. The method for constructing a mitochondrial-related gene prognosis model for breast cancer according to claim 1, wherein: in the sixth step, based on the selected mitochondrial related differential expression genes as parameters for establishing a breast cancer prognosis model, an optimal adjustment parameter lambda is determined by a ridge regression algorithm and a cross validation method according to the minimum variance, so that an optimal variable for establishing a prediction model is determined.
5. The method for constructing a mitochondrial-related gene prognosis model for breast cancer according to claim 1, wherein: the database for collecting transcriptome data of breast cancer and clinical data of patients in the first step comprises a TCGA database, a GEO database and a UCSC database.
6. The method for constructing a mitochondrial-related gene prognosis model for breast cancer according to claim 1, wherein: the database for acquiring the mitochondria-related genes in the second step comprises a MitoCarta database, a GSEA database and a GeneCards database.
7. The method for constructing a mitochondrial-related gene prognosis model for breast cancer according to claim 1, wherein: the clinical data of the patient in the step one comprises the age, sex, tumor stage and survival of the patient; the transcriptome data includes the expression level of each RNA in the respective sample.
8. The method for constructing a mitochondrial-related gene prognosis model for breast cancer according to claim 6, wherein: in the second step, mitochondrial related genes are obtained from a MitoCarta database, a GSEA database and a GeneCards database, and are searched by using 'Mitochondria' as a keyword.
9. The method for constructing a mitochondrial-related gene prognosis model for breast cancer according to claim 3, wherein: in the fifth step S51, 702 mitochondrial-related differential expression genes obtained in the third step are used, and the 702 mitochondrial-related differential expression genes obtained in the third step are subjected to further single factor COX regression analysis to obtain 33 mitochondrial-related differential expression genes.
10. The method for constructing a mitochondrial-related gene prognosis model for breast cancer according to claim 9, wherein: parameters for constructing the prognosis model in the fifth step comprise HSPE1, MRPL13, MTHFD2, APOO and TIMM17A; substituting the parameters into the prognosis score formula in the step six to obtain a prognosis model for constructing the breast cancer mitochondria-related differential expression gene, wherein the prognosis model comprises the following steps: prognosis score = (0.0529 ×timm17 Aexp) + (0.0672 ×apooxp) + (0.0663 ×hspe1 exp) + (0.1303 ×mrpl13 exp) + (0.0585×mthfd2 exp), wherein the subscript exp represents the expression level of the corresponding M-DEGs.
CN202311410198.6A 2023-10-27 2023-10-27 Construction method of mitochondrion related gene prognosis model of breast cancer Pending CN117352066A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311410198.6A CN117352066A (en) 2023-10-27 2023-10-27 Construction method of mitochondrion related gene prognosis model of breast cancer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311410198.6A CN117352066A (en) 2023-10-27 2023-10-27 Construction method of mitochondrion related gene prognosis model of breast cancer

Publications (1)

Publication Number Publication Date
CN117352066A true CN117352066A (en) 2024-01-05

Family

ID=89370895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311410198.6A Pending CN117352066A (en) 2023-10-27 2023-10-27 Construction method of mitochondrion related gene prognosis model of breast cancer

Country Status (1)

Country Link
CN (1) CN117352066A (en)

Similar Documents

Publication Publication Date Title
CN112289455A (en) Artificial intelligence neural network learning model construction system and construction method
Milanez-Almeida et al. Cancer prognosis with shallow tumor RNA sequencing
Yin et al. A fusion decision system to identify and grade malnutrition in cancer patients: machine learning reveals feasible workflow from representative real-world data
CN109872776A (en) A kind of screening technique and its application based on weighted gene coexpression network analysis to gastric cancer potential source biomolecule marker
CN113450869A (en) Construction and clinical application of colorectal cancer prognosis model based on m 6A-related lncRNA network
CN113362894A (en) Method for predicting syndromal cancer driver gene
Yang et al. Identification of key genes in coronary artery disease: an integrative approach based on weighted gene co-expression network analysis and their correlation with immune infiltration
CN115482880A (en) Head and neck squamous carcinoma glycolysis related gene prognosis model, construction method and application
Yao et al. Potential role of a three-gene signature in predicting diagnosis in patients with myocardial infarction
WO2022156610A1 (en) Prediction tool for determining sensitivity of liver cancer to drug and long-term prognosis of liver cancer on basis of genetic testing, and application thereof
Panagoulias et al. Towards personalized nutrition applications with nutritional biomarkers and machine learning
CN117079716B (en) Deep learning prediction method of tumor drug administration scheme based on gene detection
CN112037863B (en) Early NSCLC prognosis prediction system
Liu et al. Gut microbiome-based machine learning for diagnostic prediction of liver fibrosis and cirrhosis: a systematic review and meta-analysis
Bi et al. Bioinformatics analysis of key genes and miRNAs associated with Stanford type A aortic dissection
CN106415563A (en) Systems and methods for predicting a smoking status of an individual
CN117352066A (en) Construction method of mitochondrion related gene prognosis model of breast cancer
Long et al. Landscape of co-expressed genes between the myocardium and blood in sepsis and ceRNA network construction: a bioinformatic approach
CN110111890A (en) A kind of accurate health-preserving method of individual based on gene sequencing technology
CN112863604B (en) Method for predicting tumor interstitial mechanism and treatment sensitivity
Dong et al. [Retracted] Identification of Signature Genes and Construction of an Artificial Neural Network Model of Prostate Cancer
Wang et al. Identification of cancer trait genes and association analysis under pan-cancer
Wu et al. Primary tumor surgery improves survival of cancer patients with synchronous solitary bone metastasis: a large population-based study
Yuan et al. A model to predict a risk of allergic rhinitis based on mitochondrial DNA copy number
CN115631797B (en) Prediction method for predicting laryngeal squamous cell carcinoma prognosis based on autophagy related genes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination