LU103183B1 - Method for building prognosis model of lung adenocarcinoma based on cuproptosis-related genes - Google Patents

Method for building prognosis model of lung adenocarcinoma based on cuproptosis-related genes Download PDF

Info

Publication number
LU103183B1
LU103183B1 LU103183A LU103183A LU103183B1 LU 103183 B1 LU103183 B1 LU 103183B1 LU 103183 A LU103183 A LU 103183A LU 103183 A LU103183 A LU 103183A LU 103183 B1 LU103183 B1 LU 103183B1
Authority
LU
Luxembourg
Prior art keywords
lung adenocarcinoma
prognosis model
genes
survival
prognosis
Prior art date
Application number
LU103183A
Other languages
German (de)
Inventor
Zemao Zheng
Jing Ge
Yahui Hu
Original Assignee
Nanfang Hospital Southern Medical Univ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanfang Hospital Southern Medical Univ filed Critical Nanfang Hospital Southern Medical Univ
Application granted granted Critical
Publication of LU103183B1 publication Critical patent/LU103183B1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Primary Health Care (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Pathology (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Physiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application provides a method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes. The method includes: acquiring single cell data of lung adenocarcinoma from a comprehensive database of gene expression; analyzing the single cell data of the lung adenocarcinoma according to the single cell analysis to obtain the cell subgroup with the highest activity of the copper death pathway; screening differentially expressed genes between the cell subgroup with the highest activity of the copper death pathway and other cell subgroups; screening the differentially expressed genes to obtain survival prediction genes for lung adenocarcinoma; establishing the prognosis model of lung adenocarcinoma according to the survival prediction gene of lung adenocarcinoma, wherein the survival prediction gene of lung adenocarcinoma includes ANKRD29, RHOV, TLE1 and NPAS2.

Description

S166P6LU-3026-00001LU 21.07.2023
METHOD FOR BUILDING PROGNOSIS MODEL OF LUNG ADENOCARCINOMALU103183
BASED ON CUPROPTOSIS-RELATED GENES
TECHNICAL FIELD
[0001] The application relates to the technical field of health management, in particular to a method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes.
BACKGROUND
[0002] Lung adenocarcinoma (LUAD) is a subtype of lung cancer, which has a high incidence rate and mortality worldwide, posing a serious threat to human health. At present, lung adenocarcinoma is often difficult to identify. Most patients are in advanced stage at the time of diagnosis, and the long-distance spread of cancer cells will lead to serious health consequences.
Chemotherapy, radiotherapy, and surgery are routine treatment methods for lung cancer, but the treatment effects vary among different patients. Some patients have poor prognosis and require timely intervention. Therefore, there is an urgent need to identify prognostic features to predict the long-term survival rate of LUAD patients, in order to provide a basis for providing personalized treatment plans and improving prognosis.
[0003] Cell death is a common phenomenon in life and a hot topic in life science research. Cells can die in different ways, including apoptosis, necrosis and Ferroptosis. Metal ions play an important role in cellular function, with copper being an essential trace element for the human body. However, when cells are exposed to too much or too little copper ion environment, it may lead to cell death. In mammalian cells, copper content is usually low, and exceeding the required steady-state threshold for cells can trigger cytotoxic reactions. Recent research has discovered a novel cell death mode that relies on copper ion regulation, known as copper induced cell death or copper induced cell death. However, the role of cuproptosis-related genes in lung adenocarcinoma remains unclear.
SUMMARY
[0004] In view of this, it is necessary to provide a method to construct a prognosis model of lung adenocarcinoma based on cuproptosis-related genes, which can at least overcome one of
S166P6LU-3026-00001LU 21.07.2023 the above shortcomings. LU103183
[0005] First, the embodiment of the application provides a method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes. The method includes: acquiring single cell data of lung adenocarcinoma from a comprehensive database of gene expression; Analyze the single cell data of the Lung adenocarcinoma according to the single cell analysis to obtain the cell subgroup with the highest activity of the copper death pathway;
Screening differentially expressed genes between the cell subgroup with the highest activity of the copper death pathway and other cell subgroups; Screening the differentially expressed genes to obtain survival prediction genes for lung adenocarcinoma; The prognosis model of lung adenocarcinoma is established according to the survival prediction gene of Lung adenocarcinoma, wherein the survival prediction gene of lung adenocarcinoma includes
ANKRD29, RHOV, TLE1 and NPAS2.
[0006] According to an embodiment of the present application, the cell subgroup with the highest activity of the copper death pathway 1s the epithelial cell subgroup.
[0007] According to an embodiment of the application, the screening of the differentially expressed genes to obtain survival prediction genes for lung adenocarcinoma includes: applying the single factor COX survival analysis algorithms to analyze the differentially expressed genes to obtain prognosis related genes. The random forest survival algorithm is applied to screen the prognosis related genes to obtain the survival prediction genes of the lung adenocarcinoma.
[0008] According to an embodiment of the present application, the method further comprises: screening the differentially expressed genes between the epithelial cell subpopulation and other cell subpopulations based on the FindMarkers algorithm.
[0009] According to an embodiment of the application, the method of establishing the prognosis model of Lung adenocarcinoma according to the survival prediction gene of Lung adenocarcinoma includes: obtaining the transcript map and clinical information of lung adenocarcinoma patients according to the Cancer Genome Atlas; establishing the prognosis model of lung adenocarcinoma according to the survival prediction gene of the Lung adenocarcinoma, the transcriptional map and the clinical information.
[0010] According to an embodiment of the application, the acquisition of single cell data of lung adenocarcinoma from the comprehensive database of gene expression also includes: 6
S166P6LU-3026-00001LU 21.07.2023 excluding the clinical information of patients whose survival time 1s less than 30 days. LU103183
[0011] According to an embodiment of the application, the prognostic model of Lung adenocarcinoma has a risk coefficient which is equal to 0.0595 * NPAS2+0.1717 *
TLE1+0.1217 * RHOV+(-0.073) * ANKRD?29.
[0012] According to an embodiment of the application, the lung adenocarcinoma prognosis model is used to predict the 1-year, 3-year and 5-year survival rates of lung adenocarcinoma patients.
[0013] According to an embodiment of the present application, the method further comprises: establishing a training group and a validation group based on the transcription map and clinical information; training the lung adenocarcinoma prognosis model according to the training group; validating the prediction results of the lung adenocarcinoma prognosis model according to the validation group.
[0014] According to an embodiment of the application, the method further includes: inputting the data of the training group into the lung adenocarcinoma prognosis model to obtain the median of the risk coefficient; dividing the patients with lung adenocarcinoma are divided into a high-risk group and a low-risk group according to the median.
[0015] The method for building the prognosis model of lung adenocarcinoma based on cuproptosis-related genes provided in the implementation mode of the application can more accurately assess the prognosis risk of patients and provide reference for individualized treatment decisions. This prognosis model develops more effective treatment plans based on the patient's gene expression characteristics and survival information, improving the patient's survival rate and quality of life.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] Figure 1 is a flow diagram of the method for building the prognosis model of lung adenocarcinoma based on cuproptosis-related genes provided by an embodiment of the application.
[0017] Figure 2 is a schematic diagram of the activity of the copper death pathway provided in an embodiment of the present application.
[0018] Figure 3 is a schematic diagram of the activity of the copper death pathway in a cell subpopulation provided in an embodiment of the present application. 7
S166P6LU-3026-00001LU 21.07.2023
[0019] Figure 4 is a schematic diagram of the Gene prediction capability provided by an-V103188 embodiment of the present application.
[0020] Figure Sa is a schematic diagram of K-M survival analysis provided in an embodiment of the present application.
[0021] Figure 5b is a schematic diagram of K-M survival analysis provided in an embodiment of the present application.
[0022] Figure 6 is the schematic diagram of the survival rate of Lung adenocarcinoma patients provided by an embodiment of the application.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
[0023] To better expound the objectives, the technical solution, and the advantages of an embodiment of the application, a detailed description of a specific technical solution of the application will be provided below, with reference to the drawings of the embodiment of the application. The embodiment provided below is only for illustrating the application and is not intended to limit the scope of the application.
[0024] In the embodiment of the application, terms, such as “first” and “second”, as used herein are adopted only the purposes of description and should not be interpreted as indicating or implying relative importance or implicitly suggesting the quantity of a technical feature indicated thereby. Thus, the features that are held with “first” and “second” may explicitly or implicitly suggest one or more such features are included. In the description of the embodiment of the application, “multiple” means two or more than two, unless otherwise described.
[0025] In the embodiment of the application, terms, such as “comprise” and “include” or any other variations thereof, as used herein are intended to indicate containing, in an exclusive way, such that a process, a method, an article, or a device that consists of a series of element not only include such elements, but also includes other elements that are not explicitly listed, or includes inherent elements to such a process, method, article, or device. Without being further constrains, constraining an element with the phrase “comprising one ...” does not exclude that additional similar elements may be present in the process, method, article, or device including such an element.
[0026] In the embodiment of the application, terms, such as “illustrative” or “for example”, as used herein indicates being provided as an example, illustration or explanation. In the embodiment of the application, any embodiment or design solution that is described as being “illustrative” or “for example” should not be interpreted as being better than or superior to other 8
S166P6LU-3026-00001LU 21.07.2023 embodiments or design solutions. Specifically, the use of the terms “illustrative” or “fon_U103183 example” aims to provide a concrete representation of an abstract idea.
[0027] Based on the implementation methods in this application, all other implementation methods obtained by ordinary technical personnel in this field without creative labor are within the scope of protection of the application.
[0028] Lung adenocarcinoma (LUAD) is a subtype of lung cancer, which has a high incidence rate and mortality worldwide, posing a serious threat to human health. At present, lung adenocarcinoma is often difficult to identify. Most patients are in advanced stage at the time of diagnosis, and the long-distance spread of cancer cells will lead to serious health consequences.
Chemotherapy, radiotherapy, and surgery are routine treatment methods for lung cancer, but the treatment effects vary among different patients. Some patients have poor prognosis and require timely intervention. Therefore, there is an urgent need to identify prognostic features to predict the long-term survival rate of LUAD patients, in order to provide a basis for providing personalized treatment plans and improving prognosis.
[0029] Cell death is a common phenomenon in life and a hot topic in life science research. Cells can die in different ways, including apoptosis, necrosis and Ferroptosis. Metal ions play an important role in cellular function, with copper being an essential trace element for the human body. However, when cells are exposed to too much or too little copper ion environment, it may lead to cell death. In mammalian cells, copper content is usually low, and exceeding the required steady-state threshold for cells can trigger cytotoxic reactions. Recent research has discovered a novel cell death mode that relies on copper ion regulation, known as copper induced cell death or copper induced cell death. However, the role of cuproptosis-related genes in lung adenocarcinoma remains unclear.
[0030] Therefore, the embodiment of this application provides a method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes, which can more accurately assess the prognosis risk of patients and provide reference for individualized treatment decisions. This prognosis model develops more effective treatment plans based on the patient's gene expression characteristics and survival information, improving the patient's survival rate and quality of life.
[0031] Below is a detailed explanation of some implementation methods of the application, combined with the accompanying drawings. Without conflict, the following embodiments and the features in the embodiments can be combined with each other.
[0032] Figure 1 is a flow diagram of a method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes provided by an embodiment of the 9
S166P6LU-3026-00001LU 21.07.2023 application. As shown in Figure 1, the method for building a prognosis model of lung U103183 adenocarcinoma based on cuproptosis-related genes at least includes the following steps: S100: acquiring single cell data of Lung adenocarcinoma from the comprehensive database of gene expression; S200: Analyzing the single cell data of Lung adenocarcinoma according to single cell analysis to obtain the cell subgroup with the highest activity of the copper death pathway;
S300: screening differentially expressed genes between the cell subpopulations with the highest activity of the copper death pathway and other cell subpopulations; S400: screening differentially expressed genes to obtain survival prediction genes of lung adenocarcinoma;
S500: establishing the prognosis model of lung adenocarcinoma according to the survival prediction gene of lung adenocarcinoma.
[0033] S100: Obtaining the single cell data of lung adenocarcinoma from the comprehensive database of gene expression.
[0034] It can be understood that data collection is a crucial step in constructing gene expression based models. In the method of building a prognosis model of Lung adenocarcinoma based on cuproptosis-related genes, data were collected from two major databases, namely, the Gene
Expression Omnibus (GEO) database GES131907 and the The Cancer Genome Atlas (TCGA) - LUAD database. It can be understood that GEO database is a public data resource, which contains a wide range of gene expression data and related experimental information. TCGA project is a project aimed at systematically studying the Genomics characteristics of multiple cancer types, providing large-scale cancer patient samples and clinical data.
[0035] Specifically, in step S100, 11 single cell datasets of Lung adenocarcinoma were obtained from GES131907 in GEO database and downloaded for subsequent analysis. These single cell datasets provide high-resolution gene expression information for individual cells, which can be used to analyze the cell heterogeneity and molecular characteristics of Lung adenocarcinoma.
[0036] Specifically, in step S100, the transcriptional map and corresponding clinical information of Lung adenocarcinoma patients were subsequently extracted from the TCGA-
LUAD database. Understandably, the transcriptional map reflects the gene expression level in
Lung adenocarcinoma tissue. It can be understood that clinical information includes age, gender, disease stage, survival period, etc. The clinical information can be used to further analyze the prognostic characteristics and survival rate of Lung adenocarcinoma patients.
[0037] It can be understood that the prognosis model of Lung adenocarcinoma based on cuproptosis-related genes provided in the application. By collecting these data sets, we can use these rich resources to carry out molecular analysis of Lung adenocarcinoma and build prognostic models. This will help to understand the pathogenesis of Lung adenocarcinoma, find
S166P6LU-3026-00001LU 21.07.2023 the gene expression characteristics related to prognosis, and provide useful information fon U103183 individualized treatment and prognosis evaluation.
[0038] S200: Analyze the single cell data of Lung adenocarcinoma according to single cell analysis to obtain the cell subgroup with the highest activity of the copper death pathway.
[0039] It can be understood that in step S200, the single cell data of Lung adenocarcinoma were analyzed according to single cell analysis to obtain the cell subgroup with the highest activity of the copper death pathway. Specifically, single cell data can be divided into different subpopulations using clustering algorithms (such as t-SNE, PCA, k-means, etc.), and each subpopulation represents a cell population with similar Transcriptome characteristics.
Subsequently, the activity level of the copper death pathway in each cell subgroup was calculated based on the relevant genes or gene sets of the copper death pathway. Finally, by comparing the activity levels of copper death pathways in different cell subgroups, it can be determined which subgroup has the highest activity.
[0040] It can be understood that in the embodiment of this application, the cell subgroup with the highest activity of the copper death pathway is the epithelial cell subgroup.
[0041] It can be understood that in the embodiment of the application, the analysis of single cell data of Lung adenocarcinoma based on single cell analysis to obtain the cell subpopulation with the highest activity of the copper death pathway also includes the application of single factor COX survival analysis algorithms differentially expressed genes to obtain prognosis related factors, and the application of Random forest survival algorithm to screen prognosis related genes to obtain Lung adenocarcinoma survival prediction genes.
[0042] It can be understood that in the construction method of Lung adenocarcinoma prognosis model based on cuproptosis-related genes provided in this application, obtaining the cell subpopulation with the highest activity of the copper death pathway is helpful to understand the importance of copper death pathway in Lung adenocarcinoma and the related cell types and functions.
[0043] S300: Screening differentially expressed genes between the cell subgroup with the highest activity of copper death pathway and other cell subgroups.
[0044] It can be understood that in step S300, the differentially expressed genes between the cell subgroup with the highest activity of the copper death pathway and other cell subgroups were screened. Specifically, statistical methods were used to perform differential analysis of gene expression data between the cell subpopulations with the highest activity of the copper death pathway and other cell subpopulations to identify upregulated genes. Subsequently, functional annotation and enrichment analysis were performed on differentially expressed 11
S166P6LU-3026-00001LU 21.07.2023 genes to understand their biological functional differences between cell subpopulations. Finally1U103183 based on the results of the functional annotations, Explain the differences between the cell subpopulations with the highest activity of copper death pathway and other cell subpopulations.
[0045] It can be understood that in the embodiment of this application, screening for differentially expressed genes between the cell subgroup with the highest activity of the copper death pathway and other cell subgroups can be performed using the FindMarkers algorithm to screen differentially expressed genes between the epithelial cell subgroup and other cell subgroups.
[0046] S400: Screening differentially expressed genes to obtain survival prediction genes for
Lung adenocarcinoma.
[0047] Understandably, in step S400, differentially expressed genes are screened to obtain survival prediction genes for Lung adenocarcinoma. Specifically, correlation analysis will be conducted between the differentially expressed genes screened and survival time and event information in clinical data. For example, Kaplan Meier survival analysis and Cox proportional risk regression analysis are used to calculate the significance of survival prediction genes.
According to the statistical significance, determine the genes related to the survival of lung adenocarcinoma patients.
[0048] S500: The prognosis model of lung adenocarcinoma was established according to the survival prediction gene of lung adenocarcinoma.
[0049] Understandably, in step S500, a prognosis model of lung adenocarcinoma 1s established according to the survival prediction gene of lung adenocarcinoma. Specifically, the dataset 1s divided into a training set and a testing set, and a survival prediction model is constructed using the training set, and the performance and predictive ability of the model are evaluated using the testing set.
[0050] It can be understood that in the embodiment of the application, the transcriptional map and clinical information of lung adenocarcinoma patients can be obtained according to the The
Cancer Genome Atlas. A prognosis model of Lung adenocarcinoma can be established according to the survival prediction genes, transcriptional maps and clinical information of lung adenocarcinoma. Specifically, when obtaining patient clinical information, it also includes excluding clinical information with a survival time of less than 30 days.
[0051] It can be understood that in the embodiment of this application, the risk coefficient 1s equal to 0.0595 * NPAS2+0.1717 * TLE1+0.1217 * RHOV+(-0.073) * ANKRD29.
Specifically, the prognosis model of lung adenocarcinoma can be used to predict the 1-year, 3- year and 5-year survival rates of lung adenocarcinoma patients. 12
S166P6LU-3026-00001LU 21.07.2023
[0052] It can be understood that in the embodiment of the application, the method for building U103183 the prognosis model of lung adenocarcinoma based on cuproptosis-related genes establishes a training group and a validation group according to the transcriptional map and clinical information. Then, the prognosis model of lung adenocarcinoma is trained according to the training group. Finally, according to the validation group, verify the prediction results of the prognosis model of Lung adenocarcinoma. Specifically, input the data of the training group into the prognosis model of lung adenocarcinoma to obtain the median risk coefficient; According to the median, patients with lung adenocarcinoma is divided into a high-risk group and a low- risk group.
[0053] It can be understood that the construction method of lung adenocarcinoma prognosis model based on cuproptosis-related genes provided by the embodiment of the application can predict the survival period of lung adenocarcinoma patients, identify potential Targeted therapy targets, and achieve individualized treatment decisions for lung adenocarcinoma patients.
[0054] The following describes the method for building the prognosis model of lung adenocarcinoma based on cuproptosis-related genes provided in the application with an exemplary embodiment.
[0055] Firstly, 11 single cell datasets of lung adenocarcinoma were obtained from the
GES131907 gene expression comprehensive database, and transcriptional profiles and corresponding clinical information of lung adenocarcinoma patients were extracted from the
Cancer Genome Atlas (TCGA) - LUAD database. The information of 503 patients with lung adenocarcinoma was screened out by excluding the patients with survival time less than 30 days who were diagnosed as lung adenocarcinoma patients, so as to construct the prognosis model of lung adenocarcinoma.
[0056] Subsequently, data standardization was performed on the lung adenocarcinoma single cell data set, as well as the transcriptional map and corresponding clinical information of Lung adenocarcinoma patients. Specifically, Limma 3.52.2 can be used for data standardization processing.
[0057] Next, different types of cells are grouped using markers. For example, the marker
CD79A can correspond to B lymphocytes; RAMP2, VWF, and ACKR1 affect endothelial cells;
LUM, COL3A1, and DCN affect fibroblasts; TPSABI and CPA3 correspond to Mast cell;
CD8A, CD8B, and CD3D correspond to T lymphocytes; LYZ and C1QB on myeloid cells;
S100A2 and SFN correspond to epithelial cells; NKG7 corresponds to NK cells.
[0058] Next, using the AUCell R package, calculate the activity of the copper death pathway for cell samples in each cell subgroup. AUCell can evaluate the enrichment of gene sets in cell 13
S166P6LU-3026-00001LU 21.07.2023 samples based on known gene sets, thereby determining the activity of copper death pathways.LU103183
Determine the cell subgroup with the highest correlation: Based on the activity results of the copper death pathway, determine the cell subgroup with the highest correlation with the copper death pathway. The cell subgroup with the highest activity value or significant enrichment level can be selected as the subgroup most related to the copper death pathway.
[0059] Specifically, known gene sets can include common upregulated genes, such as
CDKN2A, FDX1, DLD, DLAT, LIAS, GLS, LIPT1, MTF1, PDHA1, and PDHB.
[0060] Figure 2 is a schematic diagram of the activity of the copper death pathway provided in an embodiment of the present application.
[0061] It can be understood that the higher the AUC value, the stronger the activity of the copper pathway. As shown in Figure 2, there are 32011 cells with AUC values greater than 0.034. By analyzing cells with AUC values greater than 0.034, a subset of cells with active copper death pathways can be obtained.
[0062] Figure 3 is a schematic diagram of the activity of the copper death pathway in a cell subpopulation provided in an embodiment of the present application.
[0063] It can be understood that t-Distributed Stochastic Neighbor Embedding (t-SNE) is used to analyze the activity of copper death pathways in different cell subpopulations. It can be understood that t-SNE is a dimensionality reduction technique used to visualize high- dimensional data. It can map high-dimensional data into 2D or 3D space for easy observation and analysis. Specifically, t-SNE constructs a dimensionality reduction representation by considering the similarity between data points. It is based on a random gradient descent algorithm that optimizes the difference between the relative distance between data points in high-dimensional space and the relative distance in low-dimensional space. In the process of dimensionality reduction, t-SNE pays particular attention to preserving the local structure between data points, that is, similar data points remain close after dimensionality reduction.
[0064] As shown in Figure 3, it can be clearly determined that the copper death pathway in epithelial cells is more active than other cell subpopulations. Subsequently, the FindMarkers algorithm was used to search for differentially expressed genes DEGs in epithelial cell subpopulations and other subpopulations.
[0065] It can be understood that the FindMarkers algorithm is a commonly used method for screening differentially expressed genes, used to identify genes with significant differences in expression between different cell subpopulations. The FindMarkers algorithm is based on statistical methods, which compare the gene expression levels of various cell subpopulations to identify genes with significant differential expression between different cell subpopulations. 14
S166P6LU-3026-00001LU 21.07.2023
[0066] It can be understood that differentially expressed genes screened between epithelial cellLu103183 subpopulations and other subpopulations can be further annotated and analyzed using functional annotation tools such as clusterProfiler. Specifically, when using the cluster Profiler to annotate the functions of these DEGs, different databases and annotation resources can be used, such as Gene Ontology (GO) and biological pathway databases (such as KEGG). In the embodiment of the application, through KEGG pathway enrichment analysis, it is found that these genes are mainly related to multiple pathways such as focal adhesion, Tight junction and
Hippo signaling pathway.
[0067] In addition, in the embodiment of the application, Gene Ontology analysis is also carried out. Through the analysis of Gene Ontology, it can be found that DEGs are mainly related to the Gene Ontology terms GO: 0005911 and GO: 0015629. Among them, GO: 0005911 involves the process of cell cell adhesion, while GO: 0015629 involves the process of gene expression regulation.
[0068] In the embodiment of this application, the tinyarray package can be used to evaluate the correlation between DEGs and prognosis during further univariate COX survival analysis.
According to the set threshold, when the P-value is less than 0.01, it can be considered that genes are related to prognosis. In the implementation example of this application, a total of 112 genes related to prognosis were selected by screening for genes with a P value less than 0.01.
[0069] Subsequently, random forest analysis algorithm was used to screen out the most relevant prediction gene for the prognosis of Lung adenocarcinoma patients from 112 genes related to prognosis with P value less than 0.01 for the survival period of Lung adenocarcinoma patients.
[0070] Figure 4 is a schematic diagram of the Gene prediction capability provided by an embodiment of the application.
[0071] Specifically, as shown in Figure 4, the four genes with the strongest prediction ability screened by the random forest analysis algorithm are ANKRD29, RHOV, TLE1 and NPAS2.
[0072] Understandably, these prognostic related genes may play an important role in the survival and prognosis of patients with Lung adenocarcinoma. They may be related to the development, progression, and treatment response of tumors, providing valuable information for further clinical research and individualized treatment.
[0073] Subsequently, the prognosis model of lung adenocarcinoma can be constructed through
ANKRD29, RHOV, TLE1 and NPAS2, and can be used to calculate the prognostic risk coefficient.
[0074] Specifically, the risk coefficient is equal to 0.0595 * NPAS2+0.1717 * TLE1+0.1217 *
RHOV+(-0.073) * ANKRD29.
S166P6LU-3026-00001LU 21.07.2023
[0075] Figure Sa is a schematic diagram of K-M survival analysis provided in an embodiment, U103183 of the present application. Figure Sb is a schematic diagram of K-M survival analysis provided in an embodiment of the present application.
[0076] It can be understood that Lung adenocarcinoma patients are divided into high-risk group and low-risk group according to the median of risk coefficient, and analyzed through the Kaplan
Meier survival curve. As shown in Figure 5a and Figure 5b, the survival period of Lung adenocarcinoma patients in the low-risk group is longer, significantly better than that of Lung adenocarcinoma patients in the high-risk group.
[0077] It can be understood that the K-M survival curve usually takes time as the horizontal axis and survival rate (or probability of survival) as the vertical axis. The curve represents the probability of survival at different time points and can display the survival differences between different groups or subgroups. The survival curve may gradually decrease over time, reflecting an increased risk of events such as death. The K-M survival curve 1s widely used in clinical and biomedical research, especially in evaluating treatment efficacy, predicting patient survival, and studying prognostic factors. By comparing the survival curves of different groups or subgroups, we can determine whether there are survival differences and the significance of the differences.
[0078] Specifically, in the K-M survival curve, the survival curve of the low-risk group may exhibit higher survival rates, and the rate of decline in survival may be slower over time. In contrast, the survival curve of the high-risk group may exhibit a lower survival rate, and the rate of decline in survival may be faster. Understandably, this result is very important for predicting the prognosis of lung adenocarcinoma patients and guiding treatment decisions. Low risk group lung adenocarcinoma patients may be more conducive to receiving less treatment or monitoring, while high-risk group lung adenocarcinoma patients may need more active and individualized treatment strategies. Therefore, according to the prognostic risk stratification of lung adenocarcinoma patients, more accurate individualized treatment can be provided for
Lung adenocarcinoma patients to improve their survival and prognosis.
[0079] Figure 6 is a schematic diagram of the survival rate of a patient with lung adenocarcinoma provided by an embodiment of the application.
[0080] In the embodiment of the application, Figure 6 shows the 1-year, 3-year and 5-year survival rates of patients with Lung adenocarcinoma predicted by the prognosis model of Lung adenocarcinoma. Specifically, the area under the Receiver operating characteristic for 1-year survival rate of lung adenocarcinoma patients 1s 0.67, the AUC for 3-year survival rate of lung adenocarcinoma patients 1s 0.69, and the AUC for 5-year survival rate of lung adenocarcinoma patients 1s 0.64. 16
S166P6LU-3026-00001LU 21.07.2023
[0081] It can be understood that the prognosis model of Lung adenocarcinoma constructed byLU103183 using the method for building the prognosis model of Lung adenocarcinoma based on cuproptosis-related genes provided by the application examples can accurately predict the survival period of patients, and can help doctors identify high-risk patients as early as possible and take corresponding intervention measures.
[0082] The construction method of prognosis model of Lung adenocarcinoma based on cuproptosis-related genes provided by the embodiment of the application can more accurately assess the prognosis risk of patients and provide reference for individualized treatment decisions. This prognosis model develops more effective treatment plans based on the patient's gene expression characteristics and survival information, improving the patient's survival rate and quality of life.
[0083] The sequence of the above-discussed embodiments of the application 1s adopted for illustration purposes and does not indicate the superiority of any embodiment. The above provides only the preferred embodiments of the application and does not intend to limit the scope of the claims of the application. Equivalent structure of equivalent variations of flow based on the contents of the description and the drawings of the application, or direct or indirect application thereof to other related field of technology, are all considered included in the scope of protection of the claims of the application. 17

Claims (10)

S166P6LU-3026-00001LU 21.07.2023 CLAIMS LU103183
1. A method for building a prognosis model of lung adenocarcinoma based on cuproptosis- related genes, characterized in that the method includes: obtaining single cell data of lung adenocarcinoma from a comprehensive database of gene expression; analyzing the single cell data of lung adenocarcinoma according to the single cell analysis to obtain a cell subgroup with the highest activity of the copper death pathway; screening differentially expressed genes between the cell subgroup with the highest activity of the copper death pathway and other cell subgroups; screening the differentially expressed genes to obtain survival prediction genes of lung adenocarcinoma; establishing the prognosis model of lung adenocarcinoma according to the survival prediction gene of lung adenocarcinoma, wherein the survival prediction gene of lung adenocarcinoma includes ANKRD29, RHOV, TLE1 and NPAS2.
2. The method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes as described in claim 1, characterized in that: the cell subgroup with the highest activity of the copper death pathway is the epithelial cell subgroup.
3. The method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes according to claim 2, characterized in that screening the differentially expressed genes to obtain survival prediction genes of lung adenocarcinoma includes: applying the single factor COX survival analysis algorithms to analyze the differentially expressed genes to obtain prognosis related genes; and applying the random forest survival algorithm to screen the prognosis related genes to obtain the survival prediction genes of the lung adenocarcinoma. 2
S166P6LU-3026-00001LU 21.07.2023
4. The method for building a prognosis model of lung adenocarcinoma based on-V103183 cuproptosis-related genes as described in claim 2, characterized in that the method further includes: filtering the differentially expressed genes between the epithelial cell subpopulation and other cell subpopulations according to a FindMarkers algorithm.
5. The method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes according to claim 1, characterized in that the method for establishing a prognosis model of lung adenocarcinoma based on the survival prediction gene of lung adenocarcinoma includes: obtaining the transcriptional map and clinical information of lung adenocarcinoma patients according to the Cancer Genome Atlas; and establishing the prognosis model of the lung adenocarcinoma according to the survival prediction gene of the lung adenocarcinoma, the transcriptional map and the clinical information.
6. The method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes according to claim 5, characterized in that the single cell data of lung adenocarcinoma obtained from the gene expression comprehensive database further includes: excluding clinical information where the patient's survival time is less than 30 days.
7. The method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes according to claim 6, characterized in that the prognosis model of Lung adenocarcinoma has a risk coefficient which is equal to 0.0595 * NPAS2+0.1717 * TLE1+0.1217 * RHOV+(-0.073) * ANKRD29.
8. The construction method of a lung adenocarcinoma prognosis model based on cuproptosis-related genes as claimed in claim 7, characterized in that the lung adenocarcinoma prognosis model is used to predict 1-year, 3-year, and 5-year survival rates 3
S166P6LU-3026-00001LU 21.07.2023 of lung adenocarcinoma patients. LU103183
9. The method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes as claimed in claim 7, characterized in that the method further includes: establishing a training group and a validation group based on the transcriptome and clinical information; training the Lung adenocarcinoma prognosis model according to the training group; and validating the prediction results of the lung adenocarcinoma prognosis model according to the validation group.
10. The method for building a prognosis model of lung adenocarcinoma based on cuproptosis-related genes according to claim 9, characterized in that the method further includes: inputting the data of the training group into the lung adenocarcinoma prognosis model to obtain the median of the risk coefficient; and dividing the patients with lung adenocarcinoma into a high-risk group and a low-risk group according to the median. 4
LU103183A 2023-06-16 2023-07-31 Method for building prognosis model of lung adenocarcinoma based on cuproptosis-related genes LU103183B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310725068.5A CN116758986A (en) 2023-06-16 2023-06-16 Construction method of lung adenocarcinoma prognosis model based on copper death related gene

Publications (1)

Publication Number Publication Date
LU103183B1 true LU103183B1 (en) 2024-01-31

Family

ID=87947411

Family Applications (1)

Application Number Title Priority Date Filing Date
LU103183A LU103183B1 (en) 2023-06-16 2023-07-31 Method for building prognosis model of lung adenocarcinoma based on cuproptosis-related genes

Country Status (2)

Country Link
CN (1) CN116758986A (en)
LU (1) LU103183B1 (en)

Also Published As

Publication number Publication date
CN116758986A (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN112048559B (en) Model construction and clinical application of m 6A-related IncRNA network gastric cancer prognosis
CN111128299B (en) Construction method of ceRNA regulation and control network with significant correlation to colorectal cancer prognosis
CN111435608B (en) Protein drug binding site prediction method based on deep learning
CN106815486B (en) Systemic pharmacology method for personalized medicine application
CN111128385B (en) Prognosis early warning system for esophageal squamous carcinoma and application thereof
CN108335756B (en) Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database
CN108206056B (en) Nasopharyngeal darcinoma artificial intelligence assists diagnosis and treatment decision-making terminal
CN111653314B (en) Method for analyzing and identifying lymphatic infiltration
CN111440869A (en) DNA methylation marker for predicting primary breast cancer occurrence risk and screening method and application thereof
CN112481378A (en) Breast cancer patient recurrence risk 20 gene prediction model based on breast cancer single cell transcriptome sequencing analysis
CN115762792A (en) Method for predicting survival prognosis of bladder cancer patient based on lncRNA optimization model
CN114913919A (en) Intelligent reading and reporting method, system and server for genetic variation of single-gene disease
CN108320797B (en) Nasopharyngeal carcinoma database and comprehensive diagnosis and treatment decision method based on database
CN113584175A (en) Group of molecular markers for evaluating renal papillary cell carcinoma progression risk and screening method and application thereof
CN113362895A (en) Comprehensive analysis method for predicting anti-cancer drug response related gene
LU103183B1 (en) Method for building prognosis model of lung adenocarcinoma based on cuproptosis-related genes
Liu et al. Retyping of triple‐negative breast cancer based on clustering method
CN112746108B (en) Gene marker for tumor prognosis hierarchical evaluation, evaluation method and application
Reddy et al. Real-time data mining-based cancer disease classification using KEGG gene dataset
CN117766024B (en) Ovarian cancer CD8+T cell related prognosis evaluation method, system and application thereof
KR20210080766A (en) Method And System For Constructing Cancer Patient Specific Gene Networks And Finding Prognostic Gene Pairs
Blazadonakis et al. Complementary gene signature integration in multiplatform microarray experiments
Schreidah et al. Current status of artificial intelligence methods for skin cancer survival analysis: a scoping review
CN114520060B (en) Medicine path prediction method based on network reasoning
Li et al. Using the SVM Method for Lung Adenocarcinoma Prognosis Based on Expression Level