CN112908470B - Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof - Google Patents
Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof Download PDFInfo
- Publication number
- CN112908470B CN112908470B CN202110172416.1A CN202110172416A CN112908470B CN 112908470 B CN112908470 B CN 112908470B CN 202110172416 A CN202110172416 A CN 202110172416A CN 112908470 B CN112908470 B CN 112908470B
- Authority
- CN
- China
- Prior art keywords
- hepatocellular carcinoma
- score
- binding protein
- gene
- scoring system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Primary Health Care (AREA)
- Chemical & Material Sciences (AREA)
- Databases & Information Systems (AREA)
- Genetics & Genomics (AREA)
- Pathology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Computational Biology (AREA)
- Epidemiology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Analytical Chemistry (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a hepatocellular carcinoma prognosis scoring system based on RNA-binding protein genes and application thereof, wherein input variables of the hepatocellular carcinoma prognosis scoring system comprise a coefficient of kennel and a score of the hepatocellular carcinoma-related RNA-binding protein genes in a data set, and the score is determined according to the mRNA expression level and risk ratio of the hepatocellular carcinoma-related RNA-binding protein genes in the data set; the output variable of the hepatocellular carcinoma prognosis scoring system is a weighted sum of the coefficient of the kennel and the score of the hepatocellular carcinoma-associated RNA-binding protein gene. The hepatocellular carcinoma prognosis scoring system can effectively evaluate and predict prognosis of patients, has the characteristics of cross-platform and universal applicability, and has wide clinical application prospect, and the score is obviously related to part of clinical characteristics of the patients.
Description
Technical Field
The invention belongs to the technical field of oncology, and relates to a hepatocellular carcinoma prognosis scoring system based on an RNA binding protein gene and application thereof.
Background
Hepatocellular carcinoma (Hepatocellular carcinoma, HCC) has a survival rate of less than 20% in 5 years due to hidden disease and difficult early diagnosis. In recent years, surgery, radiotherapy and chemotherapy technology, targeted therapy technology and immunotherapy technology have been significantly advanced, and new hopes are brought to HCC patients, but the curative effect of HCC in middle and late stages is still very limited. Many molecular mechanisms in the HCC development process are not clear, so it is necessary to further perfect the jigsaw of HCC molecular mechanisms and find key molecules, thereby assisting HCC early diagnosis, personalized treatment and prognosis.
RNA Binding Proteins (RBPs), which are important proteins involved in posttranscriptional regulatory events, control the metabolic behavior of many RNAs in a dynamic fashion by virtue of diverse RNA binding (RNA-binding) regions and flexible structures, including: RNA splicing, localization, transport and stability maintenance. It has been clarified at present that RBPs can participate in the occurrence and development of tumors through the actions of posttranscriptional regulation and the like, and target genes comprise oncogenes, cell cycle/apoptosis regulatory factor genes, autophagy regulatory factor genes, inflammatory factor genes and the like; there are some differentially expressed RBPs in cancer tissue and paracancerous tissue, which correlate with prognosis and clinical characteristics of the patient. Recent TCGA-based high-throughput data analysis studies suggest that altered expression of RBPs exists in a variety of cancers, particularly HCC. Many studies have identified RBPs associated with the development and progression of HCC: for example, zhao et al found that RNA-binding protein RPS3 promoted proliferation of HCC cells by posttranscriptional regulation of SIRT1 (Zhao L, cao J, huK, wang P, li G, he X, et al RNA-binding protein RPS3 contributes to hepatocarcinogenesis by post-transcriptionally up-modulating SIRT1.Nucleic Acids Res 2019;47 (4): 2011-28.); dong et al found that RNA binding protein RBM promoted the growth of HCC cells by regulating the production of circular RNA SCD-circular RNA 2 (Dong W, dai ZH, liu FC, guo XG, ge CM, ding J, et al, the RNA-binding protein RBM3 promotes cell proliferation in hepatocellular carcinoma by regulating circular RNA SCD-circular RNA 2production. EBioMedicine2019; 45:155-67).
However, most previous studies focused on the function and mechanism of single RBPs in HCC cells at an in vitro level, and the expression patterns of the same gene in different data sets may vary significantly, even with completely opposite results. If only a small number of queues are included in the study, the results are often limited, have no universal applicability and cannot reflect the actual situation. The prior art lacks a systematic retrospective study on RBPs and a study of clinical applications. Although some studies have involved a relationship between RBPs and clinical prognosis, many of these studies are based on a single dataset, and thus there are inconsistencies and contradictions in the results of the studies.
Therefore, it is necessary to obtain RBPs-related data having consistency and clinical application value based on a plurality of queues and a large-scale sample.
Disclosure of Invention
Aiming at the defects and actual demands of the prior art, the invention provides a hepatocellular carcinoma prognosis scoring system based on an RNA binding protein gene and application thereof, wherein the hepatocellular carcinoma prognosis scoring system can effectively predict the overall survival rate (OS) and disease-free survival rate (DFS) of HCC patients, and has higher clinical application value.
To achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a hepatocellular carcinoma prognostic scoring system based on an RNA-binding protein gene, the input variables of the hepatocellular carcinoma prognostic scoring system comprising the coefficient of kunity and the score of the hepatocellular carcinoma-associated RNA-binding protein gene in a dataset;
the score is determined based on mRNA expression levels and risk ratios of the hepatocellular carcinoma-associated RNA-binding protein genes in the dataset.
According to the invention, based on the results of Cox survival analysis and random forest models, a hepatocellular carcinoma prognosis scoring system RBP-score is constructed by utilizing the mRNA expression values of key 10 HCC-related RBPs genes, the importance of the 10 HCC-related RBPs genes is related to the prognosis and clinical characteristics of HCC patients, in different HCC data sets, the obvious overall survival rate and disease-free survival rate difference exist for different patients of the RBP-score, the higher the RBP-score is, the worse the overall survival rate and disease-free survival rate is, and the hepatocellular carcinoma prognosis scoring system RBP-score is an effective, cross-platform and universally applicable patient prognosis evaluation tool, and has a prognosis evaluation capability not weaker than a clinical TNM (tumor-based) staging method and practical clinical use value.
Preferably, the risk ratio (HR) is determined based on the overall survival of the hepatocellular carcinoma-associated RNA binding protein genes in the dataset based on a Cox-scale risk model, preferably GSE14520, TCGA-LIHC and ICGC-LIRI-JP, using a univariate Cox-scale risk model to find HR, and the software automatically gives the risk ratio, i.e. HR value, when performing the Cox-scale risk model calculation.
Preferably, the score is 0 or 1;
the mRNA expression level of the hepatocellular carcinoma-associated RNA-binding protein gene is greater than or equal to the average expression level (median level of expression) and the risk ratio is >1, or the mRNA expression level of the hepatocellular carcinoma-associated RNA-binding protein gene is < average expression level (median level of expression) and the risk ratio is <1, the score is 1, otherwise the score is 0.
Preferably, the hepatocellular carcinoma-associated RNA-binding protein gene comprises PRPF3, SLBP, CPEB3, PPARGC1A, IGF BP3, SF3B4, ILF2, CSTF2, ACO1, and FBL.
Preferably, the data set comprises any one or a combination of at least two of a liver cell cancer queue gene expression integrated database, a liver cell cancer genome map or international cancer genome alliance (tcm) japanese liver cancer data.
According to the invention, 30 hepatocellular carcinoma-related RNA-binding protein genes with consistent expression difference in HCC tissues are identified by integrating a plurality of large-scale cross-platform HCC queue mRNA expression data, and 10 key hepatocellular carcinoma-related RNA-binding protein genes are screened out by using a random forest of a machine learning algorithm, so that the accuracy is better, and the application value is higher.
Preferably, the integrated hepatocyte cancer queue gene expression database comprises any one or a combination of at least two of GSE14520, GSE22058, GSE25097, GSE36376, GSE45436, GSE64041, GSE76427, GSE54236 or GSE 63898.
Preferably, the data set comprises GSE14520, hepatocellular carcinoma genomic profile, and international cancer genome alliance japanese liver cancer data.
Preferably, the output variable of the hepatocellular carcinoma prognosis scoring system is a weighted sum of the coefficient of the kennel and the score of the hepatocellular carcinoma-associated RNA-binding protein gene.
Preferably, the calculation formula of the hepatocellular carcinoma prognosis scoring system is as follows:
RBP-score=∑(Gene_score×Gene_Weight)
wherein Gene_weight is the coefficient of foundation, gene_score is the score, and the value is 0 or 1.
In the invention, the RBP-score of the hepatocellular carcinoma prognosis scoring system constructed based on the random forest algorithm has higher potential application value in the aspects of indicating and predicting the prognosis of a patient, the RBP-score can effectively predict the OS and the DFS of an HCC patient, has certain correlation with other clinical characteristics TNM stage, AFP and metastasis risk of the patient, the clinical value of the RBP-score is verified in data sets of different platforms, and Cox analysis indicates that the RBP-score is an independent risk factor with poorer prognosis of the HCC patient.
Specifically, the determination method of Gene_score is carried out according to the following table, if the Gene is integrated with HR >1 and mRNA expression is equal to or greater than the median value, or with HR <1 and mRNA expression < median value, then the gene_score of the Gene is 1; otherwise, gene_score is 0. The high expression of the gene expression level in the table is defined as the expression value being equal to or greater than the median value, the low expression is defined as the expression value being equal to or greater than the median value, and the integration HR value is obtained based on GSE14520, TCGA-LIHC and ICGC-LIRI-JP data by using a univariate COX proportion risk model.
In a second aspect, the invention provides a hepatocellular carcinoma-associated RNA binding protein gene marker comprising PRPF3, SLBP, CPEB3, PPARGC1A, IGF BP3, SF3B4, ILF2, CSTF2, ACO1, and FBL.
Preferably, the hepatocellular carcinoma-associated RNA binding protein gene markers further comprise POLR2G, MBNL2, PUF60, RALY, LSM4, CASC3, ZFP36, LARP1, SNRPC, TSNAX, RBM34, IGF2BP2, ABCF1, NSUN6, RBMS3, NONO, LSM2, SNRPB, ZGPAT, and XPO5.
In the invention, as shown in figure 1, the general technical roadmap is shown in the figure 1, firstly, in a gene chip expression comprehensive database of 9 liver cell cancer queues, 30 RBPs with consistent expression modes in 9 HCC queues are screened from 430 RBPs with definite functions by utilizing a robust sequencing integration algorithm (RRA), microarray data of a plurality of different platforms can be effectively integrated by utilizing the robust sequencing integration algorithm (RRA), so that effective integration results are obtained, the HCC correlation RBPs with good specificity are screened by utilizing the robust sequencing integration algorithm, data dimension reduction is realized, the research range is reduced from 430 RBPs to 30 RBPs, and the research burden is greatly reduced. Subsequently, the integration results were verified in the RNA sequencing data of both TCGA-LIHC and ICGC-LIRI-JP, and the expression pattern of these 30 HCC-associated RBPs in the RNA sequencing data was completely identical to the expression pattern in the 9 mRNA gene chip data.
In the invention, the importance of 30 HCC-related RBPs genes in determining the 5-year survival period of a patient is calculated by further utilizing a random forest algorithm, and the most important 10 HCC-related RBPs genes are screened.
In a third aspect, the present invention provides a screening method of the hepatocellular carcinoma-associated RNA-binding protein gene marker of the second aspect, comprising:
and screening out RNA binding protein genes with consistent expression difference from a comprehensive database of liver cell cancer queue gene expression to obtain initial liver cell cancer related RNA binding protein genes.
Preferably, 30 RNA-binding protein genes with identical expression differences are screened out of the 9 hepatocellular carcinoma queue gene expression comprehensive databases by using a robust sequencing integration algorithm (Robust rank aggregation, RRA), and further verified in TCGA-LIHC and ICGC-LIRI-JP queues; 30 initial RNA-binding protein genes with consistent expression differences were analyzed for copy number variation, single nucleotide mutation, and methylation degree of the promoter region, and part of the RNA-binding protein genes were risk or protective factors for patient prognosis.
Preferably, the screening method further comprises:
taking a liver cell cancer genome map as a training set, dividing the training set sample into a 5-year survival patient and a 5-year non-survival patient, and establishing a random forest classification model;
and classifying the initial hepatocellular carcinoma-related RNA binding protein genes by using the random forest classification model to obtain the key hepatocellular carcinoma-related RNA binding protein genes.
In a fourth aspect, the present invention provides the use of the hepatocellular carcinoma prognosis scoring system of the first aspect for preparing a hepatocellular carcinoma prognosis monitoring product.
Preferably, the hepatocellular carcinoma prognosis monitoring product comprises a hepatocellular carcinoma prognosis monitoring kit and/or a hepatocellular carcinoma prognosis monitoring medical device.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention obtains the HCC correlation RBPs gene expression profile with high consistency of expression difference by integrating and analyzing a plurality of large-scale cross-platform HCC queue mRNA expression data, and constructs a hepatocellular carcinoma prognosis scoring system by utilizing a random forest of a machine learning model, wherein the hepatocellular carcinoma prognosis scoring system is a simple and powerful prognosis evaluating tool, has cross-platform characteristics and is suitable for HCC patients of different subgroups;
(2) The HCC-related RNA binding protein gene identified by the invention can be used as a novel hepatocellular carcinoma diagnosis and treatment target and used for HCC prognosis prediction and evaluation.
Drawings
FIG. 1 is a general technical roadmap;
FIG. 2A is a diagram showing that 30 HCC-related RBPs with high consistency of expression characteristics are identified in an HCC queue by a robust sequencing integration algorithm, FIG. 2B is the expression of 30 HCC-related RBPs with high consistency of expression characteristics in 9 different HCC chip data queues, FIG. 2C is the expression of 30 HCC-LIHC RNA sequencing data, FIG. 2D is the expression of 30 HCC-related RBPs with high consistency of expression characteristics in ICGC-LIRI-JP RNA sequencing data, FIG. 2E is the expression of 30 HCC-related RBPs with high consistency of expression characteristics in HCC patient TNM in a TCGA-LIHC queue, FIG. 2F is the expression of 30 HCC-related RBPs with high consistency of expression characteristics in HCC patient TNM in an ICGC-LIRI-JP queue, FIG. 2G is the expression of 30 HCC-LIHC genes with high consistency of expression characteristics in an HCC-LIRI-JP queue, and the HCC-LIHC tissue can be distinguished from the HCC-LIRI tissue in an HCC-LIHC tissue can be effectively differentiated from the HCC-LIHC tissue in the TCGA-LIHC tissue in the HCC-LIHC queue with high consistency of expression characteristics;
FIG. 3A is a graph showing the total risk of survival of 30 HCC-associated RBPs in TCGA-LIHC using the Cox model, FIG. 3B is a Kaplan-Meier curve of RBPs associated with total survival in TCGA-LIHC, FIG. 3C is a Kaplan-Meier curve of RBPs associated with disease-free survival in TCGA-LIHC, and FIG. 3D is a comprehensive survival analysis result of 30 HCC-associated RBPs in three HCC datasets of TCGA-LIHC, GSE14520 and ICGC-LIRI-JP;
fig. 4A is a calculation of total survival of TCGA-LIHC patients using Kaplan-Meier curves, fig. 4B is a calculation of total survival of GSE14520 patients using Kaplan-Meier curves, fig. 4C is a calculation of total survival of ICGC-LIRI-JP patients using Kaplan-Meier curves, fig. 4D is an evaluation of accuracy and specificity of total survival of TCGA-LIHC patients for 1 year, 3 years and 5 years using ROC curves, fig. 4E is an evaluation of accuracy and specificity of total survival of GSE14520 patients for 1 year, 3 years and 5 years using ROC curves, fig. 4F is an evaluation of accuracy and specificity of total survival of ICGC-LIRI-JP patients for 1 year, 3 years and 5 years using ROC curves, fig. 4G is a calculation of disease-free kaga-LIHC patients using ROC curves, fig. 4E is an evaluation of disease-free survival of gsp-LIHC patients for no gsp 14520 years using ROC curves;
FIG. 5A is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by gender, FIG. 5B is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by age, FIG. 5C is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by TNM, FIG. 5D is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by AFP level, FIG. 5E is a result of prognosis of HCC patients with HBV infection after stratification by RBP-score for each subgroup of patients, FIG. 5F is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by HCV infection, FIG. 5G is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by liver cirrhosis.
Detailed Description
The technical means adopted by the invention and the effects thereof are further described below with reference to the examples and the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.
The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or apparatus used were conventional products commercially available through regular channels, with no manufacturer noted.
Example 1
This example first obtained mRNA chip data GSE14520, GSE22058, GSE25097, GSE36376, GSE45436, GSE64041, GSE76427, GSE54236 and GSE63898 for 9 HCC queues from the comprehensive gene expression database (Gene Expression Omnibus, GEO, https:// www.ncbi.nlm.nih.gov/GEO /), collected the hepatocellular carcinoma genomic map (The Cancer Genome Atlas Liver Hepatocellular Carcinoma, TCGA-LIHC) including mRNA expression data from NIH GDC Data Portal (https:// portal. Gdc. Cancer//), and International cancer genomic alliance Japanese liver cancer (International Cancer Genome Consortium Japanese liver cancer, ICGC-LIRI-JP) data from ICGC DCC (https:// DCC. Org/release).
The gene identifiers of the HCC queue data set are the latest HUGO gene names, and the mRNA expression data are normalized by log 2.
In the following examples, statistical analysis was performed using R software (version 3.6.1), differences in normal distribution data between the two groups were analyzed using independent sample t-test, and differences in non-normal distribution data were analyzed using wilcoxon test; the relation between single HCC correlation RBPs genes and RBP-score and patient OS and DFS is analyzed by Kaplan-Meier survival analysis log-rank test, and single HCC correlation RBPs genes, RBP-score and other clinical indexes affecting OS are obtained by single-factor and multi-factor Cox analysis; the relation between RBP-score and clinical characteristics such as TNM stage and AFP level of a patient is analyzed by using chi-square test; p <0.05 is defined as the difference is statistically significant.
Example 2
This example uses a robust rank-ordered integration algorithm (Robust rank aggregation, RRA) (Kolde R, laur S, adler P, vilo J.Robust rank aggregation for gene list integration and meta-analysis.Bioinformatics 2012;28 (4): 573-80) to integrate 9 mRNA chip data to obtain mRNA with consistent expression patterns, and RRA algorithm was performed using R software (version 3.6.1). Genes with fold-over-differential expression (1.5 or < -1.5) and P <0.05 in cancer and normal tissues were selected to create HCC RRA lists containing 1326 genes with significant differential expression, which showed consistent up-or down-regulation in 9 HCC queues.
430 RNA-binding protein genes were obtained from the differentially expressed significant genes by searching The database of RNA-binding specificities (RBPDB, http:// RBPDB. Ccbr. Utoronto. Ca /) and referencing the research effort of Gerstberger (Gerstberger S, hafner M, tuschl T.A census of human RNA-binding proteins. Nat Rev Genet2014;15 (12): 829-45), and a list of human RBPs genes was constructed, the translation products of the genes in this list being functionally identified proteins that were capable of exerting RNA-binding functions.
As shown in FIG. 2A, the HCC RRA list and the RBPs gene list were intersected to identify 30 RBP mRNAs with identical expression in the chip dataset of 9 HCC queues, which was defined as HCC-associated RBPs gene (HCC-associated RBPs genes). The difference in expression of these 30 RBP mrnas in 9 HCC queues (cancer tissue vs normal tissue) is shown in fig. 2B, where 8 RBP mrnas exhibited low expression in HCC tissue (P < 0.05) and 22 RBP mrnas exhibited high expression in HCC tissue (P < 0.05).
Next, the expression of HCC-related RBPs genes was verified using RNA sequencing data of TCGA-LIHC and ICGC-LIRI-JP, as shown in fig. 2C and 2D, and the expression of 30 HCC-related RBPs genes in TCGA-LIHC and ICGC-LIRI-JP was highly consistent with that in 9 HCC queues.
Further analysis of the expression of 30 HCC-associated RBPs genes in different TNM stage tissues, as shown in fig. 2E and 2F, some RNA binding proteins such as XPO5 and CPEB3 were significantly different in early and late tissues, and this difference was consistent with the trend of change in normal and tumor tissues, so that these RNA binding proteins were likely to exert a pro-or anti-cancer effect. The PCA analysis results of 30 HCC-related RBPs genes in TCGA-LIHC and ICGC-LIRI-JP are shown in FIG. 2G and FIG. 2H, and the PCA analysis shows that the mRNA expression profile of 30 HCC-related RBPs genes can effectively distinguish tumor tissues from normal tissues, thereby indicating that the identified 30 HCC-related RBPs genes are highly related to HCC, and having the value of further research.
Example 3
The present example further explored the clinical utility value of 30 HCC-related RBPs genes, analyzing the correlation of mRNA expression data of individual RBP genes with prognosis in 3 HCC cohorts GSE14520, TCGA-LIHC and ICGC-LIRI-JP.
Survival analysis results of 30 RBP genes based on a Cox proportional risk model (Cox proportional hazards model) are shown in FIG. 3A; taking the median value of mRNA expression as cut-off, kaplan-Meier curves for overall survival and disease-free survival for each HCC-associated RBP gene are shown in fig. 3B and 3C (only the results of log-rank test P <0.05 are shown); the results of the integrated analysis of the three data sets are shown in fig. 3D.
Most of the 30 RBPs genes are associated with survival of HCC patients, suggesting that these RBPs genes may exert a carcinomatous or carcinomatous effect. After integrating the three data sets of TCGA-LIHC, GSE14520 and ICGC-LIRI-JP, the genes of CSTF2, SF3B4, PPARGCA1 and RALY can be found to have consistent performance in the three data sets. Taken together with the evidence, it is demonstrated that some of the 30 HCC-associated RBPs genes are closely related to the survival of HCC patients.
Example 4
In order to obtain RBPs gene markers containing a smaller number of genes, the present example further screens key HCC-related RBPs genes using a random forest algorithm. The method comprises the following steps:
taking the TCGA-LIHC data set as a training set, dividing all patients into 5-year survival patients and 5-year non-survival patients, establishing a random forest classification model, and taking mRNA expression data of HCC-related RBPs genes as input variables (ntree=500); after 10-fold cross-validation (10-fold cross validation, cv=10) layering, 10 key HCC-related RBPs genes were selected to construct RBPs gene markers based on Mean Gini value (cut-off=5.1) importance.
In order to apply HCC-related RBPs genes to clinic, this example constructs an HCC prognosis scoring system (prognostic score system, RBP-score) using the screened 10 key HCC-related RBPs genes, and the calculation formula of RBP-score is as follows:
RBP-score=∑(Gene_score×Gene_Weight)
where gene_weight is the coefficient of the random forest model, gene_score is calculated and determined according to the mRNA expression level of 10 key HCC-related RBPs genes and the corresponding risk ratio of integration (Integrated hazard ratio, integrated HR), integrated HR is determined according to the integration result of overall survival Cox regression analysis of 10 key HCC-related RBPs genes in GSE14520, TCGA-LIHC and ICGC-LIRI-JP 3 queues, and if Integrated HR >1 and mRNA expression amount > average expression amount of a certain Gene, or Integrated HR <1 and mRNA expression amount < average expression amount, gene_score=1 of the Gene, otherwise gene_score=0.
Next, RBP-score was examined for patient prognosis in GSE14520, TCGA-LIHC and ICGC-LIRI-JP 3 cohorts, and the patient HCC tissue RBP-score was calculated according to the above formula, and the patient was divided into four groups Q1, Q2, Q3 and Q4 with the lower quartile, median and upper quartile of RBP-score as cut-off, RBP-score Q1< RBP-score Q2< RBP-score Q3< RBP-score Q4, and the overall survival rate (OS) and disease-free survival rate (DFS) of each group were analyzed.
As shown in fig. 4A, 4B and 4C, a trend of decreasing patient OS with increasing RBP-score can be clearly observed; as shown in fig. 4D, 4E, and 4F, the prediction accuracy of ROC analysis RBP-score for 1 year OS, 3 years OS, and 5 years OS was > 65% in each dataset; as shown in fig. 4G and 4H, a higher RBP-score implies a worse DFS in GSE14520 and TCGA-LIHC.
Based on the chi-square analysis of RBP-score with other clinical features of HCC patients (TCGA-LIHC and GSE 14520), it was found that patients with higher RBP-score had AFP >300ng/mL, TNM stage advanced (III-IV), CLIP stage advanced (> 3), tumor size >5cm, and a higher proportion of vascular infiltration occurred.
In TCGA-LIHC and GSE14520, cox proportional hazards model (Cox proportional hazards analysis) results combined with other clinical characteristics of the patient showed that RBP-score is an independent risk factor for HCC patients with poor overall survival (HRTCGA-lihc=2.57, hrgse 14520=1.66, p < 0.05).
Example 5
This example performs a subgroup survival (sub-group-survival) analysis, classifying HCC patients into subgroups according to 7 clinical parameters of gender, age, TNM stage, alpha Fetoprotein (AFP) level, HBV condition, HCV condition, cirrhosis, and each subgroup of patients was further classified into a high RBP-score group and a low RBP-score group (cut-off=rbp-score average value), and the Overall Survival (OS) and disease-free survival (DFS) of each group was analyzed.
As shown in fig. 5A, 5B, 5C, 5D, 5E, 5F, and 5G, RBP-score can effectively predict OS in each subgroup TCGA-LIHC with the median value as cut-off. Similar subgrouping-survivinal analysis was also performed in GSE14520 and ICGC-LIRI-JP, with the exception of the subgroup where distribution bias was present, RBP-score could indicate OS in most subgroups. It should be noted that RBP-score is also effective in suggesting OS in patients of the same clinical stage. It can be seen that HCC molecular prognosis scoring systems based on RBPs genes have general applicability.
In summary, the hepatocellular carcinoma prognosis scoring system constructed based on the hepatocellular carcinoma-associated RNA-binding protein gene is a simple and powerful prognosis evaluation tool, which is suitable for different subsets of HCC patients and has cross-platform characteristics, and the identified hepatocellular carcinoma-associated RNA-binding protein gene can be used as a novel HCC diagnosis and treatment target.
The applicant states that the detailed method of the present invention is illustrated by the above examples, but the present invention is not limited to the detailed method described above, i.e. it does not mean that the present invention must be practiced in dependence upon the detailed method described above. It should be apparent to those skilled in the art that any modification of the present invention, equivalent substitution of raw materials for the product of the present invention, addition of auxiliary components, selection of specific modes, etc., falls within the scope of the present invention and the scope of disclosure.
Claims (6)
1. A hepatocellular carcinoma prognosis scoring system based on an RNA-binding protein gene, wherein the input variables of the hepatocellular carcinoma prognosis scoring system comprise a coefficient of kurther and a score of the hepatocellular carcinoma-associated RNA-binding protein gene in a dataset;
the score is determined based on mRNA expression levels and risk ratios of the hepatocellular carcinoma-associated RNA-binding protein gene in the dataset;
the hepatocellular carcinoma-associated RNA binding protein gene comprises PRPF3, SLBP, CPEB3, PPARGC1A, IGF BP3, SF3B4, ILF2, CSTF2, ACO1 and FBL;
the data set comprises GSE14520, hepatocellular carcinoma genomic profile, and international cancer genome alliance, japan liver cancer data;
the output variable of the hepatocellular carcinoma prognosis scoring system is a weighted sum of the coefficient of the kennel and the score of the hepatocellular carcinoma-associated RNA-binding protein gene;
the calculation formula of the hepatocellular carcinoma prognosis scoring system is as follows:
RBP-score=∑(Gene_score×Gene_Weight)
wherein RBP-score is the score in the hepatocellular carcinoma prognosis scoring system, gene_weight is the coefficient of foundation, gene_score is the score, and Gene_score takes a value of 0 or 1.
2. The hepatocellular carcinoma prognostic scoring system according to claim 1, wherein the risk ratio is determined based on overall survival of the hepatocellular carcinoma-associated RNA binding protein gene in the dataset based on a univariate Cox-proportional risk model.
3. The hepatocellular carcinoma prognosis scoring system of claim 1, wherein the score is 0 or 1;
the mRNA expression level of the relevant RNA binding protein gene of the liver cell cancer is more than or equal to the average expression level and the risk ratio is more than 1, or the mRNA expression level of the relevant RNA binding protein gene of the liver cell cancer is less than the average expression level and the risk ratio is less than 1, the score is 1, otherwise, the score is 0.
4. A method of screening for a gene marker for a hepatocellular carcinoma-associated RNA binding protein, the method comprising:
screening out RNA binding protein genes with consistent expression difference from a comprehensive database of liver cell cancer queue gene expression to obtain initial liver cell cancer related RNA binding protein genes;
the screening method further comprises the steps of:
taking a liver cell cancer genome map as a training set, dividing the training set sample into a 5-year survival patient and a 5-year non-survival patient, and establishing a random forest classification model;
classifying the initial hepatocellular carcinoma-related RNA binding protein genes by using the random forest classification model to obtain key hepatocellular carcinoma-related RNA binding protein genes;
the RNA binding protein genes include PRPF3, SLBP, CPEB3, PPARGC1A, IGF2BP3, SF3B4, ILF2, CSTF2, ACO1, and FBL.
5. A method of preparing a hepatocellular carcinoma prognosis monitoring product, characterized in that the method employs the hepatocellular carcinoma prognosis scoring system of any one of claims 1-3.
6. The method of claim 5, wherein the hepatocellular carcinoma prognosis monitoring product comprises a hepatocellular carcinoma prognosis monitoring kit and/or a hepatocellular carcinoma prognosis monitoring medical device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110172416.1A CN112908470B (en) | 2021-02-08 | 2021-02-08 | Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110172416.1A CN112908470B (en) | 2021-02-08 | 2021-02-08 | Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112908470A CN112908470A (en) | 2021-06-04 |
CN112908470B true CN112908470B (en) | 2023-10-03 |
Family
ID=76122735
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110172416.1A Active CN112908470B (en) | 2021-02-08 | 2021-02-08 | Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112908470B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113416786A (en) * | 2021-08-09 | 2021-09-21 | 深圳市人民医院 | Biomarker combination for hepatocellular carcinoma prognosis evaluation and screening method and application thereof |
CN113611363B (en) * | 2021-08-09 | 2023-11-28 | 上海基绪康生物科技有限公司 | Method for identifying cancer driving gene by using consensus prediction result |
CN115920006B (en) * | 2022-09-19 | 2023-09-05 | 山东大学 | Application of ABCF1 or agonist thereof in preparation of anti-DNA virus preparation |
CN115807089A (en) * | 2022-11-14 | 2023-03-17 | 石河子大学 | Hepatocellular carcinoma prognosis biomarker and application thereof |
CN116844685B (en) * | 2023-07-03 | 2024-04-12 | 广州默锐医药科技有限公司 | Immunotherapeutic effect evaluation method, device, electronic equipment and storage medium |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1852974A (en) * | 2003-06-09 | 2006-10-25 | 密歇根大学董事会 | Compositions and methods for treating and diagnosing cancer |
CN101622348A (en) * | 2006-12-08 | 2010-01-06 | 奥斯瑞根公司 | Gene and the approach regulated as the miR-20 of targets for therapeutic intervention |
CN101801419A (en) * | 2007-06-08 | 2010-08-11 | 米尔纳疗法公司 | Gene and path as the miR-34 regulation and control for the treatment of the target of intervening |
CN104271033A (en) * | 2012-05-03 | 2015-01-07 | 曼迪奥研究有限公司 | Methods and systems of evaluating a risk of a gastrointestinal cancer |
CN106771200A (en) * | 2016-11-22 | 2017-05-31 | 陈静 | Application and kit of the IGF2BP3 and AFP joint marks in liver cancer diagnosis and treatment assessment kit is prepared |
CN107657149A (en) * | 2017-09-12 | 2018-02-02 | 中国人民解放军军事医学科学院生物医学分析中心 | System for predicting liver cancer patient prognosis |
CN107922973A (en) * | 2015-07-07 | 2018-04-17 | 远见基因组系统公司 | Method and system for the modification detection based on sequencing |
CN108410984A (en) * | 2018-02-11 | 2018-08-17 | 中山大学 | RBMS3 is as tumor drug resistance detection, treatment and the application of prognosis molecule target spot |
CN108603230A (en) * | 2015-10-09 | 2018-09-28 | 南安普敦大学 | The screening of the adjusting of gene expression and protein expression imbalance |
CN109593848A (en) * | 2018-11-08 | 2019-04-09 | 浙江大学 | A kind of tumour correlated series, long-chain non-coding RNA and its application |
CN110070915A (en) * | 2017-11-10 | 2019-07-30 | 首尔大学医院 | The next generation utilizes the Prognosis in Breast Cancer prediction technique and forecasting system based on machine learning of base sequence analysis |
KR20190090939A (en) * | 2018-01-26 | 2019-08-05 | 충남대학교산학협력단 | Biomarker composition comprising human cytochrome P450 4A for diagnosis or predicting prognosis of hepatocellular carcinoma |
CN110993106A (en) * | 2019-12-11 | 2020-04-10 | 深圳市华嘉生物智能科技有限公司 | Liver cancer postoperative recurrence risk prediction method combining pathological image and clinical information |
CN110996990A (en) * | 2017-06-02 | 2020-04-10 | 亚利桑那州立大学董事会 | Universal cancer vaccines and methods of making and using the same |
CN111132682A (en) * | 2017-07-28 | 2020-05-08 | 雷莫内克斯生物制药有限公司 | Pharmaceutical composition for preventing or treating liver cancer |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201320061D0 (en) * | 2013-11-13 | 2013-12-25 | Electrophoretics Ltd | Materials nad methods for diagnosis and prognosis of liver cancer |
-
2021
- 2021-02-08 CN CN202110172416.1A patent/CN112908470B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1852974A (en) * | 2003-06-09 | 2006-10-25 | 密歇根大学董事会 | Compositions and methods for treating and diagnosing cancer |
CN101622348A (en) * | 2006-12-08 | 2010-01-06 | 奥斯瑞根公司 | Gene and the approach regulated as the miR-20 of targets for therapeutic intervention |
CN101627121A (en) * | 2006-12-08 | 2010-01-13 | 奥斯瑞根公司 | As the miRNA regulatory gene and the path for the treatment of the target of intervening |
CN101801419A (en) * | 2007-06-08 | 2010-08-11 | 米尔纳疗法公司 | Gene and path as the miR-34 regulation and control for the treatment of the target of intervening |
CN104271033A (en) * | 2012-05-03 | 2015-01-07 | 曼迪奥研究有限公司 | Methods and systems of evaluating a risk of a gastrointestinal cancer |
CN107922973A (en) * | 2015-07-07 | 2018-04-17 | 远见基因组系统公司 | Method and system for the modification detection based on sequencing |
CN108603230A (en) * | 2015-10-09 | 2018-09-28 | 南安普敦大学 | The screening of the adjusting of gene expression and protein expression imbalance |
CN106771200A (en) * | 2016-11-22 | 2017-05-31 | 陈静 | Application and kit of the IGF2BP3 and AFP joint marks in liver cancer diagnosis and treatment assessment kit is prepared |
CN110996990A (en) * | 2017-06-02 | 2020-04-10 | 亚利桑那州立大学董事会 | Universal cancer vaccines and methods of making and using the same |
CN111132682A (en) * | 2017-07-28 | 2020-05-08 | 雷莫内克斯生物制药有限公司 | Pharmaceutical composition for preventing or treating liver cancer |
CN107657149A (en) * | 2017-09-12 | 2018-02-02 | 中国人民解放军军事医学科学院生物医学分析中心 | System for predicting liver cancer patient prognosis |
CN110070915A (en) * | 2017-11-10 | 2019-07-30 | 首尔大学医院 | The next generation utilizes the Prognosis in Breast Cancer prediction technique and forecasting system based on machine learning of base sequence analysis |
KR20190090939A (en) * | 2018-01-26 | 2019-08-05 | 충남대학교산학협력단 | Biomarker composition comprising human cytochrome P450 4A for diagnosis or predicting prognosis of hepatocellular carcinoma |
CN108410984A (en) * | 2018-02-11 | 2018-08-17 | 中山大学 | RBMS3 is as tumor drug resistance detection, treatment and the application of prognosis molecule target spot |
CN109593848A (en) * | 2018-11-08 | 2019-04-09 | 浙江大学 | A kind of tumour correlated series, long-chain non-coding RNA and its application |
CN110993106A (en) * | 2019-12-11 | 2020-04-10 | 深圳市华嘉生物智能科技有限公司 | Liver cancer postoperative recurrence risk prediction method combining pathological image and clinical information |
Non-Patent Citations (3)
Title |
---|
《Prognostic potential of PRPF3 in hepatocellular carcinoma》;Liu, YL et al.;《AGING-US》;20200115;第12卷(第1期);第912-930页 * |
《TCGA 数据库中肝癌相关差异长链非编码RNA筛选和功能预测》;孙金旗等;《胃肠病学和肝病学杂志》;第28卷(第2期);第147-153页 * |
《基于蛋白互作网络分析SNRPB 在肝癌发生中的作用》;李康智等;《基因组学与应用生物学》;20191031;第38卷(第10期);第4673-4679页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112908470A (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112908470B (en) | Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof | |
Leader et al. | Single-cell analysis of human non-small cell lung cancer lesions refines tumor classification and patient stratification | |
Elias et al. | Diagnostic potential for a serum miRNA neural network for detection of ovarian cancer | |
Zhou et al. | Relapse-related long non-coding RNA signature to improve prognosis prediction of lung adenocarcinoma | |
Sun et al. | Gene co-expression network reveals shared modules predictive of stage and grade in serous ovarian cancers | |
Yan et al. | A novel seven‐lncRNA signature for prognosis prediction in hepatocellular carcinoma | |
Milanez-Almeida et al. | Cancer prognosis with shallow tumor RNA sequencing | |
Karn et al. | Data driven derivation of cutoffs from a pool of 3,030 Affymetrix arrays to stratify distinct clinical types of breast cancer | |
Lang et al. | Expression profiling of circulating tumor cells in metastatic breast cancer | |
Liu et al. | Identification and validation of gene module associated with lung cancer through coexpression network analysis | |
Michiels et al. | Multidimensionality of microarrays: statistical challenges and (im) possible solutions | |
Deng et al. | A four-methylated LncRNA signature predicts survival of osteosarcoma patients based on machine learning | |
Berkel et al. | Transcriptomic analysis reveals tumor stage-or grade-dependent expression of miRNAs in serous ovarian cancer | |
Fan et al. | Identification of a novel prognostic gene signature from the immune cell infiltration landscape of osteosarcoma | |
Zhu et al. | DNA methylation biomarkers for the occurrence of lung adenocarcinoma from TCGA data mining | |
Lehmann et al. | Evaluation of public cancer datasets and signatures identifies TP53 mutant signatures with robust prognostic and predictive value | |
Peng et al. | Construction and validation of an immune infiltration-related gene signature for the prediction of prognosis and therapeutic response in breast cancer | |
Guo et al. | In silico detection of potential prognostic circRNAs through a re‑annotation strategy in ovarian cancer | |
Wang et al. | Patterns of immune infiltration and survival in endocrine therapy-treated ER-positive breast cancer: A computational study of 1900 patients | |
Liu et al. | Identification of a seven-gene prognostic signature using the gene expression profile of osteosarcoma | |
Zheng et al. | Multi-omics characterization and validation of MSI-related molecular features across multiple malignancies | |
Zhang et al. | A novel machine learning derived RNA-binding protein gene–based score system predicts prognosis of hepatocellular carcinoma patients | |
Bhattacharya et al. | DeCompress: tissue compartment deconvolution of targeted mRNA expression panels using compressed sensing | |
Van Laar | Design and multiseries validation of a web-based gene expression assay for predicting breast cancer recurrence and patient survival | |
Ran et al. | Developing metabolic gene signatures to predict intrahepatic cholangiocarcinoma prognosis and mining a miRNA regulatory network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |