CN112908470B - Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof - Google Patents

Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof Download PDF

Info

Publication number
CN112908470B
CN112908470B CN202110172416.1A CN202110172416A CN112908470B CN 112908470 B CN112908470 B CN 112908470B CN 202110172416 A CN202110172416 A CN 202110172416A CN 112908470 B CN112908470 B CN 112908470B
Authority
CN
China
Prior art keywords
hepatocellular carcinoma
score
binding protein
gene
scoring system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110172416.1A
Other languages
Chinese (zh)
Other versions
CN112908470A (en
Inventor
刘利平
张强弩
刘权
严巧婷
张育森
孙哲
鲍世韵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Peoples Hospital
Original Assignee
Shenzhen Peoples Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Peoples Hospital filed Critical Shenzhen Peoples Hospital
Priority to CN202110172416.1A priority Critical patent/CN112908470B/en
Publication of CN112908470A publication Critical patent/CN112908470A/en
Application granted granted Critical
Publication of CN112908470B publication Critical patent/CN112908470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Primary Health Care (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Genetics & Genomics (AREA)
  • Pathology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Epidemiology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Analytical Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a hepatocellular carcinoma prognosis scoring system based on RNA-binding protein genes and application thereof, wherein input variables of the hepatocellular carcinoma prognosis scoring system comprise a coefficient of kennel and a score of the hepatocellular carcinoma-related RNA-binding protein genes in a data set, and the score is determined according to the mRNA expression level and risk ratio of the hepatocellular carcinoma-related RNA-binding protein genes in the data set; the output variable of the hepatocellular carcinoma prognosis scoring system is a weighted sum of the coefficient of the kennel and the score of the hepatocellular carcinoma-associated RNA-binding protein gene. The hepatocellular carcinoma prognosis scoring system can effectively evaluate and predict prognosis of patients, has the characteristics of cross-platform and universal applicability, and has wide clinical application prospect, and the score is obviously related to part of clinical characteristics of the patients.

Description

Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof
Technical Field
The invention belongs to the technical field of oncology, and relates to a hepatocellular carcinoma prognosis scoring system based on an RNA binding protein gene and application thereof.
Background
Hepatocellular carcinoma (Hepatocellular carcinoma, HCC) has a survival rate of less than 20% in 5 years due to hidden disease and difficult early diagnosis. In recent years, surgery, radiotherapy and chemotherapy technology, targeted therapy technology and immunotherapy technology have been significantly advanced, and new hopes are brought to HCC patients, but the curative effect of HCC in middle and late stages is still very limited. Many molecular mechanisms in the HCC development process are not clear, so it is necessary to further perfect the jigsaw of HCC molecular mechanisms and find key molecules, thereby assisting HCC early diagnosis, personalized treatment and prognosis.
RNA Binding Proteins (RBPs), which are important proteins involved in posttranscriptional regulatory events, control the metabolic behavior of many RNAs in a dynamic fashion by virtue of diverse RNA binding (RNA-binding) regions and flexible structures, including: RNA splicing, localization, transport and stability maintenance. It has been clarified at present that RBPs can participate in the occurrence and development of tumors through the actions of posttranscriptional regulation and the like, and target genes comprise oncogenes, cell cycle/apoptosis regulatory factor genes, autophagy regulatory factor genes, inflammatory factor genes and the like; there are some differentially expressed RBPs in cancer tissue and paracancerous tissue, which correlate with prognosis and clinical characteristics of the patient. Recent TCGA-based high-throughput data analysis studies suggest that altered expression of RBPs exists in a variety of cancers, particularly HCC. Many studies have identified RBPs associated with the development and progression of HCC: for example, zhao et al found that RNA-binding protein RPS3 promoted proliferation of HCC cells by posttranscriptional regulation of SIRT1 (Zhao L, cao J, huK, wang P, li G, he X, et al RNA-binding protein RPS3 contributes to hepatocarcinogenesis by post-transcriptionally up-modulating SIRT1.Nucleic Acids Res 2019;47 (4): 2011-28.); dong et al found that RNA binding protein RBM promoted the growth of HCC cells by regulating the production of circular RNA SCD-circular RNA 2 (Dong W, dai ZH, liu FC, guo XG, ge CM, ding J, et al, the RNA-binding protein RBM3 promotes cell proliferation in hepatocellular carcinoma by regulating circular RNA SCD-circular RNA 2production. EBioMedicine2019; 45:155-67).
However, most previous studies focused on the function and mechanism of single RBPs in HCC cells at an in vitro level, and the expression patterns of the same gene in different data sets may vary significantly, even with completely opposite results. If only a small number of queues are included in the study, the results are often limited, have no universal applicability and cannot reflect the actual situation. The prior art lacks a systematic retrospective study on RBPs and a study of clinical applications. Although some studies have involved a relationship between RBPs and clinical prognosis, many of these studies are based on a single dataset, and thus there are inconsistencies and contradictions in the results of the studies.
Therefore, it is necessary to obtain RBPs-related data having consistency and clinical application value based on a plurality of queues and a large-scale sample.
Disclosure of Invention
Aiming at the defects and actual demands of the prior art, the invention provides a hepatocellular carcinoma prognosis scoring system based on an RNA binding protein gene and application thereof, wherein the hepatocellular carcinoma prognosis scoring system can effectively predict the overall survival rate (OS) and disease-free survival rate (DFS) of HCC patients, and has higher clinical application value.
To achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a hepatocellular carcinoma prognostic scoring system based on an RNA-binding protein gene, the input variables of the hepatocellular carcinoma prognostic scoring system comprising the coefficient of kunity and the score of the hepatocellular carcinoma-associated RNA-binding protein gene in a dataset;
the score is determined based on mRNA expression levels and risk ratios of the hepatocellular carcinoma-associated RNA-binding protein genes in the dataset.
According to the invention, based on the results of Cox survival analysis and random forest models, a hepatocellular carcinoma prognosis scoring system RBP-score is constructed by utilizing the mRNA expression values of key 10 HCC-related RBPs genes, the importance of the 10 HCC-related RBPs genes is related to the prognosis and clinical characteristics of HCC patients, in different HCC data sets, the obvious overall survival rate and disease-free survival rate difference exist for different patients of the RBP-score, the higher the RBP-score is, the worse the overall survival rate and disease-free survival rate is, and the hepatocellular carcinoma prognosis scoring system RBP-score is an effective, cross-platform and universally applicable patient prognosis evaluation tool, and has a prognosis evaluation capability not weaker than a clinical TNM (tumor-based) staging method and practical clinical use value.
Preferably, the risk ratio (HR) is determined based on the overall survival of the hepatocellular carcinoma-associated RNA binding protein genes in the dataset based on a Cox-scale risk model, preferably GSE14520, TCGA-LIHC and ICGC-LIRI-JP, using a univariate Cox-scale risk model to find HR, and the software automatically gives the risk ratio, i.e. HR value, when performing the Cox-scale risk model calculation.
Preferably, the score is 0 or 1;
the mRNA expression level of the hepatocellular carcinoma-associated RNA-binding protein gene is greater than or equal to the average expression level (median level of expression) and the risk ratio is >1, or the mRNA expression level of the hepatocellular carcinoma-associated RNA-binding protein gene is < average expression level (median level of expression) and the risk ratio is <1, the score is 1, otherwise the score is 0.
Preferably, the hepatocellular carcinoma-associated RNA-binding protein gene comprises PRPF3, SLBP, CPEB3, PPARGC1A, IGF BP3, SF3B4, ILF2, CSTF2, ACO1, and FBL.
Preferably, the data set comprises any one or a combination of at least two of a liver cell cancer queue gene expression integrated database, a liver cell cancer genome map or international cancer genome alliance (tcm) japanese liver cancer data.
According to the invention, 30 hepatocellular carcinoma-related RNA-binding protein genes with consistent expression difference in HCC tissues are identified by integrating a plurality of large-scale cross-platform HCC queue mRNA expression data, and 10 key hepatocellular carcinoma-related RNA-binding protein genes are screened out by using a random forest of a machine learning algorithm, so that the accuracy is better, and the application value is higher.
Preferably, the integrated hepatocyte cancer queue gene expression database comprises any one or a combination of at least two of GSE14520, GSE22058, GSE25097, GSE36376, GSE45436, GSE64041, GSE76427, GSE54236 or GSE 63898.
Preferably, the data set comprises GSE14520, hepatocellular carcinoma genomic profile, and international cancer genome alliance japanese liver cancer data.
Preferably, the output variable of the hepatocellular carcinoma prognosis scoring system is a weighted sum of the coefficient of the kennel and the score of the hepatocellular carcinoma-associated RNA-binding protein gene.
Preferably, the calculation formula of the hepatocellular carcinoma prognosis scoring system is as follows:
RBP-score=∑(Gene_score×Gene_Weight)
wherein Gene_weight is the coefficient of foundation, gene_score is the score, and the value is 0 or 1.
In the invention, the RBP-score of the hepatocellular carcinoma prognosis scoring system constructed based on the random forest algorithm has higher potential application value in the aspects of indicating and predicting the prognosis of a patient, the RBP-score can effectively predict the OS and the DFS of an HCC patient, has certain correlation with other clinical characteristics TNM stage, AFP and metastasis risk of the patient, the clinical value of the RBP-score is verified in data sets of different platforms, and Cox analysis indicates that the RBP-score is an independent risk factor with poorer prognosis of the HCC patient.
Specifically, the determination method of Gene_score is carried out according to the following table, if the Gene is integrated with HR >1 and mRNA expression is equal to or greater than the median value, or with HR <1 and mRNA expression < median value, then the gene_score of the Gene is 1; otherwise, gene_score is 0. The high expression of the gene expression level in the table is defined as the expression value being equal to or greater than the median value, the low expression is defined as the expression value being equal to or greater than the median value, and the integration HR value is obtained based on GSE14520, TCGA-LIHC and ICGC-LIRI-JP data by using a univariate COX proportion risk model.
In a second aspect, the invention provides a hepatocellular carcinoma-associated RNA binding protein gene marker comprising PRPF3, SLBP, CPEB3, PPARGC1A, IGF BP3, SF3B4, ILF2, CSTF2, ACO1, and FBL.
Preferably, the hepatocellular carcinoma-associated RNA binding protein gene markers further comprise POLR2G, MBNL2, PUF60, RALY, LSM4, CASC3, ZFP36, LARP1, SNRPC, TSNAX, RBM34, IGF2BP2, ABCF1, NSUN6, RBMS3, NONO, LSM2, SNRPB, ZGPAT, and XPO5.
In the invention, as shown in figure 1, the general technical roadmap is shown in the figure 1, firstly, in a gene chip expression comprehensive database of 9 liver cell cancer queues, 30 RBPs with consistent expression modes in 9 HCC queues are screened from 430 RBPs with definite functions by utilizing a robust sequencing integration algorithm (RRA), microarray data of a plurality of different platforms can be effectively integrated by utilizing the robust sequencing integration algorithm (RRA), so that effective integration results are obtained, the HCC correlation RBPs with good specificity are screened by utilizing the robust sequencing integration algorithm, data dimension reduction is realized, the research range is reduced from 430 RBPs to 30 RBPs, and the research burden is greatly reduced. Subsequently, the integration results were verified in the RNA sequencing data of both TCGA-LIHC and ICGC-LIRI-JP, and the expression pattern of these 30 HCC-associated RBPs in the RNA sequencing data was completely identical to the expression pattern in the 9 mRNA gene chip data.
In the invention, the importance of 30 HCC-related RBPs genes in determining the 5-year survival period of a patient is calculated by further utilizing a random forest algorithm, and the most important 10 HCC-related RBPs genes are screened.
In a third aspect, the present invention provides a screening method of the hepatocellular carcinoma-associated RNA-binding protein gene marker of the second aspect, comprising:
and screening out RNA binding protein genes with consistent expression difference from a comprehensive database of liver cell cancer queue gene expression to obtain initial liver cell cancer related RNA binding protein genes.
Preferably, 30 RNA-binding protein genes with identical expression differences are screened out of the 9 hepatocellular carcinoma queue gene expression comprehensive databases by using a robust sequencing integration algorithm (Robust rank aggregation, RRA), and further verified in TCGA-LIHC and ICGC-LIRI-JP queues; 30 initial RNA-binding protein genes with consistent expression differences were analyzed for copy number variation, single nucleotide mutation, and methylation degree of the promoter region, and part of the RNA-binding protein genes were risk or protective factors for patient prognosis.
Preferably, the screening method further comprises:
taking a liver cell cancer genome map as a training set, dividing the training set sample into a 5-year survival patient and a 5-year non-survival patient, and establishing a random forest classification model;
and classifying the initial hepatocellular carcinoma-related RNA binding protein genes by using the random forest classification model to obtain the key hepatocellular carcinoma-related RNA binding protein genes.
In a fourth aspect, the present invention provides the use of the hepatocellular carcinoma prognosis scoring system of the first aspect for preparing a hepatocellular carcinoma prognosis monitoring product.
Preferably, the hepatocellular carcinoma prognosis monitoring product comprises a hepatocellular carcinoma prognosis monitoring kit and/or a hepatocellular carcinoma prognosis monitoring medical device.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention obtains the HCC correlation RBPs gene expression profile with high consistency of expression difference by integrating and analyzing a plurality of large-scale cross-platform HCC queue mRNA expression data, and constructs a hepatocellular carcinoma prognosis scoring system by utilizing a random forest of a machine learning model, wherein the hepatocellular carcinoma prognosis scoring system is a simple and powerful prognosis evaluating tool, has cross-platform characteristics and is suitable for HCC patients of different subgroups;
(2) The HCC-related RNA binding protein gene identified by the invention can be used as a novel hepatocellular carcinoma diagnosis and treatment target and used for HCC prognosis prediction and evaluation.
Drawings
FIG. 1 is a general technical roadmap;
FIG. 2A is a diagram showing that 30 HCC-related RBPs with high consistency of expression characteristics are identified in an HCC queue by a robust sequencing integration algorithm, FIG. 2B is the expression of 30 HCC-related RBPs with high consistency of expression characteristics in 9 different HCC chip data queues, FIG. 2C is the expression of 30 HCC-LIHC RNA sequencing data, FIG. 2D is the expression of 30 HCC-related RBPs with high consistency of expression characteristics in ICGC-LIRI-JP RNA sequencing data, FIG. 2E is the expression of 30 HCC-related RBPs with high consistency of expression characteristics in HCC patient TNM in a TCGA-LIHC queue, FIG. 2F is the expression of 30 HCC-related RBPs with high consistency of expression characteristics in HCC patient TNM in an ICGC-LIRI-JP queue, FIG. 2G is the expression of 30 HCC-LIHC genes with high consistency of expression characteristics in an HCC-LIRI-JP queue, and the HCC-LIHC tissue can be distinguished from the HCC-LIRI tissue in an HCC-LIHC tissue can be effectively differentiated from the HCC-LIHC tissue in the TCGA-LIHC tissue in the HCC-LIHC queue with high consistency of expression characteristics;
FIG. 3A is a graph showing the total risk of survival of 30 HCC-associated RBPs in TCGA-LIHC using the Cox model, FIG. 3B is a Kaplan-Meier curve of RBPs associated with total survival in TCGA-LIHC, FIG. 3C is a Kaplan-Meier curve of RBPs associated with disease-free survival in TCGA-LIHC, and FIG. 3D is a comprehensive survival analysis result of 30 HCC-associated RBPs in three HCC datasets of TCGA-LIHC, GSE14520 and ICGC-LIRI-JP;
fig. 4A is a calculation of total survival of TCGA-LIHC patients using Kaplan-Meier curves, fig. 4B is a calculation of total survival of GSE14520 patients using Kaplan-Meier curves, fig. 4C is a calculation of total survival of ICGC-LIRI-JP patients using Kaplan-Meier curves, fig. 4D is an evaluation of accuracy and specificity of total survival of TCGA-LIHC patients for 1 year, 3 years and 5 years using ROC curves, fig. 4E is an evaluation of accuracy and specificity of total survival of GSE14520 patients for 1 year, 3 years and 5 years using ROC curves, fig. 4F is an evaluation of accuracy and specificity of total survival of ICGC-LIRI-JP patients for 1 year, 3 years and 5 years using ROC curves, fig. 4G is a calculation of disease-free kaga-LIHC patients using ROC curves, fig. 4E is an evaluation of disease-free survival of gsp-LIHC patients for no gsp 14520 years using ROC curves;
FIG. 5A is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by gender, FIG. 5B is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by age, FIG. 5C is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by TNM, FIG. 5D is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by AFP level, FIG. 5E is a result of prognosis of HCC patients with HBV infection after stratification by RBP-score for each subgroup of patients, FIG. 5F is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by HCV infection, FIG. 5G is a result of prognosis of HCC patients with RBP-score for each subgroup of patients after stratification by liver cirrhosis.
Detailed Description
The technical means adopted by the invention and the effects thereof are further described below with reference to the examples and the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.
The specific techniques or conditions are not identified in the examples and are described in the literature in this field or are carried out in accordance with the product specifications. The reagents or apparatus used were conventional products commercially available through regular channels, with no manufacturer noted.
Example 1
This example first obtained mRNA chip data GSE14520, GSE22058, GSE25097, GSE36376, GSE45436, GSE64041, GSE76427, GSE54236 and GSE63898 for 9 HCC queues from the comprehensive gene expression database (Gene Expression Omnibus, GEO, https:// www.ncbi.nlm.nih.gov/GEO /), collected the hepatocellular carcinoma genomic map (The Cancer Genome Atlas Liver Hepatocellular Carcinoma, TCGA-LIHC) including mRNA expression data from NIH GDC Data Portal (https:// portal. Gdc. Cancer//), and International cancer genomic alliance Japanese liver cancer (International Cancer Genome Consortium Japanese liver cancer, ICGC-LIRI-JP) data from ICGC DCC (https:// DCC. Org/release).
The gene identifiers of the HCC queue data set are the latest HUGO gene names, and the mRNA expression data are normalized by log 2.
In the following examples, statistical analysis was performed using R software (version 3.6.1), differences in normal distribution data between the two groups were analyzed using independent sample t-test, and differences in non-normal distribution data were analyzed using wilcoxon test; the relation between single HCC correlation RBPs genes and RBP-score and patient OS and DFS is analyzed by Kaplan-Meier survival analysis log-rank test, and single HCC correlation RBPs genes, RBP-score and other clinical indexes affecting OS are obtained by single-factor and multi-factor Cox analysis; the relation between RBP-score and clinical characteristics such as TNM stage and AFP level of a patient is analyzed by using chi-square test; p <0.05 is defined as the difference is statistically significant.
Example 2
This example uses a robust rank-ordered integration algorithm (Robust rank aggregation, RRA) (Kolde R, laur S, adler P, vilo J.Robust rank aggregation for gene list integration and meta-analysis.Bioinformatics 2012;28 (4): 573-80) to integrate 9 mRNA chip data to obtain mRNA with consistent expression patterns, and RRA algorithm was performed using R software (version 3.6.1). Genes with fold-over-differential expression (1.5 or < -1.5) and P <0.05 in cancer and normal tissues were selected to create HCC RRA lists containing 1326 genes with significant differential expression, which showed consistent up-or down-regulation in 9 HCC queues.
430 RNA-binding protein genes were obtained from the differentially expressed significant genes by searching The database of RNA-binding specificities (RBPDB, http:// RBPDB. Ccbr. Utoronto. Ca /) and referencing the research effort of Gerstberger (Gerstberger S, hafner M, tuschl T.A census of human RNA-binding proteins. Nat Rev Genet2014;15 (12): 829-45), and a list of human RBPs genes was constructed, the translation products of the genes in this list being functionally identified proteins that were capable of exerting RNA-binding functions.
As shown in FIG. 2A, the HCC RRA list and the RBPs gene list were intersected to identify 30 RBP mRNAs with identical expression in the chip dataset of 9 HCC queues, which was defined as HCC-associated RBPs gene (HCC-associated RBPs genes). The difference in expression of these 30 RBP mrnas in 9 HCC queues (cancer tissue vs normal tissue) is shown in fig. 2B, where 8 RBP mrnas exhibited low expression in HCC tissue (P < 0.05) and 22 RBP mrnas exhibited high expression in HCC tissue (P < 0.05).
Next, the expression of HCC-related RBPs genes was verified using RNA sequencing data of TCGA-LIHC and ICGC-LIRI-JP, as shown in fig. 2C and 2D, and the expression of 30 HCC-related RBPs genes in TCGA-LIHC and ICGC-LIRI-JP was highly consistent with that in 9 HCC queues.
Further analysis of the expression of 30 HCC-associated RBPs genes in different TNM stage tissues, as shown in fig. 2E and 2F, some RNA binding proteins such as XPO5 and CPEB3 were significantly different in early and late tissues, and this difference was consistent with the trend of change in normal and tumor tissues, so that these RNA binding proteins were likely to exert a pro-or anti-cancer effect. The PCA analysis results of 30 HCC-related RBPs genes in TCGA-LIHC and ICGC-LIRI-JP are shown in FIG. 2G and FIG. 2H, and the PCA analysis shows that the mRNA expression profile of 30 HCC-related RBPs genes can effectively distinguish tumor tissues from normal tissues, thereby indicating that the identified 30 HCC-related RBPs genes are highly related to HCC, and having the value of further research.
Example 3
The present example further explored the clinical utility value of 30 HCC-related RBPs genes, analyzing the correlation of mRNA expression data of individual RBP genes with prognosis in 3 HCC cohorts GSE14520, TCGA-LIHC and ICGC-LIRI-JP.
Survival analysis results of 30 RBP genes based on a Cox proportional risk model (Cox proportional hazards model) are shown in FIG. 3A; taking the median value of mRNA expression as cut-off, kaplan-Meier curves for overall survival and disease-free survival for each HCC-associated RBP gene are shown in fig. 3B and 3C (only the results of log-rank test P <0.05 are shown); the results of the integrated analysis of the three data sets are shown in fig. 3D.
Most of the 30 RBPs genes are associated with survival of HCC patients, suggesting that these RBPs genes may exert a carcinomatous or carcinomatous effect. After integrating the three data sets of TCGA-LIHC, GSE14520 and ICGC-LIRI-JP, the genes of CSTF2, SF3B4, PPARGCA1 and RALY can be found to have consistent performance in the three data sets. Taken together with the evidence, it is demonstrated that some of the 30 HCC-associated RBPs genes are closely related to the survival of HCC patients.
Example 4
In order to obtain RBPs gene markers containing a smaller number of genes, the present example further screens key HCC-related RBPs genes using a random forest algorithm. The method comprises the following steps:
taking the TCGA-LIHC data set as a training set, dividing all patients into 5-year survival patients and 5-year non-survival patients, establishing a random forest classification model, and taking mRNA expression data of HCC-related RBPs genes as input variables (ntree=500); after 10-fold cross-validation (10-fold cross validation, cv=10) layering, 10 key HCC-related RBPs genes were selected to construct RBPs gene markers based on Mean Gini value (cut-off=5.1) importance.
In order to apply HCC-related RBPs genes to clinic, this example constructs an HCC prognosis scoring system (prognostic score system, RBP-score) using the screened 10 key HCC-related RBPs genes, and the calculation formula of RBP-score is as follows:
RBP-score=∑(Gene_score×Gene_Weight)
where gene_weight is the coefficient of the random forest model, gene_score is calculated and determined according to the mRNA expression level of 10 key HCC-related RBPs genes and the corresponding risk ratio of integration (Integrated hazard ratio, integrated HR), integrated HR is determined according to the integration result of overall survival Cox regression analysis of 10 key HCC-related RBPs genes in GSE14520, TCGA-LIHC and ICGC-LIRI-JP 3 queues, and if Integrated HR >1 and mRNA expression amount > average expression amount of a certain Gene, or Integrated HR <1 and mRNA expression amount < average expression amount, gene_score=1 of the Gene, otherwise gene_score=0.
Next, RBP-score was examined for patient prognosis in GSE14520, TCGA-LIHC and ICGC-LIRI-JP 3 cohorts, and the patient HCC tissue RBP-score was calculated according to the above formula, and the patient was divided into four groups Q1, Q2, Q3 and Q4 with the lower quartile, median and upper quartile of RBP-score as cut-off, RBP-score Q1< RBP-score Q2< RBP-score Q3< RBP-score Q4, and the overall survival rate (OS) and disease-free survival rate (DFS) of each group were analyzed.
As shown in fig. 4A, 4B and 4C, a trend of decreasing patient OS with increasing RBP-score can be clearly observed; as shown in fig. 4D, 4E, and 4F, the prediction accuracy of ROC analysis RBP-score for 1 year OS, 3 years OS, and 5 years OS was > 65% in each dataset; as shown in fig. 4G and 4H, a higher RBP-score implies a worse DFS in GSE14520 and TCGA-LIHC.
Based on the chi-square analysis of RBP-score with other clinical features of HCC patients (TCGA-LIHC and GSE 14520), it was found that patients with higher RBP-score had AFP >300ng/mL, TNM stage advanced (III-IV), CLIP stage advanced (> 3), tumor size >5cm, and a higher proportion of vascular infiltration occurred.
In TCGA-LIHC and GSE14520, cox proportional hazards model (Cox proportional hazards analysis) results combined with other clinical characteristics of the patient showed that RBP-score is an independent risk factor for HCC patients with poor overall survival (HRTCGA-lihc=2.57, hrgse 14520=1.66, p < 0.05).
Example 5
This example performs a subgroup survival (sub-group-survival) analysis, classifying HCC patients into subgroups according to 7 clinical parameters of gender, age, TNM stage, alpha Fetoprotein (AFP) level, HBV condition, HCV condition, cirrhosis, and each subgroup of patients was further classified into a high RBP-score group and a low RBP-score group (cut-off=rbp-score average value), and the Overall Survival (OS) and disease-free survival (DFS) of each group was analyzed.
As shown in fig. 5A, 5B, 5C, 5D, 5E, 5F, and 5G, RBP-score can effectively predict OS in each subgroup TCGA-LIHC with the median value as cut-off. Similar subgrouping-survivinal analysis was also performed in GSE14520 and ICGC-LIRI-JP, with the exception of the subgroup where distribution bias was present, RBP-score could indicate OS in most subgroups. It should be noted that RBP-score is also effective in suggesting OS in patients of the same clinical stage. It can be seen that HCC molecular prognosis scoring systems based on RBPs genes have general applicability.
In summary, the hepatocellular carcinoma prognosis scoring system constructed based on the hepatocellular carcinoma-associated RNA-binding protein gene is a simple and powerful prognosis evaluation tool, which is suitable for different subsets of HCC patients and has cross-platform characteristics, and the identified hepatocellular carcinoma-associated RNA-binding protein gene can be used as a novel HCC diagnosis and treatment target.
The applicant states that the detailed method of the present invention is illustrated by the above examples, but the present invention is not limited to the detailed method described above, i.e. it does not mean that the present invention must be practiced in dependence upon the detailed method described above. It should be apparent to those skilled in the art that any modification of the present invention, equivalent substitution of raw materials for the product of the present invention, addition of auxiliary components, selection of specific modes, etc., falls within the scope of the present invention and the scope of disclosure.

Claims (6)

1. A hepatocellular carcinoma prognosis scoring system based on an RNA-binding protein gene, wherein the input variables of the hepatocellular carcinoma prognosis scoring system comprise a coefficient of kurther and a score of the hepatocellular carcinoma-associated RNA-binding protein gene in a dataset;
the score is determined based on mRNA expression levels and risk ratios of the hepatocellular carcinoma-associated RNA-binding protein gene in the dataset;
the hepatocellular carcinoma-associated RNA binding protein gene comprises PRPF3, SLBP, CPEB3, PPARGC1A, IGF BP3, SF3B4, ILF2, CSTF2, ACO1 and FBL;
the data set comprises GSE14520, hepatocellular carcinoma genomic profile, and international cancer genome alliance, japan liver cancer data;
the output variable of the hepatocellular carcinoma prognosis scoring system is a weighted sum of the coefficient of the kennel and the score of the hepatocellular carcinoma-associated RNA-binding protein gene;
the calculation formula of the hepatocellular carcinoma prognosis scoring system is as follows:
RBP-score=∑(Gene_score×Gene_Weight)
wherein RBP-score is the score in the hepatocellular carcinoma prognosis scoring system, gene_weight is the coefficient of foundation, gene_score is the score, and Gene_score takes a value of 0 or 1.
2. The hepatocellular carcinoma prognostic scoring system according to claim 1, wherein the risk ratio is determined based on overall survival of the hepatocellular carcinoma-associated RNA binding protein gene in the dataset based on a univariate Cox-proportional risk model.
3. The hepatocellular carcinoma prognosis scoring system of claim 1, wherein the score is 0 or 1;
the mRNA expression level of the relevant RNA binding protein gene of the liver cell cancer is more than or equal to the average expression level and the risk ratio is more than 1, or the mRNA expression level of the relevant RNA binding protein gene of the liver cell cancer is less than the average expression level and the risk ratio is less than 1, the score is 1, otherwise, the score is 0.
4. A method of screening for a gene marker for a hepatocellular carcinoma-associated RNA binding protein, the method comprising:
screening out RNA binding protein genes with consistent expression difference from a comprehensive database of liver cell cancer queue gene expression to obtain initial liver cell cancer related RNA binding protein genes;
the screening method further comprises the steps of:
taking a liver cell cancer genome map as a training set, dividing the training set sample into a 5-year survival patient and a 5-year non-survival patient, and establishing a random forest classification model;
classifying the initial hepatocellular carcinoma-related RNA binding protein genes by using the random forest classification model to obtain key hepatocellular carcinoma-related RNA binding protein genes;
the RNA binding protein genes include PRPF3, SLBP, CPEB3, PPARGC1A, IGF2BP3, SF3B4, ILF2, CSTF2, ACO1, and FBL.
5. A method of preparing a hepatocellular carcinoma prognosis monitoring product, characterized in that the method employs the hepatocellular carcinoma prognosis scoring system of any one of claims 1-3.
6. The method of claim 5, wherein the hepatocellular carcinoma prognosis monitoring product comprises a hepatocellular carcinoma prognosis monitoring kit and/or a hepatocellular carcinoma prognosis monitoring medical device.
CN202110172416.1A 2021-02-08 2021-02-08 Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof Active CN112908470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110172416.1A CN112908470B (en) 2021-02-08 2021-02-08 Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110172416.1A CN112908470B (en) 2021-02-08 2021-02-08 Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof

Publications (2)

Publication Number Publication Date
CN112908470A CN112908470A (en) 2021-06-04
CN112908470B true CN112908470B (en) 2023-10-03

Family

ID=76122735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110172416.1A Active CN112908470B (en) 2021-02-08 2021-02-08 Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof

Country Status (1)

Country Link
CN (1) CN112908470B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113416786A (en) * 2021-08-09 2021-09-21 深圳市人民医院 Biomarker combination for hepatocellular carcinoma prognosis evaluation and screening method and application thereof
CN113611363B (en) * 2021-08-09 2023-11-28 上海基绪康生物科技有限公司 Method for identifying cancer driving gene by using consensus prediction result
CN115920006B (en) * 2022-09-19 2023-09-05 山东大学 Application of ABCF1 or agonist thereof in preparation of anti-DNA virus preparation
CN115807089A (en) * 2022-11-14 2023-03-17 石河子大学 Hepatocellular carcinoma prognosis biomarker and application thereof
CN116844685B (en) * 2023-07-03 2024-04-12 广州默锐医药科技有限公司 Immunotherapeutic effect evaluation method, device, electronic equipment and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852974A (en) * 2003-06-09 2006-10-25 密歇根大学董事会 Compositions and methods for treating and diagnosing cancer
CN101622348A (en) * 2006-12-08 2010-01-06 奥斯瑞根公司 Gene and the approach regulated as the miR-20 of targets for therapeutic intervention
CN101801419A (en) * 2007-06-08 2010-08-11 米尔纳疗法公司 Gene and path as the miR-34 regulation and control for the treatment of the target of intervening
CN104271033A (en) * 2012-05-03 2015-01-07 曼迪奥研究有限公司 Methods and systems of evaluating a risk of a gastrointestinal cancer
CN106771200A (en) * 2016-11-22 2017-05-31 陈静 Application and kit of the IGF2BP3 and AFP joint marks in liver cancer diagnosis and treatment assessment kit is prepared
CN107657149A (en) * 2017-09-12 2018-02-02 中国人民解放军军事医学科学院生物医学分析中心 System for predicting liver cancer patient prognosis
CN107922973A (en) * 2015-07-07 2018-04-17 远见基因组系统公司 Method and system for the modification detection based on sequencing
CN108410984A (en) * 2018-02-11 2018-08-17 中山大学 RBMS3 is as tumor drug resistance detection, treatment and the application of prognosis molecule target spot
CN108603230A (en) * 2015-10-09 2018-09-28 南安普敦大学 The screening of the adjusting of gene expression and protein expression imbalance
CN109593848A (en) * 2018-11-08 2019-04-09 浙江大学 A kind of tumour correlated series, long-chain non-coding RNA and its application
CN110070915A (en) * 2017-11-10 2019-07-30 首尔大学医院 The next generation utilizes the Prognosis in Breast Cancer prediction technique and forecasting system based on machine learning of base sequence analysis
KR20190090939A (en) * 2018-01-26 2019-08-05 충남대학교산학협력단 Biomarker composition comprising human cytochrome P450 4A for diagnosis or predicting prognosis of hepatocellular carcinoma
CN110993106A (en) * 2019-12-11 2020-04-10 深圳市华嘉生物智能科技有限公司 Liver cancer postoperative recurrence risk prediction method combining pathological image and clinical information
CN110996990A (en) * 2017-06-02 2020-04-10 亚利桑那州立大学董事会 Universal cancer vaccines and methods of making and using the same
CN111132682A (en) * 2017-07-28 2020-05-08 雷莫内克斯生物制药有限公司 Pharmaceutical composition for preventing or treating liver cancer

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201320061D0 (en) * 2013-11-13 2013-12-25 Electrophoretics Ltd Materials nad methods for diagnosis and prognosis of liver cancer

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852974A (en) * 2003-06-09 2006-10-25 密歇根大学董事会 Compositions and methods for treating and diagnosing cancer
CN101622348A (en) * 2006-12-08 2010-01-06 奥斯瑞根公司 Gene and the approach regulated as the miR-20 of targets for therapeutic intervention
CN101627121A (en) * 2006-12-08 2010-01-13 奥斯瑞根公司 As the miRNA regulatory gene and the path for the treatment of the target of intervening
CN101801419A (en) * 2007-06-08 2010-08-11 米尔纳疗法公司 Gene and path as the miR-34 regulation and control for the treatment of the target of intervening
CN104271033A (en) * 2012-05-03 2015-01-07 曼迪奥研究有限公司 Methods and systems of evaluating a risk of a gastrointestinal cancer
CN107922973A (en) * 2015-07-07 2018-04-17 远见基因组系统公司 Method and system for the modification detection based on sequencing
CN108603230A (en) * 2015-10-09 2018-09-28 南安普敦大学 The screening of the adjusting of gene expression and protein expression imbalance
CN106771200A (en) * 2016-11-22 2017-05-31 陈静 Application and kit of the IGF2BP3 and AFP joint marks in liver cancer diagnosis and treatment assessment kit is prepared
CN110996990A (en) * 2017-06-02 2020-04-10 亚利桑那州立大学董事会 Universal cancer vaccines and methods of making and using the same
CN111132682A (en) * 2017-07-28 2020-05-08 雷莫内克斯生物制药有限公司 Pharmaceutical composition for preventing or treating liver cancer
CN107657149A (en) * 2017-09-12 2018-02-02 中国人民解放军军事医学科学院生物医学分析中心 System for predicting liver cancer patient prognosis
CN110070915A (en) * 2017-11-10 2019-07-30 首尔大学医院 The next generation utilizes the Prognosis in Breast Cancer prediction technique and forecasting system based on machine learning of base sequence analysis
KR20190090939A (en) * 2018-01-26 2019-08-05 충남대학교산학협력단 Biomarker composition comprising human cytochrome P450 4A for diagnosis or predicting prognosis of hepatocellular carcinoma
CN108410984A (en) * 2018-02-11 2018-08-17 中山大学 RBMS3 is as tumor drug resistance detection, treatment and the application of prognosis molecule target spot
CN109593848A (en) * 2018-11-08 2019-04-09 浙江大学 A kind of tumour correlated series, long-chain non-coding RNA and its application
CN110993106A (en) * 2019-12-11 2020-04-10 深圳市华嘉生物智能科技有限公司 Liver cancer postoperative recurrence risk prediction method combining pathological image and clinical information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《Prognostic potential of PRPF3 in hepatocellular carcinoma》;Liu, YL et al.;《AGING-US》;20200115;第12卷(第1期);第912-930页 *
《TCGA 数据库中肝癌相关差异长链非编码RNA筛选和功能预测》;孙金旗等;《胃肠病学和肝病学杂志》;第28卷(第2期);第147-153页 *
《基于蛋白互作网络分析SNRPB 在肝癌发生中的作用》;李康智等;《基因组学与应用生物学》;20191031;第38卷(第10期);第4673-4679页 *

Also Published As

Publication number Publication date
CN112908470A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN112908470B (en) Hepatocellular carcinoma prognosis scoring system based on RNA binding protein gene and application thereof
Leader et al. Single-cell analysis of human non-small cell lung cancer lesions refines tumor classification and patient stratification
Elias et al. Diagnostic potential for a serum miRNA neural network for detection of ovarian cancer
Zhou et al. Relapse-related long non-coding RNA signature to improve prognosis prediction of lung adenocarcinoma
Sun et al. Gene co-expression network reveals shared modules predictive of stage and grade in serous ovarian cancers
Yan et al. A novel seven‐lncRNA signature for prognosis prediction in hepatocellular carcinoma
Milanez-Almeida et al. Cancer prognosis with shallow tumor RNA sequencing
Karn et al. Data driven derivation of cutoffs from a pool of 3,030 Affymetrix arrays to stratify distinct clinical types of breast cancer
Lang et al. Expression profiling of circulating tumor cells in metastatic breast cancer
Liu et al. Identification and validation of gene module associated with lung cancer through coexpression network analysis
Michiels et al. Multidimensionality of microarrays: statistical challenges and (im) possible solutions
Deng et al. A four-methylated LncRNA signature predicts survival of osteosarcoma patients based on machine learning
Berkel et al. Transcriptomic analysis reveals tumor stage-or grade-dependent expression of miRNAs in serous ovarian cancer
Fan et al. Identification of a novel prognostic gene signature from the immune cell infiltration landscape of osteosarcoma
Zhu et al. DNA methylation biomarkers for the occurrence of lung adenocarcinoma from TCGA data mining
Lehmann et al. Evaluation of public cancer datasets and signatures identifies TP53 mutant signatures with robust prognostic and predictive value
Peng et al. Construction and validation of an immune infiltration-related gene signature for the prediction of prognosis and therapeutic response in breast cancer
Guo et al. In silico detection of potential prognostic circRNAs through a re‑annotation strategy in ovarian cancer
Wang et al. Patterns of immune infiltration and survival in endocrine therapy-treated ER-positive breast cancer: A computational study of 1900 patients
Liu et al. Identification of a seven-gene prognostic signature using the gene expression profile of osteosarcoma
Zheng et al. Multi-omics characterization and validation of MSI-related molecular features across multiple malignancies
Zhang et al. A novel machine learning derived RNA-binding protein gene–based score system predicts prognosis of hepatocellular carcinoma patients
Bhattacharya et al. DeCompress: tissue compartment deconvolution of targeted mRNA expression panels using compressed sensing
Van Laar Design and multiseries validation of a web-based gene expression assay for predicting breast cancer recurrence and patient survival
Ran et al. Developing metabolic gene signatures to predict intrahepatic cholangiocarcinoma prognosis and mining a miRNA regulatory network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant