AU2021104371A4 - Method for constructing model for predicting survival period of hepatocellular carcinoma based on RNA binding protein - Google Patents
Method for constructing model for predicting survival period of hepatocellular carcinoma based on RNA binding protein Download PDFInfo
- Publication number
- AU2021104371A4 AU2021104371A4 AU2021104371A AU2021104371A AU2021104371A4 AU 2021104371 A4 AU2021104371 A4 AU 2021104371A4 AU 2021104371 A AU2021104371 A AU 2021104371A AU 2021104371 A AU2021104371 A AU 2021104371A AU 2021104371 A4 AU2021104371 A4 AU 2021104371A4
- Authority
- AU
- Australia
- Prior art keywords
- rbp
- survival period
- hepatocellular carcinoma
- genes
- rbp genes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 101710159080 Aconitate hydratase A Proteins 0.000 title claims abstract description 97
- 101710159078 Aconitate hydratase B Proteins 0.000 title claims abstract description 97
- 101710105008 RNA-binding protein Proteins 0.000 title claims abstract description 97
- 230000004083 survival effect Effects 0.000 title claims abstract description 93
- 206010073071 hepatocellular carcinoma Diseases 0.000 title claims abstract description 78
- 231100000844 hepatocellular carcinoma Toxicity 0.000 title claims abstract description 78
- 102000044126 RNA-Binding Proteins Human genes 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 27
- 101710146873 Receptor-binding protein Proteins 0.000 claims abstract description 39
- 101710137011 Retinol-binding protein 4 Proteins 0.000 claims abstract description 39
- 230000006916 protein interaction Effects 0.000 claims abstract description 31
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 28
- 210000001519 tissue Anatomy 0.000 claims abstract description 20
- 210000005228 liver tissue Anatomy 0.000 claims abstract description 19
- 108020004999 messenger RNA Proteins 0.000 claims abstract description 17
- 238000010201 enrichment analysis Methods 0.000 claims abstract description 14
- 238000000611 regression analysis Methods 0.000 claims abstract description 14
- 238000012163 sequencing technique Methods 0.000 claims abstract description 14
- 230000019491 signal transduction Effects 0.000 claims abstract description 12
- 101000665882 Homo sapiens Retinol-binding protein 4 Proteins 0.000 claims abstract description 11
- 238000004458 analytical method Methods 0.000 claims abstract description 11
- 230000008827 biological function Effects 0.000 claims abstract description 11
- 101150113460 RBP gene Proteins 0.000 claims abstract description 6
- 101000782060 Homo sapiens Zinc finger CCCH domain-containing protein 13 Proteins 0.000 claims description 6
- 102100036624 Zinc finger CCCH domain-containing protein 13 Human genes 0.000 claims description 6
- 102100021759 39S ribosomal protein L54, mitochondrial Human genes 0.000 claims description 5
- 102100032976 CCR4-NOT transcription complex subunit 6 Human genes 0.000 claims description 5
- 101001107003 Homo sapiens 39S ribosomal protein L54, mitochondrial Proteins 0.000 claims description 5
- 101000942595 Homo sapiens CCR4-NOT transcription complex subunit 6 Proteins 0.000 claims description 5
- 101001082063 Homo sapiens Interferon-induced protein with tetratricopeptide repeats 5 Proteins 0.000 claims description 5
- 101001090928 Homo sapiens Regulator of nonsense transcripts 3B Proteins 0.000 claims description 5
- 102100027356 Interferon-induced protein with tetratricopeptide repeats 5 Human genes 0.000 claims description 5
- 101150104557 Ppargc1a gene Proteins 0.000 claims description 5
- 102100034978 Regulator of nonsense transcripts 3B Human genes 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 230000031018 biological processes and functions Effects 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000002790 cross-validation Methods 0.000 claims description 2
- 210000003850 cellular structure Anatomy 0.000 claims 1
- 238000010230 functional analysis Methods 0.000 claims 1
- 230000004850 protein–protein interaction Effects 0.000 claims 1
- 230000003993 interaction Effects 0.000 abstract description 5
- 102000004169 proteins and genes Human genes 0.000 abstract description 3
- 238000012216 screening Methods 0.000 abstract description 2
- 206010028980 Neoplasm Diseases 0.000 description 6
- 201000009030 Carcinoma Diseases 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 4
- 238000004393 prognosis Methods 0.000 description 4
- 230000001105 regulatory effect Effects 0.000 description 4
- 101000656561 Homo sapiens 40S ribosomal protein S3 Proteins 0.000 description 3
- 102100023070 Negative elongation factor E Human genes 0.000 description 3
- 238000011282 treatment Methods 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- 102100033409 40S ribosomal protein S3 Human genes 0.000 description 2
- 101000979288 Homo sapiens Negative elongation factor E Proteins 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 231100000504 carcinogenesis Toxicity 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000034994 death Effects 0.000 description 2
- 231100000517 death Toxicity 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 201000007270 liver cancer Diseases 0.000 description 2
- 208000014018 liver neoplasm Diseases 0.000 description 2
- 108700026220 vif Genes Proteins 0.000 description 2
- 108020005345 3' Untranslated Regions Proteins 0.000 description 1
- 208000005623 Carcinogenesis Diseases 0.000 description 1
- 102000014914 Carrier Proteins Human genes 0.000 description 1
- 206010009944 Colon cancer Diseases 0.000 description 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 1
- 101000995300 Homo sapiens Protein NDRG2 Proteins 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 102100038895 Myc proto-oncogene protein Human genes 0.000 description 1
- 206010061902 Pancreatic neoplasm Diseases 0.000 description 1
- 102100034436 Protein NDRG2 Human genes 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- 102100034026 RNA-binding protein Musashi homolog 1 Human genes 0.000 description 1
- 101710129077 RNA-binding protein Musashi homolog 1 Proteins 0.000 description 1
- 102100034027 RNA-binding protein Musashi homolog 2 Human genes 0.000 description 1
- 101710129075 RNA-binding protein Musashi homolog 2 Proteins 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 102100024544 SURP and G-patch domain-containing protein 1 Human genes 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000027455 binding Effects 0.000 description 1
- 108091008324 binding proteins Proteins 0.000 description 1
- 230000036952 cancer formation Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000004663 cell proliferation Effects 0.000 description 1
- 230000010109 chemoembolization Effects 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 238000013211 curve analysis Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 102000055366 human RPS3 Human genes 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004879 molecular function Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 108010069768 negative elongation factor Proteins 0.000 description 1
- 201000002528 pancreatic cancer Diseases 0.000 description 1
- 208000008443 pancreatic carcinoma Diseases 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000092 prognostic biomarker Substances 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 238000001243 protein synthesis Methods 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 238000002271 resection Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 238000002054 transplantation Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/178—Oligonucleotides characterized by their use miRNA, siRNA or ncRNA
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Immunology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Wood Science & Technology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Oncology (AREA)
- General Engineering & Computer Science (AREA)
- Hospice & Palliative Care (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for constructing model for predicting survival period of
hepatocellular carcinoma based on RNA binding protein, which comprises that following steps
of: Acquiring mRNA sequencing data of a hepatocellular carcinoma tissue and a normal liver
tissue, and identifying a differentially expressed RBP gene based on a human RBP gene
between the mRNA sequencing data of the hepatocellular carcinoma tissue and the normal
liver tissue; Constructing a protein interaction network, and performing GO and KEGG
analysis on modules related to the differentially expressed RBP genes in the protein interaction
network; Based on the results of GO and KEGG analyses, screen out the RBP genes that related
to survival period prediction by univariate Cox regression analysis, and perform further
screening by Least Absolute Shrinkage And Selection Operator Regression(Lasso Regression),
and use stepwise multiple regression analysis to obtain key RBP genes that related to survival
period prediction, and construct a model for predicting survival period of hepatocellular
carcinoma based on key RBP genes. The invention can quickly and accurately predict the
survival period of patients with hepatocellular carcinoma.
1/1
FIGURES
Obtain mRNA sequencing data of a plurality of hepatocellular carcinoma
tissues and a plurality of normal liver tissues, and obtain a plurality of S1
human RBP genes of RNA binding protein;
Identify a plurality of differentially expressed RBP genes between mRNA
sequencing data of a plurality of hepatocellular carcinoma tissues and a S2
plurality ofnormal liver tissues based on the human RBP genes
Lb
Construct a protein interaction network based on a plurality of differentially
expressed RBP genes, and perform GO and KEGG functional enrichment analysis
on modules that related to the differentially expressed RBP genes in the protein S3
interaction network to obtain biological functions and signal transduction pathways
of the differentially expressed RBP genes in the protein interaction network;
.U
Based on the obtained biological functions and the signal transduction pathway of the RBP
genes differentially expressed in the protein interaction network, screen out the RBP genes S4
that related to the survival period prediction by adopting univariate Cox regression analysis:
L3
Further screen out the selected RBP genes related to the survival period prediction by
adopting the Least Absolute Shrinkage and Selection Operator Regression(Lasso ..
Regression), and obtain key RBP genes that related to the survival period prediction by
adopting a stepwise multiple regression analysis:
Construct a model for predicting survival period of hepatocellular
carcinoma based on the obtained key RBP genes that related to survival S6
period prediction, wherein the model for predicting survival period of
hepatocellular carcinoma is used for survival period prediction of patients
with hepatocellular carcinoma.
Figure 1
Description
1/1
Obtain mRNA sequencing data of a plurality of hepatocellular carcinoma tissues and a plurality of normal liver tissues, and obtain a plurality of S1 human RBP genes of RNA binding protein;
Identify a plurality of differentially expressed RBP genes between mRNA sequencing data of a plurality of hepatocellular carcinoma tissues and a S2 plurality ofnormal liver tissues based on the human RBP genes
Lb Construct a protein interaction network based on a plurality of differentially expressed RBP genes, and perform GO and KEGG functional enrichment analysis on modules that related to the differentially expressed RBP genes in the protein S3 interaction network to obtain biological functions and signal transduction pathways of the differentially expressed RBP genes in the protein interaction network;
.U Based on the obtained biological functions and the signal transduction pathway of the RBP
genes differentially expressed in the protein interaction network, screen out the RBP genes S4 that related to the survival period prediction by adopting univariate Cox regression analysis:
L3 Further screen out the selected RBP genes related to the survival period prediction by adopting the Least Absolute Shrinkage and Selection Operator Regression(Lasso .. Regression), and obtain key RBP genes that related to the survival period prediction by
adopting a stepwise multiple regression analysis:
Construct a model for predicting survival period of hepatocellular carcinoma based on the obtained key RBP genes that related to survival S6 period prediction, wherein the model for predicting survival period of hepatocellular carcinoma is used for survival period prediction of patients with hepatocellular carcinoma.
Figure 1
Method for constructing model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein
The invention relates to the technical field of predicting survival period of hepatocellular
carcinoma, in particular to a method for constructing model for predicting survival period
of hepatocellular carcinoma based on RNA binding protein.
Liver cancer is the fifth most common malignant tumor with high morbidity and high
mortality, and has become the second leading cause of cancer deaths worldwide.
Hepatocellular Carcinoma (HCC), a type of liver cancer, is also a malignant neoplasm
with high mortality rate. At present, the main methods for treating hepatocellular
carcinoma include systemic pharmacology treatment, surgical resection, transplantation,
ablation, Transcatheter Arterial Chemoembolization (TACE) and radiotherapy. Despite
significant advances in diagnosis and treatment, the prognosis of patients with HCC
remains poor due to the high complexity and heterogeneity of hepatocellular carcinoma,
while the incidence and mortality of HCC worldwide have been increasing in recent
decades. Therefore, it is essential to identify prognostic biomarkers and to develop novel
and accurate predictive models to predict the prognosis of patients with HCC and guide
clinical treatment.
RNA-Binding Protein (RBP) plays a vital role in regulating post-transcriptional gene.
However, studies have shown that, in addition to affect the growth and proliferation of
cells and cause the occurrence and development of tumors, abnormal expression of
certain RBPs is also significantly correlated with the malignancy and clinical prognosis of cancer patients. For example, RNA-binding proteins Musashi-1 and Musashi-2 were found to be overexpressed in colorectal cancer. Negative elongation factor E(NELFE) was founded related to promote the metastasis of pancreatic cancer by activating the
Wnt/p-catenin signaling pathway and reducing the stabilization of NDRG2 mRNA. The
human Ribosomal Protein S3(RPS3) was found up-regulated in HCC and closely related
to the prognosis of patients with HCC. SIRTI of RPS3 stable mRNA was found related
to maintain HCC progression by binding to 3'UTR of SIRTI mRNA, and somatic copy
number alterations in NELFE was found related to enhance MYC signaling and promote
HCC cell proliferation. Therefore, it is particularly necessary to provide a method for
constructing a model for predicting survival period of hepatocellular carcinoma based on
RNA binding protein.
The invention aims to provide method for constructing a model for predicting survival
period of hepatocellular carcinoma based on RNA binding protein to solve the problems
of the prior art, which can rapidly and accurately predict the survival period of a
hepatocellular carcinoma patient. A model for predicting survival period of hepatocellular
carcinoma
In order to achieve the purpose, the invention provides the following scheme: a method
for constructing a model for predicting survival period of hepatocellular carcinoma based
on RNA binding protein, which comprises the following steps:
Si: Obtain mRNA sequencing data of a plurality of hepatocellular carcinoma tissues and
a plurality of normal liver tissues, and obtain a plurality of human RBP genes of RNA
binding protein;
S2: Identify a plurality of differentially expressed RBP genes between mRNA sequencing
data of a plurality of hepatocellular carcinoma tissues and a plurality of normal liver
tissues based on the human RBP genes;
S3: Construct a protein interaction network based on a plurality of differentially
expressed RBP genes, and perform GO and KEGG functional enrichment analysis on
modules that related to the differentially expressed RBP genes in the protein interaction
network to obtain biological functions and signal transduction pathways of the
differentially expressed RBP genes in the protein interaction network;
S4: Based on the obtained biological functions and the signal transduction pathway of the
RBP genes differentially expressed in the protein interaction network, screen out the RBP
genes that related to the survival period prediction by adopting univariate Cox regression
analysis;
S5: Further screen out the selected RBP genes related to the survival period prediction by
adopting the Least Absolute Shrinkage and Selection Operator Regression(Lasso
Regression), and obtain key RBP genes that related to the survival period prediction by
adopting a stepwise multiple regression analysis;
S6: Construct a model for predicting survival period of hepatocellular carcinoma based
on the obtained key RBP genes that related to survival period prediction, wherein the
model for predicting survival period of hepatocellular carcinoma is used for survival
period prediction of patients with hepatocellular carcinoma.
Preferably, in S2, several differentially expressed RBP genes are identified between
mRNA sequencing data of a plurality of hepatocellular carcinoma tissue and a plurality
normal liver tissue by Wilcox test.
Preferably, in S2, the RBP genes with Ilog2FCl>0.5 and the RBP genes with corrected P
value<0.05 were used as differentially expressed RBP genes, wherein FC (fold change) is
the ratio of the expressed amount of RBP genes between the hepatocellular carcinoma
tissue and the normal liver tissue.
Preferably, in the S3, the method for constructing the protein interaction network based
on a plurality of differentially expressed RBP genes comprises the following steps: based
on a plurality of differentially expressed RBP genes, constructing a protein-protein
interaction network in the differentially expressed RBP genes by a STRING database,
namely a protein interaction network, and visualizing the protein interaction network by
Cytoscape software, and detecting modules related to the differentially expressed RBP
genes in the protein interaction network by using a molecular complex detection plug-in
of Cytoscape.
Preferably, in the S3, use the clusterProfiler package of R language to perform GO and
KEGG functional enrichment analysis on modules related to the differentially expressed
RBP genes in the protein interaction network, and visualize the result of GO functional
enrichment analysis by the GOplot package of R language.
Preferably, in the S3, the GO functional enrichment analysis includes biological process
analysis, cellular component analysis, and molecular function analysis.
Preferably, in the S5, based on the screened RBP genes related to the survival period
prediction, the RBP genes with a P value less than 0.01 are further screened using
LASSO regression.
Preferably, in the S5, LASSO regression is implemented using a glmnet software package
of R language, and the glmnet software package further screens RBP genes with P value
less than 0.01 using a 10-fold cross-validation method.
Preferably, in the S5, the key RBP genes related to the survival period prediction include
six RBP genes, namely CNOT6, UPF3B, MRPL54, ZC3H13, IFIT5 and PPARGC1A.
Preferably, in the S6, the method of constructing a model for predicting survival period of
hepatocellular carcinoma comprises:
Based on the key RBP genes related to survival period prediction, construct the risk score
signatures of the key RBP genes related to survival period prediction by multivariate Cox
regression as shown in the following formula:
Risk score=Expression of gene 1xCoefficient of gene 1+Expression of gene
2xCoefficient of gene 2+... +Expression of gene NxCoefficient of gene N;
In the formula, N represents the number of differentially expressed RBP genes;
Based on the correlation between the risk score signature, the clinical characteristics and
the survival period, a model for predicting survival period of hepatocellular carcinoma is
constructed.
The invention discloses that following technical effect:
According to the invention, the differentially expressed RBP genes are determined
between the hepatocellular carcinoma tissue and the normal liver tissue, and the RBP
genes that related to the survival period prediction are further screened out from the
differentially expressed RBP genes, a model for predicting survival period of
hepatocellular carcinoma is then constructed based on the screened RBP related to the
survival period prediction, therefore the survival period of a hepatocellular carcinoma patient can be rapidly and accurately predicted by the constructed model for predicting survival period of hepatocellular carcinoma.
In order to more clearly explain that embodiments of the present invention or the
technical scheme in the prior art, the figures needed to be used in the embodiments are
briefly introduce below, and it is obvious that the figures in the following description are
only some embodiments of the present invention, and other figures can be obtained
according to the figures by a person of ordinary skill in the art without paying creative
labor.
Figure 1 is a flowchart of a method for constructing a model for predicting survival
period of hepatocellular carcinoma based on RNA binding protein in an embodiment of
the present invention.
The technical solutions in the embodiments of the present invention will be clearly and
completely described below in conjunction with the accompanying figures in the
embodiments of the present invention. Obviously, the described embodiments are only a
part of the embodiments of the present invention, rather than all the embodiments. Based
on the embodiments of the present invention, all other embodiments obtained by those of
ordinary skill in the art without creative work shall fall within the protection scope of the
present invention.
In order to make the above-mentioned objects, features and advantages of the present
invention more obvious and understandable, the present invention will be further described in detail below with reference to the accompanying figures and specific embodiments.
Referring to figure 1, the embodiment provided a method for constructing a model for
predicting survival period of hepatocellular carcinoma based on RNA binding protein,
which comprising:
Si: Obtained mRNA sequencing data of a plurality of hepatocellular carcinoma tissues
and a plurality of normal liver tissues, and obtained a plurality of human RNA binding
protein RBP genes;
In this embodiment, standardized RNA-seq data (Fragments Per Kilobase of exon model
per Million mapped fragments, FPKM) and corresponding clinical data containing 374
HCC samples and 50 normal liver tissue samples were downloaded from the TCGA
database; 1542 human RBP genes have been identified.
S2: Identified a plurality of differentially expressed RBP genes between mRNA
sequencing data of a plurality of hepatocellular carcinoma tissues and a plurality of
normal liver tissues based on the human RBP gene;
In this step, differential analysis was performed by Wilcox test and differentially
expressed RBP genes between HCC and normal tissues was identified. The RBP genes of
Ilog2FCl>0.5 and the RBP genes with adjusted P value (adj.P.value) <0.05 were used as differentially expressed RBP genes. Among them, FC in log2FC, short for fold change,
refered to the ratio of expressed RBP genes between hepatocellular carcinoma tissue and
normal liver tissue, and the logarithm based on 2 was taken as log2FC. In this
embodiment, a total of 330 differentially expressed RBP genes, including 208 up- regulated RBP genes and 122 down-regulated RBP genes, were identified among 374
HCC samples and 50 normal liver tissue samples according to the Wilcox test.
RBP gene plays a vital role in RNA processing and protein synthesis, and its abnormal
expression can promote the carcinogenesis and progression of many tumors.
S3: Constructed a protein interaction network based on a plurality of differentially
expressed RBP genes, and performed GO and KEGG functional enrichment analysis on
modules related to the differentially expressed RBP genes in the protein interaction
network to obtain biological functions and signal transduction pathways of the
differentially expressed RBP genes in the protein interaction network;
In this step, based on a plurality of differentially expressed RBP genes, a Protein-Protein
Interaction (PPI) network in the differentially expressed RBP genes, i.e., a protein
interaction network, was constructed using a STRING database, and visualized by
Cytoscape software, modules related to the differentially expressed RBP genes in the PPI
network were detected using a Molecular Complex Detection (MCODE) plug-in of
Cytoscape, and GO and KEGG analyses were performed to further study the molecular
functions in hepatocellular carcinoma. Among them, the MCODE plug-in realized the
detection of important modules in the PPI network by clustering the constructed modules
in the PPI network. In this embodiment, a PPI network consisting 163 nodes and 1047
edges was constructed using the STRING database and the Cytoscape software.
In order to explore the main biological functions and signal transduction pathways of the
differentially expressed RBP genes, in this embodiment, the clusterProfiler package of R
language was used to perform GO and KEGG functional enrichment analysis on modules
related to the differentially expressed RBP genes in the protein interaction network, and the GO functional enrichment analysis results are visualized by the GOplot package of R language. GO functional enrichment analysis included biological process (BP), Cellular
Component (CC) and Molecular Functional (MF) analysis.
S4: Based on the biological function and the signal transduction pathway of the RBP
genes differentially expressed in the protein interaction network, screened the RBP gene
related to the survival period prediction by adopting univariate Cox regression analysis.
In this embodiment, 37 RBP genes related to survival period prediction were identified in
De-RBPs by univariate Cox regression analysis.
S5: Adopted Least Absolute Contraction Selection Operator (LASSO) regression to
further screen the selected RBP genes related to survival period prediction, and adopted
stepwise multiple regression analysis to obtain key RBP genes related to survival period
prediction.
In this step, after further screening for RBP genes with P value less than 0.01 using
LASSO regression, stepwise multiple regression analysis was performed to screen out
key RBP genes related to survival period prediction and obtain the standardized
regression coefficient. Among them, LASSO regression was implemented using glmnet
software package of R language, and glmnet software package used 10-fold cross
validation method to further reduce the number of RBP genes related to survival period
prediction. The key RBP genes related to survival period prediction selected by stepwise
multiple regression analysis including CNOT6, UPF3B, MRPL54, ZC3H13, IFIT5, and
PPARGC1A.
In this embodiment, verification of the six key RBP genes using the cBioPortal online
database showed that 39 of 366 patients with HCC (11%) had genetic variation (mutation and copy number variation) in the six key RBP genes, with ZC3H13 showing the highest frequency of variation. Meanwhile, six key RBP genes were analyzed in this embodiment using Kaplan-Meier curve analysis method to further verify the survival period prediction value in the TCGA array, and the results showed that patients with HCC with low expression of UPF3B and CNOT6 had longer OS, while patients with high expression of
IFIT5, MRPL54, PPARGC1A and ZC3H13 had higher survival rate, wherein OS refered
to the time from randomization to death due to any cause of patients in clinical trials. It
was considered to be the best efficacy endpoint in tumor clinical trials, and the preferred
endpoint when patients' survival can be fully evaluated.
S6: Constructed a model for predicting survival period of hepatocellular carcinoma based
on the key RBP gene related to survival period prediction, wherein the model for
predicting survival period of hepatocellular carcinoma was used for survival period
prediction of patients with hepatocellular carcinoma.
In this step, firstly, based on the key RBP genes related to survival period prediction, a
risk score signature of the key RBP genes related to survival period prediction was
constructed by multivariate Cox regression using a survival R packet in TCGA, the risk
score was shown in the following formula:
Risk score=Expression of gene 1xCoefficient of gene 1+Expression of gene
2xCoefficient of gene 2+... +Expression of gene N xCoefficient of gene N;
Secondly, based on the correlation between the risk score signatures of each key RBP
gene and the clinical features and OS, a model for predicting survival period of
hepatocellular carcinoma was constructed.
In this embodiment, based on the six key RBP genes associated with survival period
prediction, the established risk score signature was shown in the following formula:
Risk score=(0.34900xCNOT6 Exp)+(0.50277xUPF3B Exp)+(-0.43143xMRPL54
Exp)+(-0.21809xZC3H13 Exp)+(-0.46413xIFIT5 Exp)+(-0.19919xPPARGC1A Exp)
Among the six key RBP genes associated with survival period prediction, CNOT6 and
UPF3B were high risk factors (HR>1). MRPL54, ZC3H13, IFIT5 and PPARGC1A were
protective factors (HR<1).
The above-mentioned embodiments are only for describing the preferred embodiments of
the present invention and are not intended to limit the scope of the present invention. On
the premise of not departing from the design spirit of the present invention, various
modifications and improvements made to the technical scheme of the present invention
by those of ordinary skill in the art shall fall within the protection scope determined by
the claims of the present invention.
Claims (10)
1. A method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein, which is characterized in that it comprises:
Si: Obtain mRNA sequencing data of a plurality of hepatocellular carcinoma tissues and
a plurality of normal liver tissues, and obtain a plurality of human RBP genes of RNA
binding protein;
S2: Identify a plurality of differentially expressed RBP genes between mRNA sequencing
data of a plurality of hepatocellular carcinoma tissues and a plurality of normal liver
tissues based on the human RBP genes;
S3: Construct a protein interaction network based on a plurality of differentially
expressed RBP genes, and perform GO and KEGG functional enrichment analysis on
modules that related to the differentially expressed RBP genes in the protein interaction
network to obtain biological functions and signal transduction pathways of the
differentially expressed RBP genes in the protein interaction network;
S4: Based on the obtained biological functions and the signal transduction pathway of the
RBP genes differentially expressed in the protein interaction network, screen out the RBP
genes that related to the survival period prediction by adopting univariate Cox regression
analysis;
S5: Further screen out the selected RBP genes related to the survival period prediction by
adopting the Least Absolute Shrinkage and Selection Operator Regression (Lasso
Regression), and obtain key RBP genes that related to the survival period prediction by
adopting a stepwise multiple regression analysis;
S6: Construct a model for predicting survival period of hepatocellular carcinoma based
on the obtained key RBP genes that related to survival period prediction, wherein the
model for predicting survival period of hepatocellular carcinoma is used for survival
period prediction of patients with hepatocellular carcinoma.
2. The method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein according to claim 1, wherein in the S2, a
plurality of differentially expressed RBP genes are identified between mRNA sequencing
data of a plurality of hepatocellular carcinoma tissues and a plurality of normal liver
tissues by Wilcox test.
3. The method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein according to claim 2, wherein in the S2, after
Wilcox test, an RBP gene of |log2FCl>0.5 and an RBP gene with adj.P.value<0.05 are
used as differentially expressed RBP genes, wherein FC (fold change) represents the ratio
of the expressed amount of RBP genes between the hepatocellular carcinoma tissue and
the normal liver tissue.
4. The method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein according to claim 1, wherein in the S3, the
method for constructing a protein interaction network based on a plurality of
differentially expressed RBP genes comprises the steps of: Based on a plurality of
differentially expressed RBP genes, use the STRING database to construct a protein
protein interaction network in the differentially expressed RBP genes, that is, the protein
interaction network, and visualize the protein interaction network by Cytoscape software, and detect modules related to differentially expressed RBP genes in the protein interaction network by using a molecular complex detection plug-in in Cytoscape.
5. The method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein according to claim 1, characterized in that in
the S3, subject modules related to differentially expressed RBP genes in a protein
interaction network to GO and KEGG functional enrichment analysis by using a
clusterProfiler package of R language, and visualize the results of GO functional
enrichment analysis by using a GOplot package of R language.
6. The method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein according to claim 5, wherein in the S3, GO
functional enrichment analysis comprises biological process analysis, cell component
analysis and molecular functional analysis.
7. The method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein according to claim 1, wherein in the S5, based
on the selected RBP genes that related to survival prediction, use LASSO regression to
further screen RBP genes with P value less than 0.01.
8. The method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein according to claim 7, wherein in the S5, use a
glmnet software package of R language to implement LASSO regression, and the glmnet
software package further screens RBP genes that having a P value less than 0.01 using a
-fold cross-validation method.
9. The method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein according to claim 1, wherein in the S5, the key RBP genes related to survival period prediction include six RBP genes, including
CNOT6, UPF3B, MRPL54, ZC3H13, IFIT5, and PPARGC1A.
10. The method for constructing a model for predicting survival period of hepatocellular
carcinoma based on RNA binding protein according to claim 1, wherein in the S6, the
method for constructing a model for predicting survival period of hepatocellular
carcinoma comprises:
Based on the key RBP genes related to survival period prediction, use Cox regression to
construct the risk score signature of key RBP genes related to survival period prediction
as follows:
Risk score=Expression of genelxCoefficient of gene 1+Expression of gene 2xCoefficient
of gene 2+... +Expression of gene N xCoefficient of gene N;
In the formula, N is the number of differentially expressed RBP genes;
A model for predicting survival period of hepatocellular carcinoma is constructed hence
based on the correlation between the risk score signature and the clinical characteristics
and the survival period.
FIGURES 1/1
Figure 1
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021104371A AU2021104371A4 (en) | 2021-07-21 | 2021-07-21 | Method for constructing model for predicting survival period of hepatocellular carcinoma based on RNA binding protein |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021104371A AU2021104371A4 (en) | 2021-07-21 | 2021-07-21 | Method for constructing model for predicting survival period of hepatocellular carcinoma based on RNA binding protein |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2021104371A4 true AU2021104371A4 (en) | 2021-09-16 |
Family
ID=77666576
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2021104371A Ceased AU2021104371A4 (en) | 2021-07-21 | 2021-07-21 | Method for constructing model for predicting survival period of hepatocellular carcinoma based on RNA binding protein |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2021104371A4 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115148294A (en) * | 2022-06-30 | 2022-10-04 | 上海美吉生物医药科技有限公司 | Analysis method, device and application for performing functional enrichment analysis based on multiple sets of mathematical data |
CN116805513A (en) * | 2023-08-23 | 2023-09-26 | 成都信息工程大学 | Cancer driving gene prediction and analysis method based on isomerism map transducer framework |
-
2021
- 2021-07-21 AU AU2021104371A patent/AU2021104371A4/en not_active Ceased
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115148294A (en) * | 2022-06-30 | 2022-10-04 | 上海美吉生物医药科技有限公司 | Analysis method, device and application for performing functional enrichment analysis based on multiple sets of mathematical data |
CN116805513A (en) * | 2023-08-23 | 2023-09-26 | 成都信息工程大学 | Cancer driving gene prediction and analysis method based on isomerism map transducer framework |
CN116805513B (en) * | 2023-08-23 | 2023-10-31 | 成都信息工程大学 | Cancer driving gene prediction and analysis method based on isomerism map transducer framework |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ren et al. | Whole-genome and transcriptome sequencing of prostate cancer identify new genetic alterations driving disease progression | |
Mao et al. | A 15-long non-coding RNA signature to improve prognosis prediction of cervical squamous cell carcinoma | |
Kanchi et al. | Integrated analysis of germline and somatic variants in ovarian cancer | |
Li et al. | A large cohort study identifying a novel prognosis prediction model for lung adenocarcinoma through machine learning strategies | |
AU2021104371A4 (en) | Method for constructing model for predicting survival period of hepatocellular carcinoma based on RNA binding protein | |
Chen et al. | Development of biomarker signatures associated with anoikis to predict prognosis in endometrial carcinoma patients | |
Wu et al. | Integrated analysis identifies oxidative stress genes associated with progression and prognosis in gastric cancer | |
Tian et al. | Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database | |
Vinayanuwattikun et al. | Elucidating genomic characteristics of lung cancer progression from in situ to invasive adenocarcinoma | |
Zhang et al. | Whole-exome sequencing identifies novel somatic mutations in chinese breast cancer patients | |
Qin et al. | Identification of genes related to immune infiltration in the tumor microenvironment of cutaneous melanoma | |
Lopes-Ramos et al. | Regulatory network of PD1 signaling is associated with prognosis in glioblastoma multiforme | |
Wang et al. | Identification of A-to-I RNA editing profiles and their clinical relevance in lung adenocarcinoma | |
Han et al. | Identification, Validation, and Functional Annotations of Genome‐Wide Profile Variation between Melanocytic Nevus and Malignant Melanoma | |
Gao et al. | Identification of the hub and prognostic genes in liver hepatocellular carcinoma via bioinformatics analysis | |
Zhang et al. | Identification of significant genes with prognostic influence in clear cell renal cell carcinoma via bioinformatics analysis | |
Zhang et al. | An integrated model of FTO and METTL3 expression that predicts prognosis in lung squamous cell carcinoma patients | |
Ke et al. | Identification of core genes shared by endometrial cancer and ovarian cancer using an integrated approach | |
Li et al. | NUF2 is a potential immunological and prognostic marker for non-small-cell lung cancer | |
Lei et al. | Integrative analysis identifies key genes related to metastasis and a robust gene-based prognostic signature in uveal melanoma | |
Ye et al. | Correlation analysis of m6A-modified regulators with immune microenvironment infiltrating cells in lung adenocarcinoma | |
Tyagi et al. | Multi-omics approach for identifying CNV-associated lncRNA signatures with prognostic value in prostate cancer | |
Liao et al. | The comprehensive investigation of transcription factor AP-2 alpha in lung adenocarcinoma | |
Lin et al. | LncRNA DIRC1 is a novel prognostic biomarker and correlated with immune infiltrates in stomach adenocarcinoma | |
Lin et al. | The clinical significance and mechanism of microRNA-22-3p targeting TP53 in lung adenocarcinoma |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |