CN113832227A - Construction and application of prognosis prediction model of hepatocellular carcinoma patient - Google Patents

Construction and application of prognosis prediction model of hepatocellular carcinoma patient Download PDF

Info

Publication number
CN113832227A
CN113832227A CN202111092690.4A CN202111092690A CN113832227A CN 113832227 A CN113832227 A CN 113832227A CN 202111092690 A CN202111092690 A CN 202111092690A CN 113832227 A CN113832227 A CN 113832227A
Authority
CN
China
Prior art keywords
lncrna
model
prognosis
hepatocellular carcinoma
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111092690.4A
Other languages
Chinese (zh)
Other versions
CN113832227B (en
Inventor
黄凤婷
张世能
颛孙永勋
屈均池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen Memorial Hospital Sun Yat Sen University
Original Assignee
Sun Yat Sen Memorial Hospital Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen Memorial Hospital Sun Yat Sen University filed Critical Sun Yat Sen Memorial Hospital Sun Yat Sen University
Priority to CN202111092690.4A priority Critical patent/CN113832227B/en
Publication of CN113832227A publication Critical patent/CN113832227A/en
Application granted granted Critical
Publication of CN113832227B publication Critical patent/CN113832227B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Pathology (AREA)
  • Microbiology (AREA)
  • Analytical Chemistry (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Hospice & Palliative Care (AREA)
  • Evolutionary Computation (AREA)
  • Oncology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Plant Pathology (AREA)
  • Primary Health Care (AREA)

Abstract

The invention provides a group of molecular markers for predicting prognosis of a patient with hepatocellular carcinoma, wherein the biomarkers are lncRNA, and the lncRNA comprises: MKLN1-AS, LNCSRLR, POLH-AS1, AC145207.5, LINC01063, AL161937.2 and AC 105345.1. According to the invention, a model and a nomogram which can accurately predict the prognosis of the patient with the hepatocellular carcinoma are constructed by the 7 lncRNA, the risk score prediction model and the nomogram have good universality and higher accuracy and sensitivity, and the AUC of the prognosis prediction of 1 year, 3 years and 5 years is more than 0.7.

Description

Construction and application of prognosis prediction model of hepatocellular carcinoma patient
Technical Field
The invention belongs to the technical field of medicine, and particularly relates to construction and application of a prognosis prediction model of a hepatocellular carcinoma patient.
Background
Primary liver cancer is one of the most common malignant tumors worldwide, the incidence rate accounts for 4.7% of the cancers worldwide, and the mortality rate accounts for 8.3%, wherein Hepatocellular carcinoma (HCC) is the main one and accounts for 75% -85% of the primary liver cancer. The current treatment of HCC is mainly surgery, assisted by topical treatment, which is not sensitive to radiotherapy and chemotherapy. However, even after radical surgery, 5-year survival rates for resection and transplantation are only 30% and 60%, respectively, for patients in the middle and advanced stages, and the prognosis is not optimistic. The clinical urgent need is to accurately evaluate the prognosis of HCC patients, and establish a simple HCC prognosis prediction model convenient for practical clinical operation, so as to better assist clinicians in accurately evaluating the survival prognosis of patients, and provide valuable references for decisions in the treatment process of patients, so as to perform more accurate treatment management on patients.
There are many factors that affect the prognosis of HCC patients, including the tumor itself, the general status of the patient, etc. The current clinical systems for prognosis evaluation of HCC patients are mainly the clinical pathological staging system and Alpha Fetoprotein (AFP). Currently, there are various liver cancer clinical staging systems worldwide, including barcelona staging (BCLC), AJCC TNM staging, asia-pacific liver research association staging (APASL), italian liver cancer planning staging (CLIP), and japanese society for hepatopathy staging (JSH), etc., and each staging method has advantages and disadvantages, and is biased in predicting patient prognosis and has certain limitations in survival-related prognosis prediction. In addition, serum AFP is the most widely used tumor marker of HCC, and is a method for screening patients and predicting prognosis in early stage, but the level of serum AFP is influenced by various factors, such as liver cirrhosis, chronic/active hepatitis, pregnancy, germ cell-derived tumors and the like, and about 30% of liver cell liver cancer patients have no obvious abnormal AFP, and meanwhile, the sensitivity and specificity of the AFP to prognosis prediction are low.
Long non-coding RNA (lncRNA) is RNA with a length of not less than 200 nucleotides, which is not involved or rarely involved in protein coding, and plays an important role in cell growth, differentiation, chromatin regulation, gene expression regulation, and the like. An increasing number of studies have shown that lncRNA is closely related to the prognosis of HCC patients. lncRNA can play a key role in the processes of chromatin regulation, alternative splicing regulation, transcription regulation and the like, and can also be used as competitive endogenous RNA (ceRNA) to regulate the expression of miRNA downstream, and finally influence the stability of mRNA, the translation regulation process and the like. The regulation is carried out in the processes of epigenetics, transcription, post-transcription level and the like through various modes, and the situation that lncRNA plays an important role in the occurrence and development of HCC is suggested, and the lncRNA can be used as an independent factor for predicting the prognosis of a patient.
Disclosure of Invention
In order to solve the technical problems, the invention provides a construction method of a prognosis prediction model of a hepatocellular carcinoma patient, and the prognosis prediction model capable of accurately predicting the hepatocellular carcinoma patient is obtained.
The invention adopts the following scheme to realize the purpose of the invention:
in a first aspect, the present invention provides a set of lncrnas for predicting prognosis of a patient with hepatocellular carcinoma, the lncrnas comprising: MKLN1-AS, LNCSRLR, POLH-AS1, AC145207.5, LINC01063, AL161937.2, AC 105345.1.
In a second aspect, the present invention provides a kit for predicting prognosis of a patient with hepatocellular carcinoma, the kit comprising a set of biomacromolecules with molecular biomarkers, the molecular markers comprising a set of lncrnas, the lncrnas comprising: MKLN1-AS, LNCSRLR, POLH-AS1, AC145207.5, LINC01063, AL161937.2, AC 105345.1.
Preferably, the kit also comprises an amino acid sequence with nucleotides shown as SEQ ID NO 1-14.
The invention provides a method for constructing a prognosis prediction model of a patient with hepatocellular carcinoma, which comprises the following steps:
s1 downloading lncRNA expression quantity data of cancer tissues or paracancer normal tissue samples of the liver cell and liver cancer patients and clinical information data of the corresponding patients from the TCGA database;
s2, screening cancer tissue samples and paracancer normal tissue samples meeting the requirements according to the inclusion and exclusion standards;
s3, respectively carrying out differential analysis on the expression profiles of lncRNA and mRNA of the S2 sample, screening lncRNA with significant difference of expression quantity between normal tissues and cancer tissues, and setting differential screening standards to be | logFc | >1 and p < 0.05;
s4, combining the different lncRNA obtained in S3 with survival data of a patient to perform single-factor Cox regression analysis, setting the screening condition to be p less than 0.01, screening out the different lncRNA obviously related to the prognosis of the hepatocellular carcinoma, and then constructing a multifactor Cox regression model to obtain the prognosis prediction model for predicting the hepatocellular carcinoma of the patient, wherein the model is the following calculation formula:
Figure RE-GDA0003364276010000031
wherein Risk Score is the Risk Score, Exp, of each hepatocellular carcinoma sampleiThe expression level of each lncRNA in the model, ciCalculated risk factors for each and every lncRNA.
Preferably, the different lncrnas significantly correlated with the prognosis of hepatocellular carcinoma specifically include: MKLNl-AS, LNCSRLR, POLH-AS1, AC145207.5, LINC01063, AL161937.2, AC 105345.1.
Preferably, the prognostic prediction model is specifically as follows:
Risk Score=-4.13484×AC105345.1+0.439897×AL161937.2+ 0.569063×LINC01063+0.598674×AC145207.5+0.922016×POLH-AS1+0.630741×LNCSRLR+1.3 69449×MKLN1-AS。
preferably, the S1 is implemented as follows: selecting a reproducibility search mode, selecting a live and interactive biological products from a tumor part, selecting TCGA-LIHC from a tumor pathology type, selecting a transciptome profiling from a data type, selecting a Gene Expression component from a data type, and selecting HTSeq-FPKM from a data processing form.
Preferably, the inclusion criteria are: the gene expression data is complete and high in quality, the clinical data is complete, and the follow-up time is not less than 30 days; the exclusion criteria were: the survival time and the survival state are deleted, AJCC TNM is deleted in stages, and the pathological stage is deleted or unknown.
Preferably, the expression profile of S2 is obtained by the following method: downloading a corresponding set of human Ensemble id and Symbol id and a set of human coding gene/non-coding gene annotated genes from the Ensemble official network, and performing gene name conversion and separation of IncRNA from mRNA by using Perl language.
Preferably, the construction method further comprises: and (4) combining the independent clinical factors obtained according to Cox regression analysis with the lncRNA risk scoring model to draw a nomogram.
The invention has the beneficial effects that: compared with the traditional prediction method, the risk score prediction model and the nomogram in the patent have good universality and higher accuracy and sensitivity.
Drawings
FIG. 1 is a flow chart of the present invention for constructing a prediction model
FIG. 2 is a schematic diagram showing the expression profiles of lncRNA in hepatocellular carcinoma tissue in TCGA database. A. differentially expressed lncRNA volcano plot, b. differentially expressed lncRNA heatmap (only the first 50 lncrnas are shown); in the subgroup Type, N represents paracancerous normal tissue and T represents cancerous tissue.
FIG. 3 is a schematic diagram of model evaluation in the training set. A. A sample overall risk curve; B. a sample overall survival state distribution diagram; heatmap of c.7 lncrnas expressed in high/low risk score groups, except protective lncRNA-AC105345.1, where all lncrnas were significantly higher expressed in the high risk group than in the low risk group; D. survival analysis of high/low risk score groups. The prognosis of the high risk group is poor; e.model time dependence of tolerance ROC curves for 1 year, 3 years, 5 years.
FIG. 4 is a schematic diagram of model verification in a verification group. A. A sample overall risk curve; B. a sample overall survival state distribution diagram; expression heatmap of c.7 lncrnas; D. survival analysis curves for high/low risk groups; e.model time dependence of tolerance ROC curves for 1 year, 3 years, 5 years.
FIG. 5 is a schematic diagram of model validation in a complete data set. A. A sample overall risk curve; B. a sample overall survival state distribution diagram; expression heatmap of c.7 lncrnas; D. survival analysis curves for high/low risk groups; e.model time dependence of tolerance ROC curves for 1 year, 3 years, 5 years.
FIG. 6 is a schematic diagram showing the validation of 7 IncRNAs in a clinical sample in the prediction model of the present invention. T is Tumor; n Normal p < 0.05.
FIG. 7 is a schematic diagram of independent clinical risk factor screening. A. A forest map of the contribution of each clinical factor in the training set to prognosis; B. forest plots of the contribution of each clinical factor to prognosis in the complete data set.
Fig. 8 is a nomogram constructed in conjunction with the risk scoring model and TNM staging. The longitudinal scale is the screened independent clinical risk factor, and the transverse scale is the corresponding score.
Fig. 9 is an evaluation diagram of an alignment chart. A. Incorporating decision analysis curves for all clinical factors, models and nomograms; nomogram time dependence tolerance ROC curves for 1 year, 3 years, 5 years.
Fig. 10 is a correction graph of the alignment chart. A.1 predicting survival rate correction curve in year; b.3 predicting survival rate correction curve in year; and C.5, predicting a survival rate correction curve.
FIG. 11 is a graph of ROC (time dependent resistance). A.1 year, 3 year, 5 year TNM staging time dependence ROC curve chart; b.1 year time-dependent tolerance ROC curve comparison of TNM staging, lncRNA risk score model and nomogram; c.3 years time-dependent tolerance ROC curve comparison of TNM staging, lncRNA risk score model and nomogram; time-dependent resistance ROC curve comparison of TNM staging, lncRNA risk score model and nomogram for d.5 years.
Detailed Description
In order to show technical solutions, purposes and advantages of the present invention more concisely and clearly, the technical solutions of the present invention are described in detail below with reference to specific embodiments. Unless otherwise specified, the reagents involved in the examples of the present invention are all commercially available products, and all of them are commercially available.
Example 1 construction of lncRNA risk scoring model, the specific steps are as follows:
data acquisition and preliminary processing
To obtain transcriptome expression profile data of hepatocellular carcinoma and corresponding patient clinical information data, the TCGA database (https:// portal.gdc.cancer.gov /) was accessed. The method comprises the steps of selecting a reproducibility search mode, selecting a live and interactive biological products from a tumor part, selecting TCGA-LIHC from a tumor pathology type, selecting a transciptome profiling from a data type, selecting a Gene Expression component from a data type, and selecting HTSeq-FPKM (data after official correction) from a data processing form. Therefore, the lncRNA expression quantity data of 424 cancer tissues/paracancer normal tissue samples of hepatocellular carcinoma in the TCGA database and the clinical information data of corresponding patients can be downloaded, wherein 374 cases of cancer tissues and 50 cases of paracancer normal tissues are included. The general flow of the invention is shown in FIG. 1.
Inclusion and exclusion criteria were then determined. Inclusion criteria were: 1. the gene expression data is complete and high in quality; 2. the clinical data is complete; 3. the follow-up time is not less than 30 days. Exclusion criteria: 1. the absence of survival time and survival status; staged deletion of AJCC TNM; 3. the pathology is missing or unknown in grade. After screening, 338 cancer tissue samples of hepatocellular carcinoma were included in the present invention, and 50 normal tissues beside hepatocellular carcinoma were used as a control group for differential analysis.
The screened TCGA transcriptome expression data and clinical data were extracted in bulk using a Practical Report Extraction Language (Perl) and integrated (version 5.30.2.1). The corresponding set of human Ensemble id and Symbol id and the set of human coding gene/noncoding gene annotated genes were downloaded from Ensemble official network (http:// asia. Ensemble. org/index. html), and gene name conversion and separation of IncRNA from mRNA were performed using Perl language to obtain the expression profiles of IncRNA of 338 cancer tissues and 50 paracancer normal tissues for the next step of differential analysis.
② differential analysis and lncRNA model screening
The next data analysis of the invention is carried out by using R language (version 3.6.3), and the used tool is R Studio (version 1.2.1335). The expression profiles of lncRNA and mRNA of 338 cancer tissues and 50 paracancer normal tissues are respectively subjected to differential analysis by calling a 'limma' packet, the differential screening standard is set as | logFc | >1, p is less than 0.05, 3401 differential lncRNA is respectively screened from 14143 lncRNA, a 'pheamap' packet is called to draw a heat map, a 'ggplot 2' packet is called to draw a volcano map (figure 2), and clinical data and transcriptome expression data are combined by using a merge () function and are used for next analysis.
338 cancer tissue samples of hepatocellular carcinoma were randomly divided into a training group and a verification group at a ratio of 7: 3 by calling the createdatapartion () function in the "caret" package, wherein 238 cases of the training group and 100 cases of the verification group were included. The createdatapartion () function is a hierarchical random sampling after mixing classification labels, and can ensure that the distribution proportion of various labels in the training set and the test set is strictly consistent with the distribution proportion of the sample population. Then, by calling a "survivval" and "survivor" package, 3401 different lncRNA is subjected to single-factor Cox regression analysis in combination with the survival data of the patients in the training group, the screening condition is set to be p <0.01, and 39 different lncRNA obviously related to the prognosis of the hepatocellular carcinoma are screened out altogether based on the 39 lncRNA. The method comprises the steps of carrying out multifactor Cox regression model construction (stepwise regression and retreat method) by calling a 'glmnet' packet, evaluating the fitting degree and the accuracy of each model in the stepwise regression through an AIC value and an LR value respectively, selecting a model with the lowest AIC value and the highest LR value from the models, and finally obtaining a prognosis model containing 7 lncRNA, and calculating Risk scores (Risk Score) of 338 liver cell liver cancer samples respectively based on the model, wherein the calculation formula is
Figure RE-GDA0003364276010000061
Wherein Risk Score is the Risk Score, Exp, of each hepatocellular carcinoma sampleiThe expression level of each lncRNA in the model, ciCalculated risk factors for each lncRNA (table 1).
Table 1: 7 lncRNA information incorporated into the model
Figure RE-GDA0003364276010000071
Verification of lncRNA risk scoring model
The evaluation of the model mainly adopts a time-dependent tolerance subject working curve, namely a time-dependent tolerance ROC curve (which is a time ROC curve) and a survival curve (survivorship curve) to evaluate, wherein the time-dependent ROC curve is an extension of a classical ROC curve, the survival state of a research object is required to be kept unchanged in the classical ROC analysis, but in medical follow-up research, the survival state of a patient is always changed along with follow-up, and due to the compliance problem, tail data can inevitably occur, at the moment, the time-dependent ROC analysis can be adopted, the survival analysis and the ROC curve are combined, the tail data can be included, and the prediction accuracy of the model prediction can be reflected, and meanwhile, the prediction capability of models with different time nodes can be changed. In the ROC analysis, ROC curves at respective times are drawn based on survival states of the study object at a certain time point, and the area under the ROC curve (AUC) is calculated, and a common time node is used: for 1 year, 3 years and 5 years, it is generally accepted that AUC is low between 0.6-0.7 accuracy, medium between 0.7-0.9 accuracy and high between 0.9-1.0 accuracy, but overfitting may be present.
According to the risk score values obtained by the model, samples of the training group, the verification group and the complete data set (shown in figure 5) are divided into a high-risk group and a low-risk group respectively through a median, then survival analysis is carried out on the training group, the verification group and the complete data set respectively according to the grouping condition of the high-risk group and the low-risk group, and a risk curve, a survival state diagram, a heat map, a survival curve and a time-dependent tolerance ROC curve are drawn. The clinical data of the patients in the training set, the validation set and the complete data set are shown in table 2. The survival analysis related curve of the training set is used for evaluating the accuracy and the prediction capability of the model, and the related curve of the verification set is used for verifying the prediction capability and the applicability of the model. The risk scoring model shows better accuracy of prognosis prediction of the hepatocellular carcinoma patients in a training group (figure 3), and better verifies the prediction capability of the model in a verification group (figure 4) and a complete data set group (figure 5).
TABLE 2 Baseline profiles for training and validation sets
Figure RE-GDA0003364276010000072
Figure RE-GDA0003364276010000081
Example 2 validation of lncRNA risk scoring model, the specific steps are as follows:
(1) specimen Collection and tissue RNA extraction
The invention collects the cancer tissue and the tissue fresh specimens beside the cancer of the liver cell liver cancer patients from 3 months to 4 months in 2021 in 9 cases from the Zhongshan university Sun-Yixian commemorative hospital, and the specimens are all immediately placed in liquid nitrogen for preservation after surgical excision. Inclusion criteria were: 1. primary tumors of the liver; 2. the pathological section proves that the liver cell is liver cancer; 3. the patient in the first diagnosis has not received the anti-cancer treatment such as radiotherapy, chemotherapy, immunotherapy, etc. Exclusion criteria: 1. missing follow-up data; 2. other malignancies were combined. All patients had signed informed consent prior to specimen collection. The study was approved by the ethical committee of the grand university grand fugax commemorative hospital.
(2) RNA extraction step:
1. the tissue was taken out from liquid nitrogen, cut into a size of about 0.5cm × 0.5cm × 0.5cm mung bean, 1ml of Trizol solution was added, the tissue was sufficiently ground using a grinding bar and mortar, and left to stand for 5min, all on ice.
2. Chloroform (200 ml) was added, the cap was immediately closed, the mixture was vigorously shaken for 15 seconds, and then allowed to stand at room temperature for 5 min.
Centrifuging at 3.4 deg.C and 12000rpm for 15min, and collecting the supernatant as RNA layer.
4. Adding the supernatant into a new tube, adding isopropanol with half volume, mixing, and standing for 10 min.
Centrifuging at 12000rpm for 10min at 5.4 deg.C, and discarding the supernatant.
6. Add 500. mu.l of 75% ethanol (precooled), wash and centrifuge again at 4 ℃ for 5min (rpm < 7500).
7. The supernatant was discarded and step 6 was repeated once.
8. The supernatant was discarded, and an EP tube was added thereto, followed by air-drying at room temperature to be translucent.
9. Adding DEPC water 20 μ l, and water bath at 60 deg.C for 10min to dissolve RNA.
And 10, measuring the concentration and purity of the RNA by using the nanodrop, and storing at-70 ℃ for later use.
(3) Quantitative PCR
All primers of the present invention were designed and synthesized by Beijing Rui Boxing Ke Biotechnology Co., Ltd, and the sequences of the primers are shown in the following Table 3:
table 3: primer sequences
Figure RE-GDA0003364276010000091
The PCR comprises the following specific steps:
1. preparing an EP tube, a reverse transcription kit, a PCR kit, an ice box, a gun head and other articles.
2. The reaction system was calculated according to the reverse transcription and PCR kit instructions.
3. cDNA was synthesized on ice as per the reverse transcription kit instructions.
4. Mix primers with SYBR mix, mix cDNA with water.
5. And (5) adding samples, and uniformly shaking and then loading the sample on a machine.
As a result, as shown in fig. 6, AC105345.1 was low expressed in the cancer tissue and high expressed in the normal tissue; AL161937.2, LINC01063, AC145207.5, POLH-AS1, LNCSRLR, MKLN1-AS were all highly expressed in cancer tissues and were all lowly expressed in normal tissues.
Example 3 evaluation of lncRNA Risk Scoring model
Screening independent clinical risk factors
In order to find possible independent clinical factors influencing the prognosis of hepatocellular carcinoma, Cox regression analysis is carried out on a training group and a complete data set by calling a "survival" packet, a forest map is drawn for visual representation, and clinical data comprise: age, gender, pathological grading, AJCC TNM staging, then combining with the results of Cox regression analysis, dividing the complete data set into corresponding subgroups through independent clinical factors, and respectively drawing survival curves to evaluate the prediction capability of the model in the subgroups. From the forest map, it can be seen that the p values of RiskScore, Stage and T are all less than 0.001, and the p values of the rest clinical factors are all greater than 0.05 no matter in the training group or the complete data set, and since the content of the T Stage is included in Stage (namely AJCC TNM Stage), Risk Score and AJCC TNM Stage are considered to be independent clinical Risk factors influencing the prognosis of hepatocellular carcinoma.
② construction and evaluation of nomograms
According to the independent clinical factors obtained by the Cox regression analysis in the last step, a nomogram which can visually reflect the prognosis of the patient is drawn by calling the "rms" package in combination with the lncRNA risk scoring model (FIG. 8). The predictive power and accuracy of the constructed nomogram were then correspondingly verified, including the decision analysis curve (fig. 9), the time-dependent ROC curve (fig. 9), the rectification curve (fig. 10), and the C-index. The correction curve is realized by a Bootstrap self-sampling method (Bootstrap self-sampling is one of the most common repeated sampling methods, different new sample groups are obtained by repeatedly extracting from an original sample group in a back-to-back mode), the basic idea of the correction curve is that the real survival rate is used as the ordinate, the predicted survival rate is used as the abscissa, then the survival rate of all patients at a specified time point is predicted according to a histogram model, the patients are divided into a plurality of nodes according to the survival rate from low to high, the average survival rate predicted value and the average survival rate predicted value of each node in the model are calculated, all calibration points are connected by a smooth curve to obtain the prediction curve, the standard curve reflects the real survival condition, and the more the prediction curve is matched with the standard curve, the higher the prediction accuracy of the model is indicated. The C index (C-index) is also realized by Bootstrap self-sampling and is used for evaluating whether the model has overfitting bias, the evaluation of the C index is similar to the AUC value of the ROC curve, the common accuracy is between 0.6 and 0.7, the medium accuracy is between 0.7 and 0.9, and the overfitting bias is usually between 0.9 and 1.0. The decision curve analysis method can evaluate the clinical practicability of the model by calculating the Net Benefit (NB) of the model under different threshold values, and the calculation formula of the Net Benefit is as follows:
Figure RE-GDA0003364276010000111
wherein tp is the number of true positives, fp is the number of false positives, n is the total number of patients, and pt is the threshold probability. By net benefit is meant the ability of the model to find truly high-risk-of-death patients while avoiding false positives and false negatives to the greatest extent possible. Generally, within a specified threshold range, the higher the net benefit of the model, the higher the clinical utility value of the model, and therefore, the decision curve analysis can intuitively and accurately compare the clinical values of various prognostic influencing factors with each other.
The consistency of predicted values and actual values of the nomogram in 1 year, 3 years and 5 years is high as can be seen from the correction curve, which indicates that the nomogram has good accuracy; the decision analysis curve reflects that the net benefit of the nomogram established based on RiskScore and Stage stages on the patient is superior to RiskScore and Stage stages per se and is obviously superior to other single clinical factors, and the nomogram is a prediction index capable of avoiding false positive and false negative to the maximum extent; whereas the 1-year, 3-year, and 5-year AUC values of the time-dependent ROC curve were 0.796, 0.811, and 0.795, respectively, and the C index of the nomogram was calculated to be 0.696 (95% CI: 0.644-0.767, p <0.001), indicating that the nomogram (FIG. 8) has good prognostic power.
Comparing the prediction values of different prediction modes
In order to compare different prediction modes involved in the invention, 338 liver cancer samples in the TCGA database are subjected to TNM stage time-dependent tolerance ROC curve drawing (FIG. 11A), area AUC values under the curve are calculated, and are compared with time-dependent tolerance ROC curves of an lncRNA risk score model and a nomogram (FIG. 11B, C and D), so that it can be seen that the curves of the model in 1 year, the model in 3 years and the nomogram are crossed and have similar prediction values, but are generally superior to the TNM stage, in order to further understand whether the area AUC values under the curves of the model, the nomogram and the TNM stage have statistical difference between two, an orthomorphism test is carried out, and the results show that the normal distribution is satisfied (p values are respectively: 0.187, 0.107 and 0.328), and then a pairing t test is used to compare the average AUC values between two without obvious difference (Table 3), from the results, it can be seen that the average AUC values between the model and the TNM stage and between the nomogram and the TNM stage are statistically different, the average AUC values between the model and the nomogram are not significantly different, and the average AUC values of the model and the nomogram are both greater than the TNM stage, so that the prognosis prediction ability of the model and the nomogram is considered not inferior to that of the AJCC TNM stage, but the prognosis prediction ability is similar in comparison between the model and the nomogram.
TABLE 3 AUC value comparison results of three prognosis modes
Figure RE-GDA0003364276010000121
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the protection scope of the present invention, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A set of molecular markers for predicting prognosis of a patient with hepatocellular carcinoma, wherein the biomarkers are lncrnas, and the lncrnas comprise: MKLN1-AS, LNCSRLR, POLH-AS1, AC145207.5, LINC01063, AL161937.2 and AC 105345.1.
2.A kit for predicting prognosis of a patient with hepatocellular carcinoma, the kit comprising a set of molecular biomarker biological macromolecules, wherein the molecular marker comprises a set of lncRNA, MKLN1-AS, LNCSRLR, POLH-AS1, AC145207.5, LINC01063, AL161937.2 and AC 105345.1.
3. The kit of claim 2, further comprising an amino acid sequence having nucleotides of SEQ ID NO 1-14.
4. A construction method of a prognosis prediction model of a liver cell liver cancer patient is characterized by comprising the following steps:
s1 downloading lncRNA expression quantity data of cancer tissues or paracancer normal tissue samples of the liver cell and liver cancer patients and clinical information data of the corresponding patients from the TCGA database;
s2, screening cancer tissue samples and paracancer normal tissue samples meeting the requirements according to the inclusion and exclusion standards;
s3, respectively carrying out difference analysis on the expression profiles of lncRNA and mRNA of the S2 sample, screening lncRNA with the expression quantity remarkably different between normal tissues and cancer tissues, and setting a difference screening standard to be | logFc | >1 and p < 0.05;
s4, combining the different lncRNA obtained in S3 with survival data of a patient to perform single-factor Cox regression analysis, setting the screening condition to be p <0.01, screening out the different lncRNA obviously related to the prognosis of the hepatocellular carcinoma, and then constructing a multifactor Cox regression model to obtain the prognosis prediction model for predicting the hepatocellular carcinoma of the patient, wherein the model is the following calculation formula:
Figure FDA0003267940950000011
wherein Risk Score is the Risk Score, Exp, of each hepatocellular carcinoma sampleiThe expression level of each lncRNA in the model, ciCalculated risk factors for each and every lncRNA.
5. The constructing method according to claim 4, wherein the lncRNA that is significantly correlated with the prognosis of hepatocellular carcinoma specifically comprises: MKLN1-AS, LNCSRLR, POLH-AS1, AC145207.5, LINC01063, AL161937.2 and AC 105345.1.
6. The method of claim 5, wherein the prognostic prediction model is specifically as follows:
Risk Score=-4.13484×AC105345.1+0.439897×AL161937.2+0.569063×LINC01063+0.598674×AC145207.5+0.922016×POLH-AS1+0.630741×LNCSRLR+1.369449×MKLN1-AS。
7. the constructing method according to claim 4, wherein the S1 is realized by the following steps: selecting a reproducibility search mode, selecting a live and interactive biological products from a tumor part, selecting TCGA-LIHC from a tumor pathology type, selecting a transciptome profiling from a data type, selecting a Gene Expression component from a data type, and selecting HTSeq-FPKM from a data processing form.
8. The construction method of claim 4, wherein the inclusion criterion is: the gene expression data is complete and high in quality, the clinical data is complete, and the follow-up time is not less than 30 days; the exclusion criteria were: the survival time and the survival state are deleted, AJCC TNM is deleted in stages, and the pathological stage is deleted or unknown.
9. The method of claim 4, wherein the expression profile of S2 is obtained by: downloading a corresponding set of human Ensemble id and Symbol id and a set of human coding gene/non-coding gene annotated genes from the Ensemble official network, and performing gene name conversion and separation of IncRNA from mRNA by using Perl language.
10. The method of claim 4, wherein the independent clinical factors obtained from the Cox regression analysis are combined with the lncRNA risk score model to generate a nomogram.
CN202111092690.4A 2021-09-17 2021-09-17 Construction and application of prognosis prediction model of hepatocellular carcinoma patient Active CN113832227B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111092690.4A CN113832227B (en) 2021-09-17 2021-09-17 Construction and application of prognosis prediction model of hepatocellular carcinoma patient

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111092690.4A CN113832227B (en) 2021-09-17 2021-09-17 Construction and application of prognosis prediction model of hepatocellular carcinoma patient

Publications (2)

Publication Number Publication Date
CN113832227A true CN113832227A (en) 2021-12-24
CN113832227B CN113832227B (en) 2022-07-05

Family

ID=78959830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111092690.4A Active CN113832227B (en) 2021-09-17 2021-09-17 Construction and application of prognosis prediction model of hepatocellular carcinoma patient

Country Status (1)

Country Link
CN (1) CN113832227B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114107511A (en) * 2022-01-10 2022-03-01 深圳市龙华区人民医院 Marker combination for predicting liver cancer prognosis and application thereof
CN114457161A (en) * 2022-03-07 2022-05-10 江西省肿瘤医院(江西省第二人民医院、江西省癌症中心) Application of lncRNA AC145207.5 in colorectal cancer diagnosis, treatment and drug sensitivity improvement
CN116844685A (en) * 2023-07-03 2023-10-03 广州默锐医药科技有限公司 Immunotherapeutic effect evaluation method, device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106893784A (en) * 2017-05-02 2017-06-27 北京泱深生物信息技术有限公司 LncRNA marks for predicting prognosis in hcc

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106893784A (en) * 2017-05-02 2017-06-27 北京泱深生物信息技术有限公司 LncRNA marks for predicting prognosis in hcc

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
TOMCZAK K 等: "The cancer genome Atlas(TCGA):an immeasurable source of knowledge", 《CONTEMP ONCOL(POZN)》 *
张诚胜等: "LncRNA与肝细胞癌关联TCGA数据库信息分析评价", 《中华肿瘤防治杂志》 *
王沐淇等: "肝癌自噬相关lncRNA预后预测模型的构建", 《西安交通大学学报(医学版)》 *
肖金荣等: "基于公共数据库挖掘肝细胞癌预后相关的长链非编码RNA分子标签", 《中华流行病学杂志》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114107511A (en) * 2022-01-10 2022-03-01 深圳市龙华区人民医院 Marker combination for predicting liver cancer prognosis and application thereof
CN114107511B (en) * 2022-01-10 2023-10-20 深圳市龙华区人民医院 Marker combination for predicting prognosis of liver cancer and application thereof
CN114457161A (en) * 2022-03-07 2022-05-10 江西省肿瘤医院(江西省第二人民医院、江西省癌症中心) Application of lncRNA AC145207.5 in colorectal cancer diagnosis, treatment and drug sensitivity improvement
CN114457161B (en) * 2022-03-07 2023-12-19 江西省肿瘤医院(江西省第二人民医院、江西省癌症中心) Application of lncRNA AC145207.5 in colorectal cancer diagnosis, treatment and drug sensitivity improvement
CN116844685A (en) * 2023-07-03 2023-10-03 广州默锐医药科技有限公司 Immunotherapeutic effect evaluation method, device, electronic equipment and storage medium
CN116844685B (en) * 2023-07-03 2024-04-12 广州默锐医药科技有限公司 Immunotherapeutic effect evaluation method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113832227B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN113832227B (en) Construction and application of prognosis prediction model of hepatocellular carcinoma patient
Cooperberg et al. The diverse genomic landscape of clinically low-risk prostate cancer
Liang et al. Accurate diagnosis of pulmonary nodules using a noninvasive DNA methylation test
CN113450873A (en) Marker for predicting gastric cancer prognosis and immunotherapy applicability and application thereof
CN112301130B (en) Marker, kit and method for early detection of lung cancer
CN106811525B (en) Kit and system for predicting early postoperative recurrence of stage III gastric cancer patient
CN106282321B (en) By the liver cancer recurrence risk profile marker and kit for organizing snoRNA to form
CN113539376A (en) Gene model for judging prognosis of liver cell liver cancer patient, construction method and application
CN109371137B (en) Method for detecting hsa _ circ _0007986 in serum of esophageal cancer patient as novel biomarker and application
CN105018638A (en) Detection and application of gastric carcinogenesis associated molecular marker IncRNA (long non-coding RNA) HOTTIP (HOXA transcript at the distal tip)
CN115588507A (en) Prognosis model of lung adenocarcinoma EMT related gene, construction method and application
CN110423816A (en) Prognosis in Breast Cancer quantitative evaluation system and application
CN115482880A (en) Head and neck squamous carcinoma glycolysis related gene prognosis model, construction method and application
CN113517073A (en) Method and system for predicting survival rate after lung cancer surgery
Jones et al. Stromal composition predicts recurrence of early rectal cancer after local excision
CN107881239A (en) The miRNA marker related to colorectal cancer transfer and its application in blood plasma
CN111763740B (en) System for predicting treatment effect and prognosis of neoadjuvant radiotherapy and chemotherapy of esophageal squamous carcinoma patient based on lncRNA molecular model
CN108004323A (en) In tissue relevant miRNA marker and its application are shifted with colorectal cancer
CN112037863A (en) Early NSCLC prognosis prediction system
Li et al. Diagnostic value of microRNA-25 in patients with non-small cell lung cancer in Chinese population: A systematic review and meta-analysis
CN110408706A (en) It is a kind of assess recurrent nasopharyngeal carcinoma biomarker and its application
CN115798703A (en) Apparatus and computer-readable storage medium for predicting prognosis of renal clear cell carcinoma based on novel fatty acid metabolism-related gene
CN115261454A (en) Novel let-7d-5p and miR-140-5p biomarker panel diagnosis method
TW201512404A (en) Genetic marker and method for prediction of breast cancer recurrence
CN113564257A (en) Tumor marker and application thereof in preparation of colorectal cancer diagnostic kit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant