CN113355426B - Evaluation gene set and kit for predicting liver cancer prognosis - Google Patents
Evaluation gene set and kit for predicting liver cancer prognosis Download PDFInfo
- Publication number
- CN113355426B CN113355426B CN202110916132.9A CN202110916132A CN113355426B CN 113355426 B CN113355426 B CN 113355426B CN 202110916132 A CN202110916132 A CN 202110916132A CN 113355426 B CN113355426 B CN 113355426B
- Authority
- CN
- China
- Prior art keywords
- gene
- prognosis
- risk score
- patient
- genes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 150
- 238000004393 prognosis Methods 0.000 title claims abstract description 72
- 201000007270 liver cancer Diseases 0.000 title claims abstract description 57
- 208000014018 liver neoplasm Diseases 0.000 title claims abstract description 57
- 238000011156 evaluation Methods 0.000 title claims abstract description 24
- 230000014509 gene expression Effects 0.000 claims description 46
- 239000003153 chemical reaction reagent Substances 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 12
- 239000000523 sample Substances 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 8
- 238000012163 sequencing technique Methods 0.000 claims description 8
- 239000013610 patient sample Substances 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 238000012821 model calculation Methods 0.000 claims description 4
- 108020004707 nucleic acids Proteins 0.000 claims description 4
- 102000039446 nucleic acids Human genes 0.000 claims description 4
- 150000007523 nucleic acids Chemical class 0.000 claims description 4
- 239000012807 PCR reagent Substances 0.000 claims description 3
- 238000013480 data collection Methods 0.000 claims description 2
- 238000002360 preparation method Methods 0.000 claims description 2
- 230000004083 survival effect Effects 0.000 description 27
- 238000012360 testing method Methods 0.000 description 26
- 238000012549 training Methods 0.000 description 25
- 206010028980 Neoplasm Diseases 0.000 description 15
- 238000000034 method Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 11
- 210000002865 immune cell Anatomy 0.000 description 7
- 238000012216 screening Methods 0.000 description 5
- 238000011282 treatment Methods 0.000 description 5
- 230000003902 lesion Effects 0.000 description 4
- 210000005229 liver cell Anatomy 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- 239000002028 Biomass Substances 0.000 description 2
- 238000000636 Northern blotting Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000012268 genome sequencing Methods 0.000 description 2
- 230000001394 metastastic effect Effects 0.000 description 2
- 206010061289 metastatic neoplasm Diseases 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003753 real-time PCR Methods 0.000 description 2
- 238000000611 regression analysis Methods 0.000 description 2
- 238000013058 risk prediction model Methods 0.000 description 2
- 238000004611 spectroscopical analysis Methods 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 238000001262 western blot Methods 0.000 description 2
- 238000002965 ELISA Methods 0.000 description 1
- 102000013529 alpha-Fetoproteins Human genes 0.000 description 1
- 108010026331 alpha-Fetoproteins Proteins 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000007475 c-index Methods 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- 238000002648 combination therapy Methods 0.000 description 1
- 238000010205 computational analysis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 238000013524 data verification Methods 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005745 host immune response Effects 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000003018 immunoassay Methods 0.000 description 1
- 238000003364 immunohistochemistry Methods 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 238000011337 individualized treatment Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000036438 mutation frequency Effects 0.000 description 1
- 230000002980 postoperative effect Effects 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 238000003757 reverse transcription PCR Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 238000012353 t test Methods 0.000 description 1
- 230000004797 therapeutic response Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000035899 viability Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
- G01N33/57407—Specifically defined cancers
- G01N33/57438—Specifically defined cancers of liver, pancreas or kidney
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
- G01N33/57484—Immunoassay; Biospecific binding assay; Materials therefor for cancer involving compounds serving as markers for tumor, cancer, neoplasia, e.g. cellular determinants, receptors, heat shock/stress proteins, A-protein, oligosaccharides, metabolites
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Immunology (AREA)
- Physics & Mathematics (AREA)
- Urology & Nephrology (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Biotechnology (AREA)
- Pathology (AREA)
- Hematology (AREA)
- Cell Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Microbiology (AREA)
- Organic Chemistry (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Hospice & Palliative Care (AREA)
- Wood Science & Technology (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Medicinal Chemistry (AREA)
- Food Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Zoology (AREA)
- Software Systems (AREA)
- Gastroenterology & Hepatology (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
Abstract
The invention relates to a liver cancer prognosis evaluation gene set. Specifically, the invention uses a gene set consisting of 62 specific immune related genes to predict and evaluate the prognosis of the liver cancer patient and provides scientific basis for medical decision. The invention also relates to a kit, a computing device and a storage medium for predicting liver cancer prognosis.
Description
Technical Field
The present invention relates to a method for assessing the prognosis of a patient with liver cancer using a specific set of immune-related genes. Specifically, the invention relates to a characteristic gene set of 62 immune-related genes, which can be used for evaluating the prognosis of a liver cancer patient and providing scientific basis for medical decision.
Background
Tumors are incurable diseases, and the therapeutic targets are well defined, and patients are allowed to live longer by treatment since the tumors have been diagnosed. Liver cancer is one of the most common cancer types in China, and has high morbidity and mortality. The treatment generally adopts surgery, radiotherapy and chemotherapy and traditional Chinese medicine combination therapy. In the past, the recurrence rate and poor outcome rate of liver cancer are still high, and the 5-year recurrence-free survival rate and the overall survival rate are lower. The existing clinical prognostic evaluation indexes such as alpha-fetoprotein level, TNM staging and the like can not meet the requirements of comprehensively reflecting tumor characteristics and accurately judging the survival risk of patients. Therefore, there is an urgent need for reliable prognostic indicators to establish reliable prognostic models and make accurate survival predictions, and to combine corresponding molecular characteristics for targeted therapeutic intervention. In early clinical application, how to provide liver cancer prognosis prediction information for doctors and patients is an urgent problem to be solved, is helpful for formulating individualized treatment schemes, and has important clinical significance for improving postoperative survival of patients and realizing accurate treatment of liver cancer.
The activity of tumor cells and immune cells in the tumor microenvironment are involved in the generation and development of tumors, so tumor immunology has attracted attention. Tumor infiltrating immune cells are key cellular components of the host immune response and are important members of the tumor microenvironment. Many studies have demonstrated that tumor-infiltrating immune cells are associated with therapeutic response and prognosis in a variety of cancers.
The advantage of using the expression values of the characteristic gene set to evaluate the prognosis of the patient is objectivity and there is no subjective bias of the researcher. The disadvantage is that the observation time is long and it is necessary to record the occurrence of all events, i.e. the death of all patients. Published markers of immune-related genes, usually involve only a single immune gene or a small number of immune cells. However, the development of immune responses in vivo involves the involvement of multiple immune cells, and the evaluation of prognosis by a single immune gene or a small number of immune cells is not complete. Therefore, there remains a need for more accurate and efficient models that can predict the prognosis of cancer patients.
Disclosure of Invention
The method is based on TCGA liver cell liver cancer samples, samples are randomly divided into training sets and testing sets, and by combining gene expression value data and screening immune related genes, an evaluation gene set capable of predicting the prognosis of liver cell liver cancer according to the gene expression value is selected.
First, in a first aspect of the present invention, the present invention relates to an evaluation gene set for predicting prognosis of a liver cancer patient, the evaluation gene set comprising 62 genes, the genes being represented in table 1 below:
table 1: selected features assess gene status of a gene set
In another aspect, the present invention also relates to a kit for predicting prognosis of a patient with liver cancer, comprising a reagent that can specifically detect a gene expression value; wherein the genes are 62 genes in Table 1.
In this context, the terms "expression level" and "expression value" of a gene are used interchangeably to refer to the value of a parameter that measures the degree of expression of a given gene. The expression value can be determined by measuring the level of mRNA encoded by the gene of interest or by measuring the amount of protein encoded by the gene.
In some embodiments, the kit comprises one or more of nucleic acid extraction reagents, PCR reagents, genome/transcriptome sequencing reagents, gene-specific primers or probes, antibodies specific for gene expression products.
In some embodiments, the agent is any agent known in the art that can be used to detect the level of gene expression; in particular embodiments, the reagents are used in reagents for performing one or more of the following methods: real-time fluorescent quantitative PCR, northern blotting, western blotting, genome sequencing, transcriptome sequencing, biomass spectrometry or specific antibody detection.
In some embodiments, the kit further comprises sample processing reagents, such as sample lysis reagents, sample purification reagents, and nucleic acid extraction reagents, among others.
Transcriptome sequencing can rapidly and comprehensively obtain almost all transcripts and gene sequences of a specific cell or tissue of a certain species in a certain state through a second-generation sequencing platform, and can be used for researching gene expression quantity, gene function, structure, alternative splicing, prediction of new transcripts and the like. In addition, by designing appropriate primers, the transcription expression level of a gene can be determined by PCR such as reverse transcription PCR. The protein expression level of each gene can also be measured by an immunoassay such as immunohistochemistry, ELISA, or the like using an antibody specific to the gene protein.
Preferably, the gene expression value is a value obtained by annotating transcriptome sequencing data.
In another aspect, the present invention also relates to a method for predicting the prognosis of a patient with liver cancer, comprising the steps of:
a) sample collection and data detection: collecting a sample of the patient, determining their expression values for 62 genes in the evaluation gene set in table 1;
b) calculating the risk score: calculating the total expression value of the liver cancer patient in 62 genes of the evaluation gene set, namely a Risk Score (Risk Score); the risk score calculation formula is as follows:
wherein EiCoef for the value of expression of each geneiThe weight coefficient of each gene, n is the number of genes in the characteristic gene set, namely 62;
c) and (3) predicting the prognosis condition of the patient according to the calculated risk score of the liver cancer patient: the lower the risk score of the patient, the better the prognosis; and comparing the risk score with a defined value, if the risk score is higher than the defined value, predicting that the prognosis is poor, and if the risk score is lower than the defined value, predicting that the prognosis is good.
In some embodiments, the defined value is about 4.14.
In some embodiments, the patient sample is from a tissue of the patient, including tumor tissue, which is a primary lesion or a metastatic lesion.
As used herein, "about" when used in reference to a numerical value indicates that the calculation or measurement allows the value to encompass some approximation of the exact numerical value, or a reasonably close numerical value; "about" herein means at least the variation in value that can result from the usual methods of measuring or using such parameters; it should be understood that the presence or absence of "about" does not affect the interpretation of its numerical value; preferably, all values within the range of plus or minus 10% of the subsequent value are indicated. Those skilled in the art will appreciate that all or part of the functions of the above-described method steps may be implemented by hardware, or may be implemented by a computer program.
When all or part of the functions of the above method steps are implemented by means of a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
In another aspect, the invention also relates to a system for predicting prognosis of a patient with liver cancer, comprising the following modules:
a) a data collection module: collecting a sample of the patient, determining the expression values of 62 genes in the evaluation gene set in table 1, and inputting the expression value data of each gene to a model calculation module;
b) a model calculation module: calculating the total expression value of 62 genes in the evaluation gene set of the liver cancer patient, namely a Risk Score (Risk Score); the risk score calculation formula is as described above;
c) the output prediction module predicts the prognosis condition of the patient according to the risk score data of the liver cancer patient, wherein the lower the risk score of the patient is, the better the prognosis is; and comparing the risk score data with a defined value, if the risk score data is higher than the defined value, outputting that the prediction prognosis is not good, and if the risk score data is lower than the defined value, outputting that the prognosis is good.
In some embodiments, the defined value is about 4.14.
In some embodiments, the patient sample is from a tissue of the patient, including tumor tissue, which is a primary lesion or a metastatic lesion.
In another aspect, the present invention also relates to the use of the reagent for detecting the expression value of the genes described in table 1 in the preparation of kits and systems for predicting liver cancer prognosis.
In some embodiments, wherein the kit or system is a kit and system of the invention as described above.
In some embodiments, the agent for detecting expression values is selected from one or more of nucleic acid extraction reagents, PCR reagents, genome/transcriptome sequencing reagents, gene-specific primers or probes, antibodies specific for gene expression products. In some embodiments, the reagent is a reagent for performing one or more of the following: real-time fluorescent quantitative PCR, northern blotting, western blotting, genome sequencing, transcriptome sequencing, biomass spectrometry or specific antibody detection.
In another aspect, the invention also relates to a computing device comprising:
at least one processing unit; and
at least one memory coupled to the processing unit and storing instructions for execution by the processing unit, the instructions when executed, the apparatus enabling prediction of a prognosis for a liver cancer patient, the prediction comprising the steps of:
a) calculating a risk score for the patient based on the collected and determined expression values for 62 genes in the evaluation set of genes in table 1 for the patient sample; the risk score calculation formula is as described above;
b) predicting the prognosis condition of the patient according to the risk score data of the liver cancer patient, wherein the lower the risk score of the patient is, the better the prognosis is; comparing the risk score data with a defined value, if the risk score data is higher than the defined value, the prognosis is predicted to be poor, and if the risk score data is lower than the defined value, the prognosis is predicted to be good.
Preferably wherein said defined value is about 4.14.
In another aspect, the present invention also relates to a computer readable storage medium storing a computer program executable by a machine to perform the steps of predicting a prognosis for a patient with liver cancer, the steps comprising:
a) calculating a risk score for the patient based on the collected and determined expression values for 62 genes in the evaluation set of genes in table 1 for the patient sample; the risk score calculation formula is as described above;
b) predicting the prognosis condition of the patient according to the risk score data of the liver cancer patient, wherein the lower the risk score of the patient is, the better the prognosis is; comparing the risk score data with a defined value, if the risk score data is higher than the defined value, the prognosis is predicted to be poor, and if the risk score data is lower than the defined value, the prognosis is predicted to be good.
Preferably wherein said defined value is about 4.14.
The invention has the beneficial effects that:
the invention provides an evaluation gene set for liver cancer prognosis prediction and a corresponding kit, which can be more reliably applied to clinical practice. The characteristic gene set comprises 62 immune related genes, 22 immune cells are covered, and the prediction performance is verified in the test set. Compared with a method for correlating prognosis by mutation of a single gene, the method disclosed by the invention reduces the limitation of mutation frequency in a crowd and the limitation of collected samples on the stability of a survival analysis result, and can be used for more accurately predicting the prognosis of a liver cancer patient; and the method can be applied to clinical tests and provide scientific basis for medical decision making.
Drawings
FIG. 1: screening a characteristic gene set flow chart.
FIG. 2: selecting a result of the characteristic gene set; grouping patients into high risk groups (high group) and low risk groups (low group) according to the median of their risk scores for 62 selected genes; fig. 2 shows the Probability of Survival (Survival viability) for the high risk group and the low risk group in the training set (training set).
FIG. 3: selecting a result of the characteristic gene set; grouping patients into high risk groups (high group) and low risk groups (low group) according to their median of 62 selected gene risk scores; fig. 3 shows the probability of survival for the high risk group and the low risk group in the test set (test set).
FIG. 4: (ii) results of a randomly selected signature gene set; grouping patients into high risk groups (high group) and low risk groups (low group) according to their median of 62 randomly selected gene risk scores; where figure 4 shows the probability of survival for the high risk group and the low risk group in the training set.
FIG. 5: (ii) results of a randomly selected signature gene set; grouping patients into high risk groups (high group) and low risk groups (low group) according to their median of 62 randomly selected gene risk scores; fig. 5 shows the probability of survival for the high risk group and the low risk group in the test set.
Detailed Description
The following describes embodiments of the present invention with reference to the drawings. For the specific methods or materials used in the embodiments, those skilled in the art can make routine alternatives according to the existing technologies based on the technical idea of the present invention, and not limited to the specific embodiments of the present invention.
Example 1: establishing a model by a Lasso regression method to obtain a selected characteristic gene set
Data processing, screening immune gene related to liver cancer prognosis
Downloading gene expression data of the liver cell liver cancer and clinical data such as total survival time and survival end point of a patient from a cancer genome atlas (TCGA), wherein the gene expression data comprise 363 liver cell liver cancer samples and 60483 genes. In order to construct a liver cancer prognosis prediction model, 547 genes related to immunity are selected from 60483 genes for subsequent screening of a gene set for predicting patient prognosis.
Construction of liver cancer prognosis model
363 liver cancer patient samples in the TCGA dataset were randomly divided into 80% training set (290 samples) and 20% testing set (73 samples) with reference to clinical staging. Using training set samples and 547 immune-related genes, Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis was performed in the training set:
and (5) completing LASSO regression analysis and establishment of a multi-risk prediction model through the R package glmnet. And C, using cv. glmnet function in the training set, selecting a lasso regression model and a cox model, and modeling by using C-index as a judgment index of the model to obtain a penalty coefficient of the screening characteristic gene set. The penalty factor is 0.033. And the model built is validated using 20-fold cross validation. And finally, selecting 62 genes with weight values not being 0 as a final characteristic gene set. A multiple risk prediction model was thus established to predict patient prognosis, see table 2.
Table 2: selected characteristic evaluation gene set 62 genes and weight coefficients thereof
The weight coefficient of each gene was used to calculate a risk score for the signature gene set, whose expression value was the sum of the products of 62 genes and the respective weights (the calculation formula is as follows).
The calculation method of the characteristic gene lumped expression value, namely the Risk Score (Risk Score) is as follows:
i.e. the sum of the individual gene expression values and the individual weight coefficients. Wherein EiCoef for the value of expression of each geneiIs the weight coefficient of each gene, n is the number of genes in the characteristic gene set, and n is 62 in the invention; wherein the weight values corresponding to each gene are shown in Table 2.
Verifying model accuracy
And after the model training is finished, predicting the test set by using the established model and the selected gene set by using a prediction function, and testing the prediction capability of the model and the selected gene set on the data of the test set.
And according to the formula calculation method, calculating the total expression values (risk scores) of the patients in the training set, sorting the total expression values according to the sizes of the risk scores, and grouping the patients in the training set/the test set by using the median value, wherein the median value is 4.14 and is divided into a high risk group (high group) and a low risk group (low group). The Survival probability of high and low group patients is compared by plotting the Survival time (days) of the patients as the abscissa and the Survival probability (Survival probability) as the ordinate.
And performing multi-risk model prediction by using a coxph function in the R-packet survival. The function input file is patient group and patient survival time and status. The results were then examined using log-rank t test. Training set p values were less than 0.0001, 95% CI [0.066-0.18], and low risk group Hazard ratio values were 0.11. Test set p value 0.006, 95% CI [0.13-0.75], low risk group Hazard ratio value 0.3.
Fig. 2 shows the probability of survival situation for the high risk group and the low risk group of the training set. It can be seen that there are significant differences between the 2 groups of the training set, and the high risk group has a significantly lower probability of survival than the low risk group (P < 0.0001).
And calculating the C-index value of the training set. The training set C-index is 0.83. C-index, the consistency index (concordance index), used to evaluate the predictive power of the model; the C index is the proportion of pairs with the predicted result consistent with the actual result in all pairs of patients.
Test set data verification of liver cancer prognosis model
In order to verify the constructed liver cancer prognosis model, the expression values (risk scores) of liver cancer patients in the test set are calculated by using the same expression value formula and weight coefficients in the test set according to a similar process, and the test set is equally divided into a high group and a low group by using the same critical value so as to verify the accuracy of the liver cancer prognosis model of the evaluation gene set of the 62 genes. Fig. 3 shows the probability of survival for the high risk group and the low risk group of the test set. As can be seen from fig. 3, the survival probability of the high risk group is significantly lower than that of the low risk group (p = 0.006), i.e. the test set data verifies that the prognostic model is highly reliable. The C-index value of the test set was calculated to be 0.7.
Example 2: predictive power comparison of selected signature gene sets to random gene sets
To further verify the validity of the selected estimated gene set of 62 genes, the other 62 genes were randomly selected from 547 genes (excluding the above selected 62 genes) to form a "random gene set" and compared with the selected "estimated gene set"; the genes of the random gene set and their weight coefficients are seen in table 3.
Table 3: 62 genes of random gene set and weight coefficients thereof
The patients were also divided into a training set (80%) and a test set (20%) according to the procedure described in example 1, and the risk scores of the patients in the test set in the randomized model were calculated using each gene in the randomized gene set and its weight coefficients in table 3. The random gene set risk score calculation method is similar to that in example 1. Calculating the C-index of the training set and the test set; wherein the training set C-index: 0.86, test set C-index: 0.55.
the training set patients were also grouped by median risk score (1.66 by computational analysis) of the training set into high risk group and low risk group. FIG. 4 shows the probability of survival for the high risk group and the low risk group of the training set of the "random gene set". It can be seen that there are significant differences between the 2 groups of the training set, with the high risk group having a significantly lower probability of survival than the low risk group (p < 0.0001).
However, the same risk scoring formula and weighting factors were used to calculate the risk scores of the liver cancer patients in the test set of the random gene set, and the test set was equally divided into high and low groups by the median value (1.66) obtained from the above training set to verify the accuracy of the liver cancer prognosis model of the 62 genes "random gene set". Fig. 5 shows the survival probability of the high risk group and the low risk group of the training set of the "random gene set", and it can be seen that the survival probability of the high risk group is not significantly different from that of the low risk group (p = 0.48). The verification of the test set shows that the random gene set can not effectively predict the prognosis of the liver cancer patient.
As shown by the comparison between the selected gene set and the random gene set, the estimated gene set of 62 specific genes constructed by the invention can effectively predict the prognosis of the liver cancer patient, but the randomly selected gene set cannot be realized.
In order to accurately predict the prognosis risk of the liver cancer patient, 62 immune related genes are determined to predict the prognosis condition of the liver cancer, so that a high risk group and a low risk group of the liver cancer patient can be effectively distinguished, and the immune related genes can be developed into potential in-vitro diagnosis products to predict and detect the prognosis condition of the liver cancer patient, so that preventive medication or treatment is realized, and an accurate judgment basis is provided for further auxiliary treatment of the prognosis of the liver cancer patient.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (11)
2. a kit for predicting prognosis of a patient with liver cancer, comprising a reagent for detecting the expression level of a gene; wherein the genes are 62 genes in the evaluation gene set of claim 1.
3. The kit of claim 2, further comprising one or more of nucleic acid extraction reagents, PCR reagents, genomic/transcriptome sequencing reagents, gene-specific primers or probes, antibodies specific for gene expression products.
4. A system for predicting prognosis in a patient with liver cancer, comprising the following modules:
a) a data collection module: collecting the sample of the patient, measuring the gene expression value of the sample, and outputting the expression value data of each gene to a model calculation module; wherein the genes are 62 genes in the evaluation gene set of claim 1;
b) a model calculation module: calculating the total expression value of 62 genes of the liver cancer patient, namely a Risk Score (Risk Score); the risk score calculation formula is as follows:
wherein; eiCoef for the expression value of each geneiIs the weight coefficient corresponding to each gene, n is the total number of genes, namely 62;
wherein, each gene and the corresponding weight coefficient are as follows:
c) an output prediction module: predicting the prognosis of the patient according to the risk score data of the liver cancer patient, wherein the lower the risk score of the patient is, the better the prognosis is; and comparing the risk score with a defined value, and if the risk score is higher than the defined value, outputting that the prognosis is not good, and if the risk score is lower than the defined value, outputting that the prognosis is good.
5. The system of claim 4, the defined value being 4.14.
6. A computing device, comprising:
at least one processing unit; and
at least one memory coupled to the processing unit and storing instructions for execution by the processing unit, the instructions when executed, the apparatus enabling prediction of a prognosis for a liver cancer patient, the prediction comprising the steps of:
a) calculating a risk score for the patient based on the collected and determined expression values of the genes in the patient sample; the genes are 62 genes in the evaluation gene set of claim 1; the risk score calculation formula is as follows:
wherein; eiCoef for the expression value of each geneiIs the weight coefficient corresponding to each gene, n is the total number of genes, namely 62; wherein, each gene and the corresponding weight coefficient are shown in claim 4;
b) predicting the prognosis condition of the patient according to the risk score of the liver cancer patient, wherein the lower the risk score of the patient is, the better the prognosis is; and comparing the risk score with a defined value, if the risk score is higher than the defined value, the prognosis is predicted to be poor, and if the risk score is lower than the defined value, the prognosis is predicted to be good.
7. The computing device of claim 6, wherein the defined value is 4.14.
8. A computer-readable storage medium storing a computer program executable by a machine to perform steps for predicting prognosis of a patient with liver cancer, the steps comprising:
a) calculating a risk score for the patient based on the collected and determined gene expression values for the patient sample; wherein the genes are 62 genes in the evaluation gene set of claim 1; the risk score calculation formula is as follows:
wherein; eiCoef for the expression value of each geneiIs the weight coefficient corresponding to each gene, n is the total number of genes, namely 62; wherein, each gene and the corresponding weight coefficient are shown in claim 4;
b) predicting the prognosis condition of the patient according to the risk score data of the liver cancer patient, wherein the lower the risk score of the patient is, the better the prognosis is; comparing the risk score data with a defined value, if the risk score data is higher than the defined value, the prognosis is predicted to be poor, and if the risk score data is lower than the defined value, the prognosis is predicted to be good.
9. The computer-readable storage medium of claim 8, wherein the defined value is 4.14.
10. Use of a reagent for detecting gene expression levels in the preparation of a kit or system for predicting prognosis in a patient with liver cancer; wherein the genes are 62 genes in the evaluation gene set of claim 1.
11. The use according to claim 10, wherein the kit is a kit according to claim 2 or 3; the system is the system of claim 4 or 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110916132.9A CN113355426B (en) | 2021-08-11 | 2021-08-11 | Evaluation gene set and kit for predicting liver cancer prognosis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110916132.9A CN113355426B (en) | 2021-08-11 | 2021-08-11 | Evaluation gene set and kit for predicting liver cancer prognosis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113355426A CN113355426A (en) | 2021-09-07 |
CN113355426B true CN113355426B (en) | 2021-11-09 |
Family
ID=77522945
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110916132.9A Active CN113355426B (en) | 2021-08-11 | 2021-08-11 | Evaluation gene set and kit for predicting liver cancer prognosis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113355426B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114317532B (en) * | 2021-12-31 | 2024-01-19 | 广东省人民医院 | Evaluation gene set, kit, system and application for predicting leukemia prognosis |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101457254A (en) * | 2008-10-09 | 2009-06-17 | 北京大学人民医院 | Liver cancer prognosis |
CN102206710A (en) * | 2011-04-12 | 2011-10-05 | 复旦大学附属中山医院 | Real time polymerase chain reaction (PCR) microarray kit for predicting postoperative recurrence and metastasis of liver cancer after operation |
CN107657149A (en) * | 2017-09-12 | 2018-02-02 | 中国人民解放军军事医学科学院生物医学分析中心 | System for predicting liver cancer patient prognosis |
JP2019006678A (en) * | 2015-11-09 | 2019-01-17 | 国立大学法人東北大学 | Anti-phosphorylate bach2 antibody and anti-tumor immune activator screening method |
CN112011616A (en) * | 2020-09-02 | 2020-12-01 | 复旦大学附属中山医院 | Immune gene prognosis model for predicting hepatocellular carcinoma tumor immune infiltration and postoperative survival time |
-
2021
- 2021-08-11 CN CN202110916132.9A patent/CN113355426B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101457254A (en) * | 2008-10-09 | 2009-06-17 | 北京大学人民医院 | Liver cancer prognosis |
CN102206710A (en) * | 2011-04-12 | 2011-10-05 | 复旦大学附属中山医院 | Real time polymerase chain reaction (PCR) microarray kit for predicting postoperative recurrence and metastasis of liver cancer after operation |
JP2019006678A (en) * | 2015-11-09 | 2019-01-17 | 国立大学法人東北大学 | Anti-phosphorylate bach2 antibody and anti-tumor immune activator screening method |
CN107657149A (en) * | 2017-09-12 | 2018-02-02 | 中国人民解放军军事医学科学院生物医学分析中心 | System for predicting liver cancer patient prognosis |
CN112011616A (en) * | 2020-09-02 | 2020-12-01 | 复旦大学附属中山医院 | Immune gene prognosis model for predicting hepatocellular carcinoma tumor immune infiltration and postoperative survival time |
Non-Patent Citations (3)
Title |
---|
"Novel biomarkers in hepatocellular carcinoma";Felice De Stefano等;《Digestive and Liver Disease》;20180824;第50卷;第1115-1123页 * |
"基于TCGA数据库分析肝癌miRNA及其靶基因的预后意义";刘芳远等;《包头医学院学报》;20201231;第36卷(第9期);第76-80、96页 * |
"肝癌免疫相关预后标志物的分析";鞠铭伊等;《郑州大学学报(医学版)》;20201130;第55卷(第6期);第779-785页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113355426A (en) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11041866B2 (en) | Pancreatic cancer biomarkers and uses thereof | |
AU2011312491B2 (en) | Mesothelioma biomarkers and uses thereof | |
CN111394456B (en) | Early lung adenocarcinoma patient prognosis evaluation system and application thereof | |
AU2009291811B2 (en) | Lung cancer biomarkers and uses thereof | |
AU2011378427B8 (en) | Lung cancer biomarkers and uses thereof | |
US20120101002A1 (en) | Lung Cancer Biomarkers and Uses Thereof | |
CN113234829B (en) | Colon cancer prognosis evaluation gene set and construction method thereof | |
US20120143805A1 (en) | Cancer Biomarkers and Uses Thereof | |
CN111676288B (en) | System for predicting lung adenocarcinoma patient prognosis and application thereof | |
CN114686591B (en) | Lung squamous cell carcinoma immunotherapy curative effect prediction model based on gene expression condition, construction method and application thereof | |
WO2011043840A1 (en) | Cancer biomarkers and uses thereof | |
CN112331343A (en) | Method for establishing hepatocellular carcinoma postoperative risk assessment model | |
JP2016073287A (en) | Method for identification of tumor characteristics and marker set, tumor classification, and marker set of cancer | |
JP2006302222A (en) | Cancer onset risk prediction system and method, and cancer derivative method | |
CN113355426B (en) | Evaluation gene set and kit for predicting liver cancer prognosis | |
Men et al. | A prognostic 11 genes expression model for ovarian cancer | |
US20220065872A1 (en) | Lung Cancer Biomarkers and Uses Thereof | |
CN115331812A (en) | Establishment and verification method of serous ovarian cancer prognostic marker model | |
CN117766024B (en) | Ovarian cancer CD8+T cell related prognosis evaluation method, system and application thereof | |
WO2023246808A1 (en) | Use of cancer-associated short exons to assist cancer diagnosis and prognosis | |
CN118609805A (en) | Construction method of severe aGVHD risk prediction model, prediction method, system equipment and storage medium | |
CN118222713A (en) | Application of biomarker in detection of brain glioma-related TLS | |
CN117577173A (en) | Application of FCGR3B protein and screening method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Evaluation gene set and kit for predicting liver cancer prognosis Granted publication date: 20211109 Pledgee: Agricultural Bank of China Limited Shanghai Free Trade Zone Branch Pledgor: Shanghai Zhiben medical laboratory Co.,Ltd.|ORIGIMED TECHNOLOGY (SHANGHAI) Co.,Ltd.|Zhiben medical technology (Chongqing) Co.,Ltd. Registration number: Y2024980012757 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |