CN109988708B - System for typing a patient suffering from colorectal cancer - Google Patents

System for typing a patient suffering from colorectal cancer Download PDF

Info

Publication number
CN109988708B
CN109988708B CN201910106934.6A CN201910106934A CN109988708B CN 109988708 B CN109988708 B CN 109988708B CN 201910106934 A CN201910106934 A CN 201910106934A CN 109988708 B CN109988708 B CN 109988708B
Authority
CN
China
Prior art keywords
genes listed
genes
recurrence
biomarker
patients
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910106934.6A
Other languages
Chinese (zh)
Other versions
CN109988708A (en
Inventor
孙恬
吴汝嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Carbon Logic Biotechnology Foshan Co ltd
Original Assignee
Carbon Logic Biotechnology Zhongshan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Carbon Logic Biotechnology Zhongshan Co ltd filed Critical Carbon Logic Biotechnology Zhongshan Co ltd
Priority to CN201910106934.6A priority Critical patent/CN109988708B/en
Publication of CN109988708A publication Critical patent/CN109988708A/en
Application granted granted Critical
Publication of CN109988708B publication Critical patent/CN109988708B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides a system for typing patients with colorectal cancer, which can show significant performance, and the survival rates without recurrence within 5 years of patients typed as high recurrence risk type and low recurrence risk type.

Description

System for typing a patient suffering from colorectal cancer
Technical Field
The present invention relates to a system for typing a patient suffering from colorectal cancer.
Background
Colorectal cancer is the third most common cancer in the world (Ferlay et al, 2013). Approximately one third of the new cases are diagnosed as stage 2 disease, while the 5-year survival rate for stage 2 colorectal cancer patients after simple surgical resection is approximately 60-80%, and therefore, only approximately 25% of stage 2 colorectal cancer patients will require adjuvant therapy (Labianca et al, 2013). Determination of stage 2 colorectal cancer patients with high risk of recurrence after surgery can be used to select a subset of patients who may benefit from postoperative adjuvant therapy, but currently clinically widely used clinical pathological parameters of patients, such as number of lymph nodes, histological grade, deep tumor infiltration, adjacent organ invasion, do not accurately predict prognosis and accurately select patients with high risk of recurrence to receive adjuvant therapy. The present invention thus provides a method by which tumour samples from colorectal cancer patients can be typed to determine said patients with high risk of recurrence and patients with low risk of recurrence.
Disclosure of Invention
The object of the present invention is to provide a system for typing patients suffering from colorectal cancer, with which patients suffering from colorectal cancer are typed, said patients being classified into patients with a high risk of recurrence, a subgroup of patients with a high risk of recurrence being patients that should be treated post-operatively, and patients with a low risk of recurrence.
In order to realize the purpose, the technical scheme is as follows: a system for typing a patient having colorectal cancer, comprising
A data input module for inputting a first similarity value between the RNA expression level of a first biomarker in a tissue sample of a patient with colorectal cancer and the RNA expression level of the first biomarker in a tissue sample of a patient with disease relapse within five years and low cell cycle activity, and a second similarity value between the RNA expression level of the first biomarker in a tissue sample of a patient with colorectal cancer and the RNA expression level of the first biomarker in a tissue sample of a patient with no disease relapse within five years into a model calculation module, the first biomarker comprising at least 3 genes listed in table 1; inputting into the model calculation module a third similarity value between the RNA expression level of a second biomarker in a tissue sample of the patient with colorectal cancer and the RNA expression level of the second biomarker in a tissue sample of a patient with disease relapse within five years and non-low cell cycle activity, and a fourth similarity value between the RNA expression level of the second biomarker in a tissue sample of the patient with colorectal cancer and the RNA expression level of the second biomarker in a tissue sample of a patient with no disease relapse within five years, the second biomarker comprising at least 3 genes listed in table 2;
a model calculation module comprising a typing model for calculating a first and a second typing score value for a patient having colorectal cancer based on the first, second, third, fourth similarity values and the typing model; the first fractal score value = first similarity value-second similarity value, the second fractal score value = third similarity value-fourth similarity value;
a result output module, which is used for judging whether the patient has bad prognosis or good prognosis according to the first typing score value and the second typing score value of the patient with colorectal cancer, and when the first typing score value of the patient with colorectal cancer is larger than or equal to the first similarity threshold value, or/and the second typing score value of the patient with colorectal cancer is larger than or equal to the second similarity threshold value, the patient is classified as high recurrence risk and has bad prognosis; when the first typing score value of the patient with colorectal cancer < the first similarity threshold and the second typing score value of the patient with colorectal cancer < the second similarity threshold, the patient is classified as low risk of recurrence with a good prognosis.
Preferably, the first similarity threshold =0.155 and the second similarity threshold =0.076.
The present invention provides a method for typing a patient suffering from colorectal cancer, comprising the steps of:
(1) Providing an RNA sample extracted from a tumor sample of the patient, the tumor sample comprising colorectal cancer cells;
determining the RNA expression level of a biomarker in the RNA sample, wherein the biomarker comprises at least 3 genes listed in table 1;
determining a similarity value a between the RNA expression level of the biomarker and the RNA expression level of the biomarker for patients with disease relapse and low cell cycle activity within 5 years;
determining a similarity value b between the RNA expression level of the biomarker and the RNA expression level of the biomarker in patients without disease relapse within 5 years;
determining the difference e between the similarity value a and the similarity value b;
(2) Determining the RNA expression level of a biomarker in the RNA sample, wherein the biomarker comprises at least 3 genes listed in table 2;
determining a similarity value c between the RNA expression level of the biomarker and the RNA expression level of the biomarker for patients with disease relapse and non-low cell cycle activity within 5 years;
determining a similarity value d between the RNA expression level of the biomarker and the RNA expression level of the biomarker in patients without disease relapse within 5 years;
determining the difference f between the similarity value c and the similarity value d;
(3) Classifying the patient as having a poor prognosis if the difference e is above a similarity threshold or/and the difference f is above a similarity threshold; classifying the patient as having a good prognosis if the difference e is below the similarity threshold and the difference f is below the similarity threshold.
The present invention provides a system for typing a patient suffering from colorectal cancer comprising
A data input module for inputting a fifth similarity value between the RNA expression level of a third biomarker in a tissue sample of a patient with colorectal cancer and the RNA expression level of the third biomarker in a tissue sample of a patient with disease relapse within five years and low cell cycle activity, and a sixth similarity value between the RNA expression level of the third biomarker in a tissue sample of a patient with colorectal cancer and the RNA expression level of the third biomarker in a tissue sample of a patient with no disease relapse within five years into the model calculation module, the third biomarker comprising at least 3 genes listed in table 3; inputting into the model calculation module a seventh similarity value between the RNA expression level of a fourth biomarker in the tissue sample of the patient with colorectal cancer and the RNA expression level of the fourth biomarker in the tissue sample of the patient with disease relapse within five years and non-low cell cycle activity, and an eighth similarity value between the RNA expression level of the fourth biomarker in the tissue sample of the patient with colorectal cancer and the RNA expression level of the fourth biomarker in the tissue sample of the patient with disease relapse within five years, the fourth biomarker comprising at least 3 genes listed in table 4;
a model calculation module comprising a typing model for calculating a third and a fourth typing score for patients with colorectal cancer based on the fifth, sixth, seventh, eighth and typing values; the third classification score value = fifth similarity value-sixth similarity value, the fourth classification score value = seventh similarity value-eighth similarity value;
a result output module, which is used for judging whether the patient has bad prognosis or good prognosis according to the third typing score value and the fourth typing score value of the patient with colorectal cancer, and when the third typing score value of the patient with colorectal cancer is larger than or equal to a third similarity threshold value, or/and the fourth typing score value of the patient with colorectal cancer is larger than or equal to a fourth similarity threshold value, the patient is classified as high recurrence risk and has bad prognosis; when the third typing score value < the third similarity threshold for patients with colorectal cancer and the fourth typing score value < the fourth similarity threshold for patients with colorectal cancer, the patients are classified as low risk of recurrence with a good prognosis.
Preferably, the third similarity threshold =0.198 and the fourth similarity threshold = -0.003.
The present invention provides a method for typing a patient suffering from colorectal cancer, using a stable cell core gene profile, comprising the steps of:
(1) Providing an RNA sample extracted from a tumor sample of the patient, the tumor sample comprising colorectal cancer cells;
determining the RNA expression level of a biomarker in the RNA sample; wherein the biomarker comprises at least 3 genes listed in table 3;
determining a similarity value a between the RNA expression level of the biomarker and the RNA expression level of the biomarker for patients with disease relapse and low cell cycle activity within 5 years;
determining a similarity value b between the RNA expression level of the biomarker and the RNA expression level of the biomarker in patients with no disease relapse within 5 years;
determining the difference e between the similarity value a and the similarity value b;
(2) Determining the RNA expression level of a biomarker in the RNA sample; wherein the biomarker comprises at least 3 genes listed in table 4;
determining a similarity value c between the RNA expression level of the biomarker and the RNA expression level of the biomarker for patients with disease relapse and non-low cell cycle activity within 5 years;
determining a similarity value d between the RNA expression level of the biomarker and the RNA expression level of the biomarker in patients with no disease relapse within 5 years;
determining the difference f between the similarity value c and the similarity value d;
(3) Classifying the patient as having a poor prognosis if the difference e is above a similarity threshold or/and the difference f is above a similarity threshold; classifying the patient as having a good prognosis if the difference e is below the similarity threshold and the difference f is below the similarity threshold.
Preferably, the colorectal cancer comprises TNM stage II cancer according to the TNM staging system.
Preferably, the tissue sample comprises a clinically relevant sample of colorectal cancer cells or expression products of nucleic acids from colorectal cancer cells.
Preferably, the determination method of the RNA expression level comprises blotting hybridization, quantitative PCR, RNAseq sequencing and microarray analysis; the method of measuring said first, second, third, fourth, fifth, sixth, seventh or eighth similarity value comprises, but is not limited to, an euclidean distance, a manhattan distance, a chebyshev distance, a minkowski distance, a normalized euclidean distance, a mahalanobis distance, an included angle cosine, a hamming distance, a jackard distance, a correlation coefficient or an information entropy. Preferably, the method of measuring the first, second, third, fourth, fifth, sixth, seventh or eighth similarity value is a pearson correlation coefficient.
Methods for normalizing the systematic deviations of at least three genes listed in table 1 include, but are not limited to, the gene chip fRMA method, the gene chip RMA method, the RNAseq sequencing CPM method, the RNAseq sequencing FPKM method.
Normalization methods to correct for systematic variation of at least three genes listed in table 2 include, but are not limited to, the gene chip fRMA method, the gene chip RMA method, the RNAseq sequencing CPM method, the RNAseq sequencing FPKM method.
Normalization methods to correct for systematic variation of at least three genes listed in table 3 include, but are not limited to, the gene chip fRMA method, the gene chip RMA method, the RNAseq sequencing CPM method, the RNAseq sequencing FPKM method.
Normalization methods to correct for systematic variation of at least three genes listed in table 4 include, but are not limited to, the gene chip fRMA method, the gene chip RMA method, the RNAseq sequencing CPM method, the RNAseq sequencing FPKM method.
Preferably, the patient with disease recurrence and low cell cycle activity within five years is a patient who is followed up for disease recurrence and low cell cycle activity within five years, and the patient with no disease recurrence within five years is followed up for no disease recurrence within five years.
The threshold is any value that can distinguish between an RNA sample from a patient with a high risk of cancer recurrence and an RNA sample from a patient with a low risk of cancer recurrence.
Preferably, the first biomarker comprises at least 4 genes listed in table 1; more preferably, the first biomarker comprises at least 5 genes listed in table 1; more preferably, the first biomarker comprises at least 6 genes listed in table 1; more preferably, the first biomarker comprises at least 7 genes listed in table 1; more preferably, the first biomarker comprises at least 8 genes listed in table 1; more preferably, the first biomarker comprises at least 9 genes listed in table 1; more preferably, the first biomarker comprises at least 10 genes listed in table 1; more preferably, the first biomarker comprises at least 15 genes listed in table 1; more preferably, the first biomarker comprises at least 20 genes listed in table 1; more preferably, the first biomarker comprises at least 25 genes listed in table 1; more preferably, the first biomarker comprises at least 30 genes listed in table 1; more preferably, the first biomarker comprises at least 40 genes listed in table 1; more preferably, the first biomarker comprises at least 50 genes listed in table 1; more preferably, the first biomarker comprises at least 60 genes listed in table 1; more preferably, the first biomarker comprises at least 70 genes listed in table 1; more preferably, the first biomarker comprises at least 80 genes listed in table 1; more preferably, the first biomarker comprises at least 90 genes listed in table 1; more preferably, the first biomarker comprises at least 100 genes listed in table 1; more preferably, the first biomarker comprises at least 150 genes listed in table 1; more preferably, the first biomarker comprises at least 200 genes listed in table 1; most preferably, the first biomarker comprises all of the genes listed in table 1;
the second biomarker comprises at least 4 genes listed in table 2; more preferably, the second biomarker comprises at least 5 genes listed in table 2; more preferably, the second biomarker comprises at least 6 genes listed in table 2; more preferably, the second biomarker comprises at least 7 genes listed in table 2; more preferably, the second biomarker comprises at least 8 genes listed in table 2; more preferably, the second biomarker comprises at least 9 genes listed in table 2; more preferably, the second biomarker comprises at least 10 genes listed in table 2; more preferably, the second biomarker comprises at least 15 genes listed in table 2; more preferably, the second biomarker comprises at least 20 genes listed in table 2; more preferably, the second biomarker comprises at least 25 genes listed in table 2; more preferably, the second biomarker comprises at least 30 genes listed in table 2; more preferably, the second biomarker comprises at least 40 genes listed in table 2; more preferably, the second biomarker comprises at least 50 genes listed in table 2; more preferably, the second biomarker comprises at least 60 genes listed in table 2; more preferably, the second biomarker comprises at least 70 genes listed in table 2; more preferably, the second biomarker comprises at least 80 genes listed in table 2; more preferably, the second biomarker comprises at least 90 genes listed in table 2; most preferably, the second biomarker comprises all genes listed in table 2.
Preferably, the third biomarker comprises at least 4 genes listed in table 3; more preferably, the third biomarker comprises at least 5 genes listed in table 3; more preferably, the third biomarker comprises at least 6 genes listed in table 3; more preferably, the third biomarker comprises at least 7 genes listed in table 3; more preferably, the third biomarker comprises at least 8 genes listed in table 3; more preferably, the third biomarker comprises at least 9 genes listed in table 3; more preferably, the third biomarker comprises at least 10 genes listed in table 3; more preferably, the third biomarker comprises at least 15 genes listed in table 3; more preferably, the third biomarker comprises at least 20 genes listed in table 3; more preferably, the third biomarker comprises at least 25 genes listed in table 3; more preferably, the third biomarker comprises at least 30 genes listed in table 3; more preferably, the third biomarker comprises at least 40 genes listed in table 3; more preferably, the third biomarker comprises at least 50 genes listed in table 3; more preferably, the third biomarker comprises at least 60 genes listed in table 3; more preferably, the third biomarker comprises at least 70 genes listed in table 3; more preferably, the third biomarker comprises at least 80 genes listed in table 3; more preferably, the third biomarker comprises at least 90 genes listed in table 3; more preferably, the third biomarker comprises at least 100 genes listed in table 3; most preferably, the third biomarker comprises all genes listed in table 3;
the fourth biomarker comprises at least 4 genes listed in table 4; more preferably, the fourth biomarker comprises at least 5 genes listed in table 4; more preferably, the fourth biomarker comprises at least 6 genes listed in table 4; more preferably, the fourth biomarker comprises at least 7 genes listed in table 4; more preferably, the fourth biomarker comprises at least 8 genes listed in table 4; more preferably, the fourth biomarker comprises at least 9 genes listed in table 4; more preferably, the fourth biomarker comprises at least 10 genes listed in table 4; more preferably, the fourth biomarker comprises at least 15 genes listed in table 4; more preferably, the fourth biomarker comprises at least 20 genes listed in table 4; more preferably, the fourth biomarker comprises at least 25 genes listed in table 4; more preferably, the fourth biomarker comprises at least 30 genes listed in table 4; most preferably, the fourth biomarker comprises all genes listed in table 4.
Preferably, model stacking is used to increase the stability of the final parting score value. The genes in table 1 and table 2 were divided into 8 groups according to gene function: (1) cell division genes (table 1 gene function, cell cycle), (2) DNA repair genes (table 1 gene function, dnarepair), (3) epithelial mesenchymal transition genes (table 1 gene function, emt), (4) cell transfer genes (table 1 gene function, movement), (5) T cell-associated genes (table 1 gene function, tcell), (6) first 60 genes (table 1 gene function, top) with recurrence and low cell cycle activity statistics most significant, and (7) Wnt information transduction (table 2 gene function, wnt) (8) first 60 genes (table 2 gene function, top) with recurrence and non-low cell cycle activity statistics most significant. And (3) respectively calculating the typing score values of the 8 groups of genes by using a nearest centroid classification method, and finally fusing the 8 typing score values into 2 final typing score values by using a K-nearest neighbor regression method (Hechenbichler and Schliep, 2004) model. The model stacking method includes, but is not limited to, K-nearest neighbor regression, nearest centroid classification, and neural network. For each sample, if the 1 st final typing score of the sample exceeds the first similarity threshold, or the 2 nd final typing score exceeds the second similarity threshold, it is considered as a high recurrence risk sample, otherwise it is a low recurrence risk sample.
The invention also provides a method of prescribing treatment to a patient having colorectal cancer, comprising classifying the patient into a patient with high risk of recurrence and a patient with low risk of recurrence according to the system of the invention, and receiving adjuvant therapy if the patient is classified as having high risk of recurrence. Chemotherapeutic compounds include, but are not limited to, oxaliplatin, formic acid tetrahydrofolic acid, 5FU, capecitabine, eritinib, a topoisomerase 1 inhibitor. Antibody treatments include, but are not limited to, bevacizumab, cetuximab, PD-1 inhibitor oldivo, TGFBeta receptor inhibitor.
The invention provides the use of at least three genes listed in table 1, a reagent that detects the RNA expression level of at least three genes listed in table 1, at least three genes listed in table 2, or a reagent that detects the RNA expression level of at least three genes listed in table 2, in the preparation of a reagent or kit for typing a patient having colorectal cancer.
The invention provides the use of a combination of at least three genes listed in table 1 and at least three genes listed in table 2 or a combination of a reagent that detects the RNA expression level of at least three genes listed in table 1 and a reagent that detects the RNA expression level of at least three genes listed in table 2 in the preparation of a reagent or kit for typing a patient having colorectal cancer.
The invention provides the use of at least three genes listed in table 3, a reagent that detects the RNA expression level of at least three genes listed in table 3, at least three genes listed in table 4, or a reagent that detects the RNA expression level of at least three genes listed in table 4, in the preparation of a reagent or kit for typing a patient having colorectal cancer.
The invention provides the use of a combination of at least three genes listed in table 3 and at least three genes listed in table 4 or a combination of an agent that detects the RNA expression levels of at least three genes listed in table 3 and an agent that detects the RNA expression levels of at least three genes listed in table 4 in the preparation of a reagent or kit for typing a patient suffering from colorectal cancer.
Has the advantages that:
the invention provides a system for typing patients suffering from colorectal cancer, the system can show remarkable performance when being used for typing patients suffering from colorectal cancer, the typing is of a high recurrence risk type and a low recurrence risk type, the recurrence-free survival rate is remarkably different within 5 years, and the typing result is superior to the risk evaluation clinical parameters recommended by the currently used medical manuals.
Drawings
FIG. 1 is a flow chart of a method of genotyping using 4 rounds of iterative supervised machine learning and genetic function generation.
FIG. 2 is a graph of survival time for relapse in group 1 of 416 phase II colorectal cancer patients, with significantly different relapse-free survival rates in 5 years for the high and low relapse risk types, using the genes in tables 1 and 2 and the system of the invention.
FIG. 3 is a graph of survival time for relapse in group 2 of 41 stage II colorectal cancer patients, with significantly different relapse-free survival rates in 5 years for the high and low relapse risk types, using the genes in tables 1 and 2 and the system of the invention.
FIG. 4 is a graph of survival curves for time to relapse in group 1 of 416 patients with colorectal cancer in stage II using the stable cell core gene profiles in tables 3 and 4 and the system of the invention, with significantly different relapse free survival rates in 5 years for the high and low relapse risk types.
FIG. 5 is a graph of survival time for relapse in group 2 patients with stage 41 colorectal cancer, with a significant difference in relapse free survival rates between high and low relapse risk types within 5 years, using the stable cell core gene profiles in tables 3 and 4 and the system of the invention.
FIG. 6 is a plot of the recurrence risk ratio for any of the 3 to 50 genetic constructs in Table 1 for the typing method and the system of the present invention.
FIG. 7 is a plot of the recurrence risk ratio for any of the 3 to 50 genetic constructs in Table 2 for the typing method and the system of the present invention.
FIG. 8 is a plot of the recurrence risk ratio for any 3 to 50 genetic constructs in Table 3 and the system of the invention.
FIG. 9 is a plot of the recurrence risk ratio for any of the 3 to 50 genetic constructs in Table 4 for the typing method and the system of the present invention.
Detailed Description
Example 1: generation of the typing systems and methods of the invention, selection methods for patients with disease recurrence and low cell cycle activity and patients with disease recurrence and non-low cell cycle activity
Stage II colorectal cancer tumor specimens (n =416, 84 patient specimens with high risk of recurrence with recorded cancer recurrence at follow-up, 332 patient specimens with low risk of recurrence with no recorded cancer recurrence at follow-up) were used in this study. Clinical data included TNM staging of tumors, patient survival and 5-year follow-up of relapse. Gene expression data from tumor specimens, data normalized using the fRMA method. Other conventional normalization procedures may also be used.
Using gene expression data for all genes, a 4-round iterative supervised machine learning approach was used (see figure 1). For each round, a fraction of high-risk relapse specimens was compared to all low-risk relapse specimens. In these 4 rounds of repeated supervised machine learning, the high recurrence risk specimens used in each round were all different, and the selection criteria for the high recurrence risk specimens used in each round are listed in fig. 1.
The 1 st and 3 rd rounds of supervised machine learning are used to classify samples with similar characteristics and high recurrence risk, and the statistical method is to use the p-value <0.05 of the Cox genetic hazard regression of all samples in the round as the statistical standard.2 nd and 4 th rounds of supervised machine learning are used to select the final genome of the method, and 3 statistical standards are used to select genes, (1) the genes use the Cox specific risk patterns of all samples in the round as the statistical standard <0.05; (2) The difference between the median value of the gene in the current round using all high recurrence risk specimens and the median value of the gene in the current round using all low recurrence risk specimens is at least 1.2 times; (3) Genes p-value of Cox proportional risk profile of genes was recorded in 200 rounds of 10-fold cross-validation using all specimens, each cross-validation with more than 180 genes p-values. Genes selected by statistical methods were annotated for their function and subcellular localization using GO terms (The Gene Ontology Consortium, 2017).
In the 2 nd round of supervised machine learning, 65 patients with relapse phase II cancer cells were used with reduced gene expression levels of cell cycle activity relative to 332 patients without relapse phase II cancer cells with relatively low cell cycle activity, which is referred to as patients with disease relapse and low cell cycle activity within five years. As shown in Table 1, 6 groups of genes were selected in this round (1) cell division genes (cell cycle), (2) DNA repair genes (dnarepair), (3) epithelial mesenchymal transition genes (emt), (4) cell transfer genes (movement), (5) T cell-associated genes (tcell), (6) the first 60 genes (top) with recurrence and low cell cycle activity statistically most significant. To increase process stability, the cell division genome and the DNA repair genome comprise only genes encoding nuclear proteins.
In the 4 th round of supervised machine learning, the gene expression levels of the cell cycle activity of the cancer cells of the 17 patients with relapse phase II patients used were slightly increased relative to the gene expression levels of the cell cycle activity of the cancer cells of the 332 patients without relapse phase II patients, without relatively low cell cycle activity, and the patients with disease relapse and non-low cell cycle activity within five years were named. As shown in table 2, group 2 genes were selected in this round (7) Wnt signaling (Wnt) (8) with the first 60 genes (top) with the most statistically significant recurrence and non-low cell cycle activity.
To increase the stability of the final typing score value, model stacking is used. For these 8 groups of genes, each group of genes was individually assigned to calculate a typing score using the nearest centroid classification method. The 8 typing scores were fused into 2 final typing scores using a K-nearest neighbor regression algorithm (Hechenbichler and Schliep, 2004) model. And determining the optimal classification threshold of the similarity value by taking the optimal sensitivity and specificity as the standard. For each sample, if the 1 st final typing score of the sample exceeds the first similarity threshold of 0.155, or the 2 nd final typing score exceeds the second similarity threshold of 0.076, it is considered a high recurrence risk sample, otherwise it is a low recurrence risk sample.
Example 2: performance evaluation of the inventive profiling systems and methods
657 identified genes were used to construct a colorectal cancer prognostic classification, as shown in tables 1 and 2. For each stage II colorectal cancer patient, tumor specimens can be classified into 2 types: high and low risk of recurrence. In the validation of 416 patients with stage II rectal cancer in group 1 (n =416, 84 patients with high risk of recurrence had a specimen with recorded cancer recurrence at the follow-up phase, 332 patients with low risk of recurrence had no recorded cancer recurrence at the follow-up phase), both recurrence and follow-up time were positively recorded for this group of patients. Survival curve analysis showed that the survival rate for patients classified as low-risk relapse type was 97.8% [95% CI,95.7% -1] for 5 years, and 57.7% [95% CI,50.7% -65.7%) for patients classified as high-risk relapse type for 5 years. The high risk of recurrence pattern compared to the low risk of recurrence pattern was as high as HR =23.3 (p-value < 0.0001) (as in figure 2).
TABLE 1 significantly related genes
Figure GDA0003905762910000091
Figure GDA0003905762910000101
Figure GDA0003905762910000111
Figure GDA0003905762910000121
Figure GDA0003905762910000131
Figure GDA0003905762910000141
Figure GDA0003905762910000151
Figure GDA0003905762910000161
Figure GDA0003905762910000171
Figure GDA0003905762910000181
Figure GDA0003905762910000191
Figure GDA0003905762910000201
Figure GDA0003905762910000211
Figure GDA0003905762910000221
TABLE 2 significantly related genes
Figure GDA0003905762910000231
Figure GDA0003905762910000241
In group 2 validation of 41 patients with colorectal cancer II (n =41, 10 patients with high risk of recurrence with cancer recurrence recorded at follow-up, 31 patients with low risk of recurrence with no cancer recurrence recorded at follow-up). The recurrence in this group of patients was well documented, with a follow-up time of 10 years of expected value. Survival curve analysis showed that the survival rate for patients classified as low-risk relapse type was 93.8% [95% CI, 82.6% -1] for 5 years, and the survival rate for patients classified as high-risk relapse type was 50.5% [95% CI,32.0% -79.8%) for 5 years. The high risk of recurrence pattern compared to the low risk of recurrence pattern was as high as HR =11.3 (p-value = 0.00371) (see fig. 3).
The typing using the system and method of the present invention showed significant performance in both groups of patients, with significantly different relapse-free survival rates within 5 years for the high and low relapse risk types, and results superior to the risk assessment clinical parameters recommended by current medical manuals.
Example 3: performance evaluation of the typing systems and methods of the invention, using stable cell core gene profiles
The 250 stable cell core gene profiles, including 150 cell division genes, 50 DNA repair genes and the first 50 genes which were statistically most significant, as shown in tables 3 and 4, were used to construct a colorectal cancer prognostic classification. For each stage II colorectal cancer patient, tumor specimens can be classified into 2 types: high and low risk of recurrence. In the validation of the 416 patients with II-stage rectal cancer in the group 1, the relapse and the follow-up time of the patients in the group are exactly recorded. Survival curve analysis showed that the recurrence-free survival rate for patients classified as low-recurrence risk type was 93.5% [95% CI,89.7% -97.5% ], and for patients classified as high-recurrence risk type was 65.1% [95% CI, 58.7% -72.3% ], within 5 years. The high risk of recurrence pattern compared to the low risk of recurrence pattern was as high as HR =6.15 (p-value < 0.0001) (fig. 4).
TABLE 3 significantly related genes
Figure GDA0003905762910000251
Figure GDA0003905762910000261
Figure GDA0003905762910000271
Figure GDA0003905762910000281
Figure GDA0003905762910000291
TABLE 4 significantly related genes
Figure GDA0003905762910000292
Figure GDA0003905762910000301
In the validation of the 41 patients with stage II rectal cancer in group 2, there was an exact record of the recurrence of this group of patients, with the follow-up time being the expected value within 10 years. Survival curve analysis showed that the survival rate for patients classified as low risk for recurrence without recurrence was 87.8% [95% CI, 73.4% -1] within 5 years, and for patients classified as high risk for recurrence without recurrence was 57.2% [95% CI,38.5% -85.0%) within 5 years. The high risk of recurrence compared to the low risk of recurrence was as high as HR =4.38 (p-value = 0.0415) (fig. 5)
Using 250 stable cell core gene profiles, the typing system and method of the present invention showed significant performance in both groups of patients, with significant differences in relapse-free survival rates between high and low relapse risk types within 5 years, and results superior to risk assessment clinical parameters recommended by currently used medical manuals.
Example 4: results for a minimum of 3 genes for typing systems and methods
The number of genes used started at 2 genes, increased by 1 each time, and ended with 50 genes. For each selected number of genes (from 2 to 50), 200 rounds of simulation were performed. In each round of simulation, selected combinations of the number of genes, i.e., 200 times any combination of 2 genes, 200 times any combination of 3 genes, 200 times any combination of 4 genes, 200 times any combination of 5 genes, up to 200 times any combination of 50 genes, were selected from the genome. For each round of simulated combination of each gene number, the score for the typing method and system and the relapse risk ratio of high to low relapse risk in the patient specimens were calculated. The recurrence risk ratio for each selected basis factor was the average recurrence risk ratio for 200 rounds of simulations. The results show that only a minimum of any 3 gene combinations are required.
As can be seen from FIG. 6, any 3 to 50 of the genotyping methods constructed from the genes of Table 1 can achieve a recurrence risk ratio
As can be seen from FIG. 7, any 3 to 50 of the genotyping methods constructed from the genes in Table 2 can achieve a recurrence risk ratio
As can be seen from FIG. 8, any 3 to 50 of the genotyping methods constructed from the genes in Table 3 can achieve a recurrence risk ratio
As can be seen from FIG. 9, any 3 to 50 of the genotyping methods constructed from the genes in Table 4 can achieve a recurrence risk ratio.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the protection scope of the present invention, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (4)

1. Use of at least 15 genes listed in table 1, an agent that detects the RNA expression level of at least 15 genes listed in table 1, at least 15 genes listed in table 2, or an agent that detects the RNA expression level of at least 15 genes listed in table 2 in the preparation of a reagent or kit for typing a patient suffering from colorectal cancer.
2. Use of a combination of at least 15 genes listed in table 1 and at least 15 genes listed in table 2 or a combination of an agent that detects the RNA expression level of at least 15 genes listed in table 1 and an agent that detects the RNA expression level of at least 15 genes listed in table 2 in the preparation of a reagent or kit for typing a patient having colorectal cancer.
3. Use of at least 15 genes listed in table 3, an agent that detects the RNA expression level of at least 15 genes listed in table 3, at least 15 genes listed in table 4, or an agent that detects the RNA expression level of at least 15 genes listed in table 4 in the preparation of a reagent or kit for typing a patient having colorectal cancer.
4. Use of a combination of at least 15 genes listed in table 3 and at least 15 genes listed in table 4 or a combination of reagents that detect the RNA expression levels of at least 15 genes listed in table 3 and reagents that detect the RNA expression levels of at least 15 genes listed in table 4 in the preparation of a reagent or kit for typing a patient having colorectal cancer.
CN201910106934.6A 2019-02-01 2019-02-01 System for typing a patient suffering from colorectal cancer Active CN109988708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910106934.6A CN109988708B (en) 2019-02-01 2019-02-01 System for typing a patient suffering from colorectal cancer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910106934.6A CN109988708B (en) 2019-02-01 2019-02-01 System for typing a patient suffering from colorectal cancer

Publications (2)

Publication Number Publication Date
CN109988708A CN109988708A (en) 2019-07-09
CN109988708B true CN109988708B (en) 2022-12-09

Family

ID=67129810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910106934.6A Active CN109988708B (en) 2019-02-01 2019-02-01 System for typing a patient suffering from colorectal cancer

Country Status (1)

Country Link
CN (1) CN109988708B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113774130A (en) * 2020-06-09 2021-12-10 碳逻辑生物科技(香港)有限公司 Biomarkers and methods for selecting a group of chemotherapy responsive patients and uses thereof
CN112062827B (en) * 2020-09-16 2022-10-21 中国人民解放军军事科学院军事医学研究院 Application of CEP55 protein in regulation of cilia de-assembly and preparation of cilia-related disease model
CN113436741B (en) * 2021-07-16 2023-02-28 四川大学华西医院 Lung cancer recurrence prediction method based on tissue specific enhancer region DNA methylation
CN116870139A (en) * 2023-09-06 2023-10-13 暨南大学附属第一医院(广州华侨医院) Application of LASP1 protein in preparation of medicine for treating spinal cord injury repair

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8105777B1 (en) * 2008-02-13 2012-01-31 Nederlands Kanker Instituut Methods for diagnosis and/or prognosis of colon cancer

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009002175A1 (en) * 2007-06-28 2008-12-31 Agendia B.V. A method of typing a sample comprising colorectal cancer cells
EP2195658A2 (en) * 2007-09-28 2010-06-16 Royal College of Surgeons in Ireland A method of assessing colorectal cancer status in an individual
AU2015202173A1 (en) * 2007-10-05 2015-05-14 Pacific Edge Limited Proliferation signature and prognosis for gastrointestinal cancer
GB0720113D0 (en) * 2007-10-15 2007-11-28 Cambridge Cancer Diagnostics L Diagnostic, prognostic and predictive testing for cancer
EP2202320A1 (en) * 2008-12-24 2010-06-30 Agendia B.V. Methods and means for typing a sample comprising colorectal cancer cells
US20140031251A1 (en) * 2010-11-03 2014-01-30 H. Lee Moffitt Cancer Center And Research Institute, Inc. Methods of classifying human subjects with regard to cancer prognosis
EP3009842B1 (en) * 2014-09-26 2019-09-04 Sysmex Corporation Method for supporting diagnosis of risk of colorectal cancer recurrence, program and computer system
CN109938695A (en) * 2019-03-08 2019-06-28 度特斯(大连)实业有限公司 A kind of human body diseases Risk Forecast Method and equipment based on heterogeneous degree index

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8105777B1 (en) * 2008-02-13 2012-01-31 Nederlands Kanker Instituut Methods for diagnosis and/or prognosis of colon cancer

Also Published As

Publication number Publication date
CN109988708A (en) 2019-07-09

Similar Documents

Publication Publication Date Title
CN109988708B (en) System for typing a patient suffering from colorectal cancer
JP2021061844A (en) Using size and number aberrations in plasma dna for detecting cancer
JP2022521492A (en) An integrated machine learning framework for estimating homologous recombination defects
US20090062144A1 (en) Gene signature for prognosis and diagnosis of lung cancer
WO2019023517A2 (en) Genomic sequencing classifier
CN110305964B (en) Tool for prognostic recurrence risk prediction marker of prostate cancer patient and establishment of risk assessment model of tool
KR20170125044A (en) Mutation detection for cancer screening and fetal analysis
US8030060B2 (en) Gene signature for diagnosis and prognosis of breast cancer and ovarian cancer
CN110273003B (en) Marker tool for prognosis recurrence detection of papillary renal cell carcinoma patient and establishment of risk assessment model thereof
US20160102359A1 (en) Genetic marker for early breast cancer prognosis prediction and diagnosis, and use thereof
Chen et al. DNA methylation-based classification and identification of renal cell carcinoma prognosis-subgroups
WO2013086352A1 (en) Prostate cancer associated circulating nucleic acid biomarkers
EP2419540B1 (en) Methods and gene expression signature for assessing ras pathway activity
AU2005312081A1 (en) Methods and systems for prognosis and treatment of solid tumors
Zhao et al. Identification of pan-cancer prognostic biomarkers through integration of multi-omics data
EP3950960A1 (en) Dna methylation marker for predicting recurrence of liver cancer, and use thereof
CN109929934B (en) Application of immune related gene in kit and system for colorectal cancer prognosis
Zhu et al. DNA methylation biomarkers for the occurrence of lung adenocarcinoma from TCGA data mining
CN115820860A (en) Method for screening non-small cell lung cancer marker based on methylation difference of enhancer, marker and application thereof
Belvedere et al. A computational index derived from whole-genome copy number analysis is a novel tool for prognosis in early stage lung squamous cell carcinoma
CN114829624A (en) Genomic scar assay and related methods
CN115472217A (en) System for predicting pancreatic cancer patient prognosis
JP2023529759A (en) How to predict cancer progression
Ma et al. Optimizing the Prognostic Model of Cervical Cancer Based on Artificial Intelligence Algorithm and Data Mining Technology
US10240206B2 (en) Biomarkers and methods for predicting benefit of adjuvant chemotherapy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Sun Tian

Inventor after: Wu Rujia

Inventor before: Sun Tian

GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: A133, Building B, No. 1 Huanzhen East Road South, Beijiao Community, Beijiao Town, Shunde District, Foshan City, Guangdong Province, 528000 (Residence Application)

Patentee after: Carbon Logic Biotechnology (Foshan) Co.,Ltd.

Address before: 528,400 Card 502, Floor 5, Shumao Building, No. 6, Xiangxing Road, Torch Development Zone, Zhongshan City, Guangdong Province

Patentee before: Carbon Logic Biotechnology (Zhongshan) Co.,Ltd.