CN117947166A - Marker group, product, system and application thereof for prognosis of intestinal cancer - Google Patents
Marker group, product, system and application thereof for prognosis of intestinal cancer Download PDFInfo
- Publication number
- CN117947166A CN117947166A CN202410127289.7A CN202410127289A CN117947166A CN 117947166 A CN117947166 A CN 117947166A CN 202410127289 A CN202410127289 A CN 202410127289A CN 117947166 A CN117947166 A CN 117947166A
- Authority
- CN
- China
- Prior art keywords
- cms
- gene
- marker
- markers
- subject
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000003550 marker Substances 0.000 title claims abstract description 152
- 201000002313 intestinal cancer Diseases 0.000 title claims abstract description 77
- 238000004393 prognosis Methods 0.000 title claims abstract description 52
- 208000005016 Intestinal Neoplasms Diseases 0.000 title claims abstract description 36
- 201000003741 Gastrointestinal carcinoma Diseases 0.000 claims abstract description 41
- 101001096074 Homo sapiens Regenerating islet-derived protein 4 Proteins 0.000 claims abstract description 41
- 102100036170 C-X-C motif chemokine 9 Human genes 0.000 claims abstract description 40
- 101000947172 Homo sapiens C-X-C motif chemokine 9 Proteins 0.000 claims abstract description 40
- 102100037889 Regenerating islet-derived protein 4 Human genes 0.000 claims abstract description 40
- 102100038778 Amphiregulin Human genes 0.000 claims abstract description 38
- 101000809450 Homo sapiens Amphiregulin Proteins 0.000 claims abstract description 38
- 101001056707 Homo sapiens Proepiregulin Proteins 0.000 claims abstract description 38
- 101001077714 Homo sapiens Serine protease inhibitor Kazal-type 4 Proteins 0.000 claims abstract description 38
- 102100025498 Proepiregulin Human genes 0.000 claims abstract description 38
- 102100025416 Serine protease inhibitor Kazal-type 4 Human genes 0.000 claims abstract description 38
- YMZPQKXPKZZSFV-CPWYAANMSA-N 2-[3-[(1r)-1-[(2s)-1-[(2s)-2-[(1r)-cyclohex-2-en-1-yl]-2-(3,4,5-trimethoxyphenyl)acetyl]piperidine-2-carbonyl]oxy-3-(3,4-dimethoxyphenyl)propyl]phenoxy]acetic acid Chemical compound C1=C(OC)C(OC)=CC=C1CC[C@H](C=1C=C(OCC(O)=O)C=CC=1)OC(=O)[C@H]1N(C(=O)[C@@H]([C@H]2C=CCCC2)C=2C=C(OC)C(OC)=C(OC)C=2)CCCC1 YMZPQKXPKZZSFV-CPWYAANMSA-N 0.000 claims abstract description 36
- 102100029463 Aquaporin-8 Human genes 0.000 claims abstract description 36
- 102100025277 C-X-C motif chemokine 13 Human genes 0.000 claims abstract description 36
- 101000771417 Homo sapiens Aquaporin-8 Proteins 0.000 claims abstract description 36
- 101000858064 Homo sapiens C-X-C motif chemokine 13 Proteins 0.000 claims abstract description 36
- 101001082060 Homo sapiens Interferon-induced protein with tetratricopeptide repeats 3 Proteins 0.000 claims abstract description 36
- 101000864786 Homo sapiens Secreted frizzled-related protein 2 Proteins 0.000 claims abstract description 36
- 101000864793 Homo sapiens Secreted frizzled-related protein 4 Proteins 0.000 claims abstract description 36
- 101000723833 Homo sapiens Zinc finger E-box-binding homeobox 2 Proteins 0.000 claims abstract description 36
- 102100027302 Interferon-induced protein with tetratricopeptide repeats 3 Human genes 0.000 claims abstract description 36
- 108091006262 SLC4A4 Proteins 0.000 claims abstract description 36
- 108010044012 STAT1 Transcription Factor Proteins 0.000 claims abstract description 36
- 102100030054 Secreted frizzled-related protein 2 Human genes 0.000 claims abstract description 36
- 102100030052 Secreted frizzled-related protein 4 Human genes 0.000 claims abstract description 36
- 102000006633 Sodium-Bicarbonate Symporters Human genes 0.000 claims abstract description 36
- 102100028458 Zinc finger E-box-binding homeobox 2 Human genes 0.000 claims abstract description 36
- 239000012472 biological sample Substances 0.000 claims abstract description 31
- 239000003153 chemical reaction reagent Substances 0.000 claims abstract description 24
- -1 CA4 Proteins 0.000 claims abstract description 22
- 101150082072 14 gene Proteins 0.000 claims abstract description 18
- 230000014509 gene expression Effects 0.000 claims description 100
- 108090000623 proteins and genes Proteins 0.000 claims description 91
- 239000000523 sample Substances 0.000 claims description 71
- 238000012163 sequencing technique Methods 0.000 claims description 39
- 102100029904 Signal transducer and activator of transcription 1-alpha/beta Human genes 0.000 claims description 35
- 206010009944 Colon cancer Diseases 0.000 claims description 32
- 102100039534 Calcium-activated chloride channel regulator 4 Human genes 0.000 claims description 28
- 101000888577 Homo sapiens Calcium-activated chloride channel regulator 4 Proteins 0.000 claims description 28
- 101001133081 Homo sapiens Mucin-2 Proteins 0.000 claims description 28
- 238000010606 normalization Methods 0.000 claims description 28
- 102100034263 Mucin-2 Human genes 0.000 claims description 27
- 208000001333 Colorectal Neoplasms Diseases 0.000 claims description 23
- 102100026096 Claudin-8 Human genes 0.000 claims description 22
- 101000912659 Homo sapiens Claudin-8 Proteins 0.000 claims description 22
- 101000785626 Homo sapiens Zinc finger E-box-binding homeobox 1 Proteins 0.000 claims description 22
- 102100026457 Zinc finger E-box-binding homeobox 1 Human genes 0.000 claims description 22
- 238000000034 method Methods 0.000 claims description 22
- 102100025248 C-X-C motif chemokine 10 Human genes 0.000 claims description 16
- 101000858088 Homo sapiens C-X-C motif chemokine 10 Proteins 0.000 claims description 16
- 108010078184 Trefoil Factor-3 Proteins 0.000 claims description 16
- 102100039145 Trefoil factor 3 Human genes 0.000 claims description 16
- 108060000255 AIM2 Proteins 0.000 claims description 15
- 102100022144 Achaete-scute homolog 2 Human genes 0.000 claims description 15
- 102100024394 Adipocyte enhancer-binding protein 1 Human genes 0.000 claims description 15
- 102000004363 Aquaporin 3 Human genes 0.000 claims description 15
- 108090000991 Aquaporin 3 Proteins 0.000 claims description 15
- 102100030099 Chloride anion exchanger Human genes 0.000 claims description 15
- 102100023708 Coiled-coil domain-containing protein 80 Human genes 0.000 claims description 15
- 102100031611 Collagen alpha-1(III) chain Human genes 0.000 claims description 15
- 102100026245 E3 ubiquitin-protein ligase RNF43 Human genes 0.000 claims description 15
- 102100032191 Guanine nucleotide exchange factor VAV3 Human genes 0.000 claims description 15
- 102100028539 Guanylate-binding protein 5 Human genes 0.000 claims description 15
- 101000901109 Homo sapiens Achaete-scute homolog 2 Proteins 0.000 claims description 15
- 101000833122 Homo sapiens Adipocyte enhancer-binding protein 1 Proteins 0.000 claims description 15
- 101000978383 Homo sapiens Coiled-coil domain-containing protein 80 Proteins 0.000 claims description 15
- 101000993285 Homo sapiens Collagen alpha-1(III) chain Proteins 0.000 claims description 15
- 101000692702 Homo sapiens E3 ubiquitin-protein ligase RNF43 Proteins 0.000 claims description 15
- 101000775742 Homo sapiens Guanine nucleotide exchange factor VAV3 Proteins 0.000 claims description 15
- 101001058850 Homo sapiens Guanylate-binding protein 5 Proteins 0.000 claims description 15
- 101000994460 Homo sapiens Keratin, type I cytoskeletal 20 Proteins 0.000 claims description 15
- 101001055386 Homo sapiens Melanophilin Proteins 0.000 claims description 15
- 101000645296 Homo sapiens Metalloproteinase inhibitor 2 Proteins 0.000 claims description 15
- 101000633054 Homo sapiens Zinc finger protein SNAI2 Proteins 0.000 claims description 15
- 102100024064 Interferon-inducible protein AIM2 Human genes 0.000 claims description 15
- 102100032700 Keratin, type I cytoskeletal 20 Human genes 0.000 claims description 15
- 102100026158 Melanophilin Human genes 0.000 claims description 15
- 102100026262 Metalloproteinase inhibitor 2 Human genes 0.000 claims description 15
- 108091006504 SLC26A3 Proteins 0.000 claims description 15
- 108010083162 Twist-Related Protein 1 Proteins 0.000 claims description 15
- 102100030398 Twist-related protein 1 Human genes 0.000 claims description 15
- 102100029570 Zinc finger protein SNAI2 Human genes 0.000 claims description 15
- 101150098072 20 gene Proteins 0.000 claims description 13
- 102100025279 C-X-C motif chemokine 11 Human genes 0.000 claims description 13
- 101000858060 Homo sapiens C-X-C motif chemokine 11 Proteins 0.000 claims description 13
- 238000004519 manufacturing process Methods 0.000 claims description 5
- 230000002068 genetic effect Effects 0.000 claims description 3
- 102100027209 CD2-associated protein Human genes 0.000 claims 1
- 101000914499 Homo sapiens CD2-associated protein Proteins 0.000 claims 1
- 101000906798 Sulfolobus acidocaldarius (strain ATCC 33909 / DSM 639 / JCM 8929 / NBRC 15157 / NCIMB 11770) (R)-citramalate synthase Proteins 0.000 claims 1
- 238000011156 evaluation Methods 0.000 abstract description 4
- 102000006381 STAT1 Transcription Factor Human genes 0.000 abstract 1
- 208000004117 Congenital Myasthenic Syndromes Diseases 0.000 description 146
- 239000002773 nucleotide Substances 0.000 description 81
- 125000003729 nucleotide group Chemical group 0.000 description 81
- 230000004083 survival effect Effects 0.000 description 36
- 201000006621 congenital myasthenic syndrome 14 Diseases 0.000 description 25
- 101000578853 Homo sapiens Membrane-spanning 4-domains subfamily A member 12 Proteins 0.000 description 24
- 102100028425 Membrane-spanning 4-domains subfamily A member 12 Human genes 0.000 description 24
- 238000004458 analytical method Methods 0.000 description 24
- 201000006838 congenital myasthenic syndrome 20 Diseases 0.000 description 23
- 101150083341 LOG2 gene Proteins 0.000 description 16
- 238000009826 distribution Methods 0.000 description 14
- 210000001519 tissue Anatomy 0.000 description 14
- 230000002757 inflammatory effect Effects 0.000 description 13
- 206010028980 Neoplasm Diseases 0.000 description 12
- 210000002919 epithelial cell Anatomy 0.000 description 11
- 208000029742 colonic neoplasm Diseases 0.000 description 9
- 230000004069 differentiation Effects 0.000 description 9
- 201000010099 disease Diseases 0.000 description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 9
- 239000011159 matrix material Substances 0.000 description 9
- 230000035755 proliferation Effects 0.000 description 9
- 238000012216 screening Methods 0.000 description 9
- 238000006243 chemical reaction Methods 0.000 description 8
- 210000001842 enterocyte Anatomy 0.000 description 8
- 230000037361 pathway Effects 0.000 description 8
- 230000002062 proliferating effect Effects 0.000 description 8
- 238000003752 polymerase chain reaction Methods 0.000 description 7
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 6
- 101150017816 40 gene Proteins 0.000 description 6
- 108020004635 Complementary DNA Proteins 0.000 description 6
- 238000003559 RNA-seq method Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 6
- 238000010804 cDNA synthesis Methods 0.000 description 6
- 201000011510 cancer Diseases 0.000 description 6
- 239000002299 complementary DNA Substances 0.000 description 6
- 238000010219 correlation analysis Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 230000008595 infiltration Effects 0.000 description 6
- 238000001764 infiltration Methods 0.000 description 6
- 102000004169 proteins and genes Human genes 0.000 description 6
- 238000000746 purification Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 101150039504 6 gene Proteins 0.000 description 5
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 5
- 239000011324 bead Substances 0.000 description 5
- 239000000090 biomarker Substances 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 239000002853 nucleic acid probe Substances 0.000 description 5
- 210000000130 stem cell Anatomy 0.000 description 5
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 4
- ZSTCHQOKNUXHLZ-PIRIXANTSA-L [(1r,2r)-2-azanidylcyclohexyl]azanide;oxalate;pentyl n-[1-[(2r,3r,4s,5r)-3,4-dihydroxy-5-methyloxolan-2-yl]-5-fluoro-2-oxopyrimidin-4-yl]carbamate;platinum(4+) Chemical compound [Pt+4].[O-]C(=O)C([O-])=O.[NH-][C@@H]1CCCC[C@H]1[NH-].C1=C(F)C(NC(=O)OCCCCC)=NC(=O)N1[C@H]1[C@H](O)[C@H](O)[C@@H](C)O1 ZSTCHQOKNUXHLZ-PIRIXANTSA-L 0.000 description 4
- 230000004913 activation Effects 0.000 description 4
- 238000011226 adjuvant chemotherapy Methods 0.000 description 4
- 210000004027 cell Anatomy 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 230000003828 downregulation Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 210000002175 goblet cell Anatomy 0.000 description 4
- 210000002490 intestinal epithelial cell Anatomy 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000003827 upregulation Effects 0.000 description 4
- 208000015634 Rectal Neoplasms Diseases 0.000 description 3
- 102000013814 Wnt Human genes 0.000 description 3
- 108050003627 Wnt Proteins 0.000 description 3
- 208000009956 adenocarcinoma Diseases 0.000 description 3
- 238000002512 chemotherapy Methods 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012165 high-throughput sequencing Methods 0.000 description 3
- 230000001575 pathological effect Effects 0.000 description 3
- 230000002980 postoperative effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 206010038038 rectal cancer Diseases 0.000 description 3
- 201000001275 rectum cancer Diseases 0.000 description 3
- 230000001052 transient effect Effects 0.000 description 3
- 102100030708 GTPase KRas Human genes 0.000 description 2
- 101000584612 Homo sapiens GTPase KRas Proteins 0.000 description 2
- 108010001336 Horseradish Peroxidase Proteins 0.000 description 2
- 208000032818 Microsatellite Instability Diseases 0.000 description 2
- 238000002123 RNA extraction Methods 0.000 description 2
- 238000010802 RNA extraction kit Methods 0.000 description 2
- 239000013614 RNA sample Substances 0.000 description 2
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 2
- 210000001744 T-lymphocyte Anatomy 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000010521 absorption reaction Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 238000009098 adjuvant therapy Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 230000004071 biological effect Effects 0.000 description 2
- 239000011616 biotin Substances 0.000 description 2
- 229960002685 biotin Drugs 0.000 description 2
- 235000020958 biotin Nutrition 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 230000034994 death Effects 0.000 description 2
- 230000009786 epithelial differentiation Effects 0.000 description 2
- 238000011354 first-line chemotherapy Methods 0.000 description 2
- MHMNJMPURVTYEJ-UHFFFAOYSA-N fluorescein-5-isothiocyanate Chemical compound O1C(=O)C2=CC(N=C=S)=CC=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 MHMNJMPURVTYEJ-UHFFFAOYSA-N 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- 238000007429 general method Methods 0.000 description 2
- 238000010324 immunological assay Methods 0.000 description 2
- 239000012535 impurity Substances 0.000 description 2
- 230000000968 intestinal effect Effects 0.000 description 2
- 210000004347 intestinal mucosa Anatomy 0.000 description 2
- 239000003446 ligand Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 230000033607 mismatch repair Effects 0.000 description 2
- 239000003607 modifier Substances 0.000 description 2
- 201000010879 mucinous adenocarcinoma Diseases 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000007170 pathology Effects 0.000 description 2
- 238000001558 permutation test Methods 0.000 description 2
- 238000010837 poor prognosis Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 239000002096 quantum dot Substances 0.000 description 2
- 238000007670 refining Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000010839 reverse transcription Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000011664 signaling Effects 0.000 description 2
- 229910052710 silicon Inorganic materials 0.000 description 2
- 239000010703 silicon Substances 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 238000000551 statistical hypothesis test Methods 0.000 description 2
- 238000012353 t test Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000001262 western blot Methods 0.000 description 2
- 101150036497 AEBP1 gene Proteins 0.000 description 1
- 101150051235 AIM2 gene Proteins 0.000 description 1
- 101150036244 AREG gene Proteins 0.000 description 1
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 1
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 1
- 101150106024 Aqp3 gene Proteins 0.000 description 1
- 101150062140 Aqp8 gene Proteins 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 102000008096 B7-H1 Antigen Human genes 0.000 description 1
- 108010074708 B7-H1 Antigen Proteins 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 101150052727 CA1 gene Proteins 0.000 description 1
- 101150116967 CA2 gene Proteins 0.000 description 1
- 101150051782 CLCA4 gene Proteins 0.000 description 1
- 101150090107 CLDN8 gene Proteins 0.000 description 1
- 101150077124 CXCL10 gene Proteins 0.000 description 1
- 101150022250 CXCL11 gene Proteins 0.000 description 1
- 101150012886 CXCL13 gene Proteins 0.000 description 1
- 101150115558 CXCL9 gene Proteins 0.000 description 1
- 101150000258 Ca4 gene Proteins 0.000 description 1
- 102000000905 Cadherin Human genes 0.000 description 1
- 108050007957 Cadherin Proteins 0.000 description 1
- 241000282465 Canis Species 0.000 description 1
- 201000009030 Carcinoma Diseases 0.000 description 1
- 101150014368 Ccdc80 gene Proteins 0.000 description 1
- 101150008975 Col3a1 gene Proteins 0.000 description 1
- 108050006400 Cyclin Proteins 0.000 description 1
- 102000016736 Cyclin Human genes 0.000 description 1
- LTMHDMANZUZIPE-AMTYYWEZSA-N Digoxin Natural products O([C@H]1[C@H](C)O[C@H](O[C@@H]2C[C@@H]3[C@@](C)([C@@H]4[C@H]([C@]5(O)[C@](C)([C@H](O)C4)[C@H](C4=CC(=O)OC4)CC5)CC3)CC2)C[C@@H]1O)[C@H]1O[C@H](C)[C@@H](O[C@H]2O[C@@H](C)[C@H](O)[C@@H](O)C2)[C@@H](O)C1 LTMHDMANZUZIPE-AMTYYWEZSA-N 0.000 description 1
- 206010059866 Drug resistance Diseases 0.000 description 1
- 238000012286 ELISA Assay Methods 0.000 description 1
- 101150097734 EPHB2 gene Proteins 0.000 description 1
- 238000011510 Elispot assay Methods 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 102100031968 Ephrin type-B receptor 2 Human genes 0.000 description 1
- 241000283073 Equus caballus Species 0.000 description 1
- 101150070878 Ereg gene Proteins 0.000 description 1
- 241000282324 Felis Species 0.000 description 1
- 101100366864 Gallus gallus SULT1B1 gene Proteins 0.000 description 1
- HTTJABKRGRZYRN-UHFFFAOYSA-N Heparin Chemical compound OC1C(NC(=O)C)C(O)OC(COS(O)(=O)=O)C1OC1C(OS(O)(=O)=O)C(O)C(OC2C(C(OS(O)(=O)=O)C(OC3C(C(O)C(O)C(O3)C(O)=O)OS(O)(=O)=O)C(CO)O2)NS(O)(=O)=O)C(C(O)=O)O1 HTTJABKRGRZYRN-UHFFFAOYSA-N 0.000 description 1
- 101000924577 Homo sapiens Adenomatous polyposis coli protein Proteins 0.000 description 1
- 101100496142 Homo sapiens CLDN8 gene Proteins 0.000 description 1
- 101100222383 Homo sapiens CXCL13 gene Proteins 0.000 description 1
- 101100282357 Homo sapiens GBP5 gene Proteins 0.000 description 1
- 101000688216 Homo sapiens Intestinal-type alkaline phosphatase Proteins 0.000 description 1
- 101001063456 Homo sapiens Leucine-rich repeat-containing G-protein coupled receptor 5 Proteins 0.000 description 1
- 101000972291 Homo sapiens Lymphoid enhancer-binding factor 1 Proteins 0.000 description 1
- 101100244966 Homo sapiens PRKX gene Proteins 0.000 description 1
- 101001060744 Homo sapiens Peptidyl-prolyl cis-trans isomerase FKBP1A Proteins 0.000 description 1
- 101100091155 Homo sapiens RNF43 gene Proteins 0.000 description 1
- 101000836079 Homo sapiens Serpin B8 Proteins 0.000 description 1
- 101100484535 Homo sapiens VAV3 gene Proteins 0.000 description 1
- 101000760212 Homo sapiens Zinc finger protein 33B Proteins 0.000 description 1
- 101150008572 Ifit3 gene Proteins 0.000 description 1
- 102000037982 Immune checkpoint proteins Human genes 0.000 description 1
- 108091008036 Immune checkpoint proteins Proteins 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- 108090001005 Interleukin-6 Proteins 0.000 description 1
- 108090001007 Interleukin-8 Proteins 0.000 description 1
- 102100024319 Intestinal-type alkaline phosphatase Human genes 0.000 description 1
- 206010069755 K-ras gene mutation Diseases 0.000 description 1
- 101150082851 Krt20 gene Proteins 0.000 description 1
- 102100031036 Leucine-rich repeat-containing G-protein coupled receptor 5 Human genes 0.000 description 1
- 102100022699 Lymphoid enhancer-binding factor 1 Human genes 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 102000014736 Notch Human genes 0.000 description 1
- 108010070047 Notch Receptors Proteins 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 102100027913 Peptidyl-prolyl cis-trans isomerase FKBP1A Human genes 0.000 description 1
- 239000004743 Polypropylene Substances 0.000 description 1
- 239000004793 Polystyrene Substances 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 101150106827 SFRP2 gene Proteins 0.000 description 1
- 101150008354 SFRP4 gene Proteins 0.000 description 1
- 101150095813 SLC26A3 gene Proteins 0.000 description 1
- 101150047834 SNAI2 gene Proteins 0.000 description 1
- 101150067066 SPINK4 gene Proteins 0.000 description 1
- 101150094092 STAT1 gene Proteins 0.000 description 1
- CGNLCCVKSWNSDG-UHFFFAOYSA-N SYBR Green I Chemical compound CN(C)CCCN(CCC)C1=CC(C=C2N(C3=CC=CC=C3S2)C)=C2C=CC=CC2=[N+]1C1=CC=CC=C1 CGNLCCVKSWNSDG-UHFFFAOYSA-N 0.000 description 1
- 241000239226 Scorpiones Species 0.000 description 1
- 102100025520 Serpin B8 Human genes 0.000 description 1
- 101150026500 Slc4a4 gene Proteins 0.000 description 1
- 108010090804 Streptavidin Proteins 0.000 description 1
- 238000000692 Student's t-test Methods 0.000 description 1
- 101150048087 TFF3 gene Proteins 0.000 description 1
- 101150094002 TWIST1 gene Proteins 0.000 description 1
- 101150021063 Timp2 gene Proteins 0.000 description 1
- 102000015098 Tumor Suppressor Protein p53 Human genes 0.000 description 1
- 108010078814 Tumor Suppressor Protein p53 Proteins 0.000 description 1
- 102100023345 Tyrosine-protein kinase ITK/TSK Human genes 0.000 description 1
- 101150081483 VAV3 gene Proteins 0.000 description 1
- 101150019699 ZEB2 gene Proteins 0.000 description 1
- 101150074545 Zeb1 gene Proteins 0.000 description 1
- 102100024657 Zinc finger protein 33B Human genes 0.000 description 1
- 239000003146 anticoagulant agent Substances 0.000 description 1
- 229940127219 anticoagulant drug Drugs 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 101150010487 are gene Proteins 0.000 description 1
- 210000003567 ascitic fluid Anatomy 0.000 description 1
- DZBUGLKDJFMEHC-UHFFFAOYSA-N benzoquinolinylidene Natural products C1=CC=CC2=CC3=CC=CC=C3N=C21 DZBUGLKDJFMEHC-UHFFFAOYSA-N 0.000 description 1
- 230000023555 blood coagulation Effects 0.000 description 1
- 210000001185 bone marrow Anatomy 0.000 description 1
- 210000000621 bronchi Anatomy 0.000 description 1
- 102100029402 cAMP-dependent protein kinase catalytic subunit PRKX Human genes 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 239000002771 cell marker Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- LTMHDMANZUZIPE-PUGKRICDSA-N digoxin Chemical compound C1[C@H](O)[C@H](O)[C@@H](C)O[C@H]1O[C@@H]1[C@@H](C)O[C@@H](O[C@@H]2[C@H](O[C@@H](O[C@@H]3C[C@@H]4[C@]([C@@H]5[C@H]([C@]6(CC[C@@H]([C@@]6(C)[C@H](O)C5)C=5COC(=O)C=5)O)CC4)(C)CC3)C[C@@H]2O)C)C[C@@H]1O LTMHDMANZUZIPE-PUGKRICDSA-N 0.000 description 1
- 229960005156 digoxin Drugs 0.000 description 1
- LTMHDMANZUZIPE-UHFFFAOYSA-N digoxine Natural products C1C(O)C(O)C(C)OC1OC1C(C)OC(OC2C(OC(OC3CC4C(C5C(C6(CCC(C6(C)C(O)C5)C=5COC(=O)C=5)O)CC4)(C)CC3)CC2O)C)CC1O LTMHDMANZUZIPE-UHFFFAOYSA-N 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 239000003085 diluting agent Substances 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 238000003114 enzyme-linked immunosorbent spot assay Methods 0.000 description 1
- 230000008556 epithelial cell proliferation Effects 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 238000003205 genotyping method Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 229960002897 heparin Drugs 0.000 description 1
- 229920000669 heparin Polymers 0.000 description 1
- 230000003463 hyperproliferative effect Effects 0.000 description 1
- 210000002865 immune cell Anatomy 0.000 description 1
- 230000001900 immune effect Effects 0.000 description 1
- 238000003119 immunoblot Methods 0.000 description 1
- 238000009169 immunotherapy Methods 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 210000000936 intestine Anatomy 0.000 description 1
- 239000004816 latex Substances 0.000 description 1
- 229920000126 latex Polymers 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 210000001165 lymph node Anatomy 0.000 description 1
- 230000001926 lymphatic effect Effects 0.000 description 1
- 210000002540 macrophage Anatomy 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 230000001394 metastastic effect Effects 0.000 description 1
- 206010061289 metastatic neoplasm Diseases 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 101150047326 mlpH gene Proteins 0.000 description 1
- 239000003147 molecular marker Substances 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000003843 mucus production Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 210000003928 nasal cavity Anatomy 0.000 description 1
- 210000000581 natural killer T-cell Anatomy 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 231100000590 oncogenic Toxicity 0.000 description 1
- 230000002246 oncogenic effect Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 239000012188 paraffin wax Substances 0.000 description 1
- 239000013610 patient sample Substances 0.000 description 1
- 210000003800 pharynx Anatomy 0.000 description 1
- 239000008363 phosphate buffer Substances 0.000 description 1
- 239000002504 physiological saline solution Substances 0.000 description 1
- 239000004033 plastic Substances 0.000 description 1
- 210000004910 pleural fluid Anatomy 0.000 description 1
- 229920001184 polypeptide Polymers 0.000 description 1
- 229920001155 polypropylene Polymers 0.000 description 1
- 229920002223 polystyrene Polymers 0.000 description 1
- 239000013641 positive control Substances 0.000 description 1
- 230000032954 positive regulation of cell adhesion Effects 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 102000016914 ras Proteins Human genes 0.000 description 1
- 238000003753 real-time PCR Methods 0.000 description 1
- 239000013074 reference sample Substances 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 238000012106 screening analysis Methods 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 238000011255 standard chemotherapy Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000002198 surface plasmon resonance spectroscopy Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 210000001138 tear Anatomy 0.000 description 1
- ABZLKHKQJHEPAX-UHFFFAOYSA-N tetramethylrhodamine Chemical compound C=12C=CC(N(C)C)=CC2=[O+]C2=CC(N(C)C)=CC=C2C=1C1=CC=CC=C1C([O-])=O ABZLKHKQJHEPAX-UHFFFAOYSA-N 0.000 description 1
- JGVWCANSWKRBCS-UHFFFAOYSA-N tetramethylrhodamine thiocyanate Chemical compound [Cl-].C=12C=CC(N(C)C)=CC2=[O+]C2=CC(N(C)C)=CC=C2C=1C1=CC=C(SC#N)C=C1C(O)=O JGVWCANSWKRBCS-UHFFFAOYSA-N 0.000 description 1
- MPLHNVLQVRSVEE-UHFFFAOYSA-N texas red Chemical compound [O-]S(=O)(=O)C1=CC(S(Cl)(=O)=O)=CC=C1C(C1=CC=2CCCN3CCCC(C=23)=C1O1)=C2C1=C(CCC1)C3=[N+]1CCCC3=C2 MPLHNVLQVRSVEE-UHFFFAOYSA-N 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 231100000588 tumorigenic Toxicity 0.000 description 1
- 230000000381 tumorigenic effect Effects 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Provided herein are marker sets, products, systems, and uses thereof for the prognosis of intestinal cancer. In particular, provided herein is a product or set of products for assessing the prognosis of a bowel cancer in a subject, comprising reagents for detecting each marker of a set of markers from a biological sample of the subject, the set of markers having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4. Low cost, high efficiency, high quality prognostic evaluation means for bowel cancer are provided herein.
Description
Technical Field
The present invention relates to the biomedical field; in particular, it relates to marker sets, products, systems and uses thereof for assessing prognosis of bowel cancer.
Background
Colorectal cancer (CRC) has become the third most common malignancy in China. For colorectal cancer which has been treated by surgery, postoperative pathology assessment is the most important prognostic index and postoperative treatment basis. Patients with advanced CRC should receive adjuvant therapy post-operatively, but adjuvant therapy decisions for patients with stage II/III remain largely controversial.
Retrospective studies have found that about 10-20% of phase II patients undergo postoperative recurrence and metastasis. In addition, high-risk phase II patients may benefit from adjuvant chemotherapy. The current evaluation mode for whether patients with stage II colon cancer need to receive auxiliary chemotherapy remains mainly at the histological level: tumor infiltration depth, differentiation degree, presence or absence of lymphatic infiltration, presence or absence of nerve infiltration, total number of lymph nodes and positive number, and incisional margin; molecular level studies suggest that patients with MSI-H (microsatellite instability high, high satellite instability) or dMMR (MISMATCH REPAIR DEFICIENT, mismatch repair deficiency) receive reduced benefit from 5-FU chemotherapy. For patients with stage III CRC, the duration of chemotherapy is still controversial, and the latest study results of IDEA show that: the 5-year survival difference between 3 months and 6 months of adjuvant chemotherapy for stage III patients is small; the 5-year survival rate of the III-phase low-risk 4-period XELOX is superior to that of the 8-period XELOX, the 5-year survival rate of the III-phase high-risk 4-period XELOX is only reduced by 1% compared with that of the 8-period XELOX, and the toxic and side effects are greatly reduced. Furthermore, current clinical treatment outcomes reflect that not only high-risk stage II/III can benefit from adjuvant chemotherapy. The key to these problems is the lack of specific molecular marker definition or judgment of "high risk". Therefore, how to screen specific molecular markers, perform accurate molecular characteristics and subtype analysis on stage II/III CRC patients, and identify truly "high-risk" patients to benefit from adjuvant chemotherapy has been a problem to be solved in clinical settings.
Disclosure of Invention
In order to solve the above-mentioned problems, the inventors have found and refined a marker group that can be used for evaluating the prognosis of intestinal cancer in a subject through long-term and intensive research and analysis. By adopting the marker group, accurate molecular characteristics and subtype analysis can be carried out on patients with intestinal cancer (such as colorectal cancer), and the accuracy of identification and clinical prognosis of patients with high risk is greatly improved.
In one aspect, provided herein is a marker panel for assessing the prognosis of bowel cancer in a subject, having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
In another aspect, provided herein is a product or set of products for assessing the prognosis of a bowel cancer in a subject, comprising reagents for detecting each marker of a set of markers from a biological sample of the subject, the set of markers having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
In another aspect, provided herein is a system for assessing the prognosis of bowel cancer in a subject, comprising a memory and one or more processors;
wherein the memory comprises:
Each gene expression data for a marker panel from a biological sample of a subject,
CMS typing feature genome expression template data, and
One or more processor-executable instructions;
Wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4; and
Wherein the one or more processor-executable instructions are configured to:
(a) Obtaining gene expression data for the marker set from a biological sample of a subject;
(b) Calculating cosine distances between the gene expression data in (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
In another aspect, provided herein is a computer-readable medium comprising:
Each gene expression level data for a marker panel of biological samples from a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4;
CMS typing feature genome expression template data, and
Instructions for performing a method comprising:
(a) Obtaining gene expression level data of a marker set of a biological sample from a subject;
(b) Calculating cosine distances between the gene expression quantity data in the step (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
In another aspect, provided herein is an electronic device loaded with the computer-readable medium.
In another aspect, provided herein is the use of a reagent for detecting each marker of a marker panel of a biological sample from a subject in the manufacture of a product or product panel for assessing the prognosis of bowel cancer in a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
In another aspect, provided herein is a use of a product or set of products comprising reagents for detecting each marker of a set of markers from a biological sample of a subject for assessing the prognosis of bowel cancer in a subject, wherein the set of markers has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
Drawings
The invention is further described below with reference to the accompanying drawings. These displays are merely illustrative of exemplary embodiments of the invention and are not intended to limit the scope of the invention.
FIG. 1 is an exemplary flow chart of a general method of screening for a prognostic marker of bowel cancer according to one of the exemplary embodiments herein;
FIG. 2 is the result of a volcanic plot of differentially expressed genes in example 1 herein;
FIG. 3 is a matrix plot of correlation coefficients between CMS40 markers in example 1 herein;
FIG. 4 is a graph showing the results of survival analysis of CMS molecule typing data obtained with CMS40 marker set in example 1 herein;
FIG. 5 is a graph showing the results of survival analysis of CMS molecule typing data obtained with CMS20 marker sets in example 1 herein;
FIG. 6 is a graph showing the results of survival analysis of CMS molecule typing data obtained with CMS14 marker sets in example 1 herein;
FIG. 7 is a graph showing the results of three-class relapse-free survival analysis of CMS molecular typing data obtained with the CMS40 marker group in example 3 herein.
Detailed Description
The meaning of technical terms in the present application is consistent with the general understanding of those skilled in the art unless otherwise indicated. In the present application, "a" or a combination of various words thereof includes both singular and plural meanings unless specifically stated otherwise. In the present application, when a plurality of values, ranges of values, or combinations thereof are given for the same parameter or variable, it is equivalent to specifically disclose the values, the range ends, and the ranges of values formed by any combination thereof. Any numerical value, whether or not bearing modifiers such as "about", is intended to uniformly cover the approximate range, e.g., plus or minus 10%, 5%, etc., as would be understood by one of ordinary skill in the art. Each "embodiment" herein equally refers to and encompasses embodiments of the methods and systems of the present application. In the present application, one or more technical features of any embodiment may be freely combined with one or more technical features of any one or more other embodiments, and thus the resulting embodiment is also included in the present disclosure.
Some terms used in the embodiments of the present invention are enumerated below. Within the scope of the present description and claims, the relevant terms are defined as follows. Other terms not listed are defined as commonly used in the art, the meaning of which is well known to those skilled in the art.
The term "cancer" as used herein refers to the presence of cells that have characteristics typical of oncogenic cells, such as uncontrolled proliferation, immortality, metastatic potential, rapid growth and proliferation rate, as well as certain characteristic morphological features known in the art. In one embodiment, the "cancer" may be a bowel cancer or a liver cancer. In one embodiment, "cancer" may include premalignant cancer and malignant cancer.
In one embodiment, the method as described herein does not involve steps performed by a physician/physician, as will be appreciated by those skilled in the art. Thus, the results obtained by the methods as described herein require the combination of clinical data and other clinical manifestations before the final diagnosis by the physician can be provided to the subject. The final diagnosis as to whether a subject has bowel cancer is a physician's scope and is not considered part of the present disclosure. Thus, the terms "determining," "detecting," and "diagnosing" as used herein refer to identifying a subject as having a probability or likelihood of having a disease at any stage of development (e.g., bowel cancer) or determining a subject's susceptibility to developing the disease. In one embodiment, "diagnosis," "determination," "detection" is performed prior to manifestation of symptoms. In one embodiment, "diagnosing," "determining," "detecting" allows a clinician/physician (in combination with other clinical manifestations) to confirm bowel cancer in a subject suspected of having bowel cancer.
As used herein, the term "sample," "sample," or "biological sample" means a sample taken from a subject for detection of the type and amount of a marker of intestinal cancer therein. The subject sample may or may not be from the circulatory system, i.e., from blood. The subject sample may be any sample comprising a wash solution suitable for detecting markers of intestinal cancer, sources of which include tissue, whole blood, bone marrow, pleural fluid, peritoneal fluid, central spinal fluid, milk, urine, tears, sweat, saliva, organ secretions, bronchi, nasal cavities, throats, and the like. In some embodiments, the biological sample is selected from the group consisting of: fresh tissue samples, frozen tissue samples, or paraffin embedded tissue samples (e.g., FFPE samples).
As used herein, the term "prognosis" refers to the estimation of a likely outcome (with or without treatment) over a period of time in the future based on the subject's current condition. Generally, the results are in the form of a result probability (%), such as a cure probability (%), a recurrence probability (%), a death probability (%), and the like.
As used herein, the term "correlation analysis (Correlation Analysis)" refers to a method in statistics for studying whether or not there is a relationship between two or more random variables. The primary purpose of this is to determine if there is a statistical correlation or dependence between two or more variables and to attempt to quantify the degree and form of such correlation. The basic method of correlation analysis includes: linear correlation, rank correlation, distance correlation, etc.
As used herein, the term "correlation coefficient" is a statistic used in correlation analysis to quantify how closely a variable is related, reflecting the strength of a linear correlation between two variables. The larger the absolute value is, the stronger the linear correlation of the two variables is; a near 0 indicates weaker correlation. Common examples are pearson correlation coefficients, spearman correlation coefficients, and the like. The correlation coefficient referred to in this study is referred to as pearson correlation coefficient.
As used herein, the term "quantile normalization" is a commonly used method of sequencing data processing. The fractional normalization is an important pretreatment step in RNA-seq analysis, and can well eliminate systematic errors among different samples, so that the samples have comparability. The main idea is as follows: firstly, combining the expression quantity data of a certain gene of all samples, and after sequencing, calculating the expression quantity corresponding to each sequence site (such as 1%, 5%, 25% equivalent number). Then, for each sample, the original expression level is replaced with the average expression level of the quantile corresponding to the original expression level of the gene in the sample. Finally, the above procedure is repeated for all genes in the sample. Thus, through fractional conversion, the expression quantity distribution of different batches and different samples tends to be consistent, and the influence caused by different sequencing depths and technical errors among samples is eliminated. The quantile normalization has the advantages that: the method is simple and effective, the program is easy to realize, a reference sample is not needed, the robustness to the missing value and the abnormal value is good, and the relative magnitude relation of the expression quantity among the samples is kept unchanged.
As used herein, the term "consensus subtype (Consensus Molecular Subtypes)" or "CMS" is a method of molecular typing of colorectal cancer based on gene expression. The CMS typing is determined by calculating the gene expression similarity of each sample and the four modes, and determining which type the sample belongs to based on cosine similarity and a classification model. CMS typing is related to patient prognosis and efficacy, and can guide accurate treatment. Compared with single biomarker, CMS typing comprehensively utilizes whole genome expression information, and can reflect biological characteristics of tumor more comprehensively. Is an important tool for accurate medical treatment of intestinal cancer at present.
The term "CMS 1-inflammatory type (CMS 1-Inflammatory)" is a consensus subtype of intestinal cancer, representing inflammatory intestinal cancer. The main characteristics are that: increased immune cell infiltration, particularly T lymphocyte and macrophage infiltration; up-regulation of inflammation-associated genes, such as inflammatory cytokines IL-6, IL-8, etc.; increased expression of immune checkpoints such as PD-L1; natural killer T cell and TH1 type T cell-related gene activation; often microsatellite instability is high; the prognosis is better and the sensitivity to immunotherapy such as anti-PD-1 is higher.
The term "CMS 2-transient proliferative form (CMS 2-TRANSIT AMPLIFYING)" is a consensus molecule subtype of intestinal cancer, representing a class of hyperproliferative, poorly differentiated intestinal cancer subtypes, and is mainly characterized by high expression of genes associated with intestinal epithelial cell proliferation and transfer, such as cyclin, proliferation-associated antigens, etc.; down-regulation of genes associated with intestinal epithelial cell differentiation; abnormal activation of WNT pathway; mutations associated with tumorigenic driving genes APC, TP53, KRAS, etc.; the pathology is expressed as hypodifferentiation adenocarcinoma; poor prognosis; is sensitive to standard chemotherapy.
The term "CMS 2-intestinal epithelial cell type (CMS 2-Enterocyte)" is a consensus molecular subtype of intestinal cancer, representing a differentiated intestinal cancer subtype of intestinal epithelium, associated with the differentiation and absorption function of normal intestinal epithelial cells. The main characteristics of the method include: the intestinal epithelial cell differentiation and absorption related genes are highly expressed, such as alkaline phosphatase, intestinal alkaline phosphatase and the like; up-regulation of cell adhesion related proteins E-cadherin, etc.; WNT pathway down-regulation; the driving genes comprise APC, KRAS, TP and the like; the pathological type is highly differentiated adenocarcinoma; the prognosis is better; is sensitive to standard first-line chemotherapy.
The term "CMS 3-calix (CMS 3-Goblet like)" is a consensus subtype of intestinal cancer, representing a Goblet-like subtype of intestinal cancer, and is characterized mainly by: goblet cell differentiation related genes are highly expressed, such as MUC2, TFF3, etc.; mucus production-related pathway activation; often the RAS pathway is activated, KRAS mutations are more common; the pathological types are usually mucous adenocarcinoma; less immune infiltration; more occurs in the right (junction) half junction intestine; poor sensitivity to standard first-line chemotherapy; the prognosis is poor.
The term "CMS 4-Stem like" is a consensus subtype of intestinal cancer, representing a Stem cell-like subtype of intestinal cancer, and is characterized mainly by: high expression of stem cell marker genes, such as LGR5, EPHB2, etc.; upregulation of EMT-related genes; WNT pathway and Notch pathway activation; tumor heterogeneity is high and differentiation degree is poor; the pathological types are mainly ring cell carcinoma and mucinous carcinoma; proliferation of tumor stem cells, drug resistance to treatment; poor prognosis; the risk of tumor recurrence is high.
As used herein, a "p-value" represents the probability that observed data appears under a hypothetical space. Specifically, the p value represents: when the null hypothesis of the hypothesis test is true, a probability equal to or more extreme to the observed data is obtained. In general, if the p-value is very small, e.g., less than 0.01, then it is highly unlikely that the result is a random event under a null hypothesis, then the null hypothesis is rejected, i.e., the result is statistically significant. If the p-value is large, e.g. greater than 0.05, the null hypothesis cannot be rejected, i.e. the result is not statistically significant. The smaller the p-value, the higher the statistical significance of the result. Common significance determination thresholds are 0.05 and 0.01. The p value reflects the probability of observing the current result on the premise that the zero hypothesis is established, and is an important basis for judging whether the hypothesis test result is obvious or not. The smaller the p value, the more pronounced the result.
As used herein, the term "t-test" is a statistical method to test whether there is a significant difference in the mean of two samples. the basic idea of the t test is to construct a hypothesis, calculate the t value of the observation statistic, determine the p value according to the t distribution, and finally judge whether the original hypothesis is true according to the p value.
As used herein, the term "survival analysis" means that the survival analysis requires preparation of time to live (time), status (status) and other characteristic data. Where the status is generally indicated by 0 (no) or 1 (yes) whether an event (e.g., death) has occurred.
As used herein, the terms "subject," "patient," "subject" are used interchangeably and generally refer to a mammal, such as a bovine, equine, ovine, porcine, canine, feline, rodent, primate, such as a human or a non-human mammal.
A. marker panel for assessing prognosis of intestinal cancer
In one aspect, provided herein is a marker panel for assessing the prognosis of bowel cancer in a subject, having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
Herein, a marker set having the 14 gene markers described above may also be equivalently referred to as "CMS14" or "CMS14 marker set".
Herein, a "product set" refers to two or more pieces of a product that may be provided together (e.g., in the same kit or package (e.g., kit)) or separately (e.g., not in the same kit or package (e.g., kit)), which are used in combination and cannot be used alone (e.g., for assessing a subject's prognosis of bowel cancer).
In some embodiments, the bowel cancer is colorectal cancer. In some embodiments, the intestinal cancer is stage II/III colorectal cancer. In some embodiments, the bowel cancer may be selected from rectal cancer, left-half colon cancer, or right-half colon cancer.
In some embodiments, the expression (e.g., amount of expression) of each gene marker in the marker set can be used to determine CMS typing of a subject.
In some embodiments, the comparison of the expression (e.g., amount of expression) of each genetic marker in the marker set to the CMS typing feature genome expression template can be used to determine CMS typing of a subject to further assess the prognosis of the subject's bowel cancer.
As used herein, the term "CMS-typed feature genome expression template" refers to a pre-set classification template, e.g., a pre-set classification template data table. The classification templates may be a set of labeled genes that are labeled with different CMS classifications. As a specific gene, if the gene is labeled as a specific class, the gene has a higher expected expression in a sample belonging to the class than in a sample not belonging to the class. In one exemplary embodiment, the classification template data table may include at least two sets (e.g., two columns) of information: probes (e.g., entrez ID) and categories (e.g., CMS categories). In yet another exemplary embodiment, the classification template data table may include three sets (e.g., three columns) of information: probes (e.g., entrez ID), categories (e.g., CMS categories), genetic symbols.
In some embodiments, the comparison of the expression (e.g., amount of expression) of each gene marker in the marker set to the CMS-typed feature genome expression template is expressed as the cosine distance of the expression (e.g., amount of expression) of each gene marker in the CMS14 marker set to the CMS-typed feature genome expression template.
The term "cosine distance" is defined herein as a feature distance, which is the default feature distance from the nearest template predictive model. d indicates that the sample is closest to the characteristic genomic expression template for a certain CMS typing, i.e. most likely this typing. As a typing p-value for the metric statistical significance test, a random permutation test was used. The p-value of the significance test is calculated by randomly extracting the characteristic genes (default value is 1000 times) to generate random distribution of characteristic distances, comparing the distance between the detected sample and the typing characteristic template with the randomly generated distance distribution and correcting the False Discovery Rate (FDR). Smaller p values indicate a stronger statistical significance of the shortest cosine feature distance, representing a more reliable CMS typing of the prediction (typically the threshold for statistical significance p is p < 0.05).
In some embodiments, CMS typing may include one or more of the following: CMS1-inflammatory (CMS 1-Inflammatory), CMS 2-transient proliferative (CMS 2-TRANSIT AMPLIFYING), CMS 2-intestinal epithelial cell (CMS 2-Enterocyte), CMS3-Goblet like (CMS 3-Goblet like) and CMS 4-dry like (CMS 4-Stem like).
In a specific embodiment, the IFIT3 gene has a nucleotide structure as shown in ENSG00000119917, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CXCL13 gene has a nucleotide structure as shown in, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to, the nucleotide structure of ENSG 00000156234.
In a specific embodiment, the STAT1 gene has a nucleotide structure as shown in ENSG00000115415, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CXCL9 gene has a nucleotide structure as shown in, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to, an ENSG 00000138755.
In a specific embodiment, the CA4 gene has a nucleotide structure as shown in ENSG00000167434, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the AQP8 gene has a nucleotide structure as shown in ENSG00000103375, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the SLC4A4 gene has a nucleotide structure as shown in ENSG00000080493, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the EREG gene has a nucleotide structure as shown in ENSG00000124882, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the AREG gene has a nucleotide structure as shown in ENSG00000109321, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the REG4 gene has a nucleotide structure as shown in ENSG00000134193, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the SPINK4 gene has a nucleotide structure as shown in ENSG00000122711, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the SFRP2 gene has a nucleotide structure as shown in ENSG00000145423, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the ZEB2 gene has a nucleotide structure as shown in ENSG00000169554, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the SFRP4 gene has a nucleotide structure as shown in ENSG00000106483, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
Without intending to be bound by a particular theory, it was found that IFIT3, CXCL13, STAT1, and CXCL9, among the CMS14 marker set, are beneficial for specifically distinguishing CMS 1-inflammatory forms; CA4, AQP8 and SLC4A4 are beneficial for specific differentiation of CMS 2-intestinal epithelial cell types; EREG and AREG are advantageous for specifically distinguishing CMS 2-transient proliferative forms; SPINK4 and REG4 are advantageous for specifically distinguishing CMS 3-cuplike; SFRP2, ZEB2 and SFRP4 are advantageous for specifically differentiating CMS 4-stem.
In some embodiments, the marker set for assessing the prognosis of intestinal cancer in a subject may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2 and ZEB1. The marker set thus formed with 20 gene markers may also be equivalently referred to herein as "CMS20" or "CMS20 marker set".
In some embodiments, the marker panel for assessing the prognosis of bowel cancer in a subject has the following 20 gene markers :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4.
In a specific embodiment, the CA1 gene has a nucleotide structure as shown in ENSG00000133742, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CLCA4 gene has a nucleotide structure as shown in ENSG00000016602, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the MS4a12 gene has a nucleotide structure as shown in ENSG00000071203, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CLDN8 gene has a nucleotide structure as shown in ENSG00000156284, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the MUC2 gene has a nucleotide structure as shown in ENSG00000198788, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the ZEB1 gene has a nucleotide structure as shown in ENSG00000148516, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
Without intending to be bound by a particular theory, it was found that, among the CMS20 marker group, IFIT3, CXCL13, STAT1 and CXCL9 are beneficial for specifically distinguishing CMS 1-inflammatory forms; CA4, CA1, CLCA4, MS4A12, AQP8, CLDN8 and SLC4A4 facilitate specific differentiation of CMS 2-intestinal epithelial cell types; EREG and AREG are advantageous for specifically distinguishing CMS 2-transient proliferative forms; SPINK4, REG4 and MUC2 are advantageous for specifically distinguishing CMS 3-cuplike; SFRP2, ZEB1, ZEB2 and SFRP4 are advantageous for specifically differentiating CMS 4-stem.
In some embodiments, the marker set for assessing the prognosis of intestinal cancer in a subject may further comprise the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL11、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 on the basis of the CMS20 marker set. The marker set thus formed with 40 gene markers may also be equivalently referred to herein as "CMS40" or "CMS40 marker set".
In some embodiments, the marker panel for assessing the prognosis of bowel cancer in a subject has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
In a specific embodiment, the CXCL10 gene has a nucleotide structure as shown in, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to, an ENSG 00000169245.
In a specific embodiment, the AIM2 gene has a nucleotide structure as shown in ENSG00000163568, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the GBP5 gene has a nucleotide structure as shown in ENSG00000154451, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CXCL11 gene has a nucleotide structure as shown in, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to, an ENSG 00000169248.
In a specific embodiment, the KRT20 gene has a nucleotide structure as shown in ENSG00000171431, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the SLC26A3 gene has a nucleotide structure as shown in ENSG00000091138, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CA2 gene has a nucleotide structure as shown in ENSG00000104267, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the ASCL gene has a nucleotide structure as shown in ENSG00000183734, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the VAV3 gene has a nucleotide structure as shown in ENSG00000134215, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CELP gene has a nucleotide structure as shown in ENSG00000170827, or a nucleotide sequence that is at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the RNF43 gene has a nucleotide structure as shown in ENSG00000108375, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the MLPH gene has a nucleotide structure as shown in ENSG00000115648, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the TFF3 gene has a nucleotide structure as shown in ENSG00000160180, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the AQP3 gene has a nucleotide structure as shown in ENSG00000165272, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
In a specific embodiment, the COL3A1 gene has a nucleotide structure as shown in ENSG00000168542, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the SNAI2 gene has a nucleotide structure as shown in ENSG00000019549, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the CCDC80 gene has a nucleotide structure as shown in ENSG00000091986, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the AEBP1 gene has a nucleotide structure as shown in ENSG00000106624, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the TIMP2 gene has a nucleotide structure as shown in ENSG00000035862, or a nucleotide sequence at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical thereto.
In a specific embodiment, the TWIST1 gene has a nucleotide structure as shown in ENSG00000122691, or a nucleotide sequence having at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identity thereto.
Without intending to be bound by a particular theory, it is found that CXCL10, AIM2, IFIT3, CXCL13, STAT1, GBP5, CXCL11, and CXCL9 are beneficial in specifically distinguishing CMS 1-inflammatory types in the CMS40 marker group; CA4, CA1, CLCA4, KRT20, MS4A12, AQP8, CLDN8, SLC26A3, SLC4A4 and CA2 are beneficial for specific differentiation of CMS 2-intestinal epithelial cell types; ASCL2, VAV3, CELP, EREG, RNF43 and AREG are advantageous for specific differentiation of CMS 2-transient proliferative forms; SPINK4, REG4, MLPH, TFF3, MUC2 and AQP3 are beneficial for specific differentiation of CMS 3-cupped; COL3A1, SFRP2, ZEB1, SNAI2, ZEB2, SFRP4, CCDC80, AEBP1, TIMP2 and TWIST1 facilitate specific differentiation of CMS 4-stem.
B. Products for assessing prognosis of intestinal cancer
In another aspect, provided herein is a product or set of products for assessing the prognosis of a bowel cancer in a subject, comprising reagents for detecting each marker of a set of markers from a biological sample of the subject, the set of markers having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker panel").
The term "product set" as used herein is intended to mean a combination of more than one product, e.g., two, three, four or more products. The gene markers in the set of markers described herein may be present separately in different products in the set of products. More than one product of the product group can be combined to assess the prognosis of bowel cancer in a subject.
In some embodiments, the marker set may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2, and ZEB1 (i.e., forming a "CMS20 marker set"). In some such embodiments, the marker set has the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL11、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1.
In some embodiments, the marker set may further include the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL9、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 (i.e., forming a "CMS40 marker set") on the basis of the CMS20 marker set. In some such embodiments, the marker set has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
In some embodiments, the product may be a reagent product or a kit. In some embodiments, the product may be a combination of products selected from the group consisting of: a reagent product or kit.
In some embodiments, the reagents for detecting each marker of the set of markers from the biological sample of the subject can be packaged (e.g., packaged) in a container that the product contains. In some embodiments, the container is a sealed container (e.g., a capped sealed container).
In some embodiments, the reagent for detecting each marker of a set of markers in a biological sample from a subject is a reagent that facilitates detection of expression of each gene marker in the set of markers in the biological sample.
Herein, "expression" includes the production of mRNA from a gene or gene portion and includes the production of a protein encoded by the RNA or gene portion and also includes the presence of a detection substance associated with expression. For example, binding of a cDNA binding ligand (e.g., an antibody) to a gene or other oligonucleotide, protein, or protein fragment, and chromogenic portion of the binding ligand are included within the scope of the term "expressed"; the increase in half-pel density on immunoblots such as Western blots is also within the term "expression" based on biological molecules.
In some embodiments, the agent is an agent capable of detecting mRNA levels of the marker. Such reagents are well known in the art and include, but are not limited to, nucleic acid probes that specifically bind to a target sequence, primers that amplify a target sequence, non-specific fluorescent dyes (e.g., SYBR Green I), or combinations thereof.
In some embodiments, the nucleic acid probe may be a single-labeled nucleic acid probe, such as a radionuclide (e.g., 32P, 3H, 35S, etc.) labeled probe, biotin-labeled probe, horseradish peroxidase-labeled probe, digoxin-labeled probe, or a fluorophore (e.g., FITC, FAM, TET, HEX, TAMRA, cy, cy5, etc.) labeled probe; the nucleic acid probe may also be a double-labeled nucleic acid probe, such as a Taqman probe, a molecular beacon, a displacement probe, a scorpion primer probe, a QUAL probe, a FRET probe, or the like.
In some embodiments, the agent is an agent capable of detecting the protein level of the marker. In some embodiments, the reagent for protein level of the marker comprises a reagent required for immunological detection; the immunological assay is selected from ELISA assay, elispot assay, western blot or surface plasmon resonance. Reagents required for immunological assays are well known in the art and include, but are not limited to, antibodies, targeting polypeptides capable of specifically binding to at least one of ZNF33B, PRKX, LEF1, FKBP1A, SERPINB8, SULT1B 1.
In some embodiments, the reagent carries a detectable label, such as an enzyme (e.g., horseradish peroxidase, alkaline phosphatase, etc.), a radionuclide (e.g., 3H, 125I, 35S, 14C, 32P, etc.), a fluorescent dye (e.g., FITC, TRITC, PE, texas Red, quantum dots, cy7, alexa 750, etc.), an acridine ester compound, magnetic beads, colloidal gold or colored glass or plastic (e.g., polystyrene, polypropylene, latex, etc.) beads, and biotin for binding to the label-modified avidin (e.g., streptavidin) described above.
In some embodiments, the product may further comprise a reagent for pre-treating the sample. In some embodiments, reagents for pre-treating a sample include, but are not limited to, the following: a diluent (e.g., phosphate buffer or physiological saline) for diluting the sample; anticoagulants for preventing blood coagulation (e.g., heparin).
In some embodiments, the product further comprises an instrument (e.g., a tool and/or instrument) for detecting the gene expression level of the subject.
In some embodiments, the product further comprises reagents and/or instruments (e.g., tools and/or instruments) to detect other disease markers.
C. system for assessing prognosis of intestinal cancer
In another aspect, provided herein is a system for assessing the prognosis of bowel cancer in a subject, comprising a memory and one or more processors;
wherein the memory comprises:
Each gene expression data for a marker panel from a biological sample of a subject,
CMS typing feature genome expression template data, and
One or more processor-executable instructions;
Wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker set"); and
Wherein the one or more processor-executable instructions are configured to:
(a) Obtaining gene expression data of a marker set CMS14 from a biological sample of a subject;
(b) Calculating cosine distances between the gene expression data in (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
In some embodiments, the marker set may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2, and ZEB1 (i.e., forming a "CMS20 marker set"). In some such embodiments, the marker set has the following 20 gene markers :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4.
In some embodiments, the marker set may further include the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL11、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 (i.e., forming a "CMS40 marker set") on the basis of the CMS20 marker set. In some such embodiments, the marker set has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
In some embodiments, the bowel cancer is colorectal cancer. In some embodiments, the intestinal cancer is stage II/III colorectal cancer. In some embodiments, the bowel cancer may be selected from rectal cancer, left-half colon cancer, or right-half colon cancer.
In some embodiments, each gene expression data of a marker set from a biological sample of a subject is derived from RNA sequencing data of the sample tissue.
Sequencing may include any suitable sequencing technique known to those skilled in the art. In some embodiments, sequencing comprises high throughput sequencing.
In some embodiments, each gene expression data for a marker panel from a biological sample of a subject is obtained via data normalization processing based on RNA sequencing data for the sample tissue.
Normalization may include any suitable normalization method known to those skilled in the art. In some embodiments, the normalization comprises quantile normalization. In some embodiments, the quantile normalization comprises calculating LOG2 values of the original sequenced molecular numbers for each sample, then sorting the LOG2 molecular numbers for each sample, calculating an arithmetic average of all sample LOG2 molecular numbers corresponding to the order, and replacing the LOG2 values to form a normalized corrected molecular number matrix.
In some embodiments, CMS typing may include one or more of the following: CMS 1-inflammatory, CMS 2-transiently proliferative, CMS 2-intestinal epithelial cell, CMS 3-goblet-shaped and CMS 4-dry.
In some embodiments, the system comprises the following modules:
sequencing library building block: the module is used for constructing a sequencing library from sample RNA;
quantitative sequencing module: the module is used for quantifying and sequencing the sequencing library;
And a data normalization module: the module is used for carrying out data normalization on quantitative and sequencing results;
CMS molecular typing Module: the module is used for molecular typing of the data normalization result.
D. Computer readable medium
In another aspect, provided herein is a computer-readable medium comprising:
Each gene expression level data for a marker panel of biological samples from a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker panel")
CMS typing feature genome expression template data, and
Instructions for performing a method comprising:
(a) Obtaining gene expression level data of a marker set of a biological sample from a subject;
(b) Calculating cosine distances between the gene expression quantity data in the step (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
In some embodiments, the marker set may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2, and ZEB1 (i.e., forming a "CMS20 marker set"). In some such embodiments, the marker set has the following 20 gene markers :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4.
In some embodiments, the marker set may further include the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL9、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 (i.e., forming a "CMS40 marker set") on the basis of the CMS20 marker set. In some such embodiments, the marker set has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
In some embodiments, the bowel cancer is colorectal cancer. In some embodiments, the intestinal cancer is stage II/III colorectal cancer. In some embodiments, the bowel cancer may be selected from rectal cancer, left-half colon cancer, or right-half colon cancer.
In some embodiments, each gene expression data of a marker set from a biological sample of a subject is derived from RNA sequencing data of the sample tissue.
Sequencing may include any suitable sequencing technique known to those skilled in the art. In some embodiments, sequencing comprises high throughput sequencing.
In some embodiments, each gene expression data for a marker panel from a biological sample of a subject is obtained via data normalization processing based on RNA sequencing data for the sample tissue.
Normalization may include any suitable normalization method known to those skilled in the art. In some embodiments, the normalization comprises quantile normalization. In some embodiments, the quantile normalization comprises calculating LOG2 values of the original sequenced molecular numbers for each sample, then sorting the LOG2 molecular numbers for each sample, calculating an arithmetic average of all sample LOG2 molecular numbers corresponding to the order, and replacing the LOG2 values to form a normalized corrected molecular number matrix.
In some embodiments, CMS typing may include one or more of the following: CMS 1-inflammatory, CMS 2-transiently proliferative, CMS 2-intestinal epithelial cell, CMS 3-goblet-shaped and CMS 4-dry.
In another aspect, provided herein is an electronic device loaded with the computer-readable medium.
E. Use of the same
In another aspect, provided herein is the use of a reagent for detecting each marker of a marker panel of a biological sample from a subject in the manufacture of a product or product panel for assessing the prognosis of bowel cancer in a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker panel").
In another aspect, provided herein is a use of a product or set of products comprising reagents for detecting each marker of a set of markers from a biological sample of a subject for assessing the prognosis of bowel cancer in a subject, wherein the set of markers has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4 (i.e., "CMS14 marker panel").
In some embodiments, the marker set may further comprise the following 6 gene markers on the basis of the CMS14 marker set: CA1, CLCA4, MS4a12, CLDN8, MUC2, and ZEB1 (i.e., forming a "CMS20 marker set"). In some such embodiments, the marker set has the following 20 gene markers :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4.
In some embodiments, the marker set may further include the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL9、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1 (i.e., forming a "CMS40 marker set") on the basis of the CMS20 marker set. In some such embodiments, the marker set has the following 40 gene markers :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1.
The inventor determines a gene list related to CMS typing through a large number of screening analysis works through long-term and deep analysis research, performs differential expression gene analysis on expression data to obtain differential expression genes of each CMS type, analyzes and selects over-expressed genes for each CMS type, and refines the genes to obtain the specific marker group. The inventors have surprisingly found that with the marker panel described herein, a better CMS typing effect can be obtained with a smaller number of markers and can be used for accurate assessment of the prognosis of intestinal cancer. Thus, low cost, high efficiency, high quality prognostic evaluation means for bowel cancer are provided herein.
Examples
The invention is further elucidated below in connection with specific exemplary embodiments. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Appropriate modifications and variations of the invention may be made by those skilled in the art, and are within the scope of the invention.
EXAMPLE 1 screening of prognostic markers for intestinal cancer
The present study uses 1,116 colorectal cancer patient samples from 3 hospitals in the upper ocean for the screening and analysis of prognostic markers. FIG. 1 shows a general method for screening prognostic markers for intestinal cancer, comprising the following steps:
(one) determining a list of genes associated with CMS typing
1) Data conversion and normalization: aiming at RNA sequencing data of fresh tissues and FFPE tissues, a quantile normalization method is adopted for pretreatment, and the specific flow is as follows: due to the different technical characteristics of fresh tissue and FFPE tissue during sample preparation and sequencing, there is also a significant difference in the distribution of raw sequencing data molecular counts. To eliminate technical bias caused by sample sources (e.g., fresh tissue or FFPE), the data of samples from different sources need to be normalized to be comparable. First, the LOG value (LOG 2 value) of the raw sequencing molecule count for each sample is calculated to make the value distribution more symmetric. Considering that zero values exist, the zero values are added together by a small number of 0.25 to avoid becoming NA or missing values after taking the logarithm. The LOG2 values for each sample are then sorted and the average of LOG2 values for all samples at the same order position is calculated. For each gene, the original LOG2 value was replaced with its sequential average of LOG2 values in each sample. Thus, samples from different sources are processed by the same normalization method, and the result values are subjected to the same distribution. This eliminates the technical bias of sample sources, and more reliable conclusions can be drawn by performing differential analysis between different groupings based on the normalized data.
2) Construction of CMS templates: the CMS typing model uses the changes in expression (e.g., up-or down-regulation) of characteristic gene groups in each CMS type associated with gene pathway activity, signaling, and cellular biological activity processes to determine whether a test sample has a characteristic pattern associated with a particular CMS type. CMS typing herein is based on the reported 786 genes (SADANANDAM, A. (2013) Nat Med.19 (5): 619-25), and their related information (including updated information for these genes in NCBI et al databases), and subsequently discovered related gene groups, comprehensively screening and building gene sequencing combinations to train building of a typing model. The typing model algorithm uses nearest template prediction (NEAREST TEMPLATE Predictions) to calculate the cosine distance between the normalized molecular number value distribution of the characteristic genome of each sample and the CMS typing characteristic genome expression template, using the expression changes (e.g., up-or down-regulation) of the characteristic genome in each CMS typing for the gene pathway activity, signaling and cell biological activity.
The cosine distance is defined as the feature distance, which is the default feature distance of the nearest template prediction model. d indicates that the sample is closest to the characteristic genomic expression template of a certain CMS type, i.e. most likely belonging to that type. As a typing P value for the metric statistical significance test, a random permutation test was used. The P value of the significance test is calculated by randomly extracting the characteristic genes (default value is 1000 times) to generate random distribution of characteristic distances, comparing the distance between the detected sample and the parting characteristic template with the randomly generated distance distribution and correcting the False Discovery Rate (FDR). Smaller P values indicate a stronger statistical significance for the shortest cosine feature distance, representing a more reliable CMS typing of the prediction (typically the threshold for statistical significance P is P < 0.05).
Marker genes associated with CMS typing are predefined. Without loss of generality, it is assumed that group a genes (nA) up-regulate expression in CMS type a, but not or down-regulate expression in CMS type B. Likewise, group B genes (nB) up-regulate expression in CMS type B, but not or down-regulate expression in CMS type a. The group A plus group B genes constitute the characteristic genome templates A and B for the A-and B-type genotyping gene expression patterns of CMS. And extracting normalized values of nA+nB characteristic genes from N genes of the detected sample, comparing the normalized values with the A template and the B template respectively, and calculating the characteristic cosine distance relative to A or B. The closest template's typing becomes the predicted CMS typing. In calculating the statistical significance of the feature distance, na+nb genes were randomly repeated 1000 times from the N genes, resulting in a zero distribution of feature distance d. And calculating a calibration P value of the statistical significance by comparing the characteristic distance of the tested sample with zero distribution. Red and blue in the heatmap represent up-and down-regulated gene expression, respectively. CMS typing of all samples is classified into the following five types: CMS 1-inflammatory, CMS 2-intestinal epithelial cell, CMS 2-transiently proliferative, CMS 3-caliciviform and CMS 4-stem.
(II) differential expression gene analysis of the expression data to obtain differential expression genes of each CMS type
Taking 838 samples of a certain trime hospital in Shanghai as a training data set for screening a prognosis marker set; performing Differential Expression Gene (DEG) analysis by using R package limma and adopting eBayes algorithm, and obtaining the differential expression genes of each CMS type through the steps of reading expression data, setting an expression data matrix, converting data by voom functions, fitting a linear model by lmFit, adjusting variance and p value by eBayes, obtaining topTable results, drawing MD (machine direction) diagrams, drawing volcanic diagrams (such as figure 2) and the like.
(III) selection of overexpressed genes for each CMS type to give the marker set CMS40
Selection of overexpressed genes for each CMS type: genes with fold changes greater than 1.3 were first selected from the differential expression marker panel for each CMS type, and up to 10 genes were selected for each CMS type. Then, through gene ontology annotation analysis, 40 genes were finally determined to represent 5 CMS types, named "CMS40" marker set. As shown in table 1, the CMS40 marker group comprises :CXCL10、AIM2、IFIT3、CXCL13、STAT1、GBP5、CXCL11、CXCL9、CA4、CA1、CLCA4、KRT20、MS4A12、AQP8、CLDN8、SLC26A3、SLC4A4、CA2、ASCL2、VAV3、CELP、EREG、RNF43、AREG、SPINK4、REG4、MLPH、TFF3、MUC2、AQP3、COL3A1、SFRP2、ZEB1、SNAI2、ZEB2、SFRP4、CCDC80、AEBP1、TIMP2 and TWIST1. Among them, CXCL10, AIM2, IFIT3, CXCL13, STAT1, GBP5, CXCL11 and CXCL9 can specifically distinguish CMS 1-inflammatory (cms1_ Inflammatory) types; CA4, CA1, CLCA4, KRT20, MS4A12, AQP8, CLDN8, SLC26A3, SLC4A4 and CA2 are able to distinguish specifically CMS 2-intestinal epithelial cells (CMS2_ Enterocyte) type; ASCL2, VAV3, CELP, EREG, RNF43 and AREG are able to distinguish specifically CMS 2-transient proliferation (cms2_ TRANSIT AMPLIFYING) types; SPINK4, REG4, MLPH, TFF3, MUC2 and AQP3 can specifically distinguish CMS 3-cup (cms3_goblet like); COL3A1, SFRP2, ZEB1, SNAI2, ZEB2, SFRP4, CCDC80, AEBP1, TIMP2 and TWIST1 can specifically distinguish CMS 4-Stem like types.
Table 1 list of 40 significant differential genes with fold changes greater than 1.3
(IV) further comprehensively verifying the marker group obtained by screening and taking intersection to obtain a marker group CMS20
We compared the intersection of the marker combination (40 candidate genes) obtained by the above screening with the verified intestinal cancer molecular typing genes (38 candidate genes as positive control marker group), and the intersection obtained contained 20 gene markers. These 20 consensus genes are gene markers identified as being associated with colorectal cancer by both algorithmic prediction and empirical verification, which are more accurate and reliable than either algorithmic prediction or empirical verification alone. These 20 genes were assigned to the high confidence intestinal cancer molecular typing marker group, named "CMS20" marker group. Specifically, the CMS20 marker panel comprises :IFIT3、CXCL13、STAT1、CXCL9、CA4、CA1、CLCA4、MS4A12、AQP8、CLDN8、SLC4A4、EREG、AREG、SPINK4、REG4、MUC2、SFRP2、ZEB1、ZEB2 and SFRP4. Of these, IFIT3, CXCL13, STAT1 and CXCL9 are able to specifically distinguish CMS 1-inflammatory (cms1_ Innammatory) types; CA4, CA1, CLCA4, MS4A12, AQP8, CLDN8 and SLC4A4 are able to distinguish specifically CMS 2-intestinal epithelial cells (CMS2_ Entcrocyte); EREG and AREG can specifically distinguish CMS 2-transient proliferation (CMS2_ TRANSIT AMPLIFYING) types; SPINK4, REG4, and MUC2 can specifically distinguish CMS 3-cup (cms3_goblet like) types; SFRP2, ZEB1, ZEB2 and SFRP4 can specifically distinguish CMS 4-Stem like types.
TABLE 2 CMS20 marker set
CMS type | Gene | EntrezID | Fold change | t | P value | adj.P.Val |
CMS1_Inflammatory | IFIT3 | 3437 | 1.933130837 | 9.657039362 | 1.90086E-21 | 1.81612E-20 |
CMS1_Inflammatory | CXCL13 | 10563 | 2.529919558 | 14.07397641 | 2.38204E-42 | 3.77791E-40 |
CMS1_Inflammatory | STAT1 | 6772 | 2.518828709 | 13.92543994 | 1.51279E-41 | 1.49955E-39 |
CMS1_Inflammatory | CXCL9 | 4283 | 2.398874858 | 13.11158448 | 2.89751E-37 | 2.08884E-35 |
CMS2_Enterocyte | CA4 | 762 | 2.366377764 | 17.35527122 | 1.23907E-61 | 9.82581E-59 |
CMS2_Enterocyte | CA1 | 759 | 2.340091454 | 17.08073666 | 6.50317E-60 | 2.57851E-57 |
CMS2_Enterocyte | CLCA4 | 22802 | 2.249182095 | 16.21269278 | 1.34659E-54 | 3.55949E-52 |
CMS2_Enterocyte | MS4A12 | 54860 | 2.215540592 | 15.90077477 | 9.83181E-53 | 1.55933E-50 |
CMS2_Enterocyte | AQP8 | 343 | 2.197753239 | 15.6196028 | 4.47165E-51 | 5.91003E-49 |
CMS2_Enterocyte | CLDN8 | 9073 | 2.118084287 | 14.70039096 | 8.33649E-46 | 8.26355E-44 |
CMS2_Enterocyte | SLC4A4 | 8671 | 2.091970501 | 14.62326613 | 2.25184E-45 | 1.98412E-43 |
CMS2_Transit.amplifying | EREG | 2069 | 2.058542252 | 14.52587039 | 7.85432E-45 | 1.03808E-42 |
CMS2_Transit.amplifying | AREG | 374 | 2.004373762 | 13.90426091 | 1.96663E-41 | 1.94942E-39 |
CMS3_Goblet.like | SPINK4 | 27290 | 2.530018071 | 12.07638181 | 4.11256E-32 | 3.26126E-29 |
CMS3_Goblet.like | REG4 | 83998 | 2.50208751 | 11.93404367 | 1.97628E-31 | 7.83596E-29 |
CMS3_Goblet.like | MUC2 | 4583 | 2.110217066 | 9.715236529 | 1.11207E-21 | 2.20468E-19 |
CMS4_Stem.like | SFRP2 | 6423 | 1.649504053 | 9.299433542 | 4.81886E-20 | 4.19929E-19 |
CMS4_Stem.like | ZEB1 | 6935 | 1.570443115 | 8.203515337 | 4.96341E-16 | 3.0992E-15 |
CMS4_stem.like | ZEB2 | 9839 | 1.587685106 | 8.452358663 | 6.65993E-17 | 4.51395E-16 |
CMS4_Stem.like | SFRP4 | 6424 | 2.205499829 | 15.19508982 | 1.29815E-48 | 1.71572E-46 |
(V) use of correlation analysis to find surrogate relationship between prognostic markers
Surrogate relationships among 40 prognostic markers in CMS40 were found using correlation analysis, further refining the marker set: pcarson correlation coefficients between the markers are calculated using corrplot functions in the R language, forming a correlation coefficient matrix graph (see fig. 3). From the correlation coefficient matrix diagram the following can be concluded:
The correlation coefficient of CA1 and CA4 is 0.74, and CA1 and CA4 can be replaced with each other;
The correlation coefficient of CA1 and CA2 is 0.69, and CA1 and CA2 can be replaced with each other;
c, the correlation coefficient of the CA1 and the CLCA4 is 0.69, and the CA1 and the CLCA4 can be replaced with each other;
The correlation coefficient of the CA1 and the MS4A12 is 0.75, and the CA1 and the MS4A12 can be replaced with each other;
the correlation coefficient of CA1 and CLDNB is 0.71, and CA1 and CLDNB can be replaced with each other;
The correlation coefficient of the CA4 and the MS4A12 is 0.69, and the CA4 and the MS4A12 can be replaced with each other;
The correlation coefficient of the CA4 and the CLCA4 is 0.67, and the CA4 and the CLCA4 can be replaced with each other;
the correlation coefficient of the CA2 and the MS4A12 is 0.70, and the CA2 and the MS4A12 can be replaced with each other;
The correlation coefficient of CLCA4 and MS4A12 is 0.70, CLCA4 and MS4A12 can be replaced with each other;
The correlation coefficient of REG4 and SPINK4 is 0.69, REG4 and SPINK4 can be replaced with each other;
the correlation coefficient of REG4 and MUC2 is 0.67, REG4 and MUC2 can be replaced with each other;
The correlation coefficient of spink4 and MUC2 is 0.75, spink4 and MUC2 can be substituted for each other;
the correlation coefficient of EREG and AREG is 0.73, and the EREG and the AREG can be replaced with each other;
the correlation coefficient among CXCL9, CXCL10 and CXCL11 is at least 0.69, and the three can be replaced mutually.
(Six) further refining the marker set based on the substitution relationship between the markers to obtain a marker set CMS14
On the basis of CMS20, the set of markers is further refined by combining the correlation results and the difference analysis results between genes to obtain a list of 14 genes, named "CMS14" marker set. Specifically, the CMS14 marker panel comprises: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4. Of these, IFIT3, CXCL13, STAT1 and CXCL9 are able to specifically distinguish CMS 1-inflammatory (cms1_ Inflammatory) types; CA4, AQP8 and SLC4A4 are able to specifically differentiate CMS 2-intestinal epithelial cell (CMS2_ Enterocyte) types; AREG and EREG can specifically distinguish CMS 2-transient proliferation (CMS2_ TRANSIT AMPLIFYING) types; SPINK4 and REG4 can specifically distinguish CMS 3-cup (cms3_goblet like) types; SFRP2, ZEB2 and SFRP4 are able to distinguish specifically CMS 4-Stem like types.
TABLE 3 list of CMS14 genes
Seventh, all 1,116 colorectal cancer samples (including 278 of the other 2 hospitals) were used to assess CMS molecular typing ability of the marker group
CMS molecules were typed for all samples using CMS40, CMS20, CMS14 marker binding, respectively, based on 1,116 sample data. At the same time, 1,116 samples of relapse free survival data were collected, including relapse status and time.
The recurrence-free survival analysis was performed with CMS molecular typing results, recurrence status and time for all samples obtained using CMS40 marker panel. The results are shown in FIG. 4, wherein the abscissa represents time to live without recurrence in months; the ordinate indicates survival without recurrence. The light green curve represents cms1_ Inflammatory, the yellow curve represents cms2_ Enterocyte, the blue curve represents cms2_transmission. The five curves are closer when the survival rate is close to 1, and when the survival rate is reduced to be close to 0, the divergence degree between the curves is increased, which indicates that obvious prognosis differences exist between different groups. The overall differences in the five curves were significant (p < 0.0001), suggesting that the CMS40 signature gene can be an important biomarker for accurate prediction of colorectal cancer patient survival.
The recurrence-free survival analysis was performed with CMS molecular typing results, recurrence status and time for all samples obtained with CMS20 marker panel. The results are shown in FIG. 5, wherein the abscissa represents time to live without recurrence in months; the ordinate indicates survival without recurrence. The light green curve represents cms1_ Inflammatory, the yellow curve represents cms2_ Enterocyte, the blue curve represents cms2_transmission. The five curves are closer when the survival rate is close to 1, and when the survival rate is reduced to be close to 0, the divergence degree between the curves is increased, which indicates that obvious prognosis differences exist between different groups. The overall differences in the five curves were significant (p < 0.017), indicating that the CMS20 signature gene can be used as an important biomarker for accurate prediction of colorectal cancer patient survival.
Disease-free survival analysis was performed with CMS molecular typing results, disease status and time for all samples obtained using the CMS14 marker set. The results are shown in FIG. 6, wherein the abscissa represents disease-free survival time in months; the ordinate is disease-free survival. The red curve represents cms1_ Inflammatory, the blue curve represents cms2_ Enterocyte, the green curve represents cms2_transit.mapping, the orange curve represents cms3_goblet.like, and the purple curve represents cms4_stem.like. The five curves are closer when the survival rate is close to 1, and when the survival rate is reduced to be close to 0, the divergence degree between the curves is increased, which indicates that obvious prognosis differences exist between different groups. The overall differences in the five curves were significant (p=0.0015), indicating that CMS14 signature genes can be used as important biomarkers for accurate prediction of colorectal cancer patient survival.
Example 2 molecular typing to assess the prognosis of intestinal cancer
Sequencing library construction
(1) Sample RNA extraction
RNA extraction and purification of FFPE samples was performed using RNA extraction kit according to the instructions. The extracted RNA was precisely quantified (Qubit fluorometer is recommended) and the extracted product RNA was stored at-70 ℃.
(2) Sequencing library construction
1) Reverse transcription of RNA samples into cDNA: the RNA sample is synthesized into complementary DNA (cDNA) by reverse transcriptase reaction.
2) Adding a molecular tag: a unique molecular barcode was added to each cDNA sample (molecular barcode). The molecular bar code is a short sequence tag, and is used for accurately calculating the expression level later by marking each original template molecule in the amplification process.
3) Purifying: and purifying the cDNA product marked with the bar code from the reaction system by adopting a magnetic bead or silicon column purification method, and removing impurities such as proteins.
4) First round PCR reaction: a first round of Polymerase Chain Reaction (PCR) was performed. The gene-specific primers are used to amplify the marker genes to obtain a sufficient amount of templates.
5) And (3) purifying a PCR product: and purifying the first round PCR product again to prevent impurities from affecting subsequent reactions. The purification can be carried out by a magnetic bead or silicon column purification method.
6) Second round of linker sequence PCR reaction: a second round of PCR was performed, adding sample multiplex index and sequencing platform universal adaptor sequences for distinguishing samples and binding to sequencing chips.
7) Sequencing library purification: and finally purifying again to obtain a final sequencing library, and purifying by a magnetic bead purification method.
8) Sequencing library quantification: the library is precisely quantified, and the sequencing quantity of the upper machine is controlled, so that each sample is ensured to reach the required sequencing depth. The usual quantitative methods include fluorescent quantitative PCR, chip electrophoresis, and the like.
(II) quantitative data acquisition
The detection technology of the gene expression data acquisition of the intestinal cancer prognosis marker combination comprises, but is not limited to, a real-time fluorescence quantitative qPCR technology, a gene chip technology and a high-throughput full transcriptome (or targeted gene transcriptome) sequencing technology. This example illustrates targeted genome sequencing.
The technological process includes the following steps:
1. And taking any one of CMS14, CMS20 and CMS40 marker combinations as sequencing targets according to the screening result of the intestinal cancer prognosis markers.
2. Total RNA was extracted from tumor samples using RNA extraction kit. The quality and concentration of RNA were assessed.
3. CDNA libraries of this set of target genes were obtained by reverse transcription and PCR amplification. Unique molecular barcodes were added during the amplification process for calculating the expression level.
4. Single-ended or double-ended sequencing was performed on each sample using a high throughput sequencing platform such as Illumina.
5. And comparing the readings to a reference genome, and obtaining an expression quantity matrix of each gene according to the unique molecular tag count.
6. And comprehensively analyzing the expression quantity matrixes of the plurality of samples to obtain a combined prognosis model.
(III) data normalization: to eliminate technical bias caused by sample sources (e.g., fresh tissue or FFPE), the data of samples from different sources need to be normalized to be comparable. First, the LOG value (LOG 2 value) of the raw sequencing molecule count for each sample is calculated to make the value distribution more symmetric. Considering that zero values exist, the zero values are added together by a small number of 0.25 to avoid becoming NA or missing values after taking the logarithm. The LOG2 values for each sample are then sorted and the average of LOG2 values for all samples at the same order position is calculated. For each gene, the original LOG2 value was replaced with its sequential average of LOG2 values in each sample. Thus, samples from different sources are processed by the same normalization method, and the result values are subjected to the same distribution. This eliminates the technical bias of sample sources, and more reliable conclusions can be drawn by performing differential analysis between different groupings based on the normalized data.
(IV) CMS molecular typing: based on the normalized expression matrix, a marker panel of CMS14 or CMS20 or CMS40 was used to calculate cosine similarity of each sample on the 5 characteristic expression patterns of CMS typing (CMS 1 inflammatory, CMS2 intestinal epithelial, CMS2 transient proliferation, CMS3 goblet cell, CMS4 stem cell). The subtype of CMS molecules to which each sample belongs is determined according to the principle of maximum cosine similarity.
EXAMPLE 3 study of the correlation of CMS40 with survival of colon cancer patients
Sample source: FFPE samples from 229 patients with stage III colon cancer from some trimethyl hospital in the Shanghai.
(II) clinical sample data: progression free survival data was collected for 229 samples.
(III) obtaining gene expression data:
sequencing library construction and quantitative data acquisition were performed as in the molecular typing system for evaluation of intestinal cancer prognosis of example 2, resulting in expression data of all 40 signature genes in the CMS40 marker group of 229 clinical samples.
(IV) data normalization: firstly, all data are subjected to Log2 conversion, and then, expression data are normalized by adopting a quantile normalization method.
(V) CMS molecular typing: based on the normalized expression matrix, a marker set of CMS40 was used to calculate cosine similarity of each sample on 5 characteristic expression patterns of CMS typing (CMS 1 inflammatory, CMS2 intestinal epithelial, CMS2 transient proliferation, CMS3 goblet cell, CMS4 stem cell). And determining the CMS subtype to which each sample belongs according to the principle of maximum cosine similarity.
And (six) lifetime analysis:
all samples were divided into three groups according to 5 types of CMS molecules:
① Low risk group: samples of cms1 inflammatory cms1_ Inflammatory and cms2 transient increment cms2_transit. Amplifing are a group;
② Medium risk group: the samples of cms2 intestinal epithelium cms2_ Enterocyte and cms3 goblet cell type cms3_goblet.like are a group;
③ High-risk group: cms4 stem cell type cms4_stem.like samples are a group.
The results of the patient progression free survival analysis are shown in fig. 7, wherein the abscissa is disease free survival time in months; the ordinate is disease-free survival. The red curve in the figure represents samples of the low risk group, consisting of samples of both cms1_ Inflammatory and cms2_transit. The blue curve represents samples of the medium risk group, consisting of samples of both cms2_ Enterocyte and cms3_goblet.like subtypes; the green curve represents the samples of the high risk group, consisting of samples of cms4_stem.like, a CMS subtype. The three curves are closer when the survival rate is close to 1, and when the survival rate is reduced to be close to 0, the divergence degree between the curves is increased, which indicates that obvious prognosis differences exist between different groups. The difference in survival curves of the low-risk group (cms1_ Inflammatory and cms2_transit.mapping) and the medium-risk group (cms2_ Enterocyte and cms3_goblet.like) was significant (p=0.0069), while the difference in survival curves of the low-risk group and the high-risk group (cms4_stem.like) was very significant (p < 0.0001), indicating that the cms40 gene marker group can accurately predict colorectal cancer patient survival as an important biomarker group.
The foregoing description is only of a preferred embodiment of the invention and is not intended to limit the invention in any way or in any way. It should be noted that several modifications and additions will be possible to those skilled in the art without departing from the method of the invention, which modifications and additions should also be considered as within the scope of the invention. Equivalent embodiments of the present invention will be apparent to those skilled in the art having the benefit of the teachings disclosed herein, when considered in the light of the foregoing general description and the following detailed description, and without departing from the spirit and scope of the invention; meanwhile, any equivalent changes, modifications and evolution of the above embodiments according to the essential technology of the present invention still fall within the scope of the technical solution of the present invention.
Meanwhile, the present application uses specific words to describe embodiments of the present application. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the application may be combined as suitable.
In some embodiments, numbers describing the components, number of attributes are used, it being understood that such numbers being used in the description of embodiments are modified in some examples by the modifier "about," approximately, "or" substantially. Unless otherwise indicated, "about," "approximately," or "substantially" indicate that the number allows for a 20% variation. Accordingly, in some embodiments, numerical parameters set forth in the specification and claims are approximations that may vary depending upon the desired properties sought to be obtained by the individual embodiments. In some embodiments, the numerical parameters should take into account the specified significant digits and employ a method for preserving the general number of digits. Although the numerical ranges and parameters set forth herein are approximations in some embodiments for use in determining the breadth of the range, in particular embodiments, the numerical values set forth herein are as precisely as possible.
Claims (10)
1. A product or set of products for assessing the prognosis of intestinal cancer in a subject, comprising reagents for detecting each marker of a set of markers from a biological sample of the subject, the set of markers having the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
2. A system for assessing a prognosis of bowel cancer in a subject comprising a memory and one or more processors;
wherein the memory comprises:
Each gene expression data for a marker panel from a biological sample of a subject,
CMS typing feature genome expression template data, and
One or more processor-executable instructions;
Wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4; and
Wherein the one or more processor-executable instructions are configured to:
(a) Obtaining gene expression data for the marker set from a biological sample of a subject;
(b) Calculating cosine distances between the gene expression data in (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
3. A computer-readable medium, comprising:
Each gene expression level data for a marker panel of biological samples from a subject, wherein the marker panel has the following 14 gene markers: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2 and SFRP4,
CMS typing feature genome expression template data, and
Instructions for performing a method comprising:
(a) Obtaining gene expression level data of a marker set of a biological sample from a subject;
(b) Calculating cosine distances between the gene expression quantity data in the step (a) and the CMS typing characteristic genome expression templates;
(c) Determining CMS typing based on the computed cosine distances of (b); and
(D) Determining a prognosis of the bowel cancer in the subject based on the CMS type obtained in (c).
4. An electronic device loaded with the computer-readable medium of claim 3.
5. Use of a reagent for detecting each marker of a marker panel of a biological sample from a subject, wherein the marker panel has the following 14 gene markers, in the manufacture of a product or product panel for assessing the prognosis of bowel cancer in a subject: IFIT3, CXCL13, STAT1, CXCL9, CA4, AQP8, SLC4A4, AREG, EREG, SPINK4, REG4, SFRP2, ZEB2, and SFRP4.
6. The product or group of products of claim 1, the system of claim 2, the computer readable medium of claim 3, the electronic device of claim 4 or the use of claim 5, wherein the set of markers further has the following 6 genetic markers: CA1, CLCA4, MS4a12, CLDN8, MUC2 and ZEB1.
7. The product or set of products, system, computer readable medium, electronic device or use of claim 6, wherein said set of markers further has the following 20 gene markers :CXCL10、AIM2、GBP5、CXCL11、KRT20、SLC26A3、CA2、ASCL2、VAV3、CELP、RNF43、MLPH、TFF3、AQP3、COL3A1、SNAI2、CCDC80、AEBP1、TIMP2 and TWIST1.
8. The product or group of products of claim 1, the system of claim 2, the computer readable medium of claim 3, the electronic device of claim 4 or the use of claim 5, wherein the bowel cancer is colorectal cancer; specifically, the bowel cancer is stage II/III colorectal cancer.
9. The system of claim 2, comprising the following modules:
sequencing library building block: the module is used for constructing a sequencing library from sample RNA;
quantitative sequencing module: the module is used for quantifying and sequencing the sequencing library;
And a data normalization module: the module is used for carrying out data normalization on quantitative and sequencing results;
CMS molecular typing Module: the module is used for molecular typing of the data normalization result.
10. The product or set of products of claim 1, or the use of claim 5, wherein the reagent is a reagent for detecting expression of each gene marker in the set of markers in the biological sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410127289.7A CN117947166A (en) | 2024-01-30 | 2024-01-30 | Marker group, product, system and application thereof for prognosis of intestinal cancer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410127289.7A CN117947166A (en) | 2024-01-30 | 2024-01-30 | Marker group, product, system and application thereof for prognosis of intestinal cancer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117947166A true CN117947166A (en) | 2024-04-30 |
Family
ID=90803393
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410127289.7A Pending CN117947166A (en) | 2024-01-30 | 2024-01-30 | Marker group, product, system and application thereof for prognosis of intestinal cancer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117947166A (en) |
-
2024
- 2024-01-30 CN CN202410127289.7A patent/CN117947166A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6908571B2 (en) | Gene expression profile algorithms and tests to quantify the prognosis of prostate cancer | |
CN103299188B (en) | Molecular diagnostic assay for cancer | |
ES2741745T3 (en) | Method to use gene expression to determine the prognosis of prostate cancer | |
US8822153B2 (en) | Molecular diagnosis and typing of lung cancer variants | |
US20190085407A1 (en) | Methods and compositions for diagnosis of glioblastoma or a subtype thereof | |
US20130337444A1 (en) | NANO46 Genes and Methods to Predict Breast Cancer Outcome | |
JP7301798B2 (en) | Gene Expression Profile Algorithm for Calculating Recurrence Scores for Patients with Kidney Cancer | |
US20140154681A1 (en) | Methods to Predict Breast Cancer Outcome | |
WO2015073949A1 (en) | Method of subtyping high-grade bladder cancer and uses thereof | |
US20210238695A1 (en) | Methods of mast cell tumor prognosis and uses thereof | |
US20200109457A1 (en) | Chromosomal assessment to diagnose urogenital malignancy in dogs | |
US20160115551A1 (en) | Methods to predict risk of recurrence in node-positive early breast cancer | |
US20180371553A1 (en) | Methods and compositions for the analysis of cancer biomarkers | |
JP2022149754A (en) | Simultaneous detecting method of cancer | |
WO2018146162A1 (en) | Molecular biomarker for prognosis of sepsis patients | |
CN101457254B (en) | Gene chip and kit for liver cancer prognosis | |
Musella et al. | Use of formalin-fixed paraffin-embedded samples for gene expression studies in breast cancer patients | |
US20160298198A1 (en) | Method for predicting development of melanoma brain metastasis | |
CN117947166A (en) | Marker group, product, system and application thereof for prognosis of intestinal cancer | |
US10240206B2 (en) | Biomarkers and methods for predicting benefit of adjuvant chemotherapy | |
CN113528670A (en) | Biomarker and detection kit for predicting postoperative late-stage recurrence risk of liver cancer patient | |
CN114507717A (en) | Method for predicting bile duct cancer recurrence by combining multiple mRNAs and application thereof | |
US20170226592A1 (en) | Methods and kits used in classifying adrenocortical carcinoma | |
Sehovic | Analysis of Circulating Biomarkers for Minimally Invasive Early Detection of Breast Cancer | |
CN115369173A (en) | Application of gene marker combination in predicting prognosis of bladder urothelial cancer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |