US20210388449A1 - Detecting cancer cell of origin - Google Patents
Detecting cancer cell of origin Download PDFInfo
- Publication number
- US20210388449A1 US20210388449A1 US17/284,310 US201917284310A US2021388449A1 US 20210388449 A1 US20210388449 A1 US 20210388449A1 US 201917284310 A US201917284310 A US 201917284310A US 2021388449 A1 US2021388449 A1 US 2021388449A1
- Authority
- US
- United States
- Prior art keywords
- sample
- classifier
- coca
- biomarker
- subtype
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 509
- 201000011510 cancer Diseases 0.000 title claims abstract description 244
- 239000000090 biomarker Substances 0.000 claims abstract description 701
- 230000014509 gene expression Effects 0.000 claims abstract description 415
- 238000000034 method Methods 0.000 claims abstract description 339
- 108090000623 proteins and genes Proteins 0.000 claims description 139
- 210000001519 tissue Anatomy 0.000 claims description 96
- 206010006187 Breast cancer Diseases 0.000 claims description 92
- 208000026310 Breast neoplasm Diseases 0.000 claims description 92
- 210000004027 cell Anatomy 0.000 claims description 89
- 150000007523 nucleic acids Chemical class 0.000 claims description 83
- 239000002299 complementary DNA Substances 0.000 claims description 79
- 102000039446 nucleic acids Human genes 0.000 claims description 78
- 108020004707 nucleic acids Proteins 0.000 claims description 78
- 238000003556 assay Methods 0.000 claims description 73
- 238000004458 analytical method Methods 0.000 claims description 65
- 238000001514 detection method Methods 0.000 claims description 64
- 238000009396 hybridization Methods 0.000 claims description 64
- 238000012549 training Methods 0.000 claims description 55
- 201000006585 gastric adenocarcinoma Diseases 0.000 claims description 53
- 206010039491 Sarcoma Diseases 0.000 claims description 52
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 52
- 230000003321 amplification Effects 0.000 claims description 51
- 208000029742 colonic neoplasm Diseases 0.000 claims description 51
- 208000024770 Thyroid neoplasm Diseases 0.000 claims description 50
- 208000005017 glioblastoma Diseases 0.000 claims description 50
- 201000002510 thyroid cancer Diseases 0.000 claims description 50
- 206010052747 Adenocarcinoma pancreas Diseases 0.000 claims description 49
- 201000010897 colon adenocarcinoma Diseases 0.000 claims description 49
- 201000002094 pancreatic adenocarcinoma Diseases 0.000 claims description 49
- 208000008732 thymoma Diseases 0.000 claims description 49
- 208000032320 Germ cell tumor of testis Diseases 0.000 claims description 48
- 201000010240 chromophobe renal cell carcinoma Diseases 0.000 claims description 48
- 201000010302 ovarian serous cystadenocarcinoma Diseases 0.000 claims description 48
- 201000001281 rectum adenocarcinoma Diseases 0.000 claims description 48
- 208000002918 testicular germ cell tumor Diseases 0.000 claims description 48
- 206010027406 Mesothelioma Diseases 0.000 claims description 44
- 201000010915 Glioblastoma multiforme Diseases 0.000 claims description 42
- 201000005969 Uveal melanoma Diseases 0.000 claims description 42
- 208000020990 adrenal cortex carcinoma Diseases 0.000 claims description 42
- 208000007128 adrenocortical carcinoma Diseases 0.000 claims description 42
- 210000004185 liver Anatomy 0.000 claims description 42
- 208000030173 low grade glioma Diseases 0.000 claims description 42
- 201000003701 uterine corpus endometrial carcinoma Diseases 0.000 claims description 42
- 201000005825 prostate adenocarcinoma Diseases 0.000 claims description 41
- 238000003559 RNA-seq method Methods 0.000 claims description 35
- 230000004083 survival effect Effects 0.000 claims description 33
- 238000002493 microarray Methods 0.000 claims description 30
- 210000001124 body fluid Anatomy 0.000 claims description 27
- 238000003196 serial analysis of gene expression Methods 0.000 claims description 27
- 208000000102 Squamous Cell Carcinoma of Head and Neck Diseases 0.000 claims description 26
- 201000000459 head and neck squamous cell carcinoma Diseases 0.000 claims description 26
- 238000010240 RT-PCR analysis Methods 0.000 claims description 25
- 210000003734 kidney Anatomy 0.000 claims description 25
- 238000012163 sequencing technique Methods 0.000 claims description 23
- 201000009030 Carcinoma Diseases 0.000 claims description 21
- 208000010507 Adenocarcinoma of Lung Diseases 0.000 claims description 20
- 201000005249 lung adenocarcinoma Diseases 0.000 claims description 20
- 201000005243 lung squamous cell carcinoma Diseases 0.000 claims description 19
- 238000012896 Statistical algorithm Methods 0.000 claims description 18
- 206010005084 bladder transitional cell carcinoma Diseases 0.000 claims description 17
- 201000001528 bladder urothelial carcinoma Diseases 0.000 claims description 17
- 210000004369 blood Anatomy 0.000 claims description 14
- 239000008280 blood Substances 0.000 claims description 14
- 210000001808 exosome Anatomy 0.000 claims description 14
- 206010073071 hepatocellular carcinoma Diseases 0.000 claims description 14
- 239000008188 pellet Substances 0.000 claims description 14
- 210000003296 saliva Anatomy 0.000 claims description 14
- 210000002700 urine Anatomy 0.000 claims description 14
- 208000030808 Clear cell renal carcinoma Diseases 0.000 claims description 13
- 208000031671 Large B-Cell Diffuse Lymphoma Diseases 0.000 claims description 13
- 206010030155 Oesophageal carcinoma Diseases 0.000 claims description 13
- 206010036790 Productive cough Diseases 0.000 claims description 13
- 206010073251 clear cell renal cell carcinoma Diseases 0.000 claims description 13
- 206010012818 diffuse large B-cell lymphoma Diseases 0.000 claims description 13
- 231100000844 hepatocellular carcinoma Toxicity 0.000 claims description 13
- 210000002865 immune cell Anatomy 0.000 claims description 13
- 210000003802 sputum Anatomy 0.000 claims description 13
- 208000024794 sputum Diseases 0.000 claims description 13
- 208000017897 Carcinoma of esophagus Diseases 0.000 claims description 12
- 238000000636 Northern blotting Methods 0.000 claims description 12
- 201000006612 cervical squamous cell carcinoma Diseases 0.000 claims description 12
- 208000006990 cholangiocarcinoma Diseases 0.000 claims description 12
- 201000005619 esophageal carcinoma Diseases 0.000 claims description 12
- 239000012530 fluid Substances 0.000 claims description 12
- 101710163270 Nuclease Proteins 0.000 claims description 11
- 238000003633 gene expression assay Methods 0.000 claims description 11
- 239000003814 drug Substances 0.000 claims description 10
- 201000001441 melanoma Diseases 0.000 claims description 10
- 239000003155 DNA primer Substances 0.000 claims description 9
- 206010033128 Ovarian cancer Diseases 0.000 claims description 8
- 206010014733 Endometrial cancer Diseases 0.000 claims description 7
- 206010014759 Endometrial neoplasm Diseases 0.000 claims description 7
- 206010060862 Prostate cancer Diseases 0.000 claims description 7
- 208000000236 Prostatic Neoplasms Diseases 0.000 claims description 7
- 210000000481 breast Anatomy 0.000 claims description 7
- 210000001072 colon Anatomy 0.000 claims description 7
- 208000030381 cutaneous melanoma Diseases 0.000 claims description 7
- 201000003708 skin melanoma Diseases 0.000 claims description 7
- 208000036764 Adenocarcinoma of the esophagus Diseases 0.000 claims description 6
- 206010030137 Oesophageal adenocarcinoma Diseases 0.000 claims description 6
- 208000007571 Ovarian Epithelial Carcinoma Diseases 0.000 claims description 6
- 206010061332 Paraganglion neoplasm Diseases 0.000 claims description 6
- 206010038019 Rectal adenocarcinoma Diseases 0.000 claims description 6
- 208000034254 Squamous cell carcinoma of the cervix uteri Diseases 0.000 claims description 6
- 201000006598 bladder squamous cell carcinoma Diseases 0.000 claims description 6
- 208000011892 carcinosarcoma of the corpus uteri Diseases 0.000 claims description 6
- 230000004663 cell proliferation Effects 0.000 claims description 6
- 201000003683 endocervical adenocarcinoma Diseases 0.000 claims description 6
- 208000028653 esophageal adenocarcinoma Diseases 0.000 claims description 6
- 208000024312 invasive carcinoma Diseases 0.000 claims description 6
- 208000019420 lymphoid neoplasm Diseases 0.000 claims description 6
- 208000007312 paraganglioma Diseases 0.000 claims description 6
- 208000028591 pheochromocytoma Diseases 0.000 claims description 6
- 201000005290 uterine carcinosarcoma Diseases 0.000 claims description 6
- 229940124597 therapeutic agent Drugs 0.000 claims description 5
- 239000003596 drug target Substances 0.000 claims description 3
- 239000000203 mixture Substances 0.000 abstract description 27
- 238000002560 therapeutic procedure Methods 0.000 abstract description 26
- 238000009169 immunotherapy Methods 0.000 abstract description 23
- 230000004044 response Effects 0.000 abstract description 14
- 239000000523 sample Substances 0.000 description 440
- 108020004635 Complementary DNA Proteins 0.000 description 74
- 238000010804 cDNA synthesis Methods 0.000 description 74
- 230000004547 gene signature Effects 0.000 description 60
- 206010005003 Bladder cancer Diseases 0.000 description 53
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 52
- 201000005112 urinary bladder cancer Diseases 0.000 description 52
- 108020004999 messenger RNA Proteins 0.000 description 46
- 238000012360 testing method Methods 0.000 description 45
- 102000004169 proteins and genes Human genes 0.000 description 44
- 238000004422 calculation algorithm Methods 0.000 description 40
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 39
- 239000000047 product Substances 0.000 description 35
- 239000004037 angiogenesis inhibitor Substances 0.000 description 28
- 229940121369 angiogenesis inhibitor Drugs 0.000 description 28
- 101150026888 84 gene Proteins 0.000 description 26
- 238000012512 characterization method Methods 0.000 description 26
- 239000013615 primer Substances 0.000 description 26
- 238000011282 treatment Methods 0.000 description 26
- VEEGZPWAAPPXRB-BJMVGYQFSA-N (3e)-3-(1h-imidazol-5-ylmethylidene)-1h-indol-2-one Chemical compound O=C1NC2=CC=CC=C2\C1=C/C1=CN=CN1 VEEGZPWAAPPXRB-BJMVGYQFSA-N 0.000 description 22
- 230000035755 proliferation Effects 0.000 description 22
- 108020004414 DNA Proteins 0.000 description 20
- 229940076838 Immune checkpoint inhibitor Drugs 0.000 description 20
- 239000012274 immune-checkpoint protein inhibitor Substances 0.000 description 20
- 238000001959 radiotherapy Methods 0.000 description 20
- 238000006243 chemical reaction Methods 0.000 description 17
- 238000003752 polymerase chain reaction Methods 0.000 description 17
- 108091034117 Oligonucleotide Proteins 0.000 description 16
- 230000000295 complement effect Effects 0.000 description 14
- 201000010099 disease Diseases 0.000 description 14
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 14
- -1 CDCl20 Proteins 0.000 description 13
- 239000002955 immunomodulating agent Substances 0.000 description 13
- 238000001727 in vivo Methods 0.000 description 13
- 239000013074 reference sample Substances 0.000 description 13
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 description 12
- 239000000427 antigen Substances 0.000 description 12
- 102000036639 antigens Human genes 0.000 description 12
- 108091007433 antigens Proteins 0.000 description 12
- 230000005934 immune activation Effects 0.000 description 12
- 230000008569 process Effects 0.000 description 12
- 230000001105 regulatory effect Effects 0.000 description 12
- 239000005557 antagonist Substances 0.000 description 11
- 238000003745 diagnosis Methods 0.000 description 11
- 108010038512 Platelet-Derived Growth Factor Proteins 0.000 description 10
- 102000010780 Platelet-Derived Growth Factor Human genes 0.000 description 10
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 10
- 239000012634 fragment Substances 0.000 description 10
- 230000036039 immunity Effects 0.000 description 10
- 239000003112 inhibitor Substances 0.000 description 10
- 108010073929 Vascular Endothelial Growth Factor A Proteins 0.000 description 9
- 102000005789 Vascular Endothelial Growth Factors Human genes 0.000 description 9
- 108010019530 Vascular Endothelial Growth Factors Proteins 0.000 description 9
- 238000002790 cross-validation Methods 0.000 description 9
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 9
- 108010050904 Interferons Proteins 0.000 description 8
- 108091028043 Nucleic acid sequence Proteins 0.000 description 8
- 238000013459 approach Methods 0.000 description 8
- 230000001413 cellular effect Effects 0.000 description 8
- 230000005931 immune cell recruitment Effects 0.000 description 8
- 239000000126 substance Substances 0.000 description 8
- 238000012706 support-vector machine Methods 0.000 description 8
- 238000010200 validation analysis Methods 0.000 description 8
- 102100039498 Cytotoxic T-lymphocyte protein 4 Human genes 0.000 description 7
- 102000014150 Interferons Human genes 0.000 description 7
- 206010027476 Metastases Diseases 0.000 description 7
- 102100040678 Programmed cell death protein 1 Human genes 0.000 description 7
- 230000027455 binding Effects 0.000 description 7
- 230000008512 biological response Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 229940079322 interferon Drugs 0.000 description 7
- 230000009401 metastasis Effects 0.000 description 7
- 238000010208 microarray analysis Methods 0.000 description 7
- 239000003607 modifier Substances 0.000 description 7
- 238000010606 normalization Methods 0.000 description 7
- 239000012188 paraffin wax Substances 0.000 description 7
- ISWRGOKTTBVCFA-UHFFFAOYSA-N pirfenidone Chemical compound C1=C(C)C=CC(=O)N1C1=CC=CC=C1 ISWRGOKTTBVCFA-UHFFFAOYSA-N 0.000 description 7
- 108090000765 processed proteins & peptides Proteins 0.000 description 7
- 238000004393 prognosis Methods 0.000 description 7
- 108010074328 Interferon-gamma Proteins 0.000 description 6
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 6
- 101150044878 US18 gene Proteins 0.000 description 6
- 230000004075 alteration Effects 0.000 description 6
- 239000012472 biological sample Substances 0.000 description 6
- 239000003795 chemical substances by application Substances 0.000 description 6
- 230000002596 correlated effect Effects 0.000 description 6
- 238000011161 development Methods 0.000 description 6
- 238000000338 in vitro Methods 0.000 description 6
- 230000015654 memory Effects 0.000 description 6
- 230000036961 partial effect Effects 0.000 description 6
- 102000040430 polynucleotide Human genes 0.000 description 6
- 108091033319 polynucleotide Proteins 0.000 description 6
- 239000002157 polynucleotide Substances 0.000 description 6
- 230000002441 reversible effect Effects 0.000 description 6
- 210000003491 skin Anatomy 0.000 description 6
- 229940021747 therapeutic vaccine Drugs 0.000 description 6
- 210000004881 tumor cell Anatomy 0.000 description 6
- 102100039339 Atrial natriuretic peptide receptor 1 Human genes 0.000 description 5
- 102100023830 Homeobox protein EMX2 Human genes 0.000 description 5
- 101000961044 Homo sapiens Atrial natriuretic peptide receptor 1 Proteins 0.000 description 5
- 101001048970 Homo sapiens Homeobox protein EMX2 Proteins 0.000 description 5
- 102000008070 Interferon-gamma Human genes 0.000 description 5
- 108700011259 MicroRNAs Proteins 0.000 description 5
- 108700026495 N-Myc Proto-Oncogene Proteins 0.000 description 5
- 108091008606 PDGF receptors Proteins 0.000 description 5
- 102000011653 Platelet-Derived Growth Factor Receptors Human genes 0.000 description 5
- 230000004721 adaptive immunity Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 5
- 238000007405 data analysis Methods 0.000 description 5
- 230000004069 differentiation Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 238000000126 in silico method Methods 0.000 description 5
- 230000015788 innate immune response Effects 0.000 description 5
- 229960003130 interferon gamma Drugs 0.000 description 5
- 239000002679 microRNA Substances 0.000 description 5
- 239000011325 microbead Substances 0.000 description 5
- 102000004196 processed proteins & peptides Human genes 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000003753 real-time PCR Methods 0.000 description 5
- 102000005962 receptors Human genes 0.000 description 5
- 108020003175 receptors Proteins 0.000 description 5
- 230000035945 sensitivity Effects 0.000 description 5
- 238000007619 statistical method Methods 0.000 description 5
- 229960005486 vaccine Drugs 0.000 description 5
- 239000002525 vasculotropin inhibitor Substances 0.000 description 5
- WVYWICLMDOOCFB-UHFFFAOYSA-N 4-methyl-2-pentanol Chemical compound CC(C)CC(C)O WVYWICLMDOOCFB-UHFFFAOYSA-N 0.000 description 4
- 208000031261 Acute myeloid leukaemia Diseases 0.000 description 4
- 102000052594 Anaphase-Promoting Complex-Cyclosome Apc2 Subunit Human genes 0.000 description 4
- 102000004392 Aquaporin 5 Human genes 0.000 description 4
- 108090000976 Aquaporin 5 Proteins 0.000 description 4
- 102100026292 Asialoglycoprotein receptor 1 Human genes 0.000 description 4
- 102100021896 Bcl-2-like protein 15 Human genes 0.000 description 4
- 102100039848 Beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase 3 Human genes 0.000 description 4
- 102100032312 Brevican core protein Human genes 0.000 description 4
- 102100025805 Cadherin-1 Human genes 0.000 description 4
- 102100025475 Carcinoembryonic antigen-related cell adhesion molecule 5 Human genes 0.000 description 4
- 102100025473 Carcinoembryonic antigen-related cell adhesion molecule 6 Human genes 0.000 description 4
- 102100032215 Cathepsin E Human genes 0.000 description 4
- 102100038216 Charged multivesicular body protein 4c Human genes 0.000 description 4
- 101150055427 Chmp4c gene Proteins 0.000 description 4
- 102100038447 Claudin-4 Human genes 0.000 description 4
- 102100033885 Collagen alpha-2(XI) chain Human genes 0.000 description 4
- 102100033635 Collectrin Human genes 0.000 description 4
- 206010009944 Colon cancer Diseases 0.000 description 4
- 102100031096 Cubilin Human genes 0.000 description 4
- 102100025177 Dimethylglycine dehydrogenase, mitochondrial Human genes 0.000 description 4
- 102100035275 E3 ubiquitin-protein ligase CBL-C Human genes 0.000 description 4
- 102000012804 EPCAM Human genes 0.000 description 4
- 101150084967 EPCAM gene Proteins 0.000 description 4
- 102100035079 ETS-related transcription factor Elf-3 Human genes 0.000 description 4
- 102100038595 Estrogen receptor Human genes 0.000 description 4
- 102100030279 G-protein coupled receptor 35 Human genes 0.000 description 4
- 102000017700 GABRP Human genes 0.000 description 4
- 102000000805 Galectin 4 Human genes 0.000 description 4
- 108010001515 Galectin 4 Proteins 0.000 description 4
- 102100041003 Glutamate carboxypeptidase 2 Human genes 0.000 description 4
- 102100032558 Glypican-2 Human genes 0.000 description 4
- 102100034227 Grainyhead-like protein 2 homolog Human genes 0.000 description 4
- 102100034629 Hemopexin Human genes 0.000 description 4
- 102100022057 Hepatocyte nuclear factor 1-alpha Human genes 0.000 description 4
- 102100022373 Homeobox protein DLX-5 Human genes 0.000 description 4
- 102100028092 Homeobox protein Nkx-3.1 Human genes 0.000 description 4
- 101000785944 Homo sapiens Asialoglycoprotein receptor 1 Proteins 0.000 description 4
- 101000971075 Homo sapiens Bcl-2-like protein 15 Proteins 0.000 description 4
- 101000887635 Homo sapiens Beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase 3 Proteins 0.000 description 4
- 101000731086 Homo sapiens Brevican core protein Proteins 0.000 description 4
- 101000888580 Homo sapiens Calcium-activated chloride channel regulator 2 Proteins 0.000 description 4
- 101000914324 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 5 Proteins 0.000 description 4
- 101000914326 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 6 Proteins 0.000 description 4
- 101000869031 Homo sapiens Cathepsin E Proteins 0.000 description 4
- 101000882890 Homo sapiens Claudin-4 Proteins 0.000 description 4
- 101000710619 Homo sapiens Collagen alpha-2(XI) chain Proteins 0.000 description 4
- 101000945075 Homo sapiens Collectrin Proteins 0.000 description 4
- 101000889276 Homo sapiens Cytotoxic T-lymphocyte protein 4 Proteins 0.000 description 4
- 101001005618 Homo sapiens Dimethylglycine dehydrogenase, mitochondrial Proteins 0.000 description 4
- 101000737269 Homo sapiens E3 ubiquitin-protein ligase CBL-C Proteins 0.000 description 4
- 101000877379 Homo sapiens ETS-related transcription factor Elf-3 Proteins 0.000 description 4
- 101000882584 Homo sapiens Estrogen receptor Proteins 0.000 description 4
- 101001009545 Homo sapiens G-protein coupled receptor 35 Proteins 0.000 description 4
- 101000822394 Homo sapiens Gamma-aminobutyric acid receptor subunit pi Proteins 0.000 description 4
- 101001014664 Homo sapiens Glypican-2 Proteins 0.000 description 4
- 101001069929 Homo sapiens Grainyhead-like protein 2 homolog Proteins 0.000 description 4
- 101001045751 Homo sapiens Hepatocyte nuclear factor 1-alpha Proteins 0.000 description 4
- 101000901627 Homo sapiens Homeobox protein DLX-5 Proteins 0.000 description 4
- 101000578249 Homo sapiens Homeobox protein Nkx-3.1 Proteins 0.000 description 4
- 101000998020 Homo sapiens Keratin, type I cytoskeletal 18 Proteins 0.000 description 4
- 101001007027 Homo sapiens Keratin, type II cuticular Hb1 Proteins 0.000 description 4
- 101001056452 Homo sapiens Keratin, type II cytoskeletal 6A Proteins 0.000 description 4
- 101001056445 Homo sapiens Keratin, type II cytoskeletal 6B Proteins 0.000 description 4
- 101000975496 Homo sapiens Keratin, type II cytoskeletal 8 Proteins 0.000 description 4
- 101000993838 Homo sapiens Keratinocyte differentiation factor 1 Proteins 0.000 description 4
- 101000663639 Homo sapiens Kunitz-type protease inhibitor 2 Proteins 0.000 description 4
- 101001038505 Homo sapiens Ly6/PLAUR domain-containing protein 1 Proteins 0.000 description 4
- 101001005714 Homo sapiens MARVEL domain-containing protein 3 Proteins 0.000 description 4
- 101000623900 Homo sapiens Mucin-13 Proteins 0.000 description 4
- 101000623901 Homo sapiens Mucin-16 Proteins 0.000 description 4
- 101000972286 Homo sapiens Mucin-4 Proteins 0.000 description 4
- 101000581940 Homo sapiens Napsin-A Proteins 0.000 description 4
- 101001136592 Homo sapiens Prostate stem cell antigen Proteins 0.000 description 4
- 101001001272 Homo sapiens Prostatic acid phosphatase Proteins 0.000 description 4
- 101001062790 Homo sapiens Protein FAM171A2 Proteins 0.000 description 4
- 101000726113 Homo sapiens Protein crumbs homolog 3 Proteins 0.000 description 4
- 101000632467 Homo sapiens Pulmonary surfactant-associated protein D Proteins 0.000 description 4
- 101001100101 Homo sapiens Retinoic acid-induced protein 3 Proteins 0.000 description 4
- 101000711466 Homo sapiens SAM pointed domain-containing Ets transcription factor Proteins 0.000 description 4
- 101000740178 Homo sapiens Sal-like protein 4 Proteins 0.000 description 4
- 101000688930 Homo sapiens Signaling threshold-regulating transmembrane adapter 1 Proteins 0.000 description 4
- 101000740162 Homo sapiens Sodium- and chloride-dependent transporter XTRP3 Proteins 0.000 description 4
- 101000847107 Homo sapiens Tetraspanin-8 Proteins 0.000 description 4
- 101000819111 Homo sapiens Trans-acting T-cell-specific transcription factor GATA-3 Proteins 0.000 description 4
- 101000891300 Homo sapiens Transcription elongation factor A protein-like 5 Proteins 0.000 description 4
- 101000652324 Homo sapiens Transcription factor SOX-17 Proteins 0.000 description 4
- 101000808126 Homo sapiens Uroplakin-3b Proteins 0.000 description 4
- 101000760251 Homo sapiens Zinc finger protein 578 Proteins 0.000 description 4
- 101000723641 Homo sapiens Zinc finger protein 695 Proteins 0.000 description 4
- 101000772560 Homo sapiens Zinc finger transcription factor Trps1 Proteins 0.000 description 4
- 206010021143 Hypoxia Diseases 0.000 description 4
- 229940123038 Integrin antagonist Drugs 0.000 description 4
- 102100033421 Keratin, type I cytoskeletal 18 Human genes 0.000 description 4
- 102100028340 Keratin, type II cuticular Hb1 Human genes 0.000 description 4
- 102100025656 Keratin, type II cytoskeletal 6A Human genes 0.000 description 4
- 102100025655 Keratin, type II cytoskeletal 6B Human genes 0.000 description 4
- 102100023972 Keratin, type II cytoskeletal 8 Human genes 0.000 description 4
- 102100031728 Keratinocyte differentiation factor 1 Human genes 0.000 description 4
- 102100039020 Kunitz-type protease inhibitor 2 Human genes 0.000 description 4
- 102100030931 Ladinin-1 Human genes 0.000 description 4
- 102100040284 Ly6/PLAUR domain-containing protein 1 Human genes 0.000 description 4
- 102100025080 MARVEL domain-containing protein 3 Human genes 0.000 description 4
- 102100022430 Melanocyte protein PMEL Human genes 0.000 description 4
- 102100034256 Mucin-1 Human genes 0.000 description 4
- 102100023124 Mucin-13 Human genes 0.000 description 4
- 102100023123 Mucin-16 Human genes 0.000 description 4
- 102100022693 Mucin-4 Human genes 0.000 description 4
- 208000033776 Myeloid Acute Leukemia Diseases 0.000 description 4
- 102100030124 N-myc proto-oncogene protein Human genes 0.000 description 4
- 102100027343 Napsin-A Human genes 0.000 description 4
- 102100035486 Nectin-4 Human genes 0.000 description 4
- 102100027341 Neutral and basic amino acid transport protein rBAT Human genes 0.000 description 4
- 108060006580 PRAME Proteins 0.000 description 4
- 102000036673 PRAME Human genes 0.000 description 4
- 102100035278 Pendrin Human genes 0.000 description 4
- 102100024216 Programmed cell death 1 ligand 1 Human genes 0.000 description 4
- 101710089372 Programmed cell death protein 1 Proteins 0.000 description 4
- 102100036735 Prostate stem cell antigen Human genes 0.000 description 4
- 102100030535 Protein FAM171A2 Human genes 0.000 description 4
- 102100023075 Protein Niban 2 Human genes 0.000 description 4
- 102100027316 Protein crumbs homolog 3 Human genes 0.000 description 4
- 102100027845 Pulmonary surfactant-associated protein D Human genes 0.000 description 4
- 238000002123 RNA extraction Methods 0.000 description 4
- 101710100969 Receptor tyrosine-protein kinase erbB-3 Proteins 0.000 description 4
- 102100029986 Receptor tyrosine-protein kinase erbB-3 Human genes 0.000 description 4
- 102100038453 Retinoic acid-induced protein 3 Human genes 0.000 description 4
- 108091058557 SILV Proteins 0.000 description 4
- 108091006464 SLC25A23 Proteins 0.000 description 4
- 108091006507 SLC26A4 Proteins 0.000 description 4
- 108091006311 SLC3A1 Proteins 0.000 description 4
- 108091007568 SLC45A3 Proteins 0.000 description 4
- 102100037192 Sal-like protein 4 Human genes 0.000 description 4
- 102100024453 Signaling threshold-regulating transmembrane adapter 1 Human genes 0.000 description 4
- 102100037253 Solute carrier family 45 member 3 Human genes 0.000 description 4
- 101150057140 TACSTD1 gene Proteins 0.000 description 4
- 102100032802 Tetraspanin-8 Human genes 0.000 description 4
- 108060008245 Thrombospondin Proteins 0.000 description 4
- 102000002938 Thrombospondin Human genes 0.000 description 4
- 102100021386 Trans-acting T-cell-specific transcription factor GATA-3 Human genes 0.000 description 4
- 102100040422 Transcription elongation factor A protein-like 5 Human genes 0.000 description 4
- 102100030243 Transcription factor SOX-17 Human genes 0.000 description 4
- 108060008682 Tumor Necrosis Factor Proteins 0.000 description 4
- 102000000852 Tumor Necrosis Factor-alpha Human genes 0.000 description 4
- 102100027881 Tumor protein 63 Human genes 0.000 description 4
- 102100038850 Uroplakin-3b Human genes 0.000 description 4
- 102100035140 Vitronectin Human genes 0.000 description 4
- 101100022813 Zea mays MEG3 gene Proteins 0.000 description 4
- 102100024722 Zinc finger protein 578 Human genes 0.000 description 4
- 102100027855 Zinc finger protein 695 Human genes 0.000 description 4
- 102100030619 Zinc finger transcription factor Trps1 Human genes 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000003491 array Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000012093 association test Methods 0.000 description 4
- RITAVMQDGBJQJZ-FMIVXFBMSA-N axitinib Chemical compound CNC(=O)C1=CC=CC=C1SC1=CC=C(C(\C=C\C=2N=CC=CC=2)=NN2)C2=C1 RITAVMQDGBJQJZ-FMIVXFBMSA-N 0.000 description 4
- 239000003153 chemical reaction reagent Substances 0.000 description 4
- 238000002512 chemotherapy Methods 0.000 description 4
- 238000002405 diagnostic procedure Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 229940079593 drug Drugs 0.000 description 4
- 239000000975 dye Substances 0.000 description 4
- 238000010195 expression analysis Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000007954 hypoxia Effects 0.000 description 4
- 238000003018 immunoassay Methods 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 229950008499 plitidepsin Drugs 0.000 description 4
- UUSZLLQJYRSZIS-LXNNNBEUSA-N plitidepsin Chemical compound CN([C@H](CC(C)C)C(=O)N[C@@H]1C(=O)N[C@@H]([C@H](CC(=O)O[C@H](C(=O)[C@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N2CCC[C@H]2C(=O)N(C)[C@@H](CC=2C=CC(OC)=CC=2)C(=O)O[C@@H]1C)C(C)C)O)[C@@H](C)CC)C(=O)[C@@H]1CCCN1C(=O)C(C)=O UUSZLLQJYRSZIS-LXNNNBEUSA-N 0.000 description 4
- 108010049948 plitidepsin Proteins 0.000 description 4
- 239000002987 primer (paints) Substances 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 229960002633 ramucirumab Drugs 0.000 description 4
- 230000002829 reductive effect Effects 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 238000011477 surgical intervention Methods 0.000 description 4
- 238000001356 surgical procedure Methods 0.000 description 4
- 102100025674 Angiopoietin-related protein 4 Human genes 0.000 description 3
- 102100021663 Baculoviral IAP repeat-containing protein 5 Human genes 0.000 description 3
- 108010021064 CTLA-4 Antigen Proteins 0.000 description 3
- 229940045513 CTLA4 antagonist Drugs 0.000 description 3
- 102100039532 Calcium-activated chloride channel regulator 2 Human genes 0.000 description 3
- 108091007854 Cdh1/Fizzy-related Proteins 0.000 description 3
- 102000016289 Cell Adhesion Molecules Human genes 0.000 description 3
- 108010067225 Cell Adhesion Molecules Proteins 0.000 description 3
- 102100026139 DNA damage-inducible transcript 4 protein Human genes 0.000 description 3
- 230000007067 DNA methylation Effects 0.000 description 3
- 238000002965 ELISA Methods 0.000 description 3
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 3
- 102100030421 Fatty acid-binding protein 5 Human genes 0.000 description 3
- 108090000369 Glutamate Carboxypeptidase II Proteins 0.000 description 3
- 101000922080 Homo sapiens Cubilin Proteins 0.000 description 3
- 101000912753 Homo sapiens DNA damage-inducible transcript 4 protein Proteins 0.000 description 3
- 101001062855 Homo sapiens Fatty acid-binding protein 5 Proteins 0.000 description 3
- 101001133056 Homo sapiens Mucin-1 Proteins 0.000 description 3
- 101000601664 Homo sapiens Paired box protein Pax-8 Proteins 0.000 description 3
- 101001117317 Homo sapiens Programmed cell death 1 ligand 1 Proteins 0.000 description 3
- 101000611936 Homo sapiens Programmed cell death protein 1 Proteins 0.000 description 3
- 101000979748 Homo sapiens Protein NDRG1 Proteins 0.000 description 3
- 101000686225 Homo sapiens Ras-related GTP-binding protein D Proteins 0.000 description 3
- 102000006992 Interferon-alpha Human genes 0.000 description 3
- 108010047761 Interferon-alpha Proteins 0.000 description 3
- 102000004034 Kelch-Like ECH-Associated Protein 1 Human genes 0.000 description 3
- 108090000484 Kelch-Like ECH-Associated Protein 1 Proteins 0.000 description 3
- 239000002147 L01XE04 - Sunitinib Substances 0.000 description 3
- 239000003798 L01XE11 - Pazopanib Substances 0.000 description 3
- 239000002137 L01XE24 - Ponatinib Substances 0.000 description 3
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 3
- 108700012912 MYCN Proteins 0.000 description 3
- 101150022024 MYCN gene Proteins 0.000 description 3
- 101000686934 Mus musculus Prolactin-7D1 Proteins 0.000 description 3
- 241001467552 Mycobacterium bovis BCG Species 0.000 description 3
- 102000007561 NF-E2-Related Factor 2 Human genes 0.000 description 3
- 108010071382 NF-E2-Related Factor 2 Proteins 0.000 description 3
- 238000012408 PCR amplification Methods 0.000 description 3
- 108091059809 PVRL4 Proteins 0.000 description 3
- 102100037502 Paired box protein Pax-8 Human genes 0.000 description 3
- 102000035195 Peptidases Human genes 0.000 description 3
- 108091005804 Peptidases Proteins 0.000 description 3
- 206010035226 Plasma cell myeloma Diseases 0.000 description 3
- 102100026651 Pro-adrenomedullin Human genes 0.000 description 3
- 102100035703 Prostatic acid phosphatase Human genes 0.000 description 3
- 102100024980 Protein NDRG1 Human genes 0.000 description 3
- 102100025002 Ras-related GTP-binding protein D Human genes 0.000 description 3
- 208000006265 Renal cell carcinoma Diseases 0.000 description 3
- 201000000582 Retinoblastoma Diseases 0.000 description 3
- 102100034018 SAM pointed domain-containing Ets transcription factor Human genes 0.000 description 3
- 108091006601 SLC16A3 Proteins 0.000 description 3
- 108010002687 Survivin Proteins 0.000 description 3
- 108010005246 Tissue Inhibitor of Metalloproteinases Proteins 0.000 description 3
- 102000005876 Tissue Inhibitor of Metalloproteinases Human genes 0.000 description 3
- 101710140697 Tumor protein 63 Proteins 0.000 description 3
- 108010005656 Ubiquitin Thiolesterase Proteins 0.000 description 3
- 102000005918 Ubiquitin Thiolesterase Human genes 0.000 description 3
- 239000002671 adjuvant Substances 0.000 description 3
- 108010081667 aflibercept Proteins 0.000 description 3
- 150000001413 amino acids Chemical group 0.000 description 3
- 229960003005 axitinib Drugs 0.000 description 3
- 229960000190 bacillus calmette–guérin vaccine Drugs 0.000 description 3
- 239000011324 bead Substances 0.000 description 3
- 238000001574 biopsy Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 239000007795 chemical reaction product Substances 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000003828 downregulation Effects 0.000 description 3
- 238000002651 drug therapy Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000000835 fiber Substances 0.000 description 3
- 239000000499 gel Substances 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 238000003365 immunocytochemistry Methods 0.000 description 3
- 238000007834 ligase chain reaction Methods 0.000 description 3
- 201000005202 lung cancer Diseases 0.000 description 3
- 208000020816 lung neoplasm Diseases 0.000 description 3
- 229950003968 motesanib Drugs 0.000 description 3
- RAHBGWKEPAQNFF-UHFFFAOYSA-N motesanib Chemical compound C=1C=C2C(C)(C)CNC2=CC=1NC(=O)C1=CC=CN=C1NCC1=CC=NC=C1 RAHBGWKEPAQNFF-UHFFFAOYSA-N 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 3
- 239000002773 nucleotide Substances 0.000 description 3
- 125000003729 nucleotide group Chemical group 0.000 description 3
- 238000011275 oncology therapy Methods 0.000 description 3
- 230000037361 pathway Effects 0.000 description 3
- CUIHSIWYWATEQL-UHFFFAOYSA-N pazopanib Chemical compound C1=CC2=C(C)N(C)N=C2C=C1N(C)C(N=1)=CC=NC=1NC1=CC=C(C)C(S(N)(=O)=O)=C1 CUIHSIWYWATEQL-UHFFFAOYSA-N 0.000 description 3
- 108091005981 phosphorylated proteins Proteins 0.000 description 3
- 229960003073 pirfenidone Drugs 0.000 description 3
- 229920001184 polypeptide Polymers 0.000 description 3
- PHXJVRSECIGDHY-UHFFFAOYSA-N ponatinib Chemical compound C1CN(C)CCN1CC(C(=C1)C(F)(F)F)=CC=C1NC(=O)C1=CC=C(C)C(C#CC=2N3N=CC=CC3=NC=2)=C1 PHXJVRSECIGDHY-UHFFFAOYSA-N 0.000 description 3
- 238000003498 protein array Methods 0.000 description 3
- 230000010076 replication Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 210000002966 serum Anatomy 0.000 description 3
- 230000011664 signaling Effects 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000001225 therapeutic effect Effects 0.000 description 3
- 230000004797 therapeutic response Effects 0.000 description 3
- 206010044412 transitional cell carcinoma Diseases 0.000 description 3
- 239000000107 tumor biomarker Substances 0.000 description 3
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 description 3
- 230000003827 upregulation Effects 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- AKNNEGZIBPJZJG-MSOLQXFVSA-N (-)-noscapine Chemical compound CN1CCC2=CC=3OCOC=3C(OC)=C2[C@@H]1[C@@H]1C2=CC=C(OC)C(OC)=C2C(=O)O1 AKNNEGZIBPJZJG-MSOLQXFVSA-N 0.000 description 2
- NXUWTKIOMJSLSV-DEEZXRHXSA-N (2S)-2-[[(2S)-2-[[(2S)-2-[[2-[[(2S)-2-[[(2S)-5-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-6-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-5-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-amino-3-(4-hydroxyphenyl)propanoyl]amino]propanoyl]amino]-5-carbamimidamidopentanoyl]amino]propanoyl]amino]propanoyl]amino]propanoyl]amino]-5-carbamimidamidopentanoyl]amino]-5-oxopentanoyl]amino]propanoyl]amino]-5-carbamimidamidopentanoyl]amino]propanoyl]amino]hexanoyl]amino]propanoyl]amino]-4-methylpentanoyl]amino]propanoyl]amino]-5-carbamimidamidopentanoyl]amino]-5-oxopentanoyl]amino]-4-methylpentanoyl]amino]acetyl]amino]-3-methylbutanoyl]amino]propanoyl]amino]propanoic acid Chemical compound CC(C)C[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](C)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](C)NC(=O)[C@H](CCCCN)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](C)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](C)NC(=O)[C@@H](N)Cc1ccc(O)cc1)C(=O)NCC(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(O)=O NXUWTKIOMJSLSV-DEEZXRHXSA-N 0.000 description 2
- SSOORFWOBGFTHL-OTEJMHTDSA-N (4S)-5-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-6-amino-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[2-[(2S)-2-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-6-amino-1-[[(2S)-1-[[(2S)-1-[[(2S,3S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-6-amino-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-1-[[(2S)-5-amino-1-[[(2S)-1-[[(2S)-1-[[(2S)-6-amino-1-[[(2S)-6-amino-1-[[(2S)-1-[[(2S)-1-[[(2S)-5-amino-1-[[(2S)-5-carbamimidamido-1-[[(2S)-5-carbamimidamido-1-[[(1S)-4-carbamimidamido-1-carboxybutyl]amino]-1-oxopentan-2-yl]amino]-1-oxopentan-2-yl]amino]-1,5-dioxopentan-2-yl]amino]-5-carbamimidamido-1-oxopentan-2-yl]amino]-5-carbamimidamido-1-oxopentan-2-yl]amino]-1-oxohexan-2-yl]amino]-1-oxohexan-2-yl]amino]-5-carbamimidamido-1-oxopentan-2-yl]amino]-4-methyl-1-oxopentan-2-yl]amino]-1,5-dioxopentan-2-yl]amino]-4-methyl-1-oxopentan-2-yl]amino]-3-hydroxy-1-oxopropan-2-yl]amino]-3-hydroxy-1-oxopropan-2-yl]amino]-3-hydroxy-1-oxopropan-2-yl]amino]-1-oxopropan-2-yl]amino]-1-oxohexan-2-yl]amino]-3-hydroxy-1-oxopropan-2-yl]amino]-1-oxo-3-phenylpropan-2-yl]amino]-3-methyl-1-oxopentan-2-yl]amino]-3-methyl-1-oxobutan-2-yl]amino]-5-carbamimidamido-1-oxopentan-2-yl]amino]-1-oxohexan-2-yl]amino]-3-methyl-1-oxobutan-2-yl]amino]-5-carbamimidamido-1-oxopentan-2-yl]amino]-3-methyl-1-oxobutan-2-yl]amino]-4-methyl-1-oxopentan-2-yl]amino]-1-oxopropan-2-yl]amino]-5-carbamimidamido-1-oxopentan-2-yl]carbamoyl]pyrrolidin-1-yl]-2-oxoethyl]amino]-3-(1H-indol-3-yl)-1-oxopropan-2-yl]amino]-4-methyl-1-oxopentan-2-yl]amino]-1-oxo-3-phenylpropan-2-yl]amino]-5-carbamimidamido-1-oxopentan-2-yl]amino]-1-oxohexan-2-yl]amino]-3-methyl-1-oxobutan-2-yl]amino]-5-carbamimidamido-1-oxopentan-2-yl]amino]-4-methyl-1-oxopentan-2-yl]amino]-1-oxo-3-phenylpropan-2-yl]amino]-3-(1H-imidazol-4-yl)-1-oxopropan-2-yl]amino]-3-methyl-1-oxobutan-2-yl]amino]-4-methyl-1-oxopentan-2-yl]amino]-4-[[(2S)-2-[[(2S)-2-[[(2S)-2,6-diaminohexanoyl]amino]-3-methylbutanoyl]amino]propanoyl]amino]-5-oxopentanoic acid Chemical compound CC[C@H](C)[C@H](NC(=O)[C@@H](NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@@H]1CCCN1C(=O)CNC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](Cc1c[nH]cn1)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@@H](N)CCCCN)C(C)C)C(C)C)C(C)C)C(C)C)C(C)C)C(C)C)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O SSOORFWOBGFTHL-OTEJMHTDSA-N 0.000 description 2
- 108091064702 1 family Proteins 0.000 description 2
- 101150084750 1 gene Proteins 0.000 description 2
- SPMVMDHWKHCIDT-UHFFFAOYSA-N 1-[2-chloro-4-[(6,7-dimethoxy-4-quinolinyl)oxy]phenyl]-3-(5-methyl-3-isoxazolyl)urea Chemical compound C=12C=C(OC)C(OC)=CC2=NC=CC=1OC(C=C1Cl)=CC=C1NC(=O)NC=1C=C(C)ON=1 SPMVMDHWKHCIDT-UHFFFAOYSA-N 0.000 description 2
- 101150000874 11 gene Proteins 0.000 description 2
- 101150078635 18 gene Proteins 0.000 description 2
- 101150112497 26 gene Proteins 0.000 description 2
- ALYNCZNDIQEVRV-UHFFFAOYSA-N 4-aminobenzoic acid Chemical compound NC1=CC=C(C(O)=O)C=C1 ALYNCZNDIQEVRV-UHFFFAOYSA-N 0.000 description 2
- PLIKAWJENQZMHA-UHFFFAOYSA-N 4-aminophenol Chemical compound NC1=CC=C(O)C=C1 PLIKAWJENQZMHA-UHFFFAOYSA-N 0.000 description 2
- 101150096316 5 gene Proteins 0.000 description 2
- 101150054149 ANGPTL4 gene Proteins 0.000 description 2
- 108010082162 AZX 100 Proteins 0.000 description 2
- 208000024893 Acute lymphoblastic leukemia Diseases 0.000 description 2
- 208000014697 Acute lymphocytic leukaemia Diseases 0.000 description 2
- 108010048154 Angiopoietin-1 Proteins 0.000 description 2
- 102000009088 Angiopoietin-1 Human genes 0.000 description 2
- 102100034608 Angiopoietin-2 Human genes 0.000 description 2
- 108010048036 Angiopoietin-2 Proteins 0.000 description 2
- 108700042530 Angiopoietin-Like Protein 4 Proteins 0.000 description 2
- 102400000068 Angiostatin Human genes 0.000 description 2
- 108010079709 Angiostatins Proteins 0.000 description 2
- 108091023037 Aptamer Proteins 0.000 description 2
- 241000045403 Astragalus propinquus Species 0.000 description 2
- 208000037260 Atherosclerotic Plaque Diseases 0.000 description 2
- 208000032791 BCR-ABL1 positive chronic myelogenous leukemia Diseases 0.000 description 2
- 101100208237 Bos taurus THBS2 gene Proteins 0.000 description 2
- 101710155857 C-C motif chemokine 2 Proteins 0.000 description 2
- 102100025248 C-X-C motif chemokine 10 Human genes 0.000 description 2
- 102100031168 CCN family member 2 Human genes 0.000 description 2
- 102100029968 Calreticulin Human genes 0.000 description 2
- 108090000549 Calreticulin Proteins 0.000 description 2
- 101800000626 Canstatin Proteins 0.000 description 2
- 102400000730 Canstatin Human genes 0.000 description 2
- 102100025064 Cellular tumor antigen p53 Human genes 0.000 description 2
- 102100031219 Centrosomal protein of 55 kDa Human genes 0.000 description 2
- 101710092479 Centrosomal protein of 55 kDa Proteins 0.000 description 2
- 102000000018 Chemokine CCL2 Human genes 0.000 description 2
- 102000019034 Chemokines Human genes 0.000 description 2
- 108010012236 Chemokines Proteins 0.000 description 2
- 102100031186 Chromogranin-A Human genes 0.000 description 2
- 208000010833 Chronic myeloid leukaemia Diseases 0.000 description 2
- 102100031162 Collagen alpha-1(XVIII) chain Human genes 0.000 description 2
- 108010039419 Connective Tissue Growth Factor Proteins 0.000 description 2
- 229940046168 CpG oligodeoxynucleotide Drugs 0.000 description 2
- 102000004127 Cytokines Human genes 0.000 description 2
- 108090000695 Cytokines Proteins 0.000 description 2
- 238000001712 DNA sequencing Methods 0.000 description 2
- 108010079505 Endostatins Proteins 0.000 description 2
- 102000016970 Follistatin Human genes 0.000 description 2
- 108010014612 Follistatin Proteins 0.000 description 2
- 108091006027 G proteins Proteins 0.000 description 2
- 102100032340 G2/mitotic-specific cyclin-B1 Human genes 0.000 description 2
- 102000030782 GTP binding Human genes 0.000 description 2
- 108091000058 GTP-Binding Proteins 0.000 description 2
- 102100028501 Galanin peptides Human genes 0.000 description 2
- 229940126043 Galectin-3 inhibitor Drugs 0.000 description 2
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical class C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 2
- 101000695054 Homo sapiens Bombesin receptor subtype-3 Proteins 0.000 description 2
- 101000776612 Homo sapiens Cilia- and flagella-associated protein 73 Proteins 0.000 description 2
- 101000868643 Homo sapiens G2/mitotic-specific cyclin-B1 Proteins 0.000 description 2
- 101000860415 Homo sapiens Galanin peptides Proteins 0.000 description 2
- 101001112162 Homo sapiens Kinetochore protein NDC80 homolog Proteins 0.000 description 2
- 101000590482 Homo sapiens Kinetochore protein Nuf2 Proteins 0.000 description 2
- 101001012669 Homo sapiens Melanoma inhibitory activity protein 2 Proteins 0.000 description 2
- 101000690940 Homo sapiens Pro-adrenomedullin Proteins 0.000 description 2
- 101000595904 Homo sapiens Procollagen-lysine,2-oxoglutarate 5-dioxygenase 1 Proteins 0.000 description 2
- 101000945496 Homo sapiens Proliferation marker protein Ki-67 Proteins 0.000 description 2
- 101000575639 Homo sapiens Ribonucleoside-diphosphate reductase subunit M2 Proteins 0.000 description 2
- 101001087372 Homo sapiens Securin Proteins 0.000 description 2
- 101000809797 Homo sapiens Thymidylate synthase Proteins 0.000 description 2
- 101000807354 Homo sapiens Ubiquitin-conjugating enzyme E2 C Proteins 0.000 description 2
- 101150103227 IFN gene Proteins 0.000 description 2
- 108090000467 Interferon-beta Proteins 0.000 description 2
- 102100023890 Kinetochore protein NDC80 homolog Human genes 0.000 description 2
- 102100032431 Kinetochore protein Nuf2 Human genes 0.000 description 2
- 239000005517 L01XE01 - Imatinib Substances 0.000 description 2
- 239000002118 L01XE12 - Vandetanib Substances 0.000 description 2
- 239000002176 L01XE26 - Cabozantinib Substances 0.000 description 2
- 108010064548 Lymphocyte Function-Associated Antigen-1 Proteins 0.000 description 2
- 206010025323 Lymphomas Diseases 0.000 description 2
- 102000004137 Lysophosphatidic Acid Receptors Human genes 0.000 description 2
- 108090000642 Lysophosphatidic Acid Receptors Proteins 0.000 description 2
- 108091054438 MHC class II family Proteins 0.000 description 2
- 102000043131 MHC class II family Human genes 0.000 description 2
- 108010083015 MMI-0100 Proteins 0.000 description 2
- 208000000172 Medulloblastoma Diseases 0.000 description 2
- 102100029778 Melanoma inhibitory activity protein 2 Human genes 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 102100025276 Monocarboxylate transporter 4 Human genes 0.000 description 2
- 101100407308 Mus musculus Pdcd1lg2 gene Proteins 0.000 description 2
- 201000003793 Myelodysplastic syndrome Diseases 0.000 description 2
- 208000033761 Myelogenous Chronic BCR-ABL Positive Leukemia Diseases 0.000 description 2
- 208000014767 Myeloproliferative disease Diseases 0.000 description 2
- 201000007224 Myeloproliferative neoplasm Diseases 0.000 description 2
- 208000015914 Non-Hodgkin lymphomas Diseases 0.000 description 2
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 2
- 108091005461 Nucleic proteins Proteins 0.000 description 2
- 108010081689 Osteopontin Proteins 0.000 description 2
- 102000004264 Osteopontin Human genes 0.000 description 2
- 108010089484 PRM-151 Proteins 0.000 description 2
- 102000004211 Platelet factor 4 Human genes 0.000 description 2
- 108090000778 Platelet factor 4 Proteins 0.000 description 2
- 208000006664 Precursor Cell Lymphoblastic Leukemia-Lymphoma Diseases 0.000 description 2
- 102100035202 Procollagen-lysine,2-oxoglutarate 5-dioxygenase 1 Human genes 0.000 description 2
- 108700030875 Programmed Cell Death 1 Ligand 2 Proteins 0.000 description 2
- 102100024213 Programmed cell death 1 ligand 2 Human genes 0.000 description 2
- 102000003946 Prolactin Human genes 0.000 description 2
- 108010057464 Prolactin Proteins 0.000 description 2
- 102100034836 Proliferation marker protein Ki-67 Human genes 0.000 description 2
- 238000010802 RNA extraction kit Methods 0.000 description 2
- 108091030071 RNAI Proteins 0.000 description 2
- 102100026006 Ribonucleoside-diphosphate reductase subunit M2 Human genes 0.000 description 2
- 108010005173 SERPIN-B5 Proteins 0.000 description 2
- 101000757182 Saccharomyces cerevisiae Glucoamylase S2 Proteins 0.000 description 2
- 241001072909 Salvia Species 0.000 description 2
- 235000007154 Salvia chinensis Nutrition 0.000 description 2
- 240000006079 Schisandra chinensis Species 0.000 description 2
- 235000008422 Schisandra chinensis Nutrition 0.000 description 2
- 108010086019 Secretin Proteins 0.000 description 2
- 102100037505 Secretin Human genes 0.000 description 2
- 102100033004 Securin Human genes 0.000 description 2
- 102100030333 Serpin B5 Human genes 0.000 description 2
- 210000001744 T-lymphocyte Anatomy 0.000 description 2
- 102100038618 Thymidylate synthase Human genes 0.000 description 2
- 102000004887 Transforming Growth Factor beta Human genes 0.000 description 2
- 108090001012 Transforming Growth Factor beta Proteins 0.000 description 2
- 108010009583 Transforming Growth Factors Proteins 0.000 description 2
- 102000009618 Transforming Growth Factors Human genes 0.000 description 2
- 102100037256 Ubiquitin-conjugating enzyme E2 C Human genes 0.000 description 2
- 108091008605 VEGF receptors Proteins 0.000 description 2
- 108010000134 Vascular Cell Adhesion Molecule-1 Proteins 0.000 description 2
- 208000014070 Vestibular schwannoma Diseases 0.000 description 2
- 208000008383 Wilms tumor Diseases 0.000 description 2
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 2
- 230000001594 aberrant effect Effects 0.000 description 2
- 239000002253 acid Substances 0.000 description 2
- 208000004064 acoustic neuroma Diseases 0.000 description 2
- 208000009956 adenocarcinoma Diseases 0.000 description 2
- AKNNEGZIBPJZJG-UHFFFAOYSA-N alpha-noscapine Natural products CN1CCC2=CC=3OCOC=3C(OC)=C2C1C1C2=CC=C(OC)C(OC)=C2C(=O)O1 AKNNEGZIBPJZJG-UHFFFAOYSA-N 0.000 description 2
- 229960004050 aminobenzoic acid Drugs 0.000 description 2
- 239000002246 antineoplastic agent Substances 0.000 description 2
- 230000006907 apoptotic process Effects 0.000 description 2
- FZCSTZYAHCUGEM-UHFFFAOYSA-N aspergillomarasmine B Natural products OC(=O)CNC(C(O)=O)CNC(C(O)=O)CC(O)=O FZCSTZYAHCUGEM-UHFFFAOYSA-N 0.000 description 2
- 235000006533 astragalus Nutrition 0.000 description 2
- 102000012740 beta Adrenergic Receptors Human genes 0.000 description 2
- 108010079452 beta Adrenergic Receptors Proteins 0.000 description 2
- 229960001292 cabozantinib Drugs 0.000 description 2
- ONIQOQHATWINJY-UHFFFAOYSA-N cabozantinib Chemical compound C=12C=C(OC)C(OC)=CC2=NC=CC=1OC(C=C1)=CC=C1NC(=O)C1(C(=O)NC=2C=CC(F)=CC=2)CC1 ONIQOQHATWINJY-UHFFFAOYSA-N 0.000 description 2
- 208000002458 carcinoid tumor Diseases 0.000 description 2
- 208000021668 chronic eosinophilic leukemia Diseases 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 230000002860 competitive effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000011109 contamination Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000010219 correlation analysis Methods 0.000 description 2
- 229940127089 cytotoxic agent Drugs 0.000 description 2
- POZRVZJJTULAOH-LHZXLZLDSA-N danazol Chemical compound C1[C@]2(C)[C@H]3CC[C@](C)([C@](CC4)(O)C#C)[C@@H]4[C@@H]3CCC2=CC2=C1C=NO2 POZRVZJJTULAOH-LHZXLZLDSA-N 0.000 description 2
- 229960000766 danazol Drugs 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 210000004443 dendritic cell Anatomy 0.000 description 2
- LOKCTEFSRHRXRJ-UHFFFAOYSA-I dipotassium trisodium dihydrogen phosphate hydrogen phosphate dichloride Chemical compound P(=O)(O)(O)[O-].[K+].P(=O)(O)([O-])[O-].[Na+].[Na+].[Cl-].[K+].[Cl-].[Na+] LOKCTEFSRHRXRJ-UHFFFAOYSA-I 0.000 description 2
- 238000002224 dissection Methods 0.000 description 2
- 230000003511 endothelial effect Effects 0.000 description 2
- 229940017733 esbriet Drugs 0.000 description 2
- 239000000834 fixative Substances 0.000 description 2
- 108020001507 fusion proteins Proteins 0.000 description 2
- 102000037865 fusion proteins Human genes 0.000 description 2
- 206010017758 gastric cancer Diseases 0.000 description 2
- 230000009368 gene silencing by RNA Effects 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 239000003102 growth factor Substances 0.000 description 2
- 230000002489 hematologic effect Effects 0.000 description 2
- 229960002411 imatinib Drugs 0.000 description 2
- KTUFNOKKBVMGRW-UHFFFAOYSA-N imatinib Chemical compound C1CN(C)CCN1CC1=CC=C(C(=O)NC=2C=C(NC=3N=C(C=CN=3)C=3C=NC=CC=3)C(C)=CC=2)C=C1 KTUFNOKKBVMGRW-UHFFFAOYSA-N 0.000 description 2
- 230000001900 immune effect Effects 0.000 description 2
- 239000000367 immunologic factor Substances 0.000 description 2
- 238000001114 immunoprecipitation Methods 0.000 description 2
- 108090000237 interleukin-24 Proteins 0.000 description 2
- 102000003898 interleukin-24 Human genes 0.000 description 2
- 208000032839 leukemia Diseases 0.000 description 2
- 206010024627 liposarcoma Diseases 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000036210 malignancy Effects 0.000 description 2
- 206010027191 meningioma Diseases 0.000 description 2
- 238000010197 meta-analysis Methods 0.000 description 2
- 230000001394 metastastic effect Effects 0.000 description 2
- 206010061289 metastatic neoplasm Diseases 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 201000000050 myeloid neoplasm Diseases 0.000 description 2
- PLPRGLOFPNJOTN-UHFFFAOYSA-N narcotine Natural products COc1ccc2C(OC(=O)c2c1OC)C3Cc4c(CN3C)cc5OCOc5c4OC PLPRGLOFPNJOTN-UHFFFAOYSA-N 0.000 description 2
- 229930014626 natural product Natural products 0.000 description 2
- 229960004708 noscapine Drugs 0.000 description 2
- 239000002853 nucleic acid probe Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 201000008968 osteosarcoma Diseases 0.000 description 2
- 238000012567 pattern recognition method Methods 0.000 description 2
- 229960000639 pazopanib Drugs 0.000 description 2
- 229960002621 pembrolizumab Drugs 0.000 description 2
- 239000002953 phosphate buffered saline Substances 0.000 description 2
- 229920000642 polymer Polymers 0.000 description 2
- 229960001131 ponatinib Drugs 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 229940097325 prolactin Drugs 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 210000002307 prostate Anatomy 0.000 description 2
- 235000019833 protease Nutrition 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 229940044551 receptor antagonist Drugs 0.000 description 2
- 239000002464 receptor antagonist Substances 0.000 description 2
- FNHKPVJBJVTLMP-UHFFFAOYSA-N regorafenib Chemical compound C1=NC(C(=O)NC)=CC(OC=2C=C(F)C(NC(=O)NC=3C=C(C(Cl)=CC=3)C(F)(F)F)=CC=2)=C1 FNHKPVJBJVTLMP-UHFFFAOYSA-N 0.000 description 2
- 238000010839 reverse transcription Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 229960002101 secretin Drugs 0.000 description 2
- OWMZNFCDEHGFEP-NFBCVYDUSA-N secretin human Chemical compound C([C@@H](C(=O)N[C@H](C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C(C)C)C(N)=O)[C@@H](C)O)NC(=O)[C@@H](NC(=O)CNC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC=1NC=NC=1)[C@@H](C)O)C1=CC=CC=C1 OWMZNFCDEHGFEP-NFBCVYDUSA-N 0.000 description 2
- RGYQPQARIQKJKH-UHFFFAOYSA-N setanaxib Chemical compound CN(C)C1=CC=CC(C2=C3C(=O)N(C=4C(=CC=CC=4)Cl)NC3=CC(=O)N2C)=C1 RGYQPQARIQKJKH-UHFFFAOYSA-N 0.000 description 2
- PEGQOIGYZLJMIB-UHFFFAOYSA-N setogepram Chemical compound CCCCCC1=CC=CC(CC(O)=O)=C1 PEGQOIGYZLJMIB-UHFFFAOYSA-N 0.000 description 2
- 150000003384 small molecules Chemical group 0.000 description 2
- 206010041823 squamous cell carcinoma Diseases 0.000 description 2
- 238000010186 staining Methods 0.000 description 2
- 238000010561 standard procedure Methods 0.000 description 2
- 239000007858 starting material Substances 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- 229960001796 sunitinib Drugs 0.000 description 2
- WINHZLLDWRZWRT-ATVHPVEESA-N sunitinib Chemical compound CCN(CC)CCNC(=O)C1=C(C)NC(\C=C/2C3=CC(F)=CC=C3NC\2=O)=C1C WINHZLLDWRZWRT-ATVHPVEESA-N 0.000 description 2
- 208000011580 syndromic disease Diseases 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000002626 targeted therapy Methods 0.000 description 2
- ZRKFYGHZFMAOKI-QMGMOQQFSA-N tgfbeta Chemical compound C([C@H](NC(=O)[C@H](C(C)C)NC(=O)CNC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CC(C)C)NC(=O)CNC(=O)[C@H](C)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CCSC)C(C)C)[C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](C)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N1[C@@H](CCC1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O)C1=CC=C(O)C=C1 ZRKFYGHZFMAOKI-QMGMOQQFSA-N 0.000 description 2
- 229960000940 tivozanib Drugs 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 230000002103 transcriptional effect Effects 0.000 description 2
- 229940030325 tumor cell vaccine Drugs 0.000 description 2
- VBEQCZHXXJYVRD-GACYYNSASA-N uroanthelone Chemical compound C([C@@H](C(=O)N[C@H](C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CS)C(=O)N[C@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CS)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(O)=O)C(C)C)[C@@H](C)O)NC(=O)[C@H](CO)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@@H](NC(=O)[C@H](CC=1NC=NC=1)NC(=O)[C@H](CCSC)NC(=O)[C@H](CS)NC(=O)[C@@H](NC(=O)CNC(=O)CNC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CS)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)CNC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H](CO)NC(=O)[C@H](CO)NC(=O)[C@H]1N(CCC1)C(=O)[C@H](CS)NC(=O)CNC(=O)[C@H]1N(CCC1)C(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC(N)=O)C(C)C)[C@@H](C)CC)C1=CC=C(O)C=C1 VBEQCZHXXJYVRD-GACYYNSASA-N 0.000 description 2
- 208000023747 urothelial carcinoma Diseases 0.000 description 2
- UHTHHESEBZOYNR-UHFFFAOYSA-N vandetanib Chemical compound COC1=CC(C(/N=CN2)=N/C=3C(=CC(Br)=CC=3)F)=C2C=C1OCC1CCN(C)CC1 UHTHHESEBZOYNR-UHFFFAOYSA-N 0.000 description 2
- 108010060757 vasostatin Proteins 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000001262 western blot Methods 0.000 description 2
- 239000011701 zinc Substances 0.000 description 2
- 229910052725 zinc Inorganic materials 0.000 description 2
- WCWUXEGQKLTGDX-LLVKDONJSA-N (2R)-1-[[4-[(4-fluoro-2-methyl-1H-indol-5-yl)oxy]-5-methyl-6-pyrrolo[2,1-f][1,2,4]triazinyl]oxy]-2-propanol Chemical compound C1=C2NC(C)=CC2=C(F)C(OC2=NC=NN3C=C(C(=C32)C)OC[C@H](O)C)=C1 WCWUXEGQKLTGDX-LLVKDONJSA-N 0.000 description 1
- LOGFVTREOLYCPF-KXNHARMFSA-N (2s,3r)-2-[[(2r)-1-[(2s)-2,6-diaminohexanoyl]pyrrolidine-2-carbonyl]amino]-3-hydroxybutanoic acid Chemical compound C[C@@H](O)[C@@H](C(O)=O)NC(=O)[C@H]1CCCN1C(=O)[C@@H](N)CCCCN LOGFVTREOLYCPF-KXNHARMFSA-N 0.000 description 1
- KCOYQXZDFIIGCY-CZIZESTLSA-N (3e)-4-amino-5-fluoro-3-[5-(4-methylpiperazin-1-yl)-1,3-dihydrobenzimidazol-2-ylidene]quinolin-2-one Chemical compound C1CN(C)CCN1C1=CC=C(N\C(N2)=C/3C(=C4C(F)=CC=CC4=NC\3=O)N)C2=C1 KCOYQXZDFIIGCY-CZIZESTLSA-N 0.000 description 1
- MZOFCQQQCNRIBI-VMXHOPILSA-N (3s)-4-[[(2s)-1-[[(2s)-1-[[(1s)-1-carboxy-2-hydroxyethyl]amino]-4-methyl-1-oxopentan-2-yl]amino]-5-(diaminomethylideneamino)-1-oxopentan-2-yl]amino]-3-[[2-[[(2s)-2,6-diaminohexanoyl]amino]acetyl]amino]-4-oxobutanoic acid Chemical compound OC[C@@H](C(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC(O)=O)NC(=O)CNC(=O)[C@@H](N)CCCCN MZOFCQQQCNRIBI-VMXHOPILSA-N 0.000 description 1
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- LFKQSJNCVRGFCC-UHFFFAOYSA-N 1-(2,4-difluorophenyl)-3-[4-[(6,7-dimethoxy-4-quinolinyl)oxy]-2-fluorophenyl]urea Chemical compound C=12C=C(OC)C(OC)=CC2=NC=CC=1OC(C=C1F)=CC=C1NC(=O)NC1=CC=C(F)C=C1F LFKQSJNCVRGFCC-UHFFFAOYSA-N 0.000 description 1
- YAHLWSGIQJATGG-UHFFFAOYSA-N 1-(4-chlorophenyl)cyclopropane-1-carboxylic acid Chemical compound C=1C=C(Cl)C=CC=1C1(C(=O)O)CC1 YAHLWSGIQJATGG-UHFFFAOYSA-N 0.000 description 1
- DEEOXSOLTLIWMG-UHFFFAOYSA-N 1-[2-[5-(2-methoxyethoxy)-1-benzimidazolyl]-8-quinolinyl]-4-piperidinamine Chemical compound C1=NC2=CC(OCCOC)=CC=C2N1C(N=C12)=CC=C1C=CC=C2N1CCC(N)CC1 DEEOXSOLTLIWMG-UHFFFAOYSA-N 0.000 description 1
- VPBYZLCHOKSGRX-UHFFFAOYSA-N 1-[2-chloro-4-(6,7-dimethoxyquinazolin-4-yl)oxyphenyl]-3-propylurea Chemical compound C1=C(Cl)C(NC(=O)NCCC)=CC=C1OC1=NC=NC2=CC(OC)=C(OC)C=C12 VPBYZLCHOKSGRX-UHFFFAOYSA-N 0.000 description 1
- UEJJHQNACJXSKW-UHFFFAOYSA-N 2-(2,6-dioxopiperidin-3-yl)-1H-isoindole-1,3(2H)-dione Chemical compound O=C1C2=CC=CC=C2C(=O)N1C1CCC(=O)NC1=O UEJJHQNACJXSKW-UHFFFAOYSA-N 0.000 description 1
- 102000008490 2-Oxoglutarate 5-Dioxygenase Procollagen-Lysine Human genes 0.000 description 1
- 108010020504 2-Oxoglutarate 5-Dioxygenase Procollagen-Lysine Proteins 0.000 description 1
- KPGXRSRHYNQIFN-UHFFFAOYSA-N 2-oxoglutaric acid Chemical compound OC(=O)CCC(=O)C(O)=O KPGXRSRHYNQIFN-UHFFFAOYSA-N 0.000 description 1
- NHFDRBXTEDBWCZ-ZROIWOOFSA-N 3-[2,4-dimethyl-5-[(z)-(2-oxo-1h-indol-3-ylidene)methyl]-1h-pyrrol-3-yl]propanoic acid Chemical compound OC(=O)CCC1=C(C)NC(\C=C/2C3=CC=CC=C3NC\2=O)=C1C NHFDRBXTEDBWCZ-ZROIWOOFSA-N 0.000 description 1
- QFCXANHHBCGMAS-UHFFFAOYSA-N 4-[[4-(4-chloroanilino)furo[2,3-d]pyridazin-7-yl]oxymethyl]-n-methylpyridine-2-carboxamide Chemical compound C1=NC(C(=O)NC)=CC(COC=2C=3OC=CC=3C(NC=3C=CC(Cl)=CC=3)=NN=2)=C1 QFCXANHHBCGMAS-UHFFFAOYSA-N 0.000 description 1
- XXLPVQZYQCGXOV-UHFFFAOYSA-N 4-amino-5-fluoro-3-[6-(4-methylpiperazin-1-yl)-1H-benzimidazol-2-yl]-1H-quinolin-2-one 2-hydroxypropanoic acid Chemical compound CC(O)C(O)=O.CC(O)C(O)=O.CN1CCN(CC1)c1ccc2nc([nH]c2c1)-c1c(N)c2c(F)cccc2[nH]c1=O XXLPVQZYQCGXOV-UHFFFAOYSA-N 0.000 description 1
- OQRXBXNATIHDQO-UHFFFAOYSA-N 6-chloropyridine-3,4-diamine Chemical compound NC1=CN=C(Cl)C=C1N OQRXBXNATIHDQO-UHFFFAOYSA-N 0.000 description 1
- JGEBLDKNWBUGRZ-HXUWFJFHSA-N 9-[[[(2r)-1,4-dioxan-2-yl]methyl-methylsulfamoyl]amino]-2-(1-methylpyrazol-4-yl)-11-oxobenzo[1,2]cyclohepta[2,4-b]pyridine Chemical compound C=1C=C2C=CC3=NC=C(C4=CN(C)N=C4)C=C3C(=O)C2=CC=1NS(=O)(=O)N(C)C[C@@H]1COCCO1 JGEBLDKNWBUGRZ-HXUWFJFHSA-N 0.000 description 1
- 102100022900 Actin, cytoplasmic 1 Human genes 0.000 description 1
- 108010085238 Actins Proteins 0.000 description 1
- 201000003076 Angiosarcoma Diseases 0.000 description 1
- 108020005544 Antisense RNA Proteins 0.000 description 1
- 102000004411 Antithrombin III Human genes 0.000 description 1
- 108090000935 Antithrombin III Proteins 0.000 description 1
- 206010073360 Appendix cancer Diseases 0.000 description 1
- 108010002913 Asialoglycoproteins Proteins 0.000 description 1
- 206010003571 Astrocytoma Diseases 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 208000010839 B-cell chronic lymphocytic leukemia Diseases 0.000 description 1
- 108010074708 B7-H1 Antigen Proteins 0.000 description 1
- MLDQJTXFUGDVEO-UHFFFAOYSA-N BAY-43-9006 Chemical compound C1=NC(C(=O)NC)=CC(OC=2C=CC(NC(=O)NC=3C=C(C(Cl)=CC=3)C(F)(F)F)=CC=2)=C1 MLDQJTXFUGDVEO-UHFFFAOYSA-N 0.000 description 1
- 206010004146 Basal cell carcinoma Diseases 0.000 description 1
- 108010027344 Basic Helix-Loop-Helix Transcription Factors Proteins 0.000 description 1
- 102000018720 Basic Helix-Loop-Helix Transcription Factors Human genes 0.000 description 1
- 241000212384 Bifora Species 0.000 description 1
- 206010004593 Bile duct cancer Diseases 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 108700020472 CDC20 Proteins 0.000 description 1
- 102100035632 Calcyphosin Human genes 0.000 description 1
- 101710085913 Calcyphosin Proteins 0.000 description 1
- 102100025570 Cancer/testis antigen 1 Human genes 0.000 description 1
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 206010007275 Carcinoid tumour Diseases 0.000 description 1
- 102000014914 Carrier Proteins Human genes 0.000 description 1
- 108010078791 Carrier Proteins Proteins 0.000 description 1
- 101150023302 Cdc20 gene Proteins 0.000 description 1
- 102100038099 Cell division cycle protein 20 homolog Human genes 0.000 description 1
- 206010008342 Cervix carcinoma Diseases 0.000 description 1
- 108010008978 Chemokine CXCL10 Proteins 0.000 description 1
- 208000005243 Chondrosarcoma Diseases 0.000 description 1
- 201000009047 Chordoma Diseases 0.000 description 1
- 208000006332 Choriocarcinoma Diseases 0.000 description 1
- 102100024484 Codanin-1 Human genes 0.000 description 1
- 108010001463 Collagen Type XVIII Proteins 0.000 description 1
- 102000047200 Collagen Type XVIII Human genes 0.000 description 1
- 108010071942 Colony-Stimulating Factors Proteins 0.000 description 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 1
- 208000009798 Craniopharyngioma Diseases 0.000 description 1
- 102100028908 Cullin-3 Human genes 0.000 description 1
- CMSMOCZEIVJLDB-UHFFFAOYSA-N Cyclophosphamide Chemical compound ClCCN(CCCl)P1(=O)NCCCO1 CMSMOCZEIVJLDB-UHFFFAOYSA-N 0.000 description 1
- PMATZTZNYRCHOR-CGLBZJNRSA-N Cyclosporin A Chemical compound CC[C@@H]1NC(=O)[C@H]([C@H](O)[C@H](C)C\C=C\C)N(C)C(=O)[C@H](C(C)C)N(C)C(=O)[C@H](CC(C)C)N(C)C(=O)[C@H](CC(C)C)N(C)C(=O)[C@@H](C)NC(=O)[C@H](C)NC(=O)[C@H](CC(C)C)N(C)C(=O)[C@H](C(C)C)NC(=O)[C@H](CC(C)C)N(C)C(=O)CN(C)C1=O PMATZTZNYRCHOR-CGLBZJNRSA-N 0.000 description 1
- 108010036949 Cyclosporine Proteins 0.000 description 1
- 108010015742 Cytochrome P-450 Enzyme System Proteins 0.000 description 1
- 102000003849 Cytochrome P450 Human genes 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 108020003215 DNA Probes Proteins 0.000 description 1
- 238000000018 DNA microarray Methods 0.000 description 1
- 239000003298 DNA probe Substances 0.000 description 1
- 230000004568 DNA-binding Effects 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 101710088194 Dehydrogenase Proteins 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 108010053770 Deoxyribonucleases Proteins 0.000 description 1
- 102000016911 Deoxyribonucleases Human genes 0.000 description 1
- 102000016680 Dioxygenases Human genes 0.000 description 1
- 108010028143 Dioxygenases Proteins 0.000 description 1
- 101800001224 Disintegrin Proteins 0.000 description 1
- 201000009051 Embryonal Carcinoma Diseases 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 206010014950 Eosinophilia Diseases 0.000 description 1
- 206010014967 Ependymoma Diseases 0.000 description 1
- 102400001368 Epidermal growth factor Human genes 0.000 description 1
- 101800003838 Epidermal growth factor Proteins 0.000 description 1
- 241000283086 Equidae Species 0.000 description 1
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 1
- 108700039887 Essential Genes Proteins 0.000 description 1
- 208000032027 Essential Thrombocythemia Diseases 0.000 description 1
- 108090000371 Esterases Proteins 0.000 description 1
- 208000006168 Ewing Sarcoma Diseases 0.000 description 1
- 108060002716 Exonuclease Proteins 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 102000003974 Fibroblast growth factor 2 Human genes 0.000 description 1
- 108090000379 Fibroblast growth factor 2 Proteins 0.000 description 1
- 229940123256 Fibroblast growth factor antagonist Drugs 0.000 description 1
- 102100023600 Fibroblast growth factor receptor 2 Human genes 0.000 description 1
- 101710182389 Fibroblast growth factor receptor 2 Proteins 0.000 description 1
- 102100027842 Fibroblast growth factor receptor 3 Human genes 0.000 description 1
- 101710182396 Fibroblast growth factor receptor 3 Proteins 0.000 description 1
- 201000008808 Fibrosarcoma Diseases 0.000 description 1
- 229940032072 GVAX vaccine Drugs 0.000 description 1
- 208000032612 Glial tumor Diseases 0.000 description 1
- 206010018338 Glioma Diseases 0.000 description 1
- 102100036646 Glutamyl-tRNA(Gln) amidotransferase subunit A, mitochondrial Human genes 0.000 description 1
- SXRSQZLOMIGNAQ-UHFFFAOYSA-N Glutaraldehyde Chemical compound O=CCCCC=O SXRSQZLOMIGNAQ-UHFFFAOYSA-N 0.000 description 1
- 102100031181 Glyceraldehyde-3-phosphate dehydrogenase Human genes 0.000 description 1
- 102000004269 Granulocyte Colony-Stimulating Factor Human genes 0.000 description 1
- 108010017080 Granulocyte Colony-Stimulating Factor Proteins 0.000 description 1
- 108010017213 Granulocyte-Macrophage Colony-Stimulating Factor Proteins 0.000 description 1
- 102100039620 Granulocyte-macrophage colony-stimulating factor Human genes 0.000 description 1
- 208000001258 Hemangiosarcoma Diseases 0.000 description 1
- 208000002250 Hematologic Neoplasms Diseases 0.000 description 1
- 108010061414 Hepatocyte Nuclear Factor 1-beta Proteins 0.000 description 1
- 102100022123 Hepatocyte nuclear factor 1-beta Human genes 0.000 description 1
- 208000017604 Hodgkin disease Diseases 0.000 description 1
- 208000021519 Hodgkin lymphoma Diseases 0.000 description 1
- 208000010747 Hodgkins lymphoma Diseases 0.000 description 1
- 101000693076 Homo sapiens Angiopoietin-related protein 4 Proteins 0.000 description 1
- 101000858088 Homo sapiens C-X-C motif chemokine 10 Proteins 0.000 description 1
- 101000856237 Homo sapiens Cancer/testis antigen 1 Proteins 0.000 description 1
- 101000980888 Homo sapiens Codanin-1 Proteins 0.000 description 1
- 101000916238 Homo sapiens Cullin-3 Proteins 0.000 description 1
- 101001072655 Homo sapiens Glutamyl-tRNA(Gln) amidotransferase subunit A, mitochondrial Proteins 0.000 description 1
- 101000576802 Homo sapiens Mesothelin Proteins 0.000 description 1
- 101000669513 Homo sapiens Metalloproteinase inhibitor 1 Proteins 0.000 description 1
- 101000645296 Homo sapiens Metalloproteinase inhibitor 2 Proteins 0.000 description 1
- 101000831266 Homo sapiens Metalloproteinase inhibitor 4 Proteins 0.000 description 1
- 101000904196 Homo sapiens Pancreatic secretory granule membrane major glycoprotein GP2 Proteins 0.000 description 1
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 description 1
- 101000742859 Homo sapiens Retinoblastoma-associated protein Proteins 0.000 description 1
- 101000655352 Homo sapiens Telomerase reverse transcriptase Proteins 0.000 description 1
- 101000830596 Homo sapiens Tumor necrosis factor ligand superfamily member 15 Proteins 0.000 description 1
- 101000987003 Homo sapiens Tumor protein 63 Proteins 0.000 description 1
- 101001052849 Homo sapiens Tyrosine-protein kinase Fer Proteins 0.000 description 1
- 101000760337 Homo sapiens Urokinase plasminogen activator surface receptor Proteins 0.000 description 1
- 206010048643 Hypereosinophilic syndrome Diseases 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 201000003803 Inflammatory myofibroblastic tumor Diseases 0.000 description 1
- 206010067917 Inflammatory myofibroblastic tumour Diseases 0.000 description 1
- 108010064600 Intercellular Adhesion Molecule-3 Proteins 0.000 description 1
- 101710148794 Intercellular adhesion molecule 2 Proteins 0.000 description 1
- 102100037872 Intercellular adhesion molecule 2 Human genes 0.000 description 1
- 102100037871 Intercellular adhesion molecule 3 Human genes 0.000 description 1
- 102100026720 Interferon beta Human genes 0.000 description 1
- 102100037850 Interferon gamma Human genes 0.000 description 1
- 102000003996 Interferon-beta Human genes 0.000 description 1
- 102000003777 Interleukin-1 beta Human genes 0.000 description 1
- 108090000193 Interleukin-1 beta Proteins 0.000 description 1
- 108010002350 Interleukin-2 Proteins 0.000 description 1
- 108090000978 Interleukin-4 Proteins 0.000 description 1
- 108010063738 Interleukins Proteins 0.000 description 1
- 102000015696 Interleukins Human genes 0.000 description 1
- 102000011845 Iodide peroxidase Human genes 0.000 description 1
- 108010036012 Iodide peroxidase Proteins 0.000 description 1
- 208000008839 Kidney Neoplasms Diseases 0.000 description 1
- 108010062028 L-BLP25 Proteins 0.000 description 1
- JZKXXXDKRQWDET-QMMMGPOBSA-N L-m-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC(O)=C1 JZKXXXDKRQWDET-QMMMGPOBSA-N 0.000 description 1
- 239000002138 L01XE21 - Regorafenib Substances 0.000 description 1
- 239000002139 L01XE22 - Masitinib Substances 0.000 description 1
- 208000018142 Leiomyosarcoma Diseases 0.000 description 1
- 241000186779 Listeria monocytogenes Species 0.000 description 1
- 208000007433 Lymphatic Metastasis Diseases 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- 108700018351 Major Histocompatibility Complex Proteins 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 208000025205 Mantle-Cell Lymphoma Diseases 0.000 description 1
- 208000007054 Medullary Carcinoma Diseases 0.000 description 1
- 208000002030 Merkel cell carcinoma Diseases 0.000 description 1
- 102000005741 Metalloproteases Human genes 0.000 description 1
- 108010006035 Metalloproteases Proteins 0.000 description 1
- 102100039364 Metalloproteinase inhibitor 1 Human genes 0.000 description 1
- 102100026262 Metalloproteinase inhibitor 2 Human genes 0.000 description 1
- 102100026261 Metalloproteinase inhibitor 3 Human genes 0.000 description 1
- 102100024289 Metalloproteinase inhibitor 4 Human genes 0.000 description 1
- 102000000562 Monocarboxylic Acid Transporters Human genes 0.000 description 1
- 108010041817 Monocarboxylic Acid Transporters Proteins 0.000 description 1
- 108010008707 Mucin-1 Proteins 0.000 description 1
- 108010063954 Mucins Proteins 0.000 description 1
- 102000015728 Mucins Human genes 0.000 description 1
- 208000034578 Multiple myelomas Diseases 0.000 description 1
- 241000187644 Mycobacterium vaccae Species 0.000 description 1
- FFDGPVCHZBVARC-UHFFFAOYSA-N N,N-dimethylglycine Chemical compound CN(C)CC(O)=O FFDGPVCHZBVARC-UHFFFAOYSA-N 0.000 description 1
- FOFDIMHVKGYHRU-UHFFFAOYSA-N N-(1,3-benzodioxol-5-ylmethyl)-4-(4-benzofuro[3,2-d]pyrimidinyl)-1-piperazinecarbothioamide Chemical compound C12=CC=CC=C2OC2=C1N=CN=C2N(CC1)CCN1C(=S)NCC1=CC=C(OCO2)C2=C1 FOFDIMHVKGYHRU-UHFFFAOYSA-N 0.000 description 1
- 125000003047 N-acetyl group Chemical group 0.000 description 1
- 206010029260 Neuroblastoma Diseases 0.000 description 1
- 102100028762 Neuropilin-1 Human genes 0.000 description 1
- 108090000772 Neuropilin-1 Proteins 0.000 description 1
- 101100202932 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) tsp-4 gene Proteins 0.000 description 1
- 101100202938 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) tsp-5 gene Proteins 0.000 description 1
- CTQNGGLPUBDAKN-UHFFFAOYSA-N O-Xylene Chemical compound CC1=CC=CC=C1C CTQNGGLPUBDAKN-UHFFFAOYSA-N 0.000 description 1
- 201000010133 Oligodendroglioma Diseases 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 206010061535 Ovarian neoplasm Diseases 0.000 description 1
- 108010091640 PAX8 Transcription Factor Proteins 0.000 description 1
- 102000018549 PAX8 Transcription Factor Human genes 0.000 description 1
- 238000010222 PCR analysis Methods 0.000 description 1
- NVRXTLZYXZNATH-UHFFFAOYSA-N PP121 Chemical compound N1=C(C=2C=C3C=CNC3=NC=2)C=2C(N)=NC=NC=2N1C1CCCC1 NVRXTLZYXZNATH-UHFFFAOYSA-N 0.000 description 1
- 206010061902 Pancreatic neoplasm Diseases 0.000 description 1
- 102100024019 Pancreatic secretory granule membrane major glycoprotein GP2 Human genes 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 102000002508 Peptide Elongation Factors Human genes 0.000 description 1
- 108010068204 Peptide Elongation Factors Proteins 0.000 description 1
- 101710177166 Phosphoprotein Proteins 0.000 description 1
- 208000007641 Pinealoma Diseases 0.000 description 1
- 108010051742 Platelet-Derived Growth Factor beta Receptor Proteins 0.000 description 1
- 102100030485 Platelet-derived growth factor receptor alpha Human genes 0.000 description 1
- 101710148465 Platelet-derived growth factor receptor alpha Proteins 0.000 description 1
- 102100026547 Platelet-derived growth factor receptor beta Human genes 0.000 description 1
- 108091036407 Polyadenylation Proteins 0.000 description 1
- 239000004365 Protease Substances 0.000 description 1
- 102100027378 Prothrombin Human genes 0.000 description 1
- 108010094028 Prothrombin Proteins 0.000 description 1
- 108010018070 Proto-Oncogene Proteins c-ets Proteins 0.000 description 1
- 102000004053 Proto-Oncogene Proteins c-ets Human genes 0.000 description 1
- 102100036286 Purine nucleoside phosphorylase Human genes 0.000 description 1
- 108010066717 Q beta Replicase Proteins 0.000 description 1
- 101100208249 Rattus norvegicus Thbs4 gene Proteins 0.000 description 1
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 description 1
- 208000015634 Rectal Neoplasms Diseases 0.000 description 1
- 206010038389 Renal cancer Diseases 0.000 description 1
- 102400001051 Restin Human genes 0.000 description 1
- 102100038042 Retinoblastoma-associated protein Human genes 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 208000004337 Salivary Gland Neoplasms Diseases 0.000 description 1
- 206010061934 Salivary gland cancer Diseases 0.000 description 1
- 101100010298 Schizosaccharomyces pombe (strain 972 / ATCC 24843) pol2 gene Proteins 0.000 description 1
- 229940123578 Selectin antagonist Drugs 0.000 description 1
- 201000010208 Seminoma Diseases 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 206010041067 Small cell lung cancer Diseases 0.000 description 1
- 208000021712 Soft tissue sarcoma Diseases 0.000 description 1
- 208000005718 Stomach Neoplasms Diseases 0.000 description 1
- 238000000692 Student's t-test Methods 0.000 description 1
- 241000282887 Suidae Species 0.000 description 1
- 201000008736 Systemic mastocytosis Diseases 0.000 description 1
- 230000006052 T cell proliferation Effects 0.000 description 1
- 108010034610 TG4010 Proteins 0.000 description 1
- 208000024313 Testicular Neoplasms Diseases 0.000 description 1
- 206010057644 Testis cancer Diseases 0.000 description 1
- 108010046722 Thrombospondin 1 Proteins 0.000 description 1
- 201000009365 Thymic carcinoma Diseases 0.000 description 1
- 108010034949 Thyroglobulin Proteins 0.000 description 1
- 102000009843 Thyroglobulin Human genes 0.000 description 1
- 108010031429 Tissue Inhibitor of Metalloproteinase-3 Proteins 0.000 description 1
- 102000002689 Toll-like receptor Human genes 0.000 description 1
- 108020000411 Toll-like receptor Proteins 0.000 description 1
- 108010048999 Transcription Factor 3 Proteins 0.000 description 1
- 108010048992 Transcription Factor 4 Proteins 0.000 description 1
- 102100023489 Transcription factor 4 Human genes 0.000 description 1
- 102100038313 Transcription factor E2-alpha Human genes 0.000 description 1
- 102000004357 Transferases Human genes 0.000 description 1
- 108090000992 Transferases Proteins 0.000 description 1
- 102000011117 Transforming Growth Factor beta2 Human genes 0.000 description 1
- 101800000304 Transforming growth factor beta-2 Proteins 0.000 description 1
- 102000013394 Troponin I Human genes 0.000 description 1
- 108010065729 Troponin I Proteins 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 108010078814 Tumor Suppressor Protein p53 Proteins 0.000 description 1
- 102100024537 Tyrosine-protein kinase Fer Human genes 0.000 description 1
- 108090000848 Ubiquitin Proteins 0.000 description 1
- 102400000757 Ubiquitin Human genes 0.000 description 1
- 102100024689 Urokinase plasminogen activator surface receptor Human genes 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 208000002495 Uterine Neoplasms Diseases 0.000 description 1
- 108010053096 Vascular Endothelial Growth Factor Receptor-1 Proteins 0.000 description 1
- 102000009484 Vascular Endothelial Growth Factor Receptors Human genes 0.000 description 1
- 102100033178 Vascular endothelial growth factor receptor 1 Human genes 0.000 description 1
- 102100033177 Vascular endothelial growth factor receptor 2 Human genes 0.000 description 1
- 229940029042 WT1 peptide vaccine Drugs 0.000 description 1
- 238000002835 absorbance Methods 0.000 description 1
- 208000017733 acquired polycythemia vera Diseases 0.000 description 1
- 229940099550 actimmune Drugs 0.000 description 1
- 201000005188 adrenal gland cancer Diseases 0.000 description 1
- 208000024447 adrenal gland neoplasm Diseases 0.000 description 1
- 229960002833 aflibercept Drugs 0.000 description 1
- 238000007818 agglutination assay Methods 0.000 description 1
- SHGAZHPCJJPHSC-YCNIQYBTSA-N all-trans-retinoic acid Chemical compound OC(=O)\C=C(/C)\C=C\C=C(/C)\C=C\C1=C(C)CCCC1(C)C SHGAZHPCJJPHSC-YCNIQYBTSA-N 0.000 description 1
- 229940030664 allogeneic tumor cell vaccine Drugs 0.000 description 1
- 229950001537 amatuximab Drugs 0.000 description 1
- 229950009545 amuvatinib Drugs 0.000 description 1
- 239000012491 analyte Substances 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 230000033115 angiogenesis Effects 0.000 description 1
- 230000000692 anti-sense effect Effects 0.000 description 1
- 229960005348 antithrombin iii Drugs 0.000 description 1
- 208000021780 appendiceal neoplasm Diseases 0.000 description 1
- 239000007864 aqueous solution Substances 0.000 description 1
- 229960003852 atezolizumab Drugs 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 229950002916 avelumab Drugs 0.000 description 1
- 210000003719 b-lymphocyte Anatomy 0.000 description 1
- 238000003705 background correction Methods 0.000 description 1
- 229950007843 bavituximab Drugs 0.000 description 1
- 238000013398 bayesian method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 229960000397 bevacizumab Drugs 0.000 description 1
- 201000007180 bile duct carcinoma Diseases 0.000 description 1
- 201000009036 biliary tract cancer Diseases 0.000 description 1
- 208000020790 biliary tract neoplasm Diseases 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 201000001531 bladder carcinoma Diseases 0.000 description 1
- 201000000053 blastoma Diseases 0.000 description 1
- 201000000220 brain stem cancer Diseases 0.000 description 1
- 208000003362 bronchogenic carcinoma Diseases 0.000 description 1
- 201000005200 bronchus cancer Diseases 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 239000008366 buffered solution Substances 0.000 description 1
- 210000004900 c-terminal fragment Anatomy 0.000 description 1
- AIYUHDOJVYHVIT-UHFFFAOYSA-M caesium chloride Chemical compound [Cl-].[Cs+] AIYUHDOJVYHVIT-UHFFFAOYSA-M 0.000 description 1
- 229940056434 caprelsa Drugs 0.000 description 1
- 210000000845 cartilage Anatomy 0.000 description 1
- 230000021164 cell adhesion Effects 0.000 description 1
- 239000006285 cell suspension Substances 0.000 description 1
- 210000003169 central nervous system Anatomy 0.000 description 1
- 201000007455 central nervous system cancer Diseases 0.000 description 1
- 201000010881 cervical cancer Diseases 0.000 description 1
- 229960005395 cetuximab Drugs 0.000 description 1
- 230000005929 chemotherapeutic response Effects 0.000 description 1
- 238000000546 chi-square test Methods 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 229960001265 ciclosporin Drugs 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000012875 competitive assay Methods 0.000 description 1
- 239000003184 complementary RNA Substances 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 239000013068 control sample Substances 0.000 description 1
- 229950009240 crenolanib Drugs 0.000 description 1
- DYNHJHQFHQTFTP-UHFFFAOYSA-N crenolanib Chemical compound C=1C=C2N(C=3N=C4C(N5CCC(N)CC5)=CC=CC4=CC=3)C=NC2=CC=1OCC1(C)COC1 DYNHJHQFHQTFTP-UHFFFAOYSA-N 0.000 description 1
- 238000013211 curve analysis Methods 0.000 description 1
- 208000017763 cutaneous neuroendocrine carcinoma Diseases 0.000 description 1
- 229960004397 cyclophosphamide Drugs 0.000 description 1
- 229930182912 cyclosporin Natural products 0.000 description 1
- 108010038764 cytoplasmic linker protein 170 Proteins 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 229950007998 demcizumab Drugs 0.000 description 1
- 229940029030 dendritic cell vaccine Drugs 0.000 description 1
- 108010017271 denileukin diftitox Proteins 0.000 description 1
- 229960002923 denileukin diftitox Drugs 0.000 description 1
- 238000000432 density-gradient centrifugation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000014113 dietary fatty acids Nutrition 0.000 description 1
- 108700003601 dimethylglycine Proteins 0.000 description 1
- 239000002934 diuretic Substances 0.000 description 1
- 229950005778 dovitinib Drugs 0.000 description 1
- 229950009791 durvalumab Drugs 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 201000008184 embryoma Diseases 0.000 description 1
- 229940088598 enzyme Drugs 0.000 description 1
- 229940116977 epidermal growth factor Drugs 0.000 description 1
- 208000037828 epithelial carcinoma Diseases 0.000 description 1
- 210000002919 epithelial cell Anatomy 0.000 description 1
- 201000004101 esophageal cancer Diseases 0.000 description 1
- 102000013165 exonuclease Human genes 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011985 exploratory data analysis Methods 0.000 description 1
- 238000002710 external beam radiation therapy Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 229930195729 fatty acid Natural products 0.000 description 1
- 239000000194 fatty acid Substances 0.000 description 1
- 150000004665 fatty acids Chemical class 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 201000003444 follicular lymphoma Diseases 0.000 description 1
- 239000012520 frozen sample Substances 0.000 description 1
- 238000007306 functionalization reaction Methods 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 210000004475 gamma-delta t lymphocyte Anatomy 0.000 description 1
- 150000002270 gangliosides Chemical class 0.000 description 1
- 208000010749 gastric carcinoma Diseases 0.000 description 1
- 208000015419 gastrin-producing neuroendocrine tumor Diseases 0.000 description 1
- 201000000052 gastrinoma Diseases 0.000 description 1
- 201000011243 gastrointestinal stromal tumor Diseases 0.000 description 1
- 238000012817 gel-diffusion technique Methods 0.000 description 1
- 238000011223 gene expression profiling Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000003205 genotyping method Methods 0.000 description 1
- 125000002566 glucosaminyl group Chemical group 0.000 description 1
- 108020004445 glyceraldehyde-3-phosphate dehydrogenase Proteins 0.000 description 1
- 201000010536 head and neck cancer Diseases 0.000 description 1
- 208000014829 head and neck neoplasm Diseases 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 201000002222 hemangioblastoma Diseases 0.000 description 1
- 201000005787 hematologic cancer Diseases 0.000 description 1
- 208000024200 hematopoietic and lymphoid system neoplasm Diseases 0.000 description 1
- 230000001744 histochemical effect Effects 0.000 description 1
- 230000002962 histologic effect Effects 0.000 description 1
- 229940049235 iclusig Drugs 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 229960003685 imatinib mesylate Drugs 0.000 description 1
- YLMAHDNUQAMNNX-UHFFFAOYSA-N imatinib methanesulfonate Chemical compound CS(O)(=O)=O.C1CN(C)CCN1CC1=CC=C(C(=O)NC=2C=C(NC=3N=C(C=CN=3)C=3C=NC=CC=3)C(C)=CC=2)C=C1 YLMAHDNUQAMNNX-UHFFFAOYSA-N 0.000 description 1
- 230000000951 immunodiffusion Effects 0.000 description 1
- 238000002991 immunohistochemical analysis Methods 0.000 description 1
- 230000002055 immunohistochemical effect Effects 0.000 description 1
- 238000003364 immunohistochemistry Methods 0.000 description 1
- 238000012308 immunohistochemistry method Methods 0.000 description 1
- 229960001438 immunostimulant agent Drugs 0.000 description 1
- 238000007901 in situ hybridization Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 229940005319 inlyta Drugs 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 108010042414 interferon gamma-1b Proteins 0.000 description 1
- 229960001388 interferon-beta Drugs 0.000 description 1
- 229960005386 ipilimumab Drugs 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000012729 kappa analysis Methods 0.000 description 1
- 210000002510 keratinocyte Anatomy 0.000 description 1
- 201000010982 kidney cancer Diseases 0.000 description 1
- 229960004942 lenalidomide Drugs 0.000 description 1
- GOTYRUGSSMKFNF-UHFFFAOYSA-N lenalidomide Chemical compound C1C=2C(N)=CC=CC=2C(=O)N1C1CCC(=O)NC1=O GOTYRUGSSMKFNF-UHFFFAOYSA-N 0.000 description 1
- 230000023404 leukocyte cell-cell adhesion Effects 0.000 description 1
- 238000009092 lines of therapy Methods 0.000 description 1
- MPVGZUGXCQEXTM-UHFFFAOYSA-N linifanib Chemical compound CC1=CC=C(F)C(NC(=O)NC=2C=CC(=CC=2)C=2C=3C(N)=NNC=3C=CC=2)=C1 MPVGZUGXCQEXTM-UHFFFAOYSA-N 0.000 description 1
- 239000002502 liposome Substances 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 208000037829 lymphangioendotheliosarcoma Diseases 0.000 description 1
- 208000012804 lymphangiosarcoma Diseases 0.000 description 1
- 230000001926 lymphatic effect Effects 0.000 description 1
- 210000003810 lymphokine-activated killer cell Anatomy 0.000 description 1
- 238000010841 mRNA extraction Methods 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 229960004655 masitinib Drugs 0.000 description 1
- WJEOLQLKVOPQFV-UHFFFAOYSA-N masitinib Chemical compound C1CN(C)CCN1CC1=CC=C(C(=O)NC=2C=C(NC=3SC=C(N=3)C=3C=NC=CC=3)C(C)=CC=2)C=C1 WJEOLQLKVOPQFV-UHFFFAOYSA-N 0.000 description 1
- 229950008001 matuzumab Drugs 0.000 description 1
- 208000023356 medullary thyroid gland carcinoma Diseases 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 238000007431 microscopic evaluation Methods 0.000 description 1
- 230000000394 mitotic effect Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- ZLVYMPOQNJTFSG-QMMMGPOBSA-N monoiodotyrosine Chemical compound OC(=O)[C@@H](NI)CC1=CC=C(O)C=C1 ZLVYMPOQNJTFSG-QMMMGPOBSA-N 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000003562 morphometric effect Effects 0.000 description 1
- 238000013425 morphometry Methods 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 201000005962 mycosis fungoides Diseases 0.000 description 1
- 206010028537 myelofibrosis Diseases 0.000 description 1
- 208000001611 myxosarcoma Diseases 0.000 description 1
- ONDPWWDPQDCQNJ-UHFFFAOYSA-N n-(3,3-dimethyl-1,2-dihydroindol-6-yl)-2-(pyridin-4-ylmethylamino)pyridine-3-carboxamide;phosphoric acid Chemical compound OP(O)(O)=O.OP(O)(O)=O.C=1C=C2C(C)(C)CNC2=CC=1NC(=O)C1=CC=CN=C1NCC1=CC=NC=C1 ONDPWWDPQDCQNJ-UHFFFAOYSA-N 0.000 description 1
- LBWFXVZLPYTWQI-IPOVEDGCSA-N n-[2-(diethylamino)ethyl]-5-[(z)-(5-fluoro-2-oxo-1h-indol-3-ylidene)methyl]-2,4-dimethyl-1h-pyrrole-3-carboxamide;(2s)-2-hydroxybutanedioic acid Chemical compound OC(=O)[C@@H](O)CC(O)=O.CCN(CC)CCNC(=O)C1=C(C)NC(\C=C/2C3=CC(F)=CC=C3NC\2=O)=C1C LBWFXVZLPYTWQI-IPOVEDGCSA-N 0.000 description 1
- OHDXDNUPVVYWOV-UHFFFAOYSA-N n-methyl-1-(2-naphthalen-1-ylsulfanylphenyl)methanamine Chemical compound CNCC1=CC=CC=C1SC1=CC=CC2=CC=CC=C12 OHDXDNUPVVYWOV-UHFFFAOYSA-N 0.000 description 1
- 230000001452 natriuretic effect Effects 0.000 description 1
- 210000000822 natural killer cell Anatomy 0.000 description 1
- 230000009826 neoplastic cell growth Effects 0.000 description 1
- 210000000933 neural crest Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 208000007538 neurilemmoma Diseases 0.000 description 1
- 201000002120 neuroendocrine carcinoma Diseases 0.000 description 1
- 201000011519 neuroendocrine tumor Diseases 0.000 description 1
- 238000007481 next generation sequencing Methods 0.000 description 1
- 229960004378 nintedanib Drugs 0.000 description 1
- XZXHXSATPCNXJR-ZIADKAODSA-N nintedanib Chemical compound O=C1NC2=CC(C(=O)OC)=CC=C2\C1=C(C=1C=CC=CC=1)\NC(C=C1)=CC=C1N(C)C(=O)CN1CCN(C)CC1 XZXHXSATPCNXJR-ZIADKAODSA-N 0.000 description 1
- 229960003301 nivolumab Drugs 0.000 description 1
- 230000036963 noncompetitive effect Effects 0.000 description 1
- 201000011330 nonpapillary renal cell carcinoma Diseases 0.000 description 1
- 108010009099 nucleoside phosphorylase Proteins 0.000 description 1
- 238000002966 oligonucleotide array Methods 0.000 description 1
- 238000002515 oligonucleotide synthesis Methods 0.000 description 1
- 238000001543 one-way ANOVA Methods 0.000 description 1
- 201000002528 pancreatic cancer Diseases 0.000 description 1
- 208000008443 pancreatic carcinoma Diseases 0.000 description 1
- 201000002530 pancreatic endocrine carcinoma Diseases 0.000 description 1
- 229960001972 panitumumab Drugs 0.000 description 1
- 208000004019 papillary adenocarcinoma Diseases 0.000 description 1
- 201000010198 papillary carcinoma Diseases 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 239000013610 patient sample Substances 0.000 description 1
- 229950010966 patritumab Drugs 0.000 description 1
- MQHIQUBXFFAOMK-UHFFFAOYSA-N pazopanib hydrochloride Chemical compound Cl.C1=CC2=C(C)N(C)N=C2C=C1N(C)C(N=1)=CC=NC=1NC1=CC=C(C)C(S(N)(=O)=O)=C1 MQHIQUBXFFAOMK-UHFFFAOYSA-N 0.000 description 1
- 102000014187 peptide receptors Human genes 0.000 description 1
- 108010011903 peptide receptors Proteins 0.000 description 1
- 229940023041 peptide vaccine Drugs 0.000 description 1
- 208000029255 peripheral nervous system cancer Diseases 0.000 description 1
- 210000003800 pharynx Anatomy 0.000 description 1
- 208000024724 pineal body neoplasm Diseases 0.000 description 1
- 201000004123 pineal gland cancer Diseases 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- 229940012957 plasmin Drugs 0.000 description 1
- 208000037244 polycythemia vera Diseases 0.000 description 1
- 229960000688 pomalidomide Drugs 0.000 description 1
- UVSMNLNDYGZFPF-UHFFFAOYSA-N pomalidomide Chemical compound O=C1C=2C(N)=CC=CC=2C(=O)N1C1CCC(=O)NC1=O UVSMNLNDYGZFPF-UHFFFAOYSA-N 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 208000003476 primary myelofibrosis Diseases 0.000 description 1
- 235000019419 proteases Nutrition 0.000 description 1
- 229940039716 prothrombin Drugs 0.000 description 1
- 238000002661 proton therapy Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 229950011613 racotumomab Drugs 0.000 description 1
- 238000003127 radioimmunoassay Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 108091008598 receptor tyrosine kinases Proteins 0.000 description 1
- 102000027426 receptor tyrosine kinases Human genes 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 238000002278 reconstructive surgery Methods 0.000 description 1
- 206010038038 rectal cancer Diseases 0.000 description 1
- 201000001275 rectum cancer Diseases 0.000 description 1
- 208000019465 refractory cytopenia of childhood Diseases 0.000 description 1
- 229960004836 regorafenib Drugs 0.000 description 1
- 210000003289 regulatory T cell Anatomy 0.000 description 1
- 229930002330 retinoic acid Natural products 0.000 description 1
- 201000009410 rhabdomyosarcoma Diseases 0.000 description 1
- 229950003238 rilotumumab Drugs 0.000 description 1
- 229920002477 rna polymer Polymers 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 108091008601 sVEGFR Proteins 0.000 description 1
- 229950000143 sacituzumab govitecan Drugs 0.000 description 1
- ULRUOUDIQPERIJ-PQURJYPBSA-N sacituzumab govitecan Chemical compound N([C@@H](CCCCN)C(=O)NC1=CC=C(C=C1)COC(=O)O[C@]1(CC)C(=O)OCC2=C1C=C1N(C2=O)CC2=C(C3=CC(O)=CC=C3N=C21)CC)C(=O)COCC(=O)NCCOCCOCCOCCOCCOCCOCCOCCOCCN(N=N1)C=C1CNC(=O)C(CC1)CCC1CN1C(=O)CC(SC[C@H](N)C(O)=O)C1=O ULRUOUDIQPERIJ-PQURJYPBSA-N 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 206010039667 schwannoma Diseases 0.000 description 1
- 201000008407 sebaceous adenocarcinoma Diseases 0.000 description 1
- 239000002412 selectin antagonist Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000007727 signaling mechanism Effects 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 208000000649 small cell carcinoma Diseases 0.000 description 1
- 208000000587 small cell lung carcinoma Diseases 0.000 description 1
- 102000009076 src-Family Kinases Human genes 0.000 description 1
- 108010087686 src-Family Kinases Proteins 0.000 description 1
- 238000010972 statistical evaluation Methods 0.000 description 1
- 229940090374 stivarga Drugs 0.000 description 1
- 201000011549 stomach cancer Diseases 0.000 description 1
- 201000000498 stomach carcinoma Diseases 0.000 description 1
- 230000020382 suppression by virus of host antigen processing and presentation of peptide antigen via MHC class I Effects 0.000 description 1
- 229940034785 sutent Drugs 0.000 description 1
- 201000010965 sweat gland carcinoma Diseases 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 206010042863 synovial sarcoma Diseases 0.000 description 1
- 210000002437 synoviocyte Anatomy 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 108010010186 talactoferrin alfa Proteins 0.000 description 1
- UMHMOGFSSCTSNY-XSJWQOAGSA-N tecemotide Chemical compound CCCCCCCCCCCCCCCC(=O)NCCCC[C@@H](C(=O)NCC(O)=O)NC(=O)[C@@H]1CCCN1C(=O)[C@H]1N(C(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)CNC(=O)[C@H]2N(CCC2)C(=O)[C@H](C)NC(=O)[C@H]2N(CCC2)C(=O)[C@H](CCCNC(N)=N)NC(=O)[C@@H](NC(=O)[C@H](CC(O)=O)NC(=O)[C@H]2N(CCC2)C(=O)[C@H](C)NC(=O)[C@H](CO)NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)CNC(=O)[C@H](CC=2NC=NC=2)NC(=O)[C@H](C)NC(=O)[C@H]2N(CCC2)C(=O)[C@H]2N(CCC2)C(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@@H](N)CO)[C@@H](C)O)C(C)C)[C@@H](C)O)[C@@H](C)O)[C@@H](C)O)CCC1 UMHMOGFSSCTSNY-XSJWQOAGSA-N 0.000 description 1
- 229950001399 tecemotide Drugs 0.000 description 1
- 229950004186 telatinib Drugs 0.000 description 1
- 201000003120 testicular cancer Diseases 0.000 description 1
- 229960003433 thalidomide Drugs 0.000 description 1
- 229940022511 therapeutic cancer vaccine Drugs 0.000 description 1
- 229960002175 thyroglobulin Drugs 0.000 description 1
- 229940044655 toll-like receptor 9 agonist Drugs 0.000 description 1
- 229960000575 trastuzumab Drugs 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
- 229950007217 tremelimumab Drugs 0.000 description 1
- 230000005747 tumor angiogenesis Effects 0.000 description 1
- 238000007492 two-way ANOVA Methods 0.000 description 1
- 208000010570 urinary bladder carcinoma Diseases 0.000 description 1
- 206010046766 uterine cancer Diseases 0.000 description 1
- 229960000241 vandetanib Drugs 0.000 description 1
- 230000002792 vascular Effects 0.000 description 1
- 229940069559 votrient Drugs 0.000 description 1
- 239000008096 xylene Substances 0.000 description 1
- 229940036061 zaltrap Drugs 0.000 description 1
- 229960002760 ziv-aflibercept Drugs 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/112—Disease subtyping, staging or classification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Definitions
- the present invention relates to methods for determining an integrated, pan-cancer subtype and for predicting the prognosis of a patient inflicted with said integrated subtype of cancer.
- Cancers are typically classified using pathologic criteria that rely heavily on the tissue site of origin. Recently, large-scale genomics projects spearheaded by The Cancer Genome Atlas (TCGA) have been undertaken in order to provide a detailed molecular characterization of thousands of tumors, thereby making a systematic molecular-based taxonomy of cancer possible (see, for example, The_Cancer_Genome_Atlas_Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008; 455:1061-1068; The_Cancer_Genome_Atlas_Network. Integrated genomic analyses of ovarian carcinoma. Nature. 2011; 474:609-615; The_Cancer_Genome_Atlas_Network.
- TCGA Cancer Genome Atlas
- TCGA Cancer Genome Atlas
- the ‘omic’ platforms used in the studies for the COCA analysis included whole-exome DNA sequence (Illumina HiSeq and GAII), DNA methylation (Illumina 450,000-feature microarrays), genome-wide mRNA levels (Illumina mRNA-seq), microRNA levels (Illumina microRNA-seq), and protein levels and/or phosphorylated proteins (Reverse Phase Protein Arrays; RPPA).
- the present disclosure addresses the limitations of the current methods and other needs in the field for an efficient method for pan-cancer tumor classification that may inform prognosis and patient management based on underlying genomic and biologic tumor characteristics shared across tumor samples from multiple tissues of origin.
- the methods disclosed herein include determination of a cell of origin subtype, treatment of cancer based on a cell of origin subtype, prediction of overall survival of patients based on a cell of origin subtype, and application of an algorithm to gene expression data for one or a plurality of classifier biomarkers for categorization of tumor sample into one of 21 a clustering of cluster assignments (COCA) subtypes C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26
- the algorithm can be a classification to the nearest centroid (CLaNC algorithm).
- the C1 COCA subtype can indicate that a tumor sample is substantially similar to or is adenocortical carcinoma.
- the C2 COCA subtype can indicate that a tumor sample is substantially similar to or is glioblastoma.
- the C3 COCA subtype can indicate that a tumor sample is substantially similar to or is an ovarian serous cystadenocarcinoma (epithelial ovarian cancer).
- the C4 COCA subtype can indicate that a tumor sample is substantially similar to or is squamous cell carcinoma of the lung, the head and neck or the bladder.
- the C6 COCA subtype can indicate that a tumor sample is substantially similar to or is lung adenocarcinoma.
- the C8 COCA subtype can indicate that a tumor sample is substantially similar to or is pancreatic adenocarcinoma.
- the C9 COCA subtype can indicate that a tumor sample is substantially similar to or is uterine carcinosarcoma.
- the C10 COCA subtype can indicate that a tumor sample is substantially similar to or is the basal subtype of breast cancer.
- the C12 COCA subtype can indicate that a tumor sample is substantially similar to or is uterine corpus endometrial cancer.
- the C14 COCA subtype can indicate that a tumor sample is substantially similar to or is prostate cancer.
- the C15 COCA subtype can indicate that a tumor sample is substantially similar to or is non-squamous cervical cancer.
- the C16 COCA subtype can indicate that a tumor sample is substantially similar to or is a bladder urothelial carcinoma.
- the C17 COCA subtype can indicate that a tumor sample is substantially similar to or is a testicular germ cell tumor.
- the C19 COCA subtype can indicate that a tumor sample is substantially similar to or is a colon, rectal, esophageal or stomach adenocarcinoma.
- the C20 COCA subtype can indicate that a tumor sample is substantially similar to or is a sarcoma.
- the C21 COCA subtype can indicate that a tumor sample is substantially similar to or is a kidney chromophobe, kidney renal papillary cell carcinoma or kidney renal clear cell carcinoma.
- the C22 COCA subtype can indicate that a tumor sample is substantially similar to or is liver hepatocellular carcinoma.
- the C24 COCA subtype can indicate that a tumor sample is substantially similar to or is the luminal subtype of breast cancer.
- the C25 COCA subtype can indicate that a tumor sample is substantially similar to or is thymoma.
- the C26 COCA subtype can indicate that a tumor sample is substantially similar to or is melanoma.
- the C28 COCA subtype can indicate that a tumor sample is substantially similar to or is thyroid cancer.
- a method for determining a clustering of cluster assignments (COCA) subtype of a tumor cancer sample obtained from a patient comprising detecting an expression level of at least one classifier biomarker of Table 1, wherein the detection of the expression level of the classifier biomarker specifically identifies a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype.
- the method further comprises comparing the detected levels of expression of the at least one classifier biomarker of Table 1 to the expression of the at least one classifier biomarker of Table 1 in at least one sample training set(s), wherein the at least one sample training set(s) comprises expression data of the at least one classifier biomarker of Table 1 from a reference C1 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C2 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C3 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C4 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C6 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C8 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C9 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C10 sample, expression data of the at least one
- the comparing step comprises applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm.
- the expression level of the classifier biomarker is detected at the nucleic acid level.
- the nucleic acid level is RNA or cDNA.
- the detecting an expression level comprises performing a quantitative real time reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq, microarray analysis, gene chips, an nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- the expression level is detected by performing RNAseq.
- the detection of the expression level comprises using at least one pair of oligonucleotide primers specific for at least one classifier biomarker of Table 1.
- the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, a fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- the bodily fluid is blood or fractions thereof (i.e., serum or plasma), urine, saliva, or sputum.
- the at least one classifier biomarker comprises a plurality of classifier biomarkers.
- the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 4 classifier biomarkers, at least 6 classifier biomarkers, at least 8 classifier biomarkers, at least 10 classifier biomarkers, at least 12 classifier biomarkers, at least 14 classifier biomarkers, at least 16 classifier biomarkers, at least 18 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- the at least one classifier biomarker comprises, consists essentially of or consists of all the classifier biomarkers of Table 1.
- a method of detecting a biomarker in a tumor sample obtained from a patient comprising measuring the expression level of a plurality of classifier biomarker nucleic acids selected from Table 1 using an amplification, hybridization and/or sequencing assay.
- the patient is suffering from or is suspected of suffering from kidney renal papillary cell carcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer (THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma (PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma (KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG); sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD); head and neck squamous cell carcinoma (HNSC); uterine corpus endometrial carcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma (ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma (OV); rectum a
- the amplification, hybridization and/or sequencing assay comprises performing quantitative real time reverse transcriptase polymerase chain reaction(s) (qRT-PCR), RNAseq, microarray analysis, gene chips, nCounter Gene Expression Assay(s), Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- the expression level is detected by performing RNAseq.
- the detection of the expression level comprises using at least one pair of oligonucleotide primers per each of the plurality of biomarker nucleic acids selected from Table 1.
- the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- FFPE formalin-fixed, paraffin-embedded
- the bodily fluid is blood or fractions thereof, urine, saliva, or sputum.
- the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- the plurality of biomarker nucleic acids comprises, consists essentially of or consists of all the classifier biomarker nucleic acids of Table 1.
- a method of treating cancer in a subject comprising: measuring the expression level of at least one biomarker nucleic acid in a tumor sample obtained from the subject, wherein the at least one biomarker nucleic acid is selected from a set of biomarkers listed in Table 1, wherein the presence, absence and/or level of the at least one biomarker indicates a COCA subtype of the cancer; and administering a therapeutic agent based on the COCA subtype of the cancer.
- the at least one biomarker nucleic acid selected from the set of biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- the method further comprises measuring the expression of at least one biomarker from an additional set of biomarkers.
- the additional set of biomarkers comprises at least an immune cell signature, a cell proliferation signature, or drug target genes.
- the measuring the expression level is conducted using an amplification, hybridization and/or sequencing assay.
- the amplification, hybridization and/or sequencing assay comprises performing quantitative real time reverse transcriptase polymerase chain reaction(s) (qRT-PCR), RNAseq, microarray analysis, gene chips, nCounter Gene Expression Assay(s), Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- qRT-PCR quantitative real time reverse transcriptase polymerase chain reaction
- RNAseq microarray analysis
- gene chips nCounter Gene Expression Assay(s)
- SAGE Serial Analysis of Gene Expression
- RAGE Rapid Analysis of Gene Expression
- nuclease protection assays Northern blotting, or any other equivalent gene expression detection techniques.
- the expression level is detected by performing RNAseq.
- the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- the bodily fluid is blood or fractions thereof, urine, saliva, or sputum.
- the subject's COCA subtype is selected from C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28.
- a method of predicting overall survival in a cancer patient comprising detecting an expression level of at least one classifier biomarker of Table 1 in a tumor sample obtained from a patient, wherein the detection of the expression level of the at least one classifier biomarker specifically identifies a COCA subtype, and wherein identification of the COCA subtype is predictive of the overall survival in the patient.
- the method further comprises comparing the detected levels of expression of the at least one classifier biomarker of Table 1 to the expression of the at least one classifier biomarker of Table 1 in at least one sample training set(s), wherein the at least one sample training set(s) comprises expression data of the at least one classifier biomarker of Table 1 from a reference C1 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C2 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C3 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C4 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C6 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C8 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C9 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C10 sample, expression data of the at least one
- the comparing step comprises applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm.
- the expression level of the classifier biomarker is detected at the nucleic acid level.
- the nucleic acid level is RNA or cDNA.
- the detecting an expression level comprises performing quantitative real time reverse transcriptase polymerase chain reaction(s) (qRT-PCR), RNAseq, microarray analysis, gene chips, nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- the expression level is detected by performing RNAseq.
- the detection of the expression level comprises using at least one pair of oligonucleotide primers specific for at least one classifier biomarker of Table 1.
- the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- the bodily fluid is blood or fractions thereof, urine, saliva, or sputum.
- the at least one classifier biomarker comprises a plurality of classifier biomarkers.
- the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- the at least one classifier biomarker comprises, consists essentially of or consists of all the classifier biomarkers of Table 1.
- FIG. 1 shows a cross-tabulation of the TCGA tumor type and COCA subtype from Hoadley et al., Cell. 2018 Apr. 5; 173(2):291-304 for samples with qualifying expression data as described in Example 1.
- FIG. 1 also provides the integrated tumor subtypes provided herein.
- genes with low variance and/or low mean were filtered out, while genes with mean variance and mean expression values greater than 4 were kept resulting in gene expression data for 2190 genes.
- FIG. 4 illustrates agreement and disagreement between the GS subtype (rows) and the subtype based on the 84-gene subtyper (columns) (left panel) for the test set described in Example 1.
- the right panel shows agreement for each COCA subtype listed. Overall agreement was 90%. Overall agreement with COCA on the training set was 91%.
- FIG. 5 shows the proportion of COCA subtypes in the test set that were called correctly by the 84-gene typer developed in Example 1.
- the methods and compositions provided herein can utilize conventional techniques and descriptions of organic chemistry, polymer technology, molecular biology (including recombinant techniques), cell biology, biochemistry, and immunology, which are within the skill of the art.
- Such conventional techniques include polymer array synthesis, hybridization, ligation, and detection of hybridization using a label. Specific illustrations of suitable techniques can be had by reference to the example herein below. However, other equivalent conventional procedures can, of course, also be used.
- Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols.
- Computer software products for use herein typically include computer readable medium having computer-executable instructions for performing the logic steps of any of the methods provided herein.
- Suitable computer readable medium include floppy disk, CD-ROM/DVD/DVD-ROM, hard-disk drive, flash memory, ROM/RAM, magnetic tapes, etc.
- the computer-executable instructions may be written in a suitable computer language or combination of several languages.
- compositions provided herein may also make use of various computer program products and software for a variety of purposes, such as probe design, management of data, analysis, and instrument operation. See, U.S. Pat. Nos. 5,593,839, 5,795,716, 5,733,729, 5,974,164, 6,066,454, 6,090,555, 6,185,561, 6,188,783, 6,223,127, 6,229,911 and 6,308,170.
- Computer methods related to genotyping using high density microarray analysis may also be used in the present methods, see, for example, US Patent Pub. Nos. 20050250151, 20050244883, 20050108197, 20050079536 and 20050042654.
- present disclosure may have preferred embodiments that include methods for providing genetic information over networks such as the Internet as shown in U.S. Patent Pub. Nos. 20030097222, 20020183936, 20030100995, 20030120432, 20040002818, 20040126840, and 20040049354.
- the terms “individual,” “patient,” and “subject” can refer to any single animal, more preferably a mammal (including such non-human animals as, for example, dogs, cats, horses, rabbits, zoo animals, cows, pigs, sheep, and non-human primates) for which treatment is desired.
- the individual or patient herein is a human.
- the term “healthy” as used herein, is relative to cancer status, as the term “healthy” cannot be defined to correspond to any absolute evaluation or status.
- an individual defined as healthy with reference to any specified disease or disease criterion can in fact be diagnosed with any other one or more diseases, or exhibit any other one or more disease criterion, including one or more other cancers.
- tumor can refer to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues.
- cancer can refer to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues.
- cancer cancer, “cancerous,” and “tumor” are not mutually exclusive and can be used interchangeably.
- detection can include any means of detecting, including direct and indirect detection.
- Substantially similar products, items (e.g., type of cancer, nucleic acid complement), services or methods are at least 80%, 81%, 82%, 83%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5% similar or the same as a product, item (e.g., type of cancer, nucleic acid complement), service or method recited herein.
- kits, compositions and methods for identifying, determining, detecting or diagnosing integrated, pan-cancer clustering of cluster assignment (COCA) subtypes are provided herein. That is, the methods can be useful for molecularly defining subsets of cancer regardless of tissue of origin.
- the methods provide a pan-cancer classification of a tumor sample obtained from subject that can be prognostic and predictive for therapeutic response.
- the therapeutic response can include chemotherapy, immunotherapy, angiogenesis inhibitor therapy, surgical intervention and/or radiotherapy.
- the methods can be also provide a prognosis of overall survival for cancer patients according to their pan-cancer, integrated COCA subtype.
- the kits, compositions and methods provided herein can be used to classify a tumor sample as being any type of COCA subtype known in the art.
- the COCA subtype determined or diagnosed by the methods and compositions provided herein are selected from C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA).
- the COCA subtype determined using the kits, compositions or methods provided herein can indicate or disclose the cell or tissue of origin of a tumor sample obtained from a subject.
- the C1 COCA subtype can indicate that a tumor sample is substantially similar to or is adenocortical carcinoma
- the C2 COCA subtype can indicate that a tumor sample is substantially similar to or is glioblastoma
- the C3 COCA subtype can indicate that a tumor sample is substantially similar to or is an ovarian serous cystadenocarcinoma (epithelial ovarian cancer)
- the C4 COCA subtype can indicate that a tumor sample is substantially similar to or is squamous cell carcinoma of the lung, the head and neck or the bladder
- the C6 COCA subtype can indicate that a tumor sample is substantially similar to or is lung adenocarcinoma
- the C8 COCA subtype can indicate that a tumor sample is substantially similar to or is pancreatic adenocarcinoma
- Determining a COCA subtype can include, for example, diagnosing or detecting the presence, sub-type and cell-of-origin of a cancer, monitoring the progression of the disease, and identifying or detecting cells or samples that are indicative of said pan-cancer subtypes.
- the COCA subtype is assessed or determined through the evaluation of expression patterns, or profiles, of one or a plurality of classifier biomarkers or biomarkers in one or more subject samples.
- subject, or subject sample may refer to an individual regardless of health and/or disease status.
- a subject can be a subject, a study participant, a test subject, a control subject, a screening subject, or any other class of individual from whom a sample is obtained and assessed in the context of the methods and compositions provided herein.
- a subject can be previously diagnosed with one type of a myriad of cancers, can present with one or more symptoms of said type of cancer, or a predisposing factor, such as a family (genetic) or medical history (medical) factor for said type of cancer, can be undergoing treatment or therapy for said cancer, or the like.
- a subject can be healthy as de fin e d herein with respect to any of the aforementioned factors or criteria.
- the myriad of cancers from which a subject may be suffering from or suspected of suffering from can be any cancer known in the art.
- the classifier biomarkers provided herein e.g., the classifier biomarkers of Table 1
- methods of using said classifier biomarkers can be used to determine an integrated, pan-cancer COCA subtype of the cancer that said subject may be or is suspected of suffering from.
- the cancer can include, but is not limited to, carcinoma, lymphoma, blastoma (including medulloblastoma and retinoblastoma), sarcoma (including liposarcoma and synovial cell sarcoma), neuroendocrine tumors (including carcinoid tumors, gastrinoma, and islet cell cancer), mesothelioma, schwannoma (including acoustic neuroma), meningioma, adenocarcinoma, melanoma, and leukemia or lymphoid malignancies.
- carcinoma lymphoma
- blastoma including medulloblastoma and retinoblastoma
- sarcoma including liposarcoma and synovial cell sarcoma
- neuroendocrine tumors including carcinoid tumors, gastrinoma, and islet cell cancer
- mesothelioma including schwannoma (including acou
- a cancer can also include, but are not limited to, a lung cancer (e.g., a non-small cell lung cancer (NSCLC) or small cell lung cancer), a kidney cancer (e.g., a kidney urothelial carcinoma or RCC), a bladder cancer (e.g., a bladder urothelial (transitional cell) carcinoma (e.g., locally advanced or metastatic urothelial cancer, including 1L or 2L+locally advanced or metastatic urothelial carcinoma)), a breast cancer, a colorectal cancer (e.g., a colon adenocarcinoma), an ovarian cancer, a pancreatic cancer, a gastric carcinoma, an esophageal cancer, a mesothelioma, a melanoma (e.g., a skin melanoma), a head and neck cancer (e.g., a head and neck squamous cell carcinoma (HNSCC)), a thyroid
- the cancer is selected from kidney renal papillary cell carcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer (THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma (PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma (KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG); sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD); head and neck squamous cell carcinoma (HNSC); uterine corpus endometrial carcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma (ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma (OV); rectum adenocarcinoma
- an “expression profile” or an “expression pattern” or a “biomarker profile” or a “gene signature” can comprise one or more values corresponding to a measurement of the relative abundance, level, presence, or absence of expression of a discriminative or classifier biomarker or biomarker.
- An expression profile can be derived from a subject prior to or subsequent to a diagnosis of a type of cancer, can be derived from a biological sample collected from a subject at one or more time points prior to or following treatment or therapy, can be derived from a biological sample collected from a subject at one or more time points during which there is no treatment or therapy (e.g., to monitor progression of disease or to assess development of disease in a subject diagnosed with or at risk for a type of cancer), or can be collected from a healthy subject.
- the term subject can be used interchangeably with patient.
- the patient can be a human patient.
- the one or a plurality of classifier biomarkers that can make up an expression profile as provided herein can be selected from one or more biomarkers of Table 1 and/or any additional set of biomarker classifiers disclosed herein.
- determining an expression level” or “determining an expression profile” or “detecting an expression level” or “detecting an expression profile” as used in reference to a biomarker or classifier can mean the application of a biomarker specific reagent such as a probe, primer or antibody and/or a method applied to a sample, for example a sample of the subject or patient and/or a control sample, for ascertaining or measuring quantitatively, semi-quantitatively or qualitatively the amount of a biomarker or biomarkers, for example the amount of biomarker polypeptide or mRNA (or cDNA derived therefrom).
- the level of a biomarker as provided herein can be determined by any number of methods known in the art and/or provided herein.
- the methods can include for example immunoassays including for example immunohistochemistry, ELISA, Western blot, immunoprecipitation and the like, where a biomarker detection agent such as an antibody for example, a labeled antibody, specifically binds the biomarker and permits for example relative or absolute ascertaining of the amount of polypeptide biomarker, hybridization and PCR protocols where a probe or primer or primer set are used to ascertain the amount of nucleic acid biomarker, including for example probe based and amplification based methods including for example microarray analysis, RT-PCR such as quantitative RT-PCR (qRT-PCR), serial analysis of gene expression (SAGE), Northern Blot, digital molecular barcoding technology, for example Nanostring Counter Analysis, and TaqMan quantitative PCR assays.
- a biomarker detection agent such as an antibody for example, a labeled antibody
- mRNA in situ hybridization in formalin-fixed, paraffin-embedded (FFPE) tissue samples or cells can be applied, such as mRNA in situ hybridization in formalin-fixed, paraffin-embedded (FFPE) tissue samples or cells.
- FFPE paraffin-embedded
- This technology is currently offered by the QuantiGene ViewRNA (Affymetrix), which uses probe sets for each mRNA that bind specifically to an amplification system to amplify the hybridization signals; these amplified signals can be visualized using a standard fluorescence microscope or imaging system.
- This system for example can detect and measure transcript levels in heterogeneous samples; for example, if a sample has normal and tumor cells present in the same tissue section.
- TaqMan probe-based gene expression analysis can also be used for measuring gene expression levels in tissue samples, and this technology has been shown to be useful for measuring mRNA levels in FFPE samples.
- TaqMan probe-based assays utilize a probe that hybridizes specifically to the mRNA target. This probe contains a quencher dye and a reporter dye (fluorescent molecule) attached to each end, and fluorescence is emitted only when specific hybridization to the mRNA target occurs.
- the exonuclease activity of the polymerase enzyme causes the quencher and the reporter dyes to be detached from the probe, and fluorescence emission can occur. This fluorescence emission is recorded and signals are measured by a detection system; these signal intensities are used to calculate the abundance of a given transcript (gene expression) in a sample.
- the “expression profile” or a “biomarker profile” or “gene signature” associated with the classifier biomarkers described herein can be useful for distinguishing between normal and tumor samples.
- the tumor samples are one type of cancer as determined based on tissue of origin.
- the one type of cancer can be any type of cancer known in the art and/or provided herein.
- the cancer can be further classified as a specific clustering of cluster assignment (COCA) subtype based upon an expression profile of one or more classifier biomarkers (e.g., Table 1) determined using the methods provided herein.
- COCA specific clustering of cluster assignment
- the specific COCA subtype can be any COCA subtype as described in Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304.
- the specific COCA subtype can be selected from C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA.
- Expression profiles using the classifier biomarkers disclosed herein can provide valuable molecular tools for specifically identifying COCA subtypes, and for treating a cancer based on its COCA subtype. Accordingly, provided herein are methods for screening and classifying a subject for pan-cancer COCA subtypes.
- a single classifier biomarker or a plurality of classifier biomarkers provided herein is capable of identifying COCA subtypes of cancer with a predictive success of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at 1 east about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, inclusive of all ranges and subranges therebetween.
- a single classifier biomarker or a plurality of classifier biomarkers as provided herein is capable of determining COCA subtypes of cancer with a sensitivity or specificity of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at 1 east about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, inclusive of all ranges and subranges therebetween.
- Also encompassed herein is a system capable of distinguishing various COCA subtypes of cancer not detectable using current methods.
- This system can b e capable of processing a large number of subjects and subject variables such as expression profiles and other diagnostic criteria.
- the methods for determining a COCA subtype as provided herein using one or a plurality of classifier biomarkers as provided herein can be part of system capable of distinguishing various COCA subtypes that also utilizes data accumulated from other diagnostic methods.
- the other diagnostic methods can include additional genome-wide molecular assays or platforms, histochemical, immunohistochemical, cytologic, immunocytologic, visual diagnostic methods including histologic or morphometric evaluation of cancer or tumor tissue or any combination thereof.
- the additional genome-wide molecular assays or platforms can be selected from whole-exome DNA sequencing assays (e.g., Illumina HiSeq and GAID, DNA copy-number variation assays (e.g., Affymetrix 6.0 microarrays), DNA methylation assays (e.g., Illumina 450,000-feature microarrays), genome-wide mRNA level assays (e.g., Illumina mRNA-seq), microRNA level assays (e.g., Illumina microRNA-seq), and protein level assays for proteins and/or phosphorylated proteins (e.g., Reverse Phase Protein Arrays; RPPA).
- whole-exome DNA sequencing assays e.g., Illumina HiSeq and GAID, DNA copy-number variation assays (e.g., Affymetrix 6.0 microarrays), DNA methylation assays (e.g., Illumina 450,000-feature microarrays), genome-wide
- the expression profile derived from a subject is compared to a reference expression profile.
- a “reference expression profile” or “control expression profile” can be a profile derived from the subject prior to treatment or therapy; can be a profile produced from the subject sample at a particular time point (usually prior to or following treatment or therapy, but can also include a particular time point prior to or following diagnosis of a type of cancer); or can be derived from a healthy individual or a pooled reference from healthy individuals.
- a reference expression profile can be specific to different C O C A subtypes of cancer.
- the COCA reference expression profile can be from any tissues from which a specific COCA has been found.
- the specific COCA subtype can be any COCA subtype as described in Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304.
- the specific COCA subtype can be selected from a C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype.
- test expression profile can be compared to a test expression profile or vice versa.
- a “test expression profile” can be derived from the same subject as the reference expression profile except at a subsequent time point (e.g., one or more days, weeks or months following collection of the reference expression profile) or can be derived from a different subject.
- any test expression profile of a subject can be compared to a previously collected profile from a subject that has a specific COCA subtype.
- the specific COCA subtype can be any COCA subtype as described in Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al.
- the specific COCA subtype can be selected from a C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype.
- the classifier biomarkers provided herein for use in the methods, compositions or kits provided herein can include nucleic acids (RNA, cDNA, and DNA) and proteins, and variants and fragments thereof.
- Such biomarkers can include DNA comprising the entire or partial sequence of the nucleic acid sequence encoding the biomarker, or the complement of such a sequence.
- the biomarkers described herein can include RNA comprising the entire or partial sequence of any of the nucleic acid sequences of interest, or their non-natural cDNA products, obtained synthetically in vitro in a reverse transcription reaction.
- the biomarker nucleic acids can also include any expression product or portion thereof of the nucleic acid sequences of interest.
- a biomarker protein can be a protein encoded by or corresponding to a DNA biomarker provided herein.
- a biomarker protein can comprise the entire or partial amino acid sequence of any of the biomarker proteins or polypeptides.
- the biomarker nucleic acid can be extracted from a bodily fluid (e.g., blood or fractions thereof, urine, saliva, CSF, etc.), a cell or can be cell free or extracted from an extracellular vesicular entity such as an exosome.
- a “classifier biomarker” or “biomarker” or “classifier gene” can be any nucleic acid (DNA, RNA or cDNA) or protein whose level of expression in a tissue or cell is altered compared to that of a normal or healthy cell or tissue or any other reference or control as provided herein.
- a “classifier biomarker” or “biomarker” or “classifier gene” can be any nucleic acid (DNA, RNA or cDNA) or protein whose level of expression in a tissue or cell is altered in a specific COCA subtype. The detection of the biomarkers provided herein can permit the determination of the specific COCA subtype.
- the “classifier biomarker” or “biomarker” or “classifier gene” may be one that is up-regulated (e.g. expression is increased) or down-regulated (e.g. expression is decreased) relative to a reference or control as provided herein.
- the reference or control can be any reference or control as provided herein.
- the expression values of nucleic acids (DNA, RNA or cDNA) that are up-regulated or down-regulated in a particular C O C A subtype of cancer can be pooled into one gene signature.
- the overall expression level in each gene signature is referred to herein as the “‘expression profile” and is used to classify a test sample (i.e., a sample obtained from a subject suffering from or suspected of suffering from cancer) according to the COCA subtype of cancer.
- a test sample i.e., a sample obtained from a subject suffering from or suspected of suffering from cancer
- independent evaluation of expression for each of the genes disclosed herein can be used to classify tumor subtypes without the need to group up-regulated and down-regulated genes into one or more gene signatures.
- a total of 84 biomarkers can be used for COCA subtype determination.
- expression of 4 of the 84 biomarkers of Table 1 can have altered expression that is correlated therewith.
- the correlation of the 4 of the 84 biomarkers of Table 1 with the specific COCA subtype can be positive, negative or a combination thereof.
- the classifier biomarkers for use in the methods provided herein can include any nucleic acid (DNA, RNA or cDNA) or protein that is selectively expressed in COCA subtypes of cancer, as defined herein above. Sample biomarker genes are listed in Table 1 below.
- the 84-gene gene signature for COCA subtyping is found in Table 1.
- the relative gene expression levels as represented by nearest centroid coefficients of the classifier biomarkers for the 84-gene pan-cancer subtyper of Table 1 are shown in Table 2.
- a subset of one or more of the 84 genes of Table 1 can be used to classify or determine the COCA subtype of a tumor sample. In one embodiment, all 84 genes of Table 1 can be used to classify or determine the COCA subtype of a tumor sample.
- the up-regulation of a classifier biomarker e.g. expression is increased
- the down-regulation of a classifier biomarker e.g. expression is decreased
- a classifier biomarker may have no specific effects on a certain COCA subtype when the expression level equals to zero.
- determining integrated, pan-cancer COCA subtypes can further include measuring the expression of at least one biomarker from an additional set of biomarker classifiers.
- an additional set of biomarker classifiers can include measuring gene signatures related to cell proliferation.
- the gene signatures related to cell proliferation for use in the methods provided herein can include the 11 gene signature comprising BIRC5, CCNB1, CDCl20, CDCA1, CEP55, KNTC2, MKI67, PTTG1, RRM2, TYMS, and UBE2C found in Martin M. et al., Breast Cancer Res Treat, 138: 457-466 (2013), the 18 gene signature found in US 20160115551 and/or the 26 gene signature found in 62/789,668 filed Jan.
- an additional set of biomarker classifiers can include a 5 gene signature comprising tumor driver genes such as TP53 and RB1, and receptor tyrosine kinases including FGFR2, FGFR3, and ERBB2.
- the 5 gene signature is related to the signature of tumor driver genes.
- the biomarker classifiers can also include immune cell signatures that are known in the art (Bindea G. et al., Immunity, 39(4): 782-95 (2013); Faruki H. et al., JTO, 12(6): 943-953 (2017); Charoentong P.
- an additional set of biomarker classifiers can include assessing tumor purity ABSOLUTE derived from the TCGA supplementary data.
- the additional set of biomarker can be gene signatures known in the art for specific types of cancer.
- the cancer is lung cancer and the gene signature is selected from the gene signatures found in WO2017/201165, WO2017/201164, US20170114416 or U.S. Pat. No. 8,822,153, each of which is herein incorporated by reference in their entirety.
- the cancer is head and neck squamous cell carcinoma (HNSCC) and the gene signature is selected from the gene signatures found in PCT/US18/45522 or PCT/US18/48862, each of which is herein incorporated by reference in their entirety.
- HNSCC head and neck squamous cell carcinoma
- the cancer is breast cancer and the gene signature is the PAM50 sub-typer found in Parker J S et al., (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27:1160-1167, which is herein incorporated by reference in its entirety.
- the cancer is bladder cancer and the gene signature can include the bladder cancer biomarker signature described in Gene Expression Omnibus (GEO) dataset: GSE87304, Seiler R. et al., Eur Urol, 72(4):544-554 (2017); Gene Expression Omnibus (GEO) dataset: GSE32894, Sjodahl G.
- GEO Gene Expression Omnibus
- the cancer is bladder cancer (e.g., MIBC) and the gene signature can include the bladder cancer biomarker signatures described in 62/629,975 filed Feb. 13, 2018, which is herein incorporated by reference.
- the cancer is bladder cancer (e.g., MIBC) and the gene signature can include the bladder cancer biomarker signature described in The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature volume 507, pages 315-322 (2014), or Robertson, A G, et al., Cell, 171(3): 540-556 (2017), each of which is herein incorporated by reference.
- determining integrated, pan-cancer COCA subtypes can further include assessing tumor mutation burden (TMB) and/or TMB rate.
- TMB value and/or rate can be calculated from RNA (e.g., via transcriptome profiling or RNA sequencing)) as provided in U.S. 62/771,702 filed Nov. 27, 2018 and U.S. 62/743,257 filed Oct. 9, 2018, which is herein incorporated by reference herein.
- the expression levels of the at least one of the classifier biomarkers (such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein) determined, measured or detected from the sample obtained from the subject can then be compared to reference expression levels of the at least one of the classifier biomarkers (such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein) from at least one sample training set.
- the at least one sample training set can comprise, (i) expression levels of the at least one biomarker from a sample that overexpresses the at least one biomarker or (ii) expression levels from a reference sample for a specific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) and classifying the sample obtained from the subject sample as a specific
- the comparing step can comprise applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample obtained from the subject and the expression data from the at least one training set(s); and classifying the sample obtained from the subject as a specific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) based on the results of the statistical algorithm.
- the statistical algorithm for the comparing step can be an algorithm that comprises determining a correlation between the expression data obtained from the tumor sample obtained from the subject (i.e., test sample) and centroids constructed from the expression levels or profiles measured or detected for the at least one classifier biomarkers (such as the classifier biomarkers of Table 1 or subsets thereof or any additional set of biomarker classifiers or subsets thereof as disclosed herein) from the at least one training set.
- the COCA subtype for the tumor sample i.e., test sample
- the centroids can be constructed using any method known in the art for generating centroids such as, for example, those found in Mullins et al. (2007) Clin Chem. 53(7):1273-9 or Dabney (2005) Bioinformatics 21(22):4148-4154
- the COCA subtype can then be assigned to the tumor sample obtained from subject based on the use of a classification to the nearest centroid (CLaNC) algorithm as applied to the expression data generated or measured from the tumor sample and the centroid(s) constructed for the at least one training sets.
- CLaNC algorithm for use in the methods, compositions and kits provided herein can be the CLaNC algorithm implemented by the CLaNC software found in Dabney A R.
- ClaNC Point-and-click software for classifying microarrays to nearest centroids. Bioinformatics. 2006; 22: 122-123 or equivalents or derivatives thereof.
- the methods and compositions provided herein allow for the differentiation or diagnosis of a sample obtained from a subject as being a specific COCA subtype.
- the COCA subtype can be one of 21 integrated, pan-cancer COCA subtypes of cancer selected from C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA).
- the differentiation, detection or diagnosis of the sample obtained from the subject as being a COCA subtype as provided herein can be accomplished by measuring or detecting the presence and/or level of one or more classifier biomarkers from a publically available pan-cancer dataset and/or a pan-cancer dataset provided herein (e.g., Table 1).
- the measuring can be at the nucleic acid or protein level.
- a sample for use in any of the methods and compositions provided herein can be a tumor sample obtained from a subject or patient suffering from or suspected of suffering from a type of cancer.
- the type of cancer can be any type of cancer provided herein and/or known in the art.
- the tumor sample used for the detection or differentiation methods described herein can be a sample previously determined or diagnosed as a type of cancer sample using traditional tissue-of-origin methods. The previous diagnosis can be based on a histological analysis. The histological analysis can be performed by one or more pathologists.
- the sample can be any sample (e.g., tumor) isolated from the subject or patient.
- the subject or patient is a human subject or patient.
- the analysis is performed on biopsies that are embedded in paraffin wax.
- the sample can be a fresh frozen tissue sample.
- the sample can be a bodily fluid obtained from the patient.
- the bodily fluid can be blood or fractions thereof (i.e., serum, plasma), urine, saliva, sputum or cerebrospinal fluid (CSF).
- the sample can contain cellular as well as extracellular sources of nucleic acid or protein for use in the methods provided herein.
- the extracellular sources can be cell-free DNA and/or exosomes.
- the sample can be a cell pellet or a wash.
- This aspect of the methods provided herein provides a means to improve current diagnostics by accurately identifying the major histological types, even from small biopsies.
- the methods provided herein, including the RT-PCR methods, are sensitive, precise and have multi-analyte capability for use with paraffin-embedded samples. See, for example, Cronin et al. (2004) Am. J Pathol. 164(1):35-42, herein incorporated by reference.
- Formalin fixation and tissue embedding in paraffin wax is a universal approach for tissue processing prior to light microscopic evaluation.
- a major advantage afforded by formalin-fixed paraffin-embedded (FFPE) specimens is the preservation of cellular and architectural morphologic detail in tissue sections.
- the standard buffered formalin fixative in which biopsy specimens are processed is typically an aqueous solution containing 37% formaldehyde and 10-15% methyl alcohol.
- Formaldehyde is a highly reactive dipolar compound that results in the formation of protein-nucleic acid and protein-protein crosslinks in vitro (Clark et al. (1986) J Histochem Cytochem 34:1509-1512; McGhee and von Hippel (1975) Biochemistry 14:1281-1296, each incorporated by reference herein).
- the sample used herein is obtained from an individual, and comprises formalin-fixed paraffin-embedded (FFPE) tissue.
- FFPE formalin-fixed paraffin-embedded
- other tissue and sample types are amenable for use herein.
- the other tissue and sample types can be fresh frozen tissue, wash fluids, cell pellets, or the like.
- the sample can be a bodily fluid obtained from the individual.
- the bodily fluid can be blood or fractions thereof (e.g., serum, plasma), urine, sputum, saliva or cerebrospinal fluid (CSF).
- a biomarker nucleic acid as provided herein can be extracted from a cell, can be cell free or extracted from an extracellular vesicular entity such as an exosome.
- RNA nucleic acid
- FFPE tissue FFPE tissues
- RNA nucleic acid
- total RNA can be isolated from FFPE tissues as described by Bibikova et al. (2004) American Journal of Pathology 165:1799-1807, herein incorporated by reference.
- the High Pure RNA Paraffin Kit (Roche) can be used. Paraffin is removed by xylene extraction followed by ethanol wash.
- RNA can be isolated from sectioned tissue blocks using the MasterPure Purification kit (Epicenter, Madison, Wis.); a DNase I treatment step is included.
- RNA can be extracted from frozen samples using Trizol reagent according to the supplier's instructions (Invitrogen Life Technologies, Carlsbad, Calif.).
- Samples with measurable residual genomic DNA can be resubjected to DNasel treatment and assayed for DNA contamination. All purification, DNase treatment, and other steps can be performed according to the manufacturer's protocol. After total RNA isolation, samples can be stored at ⁇ 80° C. until use.
- RNA isolation can be performed using a purification kit, a buffer set and protease from commercial manufacturers, such as Qiagen (Valencia, Calif.), according to the manufacturer's instructions.
- RNA from cells in culture can be isolated using Qiagen RNeasy mini-columns.
- Other commercially available RNA isolation kits include MasterPureTM. Complete DNA and RNA Purification Kit (Epicentre, Madison, Wis.) and Paraffin Block RNA Isolation Kit (Ambion, Austin, Tex.).
- Total RNA from tissue samples can be isolated, for example, using RNA Stat-60 (Tel-Test, Friendswood, Tex.).
- RNA prepared from a tumor can be isolated, for example, by cesium chloride density gradient centrifugation.
- large numbers of tissue samples can readily be processed using techniques well known to those of skill in the art, such as, for example, the single-step RNA isolation process of Chomczynski (U.S. Pat. No. 4,843,155, incorporated by reference in its entirety for all purposes).
- a sample comprises cells harvested from a tumor sample.
- Cells can be harvested from a biological sample using standard techniques known in the art. For example, in one embodiment, cells are harvested by centrifuging a cell sample and resuspending the pelleted cells. The cells can be resuspended in a buffered solution such as phosphate-buffered saline (PBS). After centrifuging the cell suspension to obtain a cell pellet, the cells can be lysed to extract nucleic acid, e.g, messenger RNA. All samples obtained from a subject, including those subjected to any sort of further processing, are considered to be obtained from the subject.
- PBS phosphate-buffered saline
- the sample in one embodiment, is further processed before the detection of the biomarker levels of the combination of biomarkers set forth herein.
- mRNA in a cell or tissue sample can be separated from other components of the sample.
- the sample can be concentrated and/or purified to isolate mRNA in its non-natural state, as the mRNA is not in its natural environment.
- studies have indicated that the higher order structure of mRNA in vivo differs from the in vitro structure of the same sequence (see, e.g., Rouskin et al. (2014). Nature 505, pp. 701-705, incorporated herein in its entirety for all purposes).
- mRNA from the sample in one embodiment, is hybridized to a synthetic DNA probe, which in some embodiments, includes a detection moiety (e.g., detectable label, capture sequence, barcode reporting sequence). Accordingly, in these embodiments, a non-natural mRNA-cDNA complex is ultimately made and used for detection of the biomarker.
- mRNA from the sample is directly labeled with a detectable label, e.g., a fluorophore.
- the non-natural labeled-mRNA molecule is hybridized to a cDNA probe and the complex is detected.
- cDNA complementary DNA
- cDNA-mRNA hybrids are synthetic and do not exist in vivo.
- cDNA is necessarily different than mRNA, as it includes deoxyribonucleic acid and not ribonucleic acid.
- the cDNA is then amplified, for example, by the polymerase chain reaction (PCR) or other amplification method known to those of ordinary skill in the art.
- LCR ligase chain reaction
- Genomics 4:560 (1989)
- Landegren et al. Science, 241:1077 (1988)
- transcription amplification Kwoh et al., Proc. Natl. Acad. Sci. USA, 86:1173 (1989), incorporated by reference in its entirety for all purposes
- self-sustained sequence replication Guatelli et al., Proc. Nat. Acad. Sci.
- RNA based sequence amplification RNA based sequence amplification
- NASBA nucleic acid based sequence amplification
- the product of this amplification reaction i.e., amplified cDNA is also necessarily a non-natural product.
- cDNA is a non-natural molecule.
- the amplification process serves to create hundreds of millions of cDNA copies for every individual cDNA molecule of starting material. The numbers of copies generated are far removed from the number of copies of mRNA that are present in vivo.
- cDNA is amplified with primers that introduce an additional DNA sequence (e.g., adapter, reporter, capture sequence or moiety, barcode) onto the fragments (e.g., with the use of adapter-specific primers), or mRNA or cDNA biomarker sequences are hybridized directly to a cDNA probe comprising the additional sequence (e.g., adapter, reporter, capture sequence or moiety, barcode).
- Amplification and/or hybridization of mRNA to a cDNA probe therefore serves to create non-natural double stranded molecules from the non-natural single stranded cDNA, or the mRNA, by introducing additional sequences and forming non-natural hybrids.
- amplification procedures have error rates associated with them. Therefore, amplification introduces further modifications into the cDNA molecules.
- a detectable label e.g., a fluorophore
- a detectable label is added to single strand cDNA molecules.
- Amplification therefore also serves to create DNA complexes that do not occur in nature, at least because (i) cDNA does not exist in vivo, (i) adapter sequences are added to the ends of cDNA molecules to make DNA sequences that do not exist in vivo, (ii) the error rate associated with amplification further creates DNA sequences that do not exist in vivo, (iii) the disparate structure of the cDNA molecules as compared to what exists in nature, and (iv) the chemical addition of a detectable label to the cDNA molecules.
- the expression of a biomarker of interest is detected at the nucleic acid level via detection of non-natural cDNA molecules.
- the biomarkers described herein can include RNA comprising the entire or partial sequence of any of the nucleic acid sequences of interest, or their non-natural cDNA product, obtained synthetically in vitro in a reverse transcription reaction.
- fragment is intended to refer to a portion of the polynucleotide that generally comprise at least 10, 15, 20, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 800, 900, 1,000, 1,200, or 1,500 contiguous nucleotides, or up to the number of nucleotides present in a full-length biomarker polynucleotide disclosed herein.
- a fragment of a biomarker polynucleotide will generally encode at least 15, 25, 30, 50, 100, 150, 200, or 250 contiguous amino acids, or up to the total number of amino acids present in a full-length biomarker protein as provided herein.
- Isolated mRNA can be used in hybridization or amplification assays that include, but are not limited to, Southern or Northern analyses, PCR analyses and probe arrays, NanoString Assays.
- One method for the detection of mRNA levels involves contacting the isolated mRNA or synthesized cDNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected.
- the nucleic acid probe can be, for example, a cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250, or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to the non-natural cDNA or mRNA biomarker provided herein.
- the measuring or detecting step in any method provided herein is at the nucleic acid level by performing RNA-seq, a reverse transcriptase polymerase chain reaction (RT-PCR) or a hybridization assay with oligonucleotides that are substantially complementary to portions of cDNA molecules of the at least one classifier biomarker (such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein) under conditions suitable for RNA-seq, RT-PCR or hybridization and obtaining expression levels of the at least one classifier biomarkers based on the detecting step.
- RNA-seq a reverse transcriptase polymerase chain reaction
- RT-PCR reverse transcriptase polymerase chain reaction
- hybridization assay with oligonucleotides that are substantially complementary to portions of cDNA molecules of the at least one classifier biomarker (such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein) under conditions
- the method for COCA subtyping includes not only detecting expression levels of a classifier biomarker set in a sample obtained from a subject, but can further comprise detecting expression levels of said classifier biomarker set in one or more control or reference samples.
- the one or more control or reference samples can be selected from a normal or cancer-free sample, a cancer sample of a specific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) or any combination thereof.
- a specific COCA subtype e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV
- the detecting includes all of the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein at the nucleic acid level or protein level. In some embodiments, the detecting includes all of the classifier biomarkers of Table 1 at the nucleic acid level or protein level.
- a single or a subset or a plurality of the classifier biomarkers of Table 1 are detected, for example, from about 1 to about 4, from about 4 to about 8, from about 8 to about 12, from about 12 to about 16, from about 16 to about 20, from about 20 to about 24, from about 24 to about 28, from about 28 to about 32, from about 32 to about 36, from about 36 to about 40, from about 40 to about 44, from about 44 to about 48, from about 48 to about 52, from about 52 to about 56, from about 56 to about 60, from about 60 to about 64, from about 64 to about 68, from about 68 to about 72, from about 72 to about 76, from about 76 to about 80 of the biomarkers in Table 1 are detected in a method to determine the COCA subtype.
- each of the biomarkers from Table 1 is detected in a method to determine the COCA subtype.
- any of 84 of the biomarkers from Table 1 are selected as the gene signatures for a specific COCA subtype.
- the detecting can be performed by any suitable technique including, but not limited to, RNA-seq, a reverse transcriptase polymerase chain reaction (RT-PCR), a microarray hybridization assay, or another hybridization assay, e.g., a NanoString assay for example, with primers and/or probes specific to the classifier biomarkers, and/or the like.
- the primers useful for the amplification methods are any forward and reverse primers suitable for binding to a classifier biomarker provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein.
- the mRNA is obtained from a sample (e.g., form a subject suffering from or suspected of suffering from cancer or a control subject), it is converted to complementary DNA (cDNA) in a hybridization reaction.
- Conversion of the mRNA to cDNA can be performed with oligonucleotides or primers comprising sequence that is complementary to a portion of a specific mRNA.
- Conversion of the mRNA to cDNA can be performed with oligonucleotides or primers comprising random sequence.
- Conversion of the mRNA to cDNA can be performed with oligonucleotides or primers comprising sequence that is complementary to the poly(A) tail of an mRNA.
- cDNA does not exist in vivo and therefore is a non-natural molecule.
- the cDNA is then amplified, for example, by the polymerase chain reaction (PCR) or other amplification method known to those of ordinary skill in the art.
- PCR can be performed with the forward and/or reverse primers comprising sequence complementary to at least a portion of a classifier biomarker provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein.
- the product of this amplification reaction, i.e., amplified cDNA is necessarily a non-natural product.
- cDNA is a non-natural molecule.
- the amplification process serves to create hundreds of millions of cDNA copies for every individual cDNA molecule of starting material. The number of copies generated is far removed from the number of copies of mRNA that are present in vivo.
- cDNA is amplified with primers that introduce an additional DNA sequence (adapter sequence) onto the fragments (with the use of adapter-specific primers).
- the adaptor sequence can be a tail, wherein the tail sequence is not complementary to the cDNA.
- the forward and/or reverse primers comprising sequence complementary to at least a portion of a classifier biomarker provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein can comprise tail sequence.
- Amplification therefore serves to create non-natural double stranded molecules from the non-natural single stranded cDNA, by introducing barcode, adapter and/or reporter sequences onto the already non-natural cDNA.
- a detectable label e.g., a fluorophore
- Amplification therefore also serves to create DNA complexes that do not occur in nature, at least because (i) cDNA does not exist in vivo, (ii) adapter sequences are added to the ends of cDNA molecules to make DNA sequences that do not exist in vivo, (iii) the error rate associated with amplification further creates DNA sequences that do not exist in vivo, (iv) the disparate structure of the cDNA molecules as compared to what exists in nature, and (v) the chemical addition of a detectable label to the cDNA molecules.
- a detectable label e.g., a fluorophore
- the synthesized cDNA (for example, amplified cDNA) is immobilized on a solid surface via hybridization with a probe, e.g., via a microarray.
- cDNA products are detected via real-time polymerase chain reaction (PCR) via the introduction of fluorescent probes that hybridize with the cDNA products.
- PCR real-time polymerase chain reaction
- biomarker detection is assessed by quantitative fluorogenic RT-PCR (e.g., with TaqMan® probes).
- PCR analysis well known methods are available in the art for the determination of primer sequences for use in the analysis.
- the measuring or detecting step in any method provided herein is performed via a hybridization assay that comprises probing the levels of at least one of the classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein, at the nucleic acid level, in a tumor sample obtained from the patient.
- a hybridization assay that comprises probing the levels of at least one of the classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein, at the nucleic acid level, in a tumor sample obtained from the patient.
- the probing step comprises mixing the sample with one or more oligonucleotides that are substantially complementary to portions of cDNA molecules of the at least one classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein under conditions suitable for hybridization of the one or more oligonucleotides to their complements or substantial complements; detecting whether hybridization occurs between the one or more oligonucleotides to their complements or substantial complements; and obtaining hybridization values of the at least one classifier biomarkers based on the detecting step.
- the hybridization values of the at least one classifier biomarkers are then compared to reference hybridization value(s) from at least one sample training set.
- the tumor sample is classified, for example, as a COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) based on the results of the comparing step.
- the hybridization values of the tumor sample can be compared to centroid(s) constructed from the hybridization values of the training set.
- the hybridization reaction utilized in methods provided herein employs a capture probe and/or a reporter probe.
- the hybridization probe is a probe derivatized to a solid surface such as a bead, glass or silicon substrate.
- the capture probe is present in solution and mixed with the patient's sample, followed by attachment of the hybridization product to a surface, e.g., via a biotin-avidin interaction (e.g., where biotin is a part of the capture probe and avidin is on the surface).
- the hybridization assay employs both a capture probe and a reporter probe.
- the reporter probe can hybridize to either the capture probe or the biomarker nucleic acid.
- Reporter probes e.g., are then counted and detected to determine the level of biomarker(s) in the sample.
- the capture and/or reporter probe in one embodiment contain a detectable label, and/or a group that allows functionalization to a surface.
- nCounter gene analysis system see, e.g., Geiss et al. (2008) Nat. Biotechnol. 26, pp. 317-325, incorporated by reference in its entirety for all purposes, is amenable for use with the methods provided herein.
- Hybridization assays described in U.S. Pat. Nos. 7,473,767 and 8,492,094, the disclosures of which are incorporated by reference in their entireties for all purposes, are amenable for use with the methods provided herein, i.e., to detect the biomarkers and biomarker combinations described herein.
- Biomarker levels may be monitored using a membrane blot (such as used in hybridization analysis such as Northern, Southern, dot, and the like), or microwells, sample tubes, gels, beads, or fibers (or any solid support comprising bound nucleic acids). See, for example, U.S. Pat. Nos. 5,770,722, 5,874,219, 5,744,305, 5,677,195 and 5,445,934, each incorporated by reference in their entireties.
- microarrays are used to detect biomarker levels. Microarrays are particularly well suited for this purpose because of the reproducibility between different experiments. DNA microarrays provide one method for the simultaneous measurement of the expression levels of large numbers of genes. Each array consists of a reproducible pattern of capture probes attached to a solid support. Labeled RNA or DNA is hybridized to complementary probes on the array and then detected by laser scanning hybridization intensities for each probe on the array are determined and converted to a quantitative value representing relative gene expression levels. See, for example, U.S. Pat. Nos. 6,040,138, 5,800,992 and 6,020,135, 6,033,860, and 6,344,316, each incorporated by reference in their entireties. High-density oligonucleotide arrays are particularly useful for determining the gene expression profile for a large number of RNAs in a sample.
- arrays can be nucleic acids (or peptides) on beads, gels, polymeric surfaces, fibers (such as fiber optics), glass, or any other appropriate substrate. See, for example, U.S. Pat. Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992, each incorporated by reference in their entireties.
- Arrays can be packaged in such a manner as to allow for diagnostics or other manipulation of an all-inclusive device. See, for example, U.S. Pat. Nos. 5,856,174 and 5,922,591, each incorporated by reference in their entireties.
- Serial analysis of gene expression in one embodiment is employed in the methods described herein.
- SAGE is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need of providing an individual hybridization probe for each transcript.
- a short sequence tag (about 10-14 bp) is generated that contains sufficient information to uniquely identify a transcript, provided that the tag is obtained from a unique position within each transcript.
- many transcripts are linked together to form long serial molecules, that can be sequenced, revealing the identity of the multiple tags simultaneously.
- the expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. See, Velculescu et al. Science 270:484-87, 1995; Cell 88:243-51, 1997, incorporated by reference in its entirety.
- the measuring or detecting step in any method provided herein is performed via an amplification assay.
- the amplification assay can be coupled with a sequencing method.
- a method of biomarker level analysis at the nucleic acid level as provided herein utilizes an amplification reaction coupled with a sequencing method such as, for example, RNAseq, next generation sequencing, and massively parallel signature sequencing (MPSS) as described by Brenner et al. (Nat. Biotech. 18:630-34, 2000, incorporated by reference in its entirety).
- MPSS is a sequencing approach that combines non-gel-based signature sequencing with in vitro cloning of millions of templates on separate 5 ⁇ m diameter microbeads.
- a microbead library of DNA templates is constructed by in vitro cloning. This is followed by the assembly of a planar array of the template-containing microbeads in a flow cell at a high density (typically greater than 3.0 ⁇ 10 6 microbeads/cm 2 ). The free ends of the cloned templates on each microbead are analyzed simultaneously, using a fluorescence-based signature sequencing method that does not require DNA fragment separation. This method has been shown to simultaneously and accurately provide, in a single operation, hundreds of thousands of gene signature sequences from a yeast cDNA library.
- the expression level values of the at least one classifier biomarkers obtained from the amplification and/or sequencing assay are then compared to reference expression level value(s) from at least one sample training set.
- the tumor sample is classified, for example, as a COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) based on the results of the comparing step
- RNA amplification method such as, for example, RT-PCR or quantitative RT-PCR (qRT-PCR).
- Methods for determining the level of biomarker mRNA in a sample may involve the process of nucleic acid amplification, e.g., by RT-PCR (the experimental embodiment set forth in Mullis, 1987, U.S. Pat. No. 4,683,202), ligase chain reaction (Barany (1991) Proc. Natl. Acad. Sci. USA 88:189-193), self-sustained sequence replication (Guatelli et al. (1990) Proc. Natl. Acad. Sci.
- PCR qRT-PCR protocols
- a target polynucleotide sequence is amplified by reaction with at least one oligonucleotide primer or pair of oligonucleotide primers.
- the primer(s) hybridize to a complementary region of the target nucleic acid and a DNA polymerase extends the primer(s) to amplify the target sequence.
- a nucleic acid fragment of one size dominates the reaction products (the target polynucleotide sequence which is the amplification product).
- the amplification cycle is repeated to increase the concentration of the single target polynucleotide sequence.
- the reaction can be performed in any thermocycler commonly used for PCR.
- Quantitative RT-PCR (also referred as real-time RT-PCR) is preferred under some circumstances because it provides not only a quantitative measurement, but also reduced time and contamination.
- quantitative PCR or “real time qRT-PCR” refers to the direct monitoring of the progress of a PCR amplification as it is occurring without the need for repeated sampling of the reaction products.
- the reaction products may be monitored via a signaling mechanism (e.g., fluorescence) as they are generated and are tracked after the signal rises above a background level but before the reaction reaches a plateau.
- a signaling mechanism e.g., fluorescence
- a DNA binding dye e.g., SYBR green
- a labeled probe can be used to detect the extension product generated by PCR amplification. Any probe format utilizing a labeled probe comprising the sequences provided herein may be used.
- Immunohistochemistry methods are also suitable for detecting the levels of the biomarkers provided herein.
- Samples can be frozen for later preparation or immediately placed in a fixative solution.
- Tissue samples can be fixed by treatment with a reagent, such as formalin, glutaraldehyde, methanol, or the like and embedded in paraffin.
- a reagent such as formalin, glutaraldehyde, methanol, or the like.
- COCA subtypes can be evaluated using levels of protein expression of one or more of the classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein.
- the level of protein expression can be measured using an immunological detection method.
- Immunological detection methods which can be used herein include, but are not limited to, competitive and non-competitive assay systems using techniques such as Western blots, radioimmunoassays, ELISA (enzyme linked immunosorbent assay), “sandwich” immunoassays, immunoprecipitation assays, precipitin reactions, gel diffusion precipitin reactions, immunodiffusion assays, agglutination assays, complement-fixation assays, immunoradiometric assays, fluorescent immunoassays, protein A immunoassays, and the like.
- competitive and non-competitive assay systems using techniques such as Western blots, radioimmunoassays, ELISA (enzyme linked immunosorbent assay), “sandwich” immunoassays, immunoprecipitation assays, precipitin reactions, gel diffusion precipitin reactions, immunodiffusion assays, agglutination assays, complement-fixation assays, immunoradiometric as
- antibodies specific for biomarker proteins are utilized to detect the expression of a biomarker protein in a sample (e.g., tumor sample).
- the method comprises obtaining a sample from a patient or a subject, contacting the sample with at least one antibody directed to a biomarker that is selectively expressed in cancer cells, and detecting antibody binding to determine if the biomarker is expressed in the patient sample.
- an immunocytochemistry technique for diagnosing COCA subtypes.
- One of skill in the art will recognize that the immunocytochemistry method described herein below may be performed manually or in an automated fashion.
- the expression level of a classifier biomarker(s) is determined by normalization to the level of reference nucleic acid(s) (e.g., RNA transcripts) or their expression products (e.g., proteins), which can be all measured nucleic acids (e.g., transcripts (or their products)) in the sample or a particular reference set of nucleic acids (e.g., RNA transcripts (or their non-natural cDNA products)).
- reference nucleic acid(s) e.g., RNA transcripts
- their expression products e.g., proteins
- Normalization is performed to correct for or normalize away both differences in the amount of nucleic acid (e.g., RNA or cDNA) assayed and variability in the quality of the nucleic acid (e.g., RNA or cDNA) used. Therefore, an assay typically measures and incorporates the expression of certain normalizing genes, including well known housekeeping genes, such as, for example, GAPDH and/or ⁇ -Actin. Alternatively, normalization can be based on the mean or median signal of all of the assayed biomarkers or a large subset thereof (global normalization approach).
- the levels of the biomarkers provided herein are normalized against the expression levels of all RNA transcripts or their non-natural cDNA expression products, or protein products in the sample, or of a reference set of RNA transcripts or a reference set of their non-natural cDNA expression products, or a reference set of their protein products in the sample.
- the levels of the biomarkers provided herein are normalized against the expression levels of all RNA transcripts or their non-natural cDNA expression products, or protein products in the sample, or of a reference set of RNA transcripts or a reference set of their non-natural cDNA expression products, or a reference set of their protein products in the sample.
- the methods set forth herein provide a method for determining the COCA subtype of a patient.
- the biomarker levels e.g., Table 1 or any other gene signature provided herein
- the biomarker levels are compared to reference values or a reference sample as provided herein, for example with the use of statistical methods or direct comparison of detected levels, to make a determination of the COCA subtype.
- the patient's tumor sample is classified, e.g., as a specific COCA subtype (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA).
- a specific COCA subtype e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9
- expression level values of the at least one classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 are compared to reference expression level value(s) from at least one sample training set, wherein the at least one sample training set comprises expression level values from a reference sample(s).
- the at least one sample training set comprises expression level values of the at least one classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein from a specific COCA subtype (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA) or a combination thereof.
- a specific COCA subtype e.g., C1 ACC/PCPG, C2 G
- hybridization values of the at least one classifier biomarkers provided herein are compared to reference hybridization value(s) from at least one sample training set, wherein the at least one sample training set comprises hybridization values from a reference sample(s).
- the at least one sample training set comprises hybridization values of the at least one classifier biomarker provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein from a specific COCA subtype (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA) or a combination thereof.
- a specific COCA subtype e.g., C1 ACC/PCPG, C2 G
- Methods for comparing detected levels of biomarkers to reference values and/or reference samples are provided herein. Based on this comparison, in one embodiment a correlation between the biomarker levels obtained from the subject's sample and the reference values is obtained. An assessment of the COCA subtype is then made.
- biomarker levels obtained from the patient and reference biomarker levels for example, from at least one sample training set.
- a supervised pattern recognition method is employed.
- supervised pattern recognition methods can include, but are not limited to, the nearest centroid methods (Dabney (2005) Bioinformatics 21(22):4148-4154 and Tibshirani et al. (2002) Proc. Natl. Acad. Sci.
- the classifier for identifying COCA subtypes based on gene expression data is used in a centroid based method as described in Mullins et al. (2007) Clin Chem. 53(7):1273-9, which is incorporated herein by reference in its entirety.
- the classifier for identifying tumor subtypes based on gene expression data is used in a nearest centroid based method as described in Dabney (2005) Bioinformatics 21(22):4148-4154, which is incorporated herein by reference in its entirety.
- the nearest centroid based method can be performed using CLaNC software as described in Dabney A R. ClaNC: Point-and-click software for classifying microarrays to nearest centroids. Bioinformatics. 2006; 22: 122-123 or equivalents or derivatives thereof.
- an unsupervised training approach is employed, and therefore, no training set is used.
- a sample training set(s) can include expression data of a plurality or all of the classifier biomarkers (e.g., all the classifier biomarkers of Table 1) from a specific COCA subtype sample.
- the plurality of classifier biomarkers can comprise at least 4 classifier biomarkers, at least 8 classifier biomarkers, at least 12 classifier biomarkers, at least 16 classifier biomarkers at least 20 classifier biomarkers, at least 24 classifier biomarkers, at least 28 classifier biomarkers, at least 32 classifier biomarkers, at least 36 classifier biomarkers, at least 40 classifier biomarkers, at least 44 classifier biomarkers, at least 48 classifier biomarkers, at least 52 classifier biomarkers, at least 56 classifier biomarkers, at least 60 classifier biomarkers, at least 64 classifier biomarkers, at least 68 classifier biomarkers, at least 72 classifier biomarkers, at least 76 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- the plurality of classifier biomarkers comprises all 84 biomarkers of Table 1.
- the sample training set(s) are normalized to remove sample-to-sample variation.
- comparing can include applying a statistical algorithm, such as, for example, any suitable multivariate statistical analysis model, which can be parametric or non-parametric.
- applying the statistical algorithm can include determining a correlation between the expression data obtained from the tumor sample obtained from the subject suffering from or suspected of suffering from cancer (i.e., the test subject) and the expression data from the COCA subtyping training set(s).
- cross-validation is performed, such as (for example), leave-one-out cross-validation (LOOCV).
- integrative correlation is performed.
- LOOCV leave-one-out cross-validation
- Spearman correlation is performed.
- a centroid based method based on gene expression data is employed for the statistical algorithm.
- the centroids can be constructed using any method known in the art for generating centroids such as, for example, those found in Mullins et al. (2007) Clin Chem. 53(7):1273-9 or the nearest centroid method found in Dabney (2005) Bioinformatics 21(22):4148-4154, which is herein incorporated by reference in its entirety.
- a correlation analysis is performed on the expression data obtained from the tumor sample and the centroid(s) constructed from the expression data from the COCA training set(s).
- the correlation analysis can be a Spearman correlation or a Pearson correlation.
- a distance measure analysis e.g., Euclidean distance
- Results of the gene expression analysis performed on a sample from a subject may be compared to a biological sample(s) or data derived from a biological sample(s) (e.g., expression data or levels from at least one classifier biomarker provided herein, e.g., Table 1) that is known or suspected to be normal (“reference sample” or “normal sample”, e.g., non-cancer sample).
- a biological sample(s) or data derived from a biological sample(s) e.g., expression data or levels from at least one classifier biomarker provided herein, e.g., Table 1
- reference sample or “normal sample”, e.g., non-cancer sample
- a reference sample or reference gene expression data (e.g., expression data or levels from at least one classifier biomarker provided herein, e.g., Table 1) is obtained or derived from an individual known to have a particular COCA subtype of cancer, e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA.
- COCA subtype of cancer e.g., C1 ACC/PCPG, C2 GBM/LGG
- the gene expression levels or profile measured for the at least one classifier biomarkers from Table 1 measured or detected in the test sample may be compared to centroids constructed from the gene expression performed on the reference or normal sample or training set and classification can be based on determining which is the nearest centroid based on distance measure such as, for example, a Euclidean distance or a correlation.
- the centroids can be constructed using any of the methods provided herein such as, for example, using the ClaNC software described in Dabney A R. ClaNC: Point-and-click software for classifying microarrays to nearest centroids. Bioinformatics. 2006; 22: 122-123 or equivalents or derivatives related thereto.
- Classification or determination of the subtype of the test sample can then be ascertained by determining the nearest centroid from the reference or normal sample to which the expression levels or profile from said test sample is nearest based on a distance measure or correlation.
- the distance measure can be a Euclidean distance.
- the reference sample may be assayed at the same time, or at a different time from the sample obtained from the test subject (i.e., test sample).
- the biomarker level information from a reference sample may be stored in a database or other means for access at a later date.
- the biomarker level results of an assay on the test sample may be compared to the results of the same assay on a reference sample.
- the results of the assay on the reference sample are from a database, or a reference value(s).
- the results of the assay on the reference sample are a known or generally accepted value or range of values by those skilled in the art.
- the comparison is qualitative. In other cases, the comparison is quantitative.
- qualitative or quantitative comparisons may involve but are not limited to one or more of the following: comparing expression levels of a test sample to gene centroids constructed from expression level data from a reference sample (e.g., constructed from expression level data for one or a plurality of genes from Table 1), fluorescence values, spot intensities, absorbance values, chemiluminescent signals, histograms, critical threshold values, statistical significance values, expression levels of the genes described herein, mRNA copy numbers.
- a reference sample e.g., constructed from expression level data for one or a plurality of genes from Table 1
- fluorescence values e.g., spot intensities, absorbance values, chemiluminescent signals, histograms, critical threshold values, statistical significance values, expression levels of the genes described herein, mRNA copy numbers.
- an odds ratio is calculated for each biomarker level panel measurement.
- the OR is a measure of association between the measured biomarker values for the patient and an outcome, e.g., COCA subtype. For example, see, J. Can. Acad. Child Adolesc. Psychiatry 2010; 19(3): 227-229, which is incorporated by reference in its entirety for all purposes.
- a specified statistical confidence level may be determined in order to provide a confidence level regarding the COCA subtype. For example, it may be determined that a confidence level of greater than 90% may be a useful predictor of the COCA subtype. In other embodiments, more or less stringent confidence levels may be chosen. For example, a confidence level of about or at least about 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, 99.5%, or 99.9% may be chosen. The confidence level provided may in some cases be related to the quality of the sample, the quality of the data, the quality of the analysis, the specific methods used, and/or the number of gene expression values (i.e., the number of genes) analyzed.
- the specified confidence level for providing the likelihood of response may be chosen on the basis of the expected number of false positives or false negatives.
- Methods for choosing parameters for achieving a specified confidence level or for identifying markers with diagnostic power include but are not limited to Receiver Operating Characteristic (ROC) curve analysis, binomial ROC, principal component analysis, odds ratio analysis, partial least squares analysis, singular value decomposition, least absolute shrinkage and selection operator analysis, least angle regression, and the threshold gradient directed regularization method.
- ROC Receiver Operating Characteristic
- Determining the COCA subtype in some cases can be improved through the application of algorithms designed to normalize and or improve the reliability of the gene expression data.
- the data analysis utilizes a computer or other device, machine or apparatus for application of the various algorithms described herein due to the large number of individual data points that are processed.
- a “machine learning algorithm” refers to a computational-based prediction methodology, also known to persons skilled in the art as a “classifier,” employed for characterizing a gene expression profile or profiles, e.g., to determine the COCA subtype.
- the biomarker levels are in one embodiment subjected to the algorithm in order to classify the profile.
- Supervised learning generally involves “training” a classifier to recognize the distinctions among COCA subtypes such as, for example, C1 ACC/PCPG positive, C2 GBM/LGG positive, C3 OV positive, C4 Squamous-like positive, C6 LUAD-Enriched positive, C8 PAAD/some STAD positive, C9 UCS positive, C10 BRCA/Basal positive, C12 UCEC positive, C14 PRAD positive, C15 CESC (subset of cervical) positive, C16 BLCA positive, C17 TGCT positive, C19 COAD/READ positive, C20 SARC/MESO positive, C21 KIRK/KICH/KIRP positive, C22 Liver positive, C24 BRCA/Luminal positive, C25 THYM positive, C26
- the classifier can be used to predict, for example, the class (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA) in which the samples belong.
- the machine learning algorithm can be a CLaNC algorithm as provided herein.
- a robust multi-array average (RMA) method may be used to normalize raw data.
- the RMA method begins by computing background-corrected intensities for each matched cell on a number of microarrays.
- the background corrected values are restricted to positive values as described by Irizarry et al. (2003). Biostatistics April 4 (2): 249-64, incorporated by reference in its entirety for all purposes. After background correction, the base-2 logarithm of each background corrected matched-cell intensity is then obtained.
- the background corrected, log-transformed, matched intensity on each microarray is then normalized using the quantile normalization method in which for each input array and each probe value, the array percentile probe value is replaced with the average of all array percentile points, this method is more completely described by Bolstad et al. Bioinformatics 2003, incorporated by reference in its entirety.
- the normalized data may then be fit to a linear model to obtain an intensity measure for each probe on each microarray.
- Tukey's median polish algorithm (Tukey, J. W., Exploratory Data Analysis. 1977, incorporated by reference in its entirety for all purposes) may then be used to determine the log-scale intensity level for the normalized probe set data.
- Various other software programs may be implemented.
- feature selection and model estimation may be performed by logistic regression with lasso penalty using glmnet (Friedman et al. (2010). Journal of statistical software 33(1): 1-22, incorporated by reference in its entirety).
- Raw reads may be aligned using TopHat (Trapnell et al. (2009). Bioinformatics 25(9): 1105-11, incorporated by reference in its entirety).
- top features N ranging from 10 to 200
- SVM linear support vector machine
- Confidence intervals are computed using the pROC package (Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics 2011; 12: 77, incorporated by reference in its entirety).
- data may be filtered to remove data that may be considered suspect.
- data derived from microarray probes that have fewer than about 4, 5, 6, 7 or 8 guanosine+cytosine nucleotides may be considered to be unreliable due to their aberrant hybridization propensity or secondary structure issues.
- data deriving from microarray probes that have more than about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22 guanosine+cytosine nucleotides may in one embodiment be considered unreliable due to their aberrant hybridization propensity or secondary structure issues.
- data from probe-sets may be excluded from analysis if they are not identified at a detectable level (above background).
- probe-sets that exhibit no, or low variance may be excluded from further analysis.
- Low-variance probe-sets are excluded from the analysis via a Chi-Square test.
- a probe-set is considered to be low-variance if its transformed variance is to the left of the 99 percent confidence interval of the Chi-Squared distribution with (N ⁇ 1) degrees of freedom. (N ⁇ 1)*Probe-set Variance/(Gene Probe-set Variance).
- Chi-Sq(N ⁇ 1) where N is the number of input CEL files, (N ⁇ 1) is the degrees of freedom for the Chi-Squared distribution, and the “probe-set variance for the gene” is the average of probe-set variances across the gene.
- probe-sets for a given mRNA or group of mRNAs may be excluded from further analysis if they contain less than a minimum number of probes that pass through the previously described filter steps for GC content, reliability, variance and the like.
- probe-sets for a given gene or transcript cluster may be excluded from further analysis if they contain less than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or less than about 20 probes.
- Methods of biomarker level data analysis in one embodiment further include the use of a feature selection algorithm as provided herein.
- feature selection is provided by use of the LIMMA software package (Smyth, G. K. (2005). Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds.), Springer, New York, pages 397-420, incorporated by reference in its entirety for all purposes).
- Methods of biomarker level data analysis include the use of a pre-classifier algorithm.
- a pre-classifier algorithm may use a specific molecular fingerprint to pre-classify the samples according to their composition and then apply a correction/normalization factor. This data/information may then be fed in to a final classification algorithm which would incorporate that information to aid in the final diagnosis.
- Methods of biomarker level data analysis further include the use of a classifier algorithm as provided herein.
- a diagonal linear discriminant analysis k-nearest neighbor algorithm, support vector machine (SVM) algorithm, linear support vector machine, random forest algorithm, or a probabilistic model-based method or a combination thereof is provided for classification of microarray data.
- identified markers that distinguish samples e.g., of varying biomarker level profiles, and/or varying COCA subtypes of cancer are selected based on statistical significance of the difference in biomarker levels between classes of interest. In some cases, the statistical significance is adjusted by applying a Benjamin Hochberg or another correction for false discovery rate (FDR).
- FDR Benjamin Hochberg or another correction for false discovery rate
- the classifier algorithm may be supplemented with a meta-analysis approach such as that described by Fishel and Kaufman et al. 2007 Bioinformatics 23(13): 1599-606, incorporated by reference in its entirety for all purposes. In some cases, the classifier algorithm may be supplemented with a meta-analysis approach such as a repeatability analysis.
- posterior probabilities may be used in the methods provided herein to rank the markers provided by the classifier algorithm.
- a statistical evaluation of the results of the biomarker level profiling may provide a quantitative value or values indicative of one or more of the following: COCA subtype of cancer; the likelihood of the success of a particular therapeutic intervention, e.g., angiogenesis inhibitor therapy, chemotherapy, or immunotherapy.
- the data is presented directly to the physician in its most useful form to guide patient care, or is used to define patient populations in clinical trials or a patient population for a given medication.
- results of the molecular profiling can be statistically evaluated using a number of methods known to the art including, but not limited to: the students T test, the two sided T test, Pearson rank sum analysis, hidden Markov model analysis, analysis of q-q plots, principal component analysis, one way ANOVA, two way ANOVA, LIMMA and the like.
- accuracy may be determined by tracking the subject over time to determine the accuracy of the original diagnosis. In other cases, accuracy may be established in a deterministic manner or using statistical methods. For example, receiver operator characteristic (ROC) analysis may be used to determine the optimal assay parameters to achieve a specific level of accuracy, specificity, positive predictive value, negative predictive value, and/or false discovery rate.
- ROC receiver operator characteristic
- the results of the biomarker level profiling assays are entered into a database for access by representatives or agents of a molecular profiling business, the individual, a medical provider, or insurance provider.
- assay results include sample classification, identification, or diagnosis by a representative, agent or consultant of the business, such as a medical professional.
- a computer or algorithmic analysis of the data is provided automatically.
- the molecular profiling business may bill the individual, insurance provider, medical provider, researcher, or government entity for one or more of the following: molecular profiling assays performed, consulting services, data analysis, reporting of results, or database access.
- the results of the biomarker level profiling assays are presented as a report on a computer screen or as a paper record.
- the report may include, but is not limited to, such information as one or more of the following: the levels of biomarkers (e.g., as reported by copy number or fluorescence intensity, etc.) as compared to the reference sample or reference value(s); the likelihood the subject will respond to a particular therapy, based on the biomarker level values and the COCA subtype and proposed therapies.
- the results of the gene expression profiling may be classified into one or more of the following: C1 ACC/PCPG positive, C2 GBM/LGG positive, C3 OV positive, C4 Squamous-like positive, C6 LUAD-Enriched positive, C8 PAAD/some STAD positive, C9 UCS positive, C10 BRCA/Basal positive, C12 UCEC positive, C14 PRAD positive, C15 CESC (subset of cervical) positive, C16 BLCA positive, C17 TGCT positive, C19 COAD/READ positive, C20 SARC/MESO positive, C21 KIRK/KICH/KIRP positive, C22 Liver positive, C24 BRCA/Luminal positive, C25 THYM positive, C26 SKCM/UVM positive or C28 THCA positive, C1 ACC/PCPG negative, C2 GBM/LGG negative, C3 OV negative, C4 Squamous-like negative, C6 LUAD-Enriched negative, C8 PAAD/some
- results are classified using a trained algorithm.
- Trained algorithms provided herein include algorithms that have been developed using a reference set of known gene expression values and/or normal samples, for example, samples from individuals diagnosed with a particular molecular COCA subtype of cancer.
- a reference set of known gene expression values are obtained from individuals who have been diagnosed with a particular COCA subtype of cancer.
- a reference set of known gene expression values are obtained from individuals who have been diagnosed with a particular COCA subtype of cancer, and are also known to possess certain immune cell signature.
- a reference set of known gene expression values are obtained from individuals who have been diagnosed with a particular COCA subtype of cancer, and are also known to have certain expression of tumor driver genes.
- Algorithms suitable for categorization of samples include but are not limited to k-nearest neighbor algorithms, support vector machines, linear discriminant analysis, centroid algorithms (e.g., CLaNC), diagonal linear discriminant analysis, updown, naive Bayesian algorithms, neural network algorithms, hidden Markov model algorithms, genetic algorithms, or any combination thereof.
- a binary classifier When a binary classifier is compared with actual true values (e.g., values from a biological sample), there are typically four possible outcomes. If the outcome from a prediction is p (where “p” is a positive classifier output, such as the presence of a deletion or duplication syndrome) and the actual value is also p, then it is called a true positive (TP); however if the actual value is n then it is said to be a false positive (FP). Conversely, a true negative has occurred when both the prediction outcome and the actual value are n (where “n” is a negative classifier output, such as no deletion or duplication syndrome), and false negative is when the prediction outcome is n while the actual value is p.
- p is a positive classifier output, such as the presence of a deletion or duplication syndrome
- the positive predictive value is the proportion of subjects with positive test results who are correctly diagnosed as likely or unlikely to respond, or diagnosed with the correct COCA subtype, or a combination thereof. It reflects the probability that a positive test reflects the underlying condition being tested for. Its value does however depend on the prevalence of the disease, which may vary. In one example, the following characteristics are provided: FP (false positive); TN (true negative); TP (true positive); FN (false negative).
- False positive rate ( ⁇ ) FP/(FP+TN) ⁇ specificity
- False negative rate ( ⁇ ) FN/(TP+FN) ⁇ sensitivity
- Likelihood-ratio positive sensitivity/(1 ⁇ specificity)
- Likelihood-ratio negative (1 ⁇ sensitivity)/specificity.
- the negative predictive value (NPV) is the proportion of subjects with negative test results who are correctly diagnosed.
- the results of the biomarker level analysis of the subject methods provide a statistical confidence level that a given diagnosis is correct.
- such statistical confidence level is at least about, or more than about 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% 99.5%, or more.
- the method further includes classifying the tumor tissue sample as a particular COCA subtype based on the comparison of biomarker levels in the sample and reference biomarker levels, for example present in at least one training set.
- the tumor tissue sample is classified as a particular subtype if the results of the comparison meet one or more criterion such as, for example, a minimum percent agreement, a value of a statistic calculated based on the percentage agreement such as (for example) a kappa statistic, a minimum correlation (e.g., Pearson's correlation) and/or the like.
- Hardware modules may include, for example, a general-purpose processor, a field programmable gate array (FPGA), and/or an application specific integrated circuit (ASIC).
- Software modules (executed on hardware) can be expressed in a variety of software languages (e.g., computer code), including Unix utilities, C, C++, JavaTM, Ruby, SQL, SAS®, the R programming language/software environment, Visual BasicTM, and other object-oriented, procedural, or other programming language and development tools.
- Examples of computer code include, but are not limited to, micro-code or micro-instructions, machine instructions, such as produced by a compiler, code used to produce a web service, and files containing higher-level instructions that are executed by a computer using an interpreter. Additional examples of computer code include, but are not limited to, control signals, encrypted code, and compressed code.
- Non-transitory computer-readable medium also can be referred to as a non-transitory processor-readable medium or memory
- the computer-readable medium is non-transitory in the sense that it does not include transitory propagating signals per se (e.g., a propagating electromagnetic wave carrying information on a transmission medium such as space or a cable).
- the media and computer code also can be referred to as code
- non-transitory computer-readable media include, but are not limited to: magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (CD/DVDs), Compact Disc-Read Only Memories (CD-ROMs), and holographic devices; magneto-optical storage media such as optical disks; carrier wave signal processing modules; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), Read-Only Memory (ROM) and Random-Access Memory (RAM) devices.
- ASICs Application-Specific Integrated Circuits
- PLDs Programmable Logic Devices
- ROM Read-Only Memory
- RAM Random-Access Memory
- Other embodiments described herein relate to a computer program product, which can include, for example, the instructions and/or computer code discussed herein.
- a single biomarker or from about 1 to about 4, from about 4 to about 8, from about 8 to about 12, from about 12 to about 16, from about 16 to about 20, from about 20 to about 24, from about 24 to about 30, from about 34 to about 38, from about 38 to about 42, from about 42 to about 46, from about 46 to about 50, from about 50 to about 54, from about 54 to about 58, from about 58 to about 62, from about 62 to about 66, from about 66 to about 72, from about 72 to about 76, from about 76 to about 80, from about 80 to about 84 (e.g., as disclosed in Table 1) is capable of classifying COCA subtypes of cancer with a predictive success of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at 1 east about 79%, at least about 80%, at least about 81%, at least about 82%, at
- any combination of biomarkers disclosed herein can be used to obtain a predictive success of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at least about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, and all values in between.
- a single biomarker or from about 1 to about 4, from about 4 to about 8, from about 8 to about 12, from about 12 to about 16, from about 16 to about 20, from about 20 to about 24, from about 24 to about 30, from about 34 to about 38, from about 38 to about 42, from about 42 to about 46, from about 46 to about 50, from about 50 to about 54, from about 54 to about 58, from about 58 to about 62, from about 62 to about 66, from about 66 to about 72, from about 72 to about 76, from about 76 to about 80, from about 80 to about 84 (e.g., as disclosed in Table 1) is capable of classifying COCA subtypes of cancer with a sensitivity or specificity of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at least about 79%, at least about 80%, at least about 81%, at least about 82%
- any combination of biomarkers disclosed herein can be used to obtain a sensitivity or specificity of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at least about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, and all values in between.
- the methods and compositions provided herein are useful for determining the clustering of cluster assignments (COCA) subtype of a sample (e.g., tumor sample) from a patient by analyzing the expression of a set of biomarkers, whereby use of the set of biomarkers in detecting a COCA subtype comprises use of a fewer number of biomarkers from a single genome-wide platform as compared to methods known in the art for molecularly classifying a cell of origin cancer subtype (e.g., Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304, and Hoadley et al.
- COCA cluster assignments
- the set of biomarkers is less than 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 150, 100 or 90 biomarkers. In some cases, the set of biomarkers is between 4 and 84 biomarkers. In some cases, the set of biomarkers is the set of 84 biomarkers listed in Table 1.
- the set of biomarkers is a sub-set of biomarkers listed Table 1 such as, for example 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80 or 82 of the biomarkers listed in Table 1.
- the biomarkers or classifier biomarkers useful in the methods and compositions provided herein can be selected from one or more cancer datasets from one or more databases.
- the cancers can be any cancer known in the art.
- the cancers can include hematologic and lymphatic malignancies, solid tumor types, cancers of the central nervous system, cancers from neural-crest-derived tissues, and melanocytic cancers of the skin.
- the cancers for use in the methods herein can be the cancers studied in The Cancer Genome Atlas (TCGA) or a subset thereof.
- the cancers for use in the method provided herein can be those cancers listed herein.
- the databases can be public databases.
- classifier biomarkers e.g., one or more genes listed in Table 1
- classifier biomarkers useful in the methods and compositions provided herein for detecting or diagnosing subtypes were selected from a large data set of potential classifier biomarkers.
- classifier biomarkers useful for the methods and compositions provided herein such as those in Table 1 are selected by subjecting a large set of classifier biomarkers to an in silico based process in order to determine the minimum number of genes whose expression profile can be used to determine a pan-cancer COCA subtype of a subject from a sample obtained from said subject.
- the large set of classifier biomarkers can be a pan-cancer dataset such as, for example, the mRNA expression data (i.e., RNA-seq data) from TCGA found at gdc.cancer.gov/about-data/publications/pancanatlas.
- the large set of classifier biomarkers can be the genes derived from the mRNA expression profile data derived from more than 10,000 tumors across more than 30 tumor types as described in Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell 173, no.
- the in silico process for selecting a gene signature as provided herein (e.g., Table 1 and 2) for determining a COCA subtype of a sample from a patient can comprise applying or using a Classification to Nearest Centroid (CLaNC) algorithm on the pan-cancer mRNA expression data (i.e., RNA-seq data) from TCGA to choose a minimum number of correlated genes for each subtype.
- CLaNC Classification to Nearest Centroid
- the process can further comprise performing a 5-fold cross validation using the TCGA pan-cancer dataset following application of the CLaNC algorithm as provided herein to produce cross-validation curves to test different numbers of correlated genes as shown in FIG. 3 in order to determine the minimum number of correlated genes needed per subtype.
- the method can further comprise applying the CLaNC algorithm to the entire TCGA mRNA expression pan-cancer dataset.
- the CLaNC software used in the methods provided herein can be as found in or derived from Alan R. Dabney; ClaNC: point-and-click software for classifying microarrays to nearest centroids, Bioinformatics, Volume 22, Issue 1, 1 Jan. 2006, Pages 122-123).
- the method further comprises validating the gene classifiers.
- Validation can comprise testing the expression of the classifiers in a test set of samples and comparing the COCA subtype determined using the signature of Table 1 with the COCA subtype determined using the gold standard COCA subtyper method described in Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304.
- the test set of samples can be any sample type provided herein such as, for example, fresh frozen or archived formalin-fixed paraffin-embedded (FFPE) cancer samples.
- FFPE formalin-fixed paraffin-embedded
- validation can comprise testing the expression of the classifiers in several fresh frozen publicly available array and/or RNAseq datasets and calling the subtype based on said expression levels and subsequently comparing the COCA subtype determined using the signature of Table 1 with the COCA subtype determined using the gold standard COCA subtyper method described in Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell 173, no. 2 (2018): 291-304.
- validation can comprise calling the subtypes of the several fresh frozen publicly available array and RNAseq test datasets using their expression levels and the CLaNC algorithm as described herein and comparing the subtype calls with the gold standard subtype calls as defined in Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell 173, no. 2 (2018): 291-304.
- Final validation of the gene signature (e.g., Table 1) can then be performed in a newly collected dataset of archived formalin-fixed paraffin-embedded (FFPE) cancer samples to assure comparable performance in the FFPE samples.
- the classifier biomarkers of Table 1 were selected based on the in silico CLaNC process described herein.
- the in silico CLaNC process can entail use of the CLaNC process described in Dabney (2005) Bioinformatics 21(22):4148-4154.
- the in silico CLaNC process can entail use of CLaNC software described in Dabney A R. ClaNC: Point-and-click software for classifying microarrays to nearest centroids. Bioinformatics. 2006; 22: 122-123 or equivalents or derivatives related thereto.
- the methods provided herein require the detection of the expression level of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 18, at least 20, at least 22, at least 24, at least 26, at least 28, at least 30, at least 32, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least
- the methods provided herein require the detection of the expression level of a total of at least 2, at least 4, at least 6, at least 8, at least 10, at least 12, at least 14, at least 16, at least 18, at least 20, at least 22, at least 24, at least 26, at least 28, at least 30, at least 32, at least 34, at least 36, at least 38, at least 40, at least 42, at least 44, at least 46, at least 48, at least 50, at least 52, at least 54, at least 56, at least 58, at least 60, at least 62, at least 64, at least 66, at least 68, at least 70, at least 72, at least 74, at least 76, at least 78, at least 80, at least 82 or up to 84 classifier biomarkers out of the 84 gene biomarkers of Table 1 in a cancer cell sample obtained from a patient in order to identify a COCA cancer subtype.
- the methods provided herein require the detection of the expression level of a total of at least 4, at least 8, at least 12, at least 16, at least 20, at least 24, at least 28, at least 32, at least 36, at least 40, at least 44, at least 48, at least 52, at least 56, at least 60, at least 64, at least 68, at least 72, at least 76, at least 80 or up to 84 classifier biomarkers out of the 84 gene biomarkers of Table 1 in a cancer cell sample obtained from a patient in order to identify a COCA cancer subtype.
- classifier biomarker expression datasets as provided herein.
- the expression level of one or more classifier biomarkers of Table 1 can be altered in a specific COCA subtype as detected in a sample obtained from a subject as described in any of the methods provided herein.
- the alteration of the expression level can be an “up-regulation” or “down-regulation” of the one or more classifier biomarkers of Table 1.
- the alteration in expression levels of the more than one classifier biomarkers can either be an up-regulation, a down-regulation or any combination thereof.
- the alteration of the expression level can be relative to or compared to a sample isolated from a healthy subject as defined herein.
- the sample obtained from the healthy subject can be form the same anatomical area of the body. The same applies for other classifier biomarker expression datasets as provided herein.
- the expression level of an “up-regulated” biomarker as provided herein is increased by about 0.2-fold, about 0.5-fold, about 1-fold, about 1.5-fold, about 2-fold, about 2.5-fold, about 3-fold, about 3.5-fold, about 4-fold, about 4.5-fold, about 5-fold, and any values in between.
- the expression level of a “down-regulated” biomarker as provided herein is decreased by about 0.2-fold, about 0.5-fold, about 1-fold, about 1.5-fold, about 2-fold, about 2.5-fold, about 3-fold, about 3.5-fold, about 4-fold, about 4.5-fold, about 5-fold, and any values in between.
- genes useful in classifying the COCA subtypes of cancer include those that are independently capable of distinguishing between normal versus tumor, or between different classes or grades of cancer.
- a gene is considered to be capable of reliably distinguishing between COCA subtypes if the area under the receiver operator characteristic (ROC) curve is approximately 1.
- molecular platforms that generate data that can be useful in classifying the COCA subtypes of cancer can include genome-wide platforms such as, for example, whole-exome DNA sequencing assays (e.g., Illumina HiSeq and GAII), DNA copy-number variation assays (e.g., Affymetrix 6.0 microarrays), DNA methylation assays (e.g., Illumina 450,000-feature microarrays), genome-wide mRNA level assays (e.g., Illumina mRNA-seq), microRNA level assays (e.g., Illumina microRNA-seq), and protein level assays for proteins and/or phosphorylated proteins (e.g., Reverse Phase Protein Arrays; RPPA).
- genome-wide platforms such as, for example, whole-exome DNA sequencing assays (e.g., Illumina HiSeq and GAII), DNA copy-number variation assays (e.g., Affymetrix 6.0 microarrays
- a method for determining a disease outcome or prognosis for a patient suffering from cancer.
- the cancer can be any cancer known in the art and/or provided herein.
- the disease outcome or prognosis can be measured by examining the overall survival for a period of time or intervals (e.g., 0 to 36 months or 0 to 60 months).
- survival is analyzed as a function of COCA subtype.
- survival is analyzed as a function of COCA subtype across tissue of origin tumor types.
- survival is analyzed as a function of COCA subtype within a tissue of origin tumor type (see, for example, FIGS. 6-8 ).
- the COCA subtype can be determined using the methods provided herein such as, for example, determining the expression of all or subsets of the genes in Table 1. Relapse-free and overall survival can be assessed using standard Kaplan-Meier plots as well as Cox proportional hazards modeling.
- the methods and compositions as provided herein for determining a COCA subtype of a patient suffering or suspected of suffering from cancer is used to determine whether or not said patient is a candidate for treatment with a specific type or types of cancer therapy.
- the sample can be any type of sample obtained from the patient as provided herein.
- the cancer can be any type of cancer known in the art and/or provided herein.
- determining the COCA subtype is one of a number of methods that can be employed to characterize the sample obtained from the patient such that the determining the COCA subtype alone or in combination with one or more of the number of methods can be used to determine whether or not said patient is a candidate for treatment with a specific type or types of cancer therapy.
- the number of methods for characterizing the sample can entail determining a proliferation score, the tumor mutation burden (TMB), the tissue of origin subtype, the level of immune activation or any combination thereof.
- one or all of the methods for characterizing the sample can be performed on RNA sequencing data obtained from the sample.
- the characterization in addition to assessing the COCA subtype as provided herein, the characterization entails determining proliferation or proliferation score.
- proliferation or the proliferation score is determined using any method known in the art such as, for example, as provided in U.S. 62/789,668 filed Jan. 8, 2019, which is herein incorporated by reference herein.
- the characterization in addition to determining the COCA subtype as provided herein, the characterization entails calculating a TMB value and/or rate.
- the TMB value and/or rate can be calculated using any method known in the art.
- the TMB value and/or rate can be calculated from RNA (e.g., via transcriptome profiling or RNA sequencing)) as provided in U.S. 62/771,702 filed Nov. 27, 2018 and U.S. 62/743,257 filed Oct. 9, 2018, which is herein incorporated by reference herein.
- the determination of whether or not said patient is a candidate for treatment with a specific type or types of cancer therapy can be based on the COCA subtype alone or in combination with other methods known in the art for characterizing a sample obtained from a patient suffering from or suspected of suffering from cancer.
- the other methods for characterizing said sample can be histologically based methods, gene expression based methods or a combination thereof.
- the histologically based methods can include histological cancer subtyping by one or more trained pathologists as well as the histological based methods of assessing proliferation such as, for example, determining the mitotic activity index.
- the gene expression based methods can include subtyping, assessment of TMB, assessment of tissue of origin subtype, immune subtyping or any combination thereof.
- the gene expression based methods can be assessed from DNA, RNA or a combination thereof.
- the characterization of the sample obtained from the patient suffering from or suspected of suffering from cancer is performed on RNA obtained or isolated from the sample.
- the gene expression based tissue of origin cancer subtyping can be determined using gene signatures known in the art for specific types of cancer.
- the tissue of origin of the cancer is the lung and the gene signature is selected from the gene signatures found in WO2017/201165, WO2017/201164, US20170114416 or U.S. Pat. No. 8,822,153, each of which is herein incorporated by reference in their entirety.
- the tissue of origin cancer is head and neck squamous cell carcinoma (HNSCC) and the gene signature is selected from the gene signatures found in PCT/US18/45522 or PCT/US18/48862, each of which is herein incorporated by reference in their entirety.
- HNSCC head and neck squamous cell carcinoma
- the tissue of origin cancer is breast cancer and the gene signature is the PAM50 subtyper found in Parker J S et al., (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27:1160-1167, which is herein incorporated by reference in its entirety.
- the tissue of origin cancer is bladder cancer (e.g., MIBC) and the gene signature is selected from the gene signatures found in 62/629,975 filed Feb. 13, 2018, which is herein incorporated by reference in their entirety.
- the tissue of origin cancer is bladder cancer (e.g., MIBC) and the gene signature is selected from the gene signature found in The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature volume 507, pages 315-322 (2014), or Robertson, A G, et al., Cell, 171(3): 540-556 (2017), each of which is herein incorporated by reference, which is herein incorporated by reference in their entirety.
- the gene expression based immune subtyping or immune cell activation can be determined using immune expression signatures known in the art such as, for example, the gene signatures found in Thorsson, V., Gibbs, D. L., Brown, S. D., Wolf, D., Bortone, D. S., Yang, T. H. O., Porta-Pardo, E., Gao, G. F., Plaisier, C. L., Eddy, J. A. and Ziv, E., 2018, The immune landscape of cancer. Immunity, 48(4), pp. 812-830, which is herein incorporated by reference in its entirety.
- immune cell activation is determined by monitoring the immune cell signatures of Bindea et al (Immunity 2013; 39(4); 782-795), the contents of which are herein incorporated by reference in its entirety.
- the method further comprises measuring single gene immune biomarkers, such as, for example, CTLA4, PDCD1 and CD274 (PD-LI), PDCDLG2(PD-L2) and/or IFN gene signatures.
- the level of immune cell activation is determined by measuring gene expression signatures of immunomarkers.
- the immunomarkers can be measured in the same and/or different sample used to determine the COCA subtype as described herein.
- the immunomarkers can be those found in WO2017/201165, and WO2017/201164, each of which is herein incorporated by reference in their entirety.
- the gene expression based method for calculating a TMB value and/or rate can be any method known in the art.
- the TMB value and/or rate can be calculated from RNA (e.g., via transcriptome profiling or RNA sequencing)) as provided in U.S. 62/771,702 filed Nov. 27, 2018 and U.S. 62/743,257 filed Oct. 9, 2018, which is herein incorporated by reference herein.
- the patient upon determining a patient's COCA subtype (e.g., by measuring the expression of all or subsets of the genes in Table 1), the patient is selected for suitable therapy, for example, radiotherapy (radiation therapy), surgical intervention, target therapy, chemotherapy or drug therapy with an angiogenesis inhibitor or immunotherapy or combinations thereof.
- the suitable treatment can be any treatment or therapeutic method that can be used for a cancer patient.
- the patient upon determining a patient's COCA subtype, the patient is administered a suitable therapeutic agent, for example chemotherapeutic agent(s) or an angiogenesis inhibitor or immunotherapeutic agent(s).
- the therapy is immunotherapy
- the immunotherapeutic agent is a checkpoint inhibitor, monoclonal antibody, biological response modifier, therapeutic vaccine or cellular immunotherapy.
- the determination of a suitable treatment can identify treatment responders.
- the determination of a suitable treatment can identify treatment non-responders.
- the cancer patient upon determining a patient's COCA subtype, can be selected for any combination of suitable therapies. For example, chemotherapy or drug therapy with a radiotherapy, a tumor dissection with an immunotherapy or a chemotherapeutic agent with a radiotherapy.
- immunotherapy, or immunotherapeutic agent can be a checkpoint inhibitor, monoclonal antibody, biological response modifier, therapeutic vaccine or cellular immunotherapy.
- the methods provided herein are also useful for evaluating clinical response to therapy, as well as for endpoints in clinical trials for efficacy of new therapies.
- the extent to which sequential diagnostic expression profiles move towards normal can be used as one measure of the efficacy of the candidate therapy.
- the methods provided herein also find use in predicting response to different lines of therapies based on the COCA subtype of cancer alone or in combination with other characterization methods as described herein (e.g., tissue of origin cancer subtype, immune subtype, proliferation and/or TMB status). For example, chemotherapeutic response can be improved by more accurately assigning tumor cell of origin subtypes.
- treatment regimens can be formulated based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., tissue of origin cancer subtype, immune subtype, proliferation and/or TMB status).
- provided herein is a method for determining whether a cancer patient is likely to respond to immunotherapy by determining the COCA subtype of cancer of a sample obtained from the patient and, based on the COCA subtype, assessing whether the patient is likely to respond to immunotherapy.
- a method of selecting a patient suffering from cancer for immunotherapy by determining a COCA subtype of a sample from the patient and, based on the COCA subtype, selecting the patient for immunotherapy.
- the determination of the COCA subtype of the sample obtained from the patient can be performed using any method for COCA subtyping known in the art.
- the determination of the COCA subtype of the sample obtained from the patient can be performed using any method for COCA subtyping provided herein.
- the sample obtained from the patient has been previously diagnosed as being a particular type of cancer, and the methods provided herein are used to determine the COCA subtype of the sample.
- the previous diagnosis can be based on a histological analysis.
- the histological analysis can be performed by one or more pathologists.
- the COCA subtyping is performed via gene expression analysis of a set or panel of biomarkers or subsets thereof in order to generate an expression profile.
- the gene expression analysis can be performed on a tumor sample obtained from a patient in order to determine the presence, absence or level of expression of one or more biomarkers selected from a publically available pan-cancer database described herein and/or Table 1 provided herein.
- the COCA subtype can be selected from the group consisting of C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA).
- the immunotherapy can be any immunotherapy provided herein.
- the immunotherapy comprises administering one or more checkpoint inhibitors.
- the checkpoint inhibitors can be any checkpoint inhibitor provided herein such as, for example, a checkpoint inhibitor that targets PD-1, PD-LI or CTLA4.
- the biomarkers panels, or subsets thereof can be those disclosed in any publically available pan-cancer gene expression dataset or datasets.
- the biomarker panel or subset thereof is, for example, the cancer genome atlas pan-cancer mRNA expression dataset.
- the biomarker panel or subset thereof is, for example, the pan-cancer mRNA expression dataset disclosed in Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no.
- the biomarker panel or subset thereof is, for example, the gene expression signature disclosed in Table 1 in combination with one or more biomarkers from a publically available pan-cancer expression dataset.
- each of the biomarkers from any one of the pan-cancer gene expression datasets provided herein, including, for example, Table 1 for a tumor sample are detected in a method to determine the COCA subtype as provided herein.
- the methods provided herein further comprise determining the presence, absence or level of immune activation in a COCA subtype.
- the presence or level of immune cell activation can be determined by creating an expression profile or detecting the expression of one or more biomarkers associated with innate immune cells and/or adaptive immune cells associated with each COCA subtype in a sample obtained from a patient.
- immune cell activation associated with a COCA subtype of cancer is determined by monitoring the immune cell signatures of Thorsson, V. et al., 2018, The immune landscape of cancer. Immunity, 48(4), pp. 812-830, Bindea et al (Immunity 2013; 39(4); 782-795) Faruki H.
- the method further comprises measuring single gene immune biomarkers, such as, for example, CTLA4, PDCD1 and CD274 (PD-LI), PDCDLG2(PD-L2) and/or IFN gene signatures.
- single gene immune biomarkers such as, for example, CTLA4, PDCD1 and CD274 (PD-LI), PDCDLG2(PD-L2) and/or IFN gene signatures.
- PD-LI PDCD1 and CD274
- PDCDLG2(PD-L2) PDCDLG2(PD-L2)
- IFN gene signatures such as, for example, CTLA4, PDCD1 and CD274 (PD-LI), PDCDLG2(PD-L2) and/or IFN gene signatures.
- the presence or a detectable level of immune activation (Innate and/or Adaptive) associated with a COCA subtype can indicate or predict that a patient with said COCA subtype may be amendable to immunotherapy.
- the immunotherapy can be treatment with a checkpoint inhibitor as provided herein.
- a method is provided herein for detecting the expression of at least one classifier biomarker provided herein in a sample (e.g., tumor sample) obtained from a patient further comprises administering an immunotherapeutic agent following detection of immune activation as provided herein in said sample.
- the method comprises determining a COCA subtype of a tumor sample and subsequently determining a level of immune cell activation of said sub-type.
- the subtype is determined by determining the expression levels of one or more classifier biomarkers at the nucleic acid level using sequencing (e.g., RNASeq), amplification (e.g., qRT-PCR) or hybridization assays (e.g., microarray analysis) as described herein.
- the one or more biomarkers can be selected from a publically available database (e.g., TCGA pan-cancer mRNA expression datasets or any other publically available pan-cancer gene expression datasets provided herein).
- the biomarkers of Table 1 can be used to specifically determine the COCA subtype of a tumor sample obtained from a patient.
- the level of immune cell activation is determined by measuring gene expression signatures of immunomarkers.
- the immunomarkers can be measured in the same and/or different sample used to subtype the tumor sample as described herein.
- the immunomarkers that can be measured can comprise, consist of, or consistently essentially of innate immune cell (IIC) and/or adaptive immune cell (AIC) gene signatures, interferon (IFN) gene signatures, individual immunomarkers, major histocompatability complex class II (MHC class II) genes or a combination thereof.
- IIC innate immune cell
- AIC adaptive immune cell
- IFN interferon
- MHC class II major histocompatability complex class II
- the gene expression signatures for IICs, AICs, IFN and MHC class II can be any known gene signatures for said cell types or genes known in the art.
- the immune gene signatures can be those from Bindea et al. (Immunity 2013; 39(4); 782-795), Faruki H. et al., JTO, 12(6): 943-953 (2017), Charoentong P. et al., Cell reports, 18, 248-262 (2017) and/or WO2017/201165 and WO2017/201164.
- the individual immunomarkers can be CTLA4, PDCD1 and CD274 (PD-L1).
- immune subtyping or immune cell activation can be determined using the gene signatures found in Thorsson, V., Gibbs, D. L., Brown, S. D., Wolf, D., Bortone, D. S., Yang, T. H. O., Porta-Pardo, E., Gao, G. F., Plaisier, C. L., Eddy, J. A. and Ziv, E., 2018, The immune landscape of cancer. Immunity, 48(4), pp. 812-830.
- the patient upon determining a patient's COCA cancer subtype using any of the methods and classifier biomarkers panels or subsets thereof as provided herein, the patient is selected for treatment with or administered an immunotherapeutic agent.
- the immunotherapeutic agent can be a checkpoint inhibitor, monoclonal antibody, biological response modifiers, therapeutic vaccine or cellular immunotherapy.
- the immunotherapeutic agent is a checkpoint inhibitor.
- a method for determining the likelihood of response to one or more checkpoint inhibitors is provided.
- the checkpoint inhibitor is a PD-1/PD-LI checkpoint inhibitor.
- the PD-1/PD-LI checkpoint inhibitor can be nivolumab, pembrolizumab, atezolizumab, durvalumab, lambrolizumab, or avelumab.
- the checkpoint inhibitor is a CTLA-4 checkpoint inhibitor.
- the CTLA-4 checkpoint inhibitor can be ipilimumab or tremelimumab.
- the checkpoint inhibitor is a combination of checkpoint inhibitors such as, for example, a combination of one or more PD-1/PD-LI checkpoint inhibitors used in combination with one or more CTLA-4 checkpoint inhibitors.
- the immunotherapeutic agent is a monoclonal antibody.
- a method for determining the likelihood of response to one or more monoclonal antibodies is provided.
- the monoclonal antibody can be directed against tumor cells or directed against tumor products.
- the monoclonal antibody can be panitumumab, matuzumab, necitumunab, trastuzumab, amatuximab, bevacizumab, ramucirumab, bavituximab, patritumab, rilotumumab, cetuximab, immu-132, or demcizumab.
- the immunotherapeutic agent is a therapeutic vaccine.
- a method for determining the likelihood of response to one or more therapeutic vaccines is provided.
- the therapeutic vaccine can be a peptide or tumor cell vaccine.
- the vaccine can target MAGE-3 antigens, NY-ESO-1 antigens, p53 antigens, survivin antigens, or MUC1 antigens.
- the therapeutic cancer vaccine can be GVAX (GM-CSF gene-transfected tumor cell vaccine), belagenpumatucel-L (allogeneic tumor cell vaccine made with four irradiated NSCLC cell lines modified with TGF-beta2 antisense plasmid), MAGE-A3 vaccine (composed of MAGE-A3 protein and adjuvant AS15), (1)-BLP-25 anti-MUC-1 (targets MUC-1 expressed on tumor cells), CimaVax EGF (vaccine composed of human recombinant Epidermal Growth Factor (EGF) conjugated to a carrier protein), WT1 peptide vaccine (composed of four Wilms' tumor suppressor gene analogue peptides), CRS-207 (live-attenuated Listeria monocytogenes vector encoding human mesothelin), Bec2/BCG (induces anti-GD3 antibodies), GV1001 (targets the human telomerase reverse transcriptase), TG4010 (targets the MUC1 antigen
- the immunotherapeutic agent is a biological response modifier.
- a method for determining the likelihood of response to one or more biological response modifiers is provided.
- the biological response modifier can trigger inflammation such as, for example, PF-3512676 (CpG 7909) (a toll-like receptor 9 agonist), CpG-ODN 2006 (downregulates Tregs), Bacillus Calmette-Guerin (BCG), mycobacterium vaccae (SRL172) (nonspecific immune stimulants now often tested as adjuvants).
- the biological response modifier can be cytokine therapy such as, for example, IL-2+tumor necrosis factor alpha (TNF-alpha) or interferon alpha (induces T-cell proliferation), interferon gamma (induces tumor cell apoptosis), or Mda-7 (IL-24) (Mda-7/IL-24 induces tumor cell apoptosis and inhibits tumor angiogenesis).
- TNF-alpha tumor necrosis factor alpha
- interferon alpha induces T-cell proliferation
- interferon gamma induces tumor cell apoptosis
- Mda-7 IL-24
- the biological response modifier can be a colony-stimulating factor such as, for example granulocyte colony-stimulating factor.
- the biological response modifier can be a multi-modal effector such as, for example, multi-target VEGFR: thalidomide and analogues such as lenalidomide and pomalidomide, cyclophosphamide, cyclosporine, denileukin diftitox, talactoferrin, trabecetedin or all-trans-retinoic acid.
- multi-target VEGFR thalidomide and analogues such as lenalidomide and pomalidomide, cyclophosphamide, cyclosporine, denileukin diftitox, talactoferrin, trabecetedin or all-trans-retinoic acid.
- the immunotherapy is cellular immunotherapy.
- a method for determining the likelihood of response to one or more cellular therapeutic agents can be dendritic cells (DCs) (ex vivo generated DC-vaccines loaded with tumor antigens), T-cells (ex vivo generated lymphokine-activated killer cells; cytokine-induce killer cells; activated T-cells; gamma delta T-cells), or natural killer cells.
- DCs dendritic cells
- T-cells ex vivo generated lymphokine-activated killer cells
- cytokine-induce killer cells activated T-cells
- gamma delta T-cells gamma delta T-cells
- specific COCA subtypes of cancer have different levels of immune activation (e.g., innate immunity and/or adaptive immunity) such that COCA subtypes with elevated or detectable immune activation (e.g., innate immunity and/or adaptive immunity) are selected for treatment with one or more immunotherapeutic agents described herein.
- specific COCA subtypes of cancer have high or elevated levels of immune activation.
- immune activation e.g., innate immunity and/or adaptive immunity
- COCA subtypes with low levels of or no immune activation are not selected for treatment with one
- the patient upon determining a patient's or subject's COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), the patient is selected for drug therapy with an angiogenesis inhibitor.
- the angiogenesis inhibitor is a vascular endothelial growth factor (VEGF) inhibitor, a VEGF receptor inhibitor, a platelet derived growth factor (PDGF) inhibitor or a PDGF receptor inhibitor.
- VEGF vascular endothelial growth factor
- PDGF platelet derived growth factor
- the method comprises determining a COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) and probing a sample from the patient for the levels of at least five hypoxia biomarkers selected from the group consisting of RRAGD, FABP5, UCHL1, GAL, PLOD, DDIT4, VEGF, ADM, ANGPTL4, NDRG1, NP, SLC16A3, and C14ORF58 (see Table A) at the nucleic acid level.
- the probing step comprises mixing the sample with five or more oligonucleotides that are substantially complementary to portions of nucleic acid molecules of the at least five biomarkers under conditions suitable for hybridization of the five or more oligonucleotides to their complements or substantial complements, detecting whether hybridization occurs between the five or more oligonucleotides to their complements or substantial complements; and obtaining hybridization values of the sample based on the detecting steps.
- the hybridization values of the sample are then compared to reference hybridization value(s) from at least one sample training set, wherein the at least one sample training set comprises (i) hybridization value(s) of the at least five biomarkers from a sample that overexpresses the at least five biomarkers, or overexpresses a subset of the at least five biomarkers, (ii) hybridization values of the at least five biomarkers from a reference cancer of COCA subtype specific sample, or (iii) hybridization values of the at least five biomarkers from a control or healthy sample.
- a determination of whether the patient is likely to respond to angiogenesis inhibitor therapy, or a selection of the patient for angiogenesis inhibitor is then made based upon (i) the patient's COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) and (ii) the results of comparison.
- hypoxia profile The aforementioned set of thirteen biomarkers, or a subset thereof, is also referred to herein as a “hypoxia profile”.
- the method provided herein includes determining the levels of at least five biomarkers, at least six biomarkers, at least seven biomarkers, at least eight biomarkers, at least nine biomarkers, or at least ten biomarkers, or five to thirteen, six to thirteen, seven to thirteen, eight to thirteen, nine to thirteen or ten to thirteen biomarkers selected from RRAGD, FABP5, UCHL1, GAL, PLOD, DDIT4, VEGF, ADM, ANGPTL4, NDRG1, NP, SLC16A3, and C14ORF58 in a sample obtained from a subject.
- Biomarker expression in some instances may be normalized against the expression levels of all RNA transcripts or their expression products in the sample, or against a reference set of RNA transcripts or their expression products.
- the reference set as explained throughout, may be an actual sample that is tested in parallel with the sample, or may be a reference set of values from a database or stored dataset.
- Levels of expression, in one embodiment, are reported in number of copies, relative fluorescence value or detected fluorescence value.
- the level of expression of the biomarkers of the hypoxia profile together with the COCA subtype alone or in combination with other characterization methods as described herein can be used in the methods described herein to determine whether a patient is likely to respond to angiogenesis inhibitor therapy.
- the levels of expression of the thirteen biomarkers are normalized against the expression levels of all RNA transcripts or their non-natural cDNA expression products, or protein products in the sample, or of a reference set of RNA transcripts or a reference set of their non-natural cDNA expression products, or a reference set of their protein products in the sample.
- angiogenesis inhibitor treatments include, but are not limited to an integrin antagonist, a selectin antagonist, an adhesion molecule antagonist, an antagonist of intercellular adhesion molecule (ICAM)-1, ICAM-2, ICAM-3, platelet endothelial adhesion molecule (PCAM), vascular cell adhesion molecule (VCAM)), lymphocyte function-associated antigen 1 (LFA-1), a basic fibroblast growth factor antagonist, a vascular endothelial growth factor (VEGF) modulator, a platelet derived growth factor (PDGF) modulator (e.g., a PDGF antagonist).
- IAM intercellular adhesion molecule
- PCAM platelet endothelial adhesion molecule
- VCAM vascular cell adhesion molecule
- LFA-1 lymphocyte function-associated antigen 1
- VEGF vascular endothelial growth factor
- PDGF platelet derived growth factor
- the integrin antagonist is a small molecule integrin antagonist, for example, an antagonist described by Paolillo et al. (Mini Rev Med Chem, 2009, volume 12, pp. 1439-1446, incorporated by reference in its entirety), or a leukocyte adhesion-inducing cytokine or growth factor antagonist (e.g., tumor necrosis factor- ⁇ (TNF- ⁇ ), interleukin-1 ⁇ (IL-1 ⁇ ), monocyte chemotactic protein-1 (MCP-1) and a vascular endothelial growth factor (VEGF)), as described in U.S. Pat. No. 6,524,581, incorporated by reference in its entirety herein.
- TNF- ⁇ tumor necrosis factor- ⁇
- IL-1 ⁇ interleukin-1 ⁇
- MCP-1 monocyte chemotactic protein-1
- VEGF vascular endothelial growth factor
- interferon gamma 1 ⁇ interferon gamma 1 ⁇ (Actimmune®) with pirfenidone, ACUHTR028, ⁇ V ⁇ 5, aminobenzoate potassium, amyloid P, ANG1122, ANG1170, ANG3062, ANG3281, ANG3298, ANG4011, anti-CTGF RNAi, Aplidin, Astragalus membranaceus extract with salvia and Schisandra chinensis , atherosclerotic plaque blocker, Azol, AZX100, BB3, connective tissue growth factor antibody, CT140, danazol, Esbriet, EXC001, EXC002, EXC003, EXC004, EXC005, F647, FG3019, Fibrocorin, Follistatin, FT011, a galectin-3 inhibitor, GKT13783
- a method for determining whether a subject is likely to respond to one or more endogenous angiogenesis inhibitors.
- the endogenous angiogenesis inhibitor is endostatin, a 20 kDa C-terminal fragment derived from type XVIII collagen, angiostatin (a 38 kDa fragment of plasmin), a member of the thrombospondin (TSP) family of proteins.
- the angiogenesis inhibitor is a TSP-1, TSP-2, TSP-3, TSP-4 and TSP-5.
- a soluble VEGF receptor e.g., soluble VEGFR-1 and neuropilin 1 (NPR1), angiopoietin-1, angiopoietin-2, vasostatin, calreticulin, platelet factor-4, a tissue inhibitor of metalloproteinase (TIMP) (e.g., TIMP1, TIMP2, TIMP3, TIMP4), cartilage-derived angiogenesis inhibitor (e.g., peptide troponin I and chrondomodulin I), a disintegrin and metalloproteinase with thrombospondin motif 1, an interferon (IFN), (e.g., IFN- ⁇ , IFN- ⁇ , IFN- ⁇ ), a chemokine, e.g., a chemokine having the C—X—C motif (e.g., CXCL10, also known as interferon gam
- a method for determining the likelihood of response to one or more of the following angiogenesis inhibitors is angiopoietin-1, angiopoietin-2, angiostatin, endostatin, vasostatin, thrombospondin, calreticulin, platelet factor-4, TIMP, CDAI, interferon ⁇ , interferon ⁇ , vascular endothelial growth factor inhibitor (VEGI) meth-1, meth-2, prolactin, VEGI, SPARC, osteopontin, maspin, canstatin, proliferin-related protein (PRP), restin, TSP-1, TSP-2, interferon gamma 1 ⁇ , ACUHTR028, ⁇ V ⁇ 5, aminobenzoate potassium, amyloid P, ANG1122, ANG1170, ANG3062, ANG3281, ANG3298, ANG4011, anti-CTGF RNAi, Aplidin, Astragalus membranaceus extract with salvia
- the angiogenesis inhibitor can include pazopanib (Votrient), sunitinib (Sutent), sorafenib (Nexavar), axitinib (Inlyta), ponatinib (Iclusig), vandetanib (Caprelsa), cabozantinib (Cometrig), ramucirumab (Cyramza), regorafenib (Stivarga), ziv-aflibercept (Zaltrap), motesanib, or a combination thereof.
- the angiogenesis inhibitor is a VEGF inhibitor.
- the VEGF inhibitor is axitinib, cabozantinib, aflibercept, brivanib, tivozanib, ramucirumab or motesanib.
- the angiogenesis inhibitor is motesanib.
- the methods provided herein relate to determining a subject's likelihood of response to an antagonist of a member of the platelet derived growth factor (PDGF) family, for example, a drug that inhibits, reduces or modulates the signaling and/or activity of PDGF-receptors (PDGFR).
- PDGF platelet derived growth factor
- the PDGF antagonist in one embodiment, is an anti-PDGF aptamer, an anti-PDGF antibody or fragment thereof, an anti-PDGFR antibody or fragment thereof, or a small molecule antagonist.
- the PDGF antagonist is an antagonist of the PDGFR- ⁇ or PDGFR- ⁇ .
- the PDGF antagonist is the anti-PDGF- ⁇ aptamer E10030, sunitinib, axitinib, sorefenib, imatinib, imatinib mesylate, nintedanib, pazopanib HCl, ponatinib, MK-2461, dovitinib, pazopanib, crenolanib, PP-121, telatinib, imatinib, KRN 633, CP 673451, TSU-68, Ki8751, amuvatinib, tivozanib, masitinib, motesanib diphosphate, dovitinib dilactic acid, linifanib (ABT-869).
- the patient Upon making a determination of whether a patient is likely to respond to angiogenesis inhibitor therapy, or selecting a patient for angiogenesis inhibitor therapy, in one embodiment, the patient is administered the angiogenesis inhibitor.
- the angiogenesis in inhibitor can be any of the angiogenesis inhibitors described herein.
- a method for determining whether a patient is likely to respond to radiotherapy by determining the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample obtained from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), assessing whether the patient is likely to respond to or benefit from radiotherapy.
- other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample obtained from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.),
- a method of selecting a patient suffering from cancer for radiotherapy by determining a COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), selecting the patient for radiotherapy.
- characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample from the patient and,
- the radiotherapy can include but are not limited to proton therapy and external-beam radiation therapy. In some embodiments, the radiotherapy can include any types or forms of treatment that is suitable for patients with specific types of cancer.
- a patient with a specific type of cancer can have or display resistance to radiotherapy.
- Radiotherapy resistance in any cancer or subtype thereof can be determined by measuring or detecting the expression levels of one or more genes known in the art and/or provided herein associated with or related to the presence of radiotherapy resistance.
- Genes associated with radiotherapy resistance can include NFE2L2, KEAP1 and CUL3.
- radiotherapy resistance can be associated with the alterations of KEAP1 (Kelch-like ECH-associated protein 1)/NRF2 (nuclear factor E2-related factor 2) pathway. Association of a particular gene to radiotherapy resistance can be determined by examining expression of said gene in one or more patients known to be radiotherapy non-responders and comparing expression of said gene in one or more patients known to be radiotherapy responders.
- a method for determining whether a cancer patient is likely to respond to surgical intervention by determining the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample obtained from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), assessing whether the patient is likely to respond to or benefit from surgery.
- other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample obtained from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status
- a method of selecting a patient suffering from cancer for surgery by determining a COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), selecting the patient for surgery.
- the surgery can include laser technology, excision, dissection, and reconstructive surgery.
- the present disclosure provides methods for predicting overall survival rate for a cancer patient.
- the prediction of overall survival rate can involve obtaining a tumor sample for a cancer patient.
- the cancer patients can have various stages of cancers.
- the overall survival rate can be determined by detecting the expression level of at least one subtype classifier of a publically available pan-cancer database or dataset.
- an overall survival rate can be determined by detecting the expression level (e.g., protein and/or nucleic acid) of any subtype classifiers that are relevant across many types of cancer, for example, subtype classifiers relevant to cell of origin.
- the subtype classifiers can be all or a subset of classifiers from Table 1.
- the identification of the cell of origin (COCA) subtype is indicative of the overall survival in the patient.
- the COCA subtype is selected from C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA.
- the present disclosure provides methods for predicting nodal metastasis for a cancer patient.
- the prediction of nodal metastasis can involve obtaining a tumor sample for a patient.
- the patients can have various stages of cancers.
- the nodal metastasis can be determined by detecting the expression level of at least one subtype classifier from a pan-cancer gene set.
- the pan-cancer gene set can be a publically available pan-cancer database or a gene set provided herein (e.g. Table 1) or a combination thereof.
- the publically available pan-cancer gene set can be a TCGA pan-cancer gene set.
- nodal metastasis of cancer can be determined by detecting the expression level of all the subtype classifiers or subsets thereof of the classifiers found in Table 1.
- the C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype can be at least about 0.1 times, at least about 0.2 times, at least about 0.3 times, at least about 0.4 times, at least about 0.5 times, at least about 0.6 times, at least about 0.7 times, at least about 0.8 times, at least about 0.9 times, at least about 1 time, at least about 1.2 times, at least about 1.5 times, at least about
- the methods and compositions provided herein allow for the detection of at least one biomarker in a tumor sample obtained from a subject.
- the at least one biomarker can be a classifier biomarker provided herein.
- the detection can be at the nucleic acid level or protein level. In one embodiment, the detection is at the nucleic acid level and the detection can be by using any amplification, hybridization and/or sequencing assay disclosed herein.
- the at least one biomarker detected using the methods and compositions provided herein is selected from Table 1. Further to the above embodiment, the detection of the at least one biomarker selected from Table 1 is at the nucleic acid level.
- the methods of detecting the biomarker(s) (e.g., classifier biomarkers) in the tumor sample obtained from the subject comprises, consists essentially of, or consists of measuring the expression level of at least one or a plurality of biomarkers using any of the methods provided herein.
- the biomarkers can be selected from Table 1.
- the plurality of biomarker nucleic acids comprises, consists essentially of or consists of at least 4 biomarkers, at least 8 biomarkers, at least 12 biomarkers, at least 16 biomarkers, at least 20 biomarkers, at least 24 biomarkers, at least 28 biomarkers, at least 32 biomarkers, at least 36 biomarkers, at least 40 biomarkers, at least 44 biomarkers, at least 48 biomarkers, at least 52 biomarkers, at least 56 biomarkers, at least 60 biomarkers, at least 64 biomarkers, at least 68 biomarkers, at least 72 biomarkers, at least 76 biomarkers, at least 80 biomarkers or all 84 biomarkers of Table 1.
- the plurality of biomarkers comprises, consists essentially of or consists of at least 8 biomarkers, at least 16 biomarkers, at least 24 biomarkers, at least 32 biomarkers, at least 40 biomarkers, at least 48 biomarkers, at least 56 biomarkers, at least 64 biomarkers, at least 72 biomarkers, at least 80 biomarkers or all 84 biomarkers of Table 1.
- the methods and compositions provided herein allow for the detection of at least one or a plurality of biomarkers selected from the biomarkers listed in Table 1 in combination with the detection of at least one or a plurality of biomarkers from one or more additional sets of biomarkers in a tumor sample obtained from a subject.
- the tumor sample can be any type of sample provided herein.
- the subject can be suffering from or suspected of suffering from cancer.
- the cancer can be any type of cancer provided herein.
- the detection can be at the nucleic acid level or protein level. In one embodiment, the detection is at the nucleic acid level and the detection can be by using any amplification, hybridization and/or sequencing assay disclosed herein.
- the one or more additional sets of biomarkers can be selected from a set of biomarkers whose presence, absence and/or level of expression is indicative of immune activation, proliferation, a tissue of origin cancer subtype, or any combination thereof.
- the additional set of biomarkers for indicating immune activation can be gene expression signatures of and/or Adaptive Immune Cells (AIC) and/or Innate immune Cells (IIC), individual immune biomarkers, interferon genes, major histocompatibility complex, class II (MHC II) genes or a combination thereof.
- AIC Adaptive Immune Cells
- IIC Innate immune Cells
- MHC II major histocompatibility complex
- the gene expression signatures of both IIC and AIC can be any gene signatures known in the art such as, for example, the gene signatures listed in Thorsson, V. et al., 2018, The immune landscape of cancer.
- the additional set of biomarkers for indicating proliferation can be gene expression signatures that include the 11 gene signature comprising BIRC5, CCNB1, CDC20, CDCA1, CEP55, KNTC2, MKI67, PTTG1, RRM2, TYMS, and UBE2C found in Martin M.
- the additional set of biomarkers for determining tissue of origin cancer subtypes can be any gene signature found in the art for subtyping specific tissue of origin cancers.
- the additional set of biomarkers for determining tissue of origin cancer subtypes is the adenocarcinoma lung cancer subtyping gene expression signatures found in WO2017/201165, US20170114416 or U.S. Pat. No. 8,822,153.
- the additional set of biomarkers for determining tissue of origin cancer subtypes is the squamous cell carcinoma lung cancer subtyping gene expression signatures found in WO2017/201164, US20170114416 or U.S. Pat. No. 8,822,153.
- the additional set of biomarkers for determining tissue of origin cancer subtypes is the breast cancer subtyping gene expression signatures found in Parker J S et al., (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27:1160-1167, which is herein incorporated by reference in its entirety.
- the additional set of biomarkers for determining tissue of origin cancer subtypes is the bladder cancer subtyping gene expression signatures found in 62/629,975 filed Feb. 13, 2018. In one embodiment, the additional set of biomarkers for determining tissue of origin cancer subtypes is the bladder cancer subtyping gene expression signatures found in The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature volume 507, pages 315-322 (2014), or Robertson, A G, et al., Cell, 171(3): 540-556 (2017), each of which is herein incorporated by reference.
- the additional set of biomarkers for determining tissue of origin cancer subtypes is a head and neck squamous cell carcinoma (HNSCC) subtyping gene expression signatures selected from PCT/US18/45522 or PCT/US18/48862.
- HNSCC head and neck squamous cell carcinoma
- the methods and compositions provided herein further comprise determining tumor mutation burden (TMB) and/or TMB rate of the tumor sample.
- TMB and/or TMB rate can be determined or calculated using any method known in the art.
- the TMB and/or TMB rate is determined from RNA as described in 62/743,257 filed on Oct. 9, 2018 and 62/771,702 filed on Nov. 27, 2018.
- Kits for practicing the methods provided herein can be further provided.
- kit can encompass any manufacture (e.g., a package or a container) comprising at least one reagent, e.g., an antibody, a nucleic acid probe or primer, etc., for specifically detecting the expression of a biomarker provided herein.
- the kit may be promoted, distributed, or sold as a unit for performing the methods provided herein. Additionally, the kits may contain a package insert describing the kit and methods for its use.
- kits for practicing the methods provided herein are provided. Such kits are compatible with both manual and automated immunocytochemistry techniques (e.g., cell staining). These kits comprise at least one antibody directed to a biomarker of interest, chemicals for the detection of antibody binding to the biomarker, a counterstain, and, optionally, a bluing agent to facilitate identification of positive staining cells. Any chemicals that detect antigen-antibody binding may be used in the practice of the methods provided herein.
- the kits may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or more antibodies for use in the methods provided herein.
- kits for practicing the methods provided herein comprise at least one primer pair directed to a biomarker of interest, chemicals for the detection of amplification of the biomarker of interest, and, optionally, any agent necessary for quantifying the detection level of the biomarker of interest. Any chemicals that detect amplification products may be used in the practice of the methods provided herein.
- the kits may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or more primer pairs for use in the methods provided herein.
- kits for practicing the methods provided herein comprise at least one probe directed to a biomarker of interest, chemicals for the detection of hybridization of the probe to the biomarker of interest, and, optionally, any agent necessary for quantifying the level of the biomarker of interest. Any chemicals that detect hybridization products may be used in the practice of the methods provided herein.
- the kits may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or more probes for use in the methods provided herein.
- COCA pan-cancer cluster of cluster assignment
- C1 ACC/PCPG
- C2 GM/LGG
- C3 O
- C4 Squamous-like
- C6 LAD-Enriched
- C8 PAAD/some STAD
- C9 UCS
- C10 BRCA/Basal
- C12 UCEC
- C14 PRAD
- C15 CESC (subset of cervical)
- C16 BLCA
- C17 TCT
- C19 COAD/READ
- C20 SARC/MESO
- C21 KIRK/KICH/KIRP
- C22 Liver
- C24 BRCA/Luminal
- C25 TYM
- C26 SKCM/UVM
- C28 THCA
- the gene signature developed in this example can be used in diagnostic methods that include evaluation of gene expression subtypes and application of an algorithm for categorization of a tumor sample obtained from a subject into one of 21 COCA subtypes C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA))).
- COCA subtypes C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV
- kidney renal papillary cell carcinoma KIRP
- breast invasive carcinoma BRCA
- thyroid cancer THCA
- bladder urothelial carcinoma BLCA
- prostate adenocarcinoma PRAD
- kidney chromophobe KICH
- cervical squamous cell carcinoma and endocervical adenocarcinoma CESC
- kidney renal clear cell carcinoma KIRC
- liver hepatocellular carcinoma LIHC
- low grade glioma LGG
- SARC lung adenocarcinoma
- COAD colon adenocarcinoma
- HNSC head and neck squamous cell carcinoma
- UCEC uterine corpus endometrial carcinoma
- GBM glioblastoma multiforme
- esophageal carcinoma ESCA
- stomach adenocarcinoma STAD
- ovarian serous cystadenocarcinoma OV
- rectum adenocarcinoma READ
- COCA subtypes i.e., COCA_Sample_Assignment_n9759.csv
- Hoadley et al (Cell. 2018 Apr. 5; 173(2):291-304) were then assigned to the 8545 samples from the TCGA data described above, excluding COCA subtypes with 30 or fewer samples.
- FIG. 1 shows the cross-tabulation of the TCGA tumor type and COCA subtype from the Hoadley et al, 2018 paper for samples with qualifying expression data as described herein.
- FIG. 1 also provides the integrated COCA subtypes and their designations as provided herein.
- Gene expression values were log 2 transformed and genes with low variance and/or low mean were filtered out, while genes with mean variance and mean expression values greater than 4 were kept resulting in gene expression data for 2190 genes (see graph in FIG. 2 ).
- the ClaNC software package (see Dabney, 2006) used on the entire training set calculated t-statistics and 84 genes were selected based on the ranks of the strongest t-statistics (i.e., both negatively and positively correlated genes for each COCA subtype can be and were selected) (see Table 1). Then an ordinary nearest centroid classifier was fit using the 21 COCA classes and 84 genes.
- Subtypes provide potential biomarkers for targeted and immunotherapy response.
- the data demonstrate that differences in prognosis that may be meaningful to therapeutic management.
- Example 2 Example 2—Examination Use of COCA Subtype Signature as a Prognostic Indicator
- This example describes the examination of the 84 gene COCA subtyper developed in Example 1 and found in Table 1 as a prognostic indicator for overall survival. Overall, the goal of the studies in this example was to determine if the 84-gene COCA signature has prognostic value across a myriad of tumor types.
- BLCA tumors were classified into 8 predicted subtype categories (C10, C15, C16, C20, C25, C4, C8, C9; see FIG. 6 ) but 92% (345/375) were in two of them (C16 and C4), and only these categories were analyzed.
- COCA subtypes can be associated with overall survival.
- the C4 COCA subtype was significantly associated with worse overall survival in BLCA (association test p-value for C4 subtype as determined using Table 1 gene signature was 0.0204, while the Hazard ratio was 1.53 (i.e., second column); FIG. 6 ), while the C8 COCA subtype in STAD (association test p-value for C8 subtype as determined using Table 1 gene signature was 0.00689, while the Hazard ratio was 1.67; FIG. 8 ) samples was also associated with worse overall survival.
- the C24 COCA subtype in the BRCA sample had better overall survival (association test p-value was 0.00013, while the Hazard ratio was 0.37; FIG. 7 ).
- a method for determining a clustering of cluster assignments (COCA) subtype of a tumor cancer sample obtained from a patient comprising detecting an expression level of at least one classifier biomarker of Table 1, wherein the detection of the expression level of the classifier biomarker specifically identifies a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype.
- the method further comprises comparing the detected levels of expression of the at least one classifier biomarker of Table 1 to the expression of the at least one classifier biomarker of Table 1 in at least one sample training set(s), wherein the at least one sample training set(s) comprises expression data of the at least one classifier biomarker of Table 1 from a reference C1 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C2 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C3 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C4 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C6 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C8 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C9 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C10 sample,
- the comparing step comprises applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm.
- a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm.
- the C1 COCA subtype indicates that a tumor sample is substantially similar to or is adenocortical carcinoma; the C2 COCA subtype indicates that a tumor sample is substantially similar to or is glioblastoma; the C3 COCA subtype indicates that a tumor sample is substantially similar to or is an ovarian serous cystadenocarcinoma (epithelial ovarian cancer); the C4 COCA subtype indicates that a tumor sample is substantially similar to or is squamous cell carcinoma of the lung, the head and neck or the bladder; the C6 COCA subtype indicates that a tumor sample is substantially similar to or is lung adenocarcinoma; the C8 COCA subtype indicates that a tumor sample is substantially similar to or is pancreatic adenocarcinoma; the C9 COCA subtype indicates that a tumor sample is substantially similar to or is uterine carcinosarcoma; the C10 COCA subtype indicates that a tumor sample is substantially similar
- nucleic acid level is RNA or cDNA.
- the method embodiment 5 or 6, wherein the detecting an expression level comprises performing quantitative real time reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq, microarrays, gene chips, nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- qRT-PCR quantitative real time reverse transcriptase polymerase chain reaction
- RNAseq RNAseq
- microarrays gene chips
- nCounter Gene Expression Assay Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, a fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- FFPE formalin-fixed, paraffin-embedded
- the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 4 classifier biomarkers, at least 6 classifier biomarkers, at least 8 classifier biomarkers, at least 10 classifier biomarkers, at least 12 classifier biomarkers, at least 14 classifier biomarkers, at least 16 classifier biomarkers, at least 18 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- a method of detecting a biomarker in a tumor sample obtained from a patient comprising measuring the expression level of a plurality of classifier biomarker nucleic acids selected from Table 1 using an amplification, hybridization and/or sequencing assay.
- kidney renal papillary cell carcinoma kidney renal papillary cell carcinoma
- BRCA breast invasive carcinoma
- BLCA bladder urothelial carcinoma
- PRAD prostate adenocarcinoma
- KICH kidney chromophobe
- CESC cervical squamous cell carcinoma and endocervical adenocarcinoma
- LIHC liver hepatocellular carcinoma
- LGG low grade glioma
- SARC lung adenocarcinoma
- COAD colon adenocarcinoma
- HNSC uterine corpus endometrial carcinoma
- UCEC glioblastoma multiforme
- GBM esophageal carcinoma
- ESA stomach adenocarcinoma
- STAD stomach adenocarcinoma
- OV ovarian serous cystadenocarcinoma
- amplification, hybridization and/or sequencing assay comprises performing quantitative real time reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq, microarrays, gene chips, nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- FFPE formalin-fixed, paraffin-embedded
- the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- a method of treating cancer in a subject comprising:
- the at least one biomarker nucleic acid is selected from a set of biomarkers listed in Table 1, wherein the presence, absence and/or level of the at least one biomarker indicates a COCA subtype of the cancer; and administering a therapeutic agent based on the COCA subtype of the cancer.
- the at least one biomarker nucleic acid selected from the set of biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- amplification, hybridization and/or sequencing assay comprises performing quantitative real time reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq, microarrays, gene chips, nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- FFPE formalin-fixed, paraffin-embedded
- the C1 COCA subtype indicates that a tumor sample is substantially similar to or is adenocortical carcinoma
- the C2 COCA subtype indicates that a tumor sample is substantially similar to or is glioblastoma
- the C3 COCA subtype indicates that a tumor sample is substantially similar to or is an ovarian serous cystadenocarcinoma (epithelial ovarian cancer)
- the C4 COCA subtype indicates that a tumor sample is substantially similar to or is squamous cell carcinoma of the lung, the head and neck or the bladder
- the C6 COCA subtype indicates that a tumor sample is substantially similar to or is lung adenocarcinoma
- the C8 COCA subtype indicates that a tumor sample is substantially similar to or is pancreatic adenocarcinoma
- the C9 COCA subtype indicates that a tumor sample is substantially similar to or is uterine carcinosarcoma
- the C10 COCA subtype indicates that a tumor sample is substantially similar to or is the
- a method of predicting overall survival in a cancer patient comprising detecting an expression level of at least one classifier biomarker of Table 1 in a tumor sample obtained from a patient, wherein the detection of the expression level of the at least one classifier biomarker specifically identifies a COCA subtype, and wherein identification of the COCA subtype is predictive of the overall survival in the patient.
- the method further comprises comparing the detected levels of expression of the at least one classifier biomarker of Table 1 to the expression of the at least one classifier biomarker of Table 1 in at least one sample training set(s), wherein the at least one sample training set(s) comprises expression data of the at least one classifier biomarker of Table 1 from a reference C1 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C2 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C3 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C4 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C6 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C8 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C9 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C10
- the comparing step comprises applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm.
- a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm.
- nucleic acid level is RNA or cDNA.
- detecting an expression level comprises performing quantitative real time reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq, microarrays, gene chips, nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- qRT-PCR quantitative real time reverse transcriptase polymerase chain reaction
- RNAseq RNAseq
- microarrays gene chips
- nCounter Gene Expression Assay Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays
- Northern blotting or any other equivalent gene expression detection techniques.
- FFPE formalin-fixed, paraffin-embedded
- the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Hospice & Palliative Care (AREA)
- Biophysics (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
- This application claims the benefit of priority to U.S. Provisional Application No. 62/743,256 filed Oct. 9, 2018 and U.S. Provisional Application No. 62/819,893 filed Mar. 18, 2019, each of which is incorporated by reference herein in its entirety for all purposes.
- The present invention relates to methods for determining an integrated, pan-cancer subtype and for predicting the prognosis of a patient inflicted with said integrated subtype of cancer.
- The Sequence Listing associated with this application is provided in text format in lieu of a paper copy, and is hereby incorporated by reference into the specification. The name of the text file containing the Sequence Listing is GNCN_016_01WO_SeqList_ST25.txt. The text file is ≈433 KB, was created on Oct. 8, 2019, and is being submitted electronically via EFS-Web.
- Cancers are typically classified using pathologic criteria that rely heavily on the tissue site of origin. Recently, large-scale genomics projects spearheaded by The Cancer Genome Atlas (TCGA) have been undertaken in order to provide a detailed molecular characterization of thousands of tumors, thereby making a systematic molecular-based taxonomy of cancer possible (see, for example, The_Cancer_Genome_Atlas_Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008; 455:1061-1068; The_Cancer_Genome_Atlas_Network. Integrated genomic analyses of ovarian carcinoma. Nature. 2011; 474:609-615; The_Cancer_Genome_Atlas_Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012a;489:519-525; The_Cancer_Genome_Atlas_Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012b; 487:330-337; The_Cancer_Genome_Atlas_Network. Comprehensive molecular portraits of human breast tumours. Nature. 2012c; 490:61-70; The_Cancer_Genome_Atlas_Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature. 2013a; 499:43-49; The_Cancer_Genome_Atlas_Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. The New England journal of medicine. 2013b; 368:2059-2074; The_Cancer_Genome_Atlas_Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature. 2014; 507:315-322; each of which is herein incorporated by reference). These large-scale genomics projects have shown that each single-tissue cancer type can be further divided into three to four molecular subtypes and meaningful differences in clinical behavior can often be correlated with the single-tissue tumor types. In fact, in a few cases, single-tissue subtype identification has led to therapies that target the driving subtype-specific molecular alteration(s). EGFR-mutant lung adenocarcinomas and ERBB2-amplified breast cancer are two well-established examples.
- Building off these projects, more recent studies have undertaken multi-platform integrative analysis of thousands of cancers from numerous tumor types in The Cancer Genome Atlas (TCGA) project in order to determine whether tissue-of-origin categories split into sub-types based upon multi-platform genomic analyses, what molecular alterations are shared across cancers arising from different tissues and if previously recognized disease subtypes in fact span multiple tissues of origin (see Hoadley et al., Cell. 2014 Aug. 14; 158(4):929-944 and Hoadley et al., Cell. 2018 Apr. 5; 173(2):291-304, each of which is herein incorporated by reference). While these studies have helped to elucidate a molecular taxonomy of cancer with newly defined integrated subtypes that can provide a significant increase in the accuracy for the prediction of clinical outcomes, they have relied on performing a second-level cluster analysis (i.e., clustering of cluster assignments (COCA)) using as input data from five ‘omic’ platforms. The ‘omic’ platforms used in the studies for the COCA analysis included whole-exome DNA sequence (Illumina HiSeq and GAII), DNA methylation (Illumina 450,000-feature microarrays), genome-wide mRNA levels (Illumina mRNA-seq), microRNA levels (Illumina microRNA-seq), and protein levels and/or phosphorylated proteins (Reverse Phase Protein Arrays; RPPA).
- While the benefits of such a pan-cancer analysis from a clinical standpoint are clear, the resources necessary to perform said analysis can be laborious, time-consuming and expensive. Accordingly, there is need in the art for methods and resources for molecularly characterizing tumor samples in a rapid, efficient and reliable manner regardless of tissue of origin.
- The present disclosure addresses the limitations of the current methods and other needs in the field for an efficient method for pan-cancer tumor classification that may inform prognosis and patient management based on underlying genomic and biologic tumor characteristics shared across tumor samples from multiple tissues of origin.
- The methods disclosed herein include determination of a cell of origin subtype, treatment of cancer based on a cell of origin subtype, prediction of overall survival of patients based on a cell of origin subtype, and application of an algorithm to gene expression data for one or a plurality of classifier biomarkers for categorization of tumor sample into one of 21 a clustering of cluster assignments (COCA) subtypes C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) such that the COCA subtype is indicative of the cell of origin of the tumor sample regardless of the anatomical location of said tumor sample. The algorithm can be a classification to the nearest centroid (CLaNC algorithm). The C1 COCA subtype can indicate that a tumor sample is substantially similar to or is adenocortical carcinoma. The C2 COCA subtype can indicate that a tumor sample is substantially similar to or is glioblastoma. The C3 COCA subtype can indicate that a tumor sample is substantially similar to or is an ovarian serous cystadenocarcinoma (epithelial ovarian cancer). The C4 COCA subtype can indicate that a tumor sample is substantially similar to or is squamous cell carcinoma of the lung, the head and neck or the bladder. The C6 COCA subtype can indicate that a tumor sample is substantially similar to or is lung adenocarcinoma. The C8 COCA subtype can indicate that a tumor sample is substantially similar to or is pancreatic adenocarcinoma. The C9 COCA subtype can indicate that a tumor sample is substantially similar to or is uterine carcinosarcoma. The C10 COCA subtype can indicate that a tumor sample is substantially similar to or is the basal subtype of breast cancer. The C12 COCA subtype can indicate that a tumor sample is substantially similar to or is uterine corpus endometrial cancer. The C14 COCA subtype can indicate that a tumor sample is substantially similar to or is prostate cancer. The C15 COCA subtype can indicate that a tumor sample is substantially similar to or is non-squamous cervical cancer. The C16 COCA subtype can indicate that a tumor sample is substantially similar to or is a bladder urothelial carcinoma. The C17 COCA subtype can indicate that a tumor sample is substantially similar to or is a testicular germ cell tumor. The C19 COCA subtype can indicate that a tumor sample is substantially similar to or is a colon, rectal, esophageal or stomach adenocarcinoma. The C20 COCA subtype can indicate that a tumor sample is substantially similar to or is a sarcoma. The C21 COCA subtype can indicate that a tumor sample is substantially similar to or is a kidney chromophobe, kidney renal papillary cell carcinoma or kidney renal clear cell carcinoma. The C22 COCA subtype can indicate that a tumor sample is substantially similar to or is liver hepatocellular carcinoma. The C24 COCA subtype can indicate that a tumor sample is substantially similar to or is the luminal subtype of breast cancer. The C25 COCA subtype can indicate that a tumor sample is substantially similar to or is thymoma. The C26 COCA subtype can indicate that a tumor sample is substantially similar to or is melanoma. The C28 COCA subtype can indicate that a tumor sample is substantially similar to or is thyroid cancer.
- In one aspect, provided herein is a method for determining a clustering of cluster assignments (COCA) subtype of a tumor cancer sample obtained from a patient, the method comprising detecting an expression level of at least one classifier biomarker of Table 1, wherein the detection of the expression level of the classifier biomarker specifically identifies a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype. In some cases, the method further comprises comparing the detected levels of expression of the at least one classifier biomarker of Table 1 to the expression of the at least one classifier biomarker of Table 1 in at least one sample training set(s), wherein the at least one sample training set(s) comprises expression data of the at least one classifier biomarker of Table 1 from a reference C1 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C2 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C3 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C4 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C6 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C8 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C9 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C10 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C12 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C14 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C15 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C16 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C17 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C19 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C20 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C21 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C22 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C24 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C25 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C26 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C28 sample or a combination thereof; and classifying the sample as the C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the comparing step. In some cases, the comparing step comprises applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm. In some cases, the expression level of the classifier biomarker is detected at the nucleic acid level. In some cases, the nucleic acid level is RNA or cDNA. In some cases, the detecting an expression level comprises performing a quantitative real time reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq, microarray analysis, gene chips, an nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques. In some cases, the expression level is detected by performing RNAseq. In some cases, the detection of the expression level comprises using at least one pair of oligonucleotide primers specific for at least one classifier biomarker of Table 1. In some cases, the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, a fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient. In some cases, the bodily fluid is blood or fractions thereof (i.e., serum or plasma), urine, saliva, or sputum. In some cases, the at least one classifier biomarker comprises a plurality of classifier biomarkers. In some cases, the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 4 classifier biomarkers, at least 6 classifier biomarkers, at least 8 classifier biomarkers, at least 10 classifier biomarkers, at least 12 classifier biomarkers, at least 14 classifier biomarkers, at least 16 classifier biomarkers, at least 18 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1. In some cases, the at least one classifier biomarker comprises, consists essentially of or consists of all the classifier biomarkers of Table 1.
- In another aspect, provided herein is a method of detecting a biomarker in a tumor sample obtained from a patient, the method comprising measuring the expression level of a plurality of classifier biomarker nucleic acids selected from Table 1 using an amplification, hybridization and/or sequencing assay. In some cases, the patient is suffering from or is suspected of suffering from kidney renal papillary cell carcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer (THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma (PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma (KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG); sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD); head and neck squamous cell carcinoma (HNSC); uterine corpus endometrial carcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma (ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma (OV); rectum adenocarcinoma (READ); adrenocortical carcinoma (ACC); uveal melanoma (UVM); mesothelioma (MESO); pheochromocytoma and paraganglioma (PCPG); skin cutaneous melanoma (SKCM); uterine carcinsarcoma (UCS); lung squamous cell carcinoma (LUSC); testicular germ cell tumors (TGCT); cholangiocarcinoma (CHOL); pancreatic adenocarcinoma (PAAD); thymoma (THYM); or Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC). In some cases, the amplification, hybridization and/or sequencing assay comprises performing quantitative real time reverse transcriptase polymerase chain reaction(s) (qRT-PCR), RNAseq, microarray analysis, gene chips, nCounter Gene Expression Assay(s), Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques. In some cases, the expression level is detected by performing RNAseq. In some cases, the detection of the expression level comprises using at least one pair of oligonucleotide primers per each of the plurality of biomarker nucleic acids selected from Table 1. In some cases, the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient. In some cases, the bodily fluid is blood or fractions thereof, urine, saliva, or sputum. In some cases, the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1. In some cases, the plurality of biomarker nucleic acids comprises, consists essentially of or consists of all the classifier biomarker nucleic acids of Table 1.
- In yet another aspect, provided herein is a method of treating cancer in a subject, the method comprising: measuring the expression level of at least one biomarker nucleic acid in a tumor sample obtained from the subject, wherein the at least one biomarker nucleic acid is selected from a set of biomarkers listed in Table 1, wherein the presence, absence and/or level of the at least one biomarker indicates a COCA subtype of the cancer; and administering a therapeutic agent based on the COCA subtype of the cancer. In some cases, the at least one biomarker nucleic acid selected from the set of biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1. In some cases, the method further comprises measuring the expression of at least one biomarker from an additional set of biomarkers. In some cases, the additional set of biomarkers comprises at least an immune cell signature, a cell proliferation signature, or drug target genes. In some cases, the measuring the expression level is conducted using an amplification, hybridization and/or sequencing assay. In some cases, the amplification, hybridization and/or sequencing assay comprises performing quantitative real time reverse transcriptase polymerase chain reaction(s) (qRT-PCR), RNAseq, microarray analysis, gene chips, nCounter Gene Expression Assay(s), Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques. In some cases, the expression level is detected by performing RNAseq. In some cases, the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient. In some cases, the bodily fluid is blood or fractions thereof, urine, saliva, or sputum. In some cases, the subject's COCA subtype is selected from C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28.
- In still another aspect, provided herein is a method of predicting overall survival in a cancer patient, the method comprising detecting an expression level of at least one classifier biomarker of Table 1 in a tumor sample obtained from a patient, wherein the detection of the expression level of the at least one classifier biomarker specifically identifies a COCA subtype, and wherein identification of the COCA subtype is predictive of the overall survival in the patient. In some cases, the method further comprises comparing the detected levels of expression of the at least one classifier biomarker of Table 1 to the expression of the at least one classifier biomarker of Table 1 in at least one sample training set(s), wherein the at least one sample training set(s) comprises expression data of the at least one classifier biomarker of Table 1 from a reference C1 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C2 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C3 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C4 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C6 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C8 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C9 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C10 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C12 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C14 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C15 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C16 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C17 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C19 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C20 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C21 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C22 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C24 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C25 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C26 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C28 sample or a combination thereof; and classifying the sample as the C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the comparing step. In some cases, the comparing step comprises applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm. In some cases, the expression level of the classifier biomarker is detected at the nucleic acid level. In some cases, the nucleic acid level is RNA or cDNA. In some cases, the detecting an expression level comprises performing quantitative real time reverse transcriptase polymerase chain reaction(s) (qRT-PCR), RNAseq, microarray analysis, gene chips, nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques. In some cases, the expression level is detected by performing RNAseq. In some cases, the detection of the expression level comprises using at least one pair of oligonucleotide primers specific for at least one classifier biomarker of Table 1. In some cases, the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient. In some cases, the bodily fluid is blood or fractions thereof, urine, saliva, or sputum. In some cases, the at least one classifier biomarker comprises a plurality of classifier biomarkers. In some cases, the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1. In some cases, the at least one classifier biomarker comprises, consists essentially of or consists of all the classifier biomarkers of Table 1.
-
FIG. 1 shows a cross-tabulation of the TCGA tumor type and COCA subtype from Hoadley et al., Cell. 2018 Apr. 5; 173(2):291-304 for samples with qualifying expression data as described in Example 1.FIG. 1 also provides the integrated tumor subtypes provided herein. -
FIG. 2 illustrates how the TCGA samples were divided into a training set (⅔ of the data set; n=5696) and test set (⅓ of the data set), balancing for uniform tumor type of origin distributions for development of the 84-gene subtyper described herein (see the Table inFIG. 2 ). As illustrated in the graph onFIG. 2 , using the training set, genes with low variance and/or low mean were filtered out, while genes with mean variance and mean expression values greater than 4 were kept resulting in gene expression data for 2190 genes. -
FIG. 3 illustrates five-fold cross validation curves using classification to the nearest centroid (ClaNC) on the TCGA-2018 training dataset (n=408) to guide the selection of the number of genes per subtype to include in the signature for COCA subtyping provided herein. -
FIG. 4 illustrates agreement and disagreement between the GS subtype (rows) and the subtype based on the 84-gene subtyper (columns) (left panel) for the test set described in Example 1. The right panel shows agreement for each COCA subtype listed. Overall agreement was 90%. Overall agreement with COCA on the training set was 91%. -
FIG. 5 shows the proportion of COCA subtypes in the test set that were called correctly by the 84-gene typer developed in Example 1. -
FIG. 6 shows results of within cancer-type survival analysis for bladder cancer (BLCA) via testing for association of COCA subtypes from BLCA sample with overall survival. p=0.0204 for COCA subtype C4 as determined using the 84 gene COCA subtyper provided herein. -
FIG. 7 shows results of within cancer-type survival analysis for breast cancer (BRCA) via testing for association of COCA subtypes from BRCA sample with overall survival. p=0.00013 for COCA subtype C24 as determined using the 84 gene COCA subtyper provided herein. -
FIG. 8 shows results of within cancer-type survival analysis for stomach adenocarcinoma (STAD) via testing for association of COCA subtypes from STAD sample with overall survival. p=0.00689 for COCA subtype C8 as determined using the 84 gene COCA subtyper provided herein. - While the following terms are believed to be well understood by one of ordinary skill in the art, the following definitions are set forth to facilitate explanation of the presently disclosed subject matter.
- As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Additionally, the use of “or” is intended to include “and/or” unless the context clearly indicates otherwise. Furthermore, to the extent that the terms “including”, “includes”, “having”, “has”, “with”, or variants thereof are used in either the detailed description and/or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising”. The term “about” as used herein can refer to a range that is 15%, 10%, 8%, 6%, 4%, or 2% plus or minus from a stated numerical value.
- Unless the context requires otherwise, throughout the present specification and claims, the word “comprise” and variations thereof, such as, “comprises” and “comprising” are to be construed in an open, inclusive sense that is as “including, but not limited to”. The use of the alternative (e.g., “or”) should be understood to mean either one, both, or any combination thereof of the alternatives. As used herein, the terms “about” and “consisting essentially of” mean+/−20% of the indicated range, value, or structure, unless otherwise indicated.
- Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment may be included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification may not necessarily all be referring to the same embodiment. It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable sub-combination.
- Throughout this disclosure, various aspects of the methods and compositions provided herein can be presented in a range format. It should be understood that the description in range format is merely for convenience and brevity and should not be construed as an inflexible limitation on the scope of the invention. Accordingly, the description of a range should be considered to have specifically disclosed all the possible subranges as well as individual numerical values within that range. For example, description of a range such as from 1 to 6 should be considered to have specifically disclosed subranges such as from 1 to 3, from 1 to 4, from 1 to 5, from 2 to 4, from 2 to 6, from 3 to 6 etc., as well as individual numbers within that range, for example, 1, 2, 3, 4, 5, and 6. This applies regardless of the breadth of the range.
- Unless otherwise indicated, the methods and compositions provided herein can utilize conventional techniques and descriptions of organic chemistry, polymer technology, molecular biology (including recombinant techniques), cell biology, biochemistry, and immunology, which are within the skill of the art. Such conventional techniques include polymer array synthesis, hybridization, ligation, and detection of hybridization using a label. Specific illustrations of suitable techniques can be had by reference to the example herein below. However, other equivalent conventional procedures can, of course, also be used. Such conventional techniques and descriptions can be found in standard laboratory manuals such as Genome Analysis: A Laboratory Manual Series (Vols. I-IV), Using Antibodies: A Laboratory Manual, Cells: A Laboratory Manual, PCR Primer: A Laboratory Manual, and Molecular Cloning: A Laboratory Manual (all from Cold Spring Harbor Laboratory Press), Gait, “Oligonucleotide Synthesis: A Practical Approach” 1984, IRL Press, London, Nelson and Cox (2000), Lehninger et al., (2008) Principles of Biochemistry 5th Ed., W.H. Freeman Pub., New York, N.Y. and Berg et al. (2006) Biochemistry, 6.sup.th Ed., W.H. Freeman Pub., New York, N.Y., all of which are herein incorporated in their entirety by reference for all purposes.
- Conventional software and systems may also be used in the methods and compositions provided herein. Computer software products for use herein typically include computer readable medium having computer-executable instructions for performing the logic steps of any of the methods provided herein. Suitable computer readable medium include floppy disk, CD-ROM/DVD/DVD-ROM, hard-disk drive, flash memory, ROM/RAM, magnetic tapes, etc. The computer-executable instructions may be written in a suitable computer language or combination of several languages. Basic computational biology methods are described in, for example, Setubal and Meidanis et al., Introduction to Computational Biology Methods (PWS Publishing Company, Boston, 1997); Salzberg, Searles, Kasif, (Ed.), Computational Methods in Molecular Biology, (Elsevier, Amsterdam, 1998); Rashidi and Buehler, Bioinformatics Basics: Application in Biological Science and Medicine (CRC Press, London, 2000) and Ouelette and Bzevanis Bioinformatics: A Practical Guide for Analysis of Gene and Proteins (Wiley & Sons, Inc., 2.sup.nd ed., 2001). See U.S. Pat. No. 6,420,108.
- The methods and compositions provided herein may also make use of various computer program products and software for a variety of purposes, such as probe design, management of data, analysis, and instrument operation. See, U.S. Pat. Nos. 5,593,839, 5,795,716, 5,733,729, 5,974,164, 6,066,454, 6,090,555, 6,185,561, 6,188,783, 6,223,127, 6,229,911 and 6,308,170. Computer methods related to genotyping using high density microarray analysis may also be used in the present methods, see, for example, US Patent Pub. Nos. 20050250151, 20050244883, 20050108197, 20050079536 and 20050042654.
- Additionally, the present disclosure may have preferred embodiments that include methods for providing genetic information over networks such as the Internet as shown in U.S. Patent Pub. Nos. 20030097222, 20020183936, 20030100995, 20030120432, 20040002818, 20040126840, and 20040049354.
- As used herein, the terms “individual,” “patient,” and “subject” can refer to any single animal, more preferably a mammal (including such non-human animals as, for example, dogs, cats, horses, rabbits, zoo animals, cows, pigs, sheep, and non-human primates) for which treatment is desired. In particular embodiments, the individual or patient herein is a human.
- It will be appreciated that the term “healthy” as used herein, is relative to cancer status, as the term “healthy” cannot be defined to correspond to any absolute evaluation or status. Thus, an individual defined as healthy with reference to any specified disease or disease criterion, can in fact be diagnosed with any other one or more diseases, or exhibit any other one or more disease criterion, including one or more other cancers.
- The term “tumor,” as used herein, can refer to all neoplastic cell growth and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells and tissues. The terms “cancer,” “cancerous,” and “tumor” are not mutually exclusive and can be used interchangeably.
- The term “detection” can include any means of detecting, including direct and indirect detection.
- The terms “substantially” or “substantial” as used herein can mean substantially similar in function or capability or otherwise competitive to the products, items (e.g., type of cancer, nucleic acid complement), services or methods recited herein. Substantially similar products, items (e.g., type of cancer, nucleic acid complement), services or methods are at least 80%, 81%, 82%, 83%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 99.5% similar or the same as a product, item (e.g., type of cancer, nucleic acid complement), service or method recited herein.
- Provided herein are kits, compositions and methods for identifying, determining, detecting or diagnosing integrated, pan-cancer clustering of cluster assignment (COCA) subtypes. That is, the methods can be useful for molecularly defining subsets of cancer regardless of tissue of origin. The methods provide a pan-cancer classification of a tumor sample obtained from subject that can be prognostic and predictive for therapeutic response. The therapeutic response can include chemotherapy, immunotherapy, angiogenesis inhibitor therapy, surgical intervention and/or radiotherapy. The methods can be also provide a prognosis of overall survival for cancer patients according to their pan-cancer, integrated COCA subtype. The kits, compositions and methods provided herein can be used to classify a tumor sample as being any type of COCA subtype known in the art. In one embodiment, the COCA subtype determined or diagnosed by the methods and compositions provided herein are selected from C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA).
- The COCA subtype determined using the kits, compositions or methods provided herein can indicate or disclose the cell or tissue of origin of a tumor sample obtained from a subject. For example, the C1 COCA subtype can indicate that a tumor sample is substantially similar to or is adenocortical carcinoma; the C2 COCA subtype can indicate that a tumor sample is substantially similar to or is glioblastoma; the C3 COCA subtype can indicate that a tumor sample is substantially similar to or is an ovarian serous cystadenocarcinoma (epithelial ovarian cancer); the C4 COCA subtype can indicate that a tumor sample is substantially similar to or is squamous cell carcinoma of the lung, the head and neck or the bladder; the C6 COCA subtype can indicate that a tumor sample is substantially similar to or is lung adenocarcinoma; the C8 COCA subtype can indicate that a tumor sample is substantially similar to or is pancreatic adenocarcinoma; the C9 COCA subtype can indicate that a tumor sample is substantially similar to or is uterine carcinosarcoma; the C10 COCA subtype can indicate that a tumor sample is substantially similar to or is the basal subtype of breast cancer; the C12 COCA subtype can indicate that a tumor sample is substantially similar to or is uterine corpus endometrial cancer; the C14 COCA subtype can indicate that a tumor sample is substantially similar to or is prostate cancer; the C15 COCA subtype can indicate that a tumor sample is substantially similar to or is non-squamous cervical cancer; the C16 COCA subtype can indicate that a tumor sample is substantially similar to or is a bladder urothelial carcinoma; the C17 COCA subtype can indicate that a tumor sample is substantially similar to or is a testicular germ cell tumor; the C19 COCA subtype can indicate that a tumor sample is substantially similar to or is a colon, rectal, esophageal or stomach adenocarcinoma; the C20 COCA subtype can indicate that a tumor sample is substantially similar to or is a sarcoma; the C21 COCA subtype can indicate that a tumor sample is substantially similar to or is a kidney chromophobe, kidney renal papillary cell carcinoma or kidney renal clear cell carcinoma; the C22 COCA subtype can indicate that a tumor sample is substantially similar to or is liver hepatocellular carcinoma; the C24 COCA subtype can indicate that a tumor sample is substantially similar to or is the luminal subtype of breast cancer; the C25 COCA subtype can indicate that a tumor sample is substantially similar to or is thymoma; the C26 COCA subtype can indicate that a tumor sample is substantially similar to or is melanoma; or the C28 COCA subtype can indicate that a tumor sample is substantially similar to or is thyroid cancer.
- “Determining a COCA subtype” can include, for example, diagnosing or detecting the presence, sub-type and cell-of-origin of a cancer, monitoring the progression of the disease, and identifying or detecting cells or samples that are indicative of said pan-cancer subtypes.
- In one embodiment, the COCA subtype is assessed or determined through the evaluation of expression patterns, or profiles, of one or a plurality of classifier biomarkers or biomarkers in one or more subject samples. The term subject, or subject sample, may refer to an individual regardless of health and/or disease status. A subject can be a subject, a study participant, a test subject, a control subject, a screening subject, or any other class of individual from whom a sample is obtained and assessed in the context of the methods and compositions provided herein. Accordingly, a subject can be previously diagnosed with one type of a myriad of cancers, can present with one or more symptoms of said type of cancer, or a predisposing factor, such as a family (genetic) or medical history (medical) factor for said type of cancer, can be undergoing treatment or therapy for said cancer, or the like. Alternatively, a subject can be healthy as de fin e d herein with respect to any of the aforementioned factors or criteria.
- The myriad of cancers from which a subject may be suffering from or suspected of suffering from can be any cancer known in the art. The classifier biomarkers provided herein (e.g., the classifier biomarkers of Table 1) and methods of using said classifier biomarkers can be used to determine an integrated, pan-cancer COCA subtype of the cancer that said subject may be or is suspected of suffering from. Further to any of the embodiments provided herein, the cancer can include, but is not limited to, carcinoma, lymphoma, blastoma (including medulloblastoma and retinoblastoma), sarcoma (including liposarcoma and synovial cell sarcoma), neuroendocrine tumors (including carcinoid tumors, gastrinoma, and islet cell cancer), mesothelioma, schwannoma (including acoustic neuroma), meningioma, adenocarcinoma, melanoma, and leukemia or lymphoid malignancies. Examples of a cancer can also include, but are not limited to, a lung cancer (e.g., a non-small cell lung cancer (NSCLC) or small cell lung cancer), a kidney cancer (e.g., a kidney urothelial carcinoma or RCC), a bladder cancer (e.g., a bladder urothelial (transitional cell) carcinoma (e.g., locally advanced or metastatic urothelial cancer, including 1L or 2L+locally advanced or metastatic urothelial carcinoma)), a breast cancer, a colorectal cancer (e.g., a colon adenocarcinoma), an ovarian cancer, a pancreatic cancer, a gastric carcinoma, an esophageal cancer, a mesothelioma, a melanoma (e.g., a skin melanoma), a head and neck cancer (e.g., a head and neck squamous cell carcinoma (HNSCC)), a thyroid cancer, a sarcoma (e.g., a soft-tissue sarcoma, a fibrosarcoma, a myxosarcoma, a liposarcoma, an osteogenic sarcoma, an osteosarcoma, a chondrosarcoma, an angiosarcoma, an endotheliosarcoma, a lymphangiosarcoma, a lymphangioendotheliosarcoma, a leiomyosarcoma, or a rhabdomyosarcoma), a prostate cancer, a glioblastoma, a cervical cancer, a thymic carcinoma, a leukemia (e.g., an acute lymphocytic leukemia (ALL), an acute myelocytic leukemia (AML), a chronic myelocytic leukemia (CML), a chronic eosinophilic leukemia, or a chronic lymphocytic leukemia (CLL)), a lymphoma (e.g., a Hodgkin lymphoma or a non-Hodgkin lymphoma (NHL)), a myeloma (e.g., a multiple myeloma (MM)), a mycosis fungoides, a Merkel cell cancer, a hematologic malignancy, a cancer of hematological tissues, a B cell cancer, a bronchus cancer, a stomach cancer, a brain or central nervous system cancer, a peripheral nervous system cancer, a uterine or endometrial cancer, a cancer of the oral cavity or pharynx, a liver cancer, a testicular cancer, a biliary tract cancer, a small bowel or appendix cancer, a salivary gland cancer, an adrenal gland cancer, an adenocarcinoma, an inflammatory myofibroblastic tumor, a gastrointestinal stromal tumor (GIST), a colon cancer, a myelodysplastic syndrome (MDS), a myeloproliferative disorder (MPD), a polycythemia Vera, a chordoma, a synovioma, a Ewing's tumor, a squamous cell carcinoma, a basal cell carcinoma, a sweat gland carcinoma, a sebaceous gland carcinoma, a papillary carcinoma, a papillary adenocarcinoma, a medullary carcinoma, a bronchogenic carcinoma, a renal cell carcinoma, a hepatoma, a bile duct carcinoma, a choriocarcinoma, a seminoma, an embryonal carcinoma, a Wilms' tumor, a bladder carcinoma, an epithelial carcinoma, a glioma, an astrocytoma, a medulloblastoma, a craniopharyngioma, an ependymoma, a pinealoma, a hemangioblastoma, an acoustic neuroma, an oligodendroglioma, a meningioma, a neuroblastoma, a retinoblastoma, a follicular lymphoma, a diffuse large B-cell lymphoma, a mantle cell lymphoma, a hepatocellular carcinoma, a thyroid cancer, a small cell cancer, an essential thrombocythemia, an agnogenic myeloid metaplasia, a hypereosinophilic syndrome, a systemic mastocytosis, a familiar hypereosinophilia, a neuroendocrine cancer, or a carcinoid tumor.
- In one embodiment, the cancer is selected from kidney renal papillary cell carcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer (THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma (PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma (KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG); sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD); head and neck squamous cell carcinoma (HNSC); uterine corpus endometrial carcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma (ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma (OV); rectum adenocarcinoma (READ); adrenocortical carcinoma (ACC); uveal melanoma (UVM); mesothelioma (MESO); pheochromocytoma and paraganglioma (PCPG); skin cutaneous melanoma (SKCM); uterine carcinsarcoma (UCS); lung squamous cell carcinoma (LUSC); testicular germ cell tumors (TGCT); cholangiocarcinoma (CHOL); pancreatic adenocarcinoma (PAAD); thymoma (THYM); Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC); and Acute Myeloid Leukemia [LAML] mother embodiment, the cancer is selected from kidney renal papillary cell carcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer (THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma (PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma (KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG); sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD); head and neck squamous cell carcinoma (HNSC); uterine corpus endometrial carcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma (ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma (OV); rectum adenocarcinoma (READ); adrenocortical carcinoma (ACC); uveal melanoma (UVM); mesothelioma (MESO); pheochromocytoma and paraganglioma (PCPG); skin cutaneous melanoma (SKCM); uterine carcinsarcoma (UCS); lung squamous cell carcinoma (LUSC); testicular germ cell tumors (TGCT); cholangiocarcinoma (CHOL); pancreatic adenocarcinoma (PAAD); thymoma (THYM); and Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC).
- As used herein, an “expression profile” or an “expression pattern” or a “biomarker profile” or a “gene signature” can comprise one or more values corresponding to a measurement of the relative abundance, level, presence, or absence of expression of a discriminative or classifier biomarker or biomarker. An expression profile can be derived from a subject prior to or subsequent to a diagnosis of a type of cancer, can be derived from a biological sample collected from a subject at one or more time points prior to or following treatment or therapy, can be derived from a biological sample collected from a subject at one or more time points during which there is no treatment or therapy (e.g., to monitor progression of disease or to assess development of disease in a subject diagnosed with or at risk for a type of cancer), or can be collected from a healthy subject. The term subject can be used interchangeably with patient. The patient can be a human patient. The one or a plurality of classifier biomarkers that can make up an expression profile as provided herein can be selected from one or more biomarkers of Table 1 and/or any additional set of biomarker classifiers disclosed herein.
- As used herein, the term “determining an expression level” or “determining an expression profile” or “detecting an expression level” or “detecting an expression profile” as used in reference to a biomarker or classifier can mean the application of a biomarker specific reagent such as a probe, primer or antibody and/or a method applied to a sample, for example a sample of the subject or patient and/or a control sample, for ascertaining or measuring quantitatively, semi-quantitatively or qualitatively the amount of a biomarker or biomarkers, for example the amount of biomarker polypeptide or mRNA (or cDNA derived therefrom). The level of a biomarker as provided herein can be determined by any number of methods known in the art and/or provided herein. The methods can include for example immunoassays including for example immunohistochemistry, ELISA, Western blot, immunoprecipitation and the like, where a biomarker detection agent such as an antibody for example, a labeled antibody, specifically binds the biomarker and permits for example relative or absolute ascertaining of the amount of polypeptide biomarker, hybridization and PCR protocols where a probe or primer or primer set are used to ascertain the amount of nucleic acid biomarker, including for example probe based and amplification based methods including for example microarray analysis, RT-PCR such as quantitative RT-PCR (qRT-PCR), serial analysis of gene expression (SAGE), Northern Blot, digital molecular barcoding technology, for example Nanostring Counter Analysis, and TaqMan quantitative PCR assays. Other methods of mRNA detection and quantification can be applied, such as mRNA in situ hybridization in formalin-fixed, paraffin-embedded (FFPE) tissue samples or cells. This technology is currently offered by the QuantiGene ViewRNA (Affymetrix), which uses probe sets for each mRNA that bind specifically to an amplification system to amplify the hybridization signals; these amplified signals can be visualized using a standard fluorescence microscope or imaging system. This system for example can detect and measure transcript levels in heterogeneous samples; for example, if a sample has normal and tumor cells present in the same tissue section. As mentioned, TaqMan probe-based gene expression analysis (PCR-based) can also be used for measuring gene expression levels in tissue samples, and this technology has been shown to be useful for measuring mRNA levels in FFPE samples. In brief, TaqMan probe-based assays utilize a probe that hybridizes specifically to the mRNA target. This probe contains a quencher dye and a reporter dye (fluorescent molecule) attached to each end, and fluorescence is emitted only when specific hybridization to the mRNA target occurs. During the amplification step, the exonuclease activity of the polymerase enzyme causes the quencher and the reporter dyes to be detached from the probe, and fluorescence emission can occur. This fluorescence emission is recorded and signals are measured by a detection system; these signal intensities are used to calculate the abundance of a given transcript (gene expression) in a sample.
- In one embodiment, the “expression profile” or a “biomarker profile” or “gene signature” associated with the classifier biomarkers described herein (e.g., Table 1 and/or any additional set of biomarker classifiers as disclosed herein) can be useful for distinguishing between normal and tumor samples. In another embodiment, the tumor samples are one type of cancer as determined based on tissue of origin. The one type of cancer can be any type of cancer known in the art and/or provided herein. In another embodiment, the cancer can be further classified as a specific clustering of cluster assignment (COCA) subtype based upon an expression profile of one or more classifier biomarkers (e.g., Table 1) determined using the methods provided herein. The specific COCA subtype can be any COCA subtype as described in Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304. In one embodiment, the specific COCA subtype can be selected from C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA. Expression profiles using the classifier biomarkers disclosed herein (e.g., Table 1, Table 2 and any additional set of biomarker classifiers as disclosed herein) can provide valuable molecular tools for specifically identifying COCA subtypes, and for treating a cancer based on its COCA subtype. Accordingly, provided herein are methods for screening and classifying a subject for pan-cancer COCA subtypes.
- In some instances, a single classifier biomarker or a plurality of classifier biomarkers provided herein (e.g., from Table 1) is capable of identifying COCA subtypes of cancer with a predictive success of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at 1 east about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, inclusive of all ranges and subranges therebetween.
- In some instances, a single classifier biomarker or a plurality of classifier biomarkers as provided herein (e.g., from Table 1) is capable of determining COCA subtypes of cancer with a sensitivity or specificity of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at 1 east about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, inclusive of all ranges and subranges therebetween.
- Also encompassed herein is a system capable of distinguishing various COCA subtypes of cancer not detectable using current methods. This system can b e capable of processing a large number of subjects and subject variables such as expression profiles and other diagnostic criteria. In one embodiment, the methods for determining a COCA subtype as provided herein using one or a plurality of classifier biomarkers as provided herein (e.g., Table 1) can be part of system capable of distinguishing various COCA subtypes that also utilizes data accumulated from other diagnostic methods. The other diagnostic methods can include additional genome-wide molecular assays or platforms, histochemical, immunohistochemical, cytologic, immunocytologic, visual diagnostic methods including histologic or morphometric evaluation of cancer or tumor tissue or any combination thereof. The additional genome-wide molecular assays or platforms can be selected from whole-exome DNA sequencing assays (e.g., Illumina HiSeq and GAID, DNA copy-number variation assays (e.g., Affymetrix 6.0 microarrays), DNA methylation assays (e.g., Illumina 450,000-feature microarrays), genome-wide mRNA level assays (e.g., Illumina mRNA-seq), microRNA level assays (e.g., Illumina microRNA-seq), and protein level assays for proteins and/or phosphorylated proteins (e.g., Reverse Phase Protein Arrays; RPPA).
- In various embodiments, the expression profile derived from a subject (e.g., from a sample obtained from said subject) is compared to a reference expression profile. A “reference expression profile” or “control expression profile” can be a profile derived from the subject prior to treatment or therapy; can be a profile produced from the subject sample at a particular time point (usually prior to or following treatment or therapy, but can also include a particular time point prior to or following diagnosis of a type of cancer); or can be derived from a healthy individual or a pooled reference from healthy individuals. A reference expression profile can be specific to different C O C A subtypes of cancer. The COCA reference expression profile can be from any tissues from which a specific COCA has been found. As provided herein, in one embodiment, the specific COCA subtype can be any COCA subtype as described in Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304. In one embodiment, the specific COCA subtype can be selected from a C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype.
- The reference expression profile can be compared to a test expression profile or vice versa. A “test expression profile” can be derived from the same subject as the reference expression profile except at a subsequent time point (e.g., one or more days, weeks or months following collection of the reference expression profile) or can be derived from a different subject. In summary, any test expression profile of a subject can be compared to a previously collected profile from a subject that has a specific COCA subtype. The specific COCA subtype can be any COCA subtype as described in Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304. In one embodiment, the specific COCA subtype can be selected from a C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype.
- The classifier biomarkers provided herein (e.g., Table 1) for use in the methods, compositions or kits provided herein can include nucleic acids (RNA, cDNA, and DNA) and proteins, and variants and fragments thereof. Such biomarkers can include DNA comprising the entire or partial sequence of the nucleic acid sequence encoding the biomarker, or the complement of such a sequence. The biomarkers described herein can include RNA comprising the entire or partial sequence of any of the nucleic acid sequences of interest, or their non-natural cDNA products, obtained synthetically in vitro in a reverse transcription reaction. The biomarker nucleic acids can also include any expression product or portion thereof of the nucleic acid sequences of interest. A biomarker protein can be a protein encoded by or corresponding to a DNA biomarker provided herein. A biomarker protein can comprise the entire or partial amino acid sequence of any of the biomarker proteins or polypeptides. The biomarker nucleic acid can be extracted from a bodily fluid (e.g., blood or fractions thereof, urine, saliva, CSF, etc.), a cell or can be cell free or extracted from an extracellular vesicular entity such as an exosome.
- A “classifier biomarker” or “biomarker” or “classifier gene” can be any nucleic acid (DNA, RNA or cDNA) or protein whose level of expression in a tissue or cell is altered compared to that of a normal or healthy cell or tissue or any other reference or control as provided herein. For example, a “classifier biomarker” or “biomarker” or “classifier gene” can be any nucleic acid (DNA, RNA or cDNA) or protein whose level of expression in a tissue or cell is altered in a specific COCA subtype. The detection of the biomarkers provided herein can permit the determination of the specific COCA subtype. The “classifier biomarker” or “biomarker” or “classifier gene” may be one that is up-regulated (e.g. expression is increased) or down-regulated (e.g. expression is decreased) relative to a reference or control as provided herein. The reference or control can be any reference or control as provided herein. In some embodiments, the expression values of nucleic acids (DNA, RNA or cDNA) that are up-regulated or down-regulated in a particular C O C A subtype of cancer can be pooled into one gene signature. The overall expression level in each gene signature is referred to herein as the “‘expression profile” and is used to classify a test sample (i.e., a sample obtained from a subject suffering from or suspected of suffering from cancer) according to the COCA subtype of cancer. However, it is understood that independent evaluation of expression for each of the genes disclosed herein can be used to classify tumor subtypes without the need to group up-regulated and down-regulated genes into one or more gene signatures. In some cases, as shown in Tables 1 and 2, a total of 84 biomarkers can be used for COCA subtype determination. For a specific COCA subtype, for example, expression of 4 of the 84 biomarkers of Table 1 can have altered expression that is correlated therewith. Further, the correlation of the 4 of the 84 biomarkers of Table 1 with the specific COCA subtype can be positive, negative or a combination thereof.
- The classifier biomarkers for use in the methods provided herein can include any nucleic acid (DNA, RNA or cDNA) or protein that is selectively expressed in COCA subtypes of cancer, as defined herein above. Sample biomarker genes are listed in Table 1 below.
- In one embodiment, the 84-gene gene signature for COCA subtyping is found in Table 1. The relative gene expression levels as represented by nearest centroid coefficients of the classifier biomarkers for the 84-gene pan-cancer subtyper of Table 1 are shown in Table 2.
-
TABLE 1 84 Gene Classifier Biomarker Signature for Pan-Cancer COCA subtyping. GenBank SEQ Gene Accession ID NO. Symbol Gene Name Number* 1 A1BG Alpha-1-B NM_130786.3 Glycoprotein 2 ACPP Acid Phosphatase, NM_001099.5 Prostate 3 APC2 APC2, WNT Signaling NM_001351273.1 Pathway Regulator 4 AQP5 Aquaporin 5 NM_001651.4 5 ASGR1 asialoglycoprotein NM_001671.5 receptor 1 6 BCAN brevican NM_021948.5 7 BCL2L15 BCL2 like 15 NM_001010922.3 8 C1orf172 keratinocyte NM_152365.3 differentiation factor 1 9 CAPS calcyphosine NM_004058.5 10 CBLC Cbl proto- NM_012116.4 oncogene C 11 CDH1 cadherin 1 NM_004360.5 12 CEACAM5 carcinoembryonic NM_004363.5 antigen related cell adhesion molecule 5 13 CEACAM6 carcinoembryonic NM_002483.7 antigen related cell adhesion molecule 6 14 CHMP4C multivesicular NM_152284.4 body protein 4C 15 CLCA2 chloride channel NM_006536.7 accessory 2 16 CLDN4 claudin 4 NM_001305.4 17 COL11A2 collagen type NM_080680.2 XI alpha 2 chain 18 CRB3 crumbs cell NM_139161.5 polarity complex component 3 19 CTSE cathepsin E NM_001910.4 20 CUBN cubilin NM_001081.3 21 CYP2B7P1 cytochrome P450 NR_001278.1 family 2 subfamily B member 7, pseudogene 22 DLX5 distal-less homeobox 5 NM_005221.6 23 DMGDH dimethylglycine NM_013391.3 dehydrogenase 24 ELF3 E74 like ETS NM_004433.5 transcription factor 3 25 EMX2 empty spiracles NM_004098.4 homeobox 2 26 EMX2OS EMX2 opposite NR_002791.2 strand/antisense RNA 27 EPCAM epithelial cell NM_002354.2 adhesion molecule 28 ERBB3 erb-b2 receptor NM_001982.3 tyrosine kinase 3 29 ESR1 estrogen receptor 1 NM_000125.3 30 FAM171A2 family with sequence NM_198475.2 similarity 171 member A2 31 FOLH1 folate hydrolase 1 NM_004476.3 32 GABRP gamma-aminobutyric NM_014211.3 acid type A receptor pi subunit 33 GATA3 GATA binding protein 3 NM_001002295.2 34 GCNT3 glucosaminyl (N-acetyl) NM_004751.3 transferase 3, mucin type 35 GPC2 glypican 2 NM_152742.3 36 GPR35 G protein-coupled NM_001195381.1 receptor 35 37 GPRC5A G protein-coupled NM_003979.3 receptor class C group 5 member A 38 GRHL2 grainyhead like NM_024915.3 transcription factor 2 39 HNF1A HNF1 homeobox A NM_000545.6 40 HPX hemopexin NM_000613.3 41 IYD iodotyrosine NM_203395.2 deiodinase 42 KRT18 keratin 18 NM_000224.3 43 KRT6A keratin 6A NM_005554.4 44 KRT6B keratin 6B NM_005555.4 45 KRT81 keratin 81 NM_002281.3 46 KRT8 keratin 8 NM_002273.3 47 LAD1 ladinin 1 NM_005558.3 48 LCK LCK proto-oncogene, NM_005356.5 Src family tyrosine kinase 49 LGALS4 galectin 4 NM_006149.4 50 LYPD1 LY6/PLAUR domain NM_144586.6 containing 1 51 MARVELD3 MARVEL domain NM_052858.5 containing 3 52 MEG3 maternally NR_046473.1 expressed 3 53 MUC13 mucin 13, cell NM_033049.4 surface associated 54 MUC16 mucin 16, cell NM_024690.2 surface associated 55 MUC4 mucin 4, cell NM_018406.7 surface associated 56 MYCN MYCN proto-oncogene, NM_005378.6 bHLH transcription factor 57 NAPSA napsin A aspartic NM_004851.3 peptidase 58 NKX3-1 NK3 homeobox 1 NM_006167.4 59 NPR1 natriuretic NM_000906.4 peptide receptor 1 60 PAX8 paired box 8 NM_003466.4 61 PRAME preferentially NM_206956.3 expressed antigen in melanoma 62 PSCA prostate stem NM_005672.5 cell antigen 63 PVRL4 nectin cell NM_030916.3 adhesion molecule 4 64 S100P calcium binding NM_005980.3 protein P 65 SALL4 spalt like NM_020436.5 transcription factor 4 66 SFTPD surfactant protein D NM_003019.5 67 SILV premelanosome NM_006928.4 protein 68 SIT1 signaling threshold NM_014450.3 regulating transmem- brane adaptor 1 69 SLC26A4 solute carrier NM_000441.1 family 26 member 4 70 SLC3A1 solute carrier NM_000341.3 family 3 member 1 71 SLC45A3 solute carrier NM_033102.3 family 45 member 3 72 SOX17 SRY-box 17 NM_022454.4 73 SPDEF SAM pointed domain NM_012391.3 containing ETS transcription factor 74 SPINT2 serine peptidase NM_021102.4 inhibitor, Kunitz type 2 75 TCEAL5 transcription NM_001012979.3 elongation factor A like 5 76 TG thyroglobulin NM_003235.5 77 TMEM27 collectrin, amino NM_020665.6 acid transport regulator 78 TP63 tumor protein p63 NM_003722.5 79 TRPS1 transcriptional NM_001330599.1 repressor GATA binding 1 80 TSPAN8 tetraspanin 8 NM_004616.3 81 UPK3B uroplakin 3B NM_001347684.1 82 VTN vitronectin NM_000638.4 83 ZNF578 zinc finger NM_001099694.2 protein 578 84 ZNF695 zinc finger NM_020394.5 protein 695 *Each GenBank Accession Number is a representative or exemplary GenBank Accession Number for the listed gene and is herein incorporated by reference in its entirety for all purposes. Further, each listed representative or exemplary accession number should not be construed to limit the claims to the specific accession number. -
TABLE 2 Nearest centroid classifier coefficients of 84 Gene Classifier Biomarker Signature for Pan-Cancer COCA subtyping. C4 C6 C8 Gene C1 C2 C3 (Squamous- (LUAD (PAAD/some C9 # Symbol (ACC/PCPG) (GBM/LGG) (OV) like) enriched) STAD) (UCS) 1 A1BG 1.591560699 0.190424 0.486501004 −0.428197254 0.412635759 −0.184279 2.001627705 2 ACPP −3.165733781 −3.12929 1.422642856 1.55810748 1.220541761 0.3613543 −0.523609534 3 APC2 5.927921166 9.535164 −0.596869926 0.426709368 −0.235550248 0.211678867 0.296218394 4 AQP5 0.913265915 −1.5756 6.077199618 0.038435948 3.968116521 5.439757901 4.689537595 5 ASGR1 0.200382941 2.23723 −0.270715575 −0.385421722 0.9311113 0.377002269 1.767020071 6 BCAN 3.407338299 11.97624 −1.053982755 −0.662093738 −0.729179033 0.649031299 0.667042943 7 BCL2L15 −0.658510708 0.077946 0.865164587 −0.382856173 4.273114175 5.648586443 −0.38038599 8 C1orf172 −7.367511726 −8.11012 0.639328401 0.482516142 0.16674242 −0.401651641 −2.035170988 9 CAPS −0.328076695 −0.34918 2.841784698 −1.13544035 1.075257629 0.923666447 0.259472216 10 CBLC −8.155167351 −8.15517 0.559363299 1.283503926 0.054904876 0.803955968 −1.909451133 11 CDH1 −11.31993378 −6.63507 −0.611897157 0.320830787 0.259254437 0.32354087 −2.417624306 12 CEACAM5 −4.263447619 −4.26345 −2.162761329 5.453807243 8.01040042 8.25154716 −1.643025427 13 CEACAM6 −6.692665202 −6.69267 −1.411084364 3.660636032 8.117243572 7.181029674 −2.853743681 14 CHMP4C −4.851564145 −6.7881 0.477705202 0.092903499 0.246039693 0.425978288 −1.619930594 15 CLCA2 −1.916026013 −0.55212 −2.135469493 10.56468928 0.196047759 −0.982688048 0.230995571 16 CLDN4 −7.769800248 −8.52437 1.293661429 −0.21590352 0.923225733 0.698989001 −2.196157068 17 COL11A2 0.994719726 3.794411 0.981227344 −0.355884432 −0.290171902 −0.134024256 5.747006193 18 CRB3 −6.855921321 −6.69387 0.314038459 −0.051924366 0.42596628 0.230134355 −2.12978152 19 CTSE −2.769179309 −2.76918 0.690060563 −0.068850571 8.211338748 10.27849106 −1.795164025 20 CUBN 1.417595067 1.109969 −1.06751464 −1.098119051 0.200954954 0.214406457 1.001364945 21 CYP2B7P1 −0.494004573 0.493388 −0.20904464 −0.38882331 7.020240242 1.642138931 0.554846851 22 DLX5 0.646764837 −0.46515 −0.543489219 3.043312263 −1.037762982 −0.615048606 3.664621768 23 DMGDH 1.35983288 1.246526 −0.621678415 −1.957477961 0.403167757 0.050041264 −0.042151268 24 ELF3 −9.499613685 −8.35834 1.621031579 0.270463158 1.261978364 1.664172061 −1.774337463 25 EMX2 −1.445515678 4.196057 7.930390253 −0.743521137 −3.08024833 −1.20337606 5.823681705 26 EMX2OS −1.40599953 4.680959 6.731039756 −1.182024547 −3.142205278 −1.044827924 5.399413819 27 EPCAM −4.140528206 −8.15621 1.301641135 −0.969949548 1.349981921 1.210777667 −0.808127169 28 ERBB3 −6.795539466 −2.20761 −0.171425783 −0.850309305 0.543205553 0.488891046 −1.571378197 29 ESR1 −1.563757872 −2.7205 4.824366357 −0.395784527 0.80814913 −0.646508959 −0.041535695 30 FAM171A2 2.31912146 3.133851 2.782191887 −0.20635555 0.23640564 −0.553185081 4.13232199 31 FOLH1 0.35530613 1.32629 0.070865602 −0.437336881 −0.404293953 −0.516227658 0.984459412 32 GABRP −3.114382282 −3.80051 2.138616034 2.054140007 −0.13455569 5.495459292 0.876924899 33 GATA3 3.645314335 −4.46319 1.041365419 0.137891422 −0.30939716 0.245236202 −0.485018337 34 GCNT3 −3.715677872 −4.43208 −0.425219053 1.24405515 3.332011974 6.140028092 −2.627063776 35 GPC2 0.327681714 3.748559 0.567306982 0.383564177 0.020301437 −0.493055636 4.27187925 36 GPR35 −1.123275158 0.288748 0.479762937 −0.401845781 0.612377482 3.964562506 1.02539215 37 GPRC5A −5.029113731 −6.47264 −1.03988769 0.471927094 2.504432044 2.440959491 −1.902607145 38 GRHL2 −9.186320721 −9.01009 0.20905294 0.759772508 0.222957527 −0.501711577 −2.025188838 39 HNF1A −0.226398309 0.326606 −0.566429337 −0.634513541 −0.056397283 3.532952512 0.664824374 40 HPX 0.285105569 −0.08725 −0.761181725 −0.99545754 0.077064824 0.031571949 0.655698572 41 IYD −3.48501457 −3.48501 −2.700731814 −1.942508868 2.764256302 4.516203307 −2.338163874 42 KRT18 −6.139551755 −9.93225 0.824379787 −1.150398992 0.366193169 0.669054426 −1.559549225 43 KRT6A −3.978012535 −3.97801 3.985705153 13.04837042 1.834983617 2.826759232 0.495449701 44 KRT6B −3.679513879 −3.67951 −0.284781052 10.86983161 −0.095874456 4.031758881 −0.319655407 45 KRT81 −1.52156723 −2.24431 1.219674513 0.808360415 0.685287495 0.157557306 1.332452992 46 KRT8 −9.333378281 −12.0127 0.923200159 −0.888982445 0.25864778 1.104030262 −1.049517897 47 LAD1 −9.54391772 −9.93659 0.152727678 2.525900274 1.096406409 2.02745431 −1.338565911 48 LCK −2.653249024 −4.15782 −0.449932121 0.595456972 1.114873169 1.294851611 −0.905250193 49 LGALS4 −1.069860082 −0.88856 −0.804776299 −0.636711502 0.902648195 9.957840161 0.300844381 50 LYPD1 0.161356715 4.620573 6.06218977 0.619464566 0.866852785 1.287462226 2.488687449 51 MARVELD3 −6.499693064 −1.92762 −0.006137317 0.236649367 0.110533207 0.419749056 −1.459194862 52 MEG3 6.987769361 4.00401 0.481443128 −0.367973396 0.187829444 2.037867641 5.549924389 53 MUC13 −1.096164161 −1.549 −0.857929123 −0.216081121 3.60927036 9.342999034 −0.087884353 54 MUC16 −2.940429889 −2.94043 8.971152269 3.553030115 5.922759027 2.391002348 2.159619147 55 MUC4 −2.659938287 −1.76141 1.013937899 4.360400601 6.293886393 4.455995593 0.8936279 56 MYCN 2.635001351 3.722476 3.48370589 −0.476428956 −0.4996297 −0.139261692 4.259965299 57 NAPSA −1.134449647 −0.6277 −0.350262749 0.121656227 11.53466505 −0.206023725 −0.551300842 58 NKX3-1 0.643122217 −0.8988 −1.012928267 0.87446207 0.131533121 0.091873999 −0.013147747 59 NPR1 1.562673445 −1.70035 4.826134025 −0.898468302 0.426172792 0.791217063 0.909336394 60 PAX8 −1.207977403 −2.70163 6.131772035 −0.109570392 −0.507568017 0.425575927 0.567460764 61 PRAME −2.720513358 −3.0116 7.417738065 4.107800576 1.634537806 −0.732218434 8.835740874 62 PSCA −2.62692522 −1.51466 1.088832807 2.403172672 0.853737331 6.468730165 −0.867506614 63 PVRL4 −7.123332103 −7.84158 0.298515276 2.072952358 1.11395964 0.537458726 −2.175556637 64 S100P −4.176354266 −4.39339 −2.039708839 2.60206496 3.54339421 5.588006332 −0.835327459 65 SALL4 −0.350139755 −1.98283 0.992610931 0.242440795 0.631851425 1.50741502 2.172219002 66 SFTPD 1.229072592 0.156327 1.095837733 0.844728495 6.616682345 −0.138899031 −0.528759079 67 SILV −1.601355906 −3.16131 0.436716938 −0.17041022 0.318281992 0.27070394 0.049533868 68 SIT1 −2.339171989 −3.35217 −0.160872017 0.621892503 1.468402409 0.74432008 −1.226672618 69 SLC26A4 −0.01413008 0.420072 −1.783069972 −0.415510022 0.74827946 −0.100896769 −0.464161483 70 SLC3A1 1.225854746 0.996711 −0.32436489 −1.197295399 −0.603148658 5.542018189 0.717943528 71 SLC45A3 −2.005759994 −0.41185 −0.428424528 −0.241045139 0.557739049 1.726290953 0.128310421 72 SOX17 0.824116164 −0.22888 6.125476978 −1.080390132 −0.166935984 0.604218197 1.286142427 73 SPDEF −2.615781968 −2.0345 3.94966981 −0.755535616 4.925193925 4.243866593 0.699764031 74 SPINT2 −2.997432839 −4.83916 1.007795827 0.294659358 0.15758716 0.166037725 −0.979250422 75 TCEAL5 4.349410995 5.379822 1.642558611 −0.898540151 −0.528496105 0.788010053 4.759050684 76 TG 2.696748103 −0.10465 −1.217878931 0.390892921 −0.793389805 −0.297668711 1.286415669 77 TMEM27 −0.42619294 −0.29365 −0.1091435 −0.496636878 0.747255703 0.386605101 −0.460703305 78 TP63 −2.443322255 −2.69429 −1.072539715 8.079773017 1.080093521 −0.122917429 0.715521461 79 TRPS1 −0.827302587 0.82757 1.115573024 0.379838983 −0.553191739 −0.163032265 1.067422295 80 TSPAN8 −1.517176876 −1.38543 1.264805902 −0.971215985 4.123187886 8.120119283 1.88608684 81 UPK3B −1.800031107 −1.79259 6.496391778 2.591465189 1.916362767 1.31370249 1.253864936 82 VTN 4.532732542 0.962046 −0.35391519 −0.827839727 0.374371855 3.646375202 0.21090879 83 ZNF578 1.940365745 2.606116 1.274215935 −2.128937852 −0.532541888 −0.12176826 1.417239081 84 ZNF695 −2.395893789 −0.97465 2.29727236 1.015039672 −0.170693901 −0.682198412 2.909908974 Gene C10 C12 C14 C15 C16 C17 C19 # Symbol (BRCA/Basal) (UCEC) (PAAD) (CESC) (BLCA) (TGCT) (COAD/READ) 1 A1BG 0.142304769 −0.093163359 −1.141696682 −1.152290675 −1.29740042 0.256130124 −1.788698924 2 ACPP −1.398401725 −0.082101813 10.45064724 1.42121 0.266257162 0.477828173 0.992457174 3 APC2 −0.572616388 −0.977273763 0.32549264 0.054659498 0.405219188 1.45040744 0.15536217 4 AQP5 3.702869943 6.247684679 −1.136288477 8.250686508 −2.413179516 2.019598373 −0.061400955 5 ASGR1 −0.374333329 −0.307671908 −1.422479203 −0.69544287 −0.284590427 0.811900301 0.284656903 6 BCAN −0.843138597 −0.389425497 −2.101665722 0.113052513 −0.411742591 2.451102537 1.398031108 7 BCL2L15 −0.509343675 2.21556825 −0.416331352 5.642847062 −0.076320616 −0.539990486 5.956925607 8 C1orf172 0.377167732 0.541078744 0.28000935 0.529249282 0.78721152 −0.331669012 0.205165485 9 CAPS 1.236614303 3.401999923 1.140742546 3.129590389 3.356392461 −1.598575877 −1.041036365 10 CBLC 0.528896861 0.860866464 0.983689368 1.366858965 2.004650922 −1.590854285 1.462118274 11 CDH1 0.147729378 0.016506631 0.816575199 0.187211685 0.578678097 −1.440668807 0.796827631 12 CEACAM5 −0.578531195 −0.226926702 0.223892364 8.216176042 2.697562315 −3.025546808 11.26786682 13 CEACAM6 0.378171192 −0.861664584 −1.150176451 5.321201782 1.780316958 0.990910873 7.545948521 14 CHMP4C 0.932335566 0.332852097 0.654936052 −0.061505461 0.701272543 −2.076344669 0.955706036 15 CLCA2 1.252518338 −0.870513832 0.834994065 0.083375317 5.899414972 0.277783481 −1.434642506 16 CLDN4 0.596506064 0.783384706 0.695706592 0.872264084 1.666131843 −4.130304775 1.506307594 17 COL11A2 1.152615114 1.094477267 −0.512589565 0.34882784 −0.277012497 0.921823497 −0.972050734 18 CRB3 −0.252190917 0.447730499 0.314262173 0.485822293 0.422819651 −1.015012817 0.63102577 19 CTSE −1.579272399 −0.27402142 −0.123593233 4.329326574 3.964690014 0.131319517 6.509851516 20 CUBN −0.042341755 0.692484506 −0.083311834 −1.016397158 −1.291707376 −0.060942034 −1.2605976 21 CYP2B7P1 −0.793782659 −1.135804901 −0.048029795 2.920741779 −1.769335752 −1.106667891 0.57204011 22 DLX5 1.065680224 6.63897818 1.203219318 0.244733336 2.095413211 −0.874548488 −2.278703058 23 DMGDH −1.016515828 −0.716060074 0.747670367 −1.492260644 −2.335607422 −0.511890494 −1.983626926 24 ELF3 0.867636291 1.089943222 −0.33693125 2.161246514 1.979945601 −2.024525261 1.906780054 25 EMX2 0.398485049 7.622270699 −1.139800617 3.497783132 2.434679878 −0.070065196 −2.337175682 26 EMX2OS 0.35047227 6.672513444 −0.771169469 2.651126602 1.918119816 0.003917034 −2.749389229 27 EPCAM 1.070959088 1.65345969 0.656584363 1.491420181 −0.23366156 0.118017267 2.502744305 28 ERBB3 0.156681689 −0.354419176 0.85480606 0.70559009 0.245664699 −1.125976575 1.142628311 29 ESR1 −0.114307986 4.969542469 1.603982283 2.334162784 −0.866991852 −1.616321139 −2.643912362 30 FAM171A2 1.110078033 2.742226868 −0.598712372 1.23539718 0.196442464 2.52966238 −2.299101465 31 FOLH1 1.342626401 1.95199662 7.506581427 −1.68931756 −1.160417237 −1.194199537 −0.990382338 32 GABRP 8.062957771 3.497605248 3.292130436 7.257298478 −1.104614063 −2.409859347 1.866998203 33 GATA3 2.883744656 −1.322536343 0.362300268 0.172939031 5.863126341 0.347938903 −1.305257494 34 GCNT3 −2.229182519 1.453042898 −1.612817806 4.45653155 −2.101854264 −1.198296139 5.978745374 35 GPC2 1.844529239 1.87662419 −0.31412767 −0.281008532 1.248099969 3.603718475 −0.446098297 36 GPR35 −0.201147094 1.145430196 −0.633581001 3.403228537 −0.074926063 0.037914492 4.997511501 37 GPRC5A 0.136594075 −0.047037042 −1.729389157 1.713890286 0.947387044 1.099444707 2.341085357 38 GRHL2 0.887932851 0.608466746 1.846078681 0.587744821 0.927526179 −4.112841993 0.296403053 39 HNF1A −0.757149534 0.998266857 −0.224121716 3.083804118 −0.876897763 −0.071449343 4.816960704 40 HPX 0.083502266 0.467027224 1.898436273 −0.011129604 −0.443467655 1.739428473 −0.713289667 41 IYD −1.986600368 −1.163865741 1.126629676 1.43730223 0.098371764 −3.48501457 5.821174506 42 KRT18 −0.774406938 0.534341861 0.455271746 1.038944818 0.900515088 −0.471025112 1.117607068 43 KRT6A 3.285148104 2.483810663 −0.073242302 4.424176683 4.698929835 −3.375318162 0.482454829 44 KRT6B 6.929849448 −0.029392703 −0.881504967 3.541676869 2.041513217 −3.679513879 2.619439855 45 KRT81 3.704809399 1.125852117 −1.180411884 1.855047045 −0.224822743 −1.682654755 −1.136304837 46 KRT8 0.117987473 0.541422454 −0.071910723 1.371770589 1.338626534 −0.317434922 1.536661174 47 LAD1 1.117718225 0.900829355 −0.184554726 1.845819432 2.28378527 −1.0294741 2.021237694 48 LCK 0.323828061 −0.2698455 −0.135093489 0.809447978 −0.427656591 1.876095595 0.766125516 49 LGALS4 −1.049805056 −0.658445291 0.33823429 2.270150332 1.671819995 1.025358005 10.63263886 50 LYPD1 0.318704537 2.561660067 −1.46291243 0.106783622 −0.846380974 0.579452651 −1.457927861 51 MARVELD3 0.594064846 0.59864298 0.746088507 0.673220178 0.379141989 −1.349340309 0.960362538 52 MEG3 −0.760697048 −0.559506246 0.124435896 −0.790286218 0.176547542 7.212428882 −0.102624496 53 MUC13 −0.415482063 2.373448458 3.347855433 9.337472272 −1.260341969 −1.137285921 11.16914163 54 MUC16 5.749271478 7.257302838 −0.380117549 9.47128499 −0.667829704 2.503910209 −1.68279375 55 MUC4 −0.533649289 1.580638241 1.704056599 7.6185769 1.358543899 3.416007758 4.848907641 56 MYCN 1.167011131 1.749261819 −3.028046397 0.244478213 0.693171089 6.57493164 0.586544197 57 NAPSA −0.568068159 0.272590794 −0.267650347 0.491103798 −0.068387887 0.784909839 −0.516593433 58 NKX3-1 0.636142588 −0.165152927 8.791444726 1.213527947 0.111824365 0.401176282 −1.098810432 59 NPR1 −0.108559874 −0.014674711 −0.783482295 −1.047763786 −1.086165265 −0.51156436 −1.013472417 60 PAX8 −1.679043287 6.175626455 −1.300676688 3.829761838 −0.264941996 0.019241642 0.199759138 61 PRAME 5.753805128 8.637405593 −1.641025038 2.240303364 −1.86413095 6.249324024 0.732324816 62 PSCA 2.380934424 0.822962284 5.024448953 5.074590237 9.409030075 −1.378514517 0.823284273 63 PVRL4 1.983169736 0.54330796 0.274629626 1.209370824 2.956273849 −1.709650966 −0.443057331 64 S100P 3.387292842 0.722588023 0.264728445 6.102225171 7.587401555 −0.745010036 6.222215668 65 SALL4 −0.212011363 −0.183118884 −0.428196415 1.736697609 1.763666611 6.22003894 1.160887388 66 SFTPD −0.864741869 0.18743199 0.85637944 −0.169964303 −1.953003098 0.810342471 −3.059355397 67 SILV −0.445152541 0.023561013 −0.906600385 1.276911935 0.671250555 −0.419750375 0.364343543 68 SIT1 0.404484797 −0.245179126 −0.698370063 −0.228329389 −0.700949754 1.862224753 0.160038551 69 SLC26A4 −0.242562803 −0.736362299 4.542023228 0.049925703 −1.244893781 −1.393279547 −0.773795962 70 SLC3A1 −0.811991759 1.503175223 0.81337641 0.685335393 −0.830561409 0.02998455 4.234950007 71 SLC45A3 0.50351858 −0.257430773 7.798304016 0.259038112 0.761165507 1.057588337 1.253011444 72 SOX17 −0.436621464 6.590489885 0.258734252 0.190327448 −0.537074063 4.1051736 −0.639722737 73 SPDEF 2.428928058 4.878948809 9.396200085 5.896810841 0.785079512 −1.183433789 3.426972043 74 SPINT2 0.199553149 0.669655069 0.202002754 0.813009857 0.747461864 −0.67968284 0.224352664 75 TCEAL5 −0.580215651 0.283236555 1.291973501 −0.314854034 −0.207630749 1.239038231 −1.642797685 76 TG −1.043977276 −1.109552249 1.299196722 −1.24118987 −0.588195791 −0.92548737 0.175292614 77 TMEM27 0.248102359 0.030206129 0.611157159 −0.754191803 −0.750158332 1.251273614 −1.235570332 78 TP63 1.282401189 −1.071684619 3.203409462 −1.13139572 6.245170227 −0.345850774 −2.304303836 79 TRPS1 3.153356243 1.248382334 0.226726961 0.003939999 −2.35803211 −1.170459499 −1.794640325 80 TSPAN8 −1.77985797 0.74692949 6.457395474 4.704465483 −0.385074926 −0.435573001 8.97062099 81 UPK3B 0.43781618 1.273683944 −0.408516441 2.151943812 7.898733759 −0.735658973 −0.139912799 82 VTN −1.022686104 −1.195484585 −0.273276881 −1.471670906 0.728314508 3.181634913 −0.582351194 83 ZNF578 −0.128482728 −0.291752675 0.60523606 −2.466912276 −1.115858745 5.469695435 −1.515460348 84 ZNF695 2.994868611 2.77909196 −1.075168344 2.286781576 1.644923103 3.609635955 1.952089723 C21 Gene C20 (KIRK/ C22 C24 C25 C26 C28 # Symbol (SARC/MESO) KICH/KIRP) (Liver) (BRCA/Luminal) (THYM) (SKCM/UVM) (THCA) 1 A1BG 0.070192984 −0.940840591 8.703413543 0.936369381 −1.706237879 1.914831468 0.703529067 2 ACPP −2.032541038 −2.305500467 −4.40459131 −1.024439493 −2.066553063 −3.741468299 0.699426376 3 APC2 0.552360013 −0.679752407 −0.062564922 −0.527174725 −0.177801985 1.332087657 −1.406046015 4 AQP5 −1.419094969 −2.627502329 −2.379706191 −0.280015058 0.67166607 −1.438430189 3.948999547 5 ASGR1 0.462676231 −0.369411855 9.174063668 −1.440166184 −0.048667321 −1.316415138 1.003468362 6 BCAN −0.209671799 0.222422862 0.598814814 −0.685014857 −2.971866954 7.441867064 −2.689173959 7 BCL2L15 −0.946597411 0.308006633 −0.847993525 −0.339415371 0.44529515 −1.396188006 −1.219332417 8 C1orf172 −7.714313141 −2.053892696 −1.961055619 0.097896284 −1.121520119 −5.883858505 0.420348593 9 CAPS −0.499052174 −1.362494555 −0.691670099 0.575620059 0.228128949 −0.326249828 1.883752499 10 CBLC −8.155167351 −3.648064102 0.283666077 0.197383744 −8.155167351 −8.155167351 −4.683902455 11 CDH1 −7.229104847 −1.902809546 −0.803408059 0.453092886 −1.794492653 −0.380697254 1.07759058 12 CEACAM5 −4.263447619 −4.263447619 −4.263447619 3.43408992 −4.263447619 −3.826859325 −3.766911393 13 CEACAM6 −6.003097658 −6.22642964 −6.020466183 2.793954656 −5.749231419 −6.088214272 −1.029763418 14 CHMP4C −6.433893852 −0.448980057 −0.477565154 0.166274869 −3.416609246 −6.4170014 0.423295456 15 CLCA2 1.090083657 −1.916425099 −2.448293094 2.423949605 0.511588318 −0.16703996 −0.923829634 16 CLDN4 −6.877306455 −0.800472122 −4.850938463 −0.062787487 −6.750496629 −8.258170243 1.193098579 17 COL11A2 0.407171494 −0.522028403 −1.162577547 −1.391989324 2.822133446 4.012255459 0.998530701 18 CRB3 −6.899881762 0.409378715 0.082599609 −0.202979549 −3.405323461 −5.927493049 0.442041688 19 CTSE −2.104241431 0.366273475 −1.243260786 −2.250040026 −2.122740123 −1.826195711 5.586773138 20 CUBN 0.713604848 7.281399905 −1.384529654 −0.145536797 1.410829201 3.566340978 3.20938295 21 CYP2B7P1 −0.654635147 −1.447176918 5.139802834 6.622628205 −1.379336644 −1.212199304 −0.964049728 22 DLX5 0.462490967 0.017699215 −3.265329736 −0.745686562 −2.117303943 −0.829299281 −0.068606829 23 DMGDH 0.028856242 6.060504719 6.702437958 −0.060898439 −0.46134749 −1.489004211 2.38214753 24 ELF3 −7.606527581 −0.555341599 −0.334664939 0.302735788 −5.416983065 −8.778305188 −1.61788093 25 EMX2 2.802771238 7.074822861 −3.08024833 1.035384246 −1.557188268 −0.213597469 −1.488833995 26 EMX2OS 2.520378612 7.353718721 −3.593640608 1.294980889 −1.893334068 −0.333268626 −1.117234841 27 EPCAM −8.94327619 −1.582907817 −6.179887918 0.150227096 −3.89203662 −9.927578427 1.469515171 28 ERBB3 −7.397006842 0.362674201 0.676372657 1.354976731 −7.751611174 1.393383686 −0.864197462 29 ESR1 −0.204263771 0.353961842 −0.001186471 7.045755877 −0.917732083 −0.567345455 1.658056187 30 FAM171A2 0.843521911 −1.147803639 −1.94488491 −0.67709113 −0.913518602 −0.123166538 2.177241264 31 FOLH1 −0.285087483 2.107020634 2.114399746 −0.44262496 −3.693628465 −1.90866141 −0.126630866 32 GABRP −2.303619388 −1.113604356 −2.60527593 2.880519078 −2.969804846 −0.477412964 −2.451544871 33 GATA3 −0.479072713 −0.694342037 −2.42000097 7.039537351 2.449239776 −2.642455615 1.305924613 34 GCNT3 −3.369932214 3.012084991 0.745283917 −3.199989256 0.677330491 −4.065793513 −0.452261295 35 GPC2 −0.655639972 −1.345174813 −2.097850016 −0.520643983 2.326152106 −0.721335705 −2.085120953 36 GPR35 −0.15384199 0.320806564 −0.470378252 −0.642499168 1.115849375 −1.434965852 −1.068214302 37 GPRC5A −1.978637673 −3.708321529 −7.617162596 1.869297678 −7.799087331 −2.526958538 1.836642695 38 GRHL2 −8.760984555 −8.300732853 −8.41380076 0.977699908 −2.733003314 −9.346544321 −0.260490678 39 HNF1A −0.460004648 5.003097445 5.740038488 −0.419983585 −0.254057708 −0.06916107 −1.339783101 40 HPX 0.428825996 −0.644318213 12.34302566 4.13452407 −0.715914255 −1.121106629 −0.927626931 41 IYD −3.48501457 1.83359512 5.111462206 0.63980155 −3.48501457 −3.48501457 9.864317761 42 KRT18 −5.393634178 −0.214336662 0.659378292 0.645722621 −3.361767124 −6.292360598 0.141599862 43 KRT6A −2.750394666 −3.593741642 −3.978012535 −1.007177282 1.775950277 −1.756754105 −1.199299426 44 KRT6B −2.809563532 −3.679513879 −3.007677051 3.038039173 1.074967474 −0.771604139 −2.767720312 45 KRT81 −0.708340478 −0.549017438 −1.474487037 1.719278698 0.293233459 −2.832035518 1.29934714 46 KRT8 −6.585518291 −0.445328958 0.292357177 0.583940663 −1.211875599 −7.214684995 0.11941998 47 LAD1 −6.366889981 −3.487703983 −0.03314457 −0.937855879 −2.032988069 −4.586601984 −0.415235244 48 LCK −0.077998068 0.355214635 −0.747932263 −0.312039768 5.221543649 −1.853755715 −0.526016141 49 LGALS4 −0.883963009 1.366073433 9.773901918 −1.137042557 0.240641913 −1.598817352 −0.203473823 50 LYPD1 0.387124296 −0.528951531 0.547288234 −0.876913408 0.13379984 0.661428164 −0.897772519 51 MARVELD3 −5.995262907 −0.510050567 −0.675659361 0.322410532 −2.735366823 −6.383666653 −0.153513769 52 MEG3 2.478280919 −2.166933932 −0.320863598 0.160100014 2.853324939 −2.730675804 −3.166451005 53 MUC13 −0.981854002 1.997077234 6.771194467 −1.618112428 −1.772933448 −2.419188288 −1.948580943 54 MUC16 −1.427646768 −1.678538069 −2.940429889 1.60919 −1.668765178 −2.940429889 1.33450999 55 MUC4 −3.199831606 −0.551376679 −2.460727016 −2.202007274 −1.442945771 −4.022799483 −1.386650033 56 MYCN −1.115153774 −1.072643704 −0.572987836 −0.563222061 2.256763699 −1.121103788 0.383429385 57 NAPSA −0.441224683 5.357444831 −0.330311961 −0.680826272 1.54889432 −1.10948967 3.230552348 58 NKX3-1 0.34424304 −0.695607881 −0.636037275 1.380598103 −0.358807355 −1.225980043 −1.347772078 59 NPR1 3.611141372 2.936608674 0.739290647 −0.02018339 −0.04911732 −1.692369146 0.560520251 60 PAX8 −0.438408776 5.138704009 −0.439224734 −1.115572722 −0.742610195 −1.098989328 7.330352514 61 PRAME −2.28601297 3.648546027 −1.904681098 −1.204180981 3.601719603 9.250663624 −3.172944285 62 PSCA −1.664873567 −2.372135296 −2.32137749 1.204841172 −1.580925941 0.391748243 −3.189626014 63 PVRL4 −4.983285402 −6.308620694 −6.330798801 1.140190077 −3.872718208 −6.359272034 0.126583059 64 S100P −4.27172475 −4.04897439 0.499095434 2.122747189 −4.765549062 −4.663901764 −4.574712189 65 SALL4 −0.701854731 −2.670674063 −1.068969101 0.308484856 −1.113633609 2.219699773 0.019104844 66 SFTPD −2.015504097 −0.059979562 −1.883528068 −0.664564777 −1.784683096 −1.174451747 2.77077955 67 SILV −0.404015329 0.403352323 1.696442392 0.346443524 −1.809580739 12.3265592 −0.064184971 68 SIT1 0.560522874 0.56641542 −0.448541127 0.083905789 6.077538122 −1.010014644 −0.588029822 69 SLC26A4 0.245073029 0.600404783 −1.143209401 0.077251179 0.024544315 0.276126125 7.424523266 70 SLC3A1 −0.360884127 10.50265557 1.764194349 −0.750224836 −0.620498156 −0.303626634 −0.228977476 71 SLC45A3 −1.119761006 0.021163292 0.733358084 −0.480014382 −2.00381626 −1.453023157 −2.254242818 72 SOX17 0.817442439 0.837037779 0.345260582 0.26054328 −0.203888143 −1.596069152 0.733884158 73 SPDEF 2.697057213 −2.838870874 −2.854924213 7.609282727 −3.084262875 −2.745456716 −2.779572185 74 SPINT2 −3.920290094 −0.542550342 −5.841694183 0.163004523 −0.712680222 −5.966064445 0.738611694 75 TCEAL5 0.881390897 −0.79947069 −2.066499307 0.638825329 0.319737925 −0.114766242 0.788194285 76 TG 0.984073587 0.60666684 −1.221557274 −1.111088643 1.124956459 −0.727984197 15.10003804 77 TMEM27 −1.803900196 6.875082323 0.633259141 −0.517663186 −0.292695792 −0.470425257 1.189669219 78 TP63 −0.665339268 −2.068568045 −1.9563835 1.974020464 5.386578658 −2.022259539 −0.159043543 79 TRPS1 0.01584306 −0.129451899 −2.232072347 4.425365059 −1.181551745 −1.947966145 −0.277029596 80 TSPAN8 −1.889160765 −1.739590705 5.986779156 −2.397233332 −1.9296585 −4.08731649 −2.677540765 81 UPK3B −0.660628051 −0.514196839 −0.477166533 −0.755620892 −0.069584588 −1.061067781 −0.712380093 82 VTN 2.923525274 −0.52452731 14.37513361 −0.702306293 −0.029929938 1.884788007 −0.627880077 83 ZNF578 1.22614863 0.540308545 −1.87596215 −0.083197077 0.429479376 −0.209295458 2.232739055 84 ZNF695 −0.449132017 −2.051634999 −3.038221841 0.76346101 0.74872153 −0.970490477 −1.99162398 - In one embodiment, a subset of one or more of the 84 genes of Table 1 can be used to classify or determine the COCA subtype of a tumor sample. In one embodiment, all 84 genes of Table 1 can be used to classify or determine the COCA subtype of a tumor sample. In some embodiments, the up-regulation of a classifier biomarker (e.g. expression is increased) can refer to an expression value that is positive (i.e., higher than zero) relative to a reference or control as provided herein. In some embodiments, the down-regulation of a classifier biomarker (e.g. expression is decreased) can refer to an expression value that is negative (i.e., lower than zero) relative to a reference or control as provided herein. In some embodiments, a classifier biomarker may have no specific effects on a certain COCA subtype when the expression level equals to zero.
- In some embodiments, determining integrated, pan-cancer COCA subtypes can further include measuring the expression of at least one biomarker from an additional set of biomarker classifiers. In one embodiment, an additional set of biomarker classifiers can include measuring gene signatures related to cell proliferation. The gene signatures related to cell proliferation for use in the methods provided herein can include the 11 gene signature comprising BIRC5, CCNB1, CDCl20, CDCA1, CEP55, KNTC2, MKI67, PTTG1, RRM2, TYMS, and UBE2C found in Martin M. et al., Breast Cancer Res Treat, 138: 457-466 (2013), the 18 gene signature found in US 20160115551 and/or the 26 gene signature found in 62/789,668 filed Jan. 8, 2019, each of which is herein incorporated by reference. In one embodiment, an additional set of biomarker classifiers can include a 5 gene signature comprising tumor driver genes such as TP53 and RB1, and receptor tyrosine kinases including FGFR2, FGFR3, and ERBB2. In one embodiment, the 5 gene signature is related to the signature of tumor driver genes. In one embodiment, the biomarker classifiers can also include immune cell signatures that are known in the art (Bindea G. et al., Immunity, 39(4): 782-95 (2013); Faruki H. et al., JTO, 12(6): 943-953 (2017); Charoentong P. et al., Cell reports, 18, 248-262 (2017); Thorsson, V., Gibbs, D. L., Brown, S. D., Wolf, D., Bortone, D. S., Yang, T. H. O., Porta-Pardo, E., Gao, G. F., Plaisier, C. L., Eddy, J. A. and Ziv, E., 2018, The immune landscape of cancer. Immunity, 48(4), pp. 812-830; and/or WO2017/201165, and WO2017/201164), each of which is herein incorporated by reference). In one embodiment, an additional set of biomarker classifiers can include assessing tumor purity ABSOLUTE derived from the TCGA supplementary data. In one embodiment, the additional set of biomarker can be gene signatures known in the art for specific types of cancer. In one embodiment, the cancer is lung cancer and the gene signature is selected from the gene signatures found in WO2017/201165, WO2017/201164, US20170114416 or U.S. Pat. No. 8,822,153, each of which is herein incorporated by reference in their entirety. In one embodiment, the cancer is head and neck squamous cell carcinoma (HNSCC) and the gene signature is selected from the gene signatures found in PCT/US18/45522 or PCT/US18/48862, each of which is herein incorporated by reference in their entirety. In one embodiment, the cancer is breast cancer and the gene signature is the PAM50 sub-typer found in Parker J S et al., (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27:1160-1167, which is herein incorporated by reference in its entirety. In one embodiment, the cancer is bladder cancer and the gene signature can include the bladder cancer biomarker signature described in Gene Expression Omnibus (GEO) dataset: GSE87304, Seiler R. et al., Eur Urol, 72(4):544-554 (2017); Gene Expression Omnibus (GEO) dataset: GSE32894, Sjodahl G. et al., Clin Cancer Res, 18(12):3377-86 (2012), each of which is herein incorporated by reference). In one embodiment, the cancer is bladder cancer (e.g., MIBC) and the gene signature can include the bladder cancer biomarker signatures described in 62/629,975 filed Feb. 13, 2018, which is herein incorporated by reference. In one embodiment, the cancer is bladder cancer (e.g., MIBC) and the gene signature can include the bladder cancer biomarker signature described in The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma.
Nature volume 507, pages 315-322 (2014), or Robertson, A G, et al., Cell, 171(3): 540-556 (2017), each of which is herein incorporated by reference. - In some embodiments, determining integrated, pan-cancer COCA subtypes can further include assessing tumor mutation burden (TMB) and/or TMB rate. In one embodiment, the TMB value and/or rate can be calculated from RNA (e.g., via transcriptome profiling or RNA sequencing)) as provided in U.S. 62/771,702 filed Nov. 27, 2018 and U.S. 62/743,257 filed Oct. 9, 2018, which is herein incorporated by reference herein.
- As provided herein, the expression levels of the at least one of the classifier biomarkers (such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein) determined, measured or detected from the sample obtained from the subject can then be compared to reference expression levels of the at least one of the classifier biomarkers (such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein) from at least one sample training set. The at least one sample training set can comprise, (i) expression levels of the at least one biomarker from a sample that overexpresses the at least one biomarker or (ii) expression levels from a reference sample for a specific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) and classifying the sample obtained from the subject sample as a specific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) based on the results of the comparing step. In one embodiment, the comparing step can comprise applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample obtained from the subject and the expression data from the at least one training set(s); and classifying the sample obtained from the subject as a specific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) based on the results of the statistical algorithm. The statistical algorithm can be any statistical algorithm found in the art and/or provided herein.
- In one embodiment, the statistical algorithm for the comparing step can be an algorithm that comprises determining a correlation between the expression data obtained from the tumor sample obtained from the subject (i.e., test sample) and centroids constructed from the expression levels or profiles measured or detected for the at least one classifier biomarkers (such as the classifier biomarkers of Table 1 or subsets thereof or any additional set of biomarker classifiers or subsets thereof as disclosed herein) from the at least one training set. The COCA subtype for the tumor sample (i.e., test sample) can then be assigned by finding the centroid to which it is nearest from the centroids constructed from the expression data from the at least one training set, using any distance measure e.g. Euclidean distance or correlation. The centroids can be constructed using any method known in the art for generating centroids such as, for example, those found in Mullins et al. (2007) Clin Chem. 53(7):1273-9 or Dabney (2005) Bioinformatics 21(22):4148-4154 The COCA subtype can then be assigned to the tumor sample obtained from subject based on the use of a classification to the nearest centroid (CLaNC) algorithm as applied to the expression data generated or measured from the tumor sample and the centroid(s) constructed for the at least one training sets. The CLaNC algorithm for use in the methods, compositions and kits provided herein can be the CLaNC algorithm implemented by the CLaNC software found in Dabney A R. ClaNC: Point-and-click software for classifying microarrays to nearest centroids. Bioinformatics. 2006; 22: 122-123 or equivalents or derivatives thereof.
- The methods and compositions provided herein allow for the differentiation or diagnosis of a sample obtained from a subject as being a specific COCA subtype. The COCA subtype can be one of 21 integrated, pan-cancer COCA subtypes of cancer selected from C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA). The differentiation, detection or diagnosis of the sample obtained from the subject as being a COCA subtype as provided herein can be accomplished by measuring or detecting the presence and/or level of one or more classifier biomarkers from a publically available pan-cancer dataset and/or a pan-cancer dataset provided herein (e.g., Table 1). The measuring can be at the nucleic acid or protein level.
- A sample for use in any of the methods and compositions provided herein can be a tumor sample obtained from a subject or patient suffering from or suspected of suffering from a type of cancer. The type of cancer can be any type of cancer provided herein and/or known in the art. The tumor sample used for the detection or differentiation methods described herein can be a sample previously determined or diagnosed as a type of cancer sample using traditional tissue-of-origin methods. The previous diagnosis can be based on a histological analysis. The histological analysis can be performed by one or more pathologists.
- The sample (e.g., tumor sample) can be any sample (e.g., tumor) isolated from the subject or patient. In one embodiment, the subject or patient is a human subject or patient. For example, in one embodiment, the analysis is performed on biopsies that are embedded in paraffin wax. In one embodiment, the sample can be a fresh frozen tissue sample. In another embodiment, the sample can be a bodily fluid obtained from the patient. The bodily fluid can be blood or fractions thereof (i.e., serum, plasma), urine, saliva, sputum or cerebrospinal fluid (CSF). The sample can contain cellular as well as extracellular sources of nucleic acid or protein for use in the methods provided herein. The extracellular sources can be cell-free DNA and/or exosomes. In one embodiment, the sample can be a cell pellet or a wash. This aspect of the methods provided herein provides a means to improve current diagnostics by accurately identifying the major histological types, even from small biopsies. The methods provided herein, including the RT-PCR methods, are sensitive, precise and have multi-analyte capability for use with paraffin-embedded samples. See, for example, Cronin et al. (2004) Am. J Pathol. 164(1):35-42, herein incorporated by reference.
- Formalin fixation and tissue embedding in paraffin wax is a universal approach for tissue processing prior to light microscopic evaluation. A major advantage afforded by formalin-fixed paraffin-embedded (FFPE) specimens is the preservation of cellular and architectural morphologic detail in tissue sections. (Fox et al. (1985) J Histochem Cytochem 33:845-853). The standard buffered formalin fixative in which biopsy specimens are processed is typically an aqueous solution containing 37% formaldehyde and 10-15% methyl alcohol. Formaldehyde is a highly reactive dipolar compound that results in the formation of protein-nucleic acid and protein-protein crosslinks in vitro (Clark et al. (1986) J Histochem Cytochem 34:1509-1512; McGhee and von Hippel (1975) Biochemistry 14:1281-1296, each incorporated by reference herein).
- In one embodiment, the sample used herein is obtained from an individual, and comprises formalin-fixed paraffin-embedded (FFPE) tissue. However, other tissue and sample types are amenable for use herein. In one embodiment, the other tissue and sample types can be fresh frozen tissue, wash fluids, cell pellets, or the like. In one embodiment, the sample can be a bodily fluid obtained from the individual. The bodily fluid can be blood or fractions thereof (e.g., serum, plasma), urine, sputum, saliva or cerebrospinal fluid (CSF). A biomarker nucleic acid as provided herein can be extracted from a cell, can be cell free or extracted from an extracellular vesicular entity such as an exosome.
- Methods are known in the art for the isolation of nucleic acid (e.g., RNA) from FFPE tissue. In one embodiment, total RNA can be isolated from FFPE tissues as described by Bibikova et al. (2004) American Journal of Pathology 165:1799-1807, herein incorporated by reference. Likewise, the High Pure RNA Paraffin Kit (Roche) can be used. Paraffin is removed by xylene extraction followed by ethanol wash. RNA can be isolated from sectioned tissue blocks using the MasterPure Purification kit (Epicenter, Madison, Wis.); a DNase I treatment step is included. RNA can be extracted from frozen samples using Trizol reagent according to the supplier's instructions (Invitrogen Life Technologies, Carlsbad, Calif.). Samples with measurable residual genomic DNA can be resubjected to DNasel treatment and assayed for DNA contamination. All purification, DNase treatment, and other steps can be performed according to the manufacturer's protocol. After total RNA isolation, samples can be stored at −80° C. until use.
- General methods for mRNA extraction are well known in the art and are disclosed in standard textbooks of molecular biology, including Ausubel et al., ed., Current Protocols in Molecular Biology, John Wiley & Sons, New York 1987-1999. Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker (Lab Invest. 56:A67, 1987) and De Andres et al. (Biotechniques 18:42-44, 1995). In particular, RNA isolation can be performed using a purification kit, a buffer set and protease from commercial manufacturers, such as Qiagen (Valencia, Calif.), according to the manufacturer's instructions. For example, total RNA from cells in culture can be isolated using Qiagen RNeasy mini-columns. Other commercially available RNA isolation kits include MasterPure™. Complete DNA and RNA Purification Kit (Epicentre, Madison, Wis.) and Paraffin Block RNA Isolation Kit (Ambion, Austin, Tex.). Total RNA from tissue samples can be isolated, for example, using RNA Stat-60 (Tel-Test, Friendswood, Tex.). RNA prepared from a tumor can be isolated, for example, by cesium chloride density gradient centrifugation. Additionally, large numbers of tissue samples can readily be processed using techniques well known to those of skill in the art, such as, for example, the single-step RNA isolation process of Chomczynski (U.S. Pat. No. 4,843,155, incorporated by reference in its entirety for all purposes).
- In one embodiment, a sample comprises cells harvested from a tumor sample. Cells can be harvested from a biological sample using standard techniques known in the art. For example, in one embodiment, cells are harvested by centrifuging a cell sample and resuspending the pelleted cells. The cells can be resuspended in a buffered solution such as phosphate-buffered saline (PBS). After centrifuging the cell suspension to obtain a cell pellet, the cells can be lysed to extract nucleic acid, e.g, messenger RNA. All samples obtained from a subject, including those subjected to any sort of further processing, are considered to be obtained from the subject.
- The sample, in one embodiment, is further processed before the detection of the biomarker levels of the combination of biomarkers set forth herein. For example, mRNA in a cell or tissue sample can be separated from other components of the sample. The sample can be concentrated and/or purified to isolate mRNA in its non-natural state, as the mRNA is not in its natural environment. For example, studies have indicated that the higher order structure of mRNA in vivo differs from the in vitro structure of the same sequence (see, e.g., Rouskin et al. (2014).
Nature 505, pp. 701-705, incorporated herein in its entirety for all purposes). - mRNA from the sample in one embodiment, is hybridized to a synthetic DNA probe, which in some embodiments, includes a detection moiety (e.g., detectable label, capture sequence, barcode reporting sequence). Accordingly, in these embodiments, a non-natural mRNA-cDNA complex is ultimately made and used for detection of the biomarker. In another embodiment, mRNA from the sample is directly labeled with a detectable label, e.g., a fluorophore. In a further embodiment, the non-natural labeled-mRNA molecule is hybridized to a cDNA probe and the complex is detected.
- In one embodiment, once the mRNA is obtained from a sample, it is converted to complementary DNA (cDNA) prior to the hybridization reaction or is used in a hybridization reaction together with one or more cDNA probes. cDNA does not exist in vivo and therefore is a non-natural molecule. Furthermore, cDNA-mRNA hybrids are synthetic and do not exist in vivo. Besides cDNA not existing in vivo, cDNA is necessarily different than mRNA, as it includes deoxyribonucleic acid and not ribonucleic acid. The cDNA is then amplified, for example, by the polymerase chain reaction (PCR) or other amplification method known to those of ordinary skill in the art. For example, other amplification methods that may be employed include the ligase chain reaction (LCR) (Wu and Wallace, Genomics, 4:560 (1989), Landegren et al., Science, 241:1077 (1988), incorporated by reference in its entirety for all purposes, transcription amplification (Kwoh et al., Proc. Natl. Acad. Sci. USA, 86:1173 (1989), incorporated by reference in its entirety for all purposes), self-sustained sequence replication (Guatelli et al., Proc. Nat. Acad. Sci. USA, 87:1874 (1990), incorporated by reference in its entirety for all purposes), incorporated by reference in its entirety for all purposes, and nucleic acid based sequence amplification (NASBA). Guidelines for selecting primers for PCR amplification are known to those of ordinary skill in the art. See, e.g., McPherson et al., PCR Basics: From Background to Bench, Springer-Verlag, 2000, incorporated by reference in its entirety for all purposes. The product of this amplification reaction, i.e., amplified cDNA is also necessarily a non-natural product. First, as mentioned above, cDNA is a non-natural molecule. Second, in the case of PCR, the amplification process serves to create hundreds of millions of cDNA copies for every individual cDNA molecule of starting material. The numbers of copies generated are far removed from the number of copies of mRNA that are present in vivo.
- In one embodiment, cDNA is amplified with primers that introduce an additional DNA sequence (e.g., adapter, reporter, capture sequence or moiety, barcode) onto the fragments (e.g., with the use of adapter-specific primers), or mRNA or cDNA biomarker sequences are hybridized directly to a cDNA probe comprising the additional sequence (e.g., adapter, reporter, capture sequence or moiety, barcode). Amplification and/or hybridization of mRNA to a cDNA probe therefore serves to create non-natural double stranded molecules from the non-natural single stranded cDNA, or the mRNA, by introducing additional sequences and forming non-natural hybrids. Further, as known to those of ordinary skill in the art, amplification procedures have error rates associated with them. Therefore, amplification introduces further modifications into the cDNA molecules. In one embodiment, during amplification with the adapter-specific primers, a detectable label, e.g., a fluorophore, is added to single strand cDNA molecules. Amplification therefore also serves to create DNA complexes that do not occur in nature, at least because (i) cDNA does not exist in vivo, (i) adapter sequences are added to the ends of cDNA molecules to make DNA sequences that do not exist in vivo, (ii) the error rate associated with amplification further creates DNA sequences that do not exist in vivo, (iii) the disparate structure of the cDNA molecules as compared to what exists in nature, and (iv) the chemical addition of a detectable label to the cDNA molecules.
- In some embodiments, the expression of a biomarker of interest is detected at the nucleic acid level via detection of non-natural cDNA molecules.
- The biomarkers described herein can include RNA comprising the entire or partial sequence of any of the nucleic acid sequences of interest, or their non-natural cDNA product, obtained synthetically in vitro in a reverse transcription reaction. The term “fragment” is intended to refer to a portion of the polynucleotide that generally comprise at least 10, 15, 20, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 800, 900, 1,000, 1,200, or 1,500 contiguous nucleotides, or up to the number of nucleotides present in a full-length biomarker polynucleotide disclosed herein. A fragment of a biomarker polynucleotide will generally encode at least 15, 25, 30, 50, 100, 150, 200, or 250 contiguous amino acids, or up to the total number of amino acids present in a full-length biomarker protein as provided herein.
- Isolated mRNA can be used in hybridization or amplification assays that include, but are not limited to, Southern or Northern analyses, PCR analyses and probe arrays, NanoString Assays. One method for the detection of mRNA levels involves contacting the isolated mRNA or synthesized cDNA with a nucleic acid molecule (probe) that can hybridize to the mRNA encoded by the gene being detected. The nucleic acid probe can be, for example, a cDNA, or a portion thereof, such as an oligonucleotide of at least 7, 15, 30, 50, 100, 250, or 500 nucleotides in length and sufficient to specifically hybridize under stringent conditions to the non-natural cDNA or mRNA biomarker provided herein.
- In one embodiment, the measuring or detecting step in any method provided herein is at the nucleic acid level by performing RNA-seq, a reverse transcriptase polymerase chain reaction (RT-PCR) or a hybridization assay with oligonucleotides that are substantially complementary to portions of cDNA molecules of the at least one classifier biomarker (such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein) under conditions suitable for RNA-seq, RT-PCR or hybridization and obtaining expression levels of the at least one classifier biomarkers based on the detecting step.
- In some embodiments, the method for COCA subtyping includes not only detecting expression levels of a classifier biomarker set in a sample obtained from a subject, but can further comprise detecting expression levels of said classifier biomarker set in one or more control or reference samples. The one or more control or reference samples can be selected from a normal or cancer-free sample, a cancer sample of a specific COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) or any combination thereof. In some embodiments, the detecting includes all of the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein at the nucleic acid level or protein level. In some embodiments, the detecting includes all of the classifier biomarkers of Table 1 at the nucleic acid level or protein level. In another embodiment, a single or a subset or a plurality of the classifier biomarkers of Table 1 are detected, for example, from about 1 to about 4, from about 4 to about 8, from about 8 to about 12, from about 12 to about 16, from about 16 to about 20, from about 20 to about 24, from about 24 to about 28, from about 28 to about 32, from about 32 to about 36, from about 36 to about 40, from about 40 to about 44, from about 44 to about 48, from about 48 to about 52, from about 52 to about 56, from about 56 to about 60, from about 60 to about 64, from about 64 to about 68, from about 68 to about 72, from about 72 to about 76, from about 76 to about 80 of the biomarkers in Table 1 are detected in a method to determine the COCA subtype. In another embodiment, each of the biomarkers from Table 1 is detected in a method to determine the COCA subtype. In another embodiment, any of 84 of the biomarkers from Table 1 are selected as the gene signatures for a specific COCA subtype. The detecting can be performed by any suitable technique including, but not limited to, RNA-seq, a reverse transcriptase polymerase chain reaction (RT-PCR), a microarray hybridization assay, or another hybridization assay, e.g., a NanoString assay for example, with primers and/or probes specific to the classifier biomarkers, and/or the like. In some cases, the primers useful for the amplification methods (e.g., RT-PCR or qRT-PCR) are any forward and reverse primers suitable for binding to a classifier biomarker provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein.
- As explained above, in one embodiment, once the mRNA is obtained from a sample (e.g., form a subject suffering from or suspected of suffering from cancer or a control subject), it is converted to complementary DNA (cDNA) in a hybridization reaction. Conversion of the mRNA to cDNA can be performed with oligonucleotides or primers comprising sequence that is complementary to a portion of a specific mRNA. Conversion of the mRNA to cDNA can be performed with oligonucleotides or primers comprising random sequence. Conversion of the mRNA to cDNA can be performed with oligonucleotides or primers comprising sequence that is complementary to the poly(A) tail of an mRNA. cDNA does not exist in vivo and therefore is a non-natural molecule. In a further embodiment, the cDNA is then amplified, for example, by the polymerase chain reaction (PCR) or other amplification method known to those of ordinary skill in the art. PCR can be performed with the forward and/or reverse primers comprising sequence complementary to at least a portion of a classifier biomarker provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein. The product of this amplification reaction, i.e., amplified cDNA is necessarily a non-natural product. As mentioned above, cDNA is a non-natural molecule. Second, in the case of PCR, the amplification process serves to create hundreds of millions of cDNA copies for every individual cDNA molecule of starting material. The number of copies generated is far removed from the number of copies of mRNA that are present in vivo.
- In one embodiment, cDNA is amplified with primers that introduce an additional DNA sequence (adapter sequence) onto the fragments (with the use of adapter-specific primers). The adaptor sequence can be a tail, wherein the tail sequence is not complementary to the cDNA. For example, the forward and/or reverse primers comprising sequence complementary to at least a portion of a classifier biomarker provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers as disclosed herein can comprise tail sequence. Amplification therefore serves to create non-natural double stranded molecules from the non-natural single stranded cDNA, by introducing barcode, adapter and/or reporter sequences onto the already non-natural cDNA. In one embodiment, during amplification with the adapter-specific primers, a detectable label, e.g., a fluorophore, is added to single strand cDNA molecules. Amplification therefore also serves to create DNA complexes that do not occur in nature, at least because (i) cDNA does not exist in vivo, (ii) adapter sequences are added to the ends of cDNA molecules to make DNA sequences that do not exist in vivo, (iii) the error rate associated with amplification further creates DNA sequences that do not exist in vivo, (iv) the disparate structure of the cDNA molecules as compared to what exists in nature, and (v) the chemical addition of a detectable label to the cDNA molecules.
- In one embodiment, the synthesized cDNA (for example, amplified cDNA) is immobilized on a solid surface via hybridization with a probe, e.g., via a microarray. In another embodiment, cDNA products are detected via real-time polymerase chain reaction (PCR) via the introduction of fluorescent probes that hybridize with the cDNA products. For example, in one embodiment, biomarker detection is assessed by quantitative fluorogenic RT-PCR (e.g., with TaqMan® probes). For PCR analysis, well known methods are available in the art for the determination of primer sequences for use in the analysis.
- In one embodiment, the measuring or detecting step in any method provided herein is performed via a hybridization assay that comprises probing the levels of at least one of the classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein, at the nucleic acid level, in a tumor sample obtained from the patient. The probing step, in one embodiment, comprises mixing the sample with one or more oligonucleotides that are substantially complementary to portions of cDNA molecules of the at least one classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein under conditions suitable for hybridization of the one or more oligonucleotides to their complements or substantial complements; detecting whether hybridization occurs between the one or more oligonucleotides to their complements or substantial complements; and obtaining hybridization values of the at least one classifier biomarkers based on the detecting step. The hybridization values of the at least one classifier biomarkers are then compared to reference hybridization value(s) from at least one sample training set. The tumor sample is classified, for example, as a COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) based on the results of the comparing step. In one embodiment, the hybridization values of the tumor sample can be compared to centroid(s) constructed from the hybridization values of the training set.
- In one embodiment, the hybridization reaction utilized in methods provided herein employs a capture probe and/or a reporter probe. For example, the hybridization probe is a probe derivatized to a solid surface such as a bead, glass or silicon substrate. In another embodiment, the capture probe is present in solution and mixed with the patient's sample, followed by attachment of the hybridization product to a surface, e.g., via a biotin-avidin interaction (e.g., where biotin is a part of the capture probe and avidin is on the surface). The hybridization assay, in one embodiment, employs both a capture probe and a reporter probe. The reporter probe can hybridize to either the capture probe or the biomarker nucleic acid. Reporter probes e.g., are then counted and detected to determine the level of biomarker(s) in the sample. The capture and/or reporter probe, in one embodiment contain a detectable label, and/or a group that allows functionalization to a surface.
- For example, the nCounter gene analysis system (see, e.g., Geiss et al. (2008) Nat. Biotechnol. 26, pp. 317-325, incorporated by reference in its entirety for all purposes, is amenable for use with the methods provided herein.
- Hybridization assays described in U.S. Pat. Nos. 7,473,767 and 8,492,094, the disclosures of which are incorporated by reference in their entireties for all purposes, are amenable for use with the methods provided herein, i.e., to detect the biomarkers and biomarker combinations described herein.
- Biomarker levels may be monitored using a membrane blot (such as used in hybridization analysis such as Northern, Southern, dot, and the like), or microwells, sample tubes, gels, beads, or fibers (or any solid support comprising bound nucleic acids). See, for example, U.S. Pat. Nos. 5,770,722, 5,874,219, 5,744,305, 5,677,195 and 5,445,934, each incorporated by reference in their entireties.
- In one embodiment, microarrays are used to detect biomarker levels. Microarrays are particularly well suited for this purpose because of the reproducibility between different experiments. DNA microarrays provide one method for the simultaneous measurement of the expression levels of large numbers of genes. Each array consists of a reproducible pattern of capture probes attached to a solid support. Labeled RNA or DNA is hybridized to complementary probes on the array and then detected by laser scanning hybridization intensities for each probe on the array are determined and converted to a quantitative value representing relative gene expression levels. See, for example, U.S. Pat. Nos. 6,040,138, 5,800,992 and 6,020,135, 6,033,860, and 6,344,316, each incorporated by reference in their entireties. High-density oligonucleotide arrays are particularly useful for determining the gene expression profile for a large number of RNAs in a sample.
- Techniques for the synthesis of these arrays using mechanical synthesis methods are described in, for example, U.S. Pat. No. 5,384,261. Although a planar array surface is generally used, the array can be fabricated on a surface of virtually any shape or even a multiplicity of surfaces. Arrays can be nucleic acids (or peptides) on beads, gels, polymeric surfaces, fibers (such as fiber optics), glass, or any other appropriate substrate. See, for example, U.S. Pat. Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992, each incorporated by reference in their entireties. Arrays can be packaged in such a manner as to allow for diagnostics or other manipulation of an all-inclusive device. See, for example, U.S. Pat. Nos. 5,856,174 and 5,922,591, each incorporated by reference in their entireties.
- Serial analysis of gene expression (SAGE) in one embodiment is employed in the methods described herein. SAGE is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need of providing an individual hybridization probe for each transcript. First, a short sequence tag (about 10-14 bp) is generated that contains sufficient information to uniquely identify a transcript, provided that the tag is obtained from a unique position within each transcript. Then, many transcripts are linked together to form long serial molecules, that can be sequenced, revealing the identity of the multiple tags simultaneously. The expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. See, Velculescu et al. Science 270:484-87, 1995; Cell 88:243-51, 1997, incorporated by reference in its entirety.
- In another embodiment, the measuring or detecting step in any method provided herein is performed via an amplification assay. The amplification assay can be coupled with a sequencing method. In one embodiment, a method of biomarker level analysis at the nucleic acid level as provided herein utilizes an amplification reaction coupled with a sequencing method such as, for example, RNAseq, next generation sequencing, and massively parallel signature sequencing (MPSS) as described by Brenner et al. (Nat. Biotech. 18:630-34, 2000, incorporated by reference in its entirety). MPSS is a sequencing approach that combines non-gel-based signature sequencing with in vitro cloning of millions of templates on separate 5 μm diameter microbeads. First, a microbead library of DNA templates is constructed by in vitro cloning. This is followed by the assembly of a planar array of the template-containing microbeads in a flow cell at a high density (typically greater than 3.0×106 microbeads/cm2). The free ends of the cloned templates on each microbead are analyzed simultaneously, using a fluorescence-based signature sequencing method that does not require DNA fragment separation. This method has been shown to simultaneously and accurately provide, in a single operation, hundreds of thousands of gene signature sequences from a yeast cDNA library.
- The expression level values of the at least one classifier biomarkers obtained from the amplification and/or sequencing assay are then compared to reference expression level value(s) from at least one sample training set. The tumor sample is classified, for example, as a COCA subtype (e.g., C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)) based on the results of the comparing step. In one embodiment, the expression level values of the tumor sample can be compared to centroid(s) constructed from the expression level values obtained from the training set.
- Another method of biomarker level analysis at the nucleic acid level for use in any method provided herein is the use of an amplification method such as, for example, RT-PCR or quantitative RT-PCR (qRT-PCR). Methods for determining the level of biomarker mRNA in a sample may involve the process of nucleic acid amplification, e.g., by RT-PCR (the experimental embodiment set forth in Mullis, 1987, U.S. Pat. No. 4,683,202), ligase chain reaction (Barany (1991) Proc. Natl. Acad. Sci. USA 88:189-193), self-sustained sequence replication (Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87:1874-1878), transcriptional amplification system (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86:1173-1177), Q-Beta Replicase (Lizardi et al. (1988) Bio/Technology 6:1197), rolling circle replication (Lizardi et al., U.S. Pat. No. 5,854,033) or any other nucleic acid amplification method, followed by the detection of the amplified molecules using techniques well known to those of skill in the art. Numerous different PCR or qRT-PCR protocols are known in the art and can be directly applied or adapted for use using the presently described compositions for the detection and/or quantification of expression of discriminative genes in a sample. See, for example, Fan et al. (2004) Genome Res. 14:878-885, herein incorporated by reference. Generally, in PCR, a target polynucleotide sequence is amplified by reaction with at least one oligonucleotide primer or pair of oligonucleotide primers. The primer(s) hybridize to a complementary region of the target nucleic acid and a DNA polymerase extends the primer(s) to amplify the target sequence. Under conditions sufficient to provide polymerase-based nucleic acid amplification products, a nucleic acid fragment of one size dominates the reaction products (the target polynucleotide sequence which is the amplification product). The amplification cycle is repeated to increase the concentration of the single target polynucleotide sequence. The reaction can be performed in any thermocycler commonly used for PCR.
- Quantitative RT-PCR (qRT-PCR) (also referred as real-time RT-PCR) is preferred under some circumstances because it provides not only a quantitative measurement, but also reduced time and contamination. As used herein, “quantitative PCR” (or “real time qRT-PCR”) refers to the direct monitoring of the progress of a PCR amplification as it is occurring without the need for repeated sampling of the reaction products. In quantitative PCR, the reaction products may be monitored via a signaling mechanism (e.g., fluorescence) as they are generated and are tracked after the signal rises above a background level but before the reaction reaches a plateau. The number of cycles required to achieve a detectable or “threshold” level of fluorescence varies directly with the concentration of amplifiable targets at the beginning of the PCR process, enabling a measure of signal intensity to provide a measure of the amount of target nucleic acid in a sample in real time. A DNA binding dye (e.g., SYBR green) or a labeled probe can be used to detect the extension product generated by PCR amplification. Any probe format utilizing a labeled probe comprising the sequences provided herein may be used.
- Immunohistochemistry methods are also suitable for detecting the levels of the biomarkers provided herein. Samples can be frozen for later preparation or immediately placed in a fixative solution. Tissue samples can be fixed by treatment with a reagent, such as formalin, glutaraldehyde, methanol, or the like and embedded in paraffin. Methods for preparing slides for immunohistochemical analysis from formalin-fixed, paraffin-embedded tissue samples are well known in the art.
- In one embodiment, COCA subtypes can be evaluated using levels of protein expression of one or more of the classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein. The level of protein expression can be measured using an immunological detection method. Immunological detection methods which can be used herein include, but are not limited to, competitive and non-competitive assay systems using techniques such as Western blots, radioimmunoassays, ELISA (enzyme linked immunosorbent assay), “sandwich” immunoassays, immunoprecipitation assays, precipitin reactions, gel diffusion precipitin reactions, immunodiffusion assays, agglutination assays, complement-fixation assays, immunoradiometric assays, fluorescent immunoassays, protein A immunoassays, and the like. Such assays are routine and well known in the art (see, e.g., Ausubel et al, eds, 1994, Current Protocols in Molecular Biology, Vol. I, John Wiley & Sons, Inc., New York, which is incorporated by reference herein in its entirety).
- In one embodiment, antibodies specific for biomarker proteins are utilized to detect the expression of a biomarker protein in a sample (e.g., tumor sample). The method comprises obtaining a sample from a patient or a subject, contacting the sample with at least one antibody directed to a biomarker that is selectively expressed in cancer cells, and detecting antibody binding to determine if the biomarker is expressed in the patient sample. Also provided herein is an immunocytochemistry technique for diagnosing COCA subtypes. One of skill in the art will recognize that the immunocytochemistry method described herein below may be performed manually or in an automated fashion.
- In some embodiments, the expression level of a classifier biomarker(s) (e.g., from Table 1) as determined using any methods or compositions provided herein or its expression product, is determined by normalization to the level of reference nucleic acid(s) (e.g., RNA transcripts) or their expression products (e.g., proteins), which can be all measured nucleic acids (e.g., transcripts (or their products)) in the sample or a particular reference set of nucleic acids (e.g., RNA transcripts (or their non-natural cDNA products)). Normalization is performed to correct for or normalize away both differences in the amount of nucleic acid (e.g., RNA or cDNA) assayed and variability in the quality of the nucleic acid (e.g., RNA or cDNA) used. Therefore, an assay typically measures and incorporates the expression of certain normalizing genes, including well known housekeeping genes, such as, for example, GAPDH and/or β-Actin. Alternatively, normalization can be based on the mean or median signal of all of the assayed biomarkers or a large subset thereof (global normalization approach).
- In one embodiment, the levels of the biomarkers provided herein, such as the classifier biomarkers of Table 1 (or subsets thereof, for example, 1 to 4, 4 to 8, 8 to 12, 12 to 16, 16 to 20, 20 to 24, 24 to 28, 28 to 32, 32 to 36, 36 to 40, 40 to 44, 44 to 48, 48 to 52, 52 to 56, 56 to 60, 60 to 64, 64 to 68, 68 to 72, 72 to 76, 76 to 80, 80 to 84 of the classifier biomarkers) are normalized against the expression levels of all RNA transcripts or their non-natural cDNA expression products, or protein products in the sample, or of a reference set of RNA transcripts or a reference set of their non-natural cDNA expression products, or a reference set of their protein products in the sample. In one embodiment, the levels of the biomarkers provided herein, such as any of the additional set of classifier biomarkers disclosed herein are normalized against the expression levels of all RNA transcripts or their non-natural cDNA expression products, or protein products in the sample, or of a reference set of RNA transcripts or a reference set of their non-natural cDNA expression products, or a reference set of their protein products in the sample.
- As provided throughout, the methods set forth herein provide a method for determining the COCA subtype of a patient. Once the biomarker levels (e.g., Table 1 or any other gene signature provided herein) are determined, for example by measuring non-natural cDNA biomarker levels or non-natural mRNA-cDNA biomarker complexes, the biomarker levels are compared to reference values or a reference sample as provided herein, for example with the use of statistical methods or direct comparison of detected levels, to make a determination of the COCA subtype. Based on the comparison, the patient's tumor sample is classified, e.g., as a specific COCA subtype (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA).
- In one embodiment, expression level values of the at least one classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 are compared to reference expression level value(s) from at least one sample training set, wherein the at least one sample training set comprises expression level values from a reference sample(s). In a further embodiment, the at least one sample training set comprises expression level values of the at least one classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein from a specific COCA subtype (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA) or a combination thereof.
- In a separate embodiment, for methods provided herein employing a hybridization assay, hybridization values of the at least one classifier biomarkers provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein are compared to reference hybridization value(s) from at least one sample training set, wherein the at least one sample training set comprises hybridization values from a reference sample(s). In a further embodiment, the at least one sample training set comprises hybridization values of the at least one classifier biomarker provided herein, such as the classifier biomarkers of Table 1 or any additional set of biomarker classifiers disclosed herein from a specific COCA subtype (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA) or a combination thereof. Methods for comparing detected levels of biomarkers to reference values and/or reference samples are provided herein. Based on this comparison, in one embodiment a correlation between the biomarker levels obtained from the subject's sample and the reference values is obtained. An assessment of the COCA subtype is then made.
- Various statistical methods can be used to aid in the comparison of the biomarker levels obtained from the patient and reference biomarker levels, for example, from at least one sample training set.
- In one embodiment, a supervised pattern recognition method is employed. Examples of supervised pattern recognition methods can include, but are not limited to, the nearest centroid methods (Dabney (2005) Bioinformatics 21(22):4148-4154 and Tibshirani et al. (2002) Proc. Natl. Acad. Sci. USA 99(10):6576-6572); soft independent modeling of class analysis (SIMCA) (see, for example, Wold, 1976); partial least squares analysis (PLS) (see, for example, Wold, 1966; Joreskog, 1982; Frank, 1984; Bro, R., 1997); linear discriminant analysis (LDA) (see, for example, Nillson, 1965); K-nearest neighbor analysis (KNN) (sec, for example, Brown et al., 1996); artificial neural networks (ANN) (see, for example, Wasserman, 1989; Anker et al., 1992; Hare, 1994); probabilistic neural networks (PNNs) (see, for example, Parzen, 1962; Bishop, 1995; Speckt, 1990; Broomhead et al., 1988; Patterson, 1996); rule induction (RI) (see, for example, Quinlan, 1986); and, Bayesian methods (see, for example, Bretthorst, 1990a, 1990b, 1988). In one embodiment, the classifier for identifying COCA subtypes based on gene expression data is used in a centroid based method as described in Mullins et al. (2007) Clin Chem. 53(7):1273-9, which is incorporated herein by reference in its entirety. In another embodiment, the classifier for identifying tumor subtypes based on gene expression data is used in a nearest centroid based method as described in Dabney (2005) Bioinformatics 21(22):4148-4154, which is incorporated herein by reference in its entirety. The nearest centroid based method can be performed using CLaNC software as described in Dabney A R. ClaNC: Point-and-click software for classifying microarrays to nearest centroids. Bioinformatics. 2006; 22: 122-123 or equivalents or derivatives thereof.
- In other embodiments, an unsupervised training approach is employed, and therefore, no training set is used.
- Referring to sample training sets for supervised learning approaches again, in some embodiments, a sample training set(s) can include expression data of a plurality or all of the classifier biomarkers (e.g., all the classifier biomarkers of Table 1) from a specific COCA subtype sample. The plurality of classifier biomarkers can comprise at least 4 classifier biomarkers, at least 8 classifier biomarkers, at least 12 classifier biomarkers, at least 16 classifier biomarkers at least 20 classifier biomarkers, at least 24 classifier biomarkers, at least 28 classifier biomarkers, at least 32 classifier biomarkers, at least 36 classifier biomarkers, at least 40 classifier biomarkers, at least 44 classifier biomarkers, at least 48 classifier biomarkers, at least 52 classifier biomarkers, at least 56 classifier biomarkers, at least 60 classifier biomarkers, at least 64 classifier biomarkers, at least 68 classifier biomarkers, at least 72 classifier biomarkers, at least 76 classifier biomarkers or at least 80 classifier biomarkers of Table 1. In some embodiments, the plurality of classifier biomarkers comprises all 84 biomarkers of Table 1. In some embodiments, the sample training set(s) are normalized to remove sample-to-sample variation.
- In some embodiments, comparing can include applying a statistical algorithm, such as, for example, any suitable multivariate statistical analysis model, which can be parametric or non-parametric. In some embodiments, applying the statistical algorithm can include determining a correlation between the expression data obtained from the tumor sample obtained from the subject suffering from or suspected of suffering from cancer (i.e., the test subject) and the expression data from the COCA subtyping training set(s). In some embodiments, cross-validation is performed, such as (for example), leave-one-out cross-validation (LOOCV). In some embodiments, integrative correlation is performed. In some embodiments, a Spearman correlation is performed. In some embodiments, a centroid based method based on gene expression data is employed for the statistical algorithm. The centroids can be constructed using any method known in the art for generating centroids such as, for example, those found in Mullins et al. (2007) Clin Chem. 53(7):1273-9 or the nearest centroid method found in Dabney (2005) Bioinformatics 21(22):4148-4154, which is herein incorporated by reference in its entirety. In one embodiment, a correlation analysis is performed on the expression data obtained from the tumor sample and the centroid(s) constructed from the expression data from the COCA training set(s). The correlation analysis can be a Spearman correlation or a Pearson correlation. In one embodiment, a distance measure analysis (e.g., Euclidean distance) is performed on the expression data obtained from the tumor sample and the centroid(s) constructed on the expression data from the COCA training set(s).
- Results of the gene expression analysis performed on a sample from a subject (test sample) may be compared to a biological sample(s) or data derived from a biological sample(s) (e.g., expression data or levels from at least one classifier biomarker provided herein, e.g., Table 1) that is known or suspected to be normal (“reference sample” or “normal sample”, e.g., non-cancer sample). In some embodiments, a reference sample or reference gene expression data (e.g., expression data or levels from at least one classifier biomarker provided herein, e.g., Table 1) is obtained or derived from an individual known to have a particular COCA subtype of cancer, e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA. In one embodiment, the gene expression levels or profile measured for the at least one classifier biomarkers from Table 1 measured or detected in the test sample (i.e., tumor sample obtained from the subject) may be compared to centroids constructed from the gene expression performed on the reference or normal sample or training set and classification can be based on determining which is the nearest centroid based on distance measure such as, for example, a Euclidean distance or a correlation. The centroids can be constructed using any of the methods provided herein such as, for example, using the ClaNC software described in Dabney A R. ClaNC: Point-and-click software for classifying microarrays to nearest centroids. Bioinformatics. 2006; 22: 122-123 or equivalents or derivatives related thereto. Classification or determination of the subtype of the test sample can then be ascertained by determining the nearest centroid from the reference or normal sample to which the expression levels or profile from said test sample is nearest based on a distance measure or correlation. The distance measure can be a Euclidean distance.
- The reference sample may be assayed at the same time, or at a different time from the sample obtained from the test subject (i.e., test sample). Alternatively, the biomarker level information from a reference sample may be stored in a database or other means for access at a later date.
- The biomarker level results of an assay on the test sample may be compared to the results of the same assay on a reference sample. In some cases, the results of the assay on the reference sample are from a database, or a reference value(s). In some cases, the results of the assay on the reference sample are a known or generally accepted value or range of values by those skilled in the art. In some cases, the comparison is qualitative. In other cases, the comparison is quantitative. In some cases, qualitative or quantitative comparisons may involve but are not limited to one or more of the following: comparing expression levels of a test sample to gene centroids constructed from expression level data from a reference sample (e.g., constructed from expression level data for one or a plurality of genes from Table 1), fluorescence values, spot intensities, absorbance values, chemiluminescent signals, histograms, critical threshold values, statistical significance values, expression levels of the genes described herein, mRNA copy numbers.
- In one embodiment, an odds ratio (OR) is calculated for each biomarker level panel measurement. Here, the OR is a measure of association between the measured biomarker values for the patient and an outcome, e.g., COCA subtype. For example, see, J. Can. Acad. Child Adolesc. Psychiatry 2010; 19(3): 227-229, which is incorporated by reference in its entirety for all purposes.
- In one embodiment, a specified statistical confidence level may be determined in order to provide a confidence level regarding the COCA subtype. For example, it may be determined that a confidence level of greater than 90% may be a useful predictor of the COCA subtype. In other embodiments, more or less stringent confidence levels may be chosen. For example, a confidence level of about or at least about 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, 99.5%, or 99.9% may be chosen. The confidence level provided may in some cases be related to the quality of the sample, the quality of the data, the quality of the analysis, the specific methods used, and/or the number of gene expression values (i.e., the number of genes) analyzed. The specified confidence level for providing the likelihood of response may be chosen on the basis of the expected number of false positives or false negatives. Methods for choosing parameters for achieving a specified confidence level or for identifying markers with diagnostic power include but are not limited to Receiver Operating Characteristic (ROC) curve analysis, binomial ROC, principal component analysis, odds ratio analysis, partial least squares analysis, singular value decomposition, least absolute shrinkage and selection operator analysis, least angle regression, and the threshold gradient directed regularization method.
- Determining the COCA subtype in some cases can be improved through the application of algorithms designed to normalize and or improve the reliability of the gene expression data. In some embodiments, the data analysis utilizes a computer or other device, machine or apparatus for application of the various algorithms described herein due to the large number of individual data points that are processed. A “machine learning algorithm” refers to a computational-based prediction methodology, also known to persons skilled in the art as a “classifier,” employed for characterizing a gene expression profile or profiles, e.g., to determine the COCA subtype. The biomarker levels, determined by, e.g., microarray-based hybridization assays, sequencing assays, NanoString assays, etc., are in one embodiment subjected to the algorithm in order to classify the profile. Supervised learning generally involves “training” a classifier to recognize the distinctions among COCA subtypes such as, for example, C1 ACC/PCPG positive, C2 GBM/LGG positive, C3 OV positive, C4 Squamous-like positive, C6 LUAD-Enriched positive, C8 PAAD/some STAD positive, C9 UCS positive, C10 BRCA/Basal positive, C12 UCEC positive, C14 PRAD positive, C15 CESC (subset of cervical) positive, C16 BLCA positive, C17 TGCT positive, C19 COAD/READ positive, C20 SARC/MESO positive, C21 KIRK/KICH/KIRP positive, C22 Liver positive, C24 BRCA/Luminal positive, C25 THYM positive, C26 SKCM/UVM positive and C28 THCA positive, and then “testing” the accuracy of the classifier on an independent test set. Therefore, for new, unknown samples the classifier can be used to predict, for example, the class (e.g., C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA) in which the samples belong. The machine learning algorithm can be a CLaNC algorithm as provided herein.
- In some embodiments, a robust multi-array average (RMA) method may be used to normalize raw data. The RMA method begins by computing background-corrected intensities for each matched cell on a number of microarrays. In one embodiment, the background corrected values are restricted to positive values as described by Irizarry et al. (2003). Biostatistics April 4 (2): 249-64, incorporated by reference in its entirety for all purposes. After background correction, the base-2 logarithm of each background corrected matched-cell intensity is then obtained. The background corrected, log-transformed, matched intensity on each microarray is then normalized using the quantile normalization method in which for each input array and each probe value, the array percentile probe value is replaced with the average of all array percentile points, this method is more completely described by Bolstad et al. Bioinformatics 2003, incorporated by reference in its entirety. Following quantile normalization, the normalized data may then be fit to a linear model to obtain an intensity measure for each probe on each microarray. Tukey's median polish algorithm (Tukey, J. W., Exploratory Data Analysis. 1977, incorporated by reference in its entirety for all purposes) may then be used to determine the log-scale intensity level for the normalized probe set data.
- Various other software programs may be implemented. In certain methods, feature selection and model estimation may be performed by logistic regression with lasso penalty using glmnet (Friedman et al. (2010). Journal of statistical software 33(1): 1-22, incorporated by reference in its entirety). Raw reads may be aligned using TopHat (Trapnell et al. (2009). Bioinformatics 25(9): 1105-11, incorporated by reference in its entirety). In methods, top features (N ranging from 10 to 200) are used to train a linear support vector machine (SVM) (Suykens J A K, Vandewalle J. Least Squares Support Vector Machine Classifiers. Neural Processing Letters 1999; 9(3): 293-300, incorporated by reference in its entirety) using the e1071 library (Meyer D. Support vector machines: the interface to libsvm in package e1071. 2014, incorporated by reference in its entirety). Confidence intervals, in one embodiment, are computed using the pROC package (Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics 2011; 12: 77, incorporated by reference in its entirety).
- In addition, data may be filtered to remove data that may be considered suspect. In one embodiment, data derived from microarray probes that have fewer than about 4, 5, 6, 7 or 8 guanosine+cytosine nucleotides may be considered to be unreliable due to their aberrant hybridization propensity or secondary structure issues. Similarly, data deriving from microarray probes that have more than about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, or 22 guanosine+cytosine nucleotides may in one embodiment be considered unreliable due to their aberrant hybridization propensity or secondary structure issues.
- In some embodiments, data from probe-sets may be excluded from analysis if they are not identified at a detectable level (above background).
- In some embodiments, probe-sets that exhibit no, or low variance may be excluded from further analysis. Low-variance probe-sets are excluded from the analysis via a Chi-Square test. In one embodiment, a probe-set is considered to be low-variance if its transformed variance is to the left of the 99 percent confidence interval of the Chi-Squared distribution with (N−1) degrees of freedom. (N−1)*Probe-set Variance/(Gene Probe-set Variance). Chi-Sq(N−1) where N is the number of input CEL files, (N−1) is the degrees of freedom for the Chi-Squared distribution, and the “probe-set variance for the gene” is the average of probe-set variances across the gene. In some embodiments, probe-sets for a given mRNA or group of mRNAs may be excluded from further analysis if they contain less than a minimum number of probes that pass through the previously described filter steps for GC content, reliability, variance and the like. For example in some embodiments, probe-sets for a given gene or transcript cluster may be excluded from further analysis if they contain less than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or less than about 20 probes.
- Methods of biomarker level data analysis in one embodiment, further include the use of a feature selection algorithm as provided herein. In some embodiments, feature selection is provided by use of the LIMMA software package (Smyth, G. K. (2005). Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds.), Springer, New York, pages 397-420, incorporated by reference in its entirety for all purposes).
- Methods of biomarker level data analysis, in one embodiment, include the use of a pre-classifier algorithm. For example, an algorithm may use a specific molecular fingerprint to pre-classify the samples according to their composition and then apply a correction/normalization factor. This data/information may then be fed in to a final classification algorithm which would incorporate that information to aid in the final diagnosis.
- Methods of biomarker level data analysis, in one embodiment, further include the use of a classifier algorithm as provided herein. In one embodiment, a diagonal linear discriminant analysis, k-nearest neighbor algorithm, support vector machine (SVM) algorithm, linear support vector machine, random forest algorithm, or a probabilistic model-based method or a combination thereof is provided for classification of microarray data. In some embodiments, identified markers that distinguish samples (e.g., of varying biomarker level profiles, and/or varying COCA subtypes of cancer are selected based on statistical significance of the difference in biomarker levels between classes of interest. In some cases, the statistical significance is adjusted by applying a Benjamin Hochberg or another correction for false discovery rate (FDR).
- In some cases, the classifier algorithm may be supplemented with a meta-analysis approach such as that described by Fishel and Kaufman et al. 2007 Bioinformatics 23(13): 1599-606, incorporated by reference in its entirety for all purposes. In some cases, the classifier algorithm may be supplemented with a meta-analysis approach such as a repeatability analysis.
- Methods for deriving and applying posterior probabilities to the analysis of biomarker level data are known in the art and have been described for example in Smyth, G. K. 2004 Stat. Appl. Genet. Mol. Biol. 3:
Article 3, incorporated by reference in its entirety for all purposes. In some cases, the posterior probabilities may be used in the methods provided herein to rank the markers provided by the classifier algorithm. - A statistical evaluation of the results of the biomarker level profiling may provide a quantitative value or values indicative of one or more of the following: COCA subtype of cancer; the likelihood of the success of a particular therapeutic intervention, e.g., angiogenesis inhibitor therapy, chemotherapy, or immunotherapy. In one embodiment, the data is presented directly to the physician in its most useful form to guide patient care, or is used to define patient populations in clinical trials or a patient population for a given medication. The results of the molecular profiling can be statistically evaluated using a number of methods known to the art including, but not limited to: the students T test, the two sided T test, Pearson rank sum analysis, hidden Markov model analysis, analysis of q-q plots, principal component analysis, one way ANOVA, two way ANOVA, LIMMA and the like.
- In some cases, accuracy may be determined by tracking the subject over time to determine the accuracy of the original diagnosis. In other cases, accuracy may be established in a deterministic manner or using statistical methods. For example, receiver operator characteristic (ROC) analysis may be used to determine the optimal assay parameters to achieve a specific level of accuracy, specificity, positive predictive value, negative predictive value, and/or false discovery rate.
- In some cases, the results of the biomarker level profiling assays, are entered into a database for access by representatives or agents of a molecular profiling business, the individual, a medical provider, or insurance provider. In some cases, assay results include sample classification, identification, or diagnosis by a representative, agent or consultant of the business, such as a medical professional. In other cases, a computer or algorithmic analysis of the data is provided automatically. In some cases, the molecular profiling business may bill the individual, insurance provider, medical provider, researcher, or government entity for one or more of the following: molecular profiling assays performed, consulting services, data analysis, reporting of results, or database access.
- In some embodiments, the results of the biomarker level profiling assays are presented as a report on a computer screen or as a paper record. In some embodiments, the report may include, but is not limited to, such information as one or more of the following: the levels of biomarkers (e.g., as reported by copy number or fluorescence intensity, etc.) as compared to the reference sample or reference value(s); the likelihood the subject will respond to a particular therapy, based on the biomarker level values and the COCA subtype and proposed therapies.
- In one embodiment, the results of the gene expression profiling may be classified into one or more of the following: C1 ACC/PCPG positive, C2 GBM/LGG positive, C3 OV positive, C4 Squamous-like positive, C6 LUAD-Enriched positive, C8 PAAD/some STAD positive, C9 UCS positive, C10 BRCA/Basal positive, C12 UCEC positive, C14 PRAD positive, C15 CESC (subset of cervical) positive, C16 BLCA positive, C17 TGCT positive, C19 COAD/READ positive, C20 SARC/MESO positive, C21 KIRK/KICH/KIRP positive, C22 Liver positive, C24 BRCA/Luminal positive, C25 THYM positive, C26 SKCM/UVM positive or C28 THCA positive, C1 ACC/PCPG negative, C2 GBM/LGG negative, C3 OV negative, C4 Squamous-like negative, C6 LUAD-Enriched negative, C8 PAAD/some STAD negative, C9 UCS negative, C10 BRCA/Basal negative, C12 UCEC negative, C14 PRAD negative, C15 CESC (subset of cervical) negative, C16 BLCA negative, C17 TGCT negative, C19 COAD/READ negative, C20 SARC/MESO negative, C21 KIRK/KICH/KIRP negative, C22 Liver negative, C24 BRCA/Luminal negative, C25 THYM negative, C26 SKCM/UVM negative or C28 THCA negative or a combination thereof.
- In some embodiments, results are classified using a trained algorithm. Trained algorithms provided herein include algorithms that have been developed using a reference set of known gene expression values and/or normal samples, for example, samples from individuals diagnosed with a particular molecular COCA subtype of cancer. In some cases, a reference set of known gene expression values are obtained from individuals who have been diagnosed with a particular COCA subtype of cancer. In some cases, a reference set of known gene expression values are obtained from individuals who have been diagnosed with a particular COCA subtype of cancer, and are also known to possess certain immune cell signature. In some cases, a reference set of known gene expression values are obtained from individuals who have been diagnosed with a particular COCA subtype of cancer, and are also known to have certain expression of tumor driver genes.
- Algorithms suitable for categorization of samples include but are not limited to k-nearest neighbor algorithms, support vector machines, linear discriminant analysis, centroid algorithms (e.g., CLaNC), diagonal linear discriminant analysis, updown, naive Bayesian algorithms, neural network algorithms, hidden Markov model algorithms, genetic algorithms, or any combination thereof.
- When a binary classifier is compared with actual true values (e.g., values from a biological sample), there are typically four possible outcomes. If the outcome from a prediction is p (where “p” is a positive classifier output, such as the presence of a deletion or duplication syndrome) and the actual value is also p, then it is called a true positive (TP); however if the actual value is n then it is said to be a false positive (FP). Conversely, a true negative has occurred when both the prediction outcome and the actual value are n (where “n” is a negative classifier output, such as no deletion or duplication syndrome), and false negative is when the prediction outcome is n while the actual value is p. In one embodiment, consider a test that seeks to determine whether a person is likely or unlikely to respond to angiogenesis inhibitor therapy. A false positive in this case occurs when the person tests positive, but actually does respond. A false negative, on the other hand, occurs when the person tests negative, suggesting they are unlikely to respond, when they actually are likely to respond. The same holds true for classifying a COCA subtype.
- The positive predictive value (PPV), or precision rate, or post-test probability of disease, is the proportion of subjects with positive test results who are correctly diagnosed as likely or unlikely to respond, or diagnosed with the correct COCA subtype, or a combination thereof. It reflects the probability that a positive test reflects the underlying condition being tested for. Its value does however depend on the prevalence of the disease, which may vary. In one example, the following characteristics are provided: FP (false positive); TN (true negative); TP (true positive); FN (false negative). False positive rate (α)=FP/(FP+TN)−specificity; False negative rate (β)=FN/(TP+FN)−sensitivity; Power=sensitivity=1−β; Likelihood-ratio positive=sensitivity/(1−specificity); Likelihood-ratio negative=(1−sensitivity)/specificity. The negative predictive value (NPV) is the proportion of subjects with negative test results who are correctly diagnosed.
- In some embodiments, the results of the biomarker level analysis of the subject methods provide a statistical confidence level that a given diagnosis is correct. In some embodiments, such statistical confidence level is at least about, or more than about 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% 99.5%, or more.
- In some embodiments, the method further includes classifying the tumor tissue sample as a particular COCA subtype based on the comparison of biomarker levels in the sample and reference biomarker levels, for example present in at least one training set. In some embodiments, the tumor tissue sample is classified as a particular subtype if the results of the comparison meet one or more criterion such as, for example, a minimum percent agreement, a value of a statistic calculated based on the percentage agreement such as (for example) a kappa statistic, a minimum correlation (e.g., Pearson's correlation) and/or the like.
- It is intended that the methods described herein can be performed by software (stored in memory and/or executed on hardware), hardware, or a combination thereof. Hardware modules may include, for example, a general-purpose processor, a field programmable gate array (FPGA), and/or an application specific integrated circuit (ASIC). Software modules (executed on hardware) can be expressed in a variety of software languages (e.g., computer code), including Unix utilities, C, C++, Java™, Ruby, SQL, SAS®, the R programming language/software environment, Visual Basic™, and other object-oriented, procedural, or other programming language and development tools. Examples of computer code include, but are not limited to, micro-code or micro-instructions, machine instructions, such as produced by a compiler, code used to produce a web service, and files containing higher-level instructions that are executed by a computer using an interpreter. Additional examples of computer code include, but are not limited to, control signals, encrypted code, and compressed code.
- Some embodiments described herein relate to devices with a non-transitory computer-readable medium (also can be referred to as a non-transitory processor-readable medium or memory) having instructions or computer code thereon for performing various computer-implemented operations and/or methods disclosed herein. The computer-readable medium (or processor-readable medium) is non-transitory in the sense that it does not include transitory propagating signals per se (e.g., a propagating electromagnetic wave carrying information on a transmission medium such as space or a cable). The media and computer code (also can be referred to as code) may be those designed and constructed for the specific purpose or purposes. Examples of non-transitory computer-readable media include, but are not limited to: magnetic storage media such as hard disks, floppy disks, and magnetic tape; optical storage media such as Compact Disc/Digital Video Discs (CD/DVDs), Compact Disc-Read Only Memories (CD-ROMs), and holographic devices; magneto-optical storage media such as optical disks; carrier wave signal processing modules; and hardware devices that are specially configured to store and execute program code, such as Application-Specific Integrated Circuits (ASICs), Programmable Logic Devices (PLDs), Read-Only Memory (ROM) and Random-Access Memory (RAM) devices. Other embodiments described herein relate to a computer program product, which can include, for example, the instructions and/or computer code discussed herein.
- In some embodiments, a single biomarker, or from about 1 to about 4, from about 4 to about 8, from about 8 to about 12, from about 12 to about 16, from about 16 to about 20, from about 20 to about 24, from about 24 to about 30, from about 34 to about 38, from about 38 to about 42, from about 42 to about 46, from about 46 to about 50, from about 50 to about 54, from about 54 to about 58, from about 58 to about 62, from about 62 to about 66, from about 66 to about 72, from about 72 to about 76, from about 76 to about 80, from about 80 to about 84 (e.g., as disclosed in Table 1) is capable of classifying COCA subtypes of cancer with a predictive success of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at 1 east about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, and all values in between. In some embodiments, any combination of biomarkers disclosed herein (e.g., in Table 1) can be used to obtain a predictive success of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at least about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, and all values in between.
- In some embodiments, a single biomarker, or from about 1 to about 4, from about 4 to about 8, from about 8 to about 12, from about 12 to about 16, from about 16 to about 20, from about 20 to about 24, from about 24 to about 30, from about 34 to about 38, from about 38 to about 42, from about 42 to about 46, from about 46 to about 50, from about 50 to about 54, from about 54 to about 58, from about 58 to about 62, from about 62 to about 66, from about 66 to about 72, from about 72 to about 76, from about 76 to about 80, from about 80 to about 84 (e.g., as disclosed in Table 1) is capable of classifying COCA subtypes of cancer with a sensitivity or specificity of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at least about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, and all values in between. In some embodiments, any combination of biomarkers disclosed herein can be used to obtain a sensitivity or specificity of at least about 70%, at least about 71%, at least about 72%, at least about 73%, at least about 74%, at least about 75%, at least about 76%, at least about 77%, at least about 78%, at least about 79%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, up to 100%, and all values in between.
- In one embodiment, the methods and compositions provided herein are useful for determining the clustering of cluster assignments (COCA) subtype of a sample (e.g., tumor sample) from a patient by analyzing the expression of a set of biomarkers, whereby use of the set of biomarkers in detecting a COCA subtype comprises use of a fewer number of biomarkers from a single genome-wide platform as compared to methods known in the art for molecularly classifying a cell of origin cancer subtype (e.g., Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304, and Hoadley et al. “Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.”
Cell 158, no. 4 (2014): 929-944, both of which are herein incorporated by reference). In some cases, the set of biomarkers is less than 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 150, 100 or 90 biomarkers. In some cases, the set of biomarkers is between 4 and 84 biomarkers. In some cases, the set of biomarkers is the set of 84 biomarkers listed in Table 1. In some cases, the set of biomarkers is a sub-set of biomarkers listed Table 1 such as, for example 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80 or 82 of the biomarkers listed in Table 1. The biomarkers or classifier biomarkers useful in the methods and compositions provided herein can be selected from one or more cancer datasets from one or more databases. The cancers can be any cancer known in the art. The cancers can include hematologic and lymphatic malignancies, solid tumor types, cancers of the central nervous system, cancers from neural-crest-derived tissues, and melanocytic cancers of the skin. The cancers for use in the methods herein can be the cancers studied in The Cancer Genome Atlas (TCGA) or a subset thereof. The cancers for use in the method provided herein can be those cancers listed herein. The databases can be public databases. - In one embodiment, classifier biomarkers (e.g., one or more genes listed in Table 1) useful in the methods and compositions provided herein for detecting or diagnosing subtypes were selected from a large data set of potential classifier biomarkers. In one embodiment, classifier biomarkers useful for the methods and compositions provided herein such as those in Table 1 are selected by subjecting a large set of classifier biomarkers to an in silico based process in order to determine the minimum number of genes whose expression profile can be used to determine a pan-cancer COCA subtype of a subject from a sample obtained from said subject. In some cases, the large set of classifier biomarkers can be a pan-cancer dataset such as, for example, the mRNA expression data (i.e., RNA-seq data) from TCGA found at gdc.cancer.gov/about-data/publications/pancanatlas. In some cases, the large set of classifier biomarkers can be the genes derived from the mRNA expression profile data derived from more than 10,000 tumors across more than 30 tumor types as described in Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.”
Cell 173, no. 2 (2018): 291-304, which comprised one of several genome-wide molecular platforms that together can serve to define the gold standard (GS) COCA subtyper. The in silico process for selecting a gene signature as provided herein (e.g., Table 1 and 2) for determining a COCA subtype of a sample from a patient can comprise applying or using a Classification to Nearest Centroid (CLaNC) algorithm on the pan-cancer mRNA expression data (i.e., RNA-seq data) from TCGA to choose a minimum number of correlated genes for each subtype. For determination of the optimal number of genes (e.g., 84 genes as shown in Table 1) to include in the signature, the process can further comprise performing a 5-fold cross validation using the TCGA pan-cancer dataset following application of the CLaNC algorithm as provided herein to produce cross-validation curves to test different numbers of correlated genes as shown inFIG. 3 in order to determine the minimum number of correlated genes needed per subtype. To get the final list of gene classifiers, the method can further comprise applying the CLaNC algorithm to the entire TCGA mRNA expression pan-cancer dataset. The CLaNC software used in the methods provided herein can be as found in or derived from Alan R. Dabney; ClaNC: point-and-click software for classifying microarrays to nearest centroids, Bioinformatics,Volume 22,Issue - In one embodiment, the method further comprises validating the gene classifiers. Validation can comprise testing the expression of the classifiers in a test set of samples and comparing the COCA subtype determined using the signature of Table 1 with the COCA subtype determined using the gold standard COCA subtyper method described in Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304. The test set of samples can be any sample type provided herein such as, for example, fresh frozen or archived formalin-fixed paraffin-embedded (FFPE) cancer samples. In one embodiment, validation can comprise testing the expression of the classifiers in several fresh frozen publicly available array and/or RNAseq datasets and calling the subtype based on said expression levels and subsequently comparing the COCA subtype determined using the signature of Table 1 with the COCA subtype determined using the gold standard COCA subtyper method described in Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.”
Cell 173, no. 2 (2018): 291-304. In other words, validation can comprise calling the subtypes of the several fresh frozen publicly available array and RNAseq test datasets using their expression levels and the CLaNC algorithm as described herein and comparing the subtype calls with the gold standard subtype calls as defined in Hoadley et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.”Cell 173, no. 2 (2018): 291-304. Final validation of the gene signature (e.g., Table 1) can then be performed in a newly collected dataset of archived formalin-fixed paraffin-embedded (FFPE) cancer samples to assure comparable performance in the FFPE samples. In one embodiment, the classifier biomarkers of Table 1 were selected based on the in silico CLaNC process described herein. The gene symbols and official gene names are listed in Table 1. Further to the above embodiments, the in silico CLaNC process can entail use of the CLaNC process described in Dabney (2005) Bioinformatics 21(22):4148-4154. In one embodiment, the in silico CLaNC process can entail use of CLaNC software described in Dabney A R. ClaNC: Point-and-click software for classifying microarrays to nearest centroids. Bioinformatics. 2006; 22: 122-123 or equivalents or derivatives related thereto. - In one embodiment, the methods provided herein require the detection of the expression level of at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 18, at least 20, at least 22, at least 24, at least 26, at least 28, at least 30, at least 32, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83 or up to 84 classifier biomarkers (e.g., from Table 1) in a cancer sample obtained from a patient whose expression is altered in order to identify a COCA cancer subtype. The same applies for other classifier biomarker expression datasets as provided herein.
- In another embodiment, the methods provided herein require the detection of the expression level of a total of at least 2, at least 4, at least 6, at least 8, at least 10, at least 12, at least 14, at least 16, at least 18, at least 20, at least 22, at least 24, at least 26, at least 28, at least 30, at least 32, at least 34, at least 36, at least 38, at least 40, at least 42, at least 44, at least 46, at least 48, at least 50, at least 52, at least 54, at least 56, at least 58, at least 60, at least 62, at least 64, at least 66, at least 68, at least 70, at least 72, at least 74, at least 76, at least 78, at least 80, at least 82 or up to 84 classifier biomarkers out of the 84 gene biomarkers of Table 1 in a cancer cell sample obtained from a patient in order to identify a COCA cancer subtype. In another embodiment, the methods provided herein require the detection of the expression level of a total of at least 4, at least 8, at least 12, at least 16, at least 20, at least 24, at least 28, at least 32, at least 36, at least 40, at least 44, at least 48, at least 52, at least 56, at least 60, at least 64, at least 68, at least 72, at least 76, at least 80 or up to 84 classifier biomarkers out of the 84 gene biomarkers of Table 1 in a cancer cell sample obtained from a patient in order to identify a COCA cancer subtype. The same applies for other classifier biomarker expression datasets as provided herein.
- In one embodiment, the expression level of one or more classifier biomarkers of Table 1 can be altered in a specific COCA subtype as detected in a sample obtained from a subject as described in any of the methods provided herein. The alteration of the expression level can be an “up-regulation” or “down-regulation” of the one or more classifier biomarkers of Table 1. In one embodiment, at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 18, at least 20, at least 22, at least 24, at least 26, at least 28, at least 30, at least 32, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83 or up to 84 classifier biomarkers out of the 84 gene biomarkers of Table 1 are “up-regulated” in a specific COCA subtype of cancer. In another embodiment, at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 18, at least 20, at least 22, at least 24, at least 26, at least 28, at least 30, at least 32, at least 34, at least 35, at least 36, at least 37, at least 38, at least 39, at least 40, at least 41, at least 42, at least 43, at least 44, at least 45, at least 46, at least 47, at least 48, at least 49, at least 50, at least 51, at least 52, at least 53, at least 54, at least 55, at least 56, at least 57, at least 58, at least 59, at least 60, at least 61, at least 62, at least 63, at least 64, at least 65, at least 66, at least 67, at least 68, at least 69, at least 70, at least 71, at least 72, at least 73, at least 74, at least 75, at least 76, at least 77, at least 78, at least 79, at least 80, at least 81, at least 82, at least 83 or up to 84 classifier biomarkers out of the 84 gene biomarkers of Table 1 are “down-regulated” in a specific COCA subtype of cancer. In a still further embodiment, in methods provided herein utilizing more than one classifier biomarker (e.g., more than one classifier biomarker from Table 1) to determine a COCA subtype, the alteration in expression levels of the more than one classifier biomarkers can either be an up-regulation, a down-regulation or any combination thereof. Further to any of the above embodiments, the alteration of the expression level can be relative to or compared to a sample isolated from a healthy subject as defined herein. The sample obtained from the healthy subject can be form the same anatomical area of the body. The same applies for other classifier biomarker expression datasets as provided herein.
- In one embodiment, the expression level of an “up-regulated” biomarker as provided herein is increased by about 0.2-fold, about 0.5-fold, about 1-fold, about 1.5-fold, about 2-fold, about 2.5-fold, about 3-fold, about 3.5-fold, about 4-fold, about 4.5-fold, about 5-fold, and any values in between. In another embodiment, the expression level of a “down-regulated” biomarker as provided herein is decreased by about 0.2-fold, about 0.5-fold, about 1-fold, about 1.5-fold, about 2-fold, about 2.5-fold, about 3-fold, about 3.5-fold, about 4-fold, about 4.5-fold, about 5-fold, and any values in between.
- It is recognized that additional genes or proteins or molecular platforms can be used in the practice of the methods provided herein. In general, genes useful in classifying the COCA subtypes of cancer include those that are independently capable of distinguishing between normal versus tumor, or between different classes or grades of cancer. A gene is considered to be capable of reliably distinguishing between COCA subtypes if the area under the receiver operator characteristic (ROC) curve is approximately 1. Further, in general, molecular platforms that generate data that can be useful in classifying the COCA subtypes of cancer can include genome-wide platforms such as, for example, whole-exome DNA sequencing assays (e.g., Illumina HiSeq and GAII), DNA copy-number variation assays (e.g., Affymetrix 6.0 microarrays), DNA methylation assays (e.g., Illumina 450,000-feature microarrays), genome-wide mRNA level assays (e.g., Illumina mRNA-seq), microRNA level assays (e.g., Illumina microRNA-seq), and protein level assays for proteins and/or phosphorylated proteins (e.g., Reverse Phase Protein Arrays; RPPA).
- In one embodiment, a method is provided herein for determining a disease outcome or prognosis for a patient suffering from cancer. In some cases, the cancer can be any cancer known in the art and/or provided herein. The disease outcome or prognosis can be measured by examining the overall survival for a period of time or intervals (e.g., 0 to 36 months or 0 to 60 months). In one embodiment, survival is analyzed as a function of COCA subtype. In one embodiment, survival is analyzed as a function of COCA subtype across tissue of origin tumor types. In one embodiment, survival is analyzed as a function of COCA subtype within a tissue of origin tumor type (see, for example,
FIGS. 6-8 ). The COCA subtype can be determined using the methods provided herein such as, for example, determining the expression of all or subsets of the genes in Table 1. Relapse-free and overall survival can be assessed using standard Kaplan-Meier plots as well as Cox proportional hazards modeling. - In one embodiment, the methods and compositions as provided herein for determining a COCA subtype of a patient suffering or suspected of suffering from cancer is used to determine whether or not said patient is a candidate for treatment with a specific type or types of cancer therapy. The sample can be any type of sample obtained from the patient as provided herein. The cancer can be any type of cancer known in the art and/or provided herein. In one embodiment, determining the COCA subtype is one of a number of methods that can be employed to characterize the sample obtained from the patient such that the determining the COCA subtype alone or in combination with one or more of the number of methods can be used to determine whether or not said patient is a candidate for treatment with a specific type or types of cancer therapy. In addition to assessing or determining a COCA subtype, the number of methods for characterizing the sample can entail determining a proliferation score, the tumor mutation burden (TMB), the tissue of origin subtype, the level of immune activation or any combination thereof. In one embodiment, one or all of the methods for characterizing the sample can be performed on RNA sequencing data obtained from the sample.
- In one embodiment, in addition to assessing the COCA subtype as provided herein, the characterization entails determining proliferation or proliferation score. In one embodiment, proliferation or the proliferation score is determined using any method known in the art such as, for example, as provided in U.S. 62/789,668 filed Jan. 8, 2019, which is herein incorporated by reference herein.
- In one embodiment, in addition to determining the COCA subtype as provided herein, the characterization entails calculating a TMB value and/or rate. The TMB value and/or rate can be calculated using any method known in the art. In one embodiment, the TMB value and/or rate can be calculated from RNA (e.g., via transcriptome profiling or RNA sequencing)) as provided in U.S. 62/771,702 filed Nov. 27, 2018 and U.S. 62/743,257 filed Oct. 9, 2018, which is herein incorporated by reference herein.
- The determination of whether or not said patient is a candidate for treatment with a specific type or types of cancer therapy can be based on the COCA subtype alone or in combination with other methods known in the art for characterizing a sample obtained from a patient suffering from or suspected of suffering from cancer. The other methods for characterizing said sample can be histologically based methods, gene expression based methods or a combination thereof. The histologically based methods can include histological cancer subtyping by one or more trained pathologists as well as the histological based methods of assessing proliferation such as, for example, determining the mitotic activity index. The gene expression based methods can include subtyping, assessment of TMB, assessment of tissue of origin subtype, immune subtyping or any combination thereof. The gene expression based methods can be assessed from DNA, RNA or a combination thereof. In one embodiment, the characterization of the sample obtained from the patient suffering from or suspected of suffering from cancer is performed on RNA obtained or isolated from the sample.
- The gene expression based tissue of origin cancer subtyping can be determined using gene signatures known in the art for specific types of cancer. In one embodiment, the tissue of origin of the cancer is the lung and the gene signature is selected from the gene signatures found in WO2017/201165, WO2017/201164, US20170114416 or U.S. Pat. No. 8,822,153, each of which is herein incorporated by reference in their entirety. In one embodiment, the tissue of origin cancer is head and neck squamous cell carcinoma (HNSCC) and the gene signature is selected from the gene signatures found in PCT/US18/45522 or PCT/US18/48862, each of which is herein incorporated by reference in their entirety. In one embodiment, the tissue of origin cancer is breast cancer and the gene signature is the PAM50 subtyper found in Parker J S et al., (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27:1160-1167, which is herein incorporated by reference in its entirety. In one embodiment, the tissue of origin cancer is bladder cancer (e.g., MIBC) and the gene signature is selected from the gene signatures found in 62/629,975 filed Feb. 13, 2018, which is herein incorporated by reference in their entirety. In one embodiment, the tissue of origin cancer is bladder cancer (e.g., MIBC) and the gene signature is selected from the gene signature found in The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma.
Nature volume 507, pages 315-322 (2014), or Robertson, A G, et al., Cell, 171(3): 540-556 (2017), each of which is herein incorporated by reference, which is herein incorporated by reference in their entirety. - The gene expression based immune subtyping or immune cell activation can be determined using immune expression signatures known in the art such as, for example, the gene signatures found in Thorsson, V., Gibbs, D. L., Brown, S. D., Wolf, D., Bortone, D. S., Yang, T. H. O., Porta-Pardo, E., Gao, G. F., Plaisier, C. L., Eddy, J. A. and Ziv, E., 2018, The immune landscape of cancer. Immunity, 48(4), pp. 812-830, which is herein incorporated by reference in its entirety. In one embodiment, immune cell activation is determined by monitoring the immune cell signatures of Bindea et al (Immunity 2013; 39(4); 782-795), the contents of which are herein incorporated by reference in its entirety. In one embodiment, the method further comprises measuring single gene immune biomarkers, such as, for example, CTLA4, PDCD1 and CD274 (PD-LI), PDCDLG2(PD-L2) and/or IFN gene signatures. In one embodiment, the level of immune cell activation is determined by measuring gene expression signatures of immunomarkers. The immunomarkers can be measured in the same and/or different sample used to determine the COCA subtype as described herein. The immunomarkers can be those found in WO2017/201165, and WO2017/201164, each of which is herein incorporated by reference in their entirety.
- The gene expression based method for calculating a TMB value and/or rate can be any method known in the art. In one embodiment, the TMB value and/or rate can be calculated from RNA (e.g., via transcriptome profiling or RNA sequencing)) as provided in U.S. 62/771,702 filed Nov. 27, 2018 and U.S. 62/743,257 filed Oct. 9, 2018, which is herein incorporated by reference herein.
- In one embodiment, upon determining a patient's COCA subtype (e.g., by measuring the expression of all or subsets of the genes in Table 1), the patient is selected for suitable therapy, for example, radiotherapy (radiation therapy), surgical intervention, target therapy, chemotherapy or drug therapy with an angiogenesis inhibitor or immunotherapy or combinations thereof. In some embodiments, the suitable treatment can be any treatment or therapeutic method that can be used for a cancer patient. In one embodiment, upon determining a patient's COCA subtype, the patient is administered a suitable therapeutic agent, for example chemotherapeutic agent(s) or an angiogenesis inhibitor or immunotherapeutic agent(s). In one embodiment, the therapy is immunotherapy, and the immunotherapeutic agent is a checkpoint inhibitor, monoclonal antibody, biological response modifier, therapeutic vaccine or cellular immunotherapy. In some embodiments, the determination of a suitable treatment can identify treatment responders. In some embodiments, the determination of a suitable treatment can identify treatment non-responders. In some embodiments, upon determining a patient's COCA subtype, the cancer patient can be selected for any combination of suitable therapies. For example, chemotherapy or drug therapy with a radiotherapy, a tumor dissection with an immunotherapy or a chemotherapeutic agent with a radiotherapy. In some embodiments, immunotherapy, or immunotherapeutic agent can be a checkpoint inhibitor, monoclonal antibody, biological response modifier, therapeutic vaccine or cellular immunotherapy.
- The methods provided herein are also useful for evaluating clinical response to therapy, as well as for endpoints in clinical trials for efficacy of new therapies. The extent to which sequential diagnostic expression profiles move towards normal can be used as one measure of the efficacy of the candidate therapy.
- In one embodiment, the methods provided herein also find use in predicting response to different lines of therapies based on the COCA subtype of cancer alone or in combination with other characterization methods as described herein (e.g., tissue of origin cancer subtype, immune subtype, proliferation and/or TMB status). For example, chemotherapeutic response can be improved by more accurately assigning tumor cell of origin subtypes. Likewise, treatment regimens can be formulated based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., tissue of origin cancer subtype, immune subtype, proliferation and/or TMB status).
- Immunotherapy
- In one embodiment, provided herein is a method for determining whether a cancer patient is likely to respond to immunotherapy by determining the COCA subtype of cancer of a sample obtained from the patient and, based on the COCA subtype, assessing whether the patient is likely to respond to immunotherapy. In another embodiment, provided herein is a method of selecting a patient suffering from cancer for immunotherapy by determining a COCA subtype of a sample from the patient and, based on the COCA subtype, selecting the patient for immunotherapy. The determination of the COCA subtype of the sample obtained from the patient can be performed using any method for COCA subtyping known in the art. The determination of the COCA subtype of the sample obtained from the patient can be performed using any method for COCA subtyping provided herein. In one embodiment, the sample obtained from the patient has been previously diagnosed as being a particular type of cancer, and the methods provided herein are used to determine the COCA subtype of the sample. The previous diagnosis can be based on a histological analysis. The histological analysis can be performed by one or more pathologists. In one embodiment, the COCA subtyping is performed via gene expression analysis of a set or panel of biomarkers or subsets thereof in order to generate an expression profile. The gene expression analysis can be performed on a tumor sample obtained from a patient in order to determine the presence, absence or level of expression of one or more biomarkers selected from a publically available pan-cancer database described herein and/or Table 1 provided herein. The COCA subtype can be selected from the group consisting of C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA). The immunotherapy can be any immunotherapy provided herein. In one embodiment, the immunotherapy comprises administering one or more checkpoint inhibitors. The checkpoint inhibitors can be any checkpoint inhibitor provided herein such as, for example, a checkpoint inhibitor that targets PD-1, PD-LI or CTLA4.
- As disclosed herein, the biomarkers panels, or subsets thereof, can be those disclosed in any publically available pan-cancer gene expression dataset or datasets. In one embodiment, the biomarker panel or subset thereof is, for example, the cancer genome atlas pan-cancer mRNA expression dataset. In one embodiment, the biomarker panel or subset thereof is, for example, the pan-cancer mRNA expression dataset disclosed in Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304, the contents of which are herein incorporated by reference in its entirety. In one embodiment, the biomarker panel or subset thereof is, for example, the gene expression signature disclosed in Table 1 in combination with one or more biomarkers from a publically available pan-cancer expression dataset.
- In one embodiment, from about 1 to about 4, about 4 to about 8, from about 4 to about 12, from about 4 to about 16, from about 4 to about 20, from about 4 to about 24, from about 4 to about 28, from about 4 to about 32, from about 4 to about 36, from about 4 to about 40, from about 4 to about 44, from about 4 to about 48, from about 4 to about 52, from about 4 to about 56, from about 4 to about 60, from about 4 to about 64, from about 4 to about 68, from about 4 to about 72, from about 4 to about 76, from about 4 to about 80 or from about 4 to about 84 of the biomarkers in any of the pan-cancer gene expression datasets provided herein, including, for example, Table 1 for a tumor sample are detected in a method to determine the COCA subtype as provided herein. In another embodiment, each of the biomarkers from any one of the pan-cancer gene expression datasets provided herein, including, for example, Table 1 for a tumor sample are detected in a method to determine the COCA subtype as provided herein.
- In one embodiment, the methods provided herein further comprise determining the presence, absence or level of immune activation in a COCA subtype. The presence or level of immune cell activation can be determined by creating an expression profile or detecting the expression of one or more biomarkers associated with innate immune cells and/or adaptive immune cells associated with each COCA subtype in a sample obtained from a patient. In one embodiment, immune cell activation associated with a COCA subtype of cancer is determined by monitoring the immune cell signatures of Thorsson, V. et al., 2018, The immune landscape of cancer. Immunity, 48(4), pp. 812-830, Bindea et al (Immunity 2013; 39(4); 782-795) Faruki H. et al., JTO, 12(6): 943-953 (2017), Charoentong P. et al., Cell reports, 18, 248-262 (2017) and/or WO2017/201165 and WO2017/201164, the contents of each of which are herein incorporated by reference in its entirety. In one embodiment, the method further comprises measuring single gene immune biomarkers, such as, for example, CTLA4, PDCD1 and CD274 (PD-LI), PDCDLG2(PD-L2) and/or IFN gene signatures. The presence or a detectable level of immune activation (Innate and/or Adaptive) associated with a COCA subtype can indicate or predict that a patient with said COCA subtype may be amendable to immunotherapy. The immunotherapy can be treatment with a checkpoint inhibitor as provided herein. In one embodiment, a method is provided herein for detecting the expression of at least one classifier biomarker provided herein in a sample (e.g., tumor sample) obtained from a patient further comprises administering an immunotherapeutic agent following detection of immune activation as provided herein in said sample.
- In one embodiment, the method comprises determining a COCA subtype of a tumor sample and subsequently determining a level of immune cell activation of said sub-type. In one embodiment, the subtype is determined by determining the expression levels of one or more classifier biomarkers at the nucleic acid level using sequencing (e.g., RNASeq), amplification (e.g., qRT-PCR) or hybridization assays (e.g., microarray analysis) as described herein. The one or more biomarkers can be selected from a publically available database (e.g., TCGA pan-cancer mRNA expression datasets or any other publically available pan-cancer gene expression datasets provided herein). In some embodiments, the biomarkers of Table 1 can be used to specifically determine the COCA subtype of a tumor sample obtained from a patient. In one embodiment, the level of immune cell activation is determined by measuring gene expression signatures of immunomarkers. The immunomarkers can be measured in the same and/or different sample used to subtype the tumor sample as described herein. The immunomarkers that can be measured can comprise, consist of, or consistently essentially of innate immune cell (IIC) and/or adaptive immune cell (AIC) gene signatures, interferon (IFN) gene signatures, individual immunomarkers, major histocompatability complex class II (MHC class II) genes or a combination thereof. The gene expression signatures for IICs, AICs, IFN and MHC class II can be any known gene signatures for said cell types or genes known in the art. For example, the immune gene signatures can be those from Bindea et al. (Immunity 2013; 39(4); 782-795), Faruki H. et al., JTO, 12(6): 943-953 (2017), Charoentong P. et al., Cell reports, 18, 248-262 (2017) and/or WO2017/201165 and WO2017/201164. The individual immunomarkers can be CTLA4, PDCD1 and CD274 (PD-L1). In one embodiment, immune subtyping or immune cell activation can be determined using the gene signatures found in Thorsson, V., Gibbs, D. L., Brown, S. D., Wolf, D., Bortone, D. S., Yang, T. H. O., Porta-Pardo, E., Gao, G. F., Plaisier, C. L., Eddy, J. A. and Ziv, E., 2018, The immune landscape of cancer. Immunity, 48(4), pp. 812-830.
- In one embodiment, upon determining a patient's COCA cancer subtype using any of the methods and classifier biomarkers panels or subsets thereof as provided herein, the patient is selected for treatment with or administered an immunotherapeutic agent. The immunotherapeutic agent can be a checkpoint inhibitor, monoclonal antibody, biological response modifiers, therapeutic vaccine or cellular immunotherapy.
- In another embodiment, the immunotherapeutic agent is a checkpoint inhibitor. In some cases, a method for determining the likelihood of response to one or more checkpoint inhibitors is provided. In one embodiment, the checkpoint inhibitor is a PD-1/PD-LI checkpoint inhibitor. The PD-1/PD-LI checkpoint inhibitor can be nivolumab, pembrolizumab, atezolizumab, durvalumab, lambrolizumab, or avelumab. In one embodiment, the checkpoint inhibitor is a CTLA-4 checkpoint inhibitor. The CTLA-4 checkpoint inhibitor can be ipilimumab or tremelimumab. In one embodiment, the checkpoint inhibitor is a combination of checkpoint inhibitors such as, for example, a combination of one or more PD-1/PD-LI checkpoint inhibitors used in combination with one or more CTLA-4 checkpoint inhibitors.
- In one embodiment, the immunotherapeutic agent is a monoclonal antibody. In some cases, a method for determining the likelihood of response to one or more monoclonal antibodies is provided. The monoclonal antibody can be directed against tumor cells or directed against tumor products. The monoclonal antibody can be panitumumab, matuzumab, necitumunab, trastuzumab, amatuximab, bevacizumab, ramucirumab, bavituximab, patritumab, rilotumumab, cetuximab, immu-132, or demcizumab.
- In yet another embodiment, the immunotherapeutic agent is a therapeutic vaccine. In some cases, a method for determining the likelihood of response to one or more therapeutic vaccines is provided. The therapeutic vaccine can be a peptide or tumor cell vaccine. The vaccine can target MAGE-3 antigens, NY-ESO-1 antigens, p53 antigens, survivin antigens, or MUC1 antigens. The therapeutic cancer vaccine can be GVAX (GM-CSF gene-transfected tumor cell vaccine), belagenpumatucel-L (allogeneic tumor cell vaccine made with four irradiated NSCLC cell lines modified with TGF-beta2 antisense plasmid), MAGE-A3 vaccine (composed of MAGE-A3 protein and adjuvant AS15), (1)-BLP-25 anti-MUC-1 (targets MUC-1 expressed on tumor cells), CimaVax EGF (vaccine composed of human recombinant Epidermal Growth Factor (EGF) conjugated to a carrier protein), WT1 peptide vaccine (composed of four Wilms' tumor suppressor gene analogue peptides), CRS-207 (live-attenuated Listeria monocytogenes vector encoding human mesothelin), Bec2/BCG (induces anti-GD3 antibodies), GV1001 (targets the human telomerase reverse transcriptase), TG4010 (targets the MUC1 antigen), racotumomab (anti-idiotypic antibody which mimicks the NGcGM3 ganglioside that is expressed on multiple human cancers), tecemotide (liposomal BLP25; liposome-based vaccine made from tandem repeat region of MUC1) or DRibbles (a vaccine made from nine cancer antigens plus TLR adjuvants).
- In one embodiment, the immunotherapeutic agent is a biological response modifier. In some cases, a method for determining the likelihood of response to one or more biological response modifiers is provided. The biological response modifier can trigger inflammation such as, for example, PF-3512676 (CpG 7909) (a toll-like receptor 9 agonist), CpG-ODN 2006 (downregulates Tregs), Bacillus Calmette-Guerin (BCG), mycobacterium vaccae (SRL172) (nonspecific immune stimulants now often tested as adjuvants). The biological response modifier can be cytokine therapy such as, for example, IL-2+tumor necrosis factor alpha (TNF-alpha) or interferon alpha (induces T-cell proliferation), interferon gamma (induces tumor cell apoptosis), or Mda-7 (IL-24) (Mda-7/IL-24 induces tumor cell apoptosis and inhibits tumor angiogenesis). The biological response modifier can be a colony-stimulating factor such as, for example granulocyte colony-stimulating factor. The biological response modifier can be a multi-modal effector such as, for example, multi-target VEGFR: thalidomide and analogues such as lenalidomide and pomalidomide, cyclophosphamide, cyclosporine, denileukin diftitox, talactoferrin, trabecetedin or all-trans-retinoic acid.
- In one embodiment, the immunotherapy is cellular immunotherapy. In some cases, a method for determining the likelihood of response to one or more cellular therapeutic agents. The cellular immunotherapeutic agent can be dendritic cells (DCs) (ex vivo generated DC-vaccines loaded with tumor antigens), T-cells (ex vivo generated lymphokine-activated killer cells; cytokine-induce killer cells; activated T-cells; gamma delta T-cells), or natural killer cells.
- In some cases, specific COCA subtypes of cancer have different levels of immune activation (e.g., innate immunity and/or adaptive immunity) such that COCA subtypes with elevated or detectable immune activation (e.g., innate immunity and/or adaptive immunity) are selected for treatment with one or more immunotherapeutic agents described herein. In some cases, specific COCA subtypes of cancer have high or elevated levels of immune activation. In some cases, the C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and/or C28 (THCA) subtype has elevated levels of immune activation (e.g., innate immunity and/or adaptive immunity) as compared to other blaCOCA subtypes. In some cases, the C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and/or C28 (THCA) subtype has reduced levels of immune activation (e.g., innate immunity and/or adaptive immunity) as compared to other COCA subtypes. In one embodiment, COCA subtypes with low levels of or no immune activation (e.g., innate immunity and/or adaptive immunity) are not selected for treatment with one or more immunotherapeutic agents described herein.
- In one embodiment, upon determining a patient's or subject's COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), the patient is selected for drug therapy with an angiogenesis inhibitor.
- In one embodiment, the angiogenesis inhibitor is a vascular endothelial growth factor (VEGF) inhibitor, a VEGF receptor inhibitor, a platelet derived growth factor (PDGF) inhibitor or a PDGF receptor inhibitor.
- In general, methods of determining whether a patient is likely to respond to angiogenesis inhibitor therapy, or methods of selecting a patient for angiogenesis inhibitor therapy are provided herein. In one embodiment, the method comprises determining a COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) and probing a sample from the patient for the levels of at least five hypoxia biomarkers selected from the group consisting of RRAGD, FABP5, UCHL1, GAL, PLOD, DDIT4, VEGF, ADM, ANGPTL4, NDRG1, NP, SLC16A3, and C14ORF58 (see Table A) at the nucleic acid level. In a further embodiment, the probing step comprises mixing the sample with five or more oligonucleotides that are substantially complementary to portions of nucleic acid molecules of the at least five biomarkers under conditions suitable for hybridization of the five or more oligonucleotides to their complements or substantial complements, detecting whether hybridization occurs between the five or more oligonucleotides to their complements or substantial complements; and obtaining hybridization values of the sample based on the detecting steps. The hybridization values of the sample are then compared to reference hybridization value(s) from at least one sample training set, wherein the at least one sample training set comprises (i) hybridization value(s) of the at least five biomarkers from a sample that overexpresses the at least five biomarkers, or overexpresses a subset of the at least five biomarkers, (ii) hybridization values of the at least five biomarkers from a reference cancer of COCA subtype specific sample, or (iii) hybridization values of the at least five biomarkers from a control or healthy sample. A determination of whether the patient is likely to respond to angiogenesis inhibitor therapy, or a selection of the patient for angiogenesis inhibitor is then made based upon (i) the patient's COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) and (ii) the results of comparison.
-
TABLE A Biomarkers for hypoxia profile GenBank Name Abbreviation Accession No. RRAGD Ras-related GTP BC003088 binding D FABP5 fatty acid binding M94856 protein 5 UCHL1 ubiquitin carboxyl- NM_004181 terminal esterase L1 GAL Galanin BC030241 PLOD procollagen-lysine, M98252 2-oxoglutarate 5- dioxygenase lysine hydroxylase DDIT4 DNA-damage- inducible NM_019058 transcript 4 VEGF vascular endothelial M32977 growth factor ADM Adrenomedullin NM_001124 ANGPTL4 angiopoietin-like 4 AF202636 NDRG1 N-myc downstream NM_006096 regulated gene 1NP nucleoside phosphorylase NM 000270 SLC16A3 solute carrier family NM_004207 16 monocarboxylic acid transporters, member 3C14ORF58 chromosome 14 open AK000378 reading frame 58 - The aforementioned set of thirteen biomarkers, or a subset thereof, is also referred to herein as a “hypoxia profile”.
- In one embodiment, the method provided herein includes determining the levels of at least five biomarkers, at least six biomarkers, at least seven biomarkers, at least eight biomarkers, at least nine biomarkers, or at least ten biomarkers, or five to thirteen, six to thirteen, seven to thirteen, eight to thirteen, nine to thirteen or ten to thirteen biomarkers selected from RRAGD, FABP5, UCHL1, GAL, PLOD, DDIT4, VEGF, ADM, ANGPTL4, NDRG1, NP, SLC16A3, and C14ORF58 in a sample obtained from a subject. Biomarker expression in some instances may be normalized against the expression levels of all RNA transcripts or their expression products in the sample, or against a reference set of RNA transcripts or their expression products. The reference set as explained throughout, may be an actual sample that is tested in parallel with the sample, or may be a reference set of values from a database or stored dataset. Levels of expression, in one embodiment, are reported in number of copies, relative fluorescence value or detected fluorescence value. The level of expression of the biomarkers of the hypoxia profile together with the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) as determined using the methods provided herein can be used in the methods described herein to determine whether a patient is likely to respond to angiogenesis inhibitor therapy.
- In one embodiment, the levels of expression of the thirteen biomarkers (or subsets thereof, as described above, e.g., five or more, from about five to about 13), are normalized against the expression levels of all RNA transcripts or their non-natural cDNA expression products, or protein products in the sample, or of a reference set of RNA transcripts or a reference set of their non-natural cDNA expression products, or a reference set of their protein products in the sample.
- In one embodiment, angiogenesis inhibitor treatments include, but are not limited to an integrin antagonist, a selectin antagonist, an adhesion molecule antagonist, an antagonist of intercellular adhesion molecule (ICAM)-1, ICAM-2, ICAM-3, platelet endothelial adhesion molecule (PCAM), vascular cell adhesion molecule (VCAM)), lymphocyte function-associated antigen 1 (LFA-1), a basic fibroblast growth factor antagonist, a vascular endothelial growth factor (VEGF) modulator, a platelet derived growth factor (PDGF) modulator (e.g., a PDGF antagonist).
- In one embodiment of determining whether a subject is likely to respond to an integrin antagonist, the integrin antagonist is a small molecule integrin antagonist, for example, an antagonist described by Paolillo et al. (Mini Rev Med Chem, 2009,
volume 12, pp. 1439-1446, incorporated by reference in its entirety), or a leukocyte adhesion-inducing cytokine or growth factor antagonist (e.g., tumor necrosis factor-α (TNF-α), interleukin-1β (IL-1β), monocyte chemotactic protein-1 (MCP-1) and a vascular endothelial growth factor (VEGF)), as described in U.S. Pat. No. 6,524,581, incorporated by reference in its entirety herein. - The methods provided herein are also useful for determining whether a subject is likely to respond to one or more of the following angiogenesis inhibitors: interferon gamma 1β, interferon gamma 1β (Actimmune®) with pirfenidone, ACUHTR028, αVβ5, aminobenzoate potassium, amyloid P, ANG1122, ANG1170, ANG3062, ANG3281, ANG3298, ANG4011, anti-CTGF RNAi, Aplidin, Astragalus membranaceus extract with salvia and Schisandra chinensis, atherosclerotic plaque blocker, Azol, AZX100, BB3, connective tissue growth factor antibody, CT140, danazol, Esbriet, EXC001, EXC002, EXC003, EXC004, EXC005, F647, FG3019, Fibrocorin, Follistatin, FT011, a galectin-3 inhibitor, GKT137831, GMCT01, GMCT02, GRMD01, GRMD02, GRN510, Heberon Alfa R, interferon α-2β, ITMN520, JKB119, JKB121, JKB122, KRX168, LPA1 receptor antagonist, MGN4220, MIA2, microRNA 29a oligonucleotide, MMI0100, noscapine, PBI4050, PBI4419, PDGFR inhibitor, PF-06473871, PGN0052, Pirespa, Pirfenex, pirfenidone, plitidepsin, PRM151, Px102, PYN17, PYN22 with PYN17, Relivergen, rhPTX2 fusion protein, RXI109, secretin, STX100, TGF-β Inhibitor, transforming growth factor, β-
receptor 2 oligonucleotide, VA999260, XV615 or a combination thereof. - In another embodiment, a method is provided for determining whether a subject is likely to respond to one or more endogenous angiogenesis inhibitors. In a further embodiment, the endogenous angiogenesis inhibitor is endostatin, a 20 kDa C-terminal fragment derived from type XVIII collagen, angiostatin (a 38 kDa fragment of plasmin), a member of the thrombospondin (TSP) family of proteins. In a further embodiment, the angiogenesis inhibitor is a TSP-1, TSP-2, TSP-3, TSP-4 and TSP-5. Methods for determining the likelihood of response to one or more of the following angiogenesis inhibitors are also provided a soluble VEGF receptor, e.g., soluble VEGFR-1 and neuropilin 1 (NPR1), angiopoietin-1, angiopoietin-2, vasostatin, calreticulin, platelet factor-4, a tissue inhibitor of metalloproteinase (TIMP) (e.g., TIMP1, TIMP2, TIMP3, TIMP4), cartilage-derived angiogenesis inhibitor (e.g., peptide troponin I and chrondomodulin I), a disintegrin and metalloproteinase with
thrombospondin motif 1, an interferon (IFN), (e.g., IFN-α, IFN-β, IFN-γ), a chemokine, e.g., a chemokine having the C—X—C motif (e.g., CXCL10, also known as interferon gamma-inducedprotein 10 or small inducible cytokine B10), an interleukin cytokine (e.g., IL-4, IL-12, IL-18), prothrombin, antithrombin III fragment, prolactin, the protein encoded by the TNFSF15 gene, osteopontin, maspin, canstatin, proliferin-related protein. - In one embodiment, a method for determining the likelihood of response to one or more of the following angiogenesis inhibitors is provided is angiopoietin-1, angiopoietin-2, angiostatin, endostatin, vasostatin, thrombospondin, calreticulin, platelet factor-4, TIMP, CDAI, interferon α, interferon β, vascular endothelial growth factor inhibitor (VEGI) meth-1, meth-2, prolactin, VEGI, SPARC, osteopontin, maspin, canstatin, proliferin-related protein (PRP), restin, TSP-1, TSP-2, interferon gamma 1β, ACUHTR028, αVβ5, aminobenzoate potassium, amyloid P, ANG1122, ANG1170, ANG3062, ANG3281, ANG3298, ANG4011, anti-CTGF RNAi, Aplidin, Astragalus membranaceus extract with salvia and Schisandra chinensis, atherosclerotic plaque blocker, Azol, AZX100, BB3, connective tissue growth factor antibody, CT140, danazol, Esbriet, EXC001, EXC002, EXC003, EXC004, EXC005, F647, FG3019, Fibrocorin, Follistatin, FT011, a galectin-3 inhibitor, GKT137831, GMCT01, GMCT02, GRMD01, GRMD02, GRN510, Heberon Alfa R, interferon α-213, ITMN520, JKB119, JKB121, JKB122, KRX168, LPA1 receptor antagonist, MGN4220, MIA2, microRNA 29a oligonucleotide, MMI0100, noscapine, PBI4050, PBI4419, PDGFR inhibitor, PF-06473871, PGN0052, Pirespa, Pirfenex, pirfenidone, plitidepsin, PRM151, Px102, PYN17, PYN22 with PYN17, Relivergen, rhPTX2 fusion protein, RXI109, secretin, STX100, TGF-β Inhibitor, transforming growth factor, β-receptor 2 oligonucleotide, VA999260, XV615 or a combination thereof.
- In yet another embodiment, the angiogenesis inhibitor can include pazopanib (Votrient), sunitinib (Sutent), sorafenib (Nexavar), axitinib (Inlyta), ponatinib (Iclusig), vandetanib (Caprelsa), cabozantinib (Cometrig), ramucirumab (Cyramza), regorafenib (Stivarga), ziv-aflibercept (Zaltrap), motesanib, or a combination thereof. In another embodiment, the angiogenesis inhibitor is a VEGF inhibitor. In a further embodiment, the VEGF inhibitor is axitinib, cabozantinib, aflibercept, brivanib, tivozanib, ramucirumab or motesanib. In yet a further embodiment, the angiogenesis inhibitor is motesanib.
- In one embodiment, the methods provided herein relate to determining a subject's likelihood of response to an antagonist of a member of the platelet derived growth factor (PDGF) family, for example, a drug that inhibits, reduces or modulates the signaling and/or activity of PDGF-receptors (PDGFR). For example, the PDGF antagonist, in one embodiment, is an anti-PDGF aptamer, an anti-PDGF antibody or fragment thereof, an anti-PDGFR antibody or fragment thereof, or a small molecule antagonist. In one embodiment, the PDGF antagonist is an antagonist of the PDGFR-α or PDGFR-β. In one embodiment, the PDGF antagonist is the anti-PDGF-β aptamer E10030, sunitinib, axitinib, sorefenib, imatinib, imatinib mesylate, nintedanib, pazopanib HCl, ponatinib, MK-2461, dovitinib, pazopanib, crenolanib, PP-121, telatinib, imatinib, KRN 633, CP 673451, TSU-68, Ki8751, amuvatinib, tivozanib, masitinib, motesanib diphosphate, dovitinib dilactic acid, linifanib (ABT-869).
- Upon making a determination of whether a patient is likely to respond to angiogenesis inhibitor therapy, or selecting a patient for angiogenesis inhibitor therapy, in one embodiment, the patient is administered the angiogenesis inhibitor. The angiogenesis in inhibitor can be any of the angiogenesis inhibitors described herein.
- In one embodiment, provided herein is a method for determining whether a patient is likely to respond to radiotherapy by determining the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample obtained from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), assessing whether the patient is likely to respond to or benefit from radiotherapy. In another embodiment, provided herein is a method of selecting a patient suffering from cancer for radiotherapy by determining a COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), selecting the patient for radiotherapy.
- In some embodiments, the radiotherapy can include but are not limited to proton therapy and external-beam radiation therapy. In some embodiments, the radiotherapy can include any types or forms of treatment that is suitable for patients with specific types of cancer.
- In some embodiments, a patient with a specific type of cancer can have or display resistance to radiotherapy. Radiotherapy resistance in any cancer or subtype thereof can be determined by measuring or detecting the expression levels of one or more genes known in the art and/or provided herein associated with or related to the presence of radiotherapy resistance. Genes associated with radiotherapy resistance can include NFE2L2, KEAP1 and CUL3. In some embodiments, radiotherapy resistance can be associated with the alterations of KEAP1 (Kelch-like ECH-associated protein 1)/NRF2 (nuclear factor E2-related factor 2) pathway. Association of a particular gene to radiotherapy resistance can be determined by examining expression of said gene in one or more patients known to be radiotherapy non-responders and comparing expression of said gene in one or more patients known to be radiotherapy responders.
- In one embodiment, provided herein is a method for determining whether a cancer patient is likely to respond to surgical intervention by determining the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample obtained from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), assessing whether the patient is likely to respond to or benefit from surgery. In another embodiment, provided herein is a method of selecting a patient suffering from cancer for surgery by determining a COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.) of a sample from the patient and, based on the COCA subtype alone or in combination with other characterization methods as described herein (e.g., determining tissue of origin cancer subtype, proliferation signature or score, immune subtype and/or TMB status, etc.), selecting the patient for surgery. In some embodiments, the surgery can include laser technology, excision, dissection, and reconstructive surgery.
- The present disclosure provides methods for predicting overall survival rate for a cancer patient. In some embodiments, the prediction of overall survival rate can involve obtaining a tumor sample for a cancer patient. In some embodiments, the cancer patients can have various stages of cancers. In some embodiments, the overall survival rate can be determined by detecting the expression level of at least one subtype classifier of a publically available pan-cancer database or dataset. In some embodiments, an overall survival rate can be determined by detecting the expression level (e.g., protein and/or nucleic acid) of any subtype classifiers that are relevant across many types of cancer, for example, subtype classifiers relevant to cell of origin. In one embodiment, the subtype classifiers can be all or a subset of classifiers from Table 1. In some embodiments, the identification of the cell of origin (COCA) subtype is indicative of the overall survival in the patient. In some embodiments, the COCA subtype is selected from C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM and C28 THCA.
- The present disclosure provides methods for predicting nodal metastasis for a cancer patient. In some embodiments, the prediction of nodal metastasis can involve obtaining a tumor sample for a patient. In some embodiments, the patients can have various stages of cancers. In some embodiments, the nodal metastasis can be determined by detecting the expression level of at least one subtype classifier from a pan-cancer gene set. The pan-cancer gene set can be a publically available pan-cancer database or a gene set provided herein (e.g. Table 1) or a combination thereof. The publically available pan-cancer gene set can be a TCGA pan-cancer gene set. In one embodiment, nodal metastasis of cancer can be determined by detecting the expression level of all the subtype classifiers or subsets thereof of the classifiers found in Table 1.
- In some embodiments, the C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype can be more likely to be associated with nodal metastasis compared with other subtypes. In some embodiments, the C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype can be most likely associated with positive lymph node metastasis compared with other subtypes. In some embodiments, the C1 ACC/PCPG, C2 GBM/LGG, C3 OV, C4 Squamous-like, C6 LUAD-Enriched, C8 PAAD/some STAD, C9 UCS, C10 BRCA/Basal, C12 UCEC, C14 PRAD, C15 CESC (subset of cervical), C16 BLCA, C17 TGCT, C19 COAD/READ, C20 SARC/MESO, C21 KIRK/KICH/KIRP, C22 Liver, C24 BRCA/Luminal, C25 THYM, C26 SKCM/UVM or C28 THCA COCA subtype can be at least about 0.1 times, at least about 0.2 times, at least about 0.3 times, at least about 0.4 times, at least about 0.5 times, at least about 0.6 times, at least about 0.7 times, at least about 0.8 times, at least about 0.9 times, at least about 1 time, at least about 1.2 times, at least about 1.5 times, at least about 1.7 times, at least about 2.0 times, at least about 2.2 times, at least about 2.5 times, at least about 2.7 times, at least about 3.0 times, at least about 3.2 times, at least about 3.5 times, at least about 3.7 times, at least about 4.0 times, at least about 4.2 times, at least about 4.5 times, at least about 4.7 times, at least about 5.0 times, inclusive of all ranges and subranges therebetween, more likely to have occult nodal metastasis compared to other COCA subtypes.
- In one embodiment, the methods and compositions provided herein allow for the detection of at least one biomarker in a tumor sample obtained from a subject. The at least one biomarker can be a classifier biomarker provided herein. The detection can be at the nucleic acid level or protein level. In one embodiment, the detection is at the nucleic acid level and the detection can be by using any amplification, hybridization and/or sequencing assay disclosed herein. In one embodiment, the at least one biomarker detected using the methods and compositions provided herein is selected from Table 1. Further to the above embodiment, the detection of the at least one biomarker selected from Table 1 is at the nucleic acid level. In one embodiment, the methods of detecting the biomarker(s) (e.g., classifier biomarkers) in the tumor sample obtained from the subject comprises, consists essentially of, or consists of measuring the expression level of at least one or a plurality of biomarkers using any of the methods provided herein. The biomarkers can be selected from Table 1. In one embodiment, the plurality of biomarker nucleic acids comprises, consists essentially of or consists of at least 4 biomarkers, at least 8 biomarkers, at least 12 biomarkers, at least 16 biomarkers, at least 20 biomarkers, at least 24 biomarkers, at least 28 biomarkers, at least 32 biomarkers, at least 36 biomarkers, at least 40 biomarkers, at least 44 biomarkers, at least 48 biomarkers, at least 52 biomarkers, at least 56 biomarkers, at least 60 biomarkers, at least 64 biomarkers, at least 68 biomarkers, at least 72 biomarkers, at least 76 biomarkers, at least 80 biomarkers or all 84 biomarkers of Table 1. In another embodiment, the plurality of biomarkers comprises, consists essentially of or consists of at least 8 biomarkers, at least 16 biomarkers, at least 24 biomarkers, at least 32 biomarkers, at least 40 biomarkers, at least 48 biomarkers, at least 56 biomarkers, at least 64 biomarkers, at least 72 biomarkers, at least 80 biomarkers or all 84 biomarkers of Table 1.
- In another embodiment, the methods and compositions provided herein allow for the detection of at least one or a plurality of biomarkers selected from the biomarkers listed in Table 1 in combination with the detection of at least one or a plurality of biomarkers from one or more additional sets of biomarkers in a tumor sample obtained from a subject. The tumor sample can be any type of sample provided herein. The subject can be suffering from or suspected of suffering from cancer. The cancer can be any type of cancer provided herein. The detection can be at the nucleic acid level or protein level. In one embodiment, the detection is at the nucleic acid level and the detection can be by using any amplification, hybridization and/or sequencing assay disclosed herein. The one or more additional sets of biomarkers can be selected from a set of biomarkers whose presence, absence and/or level of expression is indicative of immune activation, proliferation, a tissue of origin cancer subtype, or any combination thereof. The additional set of biomarkers for indicating immune activation can be gene expression signatures of and/or Adaptive Immune Cells (AIC) and/or Innate immune Cells (IIC), individual immune biomarkers, interferon genes, major histocompatibility complex, class II (MHC II) genes or a combination thereof. The gene expression signatures of both IIC and AIC can be any gene signatures known in the art such as, for example, the gene signatures listed in Thorsson, V. et al., 2018, The immune landscape of cancer. Immunity, 48(4), pp. 812-830, Bindea et al. (Immunity 2013; 39(4); 782-795), Faruki H. et al., JTO, 12(6): 943-953 (2017), Charoentong P. et al., Cell reports, 18, 248-262 (2017) or WO2017/201165 and WO2017/201164, each of which is herein incorporated by reference in their entirety. The additional set of biomarkers for indicating proliferation can be gene expression signatures that include the 11 gene signature comprising BIRC5, CCNB1, CDC20, CDCA1, CEP55, KNTC2, MKI67, PTTG1, RRM2, TYMS, and UBE2C found in Martin M. et al., Breast Cancer Res Treat, 138: 457-466 (2013), the 18 gene signature found in US 20160115551 and/or the 26 gene signature found in 62/789,668 filed Jan. 8, 2019. The additional set of biomarkers for determining tissue of origin cancer subtypes can be any gene signature found in the art for subtyping specific tissue of origin cancers. In one embodiment, the additional set of biomarkers for determining tissue of origin cancer subtypes is the adenocarcinoma lung cancer subtyping gene expression signatures found in WO2017/201165, US20170114416 or U.S. Pat. No. 8,822,153. In one embodiment, the additional set of biomarkers for determining tissue of origin cancer subtypes is the squamous cell carcinoma lung cancer subtyping gene expression signatures found in WO2017/201164, US20170114416 or U.S. Pat. No. 8,822,153. In one embodiment, the additional set of biomarkers for determining tissue of origin cancer subtypes is the breast cancer subtyping gene expression signatures found in Parker J S et al., (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27:1160-1167, which is herein incorporated by reference in its entirety. In one embodiment, the additional set of biomarkers for determining tissue of origin cancer subtypes is the bladder cancer subtyping gene expression signatures found in 62/629,975 filed Feb. 13, 2018. In one embodiment, the additional set of biomarkers for determining tissue of origin cancer subtypes is the bladder cancer subtyping gene expression signatures found in The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma.
Nature volume 507, pages 315-322 (2014), or Robertson, A G, et al., Cell, 171(3): 540-556 (2017), each of which is herein incorporated by reference. In one embodiment, the additional set of biomarkers for determining tissue of origin cancer subtypes is a head and neck squamous cell carcinoma (HNSCC) subtyping gene expression signatures selected from PCT/US18/45522 or PCT/US18/48862. Further to any of the above embodiments, the methods and compositions provided herein further comprise determining tumor mutation burden (TMB) and/or TMB rate of the tumor sample. The TMB and/or TMB rate can be determined or calculated using any method known in the art. In one embodiment, the TMB and/or TMB rate is determined from RNA as described in 62/743,257 filed on Oct. 9, 2018 and 62/771,702 filed on Nov. 27, 2018. - Kits for practicing the methods provided herein can be further provided. By “kit” can encompass any manufacture (e.g., a package or a container) comprising at least one reagent, e.g., an antibody, a nucleic acid probe or primer, etc., for specifically detecting the expression of a biomarker provided herein. The kit may be promoted, distributed, or sold as a unit for performing the methods provided herein. Additionally, the kits may contain a package insert describing the kit and methods for its use.
- In one embodiment, kits for practicing the methods provided herein are provided. Such kits are compatible with both manual and automated immunocytochemistry techniques (e.g., cell staining). These kits comprise at least one antibody directed to a biomarker of interest, chemicals for the detection of antibody binding to the biomarker, a counterstain, and, optionally, a bluing agent to facilitate identification of positive staining cells. Any chemicals that detect antigen-antibody binding may be used in the practice of the methods provided herein. The kits may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or more antibodies for use in the methods provided herein.
- In one embodiment, the kits for practicing the methods provided herein comprise at least one primer pair directed to a biomarker of interest, chemicals for the detection of amplification of the biomarker of interest, and, optionally, any agent necessary for quantifying the detection level of the biomarker of interest. Any chemicals that detect amplification products may be used in the practice of the methods provided herein. The kits may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or more primer pairs for use in the methods provided herein.
- In one embodiment, the kits for practicing the methods provided herein comprise at least one probe directed to a biomarker of interest, chemicals for the detection of hybridization of the probe to the biomarker of interest, and, optionally, any agent necessary for quantifying the level of the biomarker of interest. Any chemicals that detect hybridization products may be used in the practice of the methods provided herein. The kits may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, or more probes for use in the methods provided herein.
- The present invention is further illustrated by reference to the following Examples. However, it should be noted that these Examples, like the embodiments described above, are illustrative and are not to be construed as restricting the scope of the invention in any way.
- Recent genomic analyses of pathologically-defined tumor types has identified disease subtypes within a tissue. The extent to which genomic signatures are shared across tumorous tissues remains unclear.
- Provided within this example is the development and validation of an 84-gene gene signature that can be used in a method for classifying a tumor sample obtained from a patient as one of 21 possible integrated, pan-cancer cluster of cluster assignment (COCA) subtypes, thereby providing valuable insight into tumor biology and potential therapeutic response. The 21 COCA subtypes that can be determined using the gene signature developed herein alone are listed in
FIG. 1 and are designated as C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA)). - This example was initiated to address the need for an efficient method for improved tumor classification based on cell-of-origin that could inform prognosis, drug response and patient management based on underlying genomic and biologic tumor characteristics. Using the data associated with the 2018 TCGA Pan-cancer publications (https://gdc.cancer.gov/about-data/publications/pancanatlas) and comparing to the multi-platform cluster of cluster assignment (COCA) analysis performed in Hoadley et al, Cell. 2018 Apr. 5; 173(2):291-304 (hereinafter referred to as the “Gold Standard” for COCA subtyping) a pan-cancer COCA subtyping signature was developed. The gene signature developed in this example can be used in diagnostic methods that include evaluation of gene expression subtypes and application of an algorithm for categorization of a tumor sample obtained from a subject into one of 21 COCA subtypes C1 (ACC/PCPG), C2 (GBM/LGG), C3 (OV), C4 (Squamous-like), C6 (LUAD-Enriched), C8 (PAAD/some STAD), C9 (UCS), C10 (BRCA/Basal), C12 (UCEC), C14 (PRAD), C15 (CESC (subset of cervical)), C16 (BLCA), C17 (TGCT), C19 (COAD/READ), C20 (SARC/MESO), C21 (KIRK/KICH/KIRP), C22 (Liver), C24 (BRCA/Luminal), C25 (THYM), C26 (SKCM/UVM) and C28 (THCA))).
- To develop the aforementioned pan-cancer, COCA subtyper, data associated with the 2018 TCGA Pan-cancer publications (https://gdc.cancer.gov/about-data/publications/pancanatlas) was downloaded. In particular, the expression data from primary solid tumor samples (n=8545; primary solid tumor per TCGA barcode) that had expression data from the “EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2” platform (i.e., EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV@-v2.geneExp.tsv) from the TCGA dataset was used, as were the merged sample quality annotations (i.e., merged_sample_quality_annotations.tsv). Data from “do_not_use=False” specified in the sample quality file (merged_sample_quality_annotations.tsv) as well as data from samples from the pilot study (designated tumor type=“FFFP”) were excluded. The 8545 samples were from 32 tumor types. The 32 tumor types were kidney renal papillary cell carcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer (THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma (PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma (KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG); sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD); head and neck squamous cell carcinoma (HNSC); uterine corpus endometrial carcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma (ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma (OV); rectum adenocarcinoma (READ); adrenocortical carcinoma (ACC); uveal melanoma (UVM); mesothelioma (MESO); pheochromocytoma and paraganglioma (PCPG); skin cutaneous melanoma (SKCM); uterine carcinsarcoma (UCS); lung squamous cell carcinoma (LUSC); testicular germ cell tumors (TGCT); cholangiocarcinoma (CHOL); pancreatic adenocarcinoma (PAAD); thymoma (THYM); and Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC).
- The COCA subtypes (i.e., COCA_Sample_Assignment_n9759.csv) from Hoadley et al, (Cell. 2018 Apr. 5; 173(2):291-304) were then assigned to the 8545 samples from the TCGA data described above, excluding COCA subtypes with 30 or fewer samples.
FIG. 1 shows the cross-tabulation of the TCGA tumor type and COCA subtype from the Hoadley et al, 2018 paper for samples with qualifying expression data as described herein.FIG. 1 also provides the integrated COCA subtypes and their designations as provided herein. - To develop the reduced and clinically applicable pan cancer COCA subtyper, the 8545 samples from the TCGA dataset described above (and the RNA-seq expression data associated therewith) were divided into a training set (⅔ of the data set; n=5696) and a test set (⅓ of the data set; n=2849), balancing for uniform tumor type of origin distributions (see the Table in
FIG. 2 ). Gene expression values werelog 2 transformed and genes with low variance and/or low mean were filtered out, while genes with mean variance and mean expression values greater than 4 were kept resulting in gene expression data for 2190 genes (see graph inFIG. 2 ). It should be noted that samples that were found to have a COCA subtype 5 (C5; n=41) using the gold standard COCA subtyper described in Hoadley et al, 2018 were excluded from the training set due to the presence of a small number of samples that were not well differentiated by gene expression. As a result, the training set subsequently used to generate the COCA subtyper via cross-validation and classification to the nearest centroid (ClaNC (Dabney, 2006)) had an n of 5655 samples. - As mentioned, a Classification to Nearest Centroid (CLaNC) algorithm (see Alan R. Dabney; ClaNC: point-and-click software for classifying microarrays to nearest centroids, Bioinformatics,
Volume 22,Issue FIG. 3 ) that were subsequently tested using 5-fold cross-validation (CV) to find the minimum number of genes that would be required to provide differentiation of the aforementioned COCA subtypes with sufficient agreement with the previously developed gold standard (i.e., COCA analysis on multiplatform ‘omic’ data as described in Hoadley et al, 2018). As shown inFIG. 3 , said 5-fold cross validation suggested that 4 genes per subtype for a total of 84 genes (i.e., for the 21 COCA subtypes described herein) would achieve sufficient agreement between the classifier prediction and COCA subtype as determined using the gold standard method from Hoadley et al. 2018. - Regarding selection of the final 84 genes (i.e., 4 genes/COCA subtype) to be included in the 21 class COCA subtyper, the ClaNC software package (see Dabney, 2006) used on the entire training set calculated t-statistics and 84 genes were selected based on the ranks of the strongest t-statistics (i.e., both negatively and positively correlated genes for each COCA subtype can be and were selected) (see Table 1). Then an ordinary nearest centroid classifier was fit using the 21 COCA classes and 84 genes.
- Validation of the reduced gene signature was performed by applying the 84-gene nearest centroid classifier of Table 1 to the test set (n=2849) and comparing the COCA subtypes as determined by the gold standard vs. the 84-gene classifier or signature (i.e., Table 1). As shown in
FIG. 4 , the test set showed an overall agreement of 90%, which was similar to the agreement with COCA GS subtyping of 91% for the training set.FIG. 5 showed that the 84 gene nearest centroid classifier called a vast majority of the COCA subtypes in the test set correctly. - Development and validation of an 84-gene signature for COCA subtyping was described. The resulting 84 gene signature maintains high concordance rates with the gold standard COCA subtyper as described in the art.
- Subtypes provide potential biomarkers for targeted and immunotherapy response. The data demonstrate that differences in prognosis that may be meaningful to therapeutic management.
- This example describes the examination of the 84 gene COCA subtyper developed in Example 1 and found in Table 1 as a prognostic indicator for overall survival. Overall, the goal of the studies in this example was to determine if the 84-gene COCA signature has prognostic value across a myriad of tumor types.
- In order to determine if the 84 gene signature of Table 1 has prognostic utility, associations between overall survival and the 84 gene COCA signature were examined within specific tumor types (i.e., BLCA, BRCA and STAD). Associations between overall survival and the 84 gene signature were examined separately within tumor type by fitting cox models adjusted for age at diagnosis and stage with overall survival the outcome and classifier subtype as the predictor, reporting hazard ratios for classifier subtype, and testing (Wald's test) whether the coefficient for classifier subtype was different from zero. It should be noted that the association tests used only subtype categories having many samples. For example, BLCA tumors were classified into 8 predicted subtype categories (C10, C15, C16, C20, C25, C4, C8, C9; see
FIG. 6 ) but 92% (345/375) were in two of them (C16 and C4), and only these categories were analyzed. - As shown in
FIGS. 6-8 , specific COCA subtypes can be associated with overall survival. For example, as shown inFIG. 6 , the C4 COCA subtype was significantly associated with worse overall survival in BLCA (association test p-value for C4 subtype as determined using Table 1 gene signature was 0.0204, while the Hazard ratio was 1.53 (i.e., second column);FIG. 6 ), while the C8 COCA subtype in STAD (association test p-value for C8 subtype as determined using Table 1 gene signature was 0.00689, while the Hazard ratio was 1.67;FIG. 8 ) samples was also associated with worse overall survival. In contrast, the C24 COCA subtype in the BRCA sample had better overall survival (association test p-value was 0.00013, while the Hazard ratio was 0.37;FIG. 7 ). - The following references are incorporated by reference in their entireties for all purposes.
- Hoadley, Katherine A., Christina Yau, Toshinori Hinoue, Denise M. Wolf, Alexander J. Lazar, Esther Drill, Ronglai Shen et al. “Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer.” Cell173, no. 2 (2018): 291-304.
- Hoadley, Katherine A., Christina Yau, Denise M. Wolf, Andrew D. Cherniack, David Tamborero, Sam Ng, Max D M Leiserson et al. “Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.”
Cell 158, no. 4 (2014): 929-944. - Alan R. Dabney; ClaNC: point-and-click software for classifying microarrays to nearest centroids, Bioinformatics,
Volume 22,Issue - Alan R. Dabney; Classification of microarrays to nearest centroids, Bioinformatics, Volume 21,
Issue - Other subject matter contemplated by the present disclosure is set out in the following numbered embodiments:
- 1. A method for determining a clustering of cluster assignments (COCA) subtype of a tumor cancer sample obtained from a patient, the method comprising detecting an expression level of at least one classifier biomarker of Table 1, wherein the detection of the expression level of the classifier biomarker specifically identifies a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype.
- 2. The method of embodiment 1, wherein the method further comprises comparing the detected levels of expression of the at least one classifier biomarker of Table 1 to the expression of the at least one classifier biomarker of Table 1 in at least one sample training set(s), wherein the at least one sample training set(s) comprises expression data of the at least one classifier biomarker of Table 1 from a reference C1 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C2 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C3 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C4 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C6 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C8 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C9 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C10 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C12 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C14 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C15 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C16 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C17 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C19 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C20 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C21 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C22 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C24 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C25 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C26 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C28 sample or a combination thereof; and classifying the sample as the C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the comparing step.
- 3. The method of
embodiment 2, wherein the comparing step comprises applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm. - 4. The method of any one of embodiments 1-3, wherein the C1 COCA subtype indicates that a tumor sample is substantially similar to or is adenocortical carcinoma; the C2 COCA subtype indicates that a tumor sample is substantially similar to or is glioblastoma; the C3 COCA subtype indicates that a tumor sample is substantially similar to or is an ovarian serous cystadenocarcinoma (epithelial ovarian cancer); the C4 COCA subtype indicates that a tumor sample is substantially similar to or is squamous cell carcinoma of the lung, the head and neck or the bladder; the C6 COCA subtype indicates that a tumor sample is substantially similar to or is lung adenocarcinoma; the C8 COCA subtype indicates that a tumor sample is substantially similar to or is pancreatic adenocarcinoma; the C9 COCA subtype indicates that a tumor sample is substantially similar to or is uterine carcinosarcoma; the C10 COCA subtype indicates that a tumor sample is substantially similar to or is the basal subtype of breast cancer; the C12 COCA subtype indicates that a tumor sample is substantially similar to or is uterine corpus endometrial cancer; the C14 COCA subtype indicates that a tumor sample is substantially similar to or is prostate cancer; the C15 COCA subtype can indicate that a tumor sample is substantially similar to or is non-squamous cervical cancer; the C16 COCA subtype indicates that a tumor sample is substantially similar to or is a bladder urothelial carcinoma; the C17 COCA subtype indicates that a tumor sample is substantially similar to or is a testicular germ cell tumor; the C19 COCA subtype indicates that a tumor sample is substantially similar to or is a colon, rectal, esophageal or stomach adenocarcinoma; the C20 COCA subtype indicates that a tumor sample is substantially similar to or is a sarcoma; the C21 COCA subtype indicates that a tumor sample is substantially similar to or is a kidney chromophobe, kidney renal papillary cell carcinoma or kidney renal clear cell carcinoma; the C22 COCA subtype indicates that a tumor sample is substantially similar to or is liver hepatocellular carcinoma; the C24 COCA subtype indicates that a tumor sample is substantially similar to or is the luminal subtype of breast cancer; the C25 COCA subtype indicates that a tumor sample is substantially similar to or is thymoma; the C26 COCA subtype indicates that a tumor sample is substantially similar to or is melanoma; or the C28 COCA subtype indicates that a tumor sample is substantially similar to or is thyroid cancer.
- 5. The method of any one of embodiments 1-4, wherein the expression level of the classifier biomarker is detected at the nucleic acid level.
- 6. The method of
embodiment 5, wherein the nucleic acid level is RNA or cDNA. - 7. The
method embodiment - 8. The method of
embodiment 7, wherein the expression level is detected by performing RNAseq. - 9. The method of
embodiment 8, wherein the detection of the expression level comprises using at least one pair of oligonucleotide primers specific for at least one classifier biomarker of Table 1. - 10. The method of any one of embodiments 1-9, wherein the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, a fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- 11. The method of
embodiment 10, wherein the bodily fluid is blood or fractions thereof, urine, saliva, or sputum. - 12. The method of any one embodiments 1-11, wherein the at least one classifier biomarker comprises a plurality of classifier biomarkers.
- 13. The method of
embodiment 12, wherein the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 4 classifier biomarkers, at least 6 classifier biomarkers, at least 8 classifier biomarkers, at least 10 classifier biomarkers, at least 12 classifier biomarkers, at least 14 classifier biomarkers, at least 16 classifier biomarkers, at least 18 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1. - 14. The method of any one of embodiments 1-13, wherein the at least one classifier biomarker comprises, consists essentially of or consists of all the classifier biomarkers of Table 1.
- 15. A method of detecting a biomarker in a tumor sample obtained from a patient, the method comprising measuring the expression level of a plurality of classifier biomarker nucleic acids selected from Table 1 using an amplification, hybridization and/or sequencing assay.
- 16. The method of
embodiment 15, wherein the patient is suffering from or is suspected of suffering from kidney renal papillary cell carcinoma (KIRP); breast invasive carcinoma (BRCA); thyroid cancer (THCA); bladder urothelial carcinoma (BLCA); prostate adenocarcinoma (PRAD); kidney chromophobe (KICH); cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC); kidney renal clear cell carcinoma (KIRC); liver hepatocellular carcinoma (LIHC); low grade glioma (LGG); sarcoma (SARC); lung adenocarcinoma (LUAD); colon adenocarcinoma (COAD); head and neck squamous cell carcinoma (HNSC); uterine corpus endometrial carcinoma (UCEC); glioblastoma multiforme (GBM); esophageal carcinoma (ESCA); stomach adenocarcinoma (STAD); ovarian serous cystadenocarcinoma (OV); rectum adenocarcinoma (READ); adrenocortical carcinoma (ACC); uveal melanoma (UVM); mesothelioma (MESO); pheochromocytoma and paraganglioma (PCPG); skin cutaneous melanoma (SKCM); uterine carcinsarcoma (UCS); lung squamous cell carcinoma (LUSC); testicular germ cell tumors (TGCT); cholangiocarcinoma (CHOL); pancreatic adenocarcinoma (PAAD); thymoma (THYM); or Lymphoid Neoplasm Diffuse Large B-cell Lymphoma (DLBC). - 17. The method of
embodiment - 18. The method of
embodiment 17, wherein the expression level is detected by performing RNAseq. - 19. The method of
embodiment 18, wherein the detection of the expression level comprises using at least one pair of oligonucleotide primers per each of the plurality of biomarker nucleic acids selected from Table 1. - 20. The method of any one of embodiments 15-19, wherein the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- 21. The method of
embodiment 20, wherein the bodily fluid is blood or fractions thereof, urine, saliva, or sputum. - 22. The method of any one of embodiments 15-21, wherein the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- 23. The method of any one of embodiments 15-22, wherein the plurality of biomarker nucleic acids comprises, consists essentially of or consists of all the classifier biomarker nucleic acids of Table 1.
- 24. A method of treating cancer in a subject, the method comprising:
- measuring the expression level of at least one biomarker nucleic acid in a tumor sample obtained from the subject, wherein the at least one biomarker nucleic acid is selected from a set of biomarkers listed in Table 1, wherein the presence, absence and/or level of the at least one biomarker indicates a COCA subtype of the cancer; and administering a therapeutic agent based on the COCA subtype of the cancer.
- 25. The method of embodiment 24, wherein the at least one biomarker nucleic acid selected from the set of biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- 26. The method of
embodiment 24 or 25, further comprising measuring the expression of at least one biomarker from an additional set of biomarkers. - 27. The method of
embodiment 26, wherein the additional set of biomarkers comprises at least an immune cell signature, a cell proliferation signature, or drug target genes. - 28. The method of any one of embodiments 24-27, wherein the measuring the expression level is conducted using an amplification, hybridization and/or sequencing assay.
- 29. The method of
embodiment 28, wherein the amplification, hybridization and/or sequencing assay comprises performing quantitative real time reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq, microarrays, gene chips, nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques. - 30. The method of
embodiment 29, wherein the expression level is detected by performing RNAseq. - 31. The method of any one of embodiments 24-30, wherein the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- 32. The method of
embodiment 31, wherein the bodily fluid is blood or fractions thereof, urine, saliva, or sputum. - 33. The method of any one of embodiments 24-32, wherein the subject's COCA subtype is selected from C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28.
- 34. The method of embodiment 33, wherein the C1 COCA subtype indicates that a tumor sample is substantially similar to or is adenocortical carcinoma; the C2 COCA subtype indicates that a tumor sample is substantially similar to or is glioblastoma; the C3 COCA subtype indicates that a tumor sample is substantially similar to or is an ovarian serous cystadenocarcinoma (epithelial ovarian cancer); the C4 COCA subtype indicates that a tumor sample is substantially similar to or is squamous cell carcinoma of the lung, the head and neck or the bladder; the C6 COCA subtype indicates that a tumor sample is substantially similar to or is lung adenocarcinoma; the C8 COCA subtype indicates that a tumor sample is substantially similar to or is pancreatic adenocarcinoma; the C9 COCA subtype indicates that a tumor sample is substantially similar to or is uterine carcinosarcoma; the C10 COCA subtype indicates that a tumor sample is substantially similar to or is the basal subtype of breast cancer; the C12 COCA subtype indicates that a tumor sample is substantially similar to or is uterine corpus endometrial cancer; the C14 COCA subtype indicates that a tumor sample is substantially similar to or is prostate cancer; the C15 COCA subtype can indicate that a tumor sample is substantially similar to or is non-squamous cervical cancer; the C16 COCA subtype indicates that a tumor sample is substantially similar to or is a bladder urothelial carcinoma; the C17 COCA subtype indicates that a tumor sample is substantially similar to or is a testicular germ cell tumor; the C19 COCA subtype indicates that a tumor sample is substantially similar to or is a colon, rectal, esophageal or stomach adenocarcinoma; the C20 COCA subtype indicates that a tumor sample is substantially similar to or is a sarcoma; the C21 COCA subtype indicates that a tumor sample is substantially similar to or is a kidney chromophobe, kidney renal papillary cell carcinoma or kidney renal clear cell carcinoma; the C22 COCA subtype indicates that a tumor sample is substantially similar to or is liver hepatocellular carcinoma; the C24 COCA subtype indicates that a tumor sample is substantially similar to or is the luminal subtype of breast cancer; the C25 COCA subtype indicates that a tumor sample is substantially similar to or is thymoma; the C26 COCA subtype indicates that a tumor sample is substantially similar to or is melanoma; or the C28 COCA subtype indicates that a tumor sample is substantially similar to or is thyroid cancer.
- 35. A method of predicting overall survival in a cancer patient, the method comprising detecting an expression level of at least one classifier biomarker of Table 1 in a tumor sample obtained from a patient, wherein the detection of the expression level of the at least one classifier biomarker specifically identifies a COCA subtype, and wherein identification of the COCA subtype is predictive of the overall survival in the patient.
- 36. The method of embodiment 35, wherein the method further comprises comparing the detected levels of expression of the at least one classifier biomarker of Table 1 to the expression of the at least one classifier biomarker of Table 1 in at least one sample training set(s), wherein the at least one sample training set(s) comprises expression data of the at least one classifier biomarker of Table 1 from a reference C1 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C2 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C3 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C4 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C6 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C8 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C9 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C10 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C12 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C14 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C15 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C16 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C17 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C19 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C20 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C21 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C22 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C24 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C25 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C26 sample, expression data of the at least one classifier biomarker of Table 1 from a reference C28 sample or a combination thereof; and classifying the sample as the C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the comparing step.
- 37. The method of
embodiment 36, wherein the comparing step comprises applying a statistical algorithm which comprises determining a correlation between the expression data obtained from the sample and the expression data from the at least one training set(s); and classifying the sample as a C1, C2, C3, C4, C6, C8, C9, C10, C12, C14, C15, C16, C17, C19, C20, C21, C22, C24, C25, C26 or C28 COCA subtype based on the results of the statistical algorithm. - 38. The method of any one of the embodiments 35-37, wherein the expression level of the classifier biomarker is detected at the nucleic acid level.
- 39. The method of embodiment 38, wherein the nucleic acid level is RNA or cDNA.
- 40. The method of any one of embodiments 35-39, wherein the detecting an expression level comprises performing quantitative real time reverse transcriptase polymerase chain reaction (qRT-PCR), RNAseq, microarrays, gene chips, nCounter Gene Expression Assay, Serial Analysis of Gene Expression (SAGE), Rapid Analysis of Gene Expression (RAGE), nuclease protection assays, Northern blotting, or any other equivalent gene expression detection techniques.
- 41. The method of
embodiment 40, wherein the expression level is detected by performing RNAseq. - 42. The method of
embodiment 35, wherein the detection of the expression level comprises using at least one pair of oligonucleotide primers specific for at least one classifier biomarker of Table 1. - 43. The method of any one of embodiments 35-42, wherein the sample is a formalin-fixed, paraffin-embedded (FFPE) tissue sample, fresh or a frozen tissue sample, an exosome, wash fluids, cell pellets, or a bodily fluid obtained from the patient.
- 44. The method of
embodiment 43, wherein the bodily fluid is blood or fractions thereof, urine, saliva, or sputum. - 45. The method of any one of embodiments 35-44, wherein the at least one classifier biomarker comprises a plurality of classifier biomarkers.
- 46. The method of embodiment 45, wherein the plurality of classifier biomarkers comprises, consists essentially of or consists of at least 2 classifier biomarkers, at least 5 classifier biomarkers, at least 10 classifier biomarkers, at least 20 classifier biomarkers, at least 30 classifier biomarkers, at least 40 classifier biomarkers, at least 50 classifier biomarkers, at least 60 classifier biomarkers, at least 70 classifier biomarkers or at least 80 classifier biomarkers of Table 1.
- 47. The method of any one of embodiments 35-46, wherein the at least one classifier biomarker comprises, consists essentially of or consists of all the classifier biomarkers of Table 1.
- The various embodiments described above can be combined to provide further embodiments. All of the U.S. patents, U.S. patent application publications, U.S. patent application, foreign patents, foreign patent application and non-patent publications referred to in this specification and/or listed in the Application Data Sheet are incorporated herein by reference, in their entirety. Aspects of the embodiments can be modified, if necessary to employ concepts of the various patents, application and publications to provide yet further embodiments.
- These and other changes can be made to the embodiments in light of the above-detailed description. In general, in the following claims, the terms used should not be construed to limit the claims to the specific embodiments disclosed in the specification and the claims, but should be construed to include all possible embodiments along with the full scope of equivalents to which such claims are entitled. Accordingly, the claims are not limited by the disclosure.
Claims (47)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/284,310 US20210388449A1 (en) | 2018-10-09 | 2019-10-09 | Detecting cancer cell of origin |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862743256P | 2018-10-09 | 2018-10-09 | |
US201962819893P | 2019-03-18 | 2019-03-18 | |
PCT/US2019/055318 WO2020076897A1 (en) | 2018-10-09 | 2019-10-09 | Detecting cancer cell of origin |
US17/284,310 US20210388449A1 (en) | 2018-10-09 | 2019-10-09 | Detecting cancer cell of origin |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2019/055318 A-371-Of-International WO2020076897A1 (en) | 2018-10-09 | 2019-10-09 | Detecting cancer cell of origin |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/820,546 Continuation US11851715B2 (en) | 2018-10-09 | 2022-08-17 | Detecting cancer cell of origin |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210388449A1 true US20210388449A1 (en) | 2021-12-16 |
Family
ID=70165239
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/284,310 Pending US20210388449A1 (en) | 2018-10-09 | 2019-10-09 | Detecting cancer cell of origin |
US17/820,546 Active US11851715B2 (en) | 2018-10-09 | 2022-08-17 | Detecting cancer cell of origin |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/820,546 Active US11851715B2 (en) | 2018-10-09 | 2022-08-17 | Detecting cancer cell of origin |
Country Status (4)
Country | Link |
---|---|
US (2) | US20210388449A1 (en) |
EP (1) | EP3864165A4 (en) |
CA (1) | CA3115922A1 (en) |
WO (1) | WO2020076897A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115851955A (en) * | 2022-12-16 | 2023-03-28 | 柳州市人民医院 | Marker for predicting necrotic apoptosis of lung squamous cell carcinoma and risk assessment model |
US11851715B2 (en) | 2018-10-09 | 2023-12-26 | Genecentric Therapeutics, Inc. | Detecting cancer cell of origin |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021257630A2 (en) * | 2020-06-15 | 2021-12-23 | Cutler Richelle | Drugs and methods for reducing body odor and sweat |
Family Cites Families (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4683202A (en) | 1985-03-28 | 1987-07-28 | Cetus Corporation | Process for amplifying nucleic acid sequences |
US4843155A (en) | 1987-11-19 | 1989-06-27 | Piotr Chomczynski | Product and process for isolating RNA |
US6040138A (en) | 1995-09-15 | 2000-03-21 | Affymetrix, Inc. | Expression monitoring by hybridization to high density oligonucleotide arrays |
US5143854A (en) | 1989-06-07 | 1992-09-01 | Affymax Technologies N.V. | Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof |
US5800992A (en) | 1989-06-07 | 1998-09-01 | Fodor; Stephen P.A. | Method of detecting nucleic acids |
US5744101A (en) | 1989-06-07 | 1998-04-28 | Affymax Technologies N.V. | Photolabile nucleoside protecting groups |
ATE148889T1 (en) | 1991-09-18 | 1997-02-15 | Affymax Tech Nv | METHOD FOR SYNTHESIS OF VARIOUS COLLECTIONS OF OLIGOMERS |
US5384261A (en) | 1991-11-22 | 1995-01-24 | Affymax Technologies N.V. | Very large scale immobilized polymer synthesis using mechanically directed flow paths |
EP0916396B1 (en) | 1991-11-22 | 2005-04-13 | Affymetrix, Inc. (a Delaware Corporation) | Combinatorial strategies for polymer synthesis |
US6090555A (en) | 1997-12-11 | 2000-07-18 | Affymetrix, Inc. | Scanned image alignment systems and methods |
US5571639A (en) | 1994-05-24 | 1996-11-05 | Affymax Technologies N.V. | Computer-aided engineering system for design of sequence arrays and lithographic masks |
US5795716A (en) | 1994-10-21 | 1998-08-18 | Chee; Mark S. | Computer-aided visualization and analysis system for sequence evaluation |
US5556752A (en) | 1994-10-24 | 1996-09-17 | Affymetrix, Inc. | Surface-bound, unimolecular, double-stranded DNA |
US5545531A (en) | 1995-06-07 | 1996-08-13 | Affymax Technologies N.V. | Methods for making a device for concurrently processing multiple biological chip assays |
US5856174A (en) | 1995-06-29 | 1999-01-05 | Affymetrix, Inc. | Integrated nucleic acid diagnostic device |
US5733729A (en) | 1995-09-14 | 1998-03-31 | Affymetrix, Inc. | Computer-aided probability base calling for arrays of nucleic acid probes on chips |
US5854033A (en) | 1995-11-21 | 1998-12-29 | Yale University | Rolling circle replication reporter systems |
EP0880598A4 (en) | 1996-01-23 | 2005-02-23 | Affymetrix Inc | Nucleic acid analysis techniques |
US6420108B2 (en) | 1998-02-09 | 2002-07-16 | Affymetrix, Inc. | Computer-aided display for comparative gene expression |
WO1999005574A1 (en) | 1997-07-25 | 1999-02-04 | Affymetrix, Inc. | Method and system for providing a probe array chip design database |
ATE280246T1 (en) | 1997-08-15 | 2004-11-15 | Affymetrix Inc | POLYMORPHISM DETECTION USING CLUSTER ANALYSIS |
DE69829402T2 (en) | 1997-10-31 | 2006-04-13 | Affymetrix, Inc. (a Delaware Corp.), Santa Clara | EXPRESSION PROFILES IN ADULTS AND FOLDS ORGANS |
US6020135A (en) | 1998-03-27 | 2000-02-01 | Affymetrix, Inc. | P53-regulated genes |
US6185561B1 (en) | 1998-09-17 | 2001-02-06 | Affymetrix, Inc. | Method and apparatus for providing and expression data mining database |
US6670321B1 (en) | 1998-12-30 | 2003-12-30 | The Children's Medical Center Corporation | Prevention and treatment for retinal ischemia and edema |
US20030097222A1 (en) | 2000-01-25 | 2003-05-22 | Craford David M. | Method, system, and computer software for providing a genomic web portal |
US6988040B2 (en) | 2001-01-11 | 2006-01-17 | Affymetrix, Inc. | System, method, and computer software for genotyping analysis and identification of allelic imbalance |
US20020183936A1 (en) | 2001-01-24 | 2002-12-05 | Affymetrix, Inc. | Method, system, and computer software for providing a genomic web portal |
US20030120432A1 (en) | 2001-01-29 | 2003-06-26 | Affymetrix, Inc. | Method, system and computer software for online ordering of custom probe arrays |
US6804679B2 (en) | 2001-03-12 | 2004-10-12 | Affymetrix, Inc. | System, method, and user interfaces for managing genomic data |
US7473767B2 (en) | 2001-07-03 | 2009-01-06 | The Institute For Systems Biology | Methods for detection and quantification of analytes in complex mixtures |
US20030100995A1 (en) | 2001-07-16 | 2003-05-29 | Affymetrix, Inc. | Method, system and computer software for variant information via a web portal |
US6872529B2 (en) | 2001-07-25 | 2005-03-29 | Affymetrix, Inc. | Complexity management of genomic DNA |
US20030120431A1 (en) | 2001-12-21 | 2003-06-26 | Affymetrix, Inc. | Method and computer software product for genomic alignment and assessment of the transcriptome |
US20040002818A1 (en) | 2001-12-21 | 2004-01-01 | Affymetrix, Inc. | Method, system and computer software for providing microarray probe data |
US20040126840A1 (en) | 2002-12-23 | 2004-07-01 | Affymetrix, Inc. | Method, system and computer software for providing genomic ontological data |
US20040049354A1 (en) | 2002-04-26 | 2004-03-11 | Affymetrix, Inc. | Method, system and computer software providing a genomic web portal for functional analysis of alternative splice variants |
US20050042654A1 (en) | 2003-06-27 | 2005-02-24 | Affymetrix, Inc. | Genotyping methods |
WO2008151110A2 (en) | 2007-06-01 | 2008-12-11 | The University Of North Carolina At Chapel Hill | Molecular diagnosis and typing of lung cancer variants |
WO2011116380A2 (en) * | 2010-03-19 | 2011-09-22 | H. Lee Moffitt Cancer Center And Research Institute, Inc. | Hybrid model for the classification of carcinoma subtypes |
DK3435084T3 (en) * | 2012-08-16 | 2023-05-30 | Mayo Found Medical Education & Res | PROSTATE CANCER PROGNOSIS USING BIOMARKERS |
WO2014149629A1 (en) * | 2013-03-15 | 2014-09-25 | Htg Molecular Diagnostics, Inc. | Subtyping lung cancers |
JP2016519935A (en) | 2013-05-13 | 2016-07-11 | ナノストリング テクノロジーズ,インコーポレイティド | A method to predict the risk of recurrence in nodule-positive early breast cancer |
CN107208131A (en) * | 2014-05-30 | 2017-09-26 | 基因中心治疗公司 | Method for lung cancer parting |
US11041214B2 (en) | 2016-05-17 | 2021-06-22 | Genecentric Therapeutics, Inc. | Methods for subtyping of lung squamous cell carcinoma |
WO2017201165A1 (en) | 2016-05-17 | 2017-11-23 | Genecentric Diagnostics, Inc. | Methods for subtyping of lung adenocarcinoma |
EP3665199A4 (en) | 2017-08-07 | 2021-08-11 | Genecentric Therapeutics, Inc. | Methods for subtyping of head and neck squamous cell carcinoma |
WO2019046585A1 (en) | 2017-08-30 | 2019-03-07 | Genecentric Therapeutics, Inc. | Gene expression subtype analysis of head and neck squamous cell carcinoma for treatment management |
CA3115922A1 (en) | 2018-10-09 | 2020-04-16 | Genecentric Therapeutics, Inc. | Detecting cancer cell of origin |
-
2019
- 2019-10-09 CA CA3115922A patent/CA3115922A1/en active Pending
- 2019-10-09 WO PCT/US2019/055318 patent/WO2020076897A1/en unknown
- 2019-10-09 EP EP19871947.8A patent/EP3864165A4/en active Pending
- 2019-10-09 US US17/284,310 patent/US20210388449A1/en active Pending
-
2022
- 2022-08-17 US US17/820,546 patent/US11851715B2/en active Active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11851715B2 (en) | 2018-10-09 | 2023-12-26 | Genecentric Therapeutics, Inc. | Detecting cancer cell of origin |
CN115851955A (en) * | 2022-12-16 | 2023-03-28 | 柳州市人民医院 | Marker for predicting necrotic apoptosis of lung squamous cell carcinoma and risk assessment model |
Also Published As
Publication number | Publication date |
---|---|
EP3864165A1 (en) | 2021-08-18 |
CA3115922A1 (en) | 2020-04-16 |
WO2020076897A1 (en) | 2020-04-16 |
EP3864165A4 (en) | 2022-08-03 |
US11851715B2 (en) | 2023-12-26 |
US20230037765A1 (en) | 2023-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7241353B2 (en) | Methods for Subtyping Lung Adenocarcinoma | |
JP7241352B2 (en) | Methods for subtyping lung squamous cell carcinoma | |
US11851715B2 (en) | Detecting cancer cell of origin | |
US20220243283A1 (en) | Methods for typing of lung cancer | |
US20230395263A1 (en) | Gene expression subtype analysis of head and neck squamous cell carcinoma for treatment management | |
US10829819B2 (en) | Methods for typing of lung cancer | |
WO2019032525A1 (en) | Methods for subtyping of head and neck squamous cell carcinoma | |
US20210054464A1 (en) | Methods for subtyping of bladder cancer | |
US11739386B2 (en) | Methods for determining response to PARP inhibitors | |
EP4313314A1 (en) | Methods for assessing proliferation and anti-folate therapeutic response | |
US20230243813A1 (en) | Methods for selecting and treating cancer with fgfr3 inhibitors | |
US20240182984A1 (en) | Methods for assessing proliferation and anti-folate therapeutic response | |
US12006554B2 (en) | Methods for subtyping of head and neck squamous cell carcinoma | |
WO2023164595A2 (en) | Methods for subtyping and treatment of head and neck squamous cell carcinoma |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION UNDERGOING PREEXAM PROCESSING |
|
AS | Assignment |
Owner name: GENECENTRIC THERAPEUTICS, INC., NORTH CAROLINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LAI-GOLDMAN, MYLA;FARUKI, HAWAZIN;MAYHEW, GREG;SIGNING DATES FROM 20211209 TO 20220107;REEL/FRAME:059036/0681 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION RETURNED BACK TO PREEXAM |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: THE UNIVERSITY OF NORTH CAROLINA AT CHAPEL HILL, NORTH CAROLINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PEROU, CHARLES M.;PARKER, JOEL;SIGNING DATES FROM 20221130 TO 20230203;REEL/FRAME:062630/0162 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |