NZ612471B2 - Colon cancer gene expression signatures and methods of use - Google Patents
Colon cancer gene expression signatures and methods of use Download PDFInfo
- Publication number
- NZ612471B2 NZ612471B2 NZ612471A NZ61247112A NZ612471B2 NZ 612471 B2 NZ612471 B2 NZ 612471B2 NZ 612471 A NZ612471 A NZ 612471A NZ 61247112 A NZ61247112 A NZ 61247112A NZ 612471 B2 NZ612471 B2 NZ 612471B2
- Authority
- NZ
- New Zealand
- Prior art keywords
- colon cancer
- transcripts
- nucleic acid
- sample
- expression level
- Prior art date
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 236
- 206010009944 Colon cancer Diseases 0.000 title claims abstract description 235
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 141
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 122
- -1 PTK2 Proteins 0.000 claims abstract description 40
- 238000003745 diagnosis Methods 0.000 claims abstract description 19
- 102100016838 AXIN2 Human genes 0.000 claims abstract description 12
- 101700047552 AXIN2 Proteins 0.000 claims abstract description 12
- 102100000143 CHD2 Human genes 0.000 claims abstract description 12
- 101700048510 CHD2 Proteins 0.000 claims abstract description 12
- 101700042715 DSCAM Proteins 0.000 claims abstract description 12
- 102100009831 EPHB4 Human genes 0.000 claims abstract description 12
- 102100012155 ITGA6 Human genes 0.000 claims abstract description 12
- 101710006669 ITGA6 Proteins 0.000 claims abstract description 12
- 102100018200 MMP1 Human genes 0.000 claims abstract description 12
- 101700019781 MMP1 Proteins 0.000 claims abstract description 12
- 102100013951 MUC2 Human genes 0.000 claims abstract description 12
- 102100013038 RUNX1 Human genes 0.000 claims abstract description 12
- 102100004239 SOD2 Human genes 0.000 claims abstract description 12
- 108010045815 superoxide dismutase 2 Proteins 0.000 claims abstract description 12
- 101710024842 BMPR1A Proteins 0.000 claims abstract description 11
- 102100014323 BMPR1A Human genes 0.000 claims abstract description 11
- 102100000790 DSP Human genes 0.000 claims abstract description 11
- 101710011613 DSP Proteins 0.000 claims abstract description 11
- 108010055323 EphB4 Receptor Proteins 0.000 claims abstract description 11
- 102100006631 GRB7 Human genes 0.000 claims abstract description 11
- 108010022046 GRB7 Adaptor Protein Proteins 0.000 claims abstract description 11
- 101700036633 MUC2 Proteins 0.000 claims abstract description 11
- 101700025439 RUNX1 Proteins 0.000 claims abstract description 11
- 235000019800 disodium phosphate Nutrition 0.000 claims abstract description 11
- 102000017256 epidermal growth factor-activated receptor activity proteins Human genes 0.000 claims abstract description 11
- 108040009258 epidermal growth factor-activated receptor activity proteins Proteins 0.000 claims abstract description 11
- 102100003816 CTSD Human genes 0.000 claims abstract 11
- 101700048360 CTSD Proteins 0.000 claims abstract 11
- 102100005377 BMP2 Human genes 0.000 claims abstract 10
- 101700000123 BMP2 Proteins 0.000 claims abstract 10
- 102100007271 BUB3 Human genes 0.000 claims abstract 10
- 101700050671 BUB3 Proteins 0.000 claims abstract 10
- 102100020540 CCN2 Human genes 0.000 claims abstract 10
- 101700026049 CCN2 Proteins 0.000 claims abstract 10
- 101700068002 DIAP Proteins 0.000 claims abstract 10
- 102100011173 EFNA3 Human genes 0.000 claims abstract 10
- 101700047531 EFNA3 Proteins 0.000 claims abstract 10
- 101710030892 FLT1 Proteins 0.000 claims abstract 10
- 102100006565 FLT1 Human genes 0.000 claims abstract 10
- 102100005434 FOXO3 Human genes 0.000 claims abstract 10
- 108010009307 Forkhead Box Protein O3 Proteins 0.000 claims abstract 10
- 102100017773 GBP1 Human genes 0.000 claims abstract 10
- 101700001333 GBP1 Proteins 0.000 claims abstract 10
- 102100003042 HIF1A Human genes 0.000 claims abstract 10
- 101700000053 HIF1A Proteins 0.000 claims abstract 10
- 102100009498 HMGB1 Human genes 0.000 claims abstract 10
- 108010014739 HMGB1 Protein Proteins 0.000 claims abstract 10
- 102100004115 ICAM1 Human genes 0.000 claims abstract 10
- 101700051176 ICAM1 Proteins 0.000 claims abstract 10
- 102100008565 ID2 Human genes 0.000 claims abstract 10
- 101700024635 ID2 Proteins 0.000 claims abstract 10
- 102100014231 IGF1 Human genes 0.000 claims abstract 10
- 101700074337 IGF1 Proteins 0.000 claims abstract 10
- 102100005117 IGF2 Human genes 0.000 claims abstract 10
- 101700070236 IGF2 Proteins 0.000 claims abstract 10
- 102100014741 IGFBP2 Human genes 0.000 claims abstract 10
- 101710031092 IGFBP2 Proteins 0.000 claims abstract 10
- 102100005660 KLF6 Human genes 0.000 claims abstract 10
- 101700054481 KLF6 Proteins 0.000 claims abstract 10
- 102100012126 NOTCH2 Human genes 0.000 claims abstract 10
- 101710036046 NOTCH2 Proteins 0.000 claims abstract 10
- 102100020134 NOTCH2NLA Human genes 0.000 claims abstract 10
- 101710025980 NOTCH2NLA Proteins 0.000 claims abstract 10
- 102100006771 PKM Human genes 0.000 claims abstract 10
- 101710039160 PKM Proteins 0.000 claims abstract 10
- 101700062224 RTEL1 Proteins 0.000 claims abstract 10
- 102100017844 RTEL1 Human genes 0.000 claims abstract 10
- 102100019657 STAT1 Human genes 0.000 claims abstract 10
- 108010044012 STAT1 Transcription Factor Proteins 0.000 claims abstract 10
- 102100008887 TCF7L2 Human genes 0.000 claims abstract 10
- 101710044572 TCF7L2 Proteins 0.000 claims abstract 10
- 230000037115 Vdr Effects 0.000 claims abstract 10
- 102000009310 vitamin D receptors Human genes 0.000 claims abstract 10
- 108050000156 vitamin D receptors Proteins 0.000 claims abstract 10
- 102100004047 CYP1B1 Human genes 0.000 claims abstract 9
- 101710036800 CYP1B1 Proteins 0.000 claims abstract 9
- 102100000883 RALBP1 Human genes 0.000 claims abstract 9
- 101710003865 RALBP1 Proteins 0.000 claims abstract 9
- 102100012984 XPC Human genes 0.000 claims abstract 9
- 108060009523 XPC Proteins 0.000 claims abstract 9
- 102100009560 TNFRSF6B Human genes 0.000 claims abstract 8
- 101710022317 TNFRSF6B Proteins 0.000 claims abstract 8
- 102100011839 CEACAM6 Human genes 0.000 claims abstract 6
- 101710043949 CEACAM6 Proteins 0.000 claims abstract 6
- 210000001519 tissues Anatomy 0.000 claims description 111
- 229920000160 (ribonucleotides)n+m Polymers 0.000 claims description 110
- 230000000692 anti-sense Effects 0.000 claims description 100
- 210000001072 Colon Anatomy 0.000 claims description 37
- 230000004083 survival Effects 0.000 claims description 35
- 238000004393 prognosis Methods 0.000 claims description 34
- 238000002493 microarray Methods 0.000 claims description 28
- 230000000875 corresponding Effects 0.000 claims description 27
- 238000002512 chemotherapy Methods 0.000 claims description 26
- 229920002676 Complementary DNA Polymers 0.000 claims description 22
- 239000002299 complementary DNA Substances 0.000 claims description 22
- 230000004044 response Effects 0.000 claims description 21
- 102100001819 SLC2A3 Human genes 0.000 claims description 16
- 108091006278 SLC2A3 Proteins 0.000 claims description 16
- 239000002671 adjuvant Substances 0.000 claims description 16
- 239000002253 acid Substances 0.000 claims description 15
- 230000000240 adjuvant Effects 0.000 claims description 15
- 101700061752 ARSD Proteins 0.000 claims description 13
- 102100003309 ARSD Human genes 0.000 claims description 13
- 238000010195 expression analysis Methods 0.000 claims description 12
- 102100010668 OLFM4 Human genes 0.000 claims description 11
- 101700072644 OLFM4 Proteins 0.000 claims description 11
- 102100013952 MUC3A Human genes 0.000 claims description 9
- 101700075415 MUC3A Proteins 0.000 claims description 9
- 102100016254 RNF39 Human genes 0.000 claims description 9
- 101700061704 RNF39 Proteins 0.000 claims description 9
- 102100011371 BCL9L Human genes 0.000 claims description 8
- 101700051346 BCL9L Proteins 0.000 claims description 8
- 102100009686 CXCL9 Human genes 0.000 claims description 8
- 101700052645 CXCL9 Proteins 0.000 claims description 8
- 102100015539 FCGBP Human genes 0.000 claims description 8
- 101700004502 FCGBP Proteins 0.000 claims description 8
- 102100014792 PCLO Human genes 0.000 claims description 8
- 101700077883 PCLO Proteins 0.000 claims description 8
- 239000012188 paraffin wax Substances 0.000 claims description 7
- 238000001574 biopsy Methods 0.000 claims description 6
- 238000002271 resection Methods 0.000 claims description 6
- 229940035295 Ting Drugs 0.000 claims description 5
- WSFSSNUMVMOOMR-UHFFFAOYSA-N formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 claims description 5
- 102100007653 SLC2A14 Human genes 0.000 claims description 4
- 108091006282 SLC2A14 Proteins 0.000 claims description 4
- 102100013274 IRF4 Human genes 0.000 claims description 3
- 108060006838 PWWP3A Proteins 0.000 claims description 3
- 108060008652 TRS85 Proteins 0.000 claims description 3
- 102100014739 SIGMAR1 Human genes 0.000 claims 2
- 101710029616 SIGMAR1 Proteins 0.000 claims 2
- 102100006463 SULT1C2 Human genes 0.000 claims 1
- 101710019003 SULT1C2 Proteins 0.000 claims 1
- 101710019008 SULT1C4 Proteins 0.000 claims 1
- 239000000523 sample Substances 0.000 description 152
- 206010028980 Neoplasm Diseases 0.000 description 78
- 201000011510 cancer Diseases 0.000 description 60
- 238000000034 method Methods 0.000 description 48
- 210000004027 cells Anatomy 0.000 description 47
- 102000004169 proteins and genes Human genes 0.000 description 44
- 108090000623 proteins and genes Proteins 0.000 description 44
- 235000018102 proteins Nutrition 0.000 description 41
- 108020004999 Messenger RNA Proteins 0.000 description 36
- 229920000272 Oligonucleotide Polymers 0.000 description 35
- 229920002106 messenger RNA Polymers 0.000 description 35
- 229920003013 deoxyribonucleic acid Polymers 0.000 description 34
- 238000003752 polymerase chain reaction Methods 0.000 description 33
- 238000004458 analytical method Methods 0.000 description 27
- 201000010099 disease Diseases 0.000 description 26
- 239000002773 nucleotide Substances 0.000 description 26
- 125000003729 nucleotide group Chemical group 0.000 description 26
- 230000003321 amplification Effects 0.000 description 25
- 230000000295 complement Effects 0.000 description 25
- 238000009396 hybridization Methods 0.000 description 25
- 238000003199 nucleic acid amplification method Methods 0.000 description 25
- 201000011231 colorectal cancer Diseases 0.000 description 22
- 238000010200 validation analysis Methods 0.000 description 22
- 238000004166 bioassay Methods 0.000 description 21
- 239000000463 material Substances 0.000 description 21
- 239000000203 mixture Substances 0.000 description 21
- 229920001850 Nucleic acid sequence Polymers 0.000 description 19
- 238000003908 quality control method Methods 0.000 description 19
- 230000027455 binding Effects 0.000 description 18
- 229920000023 polynucleotide Polymers 0.000 description 18
- 238000001356 surgical procedure Methods 0.000 description 18
- 238000001514 detection method Methods 0.000 description 17
- 239000002157 polynucleotide Substances 0.000 description 17
- 238000010837 poor prognosis Methods 0.000 description 17
- 238000011160 research Methods 0.000 description 17
- 238000002790 cross-validation Methods 0.000 description 16
- 239000007787 solid Substances 0.000 description 16
- 238000003757 reverse transcription PCR Methods 0.000 description 14
- 238000011161 development Methods 0.000 description 13
- 230000018109 developmental process Effects 0.000 description 13
- 230000015556 catabolic process Effects 0.000 description 12
- 230000004059 degradation Effects 0.000 description 12
- 238000006731 degradation reaction Methods 0.000 description 12
- 239000003814 drug Substances 0.000 description 12
- 239000000758 substrate Substances 0.000 description 12
- 229940079593 drugs Drugs 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 11
- 238000007901 in situ hybridization Methods 0.000 description 11
- 150000001875 compounds Chemical class 0.000 description 10
- 230000001965 increased Effects 0.000 description 10
- 210000000056 organs Anatomy 0.000 description 10
- 238000002560 therapeutic procedure Methods 0.000 description 10
- 230000034994 death Effects 0.000 description 9
- 229920001155 polypropylene Polymers 0.000 description 9
- 210000004881 tumor cells Anatomy 0.000 description 9
- 238000000018 DNA microarray Methods 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 8
- 230000001809 detectable Effects 0.000 description 8
- 230000004547 gene signature Effects 0.000 description 8
- 150000002500 ions Chemical class 0.000 description 8
- 238000010606 normalization Methods 0.000 description 8
- 238000003499 nucleic acid array Methods 0.000 description 8
- 238000003753 real-time PCR Methods 0.000 description 8
- 206010027476 Metastasis Diseases 0.000 description 7
- 239000004743 Polypropylene Substances 0.000 description 7
- 108090001123 antibodies Proteins 0.000 description 7
- 102000004965 antibodies Human genes 0.000 description 7
- 230000002068 genetic Effects 0.000 description 7
- 102000004190 Enzymes Human genes 0.000 description 6
- 108090000790 Enzymes Proteins 0.000 description 6
- 239000003153 chemical reaction reagent Substances 0.000 description 6
- 230000002596 correlated Effects 0.000 description 6
- 239000011521 glass Substances 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 6
- 230000035945 sensitivity Effects 0.000 description 6
- 239000000126 substance Substances 0.000 description 6
- 206010061819 Disease recurrence Diseases 0.000 description 5
- 229920002459 Intron Polymers 0.000 description 5
- 210000001165 Lymph Nodes Anatomy 0.000 description 5
- 238000002123 RNA extraction Methods 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 5
- 150000007513 acids Chemical class 0.000 description 5
- 230000004913 activation Effects 0.000 description 5
- 150000001413 amino acids Chemical group 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 5
- 238000007374 clinical diagnostic method Methods 0.000 description 5
- 230000003834 intracellular Effects 0.000 description 5
- 238000010369 molecular cloning Methods 0.000 description 5
- 230000011664 signaling Effects 0.000 description 5
- 230000035897 transcription Effects 0.000 description 5
- 229940088598 Enzyme Drugs 0.000 description 4
- 102100019178 GSTO2 Human genes 0.000 description 4
- 101700024122 GSTO2 Proteins 0.000 description 4
- UYTPUPDQBNUYGX-UHFFFAOYSA-N Guanine Chemical group O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 4
- 102100013291 MORC3 Human genes 0.000 description 4
- 101700037430 MORC3 Proteins 0.000 description 4
- 102100013925 MUC6 Human genes 0.000 description 4
- 238000000636 Northern blotting Methods 0.000 description 4
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 4
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 4
- 108020004418 Ribosomal RNA Proteins 0.000 description 4
- 102100005008 SPDYE2 Human genes 0.000 description 4
- 101710011862 SPDYE2 Proteins 0.000 description 4
- 102100012047 TLCD2 Human genes 0.000 description 4
- 101700060560 TLCD2 Proteins 0.000 description 4
- XOAAWQZATWQOTB-UHFFFAOYSA-N Taurine Chemical compound NCCS(O)(=O)=O XOAAWQZATWQOTB-UHFFFAOYSA-N 0.000 description 4
- 108020004566 Transfer RNA Proteins 0.000 description 4
- 229920001949 Transfer RNA Polymers 0.000 description 4
- 239000002246 antineoplastic agent Substances 0.000 description 4
- 238000002869 basic local alignment search tool Methods 0.000 description 4
- 239000000090 biomarker Substances 0.000 description 4
- 230000003197 catalytic Effects 0.000 description 4
- 230000010261 cell growth Effects 0.000 description 4
- 230000001419 dependent Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000012010 growth Effects 0.000 description 4
- 238000007850 in situ PCR Methods 0.000 description 4
- 238000011065 in-situ storage Methods 0.000 description 4
- 230000002401 inhibitory effect Effects 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 229920001239 microRNA Polymers 0.000 description 4
- 239000002679 microRNA Substances 0.000 description 4
- 239000002853 nucleic acid probe Substances 0.000 description 4
- 239000002751 oligonucleotide probe Substances 0.000 description 4
- 230000036961 partial Effects 0.000 description 4
- 230000037361 pathway Effects 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 238000011084 recovery Methods 0.000 description 4
- 230000004043 responsiveness Effects 0.000 description 4
- 238000010839 reverse transcription Methods 0.000 description 4
- 229920002973 ribosomal RNA Polymers 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 230000001225 therapeutic Effects 0.000 description 4
- 238000004450 types of analysis Methods 0.000 description 4
- 230000003827 upregulation Effects 0.000 description 4
- 238000005406 washing Methods 0.000 description 4
- 101710040078 C1orf116 Proteins 0.000 description 3
- 101700079768 MUC6 Proteins 0.000 description 3
- 210000002381 Plasma Anatomy 0.000 description 3
- 238000007792 addition Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 3
- 239000011127 biaxially oriented polypropylene Substances 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000010804 cDNA synthesis Methods 0.000 description 3
- 150000001768 cations Chemical class 0.000 description 3
- 238000004132 cross linking Methods 0.000 description 3
- 230000003247 decreasing Effects 0.000 description 3
- 230000029578 entry into host Effects 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 238000003364 immunohistochemistry Methods 0.000 description 3
- 238000000338 in vitro Methods 0.000 description 3
- 230000002757 inflammatory Effects 0.000 description 3
- 230000000670 limiting Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 239000012528 membrane Substances 0.000 description 3
- 238000000386 microscopy Methods 0.000 description 3
- 230000001537 neural Effects 0.000 description 3
- 229920000620 organic polymer Polymers 0.000 description 3
- 230000002018 overexpression Effects 0.000 description 3
- 238000003068 pathway analysis Methods 0.000 description 3
- 229920001184 polypeptide Polymers 0.000 description 3
- OZAIFHULBGXAKX-UHFFFAOYSA-N precursor Substances N#CC(C)(C)N=NC(C)(C)C#N OZAIFHULBGXAKX-UHFFFAOYSA-N 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 230000002285 radioactive Effects 0.000 description 3
- 230000001105 regulatory Effects 0.000 description 3
- 238000003196 serial analysis of gene expression Methods 0.000 description 3
- 230000002194 synthesizing Effects 0.000 description 3
- 230000001131 transforming Effects 0.000 description 3
- 229920002483 18S ribosomal RNA Polymers 0.000 description 2
- 102100007754 ANGPTL6 Human genes 0.000 description 2
- 101710043740 ANGPTL6 Proteins 0.000 description 2
- 101710032417 ARHGAP26 Proteins 0.000 description 2
- 102100011375 ARHGAP26 Human genes 0.000 description 2
- 102100003265 ASPH Human genes 0.000 description 2
- 101700044969 ASPH Proteins 0.000 description 2
- 101710036216 ATEG_03556 Proteins 0.000 description 2
- 102000008102 Ankyrins Human genes 0.000 description 2
- 108010049777 Ankyrins Proteins 0.000 description 2
- 102100007328 BIRC6 Human genes 0.000 description 2
- 101700009450 BIRC6 Proteins 0.000 description 2
- 102100009557 BLCAP Human genes 0.000 description 2
- 108060000953 BLCAP Proteins 0.000 description 2
- 102000001893 Bone Morphogenetic Protein Receptors Human genes 0.000 description 2
- 108010040422 Bone Morphogenetic Protein Receptors Proteins 0.000 description 2
- 102100019530 CCND2 Human genes 0.000 description 2
- 101700059002 CCND2 Proteins 0.000 description 2
- 102100005190 CDC42SE2 Human genes 0.000 description 2
- 101710005843 CDC42SE2 Proteins 0.000 description 2
- 102100011842 CEACAM5 Human genes 0.000 description 2
- 102000017589 Chromo domain Human genes 0.000 description 2
- 108050005811 Chromo domain Proteins 0.000 description 2
- 108020004635 Complementary DNA Proteins 0.000 description 2
- 102100012104 DLG5 Human genes 0.000 description 2
- 101700037918 DLG5 Proteins 0.000 description 2
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 2
- 102000002312 EC 4.2.1.47 Human genes 0.000 description 2
- 108010062427 EC 4.2.1.47 Proteins 0.000 description 2
- 102100004909 EEF2K Human genes 0.000 description 2
- 229940110715 ENZYMES FOR TREATMENT OF WOUNDS AND ULCERS Drugs 0.000 description 2
- 102000033147 ERVK-25 Human genes 0.000 description 2
- 229920000665 Exon Polymers 0.000 description 2
- 102100008658 FN1 Human genes 0.000 description 2
- 102100001838 FNDC3B Human genes 0.000 description 2
- 101710009383 FNDC3B Proteins 0.000 description 2
- 108010067306 Fibronectins Proteins 0.000 description 2
- 101710037135 GAPC2 Proteins 0.000 description 2
- 101710037116 GAPC3 Proteins 0.000 description 2
- 101710025049 GAPDG Proteins 0.000 description 2
- 101710008404 GAPDH Proteins 0.000 description 2
- 102100006425 GAPDH Human genes 0.000 description 2
- 101700014779 GLB1 Proteins 0.000 description 2
- 102100011343 GLB1 Human genes 0.000 description 2
- 102100007059 GRHL2 Human genes 0.000 description 2
- 101700023305 GRHL2 Proteins 0.000 description 2
- 101710010461 Gapdh1 Proteins 0.000 description 2
- 102100013217 HELZ Human genes 0.000 description 2
- 101700054499 HELZ Proteins 0.000 description 2
- 229940088597 Hormone Drugs 0.000 description 2
- 102100008986 IL32 Human genes 0.000 description 2
- 101700016780 IL32 Proteins 0.000 description 2
- 102100013528 ITPRID2 Human genes 0.000 description 2
- 101710021997 ITPRID2 Proteins 0.000 description 2
- 102000004218 Insulin-like growth factor I Human genes 0.000 description 2
- 108090000723 Insulin-like growth factor I Proteins 0.000 description 2
- 102000000426 Integrin alpha6 Human genes 0.000 description 2
- 108010041100 Integrin alpha6 Proteins 0.000 description 2
- 108020004391 Introns Proteins 0.000 description 2
- 241000229754 Iva xanthiifolia Species 0.000 description 2
- 102100002585 KANK1 Human genes 0.000 description 2
- 101700044750 KANK1 Proteins 0.000 description 2
- 101700055281 KIF24 Proteins 0.000 description 2
- 102100000907 KIF24 Human genes 0.000 description 2
- 102100009466 LRRC37B Human genes 0.000 description 2
- 101710012476 LRRC37B Proteins 0.000 description 2
- 206010024324 Leukaemias Diseases 0.000 description 2
- 102000003960 Ligases Human genes 0.000 description 2
- 108090000364 Ligases Proteins 0.000 description 2
- 108020005198 Long Noncoding RNA Proteins 0.000 description 2
- 101710025050 MK0970 Proteins 0.000 description 2
- 102100020447 MTRES1 Human genes 0.000 description 2
- 101710002783 MTRES1 Proteins 0.000 description 2
- 101700069245 MYO1E Proteins 0.000 description 2
- 102100002153 MYO1E Human genes 0.000 description 2
- 210000003097 Mucus Anatomy 0.000 description 2
- 102100003497 NAA50 Human genes 0.000 description 2
- 101700028098 NAA50 Proteins 0.000 description 2
- 102100007464 NDUFAF6 Human genes 0.000 description 2
- 108060005247 NDUFAF6 Proteins 0.000 description 2
- 108020002397 NR6 subfamily Proteins 0.000 description 2
- 102000005665 Neurotransmitter Transport Proteins Human genes 0.000 description 2
- 108010084810 Neurotransmitter Transport Proteins Proteins 0.000 description 2
- 101710003000 ORF1/ORF2 Proteins 0.000 description 2
- 108090000854 Oxidoreductases Proteins 0.000 description 2
- 102000004316 Oxidoreductases Human genes 0.000 description 2
- 101700019061 POSTN Proteins 0.000 description 2
- 102100013748 POSTN Human genes 0.000 description 2
- 102100015171 PPFIBP1 Human genes 0.000 description 2
- 108091005771 Peptidases Proteins 0.000 description 2
- 101700030467 Pol Proteins 0.000 description 2
- 239000004698 Polyethylene (PE) Substances 0.000 description 2
- 239000004365 Protease Substances 0.000 description 2
- 102100001311 RBM47 Human genes 0.000 description 2
- 101700011548 RBM47 Proteins 0.000 description 2
- 102100007707 RHBDD1 Human genes 0.000 description 2
- 101710032579 RHBDD1 Proteins 0.000 description 2
- 108020004412 RNA 3' Polyadenylation Signals Proteins 0.000 description 2
- 102100006261 RNF145 Human genes 0.000 description 2
- 101710030999 RNF145 Proteins 0.000 description 2
- 102100016246 RNF43 Human genes 0.000 description 2
- 101700048059 RNF43 Proteins 0.000 description 2
- 101700087122 SATB2 Proteins 0.000 description 2
- 102100010823 SATB2 Human genes 0.000 description 2
- 102100004497 SH3D19 Human genes 0.000 description 2
- 101710016254 SH3D19 Proteins 0.000 description 2
- 102000037151 SLC-Transporter Human genes 0.000 description 2
- 108091006187 SLC-Transporter Proteins 0.000 description 2
- 102000005039 SLC6A6 Human genes 0.000 description 2
- 108060007765 SLC6A6 Proteins 0.000 description 2
- 108020004688 Small Nuclear RNA Proteins 0.000 description 2
- 229920001985 Small interfering RNA Polymers 0.000 description 2
- 229920000632 Small nucleolar RNA Polymers 0.000 description 2
- 102100008888 TFAM Human genes 0.000 description 2
- 101700056260 TFAM Proteins 0.000 description 2
- 101710014814 TRABD2A Proteins 0.000 description 2
- 102100002987 TRABD2A Human genes 0.000 description 2
- 102100008220 TTC39B Human genes 0.000 description 2
- 101710022843 TTC39B Proteins 0.000 description 2
- 229960003080 Taurine Drugs 0.000 description 2
- 229920003003 Telomeric non-coding RNA Polymers 0.000 description 2
- 102100013671 ZHX2 Human genes 0.000 description 2
- 101700069422 ZHX2 Proteins 0.000 description 2
- 102100001636 ZNF418 Human genes 0.000 description 2
- 101710014848 ZNF418 Proteins 0.000 description 2
- 102100013610 ZNF75A Human genes 0.000 description 2
- 101710021462 ZNF75A Proteins 0.000 description 2
- 102100001626 ZNF814 Human genes 0.000 description 2
- 101710017697 ZNF814 Proteins 0.000 description 2
- 102100013921 ZXDC Human genes 0.000 description 2
- 101700004575 ZXDC Proteins 0.000 description 2
- 235000001014 amino acid Nutrition 0.000 description 2
- 125000000539 amino acid group Chemical group 0.000 description 2
- 102000024070 binding proteins Human genes 0.000 description 2
- 108091007650 binding proteins Proteins 0.000 description 2
- 239000012472 biological sample Substances 0.000 description 2
- UIIMBOGNXHQVGW-UHFFFAOYSA-M buffer Substances [Na+].OC([O-])=O UIIMBOGNXHQVGW-UHFFFAOYSA-M 0.000 description 2
- 101710025091 cbbGC Proteins 0.000 description 2
- 230000004663 cell proliferation Effects 0.000 description 2
- 238000005313 chemometric Methods 0.000 description 2
- 230000002860 competitive Effects 0.000 description 2
- 230000003828 downregulation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002708 enhancing Effects 0.000 description 2
- 230000002349 favourable Effects 0.000 description 2
- 230000037320 fibronectin Effects 0.000 description 2
- 238000010230 functional analysis Methods 0.000 description 2
- 101710025070 gapdh-2 Proteins 0.000 description 2
- 102000009543 guanyl-nucleotide exchange factor activity proteins Human genes 0.000 description 2
- 108040001860 guanyl-nucleotide exchange factor activity proteins Proteins 0.000 description 2
- 230000003118 histopathologic Effects 0.000 description 2
- 239000005556 hormone Substances 0.000 description 2
- 230000001976 improved Effects 0.000 description 2
- 238000010348 incorporation Methods 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 239000003446 ligand Substances 0.000 description 2
- 238000007834 ligase chain reaction Methods 0.000 description 2
- 230000031700 light absorption Effects 0.000 description 2
- 238000010841 mRNA extraction Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000036210 malignancy Effects 0.000 description 2
- 230000003211 malignant Effects 0.000 description 2
- 238000010208 microarray analysis Methods 0.000 description 2
- 230000002438 mitochondrial Effects 0.000 description 2
- 238000000491 multivariate analysis Methods 0.000 description 2
- 238000002515 oligonucleotide synthesis Methods 0.000 description 2
- 239000003960 organic solvent Substances 0.000 description 2
- 230000001575 pathological Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000001558 permutation test Methods 0.000 description 2
- 230000004557 prognostic gene signature Effects 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- 101700002861 ralR Proteins 0.000 description 2
- 230000000754 repressing Effects 0.000 description 2
- 101710004466 rgy Proteins 0.000 description 2
- 101710030364 rgy1 Proteins 0.000 description 2
- 101710030359 rgy2 Proteins 0.000 description 2
- 239000004055 small Interfering RNA Substances 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 238000005382 thermal cycling Methods 0.000 description 2
- 230000005026 transcription initiation Effects 0.000 description 2
- 230000004614 tumor growth Effects 0.000 description 2
- 238000011179 visual inspection Methods 0.000 description 2
- ZROHGHOFXNOHSO-BNTLRKBRSA-L (1R,2R)-cyclohexane-1,2-diamine;oxalate;platinum(2+) Chemical compound [H][N]([C@@H]1CCCC[C@H]1[N]1([H])[H])([H])[Pt]11OC(=O)C(=O)O1 ZROHGHOFXNOHSO-BNTLRKBRSA-L 0.000 description 1
- 108020004463 18S Ribosomal RNA Proteins 0.000 description 1
- PIGCSKVALLVWKU-UHFFFAOYSA-N 2-Aminoacridone Chemical compound C1=CC=C2C(=O)C3=CC(N)=CC=C3NC2=C1 PIGCSKVALLVWKU-UHFFFAOYSA-N 0.000 description 1
- 101700009143 ACO1 Proteins 0.000 description 1
- 102100017481 ACTN4 Human genes 0.000 description 1
- 101700014681 ACTN4 Proteins 0.000 description 1
- 102100001041 APBB2 Human genes 0.000 description 1
- 101700001962 APBB2 Proteins 0.000 description 1
- 101710038341 ARHGEF2 Proteins 0.000 description 1
- 102100004840 ARHGEF2 Human genes 0.000 description 1
- 101710039330 ATP2B4 Proteins 0.000 description 1
- 102100003157 ATP2B4 Human genes 0.000 description 1
- 102000034451 ATPases Human genes 0.000 description 1
- 108091006096 ATPases Proteins 0.000 description 1
- 240000005020 Acaciella glauca Species 0.000 description 1
- 102000010825 Actinin Human genes 0.000 description 1
- 108010063503 Actinin Proteins 0.000 description 1
- 102000007469 Actins Human genes 0.000 description 1
- 108010085238 Actins Proteins 0.000 description 1
- 208000009956 Adenocarcinoma Diseases 0.000 description 1
- 108010082126 Alanine Transaminase Proteins 0.000 description 1
- 101710017117 An07g00800 Proteins 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 241000212384 Bifora Species 0.000 description 1
- 210000004369 Blood Anatomy 0.000 description 1
- 210000000481 Breast Anatomy 0.000 description 1
- 235000003197 Byrsonima crassifolia Nutrition 0.000 description 1
- 240000001546 Byrsonima crassifolia Species 0.000 description 1
- 101710038196 CAMK1D Proteins 0.000 description 1
- 102100012401 CAMK1D Human genes 0.000 description 1
- 102100019237 CCSER1 Human genes 0.000 description 1
- 101710043734 CCSER1 Proteins 0.000 description 1
- 101710043956 CEACAM5 Proteins 0.000 description 1
- 102100018684 COMMD10 Human genes 0.000 description 1
- 101710030647 COMMD10 Proteins 0.000 description 1
- 102100018537 CPEB2 Human genes 0.000 description 1
- 101700055576 CPEB2 Proteins 0.000 description 1
- 101710023929 CSNK1A1 Proteins 0.000 description 1
- 102100000723 CSNK1A1 Human genes 0.000 description 1
- 102100011223 CTBP2 Human genes 0.000 description 1
- 101700041283 CTBP2 Proteins 0.000 description 1
- AIYUHDOJVYHVIT-UHFFFAOYSA-M Caesium chloride Chemical compound [Cl-].[Cs+] AIYUHDOJVYHVIT-UHFFFAOYSA-M 0.000 description 1
- 102000004631 Calcineurin Human genes 0.000 description 1
- 108010042955 Calcineurin Proteins 0.000 description 1
- 108010022366 Carcinoembryonic Antigen Proteins 0.000 description 1
- 108010022830 Cetuximab Proteins 0.000 description 1
- 102000010991 Chaperonin Cpn60 Human genes 0.000 description 1
- 108050001186 Chaperonin Cpn60 Proteins 0.000 description 1
- 229920001405 Coding region Polymers 0.000 description 1
- ACTIUHUUMQJHFO-UPTCCGCDSA-N Coenzyme Q10 Chemical compound COC1=C(OC)C(=O)C(C\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CC\C=C(/C)CCC=C(C)C)=C(C)C1=O ACTIUHUUMQJHFO-UPTCCGCDSA-N 0.000 description 1
- 102000020504 Collagenase family Human genes 0.000 description 1
- 108060005980 Collagenase family Proteins 0.000 description 1
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 1
- 206010053138 Congenital aplastic anaemia Diseases 0.000 description 1
- 229920000453 Consensus sequence Polymers 0.000 description 1
- 108010043471 Core Binding Factor Alpha 2 Subunit Proteins 0.000 description 1
- 102000004127 Cytokines Human genes 0.000 description 1
- 108090000695 Cytokines Proteins 0.000 description 1
- 210000000805 Cytoplasm Anatomy 0.000 description 1
- 229920001705 Cytoplasmic polyadenylation element Polymers 0.000 description 1
- MTCFGRXMJLQNBG-UWTATZPHSA-N D-serine Chemical compound OC[C@@H](N)C(O)=O MTCFGRXMJLQNBG-UWTATZPHSA-N 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- 239000003298 DNA probe Substances 0.000 description 1
- 102000005768 DNA-Activated Protein Kinase Human genes 0.000 description 1
- 108010006124 DNA-Activated Protein Kinase Proteins 0.000 description 1
- 241000252212 Danio rerio Species 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 102000030898 Desmoplakins Human genes 0.000 description 1
- 108091000075 Desmoplakins Proteins 0.000 description 1
- 108010016626 Dipeptides Proteins 0.000 description 1
- 206010061818 Disease progression Diseases 0.000 description 1
- 108020004461 Double-Stranded RNA Proteins 0.000 description 1
- 102000002578 Dystrophin-Associated Proteins Human genes 0.000 description 1
- 108010093446 Dystrophin-Associated Proteins Proteins 0.000 description 1
- 101700019751 ECE1 Proteins 0.000 description 1
- 102100017635 ECE1 Human genes 0.000 description 1
- 101710005330 EEF2K Proteins 0.000 description 1
- 102100010782 EGFR Human genes 0.000 description 1
- 101700039191 EGFR Proteins 0.000 description 1
- 238000002965 ELISA Methods 0.000 description 1
- 101700032546 EPHB4 Proteins 0.000 description 1
- 108010016831 Elongation Factor 2 Kinase Proteins 0.000 description 1
- 102000002045 Endothelin Human genes 0.000 description 1
- 108050009340 Endothelin Proteins 0.000 description 1
- ZUBDGKVDJUIMQQ-UBFCDGJISA-N Endothelin-1 Chemical compound C([C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(O)=O)NC(=O)[C@H]1NC(=O)[C@H](CC=2C=CC=CC=2)NC(=O)[C@@H](CC=2C=CC(O)=CC=2)NC(=O)[C@H](C(C)C)NC(=O)[C@H]2CSSC[C@@H](C(N[C@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)N[C@H](CC(O)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCC(O)=O)C(=O)N2)=O)NC(=O)[C@@H](CO)NC(=O)[C@H](N)CSSC1)C1=CNC=N1 ZUBDGKVDJUIMQQ-UBFCDGJISA-N 0.000 description 1
- 240000006775 Enicostema verticillatum Species 0.000 description 1
- 229920002760 Expressed sequence tag Polymers 0.000 description 1
- 229920002016 Extrachromosomal DNA Polymers 0.000 description 1
- 210000003414 Extremities Anatomy 0.000 description 1
- 102000013601 Fanconi Anemia Complementation Group D2 Protein Human genes 0.000 description 1
- 108010026653 Fanconi Anemia Complementation Group D2 Protein Proteins 0.000 description 1
- 201000004939 Fanconi anemia Diseases 0.000 description 1
- 201000000142 Fanconi anemia complementation group D2 Diseases 0.000 description 1
- 108090000045 G-protein coupled receptors Proteins 0.000 description 1
- 102000003688 G-protein coupled receptors Human genes 0.000 description 1
- 102100017764 GCC2 Human genes 0.000 description 1
- 101700062385 GCC2 Proteins 0.000 description 1
- 102100018345 GFPT1 Human genes 0.000 description 1
- 101700069341 GFPT1 Proteins 0.000 description 1
- 101700019698 GNL1 Proteins 0.000 description 1
- 102100003550 GNL1 Human genes 0.000 description 1
- 102100010966 GPT Human genes 0.000 description 1
- 102100010971 GPT2 Human genes 0.000 description 1
- 101700047294 GPT2 Proteins 0.000 description 1
- 102000009465 Growth Factor Receptors Human genes 0.000 description 1
- 108010009202 Growth Factor Receptors Proteins 0.000 description 1
- XKMLYUALXHKNFT-UUOKFMHZSA-N Guanosine-5'-triphosphate Chemical compound C1=2NC(N)=NC(=O)C=2N=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O XKMLYUALXHKNFT-UUOKFMHZSA-N 0.000 description 1
- 101710035962 HB1 Proteins 0.000 description 1
- 102100004061 HNRNPL Human genes 0.000 description 1
- 101710031714 HNRNPL Proteins 0.000 description 1
- 102100003681 HSPD1 Human genes 0.000 description 1
- 101710013836 HSPD1 Proteins 0.000 description 1
- 241000282619 Hylobates lar Species 0.000 description 1
- 102100016015 IGLL5 Human genes 0.000 description 1
- 101700014447 IGLL5 Proteins 0.000 description 1
- 102100008239 INPP4B Human genes 0.000 description 1
- 101710031804 INPP4B Proteins 0.000 description 1
- LFVLUOAHQIVABZ-UHFFFAOYSA-N Iodofenphos Chemical compound COP(=S)(OC)OC1=CC(Cl)=C(I)C=C1Cl LFVLUOAHQIVABZ-UHFFFAOYSA-N 0.000 description 1
- UWKQSNNFCGGAFS-XIFFEERXSA-N Irinotecan Chemical compound C1=C2C(CC)=C3CN(C(C4=C([C@@](C(=O)OC4)(O)CC)C=4)=O)C=4C3=NC2=CC=C1OC(=O)N(CC1)CCC1N1CCCCC1 UWKQSNNFCGGAFS-XIFFEERXSA-N 0.000 description 1
- 102100011815 KCNK1 Human genes 0.000 description 1
- 101700008915 KCNK1 Proteins 0.000 description 1
- 102100012491 KIAA0319L Human genes 0.000 description 1
- 101710040130 KIAA0319L Proteins 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- GSDSWSVVBLHKDQ-UHFFFAOYSA-N Levofloxacin Chemical compound FC1=CC(C(C(C(O)=O)=C2)=O)=C3N2C(C)COC3=C1N1CCN(C)CC1 GSDSWSVVBLHKDQ-UHFFFAOYSA-N 0.000 description 1
- 108050001513 Liprin-beta-1 Proteins 0.000 description 1
- 210000004072 Lung Anatomy 0.000 description 1
- 102100003102 MAVS Human genes 0.000 description 1
- 101700018430 MAVS Proteins 0.000 description 1
- 102100008498 MEF2A Human genes 0.000 description 1
- 101700023310 MEF2A Proteins 0.000 description 1
- 102100012740 MMP25 Human genes 0.000 description 1
- 101700026411 MMP25 Proteins 0.000 description 1
- 102100010075 MYO10 Human genes 0.000 description 1
- 101700060969 MYO10 Proteins 0.000 description 1
- 229920002521 Macromolecule Polymers 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 229920001776 Mature messenger RNA Polymers 0.000 description 1
- 102000008840 Melanoma-associated antigen 1 Human genes 0.000 description 1
- 108050000731 Melanoma-associated antigen 1 Proteins 0.000 description 1
- 206010061289 Metastatic neoplasm Diseases 0.000 description 1
- 108010008705 Mucin-2 Proteins 0.000 description 1
- 108010008692 Mucin-6 Proteins 0.000 description 1
- 102000006746 NADH Dehydrogenase Human genes 0.000 description 1
- 108010086428 NADH Dehydrogenase Proteins 0.000 description 1
- 102100007447 NCAPD2 Human genes 0.000 description 1
- 101710032688 NCAPD2 Proteins 0.000 description 1
- 102100014831 NUBPL Human genes 0.000 description 1
- 101700006473 NUBPL Proteins 0.000 description 1
- 101700080605 NUC1 Proteins 0.000 description 1
- 102000014119 Nibrin Human genes 0.000 description 1
- 108050003990 Nibrin Proteins 0.000 description 1
- 101700069236 PGA60 Proteins 0.000 description 1
- 102100012533 PPDPF Human genes 0.000 description 1
- 108060006547 PPDPF Proteins 0.000 description 1
- 101710003741 PPFIBP1 Proteins 0.000 description 1
- 102100012554 PPP1R21 Human genes 0.000 description 1
- 101710008685 PPP1R21 Proteins 0.000 description 1
- 101710017358 PPP3CA Proteins 0.000 description 1
- 102100002707 PPP3CA Human genes 0.000 description 1
- 101700061400 PRKDC Proteins 0.000 description 1
- 102100003225 PRKDC Human genes 0.000 description 1
- 101700053720 PRP40 Proteins 0.000 description 1
- 102100019032 PRPF40A Human genes 0.000 description 1
- 101710008689 PRPF40A Proteins 0.000 description 1
- 101700049963 PSY1 Proteins 0.000 description 1
- 102100017873 PTK2 Human genes 0.000 description 1
- 101700086523 PTK2 Proteins 0.000 description 1
- 102100001035 PTP4A1 Human genes 0.000 description 1
- 101710039073 PTP4A1 Proteins 0.000 description 1
- 101700055577 PTPRF Proteins 0.000 description 1
- 229920000776 Poly(Adenosine diphosphate-ribose) polymerase Polymers 0.000 description 1
- 229920000795 Polyadenylation Polymers 0.000 description 1
- 239000005062 Polybutadiene Substances 0.000 description 1
- 229920001748 Polybutylene Polymers 0.000 description 1
- 229920002367 Polyisobutene Polymers 0.000 description 1
- 241000702619 Porcine parvovirus Species 0.000 description 1
- 206010036807 Progressive multifocal leukoencephalopathy Diseases 0.000 description 1
- 102000001253 Protein Kinases Human genes 0.000 description 1
- 108060006633 Protein Kinases Proteins 0.000 description 1
- 108010078762 Protein Precursors Proteins 0.000 description 1
- 102000014961 Protein Precursors Human genes 0.000 description 1
- 108010026552 Proteome Proteins 0.000 description 1
- 229940076788 Pyruvate Drugs 0.000 description 1
- 101710015417 RABGAP1 Proteins 0.000 description 1
- 102100010218 RABGAP1 Human genes 0.000 description 1
- 101700028023 RALA Proteins 0.000 description 1
- 210000000664 Rectum Anatomy 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 102000004389 Ribonucleoproteins Human genes 0.000 description 1
- 108010081734 Ribonucleoproteins Proteins 0.000 description 1
- 229920001914 Ribonucleotide Polymers 0.000 description 1
- 102100000630 SAMD4B Human genes 0.000 description 1
- 101710023320 SAMD4B Proteins 0.000 description 1
- 102100007633 SELENOP Human genes 0.000 description 1
- 101710022233 SELENOP Proteins 0.000 description 1
- 102000000395 SH3 domain Human genes 0.000 description 1
- 108050008861 SH3 domain Proteins 0.000 description 1
- 101710005259 SH3GLB1 Proteins 0.000 description 1
- 102100000978 SH3GLB1 Human genes 0.000 description 1
- 102100000973 SINHCAF Human genes 0.000 description 1
- 108060007549 SINHCAF Proteins 0.000 description 1
- 101710013399 SIPA1L3 Proteins 0.000 description 1
- 102100018005 SIPA1L3 Human genes 0.000 description 1
- 102100017033 SLC35G3 Human genes 0.000 description 1
- 101710015330 SLC35G3 Proteins 0.000 description 1
- 102100017035 SLC35G5 Human genes 0.000 description 1
- 101710015331 SLC35G5 Proteins 0.000 description 1
- 102100000675 SMURF2 Human genes 0.000 description 1
- 101710043278 SMURF2 Proteins 0.000 description 1
- 102100018873 SNTB2 Human genes 0.000 description 1
- 101700026322 SNTB2 Proteins 0.000 description 1
- 102100005109 SRSF1 Human genes 0.000 description 1
- 101700016511 SRSF1 Proteins 0.000 description 1
- 210000003296 Saliva Anatomy 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 102000007374 Smad Proteins Human genes 0.000 description 1
- 108010007945 Smad Proteins Proteins 0.000 description 1
- 108020003224 Small Nucleolar RNA Proteins 0.000 description 1
- 102000009329 Sterile alpha motif domain Human genes 0.000 description 1
- 108050000172 Sterile alpha motif domain Proteins 0.000 description 1
- 241000282890 Sus Species 0.000 description 1
- 102000004402 Syntrophin Human genes 0.000 description 1
- 108090000916 Syntrophin Proteins 0.000 description 1
- 102100009530 TMEM87A Human genes 0.000 description 1
- 101710013195 TMEM87A Proteins 0.000 description 1
- 102100001067 TMPRSS4 Human genes 0.000 description 1
- 101710044691 TMPRSS4 Proteins 0.000 description 1
- 102100009364 TRIM5 Human genes 0.000 description 1
- 101700058064 TRIM5 Proteins 0.000 description 1
- 101700035809 TRIO Proteins 0.000 description 1
- 102100012958 TRIO Human genes 0.000 description 1
- 102100012734 TRPS1 Human genes 0.000 description 1
- 101700014835 TRPS1 Proteins 0.000 description 1
- 210000001550 Testis Anatomy 0.000 description 1
- 102000009332 Tetraspanins Human genes 0.000 description 1
- 108050000196 Tetraspanins Proteins 0.000 description 1
- 102000006612 Transducin Human genes 0.000 description 1
- 108010087042 Transducin Proteins 0.000 description 1
- 229920001437 U6 spliceosomal RNA Polymers 0.000 description 1
- 101710007061 UAC1 Proteins 0.000 description 1
- 102100015852 UBTD1 Human genes 0.000 description 1
- 101700027525 UBTD1 Proteins 0.000 description 1
- 101710011502 UMAG_10076 Proteins 0.000 description 1
- 229940035936 Ubiquinone Drugs 0.000 description 1
- 108090000848 Ubiquitin Proteins 0.000 description 1
- 102400000757 Ubiquitin Human genes 0.000 description 1
- 210000002700 Urine Anatomy 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 102100000548 WSB1 Human genes 0.000 description 1
- 101700025246 WSB1 Proteins 0.000 description 1
- 102100019742 YJEFN3 Human genes 0.000 description 1
- 101710006760 YJEFN3 Proteins 0.000 description 1
- 102100018774 YLPM1 Human genes 0.000 description 1
- 101700035284 YLPM1 Proteins 0.000 description 1
- 102100013265 YPEL5 Human genes 0.000 description 1
- 101700062365 YPEL5 Proteins 0.000 description 1
- 102100013588 ZFAND3 Human genes 0.000 description 1
- 101710013138 ZFAND3 Proteins 0.000 description 1
- ASCUXPQGEXGEMJ-GPLGTHOPSA-N [(2R,3S,4S,5R,6S)-3,4,5-triacetyloxy-6-[[(2R,3R,4S,5R,6R)-3,4,5-triacetyloxy-6-(4-methylanilino)oxan-2-yl]methoxy]oxan-2-yl]methyl acetate Chemical compound CC(=O)O[C@@H]1[C@@H](OC(C)=O)[C@@H](OC(C)=O)[C@@H](COC(=O)C)O[C@@H]1OC[C@@H]1[C@@H](OC(C)=O)[C@H](OC(C)=O)[C@@H](OC(C)=O)[C@H](NC=2C=CC(C)=CC=2)O1 ASCUXPQGEXGEMJ-GPLGTHOPSA-N 0.000 description 1
- JSSIMHFZXSNQMQ-UHFFFAOYSA-N [5-(2,4-dioxopyrimidin-1-yl)-3-hydroxyoxolan-2-yl] [hydroxy(phosphonooxy)phosphoryl] hydrogen phosphate;2-(3-hydroxy-6-oxoxanthen-9-yl)benzoic acid Chemical compound OC(=O)C1=CC=CC=C1C1=C2C=CC(=O)C=C2OC2=CC(O)=CC=C21.O1C(OP(O)(=O)OP(O)(=O)OP(O)(O)=O)C(O)CC1N1C(=O)NC(=O)C=C1 JSSIMHFZXSNQMQ-UHFFFAOYSA-N 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000009098 adjuvant therapy Methods 0.000 description 1
- 238000007605 air drying Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 239000012491 analyte Substances 0.000 description 1
- 210000004102 animal cell Anatomy 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007172 antigens Proteins 0.000 description 1
- 102000038129 antigens Human genes 0.000 description 1
- 238000000376 autoradiography Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 229920001222 biopolymer Polymers 0.000 description 1
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 230000000903 blocking Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 102000008395 cell adhesion mediator activity proteins Human genes 0.000 description 1
- 108040002558 cell adhesion mediator activity proteins Proteins 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 230000032823 cell division Effects 0.000 description 1
- 230000001413 cellular Effects 0.000 description 1
- 229940098124 cesium chloride Drugs 0.000 description 1
- 229960005395 cetuximab Drugs 0.000 description 1
- 239000005081 chemiluminescent agent Substances 0.000 description 1
- 238000000546 chi-square test Methods 0.000 description 1
- 230000002759 chromosomal Effects 0.000 description 1
- 230000001684 chronic Effects 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 235000017471 coenzyme Q10 Nutrition 0.000 description 1
- 229960002424 collagenase Drugs 0.000 description 1
- 201000010897 colon adenocarcinoma Diseases 0.000 description 1
- 238000009096 combination chemotherapy Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 229920001577 copolymer Polymers 0.000 description 1
- 230000001808 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 101700045109 crtB Proteins 0.000 description 1
- 101710035354 crtB/uppS3 Proteins 0.000 description 1
- 101700026457 crtY Proteins 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000000432 density-gradient centrifugation Methods 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 239000000890 drug combination Substances 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 239000012149 elution buffer Substances 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 230000002449 erythroblastic Effects 0.000 description 1
- 125000004185 ester group Chemical group 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- QHZOMAXECYYXGP-UHFFFAOYSA-N ethene;prop-2-enoic acid Chemical compound C=C.OC(=O)C=C QHZOMAXECYYXGP-UHFFFAOYSA-N 0.000 description 1
- 210000003527 eukaryotic cell Anatomy 0.000 description 1
- 238000001704 evaporation Methods 0.000 description 1
- 230000003203 everyday Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000011049 filling Methods 0.000 description 1
- 238000000684 flow cytometry Methods 0.000 description 1
- 238000000799 fluorescence microscopy Methods 0.000 description 1
- 101700077283 glnB Proteins 0.000 description 1
- 239000004220 glutamic acid Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 101710007446 het-s Proteins 0.000 description 1
- 230000002962 histologic Effects 0.000 description 1
- 230000002390 hyperplastic Effects 0.000 description 1
- 230000001900 immune effect Effects 0.000 description 1
- 238000003018 immunoassay Methods 0.000 description 1
- 238000003365 immunocytochemistry Methods 0.000 description 1
- 230000002055 immunohistochemical Effects 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 238000001764 infiltration Methods 0.000 description 1
- 229960004768 irinotecan Drugs 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 210000002429 large intestine Anatomy 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000012139 lysis buffer Substances 0.000 description 1
- 108009000345 mRNA Processing Proteins 0.000 description 1
- 210000004962 mammalian cells Anatomy 0.000 description 1
- 102000003986 matrix metalloproteinase 25 Human genes 0.000 description 1
- 108090000440 matrix metalloproteinase 25 Proteins 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000001394 metastastic Effects 0.000 description 1
- 200000000023 metastatic cancer Diseases 0.000 description 1
- CERQOIWHTDAKMF-UHFFFAOYSA-N methacrylic acid Chemical compound CC(=C)C(O)=O CERQOIWHTDAKMF-UHFFFAOYSA-N 0.000 description 1
- 239000004005 microsphere Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006011 modification reaction Methods 0.000 description 1
- 101700054071 myo1 Proteins 0.000 description 1
- 230000001613 neoplastic Effects 0.000 description 1
- 230000000683 nonmetastatic Effects 0.000 description 1
- 101700006494 nucA Proteins 0.000 description 1
- 238000007899 nucleic acid hybridization Methods 0.000 description 1
- 238000001216 nucleic acid method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000000963 osteoblast Anatomy 0.000 description 1
- 230000000790 osteoblast Effects 0.000 description 1
- 229960001756 oxaliplatin Drugs 0.000 description 1
- 230000002093 peripheral Effects 0.000 description 1
- 239000011886 peripheral blood Substances 0.000 description 1
- 230000035699 permeability Effects 0.000 description 1
- 230000002974 pharmacogenomic Effects 0.000 description 1
- 239000002831 pharmacologic agent Substances 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- 238000000206 photolithography Methods 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 230000035479 physiological effects, processes and functions Effects 0.000 description 1
- 235000002949 phytic acid Nutrition 0.000 description 1
- 239000004033 plastic Substances 0.000 description 1
- 229920003023 plastic Polymers 0.000 description 1
- 229920000553 poly(phenylenevinylene) Polymers 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 230000001402 polyadenylating Effects 0.000 description 1
- 229920002857 polybutadiene Polymers 0.000 description 1
- 229920000573 polyethylene Polymers 0.000 description 1
- 229920001195 polyisoprene Polymers 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 229920000306 polymethylpentene Polymers 0.000 description 1
- 239000011116 polymethylpentene Substances 0.000 description 1
- 229920001343 polytetrafluoroethylene Polymers 0.000 description 1
- 229920000131 polyvinylidene Polymers 0.000 description 1
- 230000002980 postoperative Effects 0.000 description 1
- 102000004257 potassium channel family Human genes 0.000 description 1
- 108020001213 potassium channel family Proteins 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000003133 prior Effects 0.000 description 1
- 230000001737 promoting Effects 0.000 description 1
- 201000004681 psoriasis Diseases 0.000 description 1
- LCTONWCANYUPML-UHFFFAOYSA-M pyruvate Chemical compound CC(=O)C([O-])=O LCTONWCANYUPML-UHFFFAOYSA-M 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 101700029718 ral Proteins 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 230000000306 recurrent Effects 0.000 description 1
- 235000003499 redwood Nutrition 0.000 description 1
- 230000022983 regulation of cell cycle Effects 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 230000003248 secreting Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 238000004513 sizing Methods 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 239000007858 starting material Substances 0.000 description 1
- 210000000130 stem cell Anatomy 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- ATJFFYVFTNAWJD-UHFFFAOYSA-N tin hydride Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 1
- 230000002588 toxic Effects 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000005029 transcription elongation Effects 0.000 description 1
- 230000002103 transcriptional Effects 0.000 description 1
- 108091006091 transcriptional repressors Proteins 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
- 230000004222 uncontrolled growth Effects 0.000 description 1
- 238000007473 univariate analysis Methods 0.000 description 1
- 230000003612 virological Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000036642 wellbeing Effects 0.000 description 1
- HCHKCACWOHOZIP-UHFFFAOYSA-N zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 1
- 239000011701 zinc Substances 0.000 description 1
- 229910052725 zinc Inorganic materials 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61P—SPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
- A61P35/00—Antineoplastic agents
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/112—Disease subtyping, staging or classification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
- G01N33/574—Immunoassay; Biospecific binding assay; Materials therefor for cancer
- G01N33/57407—Specifically defined cancers
- G01N33/57419—Specifically defined cancers of colon
Abstract
Disclosed is a method for diagnosing colon cancer in a sample obtained from a subject, comprising: detecting an expression level of at least 100 colon cancer-related nucleic acid molecules listed in Table 6 of the specification, in a sample comprising nucleic acids obtained from a subject; comparing the combined expression level of the at least 100 colon cancer-related nucleic acid molecules, or a decision score derived therefrom to a control threshold indicative of a diagnosis of colon cancer, wherein the expression level, or a decision score derived therefrom, on the same side of the threshold indicates a diagnosis of colon cancer, thereby diagnosing colon cancer in the sample obtained from the subject; and wherein the at least 100 colon cancer-related nucleic acid molecules does not include EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, DSP, ICAM1, BMP2, BMPR1A, CYP1B1, IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, JUN, TCF7L2, FLT1, CEACAM6, GBP1, XPC, RALBP1, STAT1, FOXO3, RTEL1, TNFRSF6B, EGFR, HIF1A, KLF6, or MUC2. g the combined expression level of the at least 100 colon cancer-related nucleic acid molecules, or a decision score derived therefrom to a control threshold indicative of a diagnosis of colon cancer, wherein the expression level, or a decision score derived therefrom, on the same side of the threshold indicates a diagnosis of colon cancer, thereby diagnosing colon cancer in the sample obtained from the subject; and wherein the at least 100 colon cancer-related nucleic acid molecules does not include EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, DSP, ICAM1, BMP2, BMPR1A, CYP1B1, IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, JUN, TCF7L2, FLT1, CEACAM6, GBP1, XPC, RALBP1, STAT1, FOXO3, RTEL1, TNFRSF6B, EGFR, HIF1A, KLF6, or MUC2.
Description
/022594
COLON CANCER GENE EXPRESSION SIGNATURES
AND METHODS OF USE
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims the benefit ofUS. Provisional Application No.
,922, filed on January 25, 2011, which is incorporated herein by reference in its
entirety.
FIELD OF THE INVENTION
The present disclosure relates to gene expression profiling in colon s,
such as colon cancer tissues. In particular, the present disclosure concerns sensitive
s to measure mRNA levels in biopsied colon tumor s, including archived
paraffin-embedded biopsy material. In addition, the disclosure provides sets of
expressed transcripts forming gene expression signatures for the prognosis, diagnosis
and treatment of colon cancer.
BACKGROUND OF THE INVENTION
Approximately 30% of all colon cancer patients are diagnosed with stage II
disease. (Jemal et al., CA Cancer J. Clin., 2004). The 5-year survival for patients with
stage II colon cancer treated by surgery is approximately , demonstrating that
the majority of patients are cured by surgery alone. n, The Oncologist, 2006;
Nauta et al., Arch. Sarg., 1989.) heless, approximately 20-25% of these
patients will develop recurrent disease within their lifetime. (Benson, The Oncologist,
2006; Gill et al., J. Clin. Oncol., 2004). In theory, these patients should benefit from
adjuvant chemotherapy. However, only around 3-4% of ts have an absolute
improvement in survival at 5-years with the use of adjuvant chemotherapy in stage II
colon cancer. (Benson, The Oncologist, 2006; Andre et al., Annals ofSurgical
Oncology 2006). As a consequence, the American y of Clinical Oncology
guidelines recommend that these patients should not be routinely d with
adjuvant chemotherapy. (Benson et al., J. Clin. Oncol., 2004). Despite this, it is clear
that approximately 20% of stage II colon cancer patients, at higher risk of relapse,
may be candidates for adjuvant treatment. n, The Oncologist, 2006; Nauta et
al., Arch. Sarg., 1989; Gill et al., J. Clin. Oncol., 2004; Andre et al., Annals of
Surgical Oncology 2006.)
In diseases such as colon cancer, the first treatment is often the most ant
and offers the greatest chance of success, so there exists a need to use the treatment
most effective for a patient’s particular stage of colon cancer as the first ent.
This has traditionally been impossible because no method was available for predicting
which drug treatment would be the most ive for a particular indiVidual’s
physiology. Many times patients would needlessly undergo toxic drug y. For
example, in Stage 11 tumor node metastasis (TNM) colon cancer, there has been no
method of determining which patients will d to nt chemotherapy after
surgery. Only one third of the 20% of stage 11 patients at risk for relapse after surgery
derive any benefit from chemotherapy. This means that prescribing adjuvant
chemotherapy exposes some patients to treatment that is unnecessary. Alternatively, a
decision to withholding adjuvant chemotherapy at this stage will expose some patients
to a higher risk of cancer relapse.
Currently, diagnostic tests used in clinical practice are based on a single
analyte test, and therefore do not e the potential value of knowing relationships
between dozens of different markers. Moreover, diagnostic tests are frequently not
tative, relying on immunohistochemistry. This method often yields different
results in different laboratories, in part e the reagents are not standardized, and
in part because the interpretations can be subjective and may not be easily quantified.
RNA-based tests have not often been used because of the problem ofRNA
degradation over time and the fact that it is difficult to obtain fresh tissue samples
from ts for analysis. Fixed paraffin-embedded tissue is more readily available
and methods have been established to detect RNA in fixed tissue. However, these
s typically do not allow for the study of large numbers of genes (DNA or
RNA) from small s of material. Thus, traditionally fixed tissue has been rarely
used other than for immunohistochemical detection of proteins.
Recently, several groups have published studies concerning the classification
of various cancer types by microarray gene expression analysis (see, e.g. Golub et al.,
Science 1 537 (1999); charjae et al., Proc. Natl. Acad. Sci. USA
98:13790 13795 (2001); Chen-Hsiang et al., Bioinformatics 17 (Suppl. 1):S316 S322
(2001); Ramaswamy et al., Proc. Natl. Acad. Sci. USA 98: 15 149 15154 (2001),
Salazar et al., Journal ofClinical Oncology 29: 17-24 (2010), O’Conneell et al.,
Journal ofClinical Oncology 28: 3937-3944 (2010) and Kerr et al., Journal of
Clinical Oncology 27 (suppl) 15s (2009)). However, these studies mostly focus on
improving and g the already established classification of various types of
, and generally do not provide new insights into the relationships of the
differentially expressed genes, and do not link the findings to treatment strategies in
order to improve the clinical outcome of cancer therapy. In addition, cancer treatment
and colon cancer clinical trials are still being pursued on the basis of the availability
ofnew active compounds rather than the integrated approach of pharmacogenomics,
which utilizes the c makeup of the tumor and the genotype of the patient to
establish a personalized medication regime.
Although modern molecular biology and mistry have revealed more
than 100 genes whose activities influence the or of tumor cells, state of their
differentiation, and their sensitivity or resistance to certain eutic drugs, with a
few exceptions, the status of these genes has not been exploited for the purpose of
routinely making clinical decisions about drug treatments.
SUMMARY OF THE INVENTION
There is a need to identify biomarkers useful for predicting prognosis of
patients with colon cancer. The ability to classify patients as high risk (poor
prognosis) or low risk (favorable prognosis) would enable selection of appropriate
therapies for these patients. For example, high-risk patients are likely to benefit from
aggressive therapy, whereas y may have no significant advantage for low risk
patients. However, in spite of this need, a solution to this problem has not been
available.
Therefore, microarray-based prognostic technologies are needed that provide a
physician with information on the hood of recovery or relapse following
stration of a particular treatment regimen, such as resection with or without
chemotherapy. Technologies are also needed that can accurately diagnose a colon
e, particularly the diagnosis of a ular stage of colon cancer, or can predict
a colon disease patient’s response to a particular therapy. Specific knowledge
regarding a tumor in a cancer patient would be extremely useful in prolonging
remission, increasing the quality of patient life, and reducing healthcare costs. Such
technologies may also be used to screen patient ates for clinical trials for novel
therapeutic compounds and methods to tate the regulatory approval process.
Disclosed are expression signatures from colon cancer that meet these needs.
The disclosed ures can be used for applications in sis of colon cancer,
diagnosis of colon cancer and classifying patient groups. In some embodiments, these
results permit assessment of genomic evidence of the efficacy of surgery alone, or in
combination with adjuvant chemotherapy for treatment of colon cancer. The
signatures described herein may be significant in, and capable of, discriminating
n two ses or prognostic outcomes. An ant aspect of the present
disclosure is to use the measured expression of certain genes in colon cancer tissue to
match patients to the most appropriate treatment, and to provide prognostic
information. Thus, disclosed are methods of using such colon cancer signatures. The
disclosed methods include detecting an expression level of at least 2 colon -
related nucleic acid molecules listed in Table 6 in a sample comprising nucleic acids
obtained from a subject and comparing the expression level of the at least 2 colon
cancer-related nucleic acid molecules, or a decision score derived rom to a
control threshold. Depending of the prediction requested, the l threshold can be
indicative of a diagnosis of colon cancer, indicative of known classification of colon
cancer, indicative of a known response to ent, indicative of having a history of
long term survival, indicative of a history of recurrence and the like.
In various embodiments, RNA is isolated from a colon tissue sample, and used
for preparing a gene expression profile. In n embodiments involving prognosis
of cancer, the sample is a colorectal tumor specimen, such as a colon cancer sample.
In certain ments, the gene expression profile involves detecting the expression
of at least 50 transcripts listed in Table 6, and which may also be listed in Table 1
and/or Table 2. The total number of transcripts detected in the gene expression profile
can vary. For example, in some embodiments the total number of transcripts detected
in the profile is fiom about 200 to about 1000, or from about 400 to about 800, or in
other ments, the number of ripts is from about 500 to about 700, or from
about 550 to about 650. In various embodiments, at least about 50, at least about 100,
at least about 200, at least about 300, at least about 400, at least about 500, at least
about 600, or all transcripts, listed in Table 6 are detected as part of the total number
of transcripts. Where additional ripts are detected (in addition to those of Table
6), they may be optionally selected from signal or sion level controls, and in
some embodiments, are transcripts known to be expressed in colon cancer, such as
those determined by Colorectal Cancer DSATM. In certain embodiments, the
onal transcripts may also be indicative of colon cancer prognosis.
2012/022594
The patient’s expression profile is scored t an expression signature
based on expression levels of the transcripts listed in Table 6 in high risk and low risk
patient groups, such as patient with a high or low risk of al relapse, and the
results may be used to determine a course of treatment. For example, a patient
ined to be a high risk patient may be treated with adjuvant chemotherapy after
surgery. For a patient deemed to be a low risk patient, adjuvant chemotherapy may be
withheld after surgery. Accordingly, the ion provides, in certain aspects, a
method for preparing a gene expression profile of a colon cancer tumor that is
indicative of risk of recurrence.
The disclosure further provides a method for prognosing colon . The
method according to this aspect comprises preparing a gene expression profile of a
colon cancer en (g, as described herein). The gene expression profile is then
classified or scored against a gene expression signature described herein. In various
embodiments, the gene expression signature is based on the sion level of at
least 50 transcripts listed in Table 6, and which may also be listed in Table 1 and/or
Table 2. In some embodiments, the total number of transcripts on which the signature
is based is less than about 800, less than about 700, less than about 600, less than
about 500, less than about 400, less than about 300, less than about 200, or less than
about 100 transcripts, and which includes transcripts from Table 6. For example, the
signature may be based on the expression levels of at least about 400, at least about
500, or at least about 600 transcripts from Table 6. Optionally, the transcripts from
Table 6 include the transcripts listed in Table I.
Also disclosed are methods of preparing a personalized colon cancer genomics
profile for a subject. The methods include detecting an expression level of at least 2
colon cancer-related nucleic acid molecules listed in Table 6 in a sample comprising
nucleic acids obtained from a subject and creating a report summarizing the data
obtained by the gene expression analysis.
In some es, of the sed methods, expression levels are determined
from nucleic acids obtained from the subject that comprise RNA and/or cDNA
transcribed from RNA extracted from a sample of colorectal tissue obtained from the
subject, such as colon cancer sample.
Also sed are nucleic acid probes and primers (as well as sets of such
probes and primers) for detecting a gene expression signature for colon cancer. In
WO 03250
some examples the probes are part of an array for use in the detection of a colon
cancer signature.
The foregoing and other features and advantages of the sure will become
more apparent from the following detailed description of several embodiments, which
proceeds with reference to the accompanying figures.
BRIEF DESCRIPTION OF THE GS
provides a flow chart showing an exemplary procedure used to derive a
colon cancer transcript expression signature.
provides a flow chart showing an exemplary outline of the stage II
colon cancer stic ure generation and validation, using the Colorectal
Cancer DSATM.
provides a graph of the receiver ing characteristic (ROC) curve
of the 636 transcript prognostic signature in the training set.
es a Kaplan-Meier plot of recurrence from training data from
the candidate model.
provides a graph of the receiver operating characteristic (ROC) curve
of the 636 transcript prognostic signature in the validation set.
provides a Kaplan-Meier plot of recurrence from validation data from
the candidate model.
provides a -Meier plot of overall survival from validation data
from the candidate model.
is Table 3 as described below.
is Table 6 as described below.
BRIEF DESCRIPTION OF THE TABLES
Table 1 provides a list of 10 candidate ripts included in a core colon
signature. These transcripts have been identified as having the highest impact on the
classification of samples into poor and good prognosis groups
Table 2 provides a list 178 unique transcripts included in the colon signature.
This table includes the weight rank of the transcript in the 636 transcript signature as
well as the orientation of the transcript expressed in colon tissue.
Table 3 provides key patient and tumor teristics in the study to identify
the 636 ript signature.
Table 4 provides performance metrics for the cross-validated ng set and
validation set used to identify the transcript signature.
Table 5 provides results of the statistical analysis showing Hazards Ratio for
patient age, t gender, pT-stage, tumor grade, tumor location and mucinous/non-
mucinous subtype status.
Table 6 provides a list of the transcripts included in the 63 6-transcript colon
ure.
SEQUENCE LISTING
The nucleic and amino acid sequences listed in the accompanying sequence
listing are shown using standard letter abbreviations for nucleotide bases, as defined
in 37 C.F.R. §1.822. Only one strand of each nucleic acid sequence is shown, but the
complementary strand is tood as included by any reference to the displayed
strand. In the accompanying sequence listing:
SEQ ID NOs: 1-636 are oligonucleotide transcripts from human colon cancer.
The Sequence Listing is submitted as an ASCII text file in the form of the file
named ADL-0311_Sequence_Listing.txt, which was created on January 25, 2012, and
is 232,154 bytes, which is incorporated by reference herein.
DETAILED DESCRIPTION
I. Summary ofTerms
Unless defined otherwise, technical and scientific terms used herein have the
same meaning as commonly understood by one of ordinary skill in the art to which
this disclosure belongs. Definitions of common terms in molecular biology may be
found in Benjamin Lewin, Genes IX, published by Jones and Bartlet, 2008 (ISBN
0763752223); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology,
published by Blackwell Science Ltd., 1994 (ISBN 0632021829); Robert A. Meyers
(ed.), Molecular y and Biotechnology: a hensive Desk Reference,
hed by VCH Publishers, Inc., 1995 (ISBN 9780471185710); Singleton et al.,
Dictionary of Microbiology and Molecular Biology 2nd ed., J. Wiley & Sons (New
York, NY. 1994), and March, Advanced c try Reactions, Mechanisms
and Structure 4th ed., John Wiley & Sons (New York, NY. 1992).
2012/022594
The singular terms 6‘ 3, 6‘
a an,” and “the” include plural referents unless t
y indicates otherwise. rly, the word “or” is intended to include “and”
unless the context clearly indicates otherwise. The term “comprises” means
“includes.” In case of conflict, the present specification, including explanations of
terms, will control.
To facilitate review of the s embodiments of this disclosure, the
following explanations of terms are provided:
Amplifying a nucleic acid molecule: To increase the number of copies of a
nucleic acid le, such as a gene or fragment of a gene, for example a ript
shown in Table 6. The resulting ts are called amplification products.
An example of in vitro amplification is the polymerase chain reaction (PCR).
Other examples of in vitro amplification techniques include quantitative real-time
PCR, strand displacement amplification (see US. Patent No. 5,744,3 l l);
transcription-free isothermal amplification (see US. Patent No. 6,033,88 l); repair
chain reaction amplification (see International Patent Publication No. W0 90/01069);
ligase chain reaction amplification (see EP-A-320 308); gap filling ligase chain
reaction amplification (see US. Patent No. 5,427,930); coupled ligase detection and
PCR (see US. Patent No. 6,027,889); and NASBATM RNA transcription-free
amplification (see US. Patent No. 6,025,134).
Array: An arrangement of molecules, such as biological macromolecules
(such c acid les) or biological samples (such as tissue sections), in
addressable locations on or in a substrate. In some es an array is an array of
polynucleotide probes (such as probes that hybridize to the nucleic acids ces
shown in Table 6, or the complement thereof), bound to a solid substrate so as not to
be substantially dislodged during a hybridization procedure. A “microarray” is an
array that is miniaturized so as to require or be aided by microscopic examination for
evaluation or analysis. Arrays are sometimes called DNA chips or biochips.
The array of molecules (“features”) makes it possible to carry out a very large
number of analyses on a sample at one time. In certain e arrays, one or more
molecules (such as an oligonucleotide probe) will occur on the array a plurality of
times (such as twice), for instance to provide internal controls.
In particular examples, an array includes nucleic acid molecules, such as
oligonucleotide ces. The polynucleotides used on an array may be cDNAs
("cDNA arrays") that are typically about 500 to 5000 bases long, although shorter or
longer cDNAs can also be used. Alternatively, the polynucleotides can be
oligonucleotides, which are typically about 20 to 80 bases long, although shorter and
longer oligonucleotides are also suitable. In one example, the molecule includes
oligonucleotides attached to the array via their 5’- or 3’-end.
Within an array, each arrayed sample is addressable, in that its location can be
ly and consistently determined within the at least two dimensions of the array.
The number of addressable locations on the array can vary, for example from at least
four, to at least 9, at least 10, at least 14, at least 15, at least 20, at least 30, at least 50,
at least 75, at least 100, at least 150, at least 200, at least 300, at least 500, least 550,
at least 600, at least 800, at least 1000, at least 10,000, or more. The e
application location on an array can assume different shapes. For example, the array
can be regular (such as arranged in uniform rows and columns) or irregular. Thus, in
ordered arrays the location of each sample is assigned to the sample at the time when
it is applied to the array, and a key may be provided in order to correlate each on
with the appropriate target or feature position. Often, d arrays are arranged in a
symmetrical grid pattern, but samples could be arranged in other patterns (such as in
radially distributed lines, spiral lines, or ordered rs). Addressable arrays usually
are computer readable, in that a computer can be programmed to correlate a particular
address on the array with information about the sample at that position (such as
ization or binding data, including for ce signal intensity). In some
examples of er readable formats, the individual features in the array are
arranged regularly, for instance in a Cartesian grid pattern, which can be correlated to
address information by a computer.
Binding or stable binding: An association between two substances or
molecules, such as the ation of a c acid to another nucleic acid (such as
the binding of a probe to a transcript shown in Table 6 or its complement), or the
association of a protein with another protein or nucleic acid molecule. Binding can be
ed by any procedure known to one skilled in the art, for example in the case of a
nucleic acid, such as by physical or onal properties of the target:oligonucleotide
complex.
Physical methods of detecting the binding of mentary strands of
nucleic acid molecules, include but are not limited to, such methods as DNase I or
chemical footprinting, gel shift and affinity cleavage assays, Northern blotting, dot
blotting and light absorption detection procedures. For example, one method involves
observing a change in light absorption of a on containing an oligonucleotide (or
an analog) and a target nucleic acid at 220 to 300 nm as the ature is slowly
sed. If the oligonucleotide or analog has bound to its target, there is a sudden
increase in absorption at a characteristic temperature as the oligonucleotide (or
analog) and target disassociate from each other, or melt. In another example, the
method involves detecting a signal, such as a detectable label, present on one or both
nucleic acid molecules (or antibody or protein as appropriate).
The binding between an oligomer and its target nucleic acid is frequently
characterized by the temperature (Tm) at which 50% of the oligomer is melted from its
target. A higher (Tm) means a stronger or more stable complex relative to a complex
with a lower (Tm).
cDNA (complementary DNA): A piece ofDNA g internal, non-coding
segments (introns) and regulatory sequences which determine transcription. cDNA can
be synthesized by reverse transcription from messenger RNA (mRNA) extracted from
cells and/or tissue samples, such a colon samples, including colon cancer samples.
Clinical outcome: Refers to the health status of a patient following treatment
for a disease or disorder, or in the absence of treatment. al outcomes include,
but are not limited to, an increase in the length of time until death, a decrease in the
length of time until death, an increase in the chance of survival, an increase in the risk
of death, survival, e-free survival, chronic e, asis, advanced or
aggressive disease, disease recurrence, death, and favorable or poor response to
therapy.
Colon cancer: Cancer that forms in the tissues of the colon (the longest part
of the large intestine). Most colon cancers are adenocarcinomas (cancers that begin in
cells that make line internal organs and have gland-like properties). Cancer
progression is characterized by stages, or the extent of cancer in the body. Staging is
usually based on the size of the tumor, whether lymph nodes contain cancer, and
r the cancer has spread from the original site to other parts of the body. Stages
of colon cancer include stage 1, stage 11, stage III and stage IV. Unless otherwise
specified, the term colon cancer refers to colon cancer at Stage 0, Stage 1, Stage 11
(including Stage IIA or IIB), Stage III ding Stage IIIA, IIIB or IIIC), or Stage
IV. In some embodiments herein, the colon cancer is from any stage. In other
embodiments, the colon cancer is a stage II colon cancer.
Chemotherapeutic agents: Any chemical agent with therapeutic usefulness
in the treatment of diseases characterized by abnormal cell growth. Such diseases
include tumors, neoplasms, and cancer as well as diseases characterized by
hyperplastic growth such as psoriasis. In one embodiment, a chemotherapeutic agent
is an agent of use in treating colon cancer. In one embodiment, a chemotherapeutic
agent is a radioactive compound. One of skill in the art can readily identify a
herapeutic agent of use (see for example, Slapak and Kufe, ples of
Cancer Therapy, r 86 in Harrison's Principles of Internal Medicine, 14th
edition; Perry et al., Chemotherapy, Ch. 17 in Abeloff, Clinical Oncology 2nd ed.,
2000 Churchill Livingstone, Inc; r and Berkery. (eds): Oncology Pocket Guide
to Chemotherapy, 2nd ed. St. Louis, Mosby-Year Book, 1995; Fischer Knobf, and
Durivage (eds): The Cancer Chemotherapy Handbook, 4th ed. St. Louis, Mosby-Year
Book, 1993). Chemotherapeutic agents used for treating colon cancer include small
molecules such as S-fluorourcil, leuvocorin, irinotecan, oxaliplatin, and tabine,
and antibodies such bevacuzimab and cetuximab. Combination chemotherapy is the
administration of more than one agent to treat cancer.
Contacting: Placement in direct physical association; includes both in solid
and liquid form. Contacting includes contact between one molecule and another
molecule, for example; contacting a sample with a nucleic acid probe, such as a probe
for any of the sequences shown in Table 6.
Control: A “control” refers to a sample or standard used for comparison with
an experimental , such as a tumor sample obtained from a patient with colon
cancer. In some embodiments, the control is a sample obtained from a healthy patient
or a non-cancerous tissue sample obtained from a patient diagnosed with colon
cancer, such as a non-cancerous tissue sample from the same organ in which the
tumor resides (e.g., ncerous colon tissue can serve as a control for a colon
cancer). In some ments, the control is a historical control or standard value
(i.e., a usly tested l sample or group of samples that represent baseline or
normal values).
Controls or standards for comparison to a sample, for the determination of
differential expression, include samples believed to be normal (in that they are not
altered for the d characteristic, for example a sample from a subject who does
not have colon cancer) as well as laboratory , even though possibly arbitrarily
set. Laboratory standards and values may be set based on a known or determined
population value and can be supplied in the format of a graph or table that permits
comparison of measured, experimentally determined values.
Detecting expression: Determining of a level expression in either a
qualitative or quantitative manner can detect nucleic acid. Exemplary methods include
rray analysis, RT-PCR, and Northern blot. In some examples, detecting
expression includes detecting the expression of one or more of the transcripts in Table
Differential expression or altered expression: A difference, such as an
increase or decrease, in the conversion of the information encoded in a gene (such as
any of the genes from Table l, 2, and/or nucleic acid transcripts in Table 6) into
messenger RNA, the conversion ofmRNA to a protein, or both. In some examples,
the difference is relative to a control or reference value, such as an amount of
expression of a c acid ript in tissue not ed by a disease, such as
colon cancer, from the same subject, or an amount expected in a different subject who
does not have colon . The difference can also be in a non-cancerous tissue from
a subject (that has the cancer in the same organ) as compared to tissue from a
different subject not afflicted with colon . Detecting differential expression can
include measuring a change in gene or protein expression, such as a change in
expression of one or more of the genes listed in Table 1, 2, and/or the expression one
or more transcripts shown in Table 6.
Downregulated or sed: When used in reference to the expression of a
c acid molecule, refers to any process that results in a decrease in production of
the nucleic acid. A gene product can be RNA (such as mRNA, rRNA, tRNA, and
structural RNA) or protein. Therefore, gene gulation or deactivation includes
processes that se ription of a gene or translation ofmRNA.
Gene downregulation includes any detectable decrease in the production of a
gene product. In certain examples, production of a gene product decreases by at least
12 fold, such as at least 2-fold, at least 3-fold or at least 4-fold, as compared to a
control (such an amount of gene expression, such as a normalized gene expression in
a normal cell). In several examples, a control is a relative amount of gene expression
or protein expression in one or more ts who do not have colon cancer, such as
the ve amount of gene expression or protein expression in “cancer-free” subjects
who do not have any known cancer.
Exon: In theory, a segment of an interrupted gene that is represented in the
messenger RNA product. In theory the term "intron" refers to any segment of DNA
that is transcribed but removed from within the transcript by splicing together the
exons on either side of it. Operationally, exon sequences occur in the mRNA
sequence of a gene as defined by Ref. Seq ID numbers. ionally, intron
ces are the ening sequences within the genomic DNA of a gene,
bracketed by exon sequences and having GT and AG splice consensus sequences at
their 5' and 3' boundaries.
Expression: The process by which the coded information of a gene is
converted into an operational, non-operational, or structural part of a cell, such as the
synthesis of nucleic acid or a protein. Gene expression can be influenced by external
signals. For instance, re of a cell to a hormone may stimulate expression of a
hormone-induced gene. ent types of cells can respond differently to an identical
signal. Expression of a gene also can be regulated anywhere in the pathway from
DNA to RNA to protein. Regulation can include controls on transcription, translation,
RNA transport and processing, degradation of intermediary les such as
mRNA, or through activation, inactivation, compartmentalization or degradation of
specific protein molecules after they are produced.
The expression of a nucleic acid le can be d, for example relative
to expression in a normal (e.g., non-cancerous) sample. An alteration in gene
expression, such as differential expression, includes but is not limited to: (1)
overexpression; (2) underexpression; or (3) suppression of expression. tions in
the expression of a nucleic acid molecule can be associated with, and in fact cause, a
change in expression of the corresponding protein. “Expression” and/0r “relative
expression” can be considered the expression value after normalization of a specific
transcript with respect to a threshold value, which is defined in the t of the
expression of all other transcripts in an expression signature, such as a colon cancer
sion signature. The overall expression data for a given sample is normalized
using methods known to those skilled in the art in order to correct for differing
amounts of starting material, varying encies of the extraction and amplification
reactions etc. Using a linear classifier on the normalized data to make a diagnostic or
prognostic call (e.g. good or poor prognosis) effectively means to split the data space,
z'.e. all possible ations of sion values for all genes in the signature, into
two nt halves by means of a separating hyperplane. This split is empirically
2012/022594
derived on a large set of training examples, for example from patients with good and
poor prognosis. Without loss of generality, one can assume a certain fixed set of
values for all but one genes, which would automatically define a threshold value for
this remaining gene where the decision would change from, for example, good to poor
prognosis. Expression values above this dynamic threshold would then either indicate
good (for a gene with a negative weight) or poor prognosis (for a gene with a ve
weight). The precise value of this threshold depends on the actual measured
expression profile of all other genes within the signature, but the general indication of
certain genes remains fixed, 2'. e. high values or “relative over-expression” always
contributes to either a poor prognosis decision (genes with a positive weight) or good
prognosis decision (genes with a negative weights). Therefore, in the context of the
overall gene expression signature relative sion can indicate if either up- or
down-regulation of a certain transcript is indicative of good or poor prognosis.
Gene amplification: A process by which multiple copies of a gene or gene
nt are formed in a particular cell or cell line. The duplicated region (a stretch of
amplified DNA) is often referred to as an con." y, the amount of the
messenger RNA (mRNA) produced, z'.e., the level of gene expression, also increases
in the proportion of the number of copies made of the particular gene expressed.
Expression profile (or fingerprint or signature): A pattern of gene
expression, which is characteristic of, or correlated with, a specific e stage or a
specific prognostic outcome. The gene expression signature may be ented by a
set of informative genes, or transcripts thereof, coding or non-coding or both. The
expression levels of the transcripts within the signatures can be evaluated to make a
prognostic determination with, but not limited to, the methods provided herein. Gene
expression levels may be used to distinguish between two al conditions or
outcomes such as normal and ed tissue for diagnosis, or responsiveness
compared to non-responsiveness for prognostic methods and recurring compared to
curring for predictive methods. Differential or altered gene expression can be
detected by changes in the detectable amount of gene expression (such as cDNA or
mRNA) or by changes in the detectable amount of proteins expressed by those genes.
A distinct or identifiable pattern of gene expression, for instance a n of high and
low sion of a defined set of genes or gene-indicative nucleic acids such as
ESTs; in some examples, as few as one or two genes provides a profile, but more
genes can be used in a profile, for example at least 2, at least 3, at least 4, at least 5, at
2012/022594
least 6, at least 7, at least 9, at least 10 or at least 11 and so on. In some embodiments,
the profile comprises at least about 200 genes (or “transcripts”) and up to about 1000
transcripts, such as fiom about 400 transcripts to about 800 transcripts, or about 500
transcripts to about 700 transcripts. The profile comprises transcripts from Table 6
(e.g., at least 100, at least 200, at least 300, at least 400, at least 500, or at least 600
transcripts from Table 6), including in some embodiments the 636 transcripts listed in
Table 6. As used herein, the term “gene” refers to an expressed ript, which may
be a characterized gene, or may be an sed transcript such as an EST. In some
embodiments, the ion platform is a microarray, and each probe is ered as
determining the expression of a separate “gene” or “transcript.”
A gene expression profile (also referred to as a fingerprint or signature) can
be linked to a tissue or cell type (such as colon tissue), to a particular stage of normal
tissue growth or disease progression (such as colon cancer), or to any other distinct or
identifiable condition that influences gene expression in a predictable way. Gene
expression profiles can include relative as well as absolute expression levels of
specific genes, and can be viewed in the context of a test sample compared to a
baseline or control sample profile (such as a sample from a subject who does not have
colon cancer). In one example, a gene expression profile in a subject is read on an
array (such as a nucleic acid array).
Hybridization: To form base pairs n complementary s of two
strands of DNA, RNA, or n DNA and RNA, thereby forming a duplex
molecule, for example a duplex formed between a probe and any of the nucleic acid
sequences shown in Table 6 or the complement thereof Hybridization conditions
resulting in particular degrees of stringency will vary depending upon the nature of
the hybridization method and the composition and length of the hybridizing nucleic
acid sequences. Generally, the temperature of hybridization and the ionic strength
(such as the Na+ concentration) of the hybridization buffer will determine the
stringency of hybridization. Calculations ing hybridization conditions for
attaining particular degrees of stringency are discussed in Sambrook et al., (1989)
Molecular g, second edition, Cold Spring Harbor Laboratory, Plainview, NY
(chapters 9 and 11). The following is an exemplary set of hybridization ions
and is not ng:
Ve Hi h Strin enc detects se uences that share at least 90% identit
Hybridization: 5x SSC at 65°C for 16 hours
Wash twice: 2x SSC at room temperature (RT) for 15 s each
Wash twice: 0.5x SSC at 65°C for 20 minutes each
High Stringency (detects sequences that share at least 80% identity)
Hybridization: 5x-6x SSC at 65°C-70°C for 16-20 hours
Wash twice: 2x SSC at RT for 5-20 minutes each
Wash twice: lx SSC at 55°C-70°C for 30 minutes each
Low Stringency (detects ces that share at least 60% identity}
ization: 6x SSC at RT to 55°C for 16-20 hours
Wash at least twice: 2x-3x SSC at RT to 55°C for 20-30 minutes each
Isolated: An “isolated” biological ent (such as a nucleic acid
molecule, protein, or cell) has been ntially separated or purified away from
other biological components in the cell of the organism, or the organism itself, in
which the component naturally occurs, such as other chromosomal and extra-
chromosomal DNA and RNA, proteins and cells. The term also embraces nucleic acid
molecules prepared by inant expression in a host cell as well as chemically
synthesized nucleic acid molecules. For example, an isolated cell, such as a colon
cancer cell, is one that is substantially separated from other types of cells.
Label: An agent capable of ion, for example by ELISA,
ophotometry, flow cytometry, or microscopy or other visual techniques. For
example, a label can be attached to a nucleic acid le or protein, thereby
permitting detection of the nucleic acid molecule or protein. For example a nucleic
acid le or an antibody that specifically binds to a target molecule, such as a
target nucleic acid molecule. Examples of labels include, but are not limited to,
radioactive isotopes, enzyme substrates, tors, ligands, chemiluminescent agents,
fluorophores, haptens, enzymes, and combinations thereof. Methods for labeling and
guidance in the choice of labels appropriate for various purposes are discussed for
example in Sambrook et al. (Molecular Cloning: A tory Manual, Cold Spring
Harbor, New York, 1989) and Ausubel et al. (In Current Protocols in Molecular
Biology, John Wiley & Sons, New York, 1998).
Long term survival: Disease-free survival for at least 3 years, more
preferably for at least 5 years, even more preferably for at least 8 years following
surgery or other treatment (e.g., chemotherapy) for colon cancer.
More aggressive: As used herein, a “more sive” form of a colon cancer
is a colon cancer with a relatively increased risk of metastasis or recurrence (such as
following surgical removal of the tumor). A “more aggressive” colon cancer can also
refer to a colon cancer that s an increased likelihood of death, or a decrease in
the time until death, upon a subject with the colon cancer. A subject having a “more
sive” form of a colon cancer is considered high risk (poor prognosis).
Nucleic acid molecules representing genes: Any nucleic acid, for example
DNA (intron or exon or both), cDNA, or RNA (such as mRNA), of any length
suitable for use as a probe or other indicator molecule, and that is informative about
the corresponding gene, such as those listed in Tables 1, or 2, for example the
transcripts listed in Table 6.
Oligonucleotide: A relatively short polynucleotide, including, without
limitation, single-stranded deoxyribonucleotides, - or -stranded
ribonucleotides, RNA:DNA hybrids and double-stranded DNAs. Oligonucleotides,
such as single-stranded DNA probe oligonucleotides, are often synthesized by
al methods, for example using automated oligonucleotide synthesizers that are
commercially available. r, oligonucleotides can be made by a variety of other
methods, including in vitro recombinant diated techniques and by expression
ofDNAs in cells and organisms.
t: As used herein, the term nt” includes human and non-human
animals. The preferred patient for treatment is a human. “Patient” and “subject” are
used interchangeably herein.
Patient se: can be assessed using any endpoint indicating a benefit to
the patient, including, without limitation, (1) inhibition, to some extent, of tumor
growth, including slowing down and complete growth arrest; (2) reduction in the
number oftumor cells; (3) reduction in tumor size; (4) inhibition (z'.e., reduction,
slowing down or complete stopping) of tumor cell infiltration into adjacent peripheral
organs and/or s; (5) inhibition (1'. e. reduction, slowing down or complete
stopping) of metastasis; (6) enhancement of umor immune se, which may,
but does not have to, result in the regression or rejection of the tumor; (7) relief, to
some extent, of one or more symptoms associated with the tumor; (8) increase in the
length of survival following treatment; and/or (9) decreased mortality at a given point
of time following treatment.
Polynucleotide: When used in singular or plural, generally refers to any
polyribonucleotide or polydeoxribonucleotide, which may be unmodified RNA or
DNA or modified RNA or DNA, or even combinations thereof. Thus, for instance,
WO 03250
polynucleotides as defined herein include, without tion, single- and double-
ed DNA, DNA including single- and double-stranded regions, single- and
double-stranded RNA, and RNA including single- and double-stranded regions,
hybrid molecules comprising DNA and RNA that may be single-stranded or, more
typically, double-stranded or include single- and double-stranded regions. The term
"polynucleotide" also includes DNAs and RNAs that contain one or more modified
bases. Thus, DNAs or RNAs with backbones modified for stability or for other
reasons are "polynucleotides" as that term is intended herein. Moreover, DNAs or
RNAs comprising l bases, such as e, or modified bases, such as tritiated
bases, are included within the term "polynucleotides" as defined herein. In general,
the term "polynucleotide" embraces all ally, enzymatically and/or
metabolically modified forms of unmodified polynucleotides, as well as the chemical
forms ofDNA and RNA characteristic of viruses and cells, ing simple and
complex cells.
Probes and primers: A probe comprises an isolated nucleic acid capable of
hybridizing to a target nucleic acid (such one of the nucleic acid ces shown in
Table 6 or the complement f). A detectable label or reporter molecule can be
attached to a probe. Typical labels include radioactive isotopes, enzyme substrates,
co-factors, ligands, chemiluminescent or fluorescent , haptens, and enzymes.
Methods for preparing and using nucleic acid probes and primers are described, for
example, in Sambrook et al. (In Molecular Cloning: A Laboratory Manual, CSHL,
New York, 1989), Ausubel et al. (ed.) (In Current Protocols in Molecular Biology,
John Wiley & Sons, New York, 1998), and Innis et al. (PCR Protocols, A Guide to
Methods and Applications, Academic Press, Inc., San Diego, CA, 1990). Methods for
labeling and guidance in the choice of labels appropriate for various purposes are
discussed, for example in Sambrook et al. (In Molecular Cloning: A Laboratory
Manual, CSHL, New York, 1989) and Ausubel et al. (In Current Protocols in
Molecular Biology, John Wiley & Sons, New York, 1998).
Probes are generally at least 12 tides in length, such as at least 12, at least
13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, least 20, at least
21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at
least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36,
at least 37, at least 38, at least 39, at least 40, at least 45, at least 50, or more uous
nucleotides complementary to the target nucleic acid molecule, such as a primer of 15-
50 nucleotides, 20-50 nucleotides, or 15-30 nucleotides. In some es, a probe is
even longer, such as a cDNA probe, which can be from about 500 to more than 5000
nucleotides in length.
Primers are short nucleic acid molecules, for instance DNA oligonucleotides 10
nucleotides or more in length, which can be annealed to a complementary target nucleic
acid molecule by nucleic acid ization to form a hybrid n the primer and
the target nucleic acid strand. A primer can be extended along the target nucleic acid
molecule by a polymerase enzyme. Therefore, primers can be used to amplify a target
nucleic acid molecule (such as a nucleic acid ce shown in Table 6).
The specificity of a primer and/or a probe increases with its length. Thus, for
example, a primer that includes 30 consecutive nucleotides will anneal to a target
ce with a higher specificity than a corresponding primer of only 15 tides.
Thus, to obtain greater specificity, probes and primers can be selected that include at
least 15, 20, 25, 30, 35, 40, 45, 50 or more consecutive nucleotides. In particular
examples, a primer is at least 15 tides in length, such as at least 15 contiguous
nucleotides complementary to a target nucleic acid molecule. Particular lengths of
s that can be used to practice the methods of the present disclosure include
primers having at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at
least 21, at least 22, at least 23, at least 24, at least 25, at least 26, at least 27, at least
28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at
least 36, at least 37, at least 38, at least 39, at least 40, at least 45, at least 50, or more
contiguous nucleotides complementary to the target nucleic acid molecule to be
amplified, such as a primer of 15-50 nucleotides, 20-50 nucleotides, or 15-30
nucleotides. One of most important factors considered in PCR primer design include
primer length, melting ature (Tm), and GC content, specificity, mentary
primer sequences, and 3'-end sequence. In general, optimal PCR primers are generally
l7-30 bases in length, and contain about 20-80%, such as, for example, about 50-60%
G+C bases. Tm's n 50°C and 80°C, e.g. about 50°C to 70° C are typically
preferred.
Primer pairs can be used for amplification of a nucleic acid sequence, for
example, by PCR, real-time PCR, or other nucleic-acid amplification methods known in
the art. An eam” or “forward” primer is a primer 5' to a reference point on a
nucleic acid sequence. A “downstream” or “reverse” primer is a primer 3' to a reference
2012/022594
point on a nucleic acid sequence. In general, at least one forward and one reverse
primer are included in an amplification reaction.
Nucleic acid probes and s can be readily prepared based on the nucleic
acid molecules provided herein, for example, by using computer programs intended
for that purpose such as Primer (Version 0.5, © 1991, Whitehead Institute for
Biomedical Research, dge, MA) or PRIMER EXPRESS® Software ed
Biosystems, AB, Foster City, CA).
Further guidelines for PCR primer and probe design may be found in
Dieffenbach et al. General Concepts for PCR Primer Design in: PCR Primer, A
Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 1995, pp. 133
155; Innis and Gelfand, Optimization of PCRs in: PCR Protocols, A Guide to
Methods and Applications, CRC Press, London, 1994, pp. 5 11; and rer,
Primerselect: Primer and probe design. Methods M01. Biol. 70:520 527, 1997.
Prognosis: The likelihood of the clinical outcome for a t afflicted with a
specific disease or disorder. With regard to cancer, the prognosis is a representation of
the hood (probability) that the subject will survive (such as for one, two, three,
four or five years) and/or the likelihood (probability) that the tumor will metastasize.
The term "prediction" is used herein to refer to the hood that a patient will
respond either favorably or unfavorably to a drug or set of drugs, and also the extent
of those ses. The predictive methods of the present invention can be used
clinically to make treatment decisions by choosing the most appropriate treatment
modalities for any particular patient. The tive s of the present disclosure
are valuable tools in predicting if a patient is likely to respond favorably to a
treatment regimen, such as surgical intervention, chemotherapy with a given drug or
drug combination, and/or radiation therapy.
Purified: The term "purified" does not require absolute ; rather, it is
intended as a relative term. Thus, for example, a purified oligonucleotide preparation is
one in which the oligonucleotide is more pure than in an nment including a
complex mixture of oligonucleotides.
Sample: A biological specimen ning genomic DNA, RNA (including
mRNA and microRNA), protein, or combinations thereof, obtained from a subject.
Examples include, but are not limited to, peripheral blood, urine, saliva, tissue biopsy,
aspirate, surgical specimen, and autopsy material, and includes fixed and/or paraffin
embedded samples. In one e, a sample includes a biopsy of a colon (such as
colon cancer tumor), a sample of noncancerous tissue, or a sample of normal tissue
(from a subject not afflicted with a known disease or disorder, such as a cancer-free
subject).
Sequence identity/similarity: The identity/similarity between two or more
nucleic acid sequences, or two or more amino acid sequences, is expressed in terms of
the identity or similarity between the sequences. Sequence ty can be measured in
terms of percentage identity; the higher the percentage, the more identical the sequences
are. Sequence similarity can be measured in terms of tage similarity (which takes
into account conservative amino acid substitutions); the higher the percentage, the more
similar the sequences are.
Methods of alignment of sequences for comparison are well known in the art.
Various programs and alignment thms are described in: Smith & Waterman, Adv.
Appl. Math. 2:482, 1981; Needleman & Wunsch, J. Mol. Biol. 48:443, 1970; n &
Lipman, Proc. Natl. Acad. Sci. USA 85:2444, 1988; Higgins & Sharp, Gene, 73:237-44,
1988; Higgins & Sharp, CABIOS 5:151-3, 1989; Corpet et al., Nuc. Acids Res.
16: 10881-90, 1988; Huang et al. Computer Appls. in the Biosciences 8, 155-65, 1992;
and Pearson et al., Meth. Mol. Bio. 24:307-31, 1994. Altschul et al., J. Mol. Biol.
215 :403-10, 1990, presents a detailed consideration of sequence alignment methods and
gy calculations.
The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al., J.
Mol. Biol. 215:403-10, 1990) is available from several sources, including the National
Center for hnology (NCBI, National Library of Medicine, Building 38A, Room
8N805, Bethesda, MD 20894) and on the et, for use in tion with the
sequence analysis ms blastp, blastn, blastx, tblastn and tblastx. Additional
information can be found at the NCBI web site.
BLASTN is used to compare nucleic acid sequences, while BLASTP is used
to compare amino acid ces. If the two compared sequences share homology,
then the designated output file will present those regions of homology as aligned
sequences. If the two compared sequences do not share homology, then the
designated output file will not t aligned ces.
Once d, the number of s is determined by counting the number of
positions where an identical nucleotide or amino acid residue is presented in both
sequences. The percent sequence identity is determined by dividing the number of
matches either by the length of the sequence set forth in the identified sequence, or by
an articulated length (such as 100 consecutive nucleotides or amino acid residues
from a sequence set forth in an identified ce), followed by multiplying the
resulting value by 100. For e, a c acid sequence that has 1 166 s
when aligned with a test sequence having 1554 nucleotides is 75.0 percent identical to
the test sequence (1 l66+l554*100=75.0). The percent sequence identity value is
rounded to the nearest tenth. For example, 751 l, 75.12, 75.13, and 75.14 are d
down to 75.1, while 75.15, 75.16, 75.17, 75.18, and 75.19 are rounded up to 75.2. The
length value will always be an r. In another example, a target ce
containing a 20-nucleotide region that aligns with 20 consecutive nucleotides from an
identified sequence as follows contains a region that shares 75 percent sequence
identity to that identified sequence (that is, l5+20*100=75).
One indication that two nucleic acid molecules are closely related is that the two
les hybridize to each other under stringent conditions, as described above.
c acid sequences that do not show a high degree of identity may nevertheless
encode identical or similar (conserved) amino acid sequences, due to the degeneracy of
the genetic code. Changes in a nucleic acid sequence can be made using this degeneracy
to produce multiple nucleic acid molecules that all encode substantially the same
protein. Such homologous c acid sequences can, for example, possess at least
about 60%, 70%, 80%, 90%, 95%, 98%, or 99% sequence identity to a molecule listed
in Table 6 determined by this .
One of skill in the art will appreciate that the particular sequence identity ranges
are provided for guidance only; it is possible that strongly icant homologs could
be obtained that fall outside the ranges provided.
Splicing or RNA splicing: An RNA processing that removes introns and joins
exons to produce mature mRNA with continuous coding sequence that moves into the
cytoplasm of a eukaryotic cell.
Transcript or gene product: An RNA molecule that is generated or derived
through the process of transcription from its corresponding DNA or a cDNA template.
Transcripts include coding and ding RNA molecules such as, but not limited
to, messenger RNAs (mRNA), alternatively spliced mRNAs, ribosomal RNA
(rRNA), transfer RNAs (tRNAs) in addition to a large range of other transcripts,
which are not translated into protein such as small nuclear RNAs (snRNAs), nse
molecules such as short interfering RNA (siRNA) and microRNA (miRNA) and other
RNA transcripts ofunknown fianction. In some embodiments, a transcript is a nucleic
acid ce shown in Table 6.
Therapeutic: A generic term that includes both sis and treatment.
Treatment: Includes both therapeutic treatment and prophylactic or
preventative measures, wherein the object is to prevent or slow down (lessen) the
targeted pathologic condition or disorder. Those in need of treatment include those
already with the disorder as well as those prone to have the disorder or those in whom
the disorder is to be prevented. In tumor (e.g. cancer) treatment, a treatment such as
surgery, chemotherapy or radiation may directly decrease the ogy of tumor
cells, or render the tumor cells more susceptible to fiarther treatment.
Tumor, neoplasia, malignancy or cancer: Neoplastic cell growth and
proliferation, whether malignant or benign, and all pre-cancerous and ous cells
and tissues and the result of abnormal and uncontrolled growth of cells. The terms
"cancer" and "cancerous" refer to or describe the logical condition in mammals
that is typically terized by unregulated cell growth. Neoplasia, ancy,
cancer and tumor are often used interchangeably and refer to abnormal growth of a
tissue or cells that results from excessive cell division. The amount of a tumor in an
individual is the “tumor burden” which can be measured as the number, volume, or
weight of the tumor. A tumor that does not metastasize is referred to as “benign.” A
tumor that invades the surrounding tissue and/or can metastasize is referred to as
“malignant.” A “non-cancerous tissue” is a tissue from the same organ wherein the
malignant neoplasm formed, but does not have the characteristic pathology of the
sm. Generally, noncancerous tissue appears histologically . A “normal
tissue” is tissue from an organ, wherein the organ is not affected by cancer or another
disease or disorder of that organ. A r-free” subject has not been diagnosed
with a cancer of that organ and does not have detectable .
The "pathology" of cancer includes all phenomena that compromise the well-
being of the t. This includes, without limitation, abnormal or uncontrollable cell
growth, metastasis, erence with the normal functioning of neighboring cells,
release of cytokines or other secretory products at abnormal levels, suppression or
aggravation of inflammatory or immunological response, neoplasia, premalignancy,
malignancy, invasion of surrounding or distant tissues or organs, such as lymph
nodes, etc.
Tumor-Node-Metastasis (TNM): The TNM classification of malignant
tumors is a cancer staging system for describing the extent of cancer in a patient’s
body. T describes the size of the primary tumor and whether it has invaded nearby
; N describes any lymph nodes that are involved; and M describes metastasis.
TNM is developed and maintained by the International Union Against Cancer to
achieve sus on one globally recognized standard for classifying the extent of
spread of cancer.
Upregulated or activation: When used in reference to the expression of a
nucleic acid molecule, refers to any process that results in an increase in production of
a gene product. A gene product can be RNA (such as mRNA, rRNA, tRNA, and
structural RNA) or protein. Therefore, gene upregulation or activation includes
processes that increase ription of a gene or translation ofmRNA, such as an
inflammatory gene.
Examples of processes that increase transcription include those that facilitate
ion of a transcription initiation complex, those that increase transcription
initiation rate, those that increase transcription elongation rate, those that increase
processivity of ription and those that relieve transcriptional repression (for
example by blocking the binding of a transcriptional repressor). Gene upregulation
can include inhibition of repression as well as stimulation of expression above an
ng level. Examples of ses that increase translation include those that
increase translational initiation, those that se translational elongation and those
that increase mRNA ity.
Gene upregulation includes any detectable increase in the production of a gene
product, such as an inflammatory gene. In n examples, production of a gene
product increases by at least 1.2 fold, such as at least 2-fold, at least 3-fold, at least 4-
fold, at least , at least 8-fold, at least lO-fold, or at least d, as compared to
a control (such an amount of gene expression and/or normalized gene expression in a
normal cell).
Weight: With reference to the gene signatures disclosed , refers to the
relative importance of an item in a statistical calculation, for example the relative
importance of a Transcript in Table 6. The weight of each transcript in a gene
expression signature may be determined on a data set of patient samples using
analytical methods known in the art. Exemplary procedures are described below.
Suitable methods and materials for the practice or testing of this disclosure are
described below. Such methods and materials are rative only and are not
intended to be limiting. Other methods and materials similar or equivalent to those
described herein can be used. For example, conventional methods well known in the
art to which this disclosure ns are bed in s general and more specific
references, ing, for example, Sambrook et al., Molecular Cloning: A Laboratory
Manual, 2d ed., Cold Spring Harbor Laboratory Press, 1989; Sambrook et al.,
Molecular Cloning: A tory Manual, 3d ed., Cold Spring Harbor Press, 2001;
Ausubel et al., Current Protocols in Molecular Biology, Greene Publishing
Associates, 1992 (and Supplements to 2000); Ausubel et al., Short Protocols in
Molecular Biology: A Compendium of Methods from Current Protocols in Molecular
Biology, 4th ed., Wiley & Sons, 1999; Harlow and Lane, Antibodies: A Laboratory
Manual, Cold Spring Harbor tory Press, 1990; Harlow and Lane, Using
Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1999
Oligonucleotide Synthesis, (M. J. Gait, ed., 1984); Animal Cell Culture, Freshney,
ed., 1987; Methods in Enzymology, Academic Press, Inc.); Handbook of
Experimental Immunology, 4.sup.th ed., D. M. Weir & C. C. Blackwell, eds.,
Blackwell e Inc., 1987; Gene Transfer Vectors for Mammalian Cells, J. M.
Miller & M. P. Calos, eds., 1987); and PCR: The Polymerase Chain Reaction, Mullis
et al., eds., 1994. In addition, the materials, methods, and examples are illustrative
only and not ed to be limiting.
II. ption ofSeveral Embodiments
A. Colon Cancer Expression Signature and Methods of Use
Disclosed herein are expression signatures from colon cancer. The disclosed
signatures can be used for applications in prognosis of colon cancer, diagnosis of
colon cancer and classifying patient groups. In some embodiments, a sample obtained
from a t, such as a patient, is processed into a set of cleotide binding
targets that represent transcripts expressed in the tissue sample. The polynucleotide
binding targets are probed with complementary polynucleotide probes representing,
or corresponding to the signatures described herein in order to obtain information on
expression levels of the transcripts. A on score is optionally calculated that
represents the expression levels of the transcripts in the signature. The decision score
is then ed to a control, such as a patient tion, and genetically similar
samples are correlated with known patient response or clinical outcomes. For
example, sensitive methods are also provided to predict patient response to, and
prognosis after, treatment for colon cancer, such as surgical resection and/or
chemotherapy. Generally, historical patient population data and tissue s are
ed to create genetic profiles for patients having a past history of colon cancer.
In some embodiments, the genetic profile of a patient sample is converted to a
decision score. The clinical outcomes of each patient are correlated to the genetic
, or decision score derived mathematically from the genetic profile for each
patient’s individual cancer.
In some embodiments, a mathematical algorithm is generated using the known
historical patient data and applied to the predictive methods for new ts with
colon cancer. In some embodiment, the algorithm s a threshold that tes
two groups of patients ing on selection criteria, for example t outcome,
response to therapy and recurrence, and the like. In some examples, the mathematical
algorithm or threshold is validated using further historical patient population data
before being used in the predictive methods described herein. The mathematical
algorithm or threshold may then be used as a reference, for example as a l, to
compare decision scores derived from genetic profiling of patients desirous of
predictive methods of colon cancer. In some embodiments, these results permit
assessment of genomic evidence of the efficacy of surgery alone, or in combination
with adjuvant chemotherapy for treatment of colon cancer.
The signatures described herein may be cant in, and capable of,
discriminating between two diagnoses or prognostic outcomes. An important aspect
of the present disclosure is to use the measured sion of certain genes in colon
cancer tissue to match ts to the most riate treatment, and to provide
prognostic ation.
In some embodiments, the signatures are developed using a colorectal cancer-
focused microarray research tool. In a specific embodiment, this research tool is a
colorectal cancer transcriptome-focused research array developed by Almac
Diagnostics, Ltd. (Almac Diagnostics, Ltd., N. d) capable of delivering accurate
expression data.
The Colorectal Cancer DSATM research tool contains 61,528 probe sets and
encodes 52,306 transcripts confirmed as being expressed in colon cancer and normal
tissue. Comparing the Colorectal Cancer DSATM ch tool against the National
WO 03250
Center for Biotechnology ation (NCBI) human Reference Sequence (RefSeq)
RNA database (available on the world wide web at ncbi.nlm.nih.gov/RefSeq/) using
BLAST analysis, 21,968 (42%) transcripts are present and 26,676 (51%) of
transcripts are absent from the human RefSeq database. Furthermore 7% of the
content represents expressed antisense transcripts to annotated genes. (Johnston et al.,
J. Clin. Oncol. 24: 3519, 2006; Pruitt et al., Nucleic Acids Research 33: D501-D504,
2005). In addition, probe-level analysis of the Colorectal Cancer DSATM ed
with leading generic arrays, highlighted that approximately 20,000 (40%) transcripts
are not contained on the leading generic rray platform (Affymetrix) and are
unique to the ctal Cancer DSATM. Thus, the Colorectal Cancer DSATM
research tool includes ripts that have not been available in hitherto performed
gene expression s.
In some embodiments, the expression of a transcript in a gene sion
signature is considered informative if expression levels are increased or decreased
n the conditions of interest. Increases or decreases in gene expression can be
assessed by methods known to those skilled in the art that include, but are not limited
to, using fold changes, t-tests, F-tests, Wilcoxon rank-sum tests, ANOVA (Cui et al.,
Genome Biology 4:210, 2003)) or dedicated methods for detecting differential
expression such as Significance Analysis of Microarrays (Tusher et al., Proc. Natl.
Acad. Sci. USA 98:5116-21, 2001)) or LIMMA (Smyth, Stat. Appl. Genet. Mol. Biol,
3:Art.3, 2004)).
In some embodiments, the transcripts in the signature are used to form a
weighted sum of their signals, where individual weights can be positive or negative.
The resulting sum (“decisive on”) is compared with a termined reference
point. The comparison with the reference point may be used to diagnose, or predict a
clinical condition or e.
One of ordinary skill in the art will appreciate that the transcripts included in
the ure provided in Table 1, 2, and/or 6 will carry unequal weights in a signature
for diagnosis or prognosis of colon cancer. Therefore, while as few as 1 sequence may
be used to diagnose or predict an outcome, the specificity and ivity or diagnosis
or prediction accuracy may increase using more sequences. Table 6 ranks the
transcripts in order of decreasing weight in the signature, defined as the rank of the
average weight in the compound decision score function measured under cross-
validation. The weight rank also corresponds to the SEQ ID NO: in the anying
sequence listing thus the transcript with the greatest weight is SEQ ID NO: 1.
In some embodiments, a signature includes at least 2, such as at least 3, at
least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at
least 12, at least 13, at least 14, at least 15, at least 20, at least 25, at least 30, at least
, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at
least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 125, at
least 150, at least 175, at least 200, at least 225, at least 250, at least 275, at least 300,
at least 325, at least 350, at least 375, at least 400, at least 425, at least 450, at least
475, at least 500, at least 525, at least 550, at least 575, at least 600, at least 634, or
even all 636 of the transcripts in Table 6 that carry the greatest weight, defined as the
rank of the average weight in the compound decision score function measured under
cross-validation, and still have prognostic value. In some embodiments, a signature
includes the top 10 ed transcripts, the second top 10 weighted transcripts, the
third top 10 weighted transcripts, the fourth top 10 weighted transcripts, the fifth top
weighted transcripts, the sixth top 10 weighted transcripts, the seventh top 10
weighted transcripts, the eighth top 10 weighted ripts, the ninth top 10 weighted
ripts, or the tenth top 10 weighted transcripts listed in Table 6. In yet further
embodiments, a signature includes the 636, 634, 620, 610, 600, 590, 580, 570, 560,
550, 540, 530, 520, 510, 500, 490, 480, 470, 460, 450, 440, 430, 420, 410, 400, 390,
380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220,
210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30,
, or 10 transcripts having the greatest weight listed in Table 6. In some
embodiments, the signature is based on expression levels of from about 200 to about
1000 transcripts, such as from about 400 to about 800 transcripts, such as from about
500 to about 700 transcripts, or in some ments, from about 550 to about 650
transcripts, including those from Table 6 (e. g., at least about 50, at least about 100, at
least about 200, at least about 300, at least about 400, at least about 500, or at least
about 600, or all transcripts from Table 6) as described above.
In one embodiment, a specific signature may be used for the s disclosed herein
that includes transcripts for MUMl and SIGMARl. In another embodiment, a
signature may be used for the methods sed herein that includes transcripts for
MUMl, l, ARSD, SULTlC2 and PPFIBPl. In yet another embodiment, a
signature may be used for the methods disclosed herein that includes transcripts for
ARSD, CXCL9, PCLO, SLC2A3, FCGBP, SLC2A14, SLC2A3, BCL9L and
antisense sequences of MUC3A, OLFM4 and RNF39. This signature is represented
by Table 1 below.
Table 1: 10 candidate core transcripts within the 636 ript signature
Gene Name Weight Rank in DAUC Orientation
636 transcript (Univariate)
signature
ARSD 3 -0.0109 Sense
CXCL9 24 -0.0103 Sense
PCLO 272 -0.0095 Sense
SLC2A3 23 -0.0087 Sense
FCGBP 416 -0.0062 Sense
SLC2A14 /// -0.0061 Sense
SLC2A3 55
BCL9L 175 -0.0059 Sense
MUC3A 112 -0.0084 AntiSense
OLFM4 61 -0.0083 AntiSense
RNF39 14 -0.0064 AntiSense
In some embodiments, a core set of gene transcripts in colon cancer signature
is ed that is fied through a separate study to determine the contribution
that each of the 636 probesets makes to the performance of the signature. In this
embodiment, ten probesets from the 636 probeset ure were removed and a new
signature was created based on 636 probesets, using the training dataset. The new
signature was then used to t the validation dataset (without old) and the
AUC was measured. The difference in AUC from the 636 probeset signature was
recorded. This process was repeated 0.5 million times and the average difference in
AUC that occurred for signatures lacking said probeset was recorded. The ets
with the largest negative AAUC are recorded in Table 1. In this ment, this set
of 10 transcripts represents a ate core set of genes whose absence from the
signature significantly impairs the predictive performance of the signature. Thus in
certain embodiments, the ripts representing the genes in Table 1 are included in
a colon cancer signature. In Table 1, the DAUC represents the drop in validation
AUC if this transcript is omitted from the signature. The orientation describes the
orientation of the transcript expressed in colon tissue. Three transcripts in this
signature are expressed as antisense transcripts of MUC3A, OLFM4 and RNF39.
In some embodiments, the ure includes a combination of 626-636
transcripts from Table 6, that include ARSD, CXCL9, PCLO, SLC2A3, FCGBP,
4, SLC2A3, BCL9L, MUC3A, OLFM4 and RNF39. In yet another
embodiment, the signature includes transcripts , 10-50, 50-636, 100-636, listed
in Table 6 which includes ARSD, CXCL9, PCLO, SLC2A3, FCGBP, SLC2A14,
SLC2A3, BCL9L, MUC3A, OLFM4 and RNF39 where the transcript orientation is
noted in Table 6.
Notably, 176 transcripts have been identified as being unrepresented by the
leading generic array by probe-level analysis (1.6. they are e” to the Colorectal
Cancer DSATM tool described above). This group of 176 transcripts listed in Table 2
are described herein as transcripts that are unique to the colon gene signatures and
methods of use herein. sequence-level gy searches have fied these
transcripts as not being contained on the leading generic array (Affymetrix) (z'.e. they
TM research tool described above). A
are "unique" to the Colorectal Cancer DSA
number of these transcripts are antisense transcripts not previously reported to be
expressed. These 176 transcripts are presented in Table 2 below, where the weight
rank ponds to the numbers shown in Table 6. Thus the sequence of these unique
transcripts can be found in Table 6.
Table 2: Unique transcripts in 636 transcript signature
Weight Rank
in 636
Gene Symbol Orientation Gene Description
ript
Signature
non-protein coding RNA 152
AC068491 .1
00152), transcr1pt variant 2,. .
424 (Clone_based Sense
non-cod1ng_ RNA [Source:RefSeq
_vega_gene)
DNA;Acc:NR_024205]
AC004968.2
(Clone_based
214 Sense Known long non-cod1ng RNA_
_ensembl_gen
cDNA FLJ52732, moderately r
AC010522.1 to Zinc finger protein 418
(Clone-based [Source:UniProtKB/TrEMBL;Acc:B
50 (Ensembl) /// AntiSense 4DR41] /// zinc finger protein 418
ZNF418 /// [Source:HGNC Symbol;Acc:20647]
ZNF814 /// zinc finger protein 814
[Source:HGNC ;Acc:33258]
ACO 1 8359. 1
(Clone_based
_Vega_gene)
Novel processed transcript ///
13 M AntiSense
Putative processed transcript.
AC123023 . 1
(Clone_based
_Vega_gene)
AC0695 13 .3
559 (Clone_based AntiSense Novel processed transcript.
gene)
AC 1 30352.2
(Clone_based
242 nse Novel miRNA.
_ensen1bl_gen
AC 1 3 8 128. 1
(Clone_based
593 Sense Novel long non-coding RNA.
_ensen1bl_gen
AC 1 3 8 128. 1
(Clone_based
488 Sense Novel long non-coding RNA.
_ensen1bl_gen
actinin, alpha 4 [Source:HGNC
177 ACTN4 AntiSense
Symbol;Acc:166]
22. 1 Putative uncharacterized protein
427 (Clone_based Sense ENSP00000383640
_ensen1bl_gen [Source:UniProtKB/TrEMBL;Acc:B
e) /// 7WNX9] /// Known protein coding.
AC145212.2
(Clone_based
_ensembl_gen
AL604028 .2
(Clone_based
290 Sense Known protein coding.
_ensembl_gen
acyl-malonyl condensing enzyme 1-
498 AMAC 1L1 Sense like 1 [Source:HGNC
Symbol;Acc:31043]
angiopoietin-like 6 [Source:HGNC
93 ANGPTL6 Sense
Symbol;Acc:23140]
d beta (A4) sor protein-
73 APBB2 AntiSense binding, family B, member 2
[Source:HGNC Symbol;Acc:582]
Rho GTPase activating protein 26
250 ARHGAP26 AntiSense
[Source:HGNC Symbol;Acc:17073]
Rho guanine nucleotide exchange
81 ARHGEFl AntiSense factor (GEF) 1 e:HGNC
Symbol;Acc:681]
ARHGEF2/// c guanine nucleotide
RP 1 1- exchange factor (GEF) 2
326 336K24.6 Sense [Source:HGNC Symbol;Acc:682] ///
(Clone_based Known nncoding transcript with no
_Vega_gene) ORF.
aspartate beta-hydroxylase
391 ASPH AntiSense
[Source:HGNC Symbol;Acc:757]
ATPase, Ca++ transporting, plasma
435 ATP2B4 Sense membrane 4 [Source:HGNC
Symbol;Acc:817]
108 AXIN2 AntiSense axin 2 [Source:HGNC
Symbol;Acc:904]
baculoviral IAP repeat-containing 6
217 BIRC6 Sense
e:HGNC Symbol;Acc:13516]
BLCAP ///
RP 1 1- bladder cancer associated protein
224 425M5 .5 AntiSense [Source:HGNC Symbol;Acc:1055]
(Clone_based /// Putative sed transcript.
_Vega_gene)
bone morphogenetic protein receptor,
373 BMPR1A AntiSense type IA [Source:HGNC
;Acc:1076]
bone morphogenetic protein receptor,
384 BMPR1A AntiSense type IA [Source:HGNC
Symbol;Acc:1076]
UPF0632 protein C2orf89 Precursor
552 C2orf89 Sense [Source:UniProtKB/Swiss-
Prot;Acc:Q86V40]
Uncharacterized protein C6orf203
486 C6orf203 Sense [Source:UniProtKB/Swiss-
Prot;Acc:Q9POP8]
UPF0551 protein C8orf38,
mitochondrial Precursor (Putative
436 C8orf38 Sense phytoene synthase)
[Source:UniProtKB/Swiss-
Prot;Acc:Q330K2]
ca1ciun1/ca1n10du1in-dependent
256 CAMK1D Sense protein kinase ID e:HGNC
SymbolgAcc: 19341]
ca1pain 12 [Source:HGNC
400 CAPN 12 Sense
Symbol;Acc:13249]
cyc1in D2 e:HGNC
87 CCND2 AntiSense
;Acc:1583]
82 CCND2 AntiSense cyc1in D2 [Source:HGNC
Symbol;Acc: 1583]
CD200 le [Source:HGNC
289 CD200 Sense
Symbol;Acc:7203]
CDC42 small effector 2
281 CDC42SE2 AntiSense
[Source:HGNC Symbol;Acc:18547]
carcinoembryonic antigen-related
510 CEACAM5 AntiSense cell adhesion molecule 5
[Source:HGNC Symbol;Acc: l 8 l 7]
chromodomain helicase DNA
159 CHD2 AntiSense binding n 2 [Source:HGNC
Symbol;Acc: l9l7]
chromodomain helicase DNA
309 CHD2 Sense binding protein 2 [Source:HGNC
Symbol;Acc: l9l7]
COMM domain containing 10
531 0 AntiSense
[Source:HGNC Symbol;Acc:30201]
cytoplasmic polyadenylation element
429 CPEB2 Sense binding protein 2 e:HGNC
;Acc:2 l 745]
casein kinase 1, alpha 1
505 CSNKlAl Sense
[Source:HGNC Symbol;Acc:2451]
C-terminal g protein 2
466 CTBP2 AntiSense
[Source:HGNC Symbol;Acc:2495]
DEAD (Asp-Glu-Ala-Asp) box
522 DDXl7 AntiSense polypeptide 17 [Source:HGNC
Symbol;Acc:2740]
death or domain containing
63 DEDD Sense
[Source:HGNC Symbol;Acc:2755]
dehydrogenase/reductase (SDR
407 DHRSl l AntiSense family) member ll [Source:HGNC
Symbol;Acc:28639]
discs, large homolog 5 (Drosophila)
300 DLG5 Sense
[Source:HGNC Symbol;Acc:2904]
desmoplakin [Source:HGNC
344 DSP AntiSense
Symbol;Acc:3052]
endothelin ting enzyme 1
378 ECE1 Sense
e:HGNC Symbol;Acc:3 146]
eukaryotic elongation factor-2 kinase
151 EEF2K AntiSense
[Source:HGNC Symbol;Acc:246l5]
epidermal growth factor receptor
(erythroblastic leukemia Viral -
587 EGFR AntiSense
b) oncogene homolog, avian)
[Source:HGNC Symbol;Acc:3236]
EPH receptor B4 [Source:HGNC
274 EPHB4 AntiSense
Symbol;Acc:3395]
family with sequence similarity 190,
member A [Source:HGNC
FAM19OA /// ;Acc:29349] /// selenoprotein
SEPP1 /// P, plasma, 1 [Source:HGNC
588 AntiSense
UBTD1 /// Symbol;Acc:10751] /// ubiquitin
UTY domain containing 1 [Source:HGNC
Symbol;Acc:25683] /// ubiquitously
transcribed tetratricopeptide repe
family with sequence similarity 60,
192 FAM60A AntiSense member A [Source:HGNC
Symbol;Acc:30702]
Fanconi anemia, complementation
229 FANCD2 Sense group D2 [Source:HGNC
Symbol;Acc:3585]
FAT tumor suppressor g l
625 FAT 1 AntiSense (Drosophila) [Source:HGNC
Symbol;Acc:3595]
fibronectin type 111 domain
12 FNDC3B Sense containing 3B [Source:HGNC
Symbol;Acc:24670]
446 FNDC3B Sense fibronectin type 111 domain
containing 3B e:HGNC
Symbol;Acc:24670]
GRIP and coiled-coil domain
183 GCC2 Sense containing 2 [Source:HGNC
;Acc:23218]
glutamine--fructosephosphate
395 GFPT1 AntiSense minase 1 [Source:HGNC
Symbol;Acc:4241]
galactosidase, beta 1 [Source:HGNC
623 GLB1 AntiSense
Symbol;Acc:4298]
GDP-mannose 4,6-dehydratase
332 GMDS Sense
[Source:HGNC Symbol;Acc:4369]
guanine nucleotide g protein-
like 1 [Source:HGNC
Symbol;Acc:4413] /// Guanine
363 GNL 1 Sense nucleotide-binding protein-like 1
(GTP-binding n HSR1)
[Source:UniProtKB/Swiss-
Prot;Acc:P36915]
G protein-coupled receptor, family
512 GPRCSA AntiSense C, group 5, member A
[Source:HGNC Symbol;Acc:9836]
glutamic pyruvate transaminase
206 GPT2 AntiSense (alanine aminotransferase) 2
[Source:HGNC Symbol;Acc:18062]
growth factor receptor-bound protein
341 GRB7 Sense
7 [Source:HGNC Symbol;Acc:4567]
grainyhead-like 2 (Drosophila)
62 GRHL2 Sense
e:HGNC Symbol;Acc:2799]
glutathione S-transferase omega 2
49 GSTO2 Sense
[Source:HGNC Symbol;Acc:23064]
glutathione S-transferase omega 2
56 GSTO2 Sense
[Source:HGNC Symbol;Acc:23064]
helicase with zinc finger
533 HELZ Sense
[Source:HGNC Symbol;Acc:16878]
heterogeneous nuclear
412 HNRNPL AntiSense ribonucleoprotein L e:HGNC
Symbol;Acc:5045]
heat shock 60kDa protein 1
198 HSPD1 Sense (chaperonin) [Source:HGNC
Symbol;Acc:5261]
globulin lambda-like
114 IGLL5 Sense polypeptide 5 [Source:HGNC
Symbol;Acc:38476]
interleukin 32 [Source:HGNC
495 IL32 AntiSense
SymbolgAcc: 16830]
inositol polyphosphate
394 INPP4B Sense phosphatase, type II, 105kDa
[Source:HGNC Symbol;Acc:6075]
integrin, alpha 6 [Source:HGNC
165 ITGA6 AntiSense
Symbol;Acc:6142]
integrin, alpha 6 [Source:HGNC
166 ITGA6 AntiSense
;Acc:6142]
KN motif and ankyrin repeat
287 KANK1 AntiSense s 1 [Source:HGNC
Symbol;Acc:19309]
KN motif and ankyrin repeat
226 KANK1 AntiSense domains 1 [Source:HGNC
Symbol;Acc:19309]
potassium channel, ily K,
179 KCNK1 AntiSense member 1 [Source:HGNC
Symbol;Acc:6272]
KIAAO319-like [Source:HGNC
513 KIAA0319L Sense
;Acc:30071]
kinesin family member 24
126 KIF24 Sense
[Source:HGNC SymbolgAcc: 19916]
KLRAQ motif containing 1
278 KLRAQI AntiSense
[Source:HGNC Symbol;Acc:30595]
leucine rich repeat containing 37B
237 LRRC37B AntiSense
[Source:HGNC Symbol;Acc:29070]
metastasis associated in colon cancer
519 MACCl Sense 1 [Source:HGNC
Symbol;Acc:302l5]
microtubule-actin crosslinking factor
301 MACFl AntiSense 1 e:HGNC
SymbolgAcc: 13664]
ondrial ral signaling
238 MAVS AntiSense protein [Source:HGNC
;Acc:29233]
e enhancer factor 2A
336 MEF2A Sense
[Source:HGNC Symbol;Acc:6993]
hsa-mir-612
567 MIR612 nse
[Source:miRBase;Acc:MIOOO3625]
matrix metallopeptidase 1 (interstitial
431 MMP1 AntiSense collagenase) [Source:HGNC
Symbol;Acc:7l55]
matrix metallopeptidase 25
397 MMP25 Sense
[Source:HGNC Symbol;Acc:14246]
MORC family CW-type zinc finger 3
72 MORC3 Sense
[Source:HGNC Symbol;Acc:23572]
MORC family CW-type zinc finger 3
506 MORC3 Sense
[Source:HGNC Symbol;Acc:23572]
mucin 2, oligomeric mucus/gel-
634 MUC2 AntiSense forming [Source:HGNC
Symbol;Acc:75 l2]
mucin 6, oligomeric mucus/gel-
110 MUC6 Sense forming [Source:HGNC
Symbol;Acc:75 l7]
133 MUC6 Sense mucin 6, oligomeric mucus/gel-
forming [Source:HGNC
;Acc:7517]
melanoma associated antigen
1 MUM1 Sense (mutated) 1 [Source:HGNC
Symbol;Acc:29641]
myosin X [Source:HGNC
3 85 MYO 1 0 Sense
Symbol;Acc:7593]
myosin IE [Source:HGNC
599 MYO1E AntiSense_
Symbol;Acc:7599]
No Transcript
448 N/A N/A
match
No ript
57 N/A N/A
match
No Transcript
628 N/A N/A
match
non-protein coding RNA 152
(NCRNA00152), transcript variant 2,
370 N/A Sense
dlng_ RNA [Source:RefSeq
DNA;Acc:NR_024205]
No Transcript
6 N/A N/A
match
No Genome
66 N/A N/A
match
No Transcript
6 1 0 N/A N/A
match
No Transcript
308 N/A N/A
match
No Genome
439 N/A N/A
match
No Genome
13 1 N/A N/A
match
N(a1pha)-acety1transferase 50, NatE
359 NAA50 AntiSense
catalytic subunit [Source:HGNC
Symbol;Acc:29533]
N(alpha)-acetyltransferase 50, NatE
333 NAA50 AntiSense catalytic subunit [Source:HGNC
;Acc:29533]
nibrin [Source:HGNC
356 NBN AntiSense
Symbol;Acc:7652]
non-SMC condensinI complex,
137 NCAPD2 Sense subunit D2 e:HGNC
Symbol;Acc:24305]
Ol8 small nucleolar RNA, C/D box 65
348 AntiSense
8 e:HGNC Symbol;Acc:32726]
non-protein coding RNA
NCRNA0026 No Transcript
297 262[Source:HGNC SymbolgAcc:
2 match
26785]
NADH dehydrogenase (ubiquinone)
1 alpha subcompleX, l3
NDUFAl 3 /// [Source:HGNC Symbol;Acc:l7l94]
606 AntiSense
YJEFN3 /// Yj eF N—terminal domain
containing 3 [Source:HGNC
Symbol;Acc:24785]
nuclear receptor subfamily 6, group
502 NR6Al AntiSense A, member 1 [Source:HGNC
Symbol;Acc:7985]
nuclear receptor subfamily 6, group
420 NR6Al AntiSense A, member 1 [Source:HGNC
Symbol;Acc:7985]
olfactomedin 4 e:HGNC
61 OLFM4 AntiSense
SymbolgAcc: 17190]
poly ibose) polymerase
129 PARP l 4 AntiSense family, member 14 [Source:HGNC
Symbol;Acc:29232]
promyelocytic leukemia
515 PML AntiSense
[Source:HGNC Symbol;Acc:9l l3]
periostin, osteoblast c factor
323 POSTN nse
[Source:HGNC Symbol;Acc:16953]
pancreatic progenitor cell
differentiation and proliferation
554 PPDPF Sense
factor homolog (zebrafish)
[Source:HGNC Symbol;Acc:16142]
PTPRF interacting protein, binding
381 PPFIBP1 Sense protein 1 (liprin beta 1)
e:HGNC Symbol;Acc:9249]
protein phosphatase 3, catalytic
504 PPP3CA Sense subunit, alpha isozyme
[Source:HGNC Symbol;Acc:9314]
protein kinase, DNA-activated,
382 PRKDC AntiSense catalytic polypeptide [Source:HGNC
Symbol;Acc:9413]
PRP40 pre-mRNA processing factor
450 PRPF40A AntiSense 40 homolog A (S. cereVisiae)
[Source:HGNC Symbol;Acc:16463]
PTK2 protein tyrosine kinase 2
525 PTK2 AntiSense
[Source:HGNC Symbol;Acc:961 1]
n tyrosine phosphatase type
215 PTP4A1 AntiSense IVA, member 1 e:HGNC
Symbol;Acc:9634]
RAB GTPase activating protein 1
298 RABGAP 1 AntiSense
e:HGNC Symbol;Acc:17155]
RNA binding motif protein 47
194 RBM47 Sense
[Source:HGNC Symbol;Acc:30358]
arginine-glutamic acid dipeptide
461 AntiSense (RE) repeats [Source:HGNC
Symbol;Acc:9965]
rhomboid domain containing 1
355 RHBDD1 Sense
[Source:HGNC ;Acc:23081]
454 RNF145 AntiSense ring finger protein 145
[Source:HGNC Symbol;Acc:20853]
ring finger protein 43 [SourceiHGNC
171 RNF43 Sense
SymbolgAcc: 18505]
RP 1 1-
357H14.7
496 Sense Novel processed transcript.
(Clone_based
_Vega_gene)
RP 1 1-
460N1 1 .2
573 AntiSense Known pseudogene.
(Clone_based
_Vega_gene)
RP 1 1-
460N1 1 .2
172 AntiSense Known pseudogene.
(Clone_based
_Vega_gene)
RP 1 1-
460N1 1 .2
155 AntiSense Known pseudogene.
(Clone_based
_Vega_gene)
RP 1 1- 1372, isoforni CRA_cNove1
7060 1 5 .1 protein ;
247 AntiSense
(Clone_based [Source :UniProtKB/TrEMBL;Acc :B
_Vega_gene) 1B108]
RP 1 1-
761E20. 1
251 Sense Novel processed transcript.
(Clone_based
_Vega_gene)
RP 1 1-86H7. 1
307 _based Sense Novel processed transcript.
gene)
RP4-7 1 7123 .3
575 (Clone_based AntiSense Novel processed transcript.
_Vega_gene)
runt-related transcription factor 1
209 RUNX1 AntiSense
[Source:HGNC Symbol;Acc:10471]
sterile alpha motif domain containing
95 SAMD4B AntiSense 4B [Source:HGNC
Symbol;Acc:25492]
SATB homeobox 2 [Source:HGNC
17 SATB2 AntiSense
Symbol;Acc:21637]
SH3 domain containing 19
264 SH3D19 AntiSense
[Source:HGNC Symbol;Acc:30418]
SH3-domain GRB2-like endophilin
235 SH3GLB1 AntiSense B1 [Source:HGNC
SymbolgAcc: 10833]
-induced proliferation-
388 SIPA1L3 Sense ated 1 like 3 e:HGNC
;Acc:23801]
solute carrier family 6
(neurotransmitter transporter,
157 SLC6A6 Sense
taurine), member 6 [Source:HGNC
Symbol;Acc:11052]
solute carrier family 6
(neurotransmitter transporter,
259 SLC6A6 Sense
taurine), member 6 [Source:HGNC
Symbol;Acc:11052]
SMAD specific E3 tin protein
462 SMURF2 Sense ligase 2 [Source:HGNC
Symbol;Acc:16809]
lococcal nuclease and tudor
377 SND 1 Sense domain containing 1 [Source:HGNC
Symbol;Acc:30646]
syntrophin, beta 2 (dystrophin-
associated protein Al, 59kDa, basic
335 SNTB2 AntiSense
component 2) [Source:HGNC
Symbol;Acc:11169]
superoxide dismutase 2,
329 SOD2 AntiSense mitochondrial [Source:HGNC
Symbol;Acc: l l 180]
SPlOO r antigen
263 SPlOO Sense
[Source:HGNC Symbol;Acc:l 1206]
speedy homolog E2 (Xenopus laeVis)
243 SPDYE2 AntiSense
e:HGNC Symbol;Acc:3384l]
speedy homolog E2 (Xenopus laeVis)
594 SPDYE2 AntiSense
[Source:HGNC Symbol;Acc:3384l]
serine/arginine-rich splicing factor 1
636 SRSFl AntiSense
[Source: HGNC Symbol; Acc:lO780]
sperm specific antigen 2
561 SSFA2 Sense
[Source:HGNC Symbol;Acc:l l3 l9]
transducin (beta)-like l X-linked
369 TBLlXRl AntiSense receptor 1 [Source:HGNC
Symbol;Acc:29529]
testis sed 10 [Source:HGNC
605 TEXlO AntiSense
Symbol;Acc:25988]
transcription factor A, mitochondrial
453 TFAM AntiSense
[Source:HGNC ;Acc:l l74l]
TLC domain containing 2 [Source:
629 TLCD2 Antisense
HGNC Symbol; Acc:33522]
TLC domain containing 2 [Source:
470 TLCD2 Antisense
HGNC Symbol; Acc:33522]
embrane n 87A
TMEM87A Sense
[Source:HGNC Symbol;Acc:24522]
transmembrane se, serine 4
624 TMPRSS4 AntiSense
[Source:HGNC Symbol;Acc:l 1878]
tripartite containing 5
102 TRIM5 Sense
[Source:HGNC Symbol;Acc:l6276]
trichorhinophalangeal syndrome I
44 TRPSl AntiSense
[Source:HGNC Symbol;Acc:l2340]
221 TSPANl Sense tetraspanin l [Source:HGNC
Symbol;Acc:20657]
tetratricopeptide repeat domain 39B
543 TTC39B nse
[Source:HGNC Symbol;Acc:23704]
U6 spliceosomal RNA
342 U6 (RFAM) Sense
[Source:RFAM;Acc:RF00026]
WD repeat and SOCS box-
288 WSB1 nse containing 1 [Source:HGNC
SymbolgAcc: 19221]
YLP motif containing 1
523 YLPMl AntiSense
[Source:HGNC Symbol;Acc:17798]
y ppi e 5 (Drosophila)
386 YPEL5 AntiSense
[Source:HGNC Symbol;Acc:18329]
zinc finger, ANl-type domain 3
612 ZFAND3 Sense
[Source:HGNC Symbol;Acc:18019]
zinc fingers and homeoboxes 2
76 ZHX2 AntiSense
[Source:HGNC Symbol;Acc:18513]
zinc finger protein 75a
161 ZNF75A AntiSense
[Source:HGNC Symbol;Acc:13146]
ZXD family zinc finger C
409 ZXDC AntiSense
[Source:HGNC Symbol;Acc:28160]
In some embodiments, a signature includes at least 2, such as at least 3, at
least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at
least 12, at least 13, at least 14, at least 15, at least 20, at least 25, at least 30, at least
, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at least 70, at
least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least 125, at
least 150, or even all 176 of the transcripts listed in Table 2, for e those that
carry the greatest weight, defined as the rank of the e weight in the compound
decision score function measured under cross-validation, and still have prognostic
value. In some embodiments, a signature includes the top 10 weighted transcripts, the
second top 10 weighted transcripts, the third top 10 weighted transcripts, the fourth
top 10 weighted transcripts, the fifth top 10 weighted transcripts, the sixth top 10
weighted transcripts, the seventh top 10 ed transcripts, the eighth top 10
weighted ripts, the ninth top 10 weighted transcripts, or the tenth top 10
weighted transcripts listed in Table 2. In yet filrther embodiments, a signature
includes the 176, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or
transcripts having the greatest weight listed in Table 2.
In some embodiments, the methods described herein include subjecting RNA
isolated from a patient to gene sion profiling. Thus, the gene expression profile
may be completed for a set of genes that includes at least two of the transcripts listed
in Table 6, which in some examples are normalized as described below. In particular
embodiments of the methods disclosed herein, the expression level of at least 2, such
as at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10,
at least 11, at least 12, at least 13, at least 14, at least 15, at least 20, at least 25, at
least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least
65, at least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100,
at least 125, at least 150, at least 175, at least 200, at least 225, at least 250, at least
275, at least 300, at least 325, at least 350, at least 375, at least 400, at least 425, at
least 450, at least 474, at least 500, at least 525, at least 550, at least 575, at least 600,
at least 634, or even all 636 of the transcripts in Table 6 or their expression ts,
and/or complement is determined, for example the transcripts in Table 6 that carry the
greatest weight, defined as the rank of the average weight in the nd decision
score function measured under validation, and still have prognostic value. In
some embodiments of this method, the expression level of at least at least 2, such as at
least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at
least 11, at least 12, at least 13, at least 14, at least 15, at least 20, at least 25, at least
, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at
least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least
125, at least 150, or even all 176 of the transcripts in Table 2 or their expression
products, and/or complement is determined, for example those that carry the greatest
weight, defined as the rank of the e weight in the nd on score
function measured under cross-validation, and still have prognostic value. In the
methods described herein, the combination of transcripts may be referred to as a
signature or expression signature.
The relative expression levels of transcripts in a colon tissue are measured to
form a gene expression profile. In one embodiment, the gene expression profile of a
set of transcripts from a patient tissue sample is summarized in the form of a
compound decision score and ed to a control threshold, such a threshold that is
mathematically derived from a training set of patient data. The threshold separates a
patient group based on different characteristics such as, but not limited to, good/poor
prognosis, responsiveness/non-responsiveness to ent, cancer
detection/diagnosis and cancer classification. The patient ng set data is
preferably derived from colon tissue samples having been characterized by prognosis,
likelihood of recurrence, or long term survival, diagnosis, cancer classification,
personalized genomics profile, al outcome, treatment response. Expression
profiles, and corresponding decision scores from t samples may be correlated
with the characteristics of patient s in the training set that are on the same side
of the mathematically derived decision threshold. In this embodiment, the threshold of
the linear classifier compound on score was optimized to maximize the sum of
sensitiVity and specificity under cross-validation applied within the training dataset.
These methods are also useful for determining prognosis of colon cancer and in a
particular embodiment a patient with stage II colon cancer. In some examples, the
disclosed methods are predictive of poor clinical outcome, which can be measured,
for example, in terms of shortened survival or increased risk of cancer recurrence, e.g.
following surgical removal of the , or following surgical removal of the cancer
in combination with adjuvant herapy.
Methods are provided for diagnosing colon cancer in a sample obtained from a
subject. Such methods include ing the expression level of at least 2 colon
-related nucleic acid les listed in Table 6 in a sample sing nucleic
acids ed from the subject and comparing the expression level of the at least 2
colon cancer-related nucleic acid molecules, or a decision score derived therefrom to
a control threshold indicative of a diagnosis of colon cancer, wherein the expression
level, or a decision score derived therefrom, on the same side of the threshold
indicates a diagnosis of colon cancer. In some examples, a control threshold is a
threshold derived from corresponding transcripts from colon -related nucleic
acid molecules listed in Table 6 in a known colon cancer sample (or samples.
Methods are provided for fying a colon cancer . Such methods
include detecting the expression level of at least 2 colon cancer-related nucleic acid
molecules listed in Table 6 in a sample comprising nucleic acids obtained from a
subject and comparing the expression level of the at least 2 colon cancer-related
nucleic acid molecules, or a decision score derived therefrom, to a control threshold
indicative of known classification, wherein the sion level, or a decision score
derived therefrom, on the same side of the threshold permits classification of the
colon cancer sample. In some examples, a control threshold is a threshold derived
from corresponding transcripts from colon -related nucleic acid molecules
listed in Table 6 in a colon cancer sample (or samples) of known classification. In
some examples, the colon cancer sample is classified as stage 1, stage 11, stage III and
stage IV. In some examples the method further includes ng a treatment plan
that will be effective for the classified colon cancer, for example surgical resection,
chemotherapy, radiation or any combination thereof.
Methods are provided for predicting a response to a treatment for colon
cancer, such as a subject with stage II colon cancer. Such methods include detecting
the expression level of at least 2 colon cancer-related nucleic acid molecules listed in
Table 6 in a sample comprising c acids obtained from a subject and comparing
the expression level of the at least 2 colon -related nucleic acid molecules, or a
decision score derived therefrom, to a control threshold indicative of a known
response to ent, wherein the expression level, or a on score derived
therefrom, on the same side of the threshold indicates a similar response to treatment,
thereby predicting response to treatment. In some examples, a control threshold is a
threshold derived from ponding transcripts from colon cancer-related nucleic
acid molecules listed in Table 6 in a colon cancer sample (or samples) having a
known response to ent. In some embodiments, the method is a method of
predicting response from surgical resection, chemotherapy, radiation or any
combination thereof.
Methods are ed for predicting long term survival of a subject with colon
cancer, such as a t diagnosed with stage II colon cancer. These methods include
detecting the expression level of at least 2 colon cancer-related nucleic acid molecules
listed in Table 6 in a sample comprising c acids obtained from a subject and
comparing the expression level of the at least 2 colon cancer-related nucleic acid
molecules, or a on score derived therefrom, to a control threshold tive of
having a history of long term survival, wherein the expression level, or a decision
score derived therefrom, on the same side of the threshold indicates long term
survival of the subject, thereby predicting long term survival of a subject. In some
examples, the control threshold is a threshold derived from corresponding transcripts
from colon cancer-related nucleic acid molecules listed in Table 6 in a colon cancer
sample (or samples) obtained from a subject (or subjects) having a history of long
term survival.
Also provided are s for predicting of recurrence of colon cancer in a
subject, such as subject diagnosed as having stage II colon . These s
include detecting the expression level of at least 2 colon cancer-related nucleic acid
molecules listed in Table 6 in a sample comprising nucleic acids obtained from a
subject and comparing the expression level of the at least 2 colon cancer-related
nucleic acid molecules, or a decision score derived therefrom to a control threshold
indicative of a history of recurrence, wherein the sion level, or a decision score
derived therefrom, on the same side of the threshold indicates a recurrence in the
subject. In some examples, a control threshold is a threshold derived from
corresponding ripts from colon cancer-related nucleic acid molecules listed in
Table 6 in a colon cancer sample (or samples) having a y of recurrence.
Methods are provided for preparing a personalized colon cancer genomics
profile for a t. The methods include detecting an expression level of at least 2
colon cancer-related c acid molecules listed in Table 6 in a sample comprising
nucleic acids obtained from a subject and creating a report summarizing the data
obtained by the gene expression analysis.
In particular embodiments of the methods disclosed herein, the expression
levels for at least 2, such as at least 3, at least 4, at least 5, at least 6, at least 7, at least
8, at least 9, at least 10, at least ll, at least 12, at least 13, at least 14, at least 15, at
least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least
55, at least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at
least 95, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225,
at least 250, at least 275, at least 300, at least 325, at least 350, at least 375, at least
400, at least 425, at least 450, at least 474, at least 500, at least 525, at least 550, at
least 575, at least 600, at least 634, or even all 636 of the transcripts in Table 6 or
their expression products is determined and ed with a control threshold. In
other embodiment of these methods the expression levels for MUMl and SIGMARl
or their sion products is determined and ed with a control threshold. In
another embodiment, the expression levels for MUMl, SIGMARl, ARSD, SULTlC2
and PPFIBPl or their expression products is determined and compared with the
control threshold. In onal embodiments, the expression levels for ARSD,
CXCL9, PCLO, SLC2A3, FCGBP, SLC2Al4, SLC2A3, BCL9L and antisense
sequences of MUC3A, OLFM4 and RNF39 or their expression products is
determined and compared with a control threshold. In still other embodiments,
expression levels for substantially all the ripts listed in one of Tables 1, 2,
and/or 6 are determined in step and compared with a control threshold.
In some embodiments of the disclosed methods, the RNA levels are corrected
for (normalize away) both differences in the amount ofRNA assayed and variability
in the quality of the RNA used. Control transcripts may be ed in assays as
positive or negative controls and to ize readings and ensure reliable
ement data, but are preferably d for performing the actual prognosis.
The exact identity of the former is typically unimportant and a very broad variety of
transcripts could be envisaged for all of the purposes disclosed herein. For the
normalization controls, a broad variety of transcripts could be envisaged, although
they have to fulfill the basic requirements of approximately constant and stable
expression between a broad variety of subjects or conditions for the target tissue of
interest, in particular between the prognostic groups under consideration. Similarly
the RNA degradation controls have to show intensity behavior, suitable for indicating
(overly) ed RNA. This may or may not e RNA controls, which show a
stable intensity regardless of the overall RNA degradation of a sample as positive
controls. In relation to these controls the intensity pattern for suitable other RNA
controls would be analyzed for which an intensity ency on the RNA
degradation stage is ed. This may or may not e specific analyses
depending on g positions of probe sequences with respect to the 3’ end of a
transcript.
In some embodiments of the disclosed methods, where a microarray is used
for quantifying gene expression, one or more of the following controls can be used:
(a) Alignment controls, which are specific transcripts spiked in d
form, which bind to specific positions on an array and ensure a proper grid alignment
in the image processing of a scanned array.
(b) Amplification controls, which are specific unlabeled transcripts, e.g.
poly-A control transcripts, spiked in before any amplification is performed, so
undergoing the same sing as the sample mRNA to ensure an appropriate
mance of the cDNA synthesis and subsequent amplification reactions.
2012/022594
(c) Labeling and hybridization controls, which are specific controls spiked
in before the labeling and hybridization to the chip for controlling the efficiency of
these two steps separately from the prior amplification on.
(d) Background controls, which are probe sequences on the microarray for
which no corresponding target sequences should be available in the sample. Thus, in
principle no specific target binding should occur. These controls are used to establish
background or cross-hybridization intensities. They would potentially be
characterized by different GC-contents and a suitable spatial distribution over an
entire microarray.
(e) Normalization ls, which are probe sequences detecting
specifically chosen target sequences from the sample which are used to correct for
varying input mRNA amounts, varying yield of amplification reactions and varying
overall sensitivity of the measurement device. They are used to correct the measured
intensity values and would thus ensure an increased analytical precision of the overall
measurement device including the preparatory tory steps.
(f) RNA quality and ation control, which are probe sequences from
various ons with respect to the 3’ position of their respective genes designed to
te the RNA y and detect RNA degradation. Corresponding probes or
probe sets from multiple genes might represent differing RNA degradation behavior
from ent RNA s.
Whereas controls a) — d) can purely be derived based on sequence
considerations and should not be naturally present in the tissue and condition of
interest, controls e) and f) can be chosen by suitable analyses of prior patient data.
This may or may not be the same training data on which the prognostic gene signature
has been derived.
It should be understood that the above controls are only provided as example
and that other embodiments of this disclosure could be envisaged (such as qPCR) in
which different controls, with similar functionality would be used.
B. , Primers and Arrays
Disclosed are probes and primers specific for the disclosed colon cancer gene
signatures. Also disclosed are arrays, which include probes for the sed colon
cancer ures. In some embodiments, a probe specific for the disclosed colon
cancer gene signature includes a nucleic acid sequence that specifically hybridizes
one of SEQ ID NOS: l-636 or the ment thereof. In some embodiments, a probe
set for a sed colon cancer signature includes probes that specifically ize
to at least 2, such as at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at
least 9, at least 10, at least ll, at least 12, at least 13, at least 14, at least 15, at least
, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 55, at
least 60, at least 65, at least 70, at least 75, at least 80, at least 85, at least 90, at least
95, at least 100, at least 125, at least 150, at least 175, at least 200, at least 225, at
least 250, at least 275, at least 300, at least 325, at least 350, at least 375, at least 400,
at least 425, at least 450, at least 474, at least 500, at least 525, at least 550, at least
575, at least 600, at least 634, or even all 636 of the transcripts in Table 6, that carry
the greatest weight, defined as the rank of the e weight in the nd
decision score function measured under cross-validation, and still have prognostic
value, such as a probe that specifically hybridizes to any one of SEQ ID NOs: l-636
or the complement thereof. In some embodiments, a probe set for a disclosed colon
cancer signature includes probes that specifically hybridize to the top 10 weighted
transcripts, the second top 10 weighted ripts, the third top 10 weighted
transcripts, the fourth top 10 weighted transcripts, the fifth top 10 weighted
transcripts, the sixth top 10 weighted transcripts, the seventh top 10 weighted
transcripts, the eighth top 10 weighted transcripts, the ninth top 10 weighted
transcripts, or the tenth top 10 weighted transcripts listed in Table 6. In yet further
embodiments, a probe set for a disclosed colon cancer signature includes probes that
cally hybridize to 636, 634, 620, 610, 600, 590, 580, 570, 560, 550, 540, 530,
520, 510, 500, 490, 480, 470, 460, 450, 440, 430, 420, 410, 400, 390, 380, 370, 360,
350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210, 200, 190,
180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20, or 10
transcripts having the greatest weight listed in Table 6 or the complement thereof In
some embodiments, a probe set for a disclosed cancer signature comprises about 200
to about 1000 probes, such as from about 400 to about 800 probes, such as from about
500 to about 700 probes, such as from about 550 to about 650 probes, where the
probes detect transcripts from Table 6. The additional probes may be optionally
selected from those that detect transcripts that are expressed in colon cancer, or which
function as signal controls or expression level controls. Such optional probes can be
ed from those included on the Colorectal Cancer DSATM tool.
In some embodiments, a probe set for a disclosed colon cancer signature
includes probes that specifically hybridize to transcripts for MUMl and SIGMARl.
In other embodiments, a probe set for a disclosed colon cancer signature includes
probes that specifically ize to transcripts for MUMl, SIGMARl , ARSD,
SULTlC2 and PPFIBPl. In yet other embodiments, a probe set for a disclosed colon
cancer signature es probes that specifically hybridize to transcripts for ARSD,
CXCL9, PCLO, SLC2A3, FCGBP, SLC2Al4, SLC2A3, BCL9L and nse
sequences of MUC3A, OLFM4 and RNF39. A set of probes or primers can be
prepared that is substantially representative of the gene expression signature.
“Substantially representative of the gene expression signature” refers to probe sets
that specifically hybridize to at least 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100%
of the coding or ding transcripts in the gene expression signature, for example
at least 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% of the coding or ding
transcripts in the gene expression signatures shown in Table l, 2, or 6 or the
ment thereof.
It is advantageous to use probes which bind to the 3’ regions of ripts in
the gene expression signature, specifically where the patient tissue to be analyzed for
gene expression is RNA extracted from paraffin embedded tissue. Typically each
probe will be capable of hybridizing to a complementary sequence in the respective
transcript, which occurs within lkb, or 500bp, or 300bp, or 200bp, or 100bp of the 3’
end of the transcript. In the case of mRNA, the “3 ’ end of the transcript” is defined
herein as the polyadenylation site, not including the poly(A) tail.
In one embodiment, a pool of probes making up 30% of the total absolute
weight of the signature is used. In alternate embodiments, a pool of probes making up
40%, 60%, 70%, 80%, 90%, 95% or 100% ofthe total absolute weight of the
signature is used in the methods described . The basis for inclusion of markers,
as well as the clinical significance ofmRNA level variations with respect to the
reference set, is indicated below. In some ments, the disclosed probes are part
of an array, for example the probes are bound to a solid substrate. Exemplary nucleic
acid array and methods of making such arrays are discussed in Section D below.
In some embodiments, a probe specific for the disclosed colon cancer gene
signature is part of a c acid array, such as a microarray. In some examples, such
arrays e a nucleic acid sequence that specifically hybridizes one of SEQ ID
NOs: l-636 or the complement thereof. In some embodiments, a nucleic acid array,
such as a microarray, includes probes that specifically hybridize to at least 2, such as
at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at
least 11, at least 12, at least 13, at least 14, at least 15, at least 20, at least 25, at least
, at least 35, at least 40, at least 45, at least 50, at least 55, at least 60, at least 65, at
least 70, at least 75, at least 80, at least 85, at least 90, at least 95, at least 100, at least
125, at least 150, at least 175, at least 200, at least 225, at least 250, at least 275, at
least 300, at least 325, at least 350, at least 375, at least 400, at least 425, at least 450,
at least 474, at least 500, at least 525, at least 550, at least 575, at least 600, at least
634, or even all 636 of the transcripts in Table 6. In some embodiments, a nucleic acid
array for a disclosed colon cancer signature includes probes that specifically hybridize
to the top 10 weighted ripts, the second top 10 weighted transcripts, the third
top 10 weighted transcripts, the fourth top 10 weighted transcripts, the fifth top 10
weighted transcripts, the sixth top 10 weighted transcripts, the seventh top 10
weighted transcripts, the eighth top 10 weighted transcripts, the ninth top 10 weighted
transcripts, or the tenth top 10 ed transcripts listed in Table 6. In yet further
embodiments, a nucleic acid array for a disclosed colon cancer signature includes
probes that specifically hybridize to 636, 634, 620, 610, 600, 590, 580, 570, 560, 550,
540, 530, 520, 510, 500, 490, 480, 470, 460, 450, 440, 430, 420, 410, 400, 390, 380,
370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250, 240, 230, 220, 210,
200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, 90, 80, 70, 60, 50, 40, 30, 20,
or 10 transcripts having the greatest weight listed in Table 6 or the ment
thereof. In some embodiments, a nucleic acid array for a disclosed colon cancer
ure comprises about 200 to about 1000 probes, such as from about 400 to about
800 probes, such as from about 500 to about 700 probes, such as from about 550 to
about 650 probes, where the probes detect transcripts from Table 6. The additional
probes may be optionally selected from those that detect transcripts that are expressed
in colon cancer, or which function as signal controls or expression level controls.
Such optional probes can be selected from those included on the ctal Cancer
DSATM tool. In some embodiments, a nucleic acid array for a disclosed colon cancer
signature comprises more than about 1000 probes.
Also disclosed are primer pairs for the amplification of a gene expression
signature for colon cancer nucleic acid. In some es a primer pair includes a
d primer 15 to 40 nucleotides in length comprising a nucleic acid sequence that
specifically izes to any one of the nucleic acid sequences set forth as SEQ ID
2012/022594
NOS: l-636 or its complement and a reverse primer 15 to 40 nucleotides in length
comprising a nucleic acid sequence that specifically hybridizes to any one of the
nucleic acid sequences set forth as SEQ ID NOs: l-636 or its complement, wherein
the set of primers is capable of directing the cation of the nucleic acid.
Set of primer pairs for the cation of a gene expression signature for
colon cancer nucleic acids are also sed. In some embodiments, a primer set for a
disclosed colon cancer signature includes primers that cally hybridize to and
are capable of amplifying at least 2, such as at least 3, at least 4, at least 5, at least 6,
at least 7, at least 8, at least 9, at least 10, at least ll, at least 12, at least 13, at least
14, at least 15, at least 20, at least 25, at least 30, at least 35, at least 40, at least 45, at
least 50, at least 55, at least 60, at least 65, at least 70, at least 75, at least 80, at least
85, at least 90, at least 95, at least 100, at least 125, at least 150, at least 175, at least
200, at least 225, at least 250, at least 275, at least 300, at least 325, at least 350, at
least 375, at least 400, at least 425, at least 450, at least 474, at least 500, at least 525,
at least 550, at least 575, at least 600, at least 634, or even all 636 of the transcripts in
Table 6 that carry the greatest weight, defined as the rank of the average weight in the
compound decision score fianction measured under cross-validation, and still have
prognostic value such as primers that specifically hybridize to and are capable of
amplifying any one of SEQ ID NOs: l-636 or the complement thereof. In some
embodiments, a primer set for a disclosed colon cancer signature includes primers that
specifically hybridize to and are capable of amplifying the top 10 weighted
transcripts, the second top 10 weighted transcripts, the third top 10 weighted
transcripts, the fourth top 10 ed transcripts, the fifth top 10 weighted
transcripts, the sixth top 10 weighted transcripts, the seventh top 10 weighted
transcripts, the eighth top 10 weighted ripts, the ninth top 10 weighted
transcripts, or the tenth top 10 weighted transcripts listed in Table 6. In yet further
embodiments, a primer set for a disclosed colon cancer signature includes primers that
specifically hybridize to and are capable of amplifying 636, 634, 620, 610, 600, 590,
580, 570, 560, 550, 540, 530, 520, 510, 500, 490, 480, 470, 460, 450, 440, 430, 420,
410, 400, 390, 380, 370, 360, 350, 340, 330, 320, 310, 300, 290, 280, 270, 260, 250,
240, 230, 220, 210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100,90, 80, 70,
60, 50, 40, 30, 20, or 10 transcripts having the greatest weight listed in Table 6 or the
complement thereof
In some embodiments, a primer set for a disclosed colon cancer signature
includes s that specifically hybridize to and are capable of amplifying
transcripts for MUMl and SIGMARl. In another embodiment, a primer set for a
disclosed colon cancer signature includes primers that specifically hybridize to and
are e of amplifying transcripts for MUMl, SIGMARl, ARSD, SULTlC2 and
PPFIBPl. In yet another ment, a probe set for a disclosed colon cancer
ure includes probes that specifically hybridize to transcripts for ARSD, CXCL9,
PCLO, SLC2A3, FCGBP, SLC2Al4, SLC2A3, BCL9L and antisense sequences of
MUC3A, OLFM4 and RNF39. A set of probes or primers can be prepared that is
substantially representative of the gene expression signature. “Substantially
representative of the gene expression signature” refers to probe sets that specifically
hybridize to at least 50%, 60%, 70%, 80%, 90%, 95%, 99%, or 100% ofthe coding or
non-coding transcripts in the gene expression signature, for example at least 50%,
60%, 70%, 80%, 90%, 95%, 99%, or 100% of the coding or ding transcripts in
the gene expression signatures shown in Table l, 2, or 6 or the complement thereof.
C. Statistical Determination ofColon Cancer Signatures
The disclosed colon cancer signatures can be evaluated by statistical methods.
In some embodiments, the gene expression profile of a t tissue sample is
evaluated by a linear classifier. As used herein, a linear classifier refers to a weighted
sum of the individual gene intensities into a compound decision score (“decision
function”). The decision score is then compared to a pre-defined cut-off threshold,
corresponding to a certain set point in terms of sensitivity and specificity, which
tes if a sample, is above the threshold (decision function positive) or below
(decision fianction negative).
Effectively, this means that the data space, i.e. the set of all le
combinations of gene expression values, is split into two mutually exclusive halves
corresponding to different clinical classifications or predictions, e.g. one
ponding to good prognosis and the other to poor sis. In the context of the
overall signature, relative over-expression of a certain gene can either increase the
decision score (positive weight) or reduce it (negative weight) and thus contribute to
an overall decision of, for example, either poor or good sis.
The retation of this quantity, i.e. the cut-off old for good versus
poor prognosis, is derived in the development phase (“training”) from a set of patients
with known outcome. The corresponding weights and the good/poor sis cut-off
threshold for the decision score are fixed a priori from training data by methods
known to those of ordinary skill in the art. In a preferred embodiment of the present
, Partial Least Squares Discriminant is (PLS-DA) is used for
determining the weights. (Stahle, J. Chemom. 1 185-196, 1987; Nguyen and Rocke,
Bioinformatz'cs 18 39-50, 2002). Other methods for performing the classification,
known to those skilled in the art, may also be with the methods described herein when
applied to the transcripts of a colon cancer signature.
Different methods can be used to convert quantitative data ed on these
genes or their products into a prognosis or other predictive use. These methods
e, but not limited to pattern recognition (Duda et al. Pattern Classification, 211d
ed., John Wiley, New York 2001), machine learning (Scholkopf et al. Learning with
Kernels, MIT Press, Cambridge 2002, Bishop, Neural Networks for Pattern
Recognition, Clarendon Press, Oxford 1995), statistics e et al. The ts of
Statistical Learning, Springer, New York 2001), bioinformatics t et al., J. Am.
t. Assoc. 97:77-87, 2002; Tibshirani et al., Proc. Natl. Acad. Sci. USA 99:6567-
6572, 2002) or chemometrics (Vandeginste, et al., Handbook of Chemometrics and
Qualimetrics, Part B, Elsevier, Amsterdam 1998).
In some embodiments, in a ng step a set of patient samples for both good
and poor prognosis cases are measured and the prediction method is optimised using
the nt information from this training data to optimally predict the training set or
a filture sample set. In this training step the used method is trained or parameterised to
predict from a specific intensity pattern to a specific prognostic call. Suitable
transformation or pre-processing steps might be performed with the measured data
before it is subjected to the prognostic method or algorithm.
In some embodiments, a weighted sum of the pre-processed intensity values
for each transcript is formed and compared with a old value optimised on the
training set (Duda et al. Pattern Classification, 211d ed., John Wiley, New York 2001).
The weights can be derived by a multitude of linear classification methods, ing
but not limited to Partial Least Squares (PLS, n et al., 2002, Bioz'nformatz'cs 18
(2002) 39-50)) or Support Vector es (SVM, (Scholkopf et al. Learning with
Kernels, MIT Press, Cambridge 2002)).
In some embodiments, the data is transformed non-linearly before applying a
weighted sum, for example as described above. This non-linear transformation might
WO 03250
include increasing the dimensionality of the data. The non-linear transformation and
ed summation might also be performed itly, e.g. through the use of a
kernel function. (Scholkopf et al. Learning with s, MIT Press, Cambridge
2002).
In some ments, a new data sample is compared with two or more class
prototypes, being either real measured training samples or artificially created
prototypes. This comparison is performed using suitable similarity measures for
example but not limited to Euclidean distance (Duda et al. Pattern Classification, 211d
ed., John Wiley, New York 2001), correlation coefficient (van’t Veer, et al., Nature
415 :530, 2002) etc. A new sample is then assigned to the prognostic group with the
closest prototype or the highest number of prototypes in the Vicinity.
In some embodiments, decision trees (Hastie et al. The Elements of Statistical
Learning, Springer, New York 2001) or random forests (Breiman, ndom
Forests, Machine Learning 45:5) are used to make a prognostic call from the
measured intensity data for the transcript set or their products.
In some embodiments, neural networks p, Neural Networks for Pattern
ition, Clarendon Press, Oxford 1995) are used to make a prognostic call from
the measured intensity data for the transcript set or their products.
In some embodiments, discriminant analysis (Duda et al. Pattern
Classification, 211d ed., John Wiley, New York 2001), comprising but not limited to
linear, diagonal linear, quadratic and logistic discriminant analysis, is used to make a
stic call from the measured intensity data for the ript set or their
products.
In some embodiments, Prediction is for Microarrays (PAM, (Tibshirani
et al., Proc. Natl. Acad. Sci. USA 99:6567-6572, 2002)) is used to make a prognostic
call from the measured intensity data for the transcript set or their products.
In some embodiments, Soft Independent Modelling of Class Analogy
(SIMCA, (Wold, 1976, Pattern Recogn. 8: 127-139)) is used to make a prognosis from
the measured intensity data for the transcript set or their products.
D. Methodsfor detection 0meNA
Gene expression can be ted by detecting mRNA encoding the gene of
st. Thus, the disclosed methods can include evaluating mRNA. RNA can be
isolated from a sample of a tumor (for example, a colon cancer tumor) from a subject,
WO 03250
a sample of adjacent non-tumor tissue from the subject, a sample of tumor-free tissue
from a normal (healthy) subject, or combinations thereof, using methods well known
to one of ordinary skill in the art, including cially available kits.
l methods for mRNA extraction are well known in the art and are
disclosed in standard textbooks of molecular biology, including Ausubel et al.,
Current Protocols of Molecular Biology, John Wiley and Sons (1997). Methods for
RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp
and Locker, Biotechniques 6:56-60, 1988, and De Andres et al., Biotechniques 18:42-
44, 1995. In one example, RNA isolation can be performed using purification kit,
buffer set and protease from commercial manufacturers, such as QIAGEN®
(Valencia, CA), according to the manufacturer's instructions. For example, total RNA
from cells in e (such as those obtained from a subject) can be isolated using
QIAGEN® RNeasy® mini-columns. Other commercially available RNA isolation
kits include MASTERPURE® Complete DNA and RNA Purification Kit
(EPICENTRE® Madison, Wis), and Paraffin Block RNA Isolation Kit (Ambion,
Inc.). Total RNA from tissue samples can be isolated using RNA Stat-60 (Tel-Test).
RNA prepared from tumor or other biological sample can be isolated, for example, by
cesium chloride density gradient centrifugation.
The t signatures and methods bed herein accommodate the use of
archived paraffin-embedded biopsy material for assay of all markers in the set, and
therefore are compatible with the most widely available type of biopsy al. The
expression level of transcripts in a colon tissue sample may be determined using RNA
obtained from a in-fixed, paraffin-embedded tissue sample, fresh frozen tissue
or fresh tissue that has been stored in solutions such as RNAlater®. The isolation of
RNA can, for example, be carried out following any of the procedures described
above or throughout the ation, or by any other method known in the art. While
all techniques of gene expression profiling, as well as mics techniques, are
suitable for use in performing the methods described herein, the gene expression
levels are often ined by DNA microarray technology.
If the source of the tissue is a in-fixed, paraffin ed tissue
sample, the RNA may be fragmented, resulting in loss of information. The signatures
provided herein are derived from pools of transcripts sequenced from their 3’ end
thereby providing an accurate entation of the transcriptome of the tissue. Thus
the ures provided herein are useful for both fresh frozen and fixed paraffin-
embedded s.
In some embodiments, RNA samples used in the methods described herein
may be ed from a fixed, wax-embedded colon tissue specimen, by using one or
more of the ing steps, such as all of the ing steps:
(a) deparaffinizing using conventional s and with multiple wash steps
in organic solvent;
(b) air drying and treating with protease to break inter- and intracellular bonds,
resulting the release of RNA from the tissue;
(c) removing contaminating genomic DNA;
(d) washing in organic solvent; and eluting in a suitable RNase-free elution
buffer.
The RNA-extraction methods may also include incubation of the tissue in a
highly denaturing lysis buffer, which has the additional filnction of reversing much of
the formalin crosslinking that occurs in tissues preserved this way to improve RNA
yield and quality for performance in downstream assays.
Following RNA recovery, the RNA may optionally be further purified
resulting in RNA that is substantially free from contaminating DNA or proteins.
Further RNA purification may be accomplished by any of the aforementioned
techniques for RNA recovery or with the use of commercially available RNA cleanup
kits, such as RNeasy® MinElute® p Kit (QIAGEN®). The tissue specimen
may, for example, be obtained from a tumor, and the RNA may be obtained from a
microdissected n of the tissue specimen enriched for tumor cells.
Methods of gene expression profiling include methods based on hybridization
analysis of polynucleotides and methods based on cing of polynucleotides. In
some examples, mRNA sion in a sample is fied using Northern blotting
or in situ hybridization (Parker & Barnes, s in Molecular Biology 106247-
283, 1999); RNAse protection assays (Hod, Biotechniques 13:852-4, 1992); and PCR-
based methods, such as reverse transcription polymerase chain reaction (RT-PCR)
(Weis et al., Trends in Genetics 8263-4, 1992). Alternatively, antibodies can be
employed that can recognize specific duplexes, including DNA duplexes, RNA
duplexes, and DNA-RNA hybrid duplexes or DNA-protein duplexes. Representative
methods for sequencing-based gene expression analysis include Serial Analysis of
Gene sion (SAGE), and gene sion analysis by massively parallel
ure sequencing (MPSS). In one example, RT-PCR can be used to compare
mRNA levels in different s, to characterize patterns of gene expression, to
discriminate between closely related mRNAs, and to analyze RNA ure. In
specific examples, the disclosed colon cancer signatures are analyzed by nucleic acid
microarray techniques, PCR techniques or combinations there of.
1. Gene Expression Profiling With Microarray s
In some embodiments, the expression profile of colon cancer-associated genes
and/or transcripts, such as those shown in Table 6, can be measured in either fresh or
paraffin-embedded tumor tissue, using microarray technology. In this method,
polynucleotide sequences of interest, such as polynucleotide sequences that
specifically hybridize to the nucleic acid sequences shown in Table 6 or a
complement thereof, are , or arrayed, on a microchip substrate. The arrayed
sequences are then hybridized with c acids from cells or tissues of interest.
Just as in RT-PCR methods (see below), the source ofmRNA typically is total
RNA isolated from human tumors or tumor cell lines, and corresponding normal
s or cell lines. Thus RNA can be isolated from a variety of primary tumors or
tumor cell lines. If the source ofmRNA is a primary tumor, mRNA can be extracted,
for example, from frozen or archived paraffin-embedded and/or fixed (e.g. formalin-
fixed) tissue s, which are routinely ed and preserved in everyday clinical
practice.
In specific embodiments of the microarray technique, PCR amplified inserts of
cDNA clones or oligonucleotides are applied to a substrate in a dense array. Short
oligonucleotides may also be synthesized directly on a ate using, for example, a
combination of semiconductor-based photolithography and solid phase chemical
synthesis technologies. (Affymetrix, Inc., Santa Clara, CA). In one embodiment, at
least 10,000 tide sequences are present on the ate. The microarrayed
transcripts, immobilized on the substrate are suitable for hybridization under stringent
ions. Fluorescently labeled nucleotide probes may be generated h
incorporation of fluorescent nucleotides by reverse transcription ofRNA extracted
from tissues of interest. Labeled probes applied to the array hybridize with specificity
to each nucleotide on the array. After washing to remove non-specifically bound
probes, the array is scanned by confocal laser microscopy or by another detection
method, such as a CCD camera. tation of hybridization of each arrayed element
allows for assessment of corresponding transcript abundance.
With dual color fluorescence, separately labeled nucleotide probes ted
from two s may be ized pairwise to the array. The urized scale of
the hybridization affords a convenient and rapid evaluation of the sion pattern
for large numbers of genes. Such methods have been shown to have the sensitivity
ed to detect rare transcripts, which are expressed at a few copies per cell, and to
reproducibly detect at least approximately two-fold differences in the expression
levels (Schena et al., Proc. Natl. Acad. Sci. USA 93(2):lO6 149 (1996)). rray
analysis can also be performed by commercially available equipment, following
manufacturer's protocols, such as by using the Affymetrix GeneChip® technology
(Affymetrix, Inc., Santa Clara, CA), or Agilent microarray technology (Agilent
Technologies, Inc., Santa Clara, CA).
The development of microarray methods for large-scale analysis of gene
expression makes it possible to search systematically for molecular markers of cancer
classification and outcome prediction in a variety of tumor types, such as colon cancer
tumors.
In particular embodiments provided herein, arrays can be used to evaluate a
colon cancer gene sion profile, for example to prognose or diagnose a patient
with colon cancer. When describing an array that consists essentially of probes or
s specific for the genes listed in Table 1, Table 2, and/or the transcripts listed in
Table 6, such an array includes probes or primers specific for these colon cancer
associated genes, and can further include control probes (for example to confirm the
incubation conditions are sufficient). Exemplary control probes include GAPDH, B-
actin, and 18S RNA.
1'. Array substrates
The solid support of the array can be formed from nic material (such as
glass) or an organic polymer. Suitable materials for the solid support include, but are
not limited to: polypropylene, polyethylene, polybutylene, polyisobutylene,
polybutadiene, polyisoprene, polyvinylpyrrolidine, polytetrafluroethylene,
polyvinylidene difluroide, polyfluoroethylene-propylene, polyethylenevinyl alcohol,
polymethylpentene, polycholorotrifluoroethylene, polysulfomes, ylated
biaxially oriented opylene, ed biaxially oriented polypropylene, thiolated
biaxially oriented polypropylene, ethyleneacrylic acid, thylene methacrylic acid, and
blends of copolymers f (see US. Patent No. 5,985,567).
In general, suitable characteristics of the material that can be used to form the
solid t e include: being amenable to surface activation such that upon
activation, the surface of the support is capable of covalently attaching a biomolecule
such as an oligonucleotide thereto; amenability to “in situ” synthesis of biomolecules;
being chemically inert such that at the areas on the support not occupied by the
oligonucleotides are not amenable to non-specific g, or when non-specific
binding occurs, such materials can be y removed from the surface without
removing the oligonucleotides.
In another example, a surface activated organic polymer is used as the solid
t surface. One example of a surface activated organic r is a
polypropylene material aminated via radio frequency plasma discharge. Other
reactive groups can also be used, such as carboxylated, hydroxylated, thiolated, or
active ester groups.
ii. Arrayformats
A wide variety of array formats can be employed in accordance with the
present disclosure. One example includes a linear array of oligonucleotide bands,
generally referred to in the art as a dipstick. Another suitable format includes a two-
dimensional pattern of discrete cells (such as 4096 squares in a 64 by 64 array). As is
appreciated by those skilled in the art, other array formats including, but not limited to
slot (rectangular) and circular arrays are equally le for use (see US. Patent No.
185). In some examples, the array is a multi-well plate. In one example, the
array is formed on a polymer medium, which is a thread, membrane or film. An
example of an organic polymer medium is a polypropylene sheet having a thickness
on the order of about 1 mil (0.001 inch) to about 20 mil, although the thickness of the
film is not al and can be varied over a fairly broad range. The array can include
biaxially oriented polypropylene (BOPP) films, which in addition to their durability,
exhibit low background fluorescence.
The array formats of the present sure can be included in a variety of
different types of formats. A “format” es any format to which the solid support
can be affixed, such as microtiter plates (e.g., multi-well plates), test tubes, inorganic
sheets, dipsticks, and the like. For example, when the solid support is a polypropylene
thread, one or more polypropylene threads can be affixed to a plastic dipstick-type
device; polypropylene membranes can be affixed to glass slides. The particular format
is, in and of , unimportant. All that is necessary is that the solid support can be
affixed thereto without affecting the functional behavior of the solid support or any
biopolymer absorbed thereon, and that the format (such as the dipstick or slide) is
stable to any materials into which the device is introduced (such as clinical samples
and hybridization solutions).
The arrays of the present disclosure can be prepared by a variety of
approaches. In one example, oligonucleotide or protein sequences are synthesized
tely and then attached to a solid support (see US. Patent No. 789). In
r example, sequences are synthesized directly onto the support to e the
desired array (see US. Patent No. 5,554,501). le methods for covalently
coupling oligonucleotides and proteins to a solid support and for directly sizing
the oligonucleotides or proteins onto the t are known to those working in the
field; a summary of suitable methods can be found in Matson et al., Anal. Biochem.
217:306-10, 1994. In one example, the ucleotides are synthesized onto the
support using conventional chemical techniques for preparing oligonucleotides on
solid supports (such as PCT applications WO 85/01051 and W0 89/10977, or US.
Patent No. 5,554,501).
A suitable array can be produced using automated means to synthesize
oligonucleotides in the cells of the array by laying down the precursors for the four
bases in a predetermined n. Briefly, a multiple-channel automated chemical
delivery system is employed to create oligonucleotide probe populations in parallel
rows (corresponding in number to the number of channels in the ry system)
across the substrate. Following completion of oligonucleotide synthesis in a first
direction, the substrate can then be rotated by 900 to permit synthesis to proceed
within a second set of rows that are now perpendicular to the first set. This process
creates a multiple-channel array whose intersection generates a plurality of discrete
cells.
The oligonucleotides can be bound to the polypropylene support by either the
3' end of the oligonucleotide or by the 5' end of the oligonucleotide. In one example,
the oligonucleotides are bound to the solid support by the 3' end. However, one of
skill in the art can determine r the use of the 3' end or the 5' end of the
oligonucleotide is le for bonding to the solid support. In l, the internal
complementarity of an oligonucleotide probe in the region of the 3' end and the 5' end
ines g to the support.
In ular examples, the ucleotide probes on the array include one or
more labels that permit detection of oligonucleotide probe:target sequence
hybridization complexes.
2. Gene Expression ng With Microarray Methods
One of the most sensitive and most flexible quantitative methods is RT-PCR,
which can be used to compare mRNA levels in different sample tions, in
normal and tumor tissues, with or without drug treatment, to characterize patterns of
gene expression, to discriminate between closely related mRNAs, and to analyze
RNA structure.
The first step is the isolation of RNA from a target sample such as human
tumors or tumor cell lines, and corresponding normal tissues or cell lines,
tively. If the source ofRNA is a primary tumor, RNA can be extracted, for
example, from frozen or archived paraffin-embedded and/or fixed (e.g. formalinf1xed
) tissue samples.
A variation of RT-PCR is real time quantitative RT-PCR, which measures
PCR product accumulation through a dual-labeled fluorogenic probe (e.g., TaqMan ®
probe). Real time PCR is compatible both with quantitative competitive PCR, where
internal competitor for each target sequence is used for normalization, and with
quantitative ative PCR using a normalization gene contained within the
sample, or a housekeeping gene for RT-PCR (see Heid et al., Genome Research
6:986-994, 1996). Quantitative PCR is also described in US. Pat. No. 5,538,848.
Related probes and quantitative amplification procedures are described in US. Pat.
No. 5,716,784 and US. Pat. No. 5,723,591. Instruments for carrying out quantitative
PCR in microtiter plates are available from PE Applied Biosystems (Foster City, CA).
In other examples, mRNA levels are ed using TaqMan® RT-PCR
technology. TaqMan® RT-PCR can be performed using commercially available
ent. The system can include a thermocycler, laser, charge-coupled device
(CCD) camera, and computer. In some es, the system amplifies samples in a
96-well format on a thermocycler. During amplification, laser-induced fluorescent
signal is collected in real-time through fiber optics cables for all 96 wells, and
detected at the CCD. The system es software for running the instrument and for
ing the data.
To minimize errors and the effect of -to-sample variation, RT-PCR can
be performed using an internal standard. The ideal internal standard is expressed at a
constant level among different tissues, and is unaffected by an experimental
treatment. RNAs ly used to normalize patterns of gene expression are
mRNAs for the housekeeping genes GAPDH, B-actin, and 18S ribosomal RNA.
The steps of a entative protocol for tating gene expression using
fixed, paraffin-embedded tissues as the RNA source, including mRNA isolation,
purification, primer extension and cation are given in various published journal
articles (see Godfrey et al., J. M01. Diag. 2:84 91, 2000; Specht et al., Am. J. Pathol.
158:419-29, 2001). Briefly, a representative process starts with cutting about 10 um
thick sections of paraffin-embedded tumor tissue samples. The RNA is then extracted,
and protein and DNA are removed. Alternatively, RNA is ed directly from a
tumor sample or other tissue sample. After analysis of the RNA concentration, RNA
repair and/or amplification steps can be included, if necessary, and RNA is reverse
transcribed using gene specific ers followed by RT-PCR and/or ization
to a nucleic acid array.
In alternate ments, commonly used methods known in the art for the
quantification ofmRNA expression in a sample may be used with the colon
ures provided herein. Such methods include, but are not limited to, northern
blotting and in situ hybridization (Parker & Barnes, Methods in Molecular Biology
106247 283 (1999)); RNase protection assays (Hod, Biotechniques 13:852 854
(1992)). Alternatively, antibodies may be employed that can recognize specific
duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA hybrid duplexes
or DNA-protein duplexes.
Further PCR—based techniques include, for example, differential display
(Liang and , Science 257:967 971 (1992)); amplified fragment length
polymorphism (iAFLP) (Kawamoto et al., Genome Res. 12:1305 1312 (1999));
BeadArrayTM technology ina, San Diego, Calif.; Oliphant et al., Discovery of
s for Disease (Supplement to Biotechniques), June 2002; Ferguson et al.,
Analytical Chemistry 72:5618 (2000)); BeadsArray for Detection of Gene Expression
(BADGE), using the commercially available Luminex100 LabMAP system and
multiple color-coded microspheres (Luminex Corp., Austin, Tex.) in a rapid assay for
gene expression (Yang et al., Genome Res. 11:1888 1898 (2001)); Competitive PCR
and MassARRAY (Oeth et al., 2004, SEQUONOME Application Note); and high
coverage expression profiling ) analysis (Fukumura et al., Nucl. Acids. Res.
31(16) e94 ).
The primers used for the amplification are selected so as to amplify a unique
segment of the gene of interest (such as the genes listed in Table 1, Table and Table 6.
Primers that can be used to these are commercially available or can be designed and
synthesized according to well known methods using the sequences of these genes as
available for example in GENBANK®.
An ative quantitative nucleic acid cation procedure is described in
US. Pat. No. 5,219,727. In this procedure, the amount of a target sequence in a
sample is determined by simultaneously amplifying the target sequence and an
internal standard nucleic acid t. The amount of amplified DNA from each
t is determined and compared to a standard curve to determine the amount of
the target nucleic acid segment that was present in the sample prior to amplification.
In some examples, gene expression is identified or confirmed using the
rray technique. Thus, the expression profile can be measured in either fresh or
paraffin-embedded tumor tissue, using microarray technology. In this method, colon
cancer signature nucleic acid sequences of interest ding cDNAs and
oligonucleotides) are plated, or d, on a microchip ate. The arrayed
sequences are then hybridized with isolated nucleic acids (such as cDNA or mRNA)
from cells or tissues of interest. Just as in the RT-PCR , the source ofmRNA
typically is total RNA ed from human tumors, and optionally from
corresponding noncancerous tissue and normal tissues or cell lines.
In a specific ment of the microarray technique, PCR amplified inserts
of cDNA clones are applied to a substrate in a dense array. In some examples, the
array includes probes specific to at least two of the colon cancer signature genes in
Tables 1, 2, and 6. The microarrayed nucleic acids are suitable for hybridization under
stringent ions. Fluorescently labeled cDNA probes may be generated through
incorporation of fluorescent nucleotides by reverse transcription ofRNA extracted
from tissues of interest. Labeled cDNA probes applied to the chip hybridize with
specificity to each spot ofDNA on the array. After stringent washing to remove non-
specifically bound probes, the chip is scanned by confocal laser microscopy or by
another detection method, such as a CCD camera. Quantitation of hybridization of
each d element allows for assessment of corresponding mRNA abundance.
With dual color fluorescence, separately labeled cDNA probes generated from two
sources of RNA are hybridized pairwise to the array. The relative abundance of the
transcripts from the two sources ponding to each specified gene is thus
determined simultaneously. The miniaturized scale of the ization affords a
convenient and rapid tion of the expression pattern for colon cancer signature
genes in Tables 1, 2, and 6. Microarray analysis can be performed by commercially
ble equipment, following manufacturer's protocols, such as are supplied with
Affymetrix GeneChip® technology (Affymetrix, Santa Clara, CA), or Agilent’s
microarray technology (Agilent Technologies, Santa Clara, CA).
3. onal s ofGene Expression Analysis
Serial analysis of gene expression (SAGE) is another method that allows the
simultaneous and quantitative analysis of a large number of gene transcripts, without
the need of providing an individual ization probe for each ript. First, a
short sequence tag (about 10-14 base pairs) is generated that contains sufficient
information to uniquely identify a transcript, provided that the tag is obtained from a
unique position within each transcript. Then, many transcripts are linked together to
form long serial molecules that can be sequenced, revealing the identity of the
multiple tags simultaneously. The sion pattern of any population of transcripts
can be quantitatively evaluated by determining the abundance of individual tags, and
identifying the gene corresponding to each tag (see, for example, Velculescu et al.,
Science 270:484-7, 1995; and Velculescu et al., Cell 88:243-51, 1997).
In situ hybridization (ISH) is another method for detecting and comparing
expression of genes of interest. ISH applies and extrapolates the logy of nucleic
acid hybridization to the single cell level, and, in combination with the art of
cytochemistry, immunocytochemistry and immunohistochemistry, permits the
nance of morphology and the identification of cellular markers to be
maintained and identified, and allows the zation of ces to specific cells
within populations, such as tissues and blood samples. ISH is a type of hybridization
that uses a complementary nucleic acid to localize one or more specific nucleic acid
sequences in a portion or section of tissue (in situ), or, if the tissue is small enough, in
the entire tissue (whole mount ISH). RNA ISH can be used to assay expression
patterns in a tissue, such as the expression of cancer survival factor-associated genes.
Sample cells or tissues are treated to increase their permeability to allow a
probe, such as a cancer survival factor-associated gene-specific probe, to enter the
cells. The probe is added to the treated cells, allowed to hybridize at pertinent
temperature, and excess probe is washed away. A complementary probe is d so
that the probe’s location and quantity in the tissue can be determined, for example,
using autoradiography, fluorescence microscopy or immunoassay. The sample may be
any sample as herein described, such as a non-tumor sample or a breast or lung tumor
sample. Since the sequences of the cancer survival factor-associated genes of interest
are known, probes can be designed accordingly such that the probes cally bind
the gene of interest.
In situ PCR is the PCR-based amplification of the target nucleic acid
sequences prior to ISH. For ion of RNA, an intracellular reverse transcription
step is introduced to generate complementary DNA from RNA templates prior to in
situ PCR. This enables detection of low copy RNA ces.
Prior to in situ PCR, cells or tissue samples are fixed and permeabilized to
ve logy and permit access of the PCR reagents to the intracellular
sequences to be amplified. PCR amplification of target sequences is next performed
either in intact cells held in suspension or directly in cytocentrifilge preparations or
tissue sections on glass . In the former approach, fixed cells suspended in the
PCR on mixture are thermally cycled using conventional thermal cyclers. After
PCR, the cells are cytocentrifuged onto glass slides with visualization of intracellular
PCR products by ISH or immunohistochemistry. In situ PCR on glass slides is
performed by overlaying the samples with the PCR mixture under a coverslip, which
is then sealed to prevent evaporation of the reaction mixture. Thermal g is
achieved by placing the glass slides either directly on top of the heating block of a
conventional or specially designed thermal cycler or by using thermal cycling ovens.
Detection of intracellular PCR products is generally achieved by one of two
ent techniques, indirect in situ PCR by ISH with PCR-product c probes,
or direct in situ PCR without ISH through direct detection of labeled nucleotides
(such as digoxigenin-l l-dUTP, fluorescein-dUTP, 3H-CTP or biotin-l6-dUTP),
which have been incorporated into the PCR products during thermal cycling.
In some ments of the detection methods, the expression of one or more
“housekeeping” genes or “internal controls” can also be evaluated. These terms
include any constitutively or globally expressed gene (or protein, as discussed below)
whose presence enables an assessment of cancer survival factor-associated gene (or
protein) levels. Such an ment includes a determination of the overall
constitutive level of gene transcription and a control for variations in RNA (or
protein) recovery.
The disclosure is further illustrated by the following non-limiting Examples.
EXAMPLES
Example 1
This example bes the generation and validation of an exemplary
predictive tool for the categorization of colon cancer samples using the methods and
reagents disclosed herein. This example includes materials published by inventors of
the subject technology in Kennedy et al, J. Clin. Oncol., 29(35) 4620-4626, 2011,
which is specifically incorporated herein by reference in its entirety.
A colorectal cancer transcriptome focused research array was developed
(Colorectal Cancer DSATM (Almac Diagnostics, N. Ireland; which can be found on
the world wide web at almac-diagnostics.com)) capable of delivering accurate
expression data from FFPE derived RNA (Johnston et al., J Clin. Oncol. 24: 3519,
2006).
The Colorectal Cancer DSATM research tool contains 61,528 probe sets and
encodes 52,306 transcripts confirmed as being expressed in colon cancer and normal
tissue. Comparing the Colorectal Cancer DSATM research tool against the al
Center for Biotechnology Information (NCBI) human Reference Sequence (RefSeq)
RNA database (which can be found on the world wide web at
ncbi.nlm.nih.gov/RefSeq/) using BLAST analysis, 21,968 (42%) ripts are
present and 26,676 (51%) of transcripts are absent from the human RefSeq database.
Furthermore 7% of the t represents sed antisense ripts to annotated
genes. (Johnston et al., J Clin. Oncol. 24: 3519, 2006; Pruitt et al., Nucleic Acids
Research 33: D501-D504, 2005). In addition, probe-level is of the Colorectal
Cancer DSATM compared with leading c , highlighted that approximately
,000 (40%) transcripts are not contained on the leading generic microarray platform
(Affymetrix) and are unique to the Colorectal Cancer DSATM. Thus, the Colorectal
Cancer DSATM research tool es transcripts that have not been available in
to performed gene expression studies. Finally, because the transcript
information used to design the Colorectal Cancer DSATM was generated in part by a
high throughput sequencing approach, it has been possible to te probes closer
to the 3 ’ end of the transcripts than are contained on other generic microarrays. The
ation of relevant disease specific content and 3’ based probe design has
yielded a unique product capable of robust profiling from FFPE derived RNA.
The aim of this study was to assess the use of the Colorectal Cancer DSATM
research array, using FFPE derived tumor material to generate and independently
te a prognostic gene signature capable of accurately classifying stage II colon
cancer patients as being at low or high risk of relapse, post y. Stage II colon
cancer as used in this example is A]CC T3 or T4 node negative (N0) non metastatic
(MO) colon cancer.
METHODS
Sample Selection.
Samples were collected retrospectively with the following eligibility criteria:
stage II colon adenocarcinoma only, with no evidence of residual disease; patient age
45 years or older at time of y surgery; six or more regional lymph nodes
assessed; a m of 50% tumor cells present in the tissue section; no family
history of colon cancer; no preoperative or postoperative cancer therapy within 1 year
of surgery (although therapy given after recurrence was acceptable); and minimum
patient -up of 5 years for sk ts. Low-risk patients were defined as
those with no cancer recurrence within 5 years of primary surgery. High-risk patients
were defined as those with metastatic cancer recurrence within 5 years of primary
surgery. Patients with local disease recurrence were excluded because this recurrence
may have been a result of local residual disease after surgery rather than metastatic
tumor. s were collected from 12 centers. All samples underwent independent
histopathologic review by a pathologist. The data set was compared with the
Surveillance, Epidemiology, and End Results database to ensure it represented a
general population with stage II colon . Key patient and tumor characteristics
are given in Table 3 (see .
Gene Expression Profiling From FFPE .
Total RNA was extracted from FFPE tumor samples using the Roche High
Pure RNA Paraffin Kit (Roche, Basel, Switzerland). Amplified cDNA targets were
prepared using the Nugen WT-Ovation® FFPE System v2 in combination with the
Nugen FL-Ovation® cDNA Biotin Module v2 and were performed in accordance
with manufacturer’s instructions. Hybridization, washing, staining and scanning of
nted, labelled cDNA was carried out ing to standard trix
protocols. Between 3.0 and 3.5 ug nted, labelled cDNA was hybridized to the
Colorectal Cancer DSA TM microarray (Almac, Craigavon, United Kingdom) on the
Affymetrix 7G scanner (Affymetrix, Santa Clara, CA). A sample profile scheduling
strategy was used that involved the stratification of samples into batches that were
ized against targeted clinical and sample property factors in addition to
operators, reagent, and material lots. Quality control criteria were applied, and
ic and technical factors were balanced between low- and high-risk samples.
This is performed in order to minimise systematic bias and diffuse any residual
technical bias into cal variation.
Classifier Model Identification.
Model development started with 5,014 probe sets identified as stable and/or
having comparable longitudinal stability under FFPE fixation to avoid the issue of
differential degradation of probe sets. Signature generation was subsequently
performed using the partial least squares classification method with selection of
ant features based on recursive feature elimination (RFE) during 10 repeats of
five-fold cross validation. All aspects of the model development were appropriately
nested within the cross validation, including an initial filtering to remove 50% of the
probe sets with the lowest variance and ity, reference-based robust multichip
averaging (RefRMA) normalization and summarization, and RFE discarding the least
important 10% of probe sets at each iteration. The total number of features to include
in the final model was determined by the feature length with the highest average area
under the receiver operating characteristics curve (AUC) under cross validation. The
threshold for omization of the tions from each model was selected based
on the maximum of the sum of sensitivity and specificity (minimum of the Youden J
statistic (Youden, Cancer 3:32-35, 1950) from cross-validated training data. In the
case of le thresholds with largely identical performance, the hazard ratio (HR)
from Cox proportional hazards regression was used as a tiebreaker to favor higher HR
values.
The precision of the predictions was evaluated by ting technical
replicates of a ctal cancer cell line (HCTl l6) ed in FFPE, which was
profiled concurrently with the clinical samples. The repeated technical measurements
of this sample were not included in model development but were predicted by all 50
validation training subsets as an independent test set with a view to select
models with high repeatability and reproducibility. Additionally, a permutation test
was performed where the true class labels were reshuffled randomly 100 times
followed by complete model development. This was done to assess what
classification performance one can expect by chance from a data set with these
characteristics and to reveal any bias in the signature generation procedure.
The independence of the final model in the t of known clinical factors
was evaluated using univariate and multivariate Cox proportional hazards regression.
The input used was the predicted dichotomized class labels together with tumor stage,
patient tumor grade, tumor location, patient age, t sex, mucinous/nonmucinous
subtype, and number of lymph nodes retrieved. atellite instability was not
included as a factor because this information was not available for the majority of the
samples. Gene Ontology tion and ment of Gene Ontology biologic
processes and lar functions were performed using an internally developed tool
based on the genes in the final ure. The hypergeometric distribution with false
discovery rate le testing correction was used to determine fianctional classes of
genes significantly enriched. The pathway analysis was generated through the use of
Ingenuity Pathway Analysis (Ingenuity s, Redwood City, CA).
Balancing, randomisation and Quality Control (QC) ofsamples.
Target population: The population used to train the assay was matched to
reflect the general population properties from the SEER and CRUK databases. The
following properties were being considered:
° Gender. The gender ence amongst in the Stage 11 population is
approximately n 50-60% male (56% in the UK and 57% in the US).
0 Tumour location (distal/proximal). The prevalence in the Stage 11 population
is approximately 55%-65% proximal and 35%-45% distal.
0 Patient age. According to NCI’s SEER Cancer Statistics Review Colon and
Rectum Section, from 2001-2005, 0.1% of patients were diagnosed under age
; 1.0% between 20 and 34; 3.7% between 35 and 44; 11.6% between 45 and
54; 18.3% between 55 and 64; 25.1% between 65 and 74; 28.2% between 75
and 84 and 12.2% 85+ years of age.
° Recurrence-free survival rate. The rate of recurrence-free survival in the
Stage 11 population is reported to be between 13%-22% (Gattaj et al, European
Journal of , 2006) and ~30% from the SEER database.
Pre-balancing.‘ Pre-balancing was performed so that the sample set put
forward for ization was balanced with respect to selected clinical covariates
whilst maintaining the general population statistics presented above. This excludes
recurrence-free survival, which was intentionally enriched to increase the power of
the biomarker discovery. The training set did not contain any samples with events
after 5 years, whereas this was not a constraint in the validation set. The rationale for
not using samples that recur after 5 years for signature generation (i.e. in the training
set) is to avoid introducing additional geneity in the sample population when
performing the biomarker discovery.
The main aim of the balancing procedure was to reduce the association (if
any) between the endpoint (high/low risk ented as a binary variable) and any of
the factors listed below. Any ation between these factors and the ow risk
endpoint would introduce a confounding that could limit the clinical utility of the
assay. 603 colorectal s were subjected to the pre-balancing in order to reduce
strong associations between prognosis and any of the ing s: Gender;
Tumor location within the bowel; t age; Contributing Centre; FFPE block age
(date of surgery); Tumor content; and RNA quality.
Continuous parameters were tested using a Kolmogorov-Smimoff test and
categorical parameters were tested using a chi-squared test. A p-value 2 0.4 for all
parameters was required to achieve balancing. 504 s ed after balancing
(335 low risk and 169 high risk) and were put forward to array proflling.
Randomization ofsamples during array profiling: Randomization of samples
was performed to avoid confounding between known technical and ical factors,
primarily the endpoint of st (prognosis). In this study operator, hybridization-
tain (HWS) kit lot, array lot and array batch were considered together with the
contributing center and the prognosis. Samples were first randomized into array
batches such that each array batch had the same proportion of prognosis and
contributing center. Operators were then assigned to each array batch according to
availability. Each array batch was then assigned a HWS kit, ensuring that each
operator used the same proportion of each kit. Array lots were allocated to each array
batch, ensuring that they were evenly distributed amongst the array s.
Quality control ofthe training data: QC procedures were applied on the
resulting arrays, primarily based on values in the Affymetrix RPT files that contain
various quality-related parameters. Limits were calculated based on visual inspection
of the distribution for each parameter for all samples: % present calls (2 20%
required); Image artefacts were identified to remove arrays with noticeable blotches ;
Outliers were detected from principal ent analysis (PCA) based on the Q
residuals and ing’s T2.
Assessment of gender genes was used to determine if observed expression
levels matched the known gender in clinical information
The following Affymetrix quality parameters were also considered during the
visual inspection of the distributions; broadly categorised as follows: RNA Quality;
Signal Quality & Detection call; Background & Noise; and Background Homogeneity
A total of 3 l9 ctal samples passed the QC procedure. Due to
inary results suggesting a heterogeneity introduced by the rectal samples, the
rectal samples were removed to form a 249 colon-only set, which was put forward for
final (post-QC) balancing.
Final post-QC balancing: The 249 colon samples passing QC were balanced
using the same principles as the initial pre-balancing, with the addition of criteria for
the % present call bution to be similar in both low risk and high risk groups (this
information is only available after hybridisation). A final set of 215 samples remained
after QC and balancing.
The final colon set with 215 samples has the ing properties compared to
the known population distribution: Gender: 53% male (50-60% in population); Tumor
location (distal/proximal): 62% proximal (55-65% al in population); Patient
age: Closely follows the continuous bution of the population; and Recurrence-
free survival rate: 34% poor prognosis (high risk). Intentionally enriched compared to
population around 15-20%.
Quality control ofthe validation set andfuture sample sets: Using a tailor-
made QC procedure on the training set is an important step in order to facilitate the
identification of biomarkers from a high-quality data set. However for prediction of
future samples, QC has to be applied on a mple-at-a-time basis. Also, the QC
procedure cannot be too specific to the data set and the system where the data has
been generated. For this purpose a separate evaluation was performed using 40
samples replicated across two systems and scanners to identify QC parameters that
are stable across systems. The AngigA parameter (average signal of the absent probe
sets) was determined to be the most stable parameter across the different s and
hence the best ate for a system-independent QC ure. For this parameter,
higher values imply lower quality and lower values imply higher y. The
AngigA values are strongly negatively correlated to the % present call parameter
which is a commonly used QC parameter and was the primary QC parameter used on
the training set. The lower acceptance value of % present calls from the training set
was set to 20%, which corresponds approximately to an upper acceptance value of 43
for the AngigA parameter for this data set. To accommodate younger FFPE samples,
it was decided not to introduce a lower threshold on the AngigA (which will allow
inclusion of higher-quality samples). The final inclusion range derived from this study
was hence AngigA S 43, which was the QC metric applied to the independent
validation set and is the QC that will be applied to future samples.
Identifyingprobe sets that are stable over FFPE block age: It was ized
that mRNA transcripts are likely to degrade at different rates and to different levels in
FFPE samples, which could result in a signature generated from old material not
performing as expected on fresh FFPE material. ore two independent
longitudinal studies were med to fy probe sets that are stable over FFPE
block age. In the first study, 9 FFPE blocks were ly sectioned and analyzed by
DNA microarray at seven time points in a 16-week timeframe following fixation.
These samples were mented by a second longitudinal study at three 6 month
intervals in a one year timeframe in which 8 FFPE blocks ranging from 6 months to 4
years of age which were serially sectioned and analyzed by DNA microarray resulting
in 113 individual samples for analysis. 5014 transcripts were identified that either did
not o r degradation with time or decayed at an equivalent rate following
fixation. This list of probe sets was subsequently used for signature generation. A
separate manuscript for presenting the details from this study is in preparation.
Estimating the precision ofthe classifier daring model development: The
ability of a classifier to consistently produce the same output from technical replicates
is an important aspect of an assay when used in a test setting. For this purpose, a set
of 39 nce samples, which are technical replicates of the same colorectal cancer
cell line (HCTl 14), were hybridized er with the al samples. During model
development, this set was predicted as an external test set during cross-validation in
order to estimate the relative variance at each step in the model development s.
No information was shared between the ng set and the 39 sample reference set
during cross-validation. The standard deviation from the predicted signature scores
were calculated and visualized as the average with 95% confidence limits. The
variability is low for longer signatures, which then gradually increases over the
e selection ure, which is also reflected in lower accuracy (AUC) for the
shorter signatures. At the ed signature length (634 probe sets), the model shows
both high precision and accuracy.
Permutation analysis ofthe classification performance: Permutation analysis
was performed to evaluate what classification performance one can expect by chance
from a data set with similar properties. This was performed by randomly reshuffiing
the true class labels (i.e. the true prognosis) and uently repeating the entire
model development process (with filtering, normalization, e selection and
classification). The signature performance is significantly better than chance at longer
signature lengths and specifically at the selected one where the number of probe sets
is 634. Additionally, the permutation test reveals any ying bias in the data set
and/or the methodology used to develop the classifier. The median AUC over the
random labels is 0.5, denoting chance, which s that there is no evident bias in
the procedure used.
RESULTS
Development ofa Stage II Colon Cancer Prognostic ure From FFPE
Tissues. Disease-free survival at 5 years was used as the primary end point for this
study. After balancing for clinical factors and applying quality control criteria to the
initial data set, a training set of 215 patients (142 sk and 73 high-risk patients)
was identified. Fifty percent variance-intensity filtering, RefRMA normalization, RFE
feature selection, and partial least squares classification were performed under 10
repeats of five-fold cross validation for estimation of the classification mance.
Cross validation indicated a 634-transcript signature to be optimal for prognostic
classification. A receiver operating characteristic curve with an AUC of 0.68 (P <
.001) was generated, indicating a significant association n signature score and
prognosis (). The observed AUC was significantly higher than random in the
permutation analysis and displayed a low ce in the evaluation of the precision
from technical replicates. A threshold of 0.465 for dichotomization of the signature
prediction scores was established from the Youden J statistics, yielding an HRof 2.62
(P < .001; ). Table 4 contains a summary of the classification performance
over the signature generation during cross validation.
Table 4. Classification Performance of the Training and ndent Validation Sets
Train(95 0.682(06 0.478(0.4 0.791(0.73 0.858(08 0.365(03 2.618(2.0
43—0720) 9) 7-0.845) 45-0872) 17—0413) 41—3195)
0.684(05 0.718(06 0.559(0.42 0.8 0.331(0.2 2.526(1.5
94-0761) 17-0811) 3-0.673) 28-0900) 50—0434) 36-4154)
The 95% CIs are :: 2 standard deviations from cross validation (training set) or
bootstrapping with 1,000 s (validation set); 80% and 20% priors have been used
when calculating the NPVs and PPVs, respectively. The old t = 0.465 was used
for dichotomization of the signature score. Abbreviations: AUC, area under the
receiver operating characteristics curve; HR, hazard ratio; NPV, negative predictive
value (negative is low risk); PPV, positive predictive value ive is high risk)
Independent Validation ofthe Stage II Colon Cancer Prognostic Signature:
The stic signature was applied to an independent validation set of 144 patients
enriched for recurrence (85 low-risk and 59 high-risk patients) using the threshold
score identified in the training set. The sample analysis was run separately and at a
later time to the training set. The signature predicted disease recurrence with an HR of
2.53 (P < .001) in the high-risk group (and Table 4). The signature also
predicted cancer-related death with an HR of 2.21 (P < .0084) in the high-risk group
(.
The fact that the signature described herein was developed from FFPE derived
tumor material facilitates a large scale validation gy based on retrospective
analysis of existing FFPE tumor banks.
The hazard ratio is an expression of the hazard or chance of events occurring
in the stage II colon cancer patients identified by the classifier as high risk as a ratio
of the hazard of the events occurring in the patients identified by the fier as low
risk. There was a significantly lower probability of recurrence for the group predicted
to have good prognosis compared to those predicted to have poor prognosis, within 5-
years post surgery. The ve tive value is the proportion of patients with
negative test results who are correctly sed (predicted negative). In a prognostic
setting, the NPV is dependent on the prevalence of disease recurrence. The positive
predictive value is the proportion of patients with positive test results who are
tly diagnosed (predictive positive). In a stic setting, the PPV is
dependent on the prevalence of disease recurrence. Based on a population prevalence
of 20% poor prognosis samples, this would imply that ts with a predicted poor
prognosis have a 33% probability of ence whereas patients with a predicted
good prognosis have a 13% probability of recurrence within 5 years.
Assessment ofSignature Independence From Known Prognostic Factors:
For a prognostic assay to be useful, it must perform independently from known
prognostic factors used in the . Therefore the independence of the assay was
assessed in both a univariate and multivariate analysis (Table 5).
Table 5. Comparison of Transcript ure to Standard Pathologic Parameters in
the Independent Validation Set
Univariate Multivariate
HR CI p HR CI p
Tumor Stage 0.667- 0.84-
1.23 0.5067 1.617 0.1501
T4 vs T3 2.269 3.110
Patient Age 1.014
1 01_
1.039 1069 0.0086 1.046 - 0.0041
’ 1.078
0.456- 0.48-
0.815 0.4895 1.274 0.6265
1.456 3.383
0.636
1.326 39- 0.434 2.161 - 0.2169
' 7.339
Tumor Location 1.224
1 075_
(Proximal vs 1.766 2' 901 0.0248 2.158 - 0.0078
Distal} ’ 3.804
Gender 0.549
0 713_
1.165 1' 901 0.5426 0.971 - 0.9204
' 1.720
Mucinous 0.433
0 418-
subtype 0.825 1627 0.5787 0.896 - 0.7682
’ 1.856
No. of Nodes 0.988
0 983-
ved 1.007 ' 0.5678 1.014 - 0.2824
1.032
1.041
Prognostic < 0.001 1.471 < 0.001
1 536-
Signature 2.526 2.551 -
4 154'
4.423
Both the univariate and multivariate analyses have been performed using Cox
proportional hazards regression with P values coming from a log-likelihood test. For
tumor grade, grade 1 has been used as the reference point for calculating the HR.
t age and number of nodes retrieved are analyzed as continuous factors. The
interpretation of the HR of patient age is the increased risk for a change in 1 year of
age, and correspondingly, the interpretation of the HR of number of nodes retrieved is
the increased risk for an increase of one retrieved node. Abbreviation: HR, hazard
ratio.
The prediction of prognosis was significant in both the univariate (P < .001)
and multivariate (P < .001) analysis, demonstrating that the signature provided
prognostic information in addition to conventional risk factors. Furthermore, the
independence of the ure was assessed with the addition of lymphovascular
invasion in the samples where this had been recorded (100 of 144 samples in the
validation set). The signature performed independently in the univariate (P < .001)
and multivariate analysis (P < .001).
Functional Analysis ofthe Genes in the Prognostic Signature: Next it was
asked if the assay ed biologic processes known to be relevant to colon cancer
recurrence. The 634 probe sets were analyzed using Ingenuity Pathway Analysis, and
a list of statistically significant pathways were fied, the most significant of
which was IGF-I signaling.
DISCUSION
As disclosed herein a DNAmicroarray—based assay was developed that
identifies patients at higher risk of recurrence after y for stage II colon cancer.
Specifically, the ure identified a high-risk cohort with an HR of recurrence of
2.53 and an HR of cancer-related death of 2.21 in an independent validation set.
Validation of a prognostic assay using a completely te set is necessary to avoid
overestimations of the performance of the signature from the ng set. The HR of
2.53 for recurrence compares favorably with histologic factors currently used to make
decisions in the clinic, which typically have anHR of approximately 1.5 or less.
Moreover, the signature does not require individual interpretation and may offer a
more standardized approach than conventional histopathologic factors. Importantly,
the assay is performed on FFPE tissue and, therefore, is easily d in current
medical practice.
gh several DNAmicroarray—based prognostic tests in several cancer
types have been published, only one has been introduced into clinical practice, and to
date, none is used in colon cancer. This may be a result of two major factors. First,
many of the signatures have been developed from fresh or frozen tissue. ,
inappropriate study methodology has resulted in a failure to te the test in an
independent data set.
ing the use of frozen tissue samples, although this tissue type provides
ent microarray data, a test generated from this tissue is unlikely to perform
tely in FFPE tissue. This can create difficulty in collecting enough samples to
develop and independently validate a prognostic test. In addition, implementation of
fresh tissue— based assays requires a change in clinical practice, because samples need
to be collected at the time of surgery.
FFPE is the standard for tumor archiving, and numerous tumor banks already
exist for assay development. Importantly, no change in sample collection and
processing is required for the development and clinical implementation of FFPE-
based .
The disclosed methods were developed to work with FFPE tissue but using a
DNA microarray platform, thereby vastly increasing the number of detectable mRNA
ripts and biologic processes relative to quantitative polymerase chain reaction
technology. As a result of using FFPE material with a microarray platform, several
methodologic issues needed to be considered. Formalin fixation results in the
degradation ofmRNA transcripts through the cross linking ofRNA to protein. Most
of this ation occurs immediately, but some transcripts continue to e with
time. The DNAmicroarray platform used for the study has probe sets designed to the
3‘ end ofmRNA transcripts to enhance the ability to detect degraded transcripts. In
addition, a separate set of colon cancer samples was ed over time to ensure we
did not incorporate probe sets that ed unstable or differentially stable mRNA
transcripts as part of the signature.
The predictive value of the signature is above and beyond known prognostic
clinical covariates. This mance can largely be attributed to the initial balancing
of prognosis t ic and technical factors that was performed as part of
establishing a suitable training set. Biologic factors ered include known
prognostic factors such as pT stage and grade, as well as other nonprognostic factors
that may have affected gene expression including tumor location, patient age, and sex.
Technical factors such as FFPE block age and the contributing center were also
balanced n high- and low-risk s in the ng set. In addition,
randomization of operators and reagent kits was performed to avoid confounding
between cal factors and known clinical factors. This minimized the risk that the
assay was dependent on the operator or relied on the use of samples from specific
centers or the use of specific batches of reagents. Because the assay was developed to
be independent from known prognostic factors, we believe that it may be possible to
develop a multiparametric test that incorporates several factors to give an even more
accurate prognostic indicator.
WO 03250
Functional analysis of the gene signature revealed that IGF-1 signaling, TGF-
B signaling, and HMGBI signaling were among the most cant pathways
fied. All of these have been previously reported to confer a poor prognosis in
colon cancer through promoting tumor growth, invasion, and metastasis and
preventing sis. In conclusion, disclosed herein is a validated and robust
prognostic DNA microarray signature for stage II colon cancer from FFPE stored
tumor tissue.
The disclosed signature can help physicians to make more informed clinical
decisions regarding the risk of relapse and the potential to benefit from adjuvant
chemotherapy. (Andre et al., Annals ofSargz'cal Oncology 13:887-898, 2006; Diaz-
Rubio et al., Clz'n. Transl. 0nc0l. 7: 3-1 1, 2005; Monga et al., Ann. Surg. Oncol. 13:
1021-1 134, 2006; Sobrero, Lancet 0nc0l. 7: 515-516, 2006). Furthermore, many
patients want to know their likelihood of cure and the risks/benefits of treatment. (Gill
et al., J. Clin. Oncol. 22: 1797-1806, 2004; Kinney et al., Cancer 91: 57-65, 2001;
Carney et al, Ann. R. Coll. Sarg. Engl. 88: 447-449, 2006; Salkeld, Health Expect 7:
104-1014, 2004). Being able to predict the patient’s prognosis provides the physician
and the patient with a better assessment of the risks/benefits and the choice of therapy.
The ability to offer individualized patient care will hopefully result in improved
survival and quality of life for these patients.
In the past, many studies have implicated sample size as the y reason
for lack of convincing statistical ce and point to larger trials being required to
prove the benefit of adjuvant treatment. Using validated prognostic markers, such as
the gene signature generated in this study, stage 11 patients can be stratified into high
and low risk sub-populations. This approach may assist in improved clinical trial
design by focusing on those patients at high risk of recurrence and therefore more
likely to derive a benefit from adjuvant therapy. Thus, the C0l0rectal Cancer DSATM
may be a useful research tool for stratifying patients for inclusion in clinical trials, for
decision-making ing adjuvant and neo-adjuvant treatment, and for the
fication of novel pathways or molecular targets for additional drug pment
The stic ure reported in Table 6 accurately predicted for relapse
for stage II colon cancer and is evaluated on an independent FFPE tion set. The
overall accuracy for prediction of recurrence was substantial for this heterogeneous
disease. Based on a population prevalence of 20% poor sis samples, this would
imply that patients with a ted poor prognosis have a 33% probability of
recurrence whereas patients with a predicted good prognosis have a 13% probability
of recurrence within 5 years. One of the major advantages of the current approach is
that it is based on expression ng from FFPE tissue which is the preferred
method of storage for the majority of available tissue banks. ovitz Proteome
Sci. 4:5, 2006). RNA extracted from FFPE tissue samples tends to have a shorter
median length due to degradation and formalin-induced modification, which makes it
difficult for generic arrays to detect. When defining the colon cancer transcriptome, a
3’-based sequencing approach was employed facilitating design of probesets to the 3‘
extremity of each transcript. This approach ensures much higher detection rate and is
thus optimally designed to detect RNA ripts from both fresh frozen and FFPE
tissue samples. The s from the current study showed that the Almac Diagnostics
Colorectal Cancer DSATM research tool is e of ing biologically
meaningful and reproducible data from FFPE derived tissue.
Example 2
Prognosis of Cancer
This example describes particular methods that can be used to prognose a
subject sed with colon cancer. r, one skilled in the art will appreciate
that methods that deviate from these specific methods can also be used to successfully
provide the prognosis of a subject with colon cancer.
A tumor sample and adjacent non-tumor sample is obtained from the subject.
Approximately l-lOO ug of tissue is obtained for each sample type, for example using
a fine needle aspirate. RNA and/or protein is isolated from the tumor and mor
tissues using routine methods (for example using a commercial kit).
In one example, the prognosis of a colon cancer tumor is determined by
detecting expression levels of 2 or more of the transcript in Tables 1, 2, and/or 6 in a
tumor sample obtained from a subject by microarray analysis or real-time quantitative
PCR. For example, the disclosed gene signature can be utilized. The relative
expression level of in the tumor sample is compared to the control (6.g. , RNA isolated
from adjacent non-tumor tissue from the subject). In other cases, the control is a
reference value, such as the ve amount of such molecules present in non-tumor
samples obtained from a group of y subjects or cancer subjects.
In view of the many possible embodiments to which the principles of the
disclosure may be d, it should be recognized that the illustrated embodiments.
Claims (52)
1. A method for diagnosing colon cancer in a sample obtained from a subject, comprising: ing an expression level of at least 100 colon cancer-related nucleic acid molecules listed in Table 6 in a sample comprising nucleic acids obtained from a subject; comparing the combined sion level of the at least 100 colon cancer-related nucleic acid molecules, or a decision score derived therefrom to a control old indicative of a diagnosis of colon cancer, wherein the expression level, or a decision score derived therefrom, on the same side of the threshold indicates a diagnosis of colon cancer, thereby sing colon cancer in the sample obtained from the subject; and wherein the at least 100 colon cancer-related nucleic acid molecules does not e EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, DSP, ICAM1, BMP2, BMPR1A, CYP1B1, IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, JUN, TCF7L2, FLT1, CEACAM6, GBP1, XPC, RALBP1, STAT1, FOXO3, RTEL1, 6B, EGFR, HIF1A, KLF6, or MUC2.
2. The method of claim 1, wherein the control old comprises: a threshold derived from corresponding transcripts from colon cancer-related nucleic acid molecules listed in Table 6 in a known colon cancer sample (or samples), wherein the expression level, or a decision score derived rom, on the same side of the threshold as a known colon cancer group tes a diagnosis of colon cancer.
3. A method for classifying a colon cancer sample, comprising: detecting an expression level of at least 100 colon cancer-related nucleic acid molecules listed in Table 6 in a sample comprising nucleic acids obtained from a subject; comparing the combined expression level of the at least 100 colon cancer-related nucleic acid les, or a decision score derived therefrom to a control threshold indicative of known classification, wherein the expression level, or a decision score derived therefrom, on the same side of the threshold permits classification of the colon cancer ; and wherein the at least 100 colon cancer-related nucleic acid molecules does not e EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, ICAM1, BMP2, BMPR1A, CYP1B1, IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, DSP, JUN, TCF7L2, FLT1, CEACAM6, GBP1, XPC, RALBP1, STAT1, FOXO3, RTEL1, TNFRSF6B, EGFR, HIF1A, KLF6, or MUC2.
4. The method of claim 3, wherein the control threshold comprises: a threshold derived from ponding transcripts from colon cancer-related nucleic acid molecules listed in Table 6 in a colon cancer sample (or samples) of known classification, wherein the expression level, or a decision score derived therefrom, on the same side of the threshold as a colon cancer sample (or samples) of known fication permits classification of the colon cancer sample.
5. The method of any one of claims 3 or 4, wherein the colon cancer sample is classified as stage I, stage II, stage III and stage IV.
6. The method of any one of claims 3-5, further comprising ng a plan designating a future treatment that will be effective for the classified colon cancer.
7. The method of claim 6, wherein the treatment is surgical resection, chemotherapy, radiation or any combination thereof.
8. A method for ting a response to a treatment for colon cancer, comprising: detecting an expression level of at least 100 colon cancer-related nucleic acid les listed in Table 6 in a sample comprising nucleic acids obtained from a subject; comparing the ed sion level of the at least 100 colon cancer-related nucleic acid molecules, or a decision score derived therefrom to a control threshold indicative of a known response to treatment, wherein the sion level, or a decision score derived therefrom, on the same side of the threshold indicates a similar response to treatment, thereby predicting response to ent; and wherein the at least 100 colon cancer-related nucleic acid molecules does not include EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, ICAM1, BMP2, BMPR1A, CYP1B1, IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, DSP, JUN, TCF7L2, FLT1, 6, GBP1, XPC, RALBP1, STAT1, FOXO3, RTEL1, TNFRSF6B, EGFR, HIF1A, KLF6, or MUC2.
9. The method of claim 8, wherein the control threshold comprises: a old derived from corresponding transcripts from colon cancer-related nucleic acid molecules listed in Table 6 in a colon cancer sample (or samples) having a known response to treatment, wherein the expression level, or a decision score derived therefrom, on the same side of the threshold as a colon cancer sample (or samples) having a known response to treatment indicates a similar response to treatment, thereby ting se to treatment.
10. The method of any one of claims 8 or 9, wherein the treatment is surgical resection.
11. The method of any one of claims 8-10, wherein the treatment is chemotherapy and/or radiation.
12. A method for predicting long term survival of a subject with colon cancer, comprising: detecting an expression level of at least 100 colon cancer-related c acid molecules listed in Table 6 in a sample comprising nucleic acids obtained from a subject; comparing the combined expression level of the at least 100 colon -related nucleic acid molecules, or a decision score d rom to a control threshold indicative of having a history of long term survival, wherein the expression level, or a decision score derived therefrom, on the same side of the threshold indicates long term survival of the subject, thereby predicting long term survival of a subject; and wherein the at least 100 colon cancer-related nucleic acid molecules does not include EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, ICAM1, BMP2, BMPR1A, CYP1B1, IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, DSP, JUN, TCF7L2, FLT1, 6, GBP1, XPCRALBP1, STAT1, FOXO3, RTEL1, TNFRSF6B, EGFR, HIF1A, KLF6, or MUC2.
13. The method of claim 12, wherein the control threshold comprises: a threshold derived from corresponding transcripts from colon cancer-related nucleic acid molecules listed in Table 6 in a colon cancer sample (or samples) ed from a subject (or subjects) having a history of long term al, wherein the expression level, or a decision score d therefrom, on the same side of the old as a colon cancer sample (or samples) obtained from a subject (or subjects) having a history of long term al indicates long term survival of the subject, thereby ting long term survival of a t.
14. The method of claim 13, wherein long term survival comprises at least 5 year survival.
15. A method for predicting of recurrence of colon cancer in a subject, comprising: detecting an expression level of at least 100 colon cancer-related nucleic acid molecules listed in Table 6 in a sample comprising nucleic acids obtained from a subject; comparing the combined expression level of the at least 100 colon cancer-related nucleic acid molecules, or a decision score derived therefrom to a control threshold indicative of a history of recurrence, wherein the expression level, or a decision score derived therefrom, on the same side of the threshold tes a recurrence in the subject; and wherein the at least 100 colon cancer-related nucleic acid molecules does not include EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, ICAM1, BMP2, BMPR1A, CYP1B1, IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, DSP, JUN, TCF7L2, FLT1, 6, GBP1, XPC, RALBP1, STAT1, FOXO3, RTEL1, 6B, EGFR, HIF1A, KLF6, or MUC2.
16. The method of claim 15, wherein the control threshold comprises: a threshold d from corresponding transcripts from colon cancer-related nucleic acid molecules listed in Table 6 in a colon cancer sample (or samples) having a history of recurrence, wherein the expression level, or a decision score derived therefrom, on the same side of the old as a colon cancer sample (or samples) of known history of recurrence, tes a recurrence in the subject.
17. A method of preparing a personalized colon cancer genomics profile for a subject, comprising: detecting an expression level of at least 100 colon cancer-related nucleic acid molecules listed in Table 6 in a sample comprising nucleic acids obtained from a subject; creating a report summarizing the data obtained by the gene expression analysis; and wherein the at least 100 colon cancer-related nucleic acid les does not include EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, ICAM1, BMP2, BMPR1A, CYP1B1, IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, DSP, JUN, TCF7L2, FLT1, CEACAM6, GBP1, XPC, RALBP1, STAT1, FOXO3, RTEL1, TNFRSF6B, EGFR, HIF1A, KLF6, or MUC2.
18. The method of any one of claims 1-17, wherein the nucleic acids obtained from the subject comprise RNA and/or cDNA ribed from RNA extracted from a sample of colorectal tissue obtained from the subject.
19. The method of claim 18, wherein the sample is a biopsy sample.
20. The method of any one of claims 18 or 19, wherein the sample is a fixed and/or paraffin embedded sample.
21. The method of any one of claims 1-20, wherein the expression level is normalized against a control gene or genes.
22. The method of any one of claims 1-21, n the level of expression is ined with PCR and/or or microarray-based methods.
23. The method of any one of claims 1-22, wherein detecting the expression level of at least 100 colon -related nucleic acid molecules listed in Table 6 comprises detecting the expression levels for MUM1 and SIGMAR1 transcripts.
24. The method of any one of claims 1-23, wherein detecting the expression level of at least 100 colon cancer-related nucleic acid molecules listed in Table 6 comprises detecting the expression levels for MUM1, SIGMAR1, ARSD, SULT1C2 and 1 ripts.
25. The method of any one of claims 1-24, wherein detecting the expression level of at least 2 colon cancer-related nucleic acid molecules listed in Table 6 comprises detecting the expression levels for ARSD, CXCL9, PCLO, SLC2A3, FCGBP, SLC2A14, SLC2A3, BCL9L and antisense sequences of MUC3A, OLFM4 and RNF39 ripts.
26. The method of any one of claims 1-25, wherein detecting the expression level of at least 100 colon cancer-related nucleic acid molecules listed in Table 6 comprises detecting the expression levels for the transcripts listed in Table 1.
27. The method of any one of claims 1-26, wherein detecting the expression level of at least 100 colon cancer-related nucleic acid molecules listed in Table 6 comprises detecting the expression levels for the transcripts listed in Table 2.
28. A method for preparing a gene expression profile indicative of colon cancer prognosis, comprising: detecting the expression level of at least 100 transcripts in a sample comprising RNA isolated from a colon cancer en, n at least 100 transcripts listed in Table 6 are detected and wherein the at least 100 transcripts does not include EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, ICAM1, BMP2, BMPR1A, , IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, DSP, JUN, TCF7L2, FLT1, CEACAM6, GBP1, XPC, RALBP1, STAT1, FOXO3, RTEL1, TNFRSF6B, EGFR, HIF1A, KLF6, or MUC2.
29. The method of claim 28, n from 400 to 800 transcript expression levels are detected.
30. The method of claim 28 or 29, wherein from 500 to 700 transcript expression levels are detected.
31. The method of any one of claims 28-30, wherein at least 200 of the total number of transcripts that are detected are transcripts from Table 6.
32. The method of any one of claims 28-31, wherein at least 300 of the total number of transcripts that are ed are transcripts from Table 6.
33. The method of any one of claims 28-32, wherein at least 400 of the total number of transcripts that are detected are transcripts from Table 6.
34. The method of any one of claims 28-33, wherein at least 500 of the total number of transcripts that are detected are transcripts from Table 6.
35. The method of any one of claims 28-34, wherein at least 550 of the total number of transcripts that are detected are transcripts from Table 6.
36. The method of any one of claims 28-35, wherein the transcripts set forth as SEQ ID NOs: 1-33, 35-82, 84-85, 87-97, 99-107, 109-144, 146-158, 160-164, 167-208, 210- 212, 214-226, 8, 250-273, 275-279, 8, 310-328, 339-339, 342-343, 346-350, 352-370, 372, 374-383, 385-389, 391-405, 407-409, 8, 420-430, 432-439, 441, 443- 448, 450-454, 456-459, 461-468, 470-477, 3, 495-502 504-524, 526-538, 540-445, 547, 548, 7, 5, 577-586, 588-596, 598-608, 610, 612-633, 635 and 636 are detected.
37. The method of any one of claims 28-36, wherein the colon cancer specimen is a formalin fixed paraffin-embedded tissue sample.
38. The method of any one of claims 28-37, further comprising, scoring the expression level of the transcripts listed in Table 6, or a decision score derived therefrom, against corresponding levels or scores for high risk and low risk patient populations.
39. The method of claim 38, r sing, choosing a plan designating adjuvant chemotherapy where the patient is determined to be in the high risk group.
40. A method for prognosing colon cancer, comprising: preparing a gene expression profile for a colon cancer specimen comprising isolated RNA; and classifying the specimen based the sion levels of at least 100 transcripts listed in Table 6, or a decision score derived therefrom, in a low risk or high risk group and wherein the at least 100 transcripts listed in Table 6 does not include EFNA3, IGF1, CTSD, PTK2, AXIN2, CTGF, CHD2, ITGA6, RUNX1, ID2, HMGB1, VDR, EPHB4, PKM2, SOD2, IGFBP2, GRB7, ICAM1, BMP2, , CYP1B1, IGF2, NOTCH2, NOTCH2NL, BUB3, MMP1, DSP, JUN, TCF7L2, FLT1, 6, GBP1, XPC, RALBP1, STAT1, FOXO3, RTEL1, TNFRSF6B, CTSD, EGFR, HIF1A, KLF6, or MUC2.
41. The method of claim 40, wherein the level of less than 1000 transcripts are detected in the gene expression profile.
42. The method of claim 40 or 41, wherein the level of less than 800 transcripts are detected in the gene expression profile.
43. The method of any one of claims 40-42, wherein the level of less than 700 transcripts are detected in the gene expression profile.
44. The method of any one of claims 40-43, n the specimen is fied based on the expression level of at least 100 transcripts from Table 6.
45. The method of any one of claims 40-44, wherein the specimen is classified based on the expression level of at least 200 transcripts from Table 6.
46. The method of any one of claims 40-45, wherein the specimen is classified based on the expression level of at least 300 transcripts from Table 6.
47. The method of any one of claims 40-46, wherein the specimen is classified based on the expression level of at least 400 transcripts from Table 6.
48. The method of any one of claims 40-47, wherein the specimen is classified based on the expression level of at least 500 transcripts from Table 6.
49. The method of any one of claims 40-48, wherein the specimen is classified based on the expression level of at least 550 transcripts from Table 6.
50. The method of any one of claims 40-49, wherein the expression level of the transcripts are used to classify the specimen.
51. The method of any one of claims 40-50, wherein the colon cancer specimen is a formalin fixed paraffin-embedded tissue sample.
52. The method of any one of claims 40-51, further comprising, choosing a plan designating nt chemotherapy where the patient is determined to be in the high risk group.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201161435922P | 2011-01-25 | 2011-01-25 | |
US61/435,922 | 2011-01-25 | ||
PCT/US2012/022594 WO2012103250A2 (en) | 2011-01-25 | 2012-01-25 | Colon cancer gene expression signatures and methods of use |
Publications (2)
Publication Number | Publication Date |
---|---|
NZ612471A NZ612471A (en) | 2015-11-27 |
NZ612471B2 true NZ612471B2 (en) | 2016-03-01 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10196691B2 (en) | Colon cancer gene expression signatures and methods of use | |
JP4938672B2 (en) | Methods, systems, and arrays for classifying cancer, predicting prognosis, and diagnosing based on association between p53 status and gene expression profile | |
JP6404304B2 (en) | Prognosis prediction of melanoma cancer | |
JP5940517B2 (en) | Methods for predicting breast cancer recurrence under endocrine therapy | |
CN103459597A (en) | Marker for predicting stomach cancer prognosis and method for predicting stomach cancer prognosis | |
JP2009528825A (en) | Molecular analysis to predict recurrence of Dukes B colorectal cancer | |
WO2011086174A2 (en) | Diagnostic gene expression platform | |
KR20180009762A (en) | Methods and compositions for diagnosing or detecting lung cancer | |
EP2982986B1 (en) | Method for manufacturing gastric cancer prognosis prediction model | |
EP2419540B1 (en) | Methods and gene expression signature for assessing ras pathway activity | |
KR20100120657A (en) | Molecular staging of stage ii and iii colon cancer and prognosis | |
EP3472361A1 (en) | Compositions and methods for diagnosing lung cancers using gene expression profiles | |
WO2013109613A1 (en) | Gene signature is associated with early stage rectal cancer recurrence | |
US20210079479A1 (en) | Compostions and methods for diagnosing lung cancers using gene expression profiles | |
EP1683862B1 (en) | Microarray for assessing neuroblastoma prognosis and method of assessing neuroblastoma prognosis | |
US20200010909A1 (en) | Gene panel to predict response to androgen deprivation in prostate cancer | |
NZ612471B2 (en) | Colon cancer gene expression signatures and methods of use | |
CN110079601B (en) | Diagnosis and treatment marker for radioactivity related diseases and application thereof |