US20220251665A1 - Cancer detection and classification using methylome analysis - Google Patents
Cancer detection and classification using methylome analysis Download PDFInfo
- Publication number
- US20220251665A1 US20220251665A1 US17/668,314 US202217668314A US2022251665A1 US 20220251665 A1 US20220251665 A1 US 20220251665A1 US 202217668314 A US202217668314 A US 202217668314A US 2022251665 A1 US2022251665 A1 US 2022251665A1
- Authority
- US
- United States
- Prior art keywords
- dna
- cancer
- cell
- subject
- methylated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 159
- 201000011510 cancer Diseases 0.000 title claims abstract description 120
- 238000001514 detection method Methods 0.000 title description 18
- 238000004458 analytical method Methods 0.000 title description 14
- 108020004414 DNA Proteins 0.000 claims abstract description 239
- 238000000034 method Methods 0.000 claims abstract description 80
- 238000012163 sequencing technique Methods 0.000 claims abstract description 50
- 239000000945 filler Substances 0.000 claims abstract description 28
- 239000011230 binding agent Substances 0.000 claims abstract description 11
- 208000020816 lung neoplasm Diseases 0.000 claims description 34
- 206010058467 Lung neoplasm malignant Diseases 0.000 claims description 33
- 201000005202 lung cancer Diseases 0.000 claims description 33
- 230000011987 methylation Effects 0.000 claims description 27
- 238000007069 methylation reaction Methods 0.000 claims description 27
- 206010009944 Colon cancer Diseases 0.000 claims description 23
- 208000001333 Colorectal Neoplasms Diseases 0.000 claims description 23
- 208000031261 Acute myeloid leukaemia Diseases 0.000 claims description 18
- 208000033776 Myeloid Acute Leukemia Diseases 0.000 claims description 18
- 230000035772 mutation Effects 0.000 claims description 18
- 206010006187 Breast cancer Diseases 0.000 claims description 16
- 208000026310 Breast neoplasm Diseases 0.000 claims description 16
- 238000010801 machine learning Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 11
- 108020004707 nucleic acids Proteins 0.000 claims description 9
- 102000039446 nucleic acids Human genes 0.000 claims description 9
- 150000007523 nucleic acids Chemical class 0.000 claims description 9
- 108090000623 proteins and genes Proteins 0.000 claims description 9
- 239000000203 mixture Substances 0.000 claims description 8
- 230000027455 binding Effects 0.000 claims description 7
- 239000003623 enhancer Substances 0.000 claims description 7
- 108091029523 CpG island Proteins 0.000 claims description 6
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 claims description 6
- 101000932478 Homo sapiens Receptor-type tyrosine-protein kinase FLT3 Proteins 0.000 claims description 6
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 claims description 6
- 102100020718 Receptor-type tyrosine-protein kinase FLT3 Human genes 0.000 claims description 6
- 208000005017 glioblastoma Diseases 0.000 claims description 6
- 102000004169 proteins and genes Human genes 0.000 claims description 6
- 208000009956 adenocarcinoma Diseases 0.000 claims description 4
- 208000000649 small cell carcinoma Diseases 0.000 claims description 4
- 206010041823 squamous cell carcinoma Diseases 0.000 claims description 4
- 101710102690 Isocitrate dehydrogenase [NADP] cytoplasmic Proteins 0.000 claims description 3
- 125000003729 nucleotide group Chemical group 0.000 claims description 3
- 238000002156 mixing Methods 0.000 claims description 2
- 102100037845 Isocitrate dehydrogenase [NADP], mitochondrial Human genes 0.000 claims 2
- 208000003721 Triple Negative Breast Neoplasms Diseases 0.000 claims 2
- 239000002253 acid Substances 0.000 claims 2
- 208000022679 triple-negative breast carcinoma Diseases 0.000 claims 2
- 230000006607 hypermethylation Effects 0.000 claims 1
- 238000012417 linear regression Methods 0.000 claims 1
- 238000002360 preparation method Methods 0.000 abstract description 6
- 108091033319 polynucleotide Proteins 0.000 abstract description 5
- 102000040430 polynucleotide Human genes 0.000 abstract description 5
- 239000002157 polynucleotide Substances 0.000 abstract description 5
- 210000004027 cell Anatomy 0.000 description 63
- 239000000523 sample Substances 0.000 description 38
- 210000002381 plasma Anatomy 0.000 description 29
- 210000001519 tissue Anatomy 0.000 description 22
- 230000007067 DNA methylation Effects 0.000 description 21
- 238000012549 training Methods 0.000 description 15
- 238000013507 mapping Methods 0.000 description 14
- 238000003860 storage Methods 0.000 description 12
- 206010035226 Plasma cell myeloma Diseases 0.000 description 11
- 238000010790 dilution Methods 0.000 description 11
- 239000012895 dilution Substances 0.000 description 11
- 238000003752 polymerase chain reaction Methods 0.000 description 11
- 238000012360 testing method Methods 0.000 description 11
- 238000001114 immunoprecipitation Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 9
- 201000010099 disease Diseases 0.000 description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 9
- 238000010200 validation analysis Methods 0.000 description 9
- 201000000050 myeloid neoplasm Diseases 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 7
- 238000002790 cross-validation Methods 0.000 description 7
- 230000003278 mimic effect Effects 0.000 description 7
- 230000004075 alteration Effects 0.000 description 6
- 238000013459 approach Methods 0.000 description 6
- 239000011324 bead Substances 0.000 description 6
- 230000036541 health Effects 0.000 description 6
- 210000000056 organ Anatomy 0.000 description 6
- 230000035945 sensitivity Effects 0.000 description 6
- 239000007787 solid Substances 0.000 description 6
- 241000699666 Mus <mouse, genus> Species 0.000 description 5
- 238000003556 assay Methods 0.000 description 5
- 210000004369 blood Anatomy 0.000 description 5
- 239000008280 blood Substances 0.000 description 5
- 239000012634 fragment Substances 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 238000011084 recovery Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 101150020771 IDH gene Proteins 0.000 description 4
- 108091028043 Nucleic acid sequence Proteins 0.000 description 4
- 238000001369 bisulfite sequencing Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000000869 mutational effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008707 rearrangement Effects 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 230000001225 therapeutic effect Effects 0.000 description 4
- 206010069754 Acquired gene mutation Diseases 0.000 description 3
- 108091093088 Amplicon Proteins 0.000 description 3
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 3
- 102100031181 Glyceraldehyde-3-phosphate dehydrogenase Human genes 0.000 description 3
- 101000581507 Homo sapiens Methyl-CpG-binding domain protein 1 Proteins 0.000 description 3
- 101000615488 Homo sapiens Methyl-CpG-binding domain protein 2 Proteins 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 3
- 102100027383 Methyl-CpG-binding domain protein 1 Human genes 0.000 description 3
- 102100021299 Methyl-CpG-binding domain protein 2 Human genes 0.000 description 3
- 208000034578 Multiple myelomas Diseases 0.000 description 3
- 241000699670 Mus sp. Species 0.000 description 3
- 206010060862 Prostate cancer Diseases 0.000 description 3
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 3
- 230000010632 Transcription Factor Activity Effects 0.000 description 3
- 230000001684 chronic effect Effects 0.000 description 3
- 108091092240 circulating cell-free DNA Proteins 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 108020004445 glyceraldehyde-3-phosphate dehydrogenase Proteins 0.000 description 3
- 238000007481 next generation sequencing Methods 0.000 description 3
- 238000000746 purification Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 230000037439 somatic mutation Effects 0.000 description 3
- 239000000758 substrate Substances 0.000 description 3
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 2
- KJLPSBMDOIVXSN-UHFFFAOYSA-N 4-[4-[2-[4-(3,4-dicarboxyphenoxy)phenyl]propan-2-yl]phenoxy]phthalic acid Chemical compound C=1C=C(OC=2C=C(C(C(O)=O)=CC=2)C(O)=O)C=CC=1C(C)(C)C(C=C1)=CC=C1OC1=CC=C(C(O)=O)C(C(O)=O)=C1 KJLPSBMDOIVXSN-UHFFFAOYSA-N 0.000 description 2
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 208000003174 Brain Neoplasms Diseases 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 206010014967 Ependymoma Diseases 0.000 description 2
- 201000010915 Glioblastoma multiforme Diseases 0.000 description 2
- 101000615495 Homo sapiens Methyl-CpG-binding domain protein 3 Proteins 0.000 description 2
- 101000615492 Homo sapiens Methyl-CpG-binding domain protein 4 Proteins 0.000 description 2
- 206010025323 Lymphomas Diseases 0.000 description 2
- 102000006890 Methyl-CpG-Binding Protein 2 Human genes 0.000 description 2
- 108010072388 Methyl-CpG-Binding Protein 2 Proteins 0.000 description 2
- 102100021291 Methyl-CpG-binding domain protein 3 Human genes 0.000 description 2
- 102100021290 Methyl-CpG-binding domain protein 4 Human genes 0.000 description 2
- 101150042248 Mgmt gene Proteins 0.000 description 2
- 108091092724 Noncoding DNA Proteins 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 2
- 238000011529 RT qPCR Methods 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 2
- 239000000427 antigen Substances 0.000 description 2
- 108091007433 antigens Proteins 0.000 description 2
- 102000036639 antigens Human genes 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 239000000090 biomarker Substances 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000002512 chemotherapy Methods 0.000 description 2
- 230000037437 driver mutation Effects 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 229940088598 enzyme Drugs 0.000 description 2
- 230000004076 epigenetic alteration Effects 0.000 description 2
- 239000012530 fluid Substances 0.000 description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 2
- 238000012165 high-throughput sequencing Methods 0.000 description 2
- 208000032839 leukemia Diseases 0.000 description 2
- 230000000527 lymphocytic effect Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 239000013641 positive control Substances 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000007480 sanger sequencing Methods 0.000 description 2
- 238000013207 serial dilution Methods 0.000 description 2
- 210000003491 skin Anatomy 0.000 description 2
- 238000013179 statistical model Methods 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- 210000001550 testis Anatomy 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 1
- 206010061424 Anal cancer Diseases 0.000 description 1
- 208000007860 Anus Neoplasms Diseases 0.000 description 1
- 206010003445 Ascites Diseases 0.000 description 1
- 206010004593 Bile duct cancer Diseases 0.000 description 1
- 206010005003 Bladder cancer Diseases 0.000 description 1
- 206010005949 Bone cancer Diseases 0.000 description 1
- 208000018084 Bone neoplasm Diseases 0.000 description 1
- 101100468275 Caenorhabditis elegans rep-1 gene Proteins 0.000 description 1
- 206010007279 Carcinoid tumour of the gastrointestinal tract Diseases 0.000 description 1
- 208000005024 Castleman disease Diseases 0.000 description 1
- 206010008342 Cervix carcinoma Diseases 0.000 description 1
- 102000029816 Collagenase Human genes 0.000 description 1
- 108060005980 Collagenase Proteins 0.000 description 1
- 108091035707 Consensus sequence Proteins 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 206010014733 Endometrial cancer Diseases 0.000 description 1
- 206010014759 Endometrial neoplasm Diseases 0.000 description 1
- 241000701959 Escherichia virus Lambda Species 0.000 description 1
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 1
- 208000012468 Ewing sarcoma/peripheral primitive neuroectodermal tumor Diseases 0.000 description 1
- 108091029865 Exogenous DNA Proteins 0.000 description 1
- 208000022072 Gallbladder Neoplasms Diseases 0.000 description 1
- 102100030688 Histone H2B type 1-A Human genes 0.000 description 1
- 208000017604 Hodgkin disease Diseases 0.000 description 1
- 208000021519 Hodgkin lymphoma Diseases 0.000 description 1
- 208000010747 Hodgkins lymphoma Diseases 0.000 description 1
- 101001084688 Homo sapiens Histone H2B type 1-A Proteins 0.000 description 1
- 101000632056 Homo sapiens Septin-9 Proteins 0.000 description 1
- 101100214367 Homo sapiens ZNF215 gene Proteins 0.000 description 1
- 108090000144 Human Proteins Proteins 0.000 description 1
- 102000003839 Human Proteins Human genes 0.000 description 1
- 206010021042 Hypopharyngeal cancer Diseases 0.000 description 1
- 206010056305 Hypopharyngeal neoplasm Diseases 0.000 description 1
- 102100039905 Isocitrate dehydrogenase [NADP] cytoplasmic Human genes 0.000 description 1
- 208000007766 Kaposi sarcoma Diseases 0.000 description 1
- 208000008839 Kidney Neoplasms Diseases 0.000 description 1
- 206010023825 Laryngeal cancer Diseases 0.000 description 1
- 108700043128 MBD2 Proteins 0.000 description 1
- 101150083522 MECP2 gene Proteins 0.000 description 1
- 208000032271 Malignant tumor of penis Diseases 0.000 description 1
- 102100039124 Methyl-CpG-binding protein 2 Human genes 0.000 description 1
- 241000204031 Mycoplasma Species 0.000 description 1
- 201000003793 Myelodysplastic syndrome Diseases 0.000 description 1
- 208000001894 Nasopharyngeal Neoplasms Diseases 0.000 description 1
- 206010061306 Nasopharyngeal cancer Diseases 0.000 description 1
- 206010029260 Neuroblastoma Diseases 0.000 description 1
- 101100519293 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) pdx-1 gene Proteins 0.000 description 1
- 208000015914 Non-Hodgkin lymphomas Diseases 0.000 description 1
- 102000007999 Nuclear Proteins Human genes 0.000 description 1
- 108010089610 Nuclear Proteins Proteins 0.000 description 1
- 206010031096 Oropharyngeal cancer Diseases 0.000 description 1
- 206010057444 Oropharyngeal neoplasm Diseases 0.000 description 1
- 206010033128 Ovarian cancer Diseases 0.000 description 1
- 206010061535 Ovarian neoplasm Diseases 0.000 description 1
- 101710112083 Para-Rep C1 Proteins 0.000 description 1
- 208000002471 Penile Neoplasms Diseases 0.000 description 1
- 206010034299 Penile cancer Diseases 0.000 description 1
- 208000007913 Pituitary Neoplasms Diseases 0.000 description 1
- 206010036790 Productive cough Diseases 0.000 description 1
- 208000015634 Rectal Neoplasms Diseases 0.000 description 1
- 206010038389 Renal cancer Diseases 0.000 description 1
- 201000000582 Retinoblastoma Diseases 0.000 description 1
- 208000004337 Salivary Gland Neoplasms Diseases 0.000 description 1
- 206010061934 Salivary gland cancer Diseases 0.000 description 1
- 206010039491 Sarcoma Diseases 0.000 description 1
- 102100028024 Septin-9 Human genes 0.000 description 1
- 208000000453 Skin Neoplasms Diseases 0.000 description 1
- 208000032383 Soft tissue cancer Diseases 0.000 description 1
- 208000005718 Stomach Neoplasms Diseases 0.000 description 1
- 208000024313 Testicular Neoplasms Diseases 0.000 description 1
- 206010057644 Testis cancer Diseases 0.000 description 1
- 208000000728 Thymus Neoplasms Diseases 0.000 description 1
- 208000024770 Thyroid neoplasm Diseases 0.000 description 1
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 206010047741 Vulval cancer Diseases 0.000 description 1
- 208000004354 Vulvar Neoplasms Diseases 0.000 description 1
- 208000016025 Waldenstroem macroglobulinemia Diseases 0.000 description 1
- 208000033559 Waldenström macroglobulinemia Diseases 0.000 description 1
- 208000008383 Wilms tumor Diseases 0.000 description 1
- 102100039974 Zinc finger protein 215 Human genes 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 201000005188 adrenal gland cancer Diseases 0.000 description 1
- 208000024447 adrenal gland neoplasm Diseases 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 102000001307 androgen receptors Human genes 0.000 description 1
- 108010080146 androgen receptors Proteins 0.000 description 1
- 239000003242 anti bacterial agent Substances 0.000 description 1
- 229940088710 antibiotic agent Drugs 0.000 description 1
- 201000011165 anus cancer Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 208000026900 bile duct neoplasm Diseases 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000009534 blood test Methods 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 230000001680 brushing effect Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000012754 cardiac puncture Methods 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 201000010881 cervical cancer Diseases 0.000 description 1
- 210000000038 chest Anatomy 0.000 description 1
- 208000006990 cholangiocarcinoma Diseases 0.000 description 1
- 229960002424 collagenase Drugs 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 208000029742 colonic neoplasm Diseases 0.000 description 1
- 238000010205 computational analysis Methods 0.000 description 1
- 238000002591 computed tomography Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000539 dimer Substances 0.000 description 1
- 238000013399 early diagnosis Methods 0.000 description 1
- 238000001839 endoscopy Methods 0.000 description 1
- 201000004101 esophageal cancer Diseases 0.000 description 1
- 208000024519 eye neoplasm Diseases 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 201000010175 gallbladder cancer Diseases 0.000 description 1
- 206010017758 gastric cancer Diseases 0.000 description 1
- 201000011243 gastrointestinal stromal tumor Diseases 0.000 description 1
- 239000000499 gel Substances 0.000 description 1
- 208000003884 gestational trophoblastic disease Diseases 0.000 description 1
- QAOWNCQODCNURD-UHFFFAOYSA-M hydrogensulfate Chemical compound OS([O-])(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-M 0.000 description 1
- 201000006866 hypopharynx cancer Diseases 0.000 description 1
- 239000005457 ice water Substances 0.000 description 1
- 230000002621 immunoprecipitating effect Effects 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000005865 ionizing radiation Effects 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 201000010982 kidney cancer Diseases 0.000 description 1
- 206010023841 laryngeal neoplasm Diseases 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000011528 liquid biopsy Methods 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 208000014018 liver neoplasm Diseases 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 208000026807 lung carcinoid tumor Diseases 0.000 description 1
- 208000037841 lung tumor Diseases 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 208000006178 malignant mesothelioma Diseases 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005399 mechanical ventilation Methods 0.000 description 1
- 201000001441 melanoma Diseases 0.000 description 1
- 210000000716 merkel cell Anatomy 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 208000018795 nasal cavity and paranasal sinus carcinoma Diseases 0.000 description 1
- 238000013188 needle biopsy Methods 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 201000008026 nephroblastoma Diseases 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 230000009871 nonspecific binding Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 201000008106 ocular cancer Diseases 0.000 description 1
- 201000005443 oral cavity cancer Diseases 0.000 description 1
- 201000006958 oropharynx cancer Diseases 0.000 description 1
- 201000008968 osteosarcoma Diseases 0.000 description 1
- 238000009595 pap smear Methods 0.000 description 1
- 210000004910 pleural fluid Anatomy 0.000 description 1
- 229920001184 polypeptide Polymers 0.000 description 1
- 230000001376 precipitating effect Effects 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000007115 recruitment Effects 0.000 description 1
- 206010038038 rectal cancer Diseases 0.000 description 1
- 201000001275 rectum cancer Diseases 0.000 description 1
- 201000009410 rhabdomyosarcoma Diseases 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 201000000849 skin cancer Diseases 0.000 description 1
- 201000002314 small intestine cancer Diseases 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 210000003802 sputum Anatomy 0.000 description 1
- 208000024794 sputum Diseases 0.000 description 1
- 201000011549 stomach cancer Diseases 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 201000003120 testicular cancer Diseases 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 201000009377 thymus cancer Diseases 0.000 description 1
- 201000002510 thyroid cancer Diseases 0.000 description 1
- 238000009424 underpinning Methods 0.000 description 1
- 201000005112 urinary bladder cancer Diseases 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 208000037965 uterine sarcoma Diseases 0.000 description 1
- 206010046885 vaginal cancer Diseases 0.000 description 1
- 208000013139 vaginal neoplasm Diseases 0.000 description 1
- 201000005102 vulva cancer Diseases 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6804—Nucleic acid analysis using immunogens
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
- G16B5/20—Probabilistic models
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2522/00—Reaction characterised by the use of non-enzymatic proteins
- C12Q2522/10—Nucleic acid binding proteins
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2537/00—Reactions characterised by the reaction format or use of a specific feature
- C12Q2537/10—Reactions characterised by the reaction format or use of a specific feature the purpose or use of
- C12Q2537/164—Methylation detection other then bisulfite or methylation sensitive restriction endonucleases
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
Definitions
- the invention relates to cancer detection and classification and more particularly to the use of methylome analysis for the same.
- circulating cell-free DNA cfDNA
- cfDNA circulating cell-free DNA
- Use of DNA methylation mapping of cfDNA as a biomarker could have a significant impact in the field of liquid biopsy, as it could allow for the identification of the tissue-of-origin[2], allow for cancer type and subtype classification, and stratify cancer patients in a minimally invasive fashion[3].
- using genome-wide DNA methylation mapping of cfDNA could overcome a critical sensitivity problem in detecting circulating tumor DNA (ctDNA) in patients with early-stage cancer with no radiographic evidence of disease.
- ctDNA detection methods are based on sequencing mutations and have limited sensitivity in part due to the limited number of recurrent mutations available to distinguish between tumor and normal circulating cfDNA[4, 5].
- genome-wide DNA methylation mapping leverages large numbers of epigenetic alterations that may be used to distinguish circulating tumor DNA (ctDNA) from normal circulating cell-free DNA (cfDNA).
- ctDNA circulating tumor DNA
- cfDNA normal circulating cell-free DNA
- some tumor types such as ependymomas, can have extensive DNA methylation aberrations without any significant recurrent somatic mutations[6].
- a method of detecting the presence of DNA from cancer cells in a subject comprising: providing a sample of cell-free DNA from a subject; subjecting the sample to library preparation to permit subsequent sequencing of the cell-free methylated DNA; adding a first amount of filler DNA to the sample, wherein at least a portion of the filler DNA is methylated, then optionally denaturing the sample; capturing cell-free methylated DNA using a binder selective for methylated polynucleotides; sequencing the captured cell-free methylated DNA; comparing the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals.
- a method of detecting the presence of DNA from cancer cells and identifying a cancer subtype comprising: receiving sequencing data of cell-free methylated DNA from a subject sample; comparing the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals; and if DNA from cancer cells is identified, further identifying the cancer cell tissue of origin and cancer subtype based on the comparison.
- a computer-implemented method of detecting the presence of DNA from cancer cells and identifying a cancer subtype comprising: receiving, at least one processor, sequencing data of cell-free methylated DNA from a subject sample; comparing, at the at least one processor, the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying, at the at least one processor, the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNA sequences from cancerous individuals and if DNA from cancer cells is identified, further identifying the cancer cell tissue of origin and cancer subtype based on the comparison.
- a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method described herein.
- a computer readable medium having stored thereon a data structure for storing the computer program product described herein.
- a device for detecting the presence of DNA from cancer cells and identifying a cancer subtype comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: receive sequencing data of cell-free methylated DNA from a subject sample; compare the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identify the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNA sequences from cancerous individuals and if DNA from cancer cells is identified, further identify the cancer cell tissue of origin and cancer subtype based on the comparison.
- a method of detecting the presence of DNA from cancer cells and determining the location of the cancer from which the cancer cells arose from two or more possible organs comprising: providing a sample of cell-free DNA from a subject; capturing cell-free methylated DNA from said sample, using a binder selective for methylated polynucleotides; sequencing the captured cell-free methylated DNA; comparing the sequence patterns of the captured cell-free methylated DNA to DNAs sequence patterns of two or more population(s) of control individuals, each of said two or more populations having localized cancer in a different organ; determining as to which organ the cancer cells arose on the basis of a statistically significant similarity between the pattern of methylation of the cell-free DNA and one of said two or more populations.
- FIG. 1 shows methylome analysis of cfDNA is a highly sensitive approach to enrich and detect ctDNA in low amounts of input DNA.
- FIG. 1A shows a computer simulation of the probability to detect at least one epimutation as a function of the concentration of ctDNA (columns), number of DMRs being investigated (rows), and the sequencing depth (x-axis).
- FIG. 1B shows genome-wide Pearson correlation between DNA methylation signal for 1 to 100 ng of input DNA from HCT116 cell line fragmented to mimic plasma cfDNA. Each concentration has two biological replicates.
- FIG. 1 shows methylome analysis of cfDNA is a highly sensitive approach to enrich and detect ctDNA in low amounts of input DNA.
- FIG. 1A shows a computer simulation of the probability to detect at least one epimutation as a function of the concentration of ctDNA (columns), number of DMRs being investigated (rows), and the sequencing depth (x-axis).
- FIG. 1B shows genome-wide
- FIG. 1C shows a DNA methylation profile obtained from cfMeDIP-seq from different concentrations of input DNA from HCT116 (Green Tracks) plus RRBS (Reduced Representation Bisulfite Sequencing) HCT116 data obtained from ENCODE (ENCSR000DFS) and WGBS (Whole-Genome Bisulfite Sequencing) HCT116 data obtained from GEO (GSM1465024).
- RRBS track yellow means methylated
- blue means unmethylated
- gray means no coverage.
- FIG. 1D and FIG. 1E show results of serial dilution of the CRC cell line HCT116 into the Multiple Myeloma (MM) cell line MM1.S.
- FIG. 1D the observed versus expected
- FIG. 1E the DNA methylation signal
- FIG. 1F illustrates that in the same dilution series, known somatic mutations are only detectable at 1/100 allele fraction by ultra-deep (>10,000 ⁇ ) targeted sequencing, above the background sequencer and polymerase error rate. Shown are the fractions of reads containing each base or an insertion/deletion at the site of each mutation in the CRC cell line.
- FIG. 1G depicts a bar graph showing frequency of ctDNA (human) as a percentage of total cfDNA (human+mice) in the plasma of mice harboring patient-derived xenograft (PDX) from two colorectal cancer patients.
- PDX patient-derived xenograft
- FIG. 2 shows the methylome analysis of plasma cfDNA allows tumor classification.
- FIG. 2A illustrates a schematic demonstrating the approach of machine learning classifier construction for cancer classification.
- Hierarchical clustering method Ward.
- FIG. 2C shows 2D visualizations by tSNE (t-Distributed Stochastic Neighbor Embedding) of the cancer-type associated DMRs identified in 10% or 25% of models.
- FIG. 2D depicts a plot showing metrics for the plasma cfDNA methylation-based multi-cancer classifier. Area under the receiver operator curve (auROC) shown on the y-axis for each cancer type and healthy donors following 50-fold generation of elastic net machine learning classifiers.
- tSNE t-Distributed Stochastic Neighbor Embedding
- FIG. 3 shows validation of the multi-cancer classifier on independent cohorts.
- LUC lung cancer
- FIG. 4 shows the methylome analysis of plasma cfDNA allows tumor subtype classification.
- FIG. 4A shows 2D visualizations by tSNE (t-Distributed Stochastic Neighbor Embedding) of cancer subtype associated DMRs.
- Breast cancer subtypes show ability to distinguish between patients harboring tumors with distinct gene expression pattern and transcription factor activity (ER status) as well as distinct tumor copy number aberrations (HER2 status).
- AML subtypes show ability to distinguish between patients harboring tumors with distinct rearrangements (FLT3 status).
- Glioblastoma multiforme (GBM) subtypes show ability to distinguish between patients harboring tumors with distinct point mutations (IDH gene mutational status).
- FIG. 4B depicts a heatmap showing the top DMRs that allow accurate discrimination of the three breast cancer subtypes in breast cancer plasma samples.
- FIG. 4C depicts a heatmap showing the top DMRs that allow accurate discrimination of the FLT3-ITD status in AML patient plasma samples.
- FIG. 4D depicts a heatmap showing the top DMRs that allow accurate discrimination of the IDH gene mutational status in glioblastoma multiforme (GBM) patient plasma samples.
- FIG. 4E depicts a heatmap showing the top DMRs that allow accurate discrimination of the three lung cancer histologies in lung cancer plasma samples.
- FIG. 5 shows a suitable configured computer device, and associated communications networks, devices, software and firmware to provide a platform for enabling one or more embodiments as described herein.
- FIG. 6 shows sequencing saturation analysis and quality controls.
- FIG. 6A , FIG. 6B , FIG. 6C , FIG. 6D , and FIG. 6E show the results of the saturation analysis from the Bioconductor package MEDIPS analyzing cfMeDIP-seq data from each replicate for each input concentration from the HCT116 DNA fragmented to mimic plasma cfDNA.
- FIG. 6F is a graph showing the results of the protocol tested in two replicates of four starting DNA concentrations (100, 10, 5, and 1 ng) of HCT116 cell line. Specificity of the reaction was calculated using methylated and unmethylated spiked-in A. thaliana DNA.
- FIG. 6G depicts a bar graph showing CpG Enrichment Scores of the sequenced samples show a robust enrichment of CpGs within the genomic regions from the immunoprecipitated samples compared to the input control.
- the CpG Enrichment Score was obtained by dividing the relative frequency of CpGs of the regions by the relative frequency of CpGs of the human genome. Error bars represent ⁇ 1 s.e.m.
- DNA methylation profiles are cell-type specific and are disrupted in cancer.
- cfDNA circulating cell-free DNA
- DMRs Differentially Methylated Regions
- Methylome analysis of cfDNA is highly sensitive and suitable for detecting circulating tumor DNA (ctDNA) in early stage patients.
- a machine-learning derived classifier using cfDNA methylomes was able to correctly classify 196 plasma samples from patients with 5 cancer types and healthy donors based on cross-validation.
- the classifier was able to correctly classify AML, lung cancer, and healthy donors, as well as both early and late stage lung cancer. Therefore, methylome analysis of cfDNA can be used for non-invasive early stage detection of ctDNA and robustly classify cancer types.
- a method of detecting the presence of DNA from cancer cells in a subject comprising: providing a sample of cell-free DNA from a subject; subjecting the sample to library preparation to permit subsequent sequencing of the cell-free methylated DNA; adding a first amount of filler DNA to the sample, wherein at least a portion of the filler DNA is methylated, then optionally denaturing the sample; capturing cell-free methylated DNA using a binder selective for methylated polynucleotides; sequencing the captured cell-free methylated DNA; comparing the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals.
- Cancer has been traditionally classified by tissue of origin—for instance, colorectal cancer, breast cancer, lung cancer, etc.
- tissue of origin for instance, colorectal cancer, breast cancer, lung cancer, etc.
- therapeutic decisions often hinge on the precise subtype of cancer, and it may be necessary for clinicians to identify the subtype prior to initiation of therapy.
- cancer subtyping that may influence therapeutic decisions include (but are not limited to) stage (e.g., early stage lung cancer treated with surgery vs late stage lung cancer treated with chemotherapy), histology (e.g., small cell carcinoma vs adenocarcinoma vs squamous cell carcinoma in lung cancer), gene expression pattern or transcription factor activity (e.g., ER status in breast cancer), copy number aberrations (e.g., HER2 status in breast cancer), specific rearrangements (e.g., FLT3 in AML), specific gene point mutational status (e.g., IDH gene point mutations), and DNA methylation patterns (e.g., MGMT gene promoter methylation in brain cancer).
- stage e.g., early stage lung cancer treated with surgery vs late stage lung cancer treated with chemotherapy
- histology e.g., small cell carcinoma vs adenocarcinoma vs squamous cell carcinoma in lung cancer
- gene expression pattern or transcription factor activity e
- the methods described herein are applicable to a wide variety of cancers, including but not limited to adrenal cancer, anal cancer, bile duct cancer, bladder cancer, bone cancer, brain/cns tumors, breast cancer, castleman disease, cervical cancer, colon/rectum cancer, endometrial cancer, esophagus cancer, ewing family of tumors, eye cancer, gallbladder cancer, gastrointestinal carcinoid tumors, gastrointestinal stromal tumor (gist), gestational trophoblastic disease, hodgkin disease, kaposi sarcoma, kidney cancer, laryngeal and hypopharyngeal cancer, leukemia (acute lymphocytic, acute myeloid, chronic lymphocytic, chronic myeloid, chronic myelomonocytic), liver cancer, lung cancer (non-small cell, small cell, lung carcinoid tumor), lymphoma, lymphoma of the skin, malignant mesothelioma, multiple myeloma, myel
- NGS next-generation sequencing
- Illumina Solexa
- Roche 454 sequencing Ion torrent: Proton/PGM sequencing
- SOLiD sequencing SOLiD sequencing.
- NGS allow for the sequencing of DNA and RNA much more quickly and cheaply than the previously used Sanger sequencing.
- said sequencing is optimized for short read sequencing.
- subject refers to any member of the animal kingdom, preferably a human being and most preferably a human being that has, has had, or is suspected of having prostate cancer.
- Cell-free methylated DNA is DNA that is circulating freely in the blood stream, and are methylated at various known regions of the DNA. Samples, for example, plasma samples can be taken to analyze cell-free methylated DNA. Accordingly, in some embodiments, the sample is the subject's blood or plasma.
- library preparation includes list end-repair, A-tailing, adapter ligation, or any other preparation performed on the cell free DNA to permit subsequent sequencing of DNA.
- fill DNA can be noncoding DNA or it can consist of amplicons.
- DNA samples may be denatured, for example, using sufficient heat.
- the comparison step is based on fit using a statistical classifier.
- Statistical classifiers using DNA methylation data can be used for assigning a sample to a particular disease state, such as cancer type or subtype.
- a classifier would consist of one or more DNA methylation variables (i.e., features) within a statistical model, and the output of the statistical model would have one or more threshold values to distinguish between distinct disease states.
- the particular feature(s) and threshold value(s) that are used in the statistical classifier can be derived from prior knowledge of the cancer types or subtypes, from prior knowledge of the features that are likely to be most informative, from machine learning, or from a combination of two or more of these approaches.
- the classifier is machine learning-derived.
- the classifier is an elastic net classifier, lasso, support vector machine, random forest, or neural network.
- the genomic space that is analyzed can be genome-wide, or preferably restricted to regulatory regions (i.e., FANTOM5 enhancers, CpG Islands, CpG shores and CpG Shelves).
- the percentage of spike-in methylated DNA recovered is included as a covariate to control for pulldown efficiency variation.
- the classifier For a classifier capable of distinguishing multiple cancer types (or subtypes) from one another, the classifier would preferably consist of differentially methylated regions from pairwise comparisons of each type (or subtype) of interest.
- control cell-free methylated DNAs sequences from healthy and cancerous individuals are comprised in a database of Differentially Methylated Regions (DMRs) between healthy and cancerous individuals.
- DMRs Differentially Methylated Regions
- control cell-free methylated DNA sequences from healthy and cancerous individuals are limited to those control cell-free methylated DNA sequences which are differentially methylated as between healthy and cancerous individuals in DNA derived from cell-free DNA from bodily fluids, such as from blood serum, cerebral spinal fluid, urine stool, sputum, pleural fluid, ascites, tears, sweat, pap smear fluid, endoscopy brushings fluid, . . . etc., preferably from blood plasma.
- bodily fluids such as from blood serum, cerebral spinal fluid, urine stool, sputum, pleural fluid, ascites, tears, sweat, pap smear fluid, endoscopy brushings fluid, . . . etc., preferably from blood plasma.
- the sample has less than 100 ng, 75 ng, or 50 ng of cell-free DNA.
- the first amount of filler DNA comprises about 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% methylated filler DNA with remainder being unmethylated filler DNA, and preferably between 5% and 50%, between 10%-40%, or between 15%-30% methylated filler DNA.
- the first amount of filler DNA is from 20 ng to 100 ng, preferably 30 ng to 100 ng, more preferably 50 ng to 100 ng.
- the cell-free DNA from the sample and the first amount of filler DNA together comprises at least 50 ng of total DNA, preferably at least 100 ng of total DNA.
- he filler DNA is 50 bp to 800 bp long, preferably 100 bp to 600 bp long, and more preferably 200 bp to 600 bp long.
- the filler DNA is double stranded.
- the filler DNA is double stranded.
- the filler DNA can be junk DNA.
- the filler DNA may also be endogenous or exogenous DNA.
- the filler DNA is non-human DNA, and in preferred embodiments, DNA.
- ⁇ DNA refers to Enterobacteria phage ⁇ DNA.
- the filler DNA has no alignment to human DNA.
- the binder is a protein comprising a Methyl-CpG-binding domain.
- MBD2 protein a protein comprising a Methyl-CpG-binding domain.
- MBD2 protein a protein comprising a Methyl-CpG-binding domain.
- MBD2 protein a protein comprising a Methyl-CpG-binding domain.
- MBD2 protein a protein comprising a Methyl-CpG-binding domain.
- MBD2 protein Methyl-CpG-binding domain
- Human proteins MECP2, MBD1, MBD2, MBD3, and MBD4 comprise a family of nuclear proteins related by the presence in each of a methyl-CpG-binding domain (MBD). Each of these proteins, with the exception of MBD3, is capable of binding specifically to methylated DNA.
- the binder is an antibody and capturing cell-free methylated DNA comprises immunoprecipitating the cell-free methylated DNA using the antibody.
- immunoprecipitation refers a technique of precipitating an antigen (such as polypeptides and nucleotides) out of solution using an antibody that specifically binds to that particular antigen. This process can be used to isolate and concentrate a particular protein or DNA from a sample and requires that the antibody be coupled to a solid substrate at some point in the procedure.
- the solid substrate includes for examples beads, such as magnetic beads. Other types of beads and solid substrates are known in the art.
- One exemplary antibody is 5-MeC antibody.
- the method described herein further comprises the step of adding a second amount of control DNA to the sample.
- the method further comprises the step of adding a second amount of control DNA to the sample for confirming the immunoprecipitation reaction.
- control may comprise both positive and negative control, or at least a positive control.
- the method further comprises the step of adding a second amount of control DNA to the sample for confirming the capture of cell-free methylated DNA.
- identifying the presence of DNA from cancer cells further includes identifying the cancer cell tissue of origin.
- tumor tissue sampling may be challenging or carry significant risks, in which case diagnosing and/or subtyping the cancer without the need for tumor tissue sampling may be desired.
- lung tumor tissue sampling may require invasive procedures such as mediastinoscopy, thoracotomy, or percutaneous needle biopsy; these procedures may result in a need for hospitalization, chest tube, mechanical ventilation, antibiotics, or other medical interventions.
- Some individuals may not undergo the invasive procedures needed for tumor tissue sampling either because of medical comorbidities or due to preference.
- the actual procedure for tumor tissue procurement may depend on the suspected cancer subtype.
- cancer subtype may evolve over time within the same individual; serial assessment with invasive tumor tissue sampling procedures is often impractical and not well tolerated by patients.
- non-invasive cancer subtyping via blood test could have many advantageous applications in the practice of clinical oncology.
- identifying the cancer cell tissue of origin further includes identifying a cancer subtype.
- the cancer subtype differentiates the cancer based on stage (e.g., early stage lung cancer treated with surgery vs late stage lung cancer treated with chemotherapy), histology (e.g., small cell carcinoma vs adenocarcinoma vs squamous cell carcinoma in lung cancer), gene expression pattern or transcription factor activity (e.g., ER status in breast cancer), copy number aberrations (e.g., HER2 status in breast cancer), specific rearrangements (e.g., FLT3 in AML), specific gene point mutational status (e.g., IDH gene point mutations), and DNA methylation patterns (e.g., MGMT gene promoter methylation in brain cancer).
- stage e.g., early stage lung cancer treated with surgery vs late stage lung cancer treated with chemotherapy
- histology e.g., small cell carcinoma vs adenocarcinoma vs squamous cell carcinoma
- comparison in step (f) is carried out genome-wide.
- the comparison in step (f) is restricted from genome-wide to specific regulatory regions, such as, but not limited to, FANTOM5 enhancers, CpG Islands, CpG shores, CpG Shelves, or any combination of the foregoing.
- certain steps are carried out by a computer processor.
- a method of detecting the presence of DNA from cancer cells and identifying a cancer subtype comprising: receiving sequencing data of cell-free methylated DNA from a subject sample; comparing the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals; and if DNA from cancer cells is identified, further identifying the cancer cell tissue of origin and cancer subtype based on the comparison step.
- a method of detecting the presence of DNA from cancer cells and determining the location of the cancer from which the cancer cells arose from two or more possible organs comprising: providing a sample of cell-free DNA from a subject; capturing cell-free methylated DNA from said sample, using a binder selective for methylated polynucleotides; sequencing the captured cell-free methylated DNA; comparing the sequence patterns of the captured cell-free methylated DNA to DNAs sequence patterns of two or more population(s) of control individuals, each of said two or more populations having localized cancer in a different organ; determining as to which organ the cancer cells arose on the basis of a statistically significant similarity between the pattern of methylation of the cell-free DNA and one of said two or more populations.
- FIG. 5 shows a generic computer device 100 that may include a central processing unit (“CPU”) 102 connected to a storage unit 104 and to a random access memory 106 .
- the CPU 102 may process an operating system 101 , application program 103 , and data 123 .
- the operating system 101 , application program 103 , and data 123 may be stored in storage unit 104 and loaded into memory 106 , as may be required.
- Computer device 100 may further include a graphics processing unit (GPU) 122 which is operatively connected to CPU 102 and to memory 106 to offload intensive image processing calculations from CPU 102 and run these calculations in parallel with CPU 102 .
- GPU graphics processing unit
- An operator 107 may interact with the computer device 100 using a video display 108 connected by a video interface 105 , and various input/output devices such as a keyboard 115 , mouse 112 , and disk drive or solid state drive 114 connected by an I/O interface 109 .
- the mouse 112 may be configured to control movement of a cursor in the video display 108 , and to operate various graphical user interface (GUI) controls appearing in the video display 108 with a mouse button.
- GUI graphical user interface
- the disk drive or solid state drive 114 may be configured to accept computer readable media 116 .
- the computer device 100 may form part of a network via a network interface 111 , allowing the computer device 100 to communicate with other suitably configured data processing systems (not shown).
- One or more different types of sensors 135 may be used to receive input from various sources.
- the present system and method may be practiced on virtually any manner of computer device including a desktop computer, laptop computer, tablet computer or wireless handheld.
- the present system and method may also be implemented as a computer-readable/useable medium that includes computer program code to enable one or more computer devices to implement each of the various process steps in a method in accordance with the present invention.
- the computer devices are networked to distribute the various steps of the operation.
- the terms computer-readable medium or computer useable medium comprises one or more of any type of physical embodiment of the program code.
- the computer-readable/useable medium can comprise program code embodied on one or more portable storage articles of manufacture (e.g. an optical disc, a magnetic disk, a tape, etc.), on one or more data storage portioned of a computing device, such as memory associated with a computer and/or a storage system.
- a computer-implemented method of detecting the presence of DNA from cancer cells and identifying a cancer subtype comprising: receiving, at least one processor, sequencing data of cell-free methylated DNA from a subject sample; comparing, at the at least one processor, the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying, at the at least one processor, the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals and if DNA from cancer cells is identified, further identifying the cancer cell tissue of origin and cancer subtype based on the comparison step;
- a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method described herein.
- a computer readable medium having stored thereon a data structure for storing the computer program product described herein.
- a device for detecting the presence of DNA from cancer cells and identifying a cancer subtype comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: receive sequencing data of cell-free methylated DNA from a subject sample; compare the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identify the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals and if DNA from cancer cells from is identified, further identify the cancer cell tissue of origin and cancer subtype based on the comparison step.
- processor may be any type of processor, such as, for example, any type of general-purpose microprocessor or microcontroller (e.g., an IntelTM x86, PowerPCTM, ARMTM processor, or the like), a digital signal processing (DSP) processor, an integrated circuit, a field programmable gate array (FPGA), or any combination thereof.
- general-purpose microprocessor or microcontroller e.g., an IntelTM x86, PowerPCTM, ARMTM processor, or the like
- DSP digital signal processing
- FPGA field programmable gate array
- memory may include a suitable combination of any type of computer memory that is located either internally or externally such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), and electrically-erasable programmable read-only memory (EEPROM), or the like.
- RAM random-access memory
- ROM read-only memory
- CDROM compact disc read-only memory
- electro-optical memory magneto-optical memory
- EPROM erasable programmable read-only memory
- EEPROM electrically-erasable programmable read-only memory
- “computer readable storage medium” (also referred to as a machine-readable medium, a processor-readable medium, or a computer usable medium having a computer-readable program code embodied therein) is a medium capable of storing data in a format readable by a computer or machine.
- the machine-readable medium can be any suitable tangible, non-transitory medium, including magnetic, optical, or electrical storage medium including a diskette, compact disk read only memory (CD-ROM), memory device (volatile or non-volatile), or similar storage mechanism.
- the computer readable storage medium can contain various sets of instructions, code sequences, configuration information, or other data, which, when executed, cause a processor to perform steps in a method according to an embodiment of the disclosure.
- data structure a particular way of organizing data in a computer so that it can be used efficiently.
- Data structures can implement one or more particular abstract data types (ADT), which specify the operations that can be performed on a data structure and the computational complexity of those operations.
- ADT abstract data types
- a data structure is a concrete implementation of the specification provided by an ADT.
- CRC, Breast cancer, and GBM samples were obtained from the University Health Network BioBank; AML samples were obtained from the University Health Network Leukemia BioBank; Lastly, healthy controls were recruited through the Family Medicine Centre at Mount Sinai Hospital (MSH) in Toronto, Canada. All samples collected with patient consent, were obtained with institutional approval from the Research Ethics Board, from University Health Network and Mount Sinai Hospital in Toronto, Canada.
- EDTA and ACD plasma samples were obtained from the BioBanks and from the Family Medicine Centre at Mount Sinai Hospital (MSH) in Toronto, Canada. All samples were either stored at ⁇ 80° C. or in vapour phase liquid nitrogen until use.
- Cell-free DNA was extracted from 0.5-3.5 ml of plasma using the QlAamp Circulating Nucleic Acid Kit (Qiagen). The extracted DNA was quantified through Qubit prior to use.
- Human colorectal tumor tissue obtained with patient consent from the University Health Network Biobank as approved by the Research Ethics Board at University Health Network, was digested to single cells using collagenase A. Single cells were subcutaneously injected into 4-6 week old NOD/SCID male mouse. Mice were euthanized by CO2 inhalation prior to blood collection by cardiac puncture and stored in EDTA tubes. From the collected blood samples, the plasma was isolated and stored at ⁇ 80 C. Cell-free DNA was extracted from 0.3-0.7 ml of plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen). All animal work was carried out in compliance with the ethical regulations approved by the Animal Care Committee at University Health Network.
- cfMeDIP-seq protocol A schematic representation of the cfMeDIP-seq protocol is shown in WO2017/190215.
- the DNA samples Prior to cfMeDIP, the DNA samples were subjected to library preparation using the Kapa Hyper Prep Kit (Kapa Biosystems). The manufacturer protocol was followed with some modifications. Briefly, the DNA of interest was added to 0.2 mL PCR tube and subjected to end-repair and A-Tailing. Adapter ligation was followed using NEBNext adapter (from the NEBNext Multiplex Oligos for Illumina kit, New England Biolabs) at a final concentration of 0.181 ⁇ M, incubated at 20° C. for 20 mins and purified with AMPure XP beads. The eluted library was digested using the USER enzyme (New England Biolabs Canada) followed by purification with Qiagen MinElute PCR Purification Kit prior to MeDIP.
- the prepared libraries were combined with the pooled methylated/unmethylated PCR product to a final DNA amount of 100 ng and subjected to MeDIP using the protocol from Taiwo et al. 2012[7] with some modifications. Briefly, for MeDIP, the Diagenode MagMeDIP kit (Cat #C02010021) was used following the manufacturer's protocol with some modifications. After the addition of 0.3 ng of the control methylated and 0.3 ng of the control unmethylated A. thaliana DNA, the filler DNA (to complete the total amount of DNA [cfDNA+Filler+Controls] to 100 ng) and the buffers to the PCR tubes containing the adapter ligated DNA, the samples were heated to 95° C.
- each sample was partitioned into two 0.2 mL PCR tubes: one for the 10% input control and the other one for the sample to be subjected to immunoprecipitation.
- the included 5-mC monoclonal antibody 33D3 (Cat #C15200081) from the MagMeDIP kit was diluted 1:15 prior to generating the diluted antibody mix and added to the sample. Washed magnetic beads (following manufacturer instructions) were also added prior to incubation at 4° C. for 17 hours.
- the samples were purified using the Diagenode iPure Kit and eluted in 50 ⁇ l of Buffer C.
- the success of the reaction was validated through qPCR to detect the presence of the spiked-in A. thaliana DNA, ensuring a % recovery of unmethylated spiked-in DNA ⁇ 1% and the % specificity of the reaction >99% (as calculated by 1 ⁇ [recovery of spiked-in unmethylated control DNA over recovery of spiked-in methylated control DNA]), prior to proceeding to the next step.
- the optimal number of cycles to amplify each library was determined through the use of qPCR, after which the samples were amplified using the KAPA HiFi Hotstart Mastermix and the NEBNext multiplex oligos added to a final concentration of 0.3 ⁇ M.
- the PCR settings used to amplify the libraries were as follows: activation at 95° C.
- the amplified libraries were purified using MinElute PCR purification column and then gel size selected with 3% Nusieve GTG agarose gel to remove any adapter dimers.
- DNA sequencing libraries were constructed from 83 ng of fragmented DNA using the KAPA Hyper Prep Kit (Kapa Biosystems, Wilmington, Mass.) utilizing NEXTflex-96 DNA Barcode adapters (Bio Scientific, Austin, Tex.) adapters.
- KAPA Hyper Prep Kit Kapa Biosystems, Wilmington, Mass.
- NEXTflex-96 DNA Barcode adapters Bio Scientific, Austin, Tex.
- the barcoded libraries were pooled and then applied the custom hybrid capture library following manufacturer's instructions (IDT xGEN Lockdown protocol version 2.1). These fragments were sequenced to >10,000 ⁇ read coverage using an Illumina HiSeq 2000 instrument. Resulting reads were aligned using bwa-mem and mutation
- cfDNA MeDIP profiles were quantified using the MEDIPS R package[8], converted to RPKMs, and afterwards transformed into log 2 counts-per-million. Subsequently, a linear model was fit using limma-trend[9] on a matrix of features that mapped to FANTOM5 enhancers, CpG Islands, CpG shores and CpG Shelves, with the percentage of spike-in methylated DNA recovered included as a covariate to control for pulldown efficiency variation. Pairwise contrasts were evaluated for each pair of tissue types and the top 150 and the bottom 150 DMRs were selected for elastic net classifier training and validation of cancer-type specificity. Performance metrics were derived by majority class votes on out-of-fold calls from the model with the highest Kappa value in cross-validation, a heuristic previously employed in Chakravarthy et al[10].
- CV 10-Fold Cross-Validation
- FIG. 1A We bioinformatically simulated mixtures with different proportions of ctDNA, from 0.001% to 10% ( FIG. 1A , column facets). We also simulated scenarios where the ctDNA had 1, 10, 100, 1000, or 10000 DMRs (Differentially Methylated Regions) as compared to normal cfDNA ( FIG. 1A , row facets). Reads were then sampled at varying sequencing depths at each locus (10 ⁇ , 100 ⁇ , 1000 ⁇ , and 10000 ⁇ ) ( FIG. 1A , x-axis). We found an increasing probability of detecting of at least 1 cancer-specific event ( FIG. 1A ) as the number of DMRs increased, even at low abundance of cancer ctDNA and shallow coverage.
- pan-cancer data from The Cancer Genome Atlas shows large numbers of DMRs between tumor and normal tissues across virtually all tumor types[11]. Therefore, these findings highlighted that an assay that successfully recovered cancer-specific DNA methylation alterations from ctDNA could serve as a very sensitive tool to detect, classify, and monitor malignant disease with low sequencing-associated costs.
- cfMeDIP-seq cell-free Methylated DNA Immunoprecipitation and high-throughput sequencing
- the filler DNA consisted of amplicons similar in size to an adapter-ligated cfDNA library and was composed of unmethylated and in vitro methylated DNA at different CpG densities. The addition of this filler DNA also serves a practical use, as different patients will yield different amounts of cfDNA, allowing for the normalization of input DNA amount to 100 ng. This ensures that the downstream protocol remains exactly the same for all samples regardless of the amount of available cfDNA.
- the libraries were sequenced to saturation ( FIGS. 6A-6E ) at around 30 to 70 million reads per library (Supplementary Table 1).
- the raw reads were aligned to both the human genome and the ⁇ genome, and found virtually no alignment was found to the ⁇ genome (Supplementary Table 1). Therefore, the addition of the exogenous 2 , DNA as filler DNA did not interfere with the generation of sequencing data.
- the CpG Enrichment Score as a quality control measure for the immunoprecipitation step[8]. All the libraries showed similar enrichment for CpGs while the input control, as expected, showed no enrichment ( FIG. 6G ), validating our immunoprecipitations even at extremely low inputs (ing).
- CRC Colorectal Cancer
- MM Multiple Myeloma
- S cell line DNA both sheared to mimic cfDNA sizes.
- CRC DNA was diluted from 100%, 10%, 1%, 0.1%, 0.01%, 0.001%, to 0% and performed cfMeDIP-seq on each of these dilutions.
- Cancer DNA is frequently hypermethylated at CpG-rich regions[17]. Since cfMeDIP-seq specifically targets methylated CpG-rich sequences, we hypothesized that ctDNA would be preferentially enriched during the immunoprecipitation procedure. To test this, we generated patient-derived xenografts (PDXs) from two colorectal cancer patients and collected the mouse plasma. Tumor-derived human cfDNA was present at less than 1% frequency within the total cfDNA pool in the input samples and at 2-fold greater abundance following immunoprecipitation ( FIG. 1G ; Supplementary Table 3). These results suggest that through biased sequencing of ctDNA, the cfMeDIP procedure could further increase ctDNA detection sensitivity.
- PDXs patient-derived xenografts
- Circulating Plasma cfDNA Methylation Profile can Distinguish Between Multiple Cancer Types and Healthy Donors
- DNA methylation patterns are tissue-specific, and have been used to stratify cancer patients into clinically relevant disease subgroups in glioblastoma[18], ependymomas[6], colorectal[19], and breast[20, 21], among many other cancer types.
- cfDNA associated profiles could be used to identify tissues-of-origin for multiple tumor types.
- cfDNA MeDIP profiles were evaluated to discriminate between disease subtypes in five distinct cases—gene expression pattern (ER status in breast cancer), copy number aberration (HER2 status in breast cancer), rearrangement (FLT3 ITD status in AML), point mutation (IDH mutation in GBM), and finally histology in lung cancer.
- linear models were used to select and rank features as described earlier.
- hierarchical clustering was used to evaluate the grouping of samples. Density clustering based on t-Distributed Stochastic Neighbor Embedding (tSNE)[22] based on the methylation status of selected features revealed distinct clustering of samples based on each of these five distinct examples of cancer subtype classification.
- tSNE t-Distributed Stochastic Neighbor Embedding
- Performance was then evaluated using AUROC (area under the receiver operating characteristic curve) derived from test set samples (held-out during the DMR selection and the subsequent GLMnet training/tuning steps). This process was repeated with 50 different splits of the discovery cohort into training and test sets to mitigate the influence of training-set biases. This culminated in a collection of 50 models for each one-vs other-classes comparison (480 models in total). Hereby, we refer to this collection of models as E50.
- cfDNA methylation patterns to accurately represent tissue-of-origin also overcomes limitations of mutation-based assays, wherein specificity for tissues-of-origin may be low due to the recurrent nature of many potential driver mutations across cancers in different tissues[23].
- Mutation based assays may also be rendered insensitive by the clonal structure of tumors, where subclonal drivers may be harder to detect by virtue of lower abundance in ctDNA[24].
- Mutation based ctDNA approaches are also vulnerable to potential confounding by driver mutations in benign tissues, which have been observed[25], and documented to display evidence of positive selection[26].
- cfMeDIP-seq as an efficient and cost-effective tool with the potential to influence management of cancer and early detection.
- the accuracy and versatility of cfMeDIP-seq may be useful to inform therapeutic decisions in settings where resistance is correlated to epigenetic alterations, such as sensitivity to androgen receptor inhibition in prostate cancer[27].
- the potential opportunities for early diagnosis and screening may be particularly evident in lung cancer, a disease in which screening has already shown clinical utility but for which existing screening tests (i.e., low dose CT scanning) has significant limitations such as ionizing radiation exposure and high false positive rate.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Analytical Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Immunology (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Pathology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Hospice & Palliative Care (AREA)
- Oncology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
Abstract
Description
- This application is a continuation of U.S. patent application Ser. No. 16/630,299 filed Jan. 10, 2020, which is a 371 Application of International Application No. PCT/CA2018/000141, filed Jul. 11, 2018, which claims priority to U.S. Provisional Patent Application No. 62/531,527 filed Jul. 12, 2017, each of which is hereby incorporated by reference in its entirety.
- The invention relates to cancer detection and classification and more particularly to the use of methylome analysis for the same.
- The use of circulating cell-free DNA (cfDNA) as a source of biomarkers is rapidly gaining momentum in oncology[1]. Use of DNA methylation mapping of cfDNA as a biomarker could have a significant impact in the field of liquid biopsy, as it could allow for the identification of the tissue-of-origin[2], allow for cancer type and subtype classification, and stratify cancer patients in a minimally invasive fashion[3]. Furthermore, using genome-wide DNA methylation mapping of cfDNA could overcome a critical sensitivity problem in detecting circulating tumor DNA (ctDNA) in patients with early-stage cancer with no radiographic evidence of disease. Existing ctDNA detection methods are based on sequencing mutations and have limited sensitivity in part due to the limited number of recurrent mutations available to distinguish between tumor and normal circulating cfDNA[4, 5]. On the other hand, genome-wide DNA methylation mapping leverages large numbers of epigenetic alterations that may be used to distinguish circulating tumor DNA (ctDNA) from normal circulating cell-free DNA (cfDNA). For example, some tumor types, such as ependymomas, can have extensive DNA methylation aberrations without any significant recurrent somatic mutations[6].
- Certain methods of capturing cell-free methylated DNA are described in WO 2017/190215, which is incorporated by reference.
- In an aspect, there is provided a method of detecting the presence of DNA from cancer cells in a subject comprising: providing a sample of cell-free DNA from a subject; subjecting the sample to library preparation to permit subsequent sequencing of the cell-free methylated DNA; adding a first amount of filler DNA to the sample, wherein at least a portion of the filler DNA is methylated, then optionally denaturing the sample; capturing cell-free methylated DNA using a binder selective for methylated polynucleotides; sequencing the captured cell-free methylated DNA; comparing the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals.
- In an aspect, there is provided a method of detecting the presence of DNA from cancer cells and identifying a cancer subtype, the method comprising: receiving sequencing data of cell-free methylated DNA from a subject sample; comparing the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals; and if DNA from cancer cells is identified, further identifying the cancer cell tissue of origin and cancer subtype based on the comparison.
- In an aspect, there is provided a computer-implemented method of detecting the presence of DNA from cancer cells and identifying a cancer subtype, the method comprising: receiving, at least one processor, sequencing data of cell-free methylated DNA from a subject sample; comparing, at the at least one processor, the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying, at the at least one processor, the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNA sequences from cancerous individuals and if DNA from cancer cells is identified, further identifying the cancer cell tissue of origin and cancer subtype based on the comparison.
- In an aspect, there is provided a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method described herein.
- In an aspect, there is provided a computer readable medium having stored thereon a data structure for storing the computer program product described herein.
- In an aspect, there is provided a device for detecting the presence of DNA from cancer cells and identifying a cancer subtype, the device comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: receive sequencing data of cell-free methylated DNA from a subject sample; compare the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identify the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNA sequences from cancerous individuals and if DNA from cancer cells is identified, further identify the cancer cell tissue of origin and cancer subtype based on the comparison.
- In an aspect, there is provided a method of detecting the presence of DNA from cancer cells and determining the location of the cancer from which the cancer cells arose from two or more possible organs, the method comprising: providing a sample of cell-free DNA from a subject; capturing cell-free methylated DNA from said sample, using a binder selective for methylated polynucleotides; sequencing the captured cell-free methylated DNA; comparing the sequence patterns of the captured cell-free methylated DNA to DNAs sequence patterns of two or more population(s) of control individuals, each of said two or more populations having localized cancer in a different organ; determining as to which organ the cancer cells arose on the basis of a statistically significant similarity between the pattern of methylation of the cell-free DNA and one of said two or more populations.
- These and other features of the preferred embodiments of the invention will become more apparent in the following detailed description in which reference is made to the appended drawings wherein:
-
FIG. 1 shows methylome analysis of cfDNA is a highly sensitive approach to enrich and detect ctDNA in low amounts of input DNA.FIG. 1A shows a computer simulation of the probability to detect at least one epimutation as a function of the concentration of ctDNA (columns), number of DMRs being investigated (rows), and the sequencing depth (x-axis).FIG. 1B shows genome-wide Pearson correlation between DNA methylation signal for 1 to 100 ng of input DNA from HCT116 cell line fragmented to mimic plasma cfDNA. Each concentration has two biological replicates.FIG. 1C shows a DNA methylation profile obtained from cfMeDIP-seq from different concentrations of input DNA from HCT116 (Green Tracks) plus RRBS (Reduced Representation Bisulfite Sequencing) HCT116 data obtained from ENCODE (ENCSR000DFS) and WGBS (Whole-Genome Bisulfite Sequencing) HCT116 data obtained from GEO (GSM1465024). For the heatmap (RRBS track), yellow means methylated, blue means unmethylated and gray means no coverage.FIG. 1D andFIG. 1E show results of serial dilution of the CRC cell line HCT116 into the Multiple Myeloma (MM) cell line MM1.S. cfMeDIP-seq was performed in pure HCT116 DNA (100% CRC), pure MM1.S DNA (100% MM) and 10%, 1%, 0.1%, 0.01%, and 0.001% CRC DNA diluted into MM DNA. All DNA was fragmented to mimic plasma cfDNA. We observed an almost perfect linear correlation (r2=0.99, p<0.0001) between the observed versus expected (FIG. 1D ) numbers of DMRs and (FIG. 1E ) the DNA methylation signal (in RPKM) within those DMRs.FIG. 1F illustrates that in the same dilution series, known somatic mutations are only detectable at 1/100 allele fraction by ultra-deep (>10,000×) targeted sequencing, above the background sequencer and polymerase error rate. Shown are the fractions of reads containing each base or an insertion/deletion at the site of each mutation in the CRC cell line.FIG. 1G depicts a bar graph showing frequency of ctDNA (human) as a percentage of total cfDNA (human+mice) in the plasma of mice harboring patient-derived xenograft (PDX) from two colorectal cancer patients. -
FIG. 2 shows the methylome analysis of plasma cfDNA allows tumor classification.FIG. 2A illustrates a schematic demonstrating the approach of machine learning classifier construction for cancer classification.FIG. 2B depicts a heatmap of DMRs contained within the multi-class elastic net machine learning classifiers. The classifiers were trained on plasma DNA samples from healthy donors (n=24), lung cancer (n=25), breast cancer (n=25), colorectal cancer (n=23), acute myelogenous leukemia (AML) (n=28), and glioblasatoma multiforme (GBM) (n=71). Hierarchical clustering method: Ward.FIG. 2C shows 2D visualizations by tSNE (t-Distributed Stochastic Neighbor Embedding) of the cancer-type associated DMRs identified in 10% or 25% of models.FIG. 2D depicts a plot showing metrics for the plasma cfDNA methylation-based multi-cancer classifier. Area under the receiver operator curve (auROC) shown on the y-axis for each cancer type and healthy donors following 50-fold generation of elastic net machine learning classifiers. -
FIG. 3 shows validation of the multi-cancer classifier on independent cohorts. InFIG. 3A , ROC curves are shown for independent validation of the multi-cancer classifier on cohorts of lung cancer (LUC) (n=55 LUC vs n=97 other), AML (n=35 AML vs n=117 other), and healthy donors (n=62 healthy donors vs n=90 other). InFIG. 3B , ROC curves are shown for independent validation of the multi-cancer classifier on early stage LUC (n=32 stage I-II LUC vs n=97 other) and late stage LUC (n=23 stage III-IV LUC vs n=97 other). -
FIG. 4 shows the methylome analysis of plasma cfDNA allows tumor subtype classification.FIG. 4A shows 2D visualizations by tSNE (t-Distributed Stochastic Neighbor Embedding) of cancer subtype associated DMRs. Breast cancer subtypes show ability to distinguish between patients harboring tumors with distinct gene expression pattern and transcription factor activity (ER status) as well as distinct tumor copy number aberrations (HER2 status). AML subtypes show ability to distinguish between patients harboring tumors with distinct rearrangements (FLT3 status). Glioblastoma multiforme (GBM) subtypes show ability to distinguish between patients harboring tumors with distinct point mutations (IDH gene mutational status). Lung cancer subtypes show ability to distinguish between patients harboring tumors with distinct histologies that have prognostic and therapeutic implications (adenocarcinoma vs. squamous carcinoma vs. small cell carcinoma).FIG. 4B depicts a heatmap showing the top DMRs that allow accurate discrimination of the three breast cancer subtypes in breast cancer plasma samples.FIG. 4C depicts a heatmap showing the top DMRs that allow accurate discrimination of the FLT3-ITD status in AML patient plasma samples.FIG. 4D depicts a heatmap showing the top DMRs that allow accurate discrimination of the IDH gene mutational status in glioblastoma multiforme (GBM) patient plasma samples.FIG. 4E depicts a heatmap showing the top DMRs that allow accurate discrimination of the three lung cancer histologies in lung cancer plasma samples. -
FIG. 5 shows a suitable configured computer device, and associated communications networks, devices, software and firmware to provide a platform for enabling one or more embodiments as described herein. -
FIG. 6 shows sequencing saturation analysis and quality controls.FIG. 6A ,FIG. 6B ,FIG. 6C ,FIG. 6D , andFIG. 6E , show the results of the saturation analysis from the Bioconductor package MEDIPS analyzing cfMeDIP-seq data from each replicate for each input concentration from the HCT116 DNA fragmented to mimic plasma cfDNA.FIG. 6F is a graph showing the results of the protocol tested in two replicates of four starting DNA concentrations (100, 10, 5, and 1 ng) of HCT116 cell line. Specificity of the reaction was calculated using methylated and unmethylated spiked-in A. thaliana DNA. Fold enrichment ratio was calculated using genomic regions of the fragmented HCT116 DNA (Primers for methylated testis-specific H2B, TSH2B0 and unmethylated human DNA region (GAPDH promoter)). The horizontal dotted line indicates a fold-enrichment ratio threshold of 25. Error bars represent ±1 s.e.m.FIG. 6G depicts a bar graph showing CpG Enrichment Scores of the sequenced samples show a robust enrichment of CpGs within the genomic regions from the immunoprecipitated samples compared to the input control. The CpG Enrichment Score was obtained by dividing the relative frequency of CpGs of the regions by the relative frequency of CpGs of the human genome. Error bars represent ±1 s.e.m. - In the following description, numerous specific details are set forth to provide a thorough understanding of the invention. However, it is understood that the invention may be practiced without these specific details.
- DNA methylation profiles are cell-type specific and are disrupted in cancer. Using a robust and sensitive method designed for methylome analysis of minute amounts of circulating cell-free DNA (cfDNA), we identified thousands of Differentially Methylated Regions (DMRs) that distinguish multiple tumor types from each other and from healthy individuals. Methylome analysis of cfDNA is highly sensitive and suitable for detecting circulating tumor DNA (ctDNA) in early stage patients. A machine-learning derived classifier using cfDNA methylomes was able to correctly classify 196 plasma samples from patients with 5 cancer types and healthy donors based on cross-validation. In an independent validation, using the same DMRs identified in the plasma cfDNA, the classifier was able to correctly classify AML, lung cancer, and healthy donors, as well as both early and late stage lung cancer. Therefore, methylome analysis of cfDNA can be used for non-invasive early stage detection of ctDNA and robustly classify cancer types.
- In an aspect, there is provided a method of detecting the presence of DNA from cancer cells in a subject comprising: providing a sample of cell-free DNA from a subject; subjecting the sample to library preparation to permit subsequent sequencing of the cell-free methylated DNA; adding a first amount of filler DNA to the sample, wherein at least a portion of the filler DNA is methylated, then optionally denaturing the sample; capturing cell-free methylated DNA using a binder selective for methylated polynucleotides; sequencing the captured cell-free methylated DNA; comparing the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals.
- Applicant's co-owned applications U.S. Provisional Patent Application No. 62/331,070 filed on May 3, 2016 and International Patent Application No. PCT/CA2017/000108 filed on May 3, 2017 describe method for capturing cell-free methylated DNA and are incorporated herein by reference.
- Cancer has been traditionally classified by tissue of origin—for instance, colorectal cancer, breast cancer, lung cancer, etc. In the modern practice of clinical oncology, it is becoming increasingly important to be able to distinguish subtypes of cancer by various molecular, developmental, and functional underpinnings. Therapeutic decisions often hinge on the precise subtype of cancer, and it may be necessary for clinicians to identify the subtype prior to initiation of therapy. Examples of cancer subtyping that may influence therapeutic decisions include (but are not limited to) stage (e.g., early stage lung cancer treated with surgery vs late stage lung cancer treated with chemotherapy), histology (e.g., small cell carcinoma vs adenocarcinoma vs squamous cell carcinoma in lung cancer), gene expression pattern or transcription factor activity (e.g., ER status in breast cancer), copy number aberrations (e.g., HER2 status in breast cancer), specific rearrangements (e.g., FLT3 in AML), specific gene point mutational status (e.g., IDH gene point mutations), and DNA methylation patterns (e.g., MGMT gene promoter methylation in brain cancer).
- The methods described herein are applicable to a wide variety of cancers, including but not limited to adrenal cancer, anal cancer, bile duct cancer, bladder cancer, bone cancer, brain/cns tumors, breast cancer, castleman disease, cervical cancer, colon/rectum cancer, endometrial cancer, esophagus cancer, ewing family of tumors, eye cancer, gallbladder cancer, gastrointestinal carcinoid tumors, gastrointestinal stromal tumor (gist), gestational trophoblastic disease, hodgkin disease, kaposi sarcoma, kidney cancer, laryngeal and hypopharyngeal cancer, leukemia (acute lymphocytic, acute myeloid, chronic lymphocytic, chronic myeloid, chronic myelomonocytic), liver cancer, lung cancer (non-small cell, small cell, lung carcinoid tumor), lymphoma, lymphoma of the skin, malignant mesothelioma, multiple myeloma, myelodysplastic syndrome, nasal cavity and paranasal sinus cancer, nasopharyngeal cancer, neuroblastoma, non-hodgkin lymphoma, oral cavity and oropharyngeal cancer, osteosarcoma, ovarian cancer, penile cancer, pituitary tumors, prostate cancer, retinoblastoma, rhabdomyosarcoma, salivary gland cancer, sarcoma—adult soft tissue cancer, skin cancer (basal and squamous cell, melanoma, merkel cell), small intestine cancer, stomach cancer, testicular cancer, thymus cancer, thyroid cancer, uterine sarcoma, vaginal cancer, vulvar cancer, waldenstrom macroglobulinemia, wilms tumor.
- Various sequencing techniques are known to the person skilled in the art, such as polymerase chain reaction (PCR) followed by Sanger sequencing. Also available are next-generation sequencing (NGS) techniques, also known as high-throughput sequencing, which includes various sequencing technologies including: Illumina (Solexa) sequencing, Roche 454 sequencing, Ion torrent: Proton/PGM sequencing, SOLiD sequencing. NGS allow for the sequencing of DNA and RNA much more quickly and cheaply than the previously used Sanger sequencing. In some embodiments, said sequencing is optimized for short read sequencing.
- The term “subject” as used herein refers to any member of the animal kingdom, preferably a human being and most preferably a human being that has, has had, or is suspected of having prostate cancer.
- Cell-free methylated DNA is DNA that is circulating freely in the blood stream, and are methylated at various known regions of the DNA. Samples, for example, plasma samples can be taken to analyze cell-free methylated DNA. Accordingly, in some embodiments, the sample is the subject's blood or plasma.
- As used herein, “library preparation” includes list end-repair, A-tailing, adapter ligation, or any other preparation performed on the cell free DNA to permit subsequent sequencing of DNA.
- As used herein, “filler DNA” can be noncoding DNA or it can consist of amplicons.
- DNA samples may be denatured, for example, using sufficient heat.
- In some embodiments, the comparison step is based on fit using a statistical classifier. Statistical classifiers using DNA methylation data can be used for assigning a sample to a particular disease state, such as cancer type or subtype. For the purpose of cancer type or subtype classification, a classifier would consist of one or more DNA methylation variables (i.e., features) within a statistical model, and the output of the statistical model would have one or more threshold values to distinguish between distinct disease states. The particular feature(s) and threshold value(s) that are used in the statistical classifier can be derived from prior knowledge of the cancer types or subtypes, from prior knowledge of the features that are likely to be most informative, from machine learning, or from a combination of two or more of these approaches.
- In some embodiments, the classifier is machine learning-derived. Preferably, the classifier is an elastic net classifier, lasso, support vector machine, random forest, or neural network.
- The genomic space that is analyzed can be genome-wide, or preferably restricted to regulatory regions (i.e., FANTOM5 enhancers, CpG Islands, CpG shores and CpG Shelves).
- Preferably, the percentage of spike-in methylated DNA recovered is included as a covariate to control for pulldown efficiency variation.
- For a classifier capable of distinguishing multiple cancer types (or subtypes) from one another, the classifier would preferably consist of differentially methylated regions from pairwise comparisons of each type (or subtype) of interest.
- In some embodiments, the control cell-free methylated DNAs sequences from healthy and cancerous individuals are comprised in a database of Differentially Methylated Regions (DMRs) between healthy and cancerous individuals.
- In some embodiments, the control cell-free methylated DNA sequences from healthy and cancerous individuals are limited to those control cell-free methylated DNA sequences which are differentially methylated as between healthy and cancerous individuals in DNA derived from cell-free DNA from bodily fluids, such as from blood serum, cerebral spinal fluid, urine stool, sputum, pleural fluid, ascites, tears, sweat, pap smear fluid, endoscopy brushings fluid, . . . etc., preferably from blood plasma.
- In some embodiments, the sample has less than 100 ng, 75 ng, or 50 ng of cell-free DNA.
- In some embodiments, the first amount of filler DNA comprises about 5%, 10%, 15%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% methylated filler DNA with remainder being unmethylated filler DNA, and preferably between 5% and 50%, between 10%-40%, or between 15%-30% methylated filler DNA.
- In some embodiments, the first amount of filler DNA is from 20 ng to 100 ng, preferably 30 ng to 100 ng, more preferably 50 ng to 100 ng.
- In some embodiments, the cell-free DNA from the sample and the first amount of filler DNA together comprises at least 50 ng of total DNA, preferably at least 100 ng of total DNA.
- In some embodiments, he filler DNA is 50 bp to 800 bp long, preferably 100 bp to 600 bp long, and more preferably 200 bp to 600 bp long.
- In some embodiments, the filler DNA is double stranded. The filler DNA is double stranded. For example, the filler DNA can be junk DNA. The filler DNA may also be endogenous or exogenous DNA. For example, the filler DNA is non-human DNA, and in preferred embodiments, DNA. As used herein, “λ DNA” refers to Enterobacteria phage λ DNA. In some embodiments, the filler DNA has no alignment to human DNA.
- In some embodiments, the binder is a protein comprising a Methyl-CpG-binding domain. One such exemplary protein is MBD2 protein. As used herein, “Methyl-CpG-binding domain (MBD)” refers to certain domains of proteins and enzymes that is approximately 70 residues long and binds to DNA that contains one or more symmetrically methylated CpGs. The MBD of MeCP2, MBD1, MBD2, MBD4 and BAZ2 mediates binding to DNA, and in cases of MeCP2, MBD1 and MBD2, preferentially to methylated CpG. Human proteins MECP2, MBD1, MBD2, MBD3, and MBD4 comprise a family of nuclear proteins related by the presence in each of a methyl-CpG-binding domain (MBD). Each of these proteins, with the exception of MBD3, is capable of binding specifically to methylated DNA.
- In other embodiments, the binder is an antibody and capturing cell-free methylated DNA comprises immunoprecipitating the cell-free methylated DNA using the antibody. As used herein, “immunoprecipitation” refers a technique of precipitating an antigen (such as polypeptides and nucleotides) out of solution using an antibody that specifically binds to that particular antigen. This process can be used to isolate and concentrate a particular protein or DNA from a sample and requires that the antibody be coupled to a solid substrate at some point in the procedure. The solid substrate includes for examples beads, such as magnetic beads. Other types of beads and solid substrates are known in the art.
- One exemplary antibody is 5-MeC antibody. For the immunoprecipitation procedure, in some embodiments at least 0.05 μg of the antibody is added to the sample; while in more preferred embodiments at least 0.16 μg of the antibody is added to the sample. To confirm the immunoprecipitation reaction, in some embodiments the method described herein further comprises the step of adding a second amount of control DNA to the sample.
- In some embodiments, the method further comprises the step of adding a second amount of control DNA to the sample for confirming the immunoprecipitation reaction.
- As used herein, the “control” may comprise both positive and negative control, or at least a positive control.
- In some embodiments, the method further comprises the step of adding a second amount of control DNA to the sample for confirming the capture of cell-free methylated DNA.
- In some embodiments, identifying the presence of DNA from cancer cells further includes identifying the cancer cell tissue of origin.
- In some instances, tumor tissue sampling may be challenging or carry significant risks, in which case diagnosing and/or subtyping the cancer without the need for tumor tissue sampling may be desired. For example, lung tumor tissue sampling may require invasive procedures such as mediastinoscopy, thoracotomy, or percutaneous needle biopsy; these procedures may result in a need for hospitalization, chest tube, mechanical ventilation, antibiotics, or other medical interventions. Some individuals may not undergo the invasive procedures needed for tumor tissue sampling either because of medical comorbidities or due to preference. In some instances, the actual procedure for tumor tissue procurement may depend on the suspected cancer subtype. In other instances, cancer subtype may evolve over time within the same individual; serial assessment with invasive tumor tissue sampling procedures is often impractical and not well tolerated by patients. Thus, non-invasive cancer subtyping via blood test could have many advantageous applications in the practice of clinical oncology.
- Accordingly, in some embodiments, identifying the cancer cell tissue of origin further includes identifying a cancer subtype. Preferably, the cancer subtype differentiates the cancer based on stage (e.g., early stage lung cancer treated with surgery vs late stage lung cancer treated with chemotherapy), histology (e.g., small cell carcinoma vs adenocarcinoma vs squamous cell carcinoma in lung cancer), gene expression pattern or transcription factor activity (e.g., ER status in breast cancer), copy number aberrations (e.g., HER2 status in breast cancer), specific rearrangements (e.g., FLT3 in AML), specific gene point mutational status (e.g., IDH gene point mutations), and DNA methylation patterns (e.g., MGMT gene promoter methylation in brain cancer).
- In some embodiments, comparison in step (f) is carried out genome-wide.
- In other embodiments, the comparison in step (f) is restricted from genome-wide to specific regulatory regions, such as, but not limited to, FANTOM5 enhancers, CpG Islands, CpG shores, CpG Shelves, or any combination of the foregoing.
- In some embodiments, certain steps are carried out by a computer processor.
- In an aspect, there is provided a method of detecting the presence of DNA from cancer cells and identifying a cancer subtype, the method comprising: receiving sequencing data of cell-free methylated DNA from a subject sample; comparing the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals; and if DNA from cancer cells is identified, further identifying the cancer cell tissue of origin and cancer subtype based on the comparison step.
- In an aspect, there is provided a method of detecting the presence of DNA from cancer cells and determining the location of the cancer from which the cancer cells arose from two or more possible organs, the method comprising: providing a sample of cell-free DNA from a subject; capturing cell-free methylated DNA from said sample, using a binder selective for methylated polynucleotides; sequencing the captured cell-free methylated DNA; comparing the sequence patterns of the captured cell-free methylated DNA to DNAs sequence patterns of two or more population(s) of control individuals, each of said two or more populations having localized cancer in a different organ; determining as to which organ the cancer cells arose on the basis of a statistically significant similarity between the pattern of methylation of the cell-free DNA and one of said two or more populations.
- The present system and method may be practiced in various embodiments. A suitably configured computer device, and associated communications networks, devices, software and firmware may provide a platform for enabling one or more embodiments as described above. By way of example,
FIG. 5 shows ageneric computer device 100 that may include a central processing unit (“CPU”) 102 connected to astorage unit 104 and to a random access memory 106. The CPU 102 may process anoperating system 101,application program 103, anddata 123. Theoperating system 101,application program 103, anddata 123 may be stored instorage unit 104 and loaded into memory 106, as may be required.Computer device 100 may further include a graphics processing unit (GPU) 122 which is operatively connected to CPU 102 and to memory 106 to offload intensive image processing calculations from CPU 102 and run these calculations in parallel with CPU 102. Anoperator 107 may interact with thecomputer device 100 using avideo display 108 connected by avideo interface 105, and various input/output devices such as a keyboard 115, mouse 112, and disk drive orsolid state drive 114 connected by an I/O interface 109. In known manner, the mouse 112 may be configured to control movement of a cursor in thevideo display 108, and to operate various graphical user interface (GUI) controls appearing in thevideo display 108 with a mouse button. The disk drive orsolid state drive 114 may be configured to accept computerreadable media 116. Thecomputer device 100 may form part of a network via anetwork interface 111, allowing thecomputer device 100 to communicate with other suitably configured data processing systems (not shown). One or more different types of sensors 135 may be used to receive input from various sources. - The present system and method may be practiced on virtually any manner of computer device including a desktop computer, laptop computer, tablet computer or wireless handheld. The present system and method may also be implemented as a computer-readable/useable medium that includes computer program code to enable one or more computer devices to implement each of the various process steps in a method in accordance with the present invention. In case of more than computer devices performing the entire operation, the computer devices are networked to distribute the various steps of the operation. It is understood that the terms computer-readable medium or computer useable medium comprises one or more of any type of physical embodiment of the program code. In particular, the computer-readable/useable medium can comprise program code embodied on one or more portable storage articles of manufacture (e.g. an optical disc, a magnetic disk, a tape, etc.), on one or more data storage portioned of a computing device, such as memory associated with a computer and/or a storage system.
- In an aspect, there is provided a computer-implemented method of detecting the presence of DNA from cancer cells and identifying a cancer subtype, the method comprising: receiving, at least one processor, sequencing data of cell-free methylated DNA from a subject sample; comparing, at the at least one processor, the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identifying, at the at least one processor, the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals and if DNA from cancer cells is identified, further identifying the cancer cell tissue of origin and cancer subtype based on the comparison step;
- In an aspect, there is provided a computer program product for use in conjunction with a general-purpose computer having a processor and a memory connected to the processor, the computer program product comprising a computer readable storage medium having a computer mechanism encoded thereon, wherein the computer program mechanism may be loaded into the memory of the computer and cause the computer to carry out the method described herein.
- In an aspect, there is provided a computer readable medium having stored thereon a data structure for storing the computer program product described herein.
- In an aspect, there is provided a device for detecting the presence of DNA from cancer cells and identifying a cancer subtype, the device comprising: at least one processor; and electronic memory in communication with the at one processor, the electronic memory storing processor-executable code that, when executed at the at least one processor, causes the at least one processor to: receive sequencing data of cell-free methylated DNA from a subject sample; compare the sequences of the captured cell-free methylated DNA to control cell-free methylated DNAs sequences from healthy and cancerous individuals; identify the presence of DNA from cancer cells if there is a statistically significant similarity between one or more sequences of the captured cell-free methylated DNA and cell-free methylated DNAs sequences from cancerous individuals and if DNA from cancer cells from is identified, further identify the cancer cell tissue of origin and cancer subtype based on the comparison step.
- As used herein, “processor” may be any type of processor, such as, for example, any type of general-purpose microprocessor or microcontroller (e.g., an Intel™ x86, PowerPC™, ARM™ processor, or the like), a digital signal processing (DSP) processor, an integrated circuit, a field programmable gate array (FPGA), or any combination thereof.
- As used herein “memory” may include a suitable combination of any type of computer memory that is located either internally or externally such as, for example, random-access memory (RAM), read-only memory (ROM), compact disc read-only memory (CDROM), electro-optical memory, magneto-optical memory, erasable programmable read-only memory (EPROM), and electrically-erasable programmable read-only memory (EEPROM), or the like. Portions of memory 102 may be organized using a conventional filesystem, controlled and administered by an operating system governing overall operation of a device.
- As used herein, “computer readable storage medium” (also referred to as a machine-readable medium, a processor-readable medium, or a computer usable medium having a computer-readable program code embodied therein) is a medium capable of storing data in a format readable by a computer or machine. The machine-readable medium can be any suitable tangible, non-transitory medium, including magnetic, optical, or electrical storage medium including a diskette, compact disk read only memory (CD-ROM), memory device (volatile or non-volatile), or similar storage mechanism. The computer readable storage medium can contain various sets of instructions, code sequences, configuration information, or other data, which, when executed, cause a processor to perform steps in a method according to an embodiment of the disclosure. Those of ordinary skill in the art will appreciate that other instructions and operations necessary to implement the described implementations can also be stored on the computer readable storage medium. The instructions stored on the computer readable storage medium can be executed by a processor or other suitable processing device, and can interface with circuitry to perform the described tasks.
- As used herein, “data structure” a particular way of organizing data in a computer so that it can be used efficiently. Data structures can implement one or more particular abstract data types (ADT), which specify the operations that can be performed on a data structure and the computational complexity of those operations. In comparison, a data structure is a concrete implementation of the specification provided by an ADT.
- The advantages of the present invention are further illustrated by the following examples. The examples and their particular details set forth herein are presented for illustration only and should not be construed as a limitation on the claims of the present invention.
- Methods and Materials
- Donor Recruitment and Sample Acquisition
- CRC, Breast cancer, and GBM samples were obtained from the University Health Network BioBank; AML samples were obtained from the University Health Network Leukemia BioBank; Lastly, healthy controls were recruited through the Family Medicine Centre at Mount Sinai Hospital (MSH) in Toronto, Canada. All samples collected with patient consent, were obtained with institutional approval from the Research Ethics Board, from University Health Network and Mount Sinai Hospital in Toronto, Canada.
- Specimen Processing—cfDNA
- EDTA and ACD plasma samples were obtained from the BioBanks and from the Family Medicine Centre at Mount Sinai Hospital (MSH) in Toronto, Canada. All samples were either stored at −80° C. or in vapour phase liquid nitrogen until use. Cell-free DNA was extracted from 0.5-3.5 ml of plasma using the QlAamp Circulating Nucleic Acid Kit (Qiagen). The extracted DNA was quantified through Qubit prior to use.
- Specimen Processing—PDX cfDNA
- Human colorectal tumor tissue obtained with patient consent from the University Health Network Biobank as approved by the Research Ethics Board at University Health Network, was digested to single cells using collagenase A. Single cells were subcutaneously injected into 4-6 week old NOD/SCID male mouse. Mice were euthanized by CO2 inhalation prior to blood collection by cardiac puncture and stored in EDTA tubes. From the collected blood samples, the plasma was isolated and stored at −80 C. Cell-free DNA was extracted from 0.3-0.7 ml of plasma using the QIAamp Circulating Nucleic Acid Kit (Qiagen). All animal work was carried out in compliance with the ethical regulations approved by the Animal Care Committee at University Health Network.
- cfMeDIP-seq
- A schematic representation of the cfMeDIP-seq protocol is shown in WO2017/190215. Prior to cfMeDIP, the DNA samples were subjected to library preparation using the Kapa Hyper Prep Kit (Kapa Biosystems). The manufacturer protocol was followed with some modifications. Briefly, the DNA of interest was added to 0.2 mL PCR tube and subjected to end-repair and A-Tailing. Adapter ligation was followed using NEBNext adapter (from the NEBNext Multiplex Oligos for Illumina kit, New England Biolabs) at a final concentration of 0.181 μM, incubated at 20° C. for 20 mins and purified with AMPure XP beads. The eluted library was digested using the USER enzyme (New England Biolabs Canada) followed by purification with Qiagen MinElute PCR Purification Kit prior to MeDIP.
- The prepared libraries were combined with the pooled methylated/unmethylated PCR product to a final DNA amount of 100 ng and subjected to MeDIP using the protocol from Taiwo et al. 2012[7] with some modifications. Briefly, for MeDIP, the Diagenode MagMeDIP kit (Cat #C02010021) was used following the manufacturer's protocol with some modifications. After the addition of 0.3 ng of the control methylated and 0.3 ng of the control unmethylated A. thaliana DNA, the filler DNA (to complete the total amount of DNA [cfDNA+Filler+Controls] to 100 ng) and the buffers to the PCR tubes containing the adapter ligated DNA, the samples were heated to 95° C. for 10 mins, then immediately placed into an ice water bath for 10 mins. Each sample was partitioned into two 0.2 mL PCR tubes: one for the 10% input control and the other one for the sample to be subjected to immunoprecipitation. The included 5-mC monoclonal antibody 33D3 (Cat #C15200081) from the MagMeDIP kit was diluted 1:15 prior to generating the diluted antibody mix and added to the sample. Washed magnetic beads (following manufacturer instructions) were also added prior to incubation at 4° C. for 17 hours. The samples were purified using the Diagenode iPure Kit and eluted in 50 μl of Buffer C. The success of the reaction (QC1) was validated through qPCR to detect the presence of the spiked-in A. thaliana DNA, ensuring a % recovery of unmethylated spiked-in DNA <1% and the % specificity of the reaction >99% (as calculated by 1−[recovery of spiked-in unmethylated control DNA over recovery of spiked-in methylated control DNA]), prior to proceeding to the next step. The optimal number of cycles to amplify each library was determined through the use of qPCR, after which the samples were amplified using the KAPA HiFi Hotstart Mastermix and the NEBNext multiplex oligos added to a final concentration of 0.3 μM. The PCR settings used to amplify the libraries were as follows: activation at 95° C. for 3 min, followed by predetermined cycles of 98° C. for 20 sec, 65° C. for 15 sec and 72° C. for 30 sec and a final extension of 72° C. for 1 min. The amplified libraries were purified using MinElute PCR purification column and then gel size selected with 3% Nusieve GTG agarose gel to remove any adapter dimers. Prior to submission for sequencing, the fold enrichment of a methylated human DNA region (testis-specific H2B, TSH2B) and an unmethylated human DNA region (GAPDH promoter) was determined for the MeDIP-seq and cfMeDIP-seq libraries generated from the HCT116 cell line DNA sheared to mimic cell free DNA (Cell line obtained from ATCC, mycoplasma free). The final libraries were submitted for BioAnalyzer analysis prior to sequencing at the UHN Princess Margaret Genomic Centre on an Illumina HiSeq 2000.
- Ultra-Deep Targeted Sequencing for Point Mutation Detection
- We used the QlAgen Circulating Nucleic Acid kit to isolate cell-free DNA from ˜20 mL of plasma (4-5×10 mL EDTA blood tubes) from patients with matched tumor tissue molecular profiling data generated prior to enrolment in early phase clinical trials at the Princess Margaret Cancer Centre. DNA was extracted from cell lines (dilution of CRC and MM cell lines) using the PureGene Gentra kit, fragmented to ˜180 bp using a Covaris sonicator, and larger size fragments excluded using Ampure beads to mimic the fragment size of cell-free DNA. DNA sequencing libraries were constructed from 83 ng of fragmented DNA using the KAPA Hyper Prep Kit (Kapa Biosystems, Wilmington, Mass.) utilizing NEXTflex-96 DNA Barcode adapters (Bio Scientific, Austin, Tex.) adapters. To isolate DNA fragments containing known mutations, we designed biotinylated DNA capture probes (xGen Lockdown Custom Probes Mini Pool, Integrated DNA Technologies, Coralville, Iowa) targeting mutation hotspots from 48 genes tested by the clinical laboratory using the Illumina TruSeq Amplicon Cancer Panel. The barcoded libraries were pooled and then applied the custom hybrid capture library following manufacturer's instructions (IDT xGEN Lockdown protocol version 2.1). These fragments were sequenced to >10,000× read coverage using an Illumina HiSeq 2000 instrument. Resulting reads were aligned using bwa-mem and mutations detected using samtools and muTect version 1.1.4.
- Modelling Relationships Between Number of Tumor-Specific Features and Probability of Detection by Sequencing Depth
- We created 145,000 simulated genomes, with the proportion of cancer-specific methylated DMRs set to 0.001%, 0.01%, 0.1%, 1%, and 10% and consisting of 1, 10, 100, 1000 and 10000 independent DMRs respectively. We sampled 14,500 diploid genomes (representing 100 ng of DNA) from these original mixtures and further sampled 10, 100, 1000, and 10000 reads per locus to represent sequencing coverage at those depths. This process was repeated 100 times for each combination of coverage, abundance, and number of features. We estimated the frequency of successful detection of at least 1 DMR for each combination of parameters and plotted probability curves (
FIG. 1A ) to visually evaluate the influence of the number of features on the probability of successful detection conditional on sequencing depths. - Derivation of Tissue-Distinctive Features, Development of a Multi-Tissue Classifier and Validation in 450 k Data
- cfDNA MeDIP profiles were quantified using the MEDIPS R package[8], converted to RPKMs, and afterwards transformed into
log 2 counts-per-million. Subsequently, a linear model was fit using limma-trend[9] on a matrix of features that mapped to FANTOM5 enhancers, CpG Islands, CpG shores and CpG Shelves, with the percentage of spike-in methylated DNA recovered included as a covariate to control for pulldown efficiency variation. Pairwise contrasts were evaluated for each pair of tissue types and the top 150 and the bottom 150 DMRs were selected for elastic net classifier training and validation of cancer-type specificity. Performance metrics were derived by majority class votes on out-of-fold calls from the model with the highest Kappa value in cross-validation, a heuristic previously employed in Chakravarthy et al[10]. - Machine learning analyses for evaluation of classification accuracy
- Model Training and Evaluation on the Discovery Cohort
- In order to evaluate the performance of cfMeDIP data in tumor classification without high computational cost, we reduced the initial set of possible candidate features to windows encompassing CpG Islands, shores, shelves and FANTOM5 enhancers (hereby labelled “regulatory features”), yielding a matrix of 196 samples and 505,027 features. We then used the caret R package to partition the discovery cohort data into 50 independent training and test sets in an 80%-20% manner (
FIG. 2A ). The splits were performed while class proportions across the discovery cohort were maintained. Then, we selected the top 300 DMRs by moderated t-statistic (150 hypermethylated, 150 hypomethylated) on the training data partition using limma-trend for each class versus other classes. A binomial GLMnet was then trained using these DMRs (up to 300 DMRs×7 other classes=2100 features) with the use of 3 iterations of 10-Fold Cross-Validation (CV) to optimize values of the mixing parameter (alpha, values=0, 0.2, 0.5, 0.8 and 1) and the penalty (lambda, values=0-0.05 in increments of 0.01) using Cohen's Kappa as the performance metric. For each training set, this yielded a collection of 6 one-class vs-other-classes binomial classifiers. - We then estimated classification performance on the held-out test set using the AUROC (area under the receiver operating characteristic curve). These estimates represent unbiased measures of classification, as the held-out test set samples were not used for either DMR pre-selection or GLMnet training and tuning. The 50 independent training and test sets also permitted for minimization of optimistic estimates due to training-set bias.
- Model Evaluation on the Validation Cohort
- For each validation cohort cfMeDIP sample, we estimated class probabilities for the AML, LUC and normal one-vs-all binomial classifiers trained on the 50 different training sets within the discovery cohort. The probabilities from the 50 models were averaged to produce a single score that was then used for AUROC estimation. We also evaluated if disease stage affected performance by estimating AUROC when either early (Stages I and II) or late stage LUC samples (Stages III and IV) were left out for the one-vs-all classifier.
- Results and Discussion
- We bioinformatically simulated mixtures with different proportions of ctDNA, from 0.001% to 10% (
FIG. 1A , column facets). We also simulated scenarios where the ctDNA had 1, 10, 100, 1000, or 10000 DMRs (Differentially Methylated Regions) as compared to normal cfDNA (FIG. 1A , row facets). Reads were then sampled at varying sequencing depths at each locus (10×, 100×, 1000×, and 10000×) (FIG. 1A , x-axis). We found an increasing probability of detecting of at least 1 cancer-specific event (FIG. 1A ) as the number of DMRs increased, even at low abundance of cancer ctDNA and shallow coverage. - Moreover, pan-cancer data from The Cancer Genome Atlas (TCGA) shows large numbers of DMRs between tumor and normal tissues across virtually all tumor types[11]. Therefore, these findings highlighted that an assay that successfully recovered cancer-specific DNA methylation alterations from ctDNA could serve as a very sensitive tool to detect, classify, and monitor malignant disease with low sequencing-associated costs.
- However, genome-wide mapping of DNA methylation in plasma cfDNA is challenging due to the very low quantities and fragmentation of DNA in circulation[12]. As a result, previous efforts at methylation profiling of cfDNA has mainly been restricted to locus specific PCR-based assays[2, 3], such as an FDA approved SEPT9 methylation assay for colorectal cancer screening[13]. While recent efforts have been made to perform whole-genome bisulfate-sequencing of fragmented cfDNA[14-16], the low genome-wide abundance of CpGs is likely to reduce the amount of useful methylation-related information available from sequencing. Therefore, the main issues with WGBS on plasma DNA are the high cost, low efficiency, and DNA losses associated with the bisulfate conversion. On the other hand, a method that selectively enriches for CpG-rich features prone to methylation is likely to maximize the amount of useful information available per read, decrease the cost, and decrease the DNA losses.
- A Genome-Wide Method Suitable for cfDNA Methylation Mapping
- We developed a new method termed cfMeDIP-seq (cell-free Methylated DNA Immunoprecipitation and high-throughput sequencing) to perform genome-wide DNA methylation mapping using cell-free DNA. The cfMeDIP-seq method described here was developed through the modification of an existing low input MeDIP-seq protocol[7] that in our experience is very robust down to 100 ng of input DNA. However, the majority of plasma samples yield much less than 100 ng of DNA. To overcome this challenge, we added exogenous λ DNA (filler DNA) to the adapter-ligated cfDNA library in order to artificially inflate the amount of starting DNA to 100 ng. This minimizes the amount of non-specific binding by the antibody and also minimizes the amount of DNA lost due to binding to plasticware. The filler DNA consisted of amplicons similar in size to an adapter-ligated cfDNA library and was composed of unmethylated and in vitro methylated DNA at different CpG densities. The addition of this filler DNA also serves a practical use, as different patients will yield different amounts of cfDNA, allowing for the normalization of input DNA amount to 100 ng. This ensures that the downstream protocol remains exactly the same for all samples regardless of the amount of available cfDNA.
- We first validated the cfMeDIP-seq protocol using DNA from human colorectal cancer cell line HCT116, sheared to a fragment size similar to that observed in cfDNA. HCT116 was chosen because of the availability of public DNA methylation data. We simultaneously performed the gold standard MeDIP-seq protocol[7] using 100 ng of sheared cell line DNA and the cfMeDIP-seq protocol using 10 ng, 5 ng, and 1 ng of the same sheared cell line DNA. This was performed in two biological replicates. For all the conditions, we obtained more than 99% specificity of the reaction (1−[recovery of spiked-in unmethylated control DNA over recovery of spiked-in methylated control DNA]), and a very high enrichment of a known methylated region over an unmethylated region (TSH2B0 and GAPDH, respectively) (
FIG. 6F ). - The libraries were sequenced to saturation (
FIGS. 6A-6E ) at around 30 to 70 million reads per library (Supplementary Table 1). The raw reads were aligned to both the human genome and the λ genome, and found virtually no alignment was found to the λ genome (Supplementary Table 1). Therefore, the addition of the exogenous 2, DNA as filler DNA did not interfere with the generation of sequencing data. Finally, we calculate the CpG Enrichment Score as a quality control measure for the immunoprecipitation step[8]. All the libraries showed similar enrichment for CpGs while the input control, as expected, showed no enrichment (FIG. 6G ), validating our immunoprecipitations even at extremely low inputs (ing). - Genome-wide correlation estimates comparing different input DNA levels show that both MeDIP-seq (100 ng) and cfMeDIP-seq (10, 5, and 1 ng) methods were very robust, with Pearson correlation of at least 0.94 between any two biological replicates (
FIG. 1B ). The analysis also demonstrates that cfMeDIP-seq at 5 and 10 ng of input DNA can robustly recapitulate the methylation profile obtained by traditional MeDIP-seq at 100 ng (Pairwise Pearson correlation of at least 0.9) (FIG. 1B ). The performance of cfMeDIP-seq at 1 ng of input DNA is reduced compared to MeDIP-seq at 100 ng but still shows a strong Pearson correlation at >0.7 (FIG. 1B ). We also observed that the cfMeDIP-seq protocol recapitulates the DNA methylation profile of HCT116 using gold standard RRBS (Reduced Representation Bisulfite Sequencing) and WGBS (Whole-Genome Bisulfite Sequencing) (FIG. 1C ). Altogether, our data suggests that cfMeDIP-seq is a robust protocol for genome-wide methylation mapping of fragmented and low input DNA material, such as circulating cfDNA. - cfMeDIP-Seq Displays High-Sensitivity for Detection of Tumor-Derived ctDNA
- To evaluate the sensitivity of the cfMeDIP-seq protocol, we performed a serial dilution of Colorectal Cancer (CRC) HCT116 cell line DNA into a Multiple Myeloma (MM) MM1.S cell line DNA, both sheared to mimic cfDNA sizes. We diluted the CRC DNA from 100%, 10%, 1%, 0.1%, 0.01%, 0.001%, to 0% and performed cfMeDIP-seq on each of these dilutions. We also performed ultra-deep (10,000× median coverage) targeted sequencing for detection of three point mutations in the same samples. The observed number of DMRs identified at each CRC dilution point versus the pure MM DNA using a 5% False Discovery rate (FDR) threshold was almost perfectly linear (r2=0.99, p<0.0001) with the expected number of DMRs based on the dilution factor (
FIG. 1D ) down to a 0.001% dilution. Moreover, the DNA methylation signal within these DMRs also shows almost perfect linearity (r2=0.99, p<0.0001) between the observed versus expected signal (FIG. 1E ; Supplementary Table 2B). In comparison, beyond the 1% dilution, ultra-deep targeted sequencing could not reliably distinguish between the CRC-specific variants and the spurious variants due to PCR or sequencing-errors (FIG. 1F ; Supplementary Table 2A). Thus, cfMeDIP-seq displays excellent sensitivity for the detection of cancer-derived DNA, exceeding the performance of variant detection by ultra-deep targeted sequencing using a standard protocol. - Cancer DNA is frequently hypermethylated at CpG-rich regions[17]. Since cfMeDIP-seq specifically targets methylated CpG-rich sequences, we hypothesized that ctDNA would be preferentially enriched during the immunoprecipitation procedure. To test this, we generated patient-derived xenografts (PDXs) from two colorectal cancer patients and collected the mouse plasma. Tumor-derived human cfDNA was present at less than 1% frequency within the total cfDNA pool in the input samples and at 2-fold greater abundance following immunoprecipitation (
FIG. 1G ; Supplementary Table 3). These results suggest that through biased sequencing of ctDNA, the cfMeDIP procedure could further increase ctDNA detection sensitivity. - Circulating Plasma cfDNA Methylation Profile can Distinguish Between Multiple Cancer Types and Healthy Donors
- DNA methylation patterns are tissue-specific, and have been used to stratify cancer patients into clinically relevant disease subgroups in glioblastoma[18], ependymomas[6], colorectal[19], and breast[20, 21], among many other cancer types. We asked if cfDNA associated profiles could be used to identify tissues-of-origin for multiple tumor types. To this end, we profiled 196 samples from 5 different tumor types and normal controls from early and late stage tumors. We used linear modeling to identify the top 300 DMRs mapping to CpG shores, shelves, islands and FANTOM5 enhancers for each pairwise comparison, leading to a total of 2,100 unique DMRs (
FIG. 2A ). Density clustering based on t-Distributed Stochastic Neighbor Embedding (tSNE)[22] of the 196 plasma samples based on the methylation status of these features revealed distinct clustering of samples based on tissue-of-origin and tumor types (FIG. 2B ,C). Using an elastic net multi-cancer classifier fit with these features (FIG. 2A ), we observed highly accurate discrimination between different tumor types (FIG. 2D ). - Discrimination of Disease Subtypes
- We evaluated the ability of cfDNA MeDIP profiles to discriminate between disease subtypes in five distinct cases—gene expression pattern (ER status in breast cancer), copy number aberration (HER2 status in breast cancer), rearrangement (FLT3 ITD status in AML), point mutation (IDH mutation in GBM), and finally histology in lung cancer. In each case, linear models were used to select and rank features as described earlier. In each case, hierarchical clustering was used to evaluate the grouping of samples. Density clustering based on t-Distributed Stochastic Neighbor Embedding (tSNE)[22] based on the methylation status of selected features revealed distinct clustering of samples based on each of these five distinct examples of cancer subtype classification.
- Detection of Cancers and Classification of Cancer Types Using Machine Learning
- In order to rigorously evaluate the ability of cfMeDIP profiles to detect cancers and further classify cancer types, we then conducted a set of machine learning analyses on our discovery cohort. To allow for accelerated computational analysis, we initially reduced our cfMeDIP discovery cohort to features mapping to CpG islands, shores, shelves and FANTOM5 enhancers (n=505,027 windows). We then implemented a strategy on our discovery cohort samples to derive unbiased estimates of performance, while accounting for training-set biases.
- Herein, we split the discovery cohort into balanced training and test sets (80% training set, 20% test set). Using only the samples in the training set, we selected the top 300 DMRs for each class (sample type) versus other classes, based on limma-trend test statistics, and trained a series of one-versus-other-classes GLMnets using these features on the training set data. The training procedure consisted of 3 rounds of 10-Fold Cross-Validation (CV) across a grid of values for alpha and lambda with optimisation for Cohen's Kappa. The use of multiple rounds of 10-Fold CV was motivated by a desire to leverage additional randomisation for more generalisable model tuning.
- Performance was then evaluated using AUROC (area under the receiver operating characteristic curve) derived from test set samples (held-out during the DMR selection and the subsequent GLMnet training/tuning steps). This process was repeated with 50 different splits of the discovery cohort into training and test sets to mitigate the influence of training-set biases. This culminated in a collection of 50 models for each one-vs other-classes comparison (480 models in total). Hereby, we refer to this collection of models as E50.
- Subsequently, we evaluated performance across batches by generating a validation cohort of additional 152 plasma samples: AML (n=35), lung cancer (n=55) and healthy control (n=62) samples. For each class, we averaged the class probabilities output by the models in E50, and estimated AUROC for the one class vs. all others classes (
FIG. 3A ). The classifiers showed high AUROC values for the classification of AML vs others (0.993), LUC vs others (0.943) and normal vs others (1.000). This further confirmed the ability of cfMeDIP-seq coupled with a machine learning approach to accurately detect and classify tumor type. Finally, we observed that the classifiers were as accurate in early stage samples (0.950) as in late stage samples (0.934) (FIG. 3B ), suggested that this approach is applicable for cancer early detection and for detection of cancer at both early stages and late stages. - Additional Advantages of cfDNA Methylome Profiling with cfMeDIP-Seq
- The ability of cfDNA methylation patterns to accurately represent tissue-of-origin also overcomes limitations of mutation-based assays, wherein specificity for tissues-of-origin may be low due to the recurrent nature of many potential driver mutations across cancers in different tissues[23]. Mutation based assays may also be rendered insensitive by the clonal structure of tumors, where subclonal drivers may be harder to detect by virtue of lower abundance in ctDNA[24]. Mutation based ctDNA approaches are also vulnerable to potential confounding by driver mutations in benign tissues, which have been observed[25], and documented to display evidence of positive selection[26].
- Taken together, our findings—based on the largest collection of cancer cfDNA methylomes derived to date—establish cfMeDIP-seq as an efficient and cost-effective tool with the potential to influence management of cancer and early detection. The accuracy and versatility of cfMeDIP-seq may be useful to inform therapeutic decisions in settings where resistance is correlated to epigenetic alterations, such as sensitivity to androgen receptor inhibition in prostate cancer[27]. The potential opportunities for early diagnosis and screening may be particularly evident in lung cancer, a disease in which screening has already shown clinical utility but for which existing screening tests (i.e., low dose CT scanning) has significant limitations such as ionizing radiation exposure and high false positive rate.
- In conclusion, our findings underscore the utility of cfDNA methylation profiles as a basis for non-invasive, cost-effective, sensitive, highly accurate early tumor detection, multi-cancer classification, and cancer subtype classification.
-
TABLE 1 Number of reads and mapping efficiency of sequenced MeDIP-seq (100 ng Rep 1 and Rep 2) and cfMeDIP-seq (10 ng, 5 ng and 1 ng, Rep1 and Rep 2) libraries prepared using various tarring inputs of HCT116 cell line DNA sheared to mimic cfDNA, to human (Hg19) genome and λ genome. Two biological replicates were used for starting input DNA. For starting inputs less than 100 ng, the samples were topped up with exogenous λ DNA to artificially increase the starting amount to 100 ng prior to MeDIP. # of aligned reads to Mapping efficiency to # of aligned reads Mapping efficiency Sample #of raw reads human genome (Hg19) human genome (Hg19) to λ genome to λ genome Input 74,504,053 71,343,168 95.76 12 0.00 100 ng Replicate 1 55,396,238 50,472,273 91.11 0 0.00 100 ng Replicate 2 66,569,209 60,770,277 91.29 1 0.00 10 ng Replicate 1 70,054,607 64,020,441 91.39 0 0.00 10 ng Replicate 2 58,297,539 53,308,777 91.44 0 0.00 5 ng Replicate 1 65,845,430 60,540,743 91.94 1 0.00 5 ng Replicate 2 64,750,879 59,358,412 91.67 0 0.00 1 ng Replicate 1 35,102,361 32,258,451 91.90 0 0.00 1 ng Replicate 2 33,881,118 31,194,711 92.07 0 0.00 -
TABLE 2A Mean coverage of ultra-deep targetd variant sequencing using dilution series of CRC cell line HCT116 DNA into MM cell line MM1.S DNA DCS (duplex consensus Dilution (% Uncollapsed SSCS (single strand sequences) of CRC reads mean consensus sequences) mean DNA) target coverage mean target coverage target coverage 100 155,964 4284 655 10 154,657 4877 654 1 154,419 4890 654 0.1 183,271 5674 887 0.01 238,291 8068 1602 0.001 199,766 7337 1299 0.0001 187,695 6891 1181 0 216,434 7721 1412 -
TABLE 2B Resultant observed DMRs and DNA methylation signal from the dilution series of CRC cell line HCT116 DNA into MM cell line MM1.S DNA Dilution (% of Observed number of Observed DNA methylation signal CRC DNA) DMRs (sum of RPKMs within DMRs) 100 111,472 645,683.90 10 1,597 8,775.61 1 692 4,521.60 0.1 12 75.71 0.01 8 79.73 0.001 2 22.42 -
TABLE 3 Number of reads and mapping efficiency of cfMeDIP- seq libraries of PDX and Input Control samples after aligning to human (Hg19) genome # of Aligned reads Mapping # of to human genome efficiency Sample Raw reads (Hg19) to human genome Input Control 1 45,857,633 389,073 0.83 Input Control 235,658,454 283,799 0.80 PDX 149,997,949 1,080,277 2.16 PDX 234,802,767 614,988 1.77 - Although preferred embodiments of the invention have been described herein, it will be understood by those skilled in the art that variations may be made thereto without departing from the spirit of the invention or the scope of the appended claims. All documents disclosed herein, including those in the following reference list, are incorporated by reference.
-
- 1. Diaz, L. A., Jr. and A. Bardelli, Liquid biopsies: genotyping circulating tumor DNA. J Clin Oncol, 2014. 32(6): p. 579-86.
- 2. Lehmann-Werman, R., et al., Identification of tissue-specific cell death using methylation patterns of circulating DNA. Proc Natl Acad Sci USA, 2016. 113(13): p. E1826-34.
- 3. Visvanathan, K., et al., Monitoring of Serum DNA Methylation as an Early Independent Marker of Response and Survival in Metastatic Breast Cancer: TBCRC 005 Prospective Biomarker Study. J Clin Oncol, 2016: p. JCO2015662080.
- 4. Newman, A. M., et al., An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med, 2014. 20(5): p. 548-54.
- 5. Aravanis, A. M., M. Lee, and R. D. Klausner, Next-Generation Sequencing of Circulating Tumor DNA for Early Cancer Detection. Cell, 2017. 168(4): p. 571-574.
- 6. Mack, S. C., et al., Epigenomic alterations define lethal CIMP-positive ependymomas of infancy. Nature, 2014. 506(7489): p. 445-50.
- 7. Taiwo, O., et al., Methylome analysis using MeDIP-seq with low DNA concentrations. Nat Protoc, 2012. 7(4): p. 617-36.
- 8. Lienhard, M., et al., MEDIPS: genome-wide differential coverage analysis of sequencing data derived from DNA enrichment experiments. Bioinformatics, 2014. 30(2): p. 284-6.
- 9. Law, C. W., et al., voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol, 2014. 15(2): p. R29.
- 10. Chakravarthy, A., et al., Human Papillomavirus Drives Tumor Development Throughout the Head and Neck: Improved Prognosis Is Associated With an Immune Response Largely Restricted to the Oropharynx. J Clin Oncol, 2016. 34(34): p. 4132-4141.
- 11. Hoadley, K. A., et al., Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell, 2014. 158(4): p. 929-44.
- 12. Fleischhacker, M. and B. Schmidt, Circulating nucleic acids (CNAs) and cancer—a survey. Biochim Biophys Acta, 2007. 1775(1): p. 181-232.
- 13. Potter, N. T., et al., Validation of a real-time PCR-based qualitative assay for the detection of methylated SEPT9 DNA in human plasma. Clin Chem, 2014. 60(9): p. 1183-91.
- 14. Legendre, C., et al., Whole-genome bisulfite sequencing of cell free DNA identifies signature associated with metastatic breast cancer. Clin Epigenetics, 2015. 7: p. 100.
- 15. Sun, K., et al., Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments. Proc Natl Acad Sci USA, 2015. 112(40): p. E5503-12.
- 16. Chan, K. C., et al., Noninvasive detection of cancer-associated genome-wide hypomethylation and copy number aberrations by plasma DNA bisulfite sequencing. Proc Natl Acad Sci USA, 2013. 110(47): p. 18761-8.
- 17. Sharma, S., T. K. Kelly, and P. A. Jones, Epigenetics in cancer. Carcinogenesis, 2010. 31(1): p. 27-36.
- 18. Sturm, D., et al., Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell, 2012. 22(4): p. 425-37.
- 19. Hinoue, T., et al., Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Res, 2012. 22(2): p. 271-82.
- 20. Stirzaker, C., et al., Methylome sequencing in triple-negative breast cancer reveals distinct methylation clusters with prognostic value. Nat Commun, 2015. 6: p. 5899.
- 21. Fang, F., et al., Breast cancer methylomes establish an epigenomic foundation for metastasis. Sci Transl Med, 2011. 3(75): p. 75ra25.
- 22. Laurens van der Maaten, G. H., Visualizing Data using t-SNE. Journal of Machine Learning Research, 2008. 9: p. 2579-2605.
- 23. Kandoth, C., et al., Mutational landscape and significance across 12 major cancer types. Nature, 2013. 502(7471): p. 333-9.
- 24. McGranahan, N., et al., Clonal status of actionable driver events and the timing of mutational processes in cancer evolution. Sci Transl Med, 2015. 7(283): p. 283ra54.
- 25. Zauber, P., S. Marotta, and M. Sabbath-Solitare, KRAS gene mutations are more common in colorectal villous adenomas and in situ carcinomas than in carcinomas. Int J Mol Epidemiol Genet, 2013. 4(1): p. 1-10.
- 26. Martincorena, I., et al., Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science, 2015. 348(6237): p. 880-6.
- 27. Beltran, H., et al., Divergent clonal evolution of castration-resistant neuroendocrine prostate cancer. 2016. 22(3): p. 298-305.
Claims (27)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/668,314 US20220251665A1 (en) | 2017-07-12 | 2022-02-09 | Cancer detection and classification using methylome analysis |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201762531527P | 2017-07-12 | 2017-07-12 | |
PCT/CA2018/000141 WO2019010564A1 (en) | 2017-07-12 | 2018-07-11 | Cancer detection and classification using methylome analysis |
US202016630299A | 2020-01-10 | 2020-01-10 | |
US17/668,314 US20220251665A1 (en) | 2017-07-12 | 2022-02-09 | Cancer detection and classification using methylome analysis |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/630,299 Continuation US20200308651A1 (en) | 2017-07-12 | 2018-07-11 | Cancer detection and classification using methylome analysis |
PCT/CA2018/000141 Continuation WO2019010564A1 (en) | 2017-07-12 | 2018-07-11 | Cancer detection and classification using methylome analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220251665A1 true US20220251665A1 (en) | 2022-08-11 |
Family
ID=65000926
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/630,299 Pending US20200308651A1 (en) | 2017-07-12 | 2018-07-11 | Cancer detection and classification using methylome analysis |
US17/668,314 Abandoned US20220251665A1 (en) | 2017-07-12 | 2022-02-09 | Cancer detection and classification using methylome analysis |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/630,299 Pending US20200308651A1 (en) | 2017-07-12 | 2018-07-11 | Cancer detection and classification using methylome analysis |
Country Status (8)
Country | Link |
---|---|
US (2) | US20200308651A1 (en) |
EP (1) | EP3652741A4 (en) |
JP (2) | JP2020537487A (en) |
KR (2) | KR102628878B1 (en) |
CN (1) | CN111094590A (en) |
BR (1) | BR112020000681A2 (en) |
CA (1) | CA3069754A1 (en) |
WO (1) | WO2019010564A1 (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109415763A (en) * | 2016-05-03 | 2019-03-01 | 大学健康网络 | Capture the cell-free method of methylate DNA and application thereof |
IL302912A (en) | 2016-12-22 | 2023-07-01 | Guardant Health Inc | Methods and systems for analyzing nucleic acid molecules |
US20230026916A1 (en) * | 2018-07-05 | 2023-01-26 | Active Genomes Expressed Diagnostics, Inc | Viral Oncogene Influences and Gene Expression Patterns as Indicators of Early Tumorigenesis |
CN109097439A (en) * | 2018-09-04 | 2018-12-28 | 上海交通大学 | A method of detecting a small amount of sample complete genome DNA methylation |
CN114616343A (en) | 2019-09-30 | 2022-06-10 | 夸登特健康公司 | Compositions and methods for analyzing cell-free DNA in methylation partition assays |
JP2022551926A (en) * | 2019-10-11 | 2022-12-14 | グレイル リミテッド ライアビリティ カンパニー | Cancer classification by thresholding tissue of origin |
GB2609715A (en) * | 2019-11-06 | 2023-02-15 | Univ Health Network | Synthetic spike-in controls for cell-free MeDIP sequencing and methods of using same |
CN111154846A (en) * | 2020-01-13 | 2020-05-15 | 四川大学华西医院 | Detection method of methylated nucleic acid |
WO2021227950A1 (en) * | 2020-05-09 | 2021-11-18 | 广州燃石医学检验所有限公司 | Cancer prognostic method |
WO2021253138A1 (en) * | 2020-06-19 | 2021-12-23 | University Health Network | Multimodal analysis of circulating tumor nucleic acid molecules |
CN112382342A (en) * | 2020-11-24 | 2021-02-19 | 山西三友和智慧信息技术股份有限公司 | Cancer methylation data classification method based on integrated feature selection |
CN112820407B (en) * | 2021-01-08 | 2022-06-17 | 清华大学 | Deep learning method and system for detecting cancer by using plasma free nucleic acid |
CN114507737A (en) * | 2022-03-22 | 2022-05-17 | 杭州医学院 | Detection primer combination and kit for methylation marker of asbestos-related disease and construction method of amplicon sequencing library |
KR102491322B1 (en) * | 2022-03-29 | 2023-01-27 | 주식회사 아이엠비디엑스 | Preparation Method Using Multi-Feature Prediction Model for Cancer Diagnosis |
WO2023230289A1 (en) * | 2022-05-25 | 2023-11-30 | Adela, Inc. | Methods and systems for cell-free nucleic acid processing |
CN115274124B (en) * | 2022-07-22 | 2023-11-14 | 江苏先声医学诊断有限公司 | Dynamic optimization method of tumor early screening targeting Panel and classification model based on data driving |
US20240055073A1 (en) * | 2022-07-25 | 2024-02-15 | Grail, Llc | Sample contamination detection of contaminated fragments with cpg-snp contamination markers |
CN115376616B (en) * | 2022-10-24 | 2023-04-28 | 臻和(北京)生物科技有限公司 | Multi-classification method and device based on cfDNA multiunit science |
WO2024091028A1 (en) * | 2022-10-28 | 2024-05-02 | 주식회사 클리노믹스 | System and method for health and disease management using cell-free dna |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2370813A4 (en) * | 2008-12-04 | 2012-05-23 | Univ California | Materials and methods for determining diagnosis and prognosis of prostate cancer |
WO2014043763A1 (en) | 2012-09-20 | 2014-03-27 | The Chinese University Of Hong Kong | Non-invasive determination of methylome of fetus or tumor from plasma |
TWI813141B (en) * | 2014-07-18 | 2023-08-21 | 香港中文大學 | Methylation pattern analysis of tissues in a dna mixture |
WO2016094330A2 (en) * | 2014-12-08 | 2016-06-16 | 20/20 Genesystems, Inc | Methods and machine learning systems for predicting the liklihood or risk of having cancer |
US9984201B2 (en) * | 2015-01-18 | 2018-05-29 | Youhealth Biotech, Limited | Method and system for determining cancer status |
CN109415763A (en) * | 2016-05-03 | 2019-03-01 | 大学健康网络 | Capture the cell-free method of methylate DNA and application thereof |
-
2018
- 2018-07-11 US US16/630,299 patent/US20200308651A1/en active Pending
- 2018-07-11 KR KR1020207004066A patent/KR102628878B1/en active IP Right Grant
- 2018-07-11 CN CN201880059089.5A patent/CN111094590A/en active Pending
- 2018-07-11 BR BR112020000681-5A patent/BR112020000681A2/en unknown
- 2018-07-11 JP JP2020501564A patent/JP2020537487A/en active Pending
- 2018-07-11 EP EP18832886.8A patent/EP3652741A4/en active Pending
- 2018-07-11 KR KR1020247002257A patent/KR20240018667A/en active Application Filing
- 2018-07-11 CA CA3069754A patent/CA3069754A1/en active Pending
- 2018-07-11 WO PCT/CA2018/000141 patent/WO2019010564A1/en unknown
-
2022
- 2022-02-09 US US17/668,314 patent/US20220251665A1/en not_active Abandoned
-
2023
- 2023-07-21 JP JP2023119203A patent/JP2023139162A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
JP2020537487A (en) | 2020-12-24 |
KR20240018667A (en) | 2024-02-13 |
CN111094590A (en) | 2020-05-01 |
JP2023139162A (en) | 2023-10-03 |
BR112020000681A2 (en) | 2020-07-14 |
EP3652741A4 (en) | 2021-04-21 |
WO2019010564A1 (en) | 2019-01-17 |
KR20200032127A (en) | 2020-03-25 |
CA3069754A1 (en) | 2019-01-17 |
US20200308651A1 (en) | 2020-10-01 |
KR102628878B1 (en) | 2024-01-23 |
EP3652741A1 (en) | 2020-05-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220251665A1 (en) | Cancer detection and classification using methylome analysis | |
US11560558B2 (en) | Methods of capturing cell-free methylated DNA and uses of same | |
Vidaki et al. | Recent progress, methods and perspectives in forensic epigenetics | |
US20210230684A1 (en) | Methods and systems for high-depth sequencing of methylated nucleic acid | |
US20210156863A1 (en) | Cancer detection, classification, prognostication, therapy prediction and therapy monitoring using methylome analysis | |
JP2018513508A (en) | Systems and methods for analyzing nucleic acids | |
US20200392584A1 (en) | Methods and systems for detecting residual disease | |
AU2024203201A1 (en) | Multimodal analysis of circulating tumor nucleic acid molecules | |
US20230203473A1 (en) | Methods of capturing cell-free methylated dna and uses of same | |
Robbe | Addressing challenges of molecular precision diagnostics for cancer patients in the genomics era |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SINAI HEALTH SYSTEM, CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SHEN, SHU YI;REEL/FRAME:059422/0801 Effective date: 20200108 Owner name: UNIVERSITY HEALTH NETWORK, CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SHEN, SHU YI;REEL/FRAME:059422/0801 Effective date: 20200108 Owner name: UNIVERSITY HEALTH NETWORK, CANADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DINIZ DE CARVALHO, DANIEL;BRATMAN, SCOTT VICTOR;SINGHANIA, RAJAT;AND OTHERS;SIGNING DATES FROM 20181211 TO 20181217;REEL/FRAME:059422/0652 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION |