US11427874B1 - Methods and systems for detection of prostate cancer by DNA methylation analysis - Google Patents
Methods and systems for detection of prostate cancer by DNA methylation analysis Download PDFInfo
- Publication number
- US11427874B1 US11427874B1 US16/995,180 US202016995180A US11427874B1 US 11427874 B1 US11427874 B1 US 11427874B1 US 202016995180 A US202016995180 A US 202016995180A US 11427874 B1 US11427874 B1 US 11427874B1
- Authority
- US
- United States
- Prior art keywords
- dna
- methylation
- prostate cancer
- dna molecules
- subject
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 208000000236 Prostatic Neoplasms Diseases 0.000 title claims abstract description 320
- 206010060862 Prostate cancer Diseases 0.000 title claims abstract description 319
- 238000000034 method Methods 0.000 title claims abstract description 84
- 230000007067 DNA methylation Effects 0.000 title claims abstract description 71
- 238000001514 detection method Methods 0.000 title abstract description 3
- 238000004458 analytical method Methods 0.000 title description 12
- 108020004414 DNA Proteins 0.000 claims abstract description 238
- 230000011987 methylation Effects 0.000 claims abstract description 121
- 238000007069 methylation reaction Methods 0.000 claims abstract description 121
- 239000012634 fragment Substances 0.000 claims abstract description 78
- 239000012472 biological sample Substances 0.000 claims abstract description 72
- 238000012545 processing Methods 0.000 claims abstract description 59
- 238000013467 fragmentation Methods 0.000 claims abstract description 33
- 238000006062 fragmentation reaction Methods 0.000 claims abstract description 33
- 108091029430 CpG site Proteins 0.000 claims abstract description 26
- 102000053602 DNA Human genes 0.000 claims description 191
- 239000000523 sample Substances 0.000 claims description 106
- 150000007523 nucleic acids Chemical class 0.000 claims description 73
- 102000039446 nucleic acids Human genes 0.000 claims description 69
- 108020004707 nucleic acids Proteins 0.000 claims description 69
- 238000011282 treatment Methods 0.000 claims description 66
- 206010028980 Neoplasm Diseases 0.000 claims description 53
- 230000003321 amplification Effects 0.000 claims description 48
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 48
- 238000003753 real-time PCR Methods 0.000 claims description 44
- 238000003556 assay Methods 0.000 claims description 43
- 238000003752 polymerase chain reaction Methods 0.000 claims description 39
- 238000012360 testing method Methods 0.000 claims description 34
- 238000012163 sequencing technique Methods 0.000 claims description 30
- 230000001225 therapeutic effect Effects 0.000 claims description 30
- 210000001519 tissue Anatomy 0.000 claims description 19
- 230000029087 digestion Effects 0.000 claims description 17
- LSNNMFCWUKXFEE-UHFFFAOYSA-M Bisulfite Chemical compound OS([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-M 0.000 claims description 16
- 108091008146 restriction endonucleases Proteins 0.000 claims description 16
- 238000001369 bisulfite sequencing Methods 0.000 claims description 15
- 210000004369 blood Anatomy 0.000 claims description 13
- 239000008280 blood Substances 0.000 claims description 13
- 238000007855 methylation-specific PCR Methods 0.000 claims description 11
- 238000012070 whole genome sequencing analysis Methods 0.000 claims description 11
- 238000004949 mass spectrometry Methods 0.000 claims description 10
- 210000002700 urine Anatomy 0.000 claims description 10
- 102100040141 Aminopeptidase O Human genes 0.000 claims description 9
- 101150010890 Asb3 gene Proteins 0.000 claims description 9
- 101000889627 Homo sapiens Aminopeptidase O Proteins 0.000 claims description 9
- 101000739160 Homo sapiens Secretoglobin family 3A member 1 Proteins 0.000 claims description 9
- 101000944311 Homo sapiens Uncharacterized protein C5orf49 Proteins 0.000 claims description 9
- 102100037268 Secretoglobin family 3A member 1 Human genes 0.000 claims description 9
- 230000002829 reductive effect Effects 0.000 claims description 9
- 238000012175 pyrosequencing Methods 0.000 claims description 8
- 102100032389 Ankyrin repeat and death domain-containing protein 1B Human genes 0.000 claims description 7
- 101000797935 Homo sapiens Ankyrin repeat and death domain-containing protein 1B Proteins 0.000 claims description 7
- 102100033120 Uncharacterized protein C5orf49 Human genes 0.000 claims description 7
- 238000002512 chemotherapy Methods 0.000 claims description 6
- 238000002271 resection Methods 0.000 claims description 6
- 108091033409 CRISPR Proteins 0.000 claims description 5
- 238000010354 CRISPR gene editing Methods 0.000 claims description 5
- 108090000652 Flap endonucleases Proteins 0.000 claims description 5
- 102000004150 Flap endonucleases Human genes 0.000 claims description 5
- LSNNMFCWUKXFEE-UHFFFAOYSA-N Sulfurous acid Chemical group OS(O)=O LSNNMFCWUKXFEE-UHFFFAOYSA-N 0.000 claims description 5
- 210000001808 exosome Anatomy 0.000 claims description 5
- 238000009169 immunotherapy Methods 0.000 claims description 5
- 238000001959 radiotherapy Methods 0.000 claims description 5
- 238000002626 targeted therapy Methods 0.000 claims description 5
- 210000002966 serum Anatomy 0.000 claims description 4
- 230000001419 dependent effect Effects 0.000 claims description 3
- 210000005267 prostate cell Anatomy 0.000 claims 2
- 208000023958 prostate neoplasm Diseases 0.000 claims 2
- 125000003729 nucleotide group Chemical group 0.000 description 76
- 239000002773 nucleotide Substances 0.000 description 71
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 62
- 201000010099 disease Diseases 0.000 description 39
- 238000012549 training Methods 0.000 description 25
- 239000000090 biomarker Substances 0.000 description 24
- 208000035475 disorder Diseases 0.000 description 23
- 239000003550 marker Substances 0.000 description 23
- 201000011510 cancer Diseases 0.000 description 22
- 108090000623 proteins and genes Proteins 0.000 description 22
- 230000015654 memory Effects 0.000 description 20
- 239000013615 primer Substances 0.000 description 20
- 229920002477 rna polymer Polymers 0.000 description 20
- 230000035945 sensitivity Effects 0.000 description 20
- 238000003860 storage Methods 0.000 description 19
- 238000003745 diagnosis Methods 0.000 description 14
- 238000006243 chemical reaction Methods 0.000 description 13
- 102000007066 Prostate-Specific Antigen Human genes 0.000 description 12
- 108010072866 Prostate-Specific Antigen Proteins 0.000 description 12
- 230000009471 action Effects 0.000 description 12
- 238000001574 biopsy Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 12
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 9
- 230000001613 neoplastic effect Effects 0.000 description 9
- 230000000875 corresponding effect Effects 0.000 description 7
- 238000007847 digital PCR Methods 0.000 description 7
- 238000004393 prognosis Methods 0.000 description 7
- 238000002591 computed tomography Methods 0.000 description 6
- 238000009396 hybridization Methods 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 238000012544 monitoring process Methods 0.000 description 6
- 230000000295 complement effect Effects 0.000 description 5
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical group NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000010839 reverse transcription Methods 0.000 description 5
- -1 ANKDDIB Proteins 0.000 description 4
- 206010005003 Bladder cancer Diseases 0.000 description 4
- 206010005949 Bone cancer Diseases 0.000 description 4
- 208000018084 Bone neoplasm Diseases 0.000 description 4
- 208000003174 Brain Neoplasms Diseases 0.000 description 4
- 206010006187 Breast cancer Diseases 0.000 description 4
- 208000026310 Breast neoplasm Diseases 0.000 description 4
- 206010008342 Cervix carcinoma Diseases 0.000 description 4
- 206010009944 Colon cancer Diseases 0.000 description 4
- 208000001333 Colorectal Neoplasms Diseases 0.000 description 4
- 206010014733 Endometrial cancer Diseases 0.000 description 4
- 206010014759 Endometrial neoplasm Diseases 0.000 description 4
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 4
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 4
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 4
- 206010033128 Ovarian cancer Diseases 0.000 description 4
- 206010061535 Ovarian neoplasm Diseases 0.000 description 4
- 206010061902 Pancreatic neoplasm Diseases 0.000 description 4
- 208000005718 Stomach Neoplasms Diseases 0.000 description 4
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 4
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 201000010881 cervical cancer Diseases 0.000 description 4
- 201000004101 esophageal cancer Diseases 0.000 description 4
- 206010017758 gastric cancer Diseases 0.000 description 4
- 201000010536 head and neck cancer Diseases 0.000 description 4
- 208000014829 head and neck neoplasm Diseases 0.000 description 4
- 208000032839 leukemia Diseases 0.000 description 4
- 201000007270 liver cancer Diseases 0.000 description 4
- 208000014018 liver neoplasm Diseases 0.000 description 4
- 201000005202 lung cancer Diseases 0.000 description 4
- 208000020816 lung neoplasm Diseases 0.000 description 4
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 201000001441 melanoma Diseases 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 201000002528 pancreatic cancer Diseases 0.000 description 4
- 208000008443 pancreatic carcinoma Diseases 0.000 description 4
- 210000002307 prostate Anatomy 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 238000012216 screening Methods 0.000 description 4
- 239000004055 small Interfering RNA Substances 0.000 description 4
- 201000011549 stomach cancer Diseases 0.000 description 4
- 201000005112 urinary bladder cancer Diseases 0.000 description 4
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 3
- 108091093088 Amplicon Proteins 0.000 description 3
- 241001244729 Apalis Species 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 3
- 108090000790 Enzymes Proteins 0.000 description 3
- 238000012408 PCR amplification Methods 0.000 description 3
- 238000003559 RNA-seq method Methods 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000000137 annealing Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000009534 blood test Methods 0.000 description 3
- 210000001124 body fluid Anatomy 0.000 description 3
- 210000004027 cell Anatomy 0.000 description 3
- 230000032823 cell division Effects 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 108010030074 endodeoxyribonuclease MluI Proteins 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000010348 incorporation Methods 0.000 description 3
- 238000011002 quantification Methods 0.000 description 3
- 210000003296 saliva Anatomy 0.000 description 3
- 210000004243 sweat Anatomy 0.000 description 3
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 description 2
- 102100032388 Apical junction component 1 homolog Human genes 0.000 description 2
- 102100028266 Brain-specific angiogenesis inhibitor 1-associated protein 2-like protein 2 Human genes 0.000 description 2
- 102100025488 CUGBP Elav-like family member 4 Human genes 0.000 description 2
- 102100025659 Cadherin EGF LAG seven-pass G-type receptor 1 Human genes 0.000 description 2
- 208000005623 Carcinogenesis Diseases 0.000 description 2
- 102100023510 Chloride intracellular channel protein 3 Human genes 0.000 description 2
- 102100038446 Claudin-5 Human genes 0.000 description 2
- 208000035473 Communicable disease Diseases 0.000 description 2
- 102100025620 Cytochrome b-245 light chain Human genes 0.000 description 2
- 238000001712 DNA sequencing Methods 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 102000015968 Dact3 Human genes 0.000 description 2
- 108050004246 Dact3 Proteins 0.000 description 2
- 102100037832 Docking protein 1 Human genes 0.000 description 2
- 238000002965 ELISA Methods 0.000 description 2
- 102100035218 Epidermal growth factor receptor kinase substrate 8-like protein 2 Human genes 0.000 description 2
- 102100020855 Forkhead box protein E3 Human genes 0.000 description 2
- 102100033043 G-protein coupled receptor 62 Human genes 0.000 description 2
- 102100022626 Glutamate receptor ionotropic, NMDA 2D Human genes 0.000 description 2
- 102100031341 Golgi apparatus membrane protein TVP23 homolog A Human genes 0.000 description 2
- 102100035786 Guanine nucleotide-binding protein G(I)/G(S)/G(O) subunit gamma-7 Human genes 0.000 description 2
- 102100034052 Heat shock factor protein 5 Human genes 0.000 description 2
- 101000797924 Homo sapiens Apical junction component 1 homolog Proteins 0.000 description 2
- 101000935881 Homo sapiens Brain-specific angiogenesis inhibitor 1-associated protein 2-like protein 2 Proteins 0.000 description 2
- 101000914306 Homo sapiens CUGBP Elav-like family member 4 Proteins 0.000 description 2
- 101000914155 Homo sapiens Cadherin EGF LAG seven-pass G-type receptor 1 Proteins 0.000 description 2
- 101000906641 Homo sapiens Chloride intracellular channel protein 3 Proteins 0.000 description 2
- 101000882896 Homo sapiens Claudin-5 Proteins 0.000 description 2
- 101000856723 Homo sapiens Cytochrome b-245 light chain Proteins 0.000 description 2
- 101000805172 Homo sapiens Docking protein 1 Proteins 0.000 description 2
- 101001024566 Homo sapiens Ecto-ADP-ribosyltransferase 4 Proteins 0.000 description 2
- 101000876686 Homo sapiens Epidermal growth factor receptor kinase substrate 8-like protein 2 Proteins 0.000 description 2
- 101000931489 Homo sapiens Forkhead box protein E3 Proteins 0.000 description 2
- 101000871128 Homo sapiens G-protein coupled receptor 62 Proteins 0.000 description 2
- 101000972840 Homo sapiens Glutamate receptor ionotropic, NMDA 2D Proteins 0.000 description 2
- 101000795972 Homo sapiens Golgi apparatus membrane protein TVP23 homolog A Proteins 0.000 description 2
- 101001073247 Homo sapiens Guanine nucleotide-binding protein G(I)/G(S)/G(O) subunit gamma-7 Proteins 0.000 description 2
- 101001016871 Homo sapiens Heat shock factor protein 5 Proteins 0.000 description 2
- 101000960337 Homo sapiens Intercellular adhesion molecule 5 Proteins 0.000 description 2
- 101001026236 Homo sapiens Intermediate conductance calcium-activated potassium channel protein 4 Proteins 0.000 description 2
- 101001047043 Homo sapiens Kelch repeat and BTB domain-containing protein 11 Proteins 0.000 description 2
- 101000613958 Homo sapiens Lysine-specific demethylase 2A Proteins 0.000 description 2
- 101001043354 Homo sapiens Lysyl oxidase homolog 3 Proteins 0.000 description 2
- 101000634545 Homo sapiens Neuronal PAS domain-containing protein 3 Proteins 0.000 description 2
- 101000992104 Homo sapiens Obscurin-like protein 1 Proteins 0.000 description 2
- 101001067170 Homo sapiens Plexin-B2 Proteins 0.000 description 2
- 101001064864 Homo sapiens Polyunsaturated fatty acid lipoxygenase ALOX12 Proteins 0.000 description 2
- 101001047090 Homo sapiens Potassium voltage-gated channel subfamily H member 2 Proteins 0.000 description 2
- 101001088739 Homo sapiens Probable inactive ribonuclease-like protein 12 Proteins 0.000 description 2
- 101000958299 Homo sapiens Protein lyl-1 Proteins 0.000 description 2
- 101001076721 Homo sapiens RNA-binding protein 38 Proteins 0.000 description 2
- 101001099922 Homo sapiens Retinoic acid-induced protein 1 Proteins 0.000 description 2
- 101000733264 Homo sapiens Rho guanine nucleotide exchange factor 33 Proteins 0.000 description 2
- 101000581125 Homo sapiens Rho-related GTP-binding protein RhoF Proteins 0.000 description 2
- 101000616406 Homo sapiens SH2B adapter protein 2 Proteins 0.000 description 2
- 101000880777 Homo sapiens SH3 and cysteine-rich domain-containing protein 2 Proteins 0.000 description 2
- 101000653757 Homo sapiens Sphingosine 1-phosphate receptor 4 Proteins 0.000 description 2
- 101000664940 Homo sapiens Synaptogyrin-3 Proteins 0.000 description 2
- 101000626390 Homo sapiens Synaptotagmin-15 Proteins 0.000 description 2
- 101000714762 Homo sapiens Transmembrane protein 176A Proteins 0.000 description 2
- 101000714756 Homo sapiens Transmembrane protein 176B Proteins 0.000 description 2
- 101000644251 Homo sapiens Urotensin-2 receptor Proteins 0.000 description 2
- 101000976643 Homo sapiens Zinc finger protein ZIC 2 Proteins 0.000 description 2
- 102100039919 Intercellular adhesion molecule 5 Human genes 0.000 description 2
- 102100037441 Intermediate conductance calcium-activated potassium channel protein 4 Human genes 0.000 description 2
- 102100022827 Kelch repeat and BTB domain-containing protein 11 Human genes 0.000 description 2
- 102100040598 Lysine-specific demethylase 2A Human genes 0.000 description 2
- 102100021949 Lysyl oxidase homolog 3 Human genes 0.000 description 2
- 108091036059 MESTIT1 (gene) Proteins 0.000 description 2
- 241000124008 Mammalia Species 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 108010018525 NFATC Transcription Factors Proteins 0.000 description 2
- 102000002673 NFATC Transcription Factors Human genes 0.000 description 2
- 102100029051 Neuronal PAS domain-containing protein 3 Human genes 0.000 description 2
- 102100031914 Obscurin-like protein 1 Human genes 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 2
- 208000002193 Pain Diseases 0.000 description 2
- 102100034383 Plexin-B2 Human genes 0.000 description 2
- 102100031949 Polyunsaturated fatty acid lipoxygenase ALOX12 Human genes 0.000 description 2
- 102100022807 Potassium voltage-gated channel subfamily H member 2 Human genes 0.000 description 2
- 102100038231 Protein lyl-1 Human genes 0.000 description 2
- 102100025859 RNA-binding protein 38 Human genes 0.000 description 2
- 102100038470 Retinoic acid-induced protein 1 Human genes 0.000 description 2
- 102100033205 Rho guanine nucleotide exchange factor 33 Human genes 0.000 description 2
- 102100027608 Rho-related GTP-binding protein RhoF Human genes 0.000 description 2
- 102100021789 SH2B adapter protein 2 Human genes 0.000 description 2
- 102100037722 SH3 and cysteine-rich domain-containing protein 2 Human genes 0.000 description 2
- 108091006595 SLC15A3 Proteins 0.000 description 2
- 108091027967 Small hairpin RNA Proteins 0.000 description 2
- 108020004459 Small interfering RNA Proteins 0.000 description 2
- 102100021485 Solute carrier family 15 member 3 Human genes 0.000 description 2
- 102100029803 Sphingosine 1-phosphate receptor 4 Human genes 0.000 description 2
- 102100038648 Synaptogyrin-3 Human genes 0.000 description 2
- 102100024613 Synaptotagmin-15 Human genes 0.000 description 2
- 102100036380 Transmembrane protein 176A Human genes 0.000 description 2
- 102100036387 Transmembrane protein 176B Human genes 0.000 description 2
- 102100020942 Urotensin-2 receptor Human genes 0.000 description 2
- 102100023492 Zinc finger protein ZIC 2 Human genes 0.000 description 2
- 230000001594 aberrant effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000036952 cancer formation Effects 0.000 description 2
- 231100000504 carcinogenesis Toxicity 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011976 chest X-ray Methods 0.000 description 2
- 238000009109 curative therapy Methods 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 238000004925 denaturation Methods 0.000 description 2
- 230000036425 denaturation Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000002405 diagnostic procedure Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000006607 hypermethylation Effects 0.000 description 2
- 208000026278 immune system disease Diseases 0.000 description 2
- 208000015181 infectious disease Diseases 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000002595 magnetic resonance imaging Methods 0.000 description 2
- 230000036210 malignancy Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 2
- 108091043220 miR-3132 stem-loop Proteins 0.000 description 2
- 108091070501 miRNA Proteins 0.000 description 2
- 239000002679 microRNA Substances 0.000 description 2
- 230000009826 neoplastic cell growth Effects 0.000 description 2
- 238000007481 next generation sequencing Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000002600 positron emission tomography Methods 0.000 description 2
- KMUONIBRACKNSN-UHFFFAOYSA-N potassium dichromate Chemical compound [K+].[K+].[O-][Cr](=O)(=O)O[Cr]([O-])(=O)=O KMUONIBRACKNSN-UHFFFAOYSA-N 0.000 description 2
- 239000000092 prognostic biomarker Substances 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000004083 survival effect Effects 0.000 description 2
- 208000024891 symptom Diseases 0.000 description 2
- 239000000107 tumor biomarker Substances 0.000 description 2
- 230000005760 tumorsuppression Effects 0.000 description 2
- 238000002604 ultrasonography Methods 0.000 description 2
- 102000017907 ADRA1D Human genes 0.000 description 1
- 208000000044 Amnesia Diseases 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 102100021390 C-terminal-binding protein 1 Human genes 0.000 description 1
- 108090000994 Catalytic RNA Proteins 0.000 description 1
- 102000053642 Catalytic RNA Human genes 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 108091029523 CpG island Proteins 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 230000033616 DNA repair Effects 0.000 description 1
- 102100031780 Endonuclease Human genes 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 208000031448 Genomic Instability Diseases 0.000 description 1
- 102000000340 Glucosyltransferases Human genes 0.000 description 1
- 108010055629 Glucosyltransferases Proteins 0.000 description 1
- 102100031019 Helicase with zinc finger domain 2 Human genes 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000689696 Homo sapiens Alpha-1D adrenergic receptor Proteins 0.000 description 1
- 101001083766 Homo sapiens Helicase with zinc finger domain 2 Proteins 0.000 description 1
- 101000663187 Homo sapiens Scavenger receptor class F member 2 Proteins 0.000 description 1
- 101000651178 Homo sapiens Striated muscle preferentially expressed protein kinase Proteins 0.000 description 1
- 208000026350 Inborn Genetic disease Diseases 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 208000026139 Memory disease Diseases 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 108060004795 Methyltransferase Proteins 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 241001529936 Murinae Species 0.000 description 1
- 206010028813 Nausea Diseases 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 206010036790 Productive cough Diseases 0.000 description 1
- 239000013614 RNA sample Substances 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 108091028733 RNTP Proteins 0.000 description 1
- 208000035977 Rare disease Diseases 0.000 description 1
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 102100037076 Scavenger receptor class F member 2 Human genes 0.000 description 1
- 101100495925 Schizosaccharomyces pombe (strain 972 / ATCC 24843) chr3 gene Proteins 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- DWAQJAXMDSEUJJ-UHFFFAOYSA-M Sodium bisulfite Chemical compound [Na+].OS([O-])=O DWAQJAXMDSEUJJ-UHFFFAOYSA-M 0.000 description 1
- 102100027659 Striated muscle preferentially expressed protein kinase Human genes 0.000 description 1
- 108020004566 Transfer RNA Proteins 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 108010040002 Tumor Suppressor Proteins Proteins 0.000 description 1
- 102000001742 Tumor Suppressor Proteins Human genes 0.000 description 1
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical group O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 description 1
- 230000010261 cell growth Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 101150118453 ctbp-1 gene Proteins 0.000 description 1
- CTMZLDSMFCVUNX-VMIOUTBZSA-N cytidylyl-(3'->5')-guanosine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@H](OP(O)(=O)OC[C@@H]2[C@H]([C@@H](O)[C@@H](O2)N2C3=C(C(N=C(N)N3)=O)N=C2)O)[C@@H](CO)O1 CTMZLDSMFCVUNX-VMIOUTBZSA-N 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000104 diagnostic biomarker Substances 0.000 description 1
- XPPKVPWEQAFLFU-UHFFFAOYSA-J diphosphate(4-) Chemical compound [O-]P([O-])(=O)OP([O-])([O-])=O XPPKVPWEQAFLFU-UHFFFAOYSA-J 0.000 description 1
- 235000011180 diphosphates Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- ZMMJGEGLRURXTF-UHFFFAOYSA-N ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 description 1
- 229960005542 ethidium bromide Drugs 0.000 description 1
- 206010016256 fatigue Diseases 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000007672 fourth generation sequencing Methods 0.000 description 1
- 208000016361 genetic disease Diseases 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000006984 memory degeneration Effects 0.000 description 1
- 208000023060 memory loss Diseases 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- WSFSSNUMVMOOMR-NJFSPNSNSA-N methanone Chemical compound O=[14CH2] WSFSSNUMVMOOMR-NJFSPNSNSA-N 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000008693 nausea Effects 0.000 description 1
- 208000015122 neurodegenerative disease Diseases 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 230000036407 pain Effects 0.000 description 1
- 238000009595 pap smear Methods 0.000 description 1
- 244000045947 parasite Species 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 239000003755 preservative agent Substances 0.000 description 1
- 239000002987 primer (paints) Substances 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000005057 refrigeration Methods 0.000 description 1
- 238000012340 reverse transcriptase PCR Methods 0.000 description 1
- 238000003757 reverse transcription PCR Methods 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108020004418 ribosomal RNA Proteins 0.000 description 1
- 108091092562 ribozyme Proteins 0.000 description 1
- 238000007841 sequencing by ligation Methods 0.000 description 1
- 239000010454 slate Substances 0.000 description 1
- 235000010267 sodium hydrogen sulphite Nutrition 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 210000003802 sputum Anatomy 0.000 description 1
- 208000024794 sputum Diseases 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000005382 thermal cycling Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
- 239000000225 tumor suppressor protein Substances 0.000 description 1
- 230000002100 tumorsuppressive effect Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 208000016261 weight loss Diseases 0.000 description 1
- 230000004580 weight loss Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/112—Disease subtyping, staging or classification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
Definitions
- Prostate cancer is the second most common form of cancer in men worldwide, accounting for about 15% of the cancers diagnosed in men.
- PSA prostate-specific antigen
- PSA is not a cancer-specific biomarker, and its level often increases abnormally in prostate-benign patients.
- only a small minority of men with an elevated PSA level are actually found to have prostate cancer when a biopsy is performed. Therefore, there is scant evidence to establish that PSA screening for prostate cancer can save lives.
- biopsy is the only method for prostate cancer diagnosis, but high false negative rates of biopsy can leads significant percentages of men remaining undiagnosed after the first biopsy.
- marker of prostate cancer such as the tumor stage, Gleason score, and PSA level cannot accurately identify the individuals ultimately failing of a treatment.
- the present disclosure provides methods, systems, and kits for detecting prostate cancers by processing nucleic acids from biological samples (e.g., tissue samples and/or bodily fluid samples) obtained from or derived from a subject.
- biological samples e.g., tissue samples and/or bodily fluid samples
- Biological samples obtained from subjects may be analyzed to measure a presence, absence, or relative assessment of the prostate cancer.
- the analysis may be performed at a set of genomic regions, such as a panel of DNA methylation biomarker regions.
- the subjects may include subjects with prostate cancer (e.g., prostate cancer patients) and subjects without prostate cancer (e.g., normal or healthy controls).
- the present disclosure provides a method for processing or analyzing a plurality of deoxyribonucleic (DNA) molecules from a biological sample of a subject, comprising: (a) providing a first set of DNA fragments derived from a first portion of said plurality of DNA molecules upon subjecting said first portion of said plurality of DNA molecules to fragmentation conditions sufficient to fragment at least a subset of said first portion of said plurality of DNA molecules at one or more CpG sites, wherein at least a subset of said first set of DNA fragments comprises methylated nucleic acid bases; (b) providing a second set of DNA fragments derived from a second portion of said plurality of DNA molecules, wherein said second portion of said plurality of DNA molecules is not subjected to fragmentation conditions; (c) for a genomic region, processing (i) said first set of DNA fragments or derivatives thereof to yield a first quantitative measure of DNA methylation and (ii) said second set of DNA fragments or derivatives thereof to yield a second quantitative measure of DNA methyl
- said biological sample is obtained or derived from a tissue sample, a blood sample, a plasma sample, a serum sample, an exosome sample, a urine sample, a sweat sample, or a saliva sample.
- the method further comprises performing an assay selected from the group consisting of methylation-sensitive restriction enzyme (MSRE) digestion, polymerase chain reaction (PCR), quantitative PCR (qPCR), nucleic acid sequencing, target capture, mass spectrometry-based target fragmentation assay, flap endonuclease-based assay, CRISPR-based assay, methylation-specific assay comprising bisulfite treatment, methylation-specific PCR, targeted sequencing, targeted bisulfite sequencing, pyrosequencing, mass spectroscopy-based bisulfite sequencing (EpiTYPER), reduced representation bisulfite sequence (RRBS), whole genome sequencing (WGS), and a combination thereof.
- MSRE methylation-sensitive restriction enzyme
- PCR polymerase chain reaction
- qPCR quantitative PCR
- nucleic acid sequencing target capture
- target capture target capture
- mass spectrometry-based target fragmentation assay flap endonuclease-based assay
- CRISPR-based assay CRISPR-based assay
- said fragmentation conditions comprise MSRE digestion of said first portion of said plurality of DNA molecules to fragment said at least said subset of said first portion of said plurality of DNA molecules at said one or more CpG sites.
- said MSRE is selected from the group consisting of AatII, Acc65I, AccI, Acil, ACII, Afel, Agel, Apal, ApaLI, AscI, AsiSI, Aval, AvaII, Aox I, BaeI, BanI, BbeI, BceAI, BegI, BfuCI, BglI, BisI, BisI, BmgBI, BsaAI, BsaBI, BsaHI, BsaI, BseYI, BsiEI, BsiWI, BslI, BsmAI, BsmBI, BsmFI, BspDI, BsrBI, BsrFI, BssHII, BssKI, BstAPI, B
- processing said first set of DNA fragments or derivatives thereof in (c) (i) comprises subjecting said first set of DNA fragments or derivatives thereof to amplification, and wherein processing said second set of DNA fragments or derivatives thereof in (c) (ii) comprises subjecting said second set of DNA fragments or derivatives thereof to said amplification.
- said amplification comprises targeted quantitative polymerase chain reaction (qPCR) at said genomic region.
- processing said first set of DNA fragments or derivatives thereof in (c) (i) comprises determining a first cycle threshold (Ct) value for said amplification of said first set of DNA fragments or derivatives thereof at said genomic region, and wherein processing said second set of DNA fragments or derivatives thereof in (c) (ii) comprises determining a second cycle threshold (Ct) value for said amplification of said second set of DNA fragments or derivatives thereof at said genomic region.
- (c) comprises determining a reference Ct value for said amplification of said first set of DNA fragments or derivatives thereof and said second set of DNA fragments or derivatives thereof at a reference genomic region, and normalizing said first quantitative measure and said second quantitative measure using said reference Ct value.
- said normalizing comprises subtracting said reference Ct value from said first quantitative measure and said second quantitative measure.
- processing said first quantitative measure with said second quantitative measure in (d) comprises calculating an intensity ratio of said first quantitative measure and said second quantitative measure at said genomic region.
- calculating said intensity ratio comprises determining a difference between said first quantitative measure and said second quantitative measure at said genomic region.
- calculating said intensity ratio comprises determining an exponentiation of a base value and said determined difference at said genomic region.
- said base value is 2.
- calculating said intensity ratio comprises determining a reciprocal of said determined exponentiation at said genomic region.
- the method further comprises subjecting said first set of DNA fragments and said second set of DNA fragments, or derivatives thereof, to conditions sufficient to permit said methylated nucleic acid bases to be distinguished from unmethylated nucleic acid bases.
- subjecting said first set of DNA fragments and said second set of DNA fragments, or derivatives thereof, to said conditions comprises performing bisulfite treatment on first set of DNA fragments and said second set of DNA fragments, or derivatives thereof.
- the method further comprises processing said methylation profile with one or more reference methylation profiles.
- said one or more reference methylation profiles are obtained from reference biological samples of one or more additional subjects.
- said one or more additional subjects comprise healthy subjects.
- said one or more additional subjects comprise subjects having a disease or disorder.
- said disease or disorder is a cancer.
- said disease or disorder is a cancer selected from the group consisting of pancreatic cancer, liver cancer, lung cancer, colorectal cancer, leukemia, bladder cancer, bone cancer, brain cancer, breast cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, melanoma, ovarian cancer, and prostate cancer.
- said cancer is prostate cancer.
- said genomic region comprises one or more CpG sites. In some embodiments, said genomic region comprises a plurality of CpG sites. In some embodiments, said plurality of CpG sites comprises at least about 10 CpG sites.
- said genomic region comprises one or more genes selected from the group consisting of SCGB3A1, ANKDDIB, C5orf49, C9orf3, and GPR75-ASB3. In some embodiments, said genomic region is selected from Table 1, Table 2, or SEQ ID NO:1-SEQ ID NO:276.
- said genomic region comprises at least about 10, at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 120, at least about 140, at least about 160, at least about 180, at least about 200, at least about 220, at least about 240, or at least about 260 distinct genomic regions selected from Table 1, Table 2, or SEQ ID NO:1-SEQ ID NO:276. In some embodiments, said genomic region is selected from Table 2.
- the method further comprises electronically outputting a report indicative of said methylation profile.
- the method further comprises processing said methylation profile to generate a likelihood of said subject as having or being suspected of having a disease or disorder.
- said disease or disorder is a cancer.
- said disease or disorder is a cancer selected from the group consisting of pancreatic cancer, liver cancer, lung cancer, colorectal cancer, leukemia, bladder cancer, bone cancer, brain cancer, breast cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, melanoma, ovarian cancer, and prostate cancer.
- said cancer is prostate cancer.
- said likelihood is generated with a sensitivity of at least about 80%. In some embodiments, said likelihood is generated with a sensitivity of at least about 90%. In some embodiments, said likelihood is generated with a specificity of at least about 90%. In some embodiments, said likelihood is generated with a specificity of at least about 95%. In some embodiments, said likelihood is generated with an accuracy of at least about 90%. In some embodiments, said likelihood is generated with an accuracy of at least about 95%. In some embodiments, said likelihood is generated with an area under the curve (AUC) of at least about 0.90.
- AUC area under the curve
- said first set of DNA fragments and said second set of DNA fragments each comprises a first amount of external DNA molecules, wherein said external DNA molecules do not contain CpG sites.
- the present disclosure provides a method for processing or analyzing a plurality of deoxyribonucleic (DNA) molecules from a biological sample of a subject, comprising: (a) providing a first set of DNA fragments derived from a first portion of said plurality of DNA molecules upon subjecting said first portion of said plurality of DNA molecules to fragmentation conditions sufficient to fragment at least a subset of said first portion of said plurality of DNA molecules at one or more CpG sites, wherein at least a subset of said first set of DNA fragments comprises methylated nucleic acid bases; (b) providing a second set of DNA fragments derived from a second portion of said plurality of DNA molecules, wherein said second portion has a substantially equal amount of DNA as said first portion; (c) for a genomic region, processing (i) said first set of DNA fragments or derivatives thereof to yield a first quantitative measure of DNA methylation and (ii) said second set of DNA fragments or derivatives thereof to yield a second quantitative measure of DNA methylation; and (
- said biological sample is obtained or derived from a tissue sample, a blood sample, a plasma sample, a serum sample, an exosome sample, a urine sample, a sweat sample, or a saliva sample.
- the method further comprises performing an assay selected from the group consisting of methylation-sensitive restriction enzyme (MSRE) digestion, polymerase chain reaction (PCR), quantitative PCR (qPCR), nucleic acid sequencing, target capture, mass spectrometry-based target fragmentation assay, flap endonuclease-based assay, CRISPR-based assay, methylation-specific assay comprising bisulfite treatment, methylation-specific PCR, targeted sequencing, targeted bisulfite sequencing, pyrosequencing, mass spectroscopy-based bisulfite sequencing (EpiTYPER), reduced representation bisulfite sequence (RRBS), whole genome sequencing (WGS), and a combination thereof.
- MSRE methylation-sensitive restriction enzyme
- PCR polymerase chain reaction
- qPCR quantitative PCR
- nucleic acid sequencing target capture
- target capture target capture
- mass spectrometry-based target fragmentation assay flap endonuclease-based assay
- CRISPR-based assay CRISPR-based assay
- said fragmentation conditions comprise MSRE digestion of said first portion of said plurality of DNA molecules to fragment said at least said subset of said first portion of said plurality of DNA molecules at said one or more CpG sites.
- said MSRE is selected from the group consisting of AatII, Acc65I, AccI, Acil, ACII, Afel, Agel, Apal, ApaLI, AscI, AsiSI, Aval, AvaII, Aox I, BaeI, BanI, BbeI, BceAI, BcgI, BfuCI, BglI, BisI, BlsI, BmgBI, BsaAI, BsaBI, BsaHI, BsaI, BseYI, BsiEI, BsiWI, BslI, BsmAI, BsmBI, BsmFI, BspDI, BsrBI, BsrFI, BssHII, BssKI, Bst
- processing said first set of DNA fragments or derivatives thereof in (c) (i) comprises subjecting said first set of DNA fragments or derivatives thereof to amplification, and wherein processing said second set of DNA fragments or derivatives thereof in (c) (ii) comprises subjecting said second set of DNA fragments or derivatives thereof to said amplification.
- said amplification comprises targeted quantitative polymerase chain reaction (qPCR) at said genomic region.
- processing said first set of DNA fragments or derivatives thereof in (c) (i) comprises determining a first cycle threshold (Ct) value for said amplification of said first set of DNA fragments or derivatives thereof at said genomic region, and wherein processing said second set of DNA fragments or derivatives thereof in (c) (ii) comprises determining a second cycle threshold (Ct) value for said amplification of said second set of DNA fragments or derivatives thereof at said genomic region.
- (c) comprises determining a reference Ct value for said amplification of said first set of DNA fragments or derivatives thereof and said second set of DNA fragments or derivatives thereof at a reference genomic region, and normalizing said first quantitative measure and said second quantitative measure using said reference Ct value.
- said normalizing comprises subtracting said reference Ct value from said first quantitative measure and said second quantitative measure.
- processing said first quantitative measure with said second quantitative measure in (d) comprises calculating an intensity ratio of said first quantitative measure and said second quantitative measure at said genomic region.
- calculating said intensity ratio comprises determining a difference between said first quantitative measure and said second quantitative measure at said genomic region.
- calculating said intensity ratio comprises determining an exponentiation of a base value and said determined difference at said genomic region.
- said base value is 2.
- calculating said intensity ratio comprises determining a reciprocal of said determined exponentiation at said genomic region.
- the method further comprises subjecting said first set of DNA fragments and said second set of DNA fragments, or derivatives thereof, to conditions sufficient to permit said methylated nucleic acid bases to be distinguished from unmethylated nucleic acid bases.
- subjecting said first set of DNA fragments and said second set of DNA fragments, or derivatives thereof, to said conditions comprises performing bisulfite treatment on first set of DNA fragments and said second set of DNA fragments, or derivatives thereof.
- the method further comprises processing said methylation profile with one or more reference methylation profiles.
- said one or more reference methylation profiles are obtained from reference biological samples of one or more additional subjects.
- said one or more additional subjects comprise healthy subjects.
- said one or more additional subjects comprise subjects having a disease or disorder.
- said disease or disorder is a cancer.
- said disease or disorder is a cancer selected from the group consisting of pancreatic cancer, liver cancer, lung cancer, colorectal cancer, leukemia, bladder cancer, bone cancer, brain cancer, breast cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, melanoma, ovarian cancer, and prostate cancer.
- said cancer is prostate cancer.
- said genomic region comprises one or more CpG sites. In some embodiments, said genomic region comprises a plurality of CpG sites. In some embodiments, said plurality of CpG sites comprises at least about 10 CpG sites.
- said genomic region comprises one or more genes selected from the group consisting of SCGB3A1, ANKDDIB, C5orf49, C9orf3, and GPR75-ASB3. In some embodiments, said genomic region is selected from Table 1, Table 2, or SEQ ID NO:1-SEQ ID NO:276.
- said genomic region comprises at least about 10, at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 120, at least about 140, at least about 160, at least about 180, at least about 200, at least about 220, at least about 240, or at least about 260 distinct genomic regions selected from Table 1, Table 2, or SEQ ID NO:1-SEQ ID NO:276. In some embodiments, said genomic region is selected from Table 2.
- the method further comprises electronically outputting a report indicative of said methylation profile.
- the method further comprises processing said methylation profile to generate a likelihood of said subject as having or being suspected of having a disease or disorder.
- said disease or disorder is a cancer.
- said disease or disorder is a cancer selected from the group consisting of pancreatic cancer, liver cancer, lung cancer, colorectal cancer, leukemia, bladder cancer, bone cancer, brain cancer, breast cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, melanoma, ovarian cancer, and prostate cancer.
- said cancer is prostate cancer.
- said first set of DNA fragments and said second set of DNA fragments each comprises a first amount of external DNA molecules, wherein said external DNA molecules do not contain CpG sites.
- the present disclosure provides a method for identifying prostate cancer of a subject, comprising: (a) using a methylation assay to process a plurality of deoxyribonucleic acid (DNA) molecules from a biological sample of the subject to determine quantitative measures of methylation at each of one or more genes, thereby generating a DNA methylation signature of said biological sample of said subject, wherein said one or more genes comprise genes selected from the group consisting of SCGB3A1, ANKDD1B, C5orf49, C9orf3, and GPR75-ASB3; (b) comparing said DNA methylation signature with one or more reference DNA methylation signatures; and (c) based at least in part on the comparing in (b), identifying the prostate cancer of said subject.
- DNA deoxyribonucleic acid
- said biological sample is obtained or derived from a tissue sample, a blood sample, or a urine sample.
- said methylation assay comprises one or more assays selected from the group consisting of: methylation-sensitive restriction enzyme (MSRE) digestion, polymerase chain reaction (PCR), quantitative PCR (qPCR), nucleic acid sequencing, target capture, mass spectrometry-based target fragmentation assay, flap endonuclease-based assay, CRISPR-based assay, methylation-specific assay comprising bisulfite treatment, methylation-specific PCR, targeted sequencing, targeted bisulfite sequencing, pyrosequencing, mass spectroscopy-based bisulfite sequencing (EpiTYPER), reduced representation bisulfite sequence (RRBS), whole genome sequencing (WGS), and a combination thereof.
- MSRE methylation-sensitive restriction enzyme
- PCR polymerase chain reaction
- qPCR quantitative PCR
- nucleic acid sequencing target capture
- target capture target capture
- mass spectrometry-based target fragmentation assay flap endonuclease-based assay
- CRISPR-based assay C
- Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto.
- the computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
- FIG. 1 illustrates a flow-chart for a method 100 of prostate cancer identification in a subject, in accordance with disclosed embodiments.
- FIGS. 2A and 2B illustrate an example of quantitative polymerase chain reaction (qPCR) amplification plots for a control locus and two restriction loci tested in a healthy (prostate normal) sample (“N1-digested” and “N1-undigested”) and a prostate cancer sample (“T1-digested” and “T1-undigested”), respectively, in accordance with disclosed embodiments.
- qPCR quantitative polymerase chain reaction
- ROC receiver operating characteristic
- FIG. 4 illustrates a computer system that is programmed or otherwise configured to implement methods provided herein.
- nucleic acid includes a plurality of nucleic acids, including mixtures thereof.
- nucleic acid generally refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides (dNTPs) or ribonucleotides (rNTPs), or analogs thereof. Nucleic acids may have any three-dimensional structure, and may perform any function, known or unknown.
- dNTPs deoxyribonucleotides
- rNTPs ribonucleotides
- Non-limiting examples of nucleic acids include DNA, RNA, coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant nucleic acids, branched nucleic acids, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
- loci locus defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant nucleic acids, branched nucle
- a nucleic acid may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be made before or after assembly of the nucleic acid.
- the sequence of nucleotides of a nucleic acid may be interrupted by non-nucleotide components.
- a nucleic acid may be further modified after polymerization, such as by conjugation or binding with a reporter agent.
- target nucleic acid generally refers to a nucleic acid molecule in a starting population of nucleic acid molecules having a nucleotide sequence whose presence, amount, and/or sequence, or changes in one or more of these, are desired to be determined.
- a target nucleic acid may be any type of nucleic acid, including DNA, RNA, and analogs thereof.
- a “target ribonucleic acid (RNA)” generally refers to a target nucleic acid that is RNA.
- a “target deoxyribonucleic acid (DNA)” generally refers to a target nucleic acid that is DNA.
- the term “target” generally refers to a genomic region within a marker gene or marker region.
- the term “reference” generally refers to a sample obtained or derived from a subject who is diagnosed with prostate cancer (prostate cancer patient) or who has received a negative clinical indication of prostate cancer (e.g., a healthy or control subject without prostate cancer).
- locus or “region” are generally interchangeable and refer to a specific genomic region on the genome represented by chromosome number, start position, and end position.
- the term “subject,” generally refers to an entity or a medium that has testable or detectable genetic information.
- a subject can be a person or individual, such as a patient.
- a subject can be a vertebrate, such as, for example, a mammal.
- Non-limiting examples of mammals include murines, simians, humans, farm animals, sport animals, and pets.
- sample generally refers to a bodily sample or part(s) of a subject, which is obtained and analyzed to measure or to determine the character of the whole, such as a specimen of tissue, blood, or urine.
- tumor suppression genes generally refers to a group of genes directing the production of the protein that regulates cell division.
- the tumor suppressor protein can play a role in keeping cell division in check.
- a tumor suppressor gene may become unable to control cell division and lead to uncontrolled cell growth, an important mechanism in tumorigenesis.
- biomarker generally refers to any substance, structure, or process that can be measured in a subject's body or its products and be used to influence or predict a clinical outcome or disease with or without treatment, select an appropriate treatment (or predict whether treatment would be effective), or monitor a current treatment and potentially change the treatment.
- methylation refers to 5-methyl cytosine (5mc) or 5-hydroxymethylcytosine (5hmC), including cytosine residues that are part of the sequence CG, also denoted as CpG dinucleotides (cytosine residues that are part of other sequences are not methylated). Some CG dinucleotides in the human genome are methylated, and others are not.
- methylation can be cell-specific and tissue-specific, such that a specific CG dinucleotide can be methylated in a certain cell and at the same time unmethylated in a different cell, or methylated in a certain tissue and at the same time unmethylated in different tissues.
- DNA methylation can be an important regulator of gene transcription. Aberrant DNA methylation patterns, both hypermethylation and hypomethylation, as compared to normal tissue, may be associated with a large number of human malignancies.
- 5hmC residues of a sequence may be subjected to glucosylation prior to subsequent bisulfite treatment and MSRE digestion. For example, the glucosylation may be performed using a glucosyltransferase.
- methylation state As used herein, the terms “methylation state,” “methylation status,” and “methylation profile” generally refer to the presence of absence of one or more methylated nucleotide bases in the nucleic acid molecule.
- a nucleic acid molecule e.g., DNA molecule
- a methylated cytosine is considered methylated (e.g., the methylation state of the nucleic acid molecule is methylated).
- a nucleic acid molecule that does not contain any methylated nucleotides is considered unmethylated.
- bisulfite treatment generally refers to the treatment of DNA with bisulfite that converts cytosine residues to uracil residues, but leaves 5-methylcytosine residues unaffected. Therefore, DNA that has been treated with bisulfite may retain only methylated cytosines.
- the term “pyrosequencing” generally refers to a sequencing-by-synthesis method that quantitatively monitors the real-time incorporation of nucleotides through the enzymatic conversion of released pyrophosphate into a proportional light signal.
- Analysis of DNA methylation patterns by pyrosequencing may combine a simple reaction protocol with reproducible and accurate measures of the degree of methylation at several CpGs in close proximity with high quantitative resolution. After bisulfite treatment and PCR, the degree of each methylation at each CpG position in a sequence may be determined from the ratio of T and C. The process of purification and sequencing can be repeated for the same template to analyze other CpGs in the same amplification product.
- the terms “amplifying” and “amplification” are used interchangeably and generally refer to generating one or more copies or “amplified product” of a nucleic acid.
- the term “DNA amplification” generally refers to generating one or more copies of a DNA molecule or “amplified DNA product.”
- the term “reverse transcription amplification” generally refers to the generation of deoxyribonucleic acid (DNA) from a ribonucleic acid (RNA) template via the action of a reverse transcriptase. Amplification may be performed by polymerase chain reaction (PCR), which is based on using DNA polymerase to synthesize new strands of DNA complementary to the initial template strands.
- PCR polymerase chain reaction
- PCR polymerase chain reaction
- This process for amplifying the target sequence may comprise introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase.
- the two primers may be complementary to their respective strands of the double-stranded target sequence.
- the mixture may be denatured and the primers may be annealed to their complementary sequences within the target molecule.
- the primers may be extended with a polymerase so as to form a new pair of complementary strands.
- the steps of denaturation, primer annealing, and polymerase extension can be repeated many times (e.g., denaturation, annealing and extension constitute one “cycle”; there can be numerous “cycles”) to obtain a high concentration of an amplified segment of the desired target sequence.
- the length of the amplified segment of the desired target sequence may be determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter.
- the method is referred to as “polymerase chain reaction” (PCR). Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be “PCR amplified” and are “PCR products” or “amplicons.”
- DNA template generally refers to the sample DNA that contains the target sequence.
- high temperature is applied to the original double-stranded DNA molecule to separate the strands from each other.
- primer generally refers to a short piece of single-stranded DNA that are complementary to the DNA template.
- the polymerase begins synthesizing new DNA from the end of the primer.
- Ct value generally refers to the number of cycles required for the fluorescent signal to cross a given cycle threshold (e.g., at which the signal exceeds a background level). Ct levels may be inversely proportional to the amount of target nucleic acid in a sample (e.g., the lower the Ct level of a given sample, the greater the amount of target nucleic acid in the sample).
- restriction enzyme generally refers to an enzyme that cuts DNA at or near specific recognition nucleotide sequences (e.g., restriction sites).
- methylation-sensitive restriction enzyme generally refers to a restriction endonuclease that cleaves its recognition sequence only if it is unmethylated (leaving methylated sites remain intact).
- the DNA cutting intensity of a “methylation-sensitive” restriction enzyme may depend on the methylation level of the specific sequence, where higher methylation levels lead to less digestion.
- control generally refers to a sequence from a human genome that does not contain the specific sequences required for methylation-sensitive restriction enzymes to cut.
- the term “external control” generally refers to a sequence from a non-human genome that does not contain a CG site.
- MSP methylation-specific PCR
- This assay may require modification of the genomic DNA by sodium bisulfite and two independent primer sets for PCR amplification, one pair designed to recognize the methylated versions of the bisulfite-modified sequence and the other pair designed to recognize the unmethylated versions of the bisulfite-modified sequence.
- the amplicons may be visualized using ethidium bromide staining following agarose gel electrophoresis. Amplicons of the expected size produced from either primer pair may be indicative of the presence of DNA in the original sample with the respective methylation status.
- Reduced representation bisulfite sequencing generally refers to an efficient and high-throughput technique for analyzing the genome-wide methylation profiles on a single nucleotide level.
- Reduced representation bisulfite sequencing may combine restriction enzymes and bisulfite sequencing to enrich for areas of the genome with a high CpG content.
- RRBS can reduce the amount of nucleotides required to sequence to 1% of the genome.
- the fragments that comprise the reduced genome may still include the majority of promoters, as well as regions such as repeated sequences that are difficult to profile using conventional bisulfite sequencing approaches.
- targeted (bisulfite) sequencing generally refers to an accurate, efficient, and economical technology for DNA methylation analysis of target regions, which may include a hybridization-based step on platforms containing pre-designed oligonucleotides (oligos) that capture the CpG islands, gene promoters, and other significant methylated regions, or a PCR-based step to amplify multiple bisulfite-converted DNA regions in a single reaction. Specific primers may be designed to capture the region of interest and evaluate site-specific DNA methylation changes.
- sensitivity generally refers to the percentage of a set of samples that report a DNA methylation value above a threshold value that distinguishes between neoplastic (e.g., prostate cancer) and non-neoplastic (e.g., healthy or control) samples.
- a positive is defined as a histology-confirmed neoplasia that reports a DNA methylation value above a threshold value (e.g., the range associated with disease)
- a false negative is defined as a histology-confirmed neoplasia that reports a DNA methylation value below the threshold value (e.g., the range associated with no disease).
- the value of sensitivity may reflect the probability that a DNA methylation measurement for a given marker obtained from a diseased sample falls in the range of disease-associated measurements.
- the clinical relevance of the calculated sensitivity value may represent an estimation of the probability that a given marker can detect or predict the presence of a clinical condition when applied to a subject having the clinical condition.
- the term “specificity” generally refers to the percentage of non-neoplastic samples that report a DNA methylation value below a threshold value that distinguishes between neoplastic and non-neoplastic samples.
- a negative is defined as a histology-confirmed non-neoplastic sample that reports a DNA methylation value below the threshold value (e.g., the range associated with no disease) and a false positive is defined as a histology-confirmed non-neoplastic sample that reports a DNA methylation value above the threshold value (e.g., the range associated with disease).
- the value of specificity may reflect the probability that a DNA methylation measurement for a given marker obtained from a non-neoplastic (e.g., healthy or control) sample falls in the range of non-disease associated measurements.
- the clinical relevance of the calculated specificity value may represent an estimation of the probability that a given marker can detect or predict the absence of a clinical condition when applied to a subject not having the clinical condition.
- the term “AUC” or “AUROC” generally refers to an abbreviation for the area under a Receiver Operating Characteristic (ROC) curve.
- the ROC curve may be a plot of the true positive rate (TPR) against the false positive rate (FPR) for a plurality of different possible thresholds or cut points of a diagnostic test, thereby illustrating the trade-off between sensitivity and specificity depending on the selected cut point (e.g., any increase in sensitivity is accompanied by a decrease in specificity).
- the area under an ROC curve (AUC) can be a measure for the accuracy of a diagnostic test (e.g., the larger the area, the more accurate the diagnosis), with an optimal value of 1.
- a random test may have an ROC curve lying on the diagonal with an AUC of 0.5 (e.g., representing a random or worthless test).
- Prostate cancer is the second most common form of cancer in men worldwide, accounting for about 15% of the cancers diagnosed in men.
- PSA prostate-specific antigen
- PSA is not a cancer-specific biomarker, and its level often increases abnormally in prostate-benign patients.
- only a small minority of men with an elevated PSA level are actually found to have prostate cancer when a biopsy is performed. Therefore, there is scant evidence to establish that PSA screening for prostate cancer can save lives.
- biopsy is the only method for prostate cancer diagnosis, but high false negative rates of biopsy can leads significant percentages of men remaining undiagnosed after the first biopsy.
- marker of prostate cancer such as the tumor stage, Gleason score, and PSA level cannot accurately identify the individuals ultimately failing of a treatment.
- DNA methylation can occur when DNA methyltransferase adds a methyl group to a DNA molecule at a cytosine-phosphate-guanine (CpG) site without changing the sequence of the DNA molecule.
- CpG cytosine-phosphate-guanine
- DNA methylation may be an early event during tumorigenesis, and global abnormal DNA methylation may be observed in different tumor types.
- cancer can be characterized by global hypomethylation (resulting in increased oncogene expression and genomic instability) and by gene-specific promoter hypermethylation resulting in suppressed DNA repair and other tumor-suppressive functions.
- DNA methylation may be stable in fixed samples over time and may be detectable in various bodily fluids and tissue. DNA methylation may also be cell-type specific. Further, various techniques for measuring DNA methylation can be performed. In light of all these characteristics, DNA methylation may be promising targets for the development of powerful diagnostic, prognostic, and predictive biomarkers for cancers.
- the present disclosure provides methods, systems, and kits for detecting prostate cancer in a subject by analyzing nucleic acids from biological samples (e.g., tissue samples and/or bodily fluid samples) obtained from or derived from the subject for abnormal methylation profiles (e.g., relative to reference samples or methylation profiles).
- biological samples e.g., tissue samples and/or bodily fluid samples
- abnormal methylation profiles e.g., relative to reference samples or methylation profiles
- Biological samples obtained from subjects may be analyzed to measure a presence, absence, or relative assessment of the prostate cancer.
- the analysis may be performed at a set of genomic regions, such as DNA methylation marker regions.
- the subjects may include subjects with prostate cancer (e.g., prostate cancer patients) and subjects without prostate cancer (e.g., normal or healthy controls).
- sample DNA obtained or derived from a test subject can be digested with at least one methylation-sensitive restriction enzyme.
- biomarkers of the present disclosure may include genomic loci that contain at least one specific MSRE recognized sequence (recognition site).
- the sample DNA can be cut (digested) according to its methylation level, where higher methylation results in less digestion by the enzyme. For example, if a DNA sample from a healthy subject is less methylated than another DNA sample from a cancer patient for the CpGs on the recognition sequence, it will be cut more extensively.
- a control locus is designed to be without MSRE cutting sites. In some embodiments, a fixed proportion of control DNA is added into the sample DNA for all test subjects. In some embodiments, at least one pair of qPCR primers is designed for each target genomic region of a biomarker. For each patient, two qPCR reactions are run independently on the same qPCR target: a first qPCR reaction is run on a first portion of the sample DNA that contains MSRE-digested DNA template, and a second qPCR reaction is run on a second portion of the sample DNA that contains undigested DNA templates. The undigested template may be used to represent the fully methylated DNA.
- the same amount of DNA may be used for the digested and undigested templates.
- the signal intensity of the qPCR reaction may be generated from the cycle threshold (Ct) values.
- Ct cycle threshold
- the Ct difference (delta Ct) between the first qPCR reaction (run on the digested DNA template) and the second qPCR reaction (run on the undigested DNA template) is calculated and used to indicate the DNA methylation level of the subject.
- the delta Ct value can represent the subject's DNA methylation level for the target region.
- the undigested DNA may have low Ct values, while the digested DNA from a normal individual may have high Ct values, thereby resulting in large absolute delta Ct values. Otherwise, the delta Ct values from a prostate cancer patient may be small (e.g., close to 0).
- prostate cancer can be accurately detected using a non user-dependent assay with high sensitivity and specificity in prostate tissue samples.
- the blood-based assay can use a set of biomarkers that accurately distinguish prostate cancer samples from control samples across all stages of prostate cancer. Further, the blood-based assay may offer high specificity, thereby facilitating the non-invasive application of prostate cancer associated biomarkers for treatment monitoring of prostate cancer patients.
- the use of methods, systems, and kits of the present disclosure for prostate cancer detection based on analysis of aberrant methylation profiles may comprise the following steps:
- the present disclosure provides a method for identifying or monitoring prostate cancer in a subject by processing or analyzing DNA molecules from a biological sample of the subject.
- the method may comprise providing a first set of DNA fragments derived from a first portion of the DNA molecules upon subjecting the first portion to CpG site fragmentation conditions.
- DNA molecules of a urine sample may be split into two sub-samples, and the first DNA sub-sample may be MSRE-digested to fragment the DNA molecules at CpG sites.
- the two sub-samples may be of equal or substantially equal size (e.g., amount or volume).
- the method may comprise providing a second set of DNA fragments derived from a second portion of the DNA molecules, wherein the second portion is not subjected to fragmentation conditions.
- the method may comprise, for a genomic region, processing (i) the first set of DNA fragments or derivatives thereof to yield a first quantitative measure of DNA methylation and (ii) the second set of DNA fragments or derivatives thereof to yield a second quantitative measure of DNA methylation.
- the method may comprise processing the first quantitative measure with the second quantitative measure to yield a third quantitative measure of DNA methylation at the genomic region, to generate a methylation profile of the plurality of DNA molecules at the genomic region.
- FIG. 1 illustrates a flow-chart for a method 100 of prostate cancer identification in a subject, in accordance with disclosed embodiments.
- the method 100 may comprise obtaining a biological sample (e.g., tissue, blood, and/or urine sample) from a subject (e.g., a patient) (as in operation 102 ).
- DNA molecules may be extracted from the biological sample (as in operation 104 ).
- at least a first portion of the extracted DNA molecules may be subjected to CpG site fragmentation conditions, such as digestion with methylation-sensitive restriction enzymes (MSREs), while a second portion of the extracted DNA molecules may not be subjected to such fragmentation conditions (as in operation 106 ).
- CpG site fragmentation conditions such as digestion with methylation-sensitive restriction enzymes (MSREs)
- qPCR amplification of at least one biomarker locus, an internal control locus may be performed (e.g., using qPCR primers) (as in operation 108 ).
- cycle threshold (Ct) values may be obtained for each amplified region of a set of genomic regions (e.g., prostate cancer associated biomarkers) and normalized based on the internal control (as in operation 110 ).
- a probability score may be calculated, which reflects the correlation between the biomarker signal intensity in the subject and tumor references and/or the correlation between the biomarker signal intensity in the subject and normal references (as in operation 114 ).
- the biological samples may be obtained (as in operation 102 ) or derived from a tissue sample, a blood sample, a plasma sample, a serum sample, a saliva sample, a sputum sample, a urine sample, a stool sample, a sweat sample, a Pap smear sample, or an exosome sample from a human subject.
- the biological samples may be stored in a variety of storage conditions before processing, such as different temperatures (e.g., at room temperature, under refrigeration or freezer conditions, at 4° C., at ⁇ 18° C., ⁇ 20° C., or at ⁇ 80° C.) or different preservatives (e.g., alcohol, formaldehyde, or potassium dichromate).
- the biological sample may be obtained from a subject with a disease or disorder, from a subject that is suspected of having the disease or disorder, or from a subject that does not have or is not suspected of having the disease or disorder.
- the disease or disorder may be an infectious disease, an immune disorder or disease, a cancer, a genetic disease, a degenerative disease, a lifestyle disease, an injury, a rare disease or an age related disease.
- the infectious disease may be caused by bacteria, viruses, fungi, and/or parasites.
- the cancer may be a prostate cancer.
- the sample may be taken before and/or after treatment of a subject with a disease or disorder. Samples may be taken before and/or after a treatment. Samples may be taken during a treatment or a treatment regime. Multiple samples may be taken from a subject to monitor the effects of the treatment over time.
- the sample may be taken from a subject known or suspected of having a prostate cancer for which a definitive positive or negative diagnosis is not available via clinical tests.
- the sample may be taken from a subject suspected of having a disease or a disorder.
- the sample may be taken from a subject experiencing unexplained symptoms, such as fatigue, nausea, weight loss, aches and pains, weakness, or memory loss.
- the sample may be taken from a subject having explained symptoms.
- the sample may be taken from a subject at risk of developing a disease or disorder due to factors such as familial history, age, environmental exposure, lifestyle risk factors, or presence of other known risk factors.
- the biological sample obtained from the subject may be assayed to generate methylation data indicative of a presence, absence, or relative assessment of a prostate cancer of a subject.
- a presence, absence, or relative assessment of nucleic acid molecules of the biological sample at a panel of prostate cancer-associated genomic loci e.g., quantitative measures of methylation at a plurality of prostate cancer-associated genomic loci
- the biological samples obtained or derived from the subject may be processed by (i) subjecting the biological sample to conditions that are sufficient to isolate, enrich, or extract a plurality of nucleic acid molecules (e.g., DNA molecules), and (ii) assaying the plurality of nucleic acid molecules to generate a methylation profile of the nucleic acid molecules at the panel of prostate cancer-associated genomic loci.
- a plurality of nucleic acid molecules e.g., DNA molecules
- a plurality of nucleic acid molecules may be extracted from the biological sample (as in operation 104 ) and subjected to further assaying (e.g., sequencing to generate a plurality of sequencing reads).
- the nucleic acid molecules may comprise deoxyribonucleic acid (DNA) or ribonucleic acid (RNA).
- the nucleic acid molecules (e.g., DNA or RNA) may be extracted from the biological sample by a variety of methods, such as a FastDNA Kit protocol from MP Biomedicals or a DNeasy Blood & Tissue Kit from QIAGEN.
- the extraction method may extract all DNA molecules from a sample.
- the extraction method may selectively extract a portion of DNA molecules from a sample. Extracted RNA molecules from a sample may be converted to DNA molecules by reverse transcription (RT).
- the method may comprise a variety of assays suitable for assessing the presence of DNA methylation (e.g., at one or more CpG sites) at the prostate cancer-specific markers in a biological sample.
- the DNA molecules may be assayed using an assay including, for example, methylation-sensitive restriction enzyme (MSRE) digestion, polymerase chain reaction (PCR), quantitative PCR (qPCR), digital PCR (dPCR), nucleic acid sequencing, target capture, mass spectrometry-based target fragmentation assay, flap endonuclease-based assay, CRISPR-based assay, methylation-specific assay comprising bisulfite treatment, methylation-specific PCR (MSP), COLD-PCR, targeted sequencing, targeted bisulfite sequencing, pyrosequencing, mass spectroscopy-based bisulfite sequencing (EpiTYPER), reduced representation bisulfite sequence (RRBS), whole genome sequencing (WGS), amplification fragment length polymorphism, amplification fragment length polymorphism (AF
- the assay may comprise restriction landmark genomic scanning and/or methylation-sensitive restriction enzyme (MSRE) digestion followed by quantitative PCR.
- the assay may utilize bisulfite treatment of DNA as a step of methylation analysis. After the bisulfite treatment, subsequent steps can include methylation-specific PCR (MSP), targeted sequencing, pyrosquencing, Epityper, reduced representation sequencing, whole genome sequencing, whole genome bisulfite sequencing (WGBS), or a combination thereof. All these methods are prevented from being used to measure DNA methylation of a single/multiple CpG sites in current invented region on the human genome.
- MSP methylation-specific PCR
- targeted sequencing pyrosquencing
- Epityper Epityper
- reduced representation sequencing whole genome sequencing
- WGBS whole genome bisulfite sequencing
- the methylation-sensitive restriction enzyme is selected from the group consisting of AatII, Acc65I, AccI, Acil, ACII, Afel, Agel, Apal, ApaLI, AscI, AsiSI, Aval, AvaII, Aox I, BaeI, BanI, BbeI, BceAI, BcgI, BfuCI, BglI, BisI, BlsI, BmgBI, BsaAI, BsaBI, BsaHI, BsaI, BseYI, BsiEI, BsiWI, BslI, BsmAI, BsmBI, BsmFI, BspDI, BsrBI, BsrFI, BssHII, BssKI, BstAPI, BstBI, BstUI, BstZl7I, Cac8J, ClaI, DpnI, DrdI, EaeI, E
- the nucleic acid sequencing may be performed by any suitable sequencing methods, such as massively parallel sequencing, paired-end sequencing, high-throughput sequencing, next-generation sequencing (NGS), shotgun sequencing, single-molecule sequencing, nanopore sequencing, semiconductor sequencing, pyrosequencing, sequencing-by-synthesis (SBS), sequencing-by-ligation, and sequencing-by-hybridization, RNA-Seq (Illumina).
- the sequencing may comprise nucleic acid amplification (e.g., of DNA or RNA molecules).
- the nucleic acid amplification is polymerase chain reaction (PCR).
- a suitable number of rounds of PCR may be performed to sufficiently amplify an initial amount of nucleic acid (e.g., DNA) to a desired input quantity for subsequent sequencing.
- the PCR may be used for global amplification of nucleic acids. This may comprise using adapter sequences that may be first ligated to different molecules followed by PCR amplification using universal primers.
- PCR may be performed using any of a number of commercial kits, e.g., provided by Life Technologies, Affymetrix, Promega, Qiagen, etc.
- target nucleic acids within a population of nucleic acids may be amplified (e.g., one or more of the panel of prostate cancer biomarkers or prostate cancer-associated genomic loci).
- Specific primers possibly in conjunction with adapter ligation, may be used to selectively amplify certain targets for downstream sequencing.
- the PCR may comprise targeted amplification of one or more genomic loci, such as genomic loci associated with prostate cancer (e.g., listed in databases such as TCGA or COSMIC).
- the sequencing may comprise use of simultaneous reverse transcription (RT) and polymerase chain reaction (PCR), such as a OneStep RT-PCR kit protocol by Qiagen, NEB, Thermo Fisher Scientific, or Bio-Rad.
- RT simultaneous reverse transcription
- PCR polymerase chain reaction
- DNA or RNA molecules may be tagged, e.g., with identifiable tags, to allow for multiplexing of a plurality of samples. Any number of DNA or RNA samples may be multiplexed.
- a multiplexed reaction may contain DNA or RNA from at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more than 100 initial samples.
- a plurality of samples may be tagged with sample barcodes such that each DNA molecule may be traced back to the sample (and the subject) from which the DNA molecule originated.
- Such tags may be attached to DNA or RNA molecules by ligation or by PCR amplification with primers.
- sequence reads may be aligned to one or more reference genomes (e.g., a genome of one or more species such as a human genome).
- the aligned sequence reads may be quantified at a panel of genomic loci to generate the data indicative of a distribution of the presence, absence, or relative assessment of the prostate cancer.
- quantification of sequences corresponding to a panel of genomic loci associated with prostate cancer may generate the methylation data indicative of the presence, absence, or relative assessment of the prostate cancer.
- the prostate cancer may be identified or monitored in the subject by using probes configured to selectively enrich nucleic acid (e.g., DNA or RNA) molecules corresponding to the panel of genomic loci (e.g., prostate cancer-associated genomic loci).
- the probes may be nucleic acid primers.
- the probes may have sequence complementarity with nucleic acid sequences (e.g., about 5 nucleotides, about 10 nucleotides, about 15 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 45 nucleotides, about 50 nucleotides, about 55 nucleotides, about 60 nucleotides, about 65 nucleotides, about 70 nucleotides, about 75 nucleotides, about 80 nucleotides, about 85 nucleotides, about 90 nucleotides, about 95 nucleotides, about 100 nucleotides, or more than about 100 nucleotides) from one or more of the individual genomic loci (e.g., prostate cancer-associated genomic loci).
- nucleic acid sequences e.g., about 5 nucleotides, about 10 nucleotides, about 15 nucleotides
- the one or more genomic loci may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, or more than about 100 distinct genomic loci (e.g., prostate cancer-associated genomic loci).
- the panel of genomic loci comprises one or more prostate cancer-associated genomic loci listed in Table 1.
- the biological sample may be processed without any nucleic acid extraction.
- the processing may comprise assaying the biological sample using probes that are selected for the panel of genomic loci (e.g., prostate cancer-associated genomic loci).
- the panel of genomic loci e.g., prostate cancer-associated genomic loci
- the panel of genomic loci may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, or more than about 100 distinct genomic loci (e.g., prostate cancer-associated genomic loci).
- the panel of genomic loci comprises one or more prostate cancer-associated genomic loci listed in Table 1.
- the processing may comprise assaying the biological sample using probes that are selective for the one or more genomic loci (e.g., prostate cancer-associated genomic loci) among other genomic loci in the biological sample.
- the probes may be nucleic acid molecules (e.g., DNA or RNA) having sequence complementarity with nucleic acid sequences (e.g., about 5 nucleotides, about 10 nucleotides, about 15 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 45 nucleotides, about 50 nucleotides, about 55 nucleotides, about 60 nucleotides, about 65 nucleotides, about 70 nucleotides, about 75 nucleotides, about 80 nucleotides, about 85 nucleotides, about 90 nucleotides, about 95 nucleotides, about 100 nucleotides, or more than about 100
- nucleic acid molecules may be primers or enrichment sequences.
- the assaying of the biological sample using probes that are selected for the one or more genomic loci may comprise use of array hybridization, polymerase chain reaction (PCR), or nucleic acid sequencing (e.g., DNA sequencing or RNA sequencing).
- PCR polymerase chain reaction
- nucleic acid sequencing e.g., DNA sequencing or RNA sequencing.
- the assay readouts may be quantified at one or more of the panel of genomic loci (e.g., prostate cancer-associated genomic loci) to generate the methylation data indicative of a presence, absence, or relative assessment of the prostate cancer.
- quantification of array hybridization or polymerase chain reaction (PCR) corresponding to the panel of prostate cancer-associated genomic loci may generate methylation data at the panel of prostate cancer-associated genomic loci in the biological sample.
- Assay readouts may comprise quantitative PCR (qPCR) values, digital PCR (dPCR) values, digital droplet PCR (ddPCR) values, fluorescence values, etc.
- Cycle threshold (Ct) values may be obtained for each amplified region of a set of genomic regions (e.g., prostate cancer associated biomarkers) and normalized based on the external control locus (as in operation 112 ). For example, a first cycle threshold (Ct) value may be determined for the amplification of a set of digested DNA fragments or derivatives thereof at one or more genomic regions, and a second cycle threshold (Ct) value may be determined for the amplification of a set of undigested DNA molecules or derivatives thereof at the one or more genomic regions.
- the undigested template may be used to represent the fully methylated DNA. After the purification of the MSRE digestion, the same amount of DNA may be used for the qPCR analysis of the digested and undigested templates.
- Reference Ct values may be generated based on the external control locus for (i) the amplification of the set of digested DNA fragments or derivatives thereof at the one or more genomic regions, and for (ii) the amplification of the set of undigested DNA molecules or derivatives thereof at the one or more genomic regions. Then, the first Ct value (for amplification of digested DNA) and the second Ct value (for amplification of undigested DNA) can be normalized using the difference of the internal control gene's Ct values before and after the digestion (delta Ct c ). The normalization may comprise subtracting the delta Ct c value from the difference between the first Ct value (for amplification of digested DNA) and the second Ct value (for amplification of undigested DNA).
- the Ct difference (delta Ct) between the first qPCR reaction (run on the digested DNA template) and the second qPCR reaction (run on the undigested DNA template) is calculated and used to indicate the DNA methylation level of the subject.
- the delta Ct value can represent the subject's DNA methylation level for the target region.
- the undigested DNA may have low Ct values, while the digested DNA from a normal individual may have high Ct values, thereby resulting in large absolute delta Ct values. Otherwise, the delta Ct values from a prostate cancer patient may be small (e.g., close to 0).
- the qPCR signal intensity may be calculated for the biomarker region from the cycle threshold (Ct) values (as in operation 114 ).
- the signal intensity can be given by 2 ⁇ circumflex over ( ) ⁇ [Ct, biomarker restriction locus ⁇ Ct, internal control locus].
- An intensity ratio may be calculated using the first Ct value (for amplification of digested DNA) and the second Ct value (for amplification of undigested DNA), such as by determining the reciprocal of an exponentiation of (i) a base value (e.g., 2, 10, or e) and (ii) a difference between the first Ct value and the second Ct value.
- a likelihood (e.g., a probability score) may be calculated, which reflects the correlation between the biomarker signal intensity in the subject and tumor references and/or the correlation between the biomarker signal intensity in the subject and normal references (as in operation 116 ).
- a likelihood or probability score may be determined using a classifier, as described herein.
- kits for identifying or monitoring a prostate cancer in a subject may comprise probes for identifying a presence, absence, or relative amount of sequences at the panel of prostate cancer-associated genomic loci in a biological sample of the subject, which may be indicative of a prostate cancer.
- the probes may be selective for the sequences at the panel of prostate cancer-associated genomic loci in the biological sample.
- a kit may comprise instructions for using the probes to process the biological sample to generate methylation data at the panel of prostate cancer-associated genomic loci in a biological sample of the subject.
- the probes in the kit may be selective for the sequences at the plurality of prostate cancer-associated genomic loci in the biological sample.
- the probes in the kit may be configured to selectively enrich nucleic acid (e.g., DNA or RNA) molecules corresponding to the panel of prostate cancer-associated genomic loci.
- the probes in the kit may be nucleic acid primers.
- the probes in the kit may have sequence complementarity with nucleic acid sequences from one or more of the prostate cancer-associated genomic loci.
- the one or more genomic loci may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, or more than about 100 distinct genomic loci (e.g., prostate cancer-associated genomic loci).
- the one or more genomic loci comprise one or more prostate cancer-associated genomic loci listed in Table 1.
- the instructions in the kit may comprise instructions to assay the biological sample using the probes that are selective for the sequences at the panel of prostate cancer-associated genomic loci in the biological sample.
- the probes may be nucleic acid molecules (e.g., DNA or RNA) having sequence complementarity with nucleic acid sequences (e.g., about 5 nucleotides, about 10 nucleotides, about 15 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 45 nucleotides, about 50 nucleotides, about 55 nucleotides, about 60 nucleotides, about 65 nucleotides, about 70 nucleotides, about 75 nucleotides, about 80 nucleotides, about 85 nucleotides, about 90 nucleotides, about 95 nucleotides, about 100 nucleotides, or more than about 100 nucleotides) from
- the instructions to assay the biological sample may comprise introductions to perform array hybridization, polymerase chain reaction (PCR), or nucleic acid sequencing (e.g., DNA sequencing or RNA sequencing) to process the biological sample to generate methylation data indicative of a presence, absence, or relative amount of sequences at the panel of prostate cancer-associated genomic loci in the biological sample, which may be indicative of a prostate cancer.
- PCR polymerase chain reaction
- nucleic acid sequencing e.g., DNA sequencing or RNA sequencing
- the instructions in the kit may comprise instructions to measure and interpret assay readouts, which may be quantified at one or more of the panel of prostate cancer-associated genomic loci to generate the methylation data indicative of a presence, absence, or relative amount of sequences at the panel of prostate cancer-associated genomic loci in the biological sample.
- quantification of array hybridization or polymerase chain reaction (PCR) corresponding to the panel of prostate cancer-associated genomic loci may generate methylation data at the panel of prostate cancer-associated genomic loci in the biological sample.
- Assay readouts may comprise quantitative PCR (qPCR) values, digital PCR (dPCR) values, digital droplet PCR (ddPCR) values, fluorescence values, etc., or normalized values thereof.
- a classifier may be used to process the methylation data at the panel of prostate cancer-associated genomic loci to classify the biological sample, thereby identifying or assessing a prostate cancer of the subject.
- the classifier may be configured to identify the prostate cancer with an accuracy of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than 99% for at least about 25, at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, at least about 500, or more than about 500 independent samples.
- the classifier may comprise a supervised machine learning algorithm or an unsupervised machine learning algorithm.
- the classifier may comprise a classification and regression tree (CART) algorithm.
- the classifier may comprise, for example, a support vector machine (SVM), a linear regression, a logistic regression, a nonlinear regression, a neural network, a Random Forest, a deep learning algorithm, a na ⁇ ve Bayes classifier.
- SVM support vector machine
- linear regression linear regression
- logistic regression logistic regression
- nonlinear regression a neural network
- Random Forest Random Forest
- a deep learning algorithm a na ⁇ ve Bayes classifier.
- the classifier may be configured to accept a plurality of input variables and to produce one or more output values based on the plurality of input variables.
- the plurality of input variables may comprise data indicative of a presence, absence, or relative amount of sequences or methylated residues at each of the plurality of prostate cancer-associated genomic loci.
- an input variable may comprise a number of sequences or methylated residues corresponding to or aligning to each of the plurality of prostate cancer-associated genomic loci.
- the classifier may have one or more possible output values, each comprising one of a fixed number of possible values (e.g., a linear classifier, a logistic regression classifier, etc.) indicating a classification of the biological sample by the classifier.
- the classifier may comprise a binary classifier, such that each of the one or more output values comprises one of two values (e.g., ⁇ 0, 1 ⁇ , ⁇ positive, negative ⁇ , or ⁇ cancerous, non-cancerous ⁇ ) indicating a classification of the biological sample by the classifier.
- the classifier may be another type of classifier, such that each of the one or more output values comprises one of more than two values (e.g., ⁇ 0, 1, 2 ⁇ , ⁇ positive, negative, or indeterminate ⁇ , or ⁇ cancerous, non-cancerous, or indeterminate ⁇ ) indicating a classification of the biological sample by the classifier.
- the output values may comprise descriptive labels, numerical values, or a combination thereof. Some of the output values may comprise descriptive labels. Such descriptive labels may provide an identification or indication of the disease state of the subject, and may comprise, for example, positive, negative, cancerous, non-cancerous, or indeterminate.
- Such descriptive labels may provide an identification of a treatment for the subject's disease state, and may comprise, for example, a therapeutic intervention, a duration of the therapeutic intervention, and/or a dosage of the therapeutic intervention.
- Such descriptive labels may provide an identification of secondary clinical tests that may be appropriate to perform on the subject, and may comprise, for example, a biopsy, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, or a PET-CT scan.
- CT computed tomography
- MRI magnetic resonance imaging
- PET positron emission tomography
- PET-CT scan positron emission tomography
- Such descriptive labels may provide a prognosis of the disease state of the subject.
- Some descriptive labels may be mapped to numerical values, for example, by mapping “positive” to 1 and “negative” to 0.
- Some of the output values may comprise numerical values, such as binary, integer, or continuous values. Such binary output values may comprise, for example, ⁇ 0, 1 ⁇ . Such integer output values may comprise, for example, ⁇ 0, 1, 2 ⁇ . Such continuous output values may comprise, for example, a probability value of at least 0 and no more than 1. Such continuous output values may comprise, for example, an un-normalized probability value of at least 0. Such continuous output values may comprise, for example, an un-normalized probability value of at least 0. Such continuous output values may indicate a prognosis of the disease or disorder state of the subject and may comprise, for example, an indication of an expected or average progression-free survival (PFS) or overall survival (OS) of the subject.
- PFS progression-free survival
- OS overall survival
- Such continuous output values may indicate a prediction of the course of treatment to treat the disease or disorder state of the subject and may comprise, for example, an indication of an expected duration of efficacy of the course of treatment.
- Some numerical values may be mapped to descriptive labels, for example, by mapping 1 to “positive” and 0 to “negative.”
- Some of the output values may be assigned based on one or more cutoff values. For example, a binary classification of samples may assign an output value of “positive” or 1 if the sample indicates that the subject has at least a 50% probability of being diseased. For example, a binary classification of samples may assign an output value of “negative” or 0 if the sample indicates that the subject has less than a 50% probability of being diseased. In this case, a single cutoff value of 50% is used to classify samples into one of the two possible binary output values.
- Examples of single cutoff values may include about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, about 98%, and about 99%.
- a classification of samples may assign an output value of “positive” or 1 if the sample indicates that the subject has a probability of being diseased of at least 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%.
- the classification of samples may assign an output value of “positive” or 1 if the sample indicates that the subject has a probability of being diseased of more than 50%, more than 55%, more than 60%, more than 65%, more than 70%, more than 75%, more than 80%, more than 85%, more than 90%, more than 95%, more than 98%, or more than 99%.
- the classification of samples may assign an output value of “negative” or 0 if the sample indicates that the subject has a probability of being diseased of less than 50%, less than 45%, less than 40%, less than 35%, less than 30%, less than 25%, less than 20%, less than 10%, less than 5%, less than 2%, or less than 1%.
- the classification of samples may assign an output value of “negative” or 0 if the sample indicates that the subject has a probability of being diseased of no more than 50%, no more than 45%, no more than 40%, no more than 35%, no more than 30%, no more than 25%, no more than 20%, no more than 10%, no more than 5%, no more than 2%, or no more than 1%.
- the classification of samples may assign an output value of “indeterminate” or 2 if the sample has not been classified as “positive,” “negative,” 1, or 0. In this case, a set of two cutoff values is used to classify samples into one of the three possible output values.
- sets of cutoff values may include ⁇ 1%, 99% ⁇ , ⁇ 2%, 98% ⁇ , ⁇ 5%, 95% ⁇ , ⁇ 10%, 90% ⁇ , ⁇ 15%, 85% ⁇ , ⁇ 20%, 80% ⁇ , ⁇ 25%, 75% ⁇ , ⁇ 30%, 70% ⁇ , ⁇ 35%, 65% ⁇ , ⁇ 40%, 60% ⁇ , and ⁇ 45%, 55% ⁇ .
- sets of n cutoff values may be used to classify samples into one of n+1 possible output values, where n is any positive integer.
- the classifier may be trained with a plurality of independent training samples.
- Each of the independent training samples may comprise a biological sample from a subject, associated data obtained by processing the biological sample (as described elsewhere herein), and one or more known output values corresponding to the biological sample (e.g., a clinical diagnosis, prognosis, treatment efficacy, or absence of a disease or disorder such as a prostate cancer of the subject).
- Independent training samples may comprise biological samples and associated data and outputs obtained from a plurality of different subjects.
- Independent training samples may comprise biological samples and associated data and outputs obtained at a plurality of different time points from the same subject (e.g., before, after, and/or during a course of treatment to treat a disease or disorder of the subject).
- Independent training samples may be associated with presence of the prostate cancer (e.g., training samples comprising biological samples and associated data and outputs obtained from a plurality of subjects known to have the prostate cancer). Independent training samples may be associated with absence of the prostate cancer (e.g., training samples comprising biological samples and associated data and outputs obtained from a plurality of subjects who are known to not have a previous diagnosis of the prostate cancer, or otherwise who are asymptomatic for the prostate cancer).
- the classifier may be trained with at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, or at least about 500 independent training samples.
- the independent training samples may comprise samples associated with presence of the prostate cancer and/or samples associated with absence of the prostate cancer.
- the classifier may be trained with no more than 500, no more than 450, no more than 400, no more than 350, no more than 300, no more than 250, no more than 200, no more than 150, no more than 100, or no more than 50 independent training samples associated with presence of the prostate cancer.
- the biological sample is independent of samples used to train the classifier.
- the classifier may be trained with a first number of independent training samples associated with presence of the prostate cancer and a second number of independent training samples associated with absence of the prostate cancer.
- the first number of independent training samples associated with presence of the prostate cancer may be no more than the second number of independent training samples associated with absence of the prostate cancer.
- the first number of independent training samples associated with presence of the prostate cancer may be equal to the second number of independent training samples associated with absence of the prostate cancer.
- the first number of independent training samples associated with presence of the prostate cancer may be greater than the second number of independent training samples associated with absence of the prostate cancer.
- the classifier may be configured to identify the prostate cancer with an accuracy of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%, for at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, or more than about 300 independent samples.
- the accuracy of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples (e.g., subjects having the prostate cancer, or apparently healthy subjects with negative clinical test results for the prostate cancer) that are correctly identified or classified as having or not having the prostate cancer, respectively.
- the classifier may be configured to identify the prostate cancer with a positive predictive value (PPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.
- the PPV of identifying the prostate cancer by the classifier may be calculated as the percentage of biological samples identified or classified as having the
- the classifier may be configured to identify the prostate cancer with a negative predictive value (NPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.
- the NPV of identifying the prostate cancer by the classifier may be calculated as the percentage of biological samples identified or classified as not having
- the classifier may be configured to identify the prostate cancer with a clinical sensitivity of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.
- the clinical sensitivity of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples associated with presence of the prostate cancer (e.g., subjects known to have the prostate cancer) that are correctly identified or classified as having the prostate cancer.
- a clinical sensitivity may also be referred to as a recall.
- the classifier may be configured to identify the prostate cancer with a clinical specificity of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.
- the clinical specificity of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples associated with absence of the prostate cancer (e
- the classifier may be configured to identify the prostate cancer with an Area-Under-Curve (AUC) of at least about 0.50, at least about 0.55, at least about 0.60, at least about 0.65, at least about 0.70, at least about 0.75, at least about 0.80, at least about 0.81, at least about 0.82, at least about 0.83, at least about 0.84, at least about 0.85, at least about 0.86, at least about 0.87, at least about 0.88, at least about 0.89, at least about 0.90, at least about 0.91, at least about 0.92, at least about 0.93, at least about 0.94, at least about 0.95, at least about 0.96, at least about 0.97, at least about 0.98, at least about 0.99, or more than about 0.99.
- the AUC may be calculated as an integral of the Receiver Operator Characteristic (ROC) curve (e.g., the area under the ROC curve) associated with the classifier in classifying biological samples as having or not having the prostate cancer.
- ROC Receiver Operator Characteristic
- the classifier may be adjusted or tuned to improve the performance, accuracy, PPV, NPV, clinical sensitivity, clinical specificity, or AUC of identifying the prostate cancer.
- the classifier may be adjusted or tuned by adjusting parameters of the classifier (e.g., a set of cutoff values used to classify a sample as described elsewhere herein, or weights of a neural network).
- the classifier may be adjusted or tuned continuously during the training process or after the training process has completed.
- a subset of the inputs may be identified as most influential or most important to be included for making high-quality classifications.
- a subset of the plurality of prostate cancer-associated genomic loci may be identified as most influential or most important to be included for making high-quality classifications or identifications of prostate cancer.
- the plurality of prostate cancer-associated genomic loci or a subset thereof may be ranked based on metrics indicative of each genomic locus's influence or importance toward making high-quality classifications or identifications of prostate cancer.
- Such metrics may be used to reduce, in some cases significantly, the number of input variables (e.g., predictor variables) that may be used to train the classifier to a desired performance level (e.g., based on a desired minimum accuracy, PPV, NPV, clinical sensitivity, clinical specificity, or AUC).
- a desired performance level e.g., based on a desired minimum accuracy, PPV, NPV, clinical sensitivity, clinical specificity, or AUC.
- training the training algorithm with a plurality comprising several dozen or hundreds of input variables in the classifier results in an accuracy of classification of more than 99%
- training the training algorithm instead with only a selected subset of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, no more than about 10, no more than about 15, no more than about 20, no more than about 25, no more than about 30, no more than about 35, no more than about 40, no more than about 45, no more than about 50, or no more than about 100
- most influential or most important input variables e.g., marker genes, marker regions, or other genomic loci
- the plurality results in decreased but still acceptable accuracy of classification (e.g., at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, or at least about 98%).
- the subset may be selected by rank-ordering the entire plurality of input variables and selecting a predetermined number (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, no more than about 10, no more than about 15, no more than about 20, no more than about 25, no more than about 30, no more than about 35, no more than about 40, no more than about 45, no more than about 50, or no more than about 100) of input variables with the best metrics.
- a predetermined number e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, no more than about 10, no more than about 15, no more than about 20, no more than about 25, no more than about 30, no more than about 35, no more than about 40, no more than about 45, no more than about 50, or no more than about 100
- the selected subset of the influential or most important input variables comprises one or more genomic loci listed in Table 2.
- a quantitative measure indicative of the presence, absence, or relative assessment of the prostate cancer may be determined (e.g., likelihood or probability of prostate cancer), and the prostate cancer may be identified or a progression or regression of the prostate cancer may be monitored in the subject based at least in part on the quantitative measure (e.g., likelihood or probability of prostate cancer).
- the prostate cancer may be identified in the subject with an accuracy of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.
- the accuracy of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples (e.g., subjects having the prostate cancer, or apparently healthy subjects with negative clinical test results for the prostate cancer) that are correctly identified or classified as having or not having the prostate cancer, respectively.
- the prostate cancer may be identified in the subject with a positive predictive value (PPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.
- the PPV of identifying the prostate cancer by the classifier may be calculated as the percentage of biological samples identified or classified as having the prostate cancer
- the prostate cancer may be identified in the subject with a negative predictive value (NPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.
- the NPV of identifying the prostate cancer by the classifier may be calculated as the percentage of biological samples identified or classified as not having the prostate
- the prostate cancer may be identified in the subject with a clinical sensitivity of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.
- the clinical sensitivity of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples associated with presence of the prostate cancer (e.g., subjects having the prostate cancer) that are correctly identified or classified as having the prostate cancer.
- a clinical sensitivity may also be referred to as a recall.
- the prostate cancer may be identified in the subject with a clinical specificity of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%.
- the clinical specificity of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples associated with absence of the prostate cancer (e.g
- a stage of the prostate cancer (e.g., stage I, stage II, stage III, or stage IV) may further be identified.
- the stage of the prostate cancer may be determined based at least in part on the methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of mutations at the prostate cancer-associated genomic loci).
- the subject may be provided with a therapeutic intervention (e.g., prescribing an appropriate course of treatment to treat the prostate cancer of the subject).
- the therapeutic intervention may comprise a surgical tumor resection, an effective dose of chemotherapy, an effective dose of radiotherapy, an effective dose of targeted therapy, an effective dose of immunotherapy.
- the therapeutic intervention may comprise a subsequent different course of treatment (e.g., to increase treatment efficacy due to tumor resistance, tumor recurrence, non-response of the current course of treatment).
- the therapeutic intervention may comprise recommending the subject for a secondary clinical test to confirm a diagnosis of the prostate cancer.
- This secondary clinical test may comprise a biopsy, an imaging test, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, a PET-CT scan, or any combination thereof.
- CT computed tomography
- MRI magnetic resonance imaging
- PET positron emission tomography
- the subject may be treated upon identifying the subject as having the prostate cancer. Treating the subject may comprise administering an appropriate therapeutic intervention to treat the prostate cancer of the subject.
- the therapeutic intervention may comprise a surgical tumor resection, an effective dose of chemotherapy, an effective dose of radiotherapy, an effective dose of targeted therapy, an effective dose of immunotherapy, or any combination thereof.
- the administered therapeutic intervention may comprise a subsequent different course of treatment (e.g., to increase treatment efficacy due to tumor resistance, tumor recurrence, non-response of the current course of treatment).
- the methylation data at the panel of prostate cancer-associated genomic loci may be assessed over a duration of time to monitor a patient (e.g., subject who has prostate cancer or who is being treated for prostate cancer).
- a patient e.g., subject who has prostate cancer or who is being treated for prostate cancer.
- the quantitative measures of methylation at the prostate cancer-associated genomic loci of the patient may change during the course of treatment.
- the quantitative measures of methylation at the prostate cancer-associated genomic loci of a patient whose prostate cancer is regressing due to an effective treatment e.g., chemotherapy or surgical resection
- an effective treatment e.g., chemotherapy or surgical resection
- the quantitative measures of methylation at the prostate cancer-associated genomic loci of a patient whose prostate cancer is progressing due to an ineffective treatment may shift toward the methylation profile or distribution of a subject with more advanced stage prostate cancer.
- the progression or regression of the prostate cancer in the subject may be monitored by monitoring a course of treatment for treating the prostate cancer in the subject.
- the monitoring may comprise assessing the prostate cancer in the subject at two or more time points.
- the assessing may be based at least on the methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined at each of the two or more time points.
- a difference in methylation data at a panel of prostate cancer-associated genomic loci may be indicative of one or more clinical indications, such as (i) a diagnosis of the prostate cancer in the subject, (ii) a prognosis of the prostate cancer in the subject, (iii) a progression of the prostate cancer in the subject, (iv) a regression of the prostate cancer in the subject, (v) an efficacy of the course of treatment for treating the prostate cancer in the subject, and (vi) a resistance of the prostate cancer toward the course of treatment for treating the prostate cancer in the subject.
- clinical indications such as (i) a diagnosis of the prostate cancer in the subject, (ii) a prognosis of the prostate cancer in the subject, (iii) a progression of the prostate cancer in the subject, (iv) a regression of the prostate cancer in the subject, (v) an efficacy of the course of treatment for treating the prostate cancer in the subject, and (vi) a resistance of the prostate cancer toward the course of treatment for treating the prostate cancer in
- a difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a diagnosis of the prostate cancer in the subject. For example, if the prostate cancer was not detected in the subject at an earlier time point but was detected in the subject at a later time point, then the difference is indicative of a diagnosis of the prostate cancer in the subject.
- a clinical action or decision may be made based on this indication of diagnosis of the prostate cancer in the subject, e.g., prescribing a new therapeutic intervention for the subject.
- a difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a prognosis of the prostate cancer in the subject.
- a difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a progression of the prostate cancer in the subject.
- the difference may be indicative of a progression (e.g., increased tumor load, tumor burden, or tumor size) of the prostate cancer in the subject.
- a clinical action or decision may be made based on this indication of the progression, e.g., prescribing a new therapeutic intervention or switching therapeutic interventions (e.g., ending a current treatment and prescribing a new treatment) for the subject.
- a difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a regression of the prostate cancer in the subject.
- the difference may be indicative of a regression (e.g., decreased tumor load, tumor burden, or tumor size) of the prostate cancer in the subject.
- a clinical action or decision may be made based on this indication of the regression, e.g., continuing or ending a current therapeutic intervention for the subject.
- a difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of an efficacy of the course of treatment for treating the prostate cancer in the subject. For example, if the prostate cancer was detected in the subject at an earlier time point but was not detected in the subject at a later time point, then the difference may be indicative of an efficacy of the course of treatment for treating the prostate cancer in the subject.
- a clinical action or decision may be made based on this indication of the efficacy of the course of treatment for treating the prostate cancer in the subject, e.g., continuing or ending a current therapeutic intervention for the subject.
- a difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a resistance of the prostate cancer toward the course of treatment for treating the prostate cancer in the subject.
- the difference may be indicative of a resistance (e.g., increased or constant tumor load, tumor burden, or tumor size) of the course of treatment for treating the prostate cancer in the subject.
- a clinical action or decision may be made based on this indication of the resistance of the course of treatment for treating the prostate cancer in the subject, e.g., ending a current therapeutic intervention and/or switching to (e.g., prescribing) a different new therapeutic intervention for the subject.
- a report may be electronically outputted that identifies or provides an indication of the identification, prognosis, progression, or regression of the prostate cancer in the subject.
- the subject may not display a benign or prostate cancer (e.g., is asymptomatic of the benign or prostate cancer).
- the report may be presented on a graphical user interface (GUI) of an electronic device of a user.
- GUI graphical user interface
- the user may be the subject, a caretaker, a physician, a nurse, or another health care worker.
- the report may include one or more clinical indications such as (i) a diagnosis of the prostate cancer in the subject, (ii) a prognosis of the prostate cancer in the subject, (iii) a progression of the prostate cancer in the subject, (iv) a regression of the prostate cancer in the subject, (v) an efficacy of the course of treatment for treating the prostate cancer in the subject, and (vi) a resistance of the prostate cancer toward the course of treatment for treating the prostate cancer in the subject.
- the report may include one or more clinical actions or decisions made based on these one or more clinical indications.
- a clinical indication of a diagnosis of the prostate cancer in the subject may be accompanied with a clinical action of prescribing a new therapeutic intervention for the subject.
- a clinical indication of a progression of the prostate cancer in the subject may be accompanied with a clinical action of prescribing a new therapeutic intervention or switching therapeutic interventions (e.g., ending a current treatment and prescribing a new treatment) for the subject.
- a clinical indication of a regression of the prostate cancer in the subject may be accompanied with a clinical action of continuing or ending a current therapeutic intervention for the subject.
- a clinical indication of an efficacy of the course of treatment for treating the prostate cancer in the subject may be accompanied with a clinical action of continuing or ending a current therapeutic intervention for the subject.
- a clinical indication of a resistance of the course of treatment for treating the prostate cancer in the subject may be accompanied with a clinical action of ending a current therapeutic intervention and/or switching to (e.g., prescribing) a different new therapeutic intervention for the subject.
- FIG. 4 shows a computer system 401 that is programmed or otherwise configured to, for example, determine quantitative measures of DNA methylation to generate methylation profiles of DNA molecules at genomic regions; determine cycle threshold (Ct) values for amplification of DNA fragments or derivatives thereof at genomic regions; calculate intensity ratio values of first quantitative measures and second quantitative measures at genomic regions; determine a quantitative measure indicative of a presence, absence, or relative assessment of prostate cancer of a subject; identify or provide an indication of the prostate cancer of the subject; and electronically output a report that identifies or provides an indication of the prostate cancer of the subject.
- Ct cycle threshold
- the computer system 401 can regulate various aspects of analysis, calculation, and generation of the present disclosure, such as, for example, determining quantitative measures of DNA methylation to generate methylation profiles of DNA molecules at genomic regions; determining cycle threshold (Ct) values for amplification of DNA fragments or derivatives thereof at genomic regions; calculating intensity ratio values of first quantitative measures and second quantitative measures at genomic regions; determining a quantitative measure indicative of a presence, absence, or relative assessment of prostate cancer of a subject; identifying or providing an indication of the prostate cancer of the subject; and electronically outputting a report that identifies or provides an indication of the prostate cancer of the subject.
- the computer system 401 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device.
- the electronic device can be a mobile electronic device.
- the computer system 401 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 405 , which can be a single core or multi core processor, or a plurality of processors for parallel processing.
- the computer system 401 also includes memory or memory location 410 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 415 (e.g., hard disk), communication interface 420 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 425 , such as cache, other memory, data storage and/or electronic display adapters.
- the memory 410 , storage unit 415 , interface 420 , and peripheral devices 425 are in communication with the CPU 405 through a communication bus (solid lines), such as a motherboard.
- the storage unit 415 can be a data storage unit (or data repository) for storing data.
- the computer system 401 can be operatively coupled to a computer network (“network”) 430 with the aid of the communication interface 420 .
- the network 430 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
- the network 430 in some cases is a telecommunication and/or data network.
- the network 430 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
- one or more computer servers may enable cloud computing over the network 430 (“the cloud”) to perform various aspects of analysis, calculation, and generation of the present disclosure, such as, for example, determining quantitative measures of DNA methylation to generate methylation profiles of DNA molecules at genomic regions; determining cycle threshold (Ct) values for amplification of DNA fragments or derivatives thereof at genomic regions; calculating intensity ratio values of first quantitative measures and second quantitative measures at genomic regions; determining a quantitative measure indicative of a presence, absence, or relative assessment of prostate cancer of a subject; identifying or providing an indication of the prostate cancer of the subject; and electronically outputting a report that identifies or provides an indication of the prostate cancer of the subject.
- Ct cycle threshold
- cloud computing may be provided by cloud computing platforms such as, for example, Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform, and IBM cloud.
- the network 430 in some cases with the aid of the computer system 401 , can implement a peer-to-peer network, which may enable devices coupled to the computer system 401 to behave as a client or a server.
- the CPU 405 may comprise one or more computer processors and/or one or more graphics processing units (GPUs).
- the CPU 405 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
- the instructions may be stored in a memory location, such as the memory 410 .
- the instructions can be directed to the CPU 405 , which can subsequently program or otherwise configure the CPU 405 to implement methods of the present disclosure. Examples of operations performed by the CPU 405 can include fetch, decode, execute, and writeback.
- the CPU 405 can be part of a circuit, such as an integrated circuit.
- a circuit such as an integrated circuit.
- One or more other components of the system 401 can be included in the circuit.
- the circuit is an application specific integrated circuit (ASIC).
- the storage unit 415 can store files, such as drivers, libraries and saved programs.
- the storage unit 415 can store user data, e.g., user preferences and user programs.
- the computer system 401 in some cases can include one or more additional data storage units that are external to the computer system 401 , such as located on a remote server that is in communication with the computer system 401 through an intranet or the Internet.
- the computer system 401 can communicate with one or more remote computer systems through the network 430 .
- the computer system 401 can communicate with a remote computer system of a user.
- remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
- the user can access the computer system 401 via the network 430 .
- Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 401 , such as, for example, on the memory 410 or electronic storage unit 415 .
- the machine executable or machine readable code can be provided in the form of software.
- the code can be executed by the processor 405 .
- the code can be retrieved from the storage unit 415 and stored on the memory 410 for ready access by the processor 405 .
- the electronic storage unit 415 can be precluded, and machine-executable instructions are stored on memory 410 .
- the code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime.
- the code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
- aspects of the systems and methods provided herein can be embodied in programming.
- Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium.
- Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk.
- “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
- another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
- a machine readable medium such as computer-executable code
- a tangible storage medium such as computer-executable code
- Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
- Volatile storage media include dynamic memory, such as main memory of such a computer platform.
- Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
- Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data.
- Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
- the computer system 401 can include or be in communication with an electronic display 435 that comprises a user interface (UI) 440 for providing, for example, a visual display of data indicative of a presence, absence, or relative assessment of prostate cancer of a subject, a determined presence, absence, or relative assessment of prostate cancer of a subject, an identification of a subject as having prostate cancer, or an electronic report that identifies or provides an indication of the prostate cancer of the subject.
- UI user interface
- Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.
- An algorithm can be implemented by way of software upon execution by the central processing unit 405 .
- the algorithm can, for example, determine quantitative measures of DNA methylation to generate methylation profiles of DNA molecules at genomic regions; determine cycle threshold (Ct) values for amplification of DNA fragments or derivatives thereof at genomic regions; calculate intensity ratio values of first quantitative measures and second quantitative measures at genomic regions; determine a quantitative measure indicative of a presence, absence, or relative assessment of prostate cancer of a subject; identify or provide an indication of the prostate cancer of the subject; and electronically output a report that identifies or provides an indication of the prostate cancer of the subject.
- Ct cycle threshold
- FIGS. 2A and 2B illustrate an example of quantitative polymerase chain reaction (qPCR) amplification plots for the control locus and two restriction loci tested in the healthy (prostate normal) sample (“N1-digested” and “N1-undigested”) and the prostate cancer sample (“T1-digested” and “T1-undigested”), respectively, in accordance with disclosed embodiments.
- ROC receiver operating characteristic
- Table 1 provides a list of marker regions, genomic coordinates, and strands for a set of prostate cancer-specific biomarkers.
- Marker regions may include marker genes or intergenic regions. Panels of one or more prostate cancer-specific genomic loci can be selected from this list.
- the genomic coordinates may comprise portions of genes or intergenic regions.
- Table 2 provides a list of selected marker regions that are observed to be most influential or most important toward classification of samples for prostate cancer assessment.
- Table 3 provides performance data (including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the ROC (AUC)) for this set of selected marker regions that are observed to be most influential or most important toward classification of samples for prostate cancer assessment.
- Table 4 provides signal ratio values (mean ratio and standard deviation) measured for this set of selected marker regions that are observed to be most influential or most important toward classification of samples for prostate cancer assessment, across the set of normal samples and the set of tumor samples.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Hospice & Palliative Care (AREA)
- Biophysics (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present disclosure provides methods and systems directed to detection of prostate cancer. A method for processing or analyzing DNA molecules from a biological sample of a subject may comprise processing (a) providing a first set of DNA fragments derived from a first portion of the DNA molecules upon subjecting the first portion to CpG site fragmentation conditions; (b) providing a second set of DNA fragments derived from a second portion of the DNA molecules, the second portion not subjected to fragmentation conditions; (c) for a genomic region, processing the first and the second sets of DNA fragments or derivatives thereof to yield first and second quantitative measures of DNA methylation; and (d) processing the first quantitative measure with the second quantitative measure to yield a third quantitative measure of DNA methylation at the genomic region, thereby generating a methylation profile of the DNA molecules at the genomic region.
Description
This application claims the benefit of U.S. Provisional Patent Application No. 62/891,673, filed Aug. 26, 2019, which is incorporated by reference herein in its entirety. INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A TEXT FILE
A Sequence Listing is provided herewith as a text file, “55223-701_201_SL.txt” created on Feb. 9, 2022, and having a size of 87,479 bytes. The contents of the text file are hereby incorporated by reference herein in their entirety.
Prostate cancer is the second most common form of cancer in men worldwide, accounting for about 15% of the cancers diagnosed in men. Currently, the only available blood molecular screening method for prostate cancer is a prostate-specific antigen (PSA) test, which aims to detect prostate cancer at an early stage when the disease is amenable to curative treatment and reduce the overall disease-specific mortality. However, PSA is not a cancer-specific biomarker, and its level often increases abnormally in prostate-benign patients. Further, only a small minority of men with an elevated PSA level are actually found to have prostate cancer when a biopsy is performed. Therefore, there is scant evidence to establish that PSA screening for prostate cancer can save lives. Currently, biopsy is the only method for prostate cancer diagnosis, but high false negative rates of biopsy can leads significant percentages of men remaining undiagnosed after the first biopsy. Further, marker of prostate cancer such as the tumor stage, Gleason score, and PSA level cannot accurately identify the individuals ultimately failing of a treatment. Thus, there is an urgent need to develop high performance assays with panels of good biomarkers for tissue and blood tests to reduce the need for repeated biopsies for prostate cancer diagnosis and to monitor treated patients for recurrence and metastasis.
The present disclosure provides methods, systems, and kits for detecting prostate cancers by processing nucleic acids from biological samples (e.g., tissue samples and/or bodily fluid samples) obtained from or derived from a subject. Biological samples obtained from subjects may be analyzed to measure a presence, absence, or relative assessment of the prostate cancer. The analysis may be performed at a set of genomic regions, such as a panel of DNA methylation biomarker regions. The subjects may include subjects with prostate cancer (e.g., prostate cancer patients) and subjects without prostate cancer (e.g., normal or healthy controls).
In an aspect, the present disclosure provides a method for processing or analyzing a plurality of deoxyribonucleic (DNA) molecules from a biological sample of a subject, comprising: (a) providing a first set of DNA fragments derived from a first portion of said plurality of DNA molecules upon subjecting said first portion of said plurality of DNA molecules to fragmentation conditions sufficient to fragment at least a subset of said first portion of said plurality of DNA molecules at one or more CpG sites, wherein at least a subset of said first set of DNA fragments comprises methylated nucleic acid bases; (b) providing a second set of DNA fragments derived from a second portion of said plurality of DNA molecules, wherein said second portion of said plurality of DNA molecules is not subjected to fragmentation conditions; (c) for a genomic region, processing (i) said first set of DNA fragments or derivatives thereof to yield a first quantitative measure of DNA methylation and (ii) said second set of DNA fragments or derivatives thereof to yield a second quantitative measure of DNA methylation; and (d) processing said first quantitative measure with said second quantitative measure to yield a third quantitative measure of DNA methylation at said genomic region, thereby generating a methylation profile of said plurality of DNA molecules at said genomic region.
In some embodiments, said biological sample is obtained or derived from a tissue sample, a blood sample, a plasma sample, a serum sample, an exosome sample, a urine sample, a sweat sample, or a saliva sample.
In some embodiments, the method further comprises performing an assay selected from the group consisting of methylation-sensitive restriction enzyme (MSRE) digestion, polymerase chain reaction (PCR), quantitative PCR (qPCR), nucleic acid sequencing, target capture, mass spectrometry-based target fragmentation assay, flap endonuclease-based assay, CRISPR-based assay, methylation-specific assay comprising bisulfite treatment, methylation-specific PCR, targeted sequencing, targeted bisulfite sequencing, pyrosequencing, mass spectroscopy-based bisulfite sequencing (EpiTYPER), reduced representation bisulfite sequence (RRBS), whole genome sequencing (WGS), and a combination thereof.
In some embodiments, said fragmentation conditions comprise MSRE digestion of said first portion of said plurality of DNA molecules to fragment said at least said subset of said first portion of said plurality of DNA molecules at said one or more CpG sites. In some embodiments, said MSRE is selected from the group consisting of AatII, Acc65I, AccI, Acil, ACII, Afel, Agel, Apal, ApaLI, AscI, AsiSI, Aval, AvaII, Aox I, BaeI, BanI, BbeI, BceAI, BegI, BfuCI, BglI, BisI, BisI, BmgBI, BsaAI, BsaBI, BsaHI, BsaI, BseYI, BsiEI, BsiWI, BslI, BsmAI, BsmBI, BsmFI, BspDI, BsrBI, BsrFI, BssHII, BssKI, BstAPI, BstBI, BstUI, BstZl7I, Cac8I, ClaI, DpnI, DrdI, EaeI, EagI, Eagl-IF, EciI, EcoRI, EcoRI-HF, FauI, Fnu4HI, FseI, FspI, Gla I, Glu I, HaeII, HgaI, HhaI, HincII, HincII, Hinfl, HinPlI, HpaI, HpaII, Hpyl66ii, Hpyl88iii, Hpy99I, HpyCH4IV, KasI, Kro I, Mal I, MluI, MmeI, MspAlI, Mte I, MwoI, NaeI, NacI, NgoNIV, Nhe-HFI, NheI, NlaIV, NotI, NotI-HF, NruI, Nt.BbvCI, Nt.BsmAI, Nt.CviPII, PaeR7I, PleI, PmeI, PmlI, Pcs I, Pkr I, PshAI, PspOMI, PvuI, RsaI, RsrII, SacII, Sall, SalI-HF, Sau3AI, Sau96I, ScrFI, SfiI, SfoI, SgrAI, Smal, SnaBI, TfiI, TscI, TseI, TspMI, and ZraI.
In some embodiments, processing said first set of DNA fragments or derivatives thereof in (c) (i) comprises subjecting said first set of DNA fragments or derivatives thereof to amplification, and wherein processing said second set of DNA fragments or derivatives thereof in (c) (ii) comprises subjecting said second set of DNA fragments or derivatives thereof to said amplification. In some embodiments, said amplification comprises targeted quantitative polymerase chain reaction (qPCR) at said genomic region. In some embodiments, processing said first set of DNA fragments or derivatives thereof in (c) (i) comprises determining a first cycle threshold (Ct) value for said amplification of said first set of DNA fragments or derivatives thereof at said genomic region, and wherein processing said second set of DNA fragments or derivatives thereof in (c) (ii) comprises determining a second cycle threshold (Ct) value for said amplification of said second set of DNA fragments or derivatives thereof at said genomic region. In some embodiments, (c) comprises determining a reference Ct value for said amplification of said first set of DNA fragments or derivatives thereof and said second set of DNA fragments or derivatives thereof at a reference genomic region, and normalizing said first quantitative measure and said second quantitative measure using said reference Ct value. In some embodiments, said normalizing comprises subtracting said reference Ct value from said first quantitative measure and said second quantitative measure. In some embodiments, processing said first quantitative measure with said second quantitative measure in (d) comprises calculating an intensity ratio of said first quantitative measure and said second quantitative measure at said genomic region. In some embodiments, calculating said intensity ratio comprises determining a difference between said first quantitative measure and said second quantitative measure at said genomic region. In some embodiments, calculating said intensity ratio comprises determining an exponentiation of a base value and said determined difference at said genomic region. In some embodiments, said base value is 2. In some embodiments, calculating said intensity ratio comprises determining a reciprocal of said determined exponentiation at said genomic region.
In some embodiments, the method further comprises subjecting said first set of DNA fragments and said second set of DNA fragments, or derivatives thereof, to conditions sufficient to permit said methylated nucleic acid bases to be distinguished from unmethylated nucleic acid bases. In some embodiments, subjecting said first set of DNA fragments and said second set of DNA fragments, or derivatives thereof, to said conditions comprises performing bisulfite treatment on first set of DNA fragments and said second set of DNA fragments, or derivatives thereof.
In some embodiments, the method further comprises processing said methylation profile with one or more reference methylation profiles. In some embodiments, said one or more reference methylation profiles are obtained from reference biological samples of one or more additional subjects. In some embodiments, said one or more additional subjects comprise healthy subjects. In some embodiments, said one or more additional subjects comprise subjects having a disease or disorder. In some embodiments, said disease or disorder is a cancer. In some embodiments, said disease or disorder is a cancer selected from the group consisting of pancreatic cancer, liver cancer, lung cancer, colorectal cancer, leukemia, bladder cancer, bone cancer, brain cancer, breast cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, melanoma, ovarian cancer, and prostate cancer. In some embodiments, said cancer is prostate cancer.
In some embodiments, said genomic region comprises one or more CpG sites. In some embodiments, said genomic region comprises a plurality of CpG sites. In some embodiments, said plurality of CpG sites comprises at least about 10 CpG sites.
In some embodiments, said genomic region comprises one or more genes selected from the group consisting of SCGB3A1, ANKDDIB, C5orf49, C9orf3, and GPR75-ASB3. In some embodiments, said genomic region is selected from Table 1, Table 2, or SEQ ID NO:1-SEQ ID NO:276. In some embodiments, said genomic region comprises at least about 10, at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 120, at least about 140, at least about 160, at least about 180, at least about 200, at least about 220, at least about 240, or at least about 260 distinct genomic regions selected from Table 1, Table 2, or SEQ ID NO:1-SEQ ID NO:276. In some embodiments, said genomic region is selected from Table 2.
In some embodiments, the method further comprises electronically outputting a report indicative of said methylation profile. In some embodiments, the method further comprises processing said methylation profile to generate a likelihood of said subject as having or being suspected of having a disease or disorder. In some embodiments, said disease or disorder is a cancer. In some embodiments, said disease or disorder is a cancer selected from the group consisting of pancreatic cancer, liver cancer, lung cancer, colorectal cancer, leukemia, bladder cancer, bone cancer, brain cancer, breast cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, melanoma, ovarian cancer, and prostate cancer. In some embodiments, said cancer is prostate cancer.
In some embodiments, said likelihood is generated with a sensitivity of at least about 80%. In some embodiments, said likelihood is generated with a sensitivity of at least about 90%. In some embodiments, said likelihood is generated with a specificity of at least about 90%. In some embodiments, said likelihood is generated with a specificity of at least about 95%. In some embodiments, said likelihood is generated with an accuracy of at least about 90%. In some embodiments, said likelihood is generated with an accuracy of at least about 95%. In some embodiments, said likelihood is generated with an area under the curve (AUC) of at least about 0.90.
In some embodiments, said first set of DNA fragments and said second set of DNA fragments each comprises a first amount of external DNA molecules, wherein said external DNA molecules do not contain CpG sites.
In another aspect, the present disclosure provides a method for processing or analyzing a plurality of deoxyribonucleic (DNA) molecules from a biological sample of a subject, comprising: (a) providing a first set of DNA fragments derived from a first portion of said plurality of DNA molecules upon subjecting said first portion of said plurality of DNA molecules to fragmentation conditions sufficient to fragment at least a subset of said first portion of said plurality of DNA molecules at one or more CpG sites, wherein at least a subset of said first set of DNA fragments comprises methylated nucleic acid bases; (b) providing a second set of DNA fragments derived from a second portion of said plurality of DNA molecules, wherein said second portion has a substantially equal amount of DNA as said first portion; (c) for a genomic region, processing (i) said first set of DNA fragments or derivatives thereof to yield a first quantitative measure of DNA methylation and (ii) said second set of DNA fragments or derivatives thereof to yield a second quantitative measure of DNA methylation; and (d) processing said first quantitative measure with said second quantitative measure to yield a third quantitative measure of DNA methylation at said genomic region, thereby generating a methylation profile of said plurality of DNA molecules at said genomic region.
In some embodiments, said biological sample is obtained or derived from a tissue sample, a blood sample, a plasma sample, a serum sample, an exosome sample, a urine sample, a sweat sample, or a saliva sample.
In some embodiments, the method further comprises performing an assay selected from the group consisting of methylation-sensitive restriction enzyme (MSRE) digestion, polymerase chain reaction (PCR), quantitative PCR (qPCR), nucleic acid sequencing, target capture, mass spectrometry-based target fragmentation assay, flap endonuclease-based assay, CRISPR-based assay, methylation-specific assay comprising bisulfite treatment, methylation-specific PCR, targeted sequencing, targeted bisulfite sequencing, pyrosequencing, mass spectroscopy-based bisulfite sequencing (EpiTYPER), reduced representation bisulfite sequence (RRBS), whole genome sequencing (WGS), and a combination thereof.
In some embodiments, said fragmentation conditions comprise MSRE digestion of said first portion of said plurality of DNA molecules to fragment said at least said subset of said first portion of said plurality of DNA molecules at said one or more CpG sites. In some embodiments, said MSRE is selected from the group consisting of AatII, Acc65I, AccI, Acil, ACII, Afel, Agel, Apal, ApaLI, AscI, AsiSI, Aval, AvaII, Aox I, BaeI, BanI, BbeI, BceAI, BcgI, BfuCI, BglI, BisI, BlsI, BmgBI, BsaAI, BsaBI, BsaHI, BsaI, BseYI, BsiEI, BsiWI, BslI, BsmAI, BsmBI, BsmFI, BspDI, BsrBI, BsrFI, BssHII, BssKI, BstAPI, BstBI, BstUI, BstZl7I, Cac8I, ClaI, DpnI, DrdI, EaeI, EagI, Eagl-IF, EciI, EcoRI, EcoRI-HF, FauI, Fnu4HI, FseI, FspI, Gla I, Glu I, HaeII, HgaI, HhaI, HincII, HincII, Hinfl, HinPlI, HpaI, HpaII, Hpyl66ii, Hpyl88iii, Hpy99I, HpyCH4IV, KasI, Kro I, Mal I, MluI, MmeI, MspAlI, Mte I, MwoI, NaeI, NacI, NgoNIV, Nhe-HFI, NheI, NlaIV, NotI, NotI-HF, NruI, Nt.BbvCI, Nt.BsmAI, Nt.CviPII, PaeR7I, PleI, PmeI, PmlI, Pcs I, Pkr I, PshAI, PspOMI, PvuI, RsaI, RsrII, SacII, Sall, SalI-HF, Sau3AI, Sau96I, ScrFI, SfiI, SfoI, SgrAI, Smal, SnaBI, TfiI, TscI, TseI, TspMI, and ZraI.
In some embodiments, processing said first set of DNA fragments or derivatives thereof in (c) (i) comprises subjecting said first set of DNA fragments or derivatives thereof to amplification, and wherein processing said second set of DNA fragments or derivatives thereof in (c) (ii) comprises subjecting said second set of DNA fragments or derivatives thereof to said amplification. In some embodiments, said amplification comprises targeted quantitative polymerase chain reaction (qPCR) at said genomic region. In some embodiments, processing said first set of DNA fragments or derivatives thereof in (c) (i) comprises determining a first cycle threshold (Ct) value for said amplification of said first set of DNA fragments or derivatives thereof at said genomic region, and wherein processing said second set of DNA fragments or derivatives thereof in (c) (ii) comprises determining a second cycle threshold (Ct) value for said amplification of said second set of DNA fragments or derivatives thereof at said genomic region. In some embodiments, (c) comprises determining a reference Ct value for said amplification of said first set of DNA fragments or derivatives thereof and said second set of DNA fragments or derivatives thereof at a reference genomic region, and normalizing said first quantitative measure and said second quantitative measure using said reference Ct value. In some embodiments, said normalizing comprises subtracting said reference Ct value from said first quantitative measure and said second quantitative measure. In some embodiments, processing said first quantitative measure with said second quantitative measure in (d) comprises calculating an intensity ratio of said first quantitative measure and said second quantitative measure at said genomic region. In some embodiments, calculating said intensity ratio comprises determining a difference between said first quantitative measure and said second quantitative measure at said genomic region. In some embodiments, calculating said intensity ratio comprises determining an exponentiation of a base value and said determined difference at said genomic region. In some embodiments, said base value is 2. In some embodiments, calculating said intensity ratio comprises determining a reciprocal of said determined exponentiation at said genomic region.
In some embodiments, the method further comprises subjecting said first set of DNA fragments and said second set of DNA fragments, or derivatives thereof, to conditions sufficient to permit said methylated nucleic acid bases to be distinguished from unmethylated nucleic acid bases. In some embodiments, subjecting said first set of DNA fragments and said second set of DNA fragments, or derivatives thereof, to said conditions comprises performing bisulfite treatment on first set of DNA fragments and said second set of DNA fragments, or derivatives thereof.
In some embodiments, the method further comprises processing said methylation profile with one or more reference methylation profiles. In some embodiments, said one or more reference methylation profiles are obtained from reference biological samples of one or more additional subjects. In some embodiments, said one or more additional subjects comprise healthy subjects. In some embodiments, said one or more additional subjects comprise subjects having a disease or disorder. In some embodiments, said disease or disorder is a cancer. In some embodiments, said disease or disorder is a cancer selected from the group consisting of pancreatic cancer, liver cancer, lung cancer, colorectal cancer, leukemia, bladder cancer, bone cancer, brain cancer, breast cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, melanoma, ovarian cancer, and prostate cancer. In some embodiments, said cancer is prostate cancer.
In some embodiments, said genomic region comprises one or more CpG sites. In some embodiments, said genomic region comprises a plurality of CpG sites. In some embodiments, said plurality of CpG sites comprises at least about 10 CpG sites.
In some embodiments, said genomic region comprises one or more genes selected from the group consisting of SCGB3A1, ANKDDIB, C5orf49, C9orf3, and GPR75-ASB3. In some embodiments, said genomic region is selected from Table 1, Table 2, or SEQ ID NO:1-SEQ ID NO:276. In some embodiments, said genomic region comprises at least about 10, at least about 20, at least about 30, at least about 40, at least about 50, at least about 60, at least about 70, at least about 80, at least about 90, at least about 100, at least about 120, at least about 140, at least about 160, at least about 180, at least about 200, at least about 220, at least about 240, or at least about 260 distinct genomic regions selected from Table 1, Table 2, or SEQ ID NO:1-SEQ ID NO:276. In some embodiments, said genomic region is selected from Table 2.
In some embodiments, the method further comprises electronically outputting a report indicative of said methylation profile. In some embodiments, the method further comprises processing said methylation profile to generate a likelihood of said subject as having or being suspected of having a disease or disorder. In some embodiments, said disease or disorder is a cancer. In some embodiments, said disease or disorder is a cancer selected from the group consisting of pancreatic cancer, liver cancer, lung cancer, colorectal cancer, leukemia, bladder cancer, bone cancer, brain cancer, breast cancer, cervical cancer, endometrial cancer, esophageal cancer, gastric cancer, head and neck cancer, melanoma, ovarian cancer, and prostate cancer. In some embodiments, said cancer is prostate cancer.
In some embodiments, said first set of DNA fragments and said second set of DNA fragments each comprises a first amount of external DNA molecules, wherein said external DNA molecules do not contain CpG sites.
In another aspect, the present disclosure provides a method for identifying prostate cancer of a subject, comprising: (a) using a methylation assay to process a plurality of deoxyribonucleic acid (DNA) molecules from a biological sample of the subject to determine quantitative measures of methylation at each of one or more genes, thereby generating a DNA methylation signature of said biological sample of said subject, wherein said one or more genes comprise genes selected from the group consisting of SCGB3A1, ANKDD1B, C5orf49, C9orf3, and GPR75-ASB3; (b) comparing said DNA methylation signature with one or more reference DNA methylation signatures; and (c) based at least in part on the comparing in (b), identifying the prostate cancer of said subject.
In some embodiments, said biological sample is obtained or derived from a tissue sample, a blood sample, or a urine sample.
In some embodiments, said methylation assay comprises one or more assays selected from the group consisting of: methylation-sensitive restriction enzyme (MSRE) digestion, polymerase chain reaction (PCR), quantitative PCR (qPCR), nucleic acid sequencing, target capture, mass spectrometry-based target fragmentation assay, flap endonuclease-based assay, CRISPR-based assay, methylation-specific assay comprising bisulfite treatment, methylation-specific PCR, targeted sequencing, targeted bisulfite sequencing, pyrosequencing, mass spectroscopy-based bisulfite sequencing (EpiTYPER), reduced representation bisulfite sequence (RRBS), whole genome sequencing (WGS), and a combination thereof.
Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “Figure” and “FIG.” herein), of which:
While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
As used in the specification and claims, the singular form “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. For example, the term “a nucleic acid” includes a plurality of nucleic acids, including mixtures thereof.
As used herein, the term “nucleic acid” generally refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides (dNTPs) or ribonucleotides (rNTPs), or analogs thereof. Nucleic acids may have any three-dimensional structure, and may perform any function, known or unknown. Non-limiting examples of nucleic acids include DNA, RNA, coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant nucleic acids, branched nucleic acids, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A nucleic acid may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be made before or after assembly of the nucleic acid. The sequence of nucleotides of a nucleic acid may be interrupted by non-nucleotide components. A nucleic acid may be further modified after polymerization, such as by conjugation or binding with a reporter agent.
As used herein, the term “target nucleic acid” generally refers to a nucleic acid molecule in a starting population of nucleic acid molecules having a nucleotide sequence whose presence, amount, and/or sequence, or changes in one or more of these, are desired to be determined. A target nucleic acid may be any type of nucleic acid, including DNA, RNA, and analogs thereof. As used herein, a “target ribonucleic acid (RNA)” generally refers to a target nucleic acid that is RNA. As used herein, a “target deoxyribonucleic acid (DNA)” generally refers to a target nucleic acid that is DNA.
As used herein, the term “target” generally refers to a genomic region within a marker gene or marker region. As used herein, the term “reference” generally refers to a sample obtained or derived from a subject who is diagnosed with prostate cancer (prostate cancer patient) or who has received a negative clinical indication of prostate cancer (e.g., a healthy or control subject without prostate cancer).
As used herein, the terms “locus” or “region” are generally interchangeable and refer to a specific genomic region on the genome represented by chromosome number, start position, and end position.
As used herein, the term “subject,” generally refers to an entity or a medium that has testable or detectable genetic information. A subject can be a person or individual, such as a patient. A subject can be a vertebrate, such as, for example, a mammal. Non-limiting examples of mammals include murines, simians, humans, farm animals, sport animals, and pets.
As used herein, the term “sample” generally refers to a bodily sample or part(s) of a subject, which is obtained and analyzed to measure or to determine the character of the whole, such as a specimen of tissue, blood, or urine.
As used herein, the term “tumor suppression genes” generally refers to a group of genes directing the production of the protein that regulates cell division. The tumor suppressor protein can play a role in keeping cell division in check. When mutated, a tumor suppressor gene may become unable to control cell division and lead to uncontrolled cell growth, an important mechanism in tumorigenesis.
As used herein, the term “biomarker” generally refers to any substance, structure, or process that can be measured in a subject's body or its products and be used to influence or predict a clinical outcome or disease with or without treatment, select an appropriate treatment (or predict whether treatment would be effective), or monitor a current treatment and potentially change the treatment.
As used herein, the term “methylation” refers to 5-methyl cytosine (5mc) or 5-hydroxymethylcytosine (5hmC), including cytosine residues that are part of the sequence CG, also denoted as CpG dinucleotides (cytosine residues that are part of other sequences are not methylated). Some CG dinucleotides in the human genome are methylated, and others are not. In addition, methylation can be cell-specific and tissue-specific, such that a specific CG dinucleotide can be methylated in a certain cell and at the same time unmethylated in a different cell, or methylated in a certain tissue and at the same time unmethylated in different tissues. DNA methylation can be an important regulator of gene transcription. Aberrant DNA methylation patterns, both hypermethylation and hypomethylation, as compared to normal tissue, may be associated with a large number of human malignancies. In some embodiments, 5hmC residues of a sequence may be subjected to glucosylation prior to subsequent bisulfite treatment and MSRE digestion. For example, the glucosylation may be performed using a glucosyltransferase.
As used herein, the terms “methylation state,” “methylation status,” and “methylation profile” generally refer to the presence of absence of one or more methylated nucleotide bases in the nucleic acid molecule. For example, a nucleic acid molecule (e.g., DNA molecule) containing a methylated cytosine is considered methylated (e.g., the methylation state of the nucleic acid molecule is methylated). A nucleic acid molecule that does not contain any methylated nucleotides is considered unmethylated.
As used herein, the term “bisulfite treatment” generally refers to the treatment of DNA with bisulfite that converts cytosine residues to uracil residues, but leaves 5-methylcytosine residues unaffected. Therefore, DNA that has been treated with bisulfite may retain only methylated cytosines.
As used herein, the term “pyrosequencing” generally refers to a sequencing-by-synthesis method that quantitatively monitors the real-time incorporation of nucleotides through the enzymatic conversion of released pyrophosphate into a proportional light signal. Analysis of DNA methylation patterns by pyrosequencing may combine a simple reaction protocol with reproducible and accurate measures of the degree of methylation at several CpGs in close proximity with high quantitative resolution. After bisulfite treatment and PCR, the degree of each methylation at each CpG position in a sequence may be determined from the ratio of T and C. The process of purification and sequencing can be repeated for the same template to analyze other CpGs in the same amplification product.
As used herein, the terms “amplifying” and “amplification” are used interchangeably and generally refer to generating one or more copies or “amplified product” of a nucleic acid. The term “DNA amplification” generally refers to generating one or more copies of a DNA molecule or “amplified DNA product.” The term “reverse transcription amplification” generally refers to the generation of deoxyribonucleic acid (DNA) from a ribonucleic acid (RNA) template via the action of a reverse transcriptase. Amplification may be performed by polymerase chain reaction (PCR), which is based on using DNA polymerase to synthesize new strands of DNA complementary to the initial template strands.
As used herein, the term “polymerase chain reaction (PCR)” generally refers to a method for increasing the concentration of a segment of a target sequence in a mixture of genomic DNA without cloning or purification. This process for amplifying the target sequence may comprise introducing a large excess of two oligonucleotide primers to the DNA mixture containing the desired target sequence, followed by a precise sequence of thermal cycling in the presence of a DNA polymerase. The two primers may be complementary to their respective strands of the double-stranded target sequence. To perform amplification, the mixture may be denatured and the primers may be annealed to their complementary sequences within the target molecule. Following annealing, the primers may be extended with a polymerase so as to form a new pair of complementary strands. The steps of denaturation, primer annealing, and polymerase extension can be repeated many times (e.g., denaturation, annealing and extension constitute one “cycle”; there can be numerous “cycles”) to obtain a high concentration of an amplified segment of the desired target sequence. The length of the amplified segment of the desired target sequence may be determined by the relative positions of the primers with respect to each other, and therefore, this length is a controllable parameter. By virtue of the repeating aspect of the process, the method is referred to as “polymerase chain reaction” (PCR). Because the desired amplified segments of the target sequence become the predominant sequences (in terms of concentration) in the mixture, they are said to be “PCR amplified” and are “PCR products” or “amplicons.”
As used herein, the term “DNA template” generally refers to the sample DNA that contains the target sequence. At the beginning of the reaction, high temperature is applied to the original double-stranded DNA molecule to separate the strands from each other.
As used herein, the term “primer” generally refers to a short piece of single-stranded DNA that are complementary to the DNA template. The polymerase begins synthesizing new DNA from the end of the primer.
As used herein, the term “Ct value” generally refers to the number of cycles required for the fluorescent signal to cross a given cycle threshold (e.g., at which the signal exceeds a background level). Ct levels may be inversely proportional to the amount of target nucleic acid in a sample (e.g., the lower the Ct level of a given sample, the greater the amount of target nucleic acid in the sample).
As used herein, the term “restriction enzyme” generally refers to an enzyme that cuts DNA at or near specific recognition nucleotide sequences (e.g., restriction sites).
As used herein, the term “methylation-sensitive” restriction enzyme (MSRE) generally refers to a restriction endonuclease that cleaves its recognition sequence only if it is unmethylated (leaving methylated sites remain intact). The DNA cutting intensity of a “methylation-sensitive” restriction enzyme may depend on the methylation level of the specific sequence, where higher methylation levels lead to less digestion.
As used herein, the term “internal control” generally refers to a sequence from a human genome that does not contain the specific sequences required for methylation-sensitive restriction enzymes to cut.
As used herein, the term “external control” generally refers to a sequence from a non-human genome that does not contain a CG site.
As used herein, the term “methylation-specific PCR (MSP)” generally refers to a tool for qualitative DNA methylation analysis. MSP may have advantages such as ease of design and execution, sensitivity in the ability to detect small quantities of methylated DNA, and the ability to rapidly screen a large number of samples without expensive laboratory equipment. This assay may require modification of the genomic DNA by sodium bisulfite and two independent primer sets for PCR amplification, one pair designed to recognize the methylated versions of the bisulfite-modified sequence and the other pair designed to recognize the unmethylated versions of the bisulfite-modified sequence. The amplicons may be visualized using ethidium bromide staining following agarose gel electrophoresis. Amplicons of the expected size produced from either primer pair may be indicative of the presence of DNA in the original sample with the respective methylation status.
As used herein, the term “reduced representation bisulfite sequencing (RRBS)” generally refers to an efficient and high-throughput technique for analyzing the genome-wide methylation profiles on a single nucleotide level. Reduced representation bisulfite sequencing (RRBS) may combine restriction enzymes and bisulfite sequencing to enrich for areas of the genome with a high CpG content. RRBS can reduce the amount of nucleotides required to sequence to 1% of the genome. The fragments that comprise the reduced genome may still include the majority of promoters, as well as regions such as repeated sequences that are difficult to profile using conventional bisulfite sequencing approaches.
As used herein, the term “targeted (bisulfite) sequencing” generally refers to an accurate, efficient, and economical technology for DNA methylation analysis of target regions, which may include a hybridization-based step on platforms containing pre-designed oligonucleotides (oligos) that capture the CpG islands, gene promoters, and other significant methylated regions, or a PCR-based step to amplify multiple bisulfite-converted DNA regions in a single reaction. Specific primers may be designed to capture the region of interest and evaluate site-specific DNA methylation changes.
As used herein, the term “sensitivity” generally refers to the percentage of a set of samples that report a DNA methylation value above a threshold value that distinguishes between neoplastic (e.g., prostate cancer) and non-neoplastic (e.g., healthy or control) samples. In some embodiments, a positive is defined as a histology-confirmed neoplasia that reports a DNA methylation value above a threshold value (e.g., the range associated with disease), and a false negative is defined as a histology-confirmed neoplasia that reports a DNA methylation value below the threshold value (e.g., the range associated with no disease). The value of sensitivity may reflect the probability that a DNA methylation measurement for a given marker obtained from a diseased sample falls in the range of disease-associated measurements. The clinical relevance of the calculated sensitivity value may represent an estimation of the probability that a given marker can detect or predict the presence of a clinical condition when applied to a subject having the clinical condition.
As used herein, the term “specificity” generally refers to the percentage of non-neoplastic samples that report a DNA methylation value below a threshold value that distinguishes between neoplastic and non-neoplastic samples. In some embodiments, a negative is defined as a histology-confirmed non-neoplastic sample that reports a DNA methylation value below the threshold value (e.g., the range associated with no disease) and a false positive is defined as a histology-confirmed non-neoplastic sample that reports a DNA methylation value above the threshold value (e.g., the range associated with disease). The value of specificity may reflect the probability that a DNA methylation measurement for a given marker obtained from a non-neoplastic (e.g., healthy or control) sample falls in the range of non-disease associated measurements. The clinical relevance of the calculated specificity value may represent an estimation of the probability that a given marker can detect or predict the absence of a clinical condition when applied to a subject not having the clinical condition.
As used herein, the term “AUC” or “AUROC” generally refers to an abbreviation for the area under a Receiver Operating Characteristic (ROC) curve. The ROC curve may be a plot of the true positive rate (TPR) against the false positive rate (FPR) for a plurality of different possible thresholds or cut points of a diagnostic test, thereby illustrating the trade-off between sensitivity and specificity depending on the selected cut point (e.g., any increase in sensitivity is accompanied by a decrease in specificity). The area under an ROC curve (AUC) can be a measure for the accuracy of a diagnostic test (e.g., the larger the area, the more accurate the diagnosis), with an optimal value of 1. In comparison, a random test may have an ROC curve lying on the diagonal with an AUC of 0.5 (e.g., representing a random or worthless test).
Prostate cancer is the second most common form of cancer in men worldwide, accounting for about 15% of the cancers diagnosed in men. Currently, the only available blood molecular screening method for prostate cancer is a prostate-specific antigen (PSA) test, which aims to detect prostate cancer at an early stage when the disease is amenable to curative treatment and reduce the overall disease-specific mortality. However, PSA is not a cancer-specific biomarker, and its level often increases abnormally in prostate-benign patients. Further, only a small minority of men with an elevated PSA level are actually found to have prostate cancer when a biopsy is performed. Therefore, there is scant evidence to establish that PSA screening for prostate cancer can save lives. Currently, biopsy is the only method for prostate cancer diagnosis, but high false negative rates of biopsy can leads significant percentages of men remaining undiagnosed after the first biopsy. Further, marker of prostate cancer such as the tumor stage, Gleason score, and PSA level cannot accurately identify the individuals ultimately failing of a treatment. Thus, there is an urgent need to develop high performance assays with panels of good biomarkers to reduce the need for repeated biopsies for prostate cancer diagnosis and to direct treatment strategies for prostate cancer.
Inactivation of tumor suppression genes is an important event contributing to the development of neoplastic malignancies. The alteration of a gene promoter DNA methylation may be often correlated to gene expression level changes. DNA methylation can occur when DNA methyltransferase adds a methyl group to a DNA molecule at a cytosine-phosphate-guanine (CpG) site without changing the sequence of the DNA molecule. DNA methylation may be an early event during tumorigenesis, and global abnormal DNA methylation may be observed in different tumor types. In general, cancer can be characterized by global hypomethylation (resulting in increased oncogene expression and genomic instability) and by gene-specific promoter hypermethylation resulting in suppressed DNA repair and other tumor-suppressive functions. DNA methylation may be stable in fixed samples over time and may be detectable in various bodily fluids and tissue. DNA methylation may also be cell-type specific. Further, various techniques for measuring DNA methylation can be performed. In light of all these characteristics, DNA methylation may be promising targets for the development of powerful diagnostic, prognostic, and predictive biomarkers for cancers.
The present disclosure provides methods, systems, and kits for detecting prostate cancer in a subject by analyzing nucleic acids from biological samples (e.g., tissue samples and/or bodily fluid samples) obtained from or derived from the subject for abnormal methylation profiles (e.g., relative to reference samples or methylation profiles). Biological samples obtained from subjects may be analyzed to measure a presence, absence, or relative assessment of the prostate cancer. The analysis may be performed at a set of genomic regions, such as DNA methylation marker regions. The subjects may include subjects with prostate cancer (e.g., prostate cancer patients) and subjects without prostate cancer (e.g., normal or healthy controls).
Methods and systems of the present disclosure may use methylation-sensitive restriction enzymes (MSRE) to analyze the methylation status of cytosine residues in CpG sequences. The enzymes may be unable to cleave methylated-cytosine residues and therefore leave methylated DNA intact. In some embodiments, sample DNA obtained or derived from a test subject can be digested with at least one methylation-sensitive restriction enzyme. For example, biomarkers of the present disclosure may include genomic loci that contain at least one specific MSRE recognized sequence (recognition site). The sample DNA can be cut (digested) according to its methylation level, where higher methylation results in less digestion by the enzyme. For example, if a DNA sample from a healthy subject is less methylated than another DNA sample from a cancer patient for the CpGs on the recognition sequence, it will be cut more extensively.
In some embodiments, a control locus is designed to be without MSRE cutting sites. In some embodiments, a fixed proportion of control DNA is added into the sample DNA for all test subjects. In some embodiments, at least one pair of qPCR primers is designed for each target genomic region of a biomarker. For each patient, two qPCR reactions are run independently on the same qPCR target: a first qPCR reaction is run on a first portion of the sample DNA that contains MSRE-digested DNA template, and a second qPCR reaction is run on a second portion of the sample DNA that contains undigested DNA templates. The undigested template may be used to represent the fully methylated DNA. After the purification of the MSRE digestion, the same amount of DNA may be used for the digested and undigested templates. The signal intensity of the qPCR reaction may be generated from the cycle threshold (Ct) values. For each locus of a given subject, the Ct difference (delta Ct) between the first qPCR reaction (run on the digested DNA template) and the second qPCR reaction (run on the undigested DNA template) is calculated and used to indicate the DNA methylation level of the subject. Thus, the delta Ct value can represent the subject's DNA methylation level for the target region. For example, the undigested DNA may have low Ct values, while the digested DNA from a normal individual may have high Ct values, thereby resulting in large absolute delta Ct values. Otherwise, the delta Ct values from a prostate cancer patient may be small (e.g., close to 0).
Using methods and systems of the present disclosure, prostate cancer can be accurately detected using a non user-dependent assay with high sensitivity and specificity in prostate tissue samples. The blood-based assay can use a set of biomarkers that accurately distinguish prostate cancer samples from control samples across all stages of prostate cancer. Further, the blood-based assay may offer high specificity, thereby facilitating the non-invasive application of prostate cancer associated biomarkers for treatment monitoring of prostate cancer patients.
The use of methods, systems, and kits of the present disclosure for prostate cancer detection based on analysis of aberrant methylation profiles (e.g., containing abnormal DNA methylation over a panel of predetermined biomarker genomic regions) may comprise the following steps:
1) Extracting DNA molecules from a biological sample (e.g., tissue, blood, urine, or exosome) of a test subject;
2) Preparing two sub-samples of equal or substantially equal size (e.g., amount and/or volume) obtained or derived from the same subject;
3) Digesting the first sub-sample of the DNA mixture with at least one MSRE to produce DNA fragments, and purifying the digested DNA fragments;
4) Performing qPCR amplification of the digested DNA fragments and the undigested DNA molecules for a panel of one or more target genes (e.g., prostate cancer biomarker genes or biomarker regions) and an internal control locus;
5) Normalizing the cycle threshold (Ct) values by subtracting the control Ct from the target Ct measured for the same subject;
6) Calculating the intensity ratio given by: 1/2∧(Ct of normalized MSRE digested DNA−Ct of normalized undigested DNA) to produce an intensity ratio of the test sample; and
7) Comparing the intensity ratio of the test sample to the intensity ratio of reference samples obtained or derived from one or more prostate cancer patients and one or more control (e.g., healthy or normal) subjects.
Processing Biological Samples
In an aspect, the present disclosure provides a method for identifying or monitoring prostate cancer in a subject by processing or analyzing DNA molecules from a biological sample of the subject. The method may comprise providing a first set of DNA fragments derived from a first portion of the DNA molecules upon subjecting the first portion to CpG site fragmentation conditions. For example, DNA molecules of a urine sample may be split into two sub-samples, and the first DNA sub-sample may be MSRE-digested to fragment the DNA molecules at CpG sites. The two sub-samples may be of equal or substantially equal size (e.g., amount or volume). Next, the method may comprise providing a second set of DNA fragments derived from a second portion of the DNA molecules, wherein the second portion is not subjected to fragmentation conditions. For example, after the DNA molecules of a urine sample are split into two sub-samples, the second DNA sub-sample may not be subjected to fragmentation conditions such as MSRE digestion. Next, the method may comprise, for a genomic region, processing (i) the first set of DNA fragments or derivatives thereof to yield a first quantitative measure of DNA methylation and (ii) the second set of DNA fragments or derivatives thereof to yield a second quantitative measure of DNA methylation. Next, the method may comprise processing the first quantitative measure with the second quantitative measure to yield a third quantitative measure of DNA methylation at the genomic region, to generate a methylation profile of the plurality of DNA molecules at the genomic region.
The biological samples may be obtained (as in operation 102) or derived from a tissue sample, a blood sample, a plasma sample, a serum sample, a saliva sample, a sputum sample, a urine sample, a stool sample, a sweat sample, a Pap smear sample, or an exosome sample from a human subject. The biological samples may be stored in a variety of storage conditions before processing, such as different temperatures (e.g., at room temperature, under refrigeration or freezer conditions, at 4° C., at −18° C., −20° C., or at −80° C.) or different preservatives (e.g., alcohol, formaldehyde, or potassium dichromate).
The biological sample may be obtained from a subject with a disease or disorder, from a subject that is suspected of having the disease or disorder, or from a subject that does not have or is not suspected of having the disease or disorder. The disease or disorder may be an infectious disease, an immune disorder or disease, a cancer, a genetic disease, a degenerative disease, a lifestyle disease, an injury, a rare disease or an age related disease. The infectious disease may be caused by bacteria, viruses, fungi, and/or parasites. The cancer may be a prostate cancer. The sample may be taken before and/or after treatment of a subject with a disease or disorder. Samples may be taken before and/or after a treatment. Samples may be taken during a treatment or a treatment regime. Multiple samples may be taken from a subject to monitor the effects of the treatment over time. The sample may be taken from a subject known or suspected of having a prostate cancer for which a definitive positive or negative diagnosis is not available via clinical tests.
The sample may be taken from a subject suspected of having a disease or a disorder. The sample may be taken from a subject experiencing unexplained symptoms, such as fatigue, nausea, weight loss, aches and pains, weakness, or memory loss. The sample may be taken from a subject having explained symptoms. The sample may be taken from a subject at risk of developing a disease or disorder due to factors such as familial history, age, environmental exposure, lifestyle risk factors, or presence of other known risk factors.
After obtaining a biological sample from the subject, the biological sample obtained from the subject may be assayed to generate methylation data indicative of a presence, absence, or relative assessment of a prostate cancer of a subject. For example, a presence, absence, or relative assessment of nucleic acid molecules of the biological sample at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at a plurality of prostate cancer-associated genomic loci) may be indicative of prostate cancer of the subject. The biological samples obtained or derived from the subject may be processed by (i) subjecting the biological sample to conditions that are sufficient to isolate, enrich, or extract a plurality of nucleic acid molecules (e.g., DNA molecules), and (ii) assaying the plurality of nucleic acid molecules to generate a methylation profile of the nucleic acid molecules at the panel of prostate cancer-associated genomic loci.
A plurality of nucleic acid molecules may be extracted from the biological sample (as in operation 104) and subjected to further assaying (e.g., sequencing to generate a plurality of sequencing reads). The nucleic acid molecules may comprise deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The nucleic acid molecules (e.g., DNA or RNA) may be extracted from the biological sample by a variety of methods, such as a FastDNA Kit protocol from MP Biomedicals or a DNeasy Blood & Tissue Kit from QIAGEN. The extraction method may extract all DNA molecules from a sample. Alternatively, the extraction method may selectively extract a portion of DNA molecules from a sample. Extracted RNA molecules from a sample may be converted to DNA molecules by reverse transcription (RT).
The method may comprise a variety of assays suitable for assessing the presence of DNA methylation (e.g., at one or more CpG sites) at the prostate cancer-specific markers in a biological sample. The DNA molecules may be assayed using an assay including, for example, methylation-sensitive restriction enzyme (MSRE) digestion, polymerase chain reaction (PCR), quantitative PCR (qPCR), digital PCR (dPCR), nucleic acid sequencing, target capture, mass spectrometry-based target fragmentation assay, flap endonuclease-based assay, CRISPR-based assay, methylation-specific assay comprising bisulfite treatment, methylation-specific PCR (MSP), COLD-PCR, targeted sequencing, targeted bisulfite sequencing, pyrosequencing, mass spectroscopy-based bisulfite sequencing (EpiTYPER), reduced representation bisulfite sequence (RRBS), whole genome sequencing (WGS), amplification fragment length polymorphism, amplification fragment length polymorphism (AFLP), enzyme-linked immunosorbent assay (ELISA), luminometric methylation assay (LUMA), methyl-sensitive cut counting (MSCC), high-performance liquid chromatograph (HPLC), microarray, bead array, or a combination thereof. For example, the assay may comprise restriction landmark genomic scanning and/or methylation-sensitive restriction enzyme (MSRE) digestion followed by quantitative PCR. The assay may utilize bisulfite treatment of DNA as a step of methylation analysis. After the bisulfite treatment, subsequent steps can include methylation-specific PCR (MSP), targeted sequencing, pyrosquencing, Epityper, reduced representation sequencing, whole genome sequencing, whole genome bisulfite sequencing (WGBS), or a combination thereof. All these methods are prevented from being used to measure DNA methylation of a single/multiple CpG sites in current invented region on the human genome.
In some embodiments, the methylation-sensitive restriction enzyme (MSRE) is selected from the group consisting of AatII, Acc65I, AccI, Acil, ACII, Afel, Agel, Apal, ApaLI, AscI, AsiSI, Aval, AvaII, Aox I, BaeI, BanI, BbeI, BceAI, BcgI, BfuCI, BglI, BisI, BlsI, BmgBI, BsaAI, BsaBI, BsaHI, BsaI, BseYI, BsiEI, BsiWI, BslI, BsmAI, BsmBI, BsmFI, BspDI, BsrBI, BsrFI, BssHII, BssKI, BstAPI, BstBI, BstUI, BstZl7I, Cac8J, ClaI, DpnI, DrdI, EaeI, EagI, Eagl-IF, EciI, EcoRI, EcoRI-HF, FauI, Fnu4HI, FseI, FspI, Gla I, Glu I, HaeII, HgaI, HhaI, HincII, HincII, Hinfl, HinPlI, HpaI, HpaII, Hpyl66ii, Hpyl88iii, Hpy99I, HpyCH4IV, KasI, Kro I, Mal I, MluI, MmeI, MspAlI, Mte I, MwoI, NaeI, NacI, NgoNIV, Nhe-HFI, NheI, NlaIV, NotI, NotI-HF, NruI, Nt.BbvCI, Nt.BsmAI, Nt.CviPII, PaeR7I, PleI, PmeI, PmlI, Pcs I, Pkr I, PshAI, PspOMI, PvuI, RsaI, RsrII, SacII, Sall, SalI-HF, Sau3AI, Sau96I, ScrFI, SfiI, SfoI, SgrAI, Smal, SnaBI, TfiI, TscI, TseI, TspMI, and ZraI.
The nucleic acid sequencing may be performed by any suitable sequencing methods, such as massively parallel sequencing, paired-end sequencing, high-throughput sequencing, next-generation sequencing (NGS), shotgun sequencing, single-molecule sequencing, nanopore sequencing, semiconductor sequencing, pyrosequencing, sequencing-by-synthesis (SBS), sequencing-by-ligation, and sequencing-by-hybridization, RNA-Seq (Illumina). The sequencing may comprise nucleic acid amplification (e.g., of DNA or RNA molecules). In some embodiments, the nucleic acid amplification is polymerase chain reaction (PCR). A suitable number of rounds of PCR (e.g., PCR, qPCR, reverse-transcriptase PCR, digital PCR, etc.) may be performed to sufficiently amplify an initial amount of nucleic acid (e.g., DNA) to a desired input quantity for subsequent sequencing. In some cases, the PCR may be used for global amplification of nucleic acids. This may comprise using adapter sequences that may be first ligated to different molecules followed by PCR amplification using universal primers. PCR may be performed using any of a number of commercial kits, e.g., provided by Life Technologies, Affymetrix, Promega, Qiagen, etc. In other cases, only certain target nucleic acids within a population of nucleic acids may be amplified (e.g., one or more of the panel of prostate cancer biomarkers or prostate cancer-associated genomic loci). Specific primers, possibly in conjunction with adapter ligation, may be used to selectively amplify certain targets for downstream sequencing. The PCR may comprise targeted amplification of one or more genomic loci, such as genomic loci associated with prostate cancer (e.g., listed in databases such as TCGA or COSMIC). The sequencing may comprise use of simultaneous reverse transcription (RT) and polymerase chain reaction (PCR), such as a OneStep RT-PCR kit protocol by Qiagen, NEB, Thermo Fisher Scientific, or Bio-Rad.
DNA or RNA molecules may be tagged, e.g., with identifiable tags, to allow for multiplexing of a plurality of samples. Any number of DNA or RNA samples may be multiplexed. For example a multiplexed reaction may contain DNA or RNA from at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, or more than 100 initial samples. For example, a plurality of samples may be tagged with sample barcodes such that each DNA molecule may be traced back to the sample (and the subject) from which the DNA molecule originated. Such tags may be attached to DNA or RNA molecules by ligation or by PCR amplification with primers.
After subjecting the nucleic acid molecules to sequencing, suitable bioinformatics processes may be performed on the sequence reads to generate the methylation data indicative of the presence, absence, or relative assessment of the prostate cancer. For example, the sequence reads may be aligned to one or more reference genomes (e.g., a genome of one or more species such as a human genome). The aligned sequence reads may be quantified at a panel of genomic loci to generate the data indicative of a distribution of the presence, absence, or relative assessment of the prostate cancer. For example, quantification of sequences corresponding to a panel of genomic loci associated with prostate cancer may generate the methylation data indicative of the presence, absence, or relative assessment of the prostate cancer.
The prostate cancer may be identified or monitored in the subject by using probes configured to selectively enrich nucleic acid (e.g., DNA or RNA) molecules corresponding to the panel of genomic loci (e.g., prostate cancer-associated genomic loci). The probes may be nucleic acid primers. The probes may have sequence complementarity with nucleic acid sequences (e.g., about 5 nucleotides, about 10 nucleotides, about 15 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 45 nucleotides, about 50 nucleotides, about 55 nucleotides, about 60 nucleotides, about 65 nucleotides, about 70 nucleotides, about 75 nucleotides, about 80 nucleotides, about 85 nucleotides, about 90 nucleotides, about 95 nucleotides, about 100 nucleotides, or more than about 100 nucleotides) from one or more of the individual genomic loci (e.g., prostate cancer-associated genomic loci). The one or more genomic loci (e.g., prostate cancer-associated genomic loci) may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, or more than about 100 distinct genomic loci (e.g., prostate cancer-associated genomic loci). In some embodiments, the panel of genomic loci comprises one or more prostate cancer-associated genomic loci listed in Table 1.
The biological sample may be processed without any nucleic acid extraction. For example, the processing may comprise assaying the biological sample using probes that are selected for the panel of genomic loci (e.g., prostate cancer-associated genomic loci). The panel of genomic loci (e.g., prostate cancer-associated genomic loci) may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, or more than about 100 distinct genomic loci (e.g., prostate cancer-associated genomic loci). In some embodiments, the panel of genomic loci comprises one or more prostate cancer-associated genomic loci listed in Table 1.
The processing may comprise assaying the biological sample using probes that are selective for the one or more genomic loci (e.g., prostate cancer-associated genomic loci) among other genomic loci in the biological sample. The probes may be nucleic acid molecules (e.g., DNA or RNA) having sequence complementarity with nucleic acid sequences (e.g., about 5 nucleotides, about 10 nucleotides, about 15 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 45 nucleotides, about 50 nucleotides, about 55 nucleotides, about 60 nucleotides, about 65 nucleotides, about 70 nucleotides, about 75 nucleotides, about 80 nucleotides, about 85 nucleotides, about 90 nucleotides, about 95 nucleotides, about 100 nucleotides, or more than about 100 nucleotides) from one or more of the individual genomic loci (e.g., prostate cancer-associated genomic loci). These nucleic acid molecules may be primers or enrichment sequences. The assaying of the biological sample using probes that are selected for the one or more genomic loci (e.g., prostate cancer-associated genomic loci) may comprise use of array hybridization, polymerase chain reaction (PCR), or nucleic acid sequencing (e.g., DNA sequencing or RNA sequencing).
The assay readouts may be quantified at one or more of the panel of genomic loci (e.g., prostate cancer-associated genomic loci) to generate the methylation data indicative of a presence, absence, or relative assessment of the prostate cancer. For example, quantification of array hybridization or polymerase chain reaction (PCR) corresponding to the panel of prostate cancer-associated genomic loci may generate methylation data at the panel of prostate cancer-associated genomic loci in the biological sample. Assay readouts may comprise quantitative PCR (qPCR) values, digital PCR (dPCR) values, digital droplet PCR (ddPCR) values, fluorescence values, etc.
Cycle threshold (Ct) values may be obtained for each amplified region of a set of genomic regions (e.g., prostate cancer associated biomarkers) and normalized based on the external control locus (as in operation 112). For example, a first cycle threshold (Ct) value may be determined for the amplification of a set of digested DNA fragments or derivatives thereof at one or more genomic regions, and a second cycle threshold (Ct) value may be determined for the amplification of a set of undigested DNA molecules or derivatives thereof at the one or more genomic regions. The undigested template may be used to represent the fully methylated DNA. After the purification of the MSRE digestion, the same amount of DNA may be used for the qPCR analysis of the digested and undigested templates. Reference Ct values may be generated based on the external control locus for (i) the amplification of the set of digested DNA fragments or derivatives thereof at the one or more genomic regions, and for (ii) the amplification of the set of undigested DNA molecules or derivatives thereof at the one or more genomic regions. Then, the first Ct value (for amplification of digested DNA) and the second Ct value (for amplification of undigested DNA) can be normalized using the difference of the internal control gene's Ct values before and after the digestion (delta Ctc). The normalization may comprise subtracting the delta Ctc value from the difference between the first Ct value (for amplification of digested DNA) and the second Ct value (for amplification of undigested DNA).
For each locus of a given subject, the Ct difference (delta Ct) between the first qPCR reaction (run on the digested DNA template) and the second qPCR reaction (run on the undigested DNA template) is calculated and used to indicate the DNA methylation level of the subject. Thus, the delta Ct value can represent the subject's DNA methylation level for the target region. For example, the undigested DNA may have low Ct values, while the digested DNA from a normal individual may have high Ct values, thereby resulting in large absolute delta Ct values. Otherwise, the delta Ct values from a prostate cancer patient may be small (e.g., close to 0).
Next, the qPCR signal intensity may be calculated for the biomarker region from the cycle threshold (Ct) values (as in operation 114). For example, the signal intensity can be given by 2{circumflex over ( )}[Ct, biomarker restriction locus−Ct, internal control locus]. An intensity ratio may be calculated using the first Ct value (for amplification of digested DNA) and the second Ct value (for amplification of undigested DNA), such as by determining the reciprocal of an exponentiation of (i) a base value (e.g., 2, 10, or e) and (ii) a difference between the first Ct value and the second Ct value.
Next, a likelihood (e.g., a probability score) may be calculated, which reflects the correlation between the biomarker signal intensity in the subject and tumor references and/or the correlation between the biomarker signal intensity in the subject and normal references (as in operation 116). Such a likelihood or probability score may be determined using a classifier, as described herein.
Kits
The present disclosure provides kits for identifying or monitoring a prostate cancer in a subject. A kit may comprise probes for identifying a presence, absence, or relative amount of sequences at the panel of prostate cancer-associated genomic loci in a biological sample of the subject, which may be indicative of a prostate cancer. The probes may be selective for the sequences at the panel of prostate cancer-associated genomic loci in the biological sample. A kit may comprise instructions for using the probes to process the biological sample to generate methylation data at the panel of prostate cancer-associated genomic loci in a biological sample of the subject.
The probes in the kit may be selective for the sequences at the plurality of prostate cancer-associated genomic loci in the biological sample. The probes in the kit may be configured to selectively enrich nucleic acid (e.g., DNA or RNA) molecules corresponding to the panel of prostate cancer-associated genomic loci. The probes in the kit may be nucleic acid primers. The probes in the kit may have sequence complementarity with nucleic acid sequences from one or more of the prostate cancer-associated genomic loci. The one or more genomic loci (e.g., prostate cancer-associated genomic loci) may comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, about 15, about 20, about 25, about 30, about 35, about 40, about 45, about 50, about 55, about 60, about 65, about 70, about 75, about 80, about 85, about 90, about 95, about 100, or more than about 100 distinct genomic loci (e.g., prostate cancer-associated genomic loci). In some embodiments, the one or more genomic loci comprise one or more prostate cancer-associated genomic loci listed in Table 1.
The instructions in the kit may comprise instructions to assay the biological sample using the probes that are selective for the sequences at the panel of prostate cancer-associated genomic loci in the biological sample. The probes may be nucleic acid molecules (e.g., DNA or RNA) having sequence complementarity with nucleic acid sequences (e.g., about 5 nucleotides, about 10 nucleotides, about 15 nucleotides, about 20 nucleotides, about 25 nucleotides, about 30 nucleotides, about 35 nucleotides, about 40 nucleotides, about 45 nucleotides, about 50 nucleotides, about 55 nucleotides, about 60 nucleotides, about 65 nucleotides, about 70 nucleotides, about 75 nucleotides, about 80 nucleotides, about 85 nucleotides, about 90 nucleotides, about 95 nucleotides, about 100 nucleotides, or more than about 100 nucleotides) from one or more of the individual genomic loci (e.g., prostate cancer-associated genomic loci). These nucleic acid molecules may be primers or enrichment sequences. The instructions to assay the biological sample may comprise introductions to perform array hybridization, polymerase chain reaction (PCR), or nucleic acid sequencing (e.g., DNA sequencing or RNA sequencing) to process the biological sample to generate methylation data indicative of a presence, absence, or relative amount of sequences at the panel of prostate cancer-associated genomic loci in the biological sample, which may be indicative of a prostate cancer.
The instructions in the kit may comprise instructions to measure and interpret assay readouts, which may be quantified at one or more of the panel of prostate cancer-associated genomic loci to generate the methylation data indicative of a presence, absence, or relative amount of sequences at the panel of prostate cancer-associated genomic loci in the biological sample. For example, quantification of array hybridization or polymerase chain reaction (PCR) corresponding to the panel of prostate cancer-associated genomic loci may generate methylation data at the panel of prostate cancer-associated genomic loci in the biological sample. Assay readouts may comprise quantitative PCR (qPCR) values, digital PCR (dPCR) values, digital droplet PCR (ddPCR) values, fluorescence values, etc., or normalized values thereof.
Classifiers
After processing the biological sample from the subject, a classifier may be used to process the methylation data at the panel of prostate cancer-associated genomic loci to classify the biological sample, thereby identifying or assessing a prostate cancer of the subject. The classifier may be configured to identify the prostate cancer with an accuracy of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than 99% for at least about 25, at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, at least about 500, or more than about 500 independent samples.
The classifier may comprise a supervised machine learning algorithm or an unsupervised machine learning algorithm. The classifier may comprise a classification and regression tree (CART) algorithm. The classifier may comprise, for example, a support vector machine (SVM), a linear regression, a logistic regression, a nonlinear regression, a neural network, a Random Forest, a deep learning algorithm, a naïve Bayes classifier.
The classifier may be configured to accept a plurality of input variables and to produce one or more output values based on the plurality of input variables. The plurality of input variables may comprise data indicative of a presence, absence, or relative amount of sequences or methylated residues at each of the plurality of prostate cancer-associated genomic loci. For example, an input variable may comprise a number of sequences or methylated residues corresponding to or aligning to each of the plurality of prostate cancer-associated genomic loci.
The classifier may have one or more possible output values, each comprising one of a fixed number of possible values (e.g., a linear classifier, a logistic regression classifier, etc.) indicating a classification of the biological sample by the classifier. The classifier may comprise a binary classifier, such that each of the one or more output values comprises one of two values (e.g., {0, 1}, {positive, negative}, or {cancerous, non-cancerous}) indicating a classification of the biological sample by the classifier. The classifier may be another type of classifier, such that each of the one or more output values comprises one of more than two values (e.g., {0, 1, 2}, {positive, negative, or indeterminate}, or {cancerous, non-cancerous, or indeterminate}) indicating a classification of the biological sample by the classifier. The output values may comprise descriptive labels, numerical values, or a combination thereof. Some of the output values may comprise descriptive labels. Such descriptive labels may provide an identification or indication of the disease state of the subject, and may comprise, for example, positive, negative, cancerous, non-cancerous, or indeterminate. Such descriptive labels may provide an identification of a treatment for the subject's disease state, and may comprise, for example, a therapeutic intervention, a duration of the therapeutic intervention, and/or a dosage of the therapeutic intervention. Such descriptive labels may provide an identification of secondary clinical tests that may be appropriate to perform on the subject, and may comprise, for example, a biopsy, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, or a PET-CT scan. Such descriptive labels may provide a prognosis of the disease state of the subject. Some descriptive labels may be mapped to numerical values, for example, by mapping “positive” to 1 and “negative” to 0.
Some of the output values may comprise numerical values, such as binary, integer, or continuous values. Such binary output values may comprise, for example, {0, 1}. Such integer output values may comprise, for example, {0, 1, 2}. Such continuous output values may comprise, for example, a probability value of at least 0 and no more than 1. Such continuous output values may comprise, for example, an un-normalized probability value of at least 0. Such continuous output values may comprise, for example, an un-normalized probability value of at least 0. Such continuous output values may indicate a prognosis of the disease or disorder state of the subject and may comprise, for example, an indication of an expected or average progression-free survival (PFS) or overall survival (OS) of the subject. Such continuous output values may indicate a prediction of the course of treatment to treat the disease or disorder state of the subject and may comprise, for example, an indication of an expected duration of efficacy of the course of treatment. Some numerical values may be mapped to descriptive labels, for example, by mapping 1 to “positive” and 0 to “negative.”
Some of the output values may be assigned based on one or more cutoff values. For example, a binary classification of samples may assign an output value of “positive” or 1 if the sample indicates that the subject has at least a 50% probability of being diseased. For example, a binary classification of samples may assign an output value of “negative” or 0 if the sample indicates that the subject has less than a 50% probability of being diseased. In this case, a single cutoff value of 50% is used to classify samples into one of the two possible binary output values. Examples of single cutoff values may include about 1%, about 2%, about 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 95%, about 98%, and about 99%.
As another example, a classification of samples may assign an output value of “positive” or 1 if the sample indicates that the subject has a probability of being diseased of at least 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 98%, or at least about 99%. The classification of samples may assign an output value of “positive” or 1 if the sample indicates that the subject has a probability of being diseased of more than 50%, more than 55%, more than 60%, more than 65%, more than 70%, more than 75%, more than 80%, more than 85%, more than 90%, more than 95%, more than 98%, or more than 99%. The classification of samples may assign an output value of “negative” or 0 if the sample indicates that the subject has a probability of being diseased of less than 50%, less than 45%, less than 40%, less than 35%, less than 30%, less than 25%, less than 20%, less than 10%, less than 5%, less than 2%, or less than 1%. The classification of samples may assign an output value of “negative” or 0 if the sample indicates that the subject has a probability of being diseased of no more than 50%, no more than 45%, no more than 40%, no more than 35%, no more than 30%, no more than 25%, no more than 20%, no more than 10%, no more than 5%, no more than 2%, or no more than 1%. The classification of samples may assign an output value of “indeterminate” or 2 if the sample has not been classified as “positive,” “negative,” 1, or 0. In this case, a set of two cutoff values is used to classify samples into one of the three possible output values. Examples of sets of cutoff values may include {1%, 99%}, {2%, 98%}, {5%, 95%}, {10%, 90%}, {15%, 85%}, {20%, 80%}, {25%, 75%}, {30%, 70%}, {35%, 65%}, {40%, 60%}, and {45%, 55%}. Similarly, sets of n cutoff values may be used to classify samples into one of n+1 possible output values, where n is any positive integer.
The classifier may be trained with a plurality of independent training samples. Each of the independent training samples may comprise a biological sample from a subject, associated data obtained by processing the biological sample (as described elsewhere herein), and one or more known output values corresponding to the biological sample (e.g., a clinical diagnosis, prognosis, treatment efficacy, or absence of a disease or disorder such as a prostate cancer of the subject). Independent training samples may comprise biological samples and associated data and outputs obtained from a plurality of different subjects. Independent training samples may comprise biological samples and associated data and outputs obtained at a plurality of different time points from the same subject (e.g., before, after, and/or during a course of treatment to treat a disease or disorder of the subject). Independent training samples may be associated with presence of the prostate cancer (e.g., training samples comprising biological samples and associated data and outputs obtained from a plurality of subjects known to have the prostate cancer). Independent training samples may be associated with absence of the prostate cancer (e.g., training samples comprising biological samples and associated data and outputs obtained from a plurality of subjects who are known to not have a previous diagnosis of the prostate cancer, or otherwise who are asymptomatic for the prostate cancer).
The classifier may be trained with at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, at least about 350, at least about 400, at least about 450, or at least about 500 independent training samples. The independent training samples may comprise samples associated with presence of the prostate cancer and/or samples associated with absence of the prostate cancer. The classifier may be trained with no more than 500, no more than 450, no more than 400, no more than 350, no more than 300, no more than 250, no more than 200, no more than 150, no more than 100, or no more than 50 independent training samples associated with presence of the prostate cancer. In some embodiments, the biological sample is independent of samples used to train the classifier.
The classifier may be trained with a first number of independent training samples associated with presence of the prostate cancer and a second number of independent training samples associated with absence of the prostate cancer. The first number of independent training samples associated with presence of the prostate cancer may be no more than the second number of independent training samples associated with absence of the prostate cancer. The first number of independent training samples associated with presence of the prostate cancer may be equal to the second number of independent training samples associated with absence of the prostate cancer. The first number of independent training samples associated with presence of the prostate cancer may be greater than the second number of independent training samples associated with absence of the prostate cancer.
The classifier may be configured to identify the prostate cancer with an accuracy of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%, for at least about 50, at least about 100, at least about 150, at least about 200, at least about 250, at least about 300, or more than about 300 independent samples. The accuracy of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples (e.g., subjects having the prostate cancer, or apparently healthy subjects with negative clinical test results for the prostate cancer) that are correctly identified or classified as having or not having the prostate cancer, respectively.
The classifier may be configured to identify the prostate cancer with a positive predictive value (PPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The PPV of identifying the prostate cancer by the classifier may be calculated as the percentage of biological samples identified or classified as having the prostate cancer that correspond to subjects that truly have the prostate cancer. A PPV may also be referred to as a precision.
The classifier may be configured to identify the prostate cancer with a negative predictive value (NPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The NPV of identifying the prostate cancer by the classifier may be calculated as the percentage of biological samples identified or classified as not having the prostate cancer that correspond to subjects that truly do not have the prostate cancer.
The classifier may be configured to identify the prostate cancer with a clinical sensitivity of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The clinical sensitivity of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples associated with presence of the prostate cancer (e.g., subjects known to have the prostate cancer) that are correctly identified or classified as having the prostate cancer. A clinical sensitivity may also be referred to as a recall.
The classifier may be configured to identify the prostate cancer with a clinical specificity of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The clinical specificity of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples associated with absence of the prostate cancer (e.g., apparently healthy subjects with negative clinical test results for the prostate cancer) that are correctly identified or classified as not having the prostate cancer.
The classifier may be configured to identify the prostate cancer with an Area-Under-Curve (AUC) of at least about 0.50, at least about 0.55, at least about 0.60, at least about 0.65, at least about 0.70, at least about 0.75, at least about 0.80, at least about 0.81, at least about 0.82, at least about 0.83, at least about 0.84, at least about 0.85, at least about 0.86, at least about 0.87, at least about 0.88, at least about 0.89, at least about 0.90, at least about 0.91, at least about 0.92, at least about 0.93, at least about 0.94, at least about 0.95, at least about 0.96, at least about 0.97, at least about 0.98, at least about 0.99, or more than about 0.99. The AUC may be calculated as an integral of the Receiver Operator Characteristic (ROC) curve (e.g., the area under the ROC curve) associated with the classifier in classifying biological samples as having or not having the prostate cancer.
The classifier may be adjusted or tuned to improve the performance, accuracy, PPV, NPV, clinical sensitivity, clinical specificity, or AUC of identifying the prostate cancer. The classifier may be adjusted or tuned by adjusting parameters of the classifier (e.g., a set of cutoff values used to classify a sample as described elsewhere herein, or weights of a neural network). The classifier may be adjusted or tuned continuously during the training process or after the training process has completed.
After the classifier is initially trained, a subset of the inputs may be identified as most influential or most important to be included for making high-quality classifications. For example, a subset of the plurality of prostate cancer-associated genomic loci may be identified as most influential or most important to be included for making high-quality classifications or identifications of prostate cancer. The plurality of prostate cancer-associated genomic loci or a subset thereof may be ranked based on metrics indicative of each genomic locus's influence or importance toward making high-quality classifications or identifications of prostate cancer. Such metrics may be used to reduce, in some cases significantly, the number of input variables (e.g., predictor variables) that may be used to train the classifier to a desired performance level (e.g., based on a desired minimum accuracy, PPV, NPV, clinical sensitivity, clinical specificity, or AUC). For example, if training the training algorithm with a plurality comprising several dozen or hundreds of input variables in the classifier results in an accuracy of classification of more than 99%, then training the training algorithm instead with only a selected subset of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, no more than about 10, no more than about 15, no more than about 20, no more than about 25, no more than about 30, no more than about 35, no more than about 40, no more than about 45, no more than about 50, or no more than about 100 such most influential or most important input variables (e.g., marker genes, marker regions, or other genomic loci) among the plurality results in decreased but still acceptable accuracy of classification (e.g., at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 95%, at least about 96%, at least about 97%, or at least about 98%). The subset may be selected by rank-ordering the entire plurality of input variables and selecting a predetermined number (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, no more than about 10, no more than about 15, no more than about 20, no more than about 25, no more than about 30, no more than about 35, no more than about 40, no more than about 45, no more than about 50, or no more than about 100) of input variables with the best metrics. In some embodiments, the selected subset of the influential or most important input variables comprises one or more genomic loci listed in Table 2.
Identifying or Monitoring a Prostate Cancer
After using a classifier to process the methylation data at the panel of prostate cancer-associated genomic loci to classify the biological sample, a quantitative measure indicative of the presence, absence, or relative assessment of the prostate cancer may be determined (e.g., likelihood or probability of prostate cancer), and the prostate cancer may be identified or a progression or regression of the prostate cancer may be monitored in the subject based at least in part on the quantitative measure (e.g., likelihood or probability of prostate cancer).
The prostate cancer may be identified in the subject with an accuracy of at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The accuracy of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples (e.g., subjects having the prostate cancer, or apparently healthy subjects with negative clinical test results for the prostate cancer) that are correctly identified or classified as having or not having the prostate cancer, respectively.
The prostate cancer may be identified in the subject with a positive predictive value (PPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The PPV of identifying the prostate cancer by the classifier may be calculated as the percentage of biological samples identified or classified as having the prostate cancer that correspond to subjects that truly have the prostate cancer. A PPV may also be referred to as a precision.
The prostate cancer may be identified in the subject with a negative predictive value (NPV) of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The NPV of identifying the prostate cancer by the classifier may be calculated as the percentage of biological samples identified or classified as not having the prostate cancer that correspond to subjects that truly do not have the prostate cancer.
The prostate cancer may be identified in the subject with a clinical sensitivity of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The clinical sensitivity of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples associated with presence of the prostate cancer (e.g., subjects having the prostate cancer) that are correctly identified or classified as having the prostate cancer. A clinical sensitivity may also be referred to as a recall.
The prostate cancer may be identified in the subject with a clinical specificity of at least about 5%, at least about 10%, at least about 15%, at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 81%, at least about 82%, at least about 83%, at least about 84%, at least about 85%, at least about 86%, at least about 87%, at least about 88%, at least about 89%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99%, or more than about 99%. The clinical specificity of identifying the prostate cancer by the classifier may be calculated as the percentage of independent test samples associated with absence of the prostate cancer (e.g., apparently healthy subjects with negative clinical test results for the prostate cancer) that are correctly identified or classified as not having the prostate cancer.
After the prostate cancer is identified in a subject, a stage of the prostate cancer (e.g., stage I, stage II, stage III, or stage IV) may further be identified. The stage of the prostate cancer may be determined based at least in part on the methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of mutations at the prostate cancer-associated genomic loci).
Upon identifying the subject as having the prostate cancer, the subject may be provided with a therapeutic intervention (e.g., prescribing an appropriate course of treatment to treat the prostate cancer of the subject). The therapeutic intervention may comprise a surgical tumor resection, an effective dose of chemotherapy, an effective dose of radiotherapy, an effective dose of targeted therapy, an effective dose of immunotherapy. If the subject is currently being treated for the prostate cancer with a course of treatment, the therapeutic intervention may comprise a subsequent different course of treatment (e.g., to increase treatment efficacy due to tumor resistance, tumor recurrence, non-response of the current course of treatment).
The therapeutic intervention may comprise recommending the subject for a secondary clinical test to confirm a diagnosis of the prostate cancer. This secondary clinical test may comprise a biopsy, an imaging test, a blood test, a computed tomography (CT) scan, a magnetic resonance imaging (MRI) scan, an ultrasound scan, a chest X-ray, a positron emission tomography (PET) scan, a PET-CT scan, or any combination thereof.
The subject may be treated upon identifying the subject as having the prostate cancer. Treating the subject may comprise administering an appropriate therapeutic intervention to treat the prostate cancer of the subject. The therapeutic intervention may comprise a surgical tumor resection, an effective dose of chemotherapy, an effective dose of radiotherapy, an effective dose of targeted therapy, an effective dose of immunotherapy, or any combination thereof. If the subject is currently being treated for the prostate cancer with a course of treatment, the administered therapeutic intervention may comprise a subsequent different course of treatment (e.g., to increase treatment efficacy due to tumor resistance, tumor recurrence, non-response of the current course of treatment).
The methylation data at the panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) may be assessed over a duration of time to monitor a patient (e.g., subject who has prostate cancer or who is being treated for prostate cancer). In such cases, the quantitative measures of methylation at the prostate cancer-associated genomic loci of the patient may change during the course of treatment. For example, the quantitative measures of methylation at the prostate cancer-associated genomic loci of a patient whose prostate cancer is regressing due to an effective treatment (e.g., chemotherapy or surgical resection) may shift toward the methylation profile or distribution of a healthy subject. Conversely, for example, the quantitative measures of methylation at the prostate cancer-associated genomic loci of a patient whose prostate cancer is progressing due to an ineffective treatment (e.g., when the tumor becomes resistant) may shift toward the methylation profile or distribution of a subject with more advanced stage prostate cancer.
The progression or regression of the prostate cancer in the subject may be monitored by monitoring a course of treatment for treating the prostate cancer in the subject. The monitoring may comprise assessing the prostate cancer in the subject at two or more time points. The assessing may be based at least on the methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined at each of the two or more time points.
A difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of one or more clinical indications, such as (i) a diagnosis of the prostate cancer in the subject, (ii) a prognosis of the prostate cancer in the subject, (iii) a progression of the prostate cancer in the subject, (iv) a regression of the prostate cancer in the subject, (v) an efficacy of the course of treatment for treating the prostate cancer in the subject, and (vi) a resistance of the prostate cancer toward the course of treatment for treating the prostate cancer in the subject.
A difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a diagnosis of the prostate cancer in the subject. For example, if the prostate cancer was not detected in the subject at an earlier time point but was detected in the subject at a later time point, then the difference is indicative of a diagnosis of the prostate cancer in the subject. A clinical action or decision may be made based on this indication of diagnosis of the prostate cancer in the subject, e.g., prescribing a new therapeutic intervention for the subject.
A difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a prognosis of the prostate cancer in the subject.
A difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a progression of the prostate cancer in the subject. For example, if the prostate cancer was detected in the subject both at an earlier time point and at a later time point, and if the difference is a negative difference (e.g., the presence, absence, or relative assessment of methylation at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) increased from the earlier time point to the later time point), then the difference may be indicative of a progression (e.g., increased tumor load, tumor burden, or tumor size) of the prostate cancer in the subject. A clinical action or decision may be made based on this indication of the progression, e.g., prescribing a new therapeutic intervention or switching therapeutic interventions (e.g., ending a current treatment and prescribing a new treatment) for the subject.
A difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a regression of the prostate cancer in the subject. For example, if the prostate cancer was detected in the subject both at an earlier time point and at a later time point, and if the difference is a positive difference (e.g., the presence, absence, or relative assessment of methylation at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) decreased from the earlier time point to the later time point), then the difference may be indicative of a regression (e.g., decreased tumor load, tumor burden, or tumor size) of the prostate cancer in the subject. A clinical action or decision may be made based on this indication of the regression, e.g., continuing or ending a current therapeutic intervention for the subject.
A difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of an efficacy of the course of treatment for treating the prostate cancer in the subject. For example, if the prostate cancer was detected in the subject at an earlier time point but was not detected in the subject at a later time point, then the difference may be indicative of an efficacy of the course of treatment for treating the prostate cancer in the subject. A clinical action or decision may be made based on this indication of the efficacy of the course of treatment for treating the prostate cancer in the subject, e.g., continuing or ending a current therapeutic intervention for the subject.
A difference in methylation data at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) determined between the two or more time points may be indicative of a resistance of the prostate cancer toward the course of treatment for treating the prostate cancer in the subject. For example, if the prostate cancer was detected in the subject both at an earlier time point and at a later time point, and if the difference is a negative or zero difference (e.g., the presence, absence, or relative assessment of methylation at a panel of prostate cancer-associated genomic loci (e.g., quantitative measures of methylation at the prostate cancer-associated genomic loci) increased or remained at a constant level from the earlier time point to the later time point), and if an efficacious treatment was indicated at an earlier time point, then the difference may be indicative of a resistance (e.g., increased or constant tumor load, tumor burden, or tumor size) of the course of treatment for treating the prostate cancer in the subject. A clinical action or decision may be made based on this indication of the resistance of the course of treatment for treating the prostate cancer in the subject, e.g., ending a current therapeutic intervention and/or switching to (e.g., prescribing) a different new therapeutic intervention for the subject.
Outputting a Report of the Prostate Cancer
After the prostate cancer is identified or a progression or regression of the prostate cancer is monitored in the subject, a report may be electronically outputted that identifies or provides an indication of the identification, prognosis, progression, or regression of the prostate cancer in the subject. The subject may not display a benign or prostate cancer (e.g., is asymptomatic of the benign or prostate cancer). The report may be presented on a graphical user interface (GUI) of an electronic device of a user. The user may be the subject, a caretaker, a physician, a nurse, or another health care worker.
The report may include one or more clinical indications such as (i) a diagnosis of the prostate cancer in the subject, (ii) a prognosis of the prostate cancer in the subject, (iii) a progression of the prostate cancer in the subject, (iv) a regression of the prostate cancer in the subject, (v) an efficacy of the course of treatment for treating the prostate cancer in the subject, and (vi) a resistance of the prostate cancer toward the course of treatment for treating the prostate cancer in the subject. The report may include one or more clinical actions or decisions made based on these one or more clinical indications.
For example, a clinical indication of a diagnosis of the prostate cancer in the subject may be accompanied with a clinical action of prescribing a new therapeutic intervention for the subject. As another example, a clinical indication of a progression of the prostate cancer in the subject may be accompanied with a clinical action of prescribing a new therapeutic intervention or switching therapeutic interventions (e.g., ending a current treatment and prescribing a new treatment) for the subject. As another example, a clinical indication of a regression of the prostate cancer in the subject may be accompanied with a clinical action of continuing or ending a current therapeutic intervention for the subject. As another example, a clinical indication of an efficacy of the course of treatment for treating the prostate cancer in the subject may be accompanied with a clinical action of continuing or ending a current therapeutic intervention for the subject. As another example, a clinical indication of a resistance of the course of treatment for treating the prostate cancer in the subject may be accompanied with a clinical action of ending a current therapeutic intervention and/or switching to (e.g., prescribing) a different new therapeutic intervention for the subject.
Computer Systems
The present disclosure provides computer systems that are programmed to implement methods of the disclosure. FIG. 4 shows a computer system 401 that is programmed or otherwise configured to, for example, determine quantitative measures of DNA methylation to generate methylation profiles of DNA molecules at genomic regions; determine cycle threshold (Ct) values for amplification of DNA fragments or derivatives thereof at genomic regions; calculate intensity ratio values of first quantitative measures and second quantitative measures at genomic regions; determine a quantitative measure indicative of a presence, absence, or relative assessment of prostate cancer of a subject; identify or provide an indication of the prostate cancer of the subject; and electronically output a report that identifies or provides an indication of the prostate cancer of the subject.
The computer system 401 can regulate various aspects of analysis, calculation, and generation of the present disclosure, such as, for example, determining quantitative measures of DNA methylation to generate methylation profiles of DNA molecules at genomic regions; determining cycle threshold (Ct) values for amplification of DNA fragments or derivatives thereof at genomic regions; calculating intensity ratio values of first quantitative measures and second quantitative measures at genomic regions; determining a quantitative measure indicative of a presence, absence, or relative assessment of prostate cancer of a subject; identifying or providing an indication of the prostate cancer of the subject; and electronically outputting a report that identifies or provides an indication of the prostate cancer of the subject. The computer system 401 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device. The electronic device can be a mobile electronic device.
The computer system 401 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 405, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 401 also includes memory or memory location 410 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 415 (e.g., hard disk), communication interface 420 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 425, such as cache, other memory, data storage and/or electronic display adapters. The memory 410, storage unit 415, interface 420, and peripheral devices 425 are in communication with the CPU 405 through a communication bus (solid lines), such as a motherboard. The storage unit 415 can be a data storage unit (or data repository) for storing data. The computer system 401 can be operatively coupled to a computer network (“network”) 430 with the aid of the communication interface 420. The network 430 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
The network 430 in some cases is a telecommunication and/or data network. The network 430 can include one or more computer servers, which can enable distributed computing, such as cloud computing. For example, one or more computer servers may enable cloud computing over the network 430 (“the cloud”) to perform various aspects of analysis, calculation, and generation of the present disclosure, such as, for example, determining quantitative measures of DNA methylation to generate methylation profiles of DNA molecules at genomic regions; determining cycle threshold (Ct) values for amplification of DNA fragments or derivatives thereof at genomic regions; calculating intensity ratio values of first quantitative measures and second quantitative measures at genomic regions; determining a quantitative measure indicative of a presence, absence, or relative assessment of prostate cancer of a subject; identifying or providing an indication of the prostate cancer of the subject; and electronically outputting a report that identifies or provides an indication of the prostate cancer of the subject. Such cloud computing may be provided by cloud computing platforms such as, for example, Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform, and IBM cloud. The network 430, in some cases with the aid of the computer system 401, can implement a peer-to-peer network, which may enable devices coupled to the computer system 401 to behave as a client or a server.
The CPU 405 may comprise one or more computer processors and/or one or more graphics processing units (GPUs). The CPU 405 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 410. The instructions can be directed to the CPU 405, which can subsequently program or otherwise configure the CPU 405 to implement methods of the present disclosure. Examples of operations performed by the CPU 405 can include fetch, decode, execute, and writeback.
The CPU 405 can be part of a circuit, such as an integrated circuit. One or more other components of the system 401 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).
The storage unit 415 can store files, such as drivers, libraries and saved programs. The storage unit 415 can store user data, e.g., user preferences and user programs. The computer system 401 in some cases can include one or more additional data storage units that are external to the computer system 401, such as located on a remote server that is in communication with the computer system 401 through an intranet or the Internet.
The computer system 401 can communicate with one or more remote computer systems through the network 430. For instance, the computer system 401 can communicate with a remote computer system of a user. Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 401 via the network 430.
Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 401, such as, for example, on the memory 410 or electronic storage unit 415. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 405. In some cases, the code can be retrieved from the storage unit 415 and stored on the memory 410 for ready access by the processor 405. In some situations, the electronic storage unit 415 can be precluded, and machine-executable instructions are stored on memory 410.
The code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
Aspects of the systems and methods provided herein, such as the computer system 401, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution.
Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
The computer system 401 can include or be in communication with an electronic display 435 that comprises a user interface (UI) 440 for providing, for example, a visual display of data indicative of a presence, absence, or relative assessment of prostate cancer of a subject, a determined presence, absence, or relative assessment of prostate cancer of a subject, an identification of a subject as having prostate cancer, or an electronic report that identifies or provides an indication of the prostate cancer of the subject. Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.
Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 405. The algorithm can, for example, determine quantitative measures of DNA methylation to generate methylation profiles of DNA molecules at genomic regions; determine cycle threshold (Ct) values for amplification of DNA fragments or derivatives thereof at genomic regions; calculate intensity ratio values of first quantitative measures and second quantitative measures at genomic regions; determine a quantitative measure indicative of a presence, absence, or relative assessment of prostate cancer of a subject; identify or provide an indication of the prostate cancer of the subject; and electronically output a report that identifies or provides an indication of the prostate cancer of the subject.
A healthy (prostate normal) sample and a prostate cancer sample were each processed using methods of the present disclosure, including quantitative polymerase chain reaction (qPCR) amplification at a control locus and two restriction loci. FIGS. 2A and 2B illustrate an example of quantitative polymerase chain reaction (qPCR) amplification plots for the control locus and two restriction loci tested in the healthy (prostate normal) sample (“N1-digested” and “N1-undigested”) and the prostate cancer sample (“T1-digested” and “T1-undigested”), respectively, in accordance with disclosed embodiments.
Table 1 provides a list of marker regions, genomic coordinates, and strands for a set of prostate cancer-specific biomarkers. Marker regions may include marker genes or intergenic regions. Panels of one or more prostate cancer-specific genomic loci can be selected from this list. The genomic coordinates may comprise portions of genes or intergenic regions.
Table 2 provides a list of selected marker regions that are observed to be most influential or most important toward classification of samples for prostate cancer assessment. Table 3 provides performance data (including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the ROC (AUC)) for this set of selected marker regions that are observed to be most influential or most important toward classification of samples for prostate cancer assessment. Table 4 provides signal ratio values (mean ratio and standard deviation) measured for this set of selected marker regions that are observed to be most influential or most important toward classification of samples for prostate cancer assessment, across the set of normal samples and the set of tumor samples.
| TABLE 1 |
| Differentially methylated regions (DMRs) with genomic coordinates |
| Marker Region Name | Genomic Coordinates | Strand |
| FOXE3 | chr1:47882821-47883001 | + |
| INTchrl_178456064_178456230 | chr11:78456064-178456230 | NA |
| SYT15 | chr10:46970108-46970295 | −;− |
| EPS8L2 | chr11:721572-721764 | + |
| SLC15A3 | chr11:60718471-60718618 | − |
| KDM2A | chr11:66885273-66885367 | + |
| INT.chr11_69452031_69452146 | chr11:69452031-69452146 | NA |
| INT.chr11_69452505_69452633 | chr11:69452505-69452633 | NA |
| BIN2 | chr12:51718739-51718869 | − |
| RHOF | chr12:122231231-122231336 | − |
| ZIC2 | chr13:100637718-100637764 | + |
| NPAS3 | chr14:34269667-34269714 | + |
| INT.chr14_54413694_54413930 | chr14:54413694-54413930 | NA |
| AMN | chr14:103394658-103394809 | + |
| SYNGR3 | chr16:2042521-2042624 | + |
| FAM18A | chr16:10912373-10912435 | − |
| INT.chr16_88700552_88700661 | chr16:88700552-88700661 | NA |
| CYBA | chr16:88717186-88717443 | − |
| ALOX12;LOC100506713 | chr17:6898966-6899129 | +;− |
| RAI1 | chr17:17603974-17604067 | + |
| STAC2 | chr17:37381086-37381291 | − |
| HSF5 | chr17:56564955-56565159 | − |
| UTS2R | chr17:80332893-80332988 | + |
| CELF4 | chr18:34833673-34833820 | − |
| NFATC1 | chr18:77280255-77280394 | +;+ |
| INT.chr18_77377808_77378030 | chr18:77377808-77378030 | NA |
| GNG7 | chr19:2561966-2562267 | − |
| S1PR4 | chr19:3178378-3178763 | + |
| S1PR4 | chr19:3179843-3180439 | + |
| ICAM5 | chr19:10398645-10398816 | +;+ |
| INT.chr19_13209661_13209782 | chr19:13209661-13209782 | NA |
| LYL1 | chr19:13210312-13210610 | − |
| KCNN4 | chr19:44278646-44278776 | − |
| DACT3 | chr19:47152943-47153124 | − |
| GRIN2D | chr19:48946038-48946234 | + |
| ARHGEF33;LOC375196 | chr2:39186942-39187340 | −;+ |
| GPR75-ASB3 | chr2:54086834-54087146 | − |
| LOXL3;DOK1 | chr2:74782170-74782343 | −;+;+ |
| SPEG | chr2:220299387-220299671 | + |
| OBSL1;MIR3132 | chr2:220416365-220416499 | −;− |
| ADRA1D | chr20:4202298-4202568 | − |
| M1R5095;RBM38 | chr20:55965671-55965845 | +;+ |
| PR1C285 | chr20:62200059-62200211 | − |
| INT.chr21_44819085_44819433 | chr21:44819085-44820000 | NA |
| CLDN5 | chr22:19511041-19511173 | − |
| SCARF2 | chr22:20783492-20784226 | − |
| BCR | chr22:23523796-23524384 | + |
| BAIAP2L2 | chr22:38485113-38485188 | − |
| CELSR1 | chr22 :46932291-46932543 | − |
| PLXNB2 | chr22:50738284-50738549 | − |
| GPR62 | chr3:51989765-51989989 | + |
| CTBP1 | chr4:1210457-1211157 | − |
| INT.chr4_55015512_55015839 | chr4:55015512-55015839 | NA |
| SCGB3A1 | chr5:180017902-180018673 | − |
| C5orf49 | chr5:7849801-7850443 | − |
| MCI | chr5:54516403-54516681 | − |
| ANKDD1B | chr5:74907331-74907750 | + |
| SH2B2 | chr7:101961701-101962105 | + |
| MEST;MESTIT1 | chr7:130131962-130132110 | +;+;+;+;− |
| TMEM176B;TMEM176A | chr7:150497959-150498298 | −,+ |
| KCNH2 | chr7:150655309-150655540 | −;− |
| KBTBD11 | chr8:1950518-1950729 | + |
| INT.chr8_38508331_38508694 | chr8:38508331-38508694 | NA |
| INT.chr8_48675655_48676143 | chr8:48675655-48676143 | NA |
| C9orf3 | chr9:97807476-97807681 | + |
| C9orf172 | chr9:139740230-139740378 | + |
| CLIC3 | chr9:139889657-139889955 | − |
| TABLE 2 |
| Selected top marker regions |
| Marker Region Name | Genomic Coordinates | Strand | ||
| SCGB3A1 | chr5: 180017902-180018673 | + | ||
| ANKDD1B | chr5: 74907443-74907561 | + | ||
| C5orf49 | chr5: 7850160-7850286 | − | ||
| C9orf3 | chr9: 97807476-97807681 | + | ||
| GPR75-ASB3 | chr2: 54086834-54087017 | − | ||
| TABLE 3 |
| Performance data for the selected top marker regions tested |
| in a set of 52 normal and 47 tumor patient tissue samples |
| Locus | Sensitivity | Specificity | PPV | NPV | AUC |
| SEQ ID NO:127 | 78.80% | 98.20% | 97.10% | 86.10% | 0.909 |
| SEQ ID NO:201 | 88.10% | 94.70% | 92.50% | 91.50% | 0.943 |
| SEQ ID NO:134 | 73.80% | 89.50% | 83.80% | 83.90% | 0.883 |
| SEQ ID NO:43 | 88.10% | 91.20% | 88.10% | 91.20% | 0.924 |
| SEQ ID NO:96 | 81.00% | 86.00% | 81.00% | 86.00% | 0.921 |
| TABLE 4 |
| Signal ratio of the selected top marker regions tested |
| in a set of 52 normal and 47 patient tissue samples |
| Normal | Normal | Cancer | Cancer | |
| Mean | Standard | Mean | Standard | |
| Locus | Ratio | Deviation | Ratio | Deviation |
| SEQ ID NO:127 | 78.2 | 2.8 | 5.1 | 3.6 |
| SEQ ID NO:201 | 52.0 | 2.9 | 3.0 | 3.1 |
| SEQ ID NO:134 | 3.0 | 2 | 2.5 | 2.2 |
| SEQ ID NO:43 | 34.3 | 3.7 | 4.1 | 4.3 |
| SEQ ID NO:96 | 4.2 | 1.8 | 1.8 | 1.7 |
| TABLE 5 |
| Area under an ROC curve (AUC) with respect to identification |
| of prostate cancer using combinations of one or more loci |
| Loci | Sensitivity | Specificity | AUC |
| {SEQ ID NO:134} | 80.9% | 82.4% | 0.883 |
| {SEQ ID NO:134, | 76.2% | 87.7% | 0.895 |
| SEQ ID NO:127} | |||
| {SEQ ID NO:134, | 78.6% | 98.2% | 0.904 |
| SEQ ID NO:127, | |||
| SEQ ID NO:43} | |||
| {SEQ ID NO:134, | 80.9% | 98.2% | 0.908 |
| SEQ ID NO:127, | |||
| SEQ ID NO:43, | |||
| SEQ ID NO:96} | |||
| {SEQ ID NO:134, | 83.3% | 98.2% | 0.917 |
| SEQ ID NO:127, | |||
| SEQ ID NO:43, | |||
| SEQ ID NO:96, | |||
| SEQ ID NO:201} | |||
| TABLE 6 |
| Sequences of the DMRs in Table 1 |
| Marker Region Name | Sequence | SEQ ID NO: |
| FOXE3 | ggcccgctgcccgctgagcccctcctggccttggccGGGCCGGC | 277 |
| AGCCGCTCTCGGCCCGCTCAGCCCTGGGGAG | ||
| GCCTACCTGAGGCAGCCGGGCTTCGCGTCGG | ||
| GGCTGGAGCGCTACCTGTGAGCCTGCGCCGC | ||
| GCGGGCAGGCACCTGTGCGACCTGTGCCCCG | ||
| GACCTGCGGCGC | ||
| INTchr1_178456064_178456230 | GGAGGCGGAAGCGCGCGAGTAGGAGGTGCG | 278 |
| GAGGTCGGGCTCGCGGGGCTCCGGGCTGCCC | ||
| CTCTGAGTGAGCCGCGCTGCTGAAGCCGGGC | ||
| CCTGCGAGGCGCCCACGGGGCCGGTGCTGGT | ||
| CCCTAGGGCCAGAGAGAAGACTTCTGTGGGG | ||
| TCCGCTGCGCCC | ||
| SYT15 | GGGGGTGCTCAGACGCTGGGTTCCAACCGCT | 279 |
| GGCCACCTGGGGCGGGCCAAAAAGGTGCCTC | ||
| CCTTAGGGTGACGTGCGGCCGCGGGGCATTC | ||
| AGGTCTCAGGGATCTGCACTGGGTGGGGTGG | ||
| TGAGAAGGCCGGACCCCCCACACCTCCTAAG | ||
| CCGCAACTGACCGCGAAGAGCGGGCCTCAGCG | ||
| EPS8L2 | CAACTGCGCCCTGGACGACATCGAGTGGTTT | 280 |
| GTGGCCCGGCTGCAGAAGGCAGCCGAGGCTT | ||
| TCAAGCAGCTGAACCAGCGGAAAAAGGGGA | ||
| AGAAGAAGGGCAAGAAGGCGCCAGCAGGTG | ||
| CAggggacagggacggggccggcaggtgcaggggacggggccagc | ||
| aggtgcaggggacagggacggggac | ||
| SLC15A3 | GGCACCGAAGGAGGTGAGGTTGCTCCGGACG | 281 |
| GAGCTGGCGGCCAGGCCGAGTAGCAGCAGGC | ||
| CCGCGTAGAGGACGGGCGCGCAGTAGGGGCT | ||
| GGGCGAGGAGCGCGGGCAGCCGGCCGAGGG | ||
| GCAGGCAGGTCCCAGCGGCGACGC | ||
| KDM2A | CCCGGATCGCGGCTGGGCTGCTCGCATGGCA | 282 |
| CTGCTCGGGTACCTCCGGCCGGGCTCCGTCGA | ||
| CGTTCGGAGCCTGCTGGCCCGTCGGGCAGCT | ||
| INT.chr11_69452031_69452146 | GGAGTGGGGCATGCCGTGGGAGCCCACGAGG | 283 |
| GCCTCAGCGCGGATCCTCCGCCGGAAAACCG | ||
| GCTCCCGCGAGCCGCCGCCGCAGGTTTCCTA | ||
| GGCCCCGCGAGTCCCGCAGCGA | ||
| INT.chr11_69452505_69452633 | CCCCTACCTGTTGGGTTTGCGTTTTAACTCCA | 284 |
| GCGCACACCTTGCCGGCAGCCCTCGGAGCTA | ||
| GGGGAGGGGTCTCGTTTCCCCGCAGCCCGCC | ||
| GGACAGACGACTGGGGCACGGGAGGGGCGG | ||
| TGGC | ||
| BIN2 | aaataaataaataaataaataaataaatGAGGAACAACTAAG | 285 |
| CTGGAGATAGAAACAGGGTAGGGGGCTGGTT | ||
| CTTAGGCAAGAGAATGATCACATTGAAAAAA | ||
| GGCTGAGGAGGATAGTATGGACGCCC | ||
| RHOF | GTCCCGGATCAGCCCCCCCTCACCCCGCTGGG | 286 |
| CTCTGGAATTCCCGAGGGGGCGCCCCGGGGT | ||
| GGCGGCCGCCTGTCCGTGCTCGGGACGCTGG | ||
| GGACTGAGGGT | ||
| ZIC2 | cggcggcggctgcggcggcggcggccgcggtgtccgcggtgcaccg | 287 |
| NPAS3 | cccggcgccgacggcgcggccgcccgcAAGACTCAGTTCG | 288 |
| GCGCCTC | ||
| INT.chr14_54413694_54413930 | CGCCGCGGACGCGCCGAGCCCTCTCAGTGTG | 289 |
| GCGCTGCCCGGCGGCGAGGGGGGTGTGGAAC | ||
| GAAGCACGGTCAAGACAGAAAACAAAGTCA | ||
| GCAGGTCACCTGGCAGGTTCTGGGCGAATTA | ||
| TGCAACGAAAGCAGGGGAATGTTTGATGCGT | ||
| CCCACTCCACACCCCCCCAACCTTTTTTTTTTT | ||
| TTTTTAAGCTCCTAGGAAGCCGGTTCCAGTTT | ||
| AAGGGTTGGGTAGGGAT | ||
| AMN | GTGCACAGGGTCTCGGCTTCTCGTCCCAGGG | 290 |
| GACTGGGGGCGGGGTGGGCGCGGAGCAGGC | ||
| CCGGACCCCCGCGTGGCGCCGCCTCAGCCCG | ||
| TGTCTCTTGCAGCTCCTGCCGCTGGATGGGGA | ||
| ACTCGTCCTGGCTTCAGGAGCCGGATT | ||
| SYNGR3 | GGCGAGCCCAGGCGAGGCGCCCCAAGCCTCG | 291 |
| GGCCCACCGACCTTTCCTCCTCCGGGCGAGGC | ||
| CGCCGTGGGCCACCGCGTGGAGCGTCGCCCT | ||
| GACGCGCCG | ||
| FAM18A | CGCCCAGGCCCGGGGCTCCAGCTCCGCCCGT | 292 |
| CGCCGCTGAAGGGGTCGGACGCCGGGCGGGC | ||
| INT.chr16_88700552_88700661 | CATGACGATTTTGCACCCCCGCCAGCAGTGCC | 293 |
| CTCGGCTGGACACACCTGCGTTGTGGGCAGC | ||
| CGGCAGCGCGTGACCCACTGTGCACCTGCGC | ||
| CTTGATGTAGGGGGG | ||
| CYBA | CCAGCCCGCGGCCTGAGGGTCCCGCCCCCGT | 294 |
| TCCCCGCACCCTCCTGGCCTGACCCCGGCCCG | ||
| GCCTGGCCCGCCTGGCGCCCCACTTCCCCACC | ||
| CTGTAAGTAGCCCGAGGTCCCGGCTGGGGTC | ||
| TTGGGACACCCCTCCAGGCTGCAGCCTCCACC | ||
| GTCCCTGACGTGCACTCACTCAGGCCGGACG | ||
| CCAGCGCCTGTTCGTTGGCCCACATGGCCCAC | ||
| TCGATCTGCCCCATGGCGACACGAACCCGGC | ||
| TGGGA | ||
| ALOX12; LOC100506713 | GGGTCCCAGCCCAGAAAAGCGGAGACCTTCC | 295 |
| CCTCCGTTTGGAAGATGGACTCATCCCCTACT | ||
| TCCTCCTGGCTCCCCAGAAGACCTGGAGACCT | ||
| CGGGAAGTGTTCTCATCTATGTTCGCTCCAAC | ||
| CCGGGGGCTCAGGCCCACCTCGATCCCGCCTT | ||
| CCCG | ||
| RAI1 | GGGGCCTCCGCGAGGCCGGGCCTCGTCCGCC | 296 |
| GAGGCCGCAGTCCGGCCCCTCTCGTCCTCGG | ||
| GGCGCGGCGCGTTCGTTTCATTAATTTATCG | ||
| STAC2 | GGCCGCGGCCTCGCGTCCCCGAGCCCAACAG | 297 |
| GGCTAGGAGCGGGGAATGCAGGACTGCGGA | ||
| GGGGCAGGGAGAGACGCCCTGAGACGCGGA | ||
| GATAAACACCCGGAGACGCCGGGAGAGACG | ||
| GGGAGAGACGCACACAGAGACACCAAGACA | ||
| CAGACACGCAGGTTGTAGAGACAAATTCAGA | ||
| GACACGAGCGAGGATAGAGGCGC | ||
| HSF5 | GCCAATGGGTCCCCGGCGCCTTTCCGACGCCC | 298 |
| ATAGCGTGAGGACGTGCGAAAATGCGCCCTC | ||
| CAGCGGCGGTTGCTCCCCGCCGCCCATGTGCC | ||
| CACTGTGGGCGAGGGCACGGGCAGTCCGGAC | ||
| TCACCGTGCGGCTcgggccggggccccgcgggcggcggcg | ||
| gcTGCTGGTGCTGCAGTGGCGCGGTGGCGGCG | ||
| GAGGCC | ||
| UTS2R | GCGCGCCTGGCCCGCGCCTACCGCCGCTCGC | 299 |
| AGCGCGCCTCCTTCAAgcgggcccggcggccgggggcg | ||
| cgcgcgcTGCGCCTGGTGCTGGGCAT | ||
| CELF4 | GGTCCGTCTGGTTCCCTCCCAACCCCCGTccccg | 300 |
| cgccccggccgcccccggTTACCTGTGCGAGTCCTGGT | ||
| CTCCCCCGGGGGACGCTCCCGCCGGCGCTCA | ||
| GTACGGGCGATTGGCGTCTTTGGGCCGCTTCA | ||
| GCTGCACCTTGA | ||
| NFATC1 | GCAGCCCTCGCGTGGCGTTGCTGCCTCTGCAG | 301 |
| ATGCCTCCGGAGGGACTCGGTGTGTCTCGGT | ||
| GTTGATTTCAGAACAGGGGAGATGCTGACGT | ||
| GTCCCGGTGGCTCTCACAGCACAGTGGAAGG | ||
| CGAGTGGCAGCTCC | ||
| INT.chr18_77377808_77378030 | CCCGCGCGGACAAGCAGCTTCCCAGAGGCCT | 302 |
| CAGGAAGCCCCGCCCGAGGGTGTCAGCTCCA | ||
| GCTCTGAGCGGGTCCCGCAAACGCCCCAGCG | ||
| TGTTCCCACCGGTGACCCCGACACCCCAACA | ||
| CCCCAACGCCCCGCACCGCCCTCAGCAGCCG | ||
| CGCCTTGGCCAGCGGGTGCCCCGGTGCCTGC | ||
| GGCCTCTGACATAGAAAACGAGGAAGGAGGc | ||
| gggcg | ||
| GNG7 | CGGGGCGACCCGAGAAACAGGAAACCCTGTT | 303 |
| TCTGAGCTTCGCAGGCTCTTCTGGGAGACCAG | ||
| CGGGTAATCCCCTTCCTCCGACATTTCTCTGA | ||
| GAAGTCTCTCTGCGTTCTGCTTCTCAGAAAGA | ||
| AACCAGGGTCCGGGGCAGGCATTCACGCCCT | ||
| CCACCCACTCAGGGGTTTGCAGTAACACCCTT | ||
| GGGATCTGCAGGTTCACACAGAGCCAGAGCC | ||
| GTGAGTCACCGCAGCCCCGGAGCTGCCGGGG | ||
| TCCCCACCTGCTCAGCCCCAGGAACACACAT | ||
| CTACAGCGGGTTCCTTTT | ||
| ICAM5 | CGCCTTCTGCTCCCAGCTTGGAGCCCCGCGCC | 304 |
| CACAGCTTTGGCCTCCGGTTCCATCGCTGCCC | ||
| TTGTAGGGATCCTCCTCACTGTGGGCGCTGCG | ||
| TACCTATGCAAGTGCCTAGCTATGAAGTCCCA | ||
| GGCGTAAAGGGGGATGTTCTATGCCGGCTGA | ||
| GCGAGAAAAAGA | ||
| LYL1 | CCCGGGAGGGGTGGGGCCTGCGGCCAGAGCT | 305 |
| GCGGCTTGGTCGCGCAGCAGCCGCACCAGGA | ||
| AGCCGATGTACTTCATGGCTAGGCGGAGCAC | ||
| CTCGTTCTTGCTCAGCTTCCGGTCGGGCGGGT | ||
| GCGTCGGCAGCAGCTTCCTCAGCTCGGCGAA | ||
| GGCGCCGTTAACGTTCTGCTGCCGCCAGCGCT | ||
| CCCGGCTGTTGGTGAACACGCGCCGGGCCAC | ||
| CTTCTGGGGCTGGTGCCCTGTGGACAAGGAG | ||
| GGCCGGGTTGGTGCCATGGCCCAAAGGGCGG | ||
| CCCCTCCTGCCCTCCCG | ||
| KCNN4 | CACGGCGGGCCCCGCACGGGCGCCGGGTGCA | 306 |
| GCCCACACACCACCAGCTCCAGCACGATCTG | ||
| CGCCGCCTGCCGCCCGGTCAGCGCCACGCGC | ||
| CAGTCCCGCAGCCCGTTGTCGGTCATGAACA | ||
| GCTGCC | ||
| DACT3 | GGCTGCGCATCGCCACGGCGTGCAGGGGGCT | 307 |
| GGGCGTCAGAAAgggcccggcgcgggcccgccgcTCCG | ||
| CCGAGGAGCAGGCCTCCGGGCCGGCGGACCC | ||
| CCCTGCCGTCGGGTAGGGCGCTGAGAAGGAC | ||
| CGCGGCACCGCTGCCCGCGCGCCCACCACCT | ||
| CCGGAGCGCTGGGACTGGC | ||
| GRIN2D | ggggggcgcgggccTGGCCGACGGCTTCCACCGCTA | 308 |
| CTACGGCCCCATCGAGCCGCAGGGCCTAGGC | ||
| CTCGGCCTGGGCGAAgcgcgcgcggcaccgcggggcgca | ||
| gccgggcgcccgctgtccccgccggccgcTCAGCCCCCGCA | ||
| GAAGCCGCCGCCCTCCTATTTCGCCATCGTAC | ||
| GCGACAAGGAGCCAGCC | ||
| ARHGEF33; LOC375196 | GTAAGCACAGCTCTTTTGTACTCTGTTTTCCC | 309 |
| CCTAAAGACATCTGATGCCCCCAGTGAAGAA | ||
| AAGCCAACAGCAGCAAAGCCTGATGGAGAGC | ||
| ATGCAGCCCGGGAAGCCCAGTGACTGGGAGC | ||
| TGGAGGGCAGGAAGCACGAGCGGCCCGAGA | ||
| GCCTTCTGGCACCGACGCAGTTCTGCGCGGCC | ||
| GAGCAGGACGTGAAGGCGCTGGCCGGGCCCC | ||
| TGCAGGCCATCCCGGAGATGGACTTCGAGTC | ||
| CTCTCCGGCGGAGCCGCTGGGCAACGTGGAG | ||
| CGCTCCCTGCGCGCCCCGGCCGAGCTCCTGCC | ||
| CGATGCCCGCGGCTTCGTGCCCGCGGCCTAC | ||
| GAAGAGTTCGAGTACGGCGGCGAGATCTTCG | ||
| CGCTGCCCGCGCCCTACGACGAGG | ||
| GPR75-ASB3 | CACCAGGAAGACAGGTACGCGGAGCCGGGCC | 310 |
| TGGCCCAGCGCAGCCGCGCTCCTCGCTATCCC | ||
| GCCAGCCTCCGGGAGCCGTCTCCGGCATCGT | ||
| GGGGTTGTCCTCCTCCAGGGGCCCGCGGCCTC | ||
| TCACCTGCCGGGTGGCCGCAGCGCCGCCCCT | ||
| CCTCCATCTCGCAGTCCGGACCCCAGCTCCGC | ||
| CTGCCGCTCTGGATGATGCAGGACTAGAGGC | ||
| ATCATCGCCATCGCCACCGCCTCCGCGCATCC | ||
| CGGGAGCCGCGGCAAGACGCGGGCGCAGAG | ||
| GCGCAGTCACGGAGACGCCGAGGGCACCGC | ||
| LOXL3; DOK1 | TTTGGAAGCCCCAGATCCCAAATCGACTTGC | 311 |
| GCCGCAACCTCCTTCCCCGTCGGGACCCGGG | ||
| CCGCCTGCGCACGCCACTCCCTCTCGAGCACT | ||
| CTCTCTCTCTCCCTAGAGGTGGAGGAAGACCT | ||
| GGGCCGTGCTCTACCCGGCCAGTCCCCACGG | ||
| CGTAGCGCGGCTCGAG | ||
| OBSL1; MIR3132 | CATCTCATACTTATCTCCCGGGCACAGCTCGG | 312 |
| CCCCCTCCCGCAGCCAGCACACGTGGCCCCC | ||
| CGAGCGGGACACAGTCACCTCCAGCACCGCC | ||
| CGGCGGCCCACCAGAACGGTCTTCTCGCGAG | ||
| GGGGGTGGC | ||
| MIR5095; RBM38 | CCTTTCTATGGCTGGGGGCCGGATGAGGGAC | 313 |
| CCGGGTCTGCTCTTACTCGCCCCAAGGGTCCC | ||
| TGACACCAGCCCAGAACGGGGTGGAGCTGGA | ||
| AAGAGCCCACACCTGCTTCCTCTGCCCACCTC | ||
| ATCTCCCGCGGGGCCTCTGAGACCGCCCGGG | ||
| ACCCGCTTCTATCGCGG | ||
| PRIC285 | GCTGCAGCCCCAGGGCCAAGCAGCAGCGGGC | 314 |
| CGGAAGCAGCAGCCACAGCGCCTGCTCTGAG | ||
| CTGGCCCGCCTCTCCAGCCGCACCTCGAACAC | ||
| CGTATTGTCGGGTGCAGGTACAGGGGCCACC | ||
| AGGGCTGTGCTGACCGCCCGGCCCAGC | ||
| CLDN5 | GGTCCATGCGGGGCTCCCCAGGCTTATCCAA | 315 |
| CGCCTCGCAGGCGTGGCTGGCAGGAGGGGCC | ||
| CGGCCGTGCCCAGCGCCCTCAGACGTAGTTCT | ||
| TCTTGTCGTAGTCGCCGGTGGCCGTGGGCCGC | ||
| CGCGGC | ||
| BCR | ACGGCGCGGGCTCGAGCGTGGGGGATGCATC | 316 |
| CAGGCCCCCTTACCGGGGACGCTCCTCGGAG | ||
| AGCAGCTGCGGCGTCGACGGCGACTACGAGG | ||
| ACGCCGAGTTGAACCCCCGCTTCCTGAAGGA | ||
| CAACCTGATCGACGCCAATGGCGGTAGCAGG | ||
| CCCCCTTGGCCGCCCCTGGAGTACCAGCCCTA | ||
| CCAGAGCATCTACGTCGGGGGCATGATGGAA | ||
| GGGGAGGGCAAGGGCCCGCTCCTGCGCAGCC | ||
| AGAGCACCTCTGAGCAGGAGAAGCGCCTTAC | ||
| CTGGCCCCGCAGGTCCTACTCCCCCCGGAGTT | ||
| TTGAGGATTGCGGAGGCGGCTATACCCCGGA | ||
| CTGCAGCTCCAATGAGAACCTCACCTCCAGC | ||
| GAGGAGGACTTCTCCTCTGGCCAGTCCAGCC | ||
| GCGTGTCCCCAAGCCCCACCACCTACCGCAT | ||
| GTTCCGGGACAAAAGCCGCTCTCCCTCGCAG | ||
| AACTCGCAACAGTCCTTCGACAGCAGCAGTC | ||
| CCCCCACGCCGCAGTGCCATAAGCGGCACCG | ||
| GCACTGCCCGGTTGTCGTGTCCGAGGCCACC | ||
| ATCGTGGGCGTCCGCAAGACCGGGCAGA | ||
| BAIAP2L2 | GTGCGGGGCAGGGAGCGACGGTCTGGCTCTA | 317 |
| GCTGGGACGCGGGCCTCGCGTCGGGCTCGGT | ||
| GCCGTAGGAGCCG | ||
| CELSR1 | GGTTCGTTCTCAAACAACGCCACCTGGTAGTT | 318 |
| GGGCATCGGAAACTTCAGGCTCCCTCTGCCG | ||
| CTCGTgccccgccgggcccgtcgcgccggccccgcccgggcTT | ||
| CGGGCAAGTTCGGCGGCAGGGGCGGCGATGG | ||
| GGATGGCGACGCGGAGGGCGTCCCCGCGGTG | ||
| GCGGCCTCCAGCGCCAGTCCCACCCGGACGG | ||
| CGCCAGCCGCGCGCCGCAGGGCGCACAGCAG | ||
| ACGCAGGCGGACCGAGCCGCCC | ||
| PLXNB2 | GGCCTGTGTGGAGCGCCCTGGACTATTCCTCG | 319 |
| CAGGCCGACCCAGGTGGCACAGCCCCTCCCC | ||
| CGGCGCCGGCACCGCCAGACTCCCCGGAGGG | ||
| CGCAGAACGGTTGCCCGGGAGCCAGGGGCAA | ||
| AGCGCGCCCGGGGCCAGGAAGCGCAGGGACT | ||
| AGGCCCGCGCCTCCTCGGCGCCGCCCACTGC | ||
| CCCCCGCGAGCCCAAGCTCCACGGCCACCGC | ||
| CCGCGCCCTCCCGGGGACTCCGGCGCCCCGT | ||
| CCGCCCCTCGGCCTCG | ||
| GPR62 | GCGCGCTGCTGGTCGTGGTGCTGCGCACGCC | 320 |
| GGGACTGCGCGACGCGCTCTACCTGGCGCAC | ||
| CTGTGCGTCGTGGACCTGCTGGCGGCCGCCTC | ||
| CATCATGCCGCTGGGCCTGCTGGCCGCACCG | ||
| CCGCCCGGGCTGGGCCGCGTGCGCCTGGGCC | ||
| CCGCGCCATGCCGCGCCGCTCGCTTCCTCTCC | ||
| GCCGCTCTGCTGCCGGCCTGCACGCTCGGGGT | ||
| GGCC | ||
| INT.chr4_55015512_55015839 | ccgcggagcagggggtggggagggggcggggcggcggggctccgg | 321 |
| ggctcgcgcatggcgggctgccagtcgccagccatgggagtcggggga | ||
| gccgggggaggaaggcagctgaggcccgacgagaattcgagcgccga | ||
| cccggtgggccagcactgctgagggacccggcgcaccctctgcagctgc | ||
| tggcccgggtgctaagcccctcactgactggtgcggtggcgccggccgg | ||
| ccgctccaagtgcggggcccgccgagcccacgcccatccggaactcgc | ||
| gctggccggcgagcacggggcacggccccggttcccgcccg | ||
| C5orf49 | GACTCGCAGATCCCATCCCAGGAAATGCACG | 322 |
| GGCCGGTGTGGCCAGGACAGAACAGAGGGA | ||
| CCACTCTCAGTCCAGCCCTCCCTGCAGGCGGC | ||
| TCGCCAGCTATTCCTTCTGGTCTTCGGTTTGC | ||
| AGGAGGGCAGAGGGTTTCCCGCGCCCTGGAC | ||
| CATCCGGGCGTAGTCCCGGCAGCAAGGCCTT | ||
| CTTTCCTTGCTAGCCTGGGCCTGCCGCAGACA | ||
| GACCCCAGAGGGAGCCGCGCCCAGCCCGCTG | ||
| GGCGGCCCCGGCTTCCCGCGACCCCCTCCAG | ||
| ACCCTGGGCAGAAAGAGCGCCCTGCTGTCCC | ||
| GACAGAGCCACTGTGCTTTTGAGGGATCCTG | ||
| ACACCTAGTGGCTCCCGCTCCCTTCTCCGAAG | ||
| AGCACCGGGTCCTATCTGAGCATTCCCGCGA | ||
| CTCCCAGCCCCTGATCGCAGCTAAGACACCC | ||
| ATTCGCGCACCCGGCTTCTCCCACATCCTCGT | ||
| CCCAGGGGTTCAGCTGACACTGGTAGTCGCC | ||
| TGAGCTGTACTCTTTGGGGCCCAGGCGCCTTG | ||
| GCGGGAGCTCACCCTCCCTGTCTCCCCAGCTG | ||
| ACCCTGCCGCGCCCCCTTCATCTCCGCACGCT | ||
| CCCACCCGGCCCCCTCCACAGGCTGTCCAGCC | ||
| CCGCCCCTCGGAAC | ||
| MCI | GGGCCTGGTGTCAGTATTCGCGGGCTCTGGC | 323 |
| AGCAGTCGGGGCCGCTTGGGATCGCGGCTCT | ||
| GAAGGGCTTCATCGCAGCGCTCGGAAATCTC | ||
| CCTCAGGATGGCGTCCACTTCCGCGCAATCCT | ||
| GCCCCGCAGCGCTGACCAACTCCTCCAGGCT | ||
| CCTTTTGGCCTTCGCCTTGAGCAGGAAGGGCT | ||
| CGGCCGCCGCCCCACAATCCCGGGACTGTGT | ||
| GATCATCAGCTTCTACGAAAGACAGGGAAGA | ||
| GCGCCGCCCTGGGGCCTGCCGGACCCTC | ||
| ANKDD1B | GGCTGACCTGCCTGCGTCCAGCCCCCGCGCCC | 324 |
| TGGGCCTGCctgggtctggatctgtgtccgagtctgggtctggatct | ||
| gggtccgagtctgggtctggCCCTGCGCTCAGGGCCCGC | ||
| GGAGGAGACTATGGAccccgccgggcgcgcccggggcc | ||
| AAGGGGCCACGGCAGGGGGGCTGCTGCTCCG | ||
| GGCTGCTGCGGCCGCCAAGGGTCTCAGGGAA | ||
| GACCTGTGGGGCGCGGCCGCCCTGCCTTGGA | ||
| GGAGCCTGTCCCGGATCCCGAAGCGGGAGGG | ||
| TCTTGGAGAGGAGGACACAGCAGTTGCCGGA | ||
| CACGAGCTCCGTGAGTCCCGGGACGAGGTCT | ||
| CAGAAAATCAGGCTGCGGGGCGGGCTGGGTC | ||
| GAAGGGACTTGAGAAGGTACCCAGAGCAGCA | ||
| CCTCCGGACGCTGCA | ||
| SCGB3A1 | GCGCCCCCGCGGGAGGCGCCCAGGAACCGTC | 325 |
| GCGCCCTGCCCGGCTCCCCGACCGCCCCTCCC | ||
| TCCTGCGCCGAGGCCTGCCAGGTGCGAGCCC | ||
| CCGGGACACAGGCGGGTCTGGGGAGGCGGCC | ||
| CCGCCAGGAGACGCTGCAGGGTCACCGGAGT | ||
| GGCCTGAGGGTGGCGGAAGGACCGGTGAACT | ||
| CTGTGCAGGGTCCGGGACAGGCCCCCAAGGG | ||
| AGGGGACACTCGCGCTGCGCCTTGCAGGATG | ||
| AGGAGCCGGTCTCCAGACGGGGGGCAGACGG | ||
| GTGTCCCCAGGCCAGGGGCGGCCTCCATCCC | ||
| GGCACGAGGCTGGAGACAGCCCTGAGAGGG | ||
| GGAGGCCGCGGGCTGCAGGCGCGGGGCCCCG | ||
| GGGTGGCGGAGCCCTCTGGGCGCCGGGCGAG | ||
| GCTGGAAGGACCTGGGATCCACGATCGGCGC | ||
| AGGCAGCGGCGGGGGCGCAGCGGGCGCCGA | ||
| GGCCTCAGGCCCCACCGTGCGCGCCAGGAGC | ||
| CCGGGGCGCTCACCGGAGCTGCAGGACAGGG | ||
| CCACGCAGAGCCCCAGGAGGGCGGCGAGCTT | ||
| CATGGCGCGGGGGCTCGGGGCGCGCGGGGAA | ||
| CCTGCGGCTGCCCGGGCAAGGCCACGAGGCT | ||
| TCTTATACCCGGTCCTCGCCCCTCCAGCGCCG | ||
| GCCTCGCCCGCGCTCCTGAGAAAGCCCTGCC | ||
| CGCTCCGCTCACGGCCGTGCCCTGGCCAACTT | ||
| CCTGCTGCGGCCGGCGGGCCCTGGGAAGCCC | ||
| GTGCCCCCTTCCCTGCCCGGGCCTCG | ||
| SH2B2 | GGCCTCCCTGCAGGATGTGGCCAGCCCAGGT | 326 |
| CCCCACTCACGCCCTGCCGTCGCCTTGTTGCA | ||
| GAGCCGGGCCCCACGCCCCCTGCCGCGCCCG | ||
| CGTCCCCGGCCTGCTGGAGCGACTCGCCCGG | ||
| CCAGCACTACTTCTCCAGCCTCGCCGCGGCCG | ||
| CCTGCCCGCCTGCCTCGCCCTCCGACGCCGCC | ||
| GGCGCCTCCTCGTCTTCCGCCTCGTCGTCCTC | ||
| TGCCGCGTcggggcccgcccccccgcgccccgTCGAGGG | ||
| CCAGCTCAGCGCGCGGAGCCGCAGCAACAGC | ||
| GCCGAGCGCCTGCTGGAGGCCGTGGCCGCCA | ||
| CCGCCGCCGAGGAgcccccggaggccgcgcccggccgcgc | ||
| gcgcgccgTGGAGAACCAGTACTCCTTCTACTAG | ||
| CCCGCGGC | ||
| MEST; MESTIT1 | GGCAGCTGCGCCTCGCAAGCGCAGTGCCGCA | 327 |
| GCGCACGCCGGAGTGGCTGTAGCTgcccggcgcgg | ||
| cgccgccctgcgcgggcTGTGGGCTGCGGGCTGCGCC | ||
| CCCGCTGCTGGCCAGCTCTGCACGGCTGCGG | ||
| GCTCTGCGGCGCCC | ||
| TMEM176B; TMEM176A | GCACGCACCCCGAGCTGCCTCCGCACAGTTG | 328 |
| GAGGAGCGTAGGAGGGACCCCCACCCAGGG | ||
| ATGACACTCCAGGAAGGGGACTGCAGAGGAA | ||
| GCCAGGTGCGGCCCCGGCTTTTGACCTACCTC | ||
| CGCACCGCAGCGCGGTCCTTCACGGGGCAGG | ||
| GGCGGCGTGAACCCGTCGGGCGTGAGCAGCA | ||
| GTCGGTGGAGCGGGAGGTCGGCGGTGGCGGG | ||
| GATGGGGGTATCCGGAGCGCAGCCGGGGCGC | ||
| AGCTGCTGGCACAGGAGCTCCACAGGCAGCC | ||
| AAGGACTCGGTCCTGTCCCAGAGCCTGCGGA | ||
| CTGTGGAGGGGAGGCCGCAGGAAGAGCCC | ||
| KCNH2 | GGGCGATGGGAGCTGGCCGGGCGCGCTGCGG | 329 |
| GGCGGAGAGCCGGGACCCACCAGCGCACGCC | ||
| GCTCCTCCGCGGGCCCGAGCCCTGCCACGTG | ||
| GTTGTCCATGGCTGTCACTTCGTCCAGGGCCA | ||
| GCGACTCGCTGCTGGGTGCCGCGGGCGTCAG | ||
| GTCCACGTCCACCACCAcggcccccggggcgcccgcgc | ||
| cgcccgcgccgcccgACCGCACCGACGACTCCCGGGC | ||
| KBTBD11 | GGCCACGGACAGCTGGAGCGCCGTGAGGCCC | 330 |
| CTGCGCCAGGCGCGCTCGCAGCTGCGGCTGC | ||
| TGGCCCTGGACGGTCACCTCTACGCCGTGGG | ||
| CGGCGAGTGCCTGCTCAGCGTGGAGCGCTAC | ||
| GAcccgcgcgccgaccgctgggcccccgtggcgccgctgccccggg | ||
| gcgccTTCGCCGTGGCGCATGAGGCCACCACCT | ||
| GCCACGGC | ||
| INT.chr8_38508331_38508694 | GGCCGCCAGCCCCAGAACAAATGGCGGCTTT | 331 |
| CCCGCTGTATTCAGCTAGTCAGCGTTCCCCGG | ||
| TTAAAAGGCGCTGGGGCAGGAACGGCCGGGG | ||
| CCTTCGGGGGCGCGACGCGGCGACGCCCAGC | ||
| CTGGGAAGGGGCGCGGGGCCCGTGTTGGCCG | ||
| CGGTGGGTCCCGGCTCCCTGGAGGCTGAGCC | ||
| CCGGGCGCTCTTTCCTCGCGGCGCTGCCGTGG | ||
| GGTGGCCGGGAGGGCAGAACGAGGGGCTGC | ||
| GGGACGGTGTTCGGAAGAAAATCGTGCGAGT | ||
| TTAAAAACATCCAAAGTGAGCCGAGCTGGGC | ||
| CCCAAGCCTCGGCCTCGCGCACTCGCCAGGC | ||
| CCAGGAGGCGGAGCAGGCGTC | ||
| INT.chr8_48675655_48676143 | GGGAATTAGCGTTTTAGTCTGTTCTGCTAACA | 332 |
| ATCGGTGTTTCTCAGTGCCAACCAGTCACAGC | ||
| ACGCGCGGCGCCCAGCCCCTGGTTTCTAAAC | ||
| CCCATTCTGTCGTCCCGGGACGAATGTCTTGG | ||
| GATGTCTGGCTGGGTCCAGGGCAAACATCCT | ||
| GCAGTGACAAAATGTGTGAAAGTGAAGTTGA | ||
| GCAACAAGAGGAACAGCATAGCAGCAGATTG | ||
| TAACCCAACAGATAAAGACATTGTCTGAGTC | ||
| AATGCGTACAGACCCATTATTCTCCGTGAGG | ||
| GAAGTGCGCGTCTTCCCTAGCCGCTACTGTCC | ||
| TCTTTCTGCGCCGCCGGGCTGGGCTCAGTCCG | ||
| CAGTGACCCAGTCTCGTGTAGGTGGGACCAG | ||
| CATCTTCACCGGCCGAGGAGACGCCCCTGGA | ||
| CCACCTGCGTGGGCAGGATCCAGGCAGGCGA | ||
| CAGGGCTCAGGGCTGCAGCCAAGTCCCCCAG | ||
| AAACAGTCCCCCTGGGAC | ||
| C9orf3 | CTGTGCTTGGAGGAAAGGACGACAGGTTTTA | 333 |
| AAGAGGGAGTGTCATCGCCTGCAGCGGGCGG | ||
| ACCCGTGTCCCGGCAGTGCAGCTCCGCAGGC | ||
| GCAGGAGGGATGCGGGCTGGGGACGCCTTGG | ||
| GGCGGCTGCAGGCTGGGGCGCGCCCCTGCAC | ||
| CCGGCGGGCCTCCGCTGCGTCCACTGCGGAC | ||
| AGGGGTCGGTGAGAGGCCC | ||
| C9orf172 | GGGGCCGCGCCGAGAAGACCCGTTGGGCCGC | 334 |
| GGCCGCAGCTACGAGAACCTGCTGGGGCGCG | ||
| AGGTGCGGGAGCCGCGAGGCGTGTCCCCCGA | ||
| AGGCCGGCGCCCGCCCGTCGTCGTGAACCTG | ||
| TCCACCTCTCCCAGACGCTACGCC | ||
| CLIC3 | GGGTTCTTGATGAACGCGGAGAACTTGTGGA | 335 |
| AAACGTCGTTGCCGGCGGTGTTGGACTCCCTG | ||
| TAACGAGGCGCCAGGCTGGGGAAGCTGCGGG | ||
| ATGAGGGGGTGGGACTCCATTAGACTGGGGG | ||
| CAGCCCCGTCCCGGCCCCACAGTACCCCCAC | ||
| AGCGCCTTCCTGGGCTCTGTCTTGCGCGCGTC | ||
| CTCCCTGGGCCCCCCTCTTTTCCCCTCCCACC | ||
| CTGCCGGGGCTCTCACTCGGGCGGCCCCAGC | ||
| GTCTCCTCCAGAAAGTCCTCGATCTGCAGCGT | ||
| GTCTGTCTTGGCGTC | ||
While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
Claims (20)
1. A method for analyzing a plurality of deoxyribonucleic (DNA) molecules from prostate cells in a biological sample obtained from a test subject, and providing a treatment to said test subject, said method comprising:
(a) providing a first portion of said plurality of DNA molecules, wherein said first portion comprises DNA molecules with methylated and unmethylated cytosines in CpG dinucleotides, and generating fragments by subjecting said first portion to methylation status-dependent fragmentation conditions at one or more CpG sites sufficient to produce fragments of at least a subset of said DNA molecules of said first portion;
(b) providing a second portion of said plurality of DNA molecules, wherein said second portion is not subjected to said fragmentation conditions;
(c) for one or more genomic regions, processing: (i) said first portion of said plurality of DNA molecules subjected to said fragmentation conditions, or derivatives thereof, to yield a first quantitative measure of DNA methylation, and (ii) said second portion of said plurality of DNA molecules not subjected to said fragmentation conditions, or derivatives thereof, to yield a second quantitative measure of DNA methylation, wherein said one or more genomic regions include genomic coordinates chr5:74907443-74907561 of the ANKDD1B locus;
(d) processing said first quantitative measure of DNA methylation with said second quantitative measure of DNA methylation to yield a third quantitative measure of DNA methylation at said one or more genomic regions in said plurality of DNA molecules, thereby generating a methylation profile of said plurality of DNA molecules at said one or more genomic regions;
(e) processing said methylation profile to indicate a likelihood of said test subject having prostate cancer, wherein processing said methylation profile indicates increased methylation in said genomic region that is chr5:74907443-74907561 in said plurality of DNA molecules as compared to methylation in chr5:74907443-74907561 in a sample obtained from a control subject without prostate cancer, wherein processing said methylation profile indicates an increased likelihood of said test subject having prostate cancer; and
(f) administering to said test subject a therapeutic intervention for treatment of prostate cancer, wherein said therapeutic intervention comprises a surgical prostate tumor resection, a chemotherapy, a radiotherapy, a targeted therapy, or an immunotherapy.
2. The method of claim 1 , wherein said biological sample is obtained or derived from a tissue sample, a blood sample, a plasma sample, a serum sample, an exosome sample, or a urine sample.
3. The method of claim 1 , further comprising performing an assay selected from the group consisting of methylation-sensitive restriction enzyme (MSRE) digestion, polymerase chain reaction (PCR), quantitative PCR (qPCR), nucleic acid sequencing, target capture, mass spectrometry-based target fragmentation assay, flap endonuclease-based assay, CRISPR-based assay, methylation-specific assay comprising bisulfite treatment, methylation-specific PCR, targeted sequencing, targeted bisulfite sequencing, pyrosequencing, mass spectroscopy-based bisulfite sequencing (EpiTYPER), reduced representation bisulfite sequence (RRBS), whole genome sequencing (WGS), and a combination thereof.
4. The method of claim 1 , wherein said fragmentation conditions comprise MSRE digestion at said one or more CpG sites.
5. The method of claim 4 , wherein said MSRE comprises HpaII.
6. The method of claim 1 , wherein:
processing said first portion of said plurality of DNA molecules subjected to said fragmentation conditions, or derivatives thereof, in (c) (i) comprises amplification; and
processing said second portion of said plurality of DNA molecules not subjected to said fragmentation conditions, or derivatives thereof, in (c) (ii) comprises amplification.
7. The method of claim 6 , wherein:
said amplification comprises targeted quantitative polymerase chain reaction (qPCR) at said one or more genomic regions;
processing said first portion of said plurality of DNA molecules subjected to said fragmentation conditions, or derivatives thereof, in (c) (i) comprises determining a first cycle threshold (Ct) value for amplification of said one or more genomic regions; and
processing said second portion of said plurality of DNA molecules not subjected to said fragmentation conditions, or derivatives thereof, in (c) (ii) comprises determining a second cycle threshold (Ct) value for amplification of said one or more genomic regions.
8. The method of claim 7 , wherein (c) comprises:
determining a reference Ct value for amplification of one or more reference genomic regions in said first portion of said plurality of DNA molecules subjected to said fragmentation conditions, or derivatives thereof, and in said second portion of said plurality of DNA molecules not subjected to said fragmentation conditions, or derivatives thereof; and
normalizing said first quantitative measure of DNA methylation and said second quantitative measure of DNA methylation using said reference Ct value.
9. The method of claim 7 , wherein processing said first quantitative measure of DNA methylation with said second quantitative measure of DNA methylation in (d) comprises calculating an intensity ratio of said first quantitative measure of DNA methylation and said second quantitative measure of DNA methylation at said one or more genomic regions.
10. The method of claim 1 , further comprising subjecting said first portion of said plurality of DNA molecules and said second portion of said plurality of DNA molecules to conditions sufficient to permit methylated nucleic acid bases to be distinguished from unmethylated nucleic acid bases.
11. The method of claim 10 , wherein said conditions sufficient to permit methylated nucleic acid bases to be distinguished from unmethylated nucleic acid bases comprise performing bisulfite treatment.
12. The method of claim 1 , wherein each of said one or more genomic regions comprises one or more CpG sites.
13. The method of claim 1 , further comprising processing said methylation profile with one or more reference methylation profiles obtained from reference biological samples of one or more additional subjects, wherein said one or more additional subjects comprise subjects having prostate cancer.
14. A method for analyzing a plurality of deoxyribonucleic (DNA) molecules from prostate cells in a biological sample obtained from a test subject, and providing a treatment to said test subject, said method comprising:
(a) providing a first portion of said plurality of DNA molecules, wherein said first portion comprises DNA molecules with methylated and unmethylated cytosines in CpG dinucleotides, and generating fragments by subjecting said first portion to methylation status-dependent fragmentation conditions at one or more CpG sites sufficient to produce fragments of at least a subset of said DNA molecules of said first portion;
(b) providing a second portion of said plurality of DNA molecules, wherein said second portion is not subjected to said fragmentation conditions, wherein said second portion has a substantially equal amount of DNA molecules as said first portion;
(c) for one or more genomic regions, processing: (i) said first portion of said plurality of DNA molecules subjected to said fragmentation conditions, or derivatives thereof, to yield a first quantitative measure of DNA methylation and; (ii) said second portion of said plurality of DNA molecules not subjected to said fragmentation conditions, or derivatives thereof, to yield a second quantitative measure of DNA methylation, wherein said one or more genomic regions include genomic coordinates chr5:74907443-74907561 of the ANKDD1B locus;
(d) processing said first quantitative measure of DNA methylation with said second quantitative measure of DNA methylation to yield a third quantitative measure of DNA methylation at said one or more genomic regions in said plurality of DNA molecules, thereby generating a methylation profile of said plurality of DNA molecules at said one or more genomic regions;
(e) processing said methylation profile to indicate a likelihood of said test subject as having prostate cancer, wherein processing said methylation profile indicates increased methylation in said genomic region that is chr5:74907443-74907561 in said plurality of DNA molecules as compared to methylation in chr5:74907443-74907561 in a sample obtained from a control subject without prostate cancer, wherein processing said methylation profile indicates an increased likelihood of said test subject having prostate cancer; and
(f) administering to said test subject a therapeutic intervention for said treatment of prostate cancer, wherein said therapeutic intervention comprises a surgical prostate tumor resection, a chemotherapy, a radiotherapy, a targeted therapy, or an immunotherapy.
15. A method for processing or analyzing a plurality of deoxyribonucleic (DNA) molecules from a biological sample of a subject, comprising:
(a) providing a first portion of said plurality of DNA molecules, wherein said first portion comprises DNA fragments generated upon subjecting at least a subset of said plurality of DNA molecules to fragmentation conditions sufficient to fragment at least a subset of said plurality of DNA molecules at one or more CpG sites, wherein at least a subset of said DNA fragments comprises methylated nucleic acid bases;
(b) providing a second portion of said plurality of DNA molecules, wherein said second portion has a substantially equal amount of DNA as said first portion;
(c) for one or more genomic regions, processing (i) said first portion of said plurality of DNA molecules or derivatives thereof to yield a first quantitative measure of DNA methylation, and (ii) said second portion of said plurality of DNA molecules or derivatives thereof to yield a second quantitative measure of DNA methylation;
(d) processing said first quantitative measure of DNA methylation with said second quantitative measure of DNA methylation to yield a third quantitative measure of DNA methylation at said one or more genomic regions, thereby generating a methylation profile of said plurality of DNA molecules at said one or more genomic regions;
(e) processing said methylation profile to generate a likelihood of said subject having or being suspected of having prostate cancer; and
(f) providing a subject that is suspected as having or being suspected of having prostate cancer based on said likelihood with a therapeutic intervention, wherein said therapeutic intervention comprises a surgical tumor resection, an effective dose of chemotherapy, an effective dose of radiotherapy, an effective dose of targeted therapy, or an effective dose of immunotherapy,
wherein said one or more genomic regions comprise ANKDD1B with genomic coordinate of chr5:74907443-74907561.
16. The method of claim 1 , wherein said one or more genomic regions further comprise genomic coordinates chr5:180017902-180018673 of the SCGB3A1 locus.
17. The method of claim 1 , wherein said one or more genomic regions further comprise genomic coordinates chr5:7850160-7850286 of the C5orf49 locus.
18. The method of claim 1 , wherein said one or more genomic regions further comprise genomic coordinates chr9:97807476-97807681 of the C9orf3 locus.
19. The method of claim 1 , wherein said one or more genomic regions further comprise genomic coordinates chr2:54086834-54087017 of the GPR75-ASB3 locus.
20. The method of claim 1 , wherein processing said methylation profile in (e) comprises using a classifier.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/995,180 US11427874B1 (en) | 2019-08-26 | 2020-08-17 | Methods and systems for detection of prostate cancer by DNA methylation analysis |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962891673P | 2019-08-26 | 2019-08-26 | |
| US16/995,180 US11427874B1 (en) | 2019-08-26 | 2020-08-17 | Methods and systems for detection of prostate cancer by DNA methylation analysis |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US11427874B1 true US11427874B1 (en) | 2022-08-30 |
Family
ID=83007668
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/995,180 Active US11427874B1 (en) | 2019-08-26 | 2020-08-17 | Methods and systems for detection of prostate cancer by DNA methylation analysis |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US11427874B1 (en) |
Citations (32)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030124600A1 (en) | 2001-11-16 | 2003-07-03 | David Sidransky | Method of detection of prostate cancer |
| EP1342794A1 (en) | 2002-03-05 | 2003-09-10 | Epigenomics AG | Method and device for determination of tissue specificity of free floating DNA in bodily fluids |
| EP1423533A2 (en) | 2001-06-22 | 2004-06-02 | Epigenomics AG | Method for high sensitivity detection of cytosine-methylation |
| WO2005040399A2 (en) * | 2003-10-21 | 2005-05-06 | Orion Genomics Llc | Methods for quantitative determination of methylation density in a dna locus |
| WO2005085477A1 (en) * | 2004-03-02 | 2005-09-15 | Orion Genomics Llc | Differential enzymatic fragmentation by whole genome amplification |
| US20060051768A1 (en) | 2003-03-25 | 2006-03-09 | John Wayne Cancer Institute | DNA markers for management of cancer |
| US20060194208A1 (en) | 2003-08-15 | 2006-08-31 | Reimo Tetzner | Method for the detection of cytosine methylations in dna with the aid of scorpion |
| WO2007018601A1 (en) | 2005-08-02 | 2007-02-15 | Rubicon Genomics, Inc. | Compositions and methods for processing and amplification of dna, including using multiple enzymes in a single reaction |
| US20070178506A1 (en) | 2002-06-26 | 2007-08-02 | Cold Spring Harbor Laboratory And Washington University | Methods and compositions for determining methylation profiles |
| US20090170088A1 (en) | 2007-02-02 | 2009-07-02 | Orion Genomics Llc | Gene methylation in cancer diagnosis |
| US20090305234A1 (en) | 2005-03-11 | 2009-12-10 | Sven Olek | Specific DNAS for Epigenetic Characterisation of Cells and Tissues |
| US20100144836A1 (en) | 2007-01-09 | 2010-06-10 | Oncomethylome Sciences Sa | Methods for Detecting Epigenetic Modifications |
| EP2313341A2 (en) | 2008-07-07 | 2011-04-27 | Nanunanu Ltd. | Inorganic nanotubes |
| WO2011070441A2 (en) | 2009-12-11 | 2011-06-16 | Adam Wasserstrom | Categorization of dna samples |
| WO2011101728A2 (en) | 2010-02-19 | 2011-08-25 | Nucleix | Identification of source of dna samples |
| WO2011132061A2 (en) | 2010-04-20 | 2011-10-27 | Nucleix | Methylation profiling of dna samples |
| US20120122088A1 (en) | 2010-11-15 | 2012-05-17 | Hongzhi Zou | Methylation assay |
| WO2012070037A2 (en) | 2010-11-22 | 2012-05-31 | Rosetta Genomics Ltd. | Methods and materials for classification of tissue of origin of tumor samples |
| US20120322058A1 (en) | 2011-02-09 | 2012-12-20 | Bio-Rad Laboratories | Analysis of nucleic acids |
| US20130078626A1 (en) | 2009-12-11 | 2013-03-28 | Nucleix | Categorization of dna samples |
| US20130224740A1 (en) | 2010-09-03 | 2013-08-29 | Centre National De La Recherche Scientifique(Cnrs) | Analytical methods for cell free nucleic acids and applications |
| US20140057259A1 (en) | 2008-03-15 | 2014-02-27 | Hologic, Inc. | Methods for analysis of nucleic acid molecules during amplification reactions |
| US20140227700A1 (en) | 2006-11-24 | 2014-08-14 | Epigenomics Ag | Methods and nucleic acids for the analysis of gene expression associated with the development of prostate cell proliferative disorders |
| US20140322707A1 (en) | 2011-04-06 | 2014-10-30 | The University Of Chicago | COMPOSITION AND METHODS RELATED TO MODIFICATION OF 5-METHYLCYTOSINE (5-mC) |
| US20140363815A1 (en) | 2011-12-13 | 2014-12-11 | Oslo Universitetssykehus Hf | Methods and kits for detection of methylation status |
| EP2971170A1 (en) | 2013-03-14 | 2016-01-20 | HudsonAlpha Institute For Biotechnology | Differential methylation level of cpg loci that are determinative of a biochemical reoccurrence of prostate cancer |
| EP3034628A1 (en) | 2010-09-13 | 2016-06-22 | Clinical Genomics Pty Ltd | Diagnosis of cancer by means of methylation marker |
| US20160201142A1 (en) | 2015-01-13 | 2016-07-14 | The Chinese University Of Hong Kong | Using size and number aberrations in plasma dna for detecting cancer |
| CN106795562A (en) | 2014-07-18 | 2017-05-31 | 香港中文大学 | Analysis of tissue methylation patterns in DNA mixtures |
| US20170233820A1 (en) | 2014-12-19 | 2017-08-17 | Epigenomics Ag | METHODS FOR DETECTING CpG METHYLATION AND FOR DIAGNOSING CANCER |
| US20180100196A1 (en) | 2013-01-23 | 2018-04-12 | The Johns Hopkins University | Dna methylation markers for metastatic prostate cancer |
| WO2019068082A1 (en) | 2017-09-29 | 2019-04-04 | Arizona Board Of Regents On Behalf Of The University Of Arizona | DNA METHYLATION BIOMARKERS FOR THE DIAGNOSIS OF CANCER |
-
2020
- 2020-08-17 US US16/995,180 patent/US11427874B1/en active Active
Patent Citations (34)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1423533A2 (en) | 2001-06-22 | 2004-06-02 | Epigenomics AG | Method for high sensitivity detection of cytosine-methylation |
| US20030124600A1 (en) | 2001-11-16 | 2003-07-03 | David Sidransky | Method of detection of prostate cancer |
| EP1342794A1 (en) | 2002-03-05 | 2003-09-10 | Epigenomics AG | Method and device for determination of tissue specificity of free floating DNA in bodily fluids |
| US20070178506A1 (en) | 2002-06-26 | 2007-08-02 | Cold Spring Harbor Laboratory And Washington University | Methods and compositions for determining methylation profiles |
| US20060051768A1 (en) | 2003-03-25 | 2006-03-09 | John Wayne Cancer Institute | DNA markers for management of cancer |
| US20060194208A1 (en) | 2003-08-15 | 2006-08-31 | Reimo Tetzner | Method for the detection of cytosine methylations in dna with the aid of scorpion |
| WO2005040399A2 (en) * | 2003-10-21 | 2005-05-06 | Orion Genomics Llc | Methods for quantitative determination of methylation density in a dna locus |
| US20050153316A1 (en) | 2003-10-21 | 2005-07-14 | Orion Genomics Llc | Methods for quantitative determination of methylation density in a DNA locus |
| WO2005085477A1 (en) * | 2004-03-02 | 2005-09-15 | Orion Genomics Llc | Differential enzymatic fragmentation by whole genome amplification |
| US20050272065A1 (en) | 2004-03-02 | 2005-12-08 | Orion Genomics Llc | Differential enzymatic fragmentation by whole genome amplification |
| US20090305234A1 (en) | 2005-03-11 | 2009-12-10 | Sven Olek | Specific DNAS for Epigenetic Characterisation of Cells and Tissues |
| WO2007018601A1 (en) | 2005-08-02 | 2007-02-15 | Rubicon Genomics, Inc. | Compositions and methods for processing and amplification of dna, including using multiple enzymes in a single reaction |
| US20140227700A1 (en) | 2006-11-24 | 2014-08-14 | Epigenomics Ag | Methods and nucleic acids for the analysis of gene expression associated with the development of prostate cell proliferative disorders |
| US20100144836A1 (en) | 2007-01-09 | 2010-06-10 | Oncomethylome Sciences Sa | Methods for Detecting Epigenetic Modifications |
| US20090170088A1 (en) | 2007-02-02 | 2009-07-02 | Orion Genomics Llc | Gene methylation in cancer diagnosis |
| US20140057259A1 (en) | 2008-03-15 | 2014-02-27 | Hologic, Inc. | Methods for analysis of nucleic acid molecules during amplification reactions |
| EP2313341A2 (en) | 2008-07-07 | 2011-04-27 | Nanunanu Ltd. | Inorganic nanotubes |
| WO2011070441A2 (en) | 2009-12-11 | 2011-06-16 | Adam Wasserstrom | Categorization of dna samples |
| US20130078626A1 (en) | 2009-12-11 | 2013-03-28 | Nucleix | Categorization of dna samples |
| WO2011101728A2 (en) | 2010-02-19 | 2011-08-25 | Nucleix | Identification of source of dna samples |
| WO2011132061A2 (en) | 2010-04-20 | 2011-10-27 | Nucleix | Methylation profiling of dna samples |
| US20130224740A1 (en) | 2010-09-03 | 2013-08-29 | Centre National De La Recherche Scientifique(Cnrs) | Analytical methods for cell free nucleic acids and applications |
| EP3034628A1 (en) | 2010-09-13 | 2016-06-22 | Clinical Genomics Pty Ltd | Diagnosis of cancer by means of methylation marker |
| US20120122088A1 (en) | 2010-11-15 | 2012-05-17 | Hongzhi Zou | Methylation assay |
| WO2012070037A2 (en) | 2010-11-22 | 2012-05-31 | Rosetta Genomics Ltd. | Methods and materials for classification of tissue of origin of tumor samples |
| US20120322058A1 (en) | 2011-02-09 | 2012-12-20 | Bio-Rad Laboratories | Analysis of nucleic acids |
| US20140322707A1 (en) | 2011-04-06 | 2014-10-30 | The University Of Chicago | COMPOSITION AND METHODS RELATED TO MODIFICATION OF 5-METHYLCYTOSINE (5-mC) |
| US20140363815A1 (en) | 2011-12-13 | 2014-12-11 | Oslo Universitetssykehus Hf | Methods and kits for detection of methylation status |
| US20180100196A1 (en) | 2013-01-23 | 2018-04-12 | The Johns Hopkins University | Dna methylation markers for metastatic prostate cancer |
| EP2971170A1 (en) | 2013-03-14 | 2016-01-20 | HudsonAlpha Institute For Biotechnology | Differential methylation level of cpg loci that are determinative of a biochemical reoccurrence of prostate cancer |
| CN106795562A (en) | 2014-07-18 | 2017-05-31 | 香港中文大学 | Analysis of tissue methylation patterns in DNA mixtures |
| US20170233820A1 (en) | 2014-12-19 | 2017-08-17 | Epigenomics Ag | METHODS FOR DETECTING CpG METHYLATION AND FOR DIAGNOSING CANCER |
| US20160201142A1 (en) | 2015-01-13 | 2016-07-14 | The Chinese University Of Hong Kong | Using size and number aberrations in plasma dna for detecting cancer |
| WO2019068082A1 (en) | 2017-09-29 | 2019-04-04 | Arizona Board Of Regents On Behalf Of The University Of Arizona | DNA METHYLATION BIOMARKERS FOR THE DIAGNOSIS OF CANCER |
Non-Patent Citations (6)
| Title |
|---|
| Cheow et al., Multiplexed locus-specific analysis of DNA methylation in single cells (2015) Nature Protocols vol. 10, No. 4 pp. 619-631 (Year: 2015). * |
| Christensen et al. Aging and Environmental Exposures Alter Tissue-Specific DNA Methylation Dependent upon CpG Island Context (2009) PLoS Genetics vol. 5, No. 8, 13 pages (Year: 2009). * |
| Costello et al. Graded Methylation in the Promoter and Body of the O6-Methylguanine DNA Methyltransferase (MGMT) Gene Correlates with MGMT Expression in Human Glioma Cells (1994) Journal of Biological Chemistry vol. 269, No. 25, pp. 17228-17237 (Year: 1994). * |
| Goni et al The qPCR data statistical analysis Integromics White Paper—Sep. 2009, available at https://gene-quantification.de/integromics-qpcr-statistics-white-paper.pdf (Year: 2009). * |
| Murat Bioinformatics analysis of epigenetic variants associated with melanoma. Diss. University of Bradford, Jul. 30, 2018, available at https://bradscholars.brad.ac.uk/handle/10454/17220 (Year: 2018). * |
| NCBI Reference Sequence: NM_001276713.2, Dec. 15, 2000) available at https://www.ncbi.nlm.nih.gov/nuccore/NM_001276713.2 ?report=genbank. (Year: 2000). * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12410480B2 (en) | Methods and systems for detecting colorectal cancer via nucleic acid methylation analysis | |
| US20240084397A1 (en) | Methods and systems for detecting cancer via nucleic acid methylation analysis | |
| US20240274232A1 (en) | Cell-free detection of methylated breast tumor | |
| JP2020103298A (en) | Systems and methods for detecting rare mutations and copy number polymorphisms | |
| US10457988B2 (en) | MiRNAs as diagnostic markers | |
| KR20210023804A (en) | Tissue specific methylation marker | |
| WO2022261039A2 (en) | Cancer detection method, kit, and system | |
| JP2024507174A (en) | Cell-free DNA methylation test | |
| US20220213558A1 (en) | Methods and systems for urine-based detection of urologic conditions | |
| US11427874B1 (en) | Methods and systems for detection of prostate cancer by DNA methylation analysis | |
| JP2024519082A (en) | DNA methylation biomarkers for hepatocellular carcinoma | |
| WO2025179073A1 (en) | Methods and systems for tissue informed differentially methylated region analysis | |
| JP2025535077A (en) | Systems and methods for multi-analyte detection of cancer - Patents.com | |
| WO2025213034A1 (en) | Systems and methods for multiple biomarker analysis in cancer | |
| WO2024159118A1 (en) | Methods of hyper- and hypo-methylation analysis for disease detection | |
| WO2024155681A1 (en) | Methods and systems for detecting and assessing liver conditions | |
| WO2025024670A1 (en) | Systems and methods for methylation analysis of liver disease |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| CC | Certificate of correction |