WO2023147445A2 - Cell-free rna biomarkers for the detection of cancer or predisposition to cancer - Google Patents
Cell-free rna biomarkers for the detection of cancer or predisposition to cancer Download PDFInfo
- Publication number
- WO2023147445A2 WO2023147445A2 PCT/US2023/061410 US2023061410W WO2023147445A2 WO 2023147445 A2 WO2023147445 A2 WO 2023147445A2 US 2023061410 W US2023061410 W US 2023061410W WO 2023147445 A2 WO2023147445 A2 WO 2023147445A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- seq
- cfrna
- cancer
- primer pair
- analysis
- Prior art date
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 196
- 108091092259 cell-free RNA Proteins 0.000 title claims abstract description 192
- 201000011510 cancer Diseases 0.000 title claims abstract description 163
- 239000000090 biomarker Substances 0.000 title claims abstract description 121
- 238000001514 detection method Methods 0.000 title description 26
- 238000000034 method Methods 0.000 claims abstract description 180
- -1 ZRANB2-AS2 Proteins 0.000 claims abstract description 135
- 230000014509 gene expression Effects 0.000 claims abstract description 83
- 239000012472 biological sample Substances 0.000 claims abstract description 52
- 102100030991 Nucleolar and spindle-associated protein 1 Human genes 0.000 claims abstract description 40
- 102100025832 Centromere-associated protein E Human genes 0.000 claims abstract description 38
- 102100038614 Hemoglobin subunit gamma-1 Human genes 0.000 claims abstract description 35
- 102100038617 Hemoglobin subunit gamma-2 Human genes 0.000 claims abstract description 35
- 102100040035 Interferon-induced transmembrane protein 3 Human genes 0.000 claims abstract description 30
- 101000755749 Homo sapiens Axin interactor, dorsalization-associated protein Proteins 0.000 claims abstract description 28
- 101000601441 Homo sapiens Serine/threonine-protein kinase Nek2 Proteins 0.000 claims abstract description 28
- 102100029470 Apolipoprotein E Human genes 0.000 claims abstract description 27
- 102100031953 Protein 4.2 Human genes 0.000 claims abstract description 27
- 102100037703 Serine/threonine-protein kinase Nek2 Human genes 0.000 claims abstract description 27
- 101001048702 Homo sapiens RNA polymerase II elongation factor ELL2 Proteins 0.000 claims abstract description 26
- 102100036201 Oxygen-dependent coproporphyrinogen-III oxidase, mitochondrial Human genes 0.000 claims abstract description 26
- 102100035890 Delta(24)-sterol reductase Human genes 0.000 claims abstract description 25
- 102100023750 RNA polymerase II elongation factor ELL2 Human genes 0.000 claims abstract description 25
- 101100396941 Citrobacter freundii cai gene Proteins 0.000 claims abstract description 24
- 101100135884 Mus musculus Pdia4 gene Proteins 0.000 claims abstract description 24
- 101150114014 cagA gene Proteins 0.000 claims abstract description 24
- 101000915292 Homo sapiens Cytoplasmic dynein 1 intermediate chain 2 Proteins 0.000 claims abstract description 23
- 101000677891 Homo sapiens Iron-sulfur clusters transporter ABCB7, mitochondrial Proteins 0.000 claims abstract description 23
- 101000736088 Homo sapiens PC4 and SFRS1-interacting protein Proteins 0.000 claims abstract description 23
- 102100026679 Carboxypeptidase Q Human genes 0.000 claims abstract description 22
- 102100028523 Cytoplasmic dynein 1 intermediate chain 2 Human genes 0.000 claims abstract description 22
- 102100031758 Extracellular matrix protein 1 Human genes 0.000 claims abstract description 22
- 102100039271 Histone H2A type 1-H Human genes 0.000 claims abstract description 22
- 101000900939 Homo sapiens Abnormal spindle-like microcephaly-associated protein Proteins 0.000 claims abstract description 22
- 101000935548 Homo sapiens Cytoplasmic tyrosine-protein kinase BMX Proteins 0.000 claims abstract description 22
- 101000825726 Homo sapiens Structural maintenance of chromosomes protein 4 Proteins 0.000 claims abstract description 22
- 101000844217 Homo sapiens Thioredoxin domain-containing protein 16 Proteins 0.000 claims abstract description 22
- 101000804798 Homo sapiens Werner syndrome ATP-dependent helicase Proteins 0.000 claims abstract description 22
- 102100021504 Iron-sulfur clusters transporter ABCB7, mitochondrial Human genes 0.000 claims abstract description 22
- 102100028130 N-formyl peptide receptor 3 Human genes 0.000 claims abstract description 22
- 102100036220 PC4 and SFRS1-interacting protein Human genes 0.000 claims abstract description 22
- 102100025352 Serine/threonine-protein kinase MRCK alpha Human genes 0.000 claims abstract description 22
- 102100035253 Transmembrane protein 150C Human genes 0.000 claims abstract description 22
- 238000010195 expression analysis Methods 0.000 claims abstract description 22
- 102100022117 Abnormal spindle-like microcephaly-associated protein Human genes 0.000 claims abstract description 21
- 102100027907 Cytoplasmic tyrosine-protein kinase BMX Human genes 0.000 claims abstract description 21
- 101000576901 Homo sapiens Serine/threonine-protein kinase MRCK alpha Proteins 0.000 claims abstract description 21
- 102100021464 Kinetochore scaffold 1 Human genes 0.000 claims abstract description 21
- 102100022842 Structural maintenance of chromosomes protein 4 Human genes 0.000 claims abstract description 21
- 102100032034 Thioredoxin domain-containing protein 16 Human genes 0.000 claims abstract description 21
- 102100035336 Werner syndrome ATP-dependent helicase Human genes 0.000 claims abstract description 21
- 102000014817 CACNA1A Human genes 0.000 claims abstract description 20
- 101000935117 Homo sapiens Voltage-dependent P/Q-type calcium channel subunit alpha-1A Proteins 0.000 claims abstract description 20
- 101000914247 Homo sapiens Centromere-associated protein E Proteins 0.000 claims abstract 6
- 101001031977 Homo sapiens Hemoglobin subunit gamma-1 Proteins 0.000 claims abstract 6
- 101001031961 Homo sapiens Hemoglobin subunit gamma-2 Proteins 0.000 claims abstract 6
- 101001034846 Homo sapiens Interferon-induced transmembrane protein 3 Proteins 0.000 claims abstract 6
- 101000991410 Homo sapiens Nucleolar and spindle-associated protein 1 Proteins 0.000 claims abstract 6
- 239000013256 coordination polymer Substances 0.000 claims abstract 6
- 102100030690 Histone H2B type 1-C/E/F/G/I Human genes 0.000 claims abstract 5
- 101000910846 Homo sapiens Carboxypeptidase Q Proteins 0.000 claims abstract 5
- 101000866526 Homo sapiens Extracellular matrix protein 1 Proteins 0.000 claims abstract 5
- 101001036100 Homo sapiens Histone H2A type 1-H Proteins 0.000 claims abstract 5
- 101001084682 Homo sapiens Histone H2B type 1-C/E/F/G/I Proteins 0.000 claims abstract 5
- 101000971521 Homo sapiens Kinetochore scaffold 1 Proteins 0.000 claims abstract 5
- 101001059802 Homo sapiens N-formyl peptide receptor 3 Proteins 0.000 claims abstract 5
- 101000596302 Homo sapiens Transmembrane protein 150C Proteins 0.000 claims abstract 5
- 101000781944 Homo sapiens Zinc finger CCCH domain-containing protein 6 Proteins 0.000 claims abstract 5
- 102100036581 Zinc finger CCCH domain-containing protein 6 Human genes 0.000 claims abstract 5
- ZPCCSZFPOXBNDL-ZSTSFXQOSA-N [(4r,5s,6s,7r,9r,10r,11e,13e,16r)-6-[(2s,3r,4r,5s,6r)-5-[(2s,4r,5s,6s)-4,5-dihydroxy-4,6-dimethyloxan-2-yl]oxy-4-(dimethylamino)-3-hydroxy-6-methyloxan-2-yl]oxy-10-[(2r,5s,6r)-5-(dimethylamino)-6-methyloxan-2-yl]oxy-5-methoxy-9,16-dimethyl-2-oxo-7-(2-oxoe Chemical compound O([C@H]1/C=C/C=C/C[C@@H](C)OC(=O)C[C@H]([C@@H]([C@H]([C@@H](CC=O)C[C@H]1C)O[C@H]1[C@@H]([C@H]([C@H](O[C@@H]2O[C@@H](C)[C@H](O)[C@](C)(O)C2)[C@@H](C)O1)N(C)C)O)OC)OC(C)=O)[C@H]1CC[C@H](N(C)C)[C@@H](C)O1 ZPCCSZFPOXBNDL-ZSTSFXQOSA-N 0.000 claims abstract 5
- 101150037123 APOE gene Proteins 0.000 claims abstract 4
- 101000755748 Escherichia coli AIDA-I autotransporter Proteins 0.000 claims abstract 4
- 101000929877 Homo sapiens Delta(24)-sterol reductase Proteins 0.000 claims abstract 4
- 101001021103 Homo sapiens Oxygen-dependent coproporphyrinogen-III oxidase, mitochondrial Proteins 0.000 claims abstract 4
- 101000920625 Homo sapiens Protein 4.2 Proteins 0.000 claims abstract 4
- 239000000523 sample Substances 0.000 claims description 125
- 238000004458 analytical method Methods 0.000 claims description 115
- 206010035226 Plasma cell myeloma Diseases 0.000 claims description 97
- 206010073071 hepatocellular carcinoma Diseases 0.000 claims description 84
- 231100000844 hepatocellular carcinoma Toxicity 0.000 claims description 84
- 201000000050 myeloid neoplasm Diseases 0.000 claims description 66
- 201000005328 monoclonal gammopathy of uncertain significance Diseases 0.000 claims description 54
- 208000010190 Monoclonal Gammopathy of Undetermined Significance Diseases 0.000 claims description 44
- 210000004027 cell Anatomy 0.000 claims description 36
- 238000007637 random forest analysis Methods 0.000 claims description 35
- 210000002381 plasma Anatomy 0.000 claims description 34
- 201000007270 liver cancer Diseases 0.000 claims description 33
- 208000014018 liver neoplasm Diseases 0.000 claims description 32
- 208000034578 Multiple myelomas Diseases 0.000 claims description 31
- 108020004635 Complementary DNA Proteins 0.000 claims description 30
- 102100031752 Fibrinogen alpha chain Human genes 0.000 claims description 29
- 102100028313 Fibrinogen beta chain Human genes 0.000 claims description 29
- 210000001519 tissue Anatomy 0.000 claims description 29
- 208000019425 cirrhosis of liver Diseases 0.000 claims description 27
- 238000010804 cDNA synthesis Methods 0.000 claims description 26
- 210000004369 blood Anatomy 0.000 claims description 25
- 239000008280 blood Substances 0.000 claims description 25
- 239000002299 complementary DNA Substances 0.000 claims description 25
- 108010044853 histidine-rich proteins Proteins 0.000 claims description 25
- 102100024783 Fibrinogen gamma chain Human genes 0.000 claims description 23
- 201000005787 hematologic cancer Diseases 0.000 claims description 20
- 208000024200 hematopoietic and lymphoid system neoplasm Diseases 0.000 claims description 19
- 206010016654 Fibrosis Diseases 0.000 claims description 18
- 150000007523 nucleic acids Chemical class 0.000 claims description 18
- 238000006243 chemical reaction Methods 0.000 claims description 17
- 230000007882 cirrhosis Effects 0.000 claims description 17
- 102000039446 nucleic acids Human genes 0.000 claims description 16
- 108020004707 nucleic acids Proteins 0.000 claims description 16
- 238000003752 polymerase chain reaction Methods 0.000 claims description 16
- 101000789523 Homo sapiens Sodium/potassium-transporting ATPase subunit beta-1 Proteins 0.000 claims description 12
- 102100028844 Sodium/potassium-transporting ATPase subunit beta-1 Human genes 0.000 claims description 11
- 210000002966 serum Anatomy 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 9
- 238000003757 reverse transcription PCR Methods 0.000 claims description 9
- 238000003786 synthesis reaction Methods 0.000 claims description 8
- 210000002700 urine Anatomy 0.000 claims description 8
- 238000003753 real-time PCR Methods 0.000 claims description 7
- 230000015572 biosynthetic process Effects 0.000 claims description 6
- 210000003296 saliva Anatomy 0.000 claims description 6
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 claims description 5
- 238000003066 decision tree Methods 0.000 claims description 5
- 210000000582 semen Anatomy 0.000 claims description 5
- 210000001138 tear Anatomy 0.000 claims description 5
- 230000002550 fecal effect Effects 0.000 claims description 4
- 210000004251 human milk Anatomy 0.000 claims description 4
- 235000020256 human milk Nutrition 0.000 claims description 4
- 101150071539 AS2 gene Proteins 0.000 claims description 2
- 102100031181 Glyceraldehyde-3-phosphate dehydrogenase Human genes 0.000 claims description 2
- 101150098212 LBD6 gene Proteins 0.000 claims description 2
- 210000001175 cerebrospinal fluid Anatomy 0.000 claims description 2
- 108020004445 glyceraldehyde-3-phosphate dehydrogenase Proteins 0.000 claims description 2
- 101000846244 Homo sapiens Fibrinogen alpha chain Proteins 0.000 claims 1
- 101000917163 Homo sapiens Fibrinogen beta chain Proteins 0.000 claims 1
- 101001052043 Homo sapiens Fibrinogen gamma chain Proteins 0.000 claims 1
- 102100022661 Pro-neuregulin-1, membrane-bound isoform Human genes 0.000 claims 1
- 108090000623 proteins and genes Proteins 0.000 description 141
- 238000012163 sequencing technique Methods 0.000 description 122
- 102000053602 DNA Human genes 0.000 description 66
- 108020004414 DNA Proteins 0.000 description 66
- 229920002477 rna polymer Polymers 0.000 description 50
- 102000040430 polynucleotide Human genes 0.000 description 46
- 108091033319 polynucleotide Proteins 0.000 description 46
- 239000002157 polynucleotide Substances 0.000 description 46
- 108010031379 centromere protein E Proteins 0.000 description 32
- 108010075016 Ceruloplasmin Proteins 0.000 description 30
- 238000003199 nucleic acid amplification method Methods 0.000 description 30
- 102100023321 Ceruloplasmin Human genes 0.000 description 29
- 101710137044 Fibrinogen alpha chain Proteins 0.000 description 29
- 101710170765 Fibrinogen beta chain Proteins 0.000 description 29
- 230000003321 amplification Effects 0.000 description 29
- 208000006994 Precancerous Conditions Diseases 0.000 description 26
- 238000012360 testing method Methods 0.000 description 25
- 101710095339 Apolipoprotein E Proteins 0.000 description 24
- 102100022414 Axin interactor, dorsalization-associated protein Human genes 0.000 description 23
- 102100027619 Histidine-rich glycoprotein Human genes 0.000 description 23
- 238000003559 RNA-seq method Methods 0.000 description 22
- 125000003729 nucleotide group Chemical group 0.000 description 21
- 238000004422 calculation algorithm Methods 0.000 description 19
- 239000002773 nucleotide Substances 0.000 description 18
- 101710091943 N-formyl peptide receptor 3 Proteins 0.000 description 17
- 230000008569 process Effects 0.000 description 17
- 108091035707 Consensus sequence Proteins 0.000 description 15
- 238000011529 RT qPCR Methods 0.000 description 15
- 238000013145 classification model Methods 0.000 description 15
- 210000004185 liver Anatomy 0.000 description 15
- 239000000047 product Substances 0.000 description 15
- 102000004169 proteins and genes Human genes 0.000 description 15
- 238000010200 validation analysis Methods 0.000 description 15
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 14
- 238000003762 quantitative reverse transcription PCR Methods 0.000 description 14
- 238000011282 treatment Methods 0.000 description 14
- 201000010099 disease Diseases 0.000 description 13
- 238000001574 biopsy Methods 0.000 description 12
- 238000003745 diagnosis Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 11
- 230000035772 mutation Effects 0.000 description 11
- 239000013598 vector Substances 0.000 description 11
- 108091034117 Oligonucleotide Proteins 0.000 description 10
- 238000003556 assay Methods 0.000 description 10
- 239000012634 fragment Substances 0.000 description 10
- 238000013139 quantization Methods 0.000 description 10
- 238000011528 liquid biopsy Methods 0.000 description 9
- 108020004999 messenger RNA Proteins 0.000 description 9
- 238000012544 monitoring process Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 239000000463 material Substances 0.000 description 7
- 238000005259 measurement Methods 0.000 description 7
- 238000012175 pyrosequencing Methods 0.000 description 7
- 108700024394 Exon Proteins 0.000 description 6
- 241001465754 Metazoa Species 0.000 description 6
- 238000009534 blood test Methods 0.000 description 6
- 210000001185 bone marrow Anatomy 0.000 description 6
- 238000009396 hybridization Methods 0.000 description 6
- 238000003384 imaging method Methods 0.000 description 6
- 238000003908 quality control method Methods 0.000 description 6
- 230000002441 reversible effect Effects 0.000 description 6
- 238000012216 screening Methods 0.000 description 6
- 238000000926 separation method Methods 0.000 description 6
- 208000005443 Circulating Neoplastic Cells Diseases 0.000 description 5
- 102000004190 Enzymes Human genes 0.000 description 5
- 108090000790 Enzymes Proteins 0.000 description 5
- 239000011324 bead Substances 0.000 description 5
- 230000006037 cell lysis Effects 0.000 description 5
- 230000000295 complement effect Effects 0.000 description 5
- 230000036541 health Effects 0.000 description 5
- 230000000670 limiting effect Effects 0.000 description 5
- 238000002360 preparation method Methods 0.000 description 5
- 238000000746 purification Methods 0.000 description 5
- 239000013074 reference sample Substances 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000012552 review Methods 0.000 description 5
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 description 4
- AOJJSUZBOXZQNB-TZSSRYMLSA-N Doxorubicin Chemical compound O([C@H]1C[C@@](O)(CC=2C(O)=C3C(=O)C=4C=CC=C(C=4C(=O)C3=C(O)C=21)OC)C(=O)CO)[C@H]1C[C@H](N)[C@H](O)[C@H](C)O1 AOJJSUZBOXZQNB-TZSSRYMLSA-N 0.000 description 4
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- 210000000349 chromosome Anatomy 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000002790 cross-validation Methods 0.000 description 4
- 230000009089 cytolysis Effects 0.000 description 4
- 230000002068 genetic effect Effects 0.000 description 4
- 238000012165 high-throughput sequencing Methods 0.000 description 4
- 230000003211 malignant effect Effects 0.000 description 4
- 238000007481 next generation sequencing Methods 0.000 description 4
- 238000004393 prognosis Methods 0.000 description 4
- 108020004418 ribosomal RNA Proteins 0.000 description 4
- 230000035945 sensitivity Effects 0.000 description 4
- 239000004055 small Interfering RNA Substances 0.000 description 4
- 238000003860 storage Methods 0.000 description 4
- 230000004083 survival effect Effects 0.000 description 4
- 238000002560 therapeutic procedure Methods 0.000 description 4
- 238000009966 trimming Methods 0.000 description 4
- 208000000419 Chronic Hepatitis B Diseases 0.000 description 3
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 3
- 102100034343 Integrase Human genes 0.000 description 3
- 101710085938 Matrix protein Proteins 0.000 description 3
- 101710127721 Membrane protein Proteins 0.000 description 3
- 101710182831 Nucleolar and spindle-associated protein 1 Proteins 0.000 description 3
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 3
- 238000000692 Student's t-test Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 210000001124 body fluid Anatomy 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 239000003153 chemical reaction reagent Substances 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 210000003238 esophagus Anatomy 0.000 description 3
- 239000007850 fluorescent dye Substances 0.000 description 3
- 208000002672 hepatitis B Diseases 0.000 description 3
- 238000002955 isolation Methods 0.000 description 3
- 210000003734 kidney Anatomy 0.000 description 3
- 230000004807 localization Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000011275 oncology therapy Methods 0.000 description 3
- 238000001558 permutation test Methods 0.000 description 3
- 239000002243 precursor Substances 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 238000007841 sequencing by ligation Methods 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 208000024891 symptom Diseases 0.000 description 3
- 230000009885 systemic effect Effects 0.000 description 3
- 238000012353 t test Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 230000032258 transport Effects 0.000 description 3
- 238000012176 true single molecule sequencing Methods 0.000 description 3
- 210000004981 tumor-associated macrophage Anatomy 0.000 description 3
- 101150084750 1 gene Proteins 0.000 description 2
- 101150017083 AIDA gene Proteins 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 2
- 206010006187 Breast cancer Diseases 0.000 description 2
- 208000026310 Breast neoplasm Diseases 0.000 description 2
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 2
- 108010028780 Complement C3 Proteins 0.000 description 2
- 102000016918 Complement C3 Human genes 0.000 description 2
- 230000007067 DNA methylation Effects 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- 206010061818 Disease progression Diseases 0.000 description 2
- 108010049003 Fibrinogen Proteins 0.000 description 2
- 102000008946 Fibrinogen Human genes 0.000 description 2
- 238000000585 Mann–Whitney U test Methods 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 238000002123 RNA extraction Methods 0.000 description 2
- 239000013614 RNA sample Substances 0.000 description 2
- 108091027967 Small hairpin RNA Proteins 0.000 description 2
- 108020004459 Small interfering RNA Proteins 0.000 description 2
- 208000004346 Smoldering Multiple Myeloma Diseases 0.000 description 2
- 108020004566 Transfer RNA Proteins 0.000 description 2
- 102000004338 Transferrin Human genes 0.000 description 2
- 108090000901 Transferrin Proteins 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- 238000001772 Wald test Methods 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000006907 apoptotic process Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 2
- 239000010839 body fluid Substances 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 239000006227 byproduct Substances 0.000 description 2
- 229910052791 calcium Inorganic materials 0.000 description 2
- 239000011575 calcium Substances 0.000 description 2
- 238000005251 capillar electrophoresis Methods 0.000 description 2
- 230000001364 causal effect Effects 0.000 description 2
- 230000033366 cell cycle process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000002512 chemotherapy Methods 0.000 description 2
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 2
- 210000003040 circulating cell Anatomy 0.000 description 2
- 108091092240 circulating cell-free DNA Proteins 0.000 description 2
- DQLATGHUWYMOKM-UHFFFAOYSA-L cisplatin Chemical compound N[Pt](N)(Cl)Cl DQLATGHUWYMOKM-UHFFFAOYSA-L 0.000 description 2
- 229960004316 cisplatin Drugs 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000002380 cytological effect Effects 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 239000003599 detergent Substances 0.000 description 2
- 230000005750 disease progression Effects 0.000 description 2
- 229960004679 doxorubicin Drugs 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000001839 endoscopy Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 210000001808 exosome Anatomy 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 210000003608 fece Anatomy 0.000 description 2
- 230000012953 feeding on blood of other organism Effects 0.000 description 2
- 108010048325 fibrinopeptides gamma Proteins 0.000 description 2
- 238000007672 fourth generation sequencing Methods 0.000 description 2
- 238000013467 fragmentation Methods 0.000 description 2
- 238000006062 fragmentation reaction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007429 general method Methods 0.000 description 2
- 210000002216 heart Anatomy 0.000 description 2
- 238000003505 heat denaturation Methods 0.000 description 2
- 210000004754 hybrid cell Anatomy 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 208000015181 infectious disease Diseases 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000007834 ligase chain reaction Methods 0.000 description 2
- 238000012317 liver biopsy Methods 0.000 description 2
- 208000019423 liver disease Diseases 0.000 description 2
- 238000009593 lumbar puncture Methods 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 230000036210 malignancy Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000001404 mediated effect Effects 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 210000004914 menses Anatomy 0.000 description 2
- 230000001394 metastastic effect Effects 0.000 description 2
- 206010061289 metastatic neoplasm Diseases 0.000 description 2
- 230000011987 methylation Effects 0.000 description 2
- 238000007069 methylation reaction Methods 0.000 description 2
- 108091070501 miRNA Proteins 0.000 description 2
- 108091064355 mitochondrial RNA Proteins 0.000 description 2
- 230000000394 mitotic effect Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000017074 necrotic cell death Effects 0.000 description 2
- 238000013188 needle biopsy Methods 0.000 description 2
- 208000025402 neoplasm of esophagus Diseases 0.000 description 2
- 230000005257 nucleotidylation Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 239000012071 phase Substances 0.000 description 2
- 230000033885 plasminogen activation Effects 0.000 description 2
- 210000004180 plasmocyte Anatomy 0.000 description 2
- BASFCYQUMIYNBI-UHFFFAOYSA-N platinum Chemical compound [Pt] BASFCYQUMIYNBI-UHFFFAOYSA-N 0.000 description 2
- 230000035935 pregnancy Effects 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000007790 scraping Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000010008 shearing Methods 0.000 description 2
- 210000003491 skin Anatomy 0.000 description 2
- 208000010721 smoldering plasma cell myeloma Diseases 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000002626 targeted therapy Methods 0.000 description 2
- 239000012581 transferrin Substances 0.000 description 2
- 238000002604 ultrasonography Methods 0.000 description 2
- 238000012070 whole genome sequencing analysis Methods 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- 101150028074 2 gene Proteins 0.000 description 1
- UEJJHQNACJXSKW-UHFFFAOYSA-N 2-(2,6-dioxopiperidin-3-yl)-1H-isoindole-1,3(2H)-dione Chemical compound O=C1C2=CC=CC=C2C(=O)N1C1CCC(=O)NC1=O UEJJHQNACJXSKW-UHFFFAOYSA-N 0.000 description 1
- 108010003692 3beta-hydroxysterol delta24-reductase Proteins 0.000 description 1
- 101150039504 6 gene Proteins 0.000 description 1
- 102100040149 Adenylyl-sulfate kinase Human genes 0.000 description 1
- 108020005544 Antisense RNA Proteins 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- MLDQJTXFUGDVEO-UHFFFAOYSA-N BAY-43-9006 Chemical compound C1=NC(C(=O)NC)=CC(OC=2C=CC(NC(=O)NC=3C=C(C(Cl)=CC=3)C(F)(F)F)=CC=2)=C1 MLDQJTXFUGDVEO-UHFFFAOYSA-N 0.000 description 1
- 101150071258 C3 gene Proteins 0.000 description 1
- GAGWJHPBXLXJQN-UORFTKCHSA-N Capecitabine Chemical compound C1=C(F)C(NC(=O)OCCCCC)=NC(=O)N1[C@H]1[C@H](O)[C@H](O)[C@@H](C)O1 GAGWJHPBXLXJQN-UORFTKCHSA-N 0.000 description 1
- GAGWJHPBXLXJQN-UHFFFAOYSA-N Capecitabine Natural products C1=C(F)C(NC(=O)OCCCCC)=NC(=O)N1C1C(O)C(O)C(C)O1 GAGWJHPBXLXJQN-UHFFFAOYSA-N 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 101710093167 Carboxypeptidase Q Proteins 0.000 description 1
- 201000009030 Carcinoma Diseases 0.000 description 1
- 108090000994 Catalytic RNA Proteins 0.000 description 1
- 102000053642 Catalytic RNA Human genes 0.000 description 1
- 108091006146 Channels Proteins 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 108010094074 Coproporphyrinogen oxidase Proteins 0.000 description 1
- CMSMOCZEIVJLDB-UHFFFAOYSA-N Cyclophosphamide Chemical compound ClCCN(CCCl)P1(=O)NCCCO1 CMSMOCZEIVJLDB-UHFFFAOYSA-N 0.000 description 1
- IGXWBGJHJZYPQS-SSDOTTSWSA-N D-Luciferin Chemical compound OC(=O)[C@H]1CSC(C=2SC3=CC=C(O)C=C3N=2)=N1 IGXWBGJHJZYPQS-SSDOTTSWSA-N 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical compound OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 108010017826 DNA Polymerase I Proteins 0.000 description 1
- 102000004594 DNA Polymerase I Human genes 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- CYCGRDQQIOGCKX-UHFFFAOYSA-N Dehydro-luciferin Natural products OC(=O)C1=CSC(C=2SC3=CC(O)=CC=C3N=2)=N1 CYCGRDQQIOGCKX-UHFFFAOYSA-N 0.000 description 1
- 101710154532 Delta(24)-sterol reductase Proteins 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 108010053770 Deoxyribonucleases Proteins 0.000 description 1
- 102000016911 Deoxyribonucleases Human genes 0.000 description 1
- 206010059866 Drug resistance Diseases 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 102100030013 Endoribonuclease Human genes 0.000 description 1
- 101710199605 Endoribonuclease Proteins 0.000 description 1
- 241000283086 Equidae Species 0.000 description 1
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 1
- 108010007577 Exodeoxyribonuclease I Proteins 0.000 description 1
- 108060002716 Exonuclease Proteins 0.000 description 1
- 102100029075 Exonuclease 1 Human genes 0.000 description 1
- 101710127949 Extracellular matrix protein 1 Proteins 0.000 description 1
- CWYNVVGOOAEACU-UHFFFAOYSA-N Fe2+ Chemical compound [Fe+2] CWYNVVGOOAEACU-UHFFFAOYSA-N 0.000 description 1
- VTLYFUHAOXGGBS-UHFFFAOYSA-N Fe3+ Chemical compound [Fe+3] VTLYFUHAOXGGBS-UHFFFAOYSA-N 0.000 description 1
- BJGNCJDXODQBOB-UHFFFAOYSA-N Fivefly Luciferin Natural products OC(=O)C1CSC(C=2SC3=CC(O)=CC=C3N=2)=N1 BJGNCJDXODQBOB-UHFFFAOYSA-N 0.000 description 1
- GHASVSINZRGABV-UHFFFAOYSA-N Fluorouracil Chemical compound FC1=CNC(=O)NC1=O GHASVSINZRGABV-UHFFFAOYSA-N 0.000 description 1
- 230000010337 G2 phase Effects 0.000 description 1
- 108090000288 Glycoproteins Proteins 0.000 description 1
- 102000003886 Glycoproteins Human genes 0.000 description 1
- 208000002250 Hematologic Neoplasms Diseases 0.000 description 1
- 101710195291 Hemoglobin subunit gamma-1 Proteins 0.000 description 1
- 101710195285 Hemoglobin subunit gamma-2 Proteins 0.000 description 1
- 229920002971 Heparan sulfate Polymers 0.000 description 1
- HTTJABKRGRZYRN-UHFFFAOYSA-N Heparin Chemical compound OC1C(NC(=O)C)C(O)OC(COS(O)(=O)=O)C1OC1C(OS(O)(=O)=O)C(O)C(OC2C(C(OS(O)(=O)=O)C(OC3C(C(O)C(O)C(O3)C(O)=O)OS(O)(=O)=O)C(CO)O2)NS(O)(=O)=O)C(C(O)=O)O1 HTTJABKRGRZYRN-UHFFFAOYSA-N 0.000 description 1
- 238000012752 Hepatectomy Methods 0.000 description 1
- 101710132518 Histone H2A type 1-H Proteins 0.000 description 1
- 108010033040 Histones Proteins 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000874281 Homo sapiens Bublin coiled-coil protein Proteins 0.000 description 1
- 101000915738 Homo sapiens Zinc finger Ran-binding domain-containing protein 2 Proteins 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 101710087316 Interferon-induced transmembrane protein 3 Proteins 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 101710192250 Kinetochore scaffold 1 Proteins 0.000 description 1
- 239000005511 L01XE05 - Sorafenib Substances 0.000 description 1
- 239000002138 L01XE21 - Regorafenib Substances 0.000 description 1
- 239000002176 L01XE26 - Cabozantinib Substances 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 108090001030 Lipoproteins Proteins 0.000 description 1
- 102000004895 Lipoproteins Human genes 0.000 description 1
- 108060001084 Luciferase Proteins 0.000 description 1
- 239000005089 Luciferase Substances 0.000 description 1
- DDWFXDSYGUXRAY-UHFFFAOYSA-N Luciferin Natural products CCc1c(C)c(CC2NC(=O)C(=C2C=C)C)[nH]c1Cc3[nH]c4C(=C5/NC(CC(=O)O)C(C)C5CC(=O)O)CC(=O)c4c3C DDWFXDSYGUXRAY-UHFFFAOYSA-N 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 108010063312 Metalloproteins Proteins 0.000 description 1
- 102000010750 Metalloproteins Human genes 0.000 description 1
- 102000029749 Microtubule Human genes 0.000 description 1
- 108091022875 Microtubule Proteins 0.000 description 1
- 206010060880 Monoclonal gammopathy Diseases 0.000 description 1
- 108020004485 Nonsense Codon Proteins 0.000 description 1
- 238000000636 Northern blotting Methods 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 206010061882 Oesophageal neoplasm Diseases 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 241000237502 Ostreidae Species 0.000 description 1
- 102000008212 P-Selectin Human genes 0.000 description 1
- 108010035766 P-Selectin Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 108010051456 Plasminogen Proteins 0.000 description 1
- 102000013566 Plasminogen Human genes 0.000 description 1
- 208000005107 Premature Birth Diseases 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 229940079156 Proteasome inhibitor Drugs 0.000 description 1
- 101710196267 Protein 4.2 Proteins 0.000 description 1
- 102000001253 Protein Kinase Human genes 0.000 description 1
- 108010026552 Proteome Proteins 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 102000038631 SUMO E3 ligases Human genes 0.000 description 1
- 108091007904 SUMO E3 ligases Proteins 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- 101710113029 Serine/threonine-protein kinase Proteins 0.000 description 1
- 101710112530 Serine/threonine-protein kinase MRCK alpha Proteins 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 238000012167 Small RNA sequencing Methods 0.000 description 1
- 229930182558 Sterol Natural products 0.000 description 1
- 108010090804 Streptavidin Proteins 0.000 description 1
- 241000282887 Suidae Species 0.000 description 1
- 108010022348 Sulfate adenylyltransferase Proteins 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- 101710164521 Transmembrane protein 150C Proteins 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 238000001793 Wilcoxon signed-rank test Methods 0.000 description 1
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 1
- 102100028956 Zinc finger Ran-binding domain-containing protein 2 Human genes 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000033289 adaptive immune response Effects 0.000 description 1
- IRLPACMLTUPBCL-FCIPNVEPSA-N adenosine-5'-phosphosulfate Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@@H](CO[P@](O)(=O)OS(O)(=O)=O)[C@H](O)[C@H]1O IRLPACMLTUPBCL-FCIPNVEPSA-N 0.000 description 1
- 150000003838 adenosines Chemical class 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 239000012491 analyte Substances 0.000 description 1
- 230000000692 anti-sense effect Effects 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 239000002246 antineoplastic agent Substances 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 229960003852 atezolizumab Drugs 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000015097 attachment of spindle microtubules to kinetochore Effects 0.000 description 1
- 229960002707 bendamustine Drugs 0.000 description 1
- YTKUWDBFDASYHO-UHFFFAOYSA-N bendamustine Chemical compound ClCCN(CCCl)C1=CC=C2N(C)C(CCCC(O)=O)=NC2=C1 YTKUWDBFDASYHO-UHFFFAOYSA-N 0.000 description 1
- 238000003339 best practice Methods 0.000 description 1
- 229960000397 bevacizumab Drugs 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 239000012620 biological material Substances 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 230000023555 blood coagulation Effects 0.000 description 1
- 238000004820 blood count Methods 0.000 description 1
- 230000036765 blood level Effects 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 229960001467 bortezomib Drugs 0.000 description 1
- GXJABQQUPOEUTA-RDJZCZTQSA-N bortezomib Chemical compound C([C@@H](C(=O)N[C@@H](CC(C)C)B(O)O)NC(=O)C=1N=CC=NC=1)C1=CC=CC=C1 GXJABQQUPOEUTA-RDJZCZTQSA-N 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 229960001292 cabozantinib Drugs 0.000 description 1
- ONIQOQHATWINJY-UHFFFAOYSA-N cabozantinib Chemical compound C=12C=C(OC)C(OC)=CC2=NC=CC=1OC(C=C1)=CC=C1NC(=O)C1(C(=O)NC=2C=CC(F)=CC=2)CC1 ONIQOQHATWINJY-UHFFFAOYSA-N 0.000 description 1
- 229960004117 capecitabine Drugs 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 229960002438 carfilzomib Drugs 0.000 description 1
- 108010021331 carfilzomib Proteins 0.000 description 1
- BLMPQMFVWMYDKT-NZTKNTHTSA-N carfilzomib Chemical compound C([C@@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CC(C)C)C(=O)[C@]1(C)OC1)NC(=O)CN1CCOCC1)CC1=CC=CC=C1 BLMPQMFVWMYDKT-NZTKNTHTSA-N 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006555 catalytic reaction Methods 0.000 description 1
- 230000022131 cell cycle Effects 0.000 description 1
- 230000030833 cell death Effects 0.000 description 1
- 230000007910 cell fusion Effects 0.000 description 1
- 210000003793 centrosome Anatomy 0.000 description 1
- 210000003679 cervix uteri Anatomy 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000010109 chemoembolization Effects 0.000 description 1
- 229940044683 chemotherapy drug Drugs 0.000 description 1
- 208000006990 cholangiocarcinoma Diseases 0.000 description 1
- 235000012000 cholesterol Nutrition 0.000 description 1
- 230000024321 chromosome segregation Effects 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 229940054315 ciltacabtagene autoleucel Drugs 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 239000003184 complementary RNA Substances 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 239000003246 corticosteroid Substances 0.000 description 1
- 229960001334 corticosteroids Drugs 0.000 description 1
- 238000009223 counseling Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 229960004397 cyclophosphamide Drugs 0.000 description 1
- 235000013365 dairy product Nutrition 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 229960002204 daratumumab Drugs 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 208000001335 desmosterolosis Diseases 0.000 description 1
- 229960003957 dexamethasone Drugs 0.000 description 1
- UREBDLICKHMUKA-CXSFZGCWSA-N dexamethasone Chemical compound C1CC2=CC(=O)C=C[C@]2(C)[C@]2(F)[C@@H]1[C@@H]1C[C@@H](C)[C@@](C(=O)CO)(O)[C@@]1(C)C[C@@H]2O UREBDLICKHMUKA-CXSFZGCWSA-N 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- XPPKVPWEQAFLFU-UHFFFAOYSA-J diphosphate(4-) Chemical compound [O-]P([O-])(=O)OP([O-])([O-])=O XPPKVPWEQAFLFU-UHFFFAOYSA-J 0.000 description 1
- 235000011180 diphosphates Nutrition 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 238000001493 electron microscopy Methods 0.000 description 1
- 229960004137 elotuzumab Drugs 0.000 description 1
- 239000000839 emulsion Substances 0.000 description 1
- 210000002889 endothelial cell Anatomy 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 238000010201 enrichment analysis Methods 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 239000000262 estrogen Substances 0.000 description 1
- 229940011871 estrogen Drugs 0.000 description 1
- VJJPUSNTGOMMGY-MRVIYFEKSA-N etoposide Chemical compound COC1=C(O)C(OC)=CC([C@@H]2C3=CC=4OCOC=4C=C3[C@@H](O[C@H]3[C@@H]([C@@H](O)[C@@H]4O[C@H](C)OC[C@H]4O3)O)[C@@H]3[C@@H]2C(OC3)=O)=C1 VJJPUSNTGOMMGY-MRVIYFEKSA-N 0.000 description 1
- 229960005420 etoposide Drugs 0.000 description 1
- 238000007387 excisional biopsy Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 102000013165 exonuclease Human genes 0.000 description 1
- 208000010706 fatty liver disease Diseases 0.000 description 1
- 230000008175 fetal development Effects 0.000 description 1
- 229940012952 fibrinogen Drugs 0.000 description 1
- 230000020764 fibrinolysis Effects 0.000 description 1
- 230000004761 fibrosis Effects 0.000 description 1
- 230000005669 field effect Effects 0.000 description 1
- LIYGYAHYXQDGEP-UHFFFAOYSA-N firefly oxyluciferin Natural products Oc1csc(n1)-c1nc2ccc(O)cc2s1 LIYGYAHYXQDGEP-UHFFFAOYSA-N 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 238000001917 fluorescence detection Methods 0.000 description 1
- 229960002949 fluorouracil Drugs 0.000 description 1
- 210000000232 gallbladder Anatomy 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- SDUQYLNIPVEERB-QPPQHZFASA-N gemcitabine Chemical compound O=C1N=C(N)C=CN1[C@H]1C(F)(F)[C@H](O)[C@@H](CO)O1 SDUQYLNIPVEERB-QPPQHZFASA-N 0.000 description 1
- 229960005277 gemcitabine Drugs 0.000 description 1
- 108091008053 gene clusters Proteins 0.000 description 1
- 210000003780 hair follicle Anatomy 0.000 description 1
- 230000002489 hematologic effect Effects 0.000 description 1
- 210000003917 human chromosome Anatomy 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 239000002955 immunomodulating agent Substances 0.000 description 1
- 238000009169 immunotherapy Methods 0.000 description 1
- 238000000126 in silico method Methods 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 238000007386 incisional biopsy Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 229910052738 indium Inorganic materials 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 230000015788 innate immune response Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000000543 intermediate Substances 0.000 description 1
- 210000000936 intestine Anatomy 0.000 description 1
- 208000014899 intrahepatic bile duct cancer Diseases 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 230000002427 irreversible effect Effects 0.000 description 1
- 229950007752 isatuximab Drugs 0.000 description 1
- 238000011901 isothermal amplification Methods 0.000 description 1
- 229960004942 lenalidomide Drugs 0.000 description 1
- GOTYRUGSSMKFNF-UHFFFAOYSA-N lenalidomide Chemical compound C1C=2C(N)=CC=CC=2C(=O)N1C1CCC(=O)NC1=O GOTYRUGSSMKFNF-UHFFFAOYSA-N 0.000 description 1
- 229960003784 lenvatinib Drugs 0.000 description 1
- WOSKHXYHFSIKNG-UHFFFAOYSA-N lenvatinib Chemical compound C=12C=C(C(N)=O)C(OC)=CC2=NC=CC=1OC(C=C1Cl)=CC=C1NC(=O)NC1CC1 WOSKHXYHFSIKNG-UHFFFAOYSA-N 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 210000005228 liver tissue Anatomy 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 238000007403 mPCR Methods 0.000 description 1
- 210000002540 macrophage Anatomy 0.000 description 1
- 238000002595 magnetic resonance imaging Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010339 medical test Methods 0.000 description 1
- 229960001924 melphalan Drugs 0.000 description 1
- SGDBTWWWUNNDEQ-LBPRGKRZSA-N melphalan Chemical compound OC(=O)[C@@H](N)CC1=CC=C(N(CCCl)CCCl)C=C1 SGDBTWWWUNNDEQ-LBPRGKRZSA-N 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 239000011859 microparticle Substances 0.000 description 1
- 210000004688 microtubule Anatomy 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 229960001156 mitoxantrone Drugs 0.000 description 1
- KKZJGLLVHKMTCM-UHFFFAOYSA-N mitoxantrone Chemical compound O=C1C2=C(O)C=CC(O)=C2C(=O)C2=C1C(NCCNCCO)=CC=C2NCCNCCO KKZJGLLVHKMTCM-UHFFFAOYSA-N 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003990 molecular pathway Effects 0.000 description 1
- 238000011242 molecular targeted therapy Methods 0.000 description 1
- 239000000178 monomer Substances 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 229960003301 nivolumab Drugs 0.000 description 1
- 206010053219 non-alcoholic steatohepatitis Diseases 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- DWAFYCQODLXJNR-BNTLRKBRSA-L oxaliplatin Chemical compound O1C(=O)C(=O)O[Pt]11N[C@@H]2CCCC[C@H]2N1 DWAFYCQODLXJNR-BNTLRKBRSA-L 0.000 description 1
- 229960001756 oxaliplatin Drugs 0.000 description 1
- JJVOROULKOMTKG-UHFFFAOYSA-N oxidized Photinus luciferin Chemical compound S1C2=CC(O)=CC=C2N=C1C1=NC(=O)CS1 JJVOROULKOMTKG-UHFFFAOYSA-N 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 235000020636 oyster Nutrition 0.000 description 1
- 210000000496 pancreas Anatomy 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 239000013610 patient sample Substances 0.000 description 1
- 229960002621 pembrolizumab Drugs 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000005502 peroxidation Methods 0.000 description 1
- 239000000546 pharmaceutical excipient Substances 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- 229910052697 platinum Inorganic materials 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 208000007232 portal hypertension Diseases 0.000 description 1
- 244000144977 poultry Species 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000002947 procoagulating effect Effects 0.000 description 1
- 238000011321 prophylaxis Methods 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 239000003207 proteasome inhibitor Substances 0.000 description 1
- 108060006633 protein kinase Proteins 0.000 description 1
- 238000007388 punch biopsy Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000010110 radioembolization Effects 0.000 description 1
- 238000007674 radiofrequency ablation Methods 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 229960002633 ramucirumab Drugs 0.000 description 1
- 102000005962 receptors Human genes 0.000 description 1
- 108020003175 receptors Proteins 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 229960004836 regorafenib Drugs 0.000 description 1
- FNHKPVJBJVTLMP-UHFFFAOYSA-N regorafenib Chemical compound C1=NC(C(=O)NC)=CC(OC=2C=C(F)C(NC(=O)NC=3C=C(C(Cl)=CC=3)C(F)(F)F)=CC=2)=C1 FNHKPVJBJVTLMP-UHFFFAOYSA-N 0.000 description 1
- 230000004043 responsiveness Effects 0.000 description 1
- 238000010839 reverse transcription Methods 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108091092562 ribozyme Proteins 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 238000007389 shave biopsy Methods 0.000 description 1
- 238000007390 skin biopsy Methods 0.000 description 1
- 210000002460 smooth muscle Anatomy 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 229960003787 sorafenib Drugs 0.000 description 1
- 238000007447 staining method Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000000528 statistical test Methods 0.000 description 1
- 150000003432 sterols Chemical class 0.000 description 1
- 235000003702 sterols Nutrition 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- CCEKAJIANROZEO-UHFFFAOYSA-N sulfluramid Chemical group CCNS(=O)(=O)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)C(F)(F)F CCEKAJIANROZEO-UHFFFAOYSA-N 0.000 description 1
- 229910052717 sulfur Inorganic materials 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 229960003433 thalidomide Drugs 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 210000001685 thyroid gland Anatomy 0.000 description 1
- 238000003325 tomography Methods 0.000 description 1
- 101150037438 tpm gene Proteins 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000011222 transcriptome analysis Methods 0.000 description 1
- 238000013520 translational research Methods 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 239000000107 tumor biomarker Substances 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- 210000003932 urinary bladder Anatomy 0.000 description 1
- 108700026220 vif Genes Proteins 0.000 description 1
- 239000002569 water oil cream Substances 0.000 description 1
- 238000007482 whole exome sequencing Methods 0.000 description 1
- 239000011701 zinc Substances 0.000 description 1
- 229910052725 zinc Inorganic materials 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/16—Primer sets for multiplex assays
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
Definitions
- This disclosure relates generally to the field of biotechnology and in particular to utilizing measurement of cell-free RNA (cfRNA) profiles as biomarkers to diagnose cancer and related products and uses thereof.
- cfRNA cell-free RNA
- MM a cancer of antibody-producing plasma cells
- MGUS monoclonal gammopathy of undetermined significance
- HCC Hepatocellular carcinoma
- Cirr liver cirrhosis
- Circulating cell-free RNA (cfRNA) in blood is released from cells by active secretion or through apoptosis and necrosis [40, 41]
- Plasma cfRNA has the potential to reflect the systemic response to growing tumors and provide information about the tissue of tumor origin specifically by cancer type.
- Previous work has demonstrated that global cfRNA profiles indicate temporal changes of organ-specific transcripts. Further analysis of these transcripts facilitated the prediction of pregnancy delivery, preterm birth, and distinction of cancer from healthy controls [42-46] .
- an ideal method for distinguishing cancers and their pre- malignant conditions would include measuring the level of cfRNA profiles in a sample from a subject.
- cfRNA cell-free RNA
- methods including analyzing (such as measuring) a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in a biological sample.
- cfRNA cell-free RNA
- a differential expression analysis is performed comparing the level of each cfRNA biomarker selected to a corresponding control value (CV).
- the disclosed materials and methods are useful for diagnosing, in a subject, cancer or a predisposition for cancer.
- An exemplary method is useful as a method for detecting cancer or a predisposition for cancer utilizing a biological sample obtained from a subject.
- the exemplary method comprises analyzing (such as measuring) a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample.
- cfRNA cell-free RNA
- a differential expression analysis is performed comparing the level of each cfRNA biomarker selected to a corresponding control value (CV).
- the differential expression shown by the differential expression analysis between the cfRNA biomarkers selected in corresponding CVs indicates cancer or a predisposition for cancer in the subject.
- the one or more cfRNA biomarkers are selected to indicate blood cancer or a predisposition to blood cancer.
- the methods include analyzing or measuring a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates blood cancer or a predisposition to blood cancer.
- the one or more cfRNA biomarkers are selected to indicate multiple myeloma (MM).
- one or more of CENPE, HBG1, HBG2, and NUSAP1 are selected to indicate multiple myeloma.
- the methods include analyzing or measuring a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination of two or more thereof, in cfRNA in a sample from a subject, wherein differential expression of one or more indicates multiple myeloma.
- the methods include analyzing or measuring a level of one or more (such as 1, 2, 3, or all) of CENPE, HBG1, HBG2, and NUSAP1 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, or all) indicates multiple myeloma.
- the one or more cfRNA biomarkers FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS).
- the methods include analyzing or measuring a level of one or more of FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates MGUS.
- the one or more cfRNA biomarkers are selected to indicate liver cancer or a predisposition to liver cancer.
- the methods include analyzing or measuring a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates liver cancer or a predisposition to liver cancer.
- the one or more cfRNA biomarkers are selected to indicate hepatocellular carcinoma (HCC).
- the methods include analyzing or measuring a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or a combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates HCC.
- the methods include analyzing or measuring a level of one or more (such as 1, 2, 3, 4, or all) of C3, CP, FGA, FGB, and IFITM3 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, 4, or all) indicates HCC.
- the one or more cfRNA biomarkers ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate cirrhosis.
- the methods include analyzing or measuring a level of one or more of ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination of two or more thereof, wherein differential expression of one or more indicates liver cirrhosis.
- Figs. 1A and IB show PCA analyses using the top 500 genes with largest variance across, respectively, (a) non-cancer and multiple myeloma and, (b) or liver cancer sample;
- Fig 1C shows Linear Discriminant Analysis (LDA) using DE genes with padj ⁇ 0.01 and top 10 most important genes identified by LVQ analysis. P-value was derived from Wilcoxon test.
- Figs 2A and 2B show ROC curves of, respectively, LDA and random Forest (RF) classifications models with two feature sets DE and LVQ;
- Fig 2C shows a LOOCV with the two models LDA and RF with two feature sets DE and LVQ.
- Fig 3 shows cfRNA biomarkers and classification models validated in independent sample cohort cfRNA profiles distinguish between non-cancer, MGUS and multiple myeloma donors.
- box plots of representative top 10 most significant genes resulted from the LVQ analysis for MM versus NC and a LDA plot using 10 genes from pairwise analysis across NC - MGUS and NC - MM pairs using the LVQ method. P-value was calculated for each pair by the t-test.
- Fig. 3 shows a LOOCV using 2 models (LDA and RF) with top 10 LVQ genes to discriminate MGUS and NC, MM vs MGUS and three groups NC, MGUS and MM.
- Figs. 4 is a correlation plot analysis showing that qRT-PCR of cfRNA biomarkers was concordant with RNA-sequencing data.
- the correlation plot of the qRT- PCR of cfRNA biomarkers is concordant with RNA-sequencing data according to of qRT-PCR data compared to RNA-sequencing data. P-value was calculated by t-test.
- Fig. 5 provides box plots showing qRT-PCR Ct values of top 4 LVQ genes identified from MM versus NC and top 5 LVQ genes identified from HCC versus NC.
- Fig 6 and Fig. 7 provide box plots showing that cfRNA profiles distinguish between non-cancer, MGUS and multiple myeloma donors; the box plots represent the top 10 most significant genes resulted from learning vector quantization analysis for multiple myeloma versus non-cancer;
- Fig . 8 is a LDA plot using 10 genes from pairwise analysis across non-cancer - MGUS and non-cancer - multiple myeloma samples using the learning vector quantization method; Fig. 8 shows a LOOCV using 2 models (LDA and RF) with top 10 Ivq genes to discriminate MGUS and non-cancer, multiple myeloma vs MGUS, and three groups: non-cancer, MGUS and multiple myeloma.
- LDA and RF 2 models
- Fig 9 and Fig. 10 provide box plots representative of the top 10 most significant genes from the LVQ analysis for HCC vs. NC. P-value was calculated for each pair by the t-test.
- Fig. 11 is a LDA plot using top 10 genes identified from each pairwise analysis between NC - Cirr and NC - HCC samples using the LVQ method.
- Fig 12 and Fig. 13 show Volcano plots between false discovery rate (FDR) and fold changes for all genes in pairwise comparison between non-cancer (NC) donors and multiple myeloma (MM) and liver cancer (HCC) analyzed by DESeq2. Histograms of number of significant genes differentiating two groups from random permutation between samples across non-cancer donors and multiple myeloma or liver cancer. Differential expression analysis was performed using DESeq2 with Wald test and adjusted p-value cutoff at 0.01.
- Fig 14 and Fig. 15 illustrate cfRNA biomarkers showing stage -dependent discrimination in pilot and validation sample sets.
- Fig. 14 shows Linear Discriminant Analysis using top 10 LVQ genes and model trained in the pilot cohort shows significant discrimination and classification by stage in both HCC and MM .
- Fig. 15 shows that when classifying the independent validation cohort with these same models, stage -dependent classification for both HCC and MM were seen. P-value for each pair was calculated by the Wilcoxon rank sum test.
- Fig 16 and Fig. 17 show box and whisker plots illustrating how cfRNA biomarkers for HCC show discrimination between various etiologies.
- the term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is, analyzed, measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term "about” meaning within an acceptable error range for the particular value should be assumed.
- polynucleotide refers to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof.
- a polynucleotide may constitute a deoxyribonucleic acid (DNA) molecule or a ribonucleic acid (RNA) molecule.
- Polynucleotides may have any three dimensional structure, and may perform any function, known or unknown.
- polynucleotides coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, cell-free RNA (cfRNA), messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro- RNA (miRNA), mitochondrial RNA (mtRNA), ribozymes, complementary DNA (cDNA), recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers.
- loci defined from linkage analysis, exons, introns, cell-free RNA (cfRNA), messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin
- a polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.
- cDNA refers to DNA synthesized from a single-stranded template in a enzymatically catalyzed reaction.
- a expressed cfRNA biomarker may be catalyzed by a reverse transcriptase to produce a cDNA template.
- Skilled persons will understand that creation of cDNA template libraries facilitates the characterization of expressed RNA by sequencing methods (see, for example, Nat. Rev. Gent. 2009 Jan;10(l):57-63; “RNA-Seq: a revolutionary tool for transcriptomics”).
- a variety of methods of amplifying polynucleotides e.g. DNA and/or RNA are available, some examples of which are described herein.
- Amplification may be linear, exponential, or involve both linear and exponential phases in a multi-phase amplification process.
- Amplification methods may involve changes in temperature, such as a heat denaturation step, or may be isothermal processes that do not require heat denaturation.
- some polynucleotides are "preferentially” treated, such as preferentially manipulating RNA in a sample comprising both RNA and DNA.
- preferentially refers to treatment that affects a greater proportion of the polynucleotide of the indicated type.
- preferentially treating RNA indicates that of the polynucleotides affected by the treatment, at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or more of the affected polynucleotides in a reaction are RNA molecules.
- preferentially treating RNA refers to the use of a particular treatment or reagent known in the art to have a degree of specificity for RNA over DNA.
- reverse transcriptase is an enzyme typically used in reverse transcription reactions to transcribe RNA into cDNA, and is known to have specificity for using RNA, rather than DNA, as a template.
- RNA can be preferentially treated using reagents that react with elements that are typically found in RNA and not DNA (e.g. the ribose sugar backbone, or the presence of uracil).
- preferential treatment of RNA comprises use of enzymes that are not specific to RNA, but whose activity is preferentially directed to polynucleotides derived from RNA (e.g. cDNA) by virtue of one or more previous steps.
- RNA e.g. cDNA
- single -stranded DNA ligases may preferentially ligate oligonucleotides to cDNA in samples where cDNA is produced and rendered single -stranded in the presence of other DNA species that are predominantly double -stranded.
- biomarker refers to a measurable substance (e.g., protein or polynucleotide) in an organism whose presence is indicative of some phenomenon such as disease (e.g., liver cancer or blood cancer), infection, or environmental exposure.
- a biomarker may include a gene, a gene fragment, or any other form of polynucleotide such as cell-free RNA (cfRNA).
- cfRNA cell-free RNA
- gene refers to a distinct sequence of polynucleotides forming part of a chromosome.
- a cfRNA biomarker may include the entirety or any portion of a polynucleotide expressed as a gene product by a cell.
- selecting a AIDA gene for analysis would include analyzing the level of RNA transcript expressed from the AIDA gene.
- cell-free As used herein, the terms “cell-free,” “circulating,” and “extracellular” as applied to polynucleotides (e.g. "cell-free DNA” and “cell-free RNA”) are used interchangeably to refer to polynucleotides present in a biological sample or portion thereof that can be isolated or otherwise manipulated without applying a lysis step to intact cells in the biological sample (e.g., as in extraction from cells or viruses).
- Cell-free polynucleotides may be encapsulated (e.g., exosomes) or unencapsulated or "free” from the cells or viruses from which they originate, even before a sample of the subject is collected.
- cell-free polynucleotides may be isolated from a non-cellular fraction of blood (e.g. serum or plasma), from other bodily fluids (e.g. urine), or from non-cellular fractions of other types of samples. Notwithstanding, since cfRNA polynucleotide originates from within a cell, cell-free polynucleotides may be produced as a byproduct of cell death (e.g. apoptosis or necrosis), cell lysis, or cell shedding, releasing polynucleotides into surrounding body fluids or into circulation. Moreover, cell-free polynucleotides may be produced as a by-product of applying a lysis step to the biological sample.
- apoptosis or necrosis e.g. apoptosis or necrosis
- cell lysis e.g. apoptosis or necrosis
- cell shedding e.g. apoptosis or necrosis
- a lysis step may include applying detergent, heat, mechanical shearing, or any combination thereof, to lyse an intact cell or a membrane encapsulated structure.
- a lysis step may be applied to induce release of polynucleotides from other membrane structures such as exosomes, or vesicles.
- sequencing refers to any of a number of technologies used to determine the sequence (e.g., the identity and order of monomer units) of a biomolecule, e.g., a nucleic acid such as DNA or RNA.
- Exemplary sequencing methods include, but are not limited to, targeted sequencing, single molecule real-time sequencing, exon or exome sequencing, intron sequencing, electron microscopy-based sequencing, panel sequencing, transistor- mediated sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, sequencing by hybridization, pyrosequencing, capillary electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid- phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, co -amplification at lower denaturation temperature-PCR (COLD-PCR), multiplex PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiDTM sequencing, MS-PET sequencing, and a combination thereof
- next generation sequencing refers to sequencing technologies having increased throughput as compared to traditional Sanger- and capillary electrophoresis-based approaches, for example, with the ability to generate hundreds of thousands of relatively small sequence reads at a time.
- next generation sequencing techniques include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization.
- subject refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant. More specifically, a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human. Animals include farm animals (e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like), sport animals, and companion animals (e.g., pets or support animals).
- farm animals e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like
- companion animals e.g., pets or support animals.
- a subject can be a healthy individual, an individual that has or is suspected of having a disease or a predisposition to the disease, or an individual that is in need of therapy or suspected of needing therapy.
- the terms “individual” or “patient” are intended to be interchangeable with “subject.”
- reference sample or “reference cfRNA sample” refers to a sample of known composition and/or having or known to have or lack specific properties (e.g., known nucleic acid variant(s), known cellular origin, known tumor fraction, known coverage, and/or the like) that is analyzed along with or compared to test samples in order to evaluate the accuracy of an analytical procedure, classify the test samples, and/or the like.
- a reference sample dataset typically includes from at least about 25 to at least about 30,000 or more reference samples.
- the reference sample dataset includes about 50, 75, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,500, 5,000, 7,500, 10,000, 15,000, 20,000, 25,000, 50,000, 100,000, 1,000,000, or more reference samples.
- a reference sample is used as a corresponding control for each biomarker to provide a control value (CV).
- a reference sample providing a AIDA CV corresponds to an AIDA cfRNA biomarker
- a CAI CV corresponds to a CAI cfRNA biomarker
- a CV may include a level, or range of levels, indicative of a normal subject’s cfRNA biomarker level or range of levels, whereby a differential expression analysis may be used to detect cfRNA biomarker level or levels that differ, or fall outside of, the level or range of levels indicated by the CV and, thus, detect cancer or a predisposition to cancer.
- a cfRNA biomarker level showing a higher expression than its corresponding CV is indicative of cancer or a predisposition to cancer.
- a combination of one or more cfRNA biomarker levels showing higher expression to their respective corresponding CVs is indicative of cancer of predisposition to cancer.
- a cfRNA biomarker level may be less than its corresponding CV.
- panel refers to a predetermined group of medical tests or assays used in the diagnosis and treatment of disease.
- test or “assay” refers to a process of analyzing a substance to determine is composition or quality.
- a panel may be designed as a single-plex, duplex, or multiplex where the panel tests or screens for, respectively, one, two, or three or more biomarkers in a single test.
- a blood cancer panel may include one or more cfRNA biomarkers selected from the group of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2- AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, to indicate blood cancer or a predisposition to blood cancer.
- cfRNA biomarkers selected from the group of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2- AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, to indicate blood cancer or a predisposition to blood cancer.
- a liver cancer panel may include one or more cfRNA biomarkers selected from a group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, HIST1H2AH, or any combination thereof, to indicate liver cancer or a predisposition to liver cancer.
- cfRNA biomarkers selected from a group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, HIST1H2AH, or any combination thereof, to indicate liver cancer or a predisposition to liver cancer.
- predisposition or “premalignancy” are used interchangeably and refer to a condition that may (or is a likely to) become cancer.
- a predisposition may derive from genetic or environmental etiologies relevant to the subject and generally indicates a pre- cancerous stage of disease.
- MGUS monoclonal gammopathy of undetermined significance
- cirrhosis are premalignant conditions known in the art have a likelihood of becoming, respectively, liver and blood cancer.
- Skilled persons will understand that a variety of staging systems exist for determining if a condition is cancerous. For example, the American Joint Committee on Cancer (633 N. St.
- a subject with elevated levels of one or more cfRNA biomarkers: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, HIST1H2AH, or any combination thereof, relative to one or more of the corresponding CVs may indicate a predisposition to liver cancer if no tumor meeting Stage lA’s requirements is detected.
- the disclosed materials and methods relate to a method for detecting cancer or a predisposition for cancer in a biological sample obtained from a subject.
- a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample is analyzed
- a differential expression analysis comparing the level of each cfRNA biomarker selected to a corresponding control value (CV) is performed.
- the differential expression shown by the differential expression analysis between the selected cfRNA biomarkers and corresponding CVs indicates cancer or a predisposition for cancer in the subject.
- axin interactor, dorsalization associated gene (AIDA) is selected (for example, analyzed or measured).
- carbon anhydrase 1 gene (CAI) is selected (for example, analyzed or measured).
- CENPE centromere protein E gene
- coproporphyrinogen oxidase gene is selected (for example, analyzed or measured).
- elongation factor for RNA Polymerase II 2 gene is selected (for example, analyzed or measured).
- erythrocyte membrane protein band 4.2 gene is selected (for example, analyzed or measured).
- hemoglobin subunit gamma 1 gene is selected (for example, analyzed or measured).
- hemoglobin subunit gamma 2 gene HBG2
- NIMA related kinase 2 gene is selected (for example, analyzed or measured).
- nucleolar and spindle associated protein 1 gene (NUSAP1) is selected (for example, analyzed or measured).
- apolipoprotein E gene (APOE) is selected (for example, analyzed or measured).
- complement component C3 gene C3 is selected (for example, analyzed or measured).
- ceruloplasmin gene CP is selected (for example, analyzed or measured).
- 24-dehydrocholesterol reductase gene DHCR24
- fibrinogen alpha chain gene FGA is selected (for example, analyzed or measured).
- fibrinogen beta chain gene is selected (for example, analyzed or measured).
- fibrinogen gamma chain gene is selected (for example, analyzed or measured).
- HRG histidine rich glycoprotein gene
- IFITM3 interferon induced transmembrane protein 3 gene
- ATP IB 1 ATP IB 1
- FPR3 N-formyl peptide receptor 3
- structural maintenance of chromosomes 4 gene is selected (for example, analyzed or measured).
- thioredoxin domain containing 16 gene is selected (for example, analyzed or measured).
- assembly factor for spindle microtubules gene is selected (for example, analyzed or measured).
- WRN recQ like helicase gene is selected (for example, analyzed or measured).
- ZRANB2 antisense RNA 2 gene is selected (for example, analyzed or measured).
- BMX non-receptor tyrosine kinase gene is selected (for example, analyzed or measured).
- Serine/ZThreonine kinase MRCK alpha gene is selected (for example, analyzed or measured).
- kinetochore scaffold 1 gene KNL1
- CACAN1A Calcium voltage-gated channel subunit alpha 1 gene
- ABSCB7 ATP binding cassette subfamily B member 7 gene
- histone cluster 1 H2bf gene HIST1H2BF
- PC4 and SFRS1 interacting protein 1 gene PSIP1
- PSIP1 PC4 and SFRS1 interacting protein 1 gene
- TMEM150C transmembrane protein 150C gene
- ZC3H6 Zinc Finger CCCH-type containing protein 6 gene
- ZC3H6 Zinc Finger CCCH-type containing protein 6 gene
- chromosome 9 open reading frame 16 gene C9orfl6
- carboxypeptidase Q gene CPQ is selected (for example, analyzed or measured).
- dynein cytoplasmic 1 intermediate chain 2 gene (DYNC1I2) is selected (for example, analyzed or measured).
- extracellular matrix protein 1 gene (ECM1) is selected (for example, analyzed or measured).
- ECM1 extracellular matrix protein 1 gene
- HIST1H2AH histone H2A type 1-H gene
- any combination thereof is selected (for example, analyzed or measured).
- one or more of the above biomarkers are not selected (for example, are not analyzed or measured).
- the one or more cfRNA biomarkers are selected to indicate blood cancer or a predisposition to blood cancer.
- a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination of two or more thereof are analyzed or measured in cfRNA in a sample from a subject, and differential expression of one or more indicates blood cancer or a predisposition to a blood cancer.
- the blood cancer is multiple myeloma (MM).
- the predisposition to blood cancer is monoclonal gammopathy of undetermined significance (MGUS).
- the one or more cfRNA biomarkers are selected to indicate multiple myeloma (MM).
- one or more of CENPE, HBG1, HBG2, and NUSAP1 are selected to indicate multiple myeloma.
- the methods include analyzing or measuring a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination of two or more thereof, in cfRNA in a sample from a subject, wherein differential expression of one or more indicates multiple myeloma.
- the methods include measuring a level of one or more (such as 1, 2, 3, or all) of CENPE, HBG1, HBG2, and NUSAP1 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, or all) indicates multiple myeloma.
- differential expression of one or more indicates multiple myeloma.
- an increase in expression level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or a combination of any two or more thereof (including, but not limited to each of CENPE, HGB1, HGB2, and NUSAP1) compared to a control indicates multiple myeloma.
- the differential expression is an increase of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 2-fold, at least about 2.5-fold, at least about 3-fold, or more compared to the control.
- the one or more cfRNA biomarkers FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS).
- the methods include analyzing or measuring a level of one or more of FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates MGUS.
- the one or more cfRNA biomarkers are selected to indicate liver cancer or a predisposition to liver cancer.
- a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination of two or more thereof are analyzed or measured in cfRNA in a sample from a subject, and differential expression of one or more indicates liver cancer or a predisposition to a liver cancer.
- the liver cancer is hepatocellular carcinoma (HCC).
- the predisposition to liver cancer is cirrhosis.
- the one or more cfRNA biomarkers are selected to indicate hepatocellular carcinoma (HCC).
- the methods include analyzing or measuring a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or a combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates HCC.
- the methods include analyzing or measuring a level of one or more (such as 1, 2, 3, 4, or all) of C3, CP, FGA, FGB, and IFITM3 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, 4, or all) indicates HCC.
- a level of one or more (such as 1, 2, 3, 4, or all) of C3, CP, FGA, FGB, and IFITM3 in cfRNA in a sample from a subject wherein differential expression of one or more (such as 1, 2, 3, 4, or all) indicates HCC.
- an increase in expression level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or a combination of any two or more thereof (including, but not limited to an increase in expression level of each of C3, CP, FGA, FGB, and IFITM3) compared to a control indicates HCC.
- the differential expression is an increase of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 2- fold, at least about 2.5-fold, at least about 3-fold, or more compared to the control.
- the one or more cfRNA biomarkers ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate liver cirrhosis.
- the methods include analyzing or measuring a level of one or more of ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or a combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates liver cirrhosis.
- the one or more cfRNA biomarkers are selected to determine the efficacy of a prophylactic treatment for preventing the development of cancer in subjects having a predisposition to cancer.
- the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.
- a lack of differential expression between the selected one or more cfRNA biomarkers and a corresponding CV will generally indicate a lack of cancer (e.g., “non-cancer”) or a lack of predisposition to cancer in the subject.
- the level of the one or more cfRNA biomarkers is analyzed by a method selected from the group of: a polymerase chain reaction (PCR), a quantitative PCR (qPCR), a reverse transcription PCR (rt-PCR), a complementary DNA (cDNA) synthesis, or a real-time PCR, or any combination thereof.
- PCR polymerase chain reaction
- qPCR quantitative PCR
- rt-PCR reverse transcription PCR
- cDNA complementary DNA
- a real-time PCR or any combination thereof.
- Skilled persons will understand the polynucleotide amplification (e.g. PCR) may require a primer pair designed to amplify a specific gene target.
- a primer pair is selected to amply a specific cfRNA gene target (as shown in Table 17.
- a primer pair selected from the group of: SEQ ID NO: 1 and SEQ ID NO: 2; SEQ ID NO: 3 and SEQ ID NO: 4; SEQ ID NO: 5 and SEQ ID NO: 6; SEQ ID NO: 7 and SEQ ID NO: 8; SEQ ID NO: 9 and SEQ ID NO: 10; SEQ ID NO: 11 and SEQ ID NO: 12; SEQ ID NO: 13 and SEQ ID NO: 14; SEQ ID NO: 15 and SEQ ID NO: 16; SEQ ID NO: 17 and SEQ ID NO: 18; SEQ ID NO: 19 and SEQ ID NO: 20; SEQ ID NO: 21 and SEQ ID NO: 22; SEQ ID NO: 23 and SEQ ID NO: 24; SEQ ID NO: 25 and SEQ ID NO: 26; SEQ ID NO: 27 and SEQ ID NO: 28; SEQ ID NO: 29 and SEQ ID NO: 30; SEQ ID NO: 31 and SEQ ID NO: 32; SEQ ID NO: 33 and SEQ ID NO: 34; SEQ ID NO: 10
- the level of the one or more cfRNA biomarkers is detected using RT-qPCR.
- the methods include a step utilizing a pool of two or more pairs of primers to pre-amplify a plurality of cDNAs of interest (for example generated by RT-PCR of cfRNA), followed by a step including two or more individual amplification reactions, each utilizing a single pair of primers to amplify a single cDNA of interest from the pre-amplification step (for example, using quantitative real-time PCR).
- the pre-amplification method includes performing a RT-PCR reaction comprising primer pairs for amplifying two or more of the cfRNA biomarkers described herein, producing a pre-amplified pool of cDNAs and digesting the pre-amplified pool of cDNAs to remove single -stranded nucleic acids.
- the methods include amplifying the one or more cfRNA biomarkers utilizing one or more primer pairs selected from the primer pair of SEQ ID NO: 23 and SEQ ID NO: 24, the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32, the primer pair of SEQ ID NO: 33 and SEQ ID NO: 34, the primer pair of SEQ ID NO: 35 and SEQ ID NO: 36, the primer pair of SEQ ID NO: 37 and SEQ ID NO: 38, the primer pair of SEQ ID NO: 39 and SEQ ID NO: 40, the primer pair of SEQ ID NO: 41 and SEQ ID NO: 42, or any combination thereof, for example for methods of detecting or identifying multiple myeloma.
- the one or more primer pairs include each of the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, and the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32 for methods of detecting or identifying multiple myeloma.
- the methods include amplifying the one or more cfRNA biomarkers utilizing one or more primer pairs selected from the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12, the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14, the primer pair of SEQ ID NO: 15 and SEQ ID NO: 16, the primer pair of SEQ ID NO: 17 and SEQ ID NO: 18, the primer pair of SEQ ID NO: 19 and SEQ ID NO: 20, the primer pair of SEQ ID NO: 21 and SEQ ID NO: 22, or any combination thereof for methods of detecting or identifying HCC.
- the one or more primer pairs include each of the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12 and the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14 for methods of detecting or identifying HCC.
- the biological sample is selected from the group of: a blood sample, a serum sample, a plasma sample, a urine sample, a tear sample, a saliva sample, a breast milk sample, a semen sample, a fecal sample, a cerebrospinal fluid sample, a tissue sample, or a cell sample.
- the subject is a human who has, or is suspected of having cancer or a predisposition to cancer.
- a subject can be an individual who has been diagnosed with having a cancer, is going to receive a cancer therapy, and/or has received at least one cancer therapy. The subject can be in remission of a cancer.
- a subject can be an individual which has a family history of having a cancer and therefore is predisposed to cancer.
- a subject can be an individual who was exposed to an environmental agent and therefore is predisposed to cancer.
- biological sample and “sample” are used interchangeably and may include but are not limited to, a blood sample, a serum sample, a plasma sample, a urine sample, a tear sample, a saliva sample, a breast milk sample, a semen sample, a fecal sample, a tissue sample, or a cell sample.
- a biological sample may be material obtained from cells or derived from cells of a subject.
- the biological sample may be a heterogeneous or homogeneous population of cells or tissues.
- the biological sample may be obtained using any method known to the art that can provide a sample suitable for the analytical methods described herein.
- the sample may be obtained by non -invasive methods including but not limited to: drawing blood, scraping of the skin or cervix, swabbing of the cheek, saliva collection, urine collection, feces collection, collection of menses, tears, or semen.
- the biological sample is obtained by biopsy. In other embodiments the biological sample is obtained by swabbing, endoscopy, scraping, phlebotomy, lumbar puncture (spinal tap) or any other methods known in the art. In some cases, the biological sample may be obtained, stored, or transported using components of a kit of the disclosed methods. In some cases, multiple samples, such as multiple blood samples may be obtained for diagnosis by the methods described herein. In some cases, longitudinal studies relying on multiple samples collected at different times may be performed by the methods described herein.
- multiple samples such as one or more samples from one tissue type (for example esophagus) and one or more samples from another specimen (for example serum) may be obtained for diagnosis by the methods.
- multiple samples such as one or more samples from one tissue type (e.g. esophagus) and one or more samples from another specimen (e.g. serum) may be obtained at the same or different times. Samples may be obtained at different times are stored and/or analyzed by different methods. For example, a sample may be obtained and analyzed by routine staining methods or any other cytological analysis methods.
- the biological sample may be obtained by a physician, nurse, or other medical professional such as a medical technician, endocrinologist, cytologist, phlebotomist, radiologist, or a pulmonologist.
- the medical professional may indicate the appropriate test or assay to perform on the sample.
- a molecular profiling business may consult on which assays or tests are most appropriately indicated.
- the patient or subject may obtain a biological sample for testing without the assistance of a medical professional, such as obtaining a whole blood sample, a urine sample, a fecal sample, a buccal sample, or a saliva sample.
- the biological sample is obtained by an invasive procedure including but not limited to: biopsy, needle aspiration, endoscopy, or phlebotomy.
- the method of needle aspiration may further include fine needle aspiration, core needle biopsy, vacuum assisted biopsy, or large core biopsy.
- multiple samples may be obtained by the methods herein to ensure a sufficient amount of biological material.
- the sample is a fine needle aspirate of a esophageal or a suspected esophageal tumor or neoplasm.
- the fine needle aspirate sampling procedure may be guided by the use of an ultrasound, X-ray, or other imaging device.
- the methods for obtaining a biological sample from a subject may include methods of biopsy such as fine needle aspiration, core needle biopsy, vacuum assisted biopsy, incisional biopsy, excisional biopsy, punch biopsy, shave biopsy or skin biopsy.
- the biological sample is obtained from a biopsy from liver tissue by any of the biopsy methods previously mentioned.
- the biological sample may be obtained from any of the tissues provided herein that include but are not limited to non- cancerous or cancerous tissue and non-cancerous or cancerous tissue from the serum, gall bladder, mucosal, skin, heart, lung, breast, pancreas, blood, liver, muscle, kidney, smooth muscle, bladder, colon, intestine, brain, prostate, esophagus, or thyroid tissue.
- the sample may be obtained from any other source including but not limited to blood, plasma, serum, urine, breastmilk, semen, sweat, hair follicle, buccal tissue, tears, menses, feces, saliva, or cells.
- any medical professional such as a doctor, nurse or medical technician may obtain a biological sample for testing.
- the biological sample can be obtained without the assistance of a medical professional.
- the biological sample may be obtained the from a subject directly, from a medical professional, from a third party, or from a kit provided by a molecular profding business or a third party.
- the biological sample may be obtained by the molecular profding business after the subject, a medical professional, or a third party acquires and sends the biological sample to the molecular profiling business.
- the molecular profiling business may provide suitable containers, and excipients for storage and transport of the biological sample to the molecular profiling business.
- a medical professional need not be involved in the initial diagnosis or biological sample acquisition.
- a subject may alternatively provide a biological sample through the use of an over the counter (OTC) kit.
- OTC kit may contain a means for providing the biological sample as described herein, a means for storing the biological sample for inspection, and instructions for proper use of the OTC kit.
- molecular profiling services are included in the price for purchase of the kit. In other cases, the molecular profiling services are billed separately.
- a biological sample suitable for use by the molecular profiling business may contain tissues, cells, nucleic acids, genes, gene fragments, expression products, gene expression products, or gene expression product fragments of a subject.
- the subject may be referred to a specialist such as an oncologist, surgeon, or endocrinologist.
- the specialist may likewise obtain a biological sample for testing or refer the individual to a testing center or laboratory for submission of the biological sample.
- the medical professional may refer the subject to a testing center or laboratory for submission of the biological sample.
- the subject may provide the biological sample.
- a molecular profiling business may obtain the biological sample.
- the level of the one or more cell-free (cfRNA) biomarkers is a gene expression level.
- the methods disclosed herein include measuring expression of coding and/or noncoding cfRNA genes.
- the expression of coding and/or noncoding RNA or DNA is analyzed. Measurement of expression can be done by a number of processes known in the art. The process of measuring expression may begin by isolating or extracting RNA from a biological sample (e.g., tissue sample, blood sample, plasma sample, etc.).
- isolation or extraction of cfRNA does not require applying a cell lysis step.
- a cell lysis step may be applied to induce release of polynucleotide from the cell.
- cell -lysis or lysis may be induced by applying detergent, mechanical shearing, heat, or any other methods known in the art used to lyse a cell.
- one or more commercially available kits may be used for isolation of cfRNA. Examples include kits from Qiagen (e.g., QIAamp Circulating Nucleic Acid kit), Thermo Fisher Scientific (e.g., MagMAX Cell-Free Total Nucleic Acid kit), Zymo Research (e.g., Quick-cfRNA Serum & Plasma kit).
- Qiagen e.g., QIAamp Circulating Nucleic Acid kit
- Thermo Fisher Scientific e.g., MagMAX Cell-Free Total Nucleic Acid kit
- Zymo Research e.g., Quick-cfRNA Serum & Plasma kit.
- a skilled person can select appropriate kits and methods for isolating or extracting cfRNA.
- the level of the one or more cfRNA biomarkers is analyzed or measured by hybridization (for example by means of Northern blot analysis or DNA or RNA arrays (microarrays) after converting RNA into labeled complementary DNA (cDNA) and/or amplification by means of a enzymatic chain reaction.
- quantitative or semi-quantitative enzymatic amplification methods such as polymerase chain reaction (PCR) or quantitative real-time RT-PCR or semi-quantitative RT-PCR techniques may be used.
- LCR ligase chain reaction
- TMA transcription-mediated amplification
- SDA strand displacement amplification
- NASBA nucleic acid sequence based amplification
- primer refers to a single-stranded polynucleotide configured to hybridize with a complementary polynucleotide strand and define a region or locus of the polynucleotide where amplification will initiate.
- a “primer pair” refers to two primers configured to hybridize with a polynucleotide and define a region or locus that will be amplified.
- a typical PCR reaction relies on a “forward” primer and a “reverse” primer, used conjunctively as a primer pair, to hybridize to, respectively, the antisense and sense strands of a double-stranded polynucleotide (e.g., DNA).
- a primer pair constitutes using a primer pair configured to amplify a specific region or locus, such as a selected cfRNA biomarker.
- primer pairs are selected to amplify one or more cfRNA biomarkers (see Table 17).
- the method uses of any of: SEQ ID NO: 1 and SEQ ID NO: 2 as a primer pair; SEQ ID NO: 3 and SEQ ID NO: 4 as a primer pair; SEQ ID NO: 5 and SEQ ID NO: 6 as a primer pair; SEQ ID NO: 7 and SEQ ID NO: 8 as a primer pair; SEQ ID NO: 9 and SEQ ID NO: 10 as a primer pair; SEQ ID NO: 11 and SEQ ID NO: 12 as a primer pair; SEQ ID NO: 13 and SEQ ID NO: 14 as a primer pair; SEQ ID NO: 15 and SEQ ID NO: 16 as a primer pair; SEQ ID NO: 17 and SEQ ID NO: 18 as a primer pair; SEQ ID NO: 19 and SEQ ID NO: 20 as a primer pair; SEQ ID NO: 21 and SEQ ID NO: 22 as a primer pair;
- each method herein uses each individual primer pair previously mentioned. For instance, one embodiment for each method uses the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6 and another embodiment for each method uses the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, and so on.
- gene expression levels of the one or more cfRNA biomarkers may also be analyzed by RNA sequencing methods known in the art.
- RNA sequencing methods may include cfRNA-seq, total RNA-seq, targeted RNA-seq, small RNA-seq, single-cell RNA- seq, ultra-low-input RNA- seq, RNA exome capture sequencing, and ribosome profding. Sequencing data may be processed an aligned using methods known in the art.
- a method for analyzing one or more cfRNA biomarkers by sequencing comprises: (a) isolating a set of one or more cfRNA biomarkers from the biological sample; (b) analyzing the set of one or more cfRNA biomarkers isolated in Step (a) to produce a set of one or more sequence reads; and (c) performing a differential expression analysis on the set of one or more sequence reads to a corresponding consensus sequence (CS) to measure the level of at least one cell-free RNA (cfRNA) biomarker selected from the group consisting of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, or any combination thereof.
- a differential expression shown between the set of one or more sequence reads aligning with a corresponding CS indicates cancer or a predisposition for cancer in
- the analysis used to obtain sequencing reads of Step (b) is: Maxam-Gilbert sequencing, chain-termination sequencing, pyrosequencing, or massive parallel sequencing, or any combination thereof.
- the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.
- one or more primer pairs selected from the group of: SEQ ID NO: 1 and SEQ ID NO: 2; SEQ ID NO: 3 and SEQ ID NO: 4; SEQ ID NO: 5 and SEQ ID NO: 6; SEQ ID NO: 7 and SEQ ID NO: 8; SEQ ID NO: 9 and SEQ ID NO: 10; SEQ ID NO: 11 and SEQ ID NO: 12; SEQ ID NO: 13 and SEQ ID NO: 14; SEQ ID NO: 15 and SEQ ID NO: 16; SEQ ID NO: 17 and SEQ ID NO: 18; SEQ ID NO: 19 and SEQ ID NO: 20; SEQ ID NO: 21 and SEQ ID NO: 22; SEQ ID NO: 23 and SEQ ID NO: 24; SEQ ID NO: 25 and SEQ ID NO: 26; SEQ ID NO: 27 and SEQ ID NO: 28; SEQ ID NO: 29 and SEQ ID NO: 30; SEQ ID NO: 31 and SEQ ID NO: 32; SEQ ID NO: 33 and SEQ ID NO: 10; SEQ ID NO:
- one or more cfRNA biomarkers from the group of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected or utilized to indicate blood cancer or a predisposition to blood cancer.
- one or more cfRNA biomarkers are selected or utilized to indicate multiple myeloma (MM).
- the cfRNA biomarkers CENPE, HBG1, HBG2, and NUSAP1 are selected or utilized to indicate MM.
- one or more cfRNA biomarkers from the group of: FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof are selected to indicate monoclonal gammopathy of undetermined significance (MGUS).
- one or more cfRNA biomarkers from the group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected or utilized to indicate liver cancer or a predisposition to liver cancer.
- one or more cfRNA biomarkers from the group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or any combination thereof, are selected or utilized to indicate hepatocellular carcinoma (HCC).
- the cfRNA biomarkers C3, CP, FGA, FGB, and IFITM3 are selected or utilized to indicate HCC.
- one or more cfRNA biomarkers from the group of: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected or utilized to indicate cirrhosis.
- the sequencing reads of Step (b) are obtained by: Maxam- Gilbert sequencing, chain-termination sequencing, pyrosequencing, massive parallel sequencing, or any combination thereof.
- the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.
- a CV may be of a gene for which the expression level does not differ across sample types, for example a gene that is constitutively expressed in all types of cells.
- a CV may be of a gene for which the expression level indicates a non-cancerous state in the subject.
- a known amount of a control RNA may be added to the sample(s) and the value analyzed for the level of the RNA of interest may be normalized to the value analyzed for the known amount of the control RNA.
- Normalization for some methods may comprise calculating the reads per kilobase of transcript per million mapped reads (RPKM) for a gene of interest, or may comprise calculating the fragments per kilobase of transcript per million mapped reads (FPKM) for a gene of interest. Normalization methods may comprise calculating the log2-transformed count per million (log- CPM). Skilled persons will understand that any method of normalization that accurately calculates the expression value of an RNA for comparison between samples may be used.
- the CV is a reference expression level.
- reference expression level refers to a value used as a reference for the values/data obtained from samples obtained from a subject.
- the reference level can be an absolute value, a relative value, a value which has an upper and/or lower limit, a series of values, an average value, a median, a mean value, or a value expressed by reference to a control or reference value.
- a reference level can be based on the value obtained from an individual sample, such as, for example, a value obtained from a sample from the subject but obtained at a previous point in time.
- the reference level can be based on a high number of samples, such as the levels obtained in a cohort of subjects having a particular characteristic.
- the reference level may be defined as the mean level of the patients in the cohort.
- a reference level can be based on the expression levels of the biomarkers obtained from samples from subjects who do not have a disease state or a particular phenotype. Skilled persons will understand that the particular reference expression level can vary depending on the specific method to be performed.
- Some embodiments include determining that an analyzed expression level is higher than, lower than, increased relative to, decreased relative to, equal to, or within a predetermined amount of a reference expression level.
- a higher, lower, increased, or decreased expression level is at least 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 50, 100, 150, 200, 250, 500, or 1000 fold (or any derivable range therein) or at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, or 900% different than the reference level, or any derivable range therein.
- a level of expression may be qualified as “low” or “high,” which indicates the patient expresses a certain gene or cfRNA at a level relative to a reference level or a level with a range of reference levels that are determined from multiple samples meeting particular criteria.
- the level or range of levels in multiple control samples is an example of this.
- that certain level or a predetermined threshold value is at, below, or above 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
- a threshold level may be derived from a cohort of individuals meeting a particular criteria.
- the number in the cohort may be, be at least, or be at most 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370,
- An analyzed expression level can be considered equal to a reference expression level if it is within a certain amount of the reference expression level, and such amount may be an amount that is predetermined.
- the predetermined amount may be within 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, or 50% of the reference level, or any range derivable therein.
- a comparison of cfRNA gene expression levels to a is to be made on a gene-by-gene basis. For example, if the expression levels of gene A, gene B, and gene X, as reflected in a patient’s cfRNA levels, are analyzed, a comparison to mean expression levels as reflected in cfRNA from a cohort of patients would involve: comparing the expression level of gene A in the patient’s cfRNA with the mean expression level of gene A reflected in cfRNA from the cohort of patients, comparing the expression level of gene B reflected in the patient’s cfRNA with the mean expression level of gene B in cfRNA from the cohort of patients, and comparing the expression level of gene X in cfRNA from the patient with the mean expression level of gene X in cfRNA from the cohort of patients.
- genes A, B, and X may be selected from any one of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH for comparison. Comparisons that involve determining whether the expression level analyzed in cfRNA from a patient is within a predetermined amount of a mean expression level or reference expression level are similarly done on a gene-by-gene basis, as applicable.
- a differential expression analysis is performed comparing the level of each cfRNA biomarker that is analyzed or utilized to a corresponding control value (CV). Differential expression shown by the differential expression analysis between the cfRNA biomarkers and corresponding CVs indicates cancer or a predisposition for cancer in the subject.
- the differential expression analysis comprises: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expresssion (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.
- the method measures the level of one or more cfRNA biomarker levels by Maxam-Gilbert sequencing, chain-termination sequencing, pyrosequencing, or massive parallel sequencing.
- DNA from the biological sample, cDNA derived from RNA from the biological sample, and/or amplification products of any of these are sequenced to produced sequencing reads identifying the order of nucleotides present in the sequenced polynucleotides or the complements thereof.
- a variety of suitable sequencing techniques are available.
- the method comprises: (a) collecting a biological sample from the subject; (b) isolating a set of one or more cfRNA molecules from the biological sample collected in Step (a); (c) sequencing the set of one or more cfRNA molecules isolated in Step (b) to produce a set of one or more sequence reads; and (d) performing a differential expression analysis on the set of one or more sequence reads to a corresponding consensus sequence (CS) to measure the level of at least one cell-free RNA (cfRNA) biomarker selected from the group consisting of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, or any combination thereof, in the biological sample.
- Differential expression between the set of one or more sequence reads aligning with a corresponding CS indicates cancer or a predisposition for cancer
- sequencing comprises massively parallel sequencing of about, or at least about 10,000, 100,000, 500,000, 1,000,000, or more DNA or cDNA molecules using a high-throughput sequencing by synthesis process, such as Illumina's sequencing-by-synthesis and reversible terminator-based sequencing chemistry (e.g. as described in Bentley et al., Nature 6:53-59 (2009)).
- Illumina's sequencing process comprises attachment of template DNA to a planar, optically transparent surface on which oligonucleotide anchors are bound.
- template DNA may include cDNA.
- Template DNA is end-repaired to generate 5'- phosphorylated blunt ends, and the polymerase activity of Klenow fragment is used to add a single A base to the 3' end of the blunt phosphorylated DNA. This addition prepares the DNA for ligation to oligonucleotide adapters, which optionally have an overhang of a single T base at their 3' end to increase ligation efficiency.
- the adapter oligonucleotides are complementary to the flow-cell anchor oligos. Under limiting -dilution conditions, adapter-modified, singlestranded template DNA is added to the flow cell and immobilized by hybridization to the anchor oligos.
- Attached DNA fragments are extended and bridge amplified to create an ultra- high density sequencing flow cell with hundreds of millions of clusters, each containing about 1,000 copies of the same template.
- the template DNA is amplified using PCR before it is subjected to cluster amplification, such as in a process described above.
- the templates are sequenced using a robust four-color DNA sequencing-by- synthesis technology that employs reversible terminators with removable fluorescent dyes. High-sensitivity fluorescence detection is achieved using laser excitation and total internal reflection optics. Short sequence reads of about tens to a few hundred base pairs are aligned against a reference genome, and unique mapping of the short sequence reads to the reference genome are identified using specially developed data analysis pipeline software. After completion of the first read, the templates can be regenerated in situ to enable a second read from the opposite end of the fragments. Thus, either single -end or paired end sequencing of the DNA fragments can be used.
- Another non-limiting example sequencing process is the single molecule sequencing technology of the Helicos True Single Molecule Sequencing (tSMS) technology (e.g. as described in Harris T. D. et al., Science 320: 106-109 (2008)).
- tSMS Helicos True Single Molecule Sequencing
- a DNA sample is cleaved into, or otherwise provided as strands of approximately 100 to 200 nucleotides, and a polyA sequence is added to the 3' end of each DNA strand.
- Each strand is labeled by the addition of a fluorescently labeled adenosine nucleotide.
- the DNA strands are then hybridized to a flow cell, which contains millions of oligo-T capture sites that are immobilized to the flow cell surface.
- the templates are at a density of about 100 million templates/cm 2 .
- the flow cell is then loaded into an instrument, e.g., HeliScopeTM sequencer, and a laser illuminates the surface of the flow cell, revealing the position of each template.
- a CCD camera can map the position of the templates on the flow cell surface.
- the template fluorescent label is then cleaved and washed away.
- the sequencing reaction begins by introducing a DNA polymerase and a fluorescently labeled nucleotide.
- the oligo-T nucleic acid serves as a primer.
- the polymerase incorporates the labeled nucleotides to the primer in a template directed manner.
- the polymerase and unincorporated nucleotides are removed.
- the templates that have directed incorporation of the fluorescently labeled nucleotide are discerned by imaging the flow cell surface.
- a cleavage step removes the fluorescent label, and the process is repeated with other fluorescently labeled nucleotides until the desired read length is achieved. Sequence information is collected with each nucleotide addition step.
- Whole genome sequencing by single molecule sequencing technologies excludes or typically obviates PCR-based amplification in the preparation of the sequencing libraries.
- 454 sequencing typically involves two steps. In the first step, DNA is sheared into fragments of, or otherwise provided (e.g. as naturally occurring cfDNA molecules, or cDNA from naturally short RNA molecules) as DNA having sizes of approximately 300-800 base pairs, and the polynucleotides are blunt-ended. Oligonucleotide adapters are then ligated to the ends of the DNA. The adapters serve as primers for amplification and sequencing of the DNA.
- 454 sequencing typically involves two steps. In the first step, DNA is sheared into fragments of, or otherwise provided (e.g. as naturally occurring cfDNA molecules, or cDNA from naturally short RNA molecules) as DNA having sizes of approximately 300-800 base pairs, and the polynucleotides are blunt-ended. Oligonucleotide adapters are then ligated to the ends of the DNA. The adapters serve as primers for amplification and sequencing of the DNA.
- the DNA can be attached to capture beads, e.g., streptavidin-coated beads using, e.g., adapter B, which contains 5'-biotin tag.
- the DNA attached to the beads are PCR amplified within droplets of an oil-water emulsion. The result is multiple copies of clonally amplified DNA molecules on each bead.
- the beads are captured in wells (e.g., picoliter-sized wells). Pyrosequencing is performed on each DNA molecule in parallel.
- PPi pyrophosphate
- ATP pyrophosphate
- Luciferase uses ATP to convert luciferin to oxyluciferin, and this reaction generates light that is measured and analyzed.
- Non-limiting examples include sequencing by ligation technologies (e.g., SOLiDTM sequencing of Applied Biosystems), single-molecule real-time sequencing (e.g., Pacific Biosciences sequencing platforms utilizing zero-mode wave detectors), nanopore sequencing (e.g. as described in Soni G V and Meller A. Clin Chem 53: 1996-2001 (2007)), sequencing using a chemical-sensitive field effect transistor (e.g., as described in U.S. Patent Application Publication No. 20090026082 ), sequencing platforms by Ion Torrent (pairing semiconductor technology with sequencing chemistry to directly translate chemically encoded information (A, C, G, T) into digital information (0, 1) on a semiconductor chip), and sequencing by hybridization. Additional illustrative details regarding sequencing technologies can be found in, e.g., U.S. Patent Application Publication No. 2016/031 9345 .
- UMIs unique molecular identifiers
- multiple sequence reads having the same UMI(s) are collapsed to obtain one or more consensus sequences, which are then used to determine the sequence of a source DNA polynucleotide.
- Multiple distinct reads may be generated from distinct instances of the same source DNA polynucleotide, and these reads may be compared to produce a consensus sequence.
- the instances may be generated by amplifying a source DNA molecule prior to sequencing, such that distinct sequencing operations are performed on distinct amplification products, each sharing the source DNA polynucleotide's sequence.
- amplification may introduce errors such that the sequences of the distinct amplification products have differences.
- a source DNA molecule or an amplification product thereof forms a cluster of DNA molecules linked to a region of a flow cell.
- the molecules of the cluster collectively provide a read.
- at least two reads are required to provide a consensus sequence.
- Sequencing depths of 100, 1000, and 10,000 are examples of sequencing depths useful in the disclosed embodiments for creating consensus reads for low allele frequencies (e.g., about 1% or less).
- nucleotides that are consistent across 100% of the reads sharing a UMI or combination of UMIs are included in the consensus sequence.
- consensus criterion can be lower than 100%.
- a 90% consensus criterion may be used, which means that base pairs that exist in 90% or more of the reads in the group are included in the consensus sequence.
- the consensus criterion may be set at about, or more than about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, or about 100%.
- sequencing reads are identified as originating from an RNA molecule in the source sample if the tag sequence (or the complement thereof) forms part of the sequence read (optionally, at an expected position, and/or adjacent to other expected sequence element(s)), and otherwise is identified as originating from a DNA molecule in the source sample if the tag sequence (or the complement thereof) is absent.
- RNA sequencing reads and DNA sequencing reads can be produced in a single sequencing reaction, but analyzed separately, and optionally compared to one another.
- a processor is used to group RNA-derived sequences separately from DNA- derived sequences. For example, in some embodiments, a mutation relative to an internal reference (e.g.
- overlapping reads or an external reference (e.g. a reference genome) is only designated as accurately representing the original molecule (e.g. a DNA molecule of the sample) if the same mutation is identified in one or more reads corresponding to an original molecule of the other type (e.g. an RNA molecule of the sample).
- This is particularly helpful for increasing sequencing accuracy in cases where no UMIs are used, and can further increase sequencing accuracy when used in combination with UMIs.
- one or more sequences corresponding to features known not to be present in the source polynucleotides e.g. sequences known to originate from tag oligonucleotides, RT primers, TSOs, or amplification primers
- are computationally ignored e.g. filtered out of the reads prior to alignment).
- sequencing reads are localized (mapped) by aligning the reads to a known reference genome.
- localization is realized by k-mer sharing and read-read alignment.
- the reference genome sequence is the GRCh37/hgl9 or GRCh38, which is available on the World Wide Web at genome.ucsc.edu/cgi-bin/hgGateway.
- GenBank GenBank
- dbEST the European Molecular Biology Laboratory
- EMBL the European Molecular Biology Laboratory
- DDBJ the DNA Databank of Japan
- a number of computer algorithms are available for aligning sequences, including without limitation BLAST (Altschul et al., 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993), FASTA (Person & Lipman, 1988), BOWTIE (Langmead et al., Genome Biology 10:R25.1-R25.10 [2009]), or ELAND (Illumina, Inc., San Diego, Calif., USA).
- one end of clonally expanded copies of plasma polynucleotide molecules is sequenced and processed by bioinformatics alignment analysis for the Illumina Genome Analyzer, which uses the Efficient Large-Scale Alignment of Nucleotide Databases (ELAND) software.
- ELAND Efficient Large-Scale Alignment of Nucleotide Databases
- the mutation creates a premature stop codon in a tumor suppressor gene
- the source polynucleotide originated from a cancer cell, particularly if there are a statistically significant number of cancer-associated markers are detected in the sequencing reads.
- one or more causal genetic variants are sequence variants associated with a particular type or stage of cancer, or of cancer having a particular characteristic (e.g. metastatic potential, drug resistance, drug responsiveness).
- causal variant refers to genetic variants responsible for an associated signal at a locus, such as biological effect on the phenotype of the subject.
- the disclosure provides methods for the determination of prognosis, such as where certain mutations or other genetic characteristics are known to be associated with patient outcomes.
- methods of the present disclosure comprise treating a subject based on RNA and DNA polynucleotide biomarkers analyzed in a sample from the subject.
- methods disclosed herein can be used in making therapeutic decisions, guidance and monitoring, as well as development and clinical trials of cancer therapies.
- treatment efficacy can be monitored by comparing an individual’s DNA and RNA in samples from before, during, and after treatment with particular therapies such as molecular targeted therapies (monoclonal drugs), chemotherapeutic drugs, radiation protocols, etc. or combinations of these.
- therapies such as molecular targeted therapies (monoclonal drugs), chemotherapeutic drugs, radiation protocols, etc. or combinations of these.
- the subject is identified as having MM using the methods provided herein and is treated with one or more of immunotherapy (such elotuzumab, daratumumab, or isatuximab), corticosteroids (such as dexamethasone), immunomodulating agents (such as thalidomide, lenalidomide, or pomalidmide), proteasome inhibitors (such as bortezomib, carfilzomib, or ixazmoib), chemotherapy (such as cisplatin, doxorubicin, cyclophosphamide, etoposide, melphalan, and/or bendamu stine), CAR-T therapy (such as idecabtagene violence 1 and/or ciltacabtagene autoleucel), and bone marrow transplant.
- immunotherapy such as elotuzumab, daratumumab, or isatuximab
- corticosteroids such as dexamethasone
- the subject is identified as having HCC using the methods provided herein and is treated with one or more of surgery (such as hepatectomy), radiation therapy, radiofrequency ablation, percutaneous ethanol injection, radioembolization, chemoembolization, immunotherpay (such as bevacizumab, atezolizumab, ramucirumab, pembrolizumab, and/or nivolumab), targeted therapy (such as sorafenib, lenvatinib, cabozantinib, and/or regorafenib), chemotherapy (such as doxorubicin, gemcitabine, oxaliplatin, cisplatin, 5 -fluorouracil, capecitabine, and/or mitoxantrone), and/or liver transplant.
- surgery such as hepatectomy
- radiation therapy such as bevacizumab, atezolizumab, ramucirumab, pembrolizumab, and/or nivolum
- a skilled clinician can select approbate treatment regimen(s) based on the subject, disease being treated, stage of disease, condition of the subject, and other factors.
- a series of samples collected over time from a single subject may be monitored to see if certain mutations, expression levels, or other phenotypic changes occur without treatment (e.g., longitudinal testing to monitor cancer staging from non-cancer to pre -malignancy or pre-malignancy to cancer).
- cell-free polynucleotides are monitored to see if certain mutations, expression levels, or other features of DNA or RNA increase or decrease, or new mutations appear, after treatment, which can allow a physician to alter a treatment (continue, stop or change treatment, for example) in a much shorter penod of time than afforded by methods of monitonng that track traditional patient symptoms.
- a subject identified as having a predisposition to cancer such as MM or HCC
- is monitored at intervals such as every 3 months, every 6 months, annually, every 2 years, or more to identify if progession to cancer has occurred or is occurring.
- the subject has a predisposition to MM (for example, has MGUS) and the monitonng may include one or more of a second (or more) screen with the methods provided herein, blood tests (such as to detect M protein and/or p2-microglobulin, blood cell counts, and/or calcium levels), unne tests (such as to detect M protein), bone manow biopsy, and/or imaging tests.
- the subject has a predisposition to HCC (such as liver cinhosis) and the monitonng may include one or more of a second (or more) screening with the methods provided herein, diagnostic imaging, and/or liver biopsy.
- a method further comprises the step of diagnosing an individual based on the RNA-derived sequences and DNA-denved sequences, such as diagnosing the subject with a particular stage or type of cancer associated with a detected sequence variant, or reporting a likelihood that the patient has or will develop such cancer.
- the present disclosure provides systems, such as computer systems, for implementing methods descnbed herein, including with respect to any of the vanous other aspects of this disclosure. It should be understood that it is not practical, or even possible in most cases, for an unaided human being to perform computational operations involved in some embodiments of methods disclosed herein. For example, mapping a single 30 bp read from a sample to any one of the human chromosomes might require years of effort without the assistance of a computational apparatus. Of course, the challenge of unaided sequence analysis and alignment is compounded in cases where reliable calls of low allele frequency mutations require mapping thousands (e.g., at least about 10,000) or even millions of reads to one or more chromosomes.
- the disclosure provides tangible and/or non-transitory computer readable media or computer program products that include program instructions and/or data (including data structures) for performing various computer-implemented operations.
- Examples of computer-readable media include, but are not limited to, semiconductor memory devices, magnetic media such as disk drives, magnetic tape, optical media such as CDs, magneto-optical media, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM).
- ROM read-only memory devices
- RAM random access memory
- the computer readable media may be directly controlled by an end user or the media may be indirectly controlled by the end user. Examples of directly controlled media include the media located at a user facility and/or media that are not shared with other entities.
- Examples of indirectly controlled media include media that is indirectly accessible to the user via an external network and/or via a service providing shared resources such as the "cloud.”
- Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
- the data or information employed in methods and systems disclosed herein are provided in an electronic format.
- data or information include, but are not limited to, sequencing reads derived from a nucleic acid sample, reference sequences (including reference sequences providing solely or primarily polymorphisms), sequences of one or more oligonucleotides used in the preparation of the sequencing reads (including portions thereof, and/or complements thereof), calls such as cancer diagnosis calls, counseling recommendations, diagnoses, and the like.
- data or other information provided in electronic format is available for storage on a machine and transmission between machines.
- data in electronic format is provided digitally and may be stored as bits and/or bytes in various data structures, lists, databases, etc. The data may be embodied electronically, optically, etc.
- a computer program product for generating an output indicating the sequences of DNA and RNA in a test sample.
- the computer product may contain instructions for performing any one or more of the above-described methods for determining DNA and RNA sequences.
- the computer product may include a non-transitory and/or tangible computer readable medium having a computer executable or compilable logic (e.g., instructions) recorded thereon for enabling a processor to determine a sequence of interest.
- the computer product includes a computer readable medium having a computer executable or compilable logic (e.g., instructions) recorded thereon for enabling a processor to diagnose a condition and/or determine a nucleic acid sequence of interest.
- methods described herein are performed using a computer processing system which is adapted or configured to perform a method for determining the sequence of polynucleotides derived from DNA and RNA of a sample, such as one or more sequences of interest (e.g. an expressed gene or portion thereof).
- a computer processing system is adapted or configured to perform a method as described herein.
- the system includes a sequencing device adapted or configured for sequencing polynucleotides to obtain the type of sequence information described elsewhere herein, such as with regard to any of the various aspects described herein.
- the apparatus includes components for processing the sample, such as liquid handlers and sequencing systems, comprising modules for implementing one or more steps of any of the various methods described herein (e.g. sample processing, polynucleotide purification, and various reactions (e.g. RT reactions, amplification reactions, and sequencing reactions).
- components for processing the sample such as liquid handlers and sequencing systems, comprising modules for implementing one or more steps of any of the various methods described herein (e.g. sample processing, polynucleotide purification, and various reactions (e.g. RT reactions, amplification reactions, and sequencing reactions).
- sequence or other data is input into a computer or stored on a computer readable medium either directly or indirectly.
- a computer system is directly coupled to a sequencing device that reads and/or analyzes sequences of nucleic acids from samples. Sequences or other information from such tools are provided via interface in the computer system. Alternatively, the sequences processed by system are provided from a sequence storage source such as a database or other repository.
- a memory device or mass storage device buffers or stores, at least temporarily, sequences of the nucleic acids.
- the memory device may store read counts for various chromosomes or genomes, etc.
- the memory may also store various routines and/or programs for analyzing the sequence or mapped data.
- the programs/routines include programs for performing statistical analyses.
- a user provides a polynucleotide sample into a sequencing apparatus.
- Data is collected and/or analyzed by the sequencing apparatus which is connected to a computer.
- Software on the computer allows for data collection and/or analysis.
- Data can be stored, displayed (via a monitor or other similar device), and/or sent to another location.
- the computer may be connected to the internet, which is used to transmit data to a handheld device utilized by a remote user (e.g., a physician, scientist or analyst). It is understood that the data can be stored and/or analyzed prior to transmittal.
- raw data is collected and sent to a remote user or apparatus that will analyze and/or store the data. Transmittal can occur via the internet, but can also occur via satellite or other connection.
- data can be stored on a computer-readable medium and the medium can be shipped to an end user (e.g., via mail).
- the remote user can be in the same or a different geographical location including, but not limited to a building, city, state, country or continent.
- the methods comprise collecting data regarding a plurality of polynucleotide sequences (e.g., reads, consensus sequences, and/or reference chromosome sequences) and sending the data to a computer or other computational system.
- the computer can be connected to laboratory equipment, e.g., a sample collection apparatus, a nucleotide amplification apparatus, a nucleotide sequencing apparatus, or a hybridization apparatus.
- the computer can then collect applicable data gathered by the laboratory device.
- the data can be stored on a computer at any step, e.g., while collected in real time, prior to the sending, during or in conjunction with the sending, or following the sending.
- the data can be stored on a computer-readable medium that can be extracted from the computer.
- T he data collected or stored can be transmitted from the computer to a remote location, e.g., via a local network or a wide area network such as the internet. At the remote location various operations can be performed on the transmitted data.
- reads obtained by sequencing nucleic acids consensus sequences based on the reads, the reference genome or sequence, thresholds for calling a test sample as either affected, non- affected, or no call, the actual calls of medical conditions related to the sequence of interest, diagnoses (clinical condition associated with the calls), recommendations for further tests derived from the calls and/or diagnoses, treatment and/or monitoring plans derived from the calls and/or diagnoses.
- these various types of data are obtained, stored transmitted, analyzed, and/or manipulated at one or more locations using distinct apparatus.
- the processing options span a wide spectrum of options.
- the sample is obtained at one location, it is processed and optionally sequenced at a different location, reads are aligned and calls are made at one or more different locations, and diagnoses, recommendations, and/or plans are prepared at still another location (which may be a location where the sample was obtained).
- kits that may be used in connection with the disclosed methods and systems.
- the kits include one or more primer pairs (such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more primers pairs) for analyzing or measuring the level of one or more of the disclosed cfRNA biomarkers.
- the kits include up to 10 primers pairs selected from SEQ ID NOs: 23-42, for example for use in methods of diagnosing or treating MM.
- the kits include 4 or more primers pairs, including SEQ ID NOs: 25-32, for example, for use in methods of diagnosing or treating MM.
- the kits include up to 9 primer pairs selected from SEQ ID NOs: 5-22, for example, for methods of diagnosing or treating HCC.
- kits include 5 or more primer pairs, including SEQ ID NOs: 5-14, for example, for use in diagnosing or treating HCC.
- the kits may further include additional components for use in connection with the disclosed methods, such as one or more buffers, enzymes (such as a reverse transcriptase and/or a DNA polymerase), salts, or other reaction components.
- the kits may include reagents for one or more controls, such as primers for amplification of one or more control cfRNAs.
- the kits include one or more control primer pairs selected from the pair of SEQ ID NO: 1 and SEQ ID NO: 2 or the pair of SEQ ID NO: 3 and SEQ ID NO: 4.
- cfRNA Cell-free RNA in plasma reflects phenotypic alterations of both localized sites of cancer and the systemic host response.
- the present disclosure provides methods for utilizing cfRNA sequencing to identify messenger RNA (mRNA) signatures in plasma with the tissue of origin specific to cancer types and pre-cancerous conditions.
- mRNA messenger RNA
- Total cfRNA were sequenced from plasma samples of hepatocellular carcinoma (HCC) and multiple myeloma (MM) patients, their respective pre-cancerous conditions and non-cancer donors to explore the diagnostic potential. Distinct gene sets were identified and classification models were built using the random forest and linear discriminant analysis algorithms that could distinguish cancer patients from premalignant conditions and non-cancer individuals with high accuracy. Sequencing data was cross-validated by quantitative reverse transcription PCR and cfRNA biomarkers were validated in independent sample sets with AUC higher than 0.86.
- cfRNA biomarker panels were sequenced from plasma samples of patients with liver cancer (HCC) and multiple myeloma (MM) and their pre- cancerous conditions including liver cirrhosis (Cirr) and MGUS, and non-cancer donors.
- HCC liver cancer
- MM multiple myeloma
- Potential cfRNA biomarkers were identified using plasma cfRNA-sequencing of a pilot sample set and validated the potential cfRNA biomarkers in an independent sample set.
- the sequencing data were then cross-validated using orthogonal measurement by quantitative reverse transcription PCR. Feature selection and classification models were built to explore the potential of cfRNA profiles in differentiating malignant from pre-malignant conditions.
- Table 1 Detailed Clinical Information of Pilot Set.
- Table 2 Detailed Clinical Information of Validation set.
- RNAs were protein coding with a mean fraction of 82% with a range from 65% to 89% (shown in Tables 3 and 4). The fraction of reads mapping to exons and the distribution of read depths were uniform across all sample groups. [00116] Table 3: Pilot Set Quality Control Data
- Figs. 1A and IB show the results of an unbiased Principal Component Analysis (PCA) using the top 500 genes where the largest variance across all samples through pairwise comparison showed separation of HCC and MM cfRNA profiles from that of non-cancer donors.
- PCA Principal Component Analysis
- a differential expression (DE) analysis of pairwise comparison between individual cancer types with respect to NC donors using DEseq2 yielded 110, and 12 differentiating genes (adjusted p-value ⁇ 0.01) for MM and HCC, respectively (shown below in Tables 5-8 and Fig 12).
- Table 5 Top DE genes Pairwise HCC vs. Healthy Donor (HD)
- Table 7 Top DE Genes Pairwise Cirr vs. Healthy Donor (HD)
- Table 8 Top DE Genes Pairwise MGUS vs. Healthy Donor (HD)
- LDA Linear Discriminant Analysis
- RF Random Forest
- Table 11 List of Genes Used for Linear Discriminant Analysis shown in Figs. 1C and 2A; Top 10 Genes Differentiating HCC and MM from NC.
- Table 12 List of Genes Used for Linear Discriminant Analysis shown in Figs. 6-8 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between MGUS and NC Determined Using Learning Vector Quantization Algorithm.
- Table 13 List of Genes Used for Linear Discriminant Analysis shown in Figs. 6-8 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between MM and NC Determined Using Learning Vector Quantization Algorithm.
- Table 14 List of Genes Used for Linear Discriminant Analysis shown in Figs. 6-8 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between MM and NC, and between MGUS and NC, Determined Using Learning Vector Quantization Algorithm.
- Table 15 List of Genes Used for Linear Discriminant Analysis shown in Figs. 9-11 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between Cirr. and NC Determined Using Learning Vector Quantization Algorithm.
- Table 16 List of Genes Used for Linear Discriminant Analysis shown in Figs. 9-11 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between HCC and NC Determined Using Learning Vector Quantization Algorithm.
- Table 16 List of Genes Used for Linear Discriminant Analysis shown in Figs. 9-11 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between HCC and NC as well as between Cirr. and NC Determined Using Learning Vector Quantization Algorithm.
- LOOCV leave-one-out cross validation
- HCC was correctly differentiated from NC donors with accuracies of 100% (28/28) and 93% (26/28) when using the LDA method or 96% (27/28) and 96% (27/28) when using the RF method with LVQ and DE feature sets, respectively.
- LOOCV test confirmed that the biomarker sets determined by DESeq2 and LVQ methods, combined with our classification models using LDA and RF algorithms, are statistically significant. LVQ gene sets yielded higher accuracy for both cancer types and were used as the feature sets for further validation.
- Amplification Parameters for a RT-qPCR assay were configured to pre-amplify products using SEQ ID NO: 1 through SEQ ID NO: 42.
- Template RNA was mixed with Superscript III One-step RT-PCR system with Platinum Taq DNA polymerase kit (Invitrogen Corp.; 1600 Faraday Ave., PO Box 6482, Carlsbad, CA, 92008, USA; Cat. No. 12574026) and SEQ ID NOs: 1-42 to generate cDNA according to the kit’s product-insert protocol.
- PCR amplification products were treated with Exonuclease I to digest single stranded primers at 37°C for 30 min followed by inactivation of enzymes at 80°C for 15 min.
- cDNA from the preamplification was diluted 1:80 and set-up in 96-well plates with SsoFast EvaGreen supermix (BioRad, Inc.;1000 Alfred Nobel Dr., Hercules, CA, 94547, USA; Cat. No. 1725200) with low ROX with the individual primer pairs at lOpM each.
- QuantStudio 7 Flex (Applied Biosystems, LLC; 180 Oyster Point Blvd., San Francisco, CA, 94080, USA; Cat. No. 4485701) was used to run RT-qPCR assay according to manufacturer’s recommended cycling conditions.
- the delta Ct of a target gene was calculated by subtracting the Ct of a control gene (such as either GAPDH or ACTB).
- RT-qPCR results from the pilot sample set were consistent with the sequencing data with a Pearson correlation coefficient > 0.77 and a p-value of 2.2xl0 16 (as shown in Fig. 3). It was confirmed that the differential level of cfRNA transcripts of genes identified by the LVQ algorithm (HBG1, HBG2, NUSAP1, for MM and C3, CP, FGA, FGB for HCC) from RNAseq was also observed with RT-qPCR (as shown in Fig. 3).
- cfRNA Profiles Distinguished Multiple Myeloma from Its Premalignant Condition: MGUS. and MGUS from Non-cancer Disclosed herein are methods of utilizing cfRNA to distinguish MM from MGUS, MM from non-cancer, and MGUS from non-cancer in individuals. It was next examined whether cfRNA profdes were able to recapitulate the transition from a pre-cancerous condition to a cancerous one, and distinguish between them. The hypothesis was tested on multiple myeloma (MM) as it has a well-defined pre-cancerous condition: MGUS.
- the top ten most significant genes that discriminate MM from non-cancer donors as identified by UVQ displayed a gradual transition in cfRNA level from the non-cancer donors through MGUS to MM Among these ten most significant genes, seven genes (CAI, EPB42, HBG1, HBG2, CENPE, CPOX, EPB42, NEK2 and NUSAP1) have higher expression in bone marrow, where cancerous plasma cells accumulate, compared to other tissue and cell types in publicly available data from the Human Protein Atlas [47, 48] .
- Centromere protein E a kinesin-like motor protein that accumulates in the G2 phase of the cell cycle and is highly expressed in bone marrow [49, 50]
- Serine/threonine-protein kinase NEK2
- Nucleolar and spindle associated protein 1 NUSAP1
- An LDA plot using a combination of the top 10 LVQ genes from pairwise comparisons MM - NC, and MGUS - NC displayed the separation of all three groups (shown in Fig.
- a RF model using the top 10 most important LVQ genes from MGUS - NC pairwise comparison yielded an accuracy of 88.6% (20/20 non-cancer donors and 6/9 MGUS patients).
- Classification of MM from MGUS yielded an accuracy of 89.5% (8/9 MGUS and 9/10 MM) using LOOCV with the RF classification method using the top 10 most important genes from LVQ analysis of MM versus NC comparison as a feature set.
- the 3-group classification resulted in an accuracy of 82% (19/20 NC, 4/9 MGUS and 9/10 MM) defined by LOOCV using the RF method with the feature set composed of the combination of the top 10 LVQ genes from the comparison MM versus non-cancer and MGUS versus non-cancer donors.
- Apolipoprotein E binds to specific liver and peripheral cell receptors and is essential for normal catabolism of triglyceride- rich lipoprotein constituents [53]
- Complement C3 (C3) is synthesized in the liver and secreted to the plasma and is involved in both innate and adaptive immune responses [54]
- Ceruloplasmin (CP) is a secreted plasma metalloprotein from the liver that binds copper in the plasma and is involved in the peroxidation of Fe(II) transferrin to Fe(III) transferrin [55]
- 24- dehydrocholesterol reductase DHCR24 catalyzes the reduction of sterol intermediates [56]
- Fibrinogen Alpha Chain FGA
- Fibrinogen Beta Chain FGB
- Fibrinogen Gamma Chain FGG
- cfRNA Disclosed herein are methods of utilizing cfRNA to distinguish HCC from Cirr and Cirr from NC individuals.
- RF methods using the top 10 important genes from Cirr - NC pairwise comparisons yielded 100% accuracy in classifying Cirr from NC samples using LOOCV (shown in Figs. 9-11).
- Classification of HCC from Cirr also yielded 100% accuracy using LOOCV with RF (as shown in Figs. 9-11). It was attempted to classify three classes including NC, Cirr, and HCC in one model. The 3-group classification resulted in 90.6% accuracy using LOOCV with RF (as shown in Figs. 9-11).
- cfRNA was sequenced from patients having two cancer types: one solid (HCC), and the other hematologic (MM) and their respective pre-cancerous conditions: Cirr and MGUS, respectively, and from NC donors. Both cancer types can be distinguished from non-cancer controls and pre-cancerous conditions using their cfRNA profdes.
- HCC solid
- MM hematologic
- MGUS hematologic
- cfRNA profdes To differentiate each cancer type from non-cancer individuals, the combination of ten genes identified by learning vector quantization (LVQ) analysis in each pairwise comparison yields higher accuracy compared to the use of a larger set of differentiating genes as evaluated by leave one out cross validation (LOOCV).
- LDA linear discriminant analysis
- RF random forest
- RT-qPCR confirmation for a panel of selected biomarkers was consistent with the sequencing data.
- Plasma cfRNA biomarkers identified from the sequencing data were further validated in an independent sample cohort.
- use of a small gene panel potentially enables a cost-effective assay for pan-cancer detection that might be performed in a clinical environment, such as a doctor’s office, that can be useful in broad clinical applications, including the detection and diagnosis of cancer or a predisposition to cancer.
- cfRNA profiles can recapitulate the transition from a pre-cancerous condition to cancer, including for both solid and hematologic cancers.
- the disclosed method comprises cfRNA panels containing a small number of genes may be useful for distinguishing cancers from pre-malignant conditions and precursors from healthy individuals, thus, facilitating cost-effective screening strategies for early cancer detection during routine exams in high-risk patients within the general population.
- liver and bone marrow have been reported to contribute heavily to the abundance of cell-free nucleic acids in plasma [42, 45, 46], This may explain the source of cfRNA biomarkers found in these cancer types.
- HCC eight out of the top ten genes used in the classification model are specifically synthesized in the liver and encode secreted proteins found in blood that mediate plasminogen activation and fibrinolysis processes.
- MM seven out of ten genes among the cfRNA biomarkers have relatively high expression in bone marrow compared to other tissue and cell types and are related to cell cycle processes.
- the disclosed method may be used to profile cell-free mRNA to establish a platform for longitudinal monitoring of disease progression (e.g., monitoring a pre-malignant condition as progresses to cancer) across multiple cancers.
- the disclosed method may be used as an panel or assay that measure transcript levels of mRNA in plasma for a small panel of genes that can differentiate cancer from pre- malignant conditions and otherwise healthy donors.
- organ-specific mRNA transcripts were identified as biomarkers that indicate the tissue of origin for the tumor.
- detecting the level of these cell-free plasma RNA biomarkers in a sample from a subject by the disclosed method may be combined with other nucleic acids-based and protein-based approaches for potentially increased diagnostic sensitivity and specificity.
- abnormal liver enzyme levels detected in the blood combined with measurement of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, or any combination thereof, may increase the diagnostic sensitivity and specificity of diagnosing cirrhosis.
- M protein monoclonal protein
- a urine sample indicative of kidney damage related to MGUS
- cfRNA biomarkers AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, may increase the diagnostic sensitivity and specificity of diagnosing MGUS.
- RNA purification was performed by using plasma/serum circulating and exosomal RNA purification kit (Norgen Biotek) from 3ml of human plasma according to the manufacturer’s protocol. To digest trace amounts of contaminating DNA, RNA was treated with 10X Baseline-ZERO DNase. DNase I treated RNA samples were purified and further concentrated using RNA clean and concentrator-5 (Zymo Research) according to the manufacturer’s manuals. Final eluted RNA was stored immediately at -80°C.
- RNA-Seq libraries were prepared using Clontech SMARTer stranded total RNA-seq kit v2- pico input mammalian (Takara Bio) according to the manufacturer’s instructions. For cDNA synthesis, option 2 was used (without fragmentation), starting from highly degraded RNA. Input of 7ul of RNA samples were used to generate cDNA libraries suitable for next-generation sequencing. For addition of adapters and indexes, the SMARTer RNA unique dual index kit -96 U was employed. SMARTer RNA unique dual index of each 5 ’ and 3 ’ PCR primer were added to each sample to distinguish pooled libraries from each other.
- the amplified RNA-seq library was purified by immobilization onto AMPure XP PCR purification system (Beckman Coulter).
- the library fragments originated from rRNA and mitochondrial rRNA were treated with ZapR v2 and R-Probes according to manufacturer’s protocols.
- 16 cycles of PCR were performed and final 20 ul was eluted in Tris buffer following amplified RNA-seq library purification.
- the amplified RNA-seq library was stored at -20°C prior to sequencing.
- the number of mapped reads to each gene were normalized to the total number of reads in the whole transcriptome (Reads Per Million - RPM). For each sample, exon, intron, intergenic fractions and protein coding fractions (CDS exons) were calculated using RSeQC [67], Samples with an exon fraction larger than 0.35 were kept for further analysis. Each sample was sequenced to more than 20 million paired-end reads using an Illumina Nextseq or HiSeq sequencer.
- Adapter sequences were trimmed using sickle tool [60], After trimming, the quality of the reads were checked using FastQC (vO.l 1.7) [61, 62] and RSeQC (v2.6.4) [63], Reads were aligned to the hg38 human genome using the STAR aligner (v2.5.3a) [64] with two pass mode flag. Duplicated reads were removed using the picard tool (v 1.119) [65], Read counts for each gene were calculated using the htseq-count tool (vO. 11.2) [66] in intersection-strict mode. The number of mapped reads to each gene were normalized to the total number of reads in the whole transcriptome (Reads Per Million - RPM).
- exon, intron, intergenic fractions and protein coding fractions were calculated using RSeQC [67] Samples with an exon fraction larger than 0.35 were kept for further analysis.
- Each sample was sequenced to more than 20 million paired-end reads using an Illumina Nextseq or HiSeq sequencer. Adapter sequences were trimmed using sickle tool [60], After trimming, the quality of the reads were checked using FastQC (vO.l 1.7) [61, 62] and RSeQC (v2.6.4) [63], Reads were aligned to the hg38 human genome using the STAR aligner (v2.5.3a) [64] with two pass mode flag.
- LDA and RF Cancer Type Classification
- DESeq2 and LVQ methods Two methods were used to build models for classifying cancer types using feature sets identified from pairwise comparison using DESeq2 and LVQ methods.
- LDA models were built using the R package MASS (v7.3-51.4) [71]
- Random Forest models were built using the R package randomForest (v4.6-14) [72]
- Statistical Consideration Permutation Test and Leave One Out Cross Validation
- Tissue Specificity of LVQ Feature Sets Using Publicly-A vailable Databases To evaluate whether the LVQ gene sets were tissue specific to the tissue-of-origin (TOO), publicly available average tissue-level expression values (transcripts per million; TPMs) were downloaded from the Human Protein Atlas (ref: www.proteinatlas.org/about/download). The methodology used to normalize and calculate average expression values can be found here: www.proteinatlas.org/about/assays+annotation#hpa_ma.
- This matrix of counts values were then sub-setted for the two gene sets (top 10 LVQ for MM versus non-cancer, and top 10 LVQ for HCC versus non-cancer), and a z-score was calculated across tissue types to evaluate which tissue types the genes were enriched in.
- a heatmap of this transformed matrix was generated using ComplexHeatmap (v2.4.3).
- Table 19 Linear Discriminant Analysis results for MGUS versus MM.
- Table 20 Linear Discriminant Analysis results for NC versus MGUS versus
- Table 21 Linear Discriminant Analysis results for NC versus Cirr.
- Table 22 Linear Discriminant Analysis results for Cirr. Versus HCC
- Table 23 Linear Discriminant Analysis results for NC versus Cirr. versus HCC.
- [00168] [1] SEER Cancer Stat Facts: Liver and Intrahepatic Bile Duct Cancer. National Cancer Institute. Bethesda, MD. 2018; [2] Howlader N, N.A., Krapcho M, Miller D, Brest A, Yu M, Ruhl J, Tatalovich Z, Marietta A, Lewis DR, Chen HS, Feuer EJ, Cronin KA SEER Cancer Statistics Review, 1975-2016, National Cancer Institute. Bethesda, MD; [3] Kyle, R.A. and S.V. Rajkumar, Management of monoclonal gammopathy of undetermined significance (MGUS) and smoldering multiple myeloma (SMM). Oncology (Williston Park), 2011.
- MGUS monoclonal gammopathy of undetermined significance
- SMM smoldering multiple myeloma
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Engineering & Computer Science (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Hospice & Palliative Care (AREA)
- Biophysics (AREA)
- Oncology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Methods of detecting or treating cancer or predisposition to cancer are provided, the methods including analyzing a level of one or more cell-free RNA (cfRNA) biomarkers selected from AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, EGA, FGB, EGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample; and performing a differential expression analysis comparing the level of each of the one or more cfRNA biomarkers to a corresponding control value (CV); in which differential expression shown by the differential expression analysis between the one or more cfRNA biomarkers and corresponding CVs indicates cancer or a predisposition for cancer in the subject.
Description
CELL-FREE RNA BIOMARKERS FOR THE DETECTION OF CANCER OR
PREDISPOSITION TO CANCER
Copyright Notice
[0001] © 2023 Oregon Health & Science University. A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. 37 CFR § 1.71(d).
Cross-Reference to Related Applications
[0002] This application claims the benefit of U.S. Provisional Application No. 63/303,970, filed January 27, 2022, and of U.S. Provisional Application No. 63/426,258 filed November 17, 2022, which are both incorporated by reference in their entirety.
Technical Field
[0003] This disclosure relates generally to the field of biotechnology and in particular to utilizing measurement of cell-free RNA (cfRNA) profiles as biomarkers to diagnose cancer and related products and uses thereof.
Background
[0004] Although recent advances in cancer research offer new methods to treat cancer, the early detection of malignancy still confers the highest chance of improving long-term patient survival. Currently, only 2.4% of metastatic liver cancer patients survive for more than 5 years [1]. Early detection of liver cancer, which has the most rapidly increasing incidence in the United States, would extend 5-year survival rates to 33% with current treatment options. Even with a hematologic malignancies like multiple myeloma (MM), 95% of patients are diagnosed when the cancer has already spread systemically, resulting in at least a 20% decrease in 5-year survival rates compared to detection at earlier stages [2], Noninvasive, low cost and reliable cancer diagnostic assays could greatly benefit patients by facilitating accessibility to early cancer screening.
[0005] In many cancers, there are disease states known to be precursors of malignant disease. For example, MM, a cancer of antibody-producing plasma cells, is often preceded by monoclonal gammopathy of undetermined significance (MGUS), which is characterized by
lower levels of abnormal antibodies. The prevalence of MGUS is about 3% in the Caucasian population, and the conversion rate from MGUS to multiple myeloma is approximately 1% per year [3, 4], Hepatocellular carcinoma (HCC), the most common form of liver cancer, is often preceded by liver cirrhosis (Cirr) characterized by irreversible fibrosis of the liver. The prevalence of cirrhosis is between 4.5-9.5% of the global population [5-7], The risk of developing de novo HCC in patients with liver cirrhosis ranges between 1-5% per year, depending on the etiology of the cirrhosis [5-11], Most early cancer detection studies to date have focused on distinguishing cancer from healthy controls, rather than discriminating between cancer and common premalignant conditions. Therefore, there is an unmet clinical need for a simple blood test that can identify patients with premalignant conditions who require further intervention due to a higher likelihood of cancer being present.
[0006] With current clinical practices, cancer diagnosis is primarily initiated based upon costly imaging studies or invasive screen procedures. Alternatively, some cancers may only come to attention with clinical symptoms that present at more advanced stages. Liquid biopsy, a minimally invasive method for sampling and analyzing biomarkers in various body fluids, has the potential to improve cancer diagnosis and prognosis [12-15], Several blood-based analytes have been explored for use in liquid biopsies for cancer detection such as circulating cells (Circulating Tumor Cells (CTCs), Circulating Hybrid Cells (CHCs), Tumor Associated Macrophages (TAMs)) [16-21], circulating tumor DNA (ctDNA) [22-24], platelets [25-27] and protein panels [28], However, ctDNA and circulating cells are present at low levels, have varied characteristics between patients, and only weakly correlate with phenotypic changes in cancer [17, 29, 30], Epigenetic features of ctDNA such as DNA methylation and 5- hydroxymethylcytosine signatures, or ctDNA protected patterns may provide information about the tissue of origin for pan-cancer detection [31-38], However, these methods may require a large sequencing coverage to be effective and may have inadequate sensitivity and specificity. Recent transcriptome analysis of tumor-educated platelets has shown promise for pan-cancer detection [25-27], but platelets are fragile, can be easily activated in vitro, and have highly variable characteristics depending on their preparation which make them challenging to utilize with existing clinical blood tests [39], There is thus a need for robust liquid biopsy technology that can overcome these challenges in a safe, reliable and cost-effective manner.
[0007] Circulating cell-free RNA (cfRNA) in blood is released from cells by active secretion or through apoptosis and necrosis [40, 41], Plasma cfRNA has the potential to reflect the
systemic response to growing tumors and provide information about the tissue of tumor origin specifically by cancer type. Previous work has demonstrated that global cfRNA profiles indicate temporal changes of organ-specific transcripts. Further analysis of these transcripts facilitated the prediction of pregnancy delivery, preterm birth, and distinction of cancer from healthy controls [42-46] . Thus, an ideal method for distinguishing cancers and their pre- malignant conditions would include measuring the level of cfRNA profiles in a sample from a subject.
Summary of the Invention
[0008] Provided herein are methods including analyzing (such as measuring) a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in a biological sample. A differential expression analysis is performed comparing the level of each cfRNA biomarker selected to a corresponding control value (CV). In some examples, the disclosed materials and methods are useful for diagnosing, in a subject, cancer or a predisposition for cancer. An exemplary method is useful as a method for detecting cancer or a predisposition for cancer utilizing a biological sample obtained from a subject. The exemplary method comprises analyzing (such as measuring) a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample. A differential expression analysis is performed comparing the level of each cfRNA biomarker selected to a corresponding control value (CV). The differential expression shown by the differential expression analysis between the cfRNA biomarkers selected in corresponding CVs indicates cancer or a predisposition for cancer in the subject.
[0009] In some embodiments, the one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are
selected to indicate blood cancer or a predisposition to blood cancer. In some examples, the methods include analyzing or measuring a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates blood cancer or a predisposition to blood cancer.
[0010] In some embodiments, the one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, are selected to indicate multiple myeloma (MM). In some examples, one or more of CENPE, HBG1, HBG2, and NUSAP1 are selected to indicate multiple myeloma. In some examples, the methods include analyzing or measuring a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination of two or more thereof, in cfRNA in a sample from a subject, wherein differential expression of one or more indicates multiple myeloma. In other examples, the methods include analyzing or measuring a level of one or more (such as 1, 2, 3, or all) of CENPE, HBG1, HBG2, and NUSAP1 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, or all) indicates multiple myeloma.
[0011] In some embodiments, the one or more cfRNA biomarkers: FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS). In some examples, the methods include analyzing or measuring a level of one or more of FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates MGUS.
[0012] In some embodiments, the one or more cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate liver cancer or a predisposition to liver cancer. In some examples, the methods include analyzing or measuring a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof in
cfRNA in a sample from a subject, wherein differential expression of one or more indicates liver cancer or a predisposition to liver cancer.
[0013] In some embodiments, the one or more cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, or any combination thereof, are selected to indicate hepatocellular carcinoma (HCC). In some examples, the methods include analyzing or measuring a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or a combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates HCC. In other examples, the methods include analyzing or measuring a level of one or more (such as 1, 2, 3, 4, or all) of C3, CP, FGA, FGB, and IFITM3 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, 4, or all) indicates HCC.
[0014] In some embodiments, the one or more cfRNA biomarkers: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate cirrhosis. In some examples, the methods include analyzing or measuring a level of one or more of ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination of two or more thereof, wherein differential expression of one or more indicates liver cirrhosis.
[0015] Additional aspects and advantages will be apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Brief Description of the Drawings
[0016] Figs. 1A and IB show PCA analyses using the top 500 genes with largest variance across, respectively, (a) non-cancer and multiple myeloma and, (b) or liver cancer sample; Fig 1C shows Linear Discriminant Analysis (LDA) using DE genes with padj < 0.01 and top 10 most important genes identified by LVQ analysis. P-value was derived from Wilcoxon test. [0017] Figs 2A and 2B show ROC curves of, respectively, LDA and random Forest (RF) classifications models with two feature sets DE and LVQ; Fig 2C shows a LOOCV with the two models LDA and RF with two feature sets DE and LVQ.
[0018] Fig 3 shows cfRNA biomarkers and classification models validated in independent sample cohort cfRNA profiles distinguish between non-cancer, MGUS and multiple myeloma donors. As shown in Fig. 3, box plots of representative top 10 most significant genes resulted from the LVQ analysis for MM versus NC and a LDA plot using 10 genes from pairwise
analysis across NC - MGUS and NC - MM pairs using the LVQ method. P-value was calculated for each pair by the t-test. Fig. 3 shows a LOOCV using 2 models (LDA and RF) with top 10 LVQ genes to discriminate MGUS and NC, MM vs MGUS and three groups NC, MGUS and MM.
[0019] Figs. 4 is a correlation plot analysis showing that qRT-PCR of cfRNA biomarkers was concordant with RNA-sequencing data. As shown in Fig. 4, the correlation plot of the qRT- PCR of cfRNA biomarkers is concordant with RNA-sequencing data according to of qRT-PCR data compared to RNA-sequencing data. P-value was calculated by t-test.
[0020] Fig. 5 provides box plots showing qRT-PCR Ct values of top 4 LVQ genes identified from MM versus NC and top 5 LVQ genes identified from HCC versus NC.
[0021] Fig 6 and Fig. 7 provide box plots showing that cfRNA profiles distinguish between non-cancer, MGUS and multiple myeloma donors; the box plots represent the top 10 most significant genes resulted from learning vector quantization analysis for multiple myeloma versus non-cancer;
[0022] Fig . 8 is a LDA plot using 10 genes from pairwise analysis across non-cancer - MGUS and non-cancer - multiple myeloma samples using the learning vector quantization method; Fig. 8 shows a LOOCV using 2 models (LDA and RF) with top 10 Ivq genes to discriminate MGUS and non-cancer, multiple myeloma vs MGUS, and three groups: non-cancer, MGUS and multiple myeloma.
[0023] Fig 9 and Fig. 10 provide box plots representative of the top 10 most significant genes from the LVQ analysis for HCC vs. NC. P-value was calculated for each pair by the t-test.
[0024] Fig. 11 is a LDA plot using top 10 genes identified from each pairwise analysis between NC - Cirr and NC - HCC samples using the LVQ method.
[0025] Fig 12 and Fig. 13 show Volcano plots between false discovery rate (FDR) and fold changes for all genes in pairwise comparison between non-cancer (NC) donors and multiple myeloma (MM) and liver cancer (HCC) analyzed by DESeq2. Histograms of number of significant genes differentiating two groups from random permutation between samples across non-cancer donors and multiple myeloma or liver cancer. Differential expression analysis was performed using DESeq2 with Wald test and adjusted p-value cutoff at 0.01.
[0026] Fig 14 and Fig. 15 illustrate cfRNA biomarkers showing stage -dependent discrimination in pilot and validation sample sets. Fig. 14 shows Linear Discriminant Analysis using top 10 LVQ genes and model trained in the pilot cohort shows significant discrimination
and classification by stage in both HCC and MM . Fig. 15 shows that when classifying the independent validation cohort with these same models, stage -dependent classification for both HCC and MM were seen. P-value for each pair was calculated by the Wilcoxon rank sum test. [0027] Fig 16 and Fig. 17 show box and whisker plots illustrating how cfRNA biomarkers for HCC show discrimination between various etiologies. As shown in Figs 16 and 17, a Linear Discriminant analysis trained on the pilot cohort with the top 10 LVQ genes showed significant discrimination between NC and HCC on the background of NASH, HCV+ and other etiologies in the pilot cohort and the validation cohort. P-value for each pair was calculated by the Wilcoxon rank sum test.
Detailed Description of the Invention
[0028] As used in the specification and claims, the singular form "a", "an" and "the" include plural references unless the context clearly dictates otherwise.
[0029] The term "about" or "approximately" means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is, analyzed, measured or determined, i.e., the limitations of the measurement system. For example, "about" can mean within 1 or more than 1 standard deviation, per the practice in the art. Alternatively, "about" can mean a range of up to 20%, up to 10%, up to 5%, or up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value. Where particular values are described in the application and claims, unless otherwise stated the term "about" meaning within an acceptable error range for the particular value should be assumed.
[0030] The terms "polynucleotide", "nucleotide", "nucleic acid," and "oligonucleotide" are used interchangeably. They refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof. For example, a polynucleotide may constitute a deoxyribonucleic acid (DNA) molecule or a ribonucleic acid (RNA) molecule. Polynucleotides may have any three dimensional structure, and may perform any function, known or unknown. The following are non-limiting examples of polynucleotides: coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, cell-free RNA (cfRNA), messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-
RNA (miRNA), mitochondrial RNA (mtRNA), ribozymes, complementary DNA (cDNA), recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and primers. A polynucleotide may comprise one or more modified nucleotides, such as methylated nucleotides and nucleotide analogs. If present, modifications to the nucleotide structure may be imparted before or after assembly of the polymer. The sequence of nucleotides may be interrupted by non-nucleotide components. A polynucleotide may be further modified after polymerization, such as by conjugation with a labeling component.
[0031] As used herein, “complementary DNA” or “cDNA” refers to DNA synthesized from a single-stranded template in a enzymatically catalyzed reaction. For example, a expressed cfRNA biomarker may be catalyzed by a reverse transcriptase to produce a cDNA template. Skilled persons will understand that creation of cDNA template libraries facilitates the characterization of expressed RNA by sequencing methods (see, for example, Nat. Rev. Gent. 2009 Jan;10(l):57-63; “RNA-Seq: a revolutionary tool for transcriptomics”).
[0032] The terms "amplify," "amplifies," "amplified," and "amplification," as used herein, generally refer to any process by which one or more copies are made of a target polynucleotide or a portion thereof. A variety of methods of amplifying polynucleotides (e.g. DNA and/or RNA) are available, some examples of which are described herein. Amplification may be linear, exponential, or involve both linear and exponential phases in a multi-phase amplification process. Amplification methods may involve changes in temperature, such as a heat denaturation step, or may be isothermal processes that do not require heat denaturation.
[0033] In some of the various embodiments, some polynucleotides are "preferentially" treated, such as preferentially manipulating RNA in a sample comprising both RNA and DNA. In this context, "preferentially" refers to treatment that affects a greater proportion of the polynucleotide of the indicated type. In some embodiments, preferentially treating RNA indicates that of the polynucleotides affected by the treatment, at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, or more of the affected polynucleotides in a reaction are RNA molecules. In some embodiments, preferentially treating RNA refers to the use of a particular treatment or reagent known in the art to have a degree of specificity for RNA over DNA. For example, reverse transcriptase is an enzyme typically used in reverse transcription reactions to transcribe RNA into cDNA, and is known to have specificity for using RNA, rather than DNA, as a template. As a further example, RNA can be preferentially treated using reagents that react
with elements that are typically found in RNA and not DNA (e.g. the ribose sugar backbone, or the presence of uracil). In some embodiments, preferential treatment of RNA comprises use of enzymes that are not specific to RNA, but whose activity is preferentially directed to polynucleotides derived from RNA (e.g. cDNA) by virtue of one or more previous steps. For example, single -stranded DNA ligases may preferentially ligate oligonucleotides to cDNA in samples where cDNA is produced and rendered single -stranded in the presence of other DNA species that are predominantly double -stranded.
[0034] As used herein, “biomarker” refers to a measurable substance (e.g., protein or polynucleotide) in an organism whose presence is indicative of some phenomenon such as disease (e.g., liver cancer or blood cancer), infection, or environmental exposure. A biomarker may include a gene, a gene fragment, or any other form of polynucleotide such as cell-free RNA (cfRNA). As used herein, “gene” refers to a distinct sequence of polynucleotides forming part of a chromosome. In some embodiments, a cfRNA biomarker may include the entirety or any portion of a polynucleotide expressed as a gene product by a cell. Thus, in some embodiments, for example, selecting a AIDA gene for analysis would include analyzing the level of RNA transcript expressed from the AIDA gene.
[0035] As used herein, the terms "cell-free," "circulating," and "extracellular" as applied to polynucleotides (e.g. "cell-free DNA" and "cell-free RNA") are used interchangeably to refer to polynucleotides present in a biological sample or portion thereof that can be isolated or otherwise manipulated without applying a lysis step to intact cells in the biological sample (e.g., as in extraction from cells or viruses). Cell-free polynucleotides may be encapsulated (e.g., exosomes) or unencapsulated or "free" from the cells or viruses from which they originate, even before a sample of the subject is collected. Accordingly, cell-free polynucleotides may be isolated from a non-cellular fraction of blood (e.g. serum or plasma), from other bodily fluids (e.g. urine), or from non-cellular fractions of other types of samples. Notwithstanding, since cfRNA polynucleotide originates from within a cell, cell-free polynucleotides may be produced as a byproduct of cell death (e.g. apoptosis or necrosis), cell lysis, or cell shedding, releasing polynucleotides into surrounding body fluids or into circulation. Moreover, cell-free polynucleotides may be produced as a by-product of applying a lysis step to the biological sample. Skilled persons will understand that a lysis step may include applying detergent, heat, mechanical shearing, or any combination thereof, to lyse an intact cell or a membrane
encapsulated structure. In some embodiments, a lysis step may be applied to induce release of polynucleotides from other membrane structures such as exosomes, or vesicles.
[0036] As used herein, “sequencing” refers to any of a number of technologies used to determine the sequence (e.g., the identity and order of monomer units) of a biomolecule, e.g., a nucleic acid such as DNA or RNA. Exemplary sequencing methods include, but are not limited to, targeted sequencing, single molecule real-time sequencing, exon or exome sequencing, intron sequencing, electron microscopy-based sequencing, panel sequencing, transistor- mediated sequencing, direct sequencing, random shotgun sequencing, Sanger dideoxy termination sequencing, whole-genome sequencing, sequencing by hybridization, pyrosequencing, capillary electrophoresis, duplex sequencing, cycle sequencing, single-base extension sequencing, solid- phase sequencing, high-throughput sequencing, massively parallel signature sequencing, emulsion PCR, co -amplification at lower denaturation temperature-PCR (COLD-PCR), multiplex PCR, sequencing by reversible dye terminator, paired-end sequencing, near-term sequencing, exonuclease sequencing, sequencing by ligation, short-read sequencing, single-molecule sequencing, sequencing-by-synthesis, real-time sequencing, reverse-terminator sequencing, nanopore sequencing, 454 sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing, and a combination thereof. In some embodiments, sequencing can be performer by a gene analyzer such as, for example, gene analyzers commercially available from Illumina, Inc., Pacific Biosciences, Inc., or Applied Biosystems/Thermo Fisher Scientific, among many others.
[0037] As used herein, “next generation sequencing” or “NGS” refers to sequencing technologies having increased throughput as compared to traditional Sanger- and capillary electrophoresis-based approaches, for example, with the ability to generate hundreds of thousands of relatively small sequence reads at a time. Some examples of next generation sequencing techniques include, but are not limited to, sequencing by synthesis, sequencing by ligation, and sequencing by hybridization.
[0038] As used herein, “subject” or “test subject” refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species, or other organism, such as a plant. More specifically, a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human. Animals include farm animals (e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like), sport animals, and companion animals (e.g., pets or support animals). A subject can be a healthy individual, an individual that has or is suspected of having a disease or
a predisposition to the disease, or an individual that is in need of therapy or suspected of needing therapy. The terms “individual” or “patient” are intended to be interchangeable with “subject.”
[0039] As used herein, “reference sample” or “reference cfRNA sample” refers to a sample of known composition and/or having or known to have or lack specific properties (e.g., known nucleic acid variant(s), known cellular origin, known tumor fraction, known coverage, and/or the like) that is analyzed along with or compared to test samples in order to evaluate the accuracy of an analytical procedure, classify the test samples, and/or the like. A reference sample dataset typically includes from at least about 25 to at least about 30,000 or more reference samples. In some embodiments, the reference sample dataset includes about 50, 75, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,500, 5,000, 7,500, 10,000, 15,000, 20,000, 25,000, 50,000, 100,000, 1,000,000, or more reference samples.
[0040] In an exemplary embodiment, a reference sample is used as a corresponding control for each biomarker to provide a control value (CV). For example, a reference sample providing a AIDA CV corresponds to an AIDA cfRNA biomarker, a CAI CV corresponds to a CAI cfRNA biomarker, and so forth. In some embodiments, a CV may include a level, or range of levels, indicative of a normal subject’s cfRNA biomarker level or range of levels, whereby a differential expression analysis may be used to detect cfRNA biomarker level or levels that differ, or fall outside of, the level or range of levels indicated by the CV and, thus, detect cancer or a predisposition to cancer. In some cases, a cfRNA biomarker level showing a higher expression than its corresponding CV is indicative of cancer or a predisposition to cancer. In some cases, a combination of one or more cfRNA biomarker levels showing higher expression to their respective corresponding CVs is indicative of cancer of predisposition to cancer. In some cases, a cfRNA biomarker level may be less than its corresponding CV.
[0041] As used herein, “panel” refers to a predetermined group of medical tests or assays used in the diagnosis and treatment of disease. As used herein, “test” or “assay” refers to a process of analyzing a substance to determine is composition or quality. A panel may be designed as a single-plex, duplex, or multiplex where the panel tests or screens for, respectively, one, two, or three or more biomarkers in a single test. For example, a blood cancer panel may include one or more cfRNA biomarkers selected from the group of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2- AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, to indicate blood
cancer or a predisposition to blood cancer. In another example, a liver cancer panel may include one or more cfRNA biomarkers selected from a group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, HIST1H2AH, or any combination thereof, to indicate liver cancer or a predisposition to liver cancer.
[0042] As used herein, “predisposition” or “premalignancy” are used interchangeably and refer to a condition that may (or is a likely to) become cancer. A predisposition may derive from genetic or environmental etiologies relevant to the subject and generally indicates a pre- cancerous stage of disease. For example, monoclonal gammopathy of undetermined significance (MGUS) and cirrhosis are premalignant conditions known in the art have a likelihood of becoming, respectively, liver and blood cancer. Skilled persons will understand that a variety of staging systems exist for determining if a condition is cancerous. For example, the American Joint Committee on Cancer (633 N. St. Clair St., Chicago, IL 60611-3211) defines “Stage IA” liver cancer as a single tumor 2 cm (4/5 inch) or smaller that hasn’t grown into blood vessels. (See: cancer.org/cancer/liver-cancer/detection-diagnosis- staging/staging.html). Thus, for example, in some cases a subject with elevated levels of one or more cfRNA biomarkers: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, HIST1H2AH, or any combination thereof, relative to one or more of the corresponding CVs may indicate a predisposition to liver cancer if no tumor meeting Stage lA’s requirements is detected.
[0043] In an exemplary embodiment, the disclosed materials and methods relate to a method for detecting cancer or a predisposition for cancer in a biological sample obtained from a subject. In the exemplary embodiment, a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample is analyzed or measured. A differential expression analysis comparing the level of each cfRNA biomarker selected to a corresponding control value (CV) is performed. The differential expression shown by the differential expression analysis between the selected cfRNA biomarkers and corresponding CVs indicates cancer or a predisposition for cancer in the subject.
[0044] In some embodiments, axin interactor, dorsalization associated gene (AIDA) is selected (for example, analyzed or measured). In some embodiments, carbon anhydrase 1 gene (CAI) is selected (for example, analyzed or measured). In some embodiments, centromere protein E gene (CENPE) is selected (for example, analyzed or measured). In some embodiments, coproporphyrinogen oxidase gene (CPOX) is selected (for example, analyzed or measured). In some embodiments, elongation factor for RNA Polymerase II 2 gene (ELL2) is selected (for example, analyzed or measured). In some embodiments, erythrocyte membrane protein band 4.2 gene (EPB42) is selected (for example, analyzed or measured). In some embodiments, hemoglobin subunit gamma 1 gene (HBG1) is selected (for example, analyzed or measured). In some embodiments, hemoglobin subunit gamma 2 gene (HBG2) is selected (for example, analyzed or measured). In some embodiments, NIMA related kinase 2 gene (NEK2) is selected (for example, analyzed or measured). In some embodiments, nucleolar and spindle associated protein 1 gene (NUSAP1) is selected (for example, analyzed or measured). In some embodiments, apolipoprotein E gene (APOE) is selected (for example, analyzed or measured). In some embodiments, complement component C3 gene (C3) is selected (for example, analyzed or measured). In some embodiments, ceruloplasmin gene (CP) is selected (for example, analyzed or measured). In some embodiments, 24-dehydrocholesterol reductase gene (DHCR24) is selected (for example, analyzed or measured). In some embodiments, fibrinogen alpha chain gene (FGA) is selected (for example, analyzed or measured). In some embodiments, fibrinogen beta chain gene (FGB) is selected (for example, analyzed or measured). In some embodiments, fibrinogen gamma chain gene (FGG) is selected (for example, analyzed or measured). In some embodiments, histidine rich glycoprotein gene (HRG) is selected (for example, analyzed or measured). In some embodiments, interferon induced transmembrane protein 3 gene (IFITM3) is selected (for example, analyzed or measured). In some embodiments, ATPase Na+/K+ transporting subunit beta 1 gene (ATP IB 1) is selected (for example, analyzed or measured). In some embodiments, N-formyl peptide receptor 3 (FPR3) is selected (for example, analyzed or measured). In some embodiments, structural maintenance of chromosomes 4 gene (SMC4) is selected (for example, analyzed or measured). In some embodiments, thioredoxin domain containing 16 gene (TXNDC16) is selected (for example, analyzed or measured). In some embodiments, assembly factor for spindle microtubules gene (ASPM) is selected (for example, analyzed or measured). In some embodiments, WRN recQ like helicase gene (WRN) is selected (for example, analyzed or
measured). In some embodiments, ZRANB2 antisense RNA 2 gene (ZRANB2-AS2) is selected (for example, analyzed or measured). In some embodiments, BMX non-receptor tyrosine kinase gene (BMX) is selected (for example, analyzed or measured). In some embodiments, Serine/ZThreonine kinase MRCK alpha gene (CDC42BPA) is selected (for example, analyzed or measured). In some embodiments, kinetochore scaffold 1 gene (KNL1) is selected (for example, analyzed or measured). In some embodiments, Calcium voltage-gated channel subunit alpha 1 gene (CACAN1A) is selected (for example, analyzed or measured). In some embodiments, ATP binding cassette subfamily B member 7 gene (ABCB7) is selected (for example, analyzed or measured). In some embodiments, histone cluster 1 H2bf gene (HIST1H2BF) is selected (for example, analyzed or measured). In some embodiments, PC4 and SFRS1 interacting protein 1 gene (PSIP1) is selected (for example, analyzed or measured). In some embodiments, transmembrane protein 150C gene (TMEM150C) is selected (for example, analyzed or measured). In some embodiments, Zinc Finger CCCH-type containing protein 6 gene (ZC3H6) is selected (for example, analyzed or measured). In some embodiments, chromosome 9 open reading frame 16 gene (C9orfl6) is selected (for example, analyzed or measured). In some embodiments, carboxypeptidase Q gene (CPQ) is selected (for example, analyzed or measured). In some embodiments, dynein cytoplasmic 1 intermediate chain 2 gene (DYNC1I2) is selected (for example, analyzed or measured). In some embodiments, extracellular matrix protein 1 gene (ECM1) is selected (for example, analyzed or measured). In some embodiments, histone H2A type 1-H gene (HIST1H2AH) is selected, (for example, analyzed or measured) In certain embodiments, any combination thereof is selected (for example, analyzed or measured). In certain embodiments, one or more of the above biomarkers are not selected (for example, are not analyzed or measured).
[0045] In some embodiments, the one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate blood cancer or a predisposition to blood cancer. In some examples, a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination of two or more thereof are analyzed or measured in cfRNA in a sample from a subject, and differential expression of one or more indicates blood cancer or a predisposition to a blood cancer. In some embodiments, the blood cancer is multiple
myeloma (MM). In some embodiments, the predisposition to blood cancer is monoclonal gammopathy of undetermined significance (MGUS).
[0046] In some embodiments, the one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, are selected to indicate multiple myeloma (MM). In some examples, one or more of CENPE, HBG1, HBG2, and NUSAP1 are selected to indicate multiple myeloma. In some examples, the methods include analyzing or measuring a level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination of two or more thereof, in cfRNA in a sample from a subject, wherein differential expression of one or more indicates multiple myeloma.
[0047] In some examples, the methods include measuring a level of one or more (such as 1, 2, 3, or all) of CENPE, HBG1, HBG2, and NUSAP1 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, or all) indicates multiple myeloma. In some examples, an increase in expression level of one or more of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or a combination of any two or more thereof (including, but not limited to each of CENPE, HGB1, HGB2, and NUSAP1) compared to a control indicates multiple myeloma. In some examples, the differential expression is an increase of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 2-fold, at least about 2.5-fold, at least about 3-fold, or more compared to the control.
[0048] In some embodiments, the one or more cfRNA biomarkers: FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS). In some examples, the methods include analyzing or measuring a level of one or more of FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates MGUS.
[0049] In some embodiments, the one or more cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate liver cancer or a predisposition to liver cancer. In some examples, a level of
one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination of two or more thereof are analyzed or measured in cfRNA in a sample from a subject, and differential expression of one or more indicates liver cancer or a predisposition to a liver cancer. In some embodiments, the liver cancer is hepatocellular carcinoma (HCC). In some embodiments, the predisposition to liver cancer is cirrhosis.
[0050] In some embodiments, the one or more cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, or any combination thereof, are selected to indicate hepatocellular carcinoma (HCC). In some examples, the methods include analyzing or measuring a level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or a combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates HCC. In other examples, the methods include analyzing or measuring a level of one or more (such as 1, 2, 3, 4, or all) of C3, CP, FGA, FGB, and IFITM3 in cfRNA in a sample from a subject, wherein differential expression of one or more (such as 1, 2, 3, 4, or all) indicates HCC. In some examples, an increase in expression level of one or more of APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or a combination of any two or more thereof (including, but not limited to an increase in expression level of each of C3, CP, FGA, FGB, and IFITM3) compared to a control indicates HCC. In some examples, the differential expression is an increase of at least about 10%, at least about 20%, at least about 30%, at least about 40%, at least about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 2- fold, at least about 2.5-fold, at least about 3-fold, or more compared to the control.
[0051] In some embodiments, the one or more cfRNA biomarkers: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate liver cirrhosis. In some examples, the methods include analyzing or measuring a level of one or more of ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or a combination of any two or more thereof in cfRNA in a sample from a subject, wherein differential expression of one or more indicates liver cirrhosis.
[0052] In some embodiments, the one or more cfRNA biomarkers are selected to determine the efficacy of a prophylactic treatment for preventing the development of cancer in subjects having a predisposition to cancer.
[0053] In some embodiments, the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof. Skilled persons will understand that a lack of differential expression between the selected one or more cfRNA biomarkers and a corresponding CV will generally indicate a lack of cancer (e.g., “non-cancer”) or a lack of predisposition to cancer in the subject.
[0054] In some embodiments, the level of the one or more cfRNA biomarkers is analyzed by a method selected from the group of: a polymerase chain reaction (PCR), a quantitative PCR (qPCR), a reverse transcription PCR (rt-PCR), a complementary DNA (cDNA) synthesis, or a real-time PCR, or any combination thereof. Skilled persons will understand the polynucleotide amplification (e.g. PCR) may require a primer pair designed to amplify a specific gene target. In some embodiments, a primer pair is selected to amply a specific cfRNA gene target (as shown in Table 17. In some embodiments, a primer pair, selected from the group of: SEQ ID NO: 1 and SEQ ID NO: 2; SEQ ID NO: 3 and SEQ ID NO: 4; SEQ ID NO: 5 and SEQ ID NO: 6; SEQ ID NO: 7 and SEQ ID NO: 8; SEQ ID NO: 9 and SEQ ID NO: 10; SEQ ID NO: 11 and SEQ ID NO: 12; SEQ ID NO: 13 and SEQ ID NO: 14; SEQ ID NO: 15 and SEQ ID NO: 16; SEQ ID NO: 17 and SEQ ID NO: 18; SEQ ID NO: 19 and SEQ ID NO: 20; SEQ ID NO: 21 and SEQ ID NO: 22; SEQ ID NO: 23 and SEQ ID NO: 24; SEQ ID NO: 25 and SEQ ID NO: 26; SEQ ID NO: 27 and SEQ ID NO: 28; SEQ ID NO: 29 and SEQ ID NO: 30; SEQ ID NO: 31 and SEQ ID NO: 32; SEQ ID NO: 33 and SEQ ID NO: 34; SEQ ID NO: 35 and SEQ ID NO: 36; SEQ ID NO: 37 and SEQ ID NO: 38; SEQ ID NO: 39 and SEQ ID NO: 40; SEQ ID NO: 41 and SEQ ID NO: 42; or any combination thereof, is used to analyze the one or more cfRNA biomarkers in the biological sample.
[0055] In some examples, the level of the one or more cfRNA biomarkers is detected using RT-qPCR. In some examples, the methods include a step utilizing a pool of two or more pairs of primers to pre-amplify a plurality of cDNAs of interest (for example generated by RT-PCR of cfRNA), followed by a step including two or more individual amplification reactions, each utilizing a single pair of primers to amplify a single cDNA of interest from the pre-amplification step (for example, using quantitative real-time PCR). In some examples, the pre-amplification
method includes performing a RT-PCR reaction comprising primer pairs for amplifying two or more of the cfRNA biomarkers described herein, producing a pre-amplified pool of cDNAs and digesting the pre-amplified pool of cDNAs to remove single -stranded nucleic acids.
[0056] In some examples, the methods include amplifying the one or more cfRNA biomarkers utilizing one or more primer pairs selected from the primer pair of SEQ ID NO: 23 and SEQ ID NO: 24, the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32, the primer pair of SEQ ID NO: 33 and SEQ ID NO: 34, the primer pair of SEQ ID NO: 35 and SEQ ID NO: 36, the primer pair of SEQ ID NO: 37 and SEQ ID NO: 38, the primer pair of SEQ ID NO: 39 and SEQ ID NO: 40, the primer pair of SEQ ID NO: 41 and SEQ ID NO: 42, or any combination thereof, for example for methods of detecting or identifying multiple myeloma. In some examples, the one or more primer pairs include each of the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, and the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32 for methods of detecting or identifying multiple myeloma. In other examples, the methods include amplifying the one or more cfRNA biomarkers utilizing one or more primer pairs selected from the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12, the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14, the primer pair of SEQ ID NO: 15 and SEQ ID NO: 16, the primer pair of SEQ ID NO: 17 and SEQ ID NO: 18, the primer pair of SEQ ID NO: 19 and SEQ ID NO: 20, the primer pair of SEQ ID NO: 21 and SEQ ID NO: 22, or any combination thereof for methods of detecting or identifying HCC. In further examples, the one or more primer pairs include each of the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12 and the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14 for methods of detecting or identifying HCC.
[0057] In some embodiments, the biological sample is selected from the group of: a blood sample, a serum sample, a plasma sample, a urine sample, a tear sample, a saliva sample, a breast milk sample, a semen sample, a fecal sample, a cerebrospinal fluid sample, a tissue sample, or a cell sample.
[0058] In some embodiments, the subject is a human who has, or is suspected of having cancer or a predisposition to cancer. For example, a subject can be an individual who has been diagnosed with having a cancer, is going to receive a cancer therapy, and/or has received at least one cancer therapy. The subject can be in remission of a cancer. In another example, a subject can be an individual which has a family history of having a cancer and therefore is predisposed to cancer. In yet another example, a subject can be an individual who was exposed to an environmental agent and therefore is predisposed to cancer.
[0059] As disclosed herein, “biological sample” and “sample” are used interchangeably and may include but are not limited to, a blood sample, a serum sample, a plasma sample, a urine sample, a tear sample, a saliva sample, a breast milk sample, a semen sample, a fecal sample, a tissue sample, or a cell sample. A biological sample may be material obtained from cells or derived from cells of a subject. The biological sample may be a heterogeneous or homogeneous population of cells or tissues. The biological sample may be obtained using any method known to the art that can provide a sample suitable for the analytical methods described herein. The sample may be obtained by non -invasive methods including but not limited to: drawing blood, scraping of the skin or cervix, swabbing of the cheek, saliva collection, urine collection, feces collection, collection of menses, tears, or semen.
[0060] In certain embodiments the biological sample is obtained by biopsy. In other embodiments the biological sample is obtained by swabbing, endoscopy, scraping, phlebotomy, lumbar puncture (spinal tap) or any other methods known in the art. In some cases, the biological sample may be obtained, stored, or transported using components of a kit of the disclosed methods. In some cases, multiple samples, such as multiple blood samples may be obtained for diagnosis by the methods described herein. In some cases, longitudinal studies relying on multiple samples collected at different times may be performed by the methods described herein. In other cases, multiple samples, such as one or more samples from one tissue type (for example esophagus) and one or more samples from another specimen (for example serum) may be obtained for diagnosis by the methods. In some cases, multiple samples such as one or more samples from one tissue type (e.g. esophagus) and one or more samples from another specimen (e.g. serum) may be obtained at the same or different times. Samples may be obtained at different times are stored and/or analyzed by different methods. For example, a sample may be obtained and analyzed by routine staining methods or any other cytological analysis methods.
[0061] In some embodiments the biological sample may be obtained by a physician, nurse, or other medical professional such as a medical technician, endocrinologist, cytologist, phlebotomist, radiologist, or a pulmonologist. The medical professional may indicate the appropriate test or assay to perform on the sample. In certain aspects a molecular profiling business may consult on which assays or tests are most appropriately indicated. In further aspects of the disclosed methods, the patient or subject may obtain a biological sample for testing without the assistance of a medical professional, such as obtaining a whole blood sample, a urine sample, a fecal sample, a buccal sample, or a saliva sample.
[0062] In other cases, the biological sample is obtained by an invasive procedure including but not limited to: biopsy, needle aspiration, endoscopy, or phlebotomy. The method of needle aspiration may further include fine needle aspiration, core needle biopsy, vacuum assisted biopsy, or large core biopsy. In some embodiments, multiple samples may be obtained by the methods herein to ensure a sufficient amount of biological material.
[0063] General methods for obtaining biological samples are also known in the art. Publications such as Ramzy, Ibrahim Clinical Cytopathology and Aspiration Biopsy 2001, which is herein incorporated by reference in its entirety, describes general methods for biopsy and cytological methods. In one embodiment, the sample is a fine needle aspirate of a esophageal or a suspected esophageal tumor or neoplasm. In some cases, the fine needle aspirate sampling procedure may be guided by the use of an ultrasound, X-ray, or other imaging device.
[0064] In certain aspects, the methods for obtaining a biological sample from a subject may include methods of biopsy such as fine needle aspiration, core needle biopsy, vacuum assisted biopsy, incisional biopsy, excisional biopsy, punch biopsy, shave biopsy or skin biopsy. In certain embodiments the biological sample is obtained from a biopsy from liver tissue by any of the biopsy methods previously mentioned. In other embodiments the biological sample may be obtained from any of the tissues provided herein that include but are not limited to non- cancerous or cancerous tissue and non-cancerous or cancerous tissue from the serum, gall bladder, mucosal, skin, heart, lung, breast, pancreas, blood, liver, muscle, kidney, smooth muscle, bladder, colon, intestine, brain, prostate, esophagus, or thyroid tissue. Alternatively, the sample may be obtained from any other source including but not limited to blood, plasma, serum, urine, breastmilk, semen, sweat, hair follicle, buccal tissue, tears, menses, feces, saliva, or cells. In certain aspects of the disclosed methods, any medical professional such as a doctor,
nurse or medical technician may obtain a biological sample for testing. Yet further, the biological sample can be obtained without the assistance of a medical professional.
[0065] In some embodiments, the biological sample may be obtained the from a subject directly, from a medical professional, from a third party, or from a kit provided by a molecular profding business or a third party. In some cases, the biological sample may be obtained by the molecular profding business after the subject, a medical professional, or a third party acquires and sends the biological sample to the molecular profiling business. In some cases, the molecular profiling business may provide suitable containers, and excipients for storage and transport of the biological sample to the molecular profiling business.
[0066] In some embodiments, a medical professional need not be involved in the initial diagnosis or biological sample acquisition. A subject may alternatively provide a biological sample through the use of an over the counter (OTC) kit. An OTC kit may contain a means for providing the biological sample as described herein, a means for storing the biological sample for inspection, and instructions for proper use of the OTC kit. In some cases, molecular profiling services are included in the price for purchase of the kit. In other cases, the molecular profiling services are billed separately. A biological sample suitable for use by the molecular profiling business may contain tissues, cells, nucleic acids, genes, gene fragments, expression products, gene expression products, or gene expression product fragments of a subject.
[0067] In some embodiments, the subject may be referred to a specialist such as an oncologist, surgeon, or endocrinologist. The specialist may likewise obtain a biological sample for testing or refer the individual to a testing center or laboratory for submission of the biological sample. In some cases the medical professional may refer the subject to a testing center or laboratory for submission of the biological sample. In other cases, the subject may provide the biological sample. In some cases, a molecular profiling business may obtain the biological sample.
[0068] In an exemplary embodiment the level of the one or more cell-free (cfRNA) biomarkers is a gene expression level. The methods disclosed herein include measuring expression of coding and/or noncoding cfRNA genes. In some embodiments, the expression of coding and/or noncoding RNA or DNA is analyzed. Measurement of expression can be done by a number of processes known in the art. The process of measuring expression may begin by isolating or extracting RNA from a biological sample (e.g., tissue sample, blood sample, plasma sample, etc.). In an exemplary embodiment, isolation or extraction of cfRNA does not require applying a cell lysis step. In some embodiments, a cell lysis step may be applied to induce release of
polynucleotide from the cell. Skilled persons will understand that cell -lysis or lysis may be induced by applying detergent, mechanical shearing, heat, or any other methods known in the art used to lyse a cell. In some examples, one or more commercially available kits may be used for isolation of cfRNA. Examples include kits from Qiagen (e.g., QIAamp Circulating Nucleic Acid kit), Thermo Fisher Scientific (e.g., MagMAX Cell-Free Total Nucleic Acid kit), Zymo Research (e.g., Quick-cfRNA Serum & Plasma kit). A skilled person can select appropriate kits and methods for isolating or extracting cfRNA.
[0069] In some embodiments, the level of the one or more cfRNA biomarkers is analyzed or measured by hybridization (for example by means of Northern blot analysis or DNA or RNA arrays (microarrays) after converting RNA into labeled complementary DNA (cDNA) and/or amplification by means of a enzymatic chain reaction. In some embodiments, quantitative or semi-quantitative enzymatic amplification methods such as polymerase chain reaction (PCR) or quantitative real-time RT-PCR or semi-quantitative RT-PCR techniques may be used. Other suitable amplification methods may include ligase chain reaction (LCR), transcription-mediated amplification (TMA), strand displacement amplification (SDA), isothermal amplification of nucleic acids, and nucleic acid sequence based amplification (NASBA).
[0070] As used herein, “primer” refers to a single-stranded polynucleotide configured to hybridize with a complementary polynucleotide strand and define a region or locus of the polynucleotide where amplification will initiate. As used herein, a “primer pair” refers to two primers configured to hybridize with a polynucleotide and define a region or locus that will be amplified. For example, a typical PCR reaction relies on a “forward” primer and a “reverse” primer, used conjunctively as a primer pair, to hybridize to, respectively, the antisense and sense strands of a double-stranded polynucleotide (e.g., DNA). Thus, use as a primer pair constitutes using a primer pair configured to amplify a specific region or locus, such as a selected cfRNA biomarker.
[0071] In an exemplary embodiment, primer pairs are selected to amplify one or more cfRNA biomarkers (see Table 17). In some embodiments, the method uses of any of: SEQ ID NO: 1 and SEQ ID NO: 2 as a primer pair; SEQ ID NO: 3 and SEQ ID NO: 4 as a primer pair; SEQ ID NO: 5 and SEQ ID NO: 6 as a primer pair; SEQ ID NO: 7 and SEQ ID NO: 8 as a primer pair; SEQ ID NO: 9 and SEQ ID NO: 10 as a primer pair; SEQ ID NO: 11 and SEQ ID NO: 12 as a primer pair; SEQ ID NO: 13 and SEQ ID NO: 14 as a primer pair; SEQ ID NO: 15 and SEQ ID NO: 16 as a primer pair; SEQ ID NO: 17 and SEQ ID NO: 18 as a primer pair; SEQ ID
NO: 19 and SEQ ID NO: 20 as a primer pair; SEQ ID NO: 21 and SEQ ID NO: 22 as a primer pair; SEQ ID NO: 23 and SEQ ID NO: 24 as a primer pair; SEQ ID NO: 25 and SEQ ID NO: 26 as a primer pair; SEQ ID NO: 27 and SEQ ID NO: 28 as a primer pair; SEQ ID NO: 29 and SEQ ID NO: 30 as a primer pair; SEQ ID NO: 31 and SEQ ID NO: 32 as a primer pair; SEQ ID NO: 33 and SEQ ID NO: 34 as a primer pair; SEQ ID NO: 35 and SEQ ID NO: 36 as a primer pair; SEQ ID NO: 37 and SEQ ID NO: 38 as a primer pair; SEQ ID NO: 39 and SEQ ID NO: 40 as a primer pair; SEQ ID NO: 41 and SEQ ID NO: 42 as a primer pair; or any combination thereof, to analyze the one or more cfRNA biomarkers in the biological sample.
[0072] It is understood that additional separate embodiments are contemplated wherein each method herein uses each individual primer pair previously mentioned. For instance, one embodiment for each method uses the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6 and another embodiment for each method uses the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, and so on.
[0073] In some embodiments, gene expression levels of the one or more cfRNA biomarkers may also be analyzed by RNA sequencing methods known in the art. RNA sequencing methods may include cfRNA-seq, total RNA-seq, targeted RNA-seq, small RNA-seq, single-cell RNA- seq, ultra-low-input RNA- seq, RNA exome capture sequencing, and ribosome profding. Sequencing data may be processed an aligned using methods known in the art.
[0074] In some embodiments, a method for analyzing one or more cfRNA biomarkers by sequencing comprises: (a) isolating a set of one or more cfRNA biomarkers from the biological sample; (b) analyzing the set of one or more cfRNA biomarkers isolated in Step (a) to produce a set of one or more sequence reads; and (c) performing a differential expression analysis on the set of one or more sequence reads to a corresponding consensus sequence (CS) to measure the level of at least one cell-free RNA (cfRNA) biomarker selected from the group consisting of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, or any combination thereof. A differential expression shown between the set of one or more sequence reads aligning with a corresponding CS indicates cancer or a predisposition for cancer in the subject.
[0075] In some embodiments, the analysis used to obtain sequencing reads of Step (b) is: Maxam-Gilbert sequencing, chain-termination sequencing, pyrosequencing, or massive parallel sequencing, or any combination thereof.
[0076] In some embodiments, the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.
[0077] In some embodiments, one or more primer pairs, selected from the group of: SEQ ID NO: 1 and SEQ ID NO: 2; SEQ ID NO: 3 and SEQ ID NO: 4; SEQ ID NO: 5 and SEQ ID NO: 6; SEQ ID NO: 7 and SEQ ID NO: 8; SEQ ID NO: 9 and SEQ ID NO: 10; SEQ ID NO: 11 and SEQ ID NO: 12; SEQ ID NO: 13 and SEQ ID NO: 14; SEQ ID NO: 15 and SEQ ID NO: 16; SEQ ID NO: 17 and SEQ ID NO: 18; SEQ ID NO: 19 and SEQ ID NO: 20; SEQ ID NO: 21 and SEQ ID NO: 22; SEQ ID NO: 23 and SEQ ID NO: 24; SEQ ID NO: 25 and SEQ ID NO: 26; SEQ ID NO: 27 and SEQ ID NO: 28; SEQ ID NO: 29 and SEQ ID NO: 30; SEQ ID NO: 31 and SEQ ID NO: 32; SEQ ID NO: 33 and SEQ ID NO: 34; SEQ ID NO: 35 and SEQ ID NO: 36; SEQ ID NO: 37 and SEQ ID NO: 38; SEQ ID NO: 39 and SEQ ID NO: 40; SEQ ID NO: 41 and SEQ ID NO: 42; or any combination thereof, are used to generate cDNA useful for producing sequencing reads of the one or more cfRNA biomarkers. In some embodiments, one or more cfRNA biomarkers from the group of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected or utilized to indicate blood cancer or a predisposition to blood cancer. In some embodiments, one or more cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, are selected or utilized to indicate multiple myeloma (MM). In further examples, the cfRNA biomarkers CENPE, HBG1, HBG2, and NUSAP1 are selected or utilized to indicate MM. In some embodiments, one or more cfRNA biomarkers from the group of: FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS). In some embodiments, one or more cfRNA biomarkers from the group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected or utilized to indicate liver cancer or a predisposition to liver cancer. In some embodiments, one or more
cfRNA biomarkers from the group of: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, or any combination thereof, are selected or utilized to indicate hepatocellular carcinoma (HCC). In further examles, the cfRNA biomarkers C3, CP, FGA, FGB, and IFITM3 are selected or utilized to indicate HCC. In some embodiments, one or more cfRNA biomarkers from the group of: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected or utilized to indicate cirrhosis.
[0078] In some embodiments, the sequencing reads of Step (b) are obtained by: Maxam- Gilbert sequencing, chain-termination sequencing, pyrosequencing, massive parallel sequencing, or any combination thereof.
[0079] In some embodiments, the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.
[0080] To normalize the expression values of one gene among different samples, comparing the cfRNA level of interest in the samples from the subject with a control value (CV) is possible. In some embodiments, a CV may be of a gene for which the expression level does not differ across sample types, for example a gene that is constitutively expressed in all types of cells. In some embodiments, a CV may be of a gene for which the expression level indicates a non-cancerous state in the subject. In some embodiments, a known amount of a control RNA may be added to the sample(s) and the value analyzed for the level of the RNA of interest may be normalized to the value analyzed for the known amount of the control RNA. Normalization for some methods, such as for sequencing, may comprise calculating the reads per kilobase of transcript per million mapped reads (RPKM) for a gene of interest, or may comprise calculating the fragments per kilobase of transcript per million mapped reads (FPKM) for a gene of interest. Normalization methods may comprise calculating the log2-transformed count per million (log- CPM). Skilled persons will understand that any method of normalization that accurately calculates the expression value of an RNA for comparison between samples may be used.
[0081] In some embodiments, the CV is a reference expression level. As used herein, the term "reference expression level" (or “reference level”) refers to a value used as a reference for the
values/data obtained from samples obtained from a subject. The reference level can be an absolute value, a relative value, a value which has an upper and/or lower limit, a series of values, an average value, a median, a mean value, or a value expressed by reference to a control or reference value. A reference level can be based on the value obtained from an individual sample, such as, for example, a value obtained from a sample from the subject but obtained at a previous point in time. The reference level can be based on a high number of samples, such as the levels obtained in a cohort of subjects having a particular characteristic. The reference level may be defined as the mean level of the patients in the cohort. A reference level can be based on the expression levels of the biomarkers obtained from samples from subjects who do not have a disease state or a particular phenotype. Skilled persons will understand that the particular reference expression level can vary depending on the specific method to be performed.
[0082] Some embodiments include determining that an analyzed expression level is higher than, lower than, increased relative to, decreased relative to, equal to, or within a predetermined amount of a reference expression level. In some embodiments, a higher, lower, increased, or decreased expression level is at least 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 50, 100, 150, 200, 250, 500, or 1000 fold (or any derivable range therein) or at least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, or 900% different than the reference level, or any derivable range therein. These values may represent a predetermined threshold level, and some embodiments include determining that the analyzed expression level is higher by a predetermined amount or lower by a predetermined amount than a reference level. In some embodiments, a level of expression may be qualified as “low” or “high,” which indicates the patient expresses a certain gene or cfRNA at a level relative to a reference level or a level with a range of reference levels that are determined from multiple samples meeting particular criteria. The level or range of levels in multiple control samples is an example of this. In some embodiments, that certain level or a predetermined threshold value is at, below, or above 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 percentile, or any range derivable therein. Moreover, a threshold level may be derived from a cohort of individuals meeting a particular criteria. The number in the cohort may be, be at least, or be at most 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180,
190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370,
380, 390, 400, 410, 420, 430, 440, 441, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550,
560, 570, 580, 590, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800,
1900, 2000 or more (or any range derivable therein). An analyzed expression level can be considered equal to a reference expression level if it is within a certain amount of the reference expression level, and such amount may be an amount that is predetermined. The predetermined amount may be within 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, or 50% of the reference level, or any range derivable therein.
[0083] In some embodiments, a comparison of cfRNA gene expression levels to a is to be made on a gene-by-gene basis. For example, if the expression levels of gene A, gene B, and gene X, as reflected in a patient’s cfRNA levels, are analyzed, a comparison to mean expression levels as reflected in cfRNA from a cohort of patients would involve: comparing the expression level of gene A in the patient’s cfRNA with the mean expression level of gene A reflected in cfRNA from the cohort of patients, comparing the expression level of gene B reflected in the patient’s cfRNA with the mean expression level of gene B in cfRNA from the cohort of patients, and comparing the expression level of gene X in cfRNA from the patient with the mean expression level of gene X in cfRNA from the cohort of patients. In the above example, genes A, B, and X may be selected from any one of: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH for comparison. Comparisons that involve determining whether the expression level analyzed in cfRNA from a patient is within a predetermined amount of a mean expression level or reference expression level are similarly done on a gene-by-gene basis, as applicable.
[0084] In an exemplary embodiment, a differential expression analysis is performed comparing the level of each cfRNA biomarker that is analyzed or utilized to a corresponding control value (CV). Differential expression shown by the differential expression analysis between the cfRNA biomarkers and corresponding CVs indicates cancer or a predisposition for cancer in the subject.
[0085] In some embodiments, the differential expression analysis comprises: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expresssion (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.
[0086] In some embodiments, the method measures the level of one or more cfRNA biomarker levels by Maxam-Gilbert sequencing, chain-termination sequencing, pyrosequencing, or massive parallel sequencing.
[0087] In some embodiments, DNA from the biological sample, cDNA derived from RNA from the biological sample, and/or amplification products of any of these are sequenced to produced sequencing reads identifying the order of nucleotides present in the sequenced polynucleotides or the complements thereof. A variety of suitable sequencing techniques are available.
[0088] In some embodiments, the method comprises: (a) collecting a biological sample from the subject; (b) isolating a set of one or more cfRNA molecules from the biological sample collected in Step (a); (c) sequencing the set of one or more cfRNA molecules isolated in Step (b) to produce a set of one or more sequence reads; and (d) performing a differential expression analysis on the set of one or more sequence reads to a corresponding consensus sequence (CS) to measure the level of at least one cell-free RNA (cfRNA) biomarker selected from the group consisting of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, or any combination thereof, in the biological sample. Differential expression between the set of one or more sequence reads aligning with a corresponding CS indicates cancer or a predisposition for cancer in the subject.
[0089] In some embodiments, sequencing comprises massively parallel sequencing of about, or at least about 10,000, 100,000, 500,000, 1,000,000, or more DNA or cDNA molecules using a high-throughput sequencing by synthesis process, such as Illumina's sequencing-by-synthesis and reversible terminator-based sequencing chemistry (e.g. as described in Bentley et al., Nature 6:53-59 (2009)). In some embodiments, particularly when cfDNA is included among the polynucleotides to be sequenced, DNA is not fragmented prior to sequencing. Typically, Illumina's sequencing process comprises attachment of template DNA to a planar, optically transparent surface on which oligonucleotide anchors are bound. (In some embodiments,
template DNA may include cDNA.) Template DNA is end-repaired to generate 5'- phosphorylated blunt ends, and the polymerase activity of Klenow fragment is used to add a single A base to the 3' end of the blunt phosphorylated DNA. This addition prepares the DNA for ligation to oligonucleotide adapters, which optionally have an overhang of a single T base at their 3' end to increase ligation efficiency. The adapter oligonucleotides are complementary to the flow-cell anchor oligos. Under limiting -dilution conditions, adapter-modified, singlestranded template DNA is added to the flow cell and immobilized by hybridization to the anchor oligos. Attached DNA fragments are extended and bridge amplified to create an ultra- high density sequencing flow cell with hundreds of millions of clusters, each containing about 1,000 copies of the same template. In one embodiment, the template DNA is amplified using PCR before it is subjected to cluster amplification, such as in a process described above. In some applications, the templates are sequenced using a robust four-color DNA sequencing-by- synthesis technology that employs reversible terminators with removable fluorescent dyes. High-sensitivity fluorescence detection is achieved using laser excitation and total internal reflection optics. Short sequence reads of about tens to a few hundred base pairs are aligned against a reference genome, and unique mapping of the short sequence reads to the reference genome are identified using specially developed data analysis pipeline software. After completion of the first read, the templates can be regenerated in situ to enable a second read from the opposite end of the fragments. Thus, either single -end or paired end sequencing of the DNA fragments can be used.
[0090] Another non-limiting example sequencing process is the single molecule sequencing technology of the Helicos True Single Molecule Sequencing (tSMS) technology (e.g. as described in Harris T. D. et al., Science 320: 106-109 (2008)). In a typical tSMS process, a DNA sample is cleaved into, or otherwise provided as strands of approximately 100 to 200 nucleotides, and a polyA sequence is added to the 3' end of each DNA strand. Each strand is labeled by the addition of a fluorescently labeled adenosine nucleotide. The DNA strands are then hybridized to a flow cell, which contains millions of oligo-T capture sites that are immobilized to the flow cell surface. In some embodiments, the templates are at a density of about 100 million templates/cm2. The flow cell is then loaded into an instrument, e.g., HeliScope™ sequencer, and a laser illuminates the surface of the flow cell, revealing the position of each template. A CCD camera can map the position of the templates on the flow cell surface. The template fluorescent label is then cleaved and washed away. The sequencing
reaction begins by introducing a DNA polymerase and a fluorescently labeled nucleotide. The oligo-T nucleic acid serves as a primer. The polymerase incorporates the labeled nucleotides to the primer in a template directed manner. The polymerase and unincorporated nucleotides are removed. The templates that have directed incorporation of the fluorescently labeled nucleotide are discerned by imaging the flow cell surface. After imaging, a cleavage step removes the fluorescent label, and the process is repeated with other fluorescently labeled nucleotides until the desired read length is achieved. Sequence information is collected with each nucleotide addition step. Whole genome sequencing by single molecule sequencing technologies excludes or typically obviates PCR-based amplification in the preparation of the sequencing libraries. [0091] Another illustrative, but non-limiting example sequencing process is pyrosequencing, such as in the 454 sequencing platform (Roche) (e.g. as described in Margulies, M. et al. Nature 437:376-380 (2005)). 454 sequencing typically involves two steps. In the first step, DNA is sheared into fragments of, or otherwise provided (e.g. as naturally occurring cfDNA molecules, or cDNA from naturally short RNA molecules) as DNA having sizes of approximately 300-800 base pairs, and the polynucleotides are blunt-ended. Oligonucleotide adapters are then ligated to the ends of the DNA. The adapters serve as primers for amplification and sequencing of the DNA. The DNA can be attached to capture beads, e.g., streptavidin-coated beads using, e.g., adapter B, which contains 5'-biotin tag. The DNA attached to the beads are PCR amplified within droplets of an oil-water emulsion. The result is multiple copies of clonally amplified DNA molecules on each bead. In the second step, the beads are captured in wells (e.g., picoliter-sized wells). Pyrosequencing is performed on each DNA molecule in parallel.
Addition of one or more nucleotides generates a light signal that is recorded by a CCD camera in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated. Pyrosequencing makes use of pyrophosphate (PPi) which is released upon nucleotide addition. PPi is converted to ATP by ATP sulfurylase in the presence of adenosine 5' phosphosulfate. Luciferase uses ATP to convert luciferin to oxyluciferin, and this reaction generates light that is measured and analyzed.
[0092] Further high-throughput sequencing processes are available. Non-limiting examples include sequencing by ligation technologies (e.g., SOLiD™ sequencing of Applied Biosystems), single-molecule real-time sequencing (e.g., Pacific Biosciences sequencing platforms utilizing zero-mode wave detectors), nanopore sequencing (e.g. as described in Soni G V and Meller A. Clin Chem 53: 1996-2001 (2007)), sequencing using a chemical-sensitive
field effect transistor (e.g., as described in U.S. Patent Application Publication No. 20090026082 ), sequencing platforms by Ion Torrent (pairing semiconductor technology with sequencing chemistry to directly translate chemically encoded information (A, C, G, T) into digital information (0, 1) on a semiconductor chip), and sequencing by hybridization. Additional illustrative details regarding sequencing technologies can be found in, e.g., U.S. Patent Application Publication No. 2016/031 9345 .
[0093] In some embodiments using unique molecular identifiers (UMIs), multiple sequence reads having the same UMI(s) are collapsed to obtain one or more consensus sequences, which are then used to determine the sequence of a source DNA polynucleotide. Multiple distinct reads may be generated from distinct instances of the same source DNA polynucleotide, and these reads may be compared to produce a consensus sequence. The instances may be generated by amplifying a source DNA molecule prior to sequencing, such that distinct sequencing operations are performed on distinct amplification products, each sharing the source DNA polynucleotide's sequence. Of course, amplification may introduce errors such that the sequences of the distinct amplification products have differences. In the context some sequencing technologies such as an embodiment of Illumina's sequencing -by-synthesis, a source DNA molecule or an amplification product thereof forms a cluster of DNA molecules linked to a region of a flow cell. The molecules of the cluster collectively provide a read. Typically, at least two reads are required to provide a consensus sequence. Sequencing depths of 100, 1000, and 10,000 are examples of sequencing depths useful in the disclosed embodiments for creating consensus reads for low allele frequencies (e.g., about 1% or less). In some embodiments, nucleotides that are consistent across 100% of the reads sharing a UMI or combination of UMIs are included in the consensus sequence. In some embodiments, consensus criterion can be lower than 100%. For instance, a 90% consensus criterion may be used, which means that base pairs that exist in 90% or more of the reads in the group are included in the consensus sequence. In some embodiments, the consensus criterion may be set at about, or more than about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 95%, or about 100%.
[0094] In some embodiments, sequencing reads (or consensus sequences thereof) are identified as originating from an RNA molecule in the source sample if the tag sequence (or the complement thereof) forms part of the sequence read (optionally, at an expected position, and/or adjacent to other expected sequence element(s)), and otherwise is identified as originating from
a DNA molecule in the source sample if the tag sequence (or the complement thereof) is absent. In this way, RNA sequencing reads and DNA sequencing reads can be produced in a single sequencing reaction, but analyzed separately, and optionally compared to one another. In some embodiments, a processor is used to group RNA-derived sequences separately from DNA- derived sequences. For example, in some embodiments, a mutation relative to an internal reference (e.g. overlapping reads) or an external reference (e.g. a reference genome) is only designated as accurately representing the original molecule (e.g. a DNA molecule of the sample) if the same mutation is identified in one or more reads corresponding to an original molecule of the other type (e.g. an RNA molecule of the sample). This is particularly helpful for increasing sequencing accuracy in cases where no UMIs are used, and can further increase sequencing accuracy when used in combination with UMIs. In some embodiments, for the purposes of alignment among sequencing reads and/or between sequencing reads and a reference sequence, one or more sequences corresponding to features known not to be present in the source polynucleotides (e.g. sequences known to originate from tag oligonucleotides, RT primers, TSOs, or amplification primers) are computationally ignored (e.g. filtered out of the reads prior to alignment).
[0095] In some embodiments, sequencing reads (or consensus sequence thereof) are localized (mapped) by aligning the reads to a known reference genome. In some embodiments, localization is realized by k-mer sharing and read-read alignment. In some embodiments, the reference genome sequence is the NCBI36/hgl8 sequence, which is available on the World Wide Web at genome.ucsc.edu/cgi-bin/hgGateway?org=Human&db=hgl8&hgsid=166260105). In some embodiments, the reference genome sequence is the GRCh37/hgl9 or GRCh38, which is available on the World Wide Web at genome.ucsc.edu/cgi-bin/hgGateway. Other sources of public sequence information include GenBank, dbEST, dbSTS, EMBL (the European Molecular Biology Laboratory), and the DDBJ (the DNA Databank of Japan). A number of computer algorithms are available for aligning sequences, including without limitation BLAST (Altschul et al., 1990), BLITZ (MPsrch) (Sturrock & Collins, 1993), FASTA (Person & Lipman, 1988), BOWTIE (Langmead et al., Genome Biology 10:R25.1-R25.10 [2009]), or ELAND (Illumina, Inc., San Diego, Calif., USA). In some embodiments, one end of clonally expanded copies of plasma polynucleotide molecules (or amplification products thereof) is sequenced and processed by bioinformatics alignment analysis for the Illumina Genome Analyzer, which uses the Efficient Large-Scale Alignment of Nucleotide Databases (ELAND)
software. By aligning reads to a reference genome, the genomic locations of mutations relative to the reference sequence can be identified. In some cases, alignment will facilitate inferring an effect of the mutation and/or a property of the cell from which it originated. For example, if the mutation creates a premature stop codon in a tumor suppressor gene, it may be inferred that the source polynucleotide originated from a cancer cell, particularly if there are a statistically significant number of cancer-associated markers are detected in the sequencing reads.
[0096] In some embodiments, one or more causal genetic variants are sequence variants associated with a particular type or stage of cancer, or of cancer having a particular characteristic (e.g. metastatic potential, drug resistance, drug responsiveness). As used herein, “causal variant” refers to genetic variants responsible for an associated signal at a locus, such as biological effect on the phenotype of the subject. In some embodiments, the disclosure provides methods for the determination of prognosis, such as where certain mutations or other genetic characteristics are known to be associated with patient outcomes. For example, circulating tumor DNA (ctDNA) has been shown to be a better biomarker for breast cancer prognosis than the traditional cancer antigen 53 (CA-53) and enumeration of circulating tumor cells (see e.g. Dawson, et al., N Engl J Med 368: 1199 (2013))n some embodiments, methods of the present disclosure comprise treating a subject based on RNA and DNA polynucleotide biomarkers analyzed in a sample from the subject. By way of non-limiting example, methods disclosed herein can be used in making therapeutic decisions, guidance and monitoring, as well as development and clinical trials of cancer therapies. For example, treatment efficacy can be monitored by comparing an individual’s DNA and RNA in samples from before, during, and after treatment with particular therapies such as molecular targeted therapies (monoclonal drugs), chemotherapeutic drugs, radiation protocols, etc. or combinations of these. In some examples, the subject is identified as having MM using the methods provided herein and is treated with one or more of immunotherapy (such elotuzumab, daratumumab, or isatuximab), corticosteroids (such as dexamethasone), immunomodulating agents (such as thalidomide, lenalidomide, or pomalidmide), proteasome inhibitors (such as bortezomib, carfilzomib, or ixazmoib), chemotherapy (such as cisplatin, doxorubicin, cyclophosphamide, etoposide, melphalan, and/or bendamu stine), CAR-T therapy (such as idecabtagene violence 1 and/or ciltacabtagene autoleucel), and bone marrow transplant. In other examples, the subject is identified as having HCC using the methods provided herein and is treated with one or more of surgery (such as hepatectomy), radiation therapy, radiofrequency ablation, percutaneous ethanol
injection, radioembolization, chemoembolization, immunotherpay (such as bevacizumab, atezolizumab, ramucirumab, pembrolizumab, and/or nivolumab), targeted therapy (such as sorafenib, lenvatinib, cabozantinib, and/or regorafenib), chemotherapy (such as doxorubicin, gemcitabine, oxaliplatin, cisplatin, 5 -fluorouracil, capecitabine, and/or mitoxantrone), and/or liver transplant. A skilled clinician can select approbate treatment regimen(s) based on the subject, disease being treated, stage of disease, condition of the subject, and other factors. [0097] In some embodiments, a series of samples collected over time from a single subject may be monitored to see if certain mutations, expression levels, or other phenotypic changes occur without treatment (e.g., longitudinal testing to monitor cancer staging from non-cancer to pre -malignancy or pre-malignancy to cancer). In some embodiments, cell-free polynucleotides are monitored to see if certain mutations, expression levels, or other features of DNA or RNA increase or decrease, or new mutations appear, after treatment, which can allow a physician to alter a treatment (continue, stop or change treatment, for example) in a much shorter penod of time than afforded by methods of monitonng that track traditional patient symptoms. In some examples, a subject identified as having a predisposition to cancer (such as MM or HCC) using the disclosed methods is monitored at intervals (such as every 3 months, every 6 months, annually, every 2 years, or more) to identify if progession to cancer has occurred or is occurring. In some examples, the subject has a predisposition to MM (for example, has MGUS) and the monitonng may include one or more of a second (or more) screen with the methods provided herein, blood tests (such as to detect M protein and/or p2-microglobulin, blood cell counts, and/or calcium levels), unne tests (such as to detect M protein), bone manow biopsy, and/or imaging tests. In other examples, the subject has a predisposition to HCC (such as liver cinhosis) and the monitonng may include one or more of a second (or more) screening with the methods provided herein, diagnostic imaging, and/or liver biopsy. In some embodiments, a method further comprises the step of diagnosing an individual based on the RNA-derived sequences and DNA-denved sequences, such as diagnosing the subject with a particular stage or type of cancer associated with a detected sequence variant, or reporting a likelihood that the patient has or will develop such cancer.
[0098] In one aspect, the present disclosure provides systems, such as computer systems, for implementing methods descnbed herein, including with respect to any of the vanous other aspects of this disclosure. It should be understood that it is not practical, or even possible in most cases, for an unaided human being to perform computational operations involved in some
embodiments of methods disclosed herein. For example, mapping a single 30 bp read from a sample to any one of the human chromosomes might require years of effort without the assistance of a computational apparatus. Of course, the challenge of unaided sequence analysis and alignment is compounded in cases where reliable calls of low allele frequency mutations require mapping thousands (e.g., at least about 10,000) or even millions of reads to one or more chromosomes. Accordingly, some embodiments of methods described herein are not capable of being performed in the human mind alone, or with mere pencil in paper, but rather necessitate the use of a computational system, such as a system comprising one or more processors programmed to implement one or more analytical processes.
[0099] In some embodiments, the disclosure provides tangible and/or non-transitory computer readable media or computer program products that include program instructions and/or data (including data structures) for performing various computer-implemented operations. Examples of computer-readable media include, but are not limited to, semiconductor memory devices, magnetic media such as disk drives, magnetic tape, optical media such as CDs, magneto-optical media, and hardware devices that are specially configured to store and perform program instructions, such as read-only memory devices (ROM) and random access memory (RAM). The computer readable media may be directly controlled by an end user or the media may be indirectly controlled by the end user. Examples of directly controlled media include the media located at a user facility and/or media that are not shared with other entities. Examples of indirectly controlled media include media that is indirectly accessible to the user via an external network and/or via a service providing shared resources such as the "cloud." Examples of program instructions include both machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter.
[00100] In some embodiments, the data or information employed in methods and systems disclosed herein are provided in an electronic format. Examples of such data or information include, but are not limited to, sequencing reads derived from a nucleic acid sample, reference sequences (including reference sequences providing solely or primarily polymorphisms), sequences of one or more oligonucleotides used in the preparation of the sequencing reads (including portions thereof, and/or complements thereof), calls such as cancer diagnosis calls, counseling recommendations, diagnoses, and the like. As used herein, data or other information provided in electronic format is available for storage on a machine and transmission between machines. Conventionally, data in electronic format is provided digitally and may be stored as
bits and/or bytes in various data structures, lists, databases, etc. The data may be embodied electronically, optically, etc.
[00101] In some embodiments, provided herein is a computer program product for generating an output indicating the sequences of DNA and RNA in a test sample. The computer product may contain instructions for performing any one or more of the above-described methods for determining DNA and RNA sequences. As explained, the computer product may include a non-transitory and/or tangible computer readable medium having a computer executable or compilable logic (e.g., instructions) recorded thereon for enabling a processor to determine a sequence of interest. In one example, the computer product includes a computer readable medium having a computer executable or compilable logic (e.g., instructions) recorded thereon for enabling a processor to diagnose a condition and/or determine a nucleic acid sequence of interest.
[00102] In some embodiments, methods described herein (or portions thereof) are performed using a computer processing system which is adapted or configured to perform a method for determining the sequence of polynucleotides derived from DNA and RNA of a sample, such as one or more sequences of interest (e.g. an expressed gene or portion thereof). In some embodiments, a computer processing system is adapted or configured to perform a method as described herein. In one embodiment, the system includes a sequencing device adapted or configured for sequencing polynucleotides to obtain the type of sequence information described elsewhere herein, such as with regard to any of the various aspects described herein. In some embodiments, the apparatus includes components for processing the sample, such as liquid handlers and sequencing systems, comprising modules for implementing one or more steps of any of the various methods described herein (e.g. sample processing, polynucleotide purification, and various reactions (e.g. RT reactions, amplification reactions, and sequencing reactions).
[00103] In some embodiments, sequence or other data is input into a computer or stored on a computer readable medium either directly or indirectly. In one embodiment, a computer system is directly coupled to a sequencing device that reads and/or analyzes sequences of nucleic acids from samples. Sequences or other information from such tools are provided via interface in the computer system. Alternatively, the sequences processed by system are provided from a sequence storage source such as a database or other repository. Once available to the processing apparatus, a memory device or mass storage device buffers or stores, at least
temporarily, sequences of the nucleic acids. In addition, the memory device may store read counts for various chromosomes or genomes, etc. The memory may also store various routines and/or programs for analyzing the sequence or mapped data. In some embodiments, the programs/routines include programs for performing statistical analyses.
[00104] In one example, a user provides a polynucleotide sample into a sequencing apparatus. Data is collected and/or analyzed by the sequencing apparatus which is connected to a computer. Software on the computer allows for data collection and/or analysis. Data can be stored, displayed (via a monitor or other similar device), and/or sent to another location. The computer may be connected to the internet, which is used to transmit data to a handheld device utilized by a remote user (e.g., a physician, scientist or analyst). It is understood that the data can be stored and/or analyzed prior to transmittal. In some embodiments, raw data is collected and sent to a remote user or apparatus that will analyze and/or store the data. Transmittal can occur via the internet, but can also occur via satellite or other connection. Alternately, data can be stored on a computer-readable medium and the medium can be shipped to an end user (e.g., via mail). The remote user can be in the same or a different geographical location including, but not limited to a building, city, state, country or continent.
[00105] In some embodiments, the methods comprise collecting data regarding a plurality of polynucleotide sequences (e.g., reads, consensus sequences, and/or reference chromosome sequences) and sending the data to a computer or other computational system. For example, the computer can be connected to laboratory equipment, e.g., a sample collection apparatus, a nucleotide amplification apparatus, a nucleotide sequencing apparatus, or a hybridization apparatus. The computer can then collect applicable data gathered by the laboratory device. The data can be stored on a computer at any step, e.g., while collected in real time, prior to the sending, during or in conjunction with the sending, or following the sending. The data can be stored on a computer-readable medium that can be extracted from the computer. T he data collected or stored can be transmitted from the computer to a remote location, e.g., via a local network or a wide area network such as the internet. At the remote location various operations can be performed on the transmitted data.
[00106] Among the types of electronically formatted data that may be stored, transmitted, analyzed, and/or manipulated in systems, apparatus, and methods disclosed herein are the following: reads obtained by sequencing nucleic acids, consensus sequences based on the reads, the reference genome or sequence, thresholds for calling a test sample as either affected, non-
affected, or no call, the actual calls of medical conditions related to the sequence of interest, diagnoses (clinical condition associated with the calls), recommendations for further tests derived from the calls and/or diagnoses, treatment and/or monitoring plans derived from the calls and/or diagnoses. In some embodiments, these various types of data are obtained, stored transmitted, analyzed, and/or manipulated at one or more locations using distinct apparatus. The processing options span a wide spectrum of options. At one end of the spectrum, all or much of this information is stored and used at the location where the test sample is processed, e.g., a doctor's office or other clinical setting. At the other end of the spectrum, the sample is obtained at one location, it is processed and optionally sequenced at a different location, reads are aligned and calls are made at one or more different locations, and diagnoses, recommendations, and/or plans are prepared at still another location (which may be a location where the sample was obtained).
[00107] Also provided herein are kits that may be used in connection with the disclosed methods and systems. In some examples, the kits include one or more primer pairs (such as 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more primers pairs) for analyzing or measuring the level of one or more of the disclosed cfRNA biomarkers. In some examples, the kits include up to 10 primers pairs selected from SEQ ID NOs: 23-42, for example for use in methods of diagnosing or treating MM. In other examples, the kits include 4 or more primers pairs, including SEQ ID NOs: 25-32, for example, for use in methods of diagnosing or treating MM. In further examples, the kits include up to 9 primer pairs selected from SEQ ID NOs: 5-22, for example, for methods of diagnosing or treating HCC. In some examples, the kits include 5 or more primer pairs, including SEQ ID NOs: 5-14, for example, for use in diagnosing or treating HCC. [00108] The kits may further include additional components for use in connection with the disclosed methods, such as one or more buffers, enzymes (such as a reverse transcriptase and/or a DNA polymerase), salts, or other reaction components. In additional examples, the kits may include reagents for one or more controls, such as primers for amplification of one or more control cfRNAs. In some examples, the kits include one or more control primer pairs selected from the pair of SEQ ID NO: 1 and SEQ ID NO: 2 or the pair of SEQ ID NO: 3 and SEQ ID NO: 4.
EXAMPLES
[00109] As disclosed herein, total plasma cfRNA from plasma samples of patients with HCC and MM and their pre-cancerous conditions including liver cirrhosis, MGUS, and noncancer (NC) donors (also referred to herein as healthy donors (HD)), were sequenced. Potential cfRNA biomarkers were identified using plasma cfRNA-sequencing of a pilot sample set and then validated in an independent sample set. The sequencing data were further validated using orthogonal measurement by quantitative reverse transcription PCR. Feature selection and classification models were built to explore the potential of cfRNA profiles in differentiating malignant from pre -malignant conditions.
[00110] Cell-free RNA (cfRNA) in plasma reflects phenotypic alterations of both localized sites of cancer and the systemic host response. In one aspect, the present disclosure provides methods for utilizing cfRNA sequencing to identify messenger RNA (mRNA) signatures in plasma with the tissue of origin specific to cancer types and pre-cancerous conditions. Total cfRNA were sequenced from plasma samples of hepatocellular carcinoma (HCC) and multiple myeloma (MM) patients, their respective pre-cancerous conditions and non-cancer donors to explore the diagnostic potential. Distinct gene sets were identified and classification models were built using the random forest and linear discriminant analysis algorithms that could distinguish cancer patients from premalignant conditions and non-cancer individuals with high accuracy. Sequencing data was cross-validated by quantitative reverse transcription PCR and cfRNA biomarkers were validated in independent sample sets with AUC higher than 0.86.
Distinction of multiple myeloma from its pre-cancerous condition, monoclonal gammopathy of undetermined significance (MGUS), yielded an accuracy of 90% (17/19). Detection of primary liver cancer from its premalignant condition cirrhosis yielded an accuracy of 100% (12/12). This work demonstrates the potential of using mRNA transcripts in plasma with a small panel of genes for monitoring pre-malignant disease progression from cirrhosis to HCC and MGUS to MM.
[00111] Disclosed herein are methods for analyzing cfRNA biomarker panels to distinguish cancers and their pre-malignant conditions. Total plasma cfRNA were sequenced from plasma samples of patients with liver cancer (HCC) and multiple myeloma (MM) and their pre- cancerous conditions including liver cirrhosis (Cirr) and MGUS, and non-cancer donors. Potential cfRNA biomarkers were identified using plasma cfRNA-sequencing of a pilot sample set and validated the potential cfRNA biomarkers in an independent sample set. The sequencing data were then cross-validated using orthogonal measurement by quantitative
reverse transcription PCR. Feature selection and classification models were built to explore the potential of cfRNA profiles in differentiating malignant from pre-malignant conditions.
1. Plasma cfRNA Biomarkers Identified by Sequencing
[00112] To identify cfRNA transcripts which potentially distinguish cancer patients from healthy individuals, blood samples were prospectively collected from the following sample sets: a pilot set of 10 MM patients and 8 HCC patients; 13 patients with pre-malignant conditions including 9 MGUS and 4 Cirr; and 20 age and gender matched non -cancer donors. Table 1 and Table 2 show detailed clinical information of, respectively, the pilot set and validation set .
[00115] Samples were randomly shuffled for RNA extraction, library preparation and sequencing in Illumina flow cells. Libraries were sequenced to saturation with a mean of 33.8M raw reads with a range of 27.7M to 52.3M (as shown below in Tables 3 and 4). After selecting for reads that mapped uniquely to the human genome, the cfRNA libraries had an average read depth of 14M with a range from 2.3M to 43M. On average, 80% of reads mapped to exons (shown in Tables 3 and 4). A total of 39,374 annotated features were detected with at least 1 mapped read across all samples. The majority of detected RNAs were protein coding with a mean fraction of 82% with a range from 65% to 89% (shown in Tables 3 and 4). The fraction of reads mapping to exons and the distribution of read depths were uniform across all sample groups.
[00116] Table 3: Pilot Set Quality Control Data
[00118] It was determined whether cfRNA profiles can distinguish HCC and MM from NC donors. Figs. 1A and IB show the results of an unbiased Principal Component Analysis (PCA) using the top 500 genes where the largest variance across all samples through pairwise comparison showed separation of HCC and MM cfRNA profiles from that of non-cancer
donors. A differential expression (DE) analysis of pairwise comparison between individual cancer types with respect to NC donors using DEseq2 yielded 110, and 12 differentiating genes (adjusted p-value < 0.01) for MM and HCC, respectively (shown below in Tables 5-8 and Fig 12). [00119] Table 5: Top DE genes Pairwise HCC vs. Healthy Donor (HD)
[00122] Table 8: Top DE Genes Pairwise MGUS vs. Healthy Donor (HD)
[00123] To confirm the significance of the differential expression results for each pairwise comparison of cancer to NC donors, a permutation test was performed in which differential expression analysis between two groups of randomized samples was compared. Permutations of random sample shuffling in each pair with 500 rounds resulted in zero significant differentiating genes (padj < 0.01) in more than 95% and 94% of permutations for each pair comparing MM, and HCC to non-cancer donors, respectively (shown below in Tables 9A-C, 10A-C, and Fig. 13).
[00124] Gene ontology analysis revealed that MM up-regulated genes were enriched for oxygen transport and gas transport. In HCC, the up-regulated gene set was enriched for
plasminogen activation. This data collectively indicates the separation of cfRNA profiles in HCC and MM compared to NC donors.
2. Validation of cfRNA Biomarkers
[00125] To further explore the potential of cell-free RNA for cancer detection, Linear Discriminant Analysis (LDA) and a Random Forest (RF) algorithm were applied to find combinations of discriminating genes to separate cancer from non-cancer individuals. Two independent methods were used to identify specific input gene lists for the classifying algorithms. First, discriminating genes using DESeq2 analysis with False Discovery Rate (FDR/adjusted p-value) < 0.01 (shown in Tables 5-8) were used as one feature set (DE gene set). Second, the learning vector quantization (LVQ) method was implemented to find the most important features that distinguished the two groups and selected the top 10 as another feature set (LVQ gene set) (shown below as Tables 11-17).
[00126] Table 11: List of Genes Used for Linear Discriminant Analysis shown in Figs. 1C and 2A; Top 10 Genes Differentiating HCC and MM from NC.
[00127] Table 12: List of Genes Used for Linear Discriminant Analysis shown in Figs. 6-8 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between MGUS and NC Determined Using Learning Vector Quantization Algorithm.
[00128] Table 13: List of Genes Used for Linear Discriminant Analysis shown in Figs. 6-8 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between MM and NC Determined Using Learning Vector Quantization Algorithm.
[00129] Table 14: List of Genes Used for Linear Discriminant Analysis shown in Figs. 6-8 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between MM and NC, and between MGUS and NC, Determined Using Learning Vector Quantization Algorithm.
[00130] Table 15: List of Genes Used for Linear Discriminant Analysis shown in Figs. 9-11 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between Cirr. and NC Determined Using Learning Vector Quantization Algorithm.
[00131] Table 16: List of Genes Used for Linear Discriminant Analysis shown in Figs. 9-11 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between HCC and NC Determined Using Learning Vector Quantization Algorithm.
[00132] Table 16: List of Genes Used for Linear Discriminant Analysis shown in Figs. 9-11 and LOOCV; Top 10 Genes Differentiating Pairwise Comparisons between HCC and NC as well as between Cirr. and NC Determined Using Learning Vector Quantization Algorithm.
[00133] The linear combination for each gene set by LDA showed significant separation between HCC and MM from NC donors with p-value of 6.7x10-8, 6.7x10-10 and 6.4x10-7, 6.4x10-7 using the DE and top 10 LVQ gene sets, respectively (as shown in Fig. 1C). The Random Forest (RF) method was further employed to develop orthogonal classification models. The area under the receiver operating characteristic (ROC) curve (AUC) is higher than 0.92 in both LDA and RF models with both DE and LVQ feature sets for the two cancer types (as shown in Figs. 2A and 2B).
[00134] The leave-one-out cross validation (LOOCV) method was employed to evaluate the significance and accuracy of the classification models. Briefly, in LOOCV, one sample was iteratively removed for testing, with the remaining samples used for training by the LDA or RF algorithms to create a classifying model. LDA or RF algorithms classified each left out sample based on these training models. The test was repeated until all individual samples were classified and cross-validated. Both LDA and RF algorithms were trained on the described DE and LVQ gene sets, resulting in four classification models (as shown in Figs. 2A-2C). Classifying MM from non-cancer donors yielded greater than 90% accuracy (27/30) for all four models tested. HCC was correctly differentiated from NC donors with accuracies of 100%
(28/28) and 93% (26/28) when using the LDA method or 96% (27/28) and 96% (27/28) when using the RF method with LVQ and DE feature sets, respectively. Overall, the LOOCV test confirmed that the biomarker sets determined by DESeq2 and LVQ methods, combined with our classification models using LDA and RF algorithms, are statistically significant. LVQ gene sets yielded higher accuracy for both cancer types and were used as the feature sets for further validation.
[00135] A primer panel for amplifying the LVQ genes was designed to validate the sequencing data by quantitative reverse transcription PCR (RT-qPCR). RT-qPCR results from the pilot sample set were consistent with the sequencing data with a Pearson correlation coefficient > 0.77 and a p-value of 2.2x10-16 (as shown in Fig. 4). It was confirmed that the differential level of cfRNA transcripts of genes identified by the LVQ algorithm (HBG1, HBG2, NUSAP1, for MM and C3, CP, FGA, FGB for HCC) from RNA-sequencing was also observed with RT-qPCR (as shown in Fig. 5). Table 17 provides forward and reverse primers delineated by LVQ gene target. [00136] Table 17: List of Forward and Reverse Primer Sequences for Amplifying LVQ
[00137] Amplification Parameters for a RT-qPCR assay were configured to pre-amplify products using SEQ ID NO: 1 through SEQ ID NO: 42. Template RNA was mixed with Superscript III One-step RT-PCR system with Platinum Taq DNA polymerase kit (Invitrogen Corp.; 1600 Faraday Ave., PO Box 6482, Carlsbad, CA, 92008, USA; Cat. No. 12574026) and SEQ ID NOs: 1-42 to generate cDNA according to the kit’s product-insert protocol. PCR amplification products were treated with Exonuclease I to digest single stranded primers at 37°C for 30 min followed by inactivation of enzymes at 80°C for 15 min. For RT-qPCR, cDNA from the preamplification was diluted 1:80 and set-up in 96-well plates with SsoFast EvaGreen supermix (BioRad, Inc.;1000 Alfred Nobel Dr., Hercules, CA, 94547, USA; Cat. No. 1725200) with low ROX with the individual primer pairs at lOpM each. QuantStudio 7 Flex (Applied Biosystems, LLC; 180 Oyster Point Blvd., San Francisco, CA, 94080, USA; Cat. No. 4485701) was used to run RT-qPCR assay according to manufacturer’s recommended cycling conditions.
The delta Ct of a target gene was calculated by subtracting the Ct of a control gene (such as either GAPDH or ACTB).
[00138] RT-qPCR results from the pilot sample set were consistent with the sequencing data with a Pearson correlation coefficient > 0.77 and a p-value of 2.2xl0 16 (as shown in Fig. 3). It was confirmed that the differential level of cfRNA transcripts of genes identified by the LVQ algorithm (HBG1, HBG2, NUSAP1, for MM and C3, CP, FGA, FGB for HCC) from RNAseq was also observed with RT-qPCR (as shown in Fig. 3).
[00139] To confirm that the feature sets and classification models defined in the pilot cohort were robust and generalizable, a set of independent validation samples was collected from 10 NC controls, 9 MM patients, and 20 HCC patients to validate if the feature sets and classification models defined in our pilot cohort were robust and generalizable (shown in Table 1 and Table 2). cfRNA biomarkers identified from the pilot set in-silico were validated by measuring the classification accuracy of this independent sample set on the models trained on the pilot dataset using the LVQ gene sets. The linear combination by LDA identified in the pilot cohort of the LVQ feature set showed significant separation in the validation sample set between MM and HCC from NC donors, consistent with the previous findings (shown in Fig.
3). Furthermore, both LDA and RF models trained on the pilot cohort with this same feature set were able to classify cancer from NC controls from the validation cohort, with an AUC > 0.86 and 0.9 when classifying non-cancer donors from MM and HCC, respectively (shown in Fig 3). [00140] This cfRNA classification model performed well for early and late stages in the pilot set. In the validation sample set, the model displayed a stage-dependent discrimination. It was validated with an AUC of 0.74 for stage A in HCC (see Figs. 14 and 15) and an AUC of 0.64 for stage I in MM (see Fig. 15). For later stages, the model achieved a higher AUC of 0.91 for stages B and C in HCC (see Figs. 14) and 0.83 for stages II and III in MM (see Fig. 15) in the validation sample set. This stepwise increase in discrimination suggests that these biomarkers become more prevalent with cancer progression. HCC classification also showed significant discrimination compared to NC for different etiologies, and both HCC and MM showed discrimination for males and females (as shown in Figs. 16 and 17) and are not agedependent (as shown in Figs. 16 and 17) in our pilot and validation sample sets.
3. cfRNA Profiles Distinguished Multiple Myeloma from Its Premalignant Condition: MGUS. and MGUS from Non-cancer
[00141] Disclosed herein are methods of utilizing cfRNA to distinguish MM from MGUS, MM from non-cancer, and MGUS from non-cancer in individuals. It was next examined whether cfRNA profdes were able to recapitulate the transition from a pre-cancerous condition to a cancerous one, and distinguish between them. The hypothesis was tested on multiple myeloma (MM) as it has a well-defined pre-cancerous condition: MGUS. The top ten most significant genes that discriminate MM from non-cancer donors as identified by UVQ displayed a gradual transition in cfRNA level from the non-cancer donors through MGUS to MM Among these ten most significant genes, seven genes (CAI, EPB42, HBG1, HBG2, CENPE, CPOX, EPB42, NEK2 and NUSAP1) have higher expression in bone marrow, where cancerous plasma cells accumulate, compared to other tissue and cell types in publicly available data from the Human Protein Atlas [47, 48] . Three out of the ten most important genes resulting from the LVQ analysis are related to cell cycle processes: Centromere protein E (CENPE), a kinesin-like motor protein that accumulates in the G2 phase of the cell cycle and is highly expressed in bone marrow [49, 50]; Serine/threonine-protein kinase (NEK2), which is involved in mitotic regulation [50, 51]; and Nucleolar and spindle associated protein 1 (NUSAP1), a nucleolar- spindle-associated protein that plays a role in spindle microtubule organization [52], [00142] An LDA plot using a combination of the top 10 LVQ genes from pairwise comparisons MM - NC, and MGUS - NC displayed the separation of all three groups (shown in Fig. 8). A RF model using the top 10 most important LVQ genes from MGUS - NC pairwise comparison yielded an accuracy of 88.6% (20/20 non-cancer donors and 6/9 MGUS patients). Classification of MM from MGUS yielded an accuracy of 89.5% (8/9 MGUS and 9/10 MM) using LOOCV with the RF classification method using the top 10 most important genes from LVQ analysis of MM versus NC comparison as a feature set. The 3-group classification resulted in an accuracy of 82% (19/20 NC, 4/9 MGUS and 9/10 MM) defined by LOOCV using the RF method with the feature set composed of the combination of the top 10 LVQ genes from the comparison MM versus non-cancer and MGUS versus non-cancer donors.
4. cfRNA Profiles Distinguish Liver Cancer from Its Pre-Malignant Condition, Cirrhosis, and Cirrhosis from Non-cancer
[00143] Next it was asked if a solid tumor such as HCC could be distinguished from its pre- cancerous condition, Cirr. Among the top ten most important genes that discriminate HCC from NC identified by the LVQ analysis, five genes also significantly differentiate HCC from
Cirr. Interestingly, 8 out of the top 10 genes are expressed specifically in the liver and the corresponding proteins are secreted into the blood [47, 48], Apolipoprotein E (APOE) binds to specific liver and peripheral cell receptors and is essential for normal catabolism of triglyceride- rich lipoprotein constituents [53], Complement C3 (C3) is synthesized in the liver and secreted to the plasma and is involved in both innate and adaptive immune responses [54], Ceruloplasmin (CP) is a secreted plasma metalloprotein from the liver that binds copper in the plasma and is involved in the peroxidation of Fe(II) transferrin to Fe(III) transferrin [55], 24- dehydrocholesterol reductase DHCR24 catalyzes the reduction of sterol intermediates [56], Fibrinogen Alpha Chain (FGA), Fibrinogen Beta Chain (FGB) and Fibrinogen Gamma Chain (FGG) encode the coagulation factor fibrinogen, which is a component of blood clotting [57], Histidine Rich Glycoprotein (HRG) is a plasma glycoprotein that binds heparin sulfate on the surface of the liver, lung, kidney and heart endothelial cells [58], [00144] Skilled persons will understand that current practices for HCC surveillance include screening on Cirr patients using imaging techniques, such as ultrasound, computerized tomography (CT) and magnetic resonance imaging (MRI). These methods are expensive and can have limited accessibility [5], In addition, detection of Cirr is mostly based on clinical symptoms which are often from complications displayed at later stages of the disease [59], Therefore, easy-to-use, reliable and specific biomarkers with accompanying prediction models are needed to improve detection of both HCC and Cirr.
[00145] Disclosed herein are methods of utilizing cfRNA to distinguish HCC from Cirr and Cirr from NC individuals. An LDA plot using the feature set comprised of a combination of the top 10 LVQ genes identified for the pairwise comparisons of HCC - NC and Cirr - NC, shows a distinct separation between these groups (shown in Fig. 11). RF methods using the top 10 important genes from Cirr - NC pairwise comparisons yielded 100% accuracy in classifying Cirr from NC samples using LOOCV (shown in Figs. 9-11). Classification of HCC from Cirr also yielded 100% accuracy using LOOCV with RF (as shown in Figs. 9-11). It was attempted to classify three classes including NC, Cirr, and HCC in one model. The 3-group classification resulted in 90.6% accuracy using LOOCV with RF (as shown in Figs. 9-11).
5. Discussion
[00146] cfRNA was sequenced from patients having two cancer types: one solid (HCC), and the other hematologic (MM) and their respective pre-cancerous conditions: Cirr and MGUS,
respectively, and from NC donors. Both cancer types can be distinguished from non-cancer controls and pre-cancerous conditions using their cfRNA profdes. To differentiate each cancer type from non-cancer individuals, the combination of ten genes identified by learning vector quantization (LVQ) analysis in each pairwise comparison yields higher accuracy compared to the use of a larger set of differentiating genes as evaluated by leave one out cross validation (LOOCV). Two classification models built on linear discriminant analysis (LDA) and the random forest (RF) algorithm resulted in similar classification performance in each pairwise comparison of cancer to healthy donors. RT-qPCR confirmation for a panel of selected biomarkers was consistent with the sequencing data. Plasma cfRNA biomarkers identified from the sequencing data were further validated in an independent sample cohort. In some embodiments, use of a small gene panel potentially enables a cost-effective assay for pan-cancer detection that might be performed in a clinical environment, such as a doctor’s office, that can be useful in broad clinical applications, including the detection and diagnosis of cancer or a predisposition to cancer.
[00147] To date, most investigations into the potential of blood-based methods for cancer detection have only focused on distinguishing cancers from healthy controls [15, 22, 25, 26, 28, 36], However, many cancer types have etiologies associated with precursor states such as MGUS for MM and Cirr for HCC. Disclosed herein is that cfRNA profiles can recapitulate the transition from a pre-cancerous condition to cancer, including for both solid and hematologic cancers. In some embodiments, the disclosed method comprises cfRNA panels containing a small number of genes may be useful for distinguishing cancers from pre-malignant conditions and precursors from healthy individuals, thus, facilitating cost-effective screening strategies for early cancer detection during routine exams in high-risk patients within the general population. [00148] Liver and bone marrow have been reported to contribute heavily to the abundance of cell-free nucleic acids in plasma [42, 45, 46], This may explain the source of cfRNA biomarkers found in these cancer types. In HCC, eight out of the top ten genes used in the classification model are specifically synthesized in the liver and encode secreted proteins found in blood that mediate plasminogen activation and fibrinolysis processes. In MM, seven out of ten genes among the cfRNA biomarkers have relatively high expression in bone marrow compared to other tissue and cell types and are related to cell cycle processes. These findings indicate that the identified cfRNA biomarkers likely originate from the tissue of origin of the tumor.
[00149] In some embodiments, the disclosed method may be used to profile cell-free mRNA to establish a platform for longitudinal monitoring of disease progression (e.g., monitoring a pre-malignant condition as progresses to cancer) across multiple cancers. In some embodiments, the disclosed method may be used as an panel or assay that measure transcript levels of mRNA in plasma for a small panel of genes that can differentiate cancer from pre- malignant conditions and otherwise healthy donors. As disclosed herein, organ-specific mRNA transcripts were identified as biomarkers that indicate the tissue of origin for the tumor. In some embodiments, detecting the level of these cell-free plasma RNA biomarkers in a sample from a subject by the disclosed method may be combined with other nucleic acids-based and protein-based approaches for potentially increased diagnostic sensitivity and specificity. For example, abnormal liver enzyme levels detected in the blood (indicative of cirrhosis) combined with measurement of AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, or any combination thereof, may increase the diagnostic sensitivity and specificity of diagnosing cirrhosis. In another example, elevated levels of monoclonal protein (M protein) detected in a urine sample (indicative of kidney damage related to MGUS) combined with measurement of cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, may increase the diagnostic sensitivity and specificity of diagnosing MGUS.
6. Methods
[00150] Patient Samples — Blood samples from non-cancer donors and patients with monoclonal gammopathy of undetermined significance (MGUS), multiple myeloma, liver cirrhosis, and liver cancer were obtained from Oregon Health and Science University (OHSU) by Knight Cancer Institute Biolibrary and Oregon Clinical and Translational Research Institute (OCTRI). All samples were collected under institutional review board (IRB) approved protocols with informed consent from all participants for research use. Individuals who had no recorded previous history of cancer were considered to be non-cancer donors.
[00151] All samples were collected and processed using a uniform protocol by the same staff at Oregon Health and Science University. Samples for analysis were matched between cancer and control groups with respect to age and gender of participants. The clinical information regarding study participants are given in Table 1 and Table 2.
[00152] Processing of Whole Blood — For all cohorts, whole blood samples were collected in EDTA-anticoagulated vacutainers. Within 2 hours of collection, blood samples were first centrifuged at 1,000g for 10 minutes at 4°C followed by 15,000g for 10 min at 4°C. Plasma was then stored at -80°C until RNA isolation.
[00153] cfRNA Isolation — Total RNA purification was performed by using plasma/serum circulating and exosomal RNA purification kit (Norgen Biotek) from 3ml of human plasma according to the manufacturer’s protocol. To digest trace amounts of contaminating DNA, RNA was treated with 10X Baseline-ZERO DNase. DNase I treated RNA samples were purified and further concentrated using RNA clean and concentrator-5 (Zymo Research) according to the manufacturer’s manuals. Final eluted RNA was stored immediately at -80°C.
[00154] Library Preparation — Stranded RNA-Seq libraries were prepared using Clontech SMARTer stranded total RNA-seq kit v2- pico input mammalian (Takara Bio) according to the manufacturer’s instructions. For cDNA synthesis, option 2 was used (without fragmentation), starting from highly degraded RNA. Input of 7ul of RNA samples were used to generate cDNA libraries suitable for next-generation sequencing. For addition of adapters and indexes, the SMARTer RNA unique dual index kit -96 U was employed. SMARTer RNA unique dual index of each 5 ’ and 3 ’ PCR primer were added to each sample to distinguish pooled libraries from each other. The amplified RNA-seq library was purified by immobilization onto AMPure XP PCR purification system (Beckman Coulter). The library fragments originated from rRNA and mitochondrial rRNA were treated with ZapR v2 and R-Probes according to manufacturer’s protocols. For final RNA-seq library amplification, 16 cycles of PCR were performed and final 20 ul was eluted in Tris buffer following amplified RNA-seq library purification. The amplified RNA-seq library was stored at -20°C prior to sequencing.
[00155] Sequencing Data Processing and Quality Control — Each sample was sequenced to more than 20 million paired-end reads using an Illumina Nextseq or HiSeq sequencer. Adapter sequences were trimmed using sickle tool [60], After trimming, the quality of the reads were checked using FastQC (vO.l 1.7) [61, 62] and RSeQC (v2.6.4) [63], Reads were aligned to the hg38 human genome using the STAR aligner (v2.5.3a) [64] with two pass mode flag. Duplicated reads were removed using the picard tool (v 1.119) [65], Read counts for each gene were calculated using the htseq-count tool (vO. 11.2) [66] in intersection-strict mode. The
number of mapped reads to each gene were normalized to the total number of reads in the whole transcriptome (Reads Per Million - RPM). For each sample, exon, intron, intergenic fractions and protein coding fractions (CDS exons) were calculated using RSeQC [67], Samples with an exon fraction larger than 0.35 were kept for further analysis. Each sample was sequenced to more than 20 million paired-end reads using an Illumina Nextseq or HiSeq sequencer. Adapter sequences were trimmed using sickle tool [60], After trimming, the quality of the reads were checked using FastQC (vO.l 1.7) [61, 62] and RSeQC (v2.6.4) [63], Reads were aligned to the hg38 human genome using the STAR aligner (v2.5.3a) [64] with two pass mode flag. Duplicated reads were removed using the picard tool (v 1.119) [65], Read counts for each gene were calculated using the htseq-count tool (vO. 11.2) [66] in intersection-strict mode. The number of mapped reads to each gene were normalized to the total number of reads in the whole transcriptome (Reads Per Million - RPM). For each sample, exon, intron, intergenic fractions and protein coding fractions (CDS exons) were calculated using RSeQC [67], Samples with an exon fraction larger than 0.35 were kept for further analysis. Each sample was sequenced to more than 20 million paired-end reads using an Illumina Nextseq or HiSeq sequencer. Adapter sequences were trimmed using sickle tool [60], After trimming, the quality of the reads were checked using FastQC (vO.l 1.7) [61, 62] and RSeQC (v2.6.4) [63], Reads were aligned to the hg38 human genome using the STAR aligner (v2.5.3a) [64] with two pass mode flag. Duplicated reads were removed using the picard tool (v 1.119) [65], Read counts for each gene were calculated using the htseq-count tool (vO. 11.2) [66] in intersection-strict mode. The number of mapped reads to each gene were normalized to the total number of reads in the whole transcriptome (Reads Per Million - RPM). For each sample, exon, intron, intergenic fractions and protein coding fractions (CDS exons) were calculated using RSeQC [67], Samples with an exon fraction larger than 0.35 were kept for further analysis.
[00156] Identification of cfRNA Biomarkers (DESeq and LVQ and GO analysis) — Two independent methods were applied to select cfRNA features for building classification models. Differentiating genes between all pairwise comparisons were identified with the R package DESeq2 (vl.24.0) using the Wald test [68], The second method for feature selection using the LVQ algorithm built in an R package caret (v6.0-84) - with 10 fold cross validation repeated 3 times [69], The top 10 most important features were selected by ranking the varlmp parameter. GO analysis was implemented on the top differentiating genes from the DESeq2 analysis with
padj > 0.01 using the package topGO (v2.37.0) and a Fischer statistical test to measure significant enrichment of each Gene Ontology term [70],
[00157] Cancer Type Classification (LDA and RF) — Two methods were used to build models for classifying cancer types using feature sets identified from pairwise comparison using DESeq2 and LVQ methods. LDA models were built using the R package MASS (v7.3-51.4) [71], Random Forest models were built using the R package randomForest (v4.6-14) [72], [00158] Statistical Consideration (Permutation Test and Leave One Out Cross Validation) — - To test if the difference in pairwise comparison between each cancer type and healthy control was specific, a permutation test in which differential expression analysis using DESeq2 package was performed between two groups of randomized samples. For each pair, 500 permutations of random shuffling were performed and the number of differentiating genes with padj < 0.01 were documented for building a histogram, and compared to the number of significant genes (padj < 0.01) for the group with correct labeling. To determine the significance and accuracy of our classification models, the LOOCV method was employed. Briefly, in LOOCV, LDA or RF algorithms classified each sample based on the training models obtained from all other samples (total number of samples in each pair minus the testing sample). The test was repeated until all individual samples were classified and cross validated.
[00159] Tissue Specificity of LVQ Feature Sets Using Publicly-A vailable Databases — To evaluate whether the LVQ gene sets were tissue specific to the tissue-of-origin (TOO), publicly available average tissue-level expression values (transcripts per million; TPMs) were downloaded from the Human Protein Atlas (ref: www.proteinatlas.org/about/download). The methodology used to normalize and calculate average expression values can be found here: www.proteinatlas.org/about/assays+annotation#hpa_ma. This matrix of counts values were then sub-setted for the two gene sets (top 10 LVQ for MM versus non-cancer, and top 10 LVQ for HCC versus non-cancer), and a z-score was calculated across tissue types to evaluate which tissue types the genes were enriched in. Next, a heatmap of this transformed matrix was generated using ComplexHeatmap (v2.4.3).
[00160] Data Availability — Data and materials availability: cfRNA sequencing data have been deposited in the Sequence Read Archive (SRA).
[00161] Code Availability — In-house scripts used in this manuscript, which includes data processing, downstream analysis and the scripts used to generate figures are publicly available on github: github.com/ohsu-cedar-comp-hub/cfRNA-seq-pipeline-Ngo-manuscript-2019
[00162] Table 18: Linear Discriminant Analysis results for MGUS versus NC.
[00163] Table 19: Linear Discriminant Analysis results for MGUS versus MM.
[00164] Table 20: Linear Discriminant Analysis results for NC versus MGUS versus
[00165] Table 21: Linear Discriminant Analysis results for NC versus Cirr.
[00166] Table 22: Linear Discriminant Analysis results for Cirr. Versus HCC
REFERENCES
[00168] [1] SEER Cancer Stat Facts: Liver and Intrahepatic Bile Duct Cancer. National Cancer Institute. Bethesda, MD. 2018; [2] Howlader N, N.A., Krapcho M, Miller D, Brest A, Yu M, Ruhl J, Tatalovich Z, Marietta A, Lewis DR, Chen HS, Feuer EJ, Cronin KA SEER Cancer Statistics Review, 1975-2016, National Cancer Institute. Bethesda, MD; [3] Kyle, R.A. and S.V. Rajkumar, Management of monoclonal gammopathy of undetermined significance (MGUS) and smoldering multiple myeloma (SMM). Oncology (Williston Park), 2011. 25(7): p. 578-86; [4] Dhodapkar, M.V., MGUS to myeloma: a mysterious gammopathy of underexplored significance. Blood, 2016. 128(23): p. 2599; [5] Llovet, J.M., et al., Hepatocellular carcinoma. Nat Rev Dis Primers, 2016. 2: p. 16018; [6] Fateen, W. and S.D. Ryder, Screening for hepatocellular carcinoma: patient selection and perspectives. J Hepatocell Carcinoma, 2017. 4: p. 71-79; [7] Starr, S.P. and D. Raines, Cirrhosis: diagnosis, management, and prevention. Am Fam Physician, 2011. 84(12): p. 1353-9; [8] Laursen, L., A preventable cancer. Nature, 2014. 516: p. S2; [9] Goh, G.B., P.E. Chang, and C.K. Tan, Changing epidemiology of hepatocellular carcinoma in Asia. Best Pract Res Clin Gastroenterol, 2015. 29(6): p. 919-28; [10] Wong, V.W.,
et al., Clinical scoring system to predict hepatocellular carcinoma in chronic hepatitis B carriers. J Clin Oncol, 2010. 28(10): p. 1660-5; [10] Yang, H.I., et al., Risk estimation for hepatocellular carcinoma in chronic hepatitis B (REACH-B): development and validation of a predictive score. Lancet Oncol, 2011. 12(6): p. 568-74; [11] Bai, Y. and H. Zhao, Liquid biopsy in tumors: opportunities and challenges. Ann Transl Med, 2018. 6(Suppl 1): p. S89; [12] Palmirotta, R., et al., Liquid biopsy of cancer: a multimodal diagnostic tool in clinical oncology. Ther Adv Med Oncol, 2018. 10: p. 1758835918794630; [13] Marrugo-Ramirez, J., M. Mir, and J. Samitier, Blood-Based Cancer Biomarkers in Liquid Biopsy: A Promising Non-Invasive Alternative to Tissue Biopsy. Int J Mol Sci, 2018. 19(10); [14] Esposito, A., et al., Liquid biopsies for solid tumors: Understanding tumor heterogeneity and real time monitoring of early resistance to targeted therapies. Pharmacol Ther, 2016. 157: p. 120-4; [15] Sundling, K.E. and A.C. Lowe, Circulating Tumor Cells: Overview and Opportunities in Cytology. Adv Anat Pathol, 2019. 26(1): p. 56-63; [16] Millner, L.M., M.W. Linder, and R. Valdes, Jr., Circulating tumor cells: a review of present methods and the need to identify heterogeneous phenotypes. Ann Clin Lab Sci, 2013. 43(3): p. 295-304; [17] Thiele, J.A., et al., Circulating Tumor Cells: Fluid Surrogates of Solid Tumors. Annual Review of Pathology: Mechanisms of Disease, 2017. 12(1): p. 419- 447; [18] Liu, Y. and X. Cao, The origin and function of tumor-associated macrophages. Cellular And Molecular Immunology, 2014. 12: p. 1; [19] Adams, D.L., et al., Circulating giant macrophages as a potential biomarker of solid tumors. Proceedings of the National Academy of Sciences, 2014. 111(9): p. 3514; [20] Gast, C.E., et al., Cell fusion potentiates tumor heterogeneity and reveals circulating hybrid cells that correlate with stage and survival. Science Advances, 2018. 4(9): p. eaat7828; [21] Newman, A.M., et al., An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med, 2014. 20(5): p. 548- 54; [22] Corcoran, R.B. and B.A. Chabner, Application of Cell-free DNA Analysis to Cancer Treatment. N Engl J Med, 2018. 379(18): p. 1754-1765; [23] Abbosh, C., et al., Phylogenetic ctDNA analysis depicts early-stage lung cancer evolution. Nature, 2017. 545(7655): p. 446-451; [24] Best, M.G., et al., RNA-Seq of Tumor-Educated Platelets Enables Blood-Based PanCancer, Multiclass, and Molecular Pathway Cancer Diagnostics. Cancer Cell, 2015. 28(5): p. 666-676; [25] Best, M.G., P. Wesseling, and T. Wurdinger, Tumor-Educated Platelets as a Noninvasive Biomarker Source for Cancer Detection and Progression Monitoring. Cancer Res, 2018. 78(13): p. 3407-3412; [26] In, S.G.J.G. t Veld, and T. Wurdinger, Tumor-educated platelets. Blood, 2019: p. blood-2018-12-852830; [27] Cohen, J.D., et al., Detection and
localization of surgically resectable cancers with a multi -analyte blood test. Science, 2018. 359(6378): p. 926; [28] Abbosh, C., N.J. Birkbak, and C. Swanton, Early stage NSCLC - challenges to implementing ctDNA-based screening and MRD detection. Nat Rev Clin Oncol, 2018. 15(9): p. 577-586; [29] Haque, LS. and O. Elemento, Challenges in Using ctDNA to Achieve Early Detection of Cancer. bioRxiv, 2017: p. 237578; [30] Salta, S., et al., A DNA Methylation-Based Test for Breast Cancer Detection in Circulating Cell-Free DNA. J Clin Med, 2018. 7(11); [31] Xu, R.-h., et al., Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma. Nature Materials, 2017. 16: p. 1155; [32] Song, C - X., et al., 5-Hydroxymethylcytosine signatures in cell-free DNA provide information about tumor types and stages. Cell Research, 2017. 27: p. 1231; [33] Shen, S.Y., et al., Sensitive tumour detection and classification using plasma cell-free DNA methylomes. Nature, 2018. 563(7732): p. 579-583; [34] Moss, J., et al., Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nature Communications, 2018. 9(1): p. 5068; [35] Cristiano, S., et al., Genome-wide cell-free DNA fragmentation in patients with cancer. Nature, 2019. 570(7761): p. 385-389; [36] Liu, M.C., et al., Sensitive and specific multi-cancer detection and localization using methylation signatures in cell-free DNA. Ann Oncol, 2020. 31(6): p. 745-759; [37] Chen, X., et al., Non-invasive early detection of cancer four years before conventional diagnosis using a blood test. Nature Communications, 2020. 11(1): p. 3475; [38] Gemmell, C.H., Activation of platelets by in vitro whole blood contact with materials: increases in microparticle, procoagulant activity, and soluble P-selectin blood levels. J Biomater Sci Polym Ed, 2001. 12(8): p. 933-43; [39] Heitzer, E., et al., Current and future perspectives of liquid biopsies in genomics-driven oncology. Nature Reviews Genetics, 2019. 20(2): p. 71-88; [40] Wan, J.C.M., et al., Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nature Reviews Cancer, 2017. 17: p. 223; [41] Koh, W., et al., Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proceedings of the National Academy of Sciences, 2014: p. 201405528; [42] Pan, W., et al., Simultaneously Monitoring Immune Response and Microbial Infections During Pregnancy through Plasma cfRNA Sequencing. Clinical Chemistry, 2016: p. clinchem.2017.273888; [43] Ngo, T.T.M., et al., Noninvasive blood tests for fetal development predict gestational age and preterm delivery. Science, 2018. 360(6393): p. 1133; [44] Larson, M.H., et al., A comprehensive characterization of the cell-free transcriptome reveals tissue- and subtype -specific biomarkers for cancer detection. Nature Communications, 2021. 12(1): p.
2357; [45] Ibarra, A., et al., Non-invasive characterization of human bone marrow stimulation and reconstitution by cell-free messenger RNA sequencing. Nature Communications, 2020. 11(1): p. 400; [46] The Genotype-Tissue Expression (GTEx) project. Nat Genet, 2013. 45(6): p. 580-5; [47] The Human Protein Atlas; [48] Sardar, H.S. and S.P. Gilbert, Microtubule capture by mitotic kinesin centromere protein E (CENP-E). J Biol Chem, 2012. 287(30): p. 24894-904; [49] Uhlen, M., et al., Proteomics. Tissue-based map of the human proteome. Science, 2015. 347(6220): p. 1260419; [50] Fry, A.M., The Nek2 protein kinase: a novel regulator of centrosome structure. Oncogene, 2002. 21(40): p. 6184-6194; [51] Mills, C.A., et al., Nucleolar and spindle-associated protein 1 (NUSAP1) interacts with a SUMO E3 ligase complex during chromosome segregation. J Biol Chem, 2017. 292(42): p. 17178-17189; [52] Srivastava, R.A., N. Bhasin, and N. Srivastava, Apolipoprotein E gene expression in various tissues of mouse and regulation by estrogen. Biochem Mol Biol Int, 1996. 38(1): p. 91-101; [53] Jia, Q., et al., Association between complement C3 and prevalence of fatty liver disease in an adult population: a cross-sectional study from the Tianjin Chronic Low-Grade Systemic Inflammation and Health (TCLSIHealth) cohort study. PLoS One, 2015. 10(4): p. e0122026; [54] Zeng, D.W., et al., Serum ceruloplasmin levels correlate negatively with liver fibrosis in males with chronic hepatitis B: a new noninvasive model for predicting liver fibrosis in HBV-related liver disease. PLoS One, 2013. 8(10): p. e77942; [55] Waterham, H.R., et al., Mutations in the 3beta- hydroxysterol Delta24-reductase gene cause desmosterolosis, an autosomal recessive disorder of cholesterol biosynthesis. Am J Hum Genet, 2001. 69(4): p. 685-94; [56] Fort, A., et al., A liver enhancer in the fibrinogen gene cluster. Blood, 2011. 117(1): p. 276-82; [57] Gram, J., et al., Plasma histidine-rich glycoprotein and plasminogen in patients with liver disease. Thromb Res, 1985. 39(4): p. 411-7; [58] Goodman, Z.D., Liver Biopsy Diagnosis of Cirrhosis, in Diagnostic Methods for Cirrhosis and Portal Hypertension, A. Berzigotti and J. Bosch, Editors. 2018, Springer International Publishing: Cham. p. 17-31; [59] Joshi NA, F.J., Sickle: A sliding- window, adaptive, quality-based trimming tool for FastQ files. Available at github.com/najoshi/sickle, 2011; [60] Leggett, R.M., et al., Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics. Frontiers in genetics, 2013. 4: p. 288-288; [61] Andrews, S., FastQC: a quality control tool for high throughput sequence data. 2010; [62] Wang, L., S. Wang, and W. Li, RSeQC: quality control of RNA-seq experiments.
Bioinformatics, 2012. 28(16): p. 2184-5; [63] Dobin, A., et al., STAR: ultrafast universal RNA- seq aligner. Bioinformatics, 2013. 29(1): p. 15-21; [64] Van der Auwera, G.A., et al., From
FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics, 2013. 43: p. 11.10.1-33; [65] Anders, S., P.T. Pyl, and W. Huber, HTSeq— a Python framework to work with high-throughput sequencing data.
Bioinformatics, 2015. 31(2): p. 166-9; [66] Wang, L., S. Wang, and W. Li, RSeQC: quality control of RNA-seq experiments. Bioinformatics, 2012. 28(16): p. 2184-2185; [67] Love, M.I., W. Huber, and S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol, 2014. 15(12): p. 550; [68] Kuhn, M., Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 2008. 28(5); [69] Alexa A, R.J., topGO: Enrichment Analysis for Gene Ontology. R package version 2.36.0., 2019; [71] Ripley, W.N.V.a.B.D., Modem Applied Statistics with S. 2002; and, [72] Wiener, A.L.a.M., Classification and Regression by randomForest. RNews, 2002. 2(3): p. 18-22.
[00169] It will be understood by those having skill in the art that many changes may be made to the details of the above-described embodiments without departing from the underlying principles of the invention. The scope of the present invention should, therefore, be determined only by the following claims.
Claims
Claims A method for detecting cancer or a predisposition for cancer in a biological sample obtained from a subject, the method comprising:
(a) analyzing a level of one or more cell-free RNA (cfRNA) biomarkers selected from a group comprising: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP IB 1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, in the biological sample; and
(b) performing a differential expression analysis comparing the level of each cfRNA biomarker selected in Step (a) to a corresponding control value (CV); in which differential expression shown by the differential expression analysis between the cfRNA biomarkers selected in Step (a) and corresponding CVs indicates cancer or a predisposition for cancer in the subject. The method of claim 1, wherein one or more of the cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or any combination thereof, are selected to indicate blood cancer or a predisposition to blood cancer. The method of claim 1, in which one or more of the cfRNA biomarkers: AIDA, CAI, CENPE, CPOX, ELL2, EPB42, HBG1, HBG2, NEK2, NUSAP1, or any combination thereof, are selected to indicate multiple myeloma (MM). The method of claim 3, wherein one or more of CENPE, HBG1, HBG2, NUSAP1, or any combination thereof are selected to indicate MM. The method of claim 4, wherein CENPE, HBG1, HBG2, and NUSAP1 are selected to indicate MM. The method of claim 1, in which one or more of the cfRNA biomarkers: FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2-AS2, BMX, CDC42BPA, KNL1, CACNA1A, or
any combination thereof, are selected to indicate monoclonal gammopathy of undetermined significance (MGUS).
7. The method of claim 6, wherein FPR3, SMC4, TXNDC16, ASPM, WRN, ZRANB2- AS2, BMX, CDC42BPA, KNL1, and CACNA1A are selected to indicate MGUS.
8. The method of claim 1, in which one or more of the cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, ATP1B1, ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate liver cancer or a predisposition to liver cancer.
9. The method of claim 1, in which one or more of the cfRNA biomarkers: APOE, C3, CP, DHCR24, FGA, FGB, FGG, HRG, IFITM3, and ATP IB 1, or any combination thereof, are selected to indicate hepatocellular carcinoma (HCC).
10. The method of claim 9, wherein one or more of the cfRNA biomarkers C3, CP, FGA, FGB, IFITM3, or any combination thereof are selected to indicate HCC.
11. The method of claim 10, wherein C3, CP, FGA, FGB, and IFITM3 are selected to indicate HCC.
12. The method of claim 1, in which one or more of the cfRNA biomarkers: ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH, or any combination thereof, are selected to indicate cirrhosis.
13. The method of claim 12, wherein ABCB7, HIST1H2BF, PSIP1, TMEM150C, ZC3H6, C9orfl6, CPQ, DYNC1I2, ECM1, and HIST1H2AH are selected to indicate cirrhosis.
14. The method of claim 1, in which the differential expression analysis is selected from the group of: (i) read-mapping analysis, (ii) Random Forest analysis , (iii) decision tree analysis, (iv) Linear Discriminant Analysis, (v) DESeq2 analysis, (vi) Bayesian modeling framework analysis, (vii) EMDomics analysis, (viii) Monocle2 analysis, (ix) Discrete distributional differential expression (D3E) analysis, (x) SINCERA analysis, (xi) edgeR analysis, (xii) DEsingle analysis, (xiii) SigEMD analysis, or any combination thereof.
15. The method of claim 1, in which the level of the one or more cfRNA biomarkers is measured by a method selected from the group of: a polymerase chain reaction (PCR), a quantitative PCR (qPCR), a reverse transcription PCR (rt-PCR), a complementary DNA (cDNA) synthesis, or a real-time PCR, or any combination thereof.
16. The method of claim 15, wherein the level of the one or more cfRNA biomarkers is measured by: a. performing a RT-PCR reaction comprising primer pairs for amplifying two or more of the cfRNA biomarkers, producing a pre-amplified pool of cDNAs; b. digesting the pre-amplified pool of cDNAs to remove single-stranded nucleic acids; and c. performing two or more qPCR reactions each comprising a single primer pair for amplifying a single cfRNA biomarker.
17. The method of Claim 1, further comprising analyzing a level of one or more of GAPDH, ACTB, or a combination thereof.
18. The method of claim 17, wherein the method uses the primer pair of SEQ ID NO: 1 and SEQ ID NO: 2, the primer pair of SEQ ID NO: 3 and SEQ ID NO: 4, or a combination thereof.
19. The method of any of Claims 1, in which the method uses one or more primer pairs selected from the primer pair of SEQ ID NO: 23 and SEQ ID NO: 24, the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32, the primer pair of SEQ ID NO: 33 and SEQ ID NO: 34, the primer pair of SEQ ID NO: 35 and SEQ ID NO: 36, the primer pair of SEQ ID NO: 37 and SEQ ID NO: 38, the primer pair of SEQ ID NO: 39 and SEQ ID NO: 40, the primer pair of SEQ ID NO: 41 and SEQ ID NO: 42, or any combination thereof. 0. The method of claim 1, in which the one or more primer pairs comprise the primer pair of SEQ ID NO: 25 and SEQ ID NO: 26, the primer pair of SEQ ID NO: 27 and SEQ ID NO: 28, the primer pair of SEQ ID NO: 29 and SEQ ID NO: 30, and the primer pair of SEQ ID NO: 31 and SEQ ID NO: 32.
The method of Claim 1, in which the method uses one or more primer pairs selected from the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12, the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14, the primer pair of SEQ ID NO: 15 and SEQ ID NO: 16, the primer pair of SEQ ID NO: 17 and SEQ ID NO: 18, the primer pair of SEQ ID NO: 19 and SEQ ID NO: 20, the primer pair of SEQ ID NO: 21 and SEQ ID NO: 22, , or any combination thereof. The method of claim 1, in which the one or more primer pairs comprise the primer pair of SEQ ID NO: 5 and SEQ ID NO: 6, the primer pair of SEQ ID NO: 7 and SEQ ID NO: 8, the primer pair of SEQ ID NO: 9 and SEQ ID NO: 10, the primer pair of SEQ ID NO: 11 and SEQ ID NO: 12 and the primer pair of SEQ ID NO: 13 and SEQ ID NO: 14. The method of claim 1, in which the biological sample is selected from the group of: a blood sample, a serum sample, a plasma sample, a urine sample, a tear sample, a saliva sample, a breast milk sample, a semen sample, a fecal sample, a cerebral spinal fluid sample, a tissue sample, or a cell sample.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263303970P | 2022-01-27 | 2022-01-27 | |
US63/303,970 | 2022-01-27 | ||
US202263426258P | 2022-11-17 | 2022-11-17 | |
US63/426,258 | 2022-11-17 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2023147445A2 true WO2023147445A2 (en) | 2023-08-03 |
WO2023147445A3 WO2023147445A3 (en) | 2023-10-19 |
Family
ID=87472701
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/061410 WO2023147445A2 (en) | 2022-01-27 | 2023-01-26 | Cell-free rna biomarkers for the detection of cancer or predisposition to cancer |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023147445A2 (en) |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040053248A1 (en) * | 2000-12-22 | 2004-03-18 | Tang Y. Tom | Novel nucleic acids and polypeptides |
CA2440747A1 (en) * | 2001-03-15 | 2002-09-26 | Hyseq, Inc. | Novel nucleic acids and polypeptides |
EP1583820A4 (en) * | 2003-01-14 | 2007-07-18 | Bristol Myers Squibb Co | Polynucleotides and polypeptides associated with the nf-kb pathway |
JP2006014723A (en) * | 2004-06-01 | 2006-01-19 | Sumitomo Chemical Co Ltd | Common marmoset-derived glyceraldehyde-3-phosphate dehydrogenase gene and use thereof |
US8586310B2 (en) * | 2008-09-05 | 2013-11-19 | Washington University | Method for multiplexed nucleic acid patch polymerase chain reaction |
EP2949760B1 (en) * | 2013-01-22 | 2019-04-24 | Otsuka Pharmaceutical Co., Ltd. | Quantification method for expression level of wt1 mrna |
EP3218503A4 (en) * | 2014-11-10 | 2018-06-06 | Murdoch Childrens Research Institute | Vectors and methods for targeted integration in loci comprising constitutively expressed genes |
WO2020092259A1 (en) * | 2018-10-29 | 2020-05-07 | Molecular Stethoscope, Inc. | Characterization of bone marrow using cell-free messenger-rna |
AU2021292521A1 (en) * | 2020-06-16 | 2022-12-08 | Grail, Llc | Methods for analysis of cell-free RNA |
-
2023
- 2023-01-26 WO PCT/US2023/061410 patent/WO2023147445A2/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2023147445A3 (en) | 2023-10-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Johansson et al. | Considerations and quality controls when analyzing cell-free tumor DNA | |
US20220033915A1 (en) | Gene expression panel for prognosis of prostate cancer recurrence | |
CN105518151B (en) | Identification and use of circulating nucleic acid tumor markers | |
JP2022023159A (en) | Urine biomarker cohorts, gene expression signatures, and methods of use thereof | |
US11015213B2 (en) | Method of preparing cell free nucleic acid molecules by in situ amplification | |
US20170298427A1 (en) | Nucleic acids and methods for detecting methylation status | |
KR20210014111A (en) | Size-tagged preferred end and orientation-cognition assays to determine the properties of cell-free mixtures | |
WO2018151601A1 (en) | Swarm intelligence-enhanced diagnosis and therapy selection for cancer using tumor- educated platelets | |
WO2014071279A2 (en) | Gene fusions and alternatively spliced junctions associated with breast cancer | |
CN113785076A (en) | Methods and compositions for predicting cancer prognosis | |
US20210087638A1 (en) | Next-generation sequencing assay for genomic characterization and minimal residual disease detection in the bone marrow, peripheral blood, and urine of multiple myeloma and smoldering myeloma patients | |
JP2022163076A (en) | Methods for cancer detection | |
Parsons et al. | Circulating plasma tumor DNA | |
Pisapia et al. | Next generation sequencing for liquid biopsy based testing in non-small cell lung cancer in 2021 | |
EP4004238A1 (en) | Systems and methods for determining tumor fraction | |
WO2009021338A1 (en) | Alternative splicing gene variants in cancer detection | |
Koessler et al. | Reliability of liquid biopsy analysis: an inter-laboratory comparison of circulating tumor DNA extraction and sequencing with different platforms | |
WO2019174004A1 (en) | System and method for determining lung cancer | |
JP6543253B2 (en) | Methods and kits for determining the quality of a library of DNA sequences obtained by genomic integrity and / or deterministic restriction enzyme site whole genome amplification | |
WO2014159425A1 (en) | Bladder cancer detection and monitoring | |
WO2014171800A1 (en) | Automatic system for early predicting and diagnosing prognosis of breast cancer | |
WO2023147445A2 (en) | Cell-free rna biomarkers for the detection of cancer or predisposition to cancer | |
JPWO2021092476A5 (en) | ||
US11845993B2 (en) | Methods for identifying prostate cancer | |
Beaver et al. | Circulating cell-free DNA for molecular diagnostics and therapeutic monitoring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23747871 Country of ref document: EP Kind code of ref document: A2 |