WO2024168196A1 - Systèmes et procédés de synthèse enzymatique de polynucléotides contenant des paires de bases nucléotidiques non standard - Google Patents
Systèmes et procédés de synthèse enzymatique de polynucléotides contenant des paires de bases nucléotidiques non standard Download PDFInfo
- Publication number
- WO2024168196A1 WO2024168196A1 PCT/US2024/015068 US2024015068W WO2024168196A1 WO 2024168196 A1 WO2024168196 A1 WO 2024168196A1 US 2024015068 W US2024015068 W US 2024015068W WO 2024168196 A1 WO2024168196 A1 WO 2024168196A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- standard
- base
- standard nucleotide
- nucleotide
- dna
- Prior art date
Links
- 125000003729 nucleotide group Chemical group 0.000 title claims abstract description 287
- 239000002773 nucleotide Substances 0.000 title claims abstract description 286
- 238000000034 method Methods 0.000 title claims abstract description 160
- 102000040430 polynucleotide Human genes 0.000 title claims description 45
- 108091033319 polynucleotide Proteins 0.000 title claims description 45
- 239000002157 polynucleotide Substances 0.000 title claims description 45
- 230000015572 biosynthetic process Effects 0.000 title description 45
- 238000003786 synthesis reaction Methods 0.000 title description 41
- 230000002255 enzymatic effect Effects 0.000 title description 13
- 238000012163 sequencing technique Methods 0.000 claims abstract description 62
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 claims abstract description 43
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 claims abstract description 43
- 108020004414 DNA Proteins 0.000 claims description 241
- 238000006243 chemical reaction Methods 0.000 claims description 158
- 102000053602 DNA Human genes 0.000 claims description 101
- 238000007672 fourth generation sequencing Methods 0.000 claims description 51
- 238000010801 machine learning Methods 0.000 claims description 38
- 229910052796 boron Inorganic materials 0.000 claims description 33
- 108091008146 restriction endonucleases Proteins 0.000 claims description 32
- 230000000295 complement effect Effects 0.000 claims description 30
- 239000001226 triphosphate Substances 0.000 claims description 30
- 229910052739 hydrogen Inorganic materials 0.000 claims description 26
- 239000001257 hydrogen Substances 0.000 claims description 26
- 238000003860 storage Methods 0.000 claims description 25
- 235000011178 triphosphate Nutrition 0.000 claims description 23
- 229910052718 tin Inorganic materials 0.000 claims description 22
- 108010017826 DNA Polymerase I Proteins 0.000 claims description 20
- 102000004594 DNA Polymerase I Human genes 0.000 claims description 20
- 229920001184 polypeptide Polymers 0.000 claims description 16
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 16
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 16
- 239000002243 precursor Substances 0.000 claims description 14
- UNXRWKVEANCORM-UHFFFAOYSA-N triphosphoric acid Chemical compound OP(O)(=O)OP(O)(=O)OP(O)(O)=O UNXRWKVEANCORM-UHFFFAOYSA-N 0.000 claims description 13
- 238000007385 chemical modification Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 9
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 claims description 8
- RYVNIFSIEDRLSJ-UHFFFAOYSA-N 5-(hydroxymethyl)cytosine Chemical compound NC=1NC(=O)N=CC=1CO RYVNIFSIEDRLSJ-UHFFFAOYSA-N 0.000 claims description 8
- VQAYFKKCNSOZKM-IOSLPCCCSA-N N(6)-methyladenosine Chemical compound C1=NC=2C(NC)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O VQAYFKKCNSOZKM-IOSLPCCCSA-N 0.000 claims description 8
- XYFCBTPGUUZFHI-UHFFFAOYSA-N Phosphine Chemical compound P XYFCBTPGUUZFHI-UHFFFAOYSA-N 0.000 claims description 8
- 230000004049 epigenetic modification Effects 0.000 claims description 8
- 229910052723 transition metal Inorganic materials 0.000 claims description 8
- 150000003624 transition metals Chemical class 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 239000005547 deoxyribonucleotide Substances 0.000 claims description 5
- 125000002637 deoxyribonucleotide group Chemical group 0.000 claims description 5
- COHVJBUINVIGOI-UHFFFAOYSA-N 4-amino-4-methyl-1,3-dihydropyrimidin-2-one Chemical group CC1(N)NC(=O)NC=C1 COHVJBUINVIGOI-UHFFFAOYSA-N 0.000 claims description 4
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 claims description 4
- OGHAROSJZRTIOK-KQYNXXCUSA-O 7-methylguanosine Chemical compound C1=2N=C(N)NC(=O)C=2[N+](C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OGHAROSJZRTIOK-KQYNXXCUSA-O 0.000 claims description 4
- VQAYFKKCNSOZKM-UHFFFAOYSA-N NSC 29409 Natural products C1=NC=2C(NC)=NC=NC=2N1C1OC(CO)C(O)C1O VQAYFKKCNSOZKM-UHFFFAOYSA-N 0.000 claims description 4
- DPOPAJRDYZGTIR-UHFFFAOYSA-N Tetrazine Chemical compound C1=CN=NN=N1 DPOPAJRDYZGTIR-UHFFFAOYSA-N 0.000 claims description 4
- 150000001336 alkenes Chemical group 0.000 claims description 4
- 150000001345 alkine derivatives Chemical group 0.000 claims description 4
- 150000001350 alkyl halides Chemical class 0.000 claims description 4
- 241000617156 archaeon Species 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 150000001540 azides Chemical class 0.000 claims description 4
- 229960002685 biotin Drugs 0.000 claims description 4
- 235000020958 biotin Nutrition 0.000 claims description 4
- 239000011616 biotin Substances 0.000 claims description 4
- ZPWOOKQUDFIEIX-UHFFFAOYSA-N cyclooctyne Chemical compound C1CCCC#CCC1 ZPWOOKQUDFIEIX-UHFFFAOYSA-N 0.000 claims description 4
- 230000007717 exclusion Effects 0.000 claims description 4
- 230000002209 hydrophobic effect Effects 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 4
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 claims description 4
- 229910000073 phosphorus hydride Inorganic materials 0.000 claims description 4
- 230000000306 recurrent effect Effects 0.000 claims description 4
- 239000002336 ribonucleotide Substances 0.000 claims description 4
- 230000006403 short-term memory Effects 0.000 claims description 4
- 150000003573 thiols Chemical class 0.000 claims description 4
- 125000002485 formyl group Chemical class [H]C(*)=O 0.000 claims 1
- 150000007523 nucleic acids Chemical class 0.000 abstract description 17
- 102000039446 nucleic acids Human genes 0.000 abstract description 16
- 108020004707 nucleic acids Proteins 0.000 abstract description 16
- 108091028043 Nucleic acid sequence Proteins 0.000 abstract description 9
- 238000007481 next generation sequencing Methods 0.000 abstract description 7
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 abstract description 2
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 abstract description 2
- 239000000047 product Substances 0.000 description 146
- OKKJLVBELUTLKV-UHFFFAOYSA-N Methanol Chemical compound OC OKKJLVBELUTLKV-UHFFFAOYSA-N 0.000 description 65
- XEKOWRVHYACXOJ-UHFFFAOYSA-N Ethyl acetate Chemical compound CCOC(C)=O XEKOWRVHYACXOJ-UHFFFAOYSA-N 0.000 description 56
- IAZDPXIOMUYVGZ-WFGJKAKNSA-N Dimethyl sulfoxide Chemical compound [2H]C([2H])([2H])S(=O)C([2H])([2H])[2H] IAZDPXIOMUYVGZ-WFGJKAKNSA-N 0.000 description 46
- 239000000203 mixture Substances 0.000 description 45
- JUJWROOIHBZHMG-UHFFFAOYSA-N Pyridine Chemical compound C1=CC=NC=C1 JUJWROOIHBZHMG-UHFFFAOYSA-N 0.000 description 43
- 239000000243 solution Substances 0.000 description 43
- YMWUJEATGCHHMB-UHFFFAOYSA-N Dichloromethane Chemical compound ClCCl YMWUJEATGCHHMB-UHFFFAOYSA-N 0.000 description 42
- 108060002716 Exonuclease Proteins 0.000 description 42
- HEDRZPFGACZZDS-MICDWDOJSA-N Trichloro(2H)methane Chemical compound [2H]C(Cl)(Cl)Cl HEDRZPFGACZZDS-MICDWDOJSA-N 0.000 description 42
- 102000013165 exonuclease Human genes 0.000 description 42
- 238000010200 validation analysis Methods 0.000 description 38
- 108091034117 Oligonucleotide Proteins 0.000 description 37
- 238000012360 testing method Methods 0.000 description 37
- 102000012410 DNA Ligases Human genes 0.000 description 36
- 108010061982 DNA Ligases Proteins 0.000 description 36
- 238000005160 1H NMR spectroscopy Methods 0.000 description 32
- 102000003960 Ligases Human genes 0.000 description 32
- 108090000364 Ligases Proteins 0.000 description 32
- 239000000872 buffer Substances 0.000 description 29
- 238000003556 assay Methods 0.000 description 27
- 235000019439 ethyl acetate Nutrition 0.000 description 27
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 26
- WEVYAHXRMPXWCK-UHFFFAOYSA-N Acetonitrile Chemical compound CC#N WEVYAHXRMPXWCK-UHFFFAOYSA-N 0.000 description 23
- 238000007792 addition Methods 0.000 description 23
- 239000011541 reaction mixture Substances 0.000 description 23
- 239000000499 gel Substances 0.000 description 22
- 239000007787 solid Substances 0.000 description 22
- 238000003780 insertion Methods 0.000 description 21
- 230000037431 insertion Effects 0.000 description 20
- 239000007858 starting material Substances 0.000 description 20
- UMJSCPRVCHMLSP-UHFFFAOYSA-N pyridine Natural products COC1=CC=CN=C1 UMJSCPRVCHMLSP-UHFFFAOYSA-N 0.000 description 19
- 238000010898 silica gel chromatography Methods 0.000 description 19
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 18
- 238000004458 analytical method Methods 0.000 description 17
- 238000001644 13C nuclear magnetic resonance spectroscopy Methods 0.000 description 16
- KDCGOANMDULRCW-UHFFFAOYSA-N 7H-purine Chemical compound N1=CNC2=NC=NC2=C1 KDCGOANMDULRCW-UHFFFAOYSA-N 0.000 description 16
- 239000013642 negative control Substances 0.000 description 16
- 239000000758 substrate Substances 0.000 description 16
- 230000002068 genetic effect Effects 0.000 description 15
- 238000002360 preparation method Methods 0.000 description 15
- 241000276427 Poecilia reticulata Species 0.000 description 14
- 230000029087 digestion Effects 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 14
- 230000002779 inactivation Effects 0.000 description 14
- 238000013507 mapping Methods 0.000 description 14
- 230000015654 memory Effects 0.000 description 14
- 102000004190 Enzymes Human genes 0.000 description 13
- 108090000790 Enzymes Proteins 0.000 description 13
- VHYFNPMBLIVWCW-UHFFFAOYSA-N 4-Dimethylaminopyridine Chemical compound CN(C)C1=CC=NC=C1 VHYFNPMBLIVWCW-UHFFFAOYSA-N 0.000 description 12
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 12
- 239000003480 eluent Substances 0.000 description 12
- 239000006260 foam Substances 0.000 description 12
- 238000007781 pre-processing Methods 0.000 description 12
- RYHBNJHYFVUHQT-UHFFFAOYSA-N 1,4-Dioxane Chemical compound C1COCCO1 RYHBNJHYFVUHQT-UHFFFAOYSA-N 0.000 description 11
- 238000010276 construction Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 11
- 239000002777 nucleoside Substances 0.000 description 11
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 10
- WFDIJRYMOXRFFG-UHFFFAOYSA-N Acetic anhydride Chemical compound CC(=O)OC(C)=O WFDIJRYMOXRFFG-UHFFFAOYSA-N 0.000 description 10
- 239000000463 material Substances 0.000 description 10
- VLKZOEOYAKHREP-UHFFFAOYSA-N n-Hexane Chemical class CCCCCC VLKZOEOYAKHREP-UHFFFAOYSA-N 0.000 description 10
- 239000013641 positive control Substances 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 9
- CSCPPACGZOOCGX-UHFFFAOYSA-N Acetone Chemical compound CC(C)=O CSCPPACGZOOCGX-UHFFFAOYSA-N 0.000 description 9
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 9
- 238000012216 screening Methods 0.000 description 9
- 239000002904 solvent Substances 0.000 description 9
- 238000004704 ultra performance liquid chromatography Methods 0.000 description 9
- RTZKZFJDLAIYFH-UHFFFAOYSA-N Diethyl ether Chemical compound CCOCC RTZKZFJDLAIYFH-UHFFFAOYSA-N 0.000 description 8
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 8
- CZPWVGJYEJSRLH-UHFFFAOYSA-N Pyrimidine Chemical compound C1=CN=CN=C1 CZPWVGJYEJSRLH-UHFFFAOYSA-N 0.000 description 8
- 230000000694 effects Effects 0.000 description 8
- 238000011534 incubation Methods 0.000 description 8
- -1 nucleoside monophosphate Chemical class 0.000 description 8
- 239000012044 organic layer Substances 0.000 description 8
- 230000011218 segmentation Effects 0.000 description 8
- 125000002264 triphosphate group Chemical class [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 description 8
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 7
- 239000007832 Na2SO4 Substances 0.000 description 7
- PMZURENOXWZQFD-UHFFFAOYSA-L Sodium Sulfate Chemical compound [Na+].[Na+].[O-]S([O-])(=O)=O PMZURENOXWZQFD-UHFFFAOYSA-L 0.000 description 7
- 108010006785 Taq Polymerase Proteins 0.000 description 7
- 239000000370 acceptor Substances 0.000 description 7
- 239000011543 agarose gel Substances 0.000 description 7
- 238000013459 approach Methods 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 7
- 238000004128 high performance liquid chromatography Methods 0.000 description 7
- 238000000746 purification Methods 0.000 description 7
- 239000000377 silicon dioxide Substances 0.000 description 7
- 229910052938 sodium sulfate Inorganic materials 0.000 description 7
- 238000003756 stirring Methods 0.000 description 7
- 229960000549 4-dimethylaminophenol Drugs 0.000 description 6
- 229910019142 PO4 Inorganic materials 0.000 description 6
- 230000004888 barrier function Effects 0.000 description 6
- 238000001816 cooling Methods 0.000 description 6
- 239000012149 elution buffer Substances 0.000 description 6
- BDAGIHXWWSANSR-UHFFFAOYSA-N methanoic acid Natural products OC=O BDAGIHXWWSANSR-UHFFFAOYSA-N 0.000 description 6
- 239000011780 sodium chloride Substances 0.000 description 6
- 238000001228 spectrum Methods 0.000 description 6
- 230000002194 synthesizing effect Effects 0.000 description 6
- 108010067770 Endopeptidase K Proteins 0.000 description 5
- ZMXDDKWLCZADIW-UHFFFAOYSA-N N,N-dimethylformamide Substances CN(C)C=O ZMXDDKWLCZADIW-UHFFFAOYSA-N 0.000 description 5
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 5
- 230000000692 anti-sense effect Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 5
- RGWHQCVHVJXOKC-SHYZEUOFSA-J dCTP(4-) Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-J 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 239000003814 drug Substances 0.000 description 5
- 125000001301 ethoxy group Chemical group [H]C([H])([H])C([H])([H])O* 0.000 description 5
- 230000006872 improvement Effects 0.000 description 5
- 150000002500 ions Chemical class 0.000 description 5
- 238000005259 measurement Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000007935 neutral effect Effects 0.000 description 5
- 150000003833 nucleoside derivatives Chemical class 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 230000002441 reversible effect Effects 0.000 description 5
- GEHJYWRUCIMESM-UHFFFAOYSA-L sodium sulfite Chemical compound [Na+].[Na+].[O-]S([O-])=O GEHJYWRUCIMESM-UHFFFAOYSA-L 0.000 description 5
- JBWYRBLDOOOJEU-UHFFFAOYSA-N 1-[chloro-(4-methoxyphenyl)-phenylmethyl]-4-methoxybenzene Chemical compound C1=CC(OC)=CC=C1C(Cl)(C=1C=CC(OC)=CC=1)C1=CC=CC=C1 JBWYRBLDOOOJEU-UHFFFAOYSA-N 0.000 description 4
- BVOITXUNGDUXRW-UHFFFAOYSA-N 2-chloro-1,3,2-benzodioxaphosphinin-4-one Chemical compound C1=CC=C2OP(Cl)OC(=O)C2=C1 BVOITXUNGDUXRW-UHFFFAOYSA-N 0.000 description 4
- 238000004679 31P NMR spectroscopy Methods 0.000 description 4
- ZCYVEMRRCGMTRW-UHFFFAOYSA-N 7553-56-2 Chemical compound [I] ZCYVEMRRCGMTRW-UHFFFAOYSA-N 0.000 description 4
- QGZKDVFQNNGYKY-UHFFFAOYSA-N Ammonia Chemical compound N QGZKDVFQNNGYKY-UHFFFAOYSA-N 0.000 description 4
- VHUUQVKOLVNVRT-UHFFFAOYSA-N Ammonium hydroxide Chemical compound [NH4+].[OH-] VHUUQVKOLVNVRT-UHFFFAOYSA-N 0.000 description 4
- BAVYZALUXZFZLV-UHFFFAOYSA-N Methylamine Chemical compound NC BAVYZALUXZFZLV-UHFFFAOYSA-N 0.000 description 4
- 238000012408 PCR amplification Methods 0.000 description 4
- 239000008118 PEG 6000 Substances 0.000 description 4
- 229920002584 Polyethylene Glycol 6000 Polymers 0.000 description 4
- ZMANZCXQSJIPKH-UHFFFAOYSA-N Triethylamine Chemical compound CCN(CC)CC ZMANZCXQSJIPKH-UHFFFAOYSA-N 0.000 description 4
- 239000007795 chemical reaction product Substances 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000001351 cycling effect Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- JXTHNDFMNIQAHM-UHFFFAOYSA-N dichloroacetic acid Chemical compound OC(=O)C(Cl)Cl JXTHNDFMNIQAHM-UHFFFAOYSA-N 0.000 description 4
- 125000002147 dimethylamino group Chemical group [H]C([H])([H])N(*)C([H])([H])[H] 0.000 description 4
- 239000000706 filtrate Substances 0.000 description 4
- 238000003818 flash chromatography Methods 0.000 description 4
- 238000004108 freeze drying Methods 0.000 description 4
- 239000007789 gas Substances 0.000 description 4
- TWYVVGMYFLAQMU-UHFFFAOYSA-N gelgreen Chemical compound [I-].[I-].C1=C(N(C)C)C=C2[N+](CCCCCC(=O)NCCCOCCOCCOCCCNC(=O)CCCCC[N+]3=C4C=C(C=CC4=CC4=CC=C(C=C43)N(C)C)N(C)C)=C(C=C(C=C3)N(C)C)C3=CC2=C1 TWYVVGMYFLAQMU-UHFFFAOYSA-N 0.000 description 4
- RAXXELZNTBOGNW-UHFFFAOYSA-N imidazole Natural products C1=CNC=N1 RAXXELZNTBOGNW-UHFFFAOYSA-N 0.000 description 4
- 229910052740 iodine Inorganic materials 0.000 description 4
- 239000011630 iodine Substances 0.000 description 4
- 238000004255 ion exchange chromatography Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 125000000325 methylidene group Chemical group [H]C([H])=* 0.000 description 4
- WRMXOVHLRUVREB-UHFFFAOYSA-N phosphono phosphate;tributylazanium Chemical compound OP(O)(=O)OP([O-])([O-])=O.CCCC[NH+](CCCC)CCCC.CCCC[NH+](CCCC)CCCC WRMXOVHLRUVREB-UHFFFAOYSA-N 0.000 description 4
- 150000008300 phosphoramidites Chemical class 0.000 description 4
- BWHMMNNQKKPAPP-UHFFFAOYSA-L potassium carbonate Chemical compound [K+].[K+].[O-]C([O-])=O BWHMMNNQKKPAPP-UHFFFAOYSA-L 0.000 description 4
- 230000035484 reaction time Effects 0.000 description 4
- 230000008439 repair process Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000002864 sequence alignment Methods 0.000 description 4
- 235000011152 sodium sulphate Nutrition 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- 238000006467 substitution reaction Methods 0.000 description 4
- 239000000725 suspension Substances 0.000 description 4
- IMFACGCPASFAPR-UHFFFAOYSA-N tributylamine Chemical compound CCCCN(CCCC)CCCC IMFACGCPASFAPR-UHFFFAOYSA-N 0.000 description 4
- HNSDLXPSAYFUHK-UHFFFAOYSA-N 1,4-bis(2-ethylhexyl) sulfosuccinate Chemical compound CCCCC(CC)COC(=O)CC(S(O)(=O)=O)C(=O)OCC(CC)CCCC HNSDLXPSAYFUHK-UHFFFAOYSA-N 0.000 description 3
- OSWFIVFLDKOXQC-UHFFFAOYSA-N 4-(3-methoxyphenyl)aniline Chemical compound COC1=CC=CC(C=2C=CC(N)=CC=2)=C1 OSWFIVFLDKOXQC-UHFFFAOYSA-N 0.000 description 3
- ZKHQWZAMYRWXGA-UHFFFAOYSA-N Adenosine triphosphate Natural products C1=NC=2C(N)=NC=NC=2N1C1OC(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)C(O)C1O ZKHQWZAMYRWXGA-UHFFFAOYSA-N 0.000 description 3
- USFZMSVCRYTOJT-UHFFFAOYSA-N Ammonium acetate Chemical compound N.CC(O)=O USFZMSVCRYTOJT-UHFFFAOYSA-N 0.000 description 3
- 239000005695 Ammonium acetate Substances 0.000 description 3
- ATRRKUHOCOJYRX-UHFFFAOYSA-N Ammonium bicarbonate Chemical compound [NH4+].OC([O-])=O ATRRKUHOCOJYRX-UHFFFAOYSA-N 0.000 description 3
- 108091023037 Aptamer Proteins 0.000 description 3
- 108010008286 DNA nucleotidylexotransferase Proteins 0.000 description 3
- 102100033215 DNA nucleotidylexotransferase Human genes 0.000 description 3
- 238000001712 DNA sequencing Methods 0.000 description 3
- 101100364969 Dictyostelium discoideum scai gene Proteins 0.000 description 3
- 240000000594 Heliconia bihai Species 0.000 description 3
- 241001546602 Horismenus Species 0.000 description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 3
- 101100364971 Mus musculus Scai gene Proteins 0.000 description 3
- 150000001299 aldehydes Chemical class 0.000 description 3
- 235000019257 ammonium acetate Nutrition 0.000 description 3
- 229940043376 ammonium acetate Drugs 0.000 description 3
- 239000001099 ammonium carbonate Substances 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 239000006227 byproduct Substances 0.000 description 3
- 230000003197 catalytic effect Effects 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 3
- 238000004587 chromatography analysis Methods 0.000 description 3
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 description 3
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 description 3
- 238000013480 data collection Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 235000019253 formic acid Nutrition 0.000 description 3
- 238000010348 incorporation Methods 0.000 description 3
- 238000007169 ligase reaction Methods 0.000 description 3
- 230000007774 longterm Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 239000011148 porous material Substances 0.000 description 3
- 239000002244 precipitate Substances 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 150000003212 purines Chemical class 0.000 description 3
- 150000003230 pyrimidines Chemical class 0.000 description 3
- 239000011535 reaction buffer Substances 0.000 description 3
- 238000004007 reversed phase HPLC Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 235000010265 sodium sulphite Nutrition 0.000 description 3
- 238000007671 third-generation sequencing Methods 0.000 description 3
- AVBGNFCMKJOFIN-UHFFFAOYSA-N triethylammonium acetate Chemical compound CC(O)=O.CCN(CC)CC AVBGNFCMKJOFIN-UHFFFAOYSA-N 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- ZXSQEZNORDWBGZ-UHFFFAOYSA-N 1,3-dihydropyrrolo[2,3-b]pyridin-2-one Chemical compound C1=CN=C2NC(=O)CC2=C1 ZXSQEZNORDWBGZ-UHFFFAOYSA-N 0.000 description 2
- DGMOBVGABMBZSB-UHFFFAOYSA-N 2-methylpropanoyl chloride Chemical compound CC(C)C(Cl)=O DGMOBVGABMBZSB-UHFFFAOYSA-N 0.000 description 2
- 108020005098 Anticodon Proteins 0.000 description 2
- 241001408449 Asca Species 0.000 description 2
- KWIUHFFTVRNATP-UHFFFAOYSA-N Betaine Natural products C[N+](C)(C)CC([O-])=O KWIUHFFTVRNATP-UHFFFAOYSA-N 0.000 description 2
- 108020004705 Codon Proteins 0.000 description 2
- 102100029921 Dipeptidyl peptidase 1 Human genes 0.000 description 2
- 108010007577 Exodeoxyribonuclease I Proteins 0.000 description 2
- 101000793922 Homo sapiens Dipeptidyl peptidase 1 Proteins 0.000 description 2
- 101001122938 Homo sapiens Lysosomal protective protein Proteins 0.000 description 2
- 101000650854 Homo sapiens Small glutamine-rich tetratricopeptide repeat-containing protein alpha Proteins 0.000 description 2
- VEXZGXHMUGYJMC-UHFFFAOYSA-N Hydrochloric acid Chemical compound Cl VEXZGXHMUGYJMC-UHFFFAOYSA-N 0.000 description 2
- 102000009617 Inorganic Pyrophosphatase Human genes 0.000 description 2
- 108010009595 Inorganic Pyrophosphatase Proteins 0.000 description 2
- 102100028524 Lysosomal protective protein Human genes 0.000 description 2
- KWIUHFFTVRNATP-UHFFFAOYSA-O N,N,N-trimethylglycinium Chemical compound C[N+](C)(C)CC(O)=O KWIUHFFTVRNATP-UHFFFAOYSA-O 0.000 description 2
- UGJBHEZMOKVTIM-UHFFFAOYSA-N N-formylglycine Chemical compound OC(=O)CNC=O UGJBHEZMOKVTIM-UHFFFAOYSA-N 0.000 description 2
- 102100035593 POU domain, class 2, transcription factor 1 Human genes 0.000 description 2
- 101710084414 POU domain, class 2, transcription factor 1 Proteins 0.000 description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- 102100027722 Small glutamine-rich tetratricopeptide repeat-containing protein alpha Human genes 0.000 description 2
- 241000205101 Sulfolobus Species 0.000 description 2
- 108020004566 Transfer RNA Proteins 0.000 description 2
- 239000002253 acid Substances 0.000 description 2
- 235000012501 ammonium carbonate Nutrition 0.000 description 2
- 235000011114 ammonium hydroxide Nutrition 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 229960003237 betaine Drugs 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 239000012267 brine Substances 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000002405 diagnostic procedure Methods 0.000 description 2
- 229960005215 dichloroacetic acid Drugs 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 238000001035 drying Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 229920000140 heteropolymer Polymers 0.000 description 2
- 230000028993 immune response Effects 0.000 description 2
- 238000005304 joining Methods 0.000 description 2
- 238000002898 library design Methods 0.000 description 2
- 238000004895 liquid chromatography mass spectrometry Methods 0.000 description 2
- 238000011068 loading method Methods 0.000 description 2
- 230000014759 maintenance of location Effects 0.000 description 2
- 238000004949 mass spectrometry Methods 0.000 description 2
- 238000001819 mass spectrum Methods 0.000 description 2
- 239000006199 nebulizer Substances 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- YJVFFLUZDVXJQI-UHFFFAOYSA-L palladium(ii) acetate Chemical compound [Pd+2].CC([O-])=O.CC([O-])=O YJVFFLUZDVXJQI-UHFFFAOYSA-L 0.000 description 2
- 235000015320 potassium carbonate Nutrition 0.000 description 2
- 229910000027 potassium carbonate Inorganic materials 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000007086 side reaction Methods 0.000 description 2
- 229910001958 silver carbonate Inorganic materials 0.000 description 2
- LKZMBDSASOBTPN-UHFFFAOYSA-L silver carbonate Substances [Ag].[O-]C([O-])=O LKZMBDSASOBTPN-UHFFFAOYSA-L 0.000 description 2
- 239000011734 sodium Substances 0.000 description 2
- UIIMBOGNXHQVGW-UHFFFAOYSA-M sodium bicarbonate Substances [Na+].OC([O-])=O UIIMBOGNXHQVGW-UHFFFAOYSA-M 0.000 description 2
- LPXPTNMVRIOKMN-UHFFFAOYSA-M sodium nitrite Chemical compound [Na+].[O-]N=O LPXPTNMVRIOKMN-UHFFFAOYSA-M 0.000 description 2
- HPALAKNZSZLMCH-UHFFFAOYSA-M sodium;chloride;hydrate Chemical compound O.[Na+].[Cl-] HPALAKNZSZLMCH-UHFFFAOYSA-M 0.000 description 2
- 238000000564 temperature-controlled scanning calorimetry Methods 0.000 description 2
- 239000003053 toxin Substances 0.000 description 2
- 231100000765 toxin Toxicity 0.000 description 2
- 108700012359 toxins Proteins 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- BPLUKJNHPBNVQL-UHFFFAOYSA-N triphenylarsine Chemical compound C1=CC=CC=C1[As](C=1C=CC=CC=1)C1=CC=CC=C1 BPLUKJNHPBNVQL-UHFFFAOYSA-N 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- ASGMFNBUXDJWJJ-JLCFBVMHSA-N (1R,3R)-3-[[3-bromo-1-[4-(5-methyl-1,3,4-thiadiazol-2-yl)phenyl]pyrazolo[3,4-d]pyrimidin-6-yl]amino]-N,1-dimethylcyclopentane-1-carboxamide Chemical compound BrC1=NN(C2=NC(=NC=C21)N[C@H]1C[C@@](CC1)(C(=O)NC)C)C1=CC=C(C=C1)C=1SC(=NN=1)C ASGMFNBUXDJWJJ-JLCFBVMHSA-N 0.000 description 1
- GHYOCDFICYLMRF-UTIIJYGPSA-N (2S,3R)-N-[(2S)-3-(cyclopenten-1-yl)-1-[(2R)-2-methyloxiran-2-yl]-1-oxopropan-2-yl]-3-hydroxy-3-(4-methoxyphenyl)-2-[[(2S)-2-[(2-morpholin-4-ylacetyl)amino]propanoyl]amino]propanamide Chemical compound C1(=CCCC1)C[C@@H](C(=O)[C@@]1(OC1)C)NC([C@H]([C@@H](C1=CC=C(C=C1)OC)O)NC([C@H](C)NC(CN1CCOCC1)=O)=O)=O GHYOCDFICYLMRF-UTIIJYGPSA-N 0.000 description 1
- IUSARDYWEPUTPN-OZBXUNDUSA-N (2r)-n-[(2s,3r)-4-[[(4s)-6-(2,2-dimethylpropyl)spiro[3,4-dihydropyrano[2,3-b]pyridine-2,1'-cyclobutane]-4-yl]amino]-3-hydroxy-1-[3-(1,3-thiazol-2-yl)phenyl]butan-2-yl]-2-methoxypropanamide Chemical compound C([C@H](NC(=O)[C@@H](C)OC)[C@H](O)CN[C@@H]1C2=CC(CC(C)(C)C)=CN=C2OC2(CCC2)C1)C(C=1)=CC=CC=1C1=NC=CS1 IUSARDYWEPUTPN-OZBXUNDUSA-N 0.000 description 1
- STBLNCCBQMHSRC-BATDWUPUSA-N (2s)-n-[(3s,4s)-5-acetyl-7-cyano-4-methyl-1-[(2-methylnaphthalen-1-yl)methyl]-2-oxo-3,4-dihydro-1,5-benzodiazepin-3-yl]-2-(methylamino)propanamide Chemical compound O=C1[C@@H](NC(=O)[C@H](C)NC)[C@H](C)N(C(C)=O)C2=CC(C#N)=CC=C2N1CC1=C(C)C=CC2=CC=CC=C12 STBLNCCBQMHSRC-BATDWUPUSA-N 0.000 description 1
- HUWSZNZAROKDRZ-RRLWZMAJSA-N (3r,4r)-3-azaniumyl-5-[[(2s,3r)-1-[(2s)-2,3-dicarboxypyrrolidin-1-yl]-3-methyl-1-oxopentan-2-yl]amino]-5-oxo-4-sulfanylpentane-1-sulfonate Chemical compound OS(=O)(=O)CC[C@@H](N)[C@@H](S)C(=O)N[C@@H]([C@H](C)CC)C(=O)N1CCC(C(O)=O)[C@H]1C(O)=O HUWSZNZAROKDRZ-RRLWZMAJSA-N 0.000 description 1
- SVJQCVOKYJWUBC-OWOJBTEDSA-N (e)-3-(2,3,4,5-tetrabromophenyl)prop-2-enoic acid Chemical compound OC(=O)\C=C\C1=CC(Br)=C(Br)C(Br)=C1Br SVJQCVOKYJWUBC-OWOJBTEDSA-N 0.000 description 1
- ZYZCALPXKGUGJI-DDVDASKDSA-M (e,3r,5s)-7-[3-(4-fluorophenyl)-2-phenyl-5-propan-2-ylimidazol-4-yl]-3,5-dihydroxyhept-6-enoate Chemical compound C=1C=C(F)C=CC=1N1C(\C=C\[C@@H](O)C[C@@H](O)CC([O-])=O)=C(C(C)C)N=C1C1=CC=CC=C1 ZYZCALPXKGUGJI-DDVDASKDSA-M 0.000 description 1
- QILCUDCYZVIAQH-UHFFFAOYSA-N 1-$l^{1}-oxidanyl-2,2,5,5-tetramethylpyrrole-3-carboxylic acid Chemical compound CC1(C)C=C(C(O)=O)C(C)(C)N1[O] QILCUDCYZVIAQH-UHFFFAOYSA-N 0.000 description 1
- UNILWMWFPHPYOR-KXEYIPSPSA-M 1-[6-[2-[3-[3-[3-[2-[2-[3-[[2-[2-[[(2r)-1-[[2-[[(2r)-1-[3-[2-[2-[3-[[2-(2-amino-2-oxoethoxy)acetyl]amino]propoxy]ethoxy]ethoxy]propylamino]-3-hydroxy-1-oxopropan-2-yl]amino]-2-oxoethyl]amino]-3-[(2r)-2,3-di(hexadecanoyloxy)propyl]sulfanyl-1-oxopropan-2-yl Chemical compound O=C1C(SCCC(=O)NCCCOCCOCCOCCCNC(=O)COCC(=O)N[C@@H](CSC[C@@H](COC(=O)CCCCCCCCCCCCCCC)OC(=O)CCCCCCCCCCCCCCC)C(=O)NCC(=O)N[C@H](CO)C(=O)NCCCOCCOCCOCCCNC(=O)COCC(N)=O)CC(=O)N1CCNC(=O)CCCCCN\1C2=CC=C(S([O-])(=O)=O)C=C2CC/1=C/C=C/C=C/C1=[N+](CC)C2=CC=C(S([O-])(=O)=O)C=C2C1 UNILWMWFPHPYOR-KXEYIPSPSA-M 0.000 description 1
- XGLVDUUYFKXKPL-UHFFFAOYSA-N 2-(2-methoxyethoxy)-n,n-bis[2-(2-methoxyethoxy)ethyl]ethanamine Chemical compound COCCOCCN(CCOCCOC)CCOCCOC XGLVDUUYFKXKPL-UHFFFAOYSA-N 0.000 description 1
- KSTJOICDZAFYTD-UHFFFAOYSA-N 2-amino-1h-imidazo[1,2-a][1,3,5]triazin-4-one Chemical compound O=C1N=C(N)N=C2NC=CN21 KSTJOICDZAFYTD-UHFFFAOYSA-N 0.000 description 1
- YZEUHQHUFTYLPH-UHFFFAOYSA-N 2-nitroimidazole Chemical compound [O-][N+](=O)C1=NC=CN1 YZEUHQHUFTYLPH-UHFFFAOYSA-N 0.000 description 1
- ASPDJZINBYYZRU-UHFFFAOYSA-N 5-amino-2-chlorobenzotrifluoride Chemical compound NC1=CC=C(Cl)C(C(F)(F)F)=C1 ASPDJZINBYYZRU-UHFFFAOYSA-N 0.000 description 1
- OZFPSOBLQZPIAV-UHFFFAOYSA-N 5-nitro-1h-indole Chemical compound [O-][N+](=O)C1=CC=C2NC=CC2=C1 OZFPSOBLQZPIAV-UHFFFAOYSA-N 0.000 description 1
- WERABQRUGJIMKQ-UHFFFAOYSA-N 6-chloro-3-nitropyridin-2-amine Chemical compound NC1=NC(Cl)=CC=C1[N+]([O-])=O WERABQRUGJIMKQ-UHFFFAOYSA-N 0.000 description 1
- FQFSWKVTOZFBOZ-UHFFFAOYSA-N 6-chloro-5-iodo-3-nitropyridin-2-amine Chemical compound NC1=NC(Cl)=C(I)C=C1[N+]([O-])=O FQFSWKVTOZFBOZ-UHFFFAOYSA-N 0.000 description 1
- ZKHQWZAMYRWXGA-KQYNXXCUSA-J ATP(4-) Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)[C@H]1O ZKHQWZAMYRWXGA-KQYNXXCUSA-J 0.000 description 1
- 229920000936 Agarose Polymers 0.000 description 1
- 102100039239 Amidophosphoribosyltransferase Human genes 0.000 description 1
- 108010039224 Amidophosphoribosyltransferase Proteins 0.000 description 1
- 229910000013 Ammonium bicarbonate Inorganic materials 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 241000726103 Atta Species 0.000 description 1
- MFYMBIZGFDNLPT-UHFFFAOYSA-N CTBT Chemical compound N1=NC2=NN=NN2C2=CC=C(Cl)C=C21 MFYMBIZGFDNLPT-UHFFFAOYSA-N 0.000 description 1
- VEXZGXHMUGYJMC-UHFFFAOYSA-M Chloride anion Chemical compound [Cl-] VEXZGXHMUGYJMC-UHFFFAOYSA-M 0.000 description 1
- 108020004635 Complementary DNA Proteins 0.000 description 1
- 229940127007 Compound 39 Drugs 0.000 description 1
- 108010001132 DNA Polymerase beta Proteins 0.000 description 1
- 102000001996 DNA Polymerase beta Human genes 0.000 description 1
- 230000033616 DNA repair Effects 0.000 description 1
- 108091027757 Deoxyribozyme Proteins 0.000 description 1
- 102100029075 Exonuclease 1 Human genes 0.000 description 1
- 101710134582 Geranylgeranyl transferase type-2 subunit alpha Proteins 0.000 description 1
- OOFLZRMKTMLSMH-UHFFFAOYSA-N H4atta Chemical compound OC(=O)CN(CC(O)=O)CC1=CC=CC(C=2N=C(C=C(C=2)C=2C3=CC=CC=C3C=C3C=CC=CC3=2)C=2N=C(CN(CC(O)=O)CC(O)=O)C=CC=2)=N1 OOFLZRMKTMLSMH-UHFFFAOYSA-N 0.000 description 1
- 229910004003 H5IO6 Inorganic materials 0.000 description 1
- 102100022536 Helicase POLQ-like Human genes 0.000 description 1
- 101100045622 Homo sapiens CCT6A gene Proteins 0.000 description 1
- 101000899334 Homo sapiens Helicase POLQ-like Proteins 0.000 description 1
- 101000835622 Homo sapiens Tubulin-specific chaperone A Proteins 0.000 description 1
- 101000935117 Homo sapiens Voltage-dependent P/Q-type calcium channel subunit alpha-1A Proteins 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- 241001026509 Kata Species 0.000 description 1
- 241001175904 Labeo bata Species 0.000 description 1
- 101100172630 Mus musculus Eri1 gene Proteins 0.000 description 1
- ZSXGLVDWWRXATF-UHFFFAOYSA-N N,N-dimethylformamide dimethyl acetal Chemical compound COC(OC)N(C)C ZSXGLVDWWRXATF-UHFFFAOYSA-N 0.000 description 1
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- KDLHZDBZIXYQEI-UHFFFAOYSA-N Palladium on carbon Substances [Pd] KDLHZDBZIXYQEI-UHFFFAOYSA-N 0.000 description 1
- 208000036758 Postinfectious cerebellitis Diseases 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 101000974926 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) Alcohol O-acetyltransferase 2 Proteins 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- QAOWNCQODCNURD-UHFFFAOYSA-N Sulfuric acid Chemical compound OS(O)(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-N 0.000 description 1
- 101100045624 Sus scrofa CCT6 gene Proteins 0.000 description 1
- 102100030664 T-complex protein 1 subunit zeta Human genes 0.000 description 1
- 102000003978 Tissue Plasminogen Activator Human genes 0.000 description 1
- 108090000373 Tissue Plasminogen Activator Proteins 0.000 description 1
- 239000007983 Tris buffer Substances 0.000 description 1
- 102100026477 Tubulin-specific chaperone A Human genes 0.000 description 1
- 241001355948 Turnip curly top virus Species 0.000 description 1
- 102100025330 Voltage-dependent P/Q-type calcium channel subunit alpha-1A Human genes 0.000 description 1
- 108091027569 Z-DNA Proteins 0.000 description 1
- 150000007513 acids Chemical class 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 229910021529 ammonia Inorganic materials 0.000 description 1
- 235000012538 ammonium bicarbonate Nutrition 0.000 description 1
- 239000000908 ammonium hydroxide Substances 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- CBHOOMGKXCMKIR-UHFFFAOYSA-N azane;methanol Chemical compound N.OC CBHOOMGKXCMKIR-UHFFFAOYSA-N 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000012620 biological material Substances 0.000 description 1
- 238000013406 biomanufacturing process Methods 0.000 description 1
- 229920001222 biopolymer Polymers 0.000 description 1
- HRHJHXJQMNWQTF-UHFFFAOYSA-N cannabichromenic acid Chemical compound O1C(C)(CCC=C(C)C)C=CC2=C1C=C(CCCCC)C(C(O)=O)=C2O HRHJHXJQMNWQTF-UHFFFAOYSA-N 0.000 description 1
- 125000002680 canonical nucleotide group Chemical group 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 125000001309 chloro group Chemical group Cl* 0.000 description 1
- 239000012612 commercial material Substances 0.000 description 1
- 229940125797 compound 12 Drugs 0.000 description 1
- 229940125878 compound 36 Drugs 0.000 description 1
- 229940125807 compound 37 Drugs 0.000 description 1
- 229940127573 compound 38 Drugs 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003467 diminishing effect Effects 0.000 description 1
- 239000001177 diphosphate Substances 0.000 description 1
- 235000011180 diphosphates Nutrition 0.000 description 1
- 238000011143 downstream manufacturing Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 230000006862 enzymatic digestion Effects 0.000 description 1
- 238000001976 enzyme digestion Methods 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 238000001704 evaporation Methods 0.000 description 1
- 230000008020 evaporation Effects 0.000 description 1
- 108010052305 exodeoxyribonuclease III Proteins 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 125000000623 heterocyclic group Chemical group 0.000 description 1
- SHFJWMWCIHQNCP-UHFFFAOYSA-M hydron;tetrabutylazanium;sulfate Chemical compound OS([O-])(=O)=O.CCCC[N+](CCCC)(CCCC)CCCC SHFJWMWCIHQNCP-UHFFFAOYSA-M 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 239000000543 intermediate Substances 0.000 description 1
- INQOMBQAUSQDDS-UHFFFAOYSA-N iodomethane Chemical compound IC INQOMBQAUSQDDS-UHFFFAOYSA-N 0.000 description 1
- 238000010501 iterative synthesis reaction Methods 0.000 description 1
- 239000010410 layer Substances 0.000 description 1
- 238000004811 liquid chromatography Methods 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 150000004712 monophosphates Chemical class 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- APVPOHHVBBYQAV-UHFFFAOYSA-N n-(4-aminophenyl)sulfonyloctadecanamide Chemical compound CCCCCCCCCCCCCCCCCC(=O)NS(=O)(=O)C1=CC=C(N)C=C1 APVPOHHVBBYQAV-UHFFFAOYSA-N 0.000 description 1
- 229910000069 nitrogen hydride Inorganic materials 0.000 description 1
- 125000003835 nucleoside group Chemical group 0.000 description 1
- PIDFDZJZLOTZTM-KHVQSSSXSA-N ombitasvir Chemical compound COC(=O)N[C@@H](C(C)C)C(=O)N1CCC[C@H]1C(=O)NC1=CC=C([C@H]2N([C@@H](CC2)C=2C=CC(NC(=O)[C@H]3N(CCC3)C(=O)[C@@H](NC(=O)OC)C(C)C)=CC=2)C=2C=CC(=CC=2)C(C)(C)C)C=C1 PIDFDZJZLOTZTM-KHVQSSSXSA-N 0.000 description 1
- TWLXDPFBEPBAQB-UHFFFAOYSA-N orthoperiodic acid Chemical compound OI(O)(O)(O)(O)=O TWLXDPFBEPBAQB-UHFFFAOYSA-N 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- AHWALFGBDFAJAI-UHFFFAOYSA-N phenyl carbonochloridate Chemical compound ClC(=O)OC1=CC=CC=C1 AHWALFGBDFAJAI-UHFFFAOYSA-N 0.000 description 1
- 229920001223 polyethylene glycol Polymers 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- ZNNZYHKDIALBAK-UHFFFAOYSA-M potassium thiocyanate Chemical compound [K+].[S-]C#N ZNNZYHKDIALBAK-UHFFFAOYSA-M 0.000 description 1
- 229940116357 potassium thiocyanate Drugs 0.000 description 1
- PSHHQIGKVLIVBD-UHFFFAOYSA-N purine-2,4-diamine Chemical compound C1=NC(N)=NC2(N)N=CN=C21 PSHHQIGKVLIVBD-UHFFFAOYSA-N 0.000 description 1
- GRJJQCWNZGRKAU-UHFFFAOYSA-N pyridin-1-ium;fluoride Chemical compound F.C1=CC=NC=C1 GRJJQCWNZGRKAU-UHFFFAOYSA-N 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000007363 ring formation reaction Methods 0.000 description 1
- 238000002390 rotary evaporation Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 229910052706 scandium Inorganic materials 0.000 description 1
- 238000011451 sequencing strategy Methods 0.000 description 1
- 239000000741 silica gel Substances 0.000 description 1
- 229910002027 silica gel Inorganic materials 0.000 description 1
- 229910052708 sodium Inorganic materials 0.000 description 1
- 235000017557 sodium bicarbonate Nutrition 0.000 description 1
- 229910000030 sodium bicarbonate Inorganic materials 0.000 description 1
- AKHNMLFCWUSKQB-UHFFFAOYSA-L sodium thiosulfate Chemical compound [Na+].[Na+].[O-]S([O-])(=O)=S AKHNMLFCWUSKQB-UHFFFAOYSA-L 0.000 description 1
- 235000019345 sodium thiosulphate Nutrition 0.000 description 1
- 239000012321 sodium triacetoxyborohydride Substances 0.000 description 1
- 239000012265 solid product Substances 0.000 description 1
- 108010068698 spleen exonuclease Proteins 0.000 description 1
- 230000005477 standard model Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- BCNZYOJHNLTNEZ-UHFFFAOYSA-N tert-butyldimethylsilyl chloride Chemical compound CC(C)(C)[Si](C)(C)Cl BCNZYOJHNLTNEZ-UHFFFAOYSA-N 0.000 description 1
- FPGGTKZVZWFYPV-UHFFFAOYSA-M tetrabutylammonium fluoride Chemical compound [F-].CCCC[N+](CCCC)(CCCC)CCCC FPGGTKZVZWFYPV-UHFFFAOYSA-M 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- AKUNSPZHHSNFFX-UHFFFAOYSA-M tributyl(tetradecyl)phosphanium;chloride Chemical compound [Cl-].CCCCCCCCCCCCCC[P+](CCCC)(CCCC)CCCC AKUNSPZHHSNFFX-UHFFFAOYSA-M 0.000 description 1
- YNJBWRMUSHSURL-UHFFFAOYSA-N trichloroacetic acid Chemical compound OC(=O)C(Cl)(Cl)Cl YNJBWRMUSHSURL-UHFFFAOYSA-N 0.000 description 1
- LENZDBCJOHFCAS-UHFFFAOYSA-N tris Chemical compound OCC(N)(CO)CO LENZDBCJOHFCAS-UHFFFAOYSA-N 0.000 description 1
- 229910052720 vanadium Inorganic materials 0.000 description 1
- 239000003039 volatile agent Substances 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
- C12N9/1241—Nucleotidyltransferases (2.7.7)
- C12N9/1252—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1093—General methods of preparing gene libraries, not provided for in other subgroups
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12P—FERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
- C12P19/00—Preparation of compounds containing saccharide radicals
- C12P19/26—Preparation of nitrogen-containing carbohydrates
- C12P19/28—N-glycosides
- C12P19/30—Nucleotides
- C12P19/34—Polynucleotides, e.g. nucleic acids, oligoribonucleotides
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y207/00—Transferases transferring phosphorus-containing groups (2.7)
- C12Y207/07—Nucleotidyltransferases (2.7.7)
- C12Y207/07007—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/06—Libraries containing nucleotides or polynucleotides, or derivatives thereof
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
Definitions
- the name of the XML file containing the sequence listing is 3915- P1293WO.UW_Sequence_Listing.xml.
- the XML file is 172,291 bytes; was created on February 07, 2024; and is being submitted electronically via Patent Center with the filing of the specification.
- BACKGROUND [0003] The four-letter standard genetic alphabet of DNA (A, T, G, C) is ubiquitous and one of the defining biomolecular signatures of life on Earth. Organisms’ ability to read, write, and translate this information forms the basis for evolution as an emergent property of nucleic acid heteropolymers. Humanity has learned how to manipulate the standard 4-letters of DNA, spurring major advancements in biotechnology, information, and healthcare.
- non-standard nucleotides that are capable of base- pairing with other non-standard nucleotides and/or standard nucleotides.
- non-standard nucleotide refers to any nucleotide that is not one of the standard four nucleotides of DNA (i.e., A, T, G, C).
- An example of such a nucleotide includes, but is not limited to, a xenonucleotide (XNA).
- XNA xenonucleotide
- the disclosure provides a method for generating an N+1 tailing product comprising a non-standard nucleotide that is covalently bound with a 3’ end of a precursor double-stranded DNA (dsDNA) template and is non-base-paired, the method comprising: combining the precursor dsDNA template with a DNA polymerase and a non-standard deoxyribonucleotide triphosphate (dNTP) under a reaction condition conducive to a blunt-end N+1 addition of the non-standard nucleotide to the 3’ end of the precursor dsDNA template by the DNA polymerase.
- dNTP non-standard deoxyribonucleotide triphosphate
- the non-standard nucleotide is a xenonucleotide (XNA) and the non-standard dNTP is a deoxy-xeno-ribonucleotide triphosphate (dxNTP).
- the DNA polymerase comprises a polypeptide sequence of a small Klenow Fragment (KF exo-) of DNA Polymerase I.
- the polypeptide sequence comprises a sequence of SEQ ID NO:2.
- the non-standard nucleotide is B or p
- the reaction condition proceeds at about 37°C for between about 1-16 hours and comprises about 0.71 U/ ⁇ L of the DNA polymerase and about 1.19 mM of the non-standard dNTP.
- the DNA polymerase comprises a polypeptide sequence of an engineered polymerase from a hyperthermophilic marine archaeon.
- the engineered polymerase is a variant of 9°N DNA polymerase.
- the polypeptide sequence comprises a sequence of SEQ ID NO:3.
- the non-standard nucleotide is selected from S n , S c , Z, X t , K n , J, and V, and the reaction condition proceeds at about 60°C for between about 4- 16 hours and comprises about 0.29 U/ ⁇ L of the DNA polymerase and about 1.19 mM of the non-standard dNTP.
- the disclosure provides a method for generating a base pair of two nucleotides of a polynucleotide, wherein at least one nucleotide of the two nucleotides is a non-standard nucleotide.
- the method comprises: generating a second N+1 tailing product comprising a second non-standard nucleotide that is base-pair complementary with the non-standard nucleotide, wherein the second non-standard nucleotide is non- base-paired; and ligating the N+1 tailing product with the second N+1 tailing product to form a dsDNA ligation product that comprises a base pair between the non-standard nucleotide and the second non-standard nucleotide.
- the N+1 tailing product comprises a hairpin.
- the second N+1 tailing product comprises a hairpin.
- the dsDNA ligation product does not comprise a free 5’ end or a free 3’ end.
- the method comprises: contacting the dsDNA ligation product with a type IIS restriction enzyme under a reaction condition conducive for the type IIS restriction enzyme to cleave the dsDNA ligation product to generate a blunt-end DNA template that comprises the base pair between the non-standard nucleotide and the second non-standard nucleotide.
- 3915-P1293WO.UW -3- can hydrogen bond to a second base, a nucleobase that can base pair (without hydrogen bonding) to a second base, a nucleobase that relies on steric exclusion for base pairing, a nucleobase that relies on hydrophobic interactions for base pairing, a nucleobase that relies on a transition metal complex for base pairing, a chemical modification, or any combination thereof.
- the non-standard nucleotide is the nucleobase that is configured to hydrogen bond to the second base and the second base is a standard base or a non-standard base.
- the non-standard nucleotide comprises the chemical modification and the chemical modification comprises a fluorophore, a biotin, a terminal alkyne, an azide, a cyclooctyne, a tetrazine, a terminal alkene, a phosphine, a halo-alkane, an aldehyde, a thiol, a transition metal complex, another reactive handle, or any combination thereof.
- the disclosure provides a dsDNA ligation product. In an aspect, the disclosure provides a further dsDNA ligation product.
- the disclosure provides a defined non-standard nucleotide base pair library comprising a library polynucleotide sequence of the dsDNA ligation product or the blunt-end dsDNA template, wherein the library polynucleotide sequence comprises the base pair between the non-standard nucleotide and the second non-standard nucleotide.
- the disclosure provides a defined non-standard nucleotide base pair library comprising a library polynucleotide sequence of the further dsDNA ligation product or the further blunt-end dsDNA template, wherein the library polynucleotide sequence comprises the plurality of base pairs between the plurality of non-standard nucleotides and the plurality of second non-standard nucleotides.
- the library polynucleotide sequence further comprises: a context barcode associated with a sequence context adjacent to a base pair of a non-
- the disclosure provides a method for generating a machine learning (ML) model that correlates one or more observed current reads with an unknown non-standard nucleotide for assignment of an identity to the unknown non-standard nucleotide, the method comprising: sequencing, with a nanopore sequencing method, the defined non-standard nucleotide base pair library to produce the one or more observed current reads; and training, with a ML algorithm, the ML model to associate the one or more observed current reads with a known identity of a defined non-standard nucleotide of the defined non-standard nucleotide base pair library, wherein the ML model is configured to assign the identity to the unknown non-standard nucleotide based on the known identity of the defined non-standard nucleotide.
- ML machine learning
- the ML model comprises a convolutional long short term memory recurrent neural network (LSTM RNN).
- LSTM RNN convolutional long short term memory recurrent neural network
- the disclosure provides a non-transitory computer-readable storage medium having stored thereon at least part of a ML model.
- the disclosure provides a computational device or computational system comprising the non- transitory computer-readable storage medium.
- the disclosure provides a nanopore sequencing kit, device, or system comprising the non-transitory computer- readable storage medium.
- the disclosure provides a method for basecalling a non- standard nucleotide expanded alphabet, the method comprising: sequencing, with a nanopore sequencing method, a subject polynucleotide sequence that comprises a non- standard nucleotide to generate a subject current read; computing, with the computational device or computational system, the known identity of the defined non-standard nucleotide of the defined non-standard nucleotide base pair library associated with the subject current read with for an association; and computing, based on the association, a structure of the non-standard nucleotide.
- the disclosure provides a circuitry configured to perform all or part of a method.
- the disclosure provides a nanopore sequencing kit, device, or system comprising the circuitry.
- FIGs 1A and 1B show nucleobases for an expanded 12-letter supernumerary DNA alphabet.
- FIG. 1A Structures of standard purine and pyrimidine nucleobases.
- FIG. 1B Structures of mutually orthogonal synthetic xenonucleobases that can form the basis of a 12-letter supernumerary DNA. Single letter abbreviations of each base indicated above nucleobase structure.
- FIGs 2A-2H show XNA tailing and XNA ligation enable a facile means for enzymatic XNA incorporation.
- FIG. 2A Polymerase XNA tailing activity screened by detection of released 2′-deoxy-xenonucleoside monophosphates (dxNMPs). Hairpin HP-3′PT was used as tailing substrate (Table 2); ‘*’ indicate positions of phosphorothioate bonds.
- Extracted ion chromatograms for each dNMP and dxNMP in assays indicate dNTP and dxNTP tailing by (FIG.2B) Klenow Fragment (exo-) and (FIG. 2C) Therminator polymerase.
- Source data are provided as a Source Data file.
- FIG. 2D Assay measuring extent of XNA tailing by T4 ligation. Tailed hairpins are not substrates for T4 ligation.
- FIG. 2E XNA tailing of hairpin using optimized conditions showing XNA tailed hairpin is the major product.
- (–) is blunt-ended hairpin negative control.
- G + is a hairpin synthesized to contain a single nucleotide 3′-G overhang as the positive control (gel representative of 3 experimental replicates; yield estimates are listed in Table 9).
- FIG. 2F Assay to ligate two DNA hairpins with complementary single nucleotide XNA overhangs. Ligated hairpins are protected from exonucleases as they lack free 5′ and 3′- ends.
- FIG. 2G XNA ligation of hairpins tailed with complementary purine (pur) and pyrimidine (pyr) XNA bases using optimized reaction conditions. (+) is a positive control that used blunt DNA substrate.
- (*) is a negative control that used blunt DNA substrate without DNA ligase.
- FIGs 3A-3D show generation of 12-letter (ATGCBSPZXKJV) nanopore sequencing kmer models.
- FIG. 3A Overview of construction of NNNNNNN libraries, starting from two synthetic oligo pools (NNN-Pool) that contain blunt, NNN-3′ ends. The 24-nt triplet-barcodes in these hairpins are linked to the 3′-NNN sequence, allowing for proper identification of bases adjacent to XNA inserts. Complementary XNA base pairs are added to the library hairpins using XNA tailing and XNA ligation.
- FIGs 4A-4C show construction and end-to-end nanopore sequencing of 6- letter DNA alphabets.
- FIG. 4A Proof of concept deployment of an XNA-refinement pipeline using 4-nt kmer models measured in this disclosure.
- Pipeline is used to transform raw commercial nanopore reads into likely XNA basecalls for the sense (+) and antisense (-) strands.
- FIG. 4C Response
- FIG. 5 shows enzyme-assisted synthesis and third-generation sequencing of supernumerary 12-letter DNA.
- the kmer probability density function (observed signal mean ⁇ I z >, model mean ⁇ ki , model standard deviation ⁇ ) is used to calculate log-likelihoods while a maximum likelihood with outlier-robust log-likelihood ratios is used to determine base call.
- FIG. 6A shows an overview of an example non-templated N+1 tailing reaction. Tailing of blunt-end hairpin DNA substrates (N) can lead to complete formation of XNA-tailed hairpin products (N+1 major).
- PPi release from tailing leads to slow background rate of pyrophosphorolysis, which acts in the reverse direction of nucleotide tailing (3′-exo). Pyrophosphorolysis is mitigated by adding YiPP to tailing reactions and balancing reaction duration and reaction rates.
- the over tailing of products to generate (N+2) hairpins is also considered in optimization for tailing reactions.
- N+1 tailing is generally thought to occur at a first-order reaction rate, 2 orders of magnitude slower than templated polymerization.
- N+2 addition rates are polymerase specific and are thought to occur at first order rates 2 orders of magnitude slower than N+1 product formation. End abbreviations: 3′ indicates 3′-OH, 5′- indicates 5′-PO4.
- N A, T, G, C
- T4 ligation assay A 5′-phosphorylated hairpin oligo with a 3′-blunt end was
- 3915-P1293WO.UW -8- purchased from IDT (5′Phos-15HP; Table 2). Oligos are first refolded by incubating 20 ⁇ M of oligo in a 100 mM NaCl, 10 mM Tris-HCl buffer (pH 8.2) at 90 ⁇ C for 3 minutes then cooling at 0.1 ⁇ C/s until reaching 20 ⁇ C. All subsequent tailing reactions used 16 ⁇ M 5′Phos-15HP (blunt-end with 15 nt in the hairpin region), 1.19 mM dNTP (with dNTP used specified on lane figure panel), and tailed for 1 h at the specified temperature using the specified polymerases.
- T4 ligation reactions were performed with 11.2 ⁇ M of oligo for 1 h using T4 DNA Ligase Reaction Buffer which contains 1 mM ATP.
- FIG.6BA Tailing screen for Taq polymerase (0.25 U/ ⁇ L, 72 ⁇ C) and Klenow Fragment (exo-; KF) polymerase (0.68 U/ ⁇ L, 37 ⁇ C) followed by high concentration T4 ligation.
- FIG. 6BB Tailing screen for Deep Vent (exo-; DV) polymerase (0.1 U/ ⁇ L, 72 ⁇ C) and Therminator (Therm) polymerase (0.1 U/ ⁇ L, 72 ⁇ C) followed by high concentration T4 ligation.
- FIG.6BA Tailing screen for Taq polymerase (0.25 U/ ⁇ L, 72 ⁇ C) and Klenow Fragment (exo-; KF) polymerase (0.68 U/ ⁇ L, 37 ⁇ C) followed by high concentration T4 ligation.
- FIG. 6BB
- FIGs 6CA-6CM show UPLC/QTOF validation of tailing activity for all dNTPs and dxNTPs by Klenow Fragment (exo-).
- FIG. 2B Full set of controls for the data shown in FIG. 2B.
- Extracted ion chromatograms (EIC) show relative abundance of either dNMP or dxNMP release when corresponding dNTPs/dxNTPs are used as a substrate for polymerase (KF exo-) tailing. Chromatogram scales are normalized for comparison of runs within each panel. dNTP or dxNTP used in each reaction shown in panel legend.
- FIGs 6DA-6DM show UPLC/QTOF validation of tailing activity for all dNTPs and dxNTPs by Therminator.
- FIG. 2C Full set of controls for the data shown in FIG. 2C.
- Extracted ion chromatograms (EIC) show relative abundance of either dNMP or dxNMP release when corresponding dNTPs/dxNTPs are used as a substrate for polymerase (Therminator; Therm) tailing. Chromatogram scales are normalized for comparison of runs within each panel. dNTP or dxNTP used in each reaction shown in
- FIGs 6EA-6EE show screening and optimization of XNA tailing conditions. All tailing reactions used 11.9 ⁇ M 5′Phos-11HP, 1.19 mM of specified dNTP/dxNTP, and tailed at the specified temperature for the specified times using either Klenow Fragment (KF exo-; 0.71 U/ ⁇ L) or Therminator (Therm; 0.29 U/ ⁇ L). Tailing completeness was measured via T4 ligation assays.
- FIG. 6EA XNA tailing screen using KF exo- and Therm for 8 h.
- FIG. 6EB XNA tailing screen using KF and Therm for 8 h.
- FIG. 6EC Additional S c tailing screen using Therm for 8 or 16 h.
- FIG. 6F shows addition of yeast inorganic pyrophosphatase (YiPP) leads to slight improvements in XNA tailing reaction yield.
- 5′-phosphorylated hairpin oligos with either a 3′-blunt end or 3′-single nucleotide (-G, or -C) overhangs were purchased from IDT (5′-Phos-11HP; Table 2). Separately, 11.4 ⁇ M of 3′-blunt end oligos were tailed with 1.14 mM of dCTP or dGTP, Klenow Fragment (exo-; KF; 0.68 U/ ⁇ L), and either 0.009 U/ ⁇ L of YiPP or no YiPP at 37 ⁇ C for 4 h.
- Ligation reactions were performed using 2.6 ⁇ M of two oligos with complementary overhang bases, either enzymatically tailed (G, C) or synthesized overhangs (G*, C*). Ligation reactions were incubated for 15 min at 16 ⁇ C using T7 DNA ligase (272 U/ ⁇ L) and carried out in 1X of NEB StickTogetherTM buffer which contains 7.5% (w/v) PEG 6000. Blunt-end hairpins (- /-) serve as a negative ligation control as the short reaction time prevents blunt end ligation.
- Unligated materials were digested using exonuclease I (2.7 U/ ⁇ L), exonuclease III (13.3 U/ ⁇ L) and exonuclease VII (1.33 U/ ⁇ L) for 1 h at 37 ⁇ C. Exonuclease reactions were heat inactivated by incubation at 95 ⁇ C for 10 min and then at 80 ⁇ C for 10 min.
- Exo VII was used which has a higher heat inactivation temperature than Exo VIII (truncated) used in other aspects of this disclosure. It was also found Exo VII would result in incomplete digestion (lower band) and required different buffer conditions. In subsequent screening work, Exo VIII (truncated) was used instead in the exonuclease treatment steps. Positive control with G* and C* shows ligation of hairpins with G and C synthetic overhangs. Gel representative of a single experimental replicate. [0044] FIG. 6G shows enzymatic tailing does not lead to measurable differences in ligation when compared to ligation using fully synthetic hairpin with N+1 tails.
- over-tailed product i.e., more than one nucleotide added to the blunt 3′-end
- N+1 tailed hairpin would result in dsDNA that contains a gap of one or more nucleotides.
- the gap region exposes a 3′ and 5′ end that would make this product susceptible to exonuclease degradation. Therefore, one way one can have tested to see if over-tailing was a problem was to compare how much ligated product was observed (as measured by agarose gel band intensity) if hairpins were tailed enzymatically vs made synthetically.
- 5′-phosphorylated hairpin oligos with either a 3′-blunt end or 3′- single nucleotide (-G, or -C) overhangs were purchased from IDT (Table 2). Oligos were first folded using previously described methods. Blunt end oligo 5′Phos-11HP was then tailed with dCTP using conditions listed in Table 8. Subsequent ligation reactions were performed using T7 or T4 DNA ligase. Either the dCTP-tailed oligo (Tailed) or 5′Phos- HP-3′C (Synth) was ligated to 5′Phos-HP-3′G.
- T7 ligation reactions 2.7 ⁇ M of each oligo were incubated with 272 U/ ⁇ L of T7 DNA ligase and StickTogether TM DNA ligase buffer at 16 ⁇ C for 15 min, after which the ligase was heat inactivated at 65 ⁇ C for 10 min.
- 4.2 ⁇ M of each oligo were incubated with 80 U/ ⁇ L of T4 DNA ligase and T4 DNA ligase buffer at 16 ⁇ C for 2 h, after which the ligase was heat inactivated at 65 ⁇ C for 10 min.
- FIGs 6HA-6HQ show high resolution LC/MS of oligo showing N+1 tailing as major product.
- FIG. 6HA Hairpin oligo, 5′Phos-ScaI-HP (Table 2) was tailed
- FIG. 6I shows an overview of T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase products. (top) Major products formed from T3 ligation and T4 ligation assays between hairpins generated in this disclosure. (bottom) Major and minor products formed for T7 ligation assays in this disclosure.
- T7 ligase preferentially ligates hairpins with a cohesive nucleotide overhang and has minimal blunt-end ligation activity.
- T7 ligase has been observed to perform blunt end ligation though to a lesser extent than T3 ligase and T4 ligase.
- Full hairpin sequences used in this disclosure can be found in Table 2. Nucleic acid end abbreviation: 3′ indicates 3′-OH, 5P′- indicates 5′-PO 4 .
- FIG. 6J shows an overview of XNA ligation products from XNA tailed hairpins. XNA ligation reactions were optimized making the following considerations of possible side products.
- FIGs 6KA-6KE show screening and optimization of ligation conditions across all XNA bases. All tailing reactions used conditions listed in Table 8 unless otherwise specified.
- Ligation reactions were performed using 4.7 ⁇ M of one oligo or 2.4 ⁇ M of two oligos with complementary tailed bases. Ligation reactions were incubated for 16 h at 16 ⁇ C using the specified ligase and carried out in 1X of NEB StickTogetherTM buffer which contains 7.5% (w/v) PEG 6000. Improperly ligated
- FIGs 6LA-6LC show results from screening T3 ligase, T4 ligase, T7 ligase for JV, X t K n , and BS c XNA ligation.
- Two blunt end hairpins that create a restriction enzyme site upon blunt ligation were purchased from IDT (5′Phos-NdeI-HP-1 and 5′Phos-NdeI-HP-2; Table 2). Blunt-end ligated hairpins create an NdeI restriction site, while successfully tailed and ligated hairpins do not.
- FIG. 6LA T3 ligase assay (272 U/ ⁇ L);
- FIG. 6LB T4 ligase assay (36 U/ ⁇ L);
- FIG.6LC T7 ligase assay (272 U/ ⁇ L) for reactions containing single hairpins or mixture of two hairpins (as indicated).
- FIGs 6MA-6MC show full gels of XNA tailing and XNA ligation using optimized conditions. All assays were done with a 5′-phosphorylated hairpin oligo with a 3′-blunt end, purchased from IDT (5′-Phos-11HP; Table 2). Each DNA/XNA base was tailed using conditions from Table 8. (FIG. 6MA) Full gel for optimized XNA tailing conditions from FIG. 2E. Tailing completeness was measured via T4 ligation.
- FIGs 6NA-6NF show a proof of concept for XNA tailing and XNA ligation cycling to insert two consecutive P ⁇ Z base pairs.
- FIG. 6NA Agarose gel showing steps in consecutive XNA insertion.
- FIG. 6NB A hairpin containing an MlyI restriction site adjacent to the site of XNA ligation is used (donor hairpin, HP D ).
- MlyI is a type IIS restriction enzyme (5′- GAGTCNNNNN ⁇ -3′) that leaves a blunt end after cutting.
- a donor hairpin with an MlyI site and an acceptor hairpin were tailed with P and Z respectively (generating HP D -P, HPA-Z), ligated and treated with exonucleases following the optimized conditions described in this disclosure, and then purified (lane 1).
- the purified construct contains a single P ⁇ Z base pair insertion.
- 3915-P1293WO.UW -14- site was prepared by XNA tailing (HPP-P).
- XNA ligation followed by MlyI and exonuclease treatment does not result in formation of a ligation product (lane 3).
- FIG. 6ND In a second round, reaction product mixture from lane 2 was tailed with Z to produce Z-tailed donor hairpin (HP D -Z) and Z-tailed PZ-acceptor hairpin (HP A -ZZ).
- XNA ligation followed by MlyI and exonuclease treatment does not result in formation of a ligation product (lane 4).
- FIGs 6OA-6OB show examples of basecalling XNA sequences with guppy.
- FIG. 6OA ONT guppy was trained to basecall sequences composed of standard nucleic acids (A, T, G, or C).
- A, T, G, or C standard nucleic acids
- FIG. 6PB Complete NNNNNNN library products for all XNA base pairs and blunt end ligation library sequenced in this disclosure.
- FIG. 6PC Self-ligation for library hairpins to check for incomplete tailing and pyrophosphorolysis products. Library hairpins were tailed with the listed XNA using conditions listed in Table 8, and 4.7 ⁇ M of each hairpin (except B* and Sc at 2.6 ⁇ M) was ligated to itself using the conditions listed in Table 10.
- FIGs 6QA-6QI show examples of variance minimization for segmentation steps of signal-to-sequence mapping.
- Signal-to-sequence mapping was performed using Tombo. Tombo uses an informed kmer model to improve the accuracy of signal-to- sequence mapping. Without a prior model, segmentation requires assigning each XNA to a standard base. Improper segmentation leads to inaccurate model parameter estimates. To minimize bias in segmentation, one can have assigned each XNA to the standard base that minimized the total variance in observed kmer signal levels.
- FIGs 6RA-6RE show example traces of signal deviation from the standard model.
- FIG. 6S shows an example xenomorph preprocessing pipeline.
- Xenomorph preprocess integrates basecalling, raw multi-to-single fast5 conversion, reference sequence fasta conversion, segmentation, and level assignment into a single command.
- Level extracted output files from xenomorph preprocess are inputs to basecalling through alternative hypothesis testing using xenomorph morph. Separating the preprocessing steps from alternative hypothesis testing allows users to experiment with basecalling using various model parameter settings or with alternative models without having to rerun the slower signal extraction steps.
- xenomorph preprocess uses guppy for initial basecalling, minimap2 for initial basecall-reference alignment, and ONT Tombo for signal normalization and signal-to-sequence alignment.
- FIGs 6TA-6TC show PCR amplification and sequencing of a DNA template with a P ⁇ Z base pair.
- FIG. 6TA Synthetic template DNA containing a P ⁇ Z base pair was amplified with Taq polymerase in a pH 8.0 buffer with varying concentrations of dxNTP and dNTP (Tables 22, 23). PCR products were sequenced on a MinION nanopore flow cell then basecalled for PZ detection. Read fractions that basecalled to (FIG. 6TB) P and (FIG. 6TC) Z for each condition are shown. PCR conditions differ by concentration of dxNTP and dNTPs used. The remaining fraction for each base corresponds to G and C basecalls (the most likely standard mutation for P and Z), respectively.
- FIGs 6UA-6UB show construction of 12-letter DNA for nanopore sequencing. All assays were performed using 12-letter DNA construction oligos as
- FIG. 6V shows an example workflow from sequencing to heptamer classification.
- FIGs 6WA-6WB and 6XA-6XB show an example method for generating a defined non-standard nucleotide base pair library that uses a Type IIS restriction enzyme and a context barcode (“Barcode”) associated with a sequence context and a pool barcode (“Pool-Barcode”) associated with a non-standard nucleotide, as well as steps for sequencing and machine learning (ML) model training. Randomer region indicated.
- FIGs 6YA-6YF show example process flows for training ML models for processing read data obtained by nanopore sequencing of polynucleotide sequences containing non-standard nucleotides (FIGs 6YA-6YD), as well as base calling using trained ML models for quantification of XNA retention in PCR reactions (FIG.6YE) and quantification of XNA transcription errors from in vivo transcription (FIG.6YF).
- the present disclosure provides an array of breakthrough approaches for synthesizing polynucleotide (e.g., DNA) sequences containing at least one non-standard nucleotide.
- the non-standard nucleotide can include a hydrogen bonding pattern that is consistent or compatible with a hydrogen bonding pattern of a standard or existing
- 3915-P1293WO.UW -18- nucleotide e.g., C, G, T, A
- the present disclosure also provides breakthrough approaches for synthesizing polynucleotide sequences containing one or more non-standard nucleotides, optionally using next-generation sequencing (NGC) platforms, such as nanopore sequencing.
- NGC next-generation sequencing
- the disclosure also enables non-standard nucleotides to be integrated into a wide range of technologies, such as biological computing and information storage systems, therapeutics, aptamers, biosensors, and the like.
- Methods of synthesizing polynucleotides containing one or more non- standard nucleotides make use of an N+1 tailing reaction of a suitable DNA polymerase. Accordingly, in an aspect, the disclosure provides a method for generating an N+1 tailing product comprising a non-standard nucleotide that is covalently bound with a 3’ end of a precursor double-stranded DNA (dsDNA) template, such that the non-standard nucleotide is non-base-paired.
- dsDNA double-stranded DNA
- the method comprises combining the precursor dsDNA template with a DNA polymerase and a non-standard deoxyribonucleotide triphosphate (dNTP) under a reaction condition conducive to facilitate a blunt-end N+1 addition of the non-standard nucleotide to the 3’ end of the precursor dsDNA template by the DNA polymerase.
- dNTP deoxyribonucleotide triphosphate
- the non-standard nucleotide is a xenonucleotide (XNA) and the non-standard dNTP is a deoxy-xeno-ribonucleotide triphosphate (dxNTP).
- the DNA polymerase comprises a polypeptide sequence of a small Klenow Fragment (KF exo-) of DNA Polymerase I, as further described herein.
- the polypeptide sequence comprises a sequence of SEQ ID NO:2.
- a variety of XNAs can be incorporated into DNA using methods of the present disclosure, however, it was found that improvement or optimization of reaction conditions allows for the N+1 tailing reaction to proceed at an acceptable rate.
- the non-standard nucleotide being added is B or p, and the reaction condition proceeds at about 37°C for between about 1-16 hours and comprises about 0.71
- the non-standard nucleotide is selected from S n , S c , Z, X t , K n , J, and V, and the reaction condition proceeds at about 60°C for between about 4-16 hours and comprises about 0.29 U/ ⁇ L of the DNA polymerase and about 1.19 mM of the non- standard dNTP. While these or similar conditions were found to be effective for the disclosed reaction, other conditions, including less-than-optimal or non-improved conditions, can be implemented in embodiments without departing from the scope and spirit of the disclosure.
- the KF exo- of DNA polymerase I can be used in embodiments, this is not the only DNA polymerase that was surprisingly and unexpectedly found to have the ability to add non-standard nucleotides to a dsDNA template in an N+1 tailing reaction.
- the DNA polymerase comprises a polypeptide sequence of an engineered polymerase from a hyperthermophilic marine archaeon.
- the engineered polymerase is a variant of 9°N DNA polymerase.
- the polypeptide sequence comprises a sequence of SEQ ID NO:3 (e.g., Therminator TM ).
- the disclosure provides a method for generating a base pair of two nucleotides of a polynucleotide, wherein at least one nucleotide of the two nucleotides is a non-standard nucleotide.
- the base pair is comprised of one non-standard nucleotide base paired with one standard nucleotide.
- the base pair is comprised of a first non-standard nucleotide base paired with a second non-standard nucleotide.
- Creation of a base pair that is comprised of two non-standard nucleotides can be implemented with a method that comprises generating a second N+1 tailing product comprising a second non-standard nucleotide that is base-pair complementary with the non-standard nucleotide, such that the second non-standard nucleotide is non-
- the second N+1 tailing product can be generated based on the same or a similar reaction as the N+1 tailing product (of the first N+1 tailing reaction).
- the method can further include ligating the N+1 tailing product with the second N+1 tailing product, which forms a dsDNA ligation product that comprises a base pair between the non- standard nucleotide and the second non-standard nucleotide, as further described herein.
- the N+1 tailing product can be linear or, in embodiments, can comprise a hairpin.
- the second N+1 tailing product can be linear or, in embodiments, can comprise a hairpin.
- the dsDNA ligation product does not comprise a free 5’ end or a free 3’ end and is fully resistant to exonucleases.
- Additional non-standard nucleotides can be added iteratively and/or sequentially, such that two or more non-standard nucleotides can be added or inserted to a polynucleotide. This can be achieved by cleaving the dsDNA ligation product and exposing the non-standard base pair. The resultant blunt-end DNA template then becomes a template for a subsequent N+1 tailing reaction.
- the method comprises contacting the dsDNA ligation product with a type IIS restriction enzyme under a reaction condition that is conducive for the type IIS restriction enzyme to cleave the dsDNA ligation product, which generates a blunt-end DNA template.
- the resultant blunt-end DNA template comprises the base pair between the non-standard nucleotide and the second non-standard nucleotide.
- the method can be performed a plurality of times for creation of a plurality of base pairs between a plurality of non-standard nucleotides and a plurality of second non-standard nucleotides as sequence elements of the further dsDNA ligation product.
- the method comprises contacting the further dsDNA ligation product with a type IIS restriction enzyme under a reaction condition conducive for the type IIS restriction enzyme to cleave the further dsDNA ligation product to generate a further blunt-end DNA template that comprises the plurality of base pairs between the plurality of non-standard nucleotides and the plurality of second non-standard nucleotides.
- the method is modular and can be repeated any number of times for addition of any number of non-standard nucleotides, either with non-standard nucleotides added in a continuous manner or in a manner such that the non-standard nucleotides are interspersed with, or interrupted by, one or more standard nucleotides, for example.
- a quantity of non-standard nucleotides added to a polynucleotide with a method of the disclosure is selected from the group including, but not necessarily limited to, the set of integers defined by the range of 1 to 10,000,000,000, inclusive.
- a quantity of standard nucleotides added to a polynucleotide with a method of the present disclosure is selected from the group including, but not necessarily limited to, the set of integers defined by the range of 1 to 10,000,000,000, inclusive.
- the non-standard nucleotide comprises an epigenetic modification, a modified sugar, a phosphate backbone, a nucleobase, a nucleobase that can hydrogen bond to a second base, a nucleobase that can base pair (without hydrogen bonding) to a second base, a nucleobase that relies on steric exclusion for base pairing, a nucleobase that relies on hydrophobic interactions for base pairing, a nucleobase that relies on a transition metal complex for base pairing, a chemical modification, or any combination thereof.
- the non-standard nucleotide is the nucleobase that is configured to hydrogen bond to the second base and the second base is a standard base or a non-standard base. In other example embodiments, the non-standard nucleotide is the nucleobase that can base pair (without hydrogen bonding) to the second base and the second base is a standard base or a non-standard base. In embodiments, the non-standard nucleotide comprises an epigenetic modification or is 4-methyl-cytosine, 5-methyl cytosine, 6-methyl adenosine, 5-hydroxymethyl cytosine, 7-methylguanosine, or N6-methyladenosine.
- the non-standard nucleotide comprises the chemical modification and the chemical modification comprises a fluorophore, a biotin, a terminal alkyne, an azide, a cyclooctyne, a tetrazine, a terminal alkene, a phosphine, a halo-alkane, an aldehyde, a thiol, a transition metal complex, another reactive handle, or any combination thereof.
- the disclosure also contemplates products, and in at least some instances, intermediates, of methods herein as also being within the scope of the disclosure.
- the disclosure provides a dsDNA ligation product that can comprise a non-standard nucleotide.
- the disclosure provides a further dsDNA ligation product that can comprise two or more non-standard nucleotides.
- the disclosure contemplates defined libraries of non-standard nucleotide base pairs, in any of a variety of nucleotide contexts, produced by the methods
- the disclosure provides a defined non-standard nucleotide base pair library comprising a library polynucleotide sequence of the dsDNA ligation product or the blunt-end dsDNA template.
- the library polynucleotide sequence comprises a base pair between a non- standard nucleotide and a second non-standard nucleotide.
- a plurality of base pairs can be incorporated into one or more defined libraries.
- the disclosure provides a defined non-standard nucleotide base pair library comprising a library polynucleotide sequence of a further dsDNA ligation product or a further blunt-end dsDNA template, such that the library polynucleotide sequence comprises the plurality of base pairs between the plurality of non-standard nucleotides and the plurality of second non-standard nucleotides.
- a library polynucleotide sequence further comprises a context barcode associated with a sequence context adjacent to a base pair of a non- standard nucleotide and a second non-standard nucleotide of the library polynucleotide sequence, and a pool barcode associated with the non-standard nucleotide, the second non-standard nucleotide, or both.
- These or similar barcodes can be comprised of standard or otherwise sequence-able nucleotides, such that the identities of the non-standard nucleotides and the contexts can be known with a high degree of confidence. This facilitates correlation between the empirical data and the non-standard nucleotide bases being observed.
- Machine learning can be used with one or more methods for facilitation of sequence data analysis.
- the disclosure provides a method for generating a machine learning (ML) model that correlates one or more observed current reads with an unknown non-standard nucleotide, for assignment of an identity to the unknown non-standard nucleotide.
- ML machine learning
- Such a method comprises sequencing, with a nanopore sequencing method, the defined non-standard nucleotide base pair library to produce the one or more observed current reads, and training, with a ML algorithm, the ML model to
- 3915-P1293WO.UW -23- associate the one or more observed current reads with a known identity of a defined non- standard nucleotide of the defined non-standard nucleotide base pair library.
- the ML model can be configured to assign the identity to the unknown non-standard nucleotide based on the known identity of the defined non-standard nucleotide.
- the ML model comprises a convolutional long short term memory recurrent neural network (LSTM RNN), however, other ML models can be implemented, in embodiments.
- LSTM RNN convolutional long short term memory recurrent neural network
- the disclosure also contemplates computer memory, computer products, computer devices, computer systems, and the like, that implement all or part of one or more methods of the disclosure as being within the scope of the disclosure.
- the disclosure provides a non-transitory computer-readable storage medium having stored thereon at least part of a ML model.
- the disclosure provides a computational device or computational system comprising the non-transitory computer- readable storage medium.
- the disclosure provides a nanopore sequencing kit, device, or system comprising the non-transitory computer-readable storage medium, optionally further including instructional materials for use of the kit.
- the disclosure provides novel and innovative tools for use in synthesizing and sequencing polynucleotides containing non-standard nucleotides. Accordingly, in an aspect, the disclosure provides a method for basecalling a non- standard nucleotide expanded alphabet.
- the method comprises sequencing, with a nanopore sequencing method, a subject polynucleotide sequence that comprises a non-standard nucleotide to generate a subject current read, computing, with the computational device or computational system, the known identity of the defined non-standard nucleotide of the defined non-standard nucleotide base pair library associated with the subject current read with for an association, and computing, based on the association, a structure of the non-standard nucleotide.
- the structure of the non-standard nucleotide can include, correspond, or relate to an identity of the non-standard nucleotide.
- circuitry includes dedicated hardware having electronic circuitry configured to perform operations or computations on a dedicated basis, without any use of microprocessors, central processing units, or software or firmware or processor-executable instructions.
- circuitry includes, among other things, one or more computing devices such as one or more processors (e.g., microprocessor(s)), one or more central processing units (CPU), one or more digital signal processors (DSP), one or more application-specific integrated circuits (ASIC), one or more field-programmable gate arrays (FPGA), or the like, or any variations or combinations thereof, and can include discrete digital and/or analog circuit elements or electronics, or combinations thereof.
- processors e.g., microprocessor(s)
- CPU central processing units
- DSP digital signal processors
- ASIC application-specific integrated circuits
- FPGA field-programmable gate arrays
- circuitry includes combinations of circuits and computer program products having software or firmware processor-executable instructions stored on one or more computer readable memories, e.g., non-transitory computer-readable storage mediums, that work together to cause a device or system to perform one or more methodologies or technologies described herein.
- circuitry includes circuits, such as, for example, microprocessors or portions of microprocessors, that require software, firmware, and the like for operation.
- circuitry includes an implementation comprising one or more processors or portions thereof and accompanying software, firmware, hardware, and the like.
- circuitry includes a baseband integrated circuit or applications processor integrated circuit or a similar integrated circuit in a server, a cellular network device, other network device, or other computing device.
- circuitry includes one or more remotely located components.
- remotely located components e.g., server, server cluster, server farm, virtual private network, etc.
- non-remotely located components e.g., desktop computer, workstation, mobile device, controller, etc.
- remotely located components are operatively connected via one or more receivers, transmitters, transceivers, or the like.
- Embodiments include one or more data stores that, for example, store instructions and/or data.
- Non-limiting examples of one or more data stores include volatile memory (e.g., Random Access memory (RAM), Dynamic Random Access memory (DRAM), or the like), non-volatile memory (e.g., Read-Only memory (ROM), Electrically Erasable Programmable Read-Only memory (EEPROM), Compact Disc Read-Only memory (CD-ROM), or the like), persistent memory, or the like. Further non- limiting examples of one or more data stores include Erasable Programmable Read-Only memory (EPROM), flash memory, or the like.
- the one or more data stores can be connected to, for example, one or more computing devices by one or more instructions, data, or power buses.
- circuitry includes one or more computer-readable media drives, interface sockets, Universal Serial Bus (USB) ports, memory card slots, or the like, and one or more input/output components such as, for example, a graphical user
- circuitry includes one or more user input/output components that are operatively connected to at least one computing device to control (electrical, electromechanical, software- implemented, firmware-implemented, or other control, or combinations thereof) one or more aspects of the embodiment.
- circuitry includes a computer-readable media drive or memory slot configured to accept signal-bearing medium (e.g., computer-readable memory media, computer-readable recording media, or the like).
- a program for causing a system to execute any of the disclosed methods can be stored on, for example, a computer-readable recording medium (CRMM), a signal-bearing medium, or the like.
- signal-bearing media include a recordable type medium such as any form of flash memory, magnetic tape, floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), Blu-Ray Disc, a digital tape, a computer memory, or the like, as well as transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link (e.g., transmitter, receiver, transceiver, transmission logic, reception logic, etc.).
- analog communication medium e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link (e.g., transmitter, receiver, transceiver, transmission logic, reception logic, etc.).
- signal-bearing media include, but are not limited to, DVD-ROM, DVD-RAM, DVD+RW, DVD-RW, DVD-R, DVD+R, CD-ROM, Super Audio CD, CD ⁇ R, CD+R, CD+RW, CD-RW, Video Compact Discs, Super Video Discs, flash memory, magnetic tape, magneto-optic disk, MINIDISC, non-volatile memory card, EEPROM, optical disk, optical storage, RAM, ROM, system memory, web server, or the like.
- the present application can include references to directions, such as “vertical,” “horizontal,” “front,” “rear,” “left,” “right,” “top,” and “bottom,” etc. These references, and other similar references in the present application, are intended to assist in helping describe and understand the particular embodiment (such as when the embodiment is positioned for use) and are not intended to limit the present disclosure to these directions or locations. [0091] The present application can also reference quantities and numbers. Unless specifically stated, such quantities and numbers are not to be considered restrictive, but examples of the possible quantities or numbers associated with the present application.
- “about” refers to the stated value and a range that includes values 11% above the stated value, 12% above the stated value, 13% above the stated value, 14% above the stated value, 15% above the stated value, 16% above the stated value, 17% above the stated value, 18% above the stated value, 19% above the stated value, 20% above the stated value, 21% above the stated value, 22% above the stated value, 23% above the stated value, 24% above the stated value, or 25% above the stated value.
- a range is stated, e.g., the range of 1-16, the stated range includes every value between the lower and upper limits as well as the lower and upper limits of the stated range, themselves, as stated values.
- the approximately stated range includes every value between the lower and upper limits as well as the lower and upper limits of the stated range, themselves, as stated values (e.g., 1 and 16 are each stated values), including those non-stated values that are near to or approximate the stated values according to practicable ranges as would be recognized by those skilled in the art or as otherwise described herein.
- the phrase “at least one of A, B, and C,” means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B, and C), including all further possible permutations when greater than three elements are listed.
- the term “A, B, and/or C” means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B, and C), including all further possible permutations when greater than three elements are listed.
- the term “or” is an inclusive “or”, and the phrase “A or B” means (A), (B), or (A and B).
- the term “and” requires both elements; for example, the phrase “A and B” means (A and B).
- the term “comprising”, is inclusive or open-ended and does not exclude additional, unrecited elements or method steps.
- Example 1 Enzymatic Synthesis and Nanopore Sequencing of 12-letter Supernumerary DNA
- Abstract The 4-letter DNA alphabet (A, T, G, C) is an elegant, yet non- exhaustive solution to the problem of storage, transfer, and evolution of biological information. This example provides strategies for both writing and reading DNA with expanded alphabets composed of up to 12 letters (A, T, G, C, B, S, P, Z, X, K, J, V).
- an enzymatic strategy is devised for inserting a singular, orthogonal xenonucleic acid (XNA) base pair into standard DNA sequences using 2′-deoxy-xenonucleoside triphosphates as substrates. Integrating this strategy with combinatorial oligos generated on a chip, libraries are constructed containing single XNA bases for parameterizing kmer basecalling models for nanopore sequencing. These elementary steps are combined to synthesize and sequence DNA containing 12 letters – the upper limit of what is accessible within the electroneutral, canonical base pairing framework.
- the 4-letter standard genetic alphabet of DNA (A, T, G, C) is ubiquitous and one of the defining biomolecular signatures of life on Earth. The ability to read, write, and translate this information forms the basis for life as an emergent property of nucleic acid heteropolymers. Humanity has learned how to manipulate the 4 letters of DNA, spurring major advances in biotechnology, information storage, and healthcare.
- the standard nucleic acids can be components for diagnostic tests to screen for disease or detect toxins, therapeutics that create immune responses, and even as a molecular system for long-term storage of digital information.
- Parameters of biomolecular compatibility of expanded non-canonical hydrogen bonding base pairings include stability in the DNA double helix, the ability to be replicated by DNA polymerases, transcribed by RNA polymerases, reverse transcribed by reverse transcriptases, and even translated by the ribosome. These xenonucleotides are at the forefront of nucleic acids research since they significantly expand DNA’s chemical, structural, and binding repertoire.
- XNAs xenonucleic acids
- methods for sequencing of xenonucleic acids are decades behind that of DNA and RNA, and rely on low-throughput, non-multiplexed measurements, such as gel-shift assays, mass spectrometry, and selective conversion of XNAs to standard bases followed by Sanger sequencing.
- XNA sequencing technology is lower throughput, less sensitive, and less generalizable than the methods Sanger and Coulson developed in the 1970s and has no service-oriented solution.
- ATGC-sequencing technology is in its ‘third generation.’
- XNA XNA
- One possible solution is to adapt existing first-, second-, or third-generation DNA sequencing technology to work with more DNA letters.
- Nanopore sequencing has the ability to sequence non-canonical bases such as epigenetic and epitranscriptomic modifications.
- nanopore sequencing can be used for sequencing 8-letter hachimoji DNA (A, T, G, C, B, S c , P, Z) using the Hel308 motor protein with an MspA pore.
- third-generation (high throughput, multiplexable, single molecule, real-time) sequencing of supernumerary DNA is possible despite the “k-mer explosion” in possible current signals induced by an expanded DNA alphabet.
- previous efforts in this regard did not attempt to build models for decoding the nanopore current signals to nucleic acid sequences.
- Non-standard bases can be classified using commercial nanopores (e.g., GridION, ONT). This can show that commercial nanopore sequencing platforms are indeed capable of sequencing chemically modified nucleobases including 2,4-diamino- purine, 5-nitro-indole, and 5-octadiynyldeoxyuracil.
- 3915-P1293WO.UW -32- phosphoramidite synthesis – commercial access is both limited and costly, standing as a major barrier to entry.
- standard phosphoramidite synthesis costs for non- standard bases average around $100-400 USD/nt – or over 1000 times more expensive than A, T, G, C synthesis ($0.04-0.40 USD/nt).
- next-generation synthesis methods that have transformed the ability to explore sequence space (pooled synthesis, synthesis-on-a-chip, enzymatic synthesis) are not commercially available for orthogonal base pairs.
- Enzymes like terminal deoxynucleotidyl transferase can catalyze non-templated addition of a wide range of modified nucleotide building blocks on ssDNA, and can do so at neutral pH.
- TdT terminal deoxynucleotidyl transferase
- 3915-P1293WO.UW -33- enzymes precludes them from being used for sequence-defined addition of dNTPs. More so, TdT-based enzymatic synthesis of nucleic acids would require specially protected building blocks or polymerase-nucleotide conjugates that are not commercially available. [0111] Lacking a suitable alternative, it was needed to develop an enzymatic synthesis strategy that would be flexible enough to handle all desired xenonucleobases using 2′-deoxynucleoside triphosphates as the universal building block and be specific enough to catalyze a non-processing N+1 addition.
- the 2′- deoxy-xenonucleoside triphosphates of the remaining bases were chemically synthesized: dX t TP, dK n TP, dJTP, dVTP (FIGs 6BA-6BE).
- a sensitive liquid chromatography/mass spectrometry (UPLC/QTOF) assay was developed for detecting tailing activity.
- UPLC/QTOF sensitive liquid chromatography/mass spectrometry
- the hairpin design of the substrates generates a desired dsDNA ligation product that lacks a free 5′ or 3′ end, making it fully resistant to exonucleases. Subsequent treatment of the ligation reaction with exonucleases therefore allows one to remove unreacted starting material and partially ligated products.
- the ideal dsDNA ligase should be able to ligate DNA strands with single nucleotide overhangs and have relaxed specificity for both the overhanging nucleotide
- phage ligases T3 DNA ligase, T4 DNA ligase, and T7 DNA ligase
- FOG.6I modified and non-standard nucleotide substrates
- a negative control can be performed in which hairpins are incubated individually in the presence of the respective ligases (FIG. 6J). In these single hairpin reactions, any ligation product would indicate either blunt-end ligation, from incomplete XNA tailing, or formation of a self-ligation (mismatch ligation) product.
- Nanopore sequencing from Oxford Nanopore Technology ® ) has features that make it adaptable for sequencing supernumerary DNA: it can sequence single DNA molecules without amplification, without the requirement for fluorescently labeled building blocks, and with high throughput (100k-10M reads per run). In nanopore sequencing, an ion current signal is generated as single-stranded DNA
- 3915-P1293WO.UW -36- is threaded through a protein nanopore. Conversion of signal-to-sequence, or basecalling, is performed computationally by either statistical or machine learning models. However, since commercial nanopore basecalling algorithms were empirically trained on standard 4-letter DNA (A, T, G, C), they are unable to decode xenonucleobases (B, S n , S c , P, Z, X t , K n , J, V; FIGs 6OA-6OB). [0117] With this in mind, one can build and measure diverse DNA-XNA libraries that can be used to construct de novo ground-up models for sequencing single xenonucleotides within a natural DNA context.
- NNNNNNN library was sequenced independently for model building, generating between 150k – 800k raw reads per library (Tables 14-15). Signals were then segmented and aligned to each barcoded reference sequence while filtering reads that aligned to possible ligation side products (FIGs 3B, 6J and 6QA-6QI). From these signal-to-sequence alignments, XNA-heptamer
- Example kmer signal distributions can be generated. Mean signal currents spanning all 2,304 xenonucleotide-containing kmers, ⁇ k , are shown in FIG. 3C and comparisons can be made to the most similar standard bases. [0120] Basecalling single xenonucleotide substitutions. Next, one can apply this model to predict signals emitted by sequences that contain a single xenonucleotide (B, S n , S c P, Z, X t , K n , J, or V).
- the expected signal is found by decomposition of a heptamer sequence into its constitutive kmers, then using measured kmer means to model current transitions (e.g., AGTBCCT ⁇ [ ⁇ ⁇ , ⁇ ⁇ , ⁇ ⁇ , ⁇ ⁇ ]).
- FIG. 3D shows examples of signal-level predictions generated by an example model (XNA model) overlayed over observations of that library sequence and the most similar standard-bases model (DNA model).
- XNA model example model overlayed over observations of that library sequence and the most similar standard-bases model
- the modeled probability density function can be used to calculate the likelihood that an observed set of signal levels was emitted from a particular sequence.
- the correct basecall should be the one that has the maximum likelihood of observation.
- the modularity of the 4-nt kmer model allows to make a diverse set of comparisons between a xenonucleotide and 1) a standard base (e.g., P vs. G), 2) any of the standard bases (e.g., P vs. A, T, G, C), or 3) any of the full supernumerary letters (e.g., P vs. A, T, G, C, B, S c , Z, X t , K n , J, V).
- XNA tailing and XNA ligation to enzymatically synthesize a new validation library composed of contextually diverse sequences.
- this library the nucleotide sequences adjacent to the XNA-containing heptamer can be further diversified making them further removed in sequence space from those used to build the 4-nt kmer models.
- This validation library can be built
- each set of hairpins can contain 10 unique sequences.
- the 20 bp at the 3′-end of each hairpin can be designed by randomly selecting standard bases from a uniform probability distribution.
- Individual hairpin sets can be tailed with XNA bases using XNA tailing.
- Two sets of hairpins with complementary tails can be ligated, producing a library of 100 possible sequences (10 x 10), with each sequence containing a single XNA base pair. These ligated hairpin libraries can be pooled together and sequenced for benchmarking (FIGs 4B-4C).
- the elementary tailing and ligation synthesis steps can be coupled with an additional Golden Gate ligation to generate two proof-of- concept 12-letter supernumerary dsDNA hairpins: S c uper-12 and S n uper-12 (FIGs 6UA- 6UB, Tables 7, 12, and 13).
- exonucleases can be added to remove intermediary DNA products, generating the desired 244 bp 12-letter dsDNA product.
- basecalling can be performed two different ways: 1) by comparing the XNA base at a position against a model that contains all 12 possible nucleobases, and 2) by comparing the XNA base at a position against a model that contains the XNA and the most similar standard nucleobase. Even when all 12 letters are present in the model, the presently disclosed basecalling model is able to properly decode XNAs in S c uper-12 with 39-89% per-read recall (FIG. 5, Tables 25, 26). In an example experiment, for the S n uper-12 sequence, all but one XNA were properly decoded in the 12-letter model, with the exception being K n (per-read recall of 14%).
- a general strategy is described for incorporating up to four additional orthogonal base pairs into standard DNA, and these methods can be used to build openly accessible models for sequencing XNAs (B, S n , S c , P, Z, X t , K n , J, V) in a standard DNA context (A, T, G, C) on commercial nanopore devices.
- the enzymatic synthesis strategy developed utilizes unmodified 2′-deoxy-xenonucleoside triphosphates as the elementary building blocks, avoiding the use of phosphoramidites or caged-triphosphates.
- Nanopore sequencing of XNAs can be performed using a nanopore sequencing device. This significantly expands the accessibility of sequencing XNAs. As history in sequencing progress has shown, additional widespread adoption and collection of XNA nanopore sequencing data can help further catalyze the improvement of sequencing models with newer basecalling algorithms, including data- intensive deep learning models. As these methods improve and adoption widens, strategies for synthesis and sequencing of higher complexity nucleic acids are possible.
- an additional base pair enables site-specific incorporation of chemically modified groups, including the addition of nucleobases such as Z that can act as a Br ⁇ nsted base.
- Adenosine triphosphate sodium salt (ATP; A6419-5G), acetonitrile (A955-4; LC/MS-grade), formic acid (A118P- 500), ammonium acetate (A637-500), ammonium carbonate (207861-25G), Tris base (10708976001), 5 M betaine solution (B0300-1VL), 6 N hydrochloric acid (1430071000), GelGreen (SCT124), and sodium chloride (S3014-5KG) were purchased from Sigma-Aldrich (St. Louis, MO).
- AMPure XP beads (A63880) were purchased from Beckman Coulter (Brea, CA).
- T4 DNA ligase high concentration T4 DNA ligase (M0202M, M0202L), T7 DNA ligase (M0318L), T3 DNA ligase (M0317S), yeast inorganic pyrophosphatase (YiPP; M2403L), thermolabile proteinase K (P8111S), Exo III (M0206L), thermolabile Exo I (M0568L), Exo I (M0293L), Exo VII (M0379L), Exo VIII (truncated; M0545S), Klenow Fragment (exo-; M0212L), Taq polymerase (M0267L), Bsu polymerase (M0330S), Deep Vent (exo-) polymerase (M0259S), Bst polymerase (M0275S), Sulfolobus DNA polymerase IV (M0327S), Therminator polymerase (M0261L), NEBNext ⁇ Ultra TM II End Repair
- Xenonucleoside triphosphates dS c TP, dPTP, dZTP, dBTP (dSTP-401S, dPTP-201, dZTP- 101, dBTP-301P) were purchased from FireBird Biomolecular Sciences LLC (Alachua, FL).
- Xenonucleoside triphosphate dS n TP (M-1015) was purchased from TriLink
- the eluted oligo was then folded in 100 mM of NaCl and 10 mM Tris-HCl (pH 8.2) buffer by incubating at 90 ⁇ C for 3 minutes, then cooling at 0.1 ⁇ C/s until reaching 20 ⁇ C. 15 ⁇ L of this refolded oligo was incubated with 0.17 mM dNTP or dxNTP, 300 units of Exo III and either KF (exo-) with rCutSmart TM buffer or Therminator with ThermoPol ® buffer for 16 h. For reactions using KF, the reaction was incubated with 15 units of KF at 37 ⁇ C.
- oligos are first refolded by incubating 40 ⁇ M of oligo in a 100mM NaCl, 10mM Tris-HCl buffer (pH 8.2) at 90 ⁇ C for 3 minutes then cooling at 0.1 ⁇ C/s until reaching 20 ⁇ C.
- the refolded oligos are then tailed by incubating 23.8 ⁇ M of oligo in the presence of dNTP or dxNTP (1.19 mM or 2.38 mM), YiPP (0.005 U/ ⁇ L; except for the dATP tailing reaction which did not contain YiPP), polymerase (0.71 U/ ⁇ L Klenow Fragment (KF exo-), 0.29 U/ ⁇ L Therminator polymerase, or 0.71 U/ ⁇ L Taq polymerase), and polymerase buffer (either rCutsmart TM or ThermoPol buffer). Full conditions tabulated in Table 8.
- 3915-P1293WO.UW -43- reactions were terminated by heat inactivation at 72 ⁇ C for 20 min.
- Therminator and Taq reactions were terminated by addition of 1X rCutSmart TM buffer and 0.005 U/ ⁇ L of thermolabile proteinase K at 37 ⁇ C for 15 min, followed by subsequent heat inactivation at 72 ⁇ C for 20 min.
- hairpins were refolded.
- 19.8 ⁇ M of oligo was incubated with 1.8 U/ ⁇ L of ScaI-HF at 37 ⁇ C for 2 h, followed by subsequent heat inactivation at 80 ⁇ C for 20 min.
- oligos are first refolded by incubating 20 ⁇ M of oligo in a 100 mM NaCl, 10 mM Tris-HCl buffer (pH 8.2) at 90 ⁇ C for 3 minutes then cooling at 0.1 ⁇ C/s until reaching 20 ⁇ C.
- the refolded oligos are then tailed by incubating 11.9 ⁇ M of oligo in the presence of dNTP or dxNTP (1.19 mM or 2.38 mM), YiPP (0.005 U/ ⁇ L; except for the dATP tailing reaction which did not contain YiPP), polymerase (0.71 U/ ⁇ L Klenow Fragment (KF exo-), 0.29 U/ ⁇ L Therminator polymerase, or 0.71 U/ ⁇ L Taq polymerase), and polymerase buffer (either rCutsmart TM or ThermoPol buffer).
- Reactions were either incubated for 8 h at 37 ⁇ C (KF exo-); 1, 4, 8, or 16 h at 60 ⁇ C (Therminator); or 1 h at 60 ⁇ C (Taq). Following incubation, KF exo- reactions were terminated by heat inactivation at 72 ⁇ C for 20 min. Therminator and Taq reactions were terminated by addition of 0.005 U/ ⁇ L of thermolabile proteinase K at 37 ⁇ C for 15 min, followed by subsequent heat inactivation at 72 ⁇ C for 20 min. Following either set of heat inactivation steps, hairpins were refolded.
- Resulting hairpins contained a mixture of product (tailed hairpins) and unreacted starting material (3′-blunt end hairpins).
- T4 DNA ligase was then used to screen reactions for remaining unreacted 3′-blunt ends by adding 80 U/ ⁇ L of T4 DNA ligase alongside 1X T4 DNA ligase reaction buffer. These T4 ligation reactions were incubated at 16 ⁇ C for 2 h, after which T4 ligase was heat inactivated at 65 ⁇ C for 10 min.
- a synthetic oligo hairpin with a 3′-G overhang (5′Phos-HP-3′G , Table 2) was used in the T4 ligation reaction.
- the starting material (5′Phos-11HP) was used in the T4 ligation reaction. Reaction products were run on a 2% (w/v) agarose gel, stained with GelGreen,
- Exonuclease reactions were heat inactivated by incubation at either 80 ⁇ C for 20 min (for reactions containing Exo I) or at 70 ⁇ C for 20 min (for reactions containing thermolabile Exo I). Reaction products were run on a 2% (w/v) agarose gel, stained with GelGreen, and visualized using a blue light transilluminator.
- Consecutive insertion of XNA base pairs using MlyI type IIS restriction enzyme 5′-phosphorylated hairpin oligos were purchased from IDT (5′Phos-11HP, 5′Phos-15HP, and 5′Phos-ScaI-HP; Table 2). 5′-Phos-15HP contains an MlyI restriction site adjacent to site of XNA ligation.
- MlyI is a type IIS restriction enzyme (5′- GAGTCNNNNN ⁇ -3′) that leaves a blunt end after cutting.
- 5′Phos-15HP donor hairpin with MlyI site; abbreviated HPD
- 5′Phos-11HP acceptor hairpin; abbreviated HPA
- HPD donor hairpin with MlyI site
- HPA acceptor hairpin
- These two hairpins were then ligated and subsequently treated with exonuclease following the optimized conditions described in “ XNA ligation conditions and reaction components.” This material was purified using Zymo’s DNA Clean and Concentrator and eluted in 30 ⁇ L of elution buffer.
- the purified construct contains a single P ⁇ Z base pair insertion and was digested using 1.24 U/ ⁇ L of MlyI and 1X rCutSmart TM buffer at 37 ⁇ C for 2 h then heat inactivated at 65 ⁇ C for 20 min. MlyI digestion results in a hairpin with a terminal P ⁇ Z,
- 5′-phosphorylated oligo pools (purchased as oPoolsTM from Integrated DNA Technologies) were designed to form blunt-end hairpins with two barcodes: a 24 nt Triplet-barcode [NNN-BC] and an 8 nt pool-barcode [Pool- BC] (FIG. 3A, Tables 3-5).
- the Triplet-barcode is linked to the NNN sequence at the 3′- blunt end of the hairpin, while the pool-barcode is used to decode which dxNTP/dNTP was tailed (Table 12).
- Each Triplet-barcode maps 1:1 with a corresponding NNN sequence adjacent to an XNA base.
- Ligation reactions for libraries generate combinations with two different pool barcodes. Restriction enzyme cut sites were included upstream of Triplet-barcodes to remove hairpins following ligation reactions and prepare DNA for nanopore sequencing. Full hairpin sequences in each library can be produced based on the present disclosure.
- Val-20 validation library design 5′-phosphorylated oligo pools (purchased as oPoolsTM from Integrated DNA Technologies) were designed to form blunt-ended hairpins with a variable 20 nt region at the end (Tables 3, 6).
- variable 20 nt region was designed computationally by randomization with a uniform prior probability for each base.
- Candidate sequences were passed through IDT oligo analyzer tool to remove sequences that might form secondary structures that could disrupt hairpin formation.
- Each validation oligo pool contained 10 unique sequences (six total pools: Val_A-F; Table 6) and was synthesized at a scale of 50 pmol/oligo.
- Two different validation oligo pools can be tailed with a dxNTP. Ligating two pools together (with complementary N+1 tails) results in a library with 100 possible sequences (10 x 10 combinations). Restriction enzyme cut sites were included upstream of these variable regions for nanopore library preparation following ligation.
- the assembled product contains two different restriction sites for hairpin removal, 5′- GATATC-3′ (EcoRV) and 5′-AGTACT-3′ (ScaI).
- EcoRV 5′- GATATC-3′
- 5′-AGTACT-3′ 5′-AGTACT-3′
- Asymmetric presence of restriction sites on the hairpins allows us to remove a singular hairpin and therefore generate a blunt end on the assembled product.
- the resulting dsDNA contains a single 3′- and 5′-end.
- Subsequent library preparation and sequencing of dsDNA results in reads where both sense and antisense strands, containing all 12-nucleobases, can be read in a single sequencing event (S c uper-12 and S n uper-12; FIG.5, FIGs 6UA-6UB).
- NNNNNNN library, validation library, and 12-letter DNA preparation by XNA tailing and XNA ligation were first refolded by incubating 20 ⁇ M of oligo pool in a 100 mM NaCl, 10 mM Tris- HCl (pH 8.2) buffer at 90 ⁇ C for 3 minutes then allowing for cooling at 0.1 ⁇ C/s until reaching 20 ⁇ C.
- oligos or oligo pools were tailed with a corresponding dxNTP using tailing conditions listed in Table 8. Reactions tailed with KF exo- were heat inactivated, while those tailed with Therminator were inactivated by thermolabile proteinase K treatment. Following inactivation of polymerase, oligos were refolded. Tailed oligo or oligo pools with complementary 3′-ends were then ligated with either T4 DNA ligase, T3 DNA ligase, or T7 DNA ligase using ligation conditions listed in Table 10. As a negative control for tailing, the starting material 3′-blunt end oligo or oligo pool (e.g.
- Purified NNN-oligo pools were then digested for 1 h at 37 ⁇ C using 1 U/ ⁇ L of BbsI-HF and rCutSmart TM buffer, then purified again using AMPure XP with a 2:1 bead-to-sample ratio and eluted in 30 ⁇ L of nuclease-free water. Purified NNNNNNN library samples were then prepared for nanopore sequencing following the details in the Nanopore sample preparation section.
- ligated validation oligo pool reactions were purified using AMPure XP with a 3:1 bead-to-sample ratio and eluted in 30 ⁇ L of elution buffer (10 mM Tris-HCl, pH 8.2), then combined to a final concentration of 0.2 ⁇ M/pool before enzymatic digestion for 1 h at 37 ⁇ C using 1 U/ ⁇ L of BbsI-HF and 1X rCutSmart TM buffer.
- Each ligated oligo set was then combined at a final equimolar concentration of 0.05 or 0.075 ⁇ M/oligo before proceeding to a Golden Gate ligation with the addition of 1 U/ ⁇ L of BbsI-HF, 20 U/ ⁇ L of T4 DNA ligase, 1X rCutSmart TM buffer, and 1X T4 DNA Ligase Reaction Buffer (FIG.6UA).
- the Golden Gate ligation included 60 cycles of 1) 37 ⁇ C for 5 min 2) 16 ⁇ C for 5 min, finalized by a step at 37 ⁇ C for 10 min, and a heat inactivation step at 65 ⁇ C for 20 min.
- the reaction was further digested to remove incomplete ligation products by the addition of 0.45 U/ ⁇ L of BbsI-HF, 0.45 U/ ⁇ L of thermolabile Exo I, 2.27 U/ ⁇ L of Exo III, and 0.23 U/ ⁇ L of Exo VIII (truncated), incubating at 37 ⁇ C for 1 h, followed by a heat inactivation step at 70 ⁇ C for 20 min.
- This reaction was then purified using AMPure XP with a 1.8:1 bead-to-sample ratio and eluted in 30 ⁇ L of nuclease-free water.
- the hairpin on either end of the complete, desired product was removed by splitting the reaction in half and adding 1X rCutsmart TM and 2.78 U/ ⁇ L of either ScaI-HF or EcoRV-HF. These reactions were incubated at 37 ⁇ C for 1 h, followed by a heat inactivation step at 80 ⁇ C for 20 min. The split samples were then
- Nanopore sample preparation and data acquisition Nanopore sample preparation followed standard Flongle or MinION Genomic DNA by Ligation protocol (available on the ONT community) using the SQK-LSK110 preparation kit with the following modifications.
- the NEBNext FFPE Repair Mix was omitted to avoid potential XNA removal by repair enzymes.
- the volume of the repair mix was replaced by nuclease-free water.
- AMPure XP bead-to-sample ratio was increased to 2:1 for the NNNNNNN library, and 3:1 for the validation.
- Signal-to-sequence mapping uses the Tombo (github.com/nanoporetech/tombo, ONT) pipeline.
- Tombo github.com/nanoporetech/tombo, ONT
- raw multi FAST5 files are split into single FAST5 using the ont-fast5-api (github.com/nanoporetech/ont_fast5_api, ONT) command multi_to_single_fast5.
- Single FAST5 files are then basecalled using guppy (version 6.1.5+446c355, ONT) with the high accuracy configuration settings (dna_r9.4.1_450bps_hac.cfg).
- FASTQ basecalls
- 3915-P1293WO.UW -50- passing default guppy quality score settings are assigned to their corresponding single FAST5 files using Tombo command Tombo preprocess annotate_raw_with_fastqs.
- Tombo uses a reference FASTA file that contains ground- truth sequences.
- the reference FASTA file was generated programmatically by considering every possible combination of ligation product including mismatch homo- ligation (e.g. P1-A+P1-A, see Table 12), blunt-end ligations leading to a gap (e.g. P1-P2, P1-P1, P2-P2), or pyrophosphorolysis ligation products.
- Full reference alignment files are deposited in the SRA (Table 31).
- the ground truth XNA (B, S n , S c , P, Z, J, V, X t , K n ) base needs to be substituted for a canonical base (A, T, G, C) for processing in a FASTA format.
- XNAs in reference sequences were substituted for the canonical bases that minimized observed variance in kmer levels; determined empirically (B ⁇ A; S n ⁇ A; S c ⁇ A; P ⁇ G; Z ⁇ C; X ⁇ A; K ⁇ G; J ⁇ C; V ⁇ G).
- Substituted bases are in general agreement with observations from basecalling XNA-containing reads with guppy (FIGs 6OA-6OB and 6QA-6QI). Signal-to-sequence mapping then proceeds using Tombo resquiggle.
- the Tombo resquiggle command uses mappy (minimap2 version 2.22-r1101 with ONT configuration) to first assign each single FAST5 read to a reference FASTA sequence based on the given FASTQ basecall. Following sequence assignment, Tombo uses dynamic programming for signal segmentation and proceeds to perform per-read signal normalization. As a general comment on the limitations of segmentation-based basecalling, Tombo is sensitive to the reference canonical base chosen for signal assignment.
- the per-read, median normalized level signal for each base is then extracted using the Tombo resquiggle results through the Tombo Python API. Details regarding how Tombo performs mapping, matching, and normalization, along with the Tombo Python API usage, can be found in the Tombo documentation (nanoporetech.github.io/tombo/).
- the resulting preprocessed and normalized signal- extracted data is exported to a CSV file for downstream processing (Tables 17, 18).
- the entire data preprocessing steps, including command groups and parameter settings, are wrapped into a single command (xenomorph preprocess) and available on the Xenomorph repository.
- XNA kmer model parameterization NNNNNNN libraries for a given XNA base pair are prepared as previously described in “NNNNNNN library, validation library, and 12-letter DNA preparation by XNA tailing and XNA ligation” and sequenced
- Signal-to-sequence mapping is then performed using the previously described pipeline in “Raw nanopore data preprocessing and signal-to- sequence mapping” with the following specifications. Reads that do not fully map with full coverage of triplet-barcodes and pool-barcodes of the XNA position are filtered out. Likewise, reads with a q-score ⁇ 9 and signal match score > 3 are not used in the model building. Signal-to-sequence mapping is also carried out with blunt-end ligation products (i.e. NNNNNN, or no XNA insertion), such that sequences that map better to blunt-end ligation products are not used.
- blunt-end ligation products i.e. NNNNNN, or no XNA insertion
- the 4-nt kmer was chosen in this disclosure as a proof of concept since reasonable kmer coverage could be obtained for the full NNNNNNN library (512 kmers per XNA base pair insertion) in a single Flongle flow cell run.
- each kmer consists of four nucleotide bases centered around the 0 th position nucleotide, as exemplified in Table 16. Therefore, each heptamer sequence (NNNNNNN) is composed of four, 4-nt kmers (i.e. +2 pos NNNN, +1 pos NNNN, 0 pos NNNN, -1 pos NNNN).
- Observed kmer levels are modeled as normal distributions parameterized with a mean ( ⁇ ⁇ ⁇ and standard deviation ( ⁇ ⁇ ). These parameters are used to describe observed kmer signal level probability density functions: ⁇ ⁇ ⁇ ⁇ ⁇ P ⁇ ⁇ ⁇ 1 e ⁇ ⁇ ⁇ P ⁇ ⁇ ⁇ probability that from kmer ′ ⁇ ′ ⁇ ⁇ ⁇ normalized kmer level mean for kmer ′ ⁇ ′ ⁇ ⁇ standard deviation of median normalized kmer levels for kmer ′ ⁇ ′ ⁇ ⁇ ⁇ observed median normalized kmer level
- level model means were approximated using the following kmer-specific bandwidth selection: I QR ⁇ ⁇ 0.9 ⁇ argmin ⁇ 1 , ⁇ ⁇ ⁇ ⁇ .34 BW ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ Silverman ⁇ s rule of thumb IQR ⁇ Interquartile range of kmer levels for kmer ′ ⁇ ′ ⁇ ⁇ ⁇ standard deviation of median normalized kmer levels for kmer ′ ⁇ ′ ⁇ ⁇ ⁇ number of observations ⁇ measurements ⁇ of kmer ′ ⁇ ′ BW ⁇ bandwidth used for kernel density estimate [0153] For practical purposes detailed in the Tombo documentation (github.com/nanoporetech/tombo), one can set a global standard deviation taken as the average observed standard deviation across all kmers in the model (i.e.
- kmer models Documentation for model building and code used to generate kmer models can be found in the Xenomorph repository (github.com/xenobiolab/xenomorph). For quality control, the entire experimental and computational procedure, from building libraries to generating 4-nt kmer models, was performed in duplicate. Models were built from data collected in a single run. The
- NNNNNNN For each heptamer sequence (NNNNNNN) a set of mapping kmer sequences (NNNN, NNNN, NNNN, NNNN) and observed signal levels (I NNNN , I NNNN , I NNNN , I NNNN ) ( ⁇ ⁇ , ⁇ ⁇ , ⁇ ⁇ , ⁇ ⁇ ⁇ are extracted. See Table 16 for additional information on numbering nomenclature of kmer sequences within a heptamer region.
- the kmer probability density function described previously in “XNA kmer model parameterization,” is used to estimate the probability that each observed level (e.g., ⁇ ⁇ ) came from the corresponding kmer (e.g.
- LLR Log-likelihood ratio
- LLR ratio > 0 is used as the default criteria for deciding if the XNA model is more likely than an alternative model for a given observed sequence of signals.
- ORLLR is a modified LLR test statistic that is nominally more robust towards outliers.
- the ORLLR test statistic is defined as follows: ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 2 sequence ⁇ ⁇ ⁇ median normalized kmer level for kmer ′ ⁇ ⁇ ⁇ ′ ⁇ ⁇ ⁇ median normalized kmer level for kmer ′ ⁇ ⁇ ⁇ ′ ⁇ ⁇ ⁇ ⁇ scale difference ⁇ ⁇ global standard deviation of median normalized kmer levels
- Consensus recall and specificity perform sequence-level assignments in calculations (rather than per-read level). Specificity of kmer models was calculated by alternative hypothesis testing on sequences that did not contain any XNAs. The definition of each statistic is provided below.
- T P recall ⁇ T P ⁇ FN TP ⁇ True positive FN ⁇ False negative F
- P specificity ⁇ 1 ⁇ FDR ⁇ 1 ⁇ F P ⁇ TN FP ⁇ False positive TN ⁇ True negative FDR ⁇ False discovery rate
- Receiver operating characteristic Receiver operating characteristic (ROC) curves were generated using the roc_curve function from the scikit-learn python library.
- 3915-P1293WO.UW -56- contained XNA bases flanked by 20 randomly chosen canonical bases. Recall on the validation set was calculated at the per-read and consensus level as described previously in “Recall and specificity calculations.”
- PCR amplification and basecalling of P ⁇ Z template DNA Two complementary oligos containing P and Z (PCR_Template_P, PCR_Template_Z, Table 22) were synthesized by Firebird Biomolecular Sciences (Alachua, Fl) and hybridized in a 1:1 molar ratio.25 ng of this hybridized PZ DNA construct was used as the template for a PCR reaction.
- PCR reactions contained 0.2 ⁇ M of each forward and reverse primer (PCR_Amp_F, PCR_Amp_R1-4, Table 22), 5 U/ ⁇ L of Taq polymerase in 1X ThermoPol buffer (pH 8.0). Triphosphate concentrations for dxNTPs and dNTPs varied by condition (no dxNTP, limiting, equimolar, optimal) and are tabulated in FIGs 6TA-6TC. The PCR reaction then proceeded with thermocycler conditions tabulated in Table 23. PCR reactions were purified using Zymo DNA Clean and Concentrator and eluted in 30 ⁇ L of nuclease-free water.
- the Xenomorph XNA sequencing pipeline One of the goals of this disclosure was to build a publicly available end-to-end pipeline for validation of XNA incorporation in target sequences. As a proof of concept, one can create a tool in python called “Xenomorph” comprised of a pipeline consisting of two steps: 1) preprocessing - xenomorph preprocess and 2) alternative hypothesis testing - xenomorph morph.
- Xenomorph runs raw FASTA5 data through the preprocessing pipeline with an additional FASTA handling modification that allows users to input reference sequences with XNA base pairs. Outputs for preprocessing steps are provided in a .csv file (see Table 17 for header description), which is used as an input for xenomorph morph.
- Xenomorph uses the XNA base pairs found input the reference sequence to perform LLR or ORLLR testing against user-defined alternatives. For example, for a sequence containing A, T, G, C, B, S n base pairs, users can calculate most likely base at the XNA position against most similar canonical base (e.g.
- B vs A purines/pyrimidines
- canonical bases e.g. B vs A, T, G, C
- all bases e.g. B vs A, T, G, C, S n .
- Alternative hypothesis testing can be performed on a per-read basis or a global basis.
- XNA kmers models generated in this disclosure are built-in and can be viewed using xenomorph models. Model compilation is performed ad hoc, allowing users to experiment with kmer models.
- Outputs for alternative hypothesis testing are provided as a .csv file (see Table 18 for header description).
- kmer models are inherently independent (i.e. signal observations of NNNBNNN are independent of NNNSNNN observations) and therefore modular.
- Xenomorph was built to be flexible, allowing users to add more kmer models or modify them, and straightforward, requiring two commands to go from raw nanopore data to XNA-refined sequences.
- FIG. 6S A graphical overview of the preprocessing pipeline can be found in FIG. 6S.
- Xenomorph can be found in the Xenomorph repository (github.com/xenobiolab/xenomorph) alongside all code, documentation, and parameters used in this disclosure.
- 3915-P1293WO.UW -58- building and basecalling can be downloaded from the SRA Bioproject PRJNA932328 [https://www.ncbi.nlm.nih.gov/bioproject/PRJNA932328]. Additional overview of how the Xenomorph pipeline performs XNA basecalling is found in Note 1. [0164] Data availability: Models measured in this disclosure used for basecalling are provided in Data Table 1, and can also be found on the Xenomorph github repository (github.com/xenobiolab/xenomorph/tree/main/models).
- the raw nanopore sequences (FAST5) and guppy basecalls (FASTQ) used in this disclosure to build models, validate models, and test 12-letter DNA sequencing have been deposited in the sequence reads archive (SRA) under Bioproject PRJNA932328 [https://www.ncbi.nlm.nih.gov/bioproject/PRJNA932328] and can be accessed without restriction (Table 31).
- Raw nanopore data for PZ PCR amplification experiments (FIGs 6TA-6TC) are available under restricted access, as this data was collected in a pooled nanopore run and contains additional data. Full sequences for hairpin libraries purchased for this work can be produced based on this disclosure. Additional source data can be produced based on this disclosure.
- Code availability Code for end-to-end processing of nanopore reads and basecalling xenonucleotides described in this example can be produced based on this disclosure.
- Information for Example 1 Enzymatic Synthesis and Nanopore Sequencing of 12-letter Supernumerary DNA
- Methods [0168] Organic synthesis of dX t TP: 8-(2′-Deoxy- ⁇ -D-erythro- pentofuranosyl)imidazo[1,2-a]-s-triazin-2,4-dione 5′-triphosphate.
- 3915-P1293WO.UW -71- xenonucleobases (B, S n , S c , P, Z, X t , K n , J, V) are integrated for selection.
- the pipeline, as built, also allows users to generate their own models.
- Basecalling can be performed either per-read or per-sequence (global). In per-read basecalling, individual reads are basecalled while in per-sequence, the signal of all reads that match a sequence are averaged before determining a global call.
- the per-read consensus is defined as the most frequent basecall among all reads that match a certain sequence.
- 4-nt kmer models are parameterized with a kmer mean ( ⁇ k) and a kmer variance ( ⁇ k ). Users have the choice of setting experimentally measured signal means, signal medians, or means from kernel density estimates as ⁇ k. Options for ⁇ k values are either the kmer-specific measured variance or a fixed global variance. The choice of bases to use in the model can also be specified. As described, basecalling in this disclosure uses signal means for ⁇ k and global average kmer variance for ⁇ k . [0207] Full code and documentation of Xenomorph is available on github. Sample data, such as the FAST5 data generated in this disclosure, can be found in the SRA under Bioproject PRJNA932328 (Table 31). [0208] Note 2.
- Each hairpin pool contains 10 unique sequences. Ligating two hairpin pools together generates a final library of 100 possible sequence combinations (10 x 10).
- the table shows constant regions for all oligos in each pool (black), with regions in brackets (blue, bold) being replaced with their corresponding sequence elements from Tables 4-6. ‘-F’ and ‘-R’ are used to note forward and reverse sequences of different components after the hairpin is folded.
- NNN denotes the 3 randomized bases at the end of the hairpins
- [NNN-BC] i.e., Triplet-barcode
- [Pool-BC] i.e., Pool-barcode
- NNN-BC Triplet-barcode
- [Pool-BC] Pool-barcode
- Regions highlighted in red denote restriction site sequence difference between HP_v1 and HP_v2, HP1 and HP2. All sequences are shown in the 5′ to 3′ direction.
- Full hairpin sequences purchased for this disclosure can be produced based on this disclosure.
- Triplet-barcodes sequences Sequences of the Triplet-barcodes and NNN sequences they are assigned to.
- the Triplet-barcode is a 24 nt sequence that is distal to the 3′-NNN end in each hairpin and is used to assign the true identity of the 3′- NNN bases that flank XNA insertions (Fig.3a).
- N A, T, G, or C; 64 NNN combinations
- Barcode sequences were chosen from Oxford Nanopore Technologies list of barcodes for long-read sequencing.
- Barcode sequences are shown in 5′ to 3′ direction.
- the Triplet- barcode (abbreviated as [NNN-BC]) and NNN sequences used to construct HP_v1-NNN- [Pool-ID] and HP_v2-NNN-[Pool-ID] hairpin sequences, shown in Table 3, by insertion into [NNN-BC] and [NNN] regions, respectively.
- Full sequences of all hairpins used for model generation can be produced based on this disclosure.
- Validation pool sequences were randomly generated and intended to provide a sequence diversity (+/- 20 nt surrounding an XNA nt) much greater than what is present in the model training NNN-pools.
- the smaller library size (100 sequences per ligated pool) and richer sequence diversity made it possible to multiplex all the validation sets while still obtaining sufficient coverage for calculating appropriate statistics.
- Validation pool sequences are a subset of HP1-[VAL-ID] and HP2-[VAL-ID] hairpin sequences shown in Table 3. Sequences are shown in 5′ to 3′ direction. Full sequences of hairpins ordered, alongside ligation products generated, can be produced based on this disclosure. SE SE A A
- Table shows barcodes for each oligo that links to the variable 3 nt sequence on the 3′-end and the xenonucleotide tailed on the 3′-end (bold), as well as restriction site sequences (red, bold). Sequences are shown in 5′ to 3′ direction.
- Primer sequences are used to amplify the template: each condition used a different barcoded reverse primer (PCR_Amp_R1: Equimolar; PCR_Amp_R2: Optimal; PCR_Amp_R3: No dxNTP; PCR_Amp_R4: Limiting). All conditions used the same forward primer (PCR_Amp_F). Sequences are shown in 5′ to 3′ direction. S
- Table shows: (left) fraction of base called at each xenonucleotide position using the full 12-letter supernumerary model; (right) base called using model with simplified priors, where denotes the xenonucleotide at position called, and ⁇ denotes the most similar standard base called instead. Box highlights base pair chosen from picking the most likely nucleobase among any purine or pyrimidine set, then fixing complementary base. Base called – S c uper-12 ⁇ .2 9 .1 3 .0 8
- Table shows: (left) fraction of base called at each xenonucleotide position using the full 12-letter supernumerary model; (right) base called using model with simplified priors, where denotes the xenonucleotide at position called, and ⁇ denotes the most similar standard base called instead. Box highlights base pair chosen from picking the most likely nucleobase among any purine or pyrimidine set, then fixing complementary base. Base called – S n uper-12 ⁇ .2 0 .0 9 .2 2 .0 4 .2
- Table 27 Tabulation of per-read recall from simulated signal levels for the standard genetic code (A, T, G, C). Information regarding read simulation can be found in the Note section. Standard code A.
- Table 28 Tabulation of per-read recall from simulated signal levels for the isoG/isoC code (A, T, G, C, B, S n ). isoG/isoC code 6 0 7 0 3
- a method for generating an N+1 tailing product comprising a non-standard nucleotide that is covalently bound with a 3’ end of a precursor double-stranded DNA (dsDNA) template and is non-base-paired, the method comprising: combining the precursor dsDNA template with a DNA polymerase and a non-standard deoxyribonucleotide triphosphate (dNTP) under a reaction condition conducive to a blunt-end N+1 addition of the non-standard nucleotide to the 3’ end of the precursor dsDNA template by the DNA polymerase.
- dNTP non-standard deoxyribonucleotide triphosphate
- Embodiment 1 The method of Embodiment 1 or any other Embodiment, wherein the non-standard nucleotide is a xenonucleotide (XNA) and the non-standard dNTP is a deoxy-xeno-ribonucleotide triphosphate (dxNTP).
- XNA xenonucleotide
- dxNTP deoxy-xeno-ribonucleotide triphosphate
- Embodiment 3 The method of Embodiment 1 or any other Embodiment, wherein the DNA polymerase comprises a polypeptide sequence of a small Klenow Fragment (KF exo-) of DNA Polymerase I.
- Embodiment 4 The method of Embodiment 3 or any other Embodiment, wherein the polypeptide sequence comprises a sequence of SEQ ID NO:2.
- Embodiment 5. The method of any of Embodiments 3-4 or any other Embodiment, wherein the non-standard
- Embodiment 6 The method of Embodiment 1 or any other Embodiment, wherein the DNA polymerase comprises a polypeptide sequence of an engineered polymerase from a hyperthermophilic marine archaeon.
- Embodiment 7 The method of Embodiment 6 or any other Embodiment, wherein the engineered polymerase is a variant of 9°N DNA polymerase.
- Embodiment 9 The method of any of Embodiments 6-8 or any other Embodiment, wherein the non-standard nucleotide is selected from Sn, Sc, Z, Xt, Kn, J, and V, and the reaction condition proceeds at about 60°C for between about 4-16 hours and comprises about 0.29 U/ ⁇ L of the DNA polymerase and about 1.19 mM of the non-standard dNTP.
- Embodiment 10 Embodiment 10.
- Embodiment 11 A method for generating a base pair of two nucleotides of a polynucleotide, wherein at least one nucleotide of the two nucleotides is a non-standard nucleotide.
- Embodiment 12 The method of Embodiment 10 or any other Embodiment, comprising the method of any of Embodiments 1-9 or any other Embodiment.
- Embodiments 10-11 or any other Embodiment comprising: generating a second N+1 tailing product comprising a second non-standard nucleotide that is base-pair complementary with the non-standard nucleotide, wherein the second non-standard nucleotide is non-base-paired; and ligating the N+1 tailing product with the second N+1 tailing product to form a dsDNA ligation product that comprises a base pair between the non-standard nucleotide and the second non-standard nucleotide.
- Embodiment 13 The method of any of Embodiments 10-12 or any other Embodiment, wherein the N+1 tailing product comprises a hairpin.
- Embodiment 14 The method of any of Embodiments 10-13 or any other Embodiment, wherein the second N+1 tailing product comprises a hairpin.
- Embodiment 15 The method of Embodiment 14 or any other Embodiment, wherein the dsDNA ligation product does not comprise a free 5’ end or a free 3’ end.
- Embodiment 16 The method of any of Embodiments 12-15 or any other Embodiment, comprising: contacting the dsDNA ligation product with a type IIS restriction enzyme under a reaction condition conducive for the type IIS restriction enzyme to cleave the dsDNA ligation product to generate a blunt-end DNA template that comprises the base pair between the non-standard nucleotide and the second non-standard nucleotide.
- Embodiment 17 Embodiment 17.
- Embodiment 16 The method of Embodiment 16 or any other Embodiment, wherein the method is performed a plurality of times for creation of a plurality of base pairs between a plurality of non-standard nucleotides and a plurality of second non-standard nucleotides as sequence elements of a further dsDNA ligation product.
- Embodiment 17 comprising: contacting the further dsDNA ligation product with a type IIS restriction enzyme under a reaction condition conducive for the type IIS restriction enzyme to cleave the further dsDNA ligation product to generate a further blunt-end DNA template that comprises the plurality of base pairs between the plurality of non- standard nucleotides and the plurality of second non-standard nucleotides.
- Embodiment 20 The method of Embodiment 19 or any other Embodiment, wherein the non-standard nucleotide is the nucleobase that is configured to hydrogen bond to the second base and the second base is a standard base or a non- standard base.
- Embodiment 21 The method of Embodiment 19 or any other Embodiment, wherein the non-standard nucleotide is the nucleobase that can base pair (without hydrogen bonding) to the second base and the second base is a standard base or a non-standard base.
- Embodiment 22 The method of Embodiment 19 or any other Embodiment, wherein the non-standard nucleotide comprises an epigenetic modification or is 4-methyl-cytosine, 5-methyl cytosine, 6-methyl adenosine, 5-hydroxymethyl cytosine, 7-methylguanosine, or N6-methyladenosine.
- Embodiment 23 The method of Embodiment 19 or any other Embodiment, wherein the non-standard nucleotide comprises the chemical modification and the chemical modification comprises a fluorophore, a biotin, a terminal alkyne, an azide, a cyclooctyne, a tetrazine, a terminal alkene, a phosphine, a halo-alkane, an aldehyde, a thiol, a transition metal complex, another reactive handle, or any combination thereof.
- Embodiment 24 A dsDNA ligation product produced by the method of any of Embodiments 12-23 or any other Embodiment.
- Embodiment 25 A further dsDNA ligation product produced by the method of any of Embodiments 17-23 or any other Embodiment.
- Embodiment 26 A blunt-end dsDNA template produced by the method of any of Embodiments 16-23 or any other Embodiment.
- Embodiment 27 A further blunt-end dsDNA template produced by the method of any of Embodiments 18-23 or any other Embodiment.
- Embodiment 28 A further blunt-end dsDNA template produced by the method of any of Embodiments 18-23 or any other Embodiment.
- a defined non-standard nucleotide base pair library comprising a library polynucleotide sequence of the dsDNA ligation product of Embodiment 24 or any other Embodiment or the blunt-end dsDNA template of Embodiment 26 or any other Embodiment, wherein the library polynucleotide sequence comprises the base pair between the non-standard nucleotide and the second non-standard nucleotide.
- a defined non-standard nucleotide base pair library comprising a library polynucleotide sequence of the further dsDNA ligation product of Embodiment 25 or any other Embodiment or the further blunt-end dsDNA template of Embodiment 27 or any other Embodiment, wherein the library polynucleotide sequence
- Embodiment 30 The defined non-standard nucleotide base pair library of any of Embodiments 28-29 or any other Embodiment, wherein the library polynucleotide sequence further comprises: a context barcode associated with a sequence context adjacent to a base pair of a non-standard nucleotide and a second non-standard nucleotide of the library polynucleotide sequence; and a pool barcode associated with the non-standard nucleotide, the second non-standard nucleotide, or both.
- Embodiment 31 A method for generating a machine learning (ML) model that correlates one or more observed current reads with an unknown non-standard nucleotide for assignment of an identity to the unknown non-standard nucleotide, the method comprising: sequencing, with a nanopore sequencing method, the defined non- standard nucleotide base pair library of any of Embodiments 28-30 or any other Embodiment to produce the one or more observed current reads; and training, with a ML algorithm, the ML model to associate the one or more observed current reads with a known identity of a defined non-standard nucleotide of the defined non-standard nucleotide base pair library of any of Embodiments 28-30 or any other Embodiment, wherein the ML model is configured to assign the identity to the unknown non-standard nucleotide based on the known identity of the defined non-standard nucleotide.
- ML machine learning
- Embodiment 32 The method of Embodiment 31 or any other Embodiment, wherein the ML model comprises a convolutional long short term memory recurrent neural network (LSTM RNN).
- Embodiment 33 A non-transitory computer-readable storage medium having stored thereon at least part of a ML model produced by any of Embodiments 31- 32 or any other Embodiment.
- Embodiment 34 A computational device or computational system comprising the non-transitory computer-readable storage medium of Embodiment 33 or any other Embodiment.
- Embodiment 35 Embodiment 35.
- Embodiment 36 A method for basecalling a non-standard nucleotide expanded alphabet, the method comprising: sequencing, with a nanopore sequencing
- Embodiment 37 A circuitry configured to perform all or part of the method of Embodiment 36 or any other Embodiment.
- Embodiment 38 A circuitry configured to perform all or part of the method of Embodiment 36 or any other Embodiment.
- Embodiment 39 A nanopore sequencing kit, device, or system comprising the circuitry of Embodiment 37 or any other Embodiment.
- Embodiment 40 A nanopore sequencing kit, device, or system comprising the circuitry of Embodiment 38 or any other Embodiment.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Zoology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Microbiology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Plant Pathology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Immunology (AREA)
- Library & Information Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Systèmes et procédés pour générer des paires de bases définies dans un format de banque défini par séquence qui comprend au moins un nucléotide non standard dans une paire de bases. Les paires de bases non standard peuvent être créées par l'utilisation d'une polymérase d'acide nucléique (par exemple , une ADN polymérase, une ARN polymérase, une désoxynucléotide polymérase terminale) pour l'ajout de la queue émoussée de la base non standard, qui peut ensuite être ligaturée à une autre extrémité nucléotidique. Les séquences nucléotidiques contenant des paires de bases non standard peuvent être utilisées pour générer des banques pour les modèles servant à l'identification de bases de la base non standard avec des plateformes de séquençage nouvelle génération (NGS) et le séquençage de séquences nucléotidiques non standard, y compris les xénonucléotides (XNA).
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363483926P | 2023-02-08 | 2023-02-08 | |
US63/483,926 | 2023-02-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024168196A1 true WO2024168196A1 (fr) | 2024-08-15 |
Family
ID=92263538
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2024/015068 WO2024168196A1 (fr) | 2023-02-08 | 2024-02-08 | Systèmes et procédés de synthèse enzymatique de polynucléotides contenant des paires de bases nucléotidiques non standard |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024168196A1 (fr) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150133320A1 (en) * | 1997-04-01 | 2015-05-14 | Illumina, Inc. | Method of nucleic acid amplification |
WO2019081680A1 (fr) * | 2017-10-25 | 2019-05-02 | Institut Pasteur | Immobilisation d'acides nucléiques à l'aide d'un mimétique d'étiquette histidine enzymatique pour des applications de diagnostic |
US20200263218A1 (en) * | 2017-10-04 | 2020-08-20 | Centrillion Technology Holdings Corporation | Method and system for enzymatic synthesis of oligonucleotides |
US20200392572A1 (en) * | 2017-12-21 | 2020-12-17 | Curevac Ag | Linear double stranded dna coupled to a single support or a tag and methods for producing said linear double stranded dna |
US10934569B1 (en) * | 2018-12-20 | 2021-03-02 | Nicole A Leal | Enzymatic processes for synthesizing RNA containing certain non-standard nucleotides |
US20210171920A1 (en) * | 2015-10-29 | 2021-06-10 | Temple University-Of The Commonwealth System Of Higher Education | Modification of 3' Terminal Ends of Nucleic Acids by DNA Polymerase Theta |
US20210355519A1 (en) * | 2020-05-15 | 2021-11-18 | Codex Dna, Inc. | Demand synthesis of polynucleotide sequences |
-
2024
- 2024-02-08 WO PCT/US2024/015068 patent/WO2024168196A1/fr unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150133320A1 (en) * | 1997-04-01 | 2015-05-14 | Illumina, Inc. | Method of nucleic acid amplification |
US20210171920A1 (en) * | 2015-10-29 | 2021-06-10 | Temple University-Of The Commonwealth System Of Higher Education | Modification of 3' Terminal Ends of Nucleic Acids by DNA Polymerase Theta |
US20200263218A1 (en) * | 2017-10-04 | 2020-08-20 | Centrillion Technology Holdings Corporation | Method and system for enzymatic synthesis of oligonucleotides |
WO2019081680A1 (fr) * | 2017-10-25 | 2019-05-02 | Institut Pasteur | Immobilisation d'acides nucléiques à l'aide d'un mimétique d'étiquette histidine enzymatique pour des applications de diagnostic |
US20200392572A1 (en) * | 2017-12-21 | 2020-12-17 | Curevac Ag | Linear double stranded dna coupled to a single support or a tag and methods for producing said linear double stranded dna |
US10934569B1 (en) * | 2018-12-20 | 2021-03-02 | Nicole A Leal | Enzymatic processes for synthesizing RNA containing certain non-standard nucleotides |
US20210355519A1 (en) * | 2020-05-15 | 2021-11-18 | Codex Dna, Inc. | Demand synthesis of polynucleotide sequences |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lucas et al. | Quantitative analysis of tRNA abundance and modifications by nanopore RNA sequencing | |
US20210062186A1 (en) | Next-generation sequencing libraries | |
Chen et al. | The history and advances of reversible terminators used in new generations of sequencing technology | |
Lu et al. | Enzymatic DNA synthesis by engineering terminal deoxynucleotidyl transferase | |
DK2245187T3 (en) | Methods for accurate sequence data and modified due to localization | |
Leu et al. | Cascade of reduced speed and accuracy after errors in enzyme-free copying of nucleic acid sequences | |
US10704164B2 (en) | Methods, systems, computer readable media, and kits for sample identification | |
EP3146075B1 (fr) | Séquençage d'adn et d'arn par synthèse basé sur la détection d'ions à l'aide de terminateurs nucléotidiques réversibles | |
CN107969138B (zh) | 条形码序列和有关系统与方法 | |
WO2015081229A2 (fr) | Amplification sélective de séquences d'acide nucléique | |
US20060141516A1 (en) | De-novo sequencing of nucleic acids | |
CN105579592B (zh) | 用于制备dna文库的dna接头分子以及生产它们的方法和用途 | |
US20200190574A1 (en) | Rna-stitch sequencing: an assay for direct mapping of rna : rna interactions in cells | |
KR20240069835A (ko) | 대규모 병렬 서열분석을 위한 dna 라이브러리를 생성하기 위한 개선된 방법 및 키트 | |
Desgranges et al. | Navigation through the twists and turns of RNA sequencing technologies: application to bacterial regulatory RNAs | |
Kawabe et al. | Enzymatic synthesis and nanopore sequencing of 12-letter supernumerary DNA | |
Jankowsky et al. | Mapping specificity landscapes of RNA-protein interactions by high throughput sequencing | |
JP2002525129A (ja) | ポリヌクレオチドを分析するための方法 | |
Giurgiu et al. | A Fluorescent G‐Quadruplex Sensor for Chemical RNA Copying | |
US20160239732A1 (en) | System and method for using nucleic acid barcodes to monitor biological, chemical, and biochemical materials and processes | |
CN116287167B (zh) | 核酸分子的测序方法 | |
WO2024168196A1 (fr) | Systèmes et procédés de synthèse enzymatique de polynucléotides contenant des paires de bases nucléotidiques non standard | |
US20240052342A1 (en) | Method for duplex sequencing | |
Tserovski et al. | Diastereoselectivity of 5-Methyluridine Osmylation is inverted inside an RNA chain | |
Lau et al. | Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24754101 Country of ref document: EP Kind code of ref document: A1 |