WO2024086596A1 - Polypeptide fusions or conjugates for gene editing - Google Patents
Polypeptide fusions or conjugates for gene editing Download PDFInfo
- Publication number
- WO2024086596A1 WO2024086596A1 PCT/US2023/077111 US2023077111W WO2024086596A1 WO 2024086596 A1 WO2024086596 A1 WO 2024086596A1 US 2023077111 W US2023077111 W US 2023077111W WO 2024086596 A1 WO2024086596 A1 WO 2024086596A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dna
- polypeptide
- domain
- site
- composition
- Prior art date
Links
- 108090000765 processed proteins & peptides Proteins 0.000 title claims description 315
- 102000004196 processed proteins & peptides Human genes 0.000 title claims description 315
- 229920001184 polypeptide Polymers 0.000 title claims description 314
- 238000010362 genome editing Methods 0.000 title description 5
- 230000004927 fusion Effects 0.000 title description 3
- 108020004414 DNA Proteins 0.000 claims abstract description 387
- 239000000203 mixture Substances 0.000 claims abstract description 93
- 238000000034 method Methods 0.000 claims abstract description 51
- 108091028043 Nucleic acid sequence Proteins 0.000 claims abstract description 6
- 102000053602 DNA Human genes 0.000 claims description 383
- 239000012634 fragment Substances 0.000 claims description 108
- 102000040430 polynucleotide Human genes 0.000 claims description 93
- 108091033319 polynucleotide Proteins 0.000 claims description 93
- 239000002157 polynucleotide Substances 0.000 claims description 93
- 150000007523 nucleic acids Chemical class 0.000 claims description 85
- 230000000694 effects Effects 0.000 claims description 83
- 102000039446 nucleic acids Human genes 0.000 claims description 80
- 108020004707 nucleic acids Proteins 0.000 claims description 80
- 101710183280 Topoisomerase Proteins 0.000 claims description 71
- 150000001413 amino acids Chemical class 0.000 claims description 55
- 102000003915 DNA Topoisomerases Human genes 0.000 claims description 54
- 108090000323 DNA Topoisomerases Proteins 0.000 claims description 48
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 claims description 48
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 claims description 48
- 238000009396 hybridization Methods 0.000 claims description 45
- 108090000623 proteins and genes Proteins 0.000 claims description 43
- 241001112695 Clostridiales Species 0.000 claims description 42
- 102000004169 proteins and genes Human genes 0.000 claims description 42
- 102000037865 fusion proteins Human genes 0.000 claims description 30
- 108020001507 fusion proteins Proteins 0.000 claims description 30
- 102100033934 DNA repair protein RAD51 homolog 2 Human genes 0.000 claims description 29
- 101001132307 Homo sapiens DNA repair protein RAD51 homolog 2 Proteins 0.000 claims description 29
- 230000001419 dependent effect Effects 0.000 claims description 29
- 210000004027 cell Anatomy 0.000 claims description 23
- 241000282414 Homo sapiens Species 0.000 claims description 18
- 230000035772 mutation Effects 0.000 claims description 18
- 241000588724 Escherichia coli Species 0.000 claims description 17
- 108010054814 DNA Gyrase Proteins 0.000 claims description 12
- 240000004808 Saccharomyces cerevisiae Species 0.000 claims description 12
- 108010006785 Taq Polymerase Proteins 0.000 claims description 12
- 241000204641 Methanopyrus kandleri Species 0.000 claims description 10
- 108010001244 Tli polymerase Proteins 0.000 claims description 10
- 230000000415 inactivating effect Effects 0.000 claims description 10
- 108090000579 DNA topoisomerase III Proteins 0.000 claims description 9
- -1 Q5 polymerase Proteins 0.000 claims description 9
- 230000003197 catalytic effect Effects 0.000 claims description 8
- 239000003795 chemical substances by application Substances 0.000 claims description 8
- 125000002887 hydroxy group Chemical group [H]O* 0.000 claims description 8
- 238000001890 transfection Methods 0.000 claims description 8
- 108010017826 DNA Polymerase I Proteins 0.000 claims description 7
- 102000004594 DNA Polymerase I Human genes 0.000 claims description 7
- 108020004634 Archaeal DNA Proteins 0.000 claims description 6
- 108010041052 DNA Topoisomerase IV Proteins 0.000 claims description 6
- 108010093204 DNA polymerase theta Proteins 0.000 claims description 6
- 102100029766 DNA polymerase theta Human genes 0.000 claims description 6
- 108010082463 DNA reverse gyrase Proteins 0.000 claims description 6
- 108010065542 DNA topoisomerase V Proteins 0.000 claims description 6
- 108010079412 DNA topoisomerase VI Proteins 0.000 claims description 6
- 241000205098 Sulfolobus acidocaldarius Species 0.000 claims description 6
- 241000205095 Sulfolobus shibatae Species 0.000 claims description 6
- 108010046308 Type II DNA Topoisomerases Proteins 0.000 claims description 6
- 102000007537 Type II DNA Topoisomerases Human genes 0.000 claims description 6
- 206010046865 Vaccinia virus infection Diseases 0.000 claims description 6
- 230000002441 reversible effect Effects 0.000 claims description 6
- 230000008685 targeting Effects 0.000 claims description 6
- 208000007089 vaccinia Diseases 0.000 claims description 6
- 241000288906 Primates Species 0.000 claims description 4
- 230000001580 bacterial effect Effects 0.000 claims description 4
- 210000005260 human cell Anatomy 0.000 claims description 4
- 102000004190 Enzymes Human genes 0.000 description 50
- 108090000790 Enzymes Proteins 0.000 description 50
- 125000003729 nucleotide group Chemical group 0.000 description 50
- 239000002773 nucleotide Substances 0.000 description 49
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 46
- 235000001014 amino acid Nutrition 0.000 description 43
- 229940024606 amino acid Drugs 0.000 description 42
- 235000018102 proteins Nutrition 0.000 description 39
- 108010042407 Endonucleases Proteins 0.000 description 28
- 102000004533 Endonucleases Human genes 0.000 description 28
- 108020005004 Guide RNA Proteins 0.000 description 28
- 239000004471 Glycine Substances 0.000 description 23
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 23
- 235000004279 alanine Nutrition 0.000 description 23
- 230000008520 organization Effects 0.000 description 21
- 108700019146 Transgenes Proteins 0.000 description 20
- 238000003776 cleavage reaction Methods 0.000 description 17
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 16
- 101710163270 Nuclease Proteins 0.000 description 15
- 125000005647 linker group Chemical group 0.000 description 15
- 230000007017 scission Effects 0.000 description 15
- 230000000295 complement effect Effects 0.000 description 14
- 108700026244 Open Reading Frames Proteins 0.000 description 12
- 108060002716 Exonuclease Proteins 0.000 description 11
- 102000013165 exonuclease Human genes 0.000 description 11
- 238000006467 substitution reaction Methods 0.000 description 11
- 238000006243 chemical reaction Methods 0.000 description 10
- 238000013461 design Methods 0.000 description 9
- 239000011780 sodium chloride Substances 0.000 description 9
- 235000002639 sodium chloride Nutrition 0.000 description 9
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 8
- 230000033616 DNA repair Effects 0.000 description 8
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 8
- 238000003556 assay Methods 0.000 description 8
- 230000027455 binding Effects 0.000 description 8
- 239000000546 pharmaceutical excipient Substances 0.000 description 8
- 238000000746 purification Methods 0.000 description 7
- 230000006378 damage Effects 0.000 description 6
- 230000006801 homologous recombination Effects 0.000 description 6
- 238000002744 homologous recombination Methods 0.000 description 6
- RAXXELZNTBOGNW-UHFFFAOYSA-N imidazole Natural products C1=CNC=N1 RAXXELZNTBOGNW-UHFFFAOYSA-N 0.000 description 6
- 239000002105 nanoparticle Substances 0.000 description 6
- 230000006798 recombination Effects 0.000 description 6
- 238000005215 recombination Methods 0.000 description 6
- 230000008439 repair process Effects 0.000 description 6
- 239000000758 substrate Substances 0.000 description 6
- 230000007018 DNA scission Effects 0.000 description 5
- 239000003937 drug carrier Substances 0.000 description 5
- 238000003780 insertion Methods 0.000 description 5
- 230000037431 insertion Effects 0.000 description 5
- 150000002632 lipids Chemical class 0.000 description 5
- 102000004389 Ribonucleoproteins Human genes 0.000 description 4
- 108010081734 Ribonucleoproteins Proteins 0.000 description 4
- 230000002255 enzymatic effect Effects 0.000 description 4
- 230000010354 integration Effects 0.000 description 4
- 239000001488 sodium phosphate Substances 0.000 description 4
- 229910000162 sodium phosphate Inorganic materials 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 241000894007 species Species 0.000 description 4
- RYFMWSXOAZQYPI-UHFFFAOYSA-K trisodium phosphate Chemical compound [Na+].[Na+].[Na+].[O-]P([O-])([O-])=O RYFMWSXOAZQYPI-UHFFFAOYSA-K 0.000 description 4
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 3
- 241000196324 Embryophyta Species 0.000 description 3
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 3
- 108020004511 Recombinant DNA Proteins 0.000 description 3
- 241000193996 Streptococcus pyogenes Species 0.000 description 3
- PZBFGYYEXUXCOF-UHFFFAOYSA-N TCEP Chemical compound OC(=O)CCP(CCC(O)=O)CCC(O)=O PZBFGYYEXUXCOF-UHFFFAOYSA-N 0.000 description 3
- 239000011543 agarose gel Substances 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000000338 in vitro Methods 0.000 description 3
- 239000006166 lysate Substances 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 125000006850 spacer group Chemical group 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- VBICKXHEKHSIBG-UHFFFAOYSA-N 1-monostearoylglycerol Chemical compound CCCCCCCCCCCCCCCCCC(=O)OCC(O)CO VBICKXHEKHSIBG-UHFFFAOYSA-N 0.000 description 2
- 241000028923 Acidibacillus sulfuroxidans Species 0.000 description 2
- 241000894006 Bacteria Species 0.000 description 2
- 239000002028 Biomass Substances 0.000 description 2
- 108091033409 CRISPR Proteins 0.000 description 2
- 241000816693 Candidatus Aureabacteria Species 0.000 description 2
- 241000843441 Candidatus Micrarchaeota Species 0.000 description 2
- 241000186581 Clostridium novyi Species 0.000 description 2
- 102000012410 DNA Ligases Human genes 0.000 description 2
- 108010061982 DNA Ligases Proteins 0.000 description 2
- 230000005778 DNA damage Effects 0.000 description 2
- 231100000277 DNA damage Toxicity 0.000 description 2
- 230000008265 DNA repair mechanism Effects 0.000 description 2
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 2
- 230000004568 DNA-binding Effects 0.000 description 2
- 101710096438 DNA-binding protein Proteins 0.000 description 2
- 239000001836 Dioctyl sodium sulphosuccinate Substances 0.000 description 2
- 239000004262 Ethyl gallate Substances 0.000 description 2
- LYCAIKOWRPUZTN-UHFFFAOYSA-N Ethylene glycol Chemical compound OCCO LYCAIKOWRPUZTN-UHFFFAOYSA-N 0.000 description 2
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 2
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 2
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 241000193390 Parageobacillus thermoglucosidasius Species 0.000 description 2
- 108010010677 Phosphodiesterase I Proteins 0.000 description 2
- 229940124158 Protease/peptidase inhibitor Drugs 0.000 description 2
- 241000192031 Ruminococcus Species 0.000 description 2
- 241000601368 Syntrophomonas palmitatica Species 0.000 description 2
- 241000205180 Thermococcus litoralis Species 0.000 description 2
- XSQUKJJJFZCRTK-UHFFFAOYSA-N Urea Chemical compound NC(N)=O XSQUKJJJFZCRTK-UHFFFAOYSA-N 0.000 description 2
- 125000000539 amino acid group Chemical group 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000009918 complex formation Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000000502 dialysis Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 108010052305 exodeoxyribonuclease III Proteins 0.000 description 2
- 238000011534 incubation Methods 0.000 description 2
- 230000002427 irreversible effect Effects 0.000 description 2
- 239000002502 liposome Substances 0.000 description 2
- 239000007788 liquid Substances 0.000 description 2
- 229930182817 methionine Natural products 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 238000002887 multiple sequence alignment Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 239000000137 peptide hydrolase inhibitor Substances 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 2
- 108010068698 spleen exonuclease Proteins 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 244000215068 Acacia senegal Species 0.000 description 1
- 235000006491 Acacia senegal Nutrition 0.000 description 1
- GUBGYTABKSRVRQ-XLOQQCSPSA-N Alpha-Lactose Chemical compound O[C@@H]1[C@@H](O)[C@@H](O)[C@@H](CO)O[C@H]1O[C@@H]1[C@@H](CO)O[C@H](O)[C@H](O)[C@H]1O GUBGYTABKSRVRQ-XLOQQCSPSA-N 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- 238000010354 CRISPR gene editing Methods 0.000 description 1
- 206010011953 Decreased activity Diseases 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010010803 Gelatin Proteins 0.000 description 1
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 1
- 229920000084 Gum arabic Polymers 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000991410 Homo sapiens Nucleolar and spindle-associated protein 1 Proteins 0.000 description 1
- 108010076876 Keratins Proteins 0.000 description 1
- 102000011782 Keratins Human genes 0.000 description 1
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 1
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 1
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 1
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- GUBGYTABKSRVRQ-QKKXKWKRSA-N Lactose Natural products OC[C@H]1O[C@@H](O[C@H]2[C@H](O)[C@@H](O)C(O)O[C@@H]2CO)[C@H](O)[C@@H](O)[C@H]1O GUBGYTABKSRVRQ-QKKXKWKRSA-N 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 1
- 102100030991 Nucleolar and spindle-associated protein 1 Human genes 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 235000019483 Peanut oil Nutrition 0.000 description 1
- 229920002873 Polyethylenimine Polymers 0.000 description 1
- 238000010843 Qubit protein assay Methods 0.000 description 1
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 1
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 1
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 1
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 1
- 229920002472 Starch Polymers 0.000 description 1
- 108091027544 Subgenomic mRNA Proteins 0.000 description 1
- 229930006000 Sucrose Natural products 0.000 description 1
- CZMRCDWAGMRECN-UGDNZRGBSA-N Sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- 108091028113 Trans-activating crRNA Proteins 0.000 description 1
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- 235000010489 acacia gum Nutrition 0.000 description 1
- 239000002671 adjuvant Substances 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 210000004102 animal cell Anatomy 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 235000009582 asparagine Nutrition 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 235000003704 aspartic acid Nutrition 0.000 description 1
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 210000004899 c-terminal region Anatomy 0.000 description 1
- 239000004202 carbamide Substances 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 239000008119 colloidal silica Substances 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 235000018417 cysteine Nutrition 0.000 description 1
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000000412 dendrimer Substances 0.000 description 1
- 229920000736 dendritic polymer Polymers 0.000 description 1
- 239000008121 dextrose Substances 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 239000003085 diluting agent Substances 0.000 description 1
- 239000013024 dilution buffer Substances 0.000 description 1
- 239000000539 dimer Substances 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 239000003995 emulsifying agent Substances 0.000 description 1
- 239000000839 emulsion Substances 0.000 description 1
- 230000001036 exonucleolytic effect Effects 0.000 description 1
- 210000001808 exosome Anatomy 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000000499 gel Substances 0.000 description 1
- 239000008273 gelatin Substances 0.000 description 1
- 229920000159 gelatin Polymers 0.000 description 1
- 235000019322 gelatine Nutrition 0.000 description 1
- 235000011852 gelatine desserts Nutrition 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 235000013922 glutamic acid Nutrition 0.000 description 1
- 239000004220 glutamic acid Substances 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- YQEMORVAKMFKLG-UHFFFAOYSA-N glycerine monostearate Natural products CCCCCCCCCCCCCCCCCC(=O)OC(CO)CO YQEMORVAKMFKLG-UHFFFAOYSA-N 0.000 description 1
- SVUQHVRAGMNPLW-UHFFFAOYSA-N glycerol monostearate Natural products CCCCCCCCCCCCCCCCC(=O)OCC(O)CO SVUQHVRAGMNPLW-UHFFFAOYSA-N 0.000 description 1
- ACCCMOQWYVYDOT-UHFFFAOYSA-N hexane-1,1-diol Chemical compound CCCCCC(O)O ACCCMOQWYVYDOT-UHFFFAOYSA-N 0.000 description 1
- 239000000017 hydrogel Substances 0.000 description 1
- WGCNASOHLSPBMP-UHFFFAOYSA-N hydroxyacetaldehyde Natural products OCC=O WGCNASOHLSPBMP-UHFFFAOYSA-N 0.000 description 1
- 238000000099 in vitro assay Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- 239000008101 lactose Substances 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 229910001629 magnesium chloride Inorganic materials 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000693 micelle Substances 0.000 description 1
- 235000010446 mineral oil Nutrition 0.000 description 1
- 239000002480 mineral oil Substances 0.000 description 1
- 239000008185 minitablet Substances 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 239000003921 oil Substances 0.000 description 1
- 235000019198 oils Nutrition 0.000 description 1
- 239000006179 pH buffering agent Substances 0.000 description 1
- 239000000312 peanut oil Substances 0.000 description 1
- 239000003208 petroleum Substances 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- QQONPFPTGQHPMA-UHFFFAOYSA-N propylene Natural products CC=C QQONPFPTGQHPMA-UHFFFAOYSA-N 0.000 description 1
- 125000004805 propylene group Chemical group [H]C([H])([H])C([H])([*:1])C([H])([H])[*:2] 0.000 description 1
- 238000000751 protein extraction Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 239000008159 sesame oil Substances 0.000 description 1
- 235000011803 sesame oil Nutrition 0.000 description 1
- 239000003549 soybean oil Substances 0.000 description 1
- 235000012424 soybean oil Nutrition 0.000 description 1
- 235000019698 starch Nutrition 0.000 description 1
- 239000008107 starch Substances 0.000 description 1
- 239000005720 sucrose Substances 0.000 description 1
- 238000013268 sustained release Methods 0.000 description 1
- 239000012730 sustained-release form Substances 0.000 description 1
- 239000003826 tablet Substances 0.000 description 1
- 239000000454 talc Substances 0.000 description 1
- 229910052623 talc Inorganic materials 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- 235000013311 vegetables Nutrition 0.000 description 1
- 239000003981 vehicle Substances 0.000 description 1
- 239000011534 wash buffer Substances 0.000 description 1
- 238000009736 wetting Methods 0.000 description 1
- 239000000080 wetting agent Substances 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/90—Isomerases (5.)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y599/00—Other isomerases (5.99)
- C12Y599/01—Other isomerases (5.99.1)
- C12Y599/01002—DNA topoisomerase (5.99.1.2)
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
- C12N9/1241—Nucleotidyltransferases (2.7.7)
- C12N9/1252—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y207/00—Transferases transferring phosphorus-containing groups (2.7)
- C12Y207/07—Nucleotidyltransferases (2.7.7)
- C12Y207/07007—DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
Definitions
- Programmable nucleases such as CRISPR-associated Cas endonucleases have revolutionized the ability to perform gene editing in organisms in a precise, site-directed manner.
- the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cas 12 polypeptide, wherein said first and second Cas 12 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site.
- a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cas 12 polypeptide, wherein said first and second Cas 12 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site.
- composition comprising a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (ii) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (c) a third segment comprising a heterologous domain of at least about 100 amino acids.
- a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (ii) a sequence comprising RuvC,
- the present disclosure provides for a fusion protein having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity identity to any of the polypeptides described herein.
- the present disclosure provides for a method of editing a nucleic acid site in a cell, comprising contacting said cell with any of the compositions described herein.
- the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with
- the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site (i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (c) a third segment comprising a heterologous domain of at least about 100 amino acids; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said doublestranded DNA site or a region with complementarity to a region 3' to said nucleic acid site; and (iii
- the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Cast 2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cast 2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a doublestranded deoxyribonucleic acid (DNA) site; and (ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site.
- a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Cast 2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of
- the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site.
- a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a
- the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first fragment comprising WED and RECI domains derived from a Casl2 enzyme; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains from a Cast 2 enzyme, wherein said Casl2 enzyme is configured to bind a double-stranded deoxyribonucleic acid (DNA) site.
- said second fragment further comprises a REC2 domain derived from a Casl2 enzyme.
- said Casl2 enzyme is a Class 2, Type V-F or a Casl2f enzyme.
- said heterologous domain comprises at least about 10, 25, 50, 75, or 100 to about 1500 amino acids in length. In some embodiments, said heterologous domain comprises at least about 876 amino acids or at least about 900 amino acids. In some embodiments, said heterologous domain comprises at most about 10, 25, 50, 75, 100, 250, 500. 750, 1000, 1200, 1500, 1700, 2000, 2200, 2500, 2700, or 3000 amino acids in length. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue.
- said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, Klenow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof.
- said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof.
- said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain.
- said Type I topoisomerase domain comprises E. coll Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase IIII, H. sapiens DNA topoisomerase Illa or 11 ip, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or M. kandleri hyperthermophilic eubacterial DNA topoisomerase V, or a functional fragment thereof.
- said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain.
- said type II topoisomerase domain comprises E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or IIJ3, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof.
- said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55.
- said first fragment and said second fragment are derived from a same Cast 2 enzyme.
- said first fragment and said second fragment are derived from a different Cast 2 enzyme.
- said first fragment and said second fragment do not comprise an inactivating mutation in an active site residue of said Cast 2 enzyme.
- said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof.
- said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43.
- said composition further comprises an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site.
- said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp.
- said insert nucleic acid sequence comprises at least about Ibp to at least about 20 kb.
- said insert nucleic acid sequence comprises at least about 100 bp, 250 bp, 500 bp, 750 bp, 1 kb, 1.2 kb, 1.5 kb, 1.7 kb, 2.0 kb, 2.5 kb, 3 kb, 3.5 kb, 4 kb, 4.5 kb, 5 kb, 6 kb, 6.5 kb, 7 kb, 7.5 kb, 8 kb, 8.5 kb, 9 kb, 9.5 kb, 10 kb, 11 kb, 12 kb, 13 kb, 14 kb, 15 kb, 16 kb, 17 kb, 18 kb, 19 kb, 20 kb, or any range between these values.
- said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single- stranded deoxyribonucleic acid molecule, or at least partially a doublestranded deoxyribonucleic acid molecule.
- said insert DNA molecule is: (i) linked to said programmable nuclease; (ii) linked to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme; or (iii) hybridized to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme.
- said composition further comprises a guide polynucleotide configured to interact with said Cas protein, wherein (a) said guide polynucleotide further comprises a hybridization domain at a 3' end; and (b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA.
- said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA.
- said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end.
- said composition further comprises a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment.
- said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof.
- said linker comprises LPXTG (SEQ ID NO: 79), GGG (SEQ ID NO: 80), (GGG) n (SEQ ID NO: 81), (GGGGS)n(SEQ ID NO: 82), (GGGS)n(SEQ ID NO: 83), NI- 7 , a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond.
- composition comprising a fusion protein comprising: (a) a first segment comprising a Casl2 enzyme configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains derived from a Casl2 enzyme; or (ii) a sequence comprising RuvC, REC2, and Nuc domains from a Casl2 enzyme; and (c) a third segment comprising a heterologous domain of at least about 10, 25, 50, 75, or 100 amino acids.
- a fusion protein comprising: (a) a first segment comprising a Casl2 enzyme configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains derived from a Casl2 enzyme; or (ii) a sequence comprising RuvC, REC2,
- said first domain further comprises WED, RECI, RuvC, REC2, and Nuc domains from said Casl2
- said heterologous domain comprises at least about 10, 25, 50, 75, or 100 to about 900 amino acids in length. In some embodiments, said heterologous domain comprises at least about 100 to about 1500 amino acids in length. In some embodiments, said heterologous domain comprises at least about 876 amino acids or at least about 900 amino acids. In some embodiments, said heterologous domain comprises at most about 10, 25, 50, 75, 100, 250, 500, 750, 1000, 1200, 1500, 1700, 2000, 2200, 2500, 2700, or 3000 amino acids in length.
- said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue.
- said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, Klenow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof.
- said heterologous domain comprises a domain with DNA- dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof.
- said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain.
- said Type I topoisomerase domain comprises E. coli Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase IIII, H. sapiens DNA topoisomerase Illa or 111[l, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or AT. kandleri hyperthermophilic eubacterial DNA topoisomerase V, or a functional fragment thereof.
- said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain.
- said type II topoisomerase domain comprises E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or IIJ3, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof.
- said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55, or a variant thereof.
- said first fragment and said second fragment are derived from a same Cast 2 enzyme.
- said first fragment and said second fragment are derived from a different Casl2 enzyme.
- said first fragment and said second fragment do not comprise an inactivating mutation in an active site residue of said Casl2 enzyme.
- said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof.
- said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43, or a variant thereof.
- the composition further comprises an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site.
- said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp.
- said insert nucleic acid sequence comprises at least about Ibp to at least about 20 kb.
- said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a singlestranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
- said insert DNA molecule is: (i) linked to said programmable nuclease; (ii) linked to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme; or (iii) hybridized to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme.
- said composition further comprises a guide polynucleotide configured to interact with said Cas protein, wherein (a) said guide polynucleotide further comprises a hybridization domain at a 3' end; and (b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA.
- said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA.
- said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end.
- the composition further comprises a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment.
- said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof.
- said linker comprises LPXTG, GGG, (GGG) n , (GGGGS) n , (GGGS) n , N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond.
- the present disclosure provides for a fusion protein comprising a sequence having at least 80% identity to any one of SEQ ID NOs: 20-23, or a variant thereof. [0015] In some aspects, the present disclosure provides for a system comprising any of the components of the compositions described herein.
- the present disclosure provides for a method of editing a DNA locus in a cell comprising contacting to a cell any of the compositions described herein, or a nucleic acid comprising or encoding any of the compositions described herein.
- the cell is a bacterial, archaeal, plant, mammalian, primate, or human cell.
- FIG. 1 depicts a schematic representation of split Casl2f functional domains. Examples are based on two casl2f proteins: Sp casl2f and Pt casl2f. It was considered that fA and fB can be integrated with different intervening heterologous domains to add new enzymatic activity while simultaneously preserving specific guided DNA cleavage activity.
- FIG. 2 depicts examples of split Casl2f proteins incorporating various heterologous polymerase domains.
- the Casl2f (Sp casl2f and Pt casl2f) functional domains were extracted and integrated with heterologous sequences (Central bar— DNA polymerase Bst or Phi29) such that the Casl2f components retain structural/functional integrity.
- the polymerase function is strategically located in the structure such that it can best support activity of the polymerases on DNA.
- FIG. 3 depicts examples of split Casl2f proteins incorporating SpyCatcher domains to allow linkage to an additional protein in vitro.
- FIGs. 4A and 4B depict examples of an alternative anchor/connector design incorporating one functional spCasl2f domain and an additional Casl2f fragment to allow for anchoring of the heterologous domain at the proper place in the complex.
- the natural dimer structure of Casl2f is used to provide a scaffold.
- the leftmost fragment is a full Sp casl2f monomer, the central domains include SpyCatcher (which serves here as a linker but also can be used to attach different heterologous polypeptides).
- the rightmost domain is a structural monomer used in a minimal truncated form where the role is limited role of anchor/connector of an accessory protein (grey box, e.g. DNA polymerase or Topoisomerase).
- SpyCatcher can be replaced with linker sequences.
- Figures 5A and 5B depict expression/purification and activity assays for Sp and Pt casl2f and derivatives constructs. These assays illustrate that domains extracted from Casl2f and integrated with SpyCatcher retain specific guided cleavage activity.
- Figure 5A depicts expression and activity assays for a first batch of split casl2f constructs. The top panel shows SDS-PAGE of the purification products in Example 1. The analyses verified production of all variants described in the example.
- the bottom panel shows an agarose gel depicting results of the in vitro guided cleavage assay in Example 1.
- the substrate and cleavage products are annotated.
- the assay demonstrates that all the variants retain wild-type-like activity.
- FIG. 5B depicts expression and activity assays for a second batch of split casl2f constructs.
- the top panel shows SDS-PAGE of the purification products in Example 1.
- the analyses shows good yield for all variants described in the example.
- the bottom panel shows an agarose gel depicting results of the in vitro guided cleavage assay in Example 1.
- the substrate and cleavage products are annotated.
- the assay demonstrates that all the variants retain wild-type-like activity.
- FIG. 6 shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs can facilitate integration of genomic DNA; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to the ends of the DNA insert.
- the endonuclease catalyzes DNA damage at a specified site to induce DNA repair.
- the DNA insert includes homology fragments which are used in homology recombination/repair.
- the DNA insert in this case is presented by two Cas polypeptides which specifically recognize and bind target DNA.
- FIG. 7 shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs can facilitate integration of genomic DNA; in this embodiment, one end of the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA, while the other end is free.
- the endonuclease catalyzes DNA damage at a specified site to induce DNA repair incorporating the DNA insert.
- the DNA insert includes homology fragments which are used in homology recombination/repair.
- the DNA insert is a presented by a single Cas polypeptide (e.g. enzyme) that specifically recognizes and binds target DNA.
- FIG. 8 shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl 2f endonuclease) constructs with topoisomerase or protein A integrated between the halves of the Casl2 can facilitate integration of genomic DNA; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to the ends of the DNA insert.
- the topoisomerase or Protein A catalyzes cleavage of target DNA and simultaneously covalently attaches to one of the strands. Damage to DNA target induces DNA repair.
- the DNA insert includes homology fragments which are used in homology recombination/repair.
- Figure 9 shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs according to the current disclosure with topoisomerase or protein A integrated between the halves of the Cas 12 can facilitate integration of genomic DNA; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to one end of the DNA insert, while the other end of the DNA insert is free.
- Cas endonuclease e.g. split-Casl2f endonuclease constructs according to the current disclosure with topoisomerase or protein A integrated between the halves of the Cas 12
- the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a
- the topoisomerase or Protein A catalyzes cleavage of target DNA simultaneously and covalently attaches to one of the strand. Damage to DNA target induces DNA repair.
- the DNA insert includes homology fragments which are used in homology recombination/repair. In this case, the DNA insert is presented by single Cas-topoisomerases.
- FIG. 10 shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs with topoisomerase or Protein A and exonuclease integrated in the constructs; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to the ends of the DNA insert.
- the topoisomerase or Protein A catalyzes cleavage of target DNA and simultaneously covalently attaches to one of the strands.
- the DNA insert includes homology fragments which are used in homology recombination/repair.
- the DNA insert is a presented by two Cas-topoisom erases which are specifically covalently attached to target DNA.
- FIG. 11 shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs with topoisomerase or Protein A and exonuclease integrated in the constructs; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to one end of the DNA insert, while the other end of the DNA insert is free.
- the topoisomerase or Protein A catalyzes cleavage of target DNA and simultaneously covalently attaches to one of the strands. Damage to DNA target induces DNA repair.
- the DNA insert includes homology fragments which are used in homology recombination/repair.
- the DNA insert is a presented by single a Cas that specifically recognizes and binds target DNA.
- FIGs. 12A and 12B illustrate various DNA insert formats that can be used with Cas (e.g. Casl2f) polypeptides (e.g. enzymes) to facilitate homologous recombination at a target site.
- Cas e.g. Casl2f
- polypeptides e.g. enzymes
- Figure 12A shows several diagrams of attachment of gRNAs to DNA inserts.
- double Cas polypeptides are attached to the insert with linear dsDNA hybridization; the insert includes target homology sequence required for the editing on either end.
- panel (B) a single cas is attached with linear dsDNA hybridization via one side of the insert, while the other side is protected from exonucleolytic degradation with modified nucleotides (e.g. using nucleotide modifications such as C3 '-spacer, hexanediol, l',2'-Dideoxyribose/dSpacer, PC Spacer, Spacer 9, or Spacer 18, or bond modifications such as phosphorothioate bonds).
- nucleotides e.g. using nucleotide modifications such as C3 '-spacer, hexanediol, l',2'-Dideoxyribose/dSpacer, PC Spacer, Spacer 9, or Spacer 18, or bond modifications such
- a single cas polypeptide is attached to dumbbell dsDNA containing an insert: one side of the insert is attached to Cas via oligo hybridization, while the other side lacks a hybridization site and simply has a dumbbell loop.
- the dumbbell includes two target homology sequences required to complete homologous recombination.
- double cas polypeptides are attached to dumbbell dsDNA containing an insert analogous to panel (C) but having a second hybridization site for an oligo that binds a gRNA on both sides of the dumbbell.
- a single cas is attached to dumbbell dsDNA containing an insert, but via a hybridization site internal to the homology regions.
- Figure 12B shows an example dumbbell sequence (5'- GAAAGGAAGCCCTGCTTCCTCCAGAGGGCGTCGCAGGACAGCTTTTCCTAGACAG GGGCTAGTATGTGCAtttcctgatgtcgatgtgCCAGGAGAGGAGGGAGAAATCCCTCCTCTC CTGGcacatcgacatcaggaaaTGCACATACTAGCCCCTGTCTAGGAAAAGCTGTCCTGCGAC GCCCTCTGGAGGAAGCAGGGCTTCCTTTCGTCAGTCAGTCAGTCAGTCAGTCAGT C AGTC AGTC AGTC AGTC AGTC AGTC AGTC A-3', SEQ ID NO: 74) that can be used in the embodiments described in Figure 12A, where underlined sequences (5'- GAAAGGAAGCCCTGCTTCCTCCAGAGGGCGTCGCAGGACAGCTTTTCCTAGACAGG GGCTAGTATGTGCAtttcctgatgtcgatgtgCCAGGAGAGGA-3' and 5'- TCC
- GTCAGTC AGTC AGTC AGTC AGTC AGTC AGTC AGTC AGTC AGTC AGTC AGTC AGTC AGTC AGTCAGTCAGTCAGTCA-3' can be used as a gRNA hybridization site.
- Such a dumbbell can be constructed by ligating the adaptor sequences (bolded) using e.g. T4 DNA ligase.
- FIG. 13 shows a schematic representation of an editing design including Cas, DNA polymerase and DNA inserts according to the current disclosure.
- the schematic emphasizes three possible paths after specific target recognition, cleavage, and annealing of the 3'- ssDNA tail to the cleaved and displaced target strand: (A) a path where the displaced and cleavage by Cas target DNA serves as a primer and DNA insert is a template; (B) a path where the insert’s 3'- ssDNA tail serves as primer and cleaved target DNA serves as a template; (C) a path similar to path B but wherein the target DNA is not cleaved.
- the paths generate stable attachment of the insert to target DNA and involve the host DNA repair mechanism to finalize editing.
- FIG. 14A shows a schematic explaining a method to incorporate long insert sequences where the insert have high homology to the target DNA. Shown are two methods with accessory 3' to 5' exonuclease and 5' to 3' exonuclease (e.g. where exonucleases are incorporated as a heterologous polypeptide between or in addition to split Casl2f domains).
- One of the strands of the cleaved DNA target is digested by 3' to 5' exonuclease or 5' to 3 'exonuclease. The digestion process competes with Host DNA repair mechanism (homologous recombination); as a result, the homology cross-point is moved away from the target site, increasing insert length.
- Figure 14B depicts schematics showing how the designs in Figure 14A can be implemented using Cas/polymerase/exonuclease fusions.
- Figures 15A, 15B, and 15C depict polypeptide diagrams showing how Cas/polymerase/exonuclease fusions as shown in Figures 14B and 14C can be constructed at a domain level.
- FIG. 15A depicts examples using split Casl2f and T5 exonuclease or exo III;
- FIG. 15B depicts examples using intact Casl2f and T5 exonuclease or ExoIII
- FIG. 15C depicts examples using Cas9 or Casl2 alongside Bst Polymerase and T5 exonuclease or Exo III.
- the left and rightmost domains represents Casl2f Sp or Pt
- the domains flanking the central domains are linkers and the central domain is exonuclease T5Exo or ExoIII.
- Figure 16 depicts an example hybrid gRNA/DNA insert usable with methods according to the disclosure.
- gRNA and insert DNA are covalently attached either by standard 5'-phosphate-3' bond or an alternative covalent conjugation method (e.g.
- Such hybrids can be constructed by ligation of gRNA to a DNA insert, chemical synthesis, or primer extension starting from a gRNA primer using DNA Pol I or Klenow fragment.
- the term “programmable nuclease” generally refers to endonucleases that are “targeted” (“programed”) to recognize and edit a pre-determined site in a genome of an organism.
- the programmable nuclease can induce site specific DNA cleavage at a pre-determined site in a genome.
- the programmable nuclease may be programmed to recognize a genomic location with a DNA binding protein domain, or combination of DNA binding protein domains.
- Cas programmable nucleases or Cas polypeptides are described in, e.g. Makarova, et al. Nat. Rev. Microbiol., 18, 67-83, Shmakov et al. Nat. Rev. Microbiol., 15, 169-182, and Karvelis et al. Nucleic Acids Research, 2020, Vol. 48, No. 9, 5016-5023, each of which is incorporated by reference herein in their entireties.
- the programmable nuclease is a Casl2 polypeptide, such as a Class 2, Type V-F polypeptide or a Casl2f polypeptide for which example domain organization, domain types (e.g.
- WED, RECI, RuvC, Nuc, and REC21 domains WED, RECI, RuvC, Nuc, and REC21 domains
- functional residues e.g. relative to SEQ ID NO: 84
- structure e.g. relative to SEQ ID NO: 84
- a “guide nucleic acid” or “guide polynucleotide” generally refers to a nucleic acid that may hybridize to another nucleic acid.
- a guide nucleic acid may be RNA.
- a guide nucleic acid may be DNA.
- the guide nucleic acid may be programmed to bind specifically to a nucleic acid with a particular sequence.
- the nucleic acid to be targeted, or the target nucleic acid may comprise nucleotides.
- the guide nucleic acid may comprise nucleotides.
- a portion of the target nucleic acid may be complementary to a portion of the guide nucleic acid.
- the strand of a double-stranded target polynucleotide that is complementary to and hybridizes with the guide nucleic acid may be called the complementary strand.
- the strand of the double-stranded target polynucleotide that is complementary to the complementary strand, and therefore may not be complementary to the guide nucleic acid may be called a noncomplementary strand.
- a guide nucleic acid may comprise a polynucleotide chain and can be called a “single guide nucleic acid.”
- a guide nucleic acid may comprise two polynucleotide chains and may be called a “double guide nucleic acid.”
- the term “guide nucleic acid” may be inclusive, referring to both single guide nucleic acids and double guide nucleic acids.
- Guide nucleic acids may comprise a nucleic acid targeting segment (e.g. a crRNA) and a protein binding sequence.
- Guide nucleic acids may comprise a nucleic acid targeting segment (e.g. a crRNA) a protein binding sequence, and a trans-activating RNA (e.g. a tracrRNA).
- a guide nucleic acid may comprise a segment that can be referred to as a “nucleic acidtargeting segment” a “nucleic acid-targeting sequence” or a “seed sequence”. In some cases, the sequence is 19-21 nucleotides in length. In some cases, “nucleic acid-targeting segment” or a “nucleic acid-targeting sequence” comprises a crRNA.
- a nucleic acid-targeting segment may comprise a sub-segment that may be referred to as a “protein binding segment” or “protein binding sequence” or “Cas protein binding segment”.
- guide RNA can generally refer to a nucleic acid with at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% sequence identity and/or sequence similarity to a wild type example guide RNA sequence (e.g., a type V guide RNA from S. pyogenes, S. aureus, etc).
- Guide RNA can refer to a nucleic acid with at most about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% sequence identity and/or sequence similarity to a wild type example guide RNA sequence.
- Guide RNA may refer to a modified form of a guide RNA that can comprise a nucleotide change such as a deletion, insertion, or substitution, variant, mutation, or chimera.
- a guide RNA may refer to a nucleic acid that can be at least about 60% identical to a wild type example guide RNA sequence over a stretch of at least 6 contiguous nucleotides.
- a guide RNA sequence can be at least about 60% identical, at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, or 100 % identical to a wild type example guide RNA sequence over a stretch of at least 6 contiguous nucleotides.
- sequence identity in the context of two or more nucleic acids or polypeptide sequences, generally refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as measured using a sequence comparison algorithm.
- Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of
- MUSCLE with default parameters
- MAFFT with parameters retree of 2 and maxiterations of 1000
- Novafold with default parameters
- HMMER hmmalign with default parameters.
- optically aligned in the context of two or more nucleic acids or polypeptide sequences, generally refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned to maximal correspondence of amino acids residues or nucleotides, for example, as determined by the alignment producing a highest or “optimized” percent identity score.
- WED domain generally refers to a domain (e.g. present in a Cas protein) interacting primarily with repeat: anti-repeat duplex of the sgRNA and PAM duplex.
- a WED domain can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences.
- HMMs Hidden Markov Models
- REC domain generally refers to a domain (e.g. present in a Cas protein) comprising at least one of two segments (RECI or REC2) that are alpha helical domains thought to contact the guide RNA.
- a REC domain or segments thereof can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences (e.g., Pfam PF19501 for domain RECI).
- HMMs Hidden Markov Models
- pharmaceutically acceptable carrier or “pharmaceutically acceptable excipient” as used herein generally refers to a diluent, adjuvant, excipient, or vehicle with which a probe of the disclosure is administered and which is approved by a regulatory agency of the Federal or a state government or listed in the U.S. Pharmacopeia or other generally recognized pharmacopeia for use in animals, and more particularly in humans.
- Such pharmaceutical carriers can be liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as pea-nut oil, soybean oil, mineral oil, sesame oil and the like.
- the pharmaceutical carriers can be saline, gum acacia, gelatin, starch paste, talc, keratin, colloidal silica, urea, and the like.
- the probe and pharmaceutically acceptable carriers can be sterile.
- Water can be a useful carrier when the composition is administered intravenously.
- Saline solutions and aqueous dextrose and glycerol solutions can also be employed as liquid carriers, particularly for injectable solutions.
- Suitable pharmaceutical carriers also include excipients such as glucose, lactose, sucrose, glycerol monostearate, sodium chloride, glycerol, propylene, glycol, water, ethanol and the like.
- the present compositions can also contain minor amounts of wetting or emulsifying agents, or pH buffering agents.
- the present compositions may take the form of solutions, emulsion, sustained-release formulations, or any other form suitable for use.
- the pharmaceutically acceptable excipient may comprise a transfection agent.
- Suitable transfection agents include, but are not limited to, linear or branched polyethylenimines (see e.g. Bonnet et al., (2008) Pharmaceut. Res. 25: 2972-2982, which is incorporated by reference herein in its entirety for all purposes), nanoparticles, lipid nanoparticles (LNPs, see e.g. in Finn et al. Cell Rep.
- variants of any of the enzymes, polypeptides, proteins, or domains described herein with one or more conservative amino acid substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three-dimensional structure or function of the polypeptide.
- Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity, and R chain length for one another. Additionally, or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., nonconserved residues) without altering the basic functions of the encoded proteins.
- Such conservatively substituted variants may include variants with at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of the endonuclease protein sequences described herein.
- such conservatively substituted variants are functional variants.
- Such functional variants can encompass sequences with substitutions such that the activity of one or more critical active site residues or guide polynucleotide binding residues of the endonuclease are not disrupted.
- variants of any of the enzymes, polypeptides, proteins, or domains described herein with substitution of one or more catalytic residues to decrease or eliminate activity of the enzyme, polypeptide, protein, or domain e.g. decreased- activity variants.
- a decreased activity variant of an enzyme, polypeptide, protein, or domain described herein comprises a disrupting substitution of at least one, at least two, three, four, five, six, or all catalytic residues.
- any of the endonucleases described herein can comprise a nickase mutation.
- any of the endonucleases described herein can comprise a RuvC domain lacking nuclease activity.
- any of the endonucleases described herein can be configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, any of the endonucleases described herein can comprise can be configured to lack endonuclease activity or be catalytically dead.
- the present disclosure provides for a composition comprising a fusion protein comprising (a) a first fragment comprising WED and RECI domains of, derived from, or obtained from a first Casl2 polypeptide or enzyme (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme); (b) a heterologous domain; and (c) a second fragment comprising RuvC and Nuc domains of, derived from, or obtained from a second Cast 2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme).
- a first Casl2 polypeptide or enzyme e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme
- the Casl2 enzyme is configured to bind a double-stranded deoxyribonucleic acid (DNA) site.
- the second fragment further comprises a REC2 domain of, derived from, or obtained from a second Casl2 polypeptide.
- the first fragment and the second fragment are derived from a same Cast 2 polypeptide (e.g. the first and the second Casl2 polypeptide are the same).
- the first fragment and the second fragment are derived from different Cast 2 polypeptides (e.g. the first and the second Casl2 polypeptide are different).
- the first fragment and the second fragment do not comprise an inactivating mutation in an active site residue of the first or the second Cast 2 polypeptide. In some embodiments, the first fragment and the second fragment do comprise an inactivating mutation in an active site residue of the first or the second Cast 2 polypeptide.
- the fusion protein further comprises a linker between the first fragment and the heterologous domain, or between the heterologous domain and the second fragment. In some embodiments, the linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof.
- the linker comprises LPXTG, GGG, (GGG) n , (GGGGS)n, (GGGS)n, N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond.
- the composition further comprises an insert DNA molecule.
- the composition further comprises a guide polynucleotide configured to interact with the Casl2 polypeptide.
- the Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be any suitable Casl2 polypeptide (e.g. a Casl2 polypeptide that can be separated into non-contiguous fragments while still retaining enzymatic or binding activity).
- the Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be from particular species, e.g.
- the Casl2 polypeptide e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide
- the Casl2 polypeptide is a Class 2, Type V-F or Casl2f polypeptide (for which example domain organization, functional residues, and structure relative to SEQ ID NO: 84 are outlined in e.g.
- a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more active site residues D326, E422, D510 (from the RuvC domain), or R490 (from the Nuc domain) relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine).
- a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more PAM interacting residues S142, R163, Y146, S286, Y146, K196, RECl c residues 134-152, or Hl 39 relative to SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine).
- the Cast 2 polypeptide e.g.
- first or second Cast 2 polypeptide, or intact Casl2 polypeptide can comprise WED, RECI, RuvC, Nuc, or REC2 domains (or any combination thereof) having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to WED, RECI, RuvC, Nuc, or REC2 domains of any one of SEQ ID Nos: 1, 2, 5, 6, 11, 13, 15, 24-43, or 84, or a variant thereof.
- the heterologous domain can comprise any suitable polypeptide residues or domains of appropriate size.
- the heterologous domain comprises (e.g. consists of) at least about 100-1500 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-2000 amino acids in length. In some cases, the heterologous domain comprises (e.g.
- the heterologous domain comprises (e.g.
- the heterologous domain comprises (e.g.
- the heterologous domain can comprise an enzyme.
- the heterologous domain can comprise a DNA-binding or a DNA-conjugating domain.
- the heterologous domain can comprise a domain with DNA-dependent DNA polymerase activity or a domain with topoisomerase activity.
- the heterologous domain can comprise a T7 DNA polymerase domain, aBst polymerase domain or an analog thereof (e.g. a Bst large fragment polymerase domain or aBst.
- T7 DNA polymerase domain a T4 DNA polymerase domain, a Taq polymerase domain, a Vent polymerase domain, a Q5 polymerase domain, a Klenow fragment domain, a DNA polymerase theta domain, or a Phi29 polymerase domain, or a functional fragment or derivative thereof.
- Example organization, structure, and function of T7 DNA polymerase can be found in e.g. Dwine et al. Curr Opin Struct Biol. 1998 Dec;8(6):704-12. doi: 10.1016/s0959- 440x(98)80089-4 and UniProtKB/Swiss-Prot accession no. P00581.1, both of which are incorporated by reference herein for all purposes.
- a T7 DNA polymerase domain can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) H506, R518, K522, Y526, E480, or Y530, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 46.
- critical (e.g. active site) residue(s) H506, R518, K522, Y526, E480, or Y530 or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 46.
- a T7 DNA polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 46, or a variant thereof.
- Example organization, structure, and function of large fragment Bst polymerase e.g. can be found in e.g. SEQ ID NO: 45
- a Bst polymerase domain (e.g. Bst large fragment or Bst 2.0 such as SEQ ID Nos: 44 or 45) according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) D653, D830, E831, H829, Q797, R615, or E658, or any combination thereof relative to (e.g.
- a Bst polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 45, or a variant thereof.
- Example organization, structure, and function of phi29 polymerase can be found in, e.g.
- a phi29 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g.
- a phi29 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 51, or a variant thereof.
- Example organization, structure, and function of Taq polymerase can be found in, e.g. Eun. “Enzymology Primer for Recombinant DNA Technology” (Chapter 6, DNA polymerases).
- a Taq polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) G308, V310, L356, R405, R25, or R74 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 48.
- critical (e.g. active site) residue(s) G308, V310, L356, R405, R25, or R74 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 48.
- a Taq polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 48, or a variant thereof.
- Example organization, structure, and function of T4 polymerase can be found in, e.g. Wang et al. Biochemistry. 1996 Jun 25;35(25):8110-9.
- a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y320 or E191 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 47.
- critical (e.g. active site) residue(s) Y320 or E191 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 47.
- a T4 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 47, or a variant thereof.
- Example organization, structure, and function of Klenow polymerase can be found in, e.g. Polesky et al. J Biol Chem.
- a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y766, R841, N845, N849, R668, or D882 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 50.
- a Klenow polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least
- a vent polymerase can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g.
- a Vent polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 49, or a variant thereof.
- the heterologous domain can comprise a topoisomerase domain.
- the heterologous domain can comprise a Type I (e.g. Type 1 A) or Type II topoisomerase domain, any combination thereof, or a functional fragment or derivative thereof.
- Type I e.g. type IA
- Example organization and function of Type I (e.g. type IA) topoisomerases can be found in e.g. Chen et al. J Biol Chem. 1998 Mar 13;273(11):6050-6. doi: 10.1074/jbc.273.11.6050, which is incorporated by reference herein for all purposes.
- a Type I topoisomerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) E9, H33, Di l l, El 15, N309, E313, T318, R321, T322, D323, H365, or T496, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 85.
- Example organization, structure, and function of Type II topoisomerases can be found in e.g. Liu et al. J Biol Chem. 1998 Aug 7;273(32):20252-60.
- a Type II topoisomerase can comprise one or more critical (e.g. active site) residue(s) Y782, R690, D697, K700, R704, or R781 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 86, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine).
- the heterologous domain can comprise E. coll Eubacterial DNA topoisomerase I, E.
- the heterologous domain can comprise E. coli eubacterial DNA gyrase, E.
- the insert DNA molecule can have a variety of structures and configurations suitable for insertion into genomic DNA (e.g. via homologous recombination or other DNA repair methods).
- the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
- the insert DNA molecule comprising a region with complementarity to a region 5' to the double-stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site.
- the region with complementarity to a region 5' to the double-stranded DNA site comprises (e.g.
- the region with complementarity to a region 5' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides.
- the region with complementarity to a region 5' to the double-stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides.
- the region with complementarity to a region 5' to the double- stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220,
- the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides.
- the region with complementarity to a region 3' to the double- stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides.
- the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides to at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides, or any range between these values.
- the insert DNA molecule further comprises a transgene.
- the transgene comprises an open reading frame (ORF).
- the transgene comprises a promoter operably linked to an ORF.
- the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt).
- the transgene comprises at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt).
- the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500,
- the transgene is flanked by the region with complementarity to a region 5' to the double-stranded DNA site and the region with complementarity to a region 3' to the nucleic acid site.
- the transgene comprises an open reading frame (ORF).
- the transgene comprises a promoter operably linked to an ORF.
- the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide.
- the insert DNA molecule is linked to a hydroxyl (e.g. catalytic hydroxyl) group of the domain having DNA topoisomerase activity at a first end, and the insert DNA molecule comprises the region homologous to a region 5' to the nucleic acid site or the region homologous to a region 3' to the nucleic acid site at a second end.
- the insert DNA molecule comprises a first end configured to hybridize with a hybridization domain of a guide polynucleotide at the 3' end of the insert DNA when the guide polynucleotide further comprises a hybridization domain at a 3' end.
- the guide polynucleotide configured to interact with the Cast 2 polypeptide can be any suitable guide polynucleotide configured to hybridize to the DNA site (e.g. an RNA comprising guide suitable for interacting with at Casl2f enzyme or a Class 2, Type V-F enzyme, or a mixture of RNA and DNA comprising a region configured to hybridize or complementary to the DNA site).
- the guide polynucleotide further comprises a hybridization domain at a 3' end.
- the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides to at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides, or any range between these values.
- the composition can comprise a pharmaceutically acceptable excipient.
- the excipient can comprise a transfection agent (e.g. a liposome or a lipid nanoparticle).
- a fusion protein of the disclosure is provided in a lipid nanoparticle (LNP) by encapsulating the fusion protein with an optional guide polynucleotide or insert DNA molecule into the LNP. This can be performed using methodologies documented e.g. in Finn et al. Cell Rep. 2018 Feb 27;22(9):2227-2235. doi: 10.1016/j.celrep.2018.02.014 or Yin et al. Nat Biotechnol. 2016 Mar;34(3):328-33. doi: 10.1038/nbt.3471, both of which are incorporated by reference herein in their entireties for all purposes.
- composition comprising a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme) configured to bind a doublestranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains of, derived from, or obtained from a first Casl2 polypeptide (e.g.
- a Casl2 polypeptide e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme
- the first segment further comprises WED, RECI, RuvC, REC2, and Nuc domains of, derived from, or obtained from the first Cast 2 polypeptide.
- the fusion protein further comprises a linker between (a), (b), or (c).
- the linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof.
- the linker comprises LPXTG, GGG, (GGG) n , (GGGGS) n , (GGGS) n , N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/Spy Catcher sequences linked by an isopeptide bond.
- the composition further comprises an insert DNA molecule.
- the composition further comprises a guide polynucleotide configured to interact with the Cast 2 polypeptide (e.g. first or second Cast 2 polypeptide, or intact Cast 2 polypeptide).
- the Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be any suitable Casl2 polypeptide (e.g. a Casl2 polypeptide that can be separated into non-contiguous fragments while still retaining enzymatic or binding activity).
- the Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be from particular species, e.g.
- the Casl2 polypeptide e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide
- the Casl2 polypeptide is a Class 2, Type V-F or Casl2f polypeptide (for which example domain organization, functional residues, and structure relative to SEQ ID NO: 84 are outlined in e.g.
- a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more active site residues D326, E422, D510 (from the RuvC domain), or R490 (from the Nuc domain) relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine).
- a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more PAM interacting residues S142, R163, Y146, S286, Y146, K196, RECl c residues 134-152, or Hl 39 relative to SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine).
- the Cast 2 polypeptide e.g.
- first or second Cast 2 polypeptide, or intact Casl2 polypeptide can comprise WED, RECI, RuvC, Nuc, or REC2 domains (or any combination thereof) having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to WED, RECI, RuvC, Nuc, or REC2 domains of any one of SEQ ID Nos: 1, 2, 5, 6, 11, 13, 15, 24-43, or 84, or a variant thereof.
- the heterologous domain can comprise any suitable polypeptide residues or domains of appropriate size.
- the heterologous domain comprises (e.g. consists of) at least about 100-1500 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-2000 amino acids in length. In some cases, the heterologous domain comprises (e.g.
- the heterologous domain comprises (e.g.
- the heterologous domain comprises (e.g.
- the heterologous domain can comprise an enzyme.
- the heterologous domain can comprise a DNA-binding or a DNA-conjugating domain.
- the heterologous domain can comprise a domain with DNA-dependent DNA polymerase activity or a domain with topoisomerase activity.
- the heterologous domain can comprise a T7 DNA polymerase domain, aBst polymerase domain or an analog thereof (e.g. a Bst large fragment polymerase domain or aBst.
- T7 DNA polymerase domain a T4 DNA polymerase domain, a Taq polymerase domain, a Vent polymerase domain, a Q5 polymerase domain, a Klenow fragment domain, a DNA polymerase theta domain, or a Phi29 polymerase domain, or a functional fragment or derivative thereof.
- Example organization, structure, and function of T7 DNA polymerase can be found in e.g. Dwine et al. Curr Opin Struct Biol. 1998 Dec;8(6):704-12. doi: 10.1016/s0959- 440x(98)80089-4 and UniProtKB/Swiss-Prot accession no. P00581.1, both of which are incorporated by reference herein for all purposes.
- a T7 DNA polymerase domain can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) H506, R518, K522, Y526, E480, or Y530, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 46.
- critical (e.g. active site) residue(s) H506, R518, K522, Y526, E480, or Y530 or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 46.
- a T7 DNA polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO:
- Example organization, structure, and function of large fragment Bst polymerase can be found in e.g. SEQ ID NO: 45
- SEQ ID NO: 45 can be found in e.g. Oscorbin et al. Comput Struct Biotechnol J. 2023 Sep 12:21 :4519-4535. doi: 10.1016/j csbj .2023.09.008.
- eCollection 2023 which is incorporated by reference herein in its entirety for all purposes.
- a Bst polymerase domain e.g.
- Bst large fragment or Bst 2.0 such as SEQ ID Nos: 44 or 45
- Bst large fragment or Bst 2.0 can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) D653, D830, E831, H829, Q797, R615, or E658, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 45.
- a Bst polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 45, or a variant thereof.
- Example organization, structure, and function of phi29 polymerase can be found in, e.g. Del Prado et al. Sci Rep. 2019 Jan 29;9(1):923. doi:
- a phi29 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g.
- a phi29 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least
- a Taq polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) G308, V310, L356, R405, R25, or R74 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 48.
- critical (e.g. active site) residue(s) G308, V310, L356, R405, R25, or R74 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 48.
- a Taq polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 48, or a variant thereof.
- Example organization, structure, and function of T4 polymerase can be found in, e.g. Wang et al. Biochemistry. 1996 Jun 25;35(25):8110-9.
- a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y320 or E191 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 47.
- critical (e.g. active site) residue(s) Y320 or E191 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 47.
- a T4 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 47, or a variant thereof.
- Example organization, structure, and function of Klenow polymerase can be found in, e.g. Polesky et al. J Biol Chem.
- a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y766, R841, N845, N849, R668, or D882 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 50.
- a Klenow polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least
- a vent polymerase can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g.
- a Vent polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 49, or a variant thereof.
- the heterologous domain can comprise a topoisomerase domain.
- the heterologous domain can comprise a Type I (e.g. Type 1 A) or Type II topoisomerase domain, any combination thereof, or a functional fragment or derivative thereof.
- Type I e.g. type IA
- Example organization and function of Type I (e.g. type IA) topoisomerases can be found in e.g. Chen et al. J Biol Chem. 1998 Mar 13;273(11):6050-6. doi: 10.1074/jbc.273.11.6050, which is incorporated by reference herein for all purposes.
- a Type I topoisomerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) E9, H33, Di l l, El 15, N309, E313, T318, R321, T322, D323, H365, or T496, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 85.
- Example organization, structure, and function of Type II topoisomerases can be found in e.g. Liu et al. J Biol Chem. 1998 Aug 7;273(32):20252-60.
- a Type II topoisomerase can comprise one or more critical (e.g. active site) residue(s) Y782, R690, D697, K700, R704, or R781 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 86, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine).
- the heterologous domain can comprise E. coll Eubacterial DNA topoisomerase I, E.
- the heterologous domain can comprise E. coli eubacterial DNA gyrase, E.
- the insert DNA molecule can have a variety of structures and configurations suitable for insertion into genomic DNA (e.g. via homologous recombination or other DNA repair methods).
- the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
- the insert DNA molecule comprising a region with complementarity to a region 5' to the double-stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site.
- the region with complementarity to a region 5' to the double-stranded DNA site comprises (e.g. consists of) at least about 4 bp or nucleotides to at least about 400 bp or nucleotides or at least about 4 bp or nucleotides to at least about 400 bp or nucleotides.
- the region with complementarity to a region 5' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides.
- the region with complementarity to a region 5' to the double-stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides.
- the region with complementarity to a region 5' to the double- stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220,
- the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides.
- the region with complementarity to a region 3' to the double- stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides.
- the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290,
- the insert DNA molecule further comprises a transgene.
- the transgene comprises an open reading frame (ORF).
- the transgene comprises a promoter operably linked to an ORF.
- the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt).
- the transgene comprises at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt).
- the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500,
- the transgene is flanked by the region with complementarity to a region 5' to the double-stranded DNA site and the region with complementarity to a region 3' to the nucleic acid site.
- the transgene comprises an open reading frame (ORF).
- the transgene comprises a promoter operably linked to an ORF.
- the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide.
- the insert DNA molecule is linked to a hydroxyl (e.g. catalytic hydroxyl) group of the domain having DNA topoisomerase activity at a first end, and the insert DNA molecule comprises the region homologous to a region 5' to the nucleic acid site or the region homologous to a region 3' to the nucleic acid site at a second end.
- the insert DNA molecule comprises a first end configured to hybridize with a hybridization domain of a guide polynucleotide at the 3' end of the insert DNA when the guide polynucleotide further comprises a hybridization domain at a 3' end.
- the guide polynucleotide configured to interact with the Cast 2 polypeptide can be any suitable guide polynucleotide (e.g. an RNA comprising guide suitable for interacting with at Casl2f enzyme or a Class 2, Type V-F enzyme, or a mixture of RNA and DNA).
- the guide polynucleotide further comprises a hybridization domain at a 3' end.
- the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides.
- the hybridization domain comprises at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides to at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides, or any range between these values.
- the composition can comprise a pharmaceutically acceptable excipient.
- the excipient can comprise a transfection agent (e.g. a liposome or a lipid nanoparticle).
- a fusion protein of the disclosure is provided in a lipid nanoparticle (LNP) by encapsulating the fusion protein with an optional guide polynucleotide or insert DNA molecule into the LNP. This can be performed using methodologies documented e.g. in Finn et al. Cell Rep. 2018 Feb 27;22(9):2227-2235. doi: 10.1016/j.celrep.2018.02.014 or Yin et al. Nat Biotechnol. 2016 Mar;34(3):328-33. doi: 10.1038/nbt.3471, both of which are incorporated by reference herein in their entireties for all purposes.
- the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to the site (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein the first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double- stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site; and (iii) a guide polynucleotide configured to interact with the first Casl2 polypeptid
- the second fragment further comprises a REC2 domain of the second Casl2 polypeptide.
- the first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.
- the heterologous domain comprises at least about 100-1500 amino acids in length.
- the heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity.
- the first Casl2 polypeptide and the second Casl2 polypeptide comprise a same Casl2 polypeptide.
- the first Casl2 polypeptide and the second Casl2 polypeptide comprise different Casl2 polypeptides.
- the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a singlestranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
- the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide.
- the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA.
- the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to the site (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein the first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double- stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site; and (iii) a guide polynucleotide configured to interact with the first Casl2 polypeptid
- the second fragment further comprises a REC2 domain of the second Casl2 polypeptide.
- the first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.
- the heterologous domain comprises at least about 100-1500 amino acids in length.
- the heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity.
- the first Casl2 polypeptide and the second Casl2 polypeptide comprise a same Casl2 polypeptide.
- the first Casl2 polypeptide and the second Casl2 polypeptide comprise different Casl2 polypeptides.
- the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a singlestranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
- the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide.
- the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA.
- the cell is a bacterial, archaeal, plant, mammalian, primate, or human cell.
- the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to the site
- a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (c) a third segment comprising a heterologous domain of at least about 100 amino acids; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to the doublestranded DNA site or a region with complementarity to a region 3' to the nucleic acid site; and (iii) a guide polynucleotide configured to interact with the first Cast 2 polypeptide or the second Cast 2 polypeptide and configured to hybridize to the DNA site.
- the Casl2 polypeptide, the first Casl2 polypeptide, or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.
- the heterologous domain comprises at least about 100-1500 amino acids in length.
- the heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.
- the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
- the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Casl2 polypeptide.
- the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA.
- said cell is a bacterial, archaeal, plant, mammalian, primate, or human cell.
- a kit for disrupting a DNA site comprising (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Cast 2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cast 2 polypeptide, wherein the first and second Cast 2 polypeptide are configured to bind a doublestranded deoxyribonucleic acid (DNA) site; and (ii) a guide polynucleotide configured to interact with the first Cast 2 polypeptide or the second Cast 2 polypeptide and configured to hybridize to the DNA site.
- a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Cast 2 polypeptide; (b) a heterologous domain comprising at least about 100
- the kit further comprises (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double-stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site.
- the second fragment further comprises a REC2 domain of the second Casl2 polypeptide.
- the first or second Casl2 polypeptide is a Class 2, Type V- F or a Casl2f polypeptide.
- the heterologous domain comprises at least about 100-1500 amino acids in length.
- the heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.
- the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
- the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Casl2 polypeptide.
- the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA.
- the kit further comprises a transfection agent. In some embodiments, the kit further comprises instructions for targeting the DNA site.
- kits for disrupting a DNA site comprising (i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; (ii) a guide polynucleotide configured to interact with the first Cast 2 polypeptide or the second Cast 2 polypeptide and configured to hybridize to the DNA site.
- a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domain
- the kit further comprises (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double- stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site.
- the second fragment further comprises a REC2 domain of the second Cast 2 polypeptide.
- the first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.
- the heterologous domain comprises at least about 100- 1500 amino acids in length.
- the heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.
- the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
- insert DNA molecule is: (i) linked to the first or the second Casl2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Casl2 polypeptide.
- the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA.
- the kit further comprises a transfection agent. In some embodiments, the kit further comprises instructions for targeting the DNA site.
- This example demonstrates that Casl2f group enzymes can be rearranged into a split domain format inserting a heterologous domain in between N- and C-terminal domains to allow for e.g. new enzymatic activity while simultaneously preserving specific guided DNA cleavage activity.
- the biomass was lysed with BugBuster protein extraction reagent (EMD Millipore), which additionally included 90U rLysozyme per 10ml of lysate (EMD Millipore), 1 tablet protease inhibitor per 10 mL of lysate (Pierce Protease Inhibitor mini tablets, EDTAS- firee from Thermo Scientific), 50mM sodium phosphate pH7.7, 0.05% TritonX, and 2.5mM TCEP. Lysis was conducted at 12°C for 45 min.
- the lysate was mixed with dilution buffer (50mM sodium phosphate pH7.7, IM NaCl, 0.05% TritonX, 2.5mM TCEP) using a ratio of 1 : 1 and incubated for 45 min at 12°C. After incubation the preparation was centrifugated 14000rpm for Ih at 8C. Purification was conducted using a batch method with His- Affinity Gel (Zymo Research), and included a loading procedure, wash procedure, and elution procedure. The washing buffer included 50mM sodium phosphate, 0.5M NaCl, 30mM Imidazole, 0.05% TritonX and 2.5mM TCEP.
- dilution buffer 50mM sodium phosphate pH7.7, IM NaCl, 0.05% TritonX, 2.5mM TCEP
- the protein was eluted using 50mM sodium phosphate pH7.7, 300mM NaCl, 300mM Imidazole, 0.05% TritonX and 2.5mMTCEP.
- the eluted protein was dialyzed at room temperature for 3h using Slide-A-Lyzer Dialysis Cassette G2 (Thermo Scientific) where the dialysis buffer included: 20mM Tris-HCl pH7.5, 300mM NaCl, 0.05% TritonX and 2mM DTT.
- the products of purification were analyzed using PAGE-SDS electrophoresis (see the Figure 7 and 8 top panel) and quantify using The Qubit Protein Assay (Therm ofi scher).
- ribonucleoprotein complexes from the purified proteins were constructed.
- six different variants were tested - SpCasl2f (SEQ ID NO: 70), Ptcasl2f (SEQ ID NO: 71), SpCasl2f-inter (SEQ ID NO: 9), Ptcasl2 C-tag (SEQ ID NO: 72) and Ptcasl2f N-tag (SEQ ID NO: 73).
- the complex formation was conducted at 37°C for 30 min. the reaction includes: 1 pM gRNA, 1 pM Cas variant, 14mM Tris-HCl pH 7.5, 80mM NaCl, ImM DTT and 0.01% TritonX.
- the gRNA_Sp (SEQ ID NO: 56)was used in the reaction with enzymes including Sp Casl2f components were the gRNA Pt (SEQ ID NO: 57) was used with Pt Casl2f components including enzymes.
- the DNA Sp cleavage substrate (target DNA) was used in the reactions with enzymes including Sp Casl2f components, while DNA Pt was used in the reaction with Pt Casl2f components including enzymes; both target DNA substrates were 513bp long and included the target sequence AGTTGACCCAACGTCGCCGG.
- the reaction was conducted at 37°C for Ih.
- the Products of the reaction were analyzed using Agarose-gel electrophoresis. Successful cleavage reactions generated two products: ⁇ 215bp and ⁇ 298bp.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Medicinal Chemistry (AREA)
- Plant Pathology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Mycology (AREA)
- Peptides Or Proteins (AREA)
Abstract
Provided herein are improved methods, compositions, and systems for editing genomic DNA, including editing genomic DNA by inserting long DNA sequences into genomic DNA.
Description
POLYPEPTIDE FUSIONS OR CONJUGATES FOR GENE EDITING
CROSS-REFERENCE
[0001] This application claims the benefit of U.S. Provisional Application No. 63/380,047 filed October 18, 2022, entitled “POLYPEPTIDE FUSIONS OR CONJUGATES FOR GENE EDITING”, which application is incorporated by reference herein in its entirety.
BACKGROUND
[0002] Programmable nucleases such as CRISPR-associated Cas endonucleases have revolutionized the ability to perform gene editing in organisms in a precise, site-directed manner.
SEQUENCE LISTING
[0003] The instant application contains a Sequence Listing which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on October 5, 2023, is named 59409-70260 l_Seq_Listing.xml and is 136,990 bytes in size.
SUMMARY
[0004] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cas 12 polypeptide, wherein said first and second Cas 12 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site.
[0005] In some aspects, the present disclosure provides fora composition comprising a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (ii) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (c) a third segment comprising a heterologous domain of at least about 100 amino acids.
[0006] In some aspects, the present disclosure provides for a fusion protein having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at
least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity identity to any of the polypeptides described herein.
[0007] In some aspects, the present disclosure provides for a method of editing a nucleic acid site in a cell, comprising contacting said cell with any of the compositions described herein. [0008] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site; and (iii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site.
[0009] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site (i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (c) a third segment comprising a heterologous domain of at least about 100 amino acids; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said doublestranded DNA site or a region with complementarity to a region 3' to said nucleic acid site; and (iii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site.
[0010] In some aspects, the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Cast 2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cast 2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a doublestranded deoxyribonucleic acid (DNA) site; and (ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site.
[0011] In some aspects, the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide and configured to hybridize to said DNA site. [0012] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first fragment comprising WED and RECI domains derived from a Casl2 enzyme; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains from a Cast 2 enzyme, wherein said Casl2 enzyme is configured to bind a double-stranded deoxyribonucleic acid (DNA) site. In some embodiments, said second fragment further comprises a REC2 domain derived from a Casl2 enzyme. In some embodiments, said Casl2 enzyme is a Class 2, Type V-F or a Casl2f enzyme. In some embodiments, said heterologous domain comprises at least about 10, 25, 50, 75, or 100 to about 1500 amino acids in length. In some embodiments, said heterologous domain comprises at least about 876 amino acids or at least about 900 amino acids. In some embodiments, said heterologous domain comprises at most about 10, 25, 50, 75, 100, 250, 500. 750, 1000, 1200, 1500, 1700, 2000, 2200, 2500, 2700, or 3000 amino acids in length. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, Klenow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof. In some embodiments, said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain. In some embodiments, said Type I topoisomerase domain comprises E. coll Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase IIII, H. sapiens DNA topoisomerase Illa or 11 ip, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H.
sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or M. kandleri hyperthermophilic eubacterial DNA topoisomerase V, or a functional fragment thereof. In some embodiments, said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain. In some embodiments, said type II topoisomerase domain comprises E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or IIJ3, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof. In some embodiments, said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55. In some embodiments, said first fragment and said second fragment are derived from a same Cast 2 enzyme. In some embodiments, said first fragment and said second fragment are derived from a different Cast 2 enzyme. In some embodiments, said first fragment and said second fragment do not comprise an inactivating mutation in an active site residue of said Cast 2 enzyme. In some embodiments, said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof. In some embodiments, said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43. In some embodiments, said composition further comprises an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site. In some embodiments, said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp. In some embodiments, said insert nucleic acid sequence comprises at least about Ibp to at least about 20 kb. In some embodiments, said insert nucleic acid sequence comprises at least about 100 bp, 250 bp, 500 bp, 750 bp, 1 kb, 1.2 kb, 1.5 kb, 1.7 kb, 2.0 kb, 2.5 kb, 3 kb, 3.5 kb, 4 kb, 4.5 kb, 5 kb, 6 kb, 6.5 kb, 7 kb, 7.5 kb, 8 kb, 8.5 kb, 9 kb, 9.5 kb, 10 kb, 11 kb, 12 kb, 13 kb, 14 kb, 15 kb, 16 kb, 17 kb, 18 kb, 19 kb, 20 kb, or any range between these values. In some embodiments, said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single- stranded deoxyribonucleic acid molecule, or at least partially a doublestranded deoxyribonucleic acid molecule. In some embodiments, said insert DNA molecule is: (i) linked to said programmable nuclease; (ii) linked to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme; or (iii) hybridized to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme. In some embodiments, said composition further comprises a guide polynucleotide configured to interact with said Cas protein, wherein (a) said
guide polynucleotide further comprises a hybridization domain at a 3' end; and (b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. In some embodiments, said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA. In some embodiments, said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end. In some embodiments, said composition further comprises a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment. In some embodiments, said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. In some embodiments, said linker comprises LPXTG (SEQ ID NO: 79), GGG (SEQ ID NO: 80), (GGG)n(SEQ ID NO: 81), (GGGGS)n(SEQ ID NO: 82), (GGGS)n(SEQ ID NO: 83), NI-7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond.
[0013] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first segment comprising a Casl2 enzyme configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains derived from a Casl2 enzyme; or (ii) a sequence comprising RuvC, REC2, and Nuc domains from a Casl2 enzyme; and (c) a third segment comprising a heterologous domain of at least about 10, 25, 50, 75, or 100 amino acids. In some embodiments, said first domain further comprises WED, RECI, RuvC, REC2, and Nuc domains from said Casl2 In some embodiments, said heterologous domain comprises at least about 10, 25, 50, 75, or 100 to about 900 amino acids in length. In some embodiments, said heterologous domain comprises at least about 100 to about 1500 amino acids in length. In some embodiments, said heterologous domain comprises at least about 876 amino acids or at least about 900 amino acids. In some embodiments, said heterologous domain comprises at most about 10, 25, 50, 75, 100, 250, 500, 750, 1000, 1200, 1500, 1700, 2000, 2200, 2500, 2700, or 3000 amino acids in length. In some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, said heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue. In
some embodiments, said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, Klenow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof. In some embodiments, said heterologous domain comprises a domain with DNA- dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof. In some embodiments, said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain. In some embodiments, said Type I topoisomerase domain comprises E. coli Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase IIII, H. sapiens DNA topoisomerase Illa or 111[l, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or AT. kandleri hyperthermophilic eubacterial DNA topoisomerase V, or a functional fragment thereof. In some embodiments, said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain. In some embodiments, said type II topoisomerase domain comprises E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or IIJ3, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof. In some embodiments, said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55, or a variant thereof. In some embodiments, said first fragment and said second fragment are derived from a same Cast 2 enzyme. In some embodiments, said first fragment and said second fragment are derived from a different Casl2 enzyme. In some embodiments, said first fragment and said second fragment do not comprise an inactivating mutation in an active site residue of said Casl2 enzyme. In some embodiments, said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof. In some embodiments, said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43, or a variant thereof. In some embodiments, the composition further comprises an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site. In some embodiments, said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp. In some embodiments, said insert nucleic acid sequence
comprises at least about Ibp to at least about 20 kb. In some embodiments, said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a singlestranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, said insert DNA molecule is: (i) linked to said programmable nuclease; (ii) linked to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme; or (iii) hybridized to a guide polynucleotide configured to interact with a Cas endonuclease, wherein said programmable nuclease is a Cas enzyme. In some embodiments, said composition further comprises a guide polynucleotide configured to interact with said Cas protein, wherein (a) said guide polynucleotide further comprises a hybridization domain at a 3' end; and (b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. In some embodiments, said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA. In some embodiments, said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end. In some embodiments, the composition further comprises a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment. In some embodiments, said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. In some embodiments, said linker comprises LPXTG, GGG, (GGG)n, (GGGGS)n, (GGGS)n, N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond.
[0014] In some aspects, the present disclosure provides for a fusion protein comprising a sequence having at least 80% identity to any one of SEQ ID NOs: 20-23, or a variant thereof. [0015] In some aspects, the present disclosure provides for a system comprising any of the components of the compositions described herein.
[0016] In some aspects, the present disclosure provides for a method of editing a DNA locus in a cell comprising contacting to a cell any of the compositions described herein, or a nucleic acid comprising or encoding any of the compositions described herein. In some embodiments, the cell is a bacterial, archaeal, plant, mammalian, primate, or human cell.
[0017] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only
illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
INCORPORATION BY REFERENCE
[0018] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
[0020] Figure 1 (FIG. 1) depicts a schematic representation of split Casl2f functional domains. Examples are based on two casl2f proteins: Sp casl2f and Pt casl2f. It was considered that fA and fB can be integrated with different intervening heterologous domains to add new enzymatic activity while simultaneously preserving specific guided DNA cleavage activity. Abbreviations: Nuclease lobe (NUC), Wedge (WED), Recognition lobe RECI and REC2, Nuclease (Lid RuvC).
[0021] Figure 2 (FIG. 2) depicts examples of split Casl2f proteins incorporating various heterologous polymerase domains. The Casl2f (Sp casl2f and Pt casl2f) functional domains were extracted and integrated with heterologous sequences (Central bar— DNA polymerase Bst or Phi29) such that the Casl2f components retain structural/functional integrity. The polymerase function is strategically located in the structure such that it can best support activity of the polymerases on DNA.
[0022] Figure 3 (FIG. 3) depicts examples of split Casl2f proteins incorporating SpyCatcher domains to allow linkage to an additional protein in vitro.
[0023] Figures 4A and 4B (FIGs. 4A and 4B) depict examples of an alternative anchor/connector design incorporating one functional spCasl2f domain and an additional Casl2f fragment to allow for anchoring of the heterologous domain at the proper place in the complex. In this embodiment, the natural dimer structure of Casl2f is used to provide a
scaffold. The leftmost fragment is a full Sp casl2f monomer, the central domains include SpyCatcher (which serves here as a linker but also can be used to attach different heterologous polypeptides). The rightmost domain is a structural monomer used in a minimal truncated form where the role is limited role of anchor/connector of an accessory protein (grey box, e.g. DNA polymerase or Topoisomerase). In some embodiments, SpyCatcher can be replaced with linker sequences.
[0024] Figures 5A and 5B (FIGs. 5A and 5B) depict expression/purification and activity assays for Sp and Pt casl2f and derivatives constructs. These assays illustrate that domains extracted from Casl2f and integrated with SpyCatcher retain specific guided cleavage activity. [0025] Figure 5A (FIG. 5A) depicts expression and activity assays for a first batch of split casl2f constructs. The top panel shows SDS-PAGE of the purification products in Example 1. The analyses verified production of all variants described in the example. Lane order: 1- Pt casl2f N-term SpyTag, 2- Sp casl2f N-term SpyTag, 3- Pt casl2f C-term SpyTag, 4- Sp casl2f C-term SpyTag, 5- Pt integral SpyCatcher, 6- Sp integral SpyCatcher. The bottom panel shows an agarose gel depicting results of the in vitro guided cleavage assay in Example 1. The substrate and cleavage products are annotated. The assay demonstrates that all the variants retain wild-type-like activity. Lane order: 1- Sp casl2f, 2- Pt casl2f, 3- Sp integral SpyCatcher, 4- Pt integral SpyCatcher, 5-Pt casl2f C-term SpyTag, 6- 1-Pt casl2f N-term SpyTag, M- molecular weight marker.
[0026] Figure 5B (FIG. 5B) depicts expression and activity assays for a second batch of split casl2f constructs. The top panel shows SDS-PAGE of the purification products in Example 1. The analyses shows good yield for all variants described in the example. Lane order: 1- Sp integral BstPol, 2- Pt integral BstPol, M- molecular weight marker. The bottom panel shows an agarose gel depicting results of the in vitro guided cleavage assay in Example 1. The substrate and cleavage products are annotated. The assay demonstrates that all the variants retain wild-type-like activity. Lane order: 1- Sp casl2f, 2-Pt casl2f, 3- Sp integral BstPol, 4- Pt integral BstPol, 5- Sp target DNA, M - molecular weight marker.
[0027] Figure 6 (FIG. 6) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs can facilitate integration of genomic DNA; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to the ends of the DNA insert. The endonuclease catalyzes DNA damage at a specified site to induce DNA repair. The DNA insert includes homology fragments which are used in homology recombination/repair. The DNA insert in this case is presented by two Cas polypeptides which specifically recognize and bind target DNA.
[0028] Figure 7 (FIG. 7) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs can facilitate integration of genomic DNA; in this embodiment, one end of the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA, while the other end is free. In this model, the endonuclease catalyzes DNA damage at a specified site to induce DNA repair incorporating the DNA insert. The DNA insert includes homology fragments which are used in homology recombination/repair. In this case, the DNA insert is a presented by a single Cas polypeptide (e.g. enzyme) that specifically recognizes and binds target DNA.
[0029] Figure 8 (FIG. 8) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl 2f endonuclease) constructs with topoisomerase or protein A integrated between the halves of the Casl2 can facilitate integration of genomic DNA; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to the ends of the DNA insert. The topoisomerase or Protein A catalyzes cleavage of target DNA and simultaneously covalently attaches to one of the strands. Damage to DNA target induces DNA repair. The DNA insert includes homology fragments which are used in homology recombination/repair. [0030] Figure 9 (FIG. 9) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs according to the current disclosure with topoisomerase or protein A integrated between the halves of the Cas 12 can facilitate integration of genomic DNA; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to one end of the DNA insert, while the other end of the DNA insert is free. The topoisomerase or Protein A catalyzes cleavage of target DNA simultaneously and covalently attaches to one of the strand. Damage to DNA target induces DNA repair. The DNA insert includes homology fragments which are used in homology recombination/repair. In this case, the DNA insert is presented by single Cas-topoisomerases.
[0031] Figure 10 (FIG. 10) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs with topoisomerase or Protein A and exonuclease integrated in the constructs; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to the ends of the DNA insert. The topoisomerase or Protein A catalyzes cleavage of target DNA and simultaneously covalently attaches to one of the strands. To make the topoisomerase reaction irreversible a second accessory protein a exonuclease damages the ssDNA in the vicinity of the cut. Damage to the target DNA induces DNA repair.
The DNA insert includes homology fragments which are used in homology recombination/repair. In this case, the DNA insert is a presented by two Cas-topoisom erases which are specifically covalently attached to target DNA.
[0032] Figure 11 (FIG. 11) shows a proposed design schematic by which Cas endonuclease (e.g. split-Casl2f endonuclease) constructs with topoisomerase or Protein A and exonuclease integrated in the constructs; in this embodiment, the DNA insert is attached to the Cas polypeptide (e.g. enzyme) complex via addition of a complementary sequence that binds a free end of the guide RNA to one end of the DNA insert, while the other end of the DNA insert is free. The topoisomerase or Protein A catalyzes cleavage of target DNA and simultaneously covalently attaches to one of the strands. Damage to DNA target induces DNA repair. To make the topoisomerase reaction irreversible a second accessory protein a exonuclease damages the ssDNA in the vicinity of the cut. The DNA insert includes homology fragments which are used in homology recombination/repair. In this case, the DNA insert is a presented by single a Cas that specifically recognizes and binds target DNA.
[0033] Figures 12A and 12B (FIGs. 12A and 12B) illustrate various DNA insert formats that can be used with Cas (e.g. Casl2f) polypeptides (e.g. enzymes) to facilitate homologous recombination at a target site.
[0034] Figure 12A shows several diagrams of attachment of gRNAs to DNA inserts. In panel (A), double Cas polypeptides are attached to the insert with linear dsDNA hybridization; the insert includes target homology sequence required for the editing on either end. In panel (B), a single cas is attached with linear dsDNA hybridization via one side of the insert, while the other side is protected from exonucleolytic degradation with modified nucleotides (e.g. using nucleotide modifications such as C3 '-spacer, hexanediol, l',2'-Dideoxyribose/dSpacer, PC Spacer, Spacer 9, or Spacer 18, or bond modifications such as phosphorothioate bonds). In panel (C), a single cas polypeptide is attached to dumbbell dsDNA containing an insert: one side of the insert is attached to Cas via oligo hybridization, while the other side lacks a hybridization site and simply has a dumbbell loop. Like in the case of the linear dsDNA insert in panels (A) and (B), the dumbbell includes two target homology sequences required to complete homologous recombination. In panel (D), double cas polypeptides are attached to dumbbell dsDNA containing an insert analogous to panel (C) but having a second hybridization site for an oligo that binds a gRNA on both sides of the dumbbell. In panel (E), a single cas is attached to dumbbell dsDNA containing an insert, but via a hybridization site internal to the homology regions.
[0035] Figure 12B shows an example dumbbell sequence (5'- GAAAGGAAGCCCTGCTTCCTCCAGAGGGCGTCGCAGGACAGCTTTTCCTAGACAG
GGGCTAGTATGTGCAtttcctgatgtcgatgtgCCAGGAGAGGAGGGAGAAATCCCTCCTCTC CTGGcacatcgacatcaggaaaTGCACATACTAGCCCCTGTCTAGGAAAAGCTGTCCTGCGAC GCCCTCTGGAGGAAGCAGGGCTTCCTTTCGTCAGTCAGTCAGTCAGTCAGTCAGT C AGTC AGTC AGTC AGTC AGTC A-3', SEQ ID NO: 74) that can be used in the embodiments described in Figure 12A, where underlined sequences (5'- GAAAGGAAGCCCTGCTTCCTCCAGAGGGCGTCGCAGGACAGCTTTTCCTAGACAGG GGCTAGTATGTGCAtttcctgatgtcgatgtgCCAGGAGAGGAGGGA-3' and 5'- TCCCTCCTCTCCTGGcacatcgacatcaggaaaTGCACATACTAGCCCCTGTCTAGGAAAAGC TGTCCTGCGACGCCCTCTGGAGGAAGCAGGGCTTCCTTTC-3') denote stem parts of the dumbbell and italicized (5'-GAAA-3' and 5'-
GTCAGTCAGTCAGTC AGTC AGTC AGTC AGTCAGTCAGTCAGTCAGTCA-3 ') letters denote single strand parts of the dumbbell. The last italicized sequence (5'-
GTCAGTC AGTC AGTC AGTC AGTC AGTC AGTCAGTCAGTCAGTCAGTCA-3') can be used as a gRNA hybridization site. Such a dumbbell can be constructed by ligating the adaptor sequences (bolded) using e.g. T4 DNA ligase.
[0036] Figure 13 (FIG. 13) shows a schematic representation of an editing design including Cas, DNA polymerase and DNA inserts according to the current disclosure. The schematic emphasizes three possible paths after specific target recognition, cleavage, and annealing of the 3'- ssDNA tail to the cleaved and displaced target strand: (A) a path where the displaced and cleavage by Cas target DNA serves as a primer and DNA insert is a template; (B) a path where the insert’s 3'- ssDNA tail serves as primer and cleaved target DNA serves as a template; (C) a path similar to path B but wherein the target DNA is not cleaved. The paths generate stable attachment of the insert to target DNA and involve the host DNA repair mechanism to finalize editing.
[0037] Figure 14A (FIG. 14A) shows a schematic explaining a method to incorporate long insert sequences where the insert have high homology to the target DNA. Shown are two methods with accessory 3' to 5' exonuclease and 5' to 3' exonuclease (e.g. where exonucleases are incorporated as a heterologous polypeptide between or in addition to split Casl2f domains). One of the strands of the cleaved DNA target is digested by 3' to 5' exonuclease or 5' to 3 'exonuclease. The digestion process competes with Host DNA repair mechanism (homologous recombination); as a result, the homology cross-point is moved away from the target site, increasing insert length.
[0038] Figure 14B (FIG. 14B) and Figure 14C (FIG. 14C) depict schematics showing how the designs in Figure 14A can be implemented using Cas/polymerase/exonuclease fusions.
[0039] Figures 15A, 15B, and 15C (FIGs. 15A, 15B, and 15C) depict polypeptide diagrams showing how Cas/polymerase/exonuclease fusions as shown in Figures 14B and 14C can be constructed at a domain level. FIG. 15A depicts examples using split Casl2f and T5 exonuclease or exo III; FIG. 15B depicts examples using intact Casl2f and T5 exonuclease or ExoIII, and FIG. 15C depicts examples using Cas9 or Casl2 alongside Bst Polymerase and T5 exonuclease or Exo III. The left and rightmost domains represents Casl2f Sp or Pt, the domains flanking the central domains are linkers and the central domain is exonuclease T5Exo or ExoIII. [0040] Figure 16 (FIG. 16) depicts an example hybrid gRNA/DNA insert usable with methods according to the disclosure. In this example, gRNA and insert DNA are covalently attached either by standard 5'-phosphate-3' bond or an alternative covalent conjugation method (e.g. chemical synthesis, splint ligation of RNA using T4 DNA ligase and a bridging DNA oligonucleotide complementary to an RNA, or any of the methods described in Huang et al. Nucleic Acids Research, Volume 24, Issue 21, 1 November 1996, Pages 4360-4361, which is incorporated by reference in its entirety herein, or Mack et al. Curr Protoc Chem Biol. 2016; 8(2): 83-95, which is incorporated by reference in its entirety herein) . For example, such hybrids can be constructed by ligation of gRNA to a DNA insert, chemical synthesis, or primer extension starting from a gRNA primer using DNA Pol I or Klenow fragment.
DETAILED DESCRIPTION
[0041] There is a need for endonuclease compositions, methods, and systems that improve the efficiency of transgene insertions into precise locations for genomic editing. Insertion efficiencies of transgenes using CRISPR-Cas relying on simple homologous recombination can be in the single digits at for large inserts, making approaches relying on such methods technically laborious. Provided herein are methods, compositions, and systems for improved gene editing, particularly involving large insert DNAs.
Definitions
[0042] The practice of some methods disclosed herein employ, unless otherwise indicated, techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics and recombinant DNA. See for example Sambrook and Green, Molecular Cloning: A Laboratory Manual, 4th Edition (2012); the series Current Protocols in Molecular Biology (F. M. Ausubel, et al. eds.); the series Methods In Enzymology (Academic Press, Inc.), PCR 2: A Practical Approach (M.J. MacPherson, B.D. Hames and G.R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) Antibodies, A Laboratory Manual, and Culture of Animal Cells: A Manual of Basic Technique and Specialized Applications, 6th Edition (R.I. Freshney, ed. (2010)) (which are entirely incorporated by reference herein).
[0043] As used herein, the term “programmable nuclease” generally refers to endonucleases that are “targeted” (“programed”) to recognize and edit a pre-determined site in a genome of an organism. In an embodiment, the programmable nuclease can induce site specific DNA cleavage at a pre-determined site in a genome. In an embodiment, the programmable nuclease may be programmed to recognize a genomic location with a DNA binding protein domain, or combination of DNA binding protein domains. Example features of Cas programmable nucleases or Cas polypeptides (e.g. enzymes) are described in, e.g. Makarova, et al. Nat. Rev. Microbiol., 18, 67-83, Shmakov et al. Nat. Rev. Microbiol., 15, 169-182, and Karvelis et al. Nucleic Acids Research, 2020, Vol. 48, No. 9, 5016-5023, each of which is incorporated by reference herein in their entireties. In some cases, the programmable nuclease is a Casl2 polypeptide, such as a Class 2, Type V-F polypeptide or a Casl2f polypeptide for which example domain organization, domain types (e.g. WED, RECI, RuvC, Nuc, and REC21 domains), functional residues, and structure (e.g. relative to SEQ ID NO: 84) are outlined in e.g. Xiao et al. Nucleic Acids Res. 2021 Apr 19; 49(7): 4120-4128, which is incorporated by reference herein for all purposes.
[0044] As used herein, a “guide nucleic acid” or “guide polynucleotide” generally refers to a nucleic acid that may hybridize to another nucleic acid. A guide nucleic acid may be RNA. A guide nucleic acid may be DNA. The guide nucleic acid may be programmed to bind specifically to a nucleic acid with a particular sequence. The nucleic acid to be targeted, or the target nucleic acid, may comprise nucleotides. The guide nucleic acid may comprise nucleotides. A portion of the target nucleic acid may be complementary to a portion of the guide nucleic acid. The strand of a double-stranded target polynucleotide that is complementary to and hybridizes with the guide nucleic acid may be called the complementary strand. The strand of the double-stranded target polynucleotide that is complementary to the complementary strand, and therefore may not be complementary to the guide nucleic acid may be called a noncomplementary strand. A guide nucleic acid may comprise a polynucleotide chain and can be called a “single guide nucleic acid.” A guide nucleic acid may comprise two polynucleotide chains and may be called a “double guide nucleic acid.” If not otherwise specified, the term “guide nucleic acid” may be inclusive, referring to both single guide nucleic acids and double guide nucleic acids. Guide nucleic acids may comprise a nucleic acid targeting segment (e.g. a crRNA) and a protein binding sequence. Guide nucleic acids may comprise a nucleic acid targeting segment (e.g. a crRNA) a protein binding sequence, and a trans-activating RNA (e.g. a tracrRNA).
[0045] A guide nucleic acid may comprise a segment that can be referred to as a “nucleic acidtargeting segment” a “nucleic acid-targeting sequence” or a “seed sequence”. In some cases,
the sequence is 19-21 nucleotides in length. In some cases, “nucleic acid-targeting segment” or a “nucleic acid-targeting sequence” comprises a crRNA. A nucleic acid-targeting segment may comprise a sub-segment that may be referred to as a “protein binding segment” or “protein binding sequence” or “Cas protein binding segment”.
[0046] The term “guide RNA”, as used herein, can generally refer to a nucleic acid with at least about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% sequence identity and/or sequence similarity to a wild type example guide RNA sequence (e.g., a type V guide RNA from S. pyogenes, S. aureus, etc). Guide RNA can refer to a nucleic acid with at most about 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% sequence identity and/or sequence similarity to a wild type example guide RNA sequence. Guide RNA may refer to a modified form of a guide RNA that can comprise a nucleotide change such as a deletion, insertion, or substitution, variant, mutation, or chimera. A guide RNA may refer to a nucleic acid that can be at least about 60% identical to a wild type example guide RNA sequence over a stretch of at least 6 contiguous nucleotides. For example, a guide RNA sequence can be at least about 60% identical, at least about 65% identical, at least about 70% identical, at least about 75% identical, at least about 80% identical, at least about 85% identical, at least about 90% identical, at least about 95% identical, at least about 98% identical, at least about 99% identical, or 100 % identical to a wild type example guide RNA sequence over a stretch of at least 6 contiguous nucleotides.
[0047] The term “sequence identity” or “percent identity” in the context of two or more nucleic acids or polypeptide sequences, generally refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as measured using a sequence comparison algorithm. Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of
1, and using a conditional compositional score matrix adjustment for polypeptide sequences longer than 30 residues; BLASTP using parameters of a wordlength (W) of 2, an expectation (E) of 1000000, and the PAM30 scoring matrix setting gap costs at 9 to open gaps and 1 to extend gaps for sequences of less than 30 residues (these are the default parameters for BLASTP in the BLAST suite available at https://blast.ncbi.nlm.nih.gov); CLUSTALW with parameters of ; the Smith -Waterman homology search algorithm with parameters of a match of
2, a mismatch of -1, and a gap of -1; MUSCLE with default parameters; MAFFT with parameters retree of 2 and maxiterations of 1000; Novafold with default parameters; HMMER
hmmalign with default parameters.
[0048] The term “optimally aligned” in the context of two or more nucleic acids or polypeptide sequences, generally refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned to maximal correspondence of amino acids residues or nucleotides, for example, as determined by the alignment producing a highest or “optimized” percent identity score.
[0049] As used herein, the term “Wedge” (WED) domain generally refers to a domain (e.g. present in a Cas protein) interacting primarily with repeat: anti-repeat duplex of the sgRNA and PAM duplex. A WED domain can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences.
[0050] As used herein, the term “REC domain” generally refers to a domain (e.g. present in a Cas protein) comprising at least one of two segments (RECI or REC2) that are alpha helical domains thought to contact the guide RNA. A REC domain or segments thereof can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences (e.g., Pfam PF19501 for domain RECI).
[0051] The term "pharmaceutically acceptable carrier" or “pharmaceutically acceptable excipient” as used herein generally refers to a diluent, adjuvant, excipient, or vehicle with which a probe of the disclosure is administered and which is approved by a regulatory agency of the Federal or a state government or listed in the U.S. Pharmacopeia or other generally recognized pharmacopeia for use in animals, and more particularly in humans. Such pharmaceutical carriers can be liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as pea-nut oil, soybean oil, mineral oil, sesame oil and the like. The pharmaceutical carriers can be saline, gum acacia, gelatin, starch paste, talc, keratin, colloidal silica, urea, and the like. When administered to a patient, the probe and pharmaceutically acceptable carriers can be sterile. Water can be a useful carrier when the composition is administered intravenously. Saline solutions and aqueous dextrose and glycerol solutions can also be employed as liquid carriers, particularly for injectable solutions. Suitable pharmaceutical carriers also include excipients such as glucose, lactose, sucrose, glycerol monostearate, sodium chloride, glycerol, propylene, glycol, water, ethanol and the like. The present compositions, if desired, can also contain minor amounts of wetting or emulsifying agents, or pH buffering agents. The present compositions may take the form of solutions, emulsion, sustained-release formulations, or any other form suitable for use. In some cases the pharmaceutically acceptable excipient may comprise a transfection agent. Suitable transfection
agents include, but are not limited to, linear or branched polyethylenimines (see e.g. Bonnet et al., (2008) Pharmaceut. Res. 25: 2972-2982, which is incorporated by reference herein in its entirety for all purposes), nanoparticles, lipid nanoparticles (LNPs, see e.g. in Finn et al. Cell Rep. 2018 Feb 27;22(9):2227-2235. Doi: 10.1016/j.celrep.2018.02.014 or Yin et al. Nat Biotechnol. 2016 Mar;34(3):328-33. Doi: 10.1038/nbt.3471, both of which are incorporated by reference herein in their entireties for all purposes), lipophilic particles, peptides, micelles, dendrimers, hydrogels, synthetic or naturally derived exosomes, polymeric composition, viruslike particles (see e.g. Lisziewicz et al., (2012) PLoS ONE 7:e35416, which is incorporated by reference herein in its entirety for all purposes), and any combination thereof.
[0052] Included in the current disclosure are variants of any of the enzymes, polypeptides, proteins, or domains described herein with one or more conservative amino acid substitutions. Such conservative substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three-dimensional structure or function of the polypeptide. Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity, and R chain length for one another. Additionally, or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., nonconserved residues) without altering the basic functions of the encoded proteins. Such conservatively substituted variants may include variants with at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, at least about 99% identity to any one of the endonuclease protein sequences described herein. In some embodiments, such conservatively substituted variants are functional variants. Such functional variants can encompass sequences with substitutions such that the activity of one or more critical active site residues or guide polynucleotide binding residues of the endonuclease are not disrupted.
[0053] Also included in the current disclosure are variants of any of the enzymes, polypeptides, proteins, or domains described herein with substitution of one or more catalytic residues to decrease or eliminate activity of the enzyme, polypeptide, protein, or domain (e.g. decreased- activity variants). In some embodiments, a decreased activity variant of an enzyme, polypeptide, protein, or domain described herein comprises a disrupting substitution of at least one, at least two, three, four, five, six, or all catalytic residues. In some embodiments, any of the endonucleases described herein can comprise a nickase mutation. In some embodiments,
any of the endonucleases described herein can comprise a RuvC domain lacking nuclease activity. In some embodiments, any of the endonucleases described herein can be configured to cleave one strand of a double-stranded target deoxyribonucleic acid. In some embodiments, any of the endonucleases described herein can comprise can be configured to lack endonuclease activity or be catalytically dead.
[0054] Conservative substitution tables providing functionally similar amino acids are available from a variety of references (see, for e.g., Creighton, Proteins: Structures and Molecular Properties (W H Freeman & Co.; 2nd edition (December 1993), which is incorporated herein in its entirety for all purposes). The following eight groups each contain amino acids that are conservative substitutions for one another:
1) Alanine (A), Glycine (G);
2) Aspartic acid (D), Glutamic acid (E);
3) Asparagine (N), Glutamine (Q);
4) Arginine (R), Lysine (K);
5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);
7) Serine (S), Threonine (T); and
8) Cysteine (C), Methionine (M). Example Embodiments
[0055] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising (a) a first fragment comprising WED and RECI domains of, derived from, or obtained from a first Casl2 polypeptide or enzyme (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme); (b) a heterologous domain; and (c) a second fragment comprising RuvC and Nuc domains of, derived from, or obtained from a second Cast 2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme). In some embodiments, the Casl2 enzyme is configured to bind a double-stranded deoxyribonucleic acid (DNA) site. In some embodiments, the second fragment further comprises a REC2 domain of, derived from, or obtained from a second Casl2 polypeptide. In some embodiments, the first fragment and the second fragment are derived from a same Cast 2 polypeptide (e.g. the first and the second Casl2 polypeptide are the same). In some embodiments, the first fragment and the second fragment are derived from different Cast 2 polypeptides (e.g. the first and the second Casl2 polypeptide are different). In some embodiments, the first fragment and the second fragment do not comprise an inactivating mutation in an active site residue of the first or the second Cast 2 polypeptide. In some embodiments, the first fragment and the second fragment do comprise an inactivating mutation
in an active site residue of the first or the second Cast 2 polypeptide. In some embodiments, the fusion protein further comprises a linker between the first fragment and the heterologous domain, or between the heterologous domain and the second fragment. In some embodiments, the linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. In some embodiments, the linker comprises LPXTG, GGG, (GGG)n, (GGGGS)n, (GGGS)n, N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond. In some embodiments, the composition further comprises an insert DNA molecule. In some embodiments, the composition further comprises a guide polynucleotide configured to interact with the Casl2 polypeptide.
[0056] The Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be any suitable Casl2 polypeptide (e.g. a Casl2 polypeptide that can be separated into non-contiguous fragments while still retaining enzymatic or binding activity). The Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be from particular species, e.g. Streptococcus pyogenes, Parageobacillus thermoglucosidasius, an archeon, Candidates Micrarchaeota (archeon), Candidates Aureabacteria (bacterium), Acidibacillus sulfur oxidans, Ruminococcus, Syntrophomonas palmitatica, Clostridium novyi, or any combination thereof. In some embodiments, the Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) is a Class 2, Type V-F or Casl2f polypeptide (for which example domain organization, functional residues, and structure relative to SEQ ID NO: 84 are outlined in e.g. Xiao et al. Nucleic Acids Res. 2021 Apr 19; 49(7): 4120-4128, which is incorporated by reference herein for all purposes). In some cases, a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more active site residues D326, E422, D510 (from the RuvC domain), or R490 (from the Nuc domain) relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). In some cases, a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more PAM interacting residues S142, R163, Y146, S286, Y146, K196, REClc residues 134-152, or Hl 39 relative to SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). The Cast 2 polypeptide (e.g. first or second Cast 2 polypeptide, or intact Casl2 polypeptide) can comprise WED, RECI, RuvC, Nuc, or REC2 domains (or any combination thereof) having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least
97%, at least 98%, at least 99%, or 100% sequence identity to WED, RECI, RuvC, Nuc, or REC2 domains of any one of SEQ ID Nos: 1, 2, 5, 6, 11, 13, 15, 24-43, or 84, or a variant thereof.
[0057] The heterologous domain can comprise any suitable polypeptide residues or domains of appropriate size. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-1500 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-2000 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. In some cases, the heterologous domain comprises (e.g. consists of) at most about 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. In some embodiments, the heterologous domain comprises (e.g. consists of) at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length to at most about 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. The heterologous domain can comprise an enzyme. The heterologous domain can comprise a DNA-binding or a DNA-conjugating domain. The heterologous domain can comprise a domain with DNA-dependent DNA polymerase activity or a domain with topoisomerase activity. The heterologous domain can comprise a T7 DNA polymerase domain, aBst polymerase domain or an analog thereof (e.g. a Bst large fragment polymerase domain or aBst. 2.0 polymerase domain), a T4 DNA polymerase domain, a Taq polymerase domain, a Vent polymerase domain, a Q5 polymerase domain, a Klenow fragment domain, a DNA polymerase theta domain, or a Phi29 polymerase domain, or a functional fragment or derivative thereof. Example organization, structure, and function of T7 DNA polymerase can be found in e.g. Doublie et al. Curr Opin Struct Biol. 1998 Dec;8(6):704-12. doi: 10.1016/s0959- 440x(98)80089-4 and UniProtKB/Swiss-Prot accession no. P00581.1, both of which are incorporated by reference herein for all purposes. In some cases, a T7 DNA polymerase domain according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) H506, R518, K522, Y526,
E480, or Y530, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 46. In some embodiments, a T7 DNA polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 46, or a variant thereof. Example organization, structure, and function of large fragment Bst polymerase (e.g. can be found in e.g. SEQ ID NO: 45) can be found in e.g. Oscorbin et al. Comput Struct Biotechnol J. 2023 Sep 12:21 :4519-4535. doi: 10.1016/j csbj .2023.09.008. eCollection 2023, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a Bst polymerase domain (e.g. Bst large fragment or Bst 2.0 such as SEQ ID Nos: 44 or 45) according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) D653, D830, E831, H829, Q797, R615, or E658, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 45. In some embodiments, a Bst polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 45, or a variant thereof. Example organization, structure, and function of phi29 polymerase can be found in, e.g. Del Prado et al. Sci Rep. 2019 Jan 29;9(1):923. doi: 10.1038/s41598-018-37513-7 and UniProtKB/Swiss-Prot accession no. P03680.1, both of which are incorporated by reference herein in its entirety for all purposes. In some embodiments, a phi29 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y101, T189, Q180, 12, 14, 15, 59, 61, 62, 65, 66, 69, 122, 123, 128, 143, 148, 169, 196, 198, 249, 252, 253, 255, 364, 371, 383, 392, 393, 434, 437, 438, 455, 456, 457, 458, 498, or 500, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 51. In some embodiments, a phi29 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 51, or a variant thereof. Example organization, structure, and function of Taq polymerase can be found in, e.g. Eun. “Enzymology Primer for Recombinant DNA Technology” (Chapter 6, DNA polymerases).
ISBN 978-0-12-243740-3 (academic Press, 1996), Park et al. Mol Cells. 1997 Jun 30;7(3):419-, 24 and UniProtKB/Swiss-Prot accession no. P19821.1, both which are incorporated by reference herein in their entirety for all purposes. In some embodiments, a Taq polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) G308, V310, L356, R405, R25, or R74 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 48. In some embodiments, a Taq polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 48, or a variant thereof. Example organization, structure, and function of T4 polymerase can be found in, e.g. Wang et al. Biochemistry. 1996 Jun 25;35(25):8110-9. doi: 10.102 l/bi960178r, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y320 or E191 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 47. In some embodiments, a T4 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 47, or a variant thereof. Example organization, structure, and function of Klenow polymerase can be found in, e.g. Polesky et al. J Biol Chem. 1990 Aug 25;265(24): 14579-91, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y766, R841, N845, N849, R668, or D882 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 50. In some embodiments, a Klenow polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least
84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least
91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least
98%, at least 99%, or 100% sequence identity to SEQ ID NO: 50, or a variant thereof. Example organization, structure, and function of Vent (e.g. T. litoralis) polymerase can be found in, e.g. Gardner et al. Nucleic Acids Res. 1999 Jun 15;27(12):2545-53. doi: 10.1093/nar/27.12.2545, which is incorporated by reference in its entirety herein for all purposes. In some embodiments,
a vent polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) A488, N494, S495, Y412, K490, N494, Q486, R487, Y496, or Y499 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 49 or a variant thereof. In some embodiments, a Vent polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 49, or a variant thereof.
[0058] The heterologous domain can comprise a topoisomerase domain. The heterologous domain can comprise a Type I (e.g. Type 1 A) or Type II topoisomerase domain, any combination thereof, or a functional fragment or derivative thereof. Example organization and function of Type I (e.g. type IA) topoisomerases can be found in e.g. Chen et al. J Biol Chem. 1998 Mar 13;273(11):6050-6. doi: 10.1074/jbc.273.11.6050, which is incorporated by reference herein for all purposes. In some cases, a Type I topoisomerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) E9, H33, Di l l, El 15, N309, E313, T318, R321, T322, D323, H365, or T496, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 85. Example organization, structure, and function of Type II topoisomerases can be found in e.g. Liu et al. J Biol Chem. 1998 Aug 7;273(32):20252-60. doi: 10.1074/jbc.273.32.20252, which is incorporated by reference herein for all purposes. In some cases, a Type II topoisomerase according to the current disclosure can comprise one or more critical (e.g. active site) residue(s) Y782, R690, D697, K700, R704, or R781 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 86, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). The heterologous domain can comprise E. coll Eubacterial DNA topoisomerase I, E. coll Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase III, H. sapiens DNA topoisomerase Illa or 111 [I, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or M. kandleri hyperthermophilic eubacterial DNA topoisomerase V, phiX174 protein A, or a functional fragment thereof. The heterologous domain can comprise E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or 11 [i, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof.
[0059] The insert DNA molecule can have a variety of structures and configurations suitable for insertion into genomic DNA (e.g. via homologous recombination or other DNA repair methods). In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule comprising a region with complementarity to a region 5' to the double-stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises (e.g. consists of) at least about 4 bp or nucleotides to at least about 400 bp or nucleotides or at least about 4 bp or nucleotides to at least about 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double- stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220,
230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides to at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150,
160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340,
350, 360, 370, 380, 390, or 400 bp or nucleotides, or any range between these values. In some embodiments, the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 3' to the double- stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides to at most about 10,
20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides, or any range between these values. In some embodiments, the insert DNA molecule further comprises a transgene. In some embodiments, the transgene comprises an open reading frame (ORF). In some embodiments, the transgene comprises a promoter operably linked to an ORF. In some embodiments, the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt). In some embodiments, the transgene comprises at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt). In some embodiments, the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500,
2.750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000,
6.250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500,
9.750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt) to at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000,
4.250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500,
7.750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt), or any range between these values. In some embodiments, the transgene is flanked by the region with complementarity to a region 5' to the double-stranded DNA site and the region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the transgene comprises an open reading frame (ORF). In some embodiments, the transgene comprises a promoter operably linked to an ORF. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide. In some embodiments, the insert DNA molecule is linked to a hydroxyl (e.g. catalytic hydroxyl) group of the domain having DNA topoisomerase activity at a first end, and the insert DNA
molecule comprises the region homologous to a region 5' to the nucleic acid site or the region homologous to a region 3' to the nucleic acid site at a second end. In some embodiments, the insert DNA molecule comprises a first end configured to hybridize with a hybridization domain of a guide polynucleotide at the 3' end of the insert DNA when the guide polynucleotide further comprises a hybridization domain at a 3' end.
[0060] The guide polynucleotide configured to interact with the Cast 2 polypeptide (e.g. first or second Cast 2 polypeptide, or intact Cast 2 polypeptide) can be any suitable guide polynucleotide configured to hybridize to the DNA site (e.g. an RNA comprising guide suitable for interacting with at Casl2f enzyme or a Class 2, Type V-F enzyme, or a mixture of RNA and DNA comprising a region configured to hybridize or complementary to the DNA site). In some embodiments, the guide polynucleotide further comprises a hybridization domain at a 3' end. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides to at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides, or any range between these values.
[0061] The composition can comprise a pharmaceutically acceptable excipient. The excipient can comprise a transfection agent (e.g. a liposome or a lipid nanoparticle). In some embodiments, a fusion protein of the disclosure is provided in a lipid nanoparticle (LNP) by encapsulating the fusion protein with an optional guide polynucleotide or insert DNA molecule into the LNP. This can be performed using methodologies documented e.g. in Finn et al. Cell Rep. 2018 Feb 27;22(9):2227-2235. doi: 10.1016/j.celrep.2018.02.014 or Yin et al. Nat Biotechnol. 2016 Mar;34(3):328-33. doi: 10.1038/nbt.3471, both of which are incorporated by reference herein in their entireties for all purposes.
[0062] In some aspects, the present disclosure provides for a composition comprising a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme) configured to bind a doublestranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (i) a sequence comprising WED and RECI domains of, derived from, or obtained from a first Casl2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Cast 2 polypeptide or enzyme); or (ii) a sequence comprising RuvC, REC2, and Nuc domains of, derived from, or obtained from a second Casl2 polypeptide (e.g. a natural genomic or polypeptide sequence of a Casl2 polypeptide or enzyme); and (c) a third segment comprising a heterologous domain of at
least about 100 amino acids. In some embodiments, the first segment further comprises WED, RECI, RuvC, REC2, and Nuc domains of, derived from, or obtained from the first Cast 2 polypeptide. In some embodiments, the fusion protein further comprises a linker between (a), (b), or (c). In some embodiments, the linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. In some embodiments, the linker comprises LPXTG, GGG, (GGG)n, (GGGGS)n, (GGGS)n, N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/Spy Catcher sequences linked by an isopeptide bond. In some embodiments, the composition further comprises an insert DNA molecule. In some embodiments, the composition further comprises a guide polynucleotide configured to interact with the Cast 2 polypeptide (e.g. first or second Cast 2 polypeptide, or intact Cast 2 polypeptide).
[0063] The Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be any suitable Casl2 polypeptide (e.g. a Casl2 polypeptide that can be separated into non-contiguous fragments while still retaining enzymatic or binding activity). The Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) can be from particular species, e.g. Streptococcus pyogenes, Parageobacillus thermoglucosidasius, an archeon, Candidates Micrarchaeota (archeon), Candidates Aureabacteria (bacterium), Acidibacillus sulfur oxidans, Ruminococcus, Syntrophomonas palmitatica, Clostridium novyi, or any combination thereof. In some embodiments, the Casl2 polypeptide (e.g. first or second Casl2 polypeptide, or intact Casl2 polypeptide) is a Class 2, Type V-F or Casl2f polypeptide (for which example domain organization, functional residues, and structure relative to SEQ ID NO: 84 are outlined in e.g. Xiao et al. Nucleic Acids Res. 2021 Apr 19; 49(7): 4120-4128, which is incorporated by reference herein for all purposes). In some cases, a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more active site residues D326, E422, D510 (from the RuvC domain), or R490 (from the Nuc domain) relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). In some cases, a Class 2, Type V-F or Casl2f polypeptide according to the disclosure comprises one or more PAM interacting residues S142, R163, Y146, S286, Y146, K196, REClc residues 134-152, or Hl 39 relative to SEQ ID NO: 84, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). The Cast 2 polypeptide (e.g. first or second Cast 2 polypeptide, or intact Casl2 polypeptide) can comprise WED, RECI, RuvC, Nuc, or REC2 domains (or any combination thereof) having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least
90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to WED, RECI, RuvC, Nuc, or REC2 domains of any one of SEQ ID Nos: 1, 2, 5, 6, 11, 13, 15, 24-43, or 84, or a variant thereof.
[0064] The heterologous domain can comprise any suitable polypeptide residues or domains of appropriate size. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-1500 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100-2000 amino acids in length. In some cases, the heterologous domain comprises (e.g. consists of) at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. In some cases, the heterologous domain comprises (e.g. consists of) at most about 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. In some embodiments, the heterologous domain comprises (e.g. consists of) at least about 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length to at most about 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950, or 2000 amino acids in length, or any range between these values. The heterologous domain can comprise an enzyme. The heterologous domain can comprise a DNA-binding or a DNA-conjugating domain. The heterologous domain can comprise a domain with DNA-dependent DNA polymerase activity or a domain with topoisomerase activity. The heterologous domain can comprise a T7 DNA polymerase domain, aBst polymerase domain or an analog thereof (e.g. a Bst large fragment polymerase domain or aBst. 2.0 polymerase domain), a T4 DNA polymerase domain, a Taq polymerase domain, a Vent polymerase domain, a Q5 polymerase domain, a Klenow fragment domain, a DNA polymerase theta domain, or a Phi29 polymerase domain, or a functional fragment or derivative thereof. Example organization, structure, and function of T7 DNA polymerase can be found in e.g. Doublie et al. Curr Opin Struct Biol. 1998 Dec;8(6):704-12. doi: 10.1016/s0959- 440x(98)80089-4 and UniProtKB/Swiss-Prot accession no. P00581.1, both of which are incorporated by reference herein for all purposes. In some cases, a T7 DNA polymerase domain according to the current disclosure can comprise, lack, or comprise substituted to
alanine or glycine one or more critical (e.g. active site) residue(s) H506, R518, K522, Y526, E480, or Y530, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 46. In some embodiments, a T7 DNA polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO:
46, or a variant thereof. Example organization, structure, and function of large fragment Bst polymerase (e.g. can be found in e.g. SEQ ID NO: 45) can be found in e.g. Oscorbin et al. Comput Struct Biotechnol J. 2023 Sep 12:21 :4519-4535. doi: 10.1016/j csbj .2023.09.008. eCollection 2023, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a Bst polymerase domain (e.g. Bst large fragment or Bst 2.0 such as SEQ ID Nos: 44 or 45) according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) D653, D830, E831, H829, Q797, R615, or E658, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 45. In some embodiments, a Bst polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 45, or a variant thereof. Example organization, structure, and function of phi29 polymerase can be found in, e.g. Del Prado et al. Sci Rep. 2019 Jan 29;9(1):923. doi:
10.1038/s41598-018-37513-7 and UniProtKB/Swiss-Prot accession no. P03680.1, both of which are incorporated by reference herein in its entirety for all purposes. In some embodiments, a phi29 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y101, T189, Q180, 12, 14, 15, 59, 61, 62, 65, 66, 69, 122, 123, 128, 143, 148, 169, 196, 198, 249, 252, 253, 255, 364, 371, 383, 392, 393, 434, 437, 438, 455, 456, 457, 458, 498, or 500, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 51. In some embodiments, a phi29 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least
84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least
91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least
98%, at least 99%, or 100% sequence identity to SEQ ID NO: 51, or a variant thereof. Example organization, structure, and function of Taq polymerase can be found in, e.g. Eun.
“Enzymology Primer for Recombinant DNA Technology” (Chapter 6, DNA polymerases). ISBN 978-0-12-243740-3 (academic Press, 1996), Park et al. Mol Cells. 1997 Jun 30;7(3):419-, 24 and UniProtKB/Swiss-Prot accession no. P19821.1, both which are incorporated by reference herein in their entirety for all purposes. In some embodiments, a Taq polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) G308, V310, L356, R405, R25, or R74 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 48. In some embodiments, a Taq polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 48, or a variant thereof. Example organization, structure, and function of T4 polymerase can be found in, e.g. Wang et al. Biochemistry. 1996 Jun 25;35(25):8110-9. doi: 10.102 l/bi960178r, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y320 or E191 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 47. In some embodiments, a T4 polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 47, or a variant thereof. Example organization, structure, and function of Klenow polymerase can be found in, e.g. Polesky et al. J Biol Chem. 1990 Aug 25;265(24): 14579-91, which is incorporated by reference herein in its entirety for all purposes. In some embodiments, a T4 polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) Y766, R841, N845, N849, R668, or D882 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 50. In some embodiments, a Klenow polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least
84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least
91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least
98%, at least 99%, or 100% sequence identity to SEQ ID NO: 50, or a variant thereof. Example organization, structure, and function of Vent (e.g. T. litoralis) polymerase can be found in, e.g. Gardner et al. Nucleic Acids Res. 1999 Jun 15;27(12):2545-53. doi: 10.1093/nar/27.12.2545,
which is incorporated by reference in its entirety herein for all purposes. In some embodiments, a vent polymerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) A488, N494, S495, Y412, K490, N494, Q486, R487, Y496, or Y499 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 49 or a variant thereof. In some embodiments, a Vent polymerase domain can comprise a sequence having at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to SEQ ID NO: 49, or a variant thereof.
[0065] The heterologous domain can comprise a topoisomerase domain. The heterologous domain can comprise a Type I (e.g. Type 1 A) or Type II topoisomerase domain, any combination thereof, or a functional fragment or derivative thereof. Example organization and function of Type I (e.g. type IA) topoisomerases can be found in e.g. Chen et al. J Biol Chem. 1998 Mar 13;273(11):6050-6. doi: 10.1074/jbc.273.11.6050, which is incorporated by reference herein for all purposes. In some cases, a Type I topoisomerase according to the current disclosure can comprise, lack, or comprise substituted to alanine or glycine one or more critical (e.g. active site) residue(s) E9, H33, Di l l, El 15, N309, E313, T318, R321, T322, D323, H365, or T496, or any combination thereof relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 85. Example organization, structure, and function of Type II topoisomerases can be found in e.g. Liu et al. J Biol Chem. 1998 Aug 7;273(32):20252-60. doi: 10.1074/jbc.273.32.20252, which is incorporated by reference herein for all purposes. In some cases, a Type II topoisomerase according to the current disclosure can comprise one or more critical (e.g. active site) residue(s) Y782, R690, D697, K700, R704, or R781 relative to (e.g. corresponding to, or when the sequence of the polypeptide is optimally aligned to) SEQ ID NO: 86, any combination thereof, or a lack of any of these active site residues (or a mutation of any of these residues to glycine or alanine). The heterologous domain can comprise E. coll Eubacterial DNA topoisomerase I, E. coll Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase III, H. sapiens DNA topoisomerase Illa or 111 [I, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or AL kandleri hyperthermophilic eubacterial DNA topoisomerase V, phiX174 protein A, or a functional fragment thereof. The heterologous domain can comprise E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA
topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or 11 [f or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof.
[0066] The insert DNA molecule can have a variety of structures and configurations suitable for insertion into genomic DNA (e.g. via homologous recombination or other DNA repair methods). In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule comprising a region with complementarity to a region 5' to the double-stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises (e.g. consists of) at least about 4 bp or nucleotides to at least about 400 bp or nucleotides or at least about 4 bp or nucleotides to at least about 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double-stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 5' to the double- stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220,
230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides to at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150,
160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340,
350, 360, 370, 380, 390, or 400 bp or nucleotides, or any range between these values. In some embodiments, the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 3' to the double- stranded DNA site comprises at most about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides. In some embodiments, the region with complementarity to a region 3' to the double-stranded DNA site comprises at least about 4, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100,
110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290,
300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides to at most about 10,
20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220,
230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, or 400 bp or nucleotides, or any range between these values. In some embodiments, the insert DNA molecule further comprises a transgene. In some embodiments, the transgene comprises an open reading frame (ORF). In some embodiments, the transgene comprises a promoter operably linked to an ORF. In some embodiments, the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt). In some embodiments, the transgene comprises at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt). In some embodiments, the transgene comprises at least about 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500,
2.750, 3,000, 3,250, 3,500, 3,750, 4,000, 4,250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000,
6.250, 6,500, 6,750, 7,000, 7,250, 7,500, 7,750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500,
9.750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt) to at most about 200, 300, 400, 500, 600, 700, 800, 900, 1,000, 1,250, 1,500, 1,750, 2,000, 2,250, 2,500, 2,750, 3,000, 3,250, 3,500, 3,750, 4,000,
4.250, 4,500, 4,750, 5,000, 5,250, 5,500, 5,750, 6,000, 6,250, 6,500, 6,750, 7,000, 7,250, 7,500,
7.750, 8,000, 8,250, 8,500, 8,750, 9,000, 9,250, 9,500, 9,750, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, or 20,000 bp (base pairs) or nucleotides (nt), or any range between these values. In some embodiments, the transgene is flanked by the region with complementarity to a region 5' to the double-stranded DNA site and the region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the transgene comprises an open reading frame (ORF). In some embodiments, the transgene comprises a promoter operably linked to an ORF. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide. In
some embodiments, the insert DNA molecule is linked to a hydroxyl (e.g. catalytic hydroxyl) group of the domain having DNA topoisomerase activity at a first end, and the insert DNA molecule comprises the region homologous to a region 5' to the nucleic acid site or the region homologous to a region 3' to the nucleic acid site at a second end. In some embodiments, the insert DNA molecule comprises a first end configured to hybridize with a hybridization domain of a guide polynucleotide at the 3' end of the insert DNA when the guide polynucleotide further comprises a hybridization domain at a 3' end.
[0067] The guide polynucleotide configured to interact with the Cast 2 polypeptide (e.g. first Casl2 polypeptide or second Casl2 polypeptide, either intact or part of segments described herein) can be any suitable guide polynucleotide (e.g. an RNA comprising guide suitable for interacting with at Casl2f enzyme or a Class 2, Type V-F enzyme, or a mixture of RNA and DNA). In some embodiments, the guide polynucleotide further comprises a hybridization domain at a 3' end. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides. In some embodiments, the hybridization domain comprises at least about 1, 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides to at most about 2, 5, 7, 10, 12, 15, 17, 20, 22, 25, 27, 30, 32, 35, 37, 40, 42, 45, 47, or 50 nucleotides, or any range between these values.
[0068] The composition can comprise a pharmaceutically acceptable excipient. The excipient can comprise a transfection agent (e.g. a liposome or a lipid nanoparticle). In some embodiments, a fusion protein of the disclosure is provided in a lipid nanoparticle (LNP) by encapsulating the fusion protein with an optional guide polynucleotide or insert DNA molecule into the LNP. This can be performed using methodologies documented e.g. in Finn et al. Cell Rep. 2018 Feb 27;22(9):2227-2235. doi: 10.1016/j.celrep.2018.02.014 or Yin et al. Nat Biotechnol. 2016 Mar;34(3):328-33. doi: 10.1038/nbt.3471, both of which are incorporated by reference herein in their entireties for all purposes.
[0069] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to the site (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein the first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double- stranded DNA site or a region with
complementarity to a region 3' to the nucleic acid site; and (iii) a guide polynucleotide configured to interact with the first Casl2 polypeptide or the second Casl2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the second fragment further comprises a REC2 domain of the second Casl2 polypeptide. In some embodiments, the first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100-1500 amino acids in length. In some embodiments, the heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the first Casl2 polypeptide and the second Casl2 polypeptide comprise a same Casl2 polypeptide. In some embodiments, the first Casl2 polypeptide and the second Casl2 polypeptide comprise different Casl2 polypeptides. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a singlestranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide. In some embodiments: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA.
[0070] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to the site (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein the first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double- stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site; and (iii) a guide polynucleotide configured to interact with the first Casl2 polypeptide or the second Casl2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the second fragment further comprises a REC2 domain of the second Casl2 polypeptide. In some embodiments, the first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100-1500 amino acids in
length. In some embodiments, the heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the first Casl2 polypeptide and the second Casl2 polypeptide comprise a same Casl2 polypeptide. In some embodiments, the first Casl2 polypeptide and the second Casl2 polypeptide comprise different Casl2 polypeptides. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a singlestranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide. In some embodiments,: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA. In some embodiments, the cell is a bacterial, archaeal, plant, mammalian, primate, or human cell.
[0071] In some aspects, the present disclosure provides for a method of editing a doublestranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to the site
(i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and (c) a third segment comprising a heterologous domain of at least about 100 amino acids; (ii) an insert DNA molecule comprising a region with complementarity to a region 5' to the doublestranded DNA site or a region with complementarity to a region 3' to the nucleic acid site; and (iii) a guide polynucleotide configured to interact with the first Cast 2 polypeptide or the second Cast 2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the Casl2 polypeptide, the first Casl2 polypeptide, or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100-1500 amino acids in length. In some embodiments, the heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2
polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Casl2 polypeptide. In some embodiments,: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA. In some embodiments, said cell is a bacterial, archaeal, plant, mammalian, primate, or human cell. [0072] In some aspects, the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first fragment comprising WED and RECI domains of a first Cast 2 polypeptide; (b) a heterologous domain comprising at least about 100 amino acids; and (c) a second fragment comprising RuvC and Nuc domains of a second Cast 2 polypeptide, wherein the first and second Cast 2 polypeptide are configured to bind a doublestranded deoxyribonucleic acid (DNA) site; and (ii) a guide polynucleotide configured to interact with the first Cast 2 polypeptide or the second Cast 2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the kit further comprises (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double-stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the second fragment further comprises a REC2 domain of the second Casl2 polypeptide. In some embodiments, the first or second Casl2 polypeptide is a Class 2, Type V- F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100-1500 amino acids in length. In some embodiments, the heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, the insert DNA molecule is: (i) linked to the first or the second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Casl2 polypeptide. In some embodiments,: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA. In some embodiments, the kit further comprises a transfection agent. In some embodiments, the kit further comprises instructions for targeting the DNA site.
[0073] In some aspects, the present disclosure provides for a kit for disrupting a DNA site, comprising (i) a fusion protein comprising: (a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site; (b) a second segment comprising either: (A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or (B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; (ii) a guide polynucleotide configured to interact with the first Cast 2 polypeptide or the second Cast 2 polypeptide and configured to hybridize to the DNA site. In some embodiments, the kit further comprises (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to the double- stranded DNA site or a region with complementarity to a region 3' to the nucleic acid site. In some embodiments, the second fragment further comprises a REC2 domain of the second Cast 2 polypeptide. In some embodiments, the first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. In some embodiments, the heterologous domain comprises at least about 100- 1500 amino acids in length. In some embodiments, the heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. In some embodiments, the insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. In some embodiments, In some embodiments, insert DNA molecule is: (i) linked to the first or the second Casl2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with the first or the second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with the first or the second Casl2 polypeptide. In some embodiments: (a) the guide polynucleotide further comprises a hybridization domain configured to hybridize to the DNA site at a 3' end; and (b) the insert DNA molecule comprises a first end configured to hybridize with the hybridization domain of the guide polynucleotide at the 3' end of the insert DNA. In some embodiments, the kit further comprises a transfection agent. In some embodiments, the kit further comprises instructions for targeting the DNA site.
Table 1: Sequences of Genes and Components Described Herein (change in text style format denotes domain boundaries below)
EXAMPLES
[0074] Example 1. -Testing activity of split Casl2f complexes
[0075] This example demonstrates that Casl2f group enzymes can be rearranged into a split domain format inserting a heterologous domain in between N- and C-terminal domains to allow for e.g. new enzymatic activity while simultaneously preserving specific guided DNA cleavage activity.
[0076] Protein expression and purification
[0077] All constructs were expressed and purified using the same method. Recombinant protein coding sequences were cloned into the pET45 (EMD Millipore) vector, and the vector was transformed into BL21 (DE3)pLysS E. coli (Therm ofi scher) for expression. The protein was expressed at 20°C for 48h using an Overnight Express Instant TB Media (EMD Millipore). After incubation, the E. coli biomass was harvested by centrifugation 2 min at 4500 RFC and frozen at -80°C. The biomass was lysed with BugBuster protein extraction reagent (EMD Millipore), which additionally included 90U rLysozyme per 10ml of lysate (EMD Millipore), 1 tablet protease inhibitor per 10 mL of lysate (Pierce Protease Inhibitor mini tablets, EDTAS- firee from Thermo Scientific), 50mM sodium phosphate pH7.7, 0.05% TritonX, and 2.5mM TCEP. Lysis was conducted at 12°C for 45 min. Next, the lysate was mixed with dilution buffer (50mM sodium phosphate pH7.7, IM NaCl, 0.05% TritonX, 2.5mM TCEP) using a ratio of 1 : 1 and incubated for 45 min at 12°C. After incubation the preparation was centrifugated 14000rpm for Ih at 8C. Purification was conducted using a batch method with His- Affinity Gel
(Zymo Research), and included a loading procedure, wash procedure, and elution procedure. The washing buffer included 50mM sodium phosphate, 0.5M NaCl, 30mM Imidazole, 0.05% TritonX and 2.5mM TCEP. The protein was eluted using 50mM sodium phosphate pH7.7, 300mM NaCl, 300mM Imidazole, 0.05% TritonX and 2.5mMTCEP. In a concluding procedure, the eluted protein was dialyzed at room temperature for 3h using Slide-A-Lyzer Dialysis Cassette G2 (Thermo Scientific) where the dialysis buffer included: 20mM Tris-HCl pH7.5, 300mM NaCl, 0.05% TritonX and 2mM DTT. The products of purification were analyzed using PAGE-SDS electrophoresis (see the Figure 7 and 8 top panel) and quantify using The Qubit Protein Assay (Therm ofi scher).
Target cleavage - in vitro assay
[0078] Following purification of the split casl2f constructs; ribonucleoprotein complex formation was assessed by proper Cas function and DNA cleavage.
[0079] First, ribonucleoprotein complexes from the purified proteins were constructed. In this experiment six different variants were tested - SpCasl2f (SEQ ID NO: 70), Ptcasl2f (SEQ ID NO: 71), SpCasl2f-inter (SEQ ID NO: 9), Ptcasl2 C-tag (SEQ ID NO: 72) and Ptcasl2f N-tag (SEQ ID NO: 73). The complex formation was conducted at 37°C for 30 min. the reaction includes: 1 pM gRNA, 1 pM Cas variant, 14mM Tris-HCl pH 7.5, 80mM NaCl, ImM DTT and 0.01% TritonX. The gRNA_Sp (SEQ ID NO: 56)was used in the reaction with enzymes including Sp Casl2f components were the gRNA Pt (SEQ ID NO: 57) was used with Pt Casl2f components including enzymes.
[0080] After reconstitution as ribonucleoprotein complexes, the complexes above were used in DNA cleavage reactions. The reactions included: 0.7pM Cas-gRNA ribonucleoprotein complex, 12.5 mM Tris-HCl pH7.5, 53 mM NaCl, 1 mM DTT, 0.01% TritonX, 5mM MgCl2„ and lOnM target DNA (DNA_Sp cleavage substrate or DNA_Pt cleavage substrate). gRNA_Pt and gRNA Sp were generated using HiScribe T7 and Monarch RNA Cleanup Kit (both NEB) before the reactions accordingly to manufacturer protocol. The DNA Sp cleavage substrate (target DNA) was used in the reactions with enzymes including Sp Casl2f components, while DNA Pt was used in the reaction with Pt Casl2f components including enzymes; both target DNA substrates were 513bp long and included the target sequence AGTTGACCCAACGTCGCCGG. The reaction was conducted at 37°C for Ih. The Products of the reaction were analyzed using Agarose-gel electrophoresis. Successful cleavage reactions generated two products: ~215bp and ~298bp.
[0081] As can be seen in the agarose gels in FIG. 5A and FIG. 5B (bottom panels), of the split- easi 2f variants tested retained activity similar to wild-type, as they were able to cleave the target DNA fragments into appropriate sizes.
[0082] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
Claims
WHAT IS CLAIMED IS:
1. A composition comprising a fusion protein comprising:
(a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide;
(b) a heterologous domain comprising at least about 100 amino acids; and
(c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site.
2. The composition of claim 1, wherein said second fragment further comprises a REC2 domain of said second Casl2 polypeptide.
3. The composition of claim 1 or 2, wherein said first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.
4. The composition of any one of claims 1-3, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.
5. The composition of any one of claims 1-4, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.
6. The composition of claim 5, wherein said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue.
7. The composition of claim 5 or 6, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, Klenow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof.
8. The composition of any one of claims 1-7, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof.
9. The composition of any one of claims 1-5, wherein said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain.
10. The composition of claim 9, wherein said Type I topoisomerase domain comprises E. coll Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase III, H. sapiens DNA topoisomerase Illa or 11 ip, S.
acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or M. kandleri hyperthermophilic eubacterial DNA topoisomerase V, phiX174 protein A, or a functional fragment thereof. The composition of any one of claims 1-6, wherein said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain. The composition of claim 11, wherein said type II topoisomerase domain comprises E. coli eubacterial DNA gyrase, E. coli eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or I IQ, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof. The composition of any one of claims 1-6, wherein said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55. The composition of any one of claims 1-13, wherein said first Casl2 polypeptide and said second Casl2 polypeptide comprise a same Casl2 polypeptide. The composition of any one of claims 1-13, wherein said first Cast 2 polypeptide and said second Casl2 polypeptide comprise different Casl2 polypeptides. The composition of any one of claims 1-15, wherein said first Casl2 polypeptide and said second Casl2 polypeptide do not comprise an inactivating mutation in an active site residue of said first Cast 2 polypeptide or said second Cast 2 polypeptide. The composition of any one of claims 1-15, wherein said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof. The composition of any one of claims 1-15, wherein said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43. The composition of any one of claims 1-18, further comprising an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site. The composition of claim 19, wherein said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp. The composition of claims 19 or 20, wherein said insert nucleic acid sequence comprises at least about Ibp to at least about 20 kb. The composition of any one of claims 19-21, wherein said insert DNA molecule is a singlestranded deoxyribonucleic acid molecule, at least partially a single-stranded
-n-
deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. The composition of any one of claims 19-22, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to a guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide. The composition of claim 22, further comprising a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Cast 2 polypeptide, wherein
(a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and
(b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. The composition of claim 24, wherein said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA. The composition of any one of claims 19-22, or 24-25, wherein said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end. The composition of any one of claims 1-25, further comprising a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment. The composition of claim 26, wherein said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. The composition of claim 26, wherein said linker comprises LPXTG, GGG, (GGG)n, (GGGGS)n, (GGGS)n, N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/Spy Catcher sequences linked by an isopeptide bond. A composition comprising a fusion protein comprising:
(a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site;
(b) a second segment comprising either:
(i) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or
(ii) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and
(c) a third segment comprising a heterologous domain of at least about 100 amino acids. The composition of claim 30, wherein said first segment further comprises WED, RECI, RuvC, REC2, and Nuc domains of said first Cast 2 polypeptide. The composition of claim 30 or 31, wherein said Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide. The composition of any one of claims 30-32, wherein said heterologous domain comprises at least about 100-900 amino acids in length. The composition of any one of claims 30-33, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity. The composition of any one of claims 30-34, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity. The composition of claim 35, wherein said domain with DNA-dependent DNA polymerase activity or said domain with Topoisomerase activity do not comprise inactivating mutations in an active site residue. The composition of claim 35 or 36, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a T7 DNA polymerase, Bst polymerase or an analog thereof, T4 DNA polymerase, Taq polymerase, Vent polymerase, Q5 polymerase, KI enow fragment, DNA polymerase theta, or Phi29 polymerase, or a functional fragment or derivative thereof. The composition of any one of claims 30-37, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity, and comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 44-52, or a variant thereof. The composition of any one of claims 30-35, wherein said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type I topoisomerase domain. The composition of claim 39, wherein said Type I topoisomerase domain comprises E. coll Eubacterial DNA topoisomerase I, E. coli Eubacterial DNA topoisomerase III, S. cerevisiae Yeast DNA topoisomerase HU, H. sapiens DNA topoisomerase Illa or 111[l, S. acidocaldarius eubacterial and archaeal reverse DNA gyrase, M. kandleri eubacterial reverse gyrase, H. sapiens eukaryotic DNA topoisomerase I, Vaccinia poxvirus topoisomerase I, or M. kandleri hyperthermophilic eubacterial DNA topoisomerase V, phi XI 74 protein A, or a functional fragment thereof.
The composition of any one of claims 30-36, wherein said heterologous domain comprises a domain with Topoisomerase activity, and comprises a Type II topoisomerase domain. The composition of claim 41, wherein said type II topoisomerase domain comprises E. coll eubacterial DNA gyrase, E. coll eubacterial DNA topoisomerase IV, S. cerevisiae yeast DNA topoisomerase II, H. sapiens mammalian DNA topoisomerase Ila or I IQ, or S. shibatae archaeal DNA topoisomerase VI, or a functional fragment thereof. The composition of any one of claims 30-36, wherein said heterologous domain comprises a domain with topoisomerase activity and said heterologous domain comprises a sequence with at least 80% sequence identity to any one of SEQ ID NOs: 53-55. The composition of any one of claims 30-43, wherein said first Casl2 polypeptide and said second Casl2 polypeptide are a same Casl2 polypeptide. The composition of any one of claims 30-43, wherein said first Cast 2 polypeptide and said second Casl2 polypeptide are different Casl2 polypeptides. The composition of any one of claims 30-45, wherein said first Casl2 polypeptide and said second Casl2 polypeptide do not comprise an inactivating mutation in an active site residue of said first Cast 2 polypeptide or said second Cast 2 polypeptide. The composition of any one of claims 30-45, wherein said first fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 1, 5, 13, 15, or 24-34, or a variant thereof. The composition of any one of claims 30-45, wherein said second fragment comprises a sequence with at least 80% identity to any one of SEQ ID NOs: 2, 6, 11, or 35-43. The composition of any one of claims 30-48, further comprising an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site. The composition of claim 49, wherein said region with complementarity to a region 5' to said nucleic acid site or said region with complementarity to a region 3' to said nucleic acid site comprises at least 4 to 30 bp or at least 4 to 400 bp. The composition of claims 49 or 50, wherein said insert nucleic acid sequence comprises at least about Ibp to at least about 20 kb. The composition of any one of claims 49-51, wherein said insert DNA molecule is a singlestranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule. The composition of any one of claims 49-52, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to a guide polynucleotide
-SO-
configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to a guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide. The composition of claim 52, wherein said composition further comprises a guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide and configured to hybridize to said DNA site, wherein
(a) said guide polynucleotide further comprises a hybridization domain at a 3' end; and
(b) wherein said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. The composition of claim 54, wherein said insert DNA molecule comprises a region with complementarity to a region 5' to said double-stranded DNA site at said 5' end of said insert DNA. The composition of any one of claims 30-52, and 54-55, wherein said insert DNA molecule is linked to a catalytic hydroxyl group of said domain having DNA topoisomerase activity at a first end, and wherein said insert DNA molecule comprises said region homologous to a region 5' to said nucleic acid site or said region homologous to a region 3' to said nucleic acid site at a second end. The composition of any one of claims 30-55, further comprising a linker between said first fragment and said heterologous domain, or between said heterologous domain and said second fragment. The composition of claim 56, wherein said linker comprises a peptide bond, an isopeptide bond, a small molecule linker, or a combination thereof. The composition of claim 56, wherein said linker comprises LPXTG, GGG, (GGG)n, (GGGGS)n, (GGGS)n, N1.7, a biotin-streptavidin pair or an analog of a biotin-streptavidin pair, or SpyTag/SpyCatcher sequences linked by an isopeptide bond. A fusion protein comprising a sequence having at least 80% identity to any one of SEQ ID Nos: 20-23. A method of editing a nucleic acid site in a cell, comprising contacting to said cell said composition of any one of claims 1-60. The method of claim 61, wherein said cell is a bacterial, archaeal, plant, mammalian, primate, or human cell. A method of editing a double-stranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site
(i) a fusion protein comprising:
(a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide;
(b) a heterologous domain comprising at least about 100 amino acids; and
(c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site;
(ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site; and
(iii) a guide polynucleotide configured to interact with said first Casl2 polypeptide or said second Casl2 polypeptide and configured to hybridize to said DNA site.
64. The method of claim 63, wherein said second fragment further comprises a REC2 domain of said second Cast 2 polypeptide.
65. The method of claim 63 or 64, wherein said first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.
66. The method of any one of claims 63-65, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.
67. The method of claim 66, wherein said heterologous domain comprises a domain with DNA- dependent DNA polymerase activity or a domain with Topoisomerase activity.
68. The method of any one of claims 63-67, wherein said first Cast 2 polypeptide and said second Casl2 polypeptide comprise a same Casl2 polypeptide.
69. The method of any one of claims 63-68, wherein said first Casl2 polypeptide and said second Casl2 polypeptide comprise different Casl2 polypeptides.
70. The method of any one of claims 63-69, wherein said insert DNA molecule is a singlestranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
71. The method of any one of claims 63-70, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide.
72. The method of claim 71, wherein: (a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and (b) said
insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA.
73. A method of editing a double-stranded deoxyribonucleic acid (DNA) site in a cell, comprising contacting to said site
(i) a fusion protein comprising:
(a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site;
(b) a second segment comprising either:
(A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or
(B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide; and
(c) a third segment comprising a heterologous domain of at least about 100 amino acids;
(ii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site; and
(iii) a guide polynucleotide configured to interact with said first Casl2 polypeptide or said second Casl2 polypeptide and configured to hybridize to said DNA site.
74. The method of claim 73, wherein said Casl2 polypeptide, said first Casl2 polypeptide, or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.
75. The method of claim 73 or 74, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.
76. The method of any one of claims 73-75, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.
77. The method of any one of claims 73-76, wherein said insert DNA molecule is a singlestranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
78. The method of any one of claims 73-77, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide.
79. The method of claim 78, wherein: (a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and (b) said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA.
80. A kit for disrupting a DNA site, comprising
(i) a fusion protein comprising:
(a) a first fragment comprising WED and RECI domains of a first Casl2 polypeptide;
(b) a heterologous domain comprising at least about 100 amino acids; and
(c) a second fragment comprising RuvC and Nuc domains of a second Casl2 polypeptide, wherein said first and second Cast 2 polypeptide are configured to bind a double-stranded deoxyribonucleic acid (DNA) site; and
(ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Casl2 polypeptide and configured to hybridize to said DNA site.
81. The kit of claim 80, further comprising (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site.
82. The kit of claim 80, wherein said second fragment further comprises a REC2 domain of said second Casl2 polypeptide.
83. The kit of claim 80 or 81, wherein said first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.
84. The kit of any one of claims 80-83, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.
85. The kit of any one of claims 80-84, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.
86. The kit of any one of claims 81-85, wherein said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
87. The kit of any one of claims 81-86, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide.
88. The kit of claim 87, wherein: (a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and (b) said
insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA.
89. The kit of any one of claims 80-88, further comprising a transfection agent.
90. The kit of any one of claims 80-89, further comprising instructions for targeting said DNA site.
91. A kit for disrupting a DNA site, comprising
(i) a fusion protein comprising:
(a) a first segment comprising a Casl2 polypeptide configured to bind a double-stranded deoxyribonucleic acid (DNA) site;
(b) a second segment comprising either:
(A) a sequence comprising WED and RECI domains of a first Casl2 polypeptide; or
(B) a sequence comprising RuvC, REC2, and Nuc domains of a second Casl2 polypeptide;
(ii) a guide polynucleotide configured to interact with said first Cast 2 polypeptide or said second Casl2 polypeptide and configured to hybridize to said DNA site.
92. The kit of claim 91, further comprising (iii) an insert DNA molecule comprising a region with complementarity to a region 5' to said double-stranded DNA site or a region with complementarity to a region 3' to said nucleic acid site.
93. The kit of claim 91 or 92, wherein said second fragment further comprises a REC2 domain of said second Cast 2 polypeptide.
94. The kit of any one of claims 91-93, said first or second Casl2 polypeptide is a Class 2, Type V-F or a Casl2f polypeptide.
95. The kit of any one of claims 91-94, wherein said heterologous domain comprises at least about 100-1500 amino acids in length.
96. The kit of any one of claims 91-95, wherein said heterologous domain comprises a domain with DNA-dependent DNA polymerase activity or a domain with Topoisomerase activity.
97. The kit of any one of claims 92-96, wherein said insert DNA molecule is a single-stranded deoxyribonucleic acid molecule, at least partially a single-stranded deoxyribonucleic acid molecule, or at least partially a double-stranded deoxyribonucleic acid molecule.
98. The kit of any one of claims 92-97, wherein said insert DNA molecule is: (i) linked to said first or said second Cast 2 polypeptide; (ii) linked to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide; or (iii) hybridized to the guide polynucleotide configured to interact with said first or said second Cast 2 polypeptide.
The kit of claim 98, wherein: (a) said guide polynucleotide further comprises a hybridization domain configured to hybridize to said DNA site at a 3' end; and (b) said insert DNA molecule comprises a first end configured to hybridize with said hybridization domain of said guide polynucleotide at said 3' end of said insert DNA. . The kit of any one of claims 91-99, further comprising a transfection agent. . The kit of any one of claims 91-100, further comprising instructions for targeting said DNA site.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263380047P | 2022-10-18 | 2022-10-18 | |
US63/380,047 | 2022-10-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024086596A1 true WO2024086596A1 (en) | 2024-04-25 |
Family
ID=90738518
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/077111 WO2024086596A1 (en) | 2022-10-18 | 2023-10-17 | Polypeptide fusions or conjugates for gene editing |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024086596A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018089664A1 (en) * | 2016-11-11 | 2018-05-17 | The Regents Of The University Of California | Variant rna-guided polypeptides and methods of use |
WO2020168132A1 (en) * | 2019-02-13 | 2020-08-20 | Beam Therapeutics Inc. | Adenosine deaminase base editors and methods of using same to modify a nucleobase in a target sequence |
WO2022040909A1 (en) * | 2020-08-25 | 2022-03-03 | Institute Of Zoology, Chinese Academy Of Sciences | Split cas12 systems and methods of use thereof |
WO2022155532A1 (en) * | 2021-01-15 | 2022-07-21 | 4M Genomics Inc. | Polypeptide fusions or conjugates for gene editing |
-
2023
- 2023-10-17 WO PCT/US2023/077111 patent/WO2024086596A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018089664A1 (en) * | 2016-11-11 | 2018-05-17 | The Regents Of The University Of California | Variant rna-guided polypeptides and methods of use |
WO2020168132A1 (en) * | 2019-02-13 | 2020-08-20 | Beam Therapeutics Inc. | Adenosine deaminase base editors and methods of using same to modify a nucleobase in a target sequence |
WO2022040909A1 (en) * | 2020-08-25 | 2022-03-03 | Institute Of Zoology, Chinese Academy Of Sciences | Split cas12 systems and methods of use thereof |
WO2022155532A1 (en) * | 2021-01-15 | 2022-07-21 | 4M Genomics Inc. | Polypeptide fusions or conjugates for gene editing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020223370B2 (en) | Enzymes with RuvC domains | |
EP2527438B1 (en) | Methods and compositions for DNA fragmentation and tagging by transposases | |
JP4220557B2 (en) | Overexpression and purification of truncated thermostable DNA polymerase by protein fusion | |
JP2018068311A (en) | Materials and methods for synthesizing nucleic acid molecules with minimum error | |
US20140295492A1 (en) | Methods for Cell-Free Protein Synthesis | |
JP6963238B2 (en) | DNA polymerase mutant | |
JP5612469B2 (en) | Mutant DNA polymerase and related methods | |
JP2003510052A (en) | Methods and compositions for improved polynucleotide synthesis | |
CN116096892A (en) | Enzyme with RuvC domain | |
US20240101987A1 (en) | Polypeptide fusions or conjugates for gene editing | |
JP2007043963A (en) | Dna ligase variant | |
US20150284768A1 (en) | Eukaryotic transposase mutants and transposon end compositions for modifying nucleic acids and methods for production and use in the generation of sequencing libraries | |
JP2017178804A (en) | Fusion protein | |
WO2024086596A1 (en) | Polypeptide fusions or conjugates for gene editing | |
US20170114333A1 (en) | Improvements to eukaryotic transposase mutants and transposon end compositions for modifying nucleic acids and methods for production and use in the generation of sequencing libraries | |
US20110294168A1 (en) | Dna polymerases and related methods | |
US20110020896A1 (en) | Mutant dna polymerases and their genes | |
Slesarev et al. | [15] Topoisomerase V from Methanopyrus kandleri | |
JP4533990B2 (en) | Sugar nucleotide synthase mutant | |
US20230332118A1 (en) | Dna polymerase and dna polymerase derived 3'-5'exonuclease | |
Ohnishi et al. | Identification and characterization of Thermus thermophilus HB8 RuvA protein, the subunit of the RuvAB protein complex that promotes branch migration of Holliday junctions | |
EP4399290A1 (en) | Class ii, type v crispr systems | |
KR101151602B1 (en) | Method for improving the performance of PCR and RT-PCR using a Klenow fragment | |
JP2004024102A (en) | Expression vector, host, fusion protein, protein, method for producing fusion protein and method for producing protein | |
EP4041877A1 (en) | Dna polymerase and dna polymerase derived 3'-5'exonuclease |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23880728 Country of ref document: EP Kind code of ref document: A1 |