WO2024168147A9 - Evolved recombinases for editing a genome in combination with prime editing - Google Patents
Evolved recombinases for editing a genome in combination with prime editing Download PDFInfo
- Publication number
- WO2024168147A9 WO2024168147A9 PCT/US2024/014998 US2024014998W WO2024168147A9 WO 2024168147 A9 WO2024168147 A9 WO 2024168147A9 US 2024014998 W US2024014998 W US 2024014998W WO 2024168147 A9 WO2024168147 A9 WO 2024168147A9
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- recombinase
- site
- bxb1
- recombinase recognition
- recognition site
- Prior art date
Links
- 102000018120 Recombinases Human genes 0.000 title claims abstract description 825
- 108010091086 Recombinases Proteins 0.000 title claims abstract description 825
- 102000040430 polynucleotide Human genes 0.000 claims abstract description 238
- 108091033319 polynucleotide Proteins 0.000 claims abstract description 238
- 239000002157 polynucleotide Substances 0.000 claims abstract description 238
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 178
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 166
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 166
- 238000000034 method Methods 0.000 claims abstract description 105
- 108091028043 Nucleic acid sequence Proteins 0.000 claims abstract description 53
- 238000003780 insertion Methods 0.000 claims abstract description 51
- 230000037431 insertion Effects 0.000 claims abstract description 51
- 239000000203 mixture Substances 0.000 claims abstract description 43
- 230000006798 recombination Effects 0.000 claims abstract description 43
- 238000005215 recombination Methods 0.000 claims abstract description 43
- 238000012217 deletion Methods 0.000 claims abstract description 21
- 230000037430 deletion Effects 0.000 claims abstract description 21
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 297
- 108020004414 DNA Proteins 0.000 claims description 294
- 150000001413 amino acids Chemical class 0.000 claims description 144
- 230000006820 DNA synthesis Effects 0.000 claims description 142
- 108091033409 CRISPR Proteins 0.000 claims description 141
- 108020005004 Guide RNA Proteins 0.000 claims description 117
- 101000607560 Homo sapiens Ubiquitin-conjugating enzyme E2 variant 3 Proteins 0.000 claims description 112
- 102100039936 Ubiquitin-conjugating enzyme E2 variant 3 Human genes 0.000 claims description 112
- 102100034343 Integrase Human genes 0.000 claims description 109
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 108
- 210000004027 cell Anatomy 0.000 claims description 102
- 108090000623 proteins and genes Proteins 0.000 claims description 95
- 230000010354 integration Effects 0.000 claims description 63
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 61
- 238000006467 substitution reaction Methods 0.000 claims description 56
- 102220407717 c.13G>A Human genes 0.000 claims description 55
- 238000009434 installation Methods 0.000 claims description 45
- 230000000295 complement effect Effects 0.000 claims description 39
- 210000000349 chromosome Anatomy 0.000 claims description 37
- 102220414501 c.40G>A Human genes 0.000 claims description 33
- 125000006850 spacer group Chemical group 0.000 claims description 31
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 30
- 102200069315 rs751222088 Human genes 0.000 claims description 26
- 108010008532 Deoxyribonuclease I Proteins 0.000 claims description 25
- 102000007260 Deoxyribonuclease I Human genes 0.000 claims description 25
- 230000001404 mediated effect Effects 0.000 claims description 24
- 102220315565 rs1553606121 Human genes 0.000 claims description 23
- 102220575132 Leucine-rich repeat transmembrane protein FLRT1_S86N_mutation Human genes 0.000 claims description 21
- 102100030768 ETS domain-containing transcription factor ERF Human genes 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 17
- 102220347676 c.153C>G Human genes 0.000 claims description 15
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 15
- 102200033756 rs55637216 Human genes 0.000 claims description 14
- 102220084672 rs864309678 Human genes 0.000 claims description 12
- 238000011144 upstream manufacturing Methods 0.000 claims description 12
- 102220324105 rs1223305093 Human genes 0.000 claims description 11
- 102200102666 rs3179969 Human genes 0.000 claims description 11
- 102220501223 Scaffold attachment factor B1_W35L_mutation Human genes 0.000 claims description 10
- 102220348732 c.223A>G Human genes 0.000 claims description 10
- -1 e.g. Proteins 0.000 claims description 10
- 102220574283 5-hydroxytryptamine receptor 2A_S86T_mutation Human genes 0.000 claims description 9
- 102220363426 c.1358C>A Human genes 0.000 claims description 9
- 201000010099 disease Diseases 0.000 claims description 9
- 102220294274 rs149101812 Human genes 0.000 claims description 9
- 102220084657 rs864309703 Human genes 0.000 claims description 9
- 102220466635 Aromatic-L-amino-acid decarboxylase_M239L_mutation Human genes 0.000 claims description 8
- 125000000539 amino acid group Chemical group 0.000 claims description 8
- 102220309555 rs864309703 Human genes 0.000 claims description 8
- 102220585505 D site-binding protein_H89N_mutation Human genes 0.000 claims description 7
- 102220631431 Histone H3.2_A97S_mutation Human genes 0.000 claims description 7
- 241000713869 Moloney murine leukemia virus Species 0.000 claims description 7
- 102220468581 Podocin_T116P_mutation Human genes 0.000 claims description 7
- 102220323621 c.57_58invTG Human genes 0.000 claims description 7
- 102200010360 rs1057520644 Human genes 0.000 claims description 7
- 102220220120 rs1060500786 Human genes 0.000 claims description 7
- 102220055260 rs140446520 Human genes 0.000 claims description 7
- 102220065899 rs201559220 Human genes 0.000 claims description 7
- 102200029884 rs2848477 Human genes 0.000 claims description 7
- 102220005529 rs28928881 Human genes 0.000 claims description 7
- 102220049168 rs587783633 Human genes 0.000 claims description 7
- 102220075586 rs760419507 Human genes 0.000 claims description 7
- 102200067663 rs80358463 Human genes 0.000 claims description 7
- 102220587361 Cellular tumor antigen p53_L93M_mutation Human genes 0.000 claims description 6
- 102000052510 DNA-Binding Proteins Human genes 0.000 claims description 6
- 101710096438 DNA-binding protein Proteins 0.000 claims description 6
- 102220358754 c.107A>G Human genes 0.000 claims description 6
- 208000035475 disorder Diseases 0.000 claims description 6
- 102200115778 rs121918093 Human genes 0.000 claims description 6
- 102220243541 rs1555280115 Human genes 0.000 claims description 6
- 102220257881 rs368473853 Human genes 0.000 claims description 6
- 102220257927 rs749807415 Human genes 0.000 claims description 6
- 102220518048 NAD-dependent protein deacetylase sirtuin-1_S47A_mutation Human genes 0.000 claims description 5
- 102220346698 c.13G>T Human genes 0.000 claims description 5
- 102220356875 c.260T>C Human genes 0.000 claims description 5
- 239000003814 drug Substances 0.000 claims description 5
- 102220032394 rs104895290 Human genes 0.000 claims description 5
- 102220226170 rs1064795187 Human genes 0.000 claims description 5
- 102200043783 rs17850684 Human genes 0.000 claims description 5
- 102220181989 rs186074112 Human genes 0.000 claims description 5
- 102200080525 rs200322968 Human genes 0.000 claims description 5
- 102220495179 NAD(P)H pyrophosphatase NUDT13, mitochondrial_S10A_mutation Human genes 0.000 claims description 4
- 230000002441 reversible effect Effects 0.000 claims description 4
- 102220049133 rs143343083 Human genes 0.000 claims description 4
- 102200043454 rs281875272 Human genes 0.000 claims description 4
- 102220521112 Nuclear autoantigenic sperm protein_H321N_mutation Human genes 0.000 claims description 3
- 238000001727 in vivo Methods 0.000 claims description 2
- 108020004999 messenger RNA Proteins 0.000 claims description 2
- 102220226390 rs5030803 Human genes 0.000 claims 15
- 102200051804 rs387907345 Human genes 0.000 claims 11
- 102220538072 CCR4-NOT transcription complex subunit 3_E20Q_mutation Human genes 0.000 claims 7
- 102220005384 rs33939620 Human genes 0.000 claims 4
- 102200029974 rs1056827 Human genes 0.000 claims 3
- 102220215776 rs1060502560 Human genes 0.000 claims 3
- 102200114143 rs9282858 Human genes 0.000 claims 3
- 238000003259 recombinant expression Methods 0.000 claims 2
- 102200059835 rs11557488 Human genes 0.000 claims 2
- 102200037858 rs387906703 Human genes 0.000 claims 2
- 102220465744 Protein angel homolog 1_E139A_mutation Human genes 0.000 claims 1
- 238000000338 in vitro Methods 0.000 claims 1
- 238000004519 manufacturing process Methods 0.000 claims 1
- 102220214800 rs1060503568 Human genes 0.000 claims 1
- 102200066795 rs183846665 Human genes 0.000 claims 1
- 102220076579 rs574164748 Human genes 0.000 claims 1
- 102200074518 rs786200936 Human genes 0.000 claims 1
- 102220123720 rs886043483 Human genes 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 40
- 230000005945 translocation Effects 0.000 abstract description 9
- 230000035772 mutation Effects 0.000 description 723
- 235000001014 amino acid Nutrition 0.000 description 240
- 229940024606 amino acid Drugs 0.000 description 130
- 239000002773 nucleotide Substances 0.000 description 79
- 125000003729 nucleotide group Chemical group 0.000 description 79
- 235000018102 proteins Nutrition 0.000 description 69
- 102000004169 proteins and genes Human genes 0.000 description 69
- 239000013612 plasmid Substances 0.000 description 57
- 230000027455 binding Effects 0.000 description 48
- 101710163270 Nuclease Proteins 0.000 description 41
- 102000053602 DNA Human genes 0.000 description 39
- 102000037865 fusion proteins Human genes 0.000 description 26
- 108020001507 fusion proteins Proteins 0.000 description 26
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 23
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 23
- 239000012634 fragment Substances 0.000 description 18
- 238000006116 polymerization reaction Methods 0.000 description 17
- 108091079001 CRISPR RNA Proteins 0.000 description 16
- 101001000998 Homo sapiens Protein phosphatase 1 regulatory subunit 12C Proteins 0.000 description 16
- 102100035620 Protein phosphatase 1 regulatory subunit 12C Human genes 0.000 description 16
- 101000910035 Streptococcus pyogenes serotype M1 CRISPR-associated endonuclease Cas9/Csn1 Proteins 0.000 description 16
- 101710149870 C-C chemokine receptor type 5 Proteins 0.000 description 14
- 102100035875 C-C chemokine receptor type 5 Human genes 0.000 description 14
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 description 14
- 230000015572 biosynthetic process Effects 0.000 description 14
- 238000010362 genome editing Methods 0.000 description 14
- 229920001184 polypeptide Polymers 0.000 description 14
- 102000004196 processed proteins & peptides Human genes 0.000 description 14
- 108091028113 Trans-activating crRNA Proteins 0.000 description 13
- 230000008859 change Effects 0.000 description 13
- 230000002068 genetic effect Effects 0.000 description 13
- 238000010839 reverse transcription Methods 0.000 description 13
- 239000000523 sample Substances 0.000 description 13
- 230000037432 silent mutation Effects 0.000 description 13
- 102000004190 Enzymes Human genes 0.000 description 12
- 108090000790 Enzymes Proteins 0.000 description 12
- 108010061833 Integrases Proteins 0.000 description 12
- 230000001419 dependent effect Effects 0.000 description 12
- 230000014509 gene expression Effects 0.000 description 12
- 108010042407 Endonucleases Proteins 0.000 description 11
- 102000013609 MutL Protein Homolog 1 Human genes 0.000 description 11
- 238000005580 one pot reaction Methods 0.000 description 11
- 238000003786 synthesis reaction Methods 0.000 description 11
- 230000033616 DNA repair Effects 0.000 description 10
- 238000002474 experimental method Methods 0.000 description 10
- 238000001890 transfection Methods 0.000 description 10
- 230000004568 DNA-binding Effects 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 9
- 238000005457 optimization Methods 0.000 description 9
- 102100031780 Endonuclease Human genes 0.000 description 8
- 238000012512 characterization method Methods 0.000 description 8
- 230000009977 dual effect Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 208000024891 symptom Diseases 0.000 description 8
- 230000007018 DNA scission Effects 0.000 description 7
- 230000003197 catalytic effect Effects 0.000 description 7
- 230000030648 nucleus localization Effects 0.000 description 7
- 239000000047 product Substances 0.000 description 7
- 241000193996 Streptococcus pyogenes Species 0.000 description 6
- 238000003776 cleavage reaction Methods 0.000 description 6
- 238000010790 dilution Methods 0.000 description 6
- 239000012895 dilution Substances 0.000 description 6
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 description 6
- 230000006872 improvement Effects 0.000 description 6
- 210000004962 mammalian cell Anatomy 0.000 description 6
- 230000010076 replication Effects 0.000 description 6
- 102220316605 rs577020327 Human genes 0.000 description 6
- 230000007017 scission Effects 0.000 description 6
- 241000894007 species Species 0.000 description 6
- 108091027544 Subgenomic mRNA Proteins 0.000 description 5
- 108010010574 Tn3 resolvase Proteins 0.000 description 5
- 241000700605 Viruses Species 0.000 description 5
- 239000002299 complementary DNA Substances 0.000 description 5
- 230000003247 decreasing effect Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000004927 fusion Effects 0.000 description 5
- 238000009396 hybridization Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 239000003607 modifier Substances 0.000 description 5
- 230000001105 regulatory effect Effects 0.000 description 5
- 102220259379 rs767909551 Human genes 0.000 description 5
- 238000012163 sequencing technique Methods 0.000 description 5
- 239000000758 substrate Substances 0.000 description 5
- 108091032955 Bacterial small RNA Proteins 0.000 description 4
- 108010051219 Cre recombinase Proteins 0.000 description 4
- 102220526137 Dihydrofolate reductase_L23Y_mutation Human genes 0.000 description 4
- 102000012330 Integrases Human genes 0.000 description 4
- 108091092195 Intron Proteins 0.000 description 4
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 4
- 101100084404 Mus musculus Prodh gene Proteins 0.000 description 4
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 4
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 4
- 108020004682 Single-Stranded DNA Proteins 0.000 description 4
- 108010052160 Site-specific recombinase Proteins 0.000 description 4
- 241000191967 Staphylococcus aureus Species 0.000 description 4
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 4
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 4
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 4
- 230000001580 bacterial effect Effects 0.000 description 4
- 108020001778 catalytic domains Proteins 0.000 description 4
- 238000006555 catalytic reaction Methods 0.000 description 4
- 230000005714 functional activity Effects 0.000 description 4
- 238000010369 molecular cloning Methods 0.000 description 4
- 229920000642 polymer Polymers 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000000746 purification Methods 0.000 description 4
- 102220253549 rs11203289 Human genes 0.000 description 4
- 102220243326 rs1183892581 Human genes 0.000 description 4
- 102220271471 rs1336300148 Human genes 0.000 description 4
- 102220233231 rs757145186 Human genes 0.000 description 4
- 230000003007 single stranded DNA break Effects 0.000 description 4
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 4
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 4
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 3
- 108091023037 Aptamer Proteins 0.000 description 3
- 108700004991 Cas12a Proteins 0.000 description 3
- 108091026890 Coding region Proteins 0.000 description 3
- 108020004705 Codon Proteins 0.000 description 3
- 102000004533 Endonucleases Human genes 0.000 description 3
- 108090000652 Flap endonucleases Proteins 0.000 description 3
- 102000004150 Flap endonucleases Human genes 0.000 description 3
- 101000756632 Homo sapiens Actin, cytoplasmic 1 Proteins 0.000 description 3
- 101001003581 Homo sapiens Lamin-B1 Proteins 0.000 description 3
- 101001109620 Homo sapiens Nucleolar and coiled-body phosphoprotein 1 Proteins 0.000 description 3
- 101710203526 Integrase Proteins 0.000 description 3
- 102100026517 Lamin-B1 Human genes 0.000 description 3
- 102100022726 Nucleolar and coiled-body phosphoprotein 1 Human genes 0.000 description 3
- 102000006382 Ribonucleases Human genes 0.000 description 3
- 108010083644 Ribonucleases Proteins 0.000 description 3
- 101710145752 Serine recombinase gin Proteins 0.000 description 3
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 3
- 230000004075 alteration Effects 0.000 description 3
- 230000033228 biological regulation Effects 0.000 description 3
- 230000002759 chromosomal effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000002955 isolation Methods 0.000 description 3
- 230000000670 limiting effect Effects 0.000 description 3
- 239000013642 negative control Substances 0.000 description 3
- 230000006780 non-homologous end joining Effects 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000001225 therapeutic effect Effects 0.000 description 3
- 230000002103 transcriptional effect Effects 0.000 description 3
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 2
- ZAYHVCMSTBRABG-JXOAFFINSA-N 5-methylcytidine Chemical compound O=C1N=C(N)C(C)=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](CO)O1 ZAYHVCMSTBRABG-JXOAFFINSA-N 0.000 description 2
- 108010088141 Argonaute Proteins Proteins 0.000 description 2
- 102000008682 Argonaute Proteins Human genes 0.000 description 2
- 241000894006 Bacteria Species 0.000 description 2
- 108020004635 Complementary DNA Proteins 0.000 description 2
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 2
- 102220605874 Cytosolic arginine sensor for mTORC1 subunit 2_D10A_mutation Human genes 0.000 description 2
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 2
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 2
- 238000012286 ELISA Assay Methods 0.000 description 2
- 101710191360 Eosinophil cationic protein Proteins 0.000 description 2
- 108010046276 FLP recombinase Proteins 0.000 description 2
- 108700007698 Genetic Terminator Regions Proteins 0.000 description 2
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 2
- 108091027305 Heteroduplex Proteins 0.000 description 2
- 108010015268 Integration Host Factors Proteins 0.000 description 2
- 241000124008 Mammalia Species 0.000 description 2
- 108091092724 Noncoding DNA Proteins 0.000 description 2
- 108091007494 Nucleic acid- binding domains Proteins 0.000 description 2
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 2
- 102100036007 Ribonuclease 3 Human genes 0.000 description 2
- 101710192197 Ribonuclease 3 Proteins 0.000 description 2
- 102000003661 Ribonuclease III Human genes 0.000 description 2
- 108010057163 Ribonuclease III Proteins 0.000 description 2
- 101150081851 SMN1 gene Proteins 0.000 description 2
- 102220493196 Sodium/calcium exchanger 3_I78V_mutation Human genes 0.000 description 2
- 108700015512 Staphylococcus aureus sin Proteins 0.000 description 2
- 241000194020 Streptococcus thermophilus Species 0.000 description 2
- 238000000692 Student's t-test Methods 0.000 description 2
- 241001661355 Synapsis Species 0.000 description 2
- 108010020764 Transposases Proteins 0.000 description 2
- 102000008579 Transposases Human genes 0.000 description 2
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical compound [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000008970 bacterial immunity Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 108010051210 beta-Fructofuranosidase Proteins 0.000 description 2
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 2
- 210000004899 c-terminal region Anatomy 0.000 description 2
- 238000010804 cDNA synthesis Methods 0.000 description 2
- 108091092356 cellular DNA Proteins 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000012239 gene modification Methods 0.000 description 2
- 230000005017 genetic modification Effects 0.000 description 2
- 235000013617 genetically modified food Nutrition 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 238000002744 homologous recombination Methods 0.000 description 2
- 230000006801 homologous recombination Effects 0.000 description 2
- 230000036039 immunity Effects 0.000 description 2
- 235000011073 invertase Nutrition 0.000 description 2
- 238000005304 joining Methods 0.000 description 2
- 230000035800 maturation Effects 0.000 description 2
- 239000002777 nucleoside Substances 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 2
- 230000037048 polymerization activity Effects 0.000 description 2
- 230000037452 priming Effects 0.000 description 2
- 238000003753 real-time PCR Methods 0.000 description 2
- 230000007115 recruitment Effects 0.000 description 2
- 102220086654 rs864622439 Human genes 0.000 description 2
- 210000003568 synaptosome Anatomy 0.000 description 2
- 238000012353 t test Methods 0.000 description 2
- 229940113082 thymine Drugs 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 229940035893 uracil Drugs 0.000 description 2
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 2
- 229940045145 uridine Drugs 0.000 description 2
- 229910052725 zinc Inorganic materials 0.000 description 2
- 239000011701 zinc Substances 0.000 description 2
- MTCFGRXMJLQNBG-REOHCLBHSA-N (2S)-2-Amino-3-hydroxypropansäure Chemical compound OC[C@H](N)C(O)=O MTCFGRXMJLQNBG-REOHCLBHSA-N 0.000 description 1
- RIFDKYBNWNPCQK-IOSLPCCCSA-N (2r,3s,4r,5r)-2-(hydroxymethyl)-5-(6-imino-3-methylpurin-9-yl)oxolane-3,4-diol Chemical compound C1=2N(C)C=NC(=N)C=2N=CN1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O RIFDKYBNWNPCQK-IOSLPCCCSA-N 0.000 description 1
- RKSLVDIXBGWPIS-UAKXSSHOSA-N 1-[(2r,3r,4s,5r)-3,4-dihydroxy-5-(hydroxymethyl)oxolan-2-yl]-5-iodopyrimidine-2,4-dione Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C(I)=C1 RKSLVDIXBGWPIS-UAKXSSHOSA-N 0.000 description 1
- PISWNSOQFZRVJK-XLPZGREQSA-N 1-[(2r,4s,5r)-4-hydroxy-5-(hydroxymethyl)oxolan-2-yl]-5-methyl-2-sulfanylidenepyrimidin-4-one Chemical compound S=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 PISWNSOQFZRVJK-XLPZGREQSA-N 0.000 description 1
- GFYLSDSUCHVORB-IOSLPCCCSA-N 1-methyladenosine Chemical compound C1=NC=2C(=N)N(C)C=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O GFYLSDSUCHVORB-IOSLPCCCSA-N 0.000 description 1
- UTAIYTHAJQNQDW-KQYNXXCUSA-N 1-methylguanosine Chemical compound C1=NC=2C(=O)N(C)C(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O UTAIYTHAJQNQDW-KQYNXXCUSA-N 0.000 description 1
- RFCQJGFZUQFYRF-UHFFFAOYSA-N 2'-O-Methylcytidine Natural products COC1C(O)C(CO)OC1N1C(=O)N=C(N)C=C1 RFCQJGFZUQFYRF-UHFFFAOYSA-N 0.000 description 1
- SXUXMRMBWZCMEN-UHFFFAOYSA-N 2'-O-methyl uridine Natural products COC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 SXUXMRMBWZCMEN-UHFFFAOYSA-N 0.000 description 1
- RFCQJGFZUQFYRF-ZOQUXTDFSA-N 2'-O-methylcytidine Chemical class CO[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)N=C(N)C=C1 RFCQJGFZUQFYRF-ZOQUXTDFSA-N 0.000 description 1
- YKBGVTZYEHREMT-KVQBGUIXSA-N 2'-deoxyguanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](CO)O1 YKBGVTZYEHREMT-KVQBGUIXSA-N 0.000 description 1
- CKTSBUTUHBMZGZ-SHYZEUOFSA-N 2'‐deoxycytidine Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 CKTSBUTUHBMZGZ-SHYZEUOFSA-N 0.000 description 1
- ZDTFMPXQUSBYRL-UUOKFMHZSA-N 2-Aminoadenosine Chemical compound C12=NC(N)=NC(N)=C2N=CN1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O ZDTFMPXQUSBYRL-UUOKFMHZSA-N 0.000 description 1
- GOJUJUVQIVIZAV-UHFFFAOYSA-N 2-amino-4,6-dichloropyrimidine-5-carbaldehyde Chemical group NC1=NC(Cl)=C(C=O)C(Cl)=N1 GOJUJUVQIVIZAV-UHFFFAOYSA-N 0.000 description 1
- JRYMOPZHXMVHTA-DAGMQNCNSA-N 2-amino-7-[(2r,3r,4s,5r)-3,4-dihydroxy-5-(hydroxymethyl)oxolan-2-yl]-1h-pyrrolo[2,3-d]pyrimidin-4-one Chemical compound C1=CC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O JRYMOPZHXMVHTA-DAGMQNCNSA-N 0.000 description 1
- RHFUOMFWUGWKKO-XVFCMESISA-N 2-thiocytidine Chemical compound S=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](CO)O1 RHFUOMFWUGWKKO-XVFCMESISA-N 0.000 description 1
- BCZUPRDAAVVBSO-MJXNYTJMSA-N 4-acetylcytidine Chemical compound C1=CC(C(=O)C)(N)NC(=O)N1[C@H]1[C@H](O)[C@H](O)[C@@H](CO)O1 BCZUPRDAAVVBSO-MJXNYTJMSA-N 0.000 description 1
- UVGCZRPOXXYZKH-QADQDURISA-N 5-(carboxyhydroxymethyl)uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C(C(O)C(O)=O)=C1 UVGCZRPOXXYZKH-QADQDURISA-N 0.000 description 1
- ZAYHVCMSTBRABG-UHFFFAOYSA-N 5-Methylcytidine Natural products O=C1N=C(N)C(C)=CN1C1C(O)C(O)C(CO)O1 ZAYHVCMSTBRABG-UHFFFAOYSA-N 0.000 description 1
- MMUBPEFMCTVKTR-IBNKKVAHSA-N 5-[(2s,3r,4s,5r)-3,4-dihydroxy-5-(hydroxymethyl)-2-methyloxolan-2-yl]-1h-pyrimidine-2,4-dione Chemical compound C=1NC(=O)NC(=O)C=1[C@]1(C)O[C@H](CO)[C@@H](O)[C@H]1O MMUBPEFMCTVKTR-IBNKKVAHSA-N 0.000 description 1
- AGFIRQJZCNVMCW-UAKXSSHOSA-N 5-bromouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C(Br)=C1 AGFIRQJZCNVMCW-UAKXSSHOSA-N 0.000 description 1
- FHIDNBAQOFJWCA-UAKXSSHOSA-N 5-fluorouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C(F)=C1 FHIDNBAQOFJWCA-UAKXSSHOSA-N 0.000 description 1
- KDOPAZIWBAHVJB-UHFFFAOYSA-N 5h-pyrrolo[3,2-d]pyrimidine Chemical compound C1=NC=C2NC=CC2=N1 KDOPAZIWBAHVJB-UHFFFAOYSA-N 0.000 description 1
- BXJHWYVXLGLDMZ-UHFFFAOYSA-N 6-O-methylguanine Chemical compound COC1=NC(N)=NC2=C1NC=N2 BXJHWYVXLGLDMZ-UHFFFAOYSA-N 0.000 description 1
- UEHOMUNTZPIBIL-UUOKFMHZSA-N 6-amino-9-[(2r,3r,4s,5r)-3,4-dihydroxy-5-(hydroxymethyl)oxolan-2-yl]-7h-purin-8-one Chemical compound O=C1NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O UEHOMUNTZPIBIL-UUOKFMHZSA-N 0.000 description 1
- HCAJQHYUCKICQH-VPENINKCSA-N 8-Oxo-7,8-dihydro-2'-deoxyguanosine Chemical compound C1=2NC(N)=NC(=O)C=2NC(=O)N1[C@H]1C[C@H](O)[C@@H](CO)O1 HCAJQHYUCKICQH-VPENINKCSA-N 0.000 description 1
- HDZZVAMISRMYHH-UHFFFAOYSA-N 9beta-Ribofuranosyl-7-deazaadenin Natural products C1=CC=2C(N)=NC=NC=2N1C1OC(CO)C(O)C1O HDZZVAMISRMYHH-UHFFFAOYSA-N 0.000 description 1
- 241000251468 Actinopterygii Species 0.000 description 1
- 241000380131 Ammophila arenaria Species 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- DWRXFEITVBNRMK-UHFFFAOYSA-N Beta-D-1-Arabinofuranosylthymine Natural products O=C1NC(=O)C(C)=CN1C1C(O)C(O)C(CO)O1 DWRXFEITVBNRMK-UHFFFAOYSA-N 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- 108010040467 CRISPR-Associated Proteins Proteins 0.000 description 1
- 238000010356 CRISPR-Cas9 genome editing Methods 0.000 description 1
- 238000010453 CRISPR/Cas method Methods 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- 101710132601 Capsid protein Proteins 0.000 description 1
- 108091006146 Channels Proteins 0.000 description 1
- 101710094648 Coat protein Proteins 0.000 description 1
- MIKUYHXYGGJMLM-GIMIYPNGSA-N Crotonoside Natural products C1=NC2=C(N)NC(=O)N=C2N1[C@H]1O[C@@H](CO)[C@H](O)[C@@H]1O MIKUYHXYGGJMLM-GIMIYPNGSA-N 0.000 description 1
- NYHBQMYGNKIUIF-UHFFFAOYSA-N D-guanosine Natural products C1=2NC(N)=NC(=O)C=2N=CN1C1OC(CO)C(O)C1O NYHBQMYGNKIUIF-UHFFFAOYSA-N 0.000 description 1
- HMFHBZSHGGEWLO-SOOFDHNKSA-N D-ribofuranose Chemical class OC[C@H]1OC(O)[C@H](O)[C@@H]1O HMFHBZSHGGEWLO-SOOFDHNKSA-N 0.000 description 1
- 238000010442 DNA editing Methods 0.000 description 1
- 238000012270 DNA recombination Methods 0.000 description 1
- 230000004543 DNA replication Effects 0.000 description 1
- CKTSBUTUHBMZGZ-UHFFFAOYSA-N Deoxycytidine Natural products O=C1N=C(N)C=CN1C1OC(CO)C(O)C1 CKTSBUTUHBMZGZ-UHFFFAOYSA-N 0.000 description 1
- 241000702421 Dependoparvovirus Species 0.000 description 1
- 241000255601 Drosophila melanogaster Species 0.000 description 1
- 101000764582 Enterobacteria phage T4 Tape measure protein Proteins 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 101000621102 Escherichia phage Mu Portal protein Proteins 0.000 description 1
- 241000701959 Escherichia virus Lambda Species 0.000 description 1
- 108091029865 Exogenous DNA Proteins 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 108060002716 Exonuclease Proteins 0.000 description 1
- 102000009095 Fanconi Anemia Complementation Group A protein Human genes 0.000 description 1
- 108010087740 Fanconi Anemia Complementation Group A protein Proteins 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241000724791 Filamentous phage Species 0.000 description 1
- 208000034951 Genetic Translocation Diseases 0.000 description 1
- 102100021181 Golgi phosphoprotein 3 Human genes 0.000 description 1
- 108060003760 HNH nuclease Proteins 0.000 description 1
- 102000029812 HNH nuclease Human genes 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 208000026350 Inborn Genetic disease Diseases 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- 241000270322 Lepidosauria Species 0.000 description 1
- 241000186805 Listeria innocua Species 0.000 description 1
- 102100025354 Macrophage mannose receptor 1 Human genes 0.000 description 1
- 101710125418 Major capsid protein Proteins 0.000 description 1
- 201000009906 Meningitis Diseases 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 241000714177 Murine leukemia virus Species 0.000 description 1
- 101710135898 Myc proto-oncogene protein Proteins 0.000 description 1
- 102100038895 Myc proto-oncogene protein Human genes 0.000 description 1
- 241001343377 Mycobacterium virus Bxb1 Species 0.000 description 1
- VQAYFKKCNSOZKM-IOSLPCCCSA-N N(6)-methyladenosine Chemical compound C1=NC=2C(NC)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O VQAYFKKCNSOZKM-IOSLPCCCSA-N 0.000 description 1
- VQAYFKKCNSOZKM-UHFFFAOYSA-N NSC 29409 Natural products C1=NC=2C(NC)=NC=NC=2N1C1OC(CO)C(O)C1O VQAYFKKCNSOZKM-UHFFFAOYSA-N 0.000 description 1
- 241000588653 Neisseria Species 0.000 description 1
- 241000244206 Nematoda Species 0.000 description 1
- 101710141454 Nucleoprotein Proteins 0.000 description 1
- 108700026244 Open Reading Frames Proteins 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 101710083689 Probable capsid protein Proteins 0.000 description 1
- PYMYPHUHKUWMLA-LMVFSUKVSA-N Ribose Natural products OC[C@@H](O)[C@@H](O)[C@@H](O)C=O PYMYPHUHKUWMLA-LMVFSUKVSA-N 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 241000194017 Streptococcus Species 0.000 description 1
- RYYWUUFWQRZTIU-UHFFFAOYSA-N Thiophosphoric acid Chemical class OP(O)(S)=O RYYWUUFWQRZTIU-UHFFFAOYSA-N 0.000 description 1
- 101710150448 Transcriptional regulator Myc Proteins 0.000 description 1
- 241000589892 Treponema denticola Species 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 210000005006 adaptive immune system Anatomy 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 230000003281 allosteric effect Effects 0.000 description 1
- HMFHBZSHGGEWLO-UHFFFAOYSA-N alpha-D-Furanose-Ribose Natural products OCC1OC(O)C(O)C1O HMFHBZSHGGEWLO-UHFFFAOYSA-N 0.000 description 1
- 150000001408 amides Chemical group 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- PYMYPHUHKUWMLA-WDCZJNDASA-N arabinose Chemical class OC[C@@H](O)[C@@H](O)[C@H](O)C=O PYMYPHUHKUWMLA-WDCZJNDASA-N 0.000 description 1
- PYMYPHUHKUWMLA-UHFFFAOYSA-N arabinose Chemical class OCC(O)C(O)C(O)C=O PYMYPHUHKUWMLA-UHFFFAOYSA-N 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- SRBFZHDQGSBBOR-UHFFFAOYSA-N beta-D-Pyranose-Lyxose Chemical class OC1COC(O)C(O)C1O SRBFZHDQGSBBOR-UHFFFAOYSA-N 0.000 description 1
- IQFYYKKMVGJFEH-UHFFFAOYSA-N beta-L-thymidine Natural products O=C1NC(=O)C(C)=CN1C1OC(CO)C(O)C1 IQFYYKKMVGJFEH-UHFFFAOYSA-N 0.000 description 1
- 230000008275 binding mechanism Effects 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 125000000837 carbohydrate group Chemical group 0.000 description 1
- 230000022131 cell cycle Effects 0.000 description 1
- 210000003855 cell nucleus Anatomy 0.000 description 1
- 239000013043 chemical agent Substances 0.000 description 1
- 150000005829 chemical entities Chemical class 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 230000009699 differential effect Effects 0.000 description 1
- ZPTBLXKRQACLCR-XVFCMESISA-N dihydrouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)CC1 ZPTBLXKRQACLCR-XVFCMESISA-N 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 102000013165 exonuclease Human genes 0.000 description 1
- 125000004030 farnesyl group Chemical group [H]C([*])([H])C([H])=C(C([H])([H])[H])C([H])([H])C([H])([H])C([H])=C(C([H])([H])[H])C([H])([H])C([H])([H])C([H])=C(C([H])([H])[H])C([H])([H])[H] 0.000 description 1
- 125000005313 fatty acid group Chemical group 0.000 description 1
- 210000002950 fibroblast Anatomy 0.000 description 1
- 238000000684 flow cytometry Methods 0.000 description 1
- 238000007306 functionalization reaction Methods 0.000 description 1
- 108010089843 gamma delta resolvase Proteins 0.000 description 1
- 230000004545 gene duplication Effects 0.000 description 1
- 238000003209 gene knockout Methods 0.000 description 1
- 238000001415 gene therapy Methods 0.000 description 1
- 208000016361 genetic disease Diseases 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 150000002402 hexoses Chemical class 0.000 description 1
- 210000005260 human cell Anatomy 0.000 description 1
- 125000002887 hydroxy group Chemical group [H]O* 0.000 description 1
- 230000007124 immune defense Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 239000001573 invertase Substances 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000010534 mechanism of action Effects 0.000 description 1
- 230000000813 microbial effect Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000002703 mutagenesis Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 230000025308 nuclear transport Effects 0.000 description 1
- 230000000269 nucleophilic effect Effects 0.000 description 1
- 150000003833 nucleoside derivatives Chemical class 0.000 description 1
- 125000003835 nucleoside group Chemical group 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 150000008300 phosphoramidites Chemical class 0.000 description 1
- 229920002401 polyacrylamide Polymers 0.000 description 1
- 230000000379 polymerizing effect Effects 0.000 description 1
- 210000001236 prokaryotic cell Anatomy 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
- 235000004252 protein component Nutrition 0.000 description 1
- 108020001580 protein domains Proteins 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000008672 reprogramming Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001177 retroviral effect Effects 0.000 description 1
- 102220094365 rs776407427 Human genes 0.000 description 1
- RHFUOMFWUGWKKO-UHFFFAOYSA-N s2C Natural products S=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 RHFUOMFWUGWKKO-UHFFFAOYSA-N 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 235000000346 sugar Nutrition 0.000 description 1
- 150000008163 sugars Chemical class 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 210000000225 synapse Anatomy 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 229940104230 thymidine Drugs 0.000 description 1
- HDZZVAMISRMYHH-KCGFPETGSA-N tubercidin Chemical compound C1=CC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O HDZZVAMISRMYHH-KCGFPETGSA-N 0.000 description 1
- 241001515965 unidentified phage Species 0.000 description 1
- 239000013603 viral vector Substances 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases [RNase]; Deoxyribonucleases [DNase]
Definitions
- Homing endonucleases and programmable endonucleases such as zinc finger nucleases, TALE nucleases, and Cas9 nucleases, have been used to introduce targeted DSBs and induce HDR in the presence of donor DNA. In most post-mitotic cells,
- DSB-induced HDR is strongly down-regulated and generally inefficient.
- repair of DSBs by error-prone repair pathways such as non-homologous end-joining (NHEJ) or single-strand annealing (SSA) causes random insertions or deletions (indels) of nucleotides at the DSB site at a higher frequency than HDR.
- NHEJ non-homologous end-joining
- SSA single-strand annealing
- the efficiency of HDR can be increased if cells are subjected to conditions forcing cell-cycle synchronization, or if the enzymes involved in NHEJ are inhibited. However, such conditions can cause many random and unpredictable events, limiting potential applications.
- the present disclosure describes the development of evolved and engineered recombinases using PACE, PANCE, and other genetic engineering methods.
- the recombinase variants provided herein exhibit increased recombination activity, for example, at recombinase recognition sites that have been introduced into a nucleic acid (e.g., genome of an organism) using prime editing.
- the recombinase variants provided herein exhibit increased insertion efficiency of donor DNA molecules at recombinase recognition sites of 2-fold or more (e.g., 3-fold or more, 4-fold or more, 5-fold or more, 6-fold or more, 7-fold or more, 8-fold or more, 9-fold or more, or 10-fold or more) relative to wild-type Bxb1 recombinase.
- the instant disclosure provides evolved and engineered recombinases, systems, compositions, polynucleotides and vectors, kits, and methods that leverage the power of prime editing (PE), e.g., single-flap or “classical” PE, twinPE, or multi-flap PE (also known as quadruple flap PE), to carry out site-specific and large-scale genetic modifications.
- PE prime editing
- modifications include, but are not limited to, insertions, deletions, inversions, replacements, and chromosomal
- the present disclosure provides Bxb1 recombinases comprising an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions at positions selected from the group consisting of amino acid residues 3, 5, 10, 14, 15, 20, 23, 24, 25, 29, 35, 36, 39, 40, 43, 45, 47, 49, 50, 51, 54, 58, 60, 66, 68, 69, 70
- the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3X, V5X, S10X, D14X, A15X, E20X, L23X, E24X, S25X, L29X, W35X, D36X, G39X, V40X, D43X, D45X, S47X, A49X, V50X, D51X, D54X, R58X, N60X, A66X, E68X, E69X, Q70X, D73X, V74X, I75X, Y78X, T84X, S86X, I87X, H89X, L93X, H95X, A97X, H100X, K101X, V105X, T116X, A119X, A124X, G127X, E139X, F147X, Y154
- a homologous recombinase e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase
- X represents any amino
- the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3T, A3V, V5F, V5I, S10A, D14N, A15T, E20K, E20Q, L23F, L23M, E24K, S25I, L29F, W35P, W35L, D36G, D36V, G39D, V40I, D43E, D45G, S47A, A49E, A49T, V50I, D51E, D51N, D51Y, D54N, R58K, N60S, A66T, E68K, E69D, Q70P, D73G, V74A, V74M, I75V, Y78H, Y78N, T84S, S86G, S86N, S86T, I87T, I87T, I87
- the Bxb1 recombinase comprises a combination of substitutions as in any one of the variants listed in Tables 1-7 provided herein.
- the Bxb1 recombinase comprises a V74X mutation relative to the amino acid sequence of SEQ ID NO: 1, wherein X is any amino acid other than the wild type amino acid.
- the Bxb1 recombinase comprises a V74A mutation relative to the amino acid sequence of SEQ ID NO: 1.
- the Bxb1 recombinase comprises V74X, E229X, and V375X mutations relative to the amino acid sequence of SEQ ID
- the Bxb recombinase comprises V74A, E229K, and V375I mutations relative to the amino acid sequence of SEQ ID NO: 1.
- the present disclosure provides systems comprising any of the Bxb1 recombinases provided herein.
- a system comprises any of the Bxb1 recombinases provided herein, a prime editor, and one or more prime editing guide RNAs (pegRNAs) comprising a DNA synthesis template encoding a recombinase recognition site.
- pegRNAs prime editing guide RNAs
- a system comprises a polynucleotide encoding any of the Bxb1 recombinases provided herein, a polynucleotide encoding a prime editor, and one or more prime editing guide RNAs (pegRNAs) or one or more polynucleotides encoding one or more pegRNAs, wherein each pegRNA comprises a DNA synthesis template encoding a recombinase recognition site.
- the system further comprises a polynucleotide comprising DNA (e.g., one or more donor genes) for insertion into a target nucleic acid (e.g., at a recombinase recognition site newly installed using prime editing).
- the DNA for insertion is flanked by one or two recombinase recognition sites.
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the present disclosure provides systems for inserting DNA into a target nucleic acid comprising: (i) a pegRNA or a first polynucleotide encoding the pegRNA, wherein the pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site.
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the present disclosure provides systems for exchanging DNA in a target nucleic acid comprising:
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the present disclosure provides systems for deleting DNA from a target nucleic acid comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the present disclosure provides systems for recombining target nucleic acids (e.g., in two chromosomes) comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the present disclosure provides systems for inverting a target nucleic acid comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation of the first recombinase recognition site, optionally wherein the second recombinase recognition site
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the present disclosure provides compositions comprising any of the Bxb1 recombinases provided herein.
- a composition comprises any of the Bxb1 recombinases provided herein, a prime editor, and one or more prime editing guide RNAs (pegRNAs) comprising a DNA synthesis template encoding a recombinase recognition site.
- pegRNAs prime editing guide RNAs
- a composition comprises a polynucleotide encoding any of the Bxb1 recombinases provided herein, a polynucleotide encoding a prime editor, and one or more prime editing guide RNAs (pegRNAs) or one or more polynucleotides encoding one or more
- pegRNAs prime editing guide RNAs
- compositions for inserting DNA into a target nucleic acid comprising: (i) a pegRNA or a first polynucleotide encoding the pegRNA, wherein the pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site.
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the present disclosure provides compositions for exchanging DNA in a target nucleic acid comprising: (i) one or more pegRNAs or a first polynucleotide encoding one or more pegRNAs, wherein each of the one or more pegRNAs comprises a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth poly
- compositions for deleting DNA from a target nucleic acid comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the present disclosure provides compositions for recombining target nucleic acids (e.g., in two chromosomes) comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as
- compositions for inverting a target nucleic acid comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation of the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the present disclosure provides polynucleotides and vectors encoding any of the Bxb1 recombinases described herein.
- the present disclosure provides cells comprising any of the Bxb1 recombinases, polynucleotides, vectors, and/or compositions described herein.
- the present disclosure provides kits comprising any of the Bxb1 recombinases, polynucleotides, vectors, and/or compositions described herein.
- the present disclosure provides methods for modifying one or more target nucleic acids in a cell comprising contacting the one or more target nucleic acids with any of the recombinases provided herein.
- the present disclosure provides methods for modifying a target nucleic acid in a cell using prime editing and a recombinase comprising expressing in the cell a polynucleotide encoding any of the Bxb1 recombinases provided herein, a polynucleotide encoding a prime editor, and one or more polynucleotides encoding one or more prime editing guide RNAs (pegRNAs) comprising DNA synthesis templates encoding one or more recombinase recognition sites.
- pegRNAs prime editing guide RNAs
- the present disclosure provides methods for modifying a target nucleic acid (e.g., DNA, such as genomic DNA) in a cell using prime editing and a recombinase comprising expressing in the cell a polynucleotide encoding any of the Bxb1 recombinases
- a target nucleic acid e.g., DNA, such as genomic DNA
- prime editing and a recombinase comprising expressing in the cell a polynucleotide encoding any of the Bxb1 recombinases
- the present disclosure provides methods for inserting DNA into a target nucleic acid in a cell using prime editing and a Bxb1 recombinase provided herein.
- the method comprises expressing in a cell: (i) a first polynucleotide encoding a pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs the first recombinase recognition site in the target nucleic acid, thereby facilitating Bx
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the method is a method for exchanging DNA in a target nucleic acid in a cell using prime editing and a recombinase.
- the method comprises expressing in a cell: (i) a first polynucleotide encoding one or more pegRNAs comprising a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and
- a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs a first instance and a second instance of the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination between the first recombinase recognition sites in the target nucleic acid and the second recombinase recognition sites flanking the DNA, resulting in excision of the target nucleic acid sequence between the first instance and the second instance of the first recombinase recognition site and insertion of the DNA in its place.
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the method is a method for deleting DNA from a target nucleic acid in a cell using prime editing and a recombinase.
- the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the B
- the method is a method for recombining target nucleic acids (e.g., in two chromosomes) in a cell using prime editing and a recombinase.
- the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site on a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- the method is a method for inverting a target nucleic acid in a cell using prime editing and a recombinase.
- the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation as the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein;
- the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
- FIG.1 provides schematics for applications of using prime editors in tandem with large serine recombinase (Bxb1), e.g., for large DNA integration, cassette exchange, genomic deletion, gene inversion, and translocation.
- Evolved Bxb1 eBxb1 enhances one-pot and sequential editing of large genomic DNA.
- the present disclosure describes the use of directed evolution to yield novel Bxb1 variants that overcome intrinsic enzymatic barriers and enhance gene editing efficiency. Combination of engineered prime editors and eBxb1 variants further enhances large DNA editing, expanding the number of pathogenic mutations that are targetable via ex vivo and in vivo therapeutics.
- FIGs.2A-2B provide schematics of PACE and PANCE circuits for evolving recombinases for integration (insertion of DNA from, for example, a donor plasmid or other nucleic acid molecule at a recombinase recognition site) (FIG.2A) and cassette exchange (exchange of DNA from, for example, a donor plasmid or other nucleic acid molecule with a DNA sequence between two recombinase recognition sites) (FIG.2B).
- the selection phage (SP) encodes for the recombinase being evolved.
- plasmid P1 contains a
- FIG.2A Plasmid integration circuit. A single att site is present in the AP and P1 plasmids. After recombination, the AP and P1 combine to form a large AP+P1 plasmid.
- FIG.2B Cassette-exchange circuit. Two att sites, facing each other, are located in the AP and P1 plasmids. After recombination, sequences present between the att sites in each plasmid are recombined.
- FIG.3 provides a flow chart for design of the evolution trajectory of PACE and PANCE for evolving Bxb1 recombinase. Seven individual evolution campaigns were performed to generate a suit of Bxb1 variants.
- FIG.4 shows genotypes of evolved Bxb1 recombinase variants from PANCE version 1 (v1) evolution.
- FIG.5 shows characterization of PANCEv1 variants in AAVS1-attP and CCR5-attB HEK293T stable cells with either attP in the AAVS1 locus or attB in the CCR5 locus. Codon- optimized Bxb1 with PANCEv1 mutations was used, and percent knockin was assessed by ddPCR.
- FIG.6 shows characterization of PANCEv1 variants in a one-pot system (transfection of HEK293T cells with Bxb1, PE2, dual pegRNAs, and DNA donor at the same time). The prime editor first installs the att site into the genome. The recombinase then integrates the desired DNA cargo.
- FIG.7 shows variants from PANCEv1 that show improved integration efficiency over wild type enzyme.
- FIG.8 shows that variants from PANCEv2 improve 5.6 kb donor integration in a one-pot system (transfection of HEK293T cells with Bxb1, PE2, dual pegRNAs, and DNA donor). Twenty-two variants showed improved knockin efficiency over codon optimized wild type Bxb1 (“Bxb1-CO”) system. The highest fold-improvement observed was 1.6-fold. Data was generated using ddPCR, and a 5.6 kB of donor DNA plasmid was used.
- FIG.9 shows variants from PANCEv2 that show improved integration efficiency in mammalian cells over wild type enzyme.
- FIG.10 shows that PACEv1 yields variants that improve gene integration in the one-pot system in HEK293T cells. Fourteen out of twenty PACEv1 variants showed improved knockin efficiency over the codon optimized wild type Bxb1 (“coBxb1”) system. The highest fold- improvement observed was two-fold. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used.
- FIG.11 shows variants from PACEv1 that showed improved DNA integration efficiency in mammalian cells over wild type enzyme.
- FIG.12 shows a first test of DNA integration efficiency with new variants from PANCEv3, PANCEv4, PACEv3, and PANCEv5 in stable HEK293T cells with either attP in the AAVS1 locus or attB in the CCR5 locus.100 ng large serine recombinase (LSR; Bxb1) and 150 ng 5.6 kb DNA donor were used. Data was generated using ddPCR.
- Bxb1 variants showed improved activity compared to codon-optimized wild type Bxb1 (“Bxb1-CO”).
- B-30 WT Bxb1 wild type Bxb1.
- Bxb1-CO codon-optimized wild type Bxb1.
- FIG.13 shows a second test of DNA integration efficiency with new variants from PANCEv3, PANCEv4, PACEv3, and PANCEv5 in HEK293T stable cells with attP in the AAVS1 locus.100 ng LSR and 150 ng 5.6 kb DNA donor were used. Data was generated using ddPCR.
- Bxb1-CO codon-optimized wild type Bxb1.
- B100.74 (V74M, M342V), B100.50 (V5I, V74M, M239L, T453N, G468D), B100.51 (V5I, S86N, S157G, K214R, E273D, E361G), B100.32 (D14N), B100.36 (V105I), B.55 (Y78N), B100.38 (I78V, E361D), B100.41 (A49T, S86T, T116P), B.63 (L29F), B.59 (D51Y), B100.65 (V5I, D14N, R207Q), B100.75 (V74A), B100.72 (V50I, I87V, R208S, V375I), B100.76 (D51E, V375I), B100.73 (D51E, V375I).
- FIG.14 shows improved variants from PANCEv3, PANCEv4, PACEv3, and PANCEv5 in HEK293T stable cells. Thirty-six out of eighty-one variants tested from five different evolutions were > 1.1x better on average than wild type Bxb1 in stable cell lines with either attP in the AAVS1 locus or attB in the CCR5 locus. Data was generated using ddPCR, and a 5.6 kB
- B1195.70174WO00 12131093.2 donor DNA plasmid was used.
- B-30 WT Bxb1 wild type Bxb1.
- Bxb1-CO codon-optimized wild type Bxb1.
- B-30 WT Bxb1 wild type Bxb1.
- Bxb1-CO codon-optimized wild type Bxb1.
- B100.16 (D14N, E267D), B100.43 (E20Q, E361D), B100.28 (E20Q), B100.49 (V5I, S86N, H321N), B100.50 (V5I, V74M, M239L, T453N, G468D), B100.17 (D14N, E273D), B100.30, B100.32 (D14N), B100.23 (D14N, E483K), B100.76 (D51E, V375I), B100.41 (A49T, S86T, T116P), B100.36 (V105I), B100.65 (V5I, D14N, R207Q), B100.73 (D51E, V375I), B100.38 (I78V, E361D), B100.18 (L29F), B100.72 (V50I, I87V, R208S, V375I), B100.75 (V74A).
- FIG.15 shows variants from PANCEv3, PANCEv4, PACEv3, and PANCEv5 that show improved integration efficiency over wild type enzyme in mammalian cells.
- FIG.16 shows that combined single and double mutations in Bxb1 improve DNA integration efficiency.55 ng LSR and 55 ng 5.6 kb DNA donor were used. A stable cell line with attP in the AAVS1 locus was used. A low dose of recombinase and DNA donor was used to dissect differences between variants. Three double mutants showed improvements over single point mutant variants. Data was generated using ddPCR.
- FIG.17 shows DNA integration efficiency of additional Bxb1 double mutants. Combining mutations from PACE improves large-gene integration. Additional double-mutant variants were assessed in stable cell lines with either attP in the AAVS1 locus or attB in the CCR5 locus. Eight different double-mutant variants showed improvements over single-mutant variants. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used.
- FIG.18 shows DNA integration efficiency of further Bxb1 double mutants. Additional double-mutant variants were assessed in stable cell lines with either attP in the AAVS1 locus or attB in the CCR5 locus. Five different double-mutant variants showed improvements over single- mutant variants. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used. [0052] FIG.19 shows rationally designed Bxb1 double mutants that show improved large-gene integration efficiency in mammalian cells. [0053] FIG.20 shows further one-pot delivery optimization of plasmid doses. Optimization of plasmid dose was shown to be important for improving one-pot knockin efficiencies. Integration efficiencies were quantified using unique molecular identifier (UMI) barcoding.
- UMI unique molecular identifier
- FIG.21 shows additional optimization of plasmid doses for one-pot delivery. Integration efficiencies were quantified using ddPCR. Optimal plasmid condition is highlighted by the black box. The plasmid dose that resulted in the highest integration efficiency was 100 ng of PE2- encoding plasmid, 10 ng each of pegRNA-encoding plasmid and Bxb1 recombinase-encoding plasmid, and 150 ng of DNA donor plasmid.
- FIG.22 shows a schematic for further one-pot delivery optimization of trimmed dual pegRNAs to promote donor DNA integration into the genome and avoid donor/pegRNA plasmid recombination.
- FIG.23 shows data for further one-pot delivery optimization of trimmed dual pegRNAs. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used. The B200.1 variant showed the best donor integration efficiency (up to 12.5 % efficiency with a trimmed pegRNA pair that contains a 28 bp overlap length).
- FIG.24 shows further optimization of one-pot delivery through the use of evolved reverse transcriptase variants. Variants of the reverse transcriptases Ec48 and Tf1 showed the highest integration efficiency. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used.
- FIG.25 shows further optimization of one-pot delivery through the use of evolved Cas9 variants to improve integration efficiency.
- Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used.
- FIGs.26A-26D show phage-assisted evolution of the Bxb1 recombinase for PASSIGE.
- FIG.26A shows an overview of PASSIGE.
- Prime editing (dual-flap or single-flap) precisely installs a large serine recombinase attachment site (attB or attP) into a targeted locus in the genome.
- a recombinase then recognizes the installed att motif and integrates donor DNA into this site.
- FIG.26B shows an overview of phage-assisted continuous evolution (PACE).
- the selection phage (SP) encodes the protein being evolved.
- Host E. coli cells encode a mutagenesis plasmid (MP), as well as plasmids that link the activity of the evolving protein to expression of gIII, an essential phage gene. Only phage that encode active variants trigger gIII expression and propagate. A constant dilution of host cells and media washes out inactive phage variants that are unable to propagate faster than the dilution rate.
- FIG.26C shows a schematic of the recombinase-PACE selection circuit. Bxb1 recombinase is encoded on the SP. Host cells harbor plasmid P1 that encodes promoter Pro1, and plasmid P2 that encodes a promoter-less gIII
- Bxb1-mediated recombination places Pro1 upstream of the gIII cassette, driving its expression.
- circuit 1 two attachment sites are present in each plasmid resulting in two recombination events that exchanges sequences between P1 and P2.
- circuit 2 one attachment site is present in each plasmid resulting in one recombination event that integrates P1 and P2.
- FIG.26D shows PANCE phage titer for the evolution of Bxb1 recombinase across six circuits (1.1-1.4 and 2.1-2.2). Each trace reflects the mean value of phage titers across four different lagoons. Individual traces for each lagoon are shown in FIG.31.
- FIGs.27A-27E show characterization of evolved Bxb1 variants in mammalian cells.
- FIG.27B shows a heatmap of fold-change in integration efficiency compared to wild-type (WT) Bxb1 for evolved variants.
- FIG.27D shows a Alphafold2- predicted structure of the Bxb1 recombinase.
- FIG.27E shows the predicted positions of mutated residues that gave the highest integration efficiencies. Residues are mapped onto the AlphaFold2 predicted structure of the NTD of Bxb1. Integration efficiency was assessed by ddPCR analysis as described in Example 2.
- FIGs.28A-28F show a characterization of evolved Bxb1 variants for PASSIGE.
- FIG. 28A shows absolute integration efficiencies for ten evolved Bxb1 variants with the highest activity from FIGs.27B-27C, and wild-type (WT) Bxb1 in the PASSIGE system.
- Dual pegRNAs were used to install attP into AAVS1 or attB into CCR5 in HEK293T cells.
- FIG.28B shows absolute integration efficiencies for PASSIGE (WT Bxb1), evoPASSIGE (Bxb1-V74A), and eePASSIGE (Bxb1-V74A+E229K+V375I). Dual pegRNAs were used to install attP into
- FIG.28C shows a percentage of mCherry positive cells 14 days after transfecting a 3.2-kB donor DNA plasmid along with either dead Bxb1, wild-type (WT) Bxb1, evoBxb1, or eeBxb1.
- the donor plasmid either has an attP or attB site and encodes mCherry under the CMV promoter.
- Statistical significance was calculated using Student’s unpaired two-tailed t-test, ***P ⁇ 0.001.
- FIG.28D shows recommended configurations for PASSIGE using evoBxb1 and eeBxb1.
- eePASSIGE When installing attP into the genome, eePASSIGE may be used due to its high-efficiency and undetected off- target integration.
- evoPASSIGE When installing attB into the genome, evoPASSIGE may be used due to off- target integration observed when using eePASSIGE in this orientation.
- FIGs.28E-28F show a representative flow cytometry plot used to assess off-target integration in FIG 28C.
- FIG.28E shows an untreated sample.
- FIG.28F shows the histogram used to assess mCherry+ cells when transfecting cells with either the dead Bxb1 negative control or eeBxb1.
- V1L represents mCherry- cells
- V1R represents mCherry+ cells.
- FIGs.29A-29F show characterization of PASSIGE, evoPASSIGE, eePASSIGE, and PASTE.
- FIG.29A shows a comparison of targeted 5.6-kB donor DNA plasmid integration efficiencies when installing either attP or attB into the genome using PASSIGE, evoPASSIGE, and eePASSIGE at the AAVS1, CCR5, ACTB, and Rosa26 loci in mammalian cells.
- FIG.29B shows a comparison of PASTE, PASSIGE, evoPASSIGE, and eePASSIGE. Dual pegRNAs were used to install attP into AAVS1 and ACTB, and attB into CCR5 and Rosa26.
- FIG.29C shows absolute integration efficiencies for PASSIGE, evoPASSIGE, eePASSIGE, and PASTE at eight different therapeutically relevant genomic sites.
- FIG.29D shows fold-change in integration efficiencies relative to PASSIGE for evoPASSIGE and eePASSIGE across all sites tested in Example 2.
- FIG.29E shows fold-change in integration efficiencies relative to PASTE for PASSIGE, evoPASSIGE, and eePASSIGE across all sites tested in Example 2.
- FIG.29F shows absolute integration efficiencies at 12 sites when using integration strategies with undetectable off-target. Either attP was installed into the genome and eePASSIGE was used, or attB was
- FIGs.30A-30D show integration of therapeutic DNA cargo using PASSIGE variants.
- FIG. 30B shows F9 protein measurement via ELISA assay.
- PASSIGE and PASTE experiments in FIGs.30A- 30B all components were delivered using single transfection. For the negative control, all components except the prime editor protein were transfected into cells, and the eeBxb1 recombinase was used.
- FIG.30C shows the ddPCR plots used to assess integration efficiency at the FANCA locus using PASSIGE (Data shown in FIG.30A).
- FIG.30D shows ddPCR plots for the FAM channel and corresponding % integration values obtained when using genome- donor junction binding probes used Example 2.
- FIG.31 shows individual PANCE experiments for Bxb1 evolutions in FIG.26D.
- circuits 1.1-1.4 two recombinase attachment sites are present in both plasmids, P1 and P2.
- Circuits 2.1, and 2.2 use one recombinase attachment site per plasmid. Circuits 1.1, 1.2, 2.1, and 2.2 have attB in P1 and attP in P2.
- Circuits 1.3, and 1.4 have attP in P1 and attB in P2.
- Circuits 1.2, 1.4, and 2.2 have a GA central dinucleotide in the attachment sites instead of GT, which is present in circuits 1.1, 1.3, and 2.1.
- PANCE traces for each lagoon (L1-L4) are shown. Selection stringency was modulated by decreasing the selection time and increasing dilution factor. Unless
- FIGs.32A-32C show PANCE and PACE experiments for Bxb1 evolution in FIG.27A.
- FIG.32A shows PANCE traces in circuit 1.3. Traces for individual lagoons (L1-L8) are shown. Selection stringency was modulated by decreasing the strength of the ribosome binding site (RBS) from sd8 to sd5, decreasing selection time, and increasing dilution factor.
- FIG.32B shows PACE traces across four lagoons (L1-L4) using circuit 1.3.
- Phage pools obtained from PANCE in were used to inoculate all lagoons. Selection stringency was modulated by increasing flow rate from 0.5 vol/hr to 3.0 vol/hr.
- FIG.32C shows PANCE traces for evolution where size of P1 was increased from 3.2-kB to 6.5-kB.
- Phage pools obtained from PACE in (FIG.32B) were used to inoculate ten individual lagoons (L1-L10). Selection stringency was modulated by increasing dilution factor. For all PANCE experiments, unless otherwise indicated, each passage was performed overnight, and phage were diluted 1:50 after each passage.
- FIGs.33A-33D show mapping evolved mutations onto the predicted structure of Bxb1.
- FIG.33A shows AlphaFold2 predicted structures of the NTD, CTD-a, and CTD-b of Bxb1. Each domain aligns well with solved structures of serine recombinases (PDB: 1ZR4, 6DNW, and 4KIS).
- FIG.33B shows positions of beneficial evolved mutations in the AlphaFold2 predicted structure of the NTD of Bxb1.
- the DNA substrate from gammadelta resolvase tetramer (PDB: 1ZR4) was docked onto the predicted structure.
- FIG.33C shows positions of beneficial mutations on the surface of the AlphaFold2 predicted structure of the NTD of Bxb1.
- FIG.33D shows predicted positions of the four mutated residues in the core of the NTD that resulted in the highest integration efficiencies. Positions were predicted using AlphaFold2. The remaining three unmutated residues in each case are in dark grey.
- FIGs.34A-34B show optimization of the PASSIGE system.
- FIG.34A shows a schematic of trimmed pegRNA optimization for PASSIGE. When the overlap length between the two newly synthesized 3' flaps is decreased, unwanted plasmid recombination is reduced as each pegRNA plasmid encodes a trimmed attachment site sequence.
- FIG.34B shows editing efficiencies when using trimmed pegRNAs to install either attP or attB into the AAVS1 and CCR5 loci. Overlap lengths from 50 bp to 8 bp and 38 bp to 12 bp were tested to install attP and
- FIGs.35A-35C show characterization of evolved and engineered Bxb1 variants.
- FIG. 35A shows a heat map of fold-change in integration efficiencies compared to wild-type (WT) Bxb1 for evolved and engineered (ee) Bxb1 variants that were generated by combining one mutation from each domain of Bxb1.
- FIG.35B shows a percentage of mCherry-positive cells 14 days after transfecting a 3.2-kB donor DNA plasmid along with either dead Bxb1, evoBxb1, Bxb1 (V74A+V375I), Bxb1 (V74A+E229K), or eeBxb1.
- the donor plasmid either has an attP or attB site and encodes mCherry under the CMV promoter. Statistical significance was calculated using Student’s unpaired two-tailed t-test, ***P ⁇ 0.001, ****P ⁇ 0.0001. Bars reflect the mean of three independent replicates and dots show the values of individual replicates.
- FIG. 35C shows predicted position of the E229K mutation that resulted in off-target integration when delivering an attP containing donor.
- the DNA substrate from Listeria innocua prophage serine recombinase (PDB: 6DNW) was docked onto the AlphaFold2 predicted structure of the CTD-a domain of Bxb1.
- FIG.36 shows PE6 variants to enhance Bxb1 attachment site installation for sites in FIG.29A. Attachment site installation efficiencies with prime editors PEmax, PE6b, PE6c, and PE6d at the AAVS1, CCR5, and ACTB loci in HEK293T cells, and at the Rosa26 locus in N2a cells. Prime editing and indel efficiencies were assessed by Illumina Miseq analysis. Bars reflect the mean of three independent replicates. Dots show the values of individual replicates. [0070] FIGs.37A-37F show a comparison of PASTE with PASSIGE.
- FIG.37A shows a comparison of the optimized pegRNA scaffold (atgRNAv2) used in PASTE with the original pegRNA scaffold used in PASSIGE.
- FIG.37B shows a comparison of the XTEN-48 linker between the Cas9 and M-MLV reverse transcriptase domain of the prime editor used in PASTE with the SGGSx2-bpNLS SV40 -SGGSx2 linker (SEQ ID NO: 90) used in PASSIGE.
- FIG.37C shows a comparison of the mutated M-MLV RT with the L139P mutation used in PASTE with the M-MLV RT used in PASSIGE.
- FIG.37D shows a comparison of fusion of Bxb1 to the
- FIG.37E shows a comparison of the mutated attP sequence used in PASTE with the original attP sequence used in PASSIGE.
- FIG.37F shows a comparison of PASSIGE architecture with the PASTE architecture using wild-type Bxb1, evoBxb1, and eeBxb1 recombinases.
- Bxb1 variants and the PEmax prime editor are unfused.
- Bxb1 variants are fused to the prime editor used in PASTE using the same linker specified in (FIG.37D).
- FIG.38 shows PE6 variants for attachment site installation for therapeutic loci in FIG. 29C.
- FIGs.39A-39B show performance of ddPCR probes at the LMNB1, NOLC1, and ACTB sites.
- FIG.39A shows integration efficiencies for PASSIGE, evoPASSIGE, eePASSIGE, and PASTE at the top three most common sites used to characterize PASTE in the Yarnall et al.
- FIG.39B shows ddPCR plots for PASTE, no PEmax (+eeBxb1) control, and dead Bxb1 control when using different ddPCR probes.
- the bold line shows the threshold that was set to assess integration efficiencies in FIG.39A. Details of how thresholds are calculated are provided in Example 2.
- probes used in the original PASTE paper are compared side-by-side with probes used in Example 2.
- probes bind to either the DNA donor plasmid (LMNB1, and NOLC1) or the Bxb1 attB sequence, which is also present in the pegRNA plasmid (ACTB).
- LMNB1, and NOLC1 DNA donor plasmid
- Bxb1 attB sequence which is also present in the pegRNA plasmid (ACTB).
- FIGs.40A-40B show sequencing of individual phages after PANCE 1 in circuits, 1.1- 1.4, and 2.1-2.2.
- FIG.41 shows sequencing of individual phages after PANCE 2 in circuit 1.3 with 3.2 kB P1 plasmid.
- FIG.42 shows sequencing of individual phages after PACE in circuit 1.3 with 3.2 kB P1 plasmid.
- FIG.43 shows sequencing of individual phages after PANCE in circuit 1.3 with 6.5 kB P1 plasmid.
- FIG.44 shows that eeBxb1 improves integration efficiency of donor DNA in human primary fibroblast cells relative to wild type Bxb1. Delivery of RNA encoding eeBxb1 and the prime editor further increased integration efficiency relative to delivery of DNA encoding the same.
- DEFINITIONS [0078] Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
- Cas9 or “Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9).
- a “Cas9 domain,” as used herein, is a protein fragment comprising an active or fully or partly inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9.
- a “Cas9 protein” is a full length Cas9 protein.
- a Cas9 nuclease is also referred to sometimes as a casn1 nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)-associated nuclease.
- CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids).
- CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids.
- CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).
- tracrRNA trans-encoded small RNA
- rnc endogenous ribonuclease 3
- Cas9 domain The tracrRNA serves as a guide for ribonuclease 3- aided processing of pre-crRNA.
- Cas9/crRNA/tracrRNA endonucleolytically cleaves a linear or circular dsDNA target complementary to the spacer.
- the strand in the target DNA not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically.
- DNA-binding and cleavage typically requires protein and both RNAs.
- single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the contents of which are incorporated herein by reference.
- Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self.
- Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc.
- Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.
- a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain.
- a nuclease-inactivated Cas9 domain may interchangeably be referred to as a “dCas9” protein (for nuclease-“dead” Cas9).
- Methods for generating a Cas9 domain (or a fragment thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science. 337:816-821(2012); Qi et al., “Repurposing CRISPR as an RNA-Guided Platform for Sequence- Specific Control of Gene Expression” (2013) Cell.28;152(5):1173-83, the entire contents of each of which are incorporated herein by reference).
- the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain.
- the HNH subdomain cleaves the strand complementary to the gRNA
- the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9.
- the mutations D10A and H840A completely inactivate the nuclease activity of S. pyogenes Cas9 (Jinek et al., Science.337:816- 821(2012); Qi et al., Cell.28;152(5):1173-83 (2013)).
- proteins comprising fragments of a Cas9 protein are provided.
- a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9.
- proteins comprising Cas9, or fragments thereof are referred to as “Cas9 variants.”
- a Cas9 variant shares homology to Cas9, or a fragment thereof.
- a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild
- the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 6).
- the Cas9 variant comprises a fragment of SEQ ID NO: 6 Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 6).
- a fragment of SEQ ID NO: 6 Cas9 e.g., a gRNA binding domain or a DNA-cleavage domain
- the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 6).
- a corresponding wild type Cas9 e.g., SpCas9 of SEQ ID NO: 6
- CRISPR is a family of DNA sequences (i.e., CRISPR clusters) in bacteria and archaea that represent snippets of prior infections by a virus that have invaded the prokaryote.
- the snippets of DNA are used by the prokaryotic cell to detect and destroy DNA from subsequent attacks by similar viruses and effectively compose, along with an array of CRISPR-associated proteins (including Cas9 and homologs thereof) and CRISPR-associated RNA, a prokaryotic immune defense system.
- CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).
- tracrRNA trans-encoded small RNA
- rnc endogenous ribonuclease 3
- Cas9 protein a trans-encoded small RNA
- the tracrRNA serves as a guide for ribonuclease 3- aided processing of pre-crRNA.
- Cas9/crRNA/tracrRNA endonucleolytically cleaves a linear or circular dsDNA target complementary to the RNA. Specifically, the DNA strand in the target that is not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically.
- RNA-binding and cleavage typically requires protein and both RNAs.
- single guide RNAs sgRNA, or simply “gRNA” can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species – the
- a “CRISPR system” refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA), a tracr mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” ), or other sequences and transcripts from a CRISPR locus.
- the tracrRNA of the system is complementary (fully or partially) to the tracr mate sequence present on the guide RNA.
- DNA synthesis template and “reverse transcription template (RTT)” refer to the region or portion of the extension arm of a PEgRNA that is utilized as a template by a polymerase of a prime editor to encode a 3 ⁇ single-strand DNA flap that contains the desired edit and which then, through the mechanism of prime editing, replaces the corresponding endogenous strand of DNA at the target site.
- the extension arm including the DNA synthesis template, may be comprised of DNA or RNA.
- the polymerase of the prime editor can be an RNA-dependent DNA polymerase (e.g., a reverse transcriptase).
- the polymerase of the prime editor can be a DNA-dependent DNA polymerase.
- the DNA synthesis template may comprise the “edit template” and the “homology arm”, and all or a portion of an optional 5′ end modifier region and/or an optional 3′ end modifier region.
- the DNA synthesis template can include the portion of the extension arm that spans from the 5 ⁇ end of the primer binding site (PBS) to 3 ⁇ end of the gRNA core that may operate as a template for the synthesis of a single-strand of DNA by a polymerase (e.g., a reverse transcriptase).
- a polymerase e.g., a reverse transcriptase
- the DNA synthesis template can include the portion of the extension arm that spans from the 5 ⁇ end of the PEgRNA molecule to the 5′ end of the PBS.
- Certain embodiments described here refer to a “DNA synthesis template,” an “RT template,” or an “RTT,” which is also inclusive of the edit template and the homology arm, but wherein the RT edit template reflects the use of a prime editor having a polymerase that is a reverse transcriptase, and wherein the DNA synthesis template reflects more broadly the use of a prime editor having any polymerase.
- an RT template may be used to refer to a template polynucleotide for reverse transcription, e.g., in a prime editing system, complex, or method using a prime editor having a polymerase that is a reverse transcriptase.
- a DNA synthesis template may be used to refer to a template polynucleotide for DNA polymerization, e.g., RNA-dependent DNA polymerization or DNA-dependent polymerization, e.g., in a prime editing system, complex, or method using a prime editor having a polymerase that is an RNA-dependent DNA polymerase or a DNA-dependent DNA polymerase.
- edit template refers to a portion of the extension arm that encodes the desired edit in the single strand 3 ⁇ DNA flap that is synthesized by the polymerase, e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase (e.g., a reverse transcriptase).
- the polymerase e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase (e.g., a reverse transcriptase).
- DNA synthesis template refers to the region or portion of the extension arm of a pegRNA that is utilized as a template strand by a polymerase of a prime editor to encode a 3 ⁇ single-strand DNA flap that contains the desired edit and which then, through the mechanism of prime editing, replaces the corresponding endogenous strand of DNA at the target site.
- the extension arm including the DNA synthesis template, may be comprised of DNA or RNA.
- the polymerase of the prime editor can be an RNA-dependent DNA polymerase (e.g., a reverse transcriptase).
- the polymerase of the prime editor can be a DNA-dependent DNA polymerase.
- the DNA synthesis template comprises an the “edit template” and a “homology arm.”
- the DNA synthesis template may comprise the “edit template” and a “homology arm”, and all or a portion of the optional 5′ end modifier region, e2. That is, depending on the nature of the e2 region (e.g., whether it includes a hairpin, toeloop, or stem/loop secondary structure), the polymerase may encode none, some, or all of the e2 region, as well.
- the DNA synthesis template can include the portion of the extension arm that spans from the 5 ⁇ end of the primer binding site (PBS) to 3 ⁇ end of the gRNA core that may operate as a template for the synthesis of a single-strand of DNA by a polymerase (e.g., a reverse transcriptase).
- a polymerase e.g., a reverse transcriptase
- the DNA synthesis template can include the portion of the extension arm that spans from the 5 ⁇ end of the pegRNA molecule to the 3 ⁇ end of the edit template.
- the DNA synthesis template excludes the primer binding site (PBS) of pegRNAs either having a 3 ⁇ extension arm or a 5 ⁇ extension arm.
- an RT template which is inclusive of the edit template and the homology arm, i.e., the sequence of the pegRNA extension arm which is actually used as a template during DNA synthesis.
- the term “RT template” is equivalent to the term “DNA synthesis template.”
- an RT template may be used to refer to a template polynucleotide for reverse transcription, e.g., in a prime editing system, complex or method using a prime editor having a polymerase that is a reverse transcriptase.
- a DNA synthesis template may be used to refer to a template polynucleotide for DNA polymerization, e.g., RNA-dependent DNA polymerization or DNA-dependent polymerization, e.g., in a prime editing system, complex, or method using a prime editor having a polymerase that is an RNA-dependent DNA polymerase or a DNA-dependent DNA polymerase.
- the DNA synthesis template is a single-stranded portion of the PEgRNA that is 5′ of the PBS and comprises a region of complementarity to the PAM strand (i.e., the non-target strand or the edit strand), and comprises one or more nucleotide edits compared to the endogenous sequence of the double stranded target DNA.
- the DNA synthesis template is complementary or substantially complementary to a sequence on the non-target strand that is downstream of a nick site, except for one or more non-complementary nucleotides at the intended nucleotide edit positions.
- the DNA synthesis template is complementary or substantially complementary to a sequence on the non-target strand that is immediately downstream (i.e., directly downstream) of a nick site, except for one or more non-complementary nucleotides at the intended nucleotide edit positions. In some embodiments, one or more of the non-complementary nucleotides at the intended nucleotide edit positions are immediately downstream of a nick site. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to the double-stranded target DNA sequence. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to the non-target strand of the double-stranded target DNA sequence.
- a nick site is characteristic of the particular napDNAbp to which the gRNA core of the PEgRNA associates with, and is characteristic of the particular PAM required for recognition and function of the napDNAbp.
- the nick site in the phosphodiester bond between bases three (“-3” position relative to the position 1 of the PAM sequence) and four (“-4” position relative to position 1 of the PAM sequence).
- the DNA synthesis template and the primer binding site are immediately adjacent to each other.
- nucleotide edit refers to a specific nucleotide edit, e.g., a specific deletion of one or more nucleotides, a specific insertion of one or more nucleotides, a specific substitution(s) of one or more nucleotides, or a combination thereof, at a specific position in a DNA synthesis template of a PEgRNA to be incorporated in a target DNA sequence.
- the DNA synthesis template comprises more than one nucleotide edit relative to the double-stranded target DNA sequence.
- each nucleotide edit is a specific nucleotide edit at a specific position in the DNA synthesis template, each nucleotide edit is at a different specific position relative to any of the other nucleotide edits
- each nucleotide edit is independently selected from a specific deletion of one or more nucleotides, a specific insertion of one or more nucleotides, a specific substitution(s) of one or more nucleotides, or a combination thereof.
- a nucleotide edit may refer to the edit on the DNA synthesis template as compared to the sequence on the target strand of the double stranded target DNA, or may refer to the edit encoded by the DNA synthesis template on the newly synthesized single stranded DNA that replaces the endogenous target DNA sequence on the non-target strand.
- Edit strand and non-edit strand are terms that may be used when describing the mechanism of action of a prime editing system on a double-stranded DNA substrate.
- the “edit strand” refers to the strand of DNA which is nicked by the prime editor complex to form a 3 ⁇ end, which is then extended as a newly synthesized single stranded DNA (also referred herein as the newly synthesized 3′ DNA flap), which comprises a desired edit and ultimately displaces and replaces the single strand region of DNA just downstream of the nick, thereby installing the 3 ⁇ DNA flap containing the desired edit downstream of the nick on the “edit strand.”
- the newly synthesized 3′ DNA flap comprising the nucleotide edit is paired in a heteroduplex with the non-edit strand that does not comprise the nucleotide edit, thereby creating a mismatch.
- the mismatch is recognized by DNA repair machinery, and/or replication machinery, e.g., an endogenous DNA repair machinery.
- the intended nucleotide edit is incorporated into both strands of the target double-stranded DNA substrate.
- the application may also refer to the “edit strand” as the “protospacer strand” or the “PAM strand” since these elements are present in that strand.
- the “edit strand” may also be called the “non-target strand” since the edit strand is not the strand that becomes annealed to the spacer of the PEgRNA molecule, but rather is the complement of the strand that is annealed by the spacer of the PEgRNA.
- the “non-edit” strand is not directly edited by the PE system. Rather, the desired edit created by the PE system in the 3 ⁇ DNA flap is incorporated into the “non-edited strand” through DNA replication and/or repair.
- the “non-edit strand” is the strand that anneals to the spacer of the PEgRNA, and thus is also called the “target strand.”
- extension arm refers to a nucleotide sequence component of a PEgRNA which comprises a primer binding site (PBS) and a DNA synthesis template for a polymerase (e.g., an RT template for reverse transcriptase).
- PBS primer binding site
- DNA synthesis template for a polymerase e.g., an RT template for reverse transcriptase
- the extension arm is located at the 3 ⁇ end of the guide RNA.
- the extension arm is located at the 5 ⁇ end of the guide RNA.
- the extension arm comprises a DNA synthesis template and a primer binding site.
- the extension arm comprises the following components in a 5 ⁇ to 3 ⁇ direction: the DNA synthesis template, and the primer binding site.
- the extension arm also includes a homology arm.
- the extension arm comprises the following components in a 5 ⁇ to 3 ⁇ direction: the homology arm, the edit template, and the primer binding site. Since polymerization activity of the reverse transcriptase is in the 5 ⁇ to 3 ⁇ direction, the preferred arrangement of the homology arm, edit template, and primer binding site is in the 5 ⁇ to 3 ⁇ direction such that the reverse transcriptase, once primed by an annealed primer sequence, polymerizes a single strand of DNA using the edit template as a complementary template strand.
- the extension arm may be described as comprising generally two regions: a primer binding site (PBS) and a DNA synthesis template, for instance.
- PBS primer binding site
- the primer binding site binds to a primer sequence, for example, a single stranded primer sequence containing a free 3′ end at the nick site that is formed from the endogenous DNA strand of the target site when it becomes nicked by the prime editor complex, thereby exposing a 3 ⁇ end on the endogenous nicked strand.
- a primer sequence for example, a single stranded primer sequence containing a free 3′ end at the nick site that is formed from the endogenous DNA strand of the target site when it becomes nicked by the prime editor complex, thereby exposing a 3 ⁇ end on the endogenous nicked strand.
- the binding of the primer sequence to the primer binding site on the extension arm of the PEgRNA creates a duplex region with an exposed 3 ⁇ end (i.e., the 3 ⁇ of the primer sequence), which then provides a substrate for a polymerase to begin polymerizing a single strand of DNA from the exposed 3 ⁇ end along the length of the DNA synthesis template.
- the sequence of the single strand DNA product is the complement of the DNA synthesis template. Polymerization continues towards the 5 ⁇ of the DNA synthesis template (or extension arm) until polymerization terminates.
- the DNA synthesis template represents the portion of the extension arm that is encoded into a single strand DNA product (i.e., the 3 ⁇ single strand DNA flap containing the desired nucleotide edit) by the polymerase of the prime editor complex and that ultimately replaces the corresponding endogenous DNA strand of the target site that sits immediately downstream of the PE-induced nick site.
- polymerization of the DNA synthesis template continues towards the 5 ⁇ end of the extension arm until a termination event.
- Polymerization may terminate in a variety of ways, including, but not limited to (a) reaching a 5 ⁇ terminus of the PEgRNA (e.g., in the case of the 5 ⁇ extension arm wherein the DNA polymerase simply runs out of template), (b) reaching an impassable RNA secondary structure (e.g., hairpin or stem/loop), or (c) reaching a replication termination signal, e.g., a specific nucleotide sequence that blocks or inhibits the polymerase, or a nucleic acid topological signal, such as supercoiled DNA or RNA.
- a 5 ⁇ terminus of the PEgRNA e.g., in the case of the 5 ⁇ extension arm wherein the DNA polymerase simply runs out of template
- an impassable RNA secondary structure e.g., hairpin or stem/loop
- a replication termination signal e.g., a specific nucleot
- Fusion protein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins.
- One protein may be located at the amino- terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively.
- a protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein.
- proteins provided herein may be produced by any method known in the art.
- the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker.
- Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4 th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which is incorporated herein by reference.
- guide RNA is a particular type of guide nucleic acid which is mostly commonly associated with a Cas protein of a CRISPR-Cas9 and which associates with Cas9, directing the Cas9 protein to a specific sequence in a DNA molecule that includes complementarity to the spacer sequence of the guide RNA.
- this term also embraces the equivalent guide nucleic acid molecules that associate with Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and which otherwise program the Cas9 equivalent to localize to a specific target
- the Cas9 equivalents may include other napDNAbp from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (a type-V CRISPR-Cas systems), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), and C2c3 (a type V CRISPR-Cas system).
- CRISPR system e.g., type II, V, VI
- Cpf1 a type-V CRISPR-Cas systems
- C2c1 a type V CRISPR-Cas system
- C2c2 a type VI CRISPR-Cas system
- C2c3 a type V CRISPR-Cas system
- C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference.
- Exemplary sequences and structures of guide RNAs are provided herein.
- methods for designing appropriate guide RNA sequences are provided herein.
- the “guide RNA” may also be referred to as a “traditional guide RNA” to contrast it with the modified forms of guide RNA termed “prime editing guide RNAs” (or “PEgRNAs”) and “engineered PEgRNAs” (or epegRNAs”).
- Guide RNAs or PEgRNAs/epegRNAs may comprise various structural elements that include, but are not limited to: [0092] Spacer sequence – the sequence in the guide RNA or pegRNA/epegRNA (having about 20 nts in length) that has the same sequence as the protospacer in the target DNA, except that the guide RNA or PEgRNA/epegRNA comprises Uracil and the target protospacer contains Thymine. [0093] gRNA core (or gRNA scaffold or backbone sequence) – the sequence within the gRNA that is responsible for binding with a nucleic acid programmable DNA binding protein, e.g., a Cas9.
- Spacer sequence the sequence in the guide RNA or pegRNA/epegRNA (having about 20 nts in length) that has the same sequence as the protospacer in the target DNA, except that the guide RNA or PEgRNA/epegRNA comprises Uracil and the target protospacer contains Thymine.
- gRNA core or
- Transcription terminator – the guide RNA or PEgRNA may comprise a transcriptional termination sequence at the 3 ⁇ of the molecule.
- a pegRNA or epegRNA may also comprise an extension arm – a single strand extension at the 3 ⁇ end or the 5 ⁇ end of the PEgRNA which comprises a primer binding site and a DNA synthesis template sequence that encodes via a polymerase (e.g., a reverse transcriptase) a single stranded DNA flap containing the desired nucleotide change, which then integrates into the endogenous DNA by replacing the corresponding endogenous strand, thereby installing the desired nucleotide change.
- Linker refers to a molecule linking two other molecules or moieties. The linker can be an amino acid sequence in the case of a peptide linker joining two
- a napDNAbp (e.g., Cas9) can be fused to a reverse transcriptase by an amino acid linker sequence.
- the linker can also be a nucleotide sequence in the case of joining two nucleotide sequences together (e.g., in a gRNA).
- the traditional guide RNA is linked via a spacer or linker nucleotide sequence to the RNA extension of a prime editing guide RNA which may comprise an RT template sequence and an RT primer binding site.
- the linker is an organic molecule, group, polymer, or chemical moiety.
- the linker is 5-200 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
- napDNAbp As used herein, the term “nucleic acid programmable DNA binding protein” or “napDNAbp,” of which Cas9 is an example, refers to a protein that uses RNA:DNA hybridization to target and bind to specific sequences in a DNA molecule.
- Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the protospacer of a guide RNA).
- guide nucleic acid e.g., guide RNA
- the guide nucleic-acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to a complementary sequence.
- the binding mechanism of a napDNAbp – guide RNA complex includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp.
- the guide RNA protospacer then hybridizes to the “target strand.” This displaces a “non-target strand” that is complementary to the target strand, which forms the single strand region of the R-loop.
- the napDNAbp includes one or more nuclease activities, which then cut the DNA, leaving various types of lesions.
- the napDNAbp may comprise a nuclease activity that cuts the non-target strand at a first location, and/or cuts the target strand at a second location.
- the target DNA can be cut to form a “double-stranded break” whereby both strands are cut.
- the target DNA can be cut at only a single site, i.e., the DNA is “nicked” on one strand.
- Exemplary napDNAbp with different nuclease activities include “Cas9 nickase”
- nickase refers to a napDNAbp (e.g., a Cas protein) which is capable of cleaving only one of the two complementary strands of a double-stranded target DNA sequence, thereby generating a nick in that strand.
- the nickase cleaves a non-target strand of a double stranded target DNA sequence.
- the nickase comprises an amino acid sequence with one or more mutations in a catalytic domain of a canonical napDNAbp (e.g., a Cas protein), wherein the one or more mutations reduces or abolishes nuclease activity of the catalytic domain.
- the nickase is a Cas9 that comprises one or more mutations in a RuvC-like domain relative to a wild type Cas9 sequence or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents.
- the nickase is a Cas9 that comprises one or more mutations in an HNH-like domain relative to a wild type Cas9 sequence or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents.
- the nickase is a Cas9 that comprises an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 relative to a canonical SpCas9 sequence or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents.
- the nickase is a Cas9 that comprises an H840A, N854A, and/or N863A mutation relative to a canonical SpCas9 sequence, or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents.
- the term “Cas9 nickase” refers to a Cas9 with one of the two nuclease domains inactivated. This enzyme is capable of cleaving only one strand of a target DNA.
- the nickase is a Cas protein that is not a Cas9 nickase.
- the napDNAbp of the prime editing complex comprises an endonuclease having nucleic acid programmable DNA binding ability.
- the napDNAbp comprises an active endonuclease capable of cleaving both strands of a double stranded target DNA.
- the napDNAbp is a nuclease active endonuclease, e.g., a nuclease active Cas protein, that can cleave both strands of a double stranded target DNA by generating a nick on each strand.
- a nuclease active Cas protein can generate a cleavage (a nick) on each strand of a double stranded target DNA.
- the two nicks on both strands are staggered nicks, for example, generated by a napDNAbp comprising a
- the two nicks on both strands are at the same genomic position, for example, generated by a napDNAbp comprising a nuclease active Cas9.
- the napDNAbp comprises an endonuclease that is a nickase.
- the napDNAbp comprises an endonuclease comprising one or more mutations that reduce nuclease activity of the endonuclease, rendering it a nickase.
- the napDNAbp comprises an inactive endonuclease, for example, in some embodiments, the napDNAbp comprises an endonuclease comprising one or more mutations that abolish the nuclease activity.
- the napDNAbp is a Cas9 protein or variant thereof.
- the napDNAbp can also be a nuclease active Cas9, a nuclease inactive Cas9 (dCas9), or a Cas9 nickase (nCas9).
- the napDNAbp is Cas9 nickase (nCas9) that nicks only a single strand.
- the napDNAbp can be selected from the group consisting of: Cas9, Cas12e, Cas12d, Cas12a, Cas12b1, Cas12b2, Cas13a, Cas12c, Cas12d, Cas12e, Cas12h, Cas12i, Cas12g, Cas12f (Cas14), Cas12f1, Cas12j (Cas ⁇ ), and Argonaute and optionally has a nickase activity such that only one strand is cut.
- the napDNAbp is selected from Cas9, Cas12e, Cas12d, Cas12a, Cas12b1, Cas12b2, Cas13a, Cas12c, Cas12d, Cas12e, Cas12h, Cas12i, Cas12g, Cas12f (Cas14), Cas12f1, Cas12j (Cas ⁇ ), and Argonaute and optionally has a nickase activity such that one DNA strand is cut preferentially to the other DNA strand.
- Nuclear localization sequence [0101]
- nuclear localization sequence or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport.
- Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed November 23, 2000, published as WO 2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for its disclosure of exemplary nuclear localization sequences.
- an NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 94), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 99), KRTADGSEFESPKKKRKV (SEQ ID NO: 97), KRTADGSEFEPKKKRKV (SEQ ID NO: 106), NLSKRPAAIKKAGQAKKKK (SEQ ID NO: 107), PAAKRVKLD (SEQ ID NO: 98), RQRRNELKRSF (SEQ ID NO: 108), or NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 109).
- nucleic acid refers to a polymer of nucleotides.
- the polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5- methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7-deazaadenosine, 7-deazagu
- the terms “prime editing guide RNA” or “PEgRNA” or “pegRNA” or “extended guide RNA” refer to a specialized form of a guide RNA that has been modified to include one or more additional sequences for implementing the prime editing methods and compositions described herein.
- the prime editing guide RNAs comprise one or more “extended regions,” also referred to herein as “extension arms,” of nucleic acid sequence.
- the extended regions may comprise, but are not limited to, single-stranded RNA or DNA. Further, the extended regions may occur at the 3′ end of a traditional guide RNA. In other arrangements, the extended regions may occur at the 5′ end of a traditional guide RNA.
- the extended region may occur at an intramolecular region of the traditional guide RNA, for example, in the gRNA core region which associates and/or binds to the napDNAbp.
- the extended region comprises a “DNA synthesis template” which encodes (by the polymerase of the prime editor) a single-stranded DNA which, in turn, has been designed to be (a) homologous with the endogenous target DNA to be edited, and (b) which comprises at least one desired nucleotide change (e.g., a transition, a transversion, a deletion, or an insertion) to be introduced or integrated into the endogenous target DNA.
- the extended region may also comprise other functional sequence elements, such as, but not limited to, a “primer binding site” and a “linker” sequence, or other structural elements, such as, but not limited to, aptamers, stem
- the “primer binding site” comprises a sequence that hybridizes to a single-strand DNA sequence having a 3′ end generated from the nicked DNA of the R-loop.
- the PEgRNAs have a 3 ⁇ extension arm, a spacer, and a gRNA core.
- the 3 ⁇ extension arm further comprises in the 5 ⁇ to 3 ⁇ direction a DNA synthesis template, a primer binding site, and a linker.
- the DNA synthesis template may also be referred to more broadly as the “DNA synthesis template” where the polymerase of a prime editor described herein is not an RT, but another type of polymerase.
- the PEgRNAs have a 5 ⁇ extension arm, a spacer, and a gRNA core.
- the 5 ⁇ extension further comprises in the 5 ⁇ to 3 ⁇ direction a DNA synthesis template, a primer binding site, and a linker.
- the DNA synthesis template may also be referred to more broadly as the “DNA synthesis template” where the polymerase of a prime editor described herein is not an RT, but another type of polymerase.
- the PEgRNAs have in the 5 ⁇ to 3 ⁇ direction a spacer (1), a gRNA core (2), and an extension arm (3).
- the extension arm (3) is at the 3 ⁇ end of the PEgRNA.
- the extension arm (3) further comprises in the 5 ⁇ to 3 ⁇ direction a homology arm, an edit template, and a primer binding site.
- the extension arm (3) may also comprise an optional modifier region at the 3 ⁇ and 5 ⁇ ends, which may be the same sequences or different sequences.
- the 3 ⁇ end of the PEgRNA may comprise a transcriptional terminator sequence.
- the PEgRNAs have in the 5 ⁇ to 3 ⁇ direction an extension arm (3), a spacer (1), and a gRNA core (2).
- the extension arm (3) is at the 5 ⁇ end of the PEgRNA.
- the extension arm (3) further comprises in the 3 ⁇ to 5 ⁇ direction a primer binding site, an edit template, and a homology arm.
- the extension arm (3) may also comprise an optional modifier region at the 3 ⁇ and 5 ⁇ ends, which may be the same sequences or different sequences.
- the PEgRNAs may also comprise a transcriptional terminator sequence at the 3 ⁇ end.
- PE1 refers to a prime editing composition comprising 1) a fusion protein comprising a Cas9 protein variant Cas9(H840A) and a wild type MMLV RT having the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(wt)] -NLS and 2) a desired
- PE1 protein has the amino acid sequence of SEQ ID NO: 3, which is shown as follows.
- PE2 refers to prime editing composition comprising 1) a fusion protein comprising a Cas9 protein variant Cas9(H840A) and a variant MMLV RT having the following structure: [NLS]-[Cas9(H840A)]-[linker]- [MMLV_RT(D200N)(T330P)(L603W)(T306K)(W313F)] -NLS and 2) a desired PEgRNA, wherein the fusion protein (referred to as the PE2 protein) has the amino acid sequence of SEQ ID NO: 4, which is shown as follows: MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNT DRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMA
- PE3 refers a prime editing composition comprising a PE2 and further comprising a second-strand nicking guide RNA that complexes with the PE2 and introduces a nick in the non-edit DNA strand in order to induce preferential replacement of the edit strand.
- PE3b refers to a prime editing composition comprising PE2 and further comprising a second-strand nicking guide RNA that complexes with PE2 and introduces a nick in the non-edit DNA strand, wherein the second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit.
- the second strand nicking guide RNA with a spacer sequence that comprises complementarity to, and only hybridizes with, the edited strand after installation of the desired nucleotide edit(s), but not the endogenous target DNA sequence.
- mismatches between the nicking guide RNA spacer and the unedited target DNA should disfavor nicking by the sgRNA until after the editing event on the PAM strand takes place.
- PE4 refers to a prime editing composition comprising a PE2 and further comprising an MLH1 dominant negative protein variant (i.e., wild-type MLH1 with amino acids 754-756 truncated, which may be referred to herein as “MLH1 ⁇ 754-756” or “MLH1dn”).
- the MLH1 dominant negative protein variant may be expressed in trans in some embodiments.
- a PE4 system comprises a fusion protein comprising a PE2 protein and an MLH1 dominant negative protein joined via an optional linker.
- PE5 refers to a prime editing composition comprising a PE3 and further comprising an MLH1 dominant negative protein variant (i.e., wild-type MLH1 with amino acids 754-756 truncated, which may be referred to as “MLH1 ⁇ 754-756” or “MLH1dn”).
- the MLH1 dominant negative variant may be expressed in trans in some embodiments.
- a PE5 system comprises a fusion protein comprising a PE2 protein and an MLH1 dominant negative protein joined via an optional linker.
- PE5b refers to a prime editing composition comprising a PE3 and an MLH1 dominant negative protein, wherein the second- strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing the second strand nicking guide RNA with a spacer sequence that comprise complementarity to, and hybridizes with, only the edited strand after installation of the desired nucleotide edit(s), but not the endogenous target DNA sequence.
- PEmax refers to a prime editing composition comprising 1) a fusion protein comprising a Cas9 protein variant Cas9(R221K N39K H840A) and a variant MMLV RT having the following structure: [bipartite NLS]-[Cas9(R221K)(N394K)(H840A)]-[linker]- [MMLV_RT(D200N)(T330P)(L603W)]-[bipartite NLS]-[NLS] and 2) a desired PEgRNA, wherein the fusion protein (referred to as the PEmax protein) has the amino acid sequence of SEQ ID NO: 5, which is shown as follows: MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNT DRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVD
- PE3max can be considered as PE3 except wherein the PE2 component is substituted with PEmax.
- PE3bmax refers to a prime editing composition comprising a PEmax protein, a desired pegRNA, and a second strand nicking guide RNA, wherein the second-strand nicking guide
- B1195.70174WO00 12131093.2 RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing the second strand nicking guide RNA with a spacer sequence that comprise complementarity to, and hybridizes with, only the edited strand after installation of the desired nucleotide edit(s), but not the endogenous target DNA sequence.
- PE4max refers to PE4 but wherein the PE2 component is substituted with PEmax.
- PE5max and PE5bmax [0116] As used herein, “PE5max” refers to PE5, but wherein the PE2 component of PE3 is substituted with PEmax.
- PE5bmax refers to PE5b wherein the PE2 component of PE3 is substituted with PEmax.
- PE6 refers to a suite of prime editors (PE6a, PE6b, PE6c, PE6d, PE6e, PE6f, and PE6g) comprising improved reverse transcriptase and/or Cas9 variants.
- the improved reverse transcriptase and Cas9 domains of the PE6 variants can also be combined with each other to offer cumulative benefits.
- a PE6 prime editor comprising an improved reverse transcriptase variant of PE6a and an improved Cas9 variant of PE6e is referred to herein as the prime editor “PE6a-e” (or “PE6e-a”).
- PE6 prime editors Any possible combination of PE6 prime editors is contemplated by the present disclosure including, for example, PE6a-e, PE6a-f, PE6a-g, PE6b-e, PE6b-f, PE6b-g, PE6c-e, PE6c-f, PE6c-g, PE6d-e, PE6d-f, and PE6d-g.
- Any of the PE6 prime editors may also comprise the architecture of the PEmax protein as provided herein.
- any of the PE6 prime editors provided herein may further comprise additional amino acid mutations, e.g., any of those included in PEmax.
- a PE6 protein comprises a reverse transcriptase of the following amino acid sequence (the RT domain of “PE6a”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDEKYTLKKI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALDYLVDICTKDDFVVQGAL
- a PE6 protein comprises a reverse transcriptase of the following amino acid sequence (the RT domain of “PE6b”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: ISSSKHTLSQ
- a PE6 protein comprises a reverse transcriptase comprising the following amino acid sequence (the RT domain of “PE6d”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSI KQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNK RVEDIHPNVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTS
- polymerase refers to an enzyme that synthesizes a nucleotide strand and that may be used in connection with the prime editor delivery systems described herein.
- the DNA polymerase can be a “DNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of DNA).
- the DNA template molecule can be a PEgRNA, wherein the extension arm comprises a strand of DNA.
- the PEgRNA may be referred to as a chimeric or hybrid PEgRNA which comprises an RNA portion (i.e., the guide RNA components, including the spacer and the gRNA core) and a DNA portion (i.e., the extension arm).
- the DNA polymerase can be an “RNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of RNA).
- the PEgRNA is RNA, i.e., including an RNA extension.
- the term “polymerase” may also refer to an enzyme that catalyzes the polymerization of nucleotides (i.e., the polymerase activity). Generally, the enzyme will initiate synthesis at the 3′-end of a primer annealed to a polynucleotide template sequence (e.g., such as a primer sequence annealed to the primer binding site of a PEgRNA) and will proceed toward the 5′ end of the template strand.
- a “DNA polymerase” catalyzes the polymerization of deoxynucleotides.
- DNA polymerase includes a “functional fragment thereof.”
- a “functional fragment thereof” refers to any portion of a wild-type or mutant DNA polymerase that encompasses less than the entire amino acid sequence of the polymerase and which retains the ability, under at least one set of conditions, to
- Prime editing refers to an approach for gene editing using napDNAbps, a polymerase (e.g., a reverse transcriptase), and specialized guide RNAs that include a primer binding site and a DNA synthesis template for encoding desired new genetic information (or deleting genetic information) that is then incorporated into a target DNA sequence.
- a polymerase e.g., a reverse transcriptase
- specialized guide RNAs that include a primer binding site and a DNA synthesis template for encoding desired new genetic information (or deleting genetic information) that is then incorporated into a target DNA sequence.
- prime editing may be used to incorporate one or more recombinase recognition sequences into target DNA sequence such as a genome, as described herein.
- Prime editing is described in Anzalone, A. V. et al., Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019), which is incorporated herein by reference. See also International PCT Application, PCT/US2020/023721, filed March 19, 2020, and published as WO 2020/191239, which is incorporated herein by reference.
- Prime editing represents a platform for genome editing that is a versatile and precise method to directly write new genetic information into a specified DNA site using a nucleic acid programmable DNA binding protein (“napDNAbp”) working in association with a polymerase (i.e., in the form of a fusion protein or otherwise provided in trans with the napDNAbp), wherein the prime editing system is programmed with a prime editing (PE) guide RNA (“PEgRNA”) that both specifies the target site and templates the synthesis of the desired edit (e.g., a recombinase recognition sequence to be inserted into a target DNA) in the form of a replacement DNA strand by way of an extension (either DNA or RNA) engineered onto a guide RNA (e.g., at the 5 ⁇ or 3 ⁇ end, or at an internal portion of a guide RNA).
- PE prime editing
- PEgRNA prime editing guide RNA
- the replacement strand containing the desired edit (e.g., a single nucleobase substitution) shares the same sequence as the endogenous strand (or is homologous to it) immediately downstream of the nick site of the target site to be edited (with the exception that it includes the desired edit).
- the endogenous strand downstream of the nick site is replaced by the newly synthesized replacement strand containing the desired edit.
- prime editing may be thought of as a “search-and-replace” genome editing technology since the prime editors, as described herein, not only search and locate the desired target site to be edited, but at the same time, encode a replacement strand containing a desired edit that is installed in place of the corresponding target site endogenous DNA strand.
- TPRT target-primed reverse transcription
- Prime editing can be leveraged or adapted for conducting precision CRISPR/Cas-based genome editing with high efficiency and genetic flexibility.
- TPRT is naturally used by mobile DNA elements, such as mammalian non-LTR retrotransposons and bacterial Group II introns.
- Cas protein-reverse transcriptase fusions or related systems are used to target a specific DNA sequence with a guide RNA, generate a single strand nick at the target site, and use the nicked DNA as a primer for reverse transcription of an engineered DNA synthesis template that is integrated with the guide RNA.
- prime editors that use reverse transcriptase as the DNA polymerase component
- the prime editors described herein are not limited to reverse transcriptases but may include the use of virtually any DNA polymerase. Indeed, while the application throughout may refer to prime editors with “reverse transcriptases,” it is set forth here that reverse transcriptases are only one type of DNA polymerase that may work with prime editing. Thus, wherever the specification mentions a “reverse transcriptase,” the person having ordinary skill in the art should appreciate that any suitable DNA polymerase may be used in place of the reverse transcriptase.
- the extension which provides the template for polymerization of the replacement strand containing the edit—can be formed from RNA or DNA.
- the polymerase of the prime editor can be an RNA-dependent DNA polymerase (such as a reverse transcriptase).
- the polymerase of the prime editor may be a DNA-dependent DNA polymerase. The newly synthesized strand (i.e., the replacement DNA strand containing the desired nucleotide edit) that is formed by the prime editor would be homologous to the genomic
- B1195.70174WO00 12131093.2 target sequence i.e., have the same sequence as
- target sequence i.e., have the same sequence as
- desired nucleotide changes e.g., a single nucleotide substitution, a deletion, or an insertion, or a combination thereof.
- the newly synthesized (or replacement) strand of DNA may also be referred to as a single strand DNA flap, which would compete for hybridization with the complementary homologous endogenous DNA strand, thereby displacing the corresponding endogenous strand.
- Resolution of the hybridized intermediate (also referred to as a heteroduplex, comprising the single strand DNA flap synthesized by the reverse transcriptase hybridized to the endogenous DNA strand with the exception of mismatches at positions where desired nucleotide edits are installed in the edit strand) can include removal of the resulting displaced flap of endogenous DNA (e.g., with a 5 ⁇ end DNA flap endonuclease, FEN1), ligation of the synthesized single strand DNA flap to the target DNA, and assimilation of the desired nucleotide changes as a result of cellular DNA repair and/or replication processes.
- endogenous DNA e.g., with a 5 ⁇ end DNA flap endonuclease, FEN1
- FEN1 5 ⁇ end DNA flap endonuclease
- the system can be combined with the use of an error-prone reverse transcriptase enzyme (e.g., provided as a fusion protein with the Cas9 domain, or provided in trans to the Cas9 domain).
- the error-prone reverse transcriptase enzyme can introduce alterations during synthesis of the single strand DNA flap.
- error-prone reverse transcriptase can be utilized to introduce nucleotide changes to the target DNA.
- prime editing operates by contacting a target DNA molecule (for which a change in the nucleotide sequence is desired to be introduced) with a nucleic acid programmable DNA binding protein (napDNAbp) complexed with a prime editing guide RNA (PEgRNA).
- napDNAbp nucleic acid programmable DNA binding protein
- PgRNA prime editing guide RNA
- the prime editing guide RNA comprises an extension at the 3′ or 5′ end of the guide RNA, or at an intramolecular location in the guide RNA, and encodes the desired nucleotide change (e.g., single nucleotide substitution, insertion, or deletion).
- the napDNAbp/extended gRNA complex contacts the DNA molecule, and the extended gRNA guides the napDNAbp to bind to a target locus.
- a nick in one of the strands of DNA of the target locus is introduced (e.g., by a nuclease or chemical agent), thereby creating an available 3′ end in one of the strands of the target locus.
- nick is created in the strand of DNA that corresponds to the R-loop strand, i.e., the strand that is not hybridized to the guide RNA sequence, i.e., the “non-target strand.”
- the nick could be introduced in either of the strands. That is, the nick could be introduced into the R-loop “target strand” (i.e., the strand hybridized to the protospacer of the extended gRNA) or the “non- target strand” (i.e., the strand forming the single-stranded portion of the R-loop and which is complementary to the target strand).
- the DNA polymerase e.g., reverse transcriptase
- the DNA polymerase can be fused to the napDNAbp or alternatively can be provided in trans to the napDNAbp.
- This forms a single-strand DNA flap comprising the desired nucleotide change (e.g., the single base change, insertion, or deletion, or a combination thereof, for example, a recombinase recognition sequence to be inserted into a target DNA sequence such as a genome) and that is otherwise homologous to the endogenous DNA at or adjacent to the nick site.
- the napDNAbp and guide RNA are released.
- the final two steps relate to the resolution of the single strand DNA flap such that the desired nucleotide change becomes incorporated into the target locus.
- This process can be driven towards the desired product formation by removing the corresponding 5′ endogenous DNA flap that forms once the 3′ single strand DNA flap invades and hybridizes to the endogenous DNA sequence.
- the cell’s endogenous DNA repair and replication processes resolve the mismatched DNA to incorporate the nucleotide change(s) to form the desired altered product.
- the process can also be driven towards product formation with “second strand nicking.” This process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions, and insertions (e.g., insertion of a recombinase recognition sequence).
- PE primary editor
- PE system or “prime editor (PE)” or “PE system” or “PE editing system” refers the compositions involved in the method of genome editing using target- primed reverse transcription (TPRT) described herein, including, but not limited to, the napDNAbps, reverse transcriptases, fusion proteins (e.g., comprising napDNAbps and reverse transcriptases), prime editing guide RNAs, and complexes comprising fusion proteins and prime editing guide RNAs, as well as accessory elements, such as second strand nicking components (e.g., second strand nicking sgRNAs) and 5′ endogenous DNA flap removal endonucleases (e.g., FEN1) for helping to drive the prime editing process towards the edited product formation.
- TPRT target- primed reverse transcription
- the PEgRNA constitutes a single molecule comprising a guide RNA (which itself comprises a spacer sequence and a gRNA core or scaffold) and a 5 ⁇ or 3 ⁇ extension arm comprising the primer binding site and a DNA synthesis template
- the PEgRNA may also take the form of two individual molecules.
- a PEgRNA may comprise a guide RNA and a trans prime editor RNA template (tPERT), which essentially houses the extension arm (including, in particular, the primer binding site and the DNA synthesis domain) and an RNA-protein recruitment domain (e.g., MS2 aptamer or hairpin) in the same molecule which becomes co-localized or recruited to a modified prime editor complex that comprises a tPERT recruiting protein (e.g., MS2cp protein, which binds to the MS2 aptamer).
- tPERT trans prime editor RNA template
- a prime editor system can comprise one or more prime editing guide RNAs (PEgRNAs).
- a prime editor system has one PEgRNA (the “single flap prime editing system”) that targets one strand of a double stranded DNA, e.g., a target genomic site.
- a single flap prime editing system may comprise a spacer sequence that comprises complementarity to a target strand of a double stranded target DNA, a primer binding site that comprises complementarity to a non-target strand of the double stranded target DNA, and a DNA synthesis template that comprises (and encodes) a nucleotide edit compared to the double stranded target DNA sequence, e.g., a recombinase recognition site.
- a prime editor system (the “dual-flap prime editing system” or “twin prime editing” or “twinPE”) comprises at least two different PEgRNAs that can target opposite strands of a double stranded target DNA, e.g., a target genomic site.
- a twin prime editing system may comprise
- each of the two PEgRNAs comprises a DNA synthesis template having a region of complementarity to each other, and direct the synthesis of two 3′ flaps having a region of complementarity to each other and contains a nucleotide edit compared to the double stranded target DNA sequence, (e.g., a recombinase recognition sequence).
- Variants of twin prime editing include quadruple-flap prime editing whereby the two sets of twin prime editors are used to introduce a genetic change at two different genetic loci, e.g., two different recombinase recognition sequences located at the 5′ end and 3′ end of a gene.
- twin prime editing is a versatile and precise genome editing method that directly writes new genetic information into a specified DNA site using a nucleic acid programmable DNA binding protein (“napDNAbp”) working in association with a polymerase (i.e., in the form of a fusion protein or otherwise provided in trans with the napDNAbp), wherein the prime editing system is programmed with a prime editing (PE) guide RNA (“PEgRNA”) that both specifies the target site and templates the synthesis of the desired edit in the form of a replacement DNA strand by way of an extension (either DNA or RNA) engineered onto a guide RNA (e.g., at the 5 ⁇ or 3 ⁇ end, or at an internal portion of a guide RNA).
- PE prime editing
- PEgRNA prime editing guide RNA
- the replacement strand containing the desired edit (e.g., a recombinase recognition sequence for insertion into a target DNA sequence) shares the same sequence as the endogenous strand of the target site to be edited (with the exception that it includes the desired edit).
- the endogenous edit e.g., a recombinase recognition sequence for insertion into a target DNA sequence
- a prime editor comprises a napDNAbp (e.g., Cas9 nickase) and a reverse transcriptase provided in trans, i.e., the napDNAbp and the reverse transcriptase are not fused.
- the in trans napDNAbp and the reverse transcriptase may be tethered via a non-peptide linkage, e.g., a MS2 RNA-protein binding RNA sequence and a MS2 coat protein fused to either the napDNAbp or the reverse transcriptase, or may be unlinked to each other and simply recruited by the pegRNA.
- a prime editor composition, system, or complex provided herein comprises a fusion protein or a fusion protein complexed with a PEgRNA, and/or further complexed with a second-strand nicking sgRNA.
- the prime editor system may also refer to the complex comprising a fusion protein (reverse transcriptase fused to a napDNAbp), a PEgRNA, and a regular guide RNA capable of directing the second-site nicking step of the non- edited strand as described herein.
- the primer binding site is capable of binding to the primer sequence that is formed after nicking of the edit strand (the non-target strand) of the target DNA sequence by the prime editor.
- the prime editor e.g., by a Cas9 nickase component of a prime editor
- nicks the edit strand of the target DNA sequence a free 3′ end is formed in the edit strand, which serves as a primer
- the PBS is complementary to or substantially complementary to and can anneal to a free 3′ end on the non-target strand of the double stranded target DNA at the nick site. In some embodiments, the PBS anneals to the free 3′ end on the non-target strand can initiate target-primed DNA synthesis.
- Protein, peptide, and polypeptide [0138] The terms “protein,” “peptide,” and “polypeptide” are used interchangeably herein and refer to a polymer of amino acid residues linked together by peptide (amide) bonds.
- a protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex.
- a protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide.
- a protein, peptide, or polypeptide may be naturally occurring, recombinant, or synthetic, or any combination thereof. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker.
- Protospacer refers to the sequence (e.g., of ⁇ 20 bp) in DNA adjacent to the PAM (protospacer adjacent motif) sequence.
- the protospacer shares the same sequence as the spacer sequence of the guide RNA (except that a protospacer contains Thymine and the spacer sequence contains Uracil).
- the guide RNA anneals to the complement of the protospacer sequence on the target DNA (specifically, one strand thereof, i.e., the “target strand”
- a Cas nickase component of a prime editor in order for a Cas nickase component of a prime editor to function, it also requires a specific protospacer adjacent motif (PAM) that varies depending on the Cas protein component itself, e.g., the type of Cas protein and the bacterial species from which it is derived.
- PAM protospacer adjacent motif
- the canonical PAM sequence (i.e., the PAM sequence that is associated with the Cas9 nuclease of Streptococcus pyogenes or SpCas9) is 5 ⁇ -NGG-3 ⁇ , wherein “N” is any nucleobase followed by two guanine (“G”) nucleobases.
- SpCas9 can also recognize additional non-canonical PAMs (e.g., NAG and NGA).
- Different PAM sequences can be associated with different Cas9 nucleases or equivalent proteins from different organisms.
- Cas9 enzymes from different bacterial species can have varying PAM specificities.
- Cas9 from Staphylococcus aureus (SaCas9) recognizes NGRRT or NGRRN.
- Cas9 from Neisseria meningitis (NmCas) recognizes NNNNGATT.
- Cas9 from Streptococcus thermophilis (StCas9) recognizes NNAGAAW.
- non-SpCas9s bind a variety of PAM sequences, which makes them useful when no suitable SpCas9 PAM sequence is present at the desired target cut site.
- non- SpCas9s may have other characteristics that make them more useful than SpCas9.
- Cas9 from Staphylococcus aureus (SaCas9) is about 1 kilobase smaller than SpCas9, so it can be packaged into adeno-associated virus (AAV).
- Recombinase refers to a site-specific enzyme that mediates the recombination of DNA between recombinase recognition sequences, which results in the excision, integration, inversion, or exchange (e.g., translocation) of DNA fragments between the recombinase recognition sequences.
- Recombinases can be classified into two distinct families: serine recombinases (e.g., resolvases and invertases) and tyrosine recombinases (e.g., integrases).
- serine recombinases include, without limitation, Hin, Gin, Tn3, ⁇ -six, CinH, ParA, ⁇ , Bxb1, ⁇ C31, TP901, TG1, ⁇ BT1, R4, ⁇ RV1, ⁇ FC1, MR11, A118, U153, and gp29.
- tyrosine recombinases include, without limitation, Cre, FLP, R, Lambda, HK101, HK022, and pSAM2.
- a recombinase is a Bxb1 recombinase, or a variant thereof (e.g., any of the evolved Bxb1 recombinases described herein).
- the serine and tyrosine recombinase names stem from the conserved nucleophilic amino acid residue that the recombinase uses to attack the DNA and which becomes covalently linked to the DNA during strand exchange.
- Recombinases have numerous applications, including the creation of gene knockouts/knock-ins and gene therapy applications.
- the methods and compositions of the invention can be expanded by mining databases for new orthogonal recombinases or designing synthetic recombinases with defined DNA specificities (See, e.g., Groth et al., “Phage integrases: biology and applications.” J. Mol. Biol.2004; 335, 667-678; Gordley et al., “Synthesis of programmable integrases.” Proc. Natl. Acad. Sci. USA.2009; 106, 5053-5058; the entire contents of each are hereby incorporated by reference in their entirety).
- the catalytic domains of a recombinase are fused to a nuclease-inactivated RNA- programmable nuclease (e.g., dCas9, or a fragment thereof), such that the recombinase domain does not comprise a nucleic acid binding domain or is unable to bind to a target nucleic acid (e.g., the recombinase domain is engineered such that it does not have specific DNA binding activity).
- a nuclease-inactivated RNA- programmable nuclease e.g., dCas9, or a fragment thereof
- Recombinases lacking DNA binding activity and methods for engineering such are known, and include those described by Klippel et al., “Isolation and characterisation of unusual gin mutants.” EMBO J.1988; 7: 3983–3989: Burke et al., “Activating mutations of Tn3 resolvase marking interfaces important in recombination catalysis and its regulation.
- serine recombinases of the resolvase-invertase group e.g., Tn3 and ⁇ resolvases and the Hin and Gin invertases
- Tn3 and ⁇ resolvases and the Hin and Gin invertases have modular structures with autonomous catalytic and DNA-binding domains (See, e.g., Grindley et al., “Mechanism of site-specific recombination.” Ann Rev Biochem.2006; 75: 567–605, the entire contents of which are incorporated by reference).
- RNA-programmable nucleases e.g., dCas9, or a fragment thereof
- RNA-programmable nucleases e.g., dCas9, or a fragment thereof
- activated recombinase mutants that do not require any accessory factors (e.g., DNA binding activities)
- Recombinase recognition sequence refers to a nucleotide sequence target that is recognized by a recombinase and undergoes strand exchange with another RRS (which can be on the same DNA molecule or a different DNA molecule, e.g., on a different chromosome, or on a donor DNA molecule such as a donor DNA vector) that results in excision, integration, in
- the multi-strand prime editors may install one or more recombinase sites in a target DNA molecule, or in more than one target molecule.
- the recombinase sites can be installed at adjacent target sites or non-adjacent target sites (e.g., separate chromosomes).
- single installed recombinase sites can be used as “landing sites” for a recombinase-mediated reaction between the genomic recombinase site and a second recombinase site within an exogenously supplied nucleic acid molecule, e.g., a plasmid comprising a donor DNA sequence. This enables the targeted integration of a desired nucleic acid molecule.
- the recombinase sites can be used for recombinase-mediated excision or inversion of the intervening sequence, or for recombinase-mediated cassette exchange with exogenous DNA having the same recombinase sites.
- a recombinase recognition sequence comprises at attP site. In some embodiments, a recombinase recognition sequence comprises an attB site. In some embodiments, the attP and attB recombinase recognition sequences recognized by Bxb1 comprise the sequences
- Recombination can result in, inter alia, the insertion, exchange, inversion, excision, or translocation of nucleic acids, e.g., in or between one or more nucleic acid molecules.
- Reverse transcriptase describes a class of polymerases characterized as RNA-dependent DNA polymerases. All known reverse transcriptases require a primer to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA, which can then be cloned into a vector for further manipulation.
- Avian myoblastosis virus (AMV) reverse transcriptase was the first widely used RNA-dependent DNA polymerase (Verma, Biochim. Biophys. Acta 473:1 (1977)).
- the enzyme has 5 ⁇ -3 ⁇ RNA-directed DNA polymerase activity, 5 ⁇ -3 ⁇ DNA-directed DNA polymerase activity, and RNase H activity.
- RNase H is a processive 5 ⁇ and 3 ⁇ ribonuclease specific for the RNA strand for RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, New York: Wiley & Sons (1984)).
- M- MLV reverse transcriptase substantially lacking in RNase H activity has also been described. See, e.g., U.S. Pat. No.5,244,797.
- the invention contemplates the use of any such reverse transcriptases, or variants or mutants thereof.
- the invention contemplates the use of reverse transcriptases that are error- prone, i.e., that may be referred to as error-prone reverse transcriptases or reverse transcriptases that do not support high fidelity incorporation of nucleotides during polymerization.
- the error-prone reverse transcriptase can introduce one or more nucleotides that are mismatched with the RT template sequence, thereby introducing changes to the nucleotide sequence through erroneous polymerization of the single-strand DNA flap.
- spacer sequence in connection with a guide RNA or a PEgRNA refers to the portion of the guide RNA or PEgRNA of about 20 nucleotides that contains a nucleotide sequence that shares the same sequence as the protospacer sequence in the target DNA sequence.
- the spacer sequence anneals to the complement of the protospacer sequence to form a ssRNA/ssDNA hybrid structure at the target site and a corresponding R loop ssDNA structure of the endogenous DNA strand.
- sient mutation refers to a mutation in a nucleic acid molecule that does not have an effect on the phenotype of the nucleic acid molecule, or the protein it produces if it encodes a protein. Silent mutations can be introduced into coding regions of a nucleic acid (i.e., segments of a gene that encode for a protein), or they can be introduced in non- coding regions of a nucleic acid. A silent mutation in a nucleic acid sequence, e.g., in a target
- B1195.70174WO00 12131093.2 DNA sequence or in a DNA synthesis template sequence to be installed in the target sequence, may be a nucleotide alteration that does not result in expression or function of the amino acid sequence encoded by the nucleic acid sequence, or other functional features of the target nucleic acid sequence.
- silent mutations When silent mutations are present in a coding region, they may be synonymous mutations.
- Synonymous mutations refer to substitutions of one base for another in a gene such that the corresponding amino acid residue of the protein produced by the gene is not modified. This is due to the redundancy of the genetic code, allowing for multiple different codons to encode for the same amino acid in a particular organism.
- a silent mutation when in a noncoding region or a junction of a coding region and a non-coding region (e.g., an intron/exon junction), it may be in a region that does not impact any biological properties of the nucleic acid molecule (e.g., splicing, gene regulation, RNA lifetime, etc.).
- a silent mutation may also be a “benign” mutation, for example, where a nucleotide substitution results in one or more alterations in the amino acid sequence encoded, but does not result in detrimental impact on the expression or function of the polypeptide.
- Silent mutations may be useful, for example, for increasing the length of contiguous changes in a desired nucleotide edit or the number of nucleotide edits made to a target nucleotide sequence using prime editing to evade correction of the edit by the MMR pathway.
- the number of silent mutations installed may be one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten, or more.
- the silent mutations may be installed within one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten, or 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides from the intended edit site.
- silent mutations are installed in order to alter or optimize the secondary structure that a particular pegRNA will form in cell.
- changing some bases of a pegRNA to incorporate silent mutations results in changes to the secondary structure of the pegRNA that can improve editing efficiency.
- subject refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human.
- the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile,
- the subject is a research animal.
- the subject is genetically engineered, e.g., a genetically engineered non- human subject.
- the subject may be of either sex and may be at any stage of development.
- the subject has a disease or disorder, or is suspected of having a disease or disorder.
- the disease or disorder is treated using the gene editing methods involving prime editing and a recombinase described herein.
- substitution refers to replacement of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence.
- mutation may also be used throughout the present disclosure to refer to a substitution (i.e., a “nucleic acid mutation” or an “amino acid mutation”). Substitutions are typically described herein by identifying the original residue followed by the position of the residue within the sequence and the identity of the newly mutated/substituted residue.
- a substitution is in a recombinase, e.g., a Bxb1 recombinase.
- Target site refers to a sequence within a nucleic acid molecule that is edited by a prime editor (PE) disclosed herein.
- PE prime editor
- the target site further refers to the sequence within a nucleic acid molecule to which a complex of the prime editor (PE) and gRNA binds.
- treatment refers to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein.
- treatment refers to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein.
- treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or
- treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence.
- variant Bxb1 recombinase is a Bxb1 recombinase comprising one or more changes in amino acid residues as compared to a wild type Bxb1 recombinase amino acid sequence.
- variant Bxb1 recombinase encompasses homologous proteins having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% identity with a reference sequence and having the same or substantially the same functional activity or activities as the reference sequence.
- vector refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter a host cell, mutate, and replicate within the host cell, and then transfer a replicated form of the vector into another host cell.
- exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure.
- wild type is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene, or characteristic as it occurs in nature as distinguished from mutant or variant forms.
- DETAILED DESCRIPTION [0166] The present disclosure provides evolved and engineered recombinase variants.
- the recombinase variants provided herein exhibit increased recombination activity (e.g., increased insertion efficiency of donor DNA molecules at recombinase recognition sites), for example, between recombinase recognition sites that have been introduced into one or more target DNA
- B1195.70174WO00 12131093.2 sequences (e.g., in a genome of an organism) using prime editing.
- the present disclosure also provides systems and compositions comprising the recombinase variants described herein and a prime editor and pegRNA, or polynucleotides encoding each of the recombinase variant, prime editor, and pegRNA.
- Methods for editing a target nucleic acid using the recombinase variants provided herein and, optionally, prime editing (e.g., for insertion, deletion, exchange, inversion, or translocation) are also described in the present disclosure.
- Evolved and Engineered Recombinases [0167] Some aspects of the present disclosure provide evolved and/or engineered recombinases that exhibit improved activity (e.g., when used for recombining recombinase sites introduced into one or more target DNA sequences used prime editing). In some aspects, the present disclosure provides Bxb1 recombinase variants that exhibit improved activity (e.g., increased insertion efficiency of donor DNA molecules at recombinase recognition sites).
- Bxb1 recombinase variants comprise various amino acid substitutions relative to the amino acid sequence of Bxb1 recombinase, which is provided below: [0168] Bxb1 recombinase: MRALVVIRLSRVTDATTSPERQLESCQQLCAQRGWDVVGVAEDLDVSGAVDPFDRKRR PNLARWLAFEEQPFDVIVAYRVDRLTRSIRHLQQLVHWAEDHKKLVVSATEAHFDTTTP FAAVVIALMGTVAQMELEAIKERNRSAAHFNIRAGKYRGSLPPWGYLPTRVDGEWRLV PDPVQRERILEVYHRVVDNHEPLHLVAHDLNRRGVLSPKDYFAQLQGREPQGREWSAT ALKRSMISEAMLGYATLNGKTVRDDDGAPLVRAEPILTREQLEALRAELVKTSRAKPAV STPSLLLRVLFCAVCGEPAYKFAGGGRKHPRYRCRSMGFPKHCGNGTVAMAEWDAFC E
- the Bxb1 recombinase comprises an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions at positions selected from the group consisting of amino acid residues 3, 5, 10, 14, 15, 20, 23, 24, 25, 29, 35, 36, 39, 40, 43, 45, 47, 49, 50, 51, 54, 58, 60, 66, 68, 69, 70, 73, 74, 75, 78, 84, 86, 87, 89, 93, 95, 97, 100, 101, 105, 116, 119, 124, 127, 139, 147, 157, 158, 169, 17
- the Bxb1 recombinase comprises an amino acid sequence that is not identical to the amino acid sequence of wild type Bxb1 recombinase, or to any other variants of Bxb1 recombinase known in the art.
- the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3X, V5X, S10X, D14X, A15X, E20X, L23X, E24X, S25X, L29X, W35X, D36X, G39X, V40X, D43X, D45X, S47X, A49X, V50X, D51X, D54X, R58X, N60X, A66X, E68X, E69X, Q70X, D73X, V74X, I75X, Y78X, T84X, S86X, I87X,
- the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3X, V5X, D14X, A15X, E20X, L23X, E24X, S25X, L29X, W35X, D36X, G39X, V40X, D43X, D45X, S47X, A49X, V50X, D51X, D54X, R58X, N60X, A66X, E68X, E69X, Q70X, D73X, V74X, I75X, Y78X, T84X, S86X, I87X, H89X, L93X, H95X, A97X, H100X, K101X, V105X, T116X, A119X, A124X, G127X, E139X, F147X, S157X, L158X
- the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3T, A3V, V5F, V5I, S10A, D14N, A15T, E20K, E20Q, L23F, L23M, E24K, S25I, L29F, W35P, W35L, D36G, D36V, G39D, V40I, D43E, D45G, S47A, A49E, A49T, V50I, D51E, D51N, D51Y, D54N, R58K, N60S, A66T, E68K, E69D, Q70P, D73G, V74A, V74M, I75V, Y78H, Y78N, T84S, S86G,
- the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3T, A3V, V5F, V5I, D14N, A15T, E20K, E20Q, L23F, L23M, E24K, S25I, L29F, W35P, W35L, D36G, D36V, G39D, V40I, D43E, D45G, S47A, A49E, A49T, V50I, D51E, D51N, D51Y, D54N, R58K, N60S, A66T, E68K, E69D, Q70P, D73G, V74A, V74M, I75V, Y78H, Y78N, T84S, S86G, S86N, S86T, I87T, I87V, H89
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 3 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A3X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A3T mutation.
- the mutation is an A3V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 5 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a V5X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V5F mutation. In certain embodiments, the mutation is a V5I mutation. [0172] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 10 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S10X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S10A mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 14 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D14X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a D14N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 15 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an A15T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 20 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E20X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an E20K mutation.
- the mutation is an E20Q mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 23 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an L23X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an L23F mutation.
- the mutation is an L23M mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 24 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E24X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E24K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 25 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an S25X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S25I mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 29 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an L29X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an L29F mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 35 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a
- the mutation is a W35P mutation. In certain embodiments, the mutation is a W35L mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 36 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D36X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D36G mutation. In certain embodiments, the mutation is a D36V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 39 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a G39X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a G39D mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 40 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a V40X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V40I mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 43 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D43X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D43E mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 45 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D45X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a D45G mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 47 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an S47A mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 49 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A49X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A49E mutation.
- the mutation is an A49T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 50 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a V50X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a V50I mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 51 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D51X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D51E mutation. In certain embodiments, the mutation is a D51N mutation. In certain embodiments, the mutation is a D51Y mutation. [0190] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 54 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D54X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D54N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 58 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an R58X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an R58K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 60 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an N60S mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 66 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A66X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A66T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 68 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E68X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an E68K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 69 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E69X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E69D mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 70 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a Q70X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a Q70P mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 73 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D73X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a D73G mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 74 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a
- the mutation is a V74A mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 75 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an I75X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an I75V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 78 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a Y78X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a Y78H mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 84 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a T84X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a T84S mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 86 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an S86X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an S86G mutation.
- the mutation is an S86N mutation.
- the mutation is an S86T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 87 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an I87X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an I87T mutation.
- the mutation is an I87V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 89 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an H89N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 93 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an L93X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an L93M mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 95 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an H95X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an H95Y mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 97 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A97X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A97S mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 100 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an H100X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H100Y mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 101 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a K101X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a K101R mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 105 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a
- the mutation is a V105I mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 116 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a T116X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a T116P mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 119 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A119X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A119S mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 124 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A124X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A124S mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 127 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a G127X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G127E mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 139 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E139X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an E139A mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 147 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an F147Y mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 154 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a Y154X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a Y154C mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 157 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an S157X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an S157G mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 158 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an L158X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an L158M mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 169 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D169X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D169N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 175 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a V175X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a V175I mutation.
- the mutation is a V175M mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 179 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a
- the mutation is a V179G mutation. In certain embodiments, the mutation is a V179M mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 181 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R181X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R181Q mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 183 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an R183X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an R183L mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 185 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an L185X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an L185M mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 194 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an N194X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an N194D mutation.
- the mutation is an N194K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 197 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a P197X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a P197Q mutation.
- the mutation is a P197T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 199 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an H199Y mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 202 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A202X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A202S mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 203 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an H203X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an H203Y mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 204 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D204X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D204G mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 207 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an R207X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an R207I mutation.
- the mutation is an R207Q mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 208 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an R208X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an R208S mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 209 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a
- the mutation is a G209V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 214 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a K214X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a K214R mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 221 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a Q221X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a Q221R mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 229 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E229X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E229K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 239 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an M239X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an M239L mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 248 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A248X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A248V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 252 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a
- the mutation is a G252S mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 261 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A261X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A261V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 266 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A266X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A266T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 267 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E267X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E267D mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 273 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E273X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an E273D mutation.
- the mutation is an E273K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 279 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an R279X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an R279C mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 280 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an A280T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 281 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E281X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an E281K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 284 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a K284X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a K284N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 285 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a T285X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a T285A mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 287 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an R287X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R287P mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 288 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A288X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A288T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 291 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an A291S mutation. In certain embodiments, the mutation is an A291T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 309 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E309X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E309D mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 311 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A311X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A311V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 321 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an H321X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H321N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 328 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S328X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S328T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 333 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a K333X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a K333N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 334 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an H334P mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 342 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an M342X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an M342V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 343 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A343X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A343T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 345 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a W345X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a W345L mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 347 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A347X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A347T mutation. In certain embodiments, the mutation is an A347V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 360 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A360X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A360T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 361 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an E361D mutation. In certain embodiments, the mutation is an E361G mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 362 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an R362X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R362K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 365 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a K365X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a K365N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 368 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a V368X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a V368A mutation.
- the mutation is a V368N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 374 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A374X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A374V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 375 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a V375X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a V375I mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 378 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an A378T mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 389 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an S389X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an S389R mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 393 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an S393X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an S393F mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 400 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an S400X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S400Y mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 411 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A411X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A411V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 415 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A415X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an A415V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 419 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an E419D mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 421 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E421X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an E421K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 422 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a G422X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a G422S mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 424 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E424X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E424G mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 434 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E434X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E434G mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 435 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a T435X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a T435A mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 438 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an R438Q mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 440 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a G440X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a G440E mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 447 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D447 mutation, wherein X is any amino acid other than the wild type.
- the mutation is a D447N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 449 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an A449X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A449V mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 453 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a T453X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a T453A mutation. In certain embodiments, the mutation is a T453N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 462 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an L462X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an L462M mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 463 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a
- the mutation is a T463I mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 466 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a V466X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a V466M mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 468 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a G468X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a G468D mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 469 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a G469X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G469R mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 478 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a D478X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D478E mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 483 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an E483X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an E483K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 485 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an
- the mutation is an H485Y mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 487 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an R487X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an R487K mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 490 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an S490X mutation, wherein X is any amino acid other than the wild type.
- the mutation is an S490N mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 494 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an R494X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R494Q mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 496 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is an H496X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H496P mutation.
- the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 497 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence.
- the mutation is a T497X mutation, wherein X is any amino acid other than the wild type.
- the mutation is a T497A mutation.
- the Bxb1 recombinase comprises a substitution or combination of substitutions of any one of the Bxb1 variants in Table 1 below: [0301] Table 1: Bxb1 Variants from PANCEv1 that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s)
- the Bxb1 recombinase comprises a substitution or combination of substitutions of any one of the Bxb1 variants in Table 3 below: [0305] Table 3: Bxb1 Variants from PACEv1 that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s) V5X, A119X, E281X, G422S, R487X V5I, A119S, E281K, G422S, R487K d of substitutions of any one of the Bxb1 variants in Table 4 below: [0307] Table 4: Bxb1 Variants from PANCEv3, PANCEv4, PACEv2, and PANCEv5 that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s) V5X P197X d R494X V5I P197T d R494
- Table 7 Rationally Designed Bxb1 Triple Mutants that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s) D14X, R207X, and T453X D14N, R207Q, and T453A he amino acid sequence of SEQ ID NO: 1, wherein X is any amino acid other than the wild type amino acid.
- the Bxb1 recombinase comprises a V74A mutation relative to the amino acid sequence of SEQ ID NO: 1.
- the Bxb1 recombinase comprises the following amino acid sequence (which is also referred to herein as “evoBxb1” and “evoPASSIGE”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: MRALVVIRLSRVTDATTSPERQLESCQQLCAQRGWDVVGVAEDLDVSGAVDPFDRKRR PNLARWLAFEEQPFDAIVAYRVDRLTRSIRHLQQLVHWAEDHKKLVVSATEAHFDTTTP FAAVVIALMGTVAQMELEAIKERNRSAAHFNIRAGKYRGSLPPWGYLPTRVDGEWRLV PDPVQRERILEVYHRVVDNHEPLHLVAHDLNR
- the Bxb1 recombinase comprises V74X, E229X, and V375X mutations relative to the amino acid sequence of SEQ ID NO: 1, wherein X is any amino acid other than the wild type amino acid.
- the Bxb recombinase comprises V74A, E229K, and V375I mutations relative to the amino acid sequence of SEQ ID NO: 1.
- the Bxb1 recombinase comprises the following amino acid sequence (which is also referred to herein as “eeBxb1” and “eePASSIGE”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: MRALVVIRLSRVTDATTSPERQLESCQQLCAQRGWDVVGVAEDLDVSGAVDPFDRKRR PNLARWLAFEEQPFDAIVAYRVDRLTRSIRHLQQLVHWAEDHKKLVVSATEAHFDTTTP FAAVVIALMGTVAQMELEAIKERNRSAAHFNIRAGKYRGSLPPWGYLPTRVDGEWRLV PDPVQRERILEVYHRVVV
- mutation of an amino acid with a hydrophobic side chain may be a mutation to a second amino acid with a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan).
- a mutation of an alanine to a threonine e.g., an A3T mutation
- any of the amino acid mutations provided herein from one amino acid to an isoleucine may be an amino acid mutation to an alanine, valine, methionine, or leucine.
- any of the amino acid mutations provided herein from one amino acid to a lysine may be an amino acid mutation to an arginine.
- any of the amino acid mutations provided herein from one amino acid to an aspartic acid may be an amino acid mutation to a glutamic acid or asparagine.
- any of the amino acid mutations provided herein from one amino acid to a valine may be an amino acid mutation to an alanine, isoleucine, methionine, or leucine.
- any of the amino acid mutations provided herein from one amino acid to a glycine may be an amino acid mutation to an alanine. It should be appreciated, however, that additional conserved amino acid residues would be recognized by the skilled artisan, and any of the amino acid mutations to other conserved amino acid residues are also within the scope of this disclosure.
- the present disclosure provides Bxb1 recombinase variants comprising mutations corresponding to any of the mutations disclosed herein, or any combination thereof, at a homologous position in another recombinase.
- the other recombinase is also a serine recombinase like Bxb1. Examples of additional serine recombinases include, without limitation, Hin, Gin, Tn3, ⁇ -six, CinH, ParA, ⁇ , ⁇ C31, TP901, TG1, ⁇ BT1, R4, ⁇ RV1,
- this disclosure combines the use of prime editing (PE), twin PE, or multi-flap prime editing with site-specific recombination.
- site-specific recombination refers to a type of genetic recombination also known as “conservative site- specific recombination.” Site-specific recombination is a type of genetic recombination in which DNA strand exchange takes place between segments possessing at least a certain degree of sequence homology.
- SSRs site-specific recombinases
- Bxb1 site-specific recombinases
- recombinase recognition sites specific DNA sequence
- the presence of a recombinase enzyme and the recombination sites is sufficient for the reaction to proceed; in other systems a number of accessory proteins and/or accessory sites are required.
- RMCE recombinase-mediated cassette exchange
- a cognate recombinase that recognizes the installed recombinase recognition site may be used to catalyze the precise cleavage, strand exchange, and rejoining of DNA fragments at the defined recombinase recognition sites. This is accomplished without relying on endogenous repair mechanisms in a cell for repairing double-strand breaks that otherwise can induce indels and other undesirable
- the reactions catalyzed by recombinases and recombinase recognition sites result in large-scale genomic changes, such as, insertions, deletions, inversions, replacements, and chromosomal translocations of one or more chromosomal regions, including one or more loci, one or more genes, or one or more portions of genes (e.g., gene exons, introns, and gene regulatory regions).
- the one or more recombinase recognition sites can be inserted or introduced anywhere within a genome.
- a genome is organized as a single chromosome (e.g., bacteria) and the recombinase recognition site may be inserted at any locus within the chromosome.
- the insertion site may be within a gene or within an intergenic region of a chromosome.
- the insertion may be within an exon, intron, or therebetween, or within a regulatory sequence, such as a promoter, enhancer, or transcription binding sequence.
- the genome is organized into more than one chromosome, and the recombinase recognition site may be inserted at any locus within the chromosome. For instance, in humans, the genome comprises 23 pairs of chromosomes.
- the genome also may be mitochondrial DNA.
- the insertion site may be within a gene or within an intergenic region of a chromosome.
- the insertion may be within an exon, intron, or therebetween, or within a regulatory sequence, such as a promoter, enhancer, or transcription binding sequence.
- a regulatory sequence such as a promoter, enhancer, or transcription binding sequence.
- reference to “inserting in a genome” may include inserting the one or more SSRs into the one or more chromosomes of the genome.
- reference to “inserting in a genome” refers to inserting one or more SSR recognition sites in any one of chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome 16, chromosome 17, chromosome 18, chromosome 19, chromosome 20, chromosome 21, chromosome 22, or chromosome 23 (aka, XX chromosome or XY chromosome), or insertion into any combination of said chromosomes, or in a mitochondrial genome.
- the recombinase recognition sites are inserted upstream by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
- recombinase recognition sites are inserted upstream by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
- the recombinase recognition sites are inserted by PE or twinPE downstream of a gene.
- the recombinase recognition sites are inserted downstream by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101,
- the recombinase recognition sites are inserted within an exon, within an intron, or at the junction between an intron and exon, or upstream or downstream of an exon or intron.
- the recombinase recognition sites may be inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90
- the recombinase recognition sites are inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
- the recombinase recognition sites are inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
- the recombinase recognition sites are inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
- the recombinase recognition sites are inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109,
- the recombinase recognition sites may be inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109
- the disclosure provides compositions and methods for installing one or more recombinase recognition sites using single flap prime editing (“classical PE”), twin prime editing (or twinPE), or multi-flap PE.
- classical PE single flap prime editing
- twinPE twin prime editing
- multi-flap PE may be used to insert one or more or two or more recombinase recognition sites into one more desired genomic sites.
- Insertion of recombinase recognition sites provides a programmed location for effecting one or more site-specific intended edits in a target DNA, e.g., genetic changes in a target gene or
- Non-limiting examples of intended edits via genetic recombination include insertion of an exogenous sequence into a target DNA, deletion (excision) of an endogenous sequence in a target DNA, inversion of an endogenous sequence in a target DNA, replacement of an endogenous sequence in a target DNA by an exogenous sequence, translocation of sequences between two target DNA sequences (e.g., between two different chromosomes), and any combination thereof.
- genetic changes via recombination can include, for example, genomic integration of an exogenous DNA sequence, e.g., sequence of a plasmid or a part thereof, genomic deletion or insertion, chromosomal translocations, and replacement of an endogenous genomic sequence in a target genome by an exogenous sequence (“cassette exchanges”), among other genetic changes.
- genomic integration of an exogenous DNA sequence e.g., sequence of a plasmid or a part thereof
- genomic deletion or insertion chromosomal translocations
- chromosomal translocations chromosomal translocations
- replacement of an endogenous genomic sequence in a target genome by an exogenous sequence (“cassette exchanges”) are illustrated in FIG.1.
- FIG.1 The mechanism of installing a recombinase recognition site into the genome is analogous to installing other sequences, such as peptide/protein and RNA tags, into the genome.
- Recombinase sites can be installed in a target DNA, e.g., a target genome, with single flap prime editing, twin prime editing, or multi-flap prime editing.
- the present disclosure provides methods for modifying one or more target nucleic acids in a cell comprising contacting the one or more target nucleic acids with any of the recombinases provided herein (e.g., one or more target nucleic acids that comprise one or more recombinase recognition sites).
- the present disclosure provides methods for modifying a target nucleic acid in a cell using prime editing and a recombinase, comprising expressing in the cell a polynucleotide encoding any of the recombinases provided herein, a polynucleotide encoding a prime editor, and one or more polynucleotides encoding one or more prime editing guide RNAs (pegRNAs) comprising DNA synthesis templates encoding one or more recombinase recognition sites.
- pegRNAs prime editing guide RNAs
- the present disclosure provides methods for modifying a target nucleic acid in a cell using prime editing and a recombinase, comprising expressing in the cell a polynucleotide encoding any of the recombinases provided herein and a polynucleotide encoding a prime editor, and providing to the cell one or more prime editing guide RNAs (pegRNAs) comprising DNA synthesis templates encoding one or more recombinase recognition sites.
- pegRNAs prime editing guide RNAs
- the method further comprises expressing in the cell a polynucleotide comprising DNA for insertion into the target nucleic acid.
- the DNA comprises one or more donor genes.
- the DNA comprises a recombinase recognition site (e.g., an attB site).
- the ratio of the polynucleotide encoding the prime editor to the polynucleotide encoding the pegRNA to the polynucleotide encoding the Bxb1 recombinase to the polynucleotide encoding the DNA for insertion into the target nucleic acid is about 10:1:10:15 (e.g., about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 : 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.2, 1.4, 1.6, 1.8, or 2 : 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 : 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20).
- the prime editor installs a recombinase recognition site (e.g., an attP site) in the target nucleic acid, thereby facilitating Bxb1-mediated recombination with the recombinase recognition site flanking the DNA, resulting in insertion of the DNA into the target nucleic acid.
- a recombinase recognition site e.g., an attP site
- the donor DNA sequence is flanked on both sides by a recombinase recognition site (e.g., an attB site).
- the prime editor installs a first instance and a second instance of a recombinase recognition site (e.g., attP sites) in the target nucleic acid, thereby facilitating Bxb1- mediated recombination between the recombinase recognition sites in the target nucleic acid and the recombinase recognition sites flanking the DNA, resulting in excision of the target nucleic acid sequence between the first instance and the second instance of the recombinase recognition site and insertion of the DNA in its place.
- a recombinase recognition site e.g., attP sites
- the method comprises expressing in the cell a polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site (e.g., an attP site) and a polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site (e.g., an attB site).
- a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site (e.g., an attP site)
- a polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site (e.g., an attB site).
- the method comprises providing to the cell a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site (e.g., an attP site) and a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site (e.g., an attB site).
- a first prime editor installs
- a first prime editor installs the first recombinase recognition site into a target nucleic acid sequence on a first chromosome
- a second prime editor installs the second recombinase recognition site into a target nucleic acid sequence on a second chromosome, thereby facilitating Bxb1-mediated recombination between the two chromosomes.
- the first and the second recombinase recognition sites are encoded in opposite orientations.
- a first prime editor installs the first recombinase recognition site into the target nucleic acid
- a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated inversion of the nucleic acid sequence between the first and the second recombinase recognition sites.
- the method is a method for inserting DNA into a target nucleic acid in a cell using prime editing and a recombinase.
- the method comprises expressing in a cell: (i) a first polynucleotide encoding a pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs the first recombinase recognition site in the target nucleic acid, thereby facilitating Bx
- the method is a method for exchanging DNA in a target nucleic acid in a cell using prime editing and a recombinase.
- the method comprises expressing in a cell: (i) a first polynucleotide encoding one or more pegRNAs comprising a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA is
- the method is a method for deleting DNA from a target nucleic acid in a cell using prime editing and a recombinase.
- the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site;
- the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a site in a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site in a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb
- the method is a method for inverting a target nucleic acid in a cell using prime editing and a recombinase.
- the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation as the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site;
- the method is a method for inserting DNA into a target nucleic acid in a cell using prime editing and a recombinase comprising providing to the cell a pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site, and expressing in the cell: (i) a first polynucleotide encoding any of the Bxb1 recombinases described herein; (ii) a second polynucleotide encoding a prime editor; and (iii) a third polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs
- the method is a method for exchanging DNA in a target nucleic acid in a cell using prime editing and a recombinase comprising providing to the cell one or more pegRNAs comprising a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; and expressing in the cell: (i) a first polynucleotide encoding any of the Bxb1 recombinases described herein; (ii) a second polynucleotide encoding a prime editor; and (iii) a third polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombina
- the method is a method for deleting DNA from a target nucleic acid in a cell using prime editing and a recombinase comprising providing to the cell 1) a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site, and 2) a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; and expressing in the cell: (i) a first polynucleotide encoding a prime editor; and (ii) a second polynucle
- the method is a method for recombining target nucleic acids in two chromosomes in a cell using prime editing and a recombinase comprising providing to the cell 1) a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site, and 2) a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site on a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; and expressing in the cell:
- B1195.70174WO00 12131093.2 (i) a first polynucleotide encoding a prime editor; and (ii) a second polynucleotide encoding any of the Bxb1 recombinases described herein; wherein a first prime editor installs the first recombinase recognition site into a target nucleic acid on a first chromosome, and a second prime editor installs the second recombinase recognition site into a target nucleic acid on a second chromosome, thereby facilitating Bxb1- mediated recombination between the two chromosomes.
- the method is a method for inverting a target nucleic acid in a cell using prime editing and a recombinase comprising providing to the cell 1) a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site, and 2) a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation as the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; and expressing in the cell: (i) a first polynucleotide encoding a prime editor; and (ii) a second polynucleotide encoding any of the Bxb1 recombinases described herein; wherein a first prime editor install
- any of the polynucleotides used in the systems, compositions, and methods provided herein comprise DNA. In some embodiments, any of the polynucleotides used in the systems, compositions, and methods provided herein are DNA. In some embodiments, any of the polynucleotides used in the systems, compositions, and methods provided herein comprise RNA. In certain embodiments, any of the polynucleotides used in the systems, compositions, and methods provided herein are RNA.
- any of the polynucleotides encoding a Bxb1 recombinase and/or any of the polynucleotides encoding a prime editor used in the systems, compositions, and methods provided herein comprises RNA (e.g., mRNA) or are RNA (e.g., mRNA).
- the methods provided herein utilize two pegRNAs designed to promote integration of a donor DNA into a target nucleic acid and prevent unwanted
- the methods utilize a first pegRNA and a second pegRNA that each produce DNA flaps on the target nucleic acid that partially overlap each other, wherein each flap comprises a 5′ portion that does not overlap with the other flap.
- the partially overlapping flaps promote integration of a donor DNA into the target nucleic acid and prevent recombination between a polynucleotide encoding the donor DNA and a polynucleotide encoding a pegRNA.
- any of the methods described herein are performed in vitro. In some embodiments, any of the methods described herein are performed ex vivo.
- any of the methods described herein are performed in vivo. In some embodiments, any of the methods described herein are performed in a subject. In certain embodiments, the subject is a human. In some embodiments, editing a target nucleic acid using any of the methods described herein may be performed in order to treat a disease or disorder, for example, in a subject such as a human.
- napDNAbp [0356]
- the prime editors utilized in the systems and methods described herein comprise a nucleic acid programmable DNA binding protein (napDNAbp).
- prime editors may include a napDNAbp domain having a wild type Cas9 sequence, including, for example the canonical Streptococcus pyogenes Cas9 sequence of SEQ ID NO: 6, shown as follows. Description Sequence SEQ ID NO: S C 9 MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGN 6
- the prime editors used in the methods described herein include any of the following other wild type SpCas9 sequences, which may be modified with one or more of the mutations described herein at corresponding amino acid positions: Description Sequence S C 9 ATGGATAAGAAATACTCAATAGGCTTAGATATCGGCACAAATAGCG AG C CT C A A T A T GG G C AT G C G A T A A A
- the Cas9 protein can be a wild type Cas9 ortholog from another bacterial species different from the canonical Cas9 from S. pyogenes.
- modified versions of the following Cas9 orthologs can be used in connection with the prime editors described in this specification by making mutations at positions corresponding to H840A or any other amino acids of interest in wild type SpCas9.
- any variant Cas9 orthologs having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of the below orthologs may also be used with the prime editors.
- the napDNAbp used in the prime editors described herein may include any suitable homologs and/or orthologs or naturally occurring enzymes, such as Cas9.
- the Cas moiety may be configured (e.g., mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only a single strand of the target double-stranded DNA.
- a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain; that is, the Cas9 is a nickase.
- the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the Cas9 orthologs in the above tables.
- Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference.
- Additional exemplary Cas variants and homologs include, but are not limited to, Cas9 (e.g., dCas9 and nCas9), Cpf1, CasX, CasY, C2c1, C2c2, C2c3, GeoCas9, CjCas9, Cas12a, Cas12b, Cas12g, Cas12h, Cas12i, Cas13b, Cas13c, Cas13d, Cas14, Csn2, xCas9, SpCas9-NG, Nme2Cas9, circularly permuted Cas9, Argonaute (Ago), Cas9-KKH, SmacCas9, Spy-macCas9, SpCas9-VRQR, SpCas9-NRRH, SpaCas9-NRTH, SpCas9-NRCH, LbCas12a, AsCas12a, CeCas12a, MbC
- the prime editors used in the systems and methods described herein comprise a reverse transcriptase domain.
- the reverse transcriptase domain is a wild type MMLV reverse transcriptase.
- the reverse transcriptase domain is a variant of wild type MMLV reverse transcriptase having the amino acid sequence of SEQ ID NO: 29.
- PE2 and PEmax comprise a variant reverse transcriptase domain of SEQ ID NO: 29, which is based on the wild type MMLV reverse transcriptase domain of SEQ ID NO: 28 (and, in particular, a Genscript codon optimized MMLV reverse transcriptase having the nucleotide sequence of SEQ ID NO: 28) and which comprises amino acid substitutions D200N,
- the prime editors used in the methods described herein can include a variant RT comprising one or more of the following mutations: P51L, S67K, E69K, L139P, T197A, D200N, H204R, F209N, E302K, E302R, T306K, F309N, W313F, T330P, L345G, L435G, N454K, D524G, E562Q, D583N, H594Q, L603W, E607K, or D653N in the wild type M-MLV RT of SEQ ID NO: 28, or at a corresponding amino acid position in another wild type RT polypeptide sequence.
- exemplary reverse transcriptases that can be fused to napDNAbp proteins or provided as individual proteins according to various embodiments of this disclosure are provided below.
- exemplary reverse transcriptases include variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following wild-type enzymes or partial enzymes: Description Sequence (variant substitutions relative to wild type) L Y L G L P L S K L L Y L G
- W313F WGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLT M V T K L Y L L P L S d herein can include a variant RT comprising one or more of the following mutations: P51X, S67X, E69X, L139X, T197X, D200X, H204X, F209X, E302X, T306X, F309X, W313X, T330X, L345X, L435X, N454X, D524X, E562X, D583X, H594X, L603X, E607X, or D653X in the wild type M-MLV RT of SEQ ID NO: 28, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein
- exemplary reverse transcriptases that can be fused to napDNAbp proteins or provided as individual proteins according to various embodiments of this disclosure are provided below.
- exemplary reverse transcriptases include variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the wild-type enzymes or partial enzymes described in SEQ ID NOs: 28-78.
- the prime editor (PE) system described here contemplates any publicly-available reverse transcriptase described or disclosed in any of the following U.S. patents (each of which are incorporated by reference): U.S. Patent Nos: 10,202,658; 10,189,831; 10,150,955; 9,932,567;
- a Reverse Transcriptase-Cas1 Fusion Protein Contains a Cas6 Domain Required for Both CRISPR RNA Biogenesis and RNA Spacer Acquisition. Mol. Cell 72, 700- 714.e8 (2016).
- Gerard, G. F. et al. The role of template-primer in protection of reverse transcriptase from thermal inactivation. Nucleic Acids Res 30, 3118–3129 (2002).
- Monot, C. et al. The Specificity and Flexibility of L1 Reverse Transcription Priming at Imperfect T-Tracts. PLOS Genetics 9, e1003499 (2013).
- Mohr, S. et al. Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing. RNA 19, 958–970 (2013).
- the prime editor proteins comprise an MMLV reverse transcriptase comprising one or more amino acid substitutions.
- the wild-type MMLV reverse transcriptase is provided by the following sequence: DESCRIPTION SEQUENCE A L V W P K Q K T
- the reverse transcriptase is the MMLV pentamutant described above (i.e., comprising amino acid substitutions D200N, T306K, W313F, T330P, and L603W).
- the disclosure also contemplates the use of any wild-type reverse transcriptase in the prime editors described herein.
- Exemplary wild-type reverse transcriptases which may be used include, but are not limited to, the following sequences, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto: SEQ DESCRIPTION SEQUENCE ID O 0 1 least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of
- the domain comprising an RNA-dependent DNA polymerase activity comprises a Tf1 reverse transcriptase.
- the prime editor proteins described herein may comprise a Tf1 reverse transcriptase comprising one or more mutations relative to the amino acid sequence of SEQ ID NO: 30.
- the Tf1 reverse transcriptase comprises one or more mutations selected from the group consisting of V14A, E22K, P70T, G72V, M102I, K106R, K118R, A139T, L158Q, F269L, S297Q, K356E, A363V, K413E, I423V, and S492N relative to the amino acid sequence of SEQ ID NO: 30.
- the Tf1 reverse transcriptase comprises any one of the following groups of amino acid substitutions relative to the amino acid sequence of SEQ ID NO: 30: K118R and S297Q; V14A, L158Q, F269L, and K356E; K106R, L158Q, F269L, A363V, and I423V; E22K, P70T, G72V, M102I, K106R, A139T, L158Q, F269L, A363V, K413E, and S492N; or P70T, G72V, M102I, K106R, L158Q, F269L, A363V, K413E, and S492N.
- the present disclosure provides reverse transcriptases, and prime editors (e.g. fusion proteins or prime editors in which each component is provided in trans) comprising reverse transcriptases, wherein the reverse transcriptase is a Tf1 reverse transcriptase of SEQ ID NO: 30, or a Tf1 reverse transcriptase variant having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with SEQ ID NO: 30, wherein the Tf1 reverse transcriptase variant comprises one or more mutations selected from the group consisting of V14A, E22K, I64L, I64W, P70T, G72V, M102I, K106R, K118R, L133N, A139T, L158Q, S188K, I260L, F269L, E274R, R288Q, Q293K, S297Q, N316Q, K
- the Tf1 reverse transcriptase variant comprises any one of the following groups of mutations relative to the amino acid sequence of SEQ ID NO: 30: K118R and S297Q; V14A, L158Q, F269L, and K356E; E22K, P70T, G72V, M102I, K106R, A139T, L158Q, F269L, A363V, K413E, and S492N; P70T, G72V, M102I, K106R, L158Q, F269L, A363V, K413E, and S492N; K106R, L158Q, F269L, A363V, and I423V; K118R, S297Q, S188K, I64L, I260L, and R288Q; E22K, P70T, G72V, M102I, K106R, A139T, L158Q, F269L
- the Tf1 reverse transcriptase variant comprises the amino acid sequence of any one of SEQ ID NOs: 30 and 56-78, or an amino acid sequence at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to any one of SEQ ID NOs: 30 and 56-78, wherein the amino acid sequence comprises at least one of residues 14A, 22K, 64L, 64W, 70T, 72V, 102I, 106R, 118R, 133N, 139T, 158Q, 188K, 260L, 269L, 274R, 288Q, 293K, 297Q, 316Q, 321R, 356E, 363V, 413E, 423V, 492N: Tf1 variant 5.131: ISSSKHTLSQMNKVSNIVKEPKLPDIYKEFKDITADTNTEKLP
- the domain comprising an RNA-dependent DNA polymerase activity comprises an Ec48 reverse transcriptase.
- the prime editor proteins described herein may comprise an Ec48 reverse transcriptase comprising one or more mutations relative to the amino acid sequence of SEQ ID NO: 31.
- the Ec48 reverse transcriptase comprises one or more mutations selected from the group consisting of A36V, E54K, K87E, R205K, V214L, D243N, R267I, S277F, E279K, N317S, K318E, H324Q, K326E, E328K, and R372K relative to the amino acid sequence of SEQ ID NO: 31.
- the Ec48 reverse transcriptase comprises any one of the following groups of amino acid substitutions relative to the amino acid sequence of SEQ ID NO: 31: R267I, K318E, K326E, E328K, and R372K; K87E, R205K, V214L, D243N, R267I, N317S, K318E, H324Q, and K326E; E54K, K87E, D243N, R267I, E279K, and K318E; A36V, K87E, R205K, D243N, R267I, E279K, and K318E; E54K, K87E, D243N, R267I, E279K, and K318E; or E54K, K87E, D243N, R267I, S277F, E279K, and K318E.
- the present disclosure provides reverse transcriptases, and prime editors (e.g. fusion proteins or prime editors in which each component is provided in trans) comprising reverse transcriptases, wherein the reverse transcriptase is an Ec48 reverse transcriptase of SEQ ID NO: 31, or an Ec48 reverse transcriptase variant having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with SEQ ID NO: 31, wherein the Ec48 reverse transcriptase variant comprises one or more mutations selected from the group consisting of A36V, E54K, E60K, K87E, S151T, E165D, L182N, T189N, R205K, V214L, D243N, R267I, S277F, E279K, V303M, K307R, R315K, N317S, K318E, H324Q, K3
- the reverse transcriptase
- the Ec48 reverse transcriptase variant comprises the amino acid sequence of any one of SEQ ID NOs: 31 and 48-55 or an amino acid sequence at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to any one of SEQ ID NOs: 31 and 48-55, wherein the amino acid sequence comprises at least one of residues 36V, 54K, 60K, 87E, 151T, 165D, 182N, 189N, 205K, 214L, 243N, 267I, 277F, 279K, 303M, 307R, 315K, 317S, 318E, 324Q, 326E, 328K, 343N, 372K, 378K, and 385R: Ec48 variant 3.23: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVE
- the fusion proteins comprise at least two NLSs. In embodiments with at least two NLSs, the NLSs can be the same NLSs or they can be different NLSs. In some embodiments, one or more of the NLSs are bipartite NLSs (“bpNLS”). In certain embodiments, the disclosed fusion proteins comprise two bipartite NLSs. In some embodiments, the disclosed fusion proteins comprise more than two bipartite NLSs.
- the location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a fusion protein (e.g., inserted between the encoded napDNAbp component (e.g., Cas9) and a polymerase domain (e.g., a reverse transcriptase).
- the NLSs may be any known NLS sequence in the art.
- the NLSs may also be any future-discovered NLSs for nuclear localization.
- the NLSs also may be any naturally-occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations).
- an NLS comprises the amino acid sequences NLSKRPAAIKKAGQAKKKK (SEQ ID NO: 107), PAAKRVKLD (SEQ ID NO: 98), RQRRNELKRSF (SEQ ID NO: 108), or NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 109).
- a prime editor or other fusion protein may be modified with one or more nuclear localization sequences (NLS), preferably at least two NLSs.
- the fusion proteins are modified with two or more NLSs.
- the disclosure contemplates the use of any nuclear localization sequence known in the art at the time of the disclosure, or any nuclear localization sequence that is identified or otherwise made available in the state of the art after the time of the instant filing.
- a representative nuclear localization sequence is a peptide sequence that directs the protein to the nucleus of the cell in which the sequence is expressed.
- a nuclear localization signal is predominantly basic, can be positioned almost anywhere in a protein's amino acid sequence, generally comprises a short sequence of four amino acids (Autieri & Agrawal, (1998) J. Biol. Chem.273: 14731-37, incorporated herein by reference) to eight amino acids, and is typically rich in lysine and arginine residues (Magin et al., (2000) Virology 274: 11-16, incorporated herein by reference).
- Nuclear localization sequences often comprise proline residues.
- a variety of nuclear localization sequences have been identified and have been used to effect transport of biological molecules from the cytoplasm to the nucleus of a cell. See, e.g., Tinland et al., (1992) Proc.
- NLSs can be classified in three general groups: (i) a monopartite NLS exemplified by the SV40 large T antigen NLS (PKKKRKV (SEQ ID NO: 94)); (ii) a bipartite motif consisting of two basic domains separated by a variable number of spacer amino acids and exemplified by the Xenopus nucleoplasmin NLS (KRXXXXXXXXXKKKL (SEQ ID NO: 110)); and (iii) noncanonical sequences such as M9 of the hnRNP Al protein, the influenza virus nucleoprotein NLS, and the yeast Gal4 protein NLS (Dingwall and Laskey 1991).
- Nuclear localization sequences appear at various points in the amino acid sequences of proteins. NLS have been identified at the N-terminus, the C-terminus, and in the central region of
- the disclosure provides fusion proteins that may be modified with one or more NLSs at the C-terminus and/or the N-terminus, as well as at internal regions of the fusion protein.
- the residues of a longer sequence that do not function as component NLS residues should be selected so as not to interfere, for example, tonically or sterically, with the nuclear localization signal itself. Therefore, although there are no strict limits on the composition of an NLS-comprising sequence, in practice, such a sequence can be functionally limited in length and composition.
- the present disclosure contemplates any suitable means by which to modify a fusion protein to include one or more NLSs.
- the fusion proteins may be engineered to express a fusion protein that is translationally fused at its N-terminus or its C-terminus (or both) to one or more NLSs, i.e., to form a prime editor-NLS fusion construct.
- a fusion protein-encoding nucleotide sequence may be genetically modified to incorporate a reading frame that encodes one or more NLSs in an internal region of the encoded prime editor.
- the NLSs may include various amino acid linkers or spacer regions encoded between the prime editor and the N-terminally, C-terminally, or internally-attached NLS amino acid sequence, e.g., and in the central region of proteins.
- nucleotide constructs, vectors, and host cells for expressing fusion proteins that comprise a prime editor and one or more NLSs, among other components.
- the prime editors described herein may also comprise nuclear localization sequences that are linked to a prime editor through one or more linkers, e.g., a polymeric, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element.
- linker refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease.
- a linker joins a gRNA binding domain of an RNA-
- a linker joins a Cas9 nickase and a reverse transcriptase.
- the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two.
- the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein).
- the linker is an organic molecule, group, polymer, or chemical moiety.
- the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated.
- the linker may be as simple as a covalent bond, or it may be a polymeric linker many atoms in length.
- the linker is a polypeptide, or amino acid-based. In other embodiments, the linker is not peptide-like.
- the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide linkage. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched, aliphatic or heteroaliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminoalkanoic acid.
- the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a phenyl ring.
- the linker may include functionalized moieties to facilitate attachment of a nucleophile (e.g., thiol, amino) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates.
- the linker comprises the amino acid sequence (GGGGS) n (SEQ ID NO: 84), (G)n (SEQ ID NO: 85), (EAAAK)n (SEQ ID NO: 86), (GGS)n (SEQ ID NO: 87), (SGGS)n (SEQ ID NO: 81), (XP)n (SEQ ID NO: 88), or any combination thereof, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid.
- the linker comprises the amino acid sequence (GGS)n (SEQ ID NO: 87), wherein n is 1, 3, or 7.
- the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 89). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 90). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 91). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 82). In other embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGG S (SEQ ID NO: 83, 60AA).
- the linker comprises the amino acid sequence GGS, GGSGGS (SEQ ID NO: 92), GGSGGSGGS (SEQ ID NO: 93), SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 80), SGSETPGTSESATPES (SEQ ID NO: 89), or SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGG S (SEQ ID NO: 83).
- linkers may be used to link any of the peptides or peptide domains or moieties of the invention (e.g., a napDNAbp linked or fused to a reverse transcriptase domain, and/or a napDNAbp linked to one or more NLS). Any of the domains of the fusion proteins used in the systems and methods described herein may also be connected to one another through any of the presently described linkers.
- PEgRNAs [0428]
- the prime editing systems and methods described herein contemplate the use of any suitable PEgRNAs, e.g., to introduce recombinase recognition sites into a target DNA sequence, such as a genome, using prime editing.
- an extended guide RNA, or pegRNA used in the prime editing systems and methods disclosed herein includes a spacer sequence (e.g., a ⁇ 20 nt spacer sequence) and a gRNA core region, which binds with the napDNAbp.
- a spacer sequence e.g., a ⁇ 20 nt spacer sequence
- a gRNA core region which binds with the napDNAbp.
- pegRNA includes an extended RNA segment, i.e., an extension arm, at the 5′ end, i.e., a 5′ extension.
- the 5′ extension includes a DNA synthesis template sequence, a primer binding site, and an optional 5-20 nucleotide linker sequence.
- the RT primer binding site hybridizes to the free 3 ⁇ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming reverse transcriptase for DNA polymerization in the 5′-3′ direction.
- an extended guide RNA used in the prime editing systems and methods provided herein includes a spacer sequence (e.g., a ⁇ 20 nt spacer sequence) and a gRNA core, which binds with the napDNAbp.
- the pegRNA includes an extended RNA segment, i.e., an extension arm, at the 3′ end, i.e., a 3′ extension.
- the 3′ extension includes a DNA synthesis template sequence, and a reverse transcription primer binding site.
- an extended guide RNA used in the prime editing systems and methods provided herein includes a spacer sequence (e.g., a ⁇ 20 nt spacer sequence) and a gRNA core, which binds with the napDNAbp.
- the pegRNA includes an extended RNA segment, i.e., an extension arm, at an intermolecular position within the gRNA core, i.e., an intramolecular extension.
- the intramolecular extension includes a DNA synthesis template sequence, and a reverse transcription primer binding site. The RT primer binding site hybridizes to the free 3′ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming reverse transcriptase for DNA polymerization in the 5′-3′ direction.
- the position of the intermolecular RNA extension is not in the spacer sequence of the guide RNA.
- the position of the intermolecular RNA extension is in the gRNA core. In still another embodiment, the position of the intermolecular RNA extension is anywhere within the guide RNA molecule except within the spacer sequence, or at a position which disrupts the spacer sequence. In one embodiment, the intermolecular RNA extension is inserted downstream from the 3′ end of the spacer sequence.
- the intermolecular RNA extension is inserted at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10
- the intermolecular RNA extension is inserted into the gRNA core, which refers to the portion of a traditional guide RNA corresponding or comprising the tracrRNA, which binds and/or interacts with the napDNAbp, e.g., a Cas9 protein or equivalent thereof (i.e., a different napDNAbp).
- the insertion of the intermolecular RNA extension does not disrupt or minimally disrupts the interaction between the tracrRNA portion and the napDNAbp.
- the length of the RNA extension (which includes at least the RT template and primer binding site) can be any useful length.
- the RNA extension is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least
- the RT template sequence can also be any suitable length.
- the RT template sequence can be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides
- the reverse transcription primer binding site sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12
- the optional linker or spacer sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200
- the RT template sequence encodes a single-stranded DNA molecule which is homologous to the non-target strand (and thus, complementary to the corresponding site of the target strand) but includes one or more nucleotide changes, e.g., for introducing a recombinase recognition sequence into a target DNA molecule.
- the one or more nucleotide changes may include one or more single-base nucleotide changes, one or more deletions, and/or one or more insertions.
- the synthesized single-stranded DNA product of the RT template sequence is homologous to the non-target strand except that it contains one or more nucleotide changes.
- the single-stranded DNA product of the RT template sequence hybridizes in equilibrium with the complementary target strand sequence, thereby displacing the homologous endogenous target strand sequence.
- the displaced endogenous strand may be referred to in some embodiments as a 5′ endogenous DNA flap species.
- This 5′ endogenous DNA flap species can be removed by a 5′ flap endonuclease (e.g., FEN1) and the single-stranded DNA product, now hybridized to the
- the nucleotide sequence of the RT template sequence corresponds to the nucleotide sequence of the non-target strand that becomes displaced as the 5′ flap species and that overlaps with the site to be edited.
- the DNA synthesis template sequence may encode a single-strand DNA flap that is complementary to an endogenous DNA sequence adjacent to a nick site, wherein the single-strand DNA flap comprises a desired nucleotide change.
- the single-stranded DNA flap may displace an endogenous single-strand DNA at the nick site.
- the displaced endogenous single-strand DNA at the nick site can have a 5′ end and form an endogenous flap, which can be excised by the cell.
- excision of the 5′ end endogenous flap can help drive product formation since removing the 5′ end endogenous flap encourages hybridization of the single-strand 3′ DNA flap to the corresponding complementary DNA strand, and the incorporation or assimilation of the desired nucleotide change carried by the single-strand 3′ DNA flap into the target DNA.
- cleavage site refers to a specific position in between two nucleotides or two base pairs in the double-stranded target DNA sequence. In some embodiments, the position of a nick site is determined relative to the position of a specific PAM sequence.
- the nick site is the particular position where a nick will occur when the double stranded target DNA is contacted with a napDNAbp, e.g., a nickase such as a Cas nickase, that recognizes a specific PAM sequence.
- a nick site e.g., the “first nick site” when referred to in the context of PE3, PE5 and similar approaches
- is characteristic of the particular napDNAbp to which the gRNA core of the PEgRNA associates with and is characteristic of the particular PAM required for recognition and function of the napDNAbp.
- a nick site in the phosphodiester bond between bases three (“-3” position relative to the position 1 of the PAM sequence) and four (“-4” position relative to position 1 of the PAM sequence).
- a nick site is in a target strand of the double-stranded target DNA sequence.
- a nick site is in a non-target strand of the double-stranded
- the nick site is in a protospacer sequence. In some embodiments, the nick site is adjacent to a protospacer sequence. In some embodiments, a nick site is downstream of a region, e.g., on a non-target strand, that is complementary to a primer binding site of a PEgRNA. In some embodiments, a nick site is downstream of a region, e.g., on a non-target strand, that binds to a primer binding site of a PEgRNA.
- the nick site is 3 nucleotides upstream of the PAM sequence, and the PAM sequence is recognized by a Streptococcus pyogenes Cas9 nickase, a P. lavamentivorans Cas9 nickase, a C. diphtheriae Cas9 nickase, a N. cinerea Cas9, a S. aureus Cas9, or a N. lari Cas9 nickase.
- the nick site is 3 nucleotides upstream of the PAM sequence, and the PAM sequence is recognized by a Cas9 nickase, wherein the Cas9 nickase comprises a nuclease active HNH domain and a nuclease inactive RuvC domain.
- the nick site is 2 base pairs upstream of the PAM sequence, and the PAM sequence is recognized by a S. thermophilus Cas9 nickase.
- the cellular repair of the single- strand DNA flap results in installation of the desired nucleotide change, thereby forming a desired product.
- the desired nucleotide change is installed in an editing window that is between about -5 to +5 of the nick site, or between about -10 to +10 of the nick site, or between about -20 to +20 of the nick site, or between about -30 to +30 of the nick site, or between about -40 to +40 of the nick site, or between about -50 to +50 of the nick site, or between about -60 to +60 of the nick site, or between about -70 to +70 of the nick site, or between about -80 to +80 of the nick site, or between about -90 to +90 of the nick site, or between about -100 to +100 of the nick site, or between about -200 to +200 of the nick site.
- the desired nucleotide change is installed in an editing window that is between about +1 to +2 from the nick site, or about +1 to +3, +1 to +4, +1 to +5, +1 to +6, +1 to +7, +1 to +8, +1 to +9, +1 to +10, +1 to +11, +1 to +12, +1 to +13, +1 to +14, +1 to +15, +1 to +16, +1 to +17, +1 to +18, +1 to +19, +1 to +20, +1 to +21, +1 to +22, +1 to +23, +1 to +24, +1 to +25, +1 to +26, +1 to +27, +1 to +28, +1 to +29, +1 to +30, +1 to +31, +1 to +32, +1 to +33, +1 to +34, +1 to +35, +1 to +36, +1 to +37
- the desired nucleotide change is installed in an editing window that is between about +1 to +2 from the nick site, or about +1 to +5, +1 to +10, +1 to +15, +1 to +20, +1 to +25, +1 to +30, +1 to +35, +1 to +40, +1 to +45, +1 to +50, +1 to +55, +1 to +100, +1 to +105, +1 to +110, +1 to +115, +1 to +120, +1 to +125, +1 to +130, +1 to +135, +1 to +140, +1 to +145, +1 to +150, +1 to +155, +1 to +160, +1 to +165, +1 to +170, +1 to +175, +1 to +180, +1 to +185, +1 to +190, +1 to +195, or +1 to +200, from the nick site.
- the extended guide RNAs are modified versions of an extended guide RNA.
- pegRNAs i.e., extended guide RNAs
- ngRNAs may be expressed from an encoding nucleic acid, or synthesized chemically. Methods are well known in the art for obtaining or otherwise synthesizing guide RNAs, and for determining the appropriate sequence of the pegRNA, including the protospacer sequence, which interacts and hybridizes with the target strand of a genomic target site of interest.
- the particular design aspects of a pegRNA sequence and ngRNA sequence will depend upon the nucleotide sequence of a genomic target site of interest
- B1195.70174WO00 12131093.2 i.e., the desired site to be edited
- the type of napDNAbp e.g., Cas9 protein
- PAM sequence locations percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc.
- a spacer sequence (i.e., a guide sequence) of a pegRNA or ngRNA can be any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, or Cas9 variant) to the target sequence.
- a napDNAbp e.g., a Cas9, Cas9 homolog, or Cas9 variant
- the degree of complementarity between a guide sequence and its corresponding target sequence when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more.
- Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net).
- Burrows-Wheeler Transform e.g., the Burrows Wheeler Aligner
- ClustalW ClustalW
- Clustal X Clustal X
- BLAT Novoalign
- SOAP available at soap.genomics.org.cn
- Maq available at maq.sourceforge.net
- a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. [0451] In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequence-specific binding of a prime editor to a target sequence may be assessed by any suitable assay.
- the components of a prime editor including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of a prime editor disclosed herein, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein.
- cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a prime editor, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions.
- Other assays are possible, and will occur to those skilled in the art.
- a guide sequence may be selected to target any target sequence.
- the target sequence is a sequence within a genome of a cell.
- Exemplary target sequences include those that are unique in the target genome.
- a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG where NNNNNNNNNNXGG (N is A, G, T, or C; and X can be anything).
- a unique target sequence in a genome may include an S.
- a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXAGAAW where NNNNNNNNNNXXAGAAW (N is A, G, T, or C; X can be anything; and W is A or T).
- a unique target sequence in a genome may include an S.
- a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGGXG where NNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything).
- a unique target sequence in a genome may include an S.
- pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNNNXGGXG where NNNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything).
- N is A, G, T, or C; and X can be anything.
- M may be A, G, T, or C, and need not be considered in identifying a sequence as unique.
- a guide sequence is selected to reduce the degree of secondary structure within the guide sequence. Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy.
- the scaffold or gRNA core portion of a pegRNA comprises sequences corresponding to the tracr sequence and tracr mate sequence of a traditional guide RNA.
- a tracr mate sequence includes any sequence that has sufficient complementarity with a tracr sequence to promote one or more of: (1) excision of a guide sequence flanked by tracr mate sequences in a cell containing the corresponding tracr sequence; and (2) formation of a complex at a target sequence, wherein the complex comprises the tracr mate sequence hybridized to the tracr sequence.
- degree of complementarity is with reference to the optimal alignment of the tracr mate sequence and tracr sequence, along the length of the shorter of the two sequences.
- Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self- complementarity within either the tracr sequence or tracr mate sequence.
- the degree of complementarity between the tracr sequence and tracr mate sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher.
- the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length.
- the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin.
- Preferred loop forming sequences for use in hairpin structures are four nucleotides in length, and most preferably have the sequence GAAA. However, longer or shorter loop sequences may be used, as may alternative sequences.
- the sequences preferably include a nucleotide triplet (for example, AAA), and an additional nucleotide (for example C or G). Examples of loop forming sequences include CAAA and AAAG.
- the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In preferred embodiments, the transcript has two, three, four or five hairpins. In a further embodiment of the invention, the transcript has at most five hairpins.
- the single transcript further includes a transcription termination sequence; preferably this is a polyT sequence, for example six T nucleotides.
- a transcription termination sequence preferably this is a polyT sequence, for example six T nucleotides.
- single polynucleotides comprising a guide sequence, a tracr mate sequence, and a tracr sequence are as follows (listed 5′ to 3′), where “N” represents a base of a
- the first block of lower case letters represent the tracr mate sequence
- the second block of lower case letters represent the tracr sequence
- the final poly-T sequence represents the transcription terminator: (1)NNNNNNNNGTTTTTGTACTCTCAAGATTTAGAAATAAATCTTGCAGAAGCTACAAA GATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTCGTTAT TTAATTTTTT (SEQ ID NO: 113); (2)NNNNNNNNNNNNNNNNNNNNNNNNGTTTTTGTACTCTCAGAAATGCAGAAGCTACAAAGAT AAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTCGTTATTTA ATTTTTTTT (SEQ ID NO: 114); (3)NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTTTTTGTACTCTCAGAAATGCAGAAGCTACAAA GATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTTTTTTTTTTTTTTTTTTT
- sequences (1) to (3) are used in combination with Cas9 from S. thermophilus CRISPR1.
- sequences (4) to (6) are used in combination with Cas9 from S. pyogenes.
- the tracr sequence is a separate transcript from a transcript comprising the tracr mate sequence.
- a target site e.g., a site at which a recombinase recognition sequence is to be introduced
- a guide RNA e.g., an sgRNA
- a guide RNA typically comprises a tracrRNA framework allowing for Cas9 binding, and a guide sequence, which confers sequence specificity to the Cas9:nucleic acid editing enzyme/domain fusion protein.
- a pegRNA comprises a structure 5 ⁇ -[guide sequence]- GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAAGGCUAGUCCGUUAUCAACUUG AAAAAGUGGCACCGAGUCGGUGCUUUUU (SEQ ID NO: 119)-extension arm-3 ⁇ , wherein the guide sequence comprises a sequence that is complementary to the target sequence.
- the guide sequence also referred to herein as the spacer sequence, is typically 20 nucleotides long.
- RNA sequences typically comprise guide sequences that are complementary to a nucleic acid sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited.
- Some exemplary guide RNA sequences suitable for targeting any of the provided fusion proteins to specific target sequences are provided herein. Additional guide sequences are well known in the art and can be used with the prime editors utilized in the methods and compositions described herein.
- a PEgRNA comprises three main component elements ordered in the 5 ⁇ to 3 ⁇ direction, namely: a spacer, a gRNA core, and an extension arm at the 3 ⁇ end.
- the extension arm may further be divided into the following structural elements in the 5 ⁇ to 3 ⁇ direction, namely: an edit template, a homology arm, and a primer binding site. In some embodiments, the extension arm may further be divided into the following structural elements in the 5 ⁇ to 3 ⁇ direction, namely: a homology arm, an edit template, and a primer binding site. In some embodiments, the extension arm may further be divided into the following structural elements in the 5 ⁇ to 3 ⁇ direction, namely: a DNA synthesis template (e.g., a RT template), and a primer binding site.
- the PEgRNA may comprise an optional 3 ⁇ end modifier region and an optional 5 ⁇ end modifier region .
- the PEgRNA may comprise a transcriptional termination signal at the 3 ⁇ end of the PEgRNA.
- These structural elements are further defined herein. The depiction of the structure of the PEgRNA is not meant to be limiting and embraces variations in the arrangement of the elements. For example, the optional sequence modifiers and could be positioned within or between any of the other regions shown, and not limited to being located at the 3 ⁇ and 5 ⁇ ends.
- PEgRNA modifications [0459]
- the PEgRNAs may also include additional design modifications that may alter the properties and/or characteristics of PEgRNAs, thereby improving the efficacy of prime editing.
- these modifications may belong to one or more of a number of different categories, including but not limited to: (1) designs to enable efficient expression of functional PEgRNAs from non-polymerase III (pol III) promoters, which would enable the expression of longer PEgRNAs without burdensome sequence requirements; (2) modifications to the core, Cas9-binding PEgRNA scaffold, which could improve efficacy; (3) modifications to the PEgRNA to improve RT processivity, allowing the insertion of longer sequences at targeted genomic loci; and (4) addition of RNA motifs to the 5 ⁇ or 3 ⁇ termini of the PEgRNA that improve PEgRNA stability, enhance RT processivity, prevent misfolding of the PEgRNA, or recruit additional factors important for genome editing.
- poly III non-polymerase III
- compositions comprising any of the evolved Bxb1 recombinases, guide RNAs (including, e.g., PEgRNAs and ePEgRNAs), prime editors, and polynucleotides described herein.
- the term “pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use.
- the pharmaceutical composition further comprises a pharmaceutically acceptable carrier.
- the pharmaceutical composition comprises additional agents (e.g., for specific delivery, increasing half-life, or other therapeutic compounds).
- the term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., an organ, tissue, or other part of the body).
- a pharmaceutically-acceptable material such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., an organ, tissue, or other part of the body).
- a pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.).
- materials that can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients,
- B1195.70174WO00 12131093.2 such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil, and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids; (23) serum component, such as serum album
- the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing.
- Suitable routes of administering the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration.
- the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site).
- the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber.
- the pharmaceutical composition described herein is delivered in a controlled release system.
- a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed.
- polymeric materials can be used.
- Polymeric materials See, e.g., Medical Applications of Controlled Release (Langer and Wise eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug
- the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human.
- compositions for administration by injection are solutions in sterile isotonic aqueous buffer.
- the pharmaceutical composition can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection.
- the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent.
- the pharmaceutical composition is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline.
- a pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer’s or Hank’s solution.
- the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated.
- the pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration.
- the particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein.
- Compounds can be entrapped in “stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther.1999, 6:1438-47).
- SPLP stabilized plasmid-lipid particles
- DOPE fusogenic lipid dioleoylphosphatidylethanolamine
- PEG polyethyleneglycol
- lipids such as N-[1-(2,3-dioleoyloxi)propyl]- N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles.
- DOTAP N-[1-(2,3-dioleoyloxi)propyl]- N,N,N-trimethyl-amoniummethylsulfate
- the preparation of such lipid particles is well known. See, e.g., U.S.
- compositions described herein may be administered or packaged as a unit dose, for example.
- unit dose when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle.
- the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection.
- a pharmaceutically acceptable diluent e.g., sterile water
- the pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention.
- Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use, or sale for human administration.
- an article of manufacture containing materials useful for the treatment of the diseases described above is included.
- the article of manufacture comprises a container and a label.
- Suitable containers include, for example, bottles, vials, syringes, and test tubes.
- the containers may be formed from a variety of materials such as glass or plastic.
- the container holds a composition that is effective for treating a disease and may have a sterile access port.
- the container may be an intravenous solution bag or a vial having a stopper pierce-able by a hypodermic injection needle.
- the active agent in the composition is a compound of the invention.
- the label on or associated with the container indicates that the composition is used for treating the disease of choice.
- the article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
- a pharmaceutically-acceptable buffer such as phosphate-buffered saline, Ringer's solution, or dextrose solution.
- It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
- the present disclosure provides polynucleotides and vectors encoding any of the Bxb1 recombinases described herein. In some aspects, the present disclosure provides polynucleotides and vectors encoding prime editors and pegRNAs as disclosed herein. In some embodiments, the polynucleotides and vectors provided herein comprise DNA. In some embodiments, the polynucleotides and vectors provided herein comprise RNA. In some embodiments, the polynucleotides and vectors provided herein consist of DNA.
- kits The evolved Bxb1 recombinases, guide RNAs (including pegRNAs and epegRNAs), prime editors, and compositions of the present disclosure may be assembled into kits.
- the kit comprises polynucleotides for expression of the evolved Bxb1 recombinases, prime editors, and/or pegRNAs, and epegRNAs described herein.
- the kit further comprises appropriate guide nucleotide sequences or nucleic acid vectors for the expression of such guide nucleotide sequences, to target the Cas9 protein of the prime editors to the desired target sequence, e.g., to introduce a recombinase recognition sequence at the target site.
- kits described herein may include one or more containers housing components for performing the methods described herein, and optionally instructions for use. Any of the kits described herein may further comprise components needed for performing the prime editing methods described herein.
- Each component of the kits where applicable, may be provided in liquid form (e.g., in solution) or in solid form, (e.g., a dry powder). In certain cases, some of the components may be reconstitutable or otherwise processible (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water), which may or may not be provided with the kit.
- the kits may optionally include instructions and/or promotion for use of the components provided.
- instructions can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc.
- the written instructions may be in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological
- kits may include other components depending on the specific application, as described herein.
- the kits may contain any one or more of the components described herein in one or more containers.
- the components may be prepared sterilely, packaged in a syringe, and shipped refrigerated. Alternatively, they may be housed in a vial or other container for storage.
- kits may include the active agents premixed and shipped in a vial, tube, or other container.
- the kits may have a variety of forms, such as a blister pouch, a shrink-wrapped pouch, a vacuum sealable pouch, a sealable thermoformed tray, or a similar pouch or tray form, with the accessories loosely packed within the pouch, one or more tubes, containers, a box, or a bag.
- the kits may be sterilized after the accessories are added, thereby allowing the individual accessories in the container to be otherwise unwrapped.
- the kits can be sterilized using any appropriate sterilization techniques, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art.
- kits may also include other components, depending on the specific application, for example, containers, cell media, salts, buffers, reagents, syringes, needles, a fabric, such as gauze, for applying or removing a disinfecting agent, disposable gloves, a support for the agents prior to administration, etc.
- kits comprising a nucleic acid construct comprising a nucleotide sequence encoding the prime editor systems described herein, or various components thereof (e.g., including, but not limited to, the napDNAbps, reverse transcriptase domains, and pegRNAs/epegRNAs).
- the nucleotide sequence(s) comprises a heterologous promoter (or more than a single promoter) that drives expression of the prime editor system components.
- kits comprising one or more nucleic acid constructs encoding the various components of the prime editing system described herein.
- the nucleotide sequence comprises a heterologous promoter that drives expression of the prime editing system components.
- Cells that may contain any of the recombinases, guide RNAs, prime editors, and/or compositions described herein include prokaryotic cells and eukaryotic cells.
- the methods described herein may be used to deliver a recombinase and a prime editor and guide RNA into a eukaryotic cell (e.g., a mammalian cell, such as a human cell).
- a eukaryotic cell e.g., a mammalian cell, such as a human cell.
- the cell is in vitro (e.g., a cultured cell).
- the cell is in vivo (e.g., in a subject, such as a human subject).
- the cell is ex vivo (e.g., isolated from a subject and may be administered back to the same or a different subject).
- Mammalian cells of the present disclosure include human cells, primate cells (e.g., vero cells), rat cells (e.g., GH3 cells, OC23 cells) or mouse cells (e.g., MC3T3 cells).
- human cell lines including, without limitation, human embryonic kidney (HEK) cells, HeLa cells, cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60), DU145 (prostate cancer) cells, Lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA- MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, SHSY5Y human neuroblastoma cells (cloned from a myeloma) and Saos-2 (bone cancer) cells.
- HEK human embryonic kidney
- HeLa cells cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60)
- DU145 (prostate cancer) cells Lncap (prostate cancer) cells
- MCF-7 breast cancer
- MDA- MB-438 breast cancer
- PC3 prostate cancer
- prime editors and/or guide RNAs are delivered into human embryonic kidney (HEK) cells (e.g., HEK293 or HEK293T cells).
- prime editors and/or guide RNAs are delivered into stem cells (e.g., human stem cells) such as, for example, pluripotent stem cells (e.g., human pluripotent stem cells including human induced pluripotent stem cells (hiPSCs)).
- stem cell refers to a cell with the ability to divide for indefinite periods in culture and to give rise to specialized cells.
- a pluripotent stem cell refers to a type of stem cell that is capable of differentiating into all tissues of an organism, but not alone capable of sustaining full organismal development.
- a human induced pluripotent stem cell refers to a somatic (e.g., mature or adult) cell that has been reprogrammed to an embryonic stem cell-like state by being forced to express genes and factors important for maintaining the defining properties of embryonic stem cells (see, e.g., Takahashi and Yamanaka, Cell 126 (4): 663–76, 2006, incorporated by reference herein).
- Human induced pluripotent stem cells express stem cell markers and are capable of generating cells characteristic of all three germ layers (ectoderm, endoderm, mesoderm).
- a host cell is transiently or non-transiently transfected with one or more vectors described herein.
- a cell is transfected as it naturally occurs in a subject.
- a cell that is transfected is taken from a subject.
- the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art.
- cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huh1, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panc1, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB
- Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassus, Va.)).
- ATCC American Type Culture Collection
- a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences.
- a cell transiently transfected with the components of a CRISPR system as described herein such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence.
- LSRs Large serine recombinases
- FIG.1 The biggest barrier preventing the use of LSRs for large gene insertion is its low DNA integration efficiency.
- FIGs.2A-2B the two circuits are noted in FIGs.2A-2B.
- M13 bacteriophage survival was linked with the successful integration of a promoter sequence in front of gIII, a gene necessary for phage infectivity and detachment.
- FIG.3 seven individual evolution campaigns were performed (FIG.3), each consisting of multiple rounds of selection, and many convergent mutations were observed (FIG.4).
- the variants from these evolutions were then cloned into mammalian expression vectors and transfected into HEK293T cells to test their performance.
- the variants were tested in cell lines that have the Bxb1 attachment sites installed in them (FIGs.5, 12, 13, and 14) as well as in the one-pot system (FIGs.6, 8, and 10).
- the one- pot system refers to the transfection of prime editor, pegRNAs, recombinase, and donor DNA at the same time.
- the prime editor first installs the Bxb1 attachment into the genomic locus of interest.
- the recombinase then integrates the desired DNA cargo. This method allows for precise, programmable gene integration in mammalian cells.
- WT wild-type
- Bxb1 activity by greater than 3-fold.
- the activity of Bxb1 was also further improved by rationally combining mutations from the evolution (FIGs.16, 17, and 18). Rationally combined variants with improved activity are listed in FIG.19.
- plasmid dosage and ratios were also optimized, and new-generation prime editors were tested (FIGs.20, 21, 24, and 25). These optimized systems offer ⁇ 20% integration efficiency in the one-pot system.
- the RT templates of the dual pegRNAs used to install the attachment site were also optimized to reduce background recombination that occurs between the pegRNA and donor plasmids (FIGs.22 and 23).
- Bxb1 variants with improved integration efficiencies were generated, and these variants can be harnessed for applications including programmable gene integration and other targeted genetic changes (e.g., DNA inversion) in mammalian cells.
- Example 2. Phage-assisted evolution of Bxb1 recombinase enhances prime editing-mediated programmable large-gene integration in mammalian cells [0487] Mutations that contribute to diseases in humans range from single nucleotide changes to large deletions, inversions, translocations, and duplications 1-3 . For many genetic diseases, a
- B1195.70174WO00 12131093.2 variety of loss-of-function mutations within a specific gene can cause pathogenesis: for instance, more than 500 ABCA4 gene variants, 1,000 PAH gene variants, and 2,000 CFTR gene variants have been reported in patients with Stargardt disease, phenylketonuria, and cystic fibrosis, respectively 4-6 .
- the ability to integrate full-length healthy genes or cDNAs into their endogenous loci in principle could serve as a single therapeutic strategy to serve patients with many different pathogenic alleles. Integration into the native locus could preserve physiological control of gene expression, evading overexpression issues associated with viral vector-mediated gene therapy that can induce pathology 7-10 .
- CRISPR-associated transposase systems show great promise for programmable integration but currently suffer from low efficiency in mammalian cells ( ⁇ 2% genomic integration for Type-I CAST systems 21,22 , while Type-V-K CAST systems have thus far not achieved integration into mammalian cell genomic DNA targets 23,24 ).
- PASSIGE prime editing-assisted site-specific integrase gene editing
- PASSIGE can be performed with a single-transfection by simultaneously delivering the prime editor, pegRNA(s), nicking sgRNA, donor DNA plasmid, and recombinase (LSR), or using two successive transfections to perform the prime editing step and recombination at different times.
- evoBxb1 the best evolved variant, evoBxb1
- evoBxb1 achieved a 3.2-fold average improvement in genomic integration efficiencies in human cells at pre-installed recombinase attachment sites.
- Evolved mutations from different domains were also combined to generate an even more active variant, eeBxb1.
- PASSIGE The use of PASSIGE with evoBxb1 or eeBxb1 is referred to herein as evoPASSIGE or eePASSIGE.
- circuit 2 has one recombinase attachment site present in each plasmid, and a single Bxb1-mediated recombination event integrates plasmids P1 and P2, also resulting in placement of the promoter upstream of gene III. It was anticipated that circuit 1 would be more stringent than circuit 2 as two recombination events are required for phage propagation in circuit 1. [0495] To identify the best evolution strategy to evolve Bxb1, four different sub-circuits (1.1- 1.4) were established for circuit 1 and two different sub-circuits (2.1-2.2) were established for circuit 2 (FIG.31).
- sub-circuits 1.1, 1.2, 2.1, and 2.2 attB one DNA landing site substrate for Bxb1, was placed in plasmid P1, and attP, the partner DNA landing site substrate for Bxb1, was placed in plasmid P2.
- sub-circuits 1.3 and 1.4 attP was instead placed in plasmid P1, and attB was placed in plasmid P2.
- the central dinucleotide of the attachment site for the Bxb1 recombinase can either be GT or GA 34 .
- sub-circuits 1.2, 1.4, and 2.2 contained a GA instead of the canonical GT central dinucleotide.
- Selection stringency was further increased by decreasing gIII expression, and PANCE was continued for four additional passages (FIG.27A and FIGs.32A-32C).
- the resulting phage pools emerging from PANCE were evolved for an additional 132 hours using PACE (FIG.32B).
- the selection stringency was increased in the lagoons by increasing the flow rate from 0.5 to 3.0 volume/hour.
- the phage pools surviving PACE were subjected to six additional passages of PANCE on a more stringent circuit in which the size of the P1 donor plasmid was increased from 3.2 kB to 6.5 kB (FIG.32C).
- NTD N-terminal domain
- CTD-a C-terminal domain-a
- CTD-b C-terminal domain-b
- the top 15 performing variants all contained a mutation in the NTD, and upon docking the DNA substrate of gammadelta resolvase tetramer 37 (PDB: 1ZR4) onto the predicted structure of the NTD, it was realized that most of these mutations are predicted to be in flexible loops of the enzyme, close to the active site and the DNA substrate (FIG.33B). All mutated residues that were present in the flexible regions of the enzyme were also surface- exposed (FIG.33C). Other conserved mutations that led to the highest improvements in integration efficiencies, including V105I, L29F, V74A, and W35L, were all clustered at the core of the protein (FIG.27E).
- HEK293T cells were co-transfected with plasmids encoding PEmax and twinPE pegRNAs with overlap lengths ranging from 8 bp to 50 bp for attP, and from 12 bp to 38 bp for attB installation. It was found that 3' flap overlap lengths could be truncated up to 28-bp for attP and 20-bp for attB installation without any
- HEK293T cells were co- transfected with a 5.6-kB donor DNA plasmid along with plasmids encoding either WT Bxb1 or an evolved Bxb1 variant, PEmax, and dual pegRNAs for twinPE 25 . After 72 hours, integration efficiencies were assessed using ddPCR.
- HEK293T cells were co- transfected with either dead Bxb1, WT Bxb1, evoBxb1, or eeBxb1 along with an attP- or attB- containing donor DNA plasmid encoding mCherry. Post-transfection, cells were passaged for two weeks, and then flow cytometry was performed to assess the percentage of mCherry + cells.
- PASSIGE variants were compared side-by-side with PASTE, a similar technology reported to have improved integration efficiencies over PASSIGE.
- PASTE differs from PASSIGE by using 1) a pegRNA scaffold mutant previously described by Wu and coworkers 42 (atgRNAv2), 2) a different linker (an XTEN-48 linker) between the Cas9 and reverse transcriptase (RT), 3) addition of the L139P mutation that was previously characterized 26 to the engineered M-MLV RT in PE2, and 4) a mutated attP sequence 27 .
- atgRNAv2 a pegRNA scaffold mutant previously described by Wu and coworkers 42
- RT reverse transcriptase
- RT reverse transcriptase
- the atgRNAv2 scaffold slightly improved integration on a case-by-case basis (FIG.37A), but the Cas9–RT linker, L139P mutation in the M-MLV RT, and attP mutant reduced integration efficiencies across all sites tested (FIGs.37B-37C, 37E).
- fusing the recombinase to the prime editor protein was reported to substantially improve integration 27 .
- WT Bxb1, evoBxb1, or eeBxb1 were fused to PEmax
- a substantial decrease in integration efficiencies was observed in all cases from fused prime editor–recombinase compared to unfused prime editor + recombinase (FIG.37D). This trend persisted when the recombinase was replaced in PASTE with Bxb1 variants generated in this Example (FIG.37F).
- ALB a highly expressed gene in the liver that has been used to express clinically relevant protein levels for loss-of- function diseases 30,45
- B2M and TRAC which have been used to express chimeric antigen receptors for CAR-T cell therapy 46
- CFTR GBA1, COL7A1, FANCA, and Smn1 implicated in cystic fibrosis 4 , Gaucher disease 47 , Parkinson’s disease 48 , dystrophic epidermolysis bullosa 49 , Fanconi anemia 50 , and spinal muscular atrophy 51 .
- PegRNAs were designed and PE6 variants were tested to install both attB and attP into each locus (FIG.38).
- the attachment sites were installed into the 5' untranslated region surrounding the start codon, since disruption of these genes increases therapeutic potency 46 .
- the attachment sites were installed into intron 4 of the gene as the majority of disease-causing mutations are located after exon 4 49 .
- the attachment site was installed into intron 1.
- AttB was installed into intron 1 of ALB, and then integrated the F9 cDNA ( ⁇ exon 1) using PASSIGE variants along with a minicircle DNA donor that is free of bacterial DNA sequences 54 .
- the F9 minicircle encoded an attP motif, a splice acceptor, followed by the F9 cDNA ( ⁇ exon 1) and 3' UTR sequence.
- splicing between the secretion signal of ALB exon 1 and the integrated F9 cDNA leads to F9 expression and release 25,45 .
- ELISA was performed on conditioned media to detect human F9 expression 9 days after transfection. Average F9 levels of 0.79, 4.0, and 6.9 ng/mL were observed upon
- the evolved mutations may help improve integration by improving enzyme solubility, catalysis, and/or attachment site binding.
- the evoBxb1 variant (V74A) demonstrates a 2.7-fold average improvement in donor integration across 12 sites from a single catalytic domain mutation, while the eeBxb1 variant (V74A, E229K, V375I) that was generated by rationally combining evolved mutations from distinct domains of the enzyme demonstrates a 4.2-fold average improvement over PASSIGE. No off-target integration was detected for evoBxb1 or eeBxb1 when delivering an attB-containing donor.
- Evo- and ee-PASSIGE show improvements in multi-kB targeted DNA integration at all 12 genomic loci tested in three mammalian cell lines, can efficiently integrate cDNA cassettes into six therapeutically relevant endogenous genomic sites in human cells, and can integrate gene cargoes that produce functional protein.
- PASSIGE variants show large improvements over other programmable gene integration methods including PASTE, with evo- and ee-PASSIGE offering a 9.1-fold and 16.2-fold average improvement in integration across all 12 sites tested in human and mouse cells, respectively. Indeed, PASTE did not outperform PASSIGE at any site tested in this study and exhibited on average 3.3-fold lower donor knock-in. PASTE also installed recombinase landing sites 2.0-fold less efficiently on average than PEmax, the prime editor typically used in PASSIGE, across seven genomic sites tested (FIG.38).
- fragments were obtained from PCR amplification, plasmid vector digestion, or synthetic gene fragments and assembled using NEBuilder Hifi DNA assembly master mix (New England Biolabs). PCR was performed using Phusion U Hot Start II DNA polymerase (Thermo Fisher Scientific), Phusion U Green Multiplex PCR Master Mix (Thermo Fisher Scientific), or Q5 Hot Start High- Fidelity 2 ⁇ Master Mix (New England Biolabs). DNA oligonucleotides were obtained from either Integrated DNA Technologies (IDT) or Eton-Biosciences. Synthetic gene fragments were obtained from either IDT or Genscript. Plasmids for mammalian expression of Bxb1 variants
- B1195.70174WO00 12131093.2 were cloned into the pCMV-Bxb1 vector backbone (Addgene, #182142). Plasmids expressing pegRNAs were cloned by assembling PCR-amplified pegRNA backbone (forward primer: 5'- GCTCGAGGTACCTCTCTA-3' (SEQ ID NO: 126), reverse primer: 5'- GAAATACTTTCAAGTTACGG-3' (SEQ ID NO: 127)) or BsaI-digested pegRNA backbone (Addgene, #132777) and pegRNA encoding eblocks ordered from IDT.
- DNA donor plasmids for mammalian cell experiments were cloned by assembling PCR amplified fragments or synthetic gene fragments into either Factor IX donor vector backbone (Addgene, # 182141) or attB-puro donor vector backbone (Addgene, #181923) digested by restriction enzymes. All prime editor variants used in this study (PEmax, PE6b-d) are available on Addgene (#174820, 207852- 207854). Constructs for PASTE experiments were obtained from Addgene: PASTE v3 (#179105), PASTE DNA donor plasmid (#179115), ACTB atgRNA (#179108), and ACTB nicking sgRNA (#179109).
- HEK293T cells American Type Culture Collection (ATCC) CRL-3216
- N2A cells ATCC, CCL-131
- HuH7 cells originated from ATCC
- HEK293T clonal cell lines with either pre-installed attP at AAVS1 or attB at CCR5 were cultured in Dulbecco’s Modified Eagle Medium (DMEM) plus GlutaMAX (Thermo Fisher Scientific) supplemented with 10% (v/v) fetal bovine serum (FBS) (Thermo Fisher Scientific).
- DMEM Modified Eagle Medium
- FBS fetal bovine serum
- Phage plaquing Plaque assays were performed to check for phage that cheat the selection (for example by integrating gIII onto the SP), to measure phage titers, and for bacteriophage cloning. An overnight culture of host cells was diluted by 50-fold in Davis Rich Medium (DRM) with carbenicillin and grown at 37 °C with shaking at 225 rpm until OD600 reached 0.3-0.8. Phage were serially diluted by a factor of 10 in water, up to 10 6 -fold, and four different dilutions were then chosen for plaquing. Plates for plaquing were made by pipetting ⁇ 1 mL of molten 2 ⁇ YT agar mixed with 0.04% Bluo-gal (Gold Biotechnologies) into a 12-well plate (Corning). Top
- B1195.70174WO00 12131093.2 agar was made by combining 2 ⁇ YT media and agar (2:1 ratio) and stored at 55 °C until use.
- To plaque 100- ⁇ L of host cells, 10 ⁇ L of serially diluted phage, and 500- ⁇ L of top agar were mixed and quickly added onto the solid agar in the 12-well plate. After the top agar solidified, plates were incubated overnight at 37 °C. Preparation and transformation of chemically competent cells [0526] Strain S2060 was used for all evolution experiments.
- an overnight culture of bacteria was diluted by 50-fold into 30 mL of 2 ⁇ YT media with the appropriate antibiotics and grown at 37 °C with shaking at 225 rpm until OD 600 reached 0.3-0.4.
- the cells were centrifuged for 10 minutes at 4,000 g at 4 °C, and the pellet was resuspended in 3 mL of cold TSS media (LB media supplemented with 5% v/v DMSO, 10% w/v PEG 3350, and 20 mM MgCl 2 ) on ice.
- the resuspended cells were aliquoted into 100 ⁇ L volumes, frozen in dry ice, and stored at ⁇ 80 °C.
- each plasmid 1-5 ⁇ L of each plasmid, 20 ⁇ L of 5 ⁇ KCM solution (500mM KCl, 150 mM CaCl2, and 250 mM MgCL2), 100 ⁇ L of chemically competent cells, and 80 ⁇ L of water were mixed and incubated on ice for 10 minutes. Cells were heat-shocked at 42 °C for 90 seconds, then 1 mL of SOC media (New England Biolabs) was added for recovery. Cells were recovered at 37 °C with shaking at 225 rpm for 1-2 hours before plating.
- 5 ⁇ KCM solution 500mM KCl, 150 mM CaCl2, and 250 mM MgCL2
- SOC media New England Biolabs
- Bacteriophage cloning [0527] Cloning of Bxb1 phage was performed using Gibson assembly of PCR fragments, as previously described 56 . Following assembly, the reaction was transformed into chemically competent S2060 E. coli host cells containing plasmid pJC175e which encodes gIII under the phage-shock promoter and allows for activity-independent phage propagation 57 . After transformation, the cloned phage in E. coli was grown first for 15 minutes in DRM media without antibiotics at 37 °C, and then overnight in media with carbenicillin. Bacteria were centrifuged for 3 minutes at 8,000 g and plaqued in host strain S2060 transformed with pJC175e.
- the overnight culture was then diluted by 50-fold in DRM media with the appropriate antibiotics and grown at 37 °C with shaking at 225 rpm until OD600 reached 0.3-0.4.
- arabinose was added to reach a final concentration of 20 mM.
- 1 mL of this culture was mixed with 10 ⁇ L of the selection phage in a 96-well plate (Avantor, VWR) and grown overnight for 12-18 hours at 37 °C with shaking at 225 rpm. The plate was centrifuged for 10 minutes at 4,000 g and phage was isolated from the supernatant. Isolated phage were used to infect the next PANCE passage until a noticeable change in phage propagation was observed.
- titers of isolated phage were determined by qPCR (described below), and this information was used to determine the selection strategy for the next passage.
- phage were plaqued in 1) host strain 2060 to check for cheater phage that might have recombined with gIII, and 2) host strain S2060 transformed with pJC175e to determine phage titers.
- Individual plaques were PCR amplified using the same primers noted in ‘Bacteriophage cloning’ and sent for sanger sequencing. Mutation tables were generated using Mutato (hub.docker.com/r/araguram/mutato).
- S2060 host cells with the appropriate plasmids (P1, P2, and MP6) were grown until OD 600 reached 0.3-0.4.
- the chemostat and all four lagoons were filled with 80 mL and 15 mL of this cell culture respectively.
- an appropriate flow rate ( ⁇ 80mL/hour) was established to continuously dilute the cells with fresh media (59g Harvard Custom Media C, 50 ⁇ L of 0.1M CaCl 2 , 120 ⁇ L of trace metal solution, 400 mg chloramphenicol pre-dissolved in 4 mL of ethanol, 500 ng carbenicillin, 1 g spectinomycin, 500 mL DI water, and 20 L Harvard Custom Media A solution).
- a flow rate of 7.5 mL/hour was set in the lagoons, and cells were induced with 10 mM arabinose. This set-up was allowed to equilibrate for at least an hour before selection phage infection.
- PASTE experiments 100 ng of prime editor plasmid and 100 ng of Bxb1 plasmid were replaced with 200 ng of PASTEv3 construct (Addgene: #179105).
- 20 ng of pegRNA plasmid, and 10 ng of nicking sgRNA plasmid were used.
- 100 ng of prime editor or 200 ng of PASTE v3 along with 10 ng of each pegRNA were transfected.
- 25 ng of each pegRNA was transfected instead of 10 ng, and the amount of all other components were kept the same as above.
- genomic DNA for ddPCR, the above mixture was further purified using the DNAdvance kit from Beckman Coulter (Cat# A48705), according to the manufacturer’s protocol. Briefly, 60 ⁇ L of Pre-Bind PBBA buffer was mixed with 30 ⁇ L of cell lysate. Next, 60 ⁇ L of Bind BBE buffer with beads was added and thoroughly mixed. The beads were washed twice with 200 ⁇ L of freshly prepared 70% ethanol and air-dried for five minutes before eluting in 20-30 ⁇ L of water or elution buffer. Final DNA concentrations were determined using Nano Drop (Thermo Fisher Scientific).
- PCR2 a second 25 ⁇ L PCR reaction
- PCR2 was performed using 0.5 ⁇ M of unique forward and reverse Illumina barcoding primer pair, Phusion U Hot Start II DNA polymerase, 1 ⁇ L of PCR1 product, and water.
- PCR2 was performed with the following conditions: 98 °C for 2 min, 10 cycles of [98 °C for 10 s, 61 °C for 20 s and 72 °C for 30 s], and finally a 72 °C extension for 2 min.
- Products from PCR2 were combined, electrophoresed on a 1.5% agarose gel, and extracted using QIAquick Gel Extraction Kit (Qiagen).
- DNA concentration of the resulting library was quantified using a Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific). The library was then normalized and sequenced on an Illumina Miseq instrument according to the manufacturer’s protocols. Individual sequencing reads were demultiplexed using MiSeq Reporter (Illumina). [0536] CRISPResso2 61 was used to analyze high-throughput sequencing reads, as previously described 25 .
- CRISPResso2 was executed on HDR mode with the following parameters for each edit: “e” specified the amplicon expected after editing, “qwc” specified the quantification window which was set between 10-bp upstream of the first nick and 10-bp downstream of the second nick, “discard_indel_reads” was set to TRUE, and “q” was set to 30.
- Percent editing was quantified by multiplying the ratio of non-discarded HDR aligned reads and total reads aligned to all amplicons by 100.
- Percent indels were quantified by multiplying the ratio of indel-containing discarded reads and total reads aligned to all amplicons by 100.
- ddPCR analysis was used to determine the abundance of genomic DNA fragments containing the genome–donor junction in comparison to a reference gene. Approximately, 50–200 ng of bead- purified DNA was added to a 25 ⁇ L reaction mixture containing 1) 2X ddPCR Supermix for Probes (No dUTP) (Bio-Rad, 1863025), 2) reference gene primer pair + probe master mix from
- FIG.30C shows the ddPCR plots used to assess integration efficiency at the FANCA locus using PASSIGE (Data shown in FIG.30A).
- FAM channel genome-donor junction
- a first line is drawn right above the negative cloud (3221, dashed line).
- a second line is drawn around the mean of the positive cloud (4493, dashed line).
- the same process is used to determine the threshold for the reference (HEX channel). In this case, the threshold for the reference is 3265.
- FIG.30D shows ddPCR plots for the FAM channel and corresponding % integration values obtained when using genome-donor junction binding probes used in this study.
- An attL or attR binding probe was used to assess integration efficiencies at the GBA1 and FANCA loci respectively (Data shown in FIG.30A). False positives from plasmid-donor recombined products
- transfected cells were trypsinized, resuspended in PBS solution, and assessed for mCherry fluorescence using the CytoFLEX S Flow Cytometer (Beckman Coulter) software. The following highlights an example of how the cells were gated.
- FIG.28C To assess off-target integration in FIG.28C, a representative flow cytometry plot, shown in FIGs.28E-28F, was used.15,000 HEK293T cells were transfected with a recombinase variant and mCherry donor plasmid. The cells were passaged for 14 days before performing flow cytometry analysis.
- FIG.28E shows an untreated sample.
- FIG.28F shows the histogram used to assess mCherry + cells when transfecting cells with either the dead Bxb1 negative control or eeBxb1.
- V1L represents mCherry- cells
- V1R represents mCherry + cells.
- Minicircle donor production [0543] Minicircle donor DNA was prepared using MC-Easy minicircle DNA Production Kit (System Biosciences, Cat #: MA925A-1).
- the Factor IX donor plasmid (Addgene, # 182141) was transformed into ZYCY10P3S2T Minicircle Production Cells.
- Transformed ZYCY10P3S2T cells were inoculated in 2 mL of LB media with kanamycin and grown for one hour at 30 °C. Then, 1 mL of this culture was inoculated into 200 mL of LB media without antibiotics and grown overnight at 30 °C with shaking at 225 rpm. The next day, 200 mL of induction media was added before the OD600 of the overnight culture reached 0.6. This culture was further grown for 3 hours at 30 °C and 1 hour at 37 °C with shaking at 225 rpm.
- Qiagen Plasmid Maxi Kit was used to extract the minicircle DNA. To check the quality of the minicircle, 1 ⁇ g of the maxi-prepped product was linearized using restriction enzymes and electrophoresed on a 1% agarose gel. The F9 minicircle donor was used to transfect HuH7 cells as described above.
- CRISPR-Cas9 induces large structural variants at on-target and off- target sites in vivo that segregate across generations. Nat Commun 13, 627 (2022). [0565] 21. Lampe, G.D. et al. Targeted DNA integration in human cells without double-strand breaks using CRISPR-associated transposases. Nat Biotechnol (2023). [0566] 22. Klompe, S.E., Vo, P.L.H., Halpin-Healy, T.S. & Sternberg, S.H. Transposon-encoded CRISPR-Cas systems direct RNA-guided DNA integration. Nature 571, 219-225 (2019). [0567] 23. Strecker, J. et al.
- the invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process.
- the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim.
- any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim.
- elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biomedical Technology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Mycology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Plant Pathology (AREA)
- Enzymes And Modification Thereof (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The present disclosure provides evolved and engineered Bxbl recombinase variants. The recombinase variants provided herein exhibit increased recombination activity, for example, between recombinase recognition sites that have been introduced into one or more target DNA sequences (e.g., in a genome) using prime editing. The present disclosure also provides systems and compositions comprising the recombinase valiants described herein and a prime editor and pegRNA, or polynucleotides encoding each of the recombinase variant, prime editor, and pegRNA. Methods for editing a target nucleic acid using the recombinase valiants provided herein and prime editing (e.g., for insertion, deletion, exchange, inversion, or translocation) are also described in the present disclosure.
Description
EVOLVED RECOMBINASES FOR EDITING A GENOME IN COMBINATION WITH PRIME EDITING RELATED APPLICATIONS [0001] This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application, U.S.S.N.63/484,184, filed February 9, 2023, and U.S. Provisional Application, U.S.S.N. 63/619,465, filed January 10, 2024, each of which is incorporated herein by reference. GOVERNMENT SUPPORT [0002] This invention was made with government support under grant numbers R01 HL160970, U01 AI142756, R01 EB031172, RM1 HG009490, and R35 GM118062 awarded by the National Institutes of Health. The government has certain rights in the invention. BACKGROUND OF THE INVENTION [0003] Efficient, programmable, and site-specific genome modification remains a longstanding goal of genetics and medicine. In particular, the ability to efficiently and accurately render large- scale genomic changes, such as gene-level or exon-level insertions, inversions, translocations, and deletions has long been sought. Such methods would greatly advance the state of the art. Indeed, numerous genetic diseases caused by large-scale genomic defects, such as gene loss, gene inversion, gene duplication, and chromosomal translocations, could be treated with gene editing technologies that are capable of making such large-scale genetic modifications. [0004] Early attempts at making such large-scale changes in the genome were focused on harnessing the power of homologous recombination. For instance, previous methods involved directing recombination to loci of interest (e.g., a disease-associated inverted or duplicated gene) based on available endogenous sites for recombination. This strategy was hampered by poor efficiency. More recent efforts have exploited the ability of double-stranded DNA breaks (DSBs) to induce homology-directed repair (HDR). See, e.g., International PCT Application, PCT/US2017/046144, filed August 9, 2017, and published as WO 2018/031683, which is incorporated herein by reference. Homing endonucleases and programmable endonucleases, such as zinc finger nucleases, TALE nucleases, and Cas9 nucleases, have been used to introduce targeted DSBs and induce HDR in the presence of donor DNA. In most post-mitotic cells,
B1195.70174WO00 12131093.2
however, DSB-induced HDR is strongly down-regulated and generally inefficient. Moreover, repair of DSBs by error-prone repair pathways, such as non-homologous end-joining (NHEJ) or single-strand annealing (SSA), causes random insertions or deletions (indels) of nucleotides at the DSB site at a higher frequency than HDR. The efficiency of HDR can be increased if cells are subjected to conditions forcing cell-cycle synchronization, or if the enzymes involved in NHEJ are inhibited. However, such conditions can cause many random and unpredictable events, limiting potential applications. [0005] More recently, the use of prime editing to introduce recombinase recognition sites at desired locations in one or more target nucleotide sequences that can then be used by a recombinase to catalyze such large genomic changes has been described. See, e.g., International PCT Application, PCT/US2020/023721, filed March 19, 2020, and published as WO 2020/191239, which is incorporated herein by reference. Additional compositions, systems, and methods capable of efficiently and accurately introducing large-scale genomic changes (such as gene-level or exon-level insertions, inversions, translocations, and deletions) using prime editing and a recombinase would significantly advance gene editing. SUMMARY OF THE INVENTION [0006] The present disclosure describes the development of evolved and engineered recombinases using PACE, PANCE, and other genetic engineering methods. The recombinase variants provided herein exhibit increased recombination activity, for example, at recombinase recognition sites that have been introduced into a nucleic acid (e.g., genome of an organism) using prime editing. In particular, the recombinase variants provided herein exhibit increased insertion efficiency of donor DNA molecules at recombinase recognition sites of 2-fold or more (e.g., 3-fold or more, 4-fold or more, 5-fold or more, 6-fold or more, 7-fold or more, 8-fold or more, 9-fold or more, or 10-fold or more) relative to wild-type Bxb1 recombinase. The instant disclosure provides evolved and engineered recombinases, systems, compositions, polynucleotides and vectors, kits, and methods that leverage the power of prime editing (PE), e.g., single-flap or “classical” PE, twinPE, or multi-flap PE (also known as quadruple flap PE), to carry out site-specific and large-scale genetic modifications. Such modifications include, but are not limited to, insertions, deletions, inversions, replacements, and chromosomal
B1195.70174WO00 12131093.2
translocations of chromosomes or portions thereof, chromosomal loci, or one or more genes or regions thereof, such as exons, introns, or regulatory regions of a gene. [0007] Thus, in one aspect, the present disclosure provides Bxb1 recombinases comprising an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions at positions selected from the group consisting of amino acid residues 3, 5, 10, 14, 15, 20, 23, 24, 25, 29, 35, 36, 39, 40, 43, 45, 47, 49, 50, 51, 54, 58, 60, 66, 68, 69, 70, 73, 74, 75, 78, 84, 86, 87, 89, 93, 95, 97, 100, 101, 105, 116, 119, 124, 127, 139, 147, 154, 157, 158, 169, 175, 179, 181, 183, 185, 194, 197, 199, 202, 203, 204, 207, 208, 209, 214, 221, 229, 239, 248, 252, 261, 266, 267, 273, 279, 280, 281, 284, 285, 287, 288, 291, 309, 311, 321, 328, 333, 334, 342, 343, 345, 347, 360, 361, 362, 365, 368, 374, 375, 378, 389, 393, 400, 411, 415, 419, 421, 422, 424, 434, 435, 438, 440, 447, 449, 453, 462, 463, 466, 468, 469, 478, 483, 485, 487, 490, 494, 496, and 497 of the amino acid sequence provided in SEQ ID NO: 1, or at corresponding positions in a homologous recombinase (e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase). In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3X, V5X, S10X, D14X, A15X, E20X, L23X, E24X, S25X, L29X, W35X, D36X, G39X, V40X, D43X, D45X, S47X, A49X, V50X, D51X, D54X, R58X, N60X, A66X, E68X, E69X, Q70X, D73X, V74X, I75X, Y78X, T84X, S86X, I87X, H89X, L93X, H95X, A97X, H100X, K101X, V105X, T116X, A119X, A124X, G127X, E139X, F147X, Y154X, S157X, L158X, D169X, V175X, V179X, R181X, R183X, L185X, N194X, P197X, H199X, A202X, H203X, D204X, R207X, R208X, G209X, K214X, Q221X, E229X, M239X, A248X, G252X, A261X, A266X, E267X, E273X, R279X, A280X, E281X, K284X, T285X, R287X, A288X, A291X, E309X, A311X, H321X, S328X, K333X, H334X, M342X, A343X, W345X, A347X, A360X, E361X, R362X, K365X, V368X, A374X, V375X, A378X, S389X, S393X, S400X, A411X, A415X, E419X, E421X, G422X, E424X, E434X, T435X, R438X, G440X, D447X, A449X, T453X, L462X, T463X, V466X, G468X, G469X, D478X, E483X, H485X, R487X, S490X,
B1195.70174WO00 12131093.2
R494X, H496X, and T497X relative to the amino acid sequence provided in SEQ ID NO: 1, or at one or more corresponding positions in a homologous recombinase (e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase), wherein X represents any amino acid other than the wild type amino acid. In certain embodiments, the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3T, A3V, V5F, V5I, S10A, D14N, A15T, E20K, E20Q, L23F, L23M, E24K, S25I, L29F, W35P, W35L, D36G, D36V, G39D, V40I, D43E, D45G, S47A, A49E, A49T, V50I, D51E, D51N, D51Y, D54N, R58K, N60S, A66T, E68K, E69D, Q70P, D73G, V74A, V74M, I75V, Y78H, Y78N, T84S, S86G, S86N, S86T, I87T, I87V, H89N, L93M, H95Y, A97S, H100Y, K101R, V105I, T116P, A119S, A124S, G127E, E139A, F147Y, Y154C, S157G, L158M, D169N, V175I, V175M, V179G, V179M, R181Q, R183L, L185M, N194D, N194K, P197Q, P197T, H199Y, A202S, H203Y, D204G, R207I, R207Q, R208S, G209V, K214R, Q221R, E229K, M239L, A248V, G252S, A261V, A266T, E267D, E273D, E273K, R279C, A280T, E281K, K284N, T285A, R287P, A288T, A291S, A291T, E309D, A311V, H321N, S328T, K333N, H334P, M342V, A343T, W345L, A347T, A347V, A360T, E361D, E361G, R362K, K365N, V368A, V368N, A374V, V375I, A378T, S389R, S393F, S400Y, A411V, A415V, E419D, E421K, G422S, E424G, E434G, T435A, R438Q, G440E, D447N, A449V, T453A, T453N, L462M, T463I, V466M, G468D, G469R, D478E, E483K, H485Y, R487K, S490N, R494Q, H496P, and T497A relative to the amino acid sequence of SEQ ID NO: 1, or at corresponding positions in a homologous recombinase (e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase). In some embodiments, the Bxb1 recombinase comprises a combination of substitutions as in any one of the variants listed in Tables 1-7 provided herein. [0008] In some embodiments, the Bxb1 recombinase comprises a V74X mutation relative to the amino acid sequence of SEQ ID NO: 1, wherein X is any amino acid other than the wild type amino acid. In certain embodiments, the Bxb1 recombinase comprises a V74A mutation relative to the amino acid sequence of SEQ ID NO: 1. In some embodiments, the Bxb1 recombinase comprises V74X, E229X, and V375X mutations relative to the amino acid sequence of SEQ ID
B1195.70174WO00 12131093.2
NO: 1, wherein X is any amino acid other than the wild type amino acid. In certain embodiments, the Bxb recombinase comprises V74A, E229K, and V375I mutations relative to the amino acid sequence of SEQ ID NO: 1. [0009] In another aspect, the present disclosure provides systems comprising any of the Bxb1 recombinases provided herein. In some embodiments, a system comprises any of the Bxb1 recombinases provided herein, a prime editor, and one or more prime editing guide RNAs (pegRNAs) comprising a DNA synthesis template encoding a recombinase recognition site. In some embodiments, a system comprises a polynucleotide encoding any of the Bxb1 recombinases provided herein, a polynucleotide encoding a prime editor, and one or more prime editing guide RNAs (pegRNAs) or one or more polynucleotides encoding one or more pegRNAs, wherein each pegRNA comprises a DNA synthesis template encoding a recombinase recognition site. In certain embodiments, the system further comprises a polynucleotide comprising DNA (e.g., one or more donor genes) for insertion into a target nucleic acid (e.g., at a recombinase recognition site newly installed using prime editing). In certain embodiments, the DNA for insertion is flanked by one or two recombinase recognition sites. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0010] In some embodiments, the present disclosure provides systems for inserting DNA into a target nucleic acid comprising: (i) a pegRNA or a first polynucleotide encoding the pegRNA, wherein the pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0011] In some embodiments, the present disclosure provides systems for exchanging DNA in a target nucleic acid comprising:
B1195.70174WO00 12131093.2
(i) one or more pegRNAs or a first polynucleotide encoding one or more pegRNAs, wherein the one or more pegRNAs each comprise a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0012] In some embodiments, the present disclosure provides systems for deleting DNA from a target nucleic acid comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0013] In some embodiments, the present disclosure provides systems for recombining target nucleic acids (e.g., in two chromosomes) comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for
B1195.70174WO00 12131093.2
installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site on a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0014] In some embodiments, the present disclosure provides systems for inverting a target nucleic acid comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation of the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0015] In another aspect, the present disclosure provides compositions comprising any of the Bxb1 recombinases provided herein. In some embodiments, a composition comprises any of the Bxb1 recombinases provided herein, a prime editor, and one or more prime editing guide RNAs (pegRNAs) comprising a DNA synthesis template encoding a recombinase recognition site. In some embodiments, a composition comprises a polynucleotide encoding any of the Bxb1 recombinases provided herein, a polynucleotide encoding a prime editor, and one or more prime editing guide RNAs (pegRNAs) or one or more polynucleotides encoding one or more
B1195.70174WO00 12131093.2
pegRNAs, wherein each pegRNA comprises a DNA synthesis template encoding a recombinase recognition site. In certain embodiments, the composition further comprises a polynucleotide comprising DNA (e.g., one or more donor genes) for insertion into a target nucleic acid. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0016] In some embodiments, the present disclosure provides compositions for inserting DNA into a target nucleic acid comprising: (i) a pegRNA or a first polynucleotide encoding the pegRNA, wherein the pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0017] In some embodiments, the present disclosure provides compositions for exchanging DNA in a target nucleic acid comprising: (i) one or more pegRNAs or a first polynucleotide encoding one or more pegRNAs, wherein each of the one or more pegRNAs comprises a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
B1195.70174WO00 12131093.2
[0018] In some embodiments, the present disclosure provides compositions for deleting DNA from a target nucleic acid comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0019] In some embodiments, the present disclosure provides compositions for recombining target nucleic acids (e.g., in two chromosomes) comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site on a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
B1195.70174WO00 12131093.2
[0020] In some embodiments, the present disclosure provides compositions for inverting a target nucleic acid comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation of the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0021] In another aspect, the present disclosure provides polynucleotides and vectors encoding any of the Bxb1 recombinases described herein. [0022] In another aspect, the present disclosure provides cells comprising any of the Bxb1 recombinases, polynucleotides, vectors, and/or compositions described herein. [0023] In another aspect, the present disclosure provides kits comprising any of the Bxb1 recombinases, polynucleotides, vectors, and/or compositions described herein. [0024] In another aspect, the present disclosure provides methods for modifying one or more target nucleic acids in a cell comprising contacting the one or more target nucleic acids with any of the recombinases provided herein. [0025] In another aspect, the present disclosure provides methods for modifying a target nucleic acid in a cell using prime editing and a recombinase comprising expressing in the cell a polynucleotide encoding any of the Bxb1 recombinases provided herein, a polynucleotide encoding a prime editor, and one or more polynucleotides encoding one or more prime editing guide RNAs (pegRNAs) comprising DNA synthesis templates encoding one or more recombinase recognition sites. [0026] In another aspect, the present disclosure provides methods for modifying a target nucleic acid (e.g., DNA, such as genomic DNA) in a cell using prime editing and a recombinase comprising expressing in the cell a polynucleotide encoding any of the Bxb1 recombinases
B1195.70174WO00 12131093.2
provided herein and a polynucleotide encoding a prime editor, and providing to the cell one or more prime editing guide RNAs (pegRNAs) comprising DNA synthesis templates encoding one or more recombinase recognition sites. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0027] In some embodiments, the present disclosure provides methods for inserting DNA into a target nucleic acid in a cell using prime editing and a Bxb1 recombinase provided herein. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding a pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination with the second recombination site, resulting in insertion of the DNA into the target nucleic acid. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0028] In some embodiments, the method is a method for exchanging DNA in a target nucleic acid in a cell using prime editing and a recombinase. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding one or more pegRNAs comprising a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and
B1195.70174WO00 12131093.2
(iv) a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs a first instance and a second instance of the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination between the first recombinase recognition sites in the target nucleic acid and the second recombinase recognition sites flanking the DNA, resulting in excision of the target nucleic acid sequence between the first instance and the second instance of the first recombinase recognition site and insertion of the DNA in its place. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0029] In some embodiments, the method is a method for deleting DNA from a target nucleic acid in a cell using prime editing and a recombinase. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein; wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated deletion of the nucleic acid between the first and the second recombinase recognition sites. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA.
B1195.70174WO00 12131093.2
[0030] In some embodiments, the method is a method for recombining target nucleic acids (e.g., in two chromosomes) in a cell using prime editing and a recombinase. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site on a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein; wherein a first prime editor installs the first recombinase recognition site into a target nucleic acid on a first chromosome, and a second prime editor installs the second recombinase recognition site into a target nucleic acid on a second chromosome, thereby facilitating Bxb1- mediated recombination between the two chromosomes. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0031] In some embodiments, the method is a method for inverting a target nucleic acid in a cell using prime editing and a recombinase. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation as the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein;
B1195.70174WO00 12131093.2
wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated inversion of the nucleic acid between the first and the second recombinase recognition sites. In certain embodiments, the polynucleotide encoding the prime editor and/or the polynucleotide encoding the Bxb1 recombinase comprise RNA. [0032] It should be appreciated that the foregoing concepts, and additional concepts discussed below, may be arranged in any suitable combination, as the present disclosure is not limited in this respect. Further, other advantages and novel features of the present disclosure will become apparent from the following detailed description of various non-limiting embodiments when considered in conjunction with the accompanying figures. BRIEF DESCRIPTION OF THE DRAWINGS [0033] The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present disclosure, which can be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein. [0034] FIG.1 provides schematics for applications of using prime editors in tandem with large serine recombinase (Bxb1), e.g., for large DNA integration, cassette exchange, genomic deletion, gene inversion, and translocation. Evolved Bxb1 (eBxb1) enhances one-pot and sequential editing of large genomic DNA. The present disclosure describes the use of directed evolution to yield novel Bxb1 variants that overcome intrinsic enzymatic barriers and enhance gene editing efficiency. Combination of engineered prime editors and eBxb1 variants further enhances large DNA editing, expanding the number of pathogenic mutations that are targetable via ex vivo and in vivo therapeutics. [0035] FIGs.2A-2B provide schematics of PACE and PANCE circuits for evolving recombinases for integration (insertion of DNA from, for example, a donor plasmid or other nucleic acid molecule at a recombinase recognition site) (FIG.2A) and cassette exchange (exchange of DNA from, for example, a donor plasmid or other nucleic acid molecule with a DNA sequence between two recombinase recognition sites) (FIG.2B). The selection phage (SP) encodes for the recombinase being evolved. In the bacterial host cells, plasmid P1 contains a
B1195.70174WO00 12131093.2
Pro1 promoter, and the accessory plasmid (AP) encodes a promoter-less gIII. After recombination, Pro1 is placed in front of gIII which drives its expression. (FIG.2A) Plasmid integration circuit. A single att site is present in the AP and P1 plasmids. After recombination, the AP and P1 combine to form a large AP+P1 plasmid. (FIG.2B) Cassette-exchange circuit. Two att sites, facing each other, are located in the AP and P1 plasmids. After recombination, sequences present between the att sites in each plasmid are recombined. [0036] FIG.3 provides a flow chart for design of the evolution trajectory of PACE and PANCE for evolving Bxb1 recombinase. Seven individual evolution campaigns were performed to generate a suit of Bxb1 variants. [0037] FIG.4 shows genotypes of evolved Bxb1 recombinase variants from PANCE version 1 (v1) evolution. [0038] FIG.5 shows characterization of PANCEv1 variants in AAVS1-attP and CCR5-attB HEK293T stable cells with either attP in the AAVS1 locus or attB in the CCR5 locus. Codon- optimized Bxb1 with PANCEv1 mutations was used, and percent knockin was assessed by ddPCR. Ten variants showed improvements over codon-optimized wild type Bxb1 (“CO- Bxb1”), some of which exhibited a two-fold improvement in knockin efficiency. Data was generated using ddPCR using a 5.6 kb donor. [0039] FIG.6 shows characterization of PANCEv1 variants in a one-pot system (transfection of HEK293T cells with Bxb1, PE2, dual pegRNAs, and DNA donor at the same time). The prime editor first installs the att site into the genome. The recombinase then integrates the desired DNA cargo. A flow-based assay showed ten variants generated from PANCEv1 with improved knockin efficiency compared to codon-optimized wild type Bxb1 (“Bxb1 CO”) in HEK293T cells. Flow was performed to assess percent knockin two weeks after transfection. [0040] FIG.7 shows variants from PANCEv1 that show improved integration efficiency over wild type enzyme. [0041] FIG.8 shows that variants from PANCEv2 improve 5.6 kb donor integration in a one-pot system (transfection of HEK293T cells with Bxb1, PE2, dual pegRNAs, and DNA donor). Twenty-two variants showed improved knockin efficiency over codon optimized wild type Bxb1 (“Bxb1-CO”) system. The highest fold-improvement observed was 1.6-fold. Data was generated using ddPCR, and a 5.6 kB of donor DNA plasmid was used.
B1195.70174WO00 12131093.2
[0042] FIG.9 shows variants from PANCEv2 that show improved integration efficiency in mammalian cells over wild type enzyme. [0043] FIG.10 shows that PACEv1 yields variants that improve gene integration in the one-pot system in HEK293T cells. Fourteen out of twenty PACEv1 variants showed improved knockin efficiency over the codon optimized wild type Bxb1 (“coBxb1”) system. The highest fold- improvement observed was two-fold. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used. [0044] FIG.11 shows variants from PACEv1 that showed improved DNA integration efficiency in mammalian cells over wild type enzyme. [0045] FIG.12 shows a first test of DNA integration efficiency with new variants from PANCEv3, PANCEv4, PACEv3, and PANCEv5 in stable HEK293T cells with either attP in the AAVS1 locus or attB in the CCR5 locus.100 ng large serine recombinase (LSR; Bxb1) and 150 ng 5.6 kb DNA donor were used. Data was generated using ddPCR. Several Bxb1 variants showed improved activity compared to codon-optimized wild type Bxb1 (“Bxb1-CO”). B-30 WT Bxb1: wild type Bxb1. Bxb1-CO: codon-optimized wild type Bxb1. B100.18 (L29F), B100.30 (E20Q), B100.23 (D14N, E483K) , PACEv1_6 (E24K), B100.17 (D14N, E273D), B-55 (Y78N), B100.28 (E20Q), B100.16 (D14N, E267D), B100.29 (L29F, R183L, K333N), B-57 (W35L). [0046] FIG.13 shows a second test of DNA integration efficiency with new variants from PANCEv3, PANCEv4, PACEv3, and PANCEv5 in HEK293T stable cells with attP in the AAVS1 locus.100 ng LSR and 150 ng 5.6 kb DNA donor were used. Data was generated using ddPCR. B-30 WT Bxb1: wild type Bxb1. Bxb1-CO: codon-optimized wild type Bxb1. B100.74 (V74M, M342V), B100.50 (V5I, V74M, M239L, T453N, G468D), B100.51 (V5I, S86N, S157G, K214R, E273D, E361G), B100.32 (D14N), B100.36 (V105I), B.55 (Y78N), B100.38 (I78V, E361D), B100.41 (A49T, S86T, T116P), B.63 (L29F), B.59 (D51Y), B100.65 (V5I, D14N, R207Q), B100.75 (V74A), B100.72 (V50I, I87V, R208S, V375I), B100.76 (D51E, V375I), B100.73 (D51E, V375I). [0047] FIG.14 shows improved variants from PANCEv3, PANCEv4, PACEv3, and PANCEv5 in HEK293T stable cells. Thirty-six out of eighty-one variants tested from five different evolutions were > 1.1x better on average than wild type Bxb1 in stable cell lines with either attP in the AAVS1 locus or attB in the CCR5 locus. Data was generated using ddPCR, and a 5.6 kB
B1195.70174WO00 12131093.2
donor DNA plasmid was used. B-30 WT Bxb1: wild type Bxb1. Bxb1-CO: codon-optimized wild type Bxb1. B-30 WT Bxb1: wild type Bxb1. Bxb1-CO: codon-optimized wild type Bxb1. B100.16 (D14N, E267D), B100.43 (E20Q, E361D), B100.28 (E20Q), B100.49 (V5I, S86N, H321N), B100.50 (V5I, V74M, M239L, T453N, G468D), B100.17 (D14N, E273D), B100.30, B100.32 (D14N), B100.23 (D14N, E483K), B100.76 (D51E, V375I), B100.41 (A49T, S86T, T116P), B100.36 (V105I), B100.65 (V5I, D14N, R207Q), B100.73 (D51E, V375I), B100.38 (I78V, E361D), B100.18 (L29F), B100.72 (V50I, I87V, R208S, V375I), B100.75 (V74A). [0048] FIG.15 shows variants from PANCEv3, PANCEv4, PACEv3, and PANCEv5 that show improved integration efficiency over wild type enzyme in mammalian cells. [0049] FIG.16 shows that combined single and double mutations in Bxb1 improve DNA integration efficiency.55 ng LSR and 55 ng 5.6 kb DNA donor were used. A stable cell line with attP in the AAVS1 locus was used. A low dose of recombinase and DNA donor was used to dissect differences between variants. Three double mutants showed improvements over single point mutant variants. Data was generated using ddPCR. In the variant names, “CO” means codon-optimized, and “non-CO” refers to the un-optimized wild type Bxb1 sequence. Bxb1-CO: codon-optimized wild type Bxb1. [0050] FIG.17 shows DNA integration efficiency of additional Bxb1 double mutants. Combining mutations from PACE improves large-gene integration. Additional double-mutant variants were assessed in stable cell lines with either attP in the AAVS1 locus or attB in the CCR5 locus. Eight different double-mutant variants showed improvements over single-mutant variants. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used. [0051] FIG.18 shows DNA integration efficiency of further Bxb1 double mutants. Additional double-mutant variants were assessed in stable cell lines with either attP in the AAVS1 locus or attB in the CCR5 locus. Five different double-mutant variants showed improvements over single- mutant variants. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used. [0052] FIG.19 shows rationally designed Bxb1 double mutants that show improved large-gene integration efficiency in mammalian cells. [0053] FIG.20 shows further one-pot delivery optimization of plasmid doses. Optimization of plasmid dose was shown to be important for improving one-pot knockin efficiencies. Integration efficiencies were quantified using unique molecular identifier (UMI) barcoding.
B1195.70174WO00 12131093.2
[0054] FIG.21 shows additional optimization of plasmid doses for one-pot delivery. Integration efficiencies were quantified using ddPCR. Optimal plasmid condition is highlighted by the black box. The plasmid dose that resulted in the highest integration efficiency was 100 ng of PE2- encoding plasmid, 10 ng each of pegRNA-encoding plasmid and Bxb1 recombinase-encoding plasmid, and 150 ng of DNA donor plasmid. [0055] FIG.22 shows a schematic for further one-pot delivery optimization of trimmed dual pegRNAs to promote donor DNA integration into the genome and avoid donor/pegRNA plasmid recombination. [0056] FIG.23 shows data for further one-pot delivery optimization of trimmed dual pegRNAs. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used. The B200.1 variant showed the best donor integration efficiency (up to 12.5 % efficiency with a trimmed pegRNA pair that contains a 28 bp overlap length). [0057] FIG.24 shows further optimization of one-pot delivery through the use of evolved reverse transcriptase variants. Variants of the reverse transcriptases Ec48 and Tf1 showed the highest integration efficiency. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used. [0058] FIG.25 shows further optimization of one-pot delivery through the use of evolved Cas9 variants to improve integration efficiency. Data was generated using ddPCR, and a 5.6 kB donor DNA plasmid was used. [0059] FIGs.26A-26D show phage-assisted evolution of the Bxb1 recombinase for PASSIGE. FIG.26A shows an overview of PASSIGE. Prime editing (dual-flap or single-flap) precisely installs a large serine recombinase attachment site (attB or attP) into a targeted locus in the genome. A recombinase then recognizes the installed att motif and integrates donor DNA into this site. FIG.26B shows an overview of phage-assisted continuous evolution (PACE). The selection phage (SP) encodes the protein being evolved. Host E. coli cells encode a mutagenesis plasmid (MP), as well as plasmids that link the activity of the evolving protein to expression of gIII, an essential phage gene. Only phage that encode active variants trigger gIII expression and propagate. A constant dilution of host cells and media washes out inactive phage variants that are unable to propagate faster than the dilution rate. FIG.26C shows a schematic of the recombinase-PACE selection circuit. Bxb1 recombinase is encoded on the SP. Host cells harbor plasmid P1 that encodes promoter Pro1, and plasmid P2 that encodes a promoter-less gIII
B1195.70174WO00 12131093.2
cassette. Bxb1-mediated recombination places Pro1 upstream of the gIII cassette, driving its expression. In circuit 1, two attachment sites are present in each plasmid resulting in two recombination events that exchanges sequences between P1 and P2. In circuit 2, one attachment site is present in each plasmid resulting in one recombination event that integrates P1 and P2. FIG.26D shows PANCE phage titer for the evolution of Bxb1 recombinase across six circuits (1.1-1.4 and 2.1-2.2). Each trace reflects the mean value of phage titers across four different lagoons. Individual traces for each lagoon are shown in FIG.31. Selection stringency was modulated by decreasing the selection time and increasing dilution factor. Unless otherwise indicated, each passage was performed overnight, and phage were diluted 1:50 after each passage. [0060] FIGs.27A-27E show characterization of evolved Bxb1 variants in mammalian cells. FIG.27A shows a summary of the Bxb1 evolution campaign. p=PANCE passages, h=hours. FIG.27B shows a heatmap of fold-change in integration efficiency compared to wild-type (WT) Bxb1 for evolved variants. A 5.6-kB donor plasmid along with either recombinase-dead Bxb1, WT Bxb1, or an evolved variant were transfected into HEK293T cells with either pre-installed attP in AAVS1 or attB in CCR5. Each square reflects the mean value for three independent replicates. Absolute integration efficiencies and genotype for each Bxb1 variant are reported in Table 6. FIG.27C shows absolute integration efficiencies for 15 evolved Bxb1 variants with the highest activity, and WT Bxb1 from FIG.27B. Bars reflect the mean of three independent replicates, and dots show individual n=3 replicate values. FIG.27D shows a Alphafold2- predicted structure of the Bxb1 recombinase. The three distinct domains, NTD, CTD-a, and CTD-b are shown. All mutated residues in each domain are listed. FIG.27E shows the predicted positions of mutated residues that gave the highest integration efficiencies. Residues are mapped onto the AlphaFold2 predicted structure of the NTD of Bxb1. Integration efficiency was assessed by ddPCR analysis as described in Example 2. [0061] FIGs.28A-28F show a characterization of evolved Bxb1 variants for PASSIGE. FIG. 28A shows absolute integration efficiencies for ten evolved Bxb1 variants with the highest activity from FIGs.27B-27C, and wild-type (WT) Bxb1 in the PASSIGE system. Dual pegRNAs were used to install attP into AAVS1 or attB into CCR5 in HEK293T cells. FIG.28B shows absolute integration efficiencies for PASSIGE (WT Bxb1), evoPASSIGE (Bxb1-V74A), and eePASSIGE (Bxb1-V74A+E229K+V375I). Dual pegRNAs were used to install attP into
B1195.70174WO00 12131093.2
AAVS1 or attB into CCR5 in HEK293T cells and attB into Rosa26 in N2a cells. FIG.28C shows a percentage of mCherry positive cells 14 days after transfecting a 3.2-kB donor DNA plasmid along with either dead Bxb1, wild-type (WT) Bxb1, evoBxb1, or eeBxb1. The donor plasmid either has an attP or attB site and encodes mCherry under the CMV promoter. Statistical significance was calculated using Student’s unpaired two-tailed t-test, ***P < 0.001. FIG.28D shows recommended configurations for PASSIGE using evoBxb1 and eeBxb1. When installing attP into the genome, eePASSIGE may be used due to its high-efficiency and undetected off- target integration. When installing attB into the genome, evoPASSIGE may be used due to off- target integration observed when using eePASSIGE in this orientation. For PASSIGE experiments, prime editor protein, dual pegRNAs, 5.6-kB DNA donor plasmid, and a recombinase variant were all delivered into cells using single-transfection (FIGs.28A-28B). Bars reflect the mean of three independent replicates and dots show individual n=3 replicate values (FIGs.28A-28C). Integration efficiency was assessed by ddPCR analysis as described in Example 2. FIGs.28E-28F show a representative flow cytometry plot used to assess off-target integration in FIG 28C. FIG.28E shows an untreated sample. FIG.28F shows the histogram used to assess mCherry+ cells when transfecting cells with either the dead Bxb1 negative control or eeBxb1. V1L represents mCherry- cells, and V1R represents mCherry+ cells. [0062] FIGs.29A-29F show characterization of PASSIGE, evoPASSIGE, eePASSIGE, and PASTE. FIG.29A shows a comparison of targeted 5.6-kB donor DNA plasmid integration efficiencies when installing either attP or attB into the genome using PASSIGE, evoPASSIGE, and eePASSIGE at the AAVS1, CCR5, ACTB, and Rosa26 loci in mammalian cells. FIG.29B shows a comparison of PASTE, PASSIGE, evoPASSIGE, and eePASSIGE. Dual pegRNAs were used to install attP into AAVS1 and ACTB, and attB into CCR5 and Rosa26. FIG.29C shows absolute integration efficiencies for PASSIGE, evoPASSIGE, eePASSIGE, and PASTE at eight different therapeutically relevant genomic sites. Integration was assessed when installing both attP and attB into each locus separately. FIG.29D shows fold-change in integration efficiencies relative to PASSIGE for evoPASSIGE and eePASSIGE across all sites tested in Example 2. FIG.29E shows fold-change in integration efficiencies relative to PASTE for PASSIGE, evoPASSIGE, and eePASSIGE across all sites tested in Example 2. FIG.29F shows absolute integration efficiencies at 12 sites when using integration strategies with undetectable off-target. Either attP was installed into the genome and eePASSIGE was used, or attB was
B1195.70174WO00 12131093.2
installed into the genome and evoPASSIGE was used. For PASSIGE and PASTE experiments (FIGs.29A-29F), all components were delivered using single transfection, and a 5.6-kB donor DNA plasmid was used. Smn1 and Rosa26 are genomic sites in N2a cells, and all other genomic sites are in HEK293T cells (FIGs.29A-29F). For FIG.29D and FIG.29E, integration efficiencies were evaluated at 12 different loci when installing both attP and attB separately. Bars reflect the mean of three independent replicates (FIGs.29A-29C, 29F), dots show individual n= 3 replicate values (FIGs.29A-29F) and horizontal lines show the mean value (FIGs.29D-29E). Integration efficiency was assessed by ddPCR analysis as described in Example 2. [0063] FIGs.30A-30D show integration of therapeutic DNA cargo using PASSIGE variants. FIG.30A shows absolute integration efficiencies for PASSIGE, eePASSIGE, and PASTE when integrating therapeutically relevant cDNA cargoes into multiple loci in HEK293T and N2a cells. Bars reflect the mean of three independent replicates, and dots show individual n=3 replicate values. Integration efficiency was assessed by ddPCR analysis as described in Example 2. FIG. 30B shows F9 protein measurement via ELISA assay. HuH7 cells were passaged 72 hours after transfection. At day 9 after transfection, media supernatant was collected from each condition and used for F9 ELISA assay. Bars reflect the mean of two independent replicates, and dots show individual n=2 replicate values. For PASSIGE and PASTE experiments in FIGs.30A- 30B, all components were delivered using single transfection. For the negative control, all components except the prime editor protein were transfected into cells, and the eeBxb1 recombinase was used. FIG.30C shows the ddPCR plots used to assess integration efficiency at the FANCA locus using PASSIGE (Data shown in FIG.30A). FIG.30D shows ddPCR plots for the FAM channel and corresponding % integration values obtained when using genome- donor junction binding probes used Example 2. [0064] FIG.31 shows individual PANCE experiments for Bxb1 evolutions in FIG.26D. In circuits 1.1-1.4, two recombinase attachment sites are present in both plasmids, P1 and P2. Circuits 2.1, and 2.2 use one recombinase attachment site per plasmid. Circuits 1.1, 1.2, 2.1, and 2.2 have attB in P1 and attP in P2. Circuits 1.3, and 1.4 have attP in P1 and attB in P2. Circuits 1.2, 1.4, and 2.2 have a GA central dinucleotide in the attachment sites instead of GT, which is present in circuits 1.1, 1.3, and 2.1. PANCE traces for each lagoon (L1-L4) are shown. Selection stringency was modulated by decreasing the selection time and increasing dilution factor. Unless
B1195.70174WO00 12131093.2
otherwise indicated, each PANCE passage was performed overnight, and phage were diluted 1:50 after each passage. PANCE titers were measured by qPCR as described in Example 2. [0065] FIGs.32A-32C show PANCE and PACE experiments for Bxb1 evolution in FIG.27A. FIG.32A shows PANCE traces in circuit 1.3. Traces for individual lagoons (L1-L8) are shown. Selection stringency was modulated by decreasing the strength of the ribosome binding site (RBS) from sd8 to sd5, decreasing selection time, and increasing dilution factor. FIG.32B shows PACE traces across four lagoons (L1-L4) using circuit 1.3. Phage pools obtained from PANCE in (FIG.32A) were used to inoculate all lagoons. Selection stringency was modulated by increasing flow rate from 0.5 vol/hr to 3.0 vol/hr. FIG.32C shows PANCE traces for evolution where size of P1 was increased from 3.2-kB to 6.5-kB. Phage pools obtained from PACE in (FIG.32B) were used to inoculate ten individual lagoons (L1-L10). Selection stringency was modulated by increasing dilution factor. For all PANCE experiments, unless otherwise indicated, each passage was performed overnight, and phage were diluted 1:50 after each passage. PANCE titers were measured using qPCR, and PACE titers were measured using plaquing, as described in Example 2. [0066] FIGs.33A-33D show mapping evolved mutations onto the predicted structure of Bxb1. FIG.33A shows AlphaFold2 predicted structures of the NTD, CTD-a, and CTD-b of Bxb1. Each domain aligns well with solved structures of serine recombinases (PDB: 1ZR4, 6DNW, and 4KIS). FIG.33B shows positions of beneficial evolved mutations in the AlphaFold2 predicted structure of the NTD of Bxb1. The DNA substrate from gammadelta resolvase tetramer (PDB: 1ZR4) was docked onto the predicted structure. S10 is the catalytic residue. FIG.33C shows positions of beneficial mutations on the surface of the AlphaFold2 predicted structure of the NTD of Bxb1. FIG.33D shows predicted positions of the four mutated residues in the core of the NTD that resulted in the highest integration efficiencies. Positions were predicted using AlphaFold2. The remaining three unmutated residues in each case are in dark grey. [0067] FIGs.34A-34B show optimization of the PASSIGE system. FIG.34A shows a schematic of trimmed pegRNA optimization for PASSIGE. When the overlap length between the two newly synthesized 3' flaps is decreased, unwanted plasmid recombination is reduced as each pegRNA plasmid encodes a trimmed attachment site sequence. FIG.34B shows editing efficiencies when using trimmed pegRNAs to install either attP or attB into the AAVS1 and CCR5 loci. Overlap lengths from 50 bp to 8 bp and 38 bp to 12 bp were tested to install attP and
B1195.70174WO00 12131093.2
attB, respectively. The % edit or indels was assessed by Illumina Miseq analysis. Bars reflect the mean of three independent replicates, and dots show the values of individual replicates. [0068] FIGs.35A-35C show characterization of evolved and engineered Bxb1 variants. FIG. 35A shows a heat map of fold-change in integration efficiencies compared to wild-type (WT) Bxb1 for evolved and engineered (ee) Bxb1 variants that were generated by combining one mutation from each domain of Bxb1. A 5.6-kB donor plasmid along with either WT Bxb1, or an ee-variant were transfected into HEK293T cells with either pre-installed attP in AAVS1 or attB in CCR5. Each square reflects the mean value for three independent replicates. Absolute integration efficiencies for each variant are reported in Table 7. Integration efficiency was assessed by ddPCR analysis as described in Example 2. FIG.35B shows a percentage of mCherry-positive cells 14 days after transfecting a 3.2-kB donor DNA plasmid along with either dead Bxb1, evoBxb1, Bxb1 (V74A+V375I), Bxb1 (V74A+E229K), or eeBxb1. The donor plasmid either has an attP or attB site and encodes mCherry under the CMV promoter. Statistical significance was calculated using Student’s unpaired two-tailed t-test, ***P < 0.001, ****P < 0.0001. Bars reflect the mean of three independent replicates and dots show the values of individual replicates. FIG. 35C shows predicted position of the E229K mutation that resulted in off-target integration when delivering an attP containing donor. The DNA substrate from Listeria innocua prophage serine recombinase (PDB: 6DNW) was docked onto the AlphaFold2 predicted structure of the CTD-a domain of Bxb1. [0069] FIG.36 shows PE6 variants to enhance Bxb1 attachment site installation for sites in FIG.29A. Attachment site installation efficiencies with prime editors PEmax, PE6b, PE6c, and PE6d at the AAVS1, CCR5, and ACTB loci in HEK293T cells, and at the Rosa26 locus in N2a cells. Prime editing and indel efficiencies were assessed by Illumina Miseq analysis. Bars reflect the mean of three independent replicates. Dots show the values of individual replicates. [0070] FIGs.37A-37F show a comparison of PASTE with PASSIGE. FIG.37A shows a comparison of the optimized pegRNA scaffold (atgRNAv2) used in PASTE with the original pegRNA scaffold used in PASSIGE. FIG.37B shows a comparison of the XTEN-48 linker between the Cas9 and M-MLV reverse transcriptase domain of the prime editor used in PASTE with the SGGSx2-bpNLSSV40-SGGSx2 linker (SEQ ID NO: 90) used in PASSIGE. FIG.37C shows a comparison of the mutated M-MLV RT with the L139P mutation used in PASTE with the M-MLV RT used in PASSIGE. FIG.37D shows a comparison of fusion of Bxb1 to the
B1195.70174WO00 12131093.2
PEmax prime editor using the same linker specified in PASTE (cis) with the unfused Bxb1 used in PASSIGE (trans). FIG.37E shows a comparison of the mutated attP sequence used in PASTE with the original attP sequence used in PASSIGE. FIG.37F shows a comparison of PASSIGE architecture with the PASTE architecture using wild-type Bxb1, evoBxb1, and eeBxb1 recombinases. In the PASSIGE architecture, Bxb1 variants and the PEmax prime editor are unfused. In the PASTE architecture, Bxb1 variants are fused to the prime editor used in PASTE using the same linker specified in (FIG.37D). For PASSIGE and PASTE experiments, all components were delivered using single transfection, a 5.6-kB donor DNA plasmid was used, and all experiments were performed in HEK293T cells. At the AAVS1 locus, attP was installed and at the CCR5, and ACTB loci, attB was installed. In all cases, the WT Bxb1, evoBxb1, and eeBxb1 were used. Bars reflect the mean of three independent replicates and dots show individual replicate values. Integration efficiency was assessed by ddPCR analysis as described in Example 2. [0071] FIG.38 shows PE6 variants for attachment site installation for therapeutic loci in FIG. 29C. Attachment site installation efficiencies with the PASTE prime editor–Bxb1 fusion compared with prime editors PEmax, PE6b, PE6c, and PE6d. Smn1 site is in N2a cells, while all other sites are in HEK293T cells. The % edit or indels was assessed by Illumina Miseq analysis. Bars reflect the mean of three independent replicates and dots show individual replicate values. [0072] FIGs.39A-39B show performance of ddPCR probes at the LMNB1, NOLC1, and ACTB sites. FIG.39A shows integration efficiencies for PASSIGE, evoPASSIGE, eePASSIGE, and PASTE at the top three most common sites used to characterize PASTE in the Yarnall et al. PASTE report using different ddPCR probes (DNA donor or pegRNA binding probes used in Yarnell et al.27 compared with genome-donor junction probe used in twin prime editing25 and in this work). Bars reflect the mean of three independent replicates, and dots show individual replicate values. FIG.39B shows ddPCR plots for PASTE, no PEmax (+eeBxb1) control, and dead Bxb1 control when using different ddPCR probes. The bold line shows the threshold that was set to assess integration efficiencies in FIG.39A. Details of how thresholds are calculated are provided in Example 2. In FIGs.39A-39B, probes used in the original PASTE paper are compared side-by-side with probes used in Example 2. In the original PASTE report, probes bind to either the DNA donor plasmid (LMNB1, and NOLC1) or the Bxb1 attB sequence, which is also present in the pegRNA plasmid (ACTB). In Example 2, probes that bind to the Bxb1 attL or
B1195.70174WO00 12131093.2
attR sites were used, which are only present in cells after recombination. When using the probe from Yarnall et al. that binds to the DNA donor plasmid at LMNB1, and NOLC1, high background was observed in the negative controls: no PEmax (+eeBxb1) and dead Bxb1 both showed false positive signals. In contrast, minimal or no background was observed at these sites when using an attL-binding probe. For PASSIGE and PASTE experiments, all components were delivered using single transfection, and a 4.5-kB donor DNA plasmid reported in the PASTE paper was used. The primer pair used for ddPCR is the same as what was reported in the PASTE paper for all reactions. All experiments were performed in HEK293T cells. [0073] FIGs.40A-40B show sequencing of individual phages after PANCE 1 in circuits, 1.1- 1.4, and 2.1-2.2. [0074] FIG.41 shows sequencing of individual phages after PANCE 2 in circuit 1.3 with 3.2 kB P1 plasmid. [0075] FIG.42 shows sequencing of individual phages after PACE in circuit 1.3 with 3.2 kB P1 plasmid. [0076] FIG.43 shows sequencing of individual phages after PANCE in circuit 1.3 with 6.5 kB P1 plasmid. [0077] FIG.44 shows that eeBxb1 improves integration efficiency of donor DNA in human primary fibroblast cells relative to wild type Bxb1. Delivery of RNA encoding eeBxb1 and the prime editor further increased integration efficiency relative to delivery of DNA encoding the same. DEFINITIONS [0078] Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
B1195.70174WO00 12131093.2
Cas9 [0079] The term “Cas9” or “Cas9 nuclease” refers to an RNA-guided nuclease comprising a Cas9 domain, or a fragment thereof (e.g., a protein comprising an active or inactive DNA cleavage domain of Cas9, and/or the gRNA binding domain of Cas9). A “Cas9 domain,” as used herein, is a protein fragment comprising an active or fully or partly inactive cleavage domain of Cas9 and/or the gRNA binding domain of Cas9. A “Cas9 protein” is a full length Cas9 protein. A Cas9 nuclease is also referred to sometimes as a casn1 nuclease or a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeat)-associated nuclease. CRISPR is an adaptive immune system that provides protection against mobile genetic elements (viruses, transposable elements, and conjugative plasmids). CRISPR clusters contain spacers, sequences complementary to antecedent mobile elements, and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In type II CRISPR systems, correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc), and a Cas9 domain. The tracrRNA serves as a guide for ribonuclease 3- aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves a linear or circular dsDNA target complementary to the spacer. The strand in the target DNA not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the contents of which are incorporated herein by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A.98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable
B1195.70174WO00 12131093.2
dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. In some embodiments, a Cas9 nuclease comprises one or more mutations that partially impair or inactivate the DNA cleavage domain. [0080] A nuclease-inactivated Cas9 domain may interchangeably be referred to as a “dCas9” protein (for nuclease-“dead” Cas9). Methods for generating a Cas9 domain (or a fragment thereof) having an inactive DNA cleavage domain are known (see, e.g., Jinek et al., Science. 337:816-821(2012); Qi et al., “Repurposing CRISPR as an RNA-Guided Platform for Sequence- Specific Control of Gene Expression” (2013) Cell.28;152(5):1173-83, the entire contents of each of which are incorporated herein by reference). For example, the DNA cleavage domain of Cas9 is known to include two subdomains, the HNH nuclease subdomain and the RuvC1 subdomain. The HNH subdomain cleaves the strand complementary to the gRNA, whereas the RuvC1 subdomain cleaves the non-complementary strand. Mutations within these subdomains can silence the nuclease activity of Cas9. For example, the mutations D10A and H840A completely inactivate the nuclease activity of S. pyogenes Cas9 (Jinek et al., Science.337:816- 821(2012); Qi et al., Cell.28;152(5):1173-83 (2013)). In some embodiments, proteins comprising fragments of a Cas9 protein are provided. For example, in some embodiments, a protein comprises one of two Cas9 domains: (1) the gRNA binding domain of Cas9; or (2) the DNA cleavage domain of Cas9. In some embodiments, proteins comprising Cas9, or fragments thereof, are referred to as “Cas9 variants.” A Cas9 variant shares homology to Cas9, or a fragment thereof. For example, a Cas9 variant is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, at least about 99.8% identical, or at least about 99.9% identical to wild
B1195.70174WO00 12131093.2
type Cas9 (e.g., SpCas9 of SEQ ID NO: 6). In some embodiments, the Cas9 variant may have 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 21, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acid changes compared to wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 6). In some embodiments, the Cas9 variant comprises a fragment of SEQ ID NO: 6 Cas9 (e.g., a gRNA binding domain or a DNA-cleavage domain), such that the fragment is at least about 70% identical, at least about 80% identical, at least about 90% identical, at least about 95% identical, at least about 96% identical, at least about 97% identical, at least about 98% identical, at least about 99% identical, at least about 99.5% identical, or at least about 99.9% identical to the corresponding fragment of wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 6). In some embodiments, the fragment is at least 30%, at least 35%, at least 40%, at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identical, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% of the amino acid length of a corresponding wild type Cas9 (e.g., SpCas9 of SEQ ID NO: 6). CRISPR [0081] CRISPR is a family of DNA sequences (i.e., CRISPR clusters) in bacteria and archaea that represent snippets of prior infections by a virus that have invaded the prokaryote. The snippets of DNA are used by the prokaryotic cell to detect and destroy DNA from subsequent attacks by similar viruses and effectively compose, along with an array of CRISPR-associated proteins (including Cas9 and homologs thereof) and CRISPR-associated RNA, a prokaryotic immune defense system. In nature, CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In certain types of CRISPR systems (e.g., type II CRISPR systems), correct processing of pre-crRNA requires a trans-encoded small RNA (tracrRNA), endogenous ribonuclease 3 (rnc), and a Cas9 protein. The tracrRNA serves as a guide for ribonuclease 3- aided processing of pre-crRNA. Subsequently, Cas9/crRNA/tracrRNA endonucleolytically cleaves a linear or circular dsDNA target complementary to the RNA. Specifically, the DNA strand in the target that is not complementary to crRNA is first cut endonucleolytically, then trimmed 3′-5′ exonucleolytically. In nature, DNA-binding and cleavage typically requires protein and both RNAs. However, single guide RNAs (“sgRNA”, or simply “gRNA”) can be engineered so as to incorporate aspects of both the crRNA and tracrRNA into a single RNA species – the
B1195.70174WO00 12131093.2
guide RNA. See, e.g., Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of which is hereby incorporated by reference. Cas9 recognizes a short motif in the CRISPR repeat sequences (the PAM or protospacer adjacent motif) to help distinguish self versus non-self. CRISPR biology, as well as Cas9 nuclease sequences and structures are well known to those of skill in the art (see, e.g., “Complete genome sequence of an M1 strain of Streptococcus pyogenes.” Ferretti et al., J.J., McShan W.M., Ajdic D.J., Savic D.J., Savic G., Lyon K., Primeaux C., Sezate S., Suvorov A.N., Kenton S., Lai H.S., Lin S.P., Qian Y., Jia H.G., Najar F.Z., Ren Q., Zhu H., Song L., White J., Yuan X., Clifton S.W., Roe B.A., McLaughlin R.E., Proc. Natl. Acad. Sci. U.S.A.98:4658-4663(2001); “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III.” Deltcheva E., Chylinski K., Sharma C.M., Gonzales K., Chao Y., Pirzada Z.A., Eckert M.R., Vogel J., Charpentier E., Nature 471:602-607(2011); and “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.” Jinek M., Chylinski K., Fonfara I., Hauer M., Doudna J.A., Charpentier E. Science 337:816-821(2012), the entire contents of each of which are incorporated herein by reference). Cas9 orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. Additional suitable Cas9 nucleases and sequences will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. [0082] In general, a “CRISPR system” refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA), a tracr mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a “spacer” ), or other sequences and transcripts from a CRISPR locus. The tracrRNA of the system is complementary (fully or partially) to the tracr mate sequence present on the guide RNA.
B1195.70174WO00 12131093.2
DNA synthesis template (or reverse transcription template (RTT)) [0083] As used herein, the terms “DNA synthesis template” and “reverse transcription template (RTT)” refer to the region or portion of the extension arm of a PEgRNA that is utilized as a template by a polymerase of a prime editor to encode a 3ʹ single-strand DNA flap that contains the desired edit and which then, through the mechanism of prime editing, replaces the corresponding endogenous strand of DNA at the target site. The extension arm, including the DNA synthesis template, may be comprised of DNA or RNA. In the case of RNA, the polymerase of the prime editor can be an RNA-dependent DNA polymerase (e.g., a reverse transcriptase). In the case of DNA, the polymerase of the prime editor can be a DNA-dependent DNA polymerase. In various embodiments, the DNA synthesis template may comprise the “edit template” and the “homology arm”, and all or a portion of an optional 5′ end modifier region and/or an optional 3′ end modifier region. Said another way, in the case of a 3ʹ extension arm, the DNA synthesis template can include the portion of the extension arm that spans from the 5ʹ end of the primer binding site (PBS) to 3ʹ end of the gRNA core that may operate as a template for the synthesis of a single-strand of DNA by a polymerase (e.g., a reverse transcriptase). In the case of a 5ʹ extension arm, the DNA synthesis template can include the portion of the extension arm that spans from the 5ʹ end of the PEgRNA molecule to the 5′ end of the PBS. Certain embodiments described here refer to a “DNA synthesis template,” an “RT template,” or an “RTT,” which is also inclusive of the edit template and the homology arm, but wherein the RT edit template reflects the use of a prime editor having a polymerase that is a reverse transcriptase, and wherein the DNA synthesis template reflects more broadly the use of a prime editor having any polymerase. In certain embodiments, an RT template may be used to refer to a template polynucleotide for reverse transcription, e.g., in a prime editing system, complex, or method using a prime editor having a polymerase that is a reverse transcriptase. In some embodiments, a DNA synthesis template may be used to refer to a template polynucleotide for DNA polymerization, e.g., RNA-dependent DNA polymerization or DNA-dependent polymerization, e.g., in a prime editing system, complex, or method using a prime editor having a polymerase that is an RNA-dependent DNA polymerase or a DNA-dependent DNA polymerase. The term “edit template” refers to a portion of the extension arm that encodes the desired edit in the single strand 3ʹ DNA flap that is synthesized by the polymerase, e.g., a DNA-dependent DNA polymerase or an RNA-dependent DNA polymerase (e.g., a reverse transcriptase).
B1195.70174WO00 12131093.2
[0084] As used herein, the term “DNA synthesis template” refers to the region or portion of the extension arm of a pegRNA that is utilized as a template strand by a polymerase of a prime editor to encode a 3ʹ single-strand DNA flap that contains the desired edit and which then, through the mechanism of prime editing, replaces the corresponding endogenous strand of DNA at the target site. The extension arm, including the DNA synthesis template, may be comprised of DNA or RNA. In the case of RNA, the polymerase of the prime editor can be an RNA-dependent DNA polymerase (e.g., a reverse transcriptase). In the case of DNA, the polymerase of the prime editor can be a DNA-dependent DNA polymerase. In various embodiments, the DNA synthesis template comprises an the “edit template” and a “homology arm.” In various embodiments, the DNA synthesis template may comprise the “edit template” and a “homology arm”, and all or a portion of the optional 5′ end modifier region, e2. That is, depending on the nature of the e2 region (e.g., whether it includes a hairpin, toeloop, or stem/loop secondary structure), the polymerase may encode none, some, or all of the e2 region, as well. Said another way, in the case of a 3ʹ extension arm, the DNA synthesis template can include the portion of the extension arm that spans from the 5ʹ end of the primer binding site (PBS) to 3ʹ end of the gRNA core that may operate as a template for the synthesis of a single-strand of DNA by a polymerase (e.g., a reverse transcriptase). In the case of a 5ʹ extension arm, the DNA synthesis template can include the portion of the extension arm that spans from the 5ʹ end of the pegRNA molecule to the 3ʹ end of the edit template. In some embodiments, the DNA synthesis template excludes the primer binding site (PBS) of pegRNAs either having a 3ʹ extension arm or a 5ʹ extension arm. Certain embodiments refer to an “RT template,” which is inclusive of the edit template and the homology arm, i.e., the sequence of the pegRNA extension arm which is actually used as a template during DNA synthesis. The term “RT template” is equivalent to the term “DNA synthesis template.” In certain embodiments, an RT template may be used to refer to a template polynucleotide for reverse transcription, e.g., in a prime editing system, complex or method using a prime editor having a polymerase that is a reverse transcriptase. In some embodiments, a DNA synthesis template may be used to refer to a template polynucleotide for DNA polymerization, e.g., RNA-dependent DNA polymerization or DNA-dependent polymerization, e.g., in a prime editing system, complex, or method using a prime editor having a polymerase that is an RNA-dependent DNA polymerase or a DNA-dependent DNA polymerase.
B1195.70174WO00 12131093.2
[0085] In some embodiments, the DNA synthesis template is a single-stranded portion of the PEgRNA that is 5′ of the PBS and comprises a region of complementarity to the PAM strand (i.e., the non-target strand or the edit strand), and comprises one or more nucleotide edits compared to the endogenous sequence of the double stranded target DNA. In some embodiments, the DNA synthesis template is complementary or substantially complementary to a sequence on the non-target strand that is downstream of a nick site, except for one or more non-complementary nucleotides at the intended nucleotide edit positions. In some embodiments, the DNA synthesis template is complementary or substantially complementary to a sequence on the non-target strand that is immediately downstream (i.e., directly downstream) of a nick site, except for one or more non-complementary nucleotides at the intended nucleotide edit positions. In some embodiments, one or more of the non-complementary nucleotides at the intended nucleotide edit positions are immediately downstream of a nick site. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to the double-stranded target DNA sequence. In some embodiments, the DNA synthesis template comprises one or more nucleotide edits relative to the non-target strand of the double-stranded target DNA sequence. For each PEgRNA described herein, a nick site is characteristic of the particular napDNAbp to which the gRNA core of the PEgRNA associates with, and is characteristic of the particular PAM required for recognition and function of the napDNAbp. For example, for a PEgRNA that comprises a gRNA core that associates with a SpCas9, the nick site in the phosphodiester bond between bases three (“-3” position relative to the position 1 of the PAM sequence) and four (“-4” position relative to position 1 of the PAM sequence). In some embodiments, the DNA synthesis template and the primer binding site are immediately adjacent to each other. The terms “nucleotide edit”, “nucleotide change”, “desired nucleotide change”, and “desired nucleotide edit” are used interchangeably to refer to a specific nucleotide edit, e.g., a specific deletion of one or more nucleotides, a specific insertion of one or more nucleotides, a specific substitution(s) of one or more nucleotides, or a combination thereof, at a specific position in a DNA synthesis template of a PEgRNA to be incorporated in a target DNA sequence. In some embodiments, the DNA synthesis template comprises more than one nucleotide edit relative to the double-stranded target DNA sequence. In such embodiments, each nucleotide edit is a specific nucleotide edit at a specific position in the DNA synthesis template, each nucleotide edit is at a different specific position relative to any of the other nucleotide edits
B1195.70174WO00 12131093.2
in the DNA synthesis template, and each nucleotide edit is independently selected from a specific deletion of one or more nucleotides, a specific insertion of one or more nucleotides, a specific substitution(s) of one or more nucleotides, or a combination thereof. A nucleotide edit may refer to the edit on the DNA synthesis template as compared to the sequence on the target strand of the double stranded target DNA, or may refer to the edit encoded by the DNA synthesis template on the newly synthesized single stranded DNA that replaces the endogenous target DNA sequence on the non-target strand. Edit strand and non-edit strand [0086] The terms “edit strand” and “non-edit strand” are terms that may be used when describing the mechanism of action of a prime editing system on a double-stranded DNA substrate. The “edit strand” refers to the strand of DNA which is nicked by the prime editor complex to form a 3ʹ end, which is then extended as a newly synthesized single stranded DNA (also referred herein as the newly synthesized 3′ DNA flap), which comprises a desired edit and ultimately displaces and replaces the single strand region of DNA just downstream of the nick, thereby installing the 3ʹ DNA flap containing the desired edit downstream of the nick on the “edit strand.” In some embodiments, the newly synthesized 3′ DNA flap comprising the nucleotide edit is paired in a heteroduplex with the non-edit strand that does not comprise the nucleotide edit, thereby creating a mismatch. In some embodiments, the mismatch is recognized by DNA repair machinery, and/or replication machinery, e.g., an endogenous DNA repair machinery. In some embodiments, through DNA repair, the intended nucleotide edit is incorporated into both strands of the target double-stranded DNA substrate. The application may also refer to the “edit strand” as the “protospacer strand” or the “PAM strand” since these elements are present in that strand. The “edit strand” may also be called the “non-target strand” since the edit strand is not the strand that becomes annealed to the spacer of the PEgRNA molecule, but rather is the complement of the strand that is annealed by the spacer of the PEgRNA. The “non-edit” strand is not directly edited by the PE system. Rather, the desired edit created by the PE system in the 3ʹ DNA flap is incorporated into the “non-edited strand” through DNA replication and/or repair. In some embodiments, the “non-edit strand” is the strand that anneals to the spacer of the PEgRNA, and thus is also called the “target strand.”
B1195.70174WO00 12131093.2
Extension arm [0087] The term “extension arm” refers to a nucleotide sequence component of a PEgRNA which comprises a primer binding site (PBS) and a DNA synthesis template for a polymerase (e.g., an RT template for reverse transcriptase). In some embodiments, the extension arm is located at the 3ʹ end of the guide RNA. In other embodiments, the extension arm is located at the 5ʹ end of the guide RNA. In some embodiments, the extension arm comprises a DNA synthesis template and a primer binding site. In some embodiments, the extension arm comprises the following components in a 5ʹ to 3ʹ direction: the DNA synthesis template, and the primer binding site. In some embodiments, the extension arm also includes a homology arm. In various embodiments, the extension arm comprises the following components in a 5ʹ to 3ʹ direction: the homology arm, the edit template, and the primer binding site. Since polymerization activity of the reverse transcriptase is in the 5ʹ to 3ʹ direction, the preferred arrangement of the homology arm, edit template, and primer binding site is in the 5ʹ to 3ʹ direction such that the reverse transcriptase, once primed by an annealed primer sequence, polymerizes a single strand of DNA using the edit template as a complementary template strand. [0088] The extension arm may be described as comprising generally two regions: a primer binding site (PBS) and a DNA synthesis template, for instance. The primer binding site binds to a primer sequence, for example, a single stranded primer sequence containing a free 3′ end at the nick site that is formed from the endogenous DNA strand of the target site when it becomes nicked by the prime editor complex, thereby exposing a 3ʹ end on the endogenous nicked strand. As explained herein, the binding of the primer sequence to the primer binding site on the extension arm of the PEgRNA creates a duplex region with an exposed 3ʹ end (i.e., the 3ʹ of the primer sequence), which then provides a substrate for a polymerase to begin polymerizing a single strand of DNA from the exposed 3ʹ end along the length of the DNA synthesis template. The sequence of the single strand DNA product is the complement of the DNA synthesis template. Polymerization continues towards the 5ʹ of the DNA synthesis template (or extension arm) until polymerization terminates. Thus, the DNA synthesis template represents the portion of the extension arm that is encoded into a single strand DNA product (i.e., the 3ʹ single strand DNA flap containing the desired nucleotide edit) by the polymerase of the prime editor complex and that ultimately replaces the corresponding endogenous DNA strand of the target site that sits immediately downstream of the PE-induced nick site. Without being bound by theory,
B1195.70174WO00 12131093.2
polymerization of the DNA synthesis template continues towards the 5ʹ end of the extension arm until a termination event. Polymerization may terminate in a variety of ways, including, but not limited to (a) reaching a 5ʹ terminus of the PEgRNA (e.g., in the case of the 5ʹ extension arm wherein the DNA polymerase simply runs out of template), (b) reaching an impassable RNA secondary structure (e.g., hairpin or stem/loop), or (c) reaching a replication termination signal, e.g., a specific nucleotide sequence that blocks or inhibits the polymerase, or a nucleic acid topological signal, such as supercoiled DNA or RNA. Fusion protein [0089] The term “fusion protein” as used herein refers to a hybrid polypeptide which comprises protein domains from at least two different proteins. One protein may be located at the amino- terminal (N-terminal) portion of the fusion protein or at the carboxy-terminal (C-terminal) protein thus forming an “amino-terminal fusion protein” or a “carboxy-terminal fusion protein,” respectively. A protein may comprise different domains, for example, a nucleic acid binding domain (e.g., the gRNA binding domain of Cas9 that directs the binding of the protein to a target site) and a nucleic acid cleavage domain or a catalytic domain of a nucleic-acid editing protein. Another example includes fusion of a Cas9 or equivalent thereof to a reverse transcriptase (i.e., a prime editor). Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the entire contents of which is incorporated herein by reference. Guide RNA (“gRNA”) [0090] As used herein, the term “guide RNA” is a particular type of guide nucleic acid which is mostly commonly associated with a Cas protein of a CRISPR-Cas9 and which associates with Cas9, directing the Cas9 protein to a specific sequence in a DNA molecule that includes complementarity to the spacer sequence of the guide RNA. However, this term also embraces the equivalent guide nucleic acid molecules that associate with Cas9 equivalents, homologs, orthologs, or paralogs, whether naturally occurring or non-naturally occurring (e.g., engineered or recombinant), and which otherwise program the Cas9 equivalent to localize to a specific target
B1195.70174WO00 12131093.2
nucleotide sequence. The Cas9 equivalents may include other napDNAbp from any type of CRISPR system (e.g., type II, V, VI), including Cpf1 (a type-V CRISPR-Cas systems), C2c1 (a type V CRISPR-Cas system), C2c2 (a type VI CRISPR-Cas system), and C2c3 (a type V CRISPR-Cas system). Further Cas-equivalents are described in Makarova et al., “C2c2 is a single-component programmable RNA-guided RNA-targeting CRISPR effector,” Science 2016; 353(6299), the contents of which are incorporated herein by reference. Exemplary sequences and structures of guide RNAs are provided herein. In addition, methods for designing appropriate guide RNA sequences are provided herein. As used herein, the “guide RNA” may also be referred to as a “traditional guide RNA” to contrast it with the modified forms of guide RNA termed “prime editing guide RNAs” (or “PEgRNAs”) and “engineered PEgRNAs” (or epegRNAs”). [0091] Guide RNAs or PEgRNAs/epegRNAs may comprise various structural elements that include, but are not limited to: [0092] Spacer sequence – the sequence in the guide RNA or pegRNA/epegRNA (having about 20 nts in length) that has the same sequence as the protospacer in the target DNA, except that the guide RNA or PEgRNA/epegRNA comprises Uracil and the target protospacer contains Thymine. [0093] gRNA core (or gRNA scaffold or backbone sequence) – the sequence within the gRNA that is responsible for binding with a nucleic acid programmable DNA binding protein, e.g., a Cas9. It does not include the spacer sequence that is used to guide Cas9 to target DNA. [0094] Transcription terminator – the guide RNA or PEgRNA may comprise a transcriptional termination sequence at the 3ʹ of the molecule. [0095] In some embodiments, a pegRNA or epegRNA may also comprise an extension arm – a single strand extension at the 3ʹ end or the 5ʹ end of the PEgRNA which comprises a primer binding site and a DNA synthesis template sequence that encodes via a polymerase (e.g., a reverse transcriptase) a single stranded DNA flap containing the desired nucleotide change, which then integrates into the endogenous DNA by replacing the corresponding endogenous strand, thereby installing the desired nucleotide change. Linker [0096] The term “linker,” as used herein, refers to a molecule linking two other molecules or moieties. The linker can be an amino acid sequence in the case of a peptide linker joining two
B1195.70174WO00 12131093.2
domains of a fusion protein. For example, a napDNAbp (e.g., Cas9) can be fused to a reverse transcriptase by an amino acid linker sequence. The linker can also be a nucleotide sequence in the case of joining two nucleotide sequences together (e.g., in a gRNA). For example, in the instant case, the traditional guide RNA is linked via a spacer or linker nucleotide sequence to the RNA extension of a prime editing guide RNA which may comprise an RT template sequence and an RT primer binding site. In other embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-200 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated. napDNAbp [0097] As used herein, the term “nucleic acid programmable DNA binding protein” or “napDNAbp,” of which Cas9 is an example, refers to a protein that uses RNA:DNA hybridization to target and bind to specific sequences in a DNA molecule. Each napDNAbp is associated with at least one guide nucleic acid (e.g., guide RNA), which localizes the napDNAbp to a DNA sequence that comprises a DNA strand (i.e., a target strand) that is complementary to the guide nucleic acid, or a portion thereof (e.g., the protospacer of a guide RNA). In other words, the guide nucleic-acid “programs” the napDNAbp (e.g., Cas9 or equivalent) to localize and bind to a complementary sequence. [0098] Without being bound by theory, the binding mechanism of a napDNAbp – guide RNA complex, in general, includes the step of forming an R-loop whereby the napDNAbp induces the unwinding of a double-strand DNA target, thereby separating the strands in the region bound by the napDNAbp. The guide RNA protospacer then hybridizes to the “target strand.” This displaces a “non-target strand” that is complementary to the target strand, which forms the single strand region of the R-loop. In some embodiments, the napDNAbp includes one or more nuclease activities, which then cut the DNA, leaving various types of lesions. For example, the napDNAbp may comprise a nuclease activity that cuts the non-target strand at a first location, and/or cuts the target strand at a second location. Depending on the nuclease activity, the target DNA can be cut to form a “double-stranded break” whereby both strands are cut. In other embodiments, the target DNA can be cut at only a single site, i.e., the DNA is “nicked” on one strand. Exemplary napDNAbp with different nuclease activities include “Cas9 nickase”
B1195.70174WO00 12131093.2
(“nCas9”) and a deactivated Cas9 having no nuclease activities (“dead Cas9” or “dCas9”). Exemplary sequences for these and other napDNAbp are provided herein. Nickase [0099] As used herein, a “nickase” refers to a napDNAbp (e.g., a Cas protein) which is capable of cleaving only one of the two complementary strands of a double-stranded target DNA sequence, thereby generating a nick in that strand. In some embodiments, the nickase cleaves a non-target strand of a double stranded target DNA sequence. In some embodiments, the nickase comprises an amino acid sequence with one or more mutations in a catalytic domain of a canonical napDNAbp (e.g., a Cas protein), wherein the one or more mutations reduces or abolishes nuclease activity of the catalytic domain. In some embodiments, the nickase is a Cas9 that comprises one or more mutations in a RuvC-like domain relative to a wild type Cas9 sequence or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is a Cas9 that comprises one or more mutations in an HNH-like domain relative to a wild type Cas9 sequence or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is a Cas9 that comprises an aspartate-to-alanine substitution (D10A) in the RuvC I catalytic domain of Cas9 relative to a canonical SpCas9 sequence or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the nickase is a Cas9 that comprises an H840A, N854A, and/or N863A mutation relative to a canonical SpCas9 sequence, or to an equivalent amino acid position in other Cas9 variants or Cas9 equivalents. In some embodiments, the term “Cas9 nickase” refers to a Cas9 with one of the two nuclease domains inactivated. This enzyme is capable of cleaving only one strand of a target DNA. In some embodiments, the nickase is a Cas protein that is not a Cas9 nickase. [0100] In some embodiments, the napDNAbp of the prime editing complex comprises an endonuclease having nucleic acid programmable DNA binding ability. In some embodiments, the napDNAbp comprises an active endonuclease capable of cleaving both strands of a double stranded target DNA. In some embodiments, the napDNAbp is a nuclease active endonuclease, e.g., a nuclease active Cas protein, that can cleave both strands of a double stranded target DNA by generating a nick on each strand. For example, a nuclease active Cas protein can generate a cleavage (a nick) on each strand of a double stranded target DNA. In some embodiments, the two nicks on both strands are staggered nicks, for example, generated by a napDNAbp comprising a
B1195.70174WO00 12131093.2
Cas12a or Cas12b1. In some embodiments, the two nicks on both strands are at the same genomic position, for example, generated by a napDNAbp comprising a nuclease active Cas9. In some embodiments, the napDNAbp comprises an endonuclease that is a nickase. For example, in some embodiments, the napDNAbp comprises an endonuclease comprising one or more mutations that reduce nuclease activity of the endonuclease, rendering it a nickase. In some embodiments, the napDNAbp comprises an inactive endonuclease, for example, in some embodiments, the napDNAbp comprises an endonuclease comprising one or more mutations that abolish the nuclease activity. In various embodiments, the napDNAbp is a Cas9 protein or variant thereof. The napDNAbp can also be a nuclease active Cas9, a nuclease inactive Cas9 (dCas9), or a Cas9 nickase (nCas9). In a preferred embodiment, the napDNAbp is Cas9 nickase (nCas9) that nicks only a single strand. In other embodiments, the napDNAbp can be selected from the group consisting of: Cas9, Cas12e, Cas12d, Cas12a, Cas12b1, Cas12b2, Cas13a, Cas12c, Cas12d, Cas12e, Cas12h, Cas12i, Cas12g, Cas12f (Cas14), Cas12f1, Cas12j (CasΦ), and Argonaute and optionally has a nickase activity such that only one strand is cut. In some embodiments, the napDNAbp is selected from Cas9, Cas12e, Cas12d, Cas12a, Cas12b1, Cas12b2, Cas13a, Cas12c, Cas12d, Cas12e, Cas12h, Cas12i, Cas12g, Cas12f (Cas14), Cas12f1, Cas12j (CasΦ), and Argonaute and optionally has a nickase activity such that one DNA strand is cut preferentially to the other DNA strand. Nuclear localization sequence (NLS) [0101] The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., international PCT application, PCT/EP2000/011690, filed November 23, 2000, published as WO 2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference for its disclosure of exemplary nuclear localization sequences. In some embodiments, an NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 94), MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 99), KRTADGSEFESPKKKRKV (SEQ ID NO: 97), KRTADGSEFEPKKKRKV (SEQ ID NO: 106), NLSKRPAAIKKAGQAKKKK (SEQ ID NO: 107), PAAKRVKLD (SEQ ID NO: 98), RQRRNELKRSF (SEQ ID NO: 108), or NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 109).
B1195.70174WO00 12131093.2
Nucleic acid [0102] The term “nucleic acid,” as used herein, refers to a polymer of nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (e.g., 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo-pyrimidine, 3-methyl adenosine, 5- methylcytidine, C5 bromouridine, C5 fluorouridine, C5 iodouridine, C5 propynyl uridine, C5 propynyl cytidine, C5 methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8- oxoguanosine, O(6)-methylguanine, 4-acetylcytidine, 5-(carboxyhydroxymethyl)uridine, dihydrouridine, methylpseudouridine, 1-methyl adenosine, 1-methyl guanosine, N6-methyl adenosine, and 2-thiocytidine), chemically modified bases, biologically modified bases (e.g., methylated bases), intercalated bases, modified sugars (e.g., 2′-fluororibose, ribose, 2′- deoxyribose, 2′-O-methylcytidine, arabinose, and hexose), or modified phosphate groups (e.g., phosphorothioates and 5ʹ N phosphoramidite linkages). PEgRNA [0103] As used herein, the terms “prime editing guide RNA” or “PEgRNA” or “pegRNA” or “extended guide RNA” refer to a specialized form of a guide RNA that has been modified to include one or more additional sequences for implementing the prime editing methods and compositions described herein. As described herein, the prime editing guide RNAs comprise one or more “extended regions,” also referred to herein as “extension arms,” of nucleic acid sequence. The extended regions may comprise, but are not limited to, single-stranded RNA or DNA. Further, the extended regions may occur at the 3′ end of a traditional guide RNA. In other arrangements, the extended regions may occur at the 5′ end of a traditional guide RNA. In still other arrangements, the extended region may occur at an intramolecular region of the traditional guide RNA, for example, in the gRNA core region which associates and/or binds to the napDNAbp. The extended region comprises a “DNA synthesis template” which encodes (by the polymerase of the prime editor) a single-stranded DNA which, in turn, has been designed to be (a) homologous with the endogenous target DNA to be edited, and (b) which comprises at least one desired nucleotide change (e.g., a transition, a transversion, a deletion, or an insertion) to be introduced or integrated into the endogenous target DNA. The extended region may also comprise other functional sequence elements, such as, but not limited to, a “primer binding site” and a “linker” sequence, or other structural elements, such as, but not limited to, aptamers, stem
B1195.70174WO00 12131093.2
loops, hairpins, toe loops (e.g., a 3′ toeloop), or an RNA-protein recruitment domain (e.g., MS2 hairpin). As used herein, the “primer binding site” comprises a sequence that hybridizes to a single-strand DNA sequence having a 3′ end generated from the nicked DNA of the R-loop. [0104] In certain embodiments, the PEgRNAs have a 3ʹ extension arm, a spacer, and a gRNA core. The 3ʹ extension arm further comprises in the 5ʹ to 3ʹ direction a DNA synthesis template, a primer binding site, and a linker. The DNA synthesis template may also be referred to more broadly as the “DNA synthesis template” where the polymerase of a prime editor described herein is not an RT, but another type of polymerase. [0105] In certain other embodiments, the PEgRNAs have a 5ʹ extension arm, a spacer, and a gRNA core. The 5ʹ extension further comprises in the 5ʹ to 3ʹ direction a DNA synthesis template, a primer binding site, and a linker. The DNA synthesis template may also be referred to more broadly as the “DNA synthesis template” where the polymerase of a prime editor described herein is not an RT, but another type of polymerase. [0106] In still other embodiments, the PEgRNAs have in the 5ʹ to 3ʹ direction a spacer (1), a gRNA core (2), and an extension arm (3). The extension arm (3) is at the 3ʹ end of the PEgRNA. The extension arm (3) further comprises in the 5ʹ to 3ʹ direction a homology arm, an edit template, and a primer binding site. The extension arm (3) may also comprise an optional modifier region at the 3ʹ and 5ʹ ends, which may be the same sequences or different sequences. In addition, the 3ʹ end of the PEgRNA may comprise a transcriptional terminator sequence. These sequence elements of the PEgRNAs are further described and defined herein. [0107] In still other embodiments, the PEgRNAs have in the 5ʹ to 3ʹ direction an extension arm (3), a spacer (1), and a gRNA core (2). The extension arm (3) is at the 5ʹ end of the PEgRNA. The extension arm (3) further comprises in the 3ʹ to 5ʹ direction a primer binding site, an edit template, and a homology arm. The extension arm (3) may also comprise an optional modifier region at the 3ʹ and 5ʹ ends, which may be the same sequences or different sequences. The PEgRNAs may also comprise a transcriptional terminator sequence at the 3ʹ end. These sequence elements of the PEgRNAs are further described and defined herein. PE1 [0108] As used herein, “PE1” refers to a prime editing composition comprising 1) a fusion protein comprising a Cas9 protein variant Cas9(H840A) and a wild type MMLV RT having the following structure: [NLS]-[Cas9(H840A)]-[linker]-[MMLV_RT(wt)] -NLS and 2) a desired
B1195.70174WO00 12131093.2
PEgRNA, wherein the fusion protein (referred to as the PE1 protein) has the amino acid sequence of SEQ ID NO: 3, which is shown as follows. MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTD RHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDS FFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLI YLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAI LSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSK DTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYD EHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEK MDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKDNRE KIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERM TNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDL LFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDKDFLD NEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGWGRLSR KLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLH EHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNS RERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRL SDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLN AKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYD ENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKY PKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIR KRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKR NSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIME RSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILA DANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTK EVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTSESATPESSGGSSGG SSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQ YPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNKRVEDIH PTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGISGQLTWTRL PQGFKNSPTLFDEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRALLQTLGNLG YRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFLGTAGFCRLW IPGFAEMAAPLYPLTKTGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGY AKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPH AVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAE AHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELI ALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGLLTSEGKEIKNKDEILALLKALFLPKR LSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFEP KKKRKV (SEQ ID NO: 3) KEY: NUCLEAR LOCALIZATION SEQUENCE (NLS) TOP:(SEQ ID NO: 95), BOTTOM: (SEQ ID NO: 96) CAS9(H840A) (SEQ ID NO: 10) 33-AMINO ACID LINKER (SEQ ID NO: 80)
B1195.70174WO00 12131093.2
M-MLV reverse transcriptase (SEQ ID NO: 28). PE2 [0109] As used herein, “PE2” refers to prime editing composition comprising 1) a fusion protein comprising a Cas9 protein variant Cas9(H840A) and a variant MMLV RT having the following structure: [NLS]-[Cas9(H840A)]-[linker]- [MMLV_RT(D200N)(T330P)(L603W)(T306K)(W313F)] -NLS and 2) a desired PEgRNA, wherein the fusion protein (referred to as the PE2 protein) has the amino acid sequence of SEQ ID NO: 4, which is shown as follows: MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNT DRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVD DSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADL RLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA KAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQ LSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPI LEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFI ERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKA IVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDK DFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGW GRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQ GDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQK GQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQE LDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYW RQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRM NTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGT ALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITL ANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKE SILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKEL LGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGEL QKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFS KRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRK RYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSSGSETPGTSESATPESS GGSSGGSSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKAT STPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREV NKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGIS GQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLLLAATSELDCQQGTRA LLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKTPRQLREFL GKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPF ELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRMVAAIAVLTKDAGKLT MGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDRVQFGPVVALNPATLLPLPEE GLQHNCLDILAEAHGTRPDLTDQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKA
B1195.70174WO00 12131093.2
LPAGTSAQRAELIALTQALKMAEGKKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNK DEILALLKALFLPKRLSIIHCPGHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPS GGSKRTADGSEFEPKKKRKV (SEQ ID NO: 4) KEY: NUCLEAR LOCALIZATION SEQUENCE (NLS) TOP:(SEQ ID NO: 95), BOTTOM: (SEQ ID NO: 96) CAS9(H840A) (SEQ ID NO: 10) 33-AMINO ACID LINKER (SEQ ID NO: 80) M-MLV reverse transcriptase (SEQ ID NO: 29). PE3 As used herein, “PE3” refers a prime editing composition comprising a PE2 and further comprising a second-strand nicking guide RNA that complexes with the PE2 and introduces a nick in the non-edit DNA strand in order to induce preferential replacement of the edit strand. PE3b [0110] As used herein, “PE3b” refers to a prime editing composition comprising PE2 and further comprising a second-strand nicking guide RNA that complexes with PE2 and introduces a nick in the non-edit DNA strand, wherein the second-strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing the second strand nicking guide RNA with a spacer sequence that comprises complementarity to, and only hybridizes with, the edited strand after installation of the desired nucleotide edit(s), but not the endogenous target DNA sequence. Using this strategy, mismatches between the nicking guide RNA spacer and the unedited target DNA should disfavor nicking by the sgRNA until after the editing event on the PAM strand takes place. PE4 [0111] As used herein, “PE4” refers to a prime editing composition comprising a PE2 and further comprising an MLH1 dominant negative protein variant (i.e., wild-type MLH1 with amino acids 754-756 truncated, which may be referred to herein as “MLH1 Δ754-756” or “MLH1dn”). The MLH1 dominant negative protein variant may be expressed in trans in some embodiments. In some embodiments, a PE4 system comprises a fusion protein comprising a PE2 protein and an MLH1 dominant negative protein joined via an optional linker.
B1195.70174WO00 12131093.2
PE5 and PE5b [0112] As used herein, “PE5” refers to a prime editing composition comprising a PE3 and further comprising an MLH1 dominant negative protein variant (i.e., wild-type MLH1 with amino acids 754-756 truncated, which may be referred to as “MLH1 Δ754-756” or “MLH1dn”). The MLH1 dominant negative variant may be expressed in trans in some embodiments. In some embodiments, a PE5 system comprises a fusion protein comprising a PE2 protein and an MLH1 dominant negative protein joined via an optional linker. “PE5b” refers to a prime editing composition comprising a PE3 and an MLH1 dominant negative protein, wherein the second- strand nicking guide RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing the second strand nicking guide RNA with a spacer sequence that comprise complementarity to, and hybridizes with, only the edited strand after installation of the desired nucleotide edit(s), but not the endogenous target DNA sequence. PEmax [0113] As used herein, “PEmax” refers to a prime editing composition comprising 1) a fusion protein comprising a Cas9 protein variant Cas9(R221K N39K H840A) and a variant MMLV RT having the following structure: [bipartite NLS]-[Cas9(R221K)(N394K)(H840A)]-[linker]- [MMLV_RT(D200N)(T330P)(L603W)]-[bipartite NLS]-[NLS] and 2) a desired PEgRNA, wherein the fusion protein (referred to as the PEmax protein) has the amino acid sequence of SEQ ID NO: 5, which is shown as follows: MKRTADGSEFESPKKKRKVDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNT DRHSIKKNLIGALLFDSGETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVD DSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADL RLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENPINASGVDA KAILSARLSKSRKLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQ LSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIK RYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPI LEKMDGTEELLVKLKREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDFYPFLKD NREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFI ERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKA IVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLLKIIKDK DFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTGW GRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQ GDSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQK GQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQE LDINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYW
B1195.70174WO00 12131093.2
RQLLNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRM NTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGT ALIKKYPKLESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITL ANGEIRKRPLIETNGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKE SILPKRNSDKLIARKKDWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKEL LGITIMERSSFEKNPIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGEL QKGNELALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFS KRVILADANLDKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRK RYTSTKEVLDATLIHQSITGLYETRIDLSQLGGDSGGSSGGSKRTADGSEFESPKKK RKVSGGSSGGSTLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAP LIIPLKATSTPVSIKQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTND YRPVQDLREVNKRVEDIHPTVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPL FAFEWRDPEMGISGQLTWTRLPQGFKNSPTLFNEALHRDLADFRIQHPDLILLQYVDDLL LAATSELDCQQGTRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEAR KETVMGQPTPKTPRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKA YQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDP VAAGWPPCLRMVAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMT HYQALLLDTDRVQFGPVVALNPATLLPLPEEGLQHNCLDILAEAHGTRPDLTDQPLPDA DHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKALPAGTSAQRAELIALTQALKMAEG KKLNVYTDSRYAFATAHIHGEIYRRRGWLTSEGKEIKNKDEILALLKALFLPKRLSIIHCP GHQKGHSAEARGNRMADQAARKAAITETPDTSTLLIENSSPSGGSKRTADGSEFESPKK KRKVGSGPAAKRVKLD (SEQ ID NO: 5) KEY: BIPARTITE SV40 NUCLEAR LOCALIZATION SEQUENCE (NLS) TOP: (SEQ ID NO: 95), CAS9(R221K N39K H840A) (SEQ ID NO: 11) SGGSx2-BIPARTITE SV40NLS-SGGSx2 LINKER (SEQ ID NO: 79) M-MLV reverse transcriptase(D200N T306K W313F T330P L603W) (SEQ ID NO: 29) Other linker sequence (SEQ ID NO: 82) BIPARTITE SV40NLS (SEQ ID NO: 97) Other linker sequence c-Myc NLS (SEQ ID NO: 98) PE3max and PE3bmax [0114] As used herein, “PE3max” refers to a prime editing composition comprising a PEmax protein, a desired pegRNA, and a second strand nicking guide RNA. In some embodiments, PE3max can be considered as PE3 except wherein the PE2 component is substituted with PEmax. “PE3bmax” refers to a prime editing composition comprising a PEmax protein, a desired pegRNA, and a second strand nicking guide RNA, wherein the second-strand nicking guide
B1195.70174WO00 12131093.2
RNA is designed for temporal control such that the second strand nick is not introduced until after the installation of the desired edit. This is achieved by designing the second strand nicking guide RNA with a spacer sequence that comprise complementarity to, and hybridizes with, only the edited strand after installation of the desired nucleotide edit(s), but not the endogenous target DNA sequence. PE4max [0115] As used herein, “PE4max” refers to PE4 but wherein the PE2 component is substituted with PEmax. PE5max and PE5bmax [0116] As used herein, “PE5max” refers to PE5, but wherein the PE2 component of PE3 is substituted with PEmax. “PE5bmax” refers to PE5b wherein the PE2 component of PE3 is substituted with PEmax. PE6 [0117] The term “PE6” refers to a suite of prime editors (PE6a, PE6b, PE6c, PE6d, PE6e, PE6f, and PE6g) comprising improved reverse transcriptase and/or Cas9 variants. The improved reverse transcriptase and Cas9 domains of the PE6 variants can also be combined with each other to offer cumulative benefits. For example, a PE6 prime editor comprising an improved reverse transcriptase variant of PE6a and an improved Cas9 variant of PE6e is referred to herein as the prime editor “PE6a-e” (or “PE6e-a”). Any possible combination of PE6 prime editors is contemplated by the present disclosure including, for example, PE6a-e, PE6a-f, PE6a-g, PE6b-e, PE6b-f, PE6b-g, PE6c-e, PE6c-f, PE6c-g, PE6d-e, PE6d-f, and PE6d-g. [0118] Any of the PE6 prime editors may also comprise the architecture of the PEmax protein as provided herein. In some embodiments, any of the PE6 prime editors provided herein may further comprise additional amino acid mutations, e.g., any of those included in PEmax. [0119] In some embodiments, a PE6 protein comprises a reverse transcriptase of the following amino acid sequence (the RT domain of “PE6a”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDEKYTLKKI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALDYLVDICTKDDFVVQGAL
B1195.70174WO00 12131093.2
TSSYIATLCLFAVEGDVVRRAQRKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH NLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPSDKVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGRVNELGRVGHEKYESFKKQLQAIKPMPSNRDVAVIDAAIKSLELSYSK GNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 52) [0120] In some embodiments, a PE6 protein comprises a reverse transcriptase of the following amino acid sequence (the RT domain of “PE6b”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLTPVKMQAMNDEINQGLKGGIIRESKAINACPVIFVPRKEGTLRMVVDYRPLNK YVKPNVYPLPLIEQLLAKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGLTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISI (SEQ ID NO: 76) [0121] In some embodiments, a PE6 protein comprises a reverse transcriptase of the following amino acid sequence (the RT domain of “PE6c”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLTPVKMQAMNDEINQGLKGGIIRESKAINACPVIFVPRKEGTLRMVVDYRPLNK YVKPNVYPLPLIEQLLAKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGIKTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFLGYHISEKGLTPCQENIDKVLQWKQPKNQKELRQFLGQVNY LRKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY
B1195.70174WO00 12131093.2
LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISI (SEQ ID NO: 77) [0122] In some embodiments, a PE6 protein comprises a reverse transcriptase comprising the following amino acid sequence (the RT domain of “PE6d”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: TLNIEDEYRLHETSKEPDVSLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSI KQYPMSQEARLGIKPHIQRLLDQGILVPCQSPWNTPLLPVKKPGTNDYRPVQDLREVNK RVEDIHPNVPNPYNLLSGLPPSHQWYTVLDLKDAFFCLRLHPTSQPLFAFEWRDPEMGIS GQLTWTRLPQGFKNSPTLFCEALHRDLADFRIQHPDLILLQYYDDLLLAATSELDCQQG TRALLQTLGNLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPTPKT PRQLREFLGKAGFCRLFIPGFAEMAAPLYPLTKPGTLFNWGPDQQKAYQEIKQALLTAP ALGLPDLTKPFELFVDEKQGYAKGVLTQKLGPWRRPVAYLSKKLDPVAAGWPPCLRM VAAIAVLTKDAGKLTMGQPLVILAPHAVEALVKQPPDRWLSNARMTHYQALLLDTDR VQFGPVVALNPATLLPLPEEGLQHNCLD (SEQ ID NO: 120) [0123] In some embodiments, a PE6 protein comprises a Cas9 protein of the following amino acid sequence (the Cas9 domain of “PE6e”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIF GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNS DVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFG NLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSD AILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGY AGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGEL HAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEE VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPA FLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLL KIIKDKDFLDNEENEDILEDIVLTLTLFEDREMIEERLKTYAHLFDDKVMKQLKRRRYTG WGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQG
B1195.70174WO00 12131093.2
DSLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQRN SRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLSD YDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLIT QRKFDNLTKAERGGLSELDKAGFIARQLVETRQITKHVAQILDSRMNTKYDENDKLIRE VKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVYG DYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETGEI VWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPKK YGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYKE VKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKGS PEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENI IHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD (SEQ ID NO: 121) [0124] In some embodiments, a PE6 protein comprises a Cas9 protein of the following amino acid sequence (the Cas9 domain of “PE6f”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFRRLEESFLVEEDKKHERHPIF GNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNS DVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFG NLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSD AILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGY AGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGEL HAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEKTITPWNFEE VVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPA FLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLGTYHDLL KIIKDKDFLDNEENEDILEDIVLTLTLFEDREMVEERLKTYAHLFDNKVMKQLKRRRYT GWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQ GDSLYEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQK NSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINRLS DYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLI
B1195.70174WO00 12131093.2
TQRKFDNLTKAERGGLSELDKAGFIARQLVETRQITKHVAQILDSRMNTKYDENDKLIR EVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFVY GDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGETG EIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKKDWDPK KYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLEAKGYK EVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHYEKLKG SPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAE NIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQLGGD (SEQ ID NO: 122) [0125] In some embodiments, a PE6 protein comprises a Cas9 protein of the following amino acid sequence (the Cas9 domain of “PE6g”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: [0126] MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDS GETAEATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFRRLEESFLVEEDKKHE RHPIFGNIVDEVAYHEKYPTIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDL NPDNSDVDKLFIQLVQTYNQLFEENPINASGVDAKAILSARLSKSRRLENLIAQLPGEKK NGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAA KNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQ SKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQI HLGELHAILRRQEDFYPFLKDNREKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEKTITP WNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEG MRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISGVEDRFNASLG TYHDLLKIIKDKDFLDNEENEDILEDIVLTLTLFEDREMVEERLKTYAHLFDNKVMKQL KRCRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKA QVSGQGDSLYEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTT QKGQKNSRERMKRIEEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQEL DINRLSDYDVDAIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNYWRQL LNAKLITQRKFDNLTKAERGGLSELDKAGFIKRQLVETRQITKHVAQILDSRMNTKYDE NDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKL ESEFVYGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIET
B1195.70174WO00 12131093.2
NGETGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKESILPKRNSDKLIARKK DWDPKKYGGFDSPTVAYSVLVVAKVEKGKSKKLKSVKELLGITIMERSSFEKNPIDFLE AKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNELALPSKYVNFLYLASHY EKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPI REQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYTSTKEVLDATLIHQSITGLYETRIDLSQ LGGD (SEQ ID NO: 123) Polymerase [0127] As used herein, the term “polymerase” refers to an enzyme that synthesizes a nucleotide strand and that may be used in connection with the prime editor delivery systems described herein. The polymerase can be a “template-dependent” polymerase (i.e., a polymerase that synthesizes a nucleotide strand based on the order of nucleotide bases of a template strand). The polymerase can also be a “template-independent” polymerase (i.e., a polymerase that synthesizes a nucleotide strand without the requirement of a template strand). A polymerase may also be further categorized as a “DNA polymerase” or an “RNA polymerase.” In various embodiments, the prime editor system comprises a DNA polymerase. In various embodiments, the DNA polymerase can be a “DNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of DNA). In such cases, the DNA template molecule can be a PEgRNA, wherein the extension arm comprises a strand of DNA. In such cases, the PEgRNA may be referred to as a chimeric or hybrid PEgRNA which comprises an RNA portion (i.e., the guide RNA components, including the spacer and the gRNA core) and a DNA portion (i.e., the extension arm). In various other embodiments, the DNA polymerase can be an “RNA-dependent DNA polymerase” (i.e., whereby the template molecule is a strand of RNA). In such cases, the PEgRNA is RNA, i.e., including an RNA extension. The term “polymerase” may also refer to an enzyme that catalyzes the polymerization of nucleotides (i.e., the polymerase activity). Generally, the enzyme will initiate synthesis at the 3′-end of a primer annealed to a polynucleotide template sequence (e.g., such as a primer sequence annealed to the primer binding site of a PEgRNA) and will proceed toward the 5′ end of the template strand. A “DNA polymerase” catalyzes the polymerization of deoxynucleotides. As used herein in reference to a DNA polymerase, the term DNA polymerase includes a “functional fragment thereof.” A “functional fragment thereof” refers to any portion of a wild-type or mutant DNA polymerase that encompasses less than the entire amino acid sequence of the polymerase and which retains the ability, under at least one set of conditions, to
B1195.70174WO00 12131093.2
catalyze the polymerization of a polynucleotide. Such a functional fragment may exist as a separate entity, or it may be a constituent of a larger polypeptide, such as a fusion protein. Prime editing [0128] As used herein, the term “prime editing” refers to an approach for gene editing using napDNAbps, a polymerase (e.g., a reverse transcriptase), and specialized guide RNAs that include a primer binding site and a DNA synthesis template for encoding desired new genetic information (or deleting genetic information) that is then incorporated into a target DNA sequence. For example, prime editing may be used to incorporate one or more recombinase recognition sequences into target DNA sequence such as a genome, as described herein. Prime editing is described in Anzalone, A. V. et al., Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019), which is incorporated herein by reference. See also International PCT Application, PCT/US2020/023721, filed March 19, 2020, and published as WO 2020/191239, which is incorporated herein by reference. [0129] Prime editing represents a platform for genome editing that is a versatile and precise method to directly write new genetic information into a specified DNA site using a nucleic acid programmable DNA binding protein (“napDNAbp”) working in association with a polymerase (i.e., in the form of a fusion protein or otherwise provided in trans with the napDNAbp), wherein the prime editing system is programmed with a prime editing (PE) guide RNA (“PEgRNA”) that both specifies the target site and templates the synthesis of the desired edit (e.g., a recombinase recognition sequence to be inserted into a target DNA) in the form of a replacement DNA strand by way of an extension (either DNA or RNA) engineered onto a guide RNA (e.g., at the 5ʹ or 3ʹ end, or at an internal portion of a guide RNA). The replacement strand containing the desired edit (e.g., a single nucleobase substitution) shares the same sequence as the endogenous strand (or is homologous to it) immediately downstream of the nick site of the target site to be edited (with the exception that it includes the desired edit). Through DNA repair and/or replication machinery, the endogenous strand downstream of the nick site is replaced by the newly synthesized replacement strand containing the desired edit. In some cases, prime editing may be thought of as a “search-and-replace” genome editing technology since the prime editors, as described herein, not only search and locate the desired target site to be edited, but at the same time, encode a replacement strand containing a desired edit that is installed in place of the corresponding target site endogenous DNA strand. The prime editors of the present disclosure
B1195.70174WO00 12131093.2
relate, in part, to the discovery that the mechanism of target-primed reverse transcription (TPRT) or “prime editing” can be leveraged or adapted for conducting precision CRISPR/Cas-based genome editing with high efficiency and genetic flexibility. TPRT is naturally used by mobile DNA elements, such as mammalian non-LTR retrotransposons and bacterial Group II introns. Cas protein-reverse transcriptase fusions or related systems are used to target a specific DNA sequence with a guide RNA, generate a single strand nick at the target site, and use the nicked DNA as a primer for reverse transcription of an engineered DNA synthesis template that is integrated with the guide RNA. However, while the concept begins with prime editors that use reverse transcriptase as the DNA polymerase component, the prime editors described herein are not limited to reverse transcriptases but may include the use of virtually any DNA polymerase. Indeed, while the application throughout may refer to prime editors with “reverse transcriptases,” it is set forth here that reverse transcriptases are only one type of DNA polymerase that may work with prime editing. Thus, wherever the specification mentions a “reverse transcriptase,” the person having ordinary skill in the art should appreciate that any suitable DNA polymerase may be used in place of the reverse transcriptase. Thus, in one aspect, the prime editors may comprise Cas9 (or an equivalent napDNAbp), which is programmed to target a DNA sequence by associating it with a specialized guide RNA (i.e., PEgRNA) containing a spacer sequence that anneals to a complementary sequence (the complementary sequence to an endogenous protospacer sequence) in the target DNA. The PEgRNA also contains new genetic information in the form of an extension that encodes a replacement strand of DNA containing a desired nucleotide change which is used to replace a corresponding endogenous DNA strand at the target site. To transfer information from the PEgRNA to the target DNA, the mechanism of prime editing involves nicking the target site in one strand of the DNA to expose a 3′-hydroxyl group. The exposed 3′-hydroxyl group can then be used to prime the DNA polymerization of the edit- encoding extension on PEgRNA directly into the target site. In various embodiments, the extension—which provides the template for polymerization of the replacement strand containing the edit—can be formed from RNA or DNA. In the case of an RNA extension, the polymerase of the prime editor can be an RNA-dependent DNA polymerase (such as a reverse transcriptase). In the case of a DNA extension, the polymerase of the prime editor may be a DNA-dependent DNA polymerase. The newly synthesized strand (i.e., the replacement DNA strand containing the desired nucleotide edit) that is formed by the prime editor would be homologous to the genomic
B1195.70174WO00 12131093.2
target sequence (i.e., have the same sequence as), except for the inclusion of one or more desired nucleotide changes (e.g., a single nucleotide substitution, a deletion, or an insertion, or a combination thereof). The newly synthesized (or replacement) strand of DNA may also be referred to as a single strand DNA flap, which would compete for hybridization with the complementary homologous endogenous DNA strand, thereby displacing the corresponding endogenous strand. Resolution of the hybridized intermediate (also referred to as a heteroduplex, comprising the single strand DNA flap synthesized by the reverse transcriptase hybridized to the endogenous DNA strand with the exception of mismatches at positions where desired nucleotide edits are installed in the edit strand) can include removal of the resulting displaced flap of endogenous DNA (e.g., with a 5ʹ end DNA flap endonuclease, FEN1), ligation of the synthesized single strand DNA flap to the target DNA, and assimilation of the desired nucleotide changes as a result of cellular DNA repair and/or replication processes. Because templated DNA synthesis offers single nucleotide precision for the modification of any nucleotide, including insertions and deletions, the scope of this approach is very broad and could foreseeably be used for myriad applications in basic science and therapeutics. In certain embodiments, the system can be combined with the use of an error-prone reverse transcriptase enzyme (e.g., provided as a fusion protein with the Cas9 domain, or provided in trans to the Cas9 domain). The error-prone reverse transcriptase enzyme can introduce alterations during synthesis of the single strand DNA flap. Thus, in certain embodiments, error-prone reverse transcriptase can be utilized to introduce nucleotide changes to the target DNA. Depending on the error-prone reverse transcriptase that is used with the system, the changes can be random or non-random. [0130] In various embodiments, prime editing operates by contacting a target DNA molecule (for which a change in the nucleotide sequence is desired to be introduced) with a nucleic acid programmable DNA binding protein (napDNAbp) complexed with a prime editing guide RNA (PEgRNA). In various embodiments, the prime editing guide RNA (PEgRNA) comprises an extension at the 3′ or 5′ end of the guide RNA, or at an intramolecular location in the guide RNA, and encodes the desired nucleotide change (e.g., single nucleotide substitution, insertion, or deletion). First, the napDNAbp/extended gRNA complex contacts the DNA molecule, and the extended gRNA guides the napDNAbp to bind to a target locus. Next, a nick in one of the strands of DNA of the target locus is introduced (e.g., by a nuclease or chemical agent), thereby creating an available 3′ end in one of the strands of the target locus. In certain embodiments, the
B1195.70174WO00 12131093.2
nick is created in the strand of DNA that corresponds to the R-loop strand, i.e., the strand that is not hybridized to the guide RNA sequence, i.e., the “non-target strand.” The nick, however, could be introduced in either of the strands. That is, the nick could be introduced into the R-loop “target strand” (i.e., the strand hybridized to the protospacer of the extended gRNA) or the “non- target strand” (i.e., the strand forming the single-stranded portion of the R-loop and which is complementary to the target strand). In the next step, the 3′ end of the DNA strand (formed by the nick) interacts with the extended portion of the guide RNA in order to prime reverse transcription (i.e., “target-primed RT”). In certain embodiments, the 3′ end DNA strand hybridizes to a specific RT priming sequence on the extended portion of the guide RNA, i.e., the “reverse transcriptase priming sequence” or “primer binding site” on the PEgRNA. In the next step, a reverse transcriptase (or other suitable DNA polymerase) is introduced that synthesizes a single strand of DNA from the 3′ end of the primed site towards the 5′ end of the prime editing guide RNA. The DNA polymerase (e.g., reverse transcriptase) can be fused to the napDNAbp or alternatively can be provided in trans to the napDNAbp. This forms a single-strand DNA flap comprising the desired nucleotide change (e.g., the single base change, insertion, or deletion, or a combination thereof, for example, a recombinase recognition sequence to be inserted into a target DNA sequence such as a genome) and that is otherwise homologous to the endogenous DNA at or adjacent to the nick site. In the next step, the napDNAbp and guide RNA are released. The final two steps relate to the resolution of the single strand DNA flap such that the desired nucleotide change becomes incorporated into the target locus. This process can be driven towards the desired product formation by removing the corresponding 5′ endogenous DNA flap that forms once the 3′ single strand DNA flap invades and hybridizes to the endogenous DNA sequence. Without being bound by theory, the cell’s endogenous DNA repair and replication processes resolve the mismatched DNA to incorporate the nucleotide change(s) to form the desired altered product. The process can also be driven towards product formation with “second strand nicking.” This process may introduce at least one or more of the following genetic changes: transversions, transitions, deletions, and insertions (e.g., insertion of a recombinase recognition sequence). In some embodiments, one or more recombinase recognition sequences are inserted into a target DNA sequence using prime editing, and then these recombinase recognition sequences are contacted with a recombinase (e.g., any of the evolved recombinases
B1195.70174WO00 12131093.2
provided herein) and, optionally, a donor DNA sequence to be inserted into the target DNA sequence. [0131] The term “prime editor (PE) system” or “prime editor (PE)” or “PE system” or “PE editing system” refers the compositions involved in the method of genome editing using target- primed reverse transcription (TPRT) described herein, including, but not limited to, the napDNAbps, reverse transcriptases, fusion proteins (e.g., comprising napDNAbps and reverse transcriptases), prime editing guide RNAs, and complexes comprising fusion proteins and prime editing guide RNAs, as well as accessory elements, such as second strand nicking components (e.g., second strand nicking sgRNAs) and 5′ endogenous DNA flap removal endonucleases (e.g., FEN1) for helping to drive the prime editing process towards the edited product formation. [0132] Although in the embodiments described thus far the PEgRNA constitutes a single molecule comprising a guide RNA (which itself comprises a spacer sequence and a gRNA core or scaffold) and a 5ʹ or 3ʹ extension arm comprising the primer binding site and a DNA synthesis template, the PEgRNA may also take the form of two individual molecules. For example, in some embodiments, a PEgRNA may comprise a guide RNA and a trans prime editor RNA template (tPERT), which essentially houses the extension arm (including, in particular, the primer binding site and the DNA synthesis domain) and an RNA-protein recruitment domain (e.g., MS2 aptamer or hairpin) in the same molecule which becomes co-localized or recruited to a modified prime editor complex that comprises a tPERT recruiting protein (e.g., MS2cp protein, which binds to the MS2 aptamer). [0133] A prime editor system can comprise one or more prime editing guide RNAs (PEgRNAs). In some embodiments, a prime editor system has one PEgRNA (the “single flap prime editing system”) that targets one strand of a double stranded DNA, e.g., a target genomic site. For example, a single flap prime editing system may comprise a spacer sequence that comprises complementarity to a target strand of a double stranded target DNA, a primer binding site that comprises complementarity to a non-target strand of the double stranded target DNA, and a DNA synthesis template that comprises (and encodes) a nucleotide edit compared to the double stranded target DNA sequence, e.g., a recombinase recognition site. In some embodiments, a prime editor system (the “dual-flap prime editing system” or “twin prime editing” or “twinPE”) comprises at least two different PEgRNAs that can target opposite strands of a double stranded target DNA, e.g., a target genomic site. For example, a twin prime editing system may comprise
B1195.70174WO00 12131093.2
two PEgRNAs, wherein each of the two PEgRNAs comprises a DNA synthesis template having a region of complementarity to each other, and direct the synthesis of two 3′ flaps having a region of complementarity to each other and contains a nucleotide edit compared to the double stranded target DNA sequence, (e.g., a recombinase recognition sequence). Unlike single flap prime editing, there is no requirement for the pair of edited DNA strands (3′ flaps) to directly compete with 5′ flaps in endogenous genomic DNA (i.e., no requirement for a homology arm in the extension arm which would generate a region having complementarity to the endogenous DNA), as the complementary edited strand is available for hybridization instead. Since both strands of the duplex are synthesized as edited DNA, the dual-flap prime editing system obviates the need for the replacement of the non-edited complementary DNA strand required by classical prime editing. Instead, cellular DNA repair machinery need only excise the paired 5′ flaps (original genomic DNA) and ligate the paired 3′ flaps (edited DNA) into the locus. Therefore, there is also no need to include sequences homologous to genomic DNA in the newly synthesized DNA strands, allowing selective hybridization of the new strands and facilitating edits that contain minimal genomic homology. Nuclease-active versions of prime editors that cut both strands of DNA could also be used to accelerate the removal of the original DNA sequence. [0134] Variants of twin prime editing include quadruple-flap prime editing whereby the two sets of twin prime editors are used to introduce a genetic change at two different genetic loci, e.g., two different recombinase recognition sequences located at the 5′ end and 3′ end of a gene. [0135] Like classical prime editing, twin prime editing (including dual-flap and quadruple-flap prime editing) is a versatile and precise genome editing method that directly writes new genetic information into a specified DNA site using a nucleic acid programmable DNA binding protein (“napDNAbp”) working in association with a polymerase (i.e., in the form of a fusion protein or otherwise provided in trans with the napDNAbp), wherein the prime editing system is programmed with a prime editing (PE) guide RNA (“PEgRNA”) that both specifies the target site and templates the synthesis of the desired edit in the form of a replacement DNA strand by way of an extension (either DNA or RNA) engineered onto a guide RNA (e.g., at the 5ʹ or 3ʹ end, or at an internal portion of a guide RNA). The replacement strand containing the desired edit (e.g., a recombinase recognition sequence for insertion into a target DNA sequence) shares the same sequence as the endogenous strand of the target site to be edited (with the exception that it includes the desired edit). Through DNA repair and/or replication machinery, the endogenous
B1195.70174WO00 12131093.2
strand of the target site is replaced by the newly synthesized replacement strand containing the desired edit. Prime editor [0136] The term “prime editor” refers to the polypeptide or polypeptide components involved in prime editing as described herein. In some embodiments, a prime editor comprises a fusion constructs comprising a napDNAbp (e.g., Cas9 nickase) and a reverse transcriptase. In some embodiments, a prime editor is capable of carrying out prime editing on a target nucleotide sequence in the presence of a PEgRNA (or “extended guide RNA”). In some embodiments, a prime editor comprises a napDNAbp (e.g., Cas9 nickase) and a reverse transcriptase provided in trans, i.e., the napDNAbp and the reverse transcriptase are not fused. The in trans napDNAbp and the reverse transcriptase may be tethered via a non-peptide linkage, e.g., a MS2 RNA-protein binding RNA sequence and a MS2 coat protein fused to either the napDNAbp or the reverse transcriptase, or may be unlinked to each other and simply recruited by the pegRNA. In some embodiments, a prime editor composition, system, or complex provided herein comprises a fusion protein or a fusion protein complexed with a PEgRNA, and/or further complexed with a second-strand nicking sgRNA. In some embodiments, the prime editor system may also refer to the complex comprising a fusion protein (reverse transcriptase fused to a napDNAbp), a PEgRNA, and a regular guide RNA capable of directing the second-site nicking step of the non- edited strand as described herein. Primer binding site [0137] The term “primer binding site” or “PBS” refers to the portion of a PEgRNA as a component of the extension arm (e.g., at the 3ʹ end of the extension arm), and is a single- stranded portion of the PEgRNA as a component of the extension arm that comprises a region of complementarity to a sequence on the non-target strand of a double stranded target DNA. In some embodiments, the primer binding site is complementary to a region upstream of a nick site in a non-target strand. In some embodiments, the primer binding site is complementary to a region immediately upstream of a nick site in the non-target strand. In some embodiments, the primer binding site is capable of binding to the primer sequence that is formed after nicking of the edit strand (the non-target strand) of the target DNA sequence by the prime editor. When the prime editor (e.g., by a Cas9 nickase component of a prime editor) nicks the edit strand of the target DNA sequence, a free 3′ end is formed in the edit strand, which serves as a primer
B1195.70174WO00 12131093.2
sequence that anneals to the primer binding site on the PEgRNA to prime reverse transcription. In some embodiments, the PBS is complementary to or substantially complementary to and can anneal to a free 3′ end on the non-target strand of the double stranded target DNA at the nick site. In some embodiments, the PBS anneals to the free 3′ end on the non-target strand can initiate target-primed DNA synthesis. Protein, peptide, and polypeptide [0138] The terms “protein,” “peptide,” and “polypeptide” are used interchangeably herein and refer to a polymer of amino acid residues linked together by peptide (amide) bonds. The terms refer to a protein, peptide, or polypeptide of any size, structure, or function. Typically, a protein, peptide, or polypeptide will be at least three amino acids long. A protein, peptide, or polypeptide may refer to an individual protein or a collection of proteins. One or more of the amino acids in a protein, peptide, or polypeptide may be modified, for example, by the addition of a chemical entity such as a carbohydrate group, a hydroxyl group, a phosphate group, a farnesyl group, an isofarnesyl group, a fatty acid group, a linker for conjugation, functionalization, or other modification, etc. A protein, peptide, or polypeptide may also be a single molecule or may be a multi-molecular complex. A protein, peptide, or polypeptide may be just a fragment of a naturally occurring protein or peptide. A protein, peptide, or polypeptide may be naturally occurring, recombinant, or synthetic, or any combination thereof. Any of the proteins provided herein may be produced by any method known in the art. For example, the proteins provided herein may be produced via recombinant protein expression and purification, which is especially suited for fusion proteins comprising a peptide linker. Methods for recombinant protein expression and purification are well known, and include those described by Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)), the contents of which are incorporated herein by reference. Protospacer [0139] As used herein, the term “protospacer” refers to the sequence (e.g., of ~20 bp) in DNA adjacent to the PAM (protospacer adjacent motif) sequence. The protospacer shares the same sequence as the spacer sequence of the guide RNA (except that a protospacer contains Thymine and the spacer sequence contains Uracil). The guide RNA anneals to the complement of the protospacer sequence on the target DNA (specifically, one strand thereof, i.e., the “target strand”
B1195.70174WO00 12131093.2
versus the “non-target strand” of the target DNA sequence). In some embodiments, in order for a Cas nickase component of a prime editor to function, it also requires a specific protospacer adjacent motif (PAM) that varies depending on the Cas protein component itself, e.g., the type of Cas protein and the bacterial species from which it is derived. The most commonly used Cas9 nuclease, derived from S. pyogenes, recognizes a PAM sequence of NGG that is directly downstream of the protospacer sequence in the genomic DNA, on the non-target strand. Protospacer adjacent motif (PAM) [0140] As used herein, the term “protospacer adjacent motif” or “PAM” refers to a DNA sequence (e.g., an approximately 2-6 nucleotide sequence) that is an important targeting component of a Cas nuclease, e.g., a Cas9. For example, in some embodiments for a Cas9 nuclease, the PAM sequence is on either strand and is downstream in the 5ʹ to 3ʹ direction of the Cas9 cut site. The canonical PAM sequence (i.e., the PAM sequence that is associated with the Cas9 nuclease of Streptococcus pyogenes or SpCas9) is 5ʹ-NGG-3ʹ, wherein “N” is any nucleobase followed by two guanine (“G”) nucleobases. In some embodiments, SpCas9 can also recognize additional non-canonical PAMs (e.g., NAG and NGA). [0141] Different PAM sequences can be associated with different Cas9 nucleases or equivalent proteins from different organisms. In addition, any given Cas9 nuclease, e.g., SpCas9, may be modified to alter the PAM specificity of the nuclease such that the nuclease recognizes an alternative PAM sequence. [0142] For example, with reference to the canonical SpCas9 amino acid sequence SEQ ID NO: 6, the PAM sequence can be modified by introducing one or more mutations, including (a) D1135V, R1335Q, and T1337R “the VQR variant,” which alters the PAM specificity to NGAN or NGNG, (b) D1135E, R1335Q, and T1337R “the EQR variant,” which alters the PAM specificity to NGAG, and (c) D1135V, G1218R, R1335E, and T1337R “the VRER variant,” which alters the PAM specificity to NGCG. In addition, the D1135E variant of canonical SpCas9 still recognizes NGG, but it is more selective compared to the wild type SpCas9 protein. [0143] It will also be appreciated that Cas9 enzymes from different bacterial species (i.e., Cas9 orthologs) can have varying PAM specificities. For example, Cas9 from Staphylococcus aureus (SaCas9) recognizes NGRRT or NGRRN. In addition, Cas9 from Neisseria meningitis (NmCas) recognizes NNNNGATT. In another example, Cas9 from Streptococcus thermophilis (StCas9) recognizes NNAGAAW. In still another example, Cas9 from Treponema denticola (TdCas)
B1195.70174WO00 12131093.2
recognizes NAAAAC. These are examples and are not meant to be limiting. It will be further appreciated that non-SpCas9s bind a variety of PAM sequences, which makes them useful when no suitable SpCas9 PAM sequence is present at the desired target cut site. Furthermore, non- SpCas9s may have other characteristics that make them more useful than SpCas9. For example, Cas9 from Staphylococcus aureus (SaCas9) is about 1 kilobase smaller than SpCas9, so it can be packaged into adeno-associated virus (AAV). Further reference is made to Shah et al., “Protospacer recognition motifs: mixed identities and functional diversity,” RNA Biology, 10(5): 891-899 (which is incorporated herein by reference). Recombinase [0144] The term “recombinase,” as used herein, refers to a site-specific enzyme that mediates the recombination of DNA between recombinase recognition sequences, which results in the excision, integration, inversion, or exchange (e.g., translocation) of DNA fragments between the recombinase recognition sequences. Recombinases can be classified into two distinct families: serine recombinases (e.g., resolvases and invertases) and tyrosine recombinases (e.g., integrases). Examples of serine recombinases include, without limitation, Hin, Gin, Tn3, β-six, CinH, ParA, γδ, Bxb1, ϕC31, TP901, TG1, φBT1, R4, φRV1, φFC1, MR11, A118, U153, and gp29. Examples of tyrosine recombinases include, without limitation, Cre, FLP, R, Lambda, HK101, HK022, and pSAM2. In some embodiments, a recombinase is a Bxb1 recombinase, or a variant thereof (e.g., any of the evolved Bxb1 recombinases described herein). The serine and tyrosine recombinase names stem from the conserved nucleophilic amino acid residue that the recombinase uses to attack the DNA and which becomes covalently linked to the DNA during strand exchange. Recombinases have numerous applications, including the creation of gene knockouts/knock-ins and gene therapy applications. See, e.g., Brown et al., “Serine recombinases as tools for genome engineering.” Methods.2011;53(4):372-9; Hirano et al., “Site- specific recombinases as tools for heterologous gene integration.” Appl. Microbiol. Biotechnol. 2011; 92(2):227-39; Chavez and Calos, “Therapeutic applications of the ΦC31 integrase system.” Curr. Gene Ther.2011;11(5):375-81; Turan and Bode, “Site-specific recombinases: from tag-and-target- to tag-and-exchange-based genomic modifications.” FASEB J.2011; 25(12):4088-107; Venken and Bellen, “Genome-wide manipulations of Drosophila melanogaster with transposons, Flp recombinase, and ΦC31 integrase.” Methods Mol. Biol.2012; 859:203-28; Murphy, “Phage recombinases and their applications.” Adv. Virus Res.2012; 83:367-414; Zhang
B1195.70174WO00 12131093.2
et al., “Conditional gene manipulation: Cre-ating a new biological era.” J. Zhejiang Univ. Sci. B. 2012; 13(7):511-24; Karpenshif and Bernstein, “From yeast to mammals: recent advances in genetic control of homologous recombination.” DNA Repair (Amst).2012; 1;11(10):781-8; the entire contents of each are hereby incorporated by reference. [0145] The recombinases provided herein are not meant to be exclusive examples of recombinases that can be used in embodiments of the invention. The methods and compositions of the invention can be expanded by mining databases for new orthogonal recombinases or designing synthetic recombinases with defined DNA specificities (See, e.g., Groth et al., “Phage integrases: biology and applications.” J. Mol. Biol.2004; 335, 667-678; Gordley et al., “Synthesis of programmable integrases.” Proc. Natl. Acad. Sci. USA.2009; 106, 5053-5058; the entire contents of each are hereby incorporated by reference in their entirety). Corresponding mutations or substitutions to those in the evolved Bxb1 recombinases provided herein may also be made at homologous positions in other recombinases, and the present disclosure contemplates the use of such variants of recombinases homologous to Bxb1 [0146] Other examples of recombinases that are useful in the methods and compositions described herein are known to those of skill in the art, and any new recombinase that is discovered or generated is expected to be able to be used in the different embodiments of the invention (e.g., recombinases incorporating the mutations or substitutions described herein at positions homologous to those in the evolved Bxb1 variants described herein). In some embodiments, the catalytic domains of a recombinase are fused to a nuclease-inactivated RNA- programmable nuclease (e.g., dCas9, or a fragment thereof), such that the recombinase domain does not comprise a nucleic acid binding domain or is unable to bind to a target nucleic acid (e.g., the recombinase domain is engineered such that it does not have specific DNA binding activity). Recombinases lacking DNA binding activity and methods for engineering such are known, and include those described by Klippel et al., “Isolation and characterisation of unusual gin mutants.” EMBO J.1988; 7: 3983–3989: Burke et al., “Activating mutations of Tn3 resolvase marking interfaces important in recombination catalysis and its regulation. Mol Microbiol.2004; 51: 937–948; Olorunniji et al., “Synapsis and catalysis by activated Tn3 resolvase mutants.” Nucleic Acids Res.2008; 36: 7181–7191; Rowland et al., “Regulatory mutations in Sin recombinase support a structure-based model of the synaptosome.” Mol Microbiol.2009; 74: 282–298; Akopian et al., “Chimeric recombinases with designed DNA
B1195.70174WO00 12131093.2
sequence recognition.” Proc Natl Acad Sci USA.2003;100: 8688–8691; Gordley et al., “Evolution of programmable zinc finger-recombinases with activity in human cells. J Mol Biol. 2007; 367: 802–813; Gordley et al., “Synthesis of programmable integrases.” Proc Natl Acad Sci USA.2009;106: 5053–5058; Arnold et al., “Mutants of Tn3 resolvase which do not require accessory binding sites for recombination activity.” EMBO J.1999;18: 1407–1414; Gaj et al., “Structure-guided reprogramming of serine recombinase DNA sequence specificity.” Proc Natl Acad Sci USA.2011;108(2):498-503; and Proudfoot et al., “Zinc finger recombinases with adaptable DNA sequence specificity.” PLoS One.2011;6(4):e19537; the entire contents of each are hereby incorporated by reference. [0147] For example, serine recombinases of the resolvase-invertase group, e.g., Tn3 and γδ resolvases and the Hin and Gin invertases, have modular structures with autonomous catalytic and DNA-binding domains (See, e.g., Grindley et al., “Mechanism of site-specific recombination.” Ann Rev Biochem.2006; 75: 567–605, the entire contents of which are incorporated by reference). The catalytic domains of these recombinases are thus amenable to being recombined with nuclease-inactivated RNA-programmable nucleases (e.g., dCas9, or a fragment thereof) as described herein, e.g., following the isolation of “activated” recombinase mutants that do not require any accessory factors (e.g., DNA binding activities) (See, e.g., Klippel et al., “Isolation and characterisation of unusual gin mutants.” EMBO J.1988; 7: 3983– 3989: Burke et al., “Activating mutations of Tn3 resolvase marking interfaces important in recombination catalysis and its regulation. Mol Microbiol.2004; 51: 937–948; Olorunniji et al., “Synapsis and catalysis by activated Tn3 resolvase mutants.” Nucleic Acids Res.2008; 36: 7181–7191; Rowland et al., “Regulatory mutations in Sin recombinase support a structure-based model of the synaptosome.” Mol Microbiol.2009; 74: 282–298; Akopian et al., “Chimeric recombinases with designed DNA sequence recognition.” Proc Natl Acad Sci USA.2003;100: 8688–8691). [0148] Additionally, many other natural serine recombinases having an N-terminal catalytic domain and a C-terminal DNA binding domain are known (e.g., phiC31 integrase, TnpX transposase, IS607 transposase), and their catalytic domains can be co-opted to engineer programmable site-specific recombinases as described herein (See, e.g., Smith et al., “Diversity in the serine recombinases.” Mol Microbiol.2002;44: 299–307, the entire contents of which are incorporated by reference). This includes other natural serine recombinases engineered to include
B1195.70174WO00 12131093.2
one or more of the mutations described herein at positions homologous to those in the evolved Bxb1 recombinase variants provided in the present disclosure. [0149] Similarly, the core catalytic domains of tyrosine recombinases (e.g., Cre, λ integrase) are known, and can be similarly co-opted to engineer programmable site-specific recombinases as described herein (See, e.g., Guo et al., “Structure of Cre recombinase complexed with DNA in a site-specific recombination synapse.” Nature.1997; 389:40–46; Hartung et al., “Cre mutants with altered DNA binding properties.” J Biol Chem 1998; 273:22884–22891; Shaikh et al., “Chimeras of the Flp and Cre recombinases: Tests of the mode of cleavage by Flp and Cre. J Mol Biol.2000; 302:27–48; Rongrong et al., “Effect of deletion mutation on the recombination activity of Cre recombinase.” Acta Biochim Pol.2005; 52:541–544; Kilbride et al., “Determinants of product topology in a hybrid Cre-Tn3 resolvase site-specific recombination system.” J Mol Biol.2006; 355:185–195; Warren et al., “A chimeric cre recombinase with regulated directionality.” Proc Natl Acad Sci USA.2008105:18278–18283; Van Duyne, “Teaching Cre to follow directions.” Proc Natl Acad Sci USA.2009 Jan 6;106(1):4-5; Numrych et al., “A comparison of the effects of single-base and triple-base changes in the integrase arm- type binding sites on the site-specific recombination of bacteriophage λ.” Nucleic Acids Res. 1990; 18:3953–3959; Tirumalai et al., “The recognition of core-type DNA sites by λ integrase.” J Mol Biol.1998; 279:513–527; Aihara et al., “A conformational switch controls the DNA cleavage activity of λ integrase.” Mol Cell.2003; 12:187–198; Biswas et al., “A structural basis for allosteric control of DNA recombination by λ integrase.” Nature.2005; 435:1059–1066; and Warren et al., “Mutations in the amino-terminal domain of λ-integrase have differential effects on integrative and excisive recombination.” Mol Microbiol.2005; 55:1104–1112; the entire contents of each are incorporated by reference). [0150] In some embodiments, the present disclosure provides variants of a Bxb1 recombinase (from Mycobacteriophage Bxb1) of SEQ ID NO: 1: MRALVVIRLSRVTDATTSPERQLESCQQLCAQRGWDVVGVAEDLDVSGAVDPFDRKRR PNLARWLAFEEQPFDVIVAYRVDRLTRSIRHLQQLVHWAEDHKKLVVSATEAHFDTTTP FAAVVIALMGTVAQMELEAIKERNRSAAHFNIRAGKYRGSLPPWGYLPTRVDGEWRLV PDPVQRERILEVYHRVVDNHEPLHLVAHDLNRRGVLSPKDYFAQLQGREPQGREWSAT ALKRSMISEAMLGYATLNGKTVRDDDGAPLVRAEPILTREQLEALRAELVKTSRAKPAV STPSLLLRVLFCAVCGEPAYKFAGGGRKHPRYRCRSMGFPKHCGNGTVAMAEWDAFC
B1195.70174WO00 12131093.2
EEQVLDLLGDAERLEKVWVAGSDSAVELAEVNAELVDLTSLIGSPAYRAGSPQREALD ARIAALAARQEELEGLEARPSGWEWRETGQRFGDWWREQDTAAKNTWLRSMNVRLTF DVRGGLTRTIDFGDLQEYEQHLRLGSVVERLHTGMS (SEQ ID NO: 1) Recombinase recognition sequence [0151] The term “recombinase recognition sequence”, or equivalently as “RRS” or “recombinase target sequence” or “recombinase site,” as used herein, refers to a nucleotide sequence target that is recognized by a recombinase and undergoes strand exchange with another RRS (which can be on the same DNA molecule or a different DNA molecule, e.g., on a different chromosome, or on a donor DNA molecule such as a donor DNA vector) that results in excision, integration, inversion, or exchange of DNA fragments between the recombinase recognition sequences. In various embodiments, the multi-strand prime editors may install one or more recombinase sites in a target DNA molecule, or in more than one target molecule. When more than one recombinase site is installed by a multi-strand prime editor, the recombinase sites can be installed at adjacent target sites or non-adjacent target sites (e.g., separate chromosomes). In various embodiments, single installed recombinase sites can be used as “landing sites” for a recombinase-mediated reaction between the genomic recombinase site and a second recombinase site within an exogenously supplied nucleic acid molecule, e.g., a plasmid comprising a donor DNA sequence. This enables the targeted integration of a desired nucleic acid molecule. In other embodiments, where two recombinase sites are inserted in adjacent regions of DNA (e.g., separated by 25-50 bp, 50-100 bp, 100-200 bp, 200-300 bp, 300-400 bp, 400-500 bp, 500-600 bp, 600-700 bp, 700-800 bp, 800-900 bp, 900-1000 bp, 1000-2000 bp, 2000-3000 bp, 3000-4000 bp, 4000-5000 bp, or more), the recombinase sites can be used for recombinase-mediated excision or inversion of the intervening sequence, or for recombinase-mediated cassette exchange with exogenous DNA having the same recombinase sites. When the two or more recombinase sites are installed by prime editors on two different chromosomes, translocation of the intervening sequence can occur from a first chromosomal location to the second. [0152] In some embodiments, a recombinase recognition sequence comprises at attP site. In some embodiments, a recombinase recognition sequence comprises an attB site. In some embodiments, the attP and attB recombinase recognition sequences recognized by Bxb1 comprise the sequences
B1195.70174WO00 12131093.2
GGTTTGTCTGGTCAACCACCGCGGTCTCAGTGGTGTACGGTACAAACC (SEQ ID NO: 111) and GGCTTGTCGACGACGGCGGTCTCCGTCGTCAGGATCAT (SEQ ID NO: 112). Recombine or recombination [0153] The term “recombine,” or “recombination,” in the context of a nucleic acid modification (e.g., a genomic modification), is used to refer to the process by which two or more nucleic acid molecules, or two or more regions of a single nucleic acid molecule, are modified by the action of a recombinase protein (e.g., an inventive recombinase fusion protein provided herein). Recombination can result in, inter alia, the insertion, exchange, inversion, excision, or translocation of nucleic acids, e.g., in or between one or more nucleic acid molecules. Reverse transcriptase [0154] The term “reverse transcriptase” describes a class of polymerases characterized as RNA- dependent DNA polymerases. All known reverse transcriptases require a primer to synthesize a DNA transcript from an RNA template. Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA, which can then be cloned into a vector for further manipulation. Avian myoblastosis virus (AMV) reverse transcriptase was the first widely used RNA-dependent DNA polymerase (Verma, Biochim. Biophys. Acta 473:1 (1977)). The enzyme has 5ʹ-3ʹ RNA-directed DNA polymerase activity, 5ʹ-3ʹ DNA-directed DNA polymerase activity, and RNase H activity. RNase H is a processive 5ʹ and 3ʹ ribonuclease specific for the RNA strand for RNA-DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, New York: Wiley & Sons (1984)). Errors in transcription cannot be corrected by reverse transcriptase because known viral reverse transcriptases lack the 3ʹ-5ʹ exonuclease activity necessary for proofreading (Saunders and Saunders, Microbial Genetics Applied to Biotechnology, London: Croom Helm (1987)). A detailed study of the activity of AMV reverse transcriptase and its associated RNaseH activity has been presented by Berger et al., Biochemistry 22:2365-2372 (1983). Another reverse transcriptase that is used extensively in molecular biology is reverse transcriptase originating from Moloney murine leukemia virus (M-MLV or “MMLV”). See, e.g., Gerard, G. R., DNA 5:271-279 (1986), and Kotewicz, M. L., et al., Gene 35:249-258 (1985). M- MLV reverse transcriptase substantially lacking in RNase H activity has also been described. See, e.g., U.S. Pat. No.5,244,797. The invention contemplates the use of any such reverse transcriptases, or variants or mutants thereof.
B1195.70174WO00 12131093.2
[0155] In addition, the invention contemplates the use of reverse transcriptases that are error- prone, i.e., that may be referred to as error-prone reverse transcriptases or reverse transcriptases that do not support high fidelity incorporation of nucleotides during polymerization. During synthesis of the single-strand DNA flap based on the RT template integrated with the guide RNA, the error-prone reverse transcriptase can introduce one or more nucleotides that are mismatched with the RT template sequence, thereby introducing changes to the nucleotide sequence through erroneous polymerization of the single-strand DNA flap. These errors introduced during synthesis of the single strand DNA flap then become integrated into the double strand molecule through hybridization to the corresponding endogenous target strand, removal of the endogenous displaced strand, ligation, and then through one more round of endogenous DNA repair and/or sequencing processes. In some embodiments, the prime editors used in the systems and methods provided herein comprise MMLV RT. Reverse transcription [0156] As used herein, the term “reverse transcription” indicates the capability of an enzyme to synthesize a DNA strand (that is, complementary DNA or cDNA) using RNA as a template. In some embodiments, the reverse transcription can be “error-prone reverse transcription,” which refers to the properties of certain reverse transcriptase enzymes that are error-prone in their DNA polymerization activity. Spacer sequence [0157] As used herein, the term “spacer sequence” in connection with a guide RNA or a PEgRNA refers to the portion of the guide RNA or PEgRNA of about 20 nucleotides that contains a nucleotide sequence that shares the same sequence as the protospacer sequence in the target DNA sequence. The spacer sequence anneals to the complement of the protospacer sequence to form a ssRNA/ssDNA hybrid structure at the target site and a corresponding R loop ssDNA structure of the endogenous DNA strand. Silent mutation [0158] As used herein, the term “silent mutation” refers to a mutation in a nucleic acid molecule that does not have an effect on the phenotype of the nucleic acid molecule, or the protein it produces if it encodes a protein. Silent mutations can be introduced into coding regions of a nucleic acid (i.e., segments of a gene that encode for a protein), or they can be introduced in non- coding regions of a nucleic acid. A silent mutation in a nucleic acid sequence, e.g., in a target
B1195.70174WO00 12131093.2
DNA sequence or in a DNA synthesis template sequence to be installed in the target sequence, may be a nucleotide alteration that does not result in expression or function of the amino acid sequence encoded by the nucleic acid sequence, or other functional features of the target nucleic acid sequence. When silent mutations are present in a coding region, they may be synonymous mutations. Synonymous mutations refer to substitutions of one base for another in a gene such that the corresponding amino acid residue of the protein produced by the gene is not modified. This is due to the redundancy of the genetic code, allowing for multiple different codons to encode for the same amino acid in a particular organism. When a silent mutation is in a noncoding region or a junction of a coding region and a non-coding region (e.g., an intron/exon junction), it may be in a region that does not impact any biological properties of the nucleic acid molecule (e.g., splicing, gene regulation, RNA lifetime, etc.). In particular embodiments, a silent mutation may also be a “benign” mutation, for example, where a nucleotide substitution results in one or more alterations in the amino acid sequence encoded, but does not result in detrimental impact on the expression or function of the polypeptide. Silent mutations may be useful, for example, for increasing the length of contiguous changes in a desired nucleotide edit or the number of nucleotide edits made to a target nucleotide sequence using prime editing to evade correction of the edit by the MMR pathway. In certain embodiments, the number of silent mutations installed may be one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten, or more. In certain other embodiments involving at least two silent mutations, the silent mutations may be installed within one, or two, or three, or four, or five, or six, or seven, or eight, or nine, or ten, or 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 nucleotides from the intended edit site. In some embodiments, silent mutations are installed in order to alter or optimize the secondary structure that a particular pegRNA will form in cell. In some embodiments, changing some bases of a pegRNA to incorporate silent mutations results in changes to the secondary structure of the pegRNA that can improve editing efficiency. Subject [0159] The term “subject,” as used herein, refers to an individual organism, for example, an individual mammal. In some embodiments, the subject is a human. In some embodiments, the subject is a non-human mammal. In some embodiments, the subject is a non-human primate. In some embodiments, the subject is a rodent. In some embodiments, the subject is a sheep, a goat, a cattle, a cat, or a dog. In some embodiments, the subject is a vertebrate, an amphibian, a reptile,
B1195.70174WO00 12131093.2
a fish, an insect, a fly, or a nematode. In some embodiments, the subject is a research animal. In some embodiments, the subject is genetically engineered, e.g., a genetically engineered non- human subject. The subject may be of either sex and may be at any stage of development. In some embodiments, the subject has a disease or disorder, or is suspected of having a disease or disorder. In some embodiments, the disease or disorder is treated using the gene editing methods involving prime editing and a recombinase described herein. Substitution [0160] The term “substitution,” as used herein, refers to replacement of a residue within a sequence, e.g., a nucleic acid or amino acid sequence, with another residue, or a deletion or insertion of one or more residues within a sequence. The term “mutation” may also be used throughout the present disclosure to refer to a substitution (i.e., a “nucleic acid mutation” or an “amino acid mutation”). Substitutions are typically described herein by identifying the original residue followed by the position of the residue within the sequence and the identity of the newly mutated/substituted residue. Various methods for making the amino acid substitutions provided herein are well known in the art, and are provided by, for example, Green and Sambrook, Molecular Cloning: A Laboratory Manual (4th ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2012)). In some embodiments, a substitution is in a recombinase, e.g., a Bxb1 recombinase. Target site [0161] The term “target site” refers to a sequence within a nucleic acid molecule that is edited by a prime editor (PE) disclosed herein. The target site further refers to the sequence within a nucleic acid molecule to which a complex of the prime editor (PE) and gRNA binds. Treatment [0162] The terms “treatment,” “treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. As used herein, the terms “treatment,” “treat,” and “treating” refer to a clinical intervention aimed to reverse, alleviate, delay the onset of, or inhibit the progress of a disease or disorder, or one or more symptoms thereof, as described herein. In some embodiments, treatment may be administered after one or more symptoms have developed and/or after a disease has been diagnosed. In other embodiments, treatment may be administered in the absence of symptoms, e.g., to prevent or delay onset of a symptom or inhibit onset or
B1195.70174WO00 12131093.2
progression of a disease. For example, treatment may be administered to a susceptible individual prior to the onset of symptoms (e.g., in light of a history of symptoms and/or in light of genetic or other susceptibility factors). Treatment may also be continued after symptoms have resolved, for example, to prevent or delay their recurrence. Variant [0163] As used herein, the term “variant” should be taken to mean the exhibition of qualities that have a pattern that deviates from what occurs in nature, e.g., a variant Bxb1 recombinase is a Bxb1 recombinase comprising one or more changes in amino acid residues as compared to a wild type Bxb1 recombinase amino acid sequence. The term “variant” encompasses homologous proteins having at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% identity with a reference sequence and having the same or substantially the same functional activity or activities as the reference sequence. The term also encompasses mutants, truncations, or domains of a reference sequence that display the same or substantially the same functional activity or activities as the reference sequence. Vector [0164] The term “vector,” as used herein, refers to a nucleic acid that can be modified to encode a gene of interest and that is able to enter a host cell, mutate, and replicate within the host cell, and then transfer a replicated form of the vector into another host cell. Exemplary suitable vectors include viral vectors, such as retroviral vectors or bacteriophages and filamentous phage, and conjugative plasmids. Additional suitable vectors will be apparent to those of skill in the art based on the instant disclosure. Wild type [0165] As used herein the term “wild type” is a term of the art understood by skilled persons and means the typical form of an organism, strain, gene, or characteristic as it occurs in nature as distinguished from mutant or variant forms. DETAILED DESCRIPTION [0166] The present disclosure provides evolved and engineered recombinase variants. The recombinase variants provided herein exhibit increased recombination activity (e.g., increased insertion efficiency of donor DNA molecules at recombinase recognition sites), for example, between recombinase recognition sites that have been introduced into one or more target DNA
B1195.70174WO00 12131093.2
sequences (e.g., in a genome of an organism) using prime editing. The present disclosure also provides systems and compositions comprising the recombinase variants described herein and a prime editor and pegRNA, or polynucleotides encoding each of the recombinase variant, prime editor, and pegRNA. Methods for editing a target nucleic acid using the recombinase variants provided herein and, optionally, prime editing (e.g., for insertion, deletion, exchange, inversion, or translocation) are also described in the present disclosure. Evolved and Engineered Recombinases [0167] Some aspects of the present disclosure provide evolved and/or engineered recombinases that exhibit improved activity (e.g., when used for recombining recombinase sites introduced into one or more target DNA sequences used prime editing). In some aspects, the present disclosure provides Bxb1 recombinase variants that exhibit improved activity (e.g., increased insertion efficiency of donor DNA molecules at recombinase recognition sites). The recombinase variants provided herein comprise various amino acid substitutions relative to the amino acid sequence of Bxb1 recombinase, which is provided below: [0168] Bxb1 recombinase: MRALVVIRLSRVTDATTSPERQLESCQQLCAQRGWDVVGVAEDLDVSGAVDPFDRKRR PNLARWLAFEEQPFDVIVAYRVDRLTRSIRHLQQLVHWAEDHKKLVVSATEAHFDTTTP FAAVVIALMGTVAQMELEAIKERNRSAAHFNIRAGKYRGSLPPWGYLPTRVDGEWRLV PDPVQRERILEVYHRVVDNHEPLHLVAHDLNRRGVLSPKDYFAQLQGREPQGREWSAT ALKRSMISEAMLGYATLNGKTVRDDDGAPLVRAEPILTREQLEALRAELVKTSRAKPAV STPSLLLRVLFCAVCGEPAYKFAGGGRKHPRYRCRSMGFPKHCGNGTVAMAEWDAFC EEQVLDLLGDAERLEKVWVAGSDSAVELAEVNAELVDLTSLIGSPAYRAGSPQREALD ARIAALAARQEELEGLEARPSGWEWRETGQRFGDWWREQDTAAKNTWLRSMNVRLTF DVRGGLTRTIDFGDLQEYEQHLRLGSVVERLHTGMS (SEQ ID NO: 1) [0169] In some aspects, the present disclosure provides Bxb1 recombinases comprising an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions at positions selected from the group consisting of amino acid residues 3, 5, 10, 14, 15, 20, 23, 24, 25, 29, 35, 36, 39, 40, 43, 45, 47, 49, 50, 51, 54, 58, 60, 66, 68, 69, 70, 73, 74, 75, 78, 84, 86, 87,
B1195.70174WO00 12131093.2
89, 93, 95, 97, 100, 101, 105, 116, 119, 124, 127, 139, 147, 154, 157, 158, 169, 175, 179, 181, 183, 185, 194, 197, 199, 202, 203, 204, 207, 208, 209, 214, 221, 229, 239, 248, 252, 261, 266, 267, 273, 279, 280, 281, 284, 285, 287, 288, 291, 309, 311, 321, 328, 333, 334, 342, 343, 345, 347, 360, 361, 362, 365, 368, 374, 375, 378, 389, 393, 400, 411, 415, 419, 421, 422, 424, 434, 435, 438, 440, 447, 449, 453, 462, 463, 466, 468, 469, 478, 483, 485, 487, 490, 494, 496, and 497 of the amino acid sequence provided in SEQ ID NO: 1, or at corresponding positions in a homologous recombinase (e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase). In some embodiments, the Bxb1 recombinase comprises an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions at positions selected from the group consisting of amino acid residues 3, 5, 10, 14, 15, 20, 23, 24, 25, 29, 35, 36, 39, 40, 43, 45, 47, 49, 50, 51, 54, 58, 60, 66, 68, 69, 70, 73, 74, 75, 78, 84, 86, 87, 89, 93, 95, 97, 100, 101, 105, 116, 119, 124, 127, 139, 147, 157, 158, 169, 175, 179, 181, 183, 185, 194, 197, 199, 202, 203, 204, 207, 208, 209, 214, 221, 229, 239, 248, 252, 261, 266, 267, 273, 279, 280, 281, 284, 285, 287, 288, 291, 309, 311, 321, 328, 333, 334, 342, 343, 345, 347, 360, 361, 362, 365, 368, 374, 375, 378, 389, 393, 400, 411, 415, 419, 421, 422, 424, 434, 435, 438, 440, 447, 449, 453, 462, 463, 466, 468, 469, 478, 483, 485, 487, 490, 494, 496, and 497 of the amino acid sequence provided in SEQ ID NO: 1, or at corresponding positions in a homologous recombinase (e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase). In some embodiments, the Bxb1 recombinase comprises an amino acid sequence that is not identical to the amino acid sequence of wild type Bxb1 recombinase, or to any other variants of Bxb1 recombinase known in the art. In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3X, V5X, S10X, D14X, A15X, E20X, L23X, E24X, S25X, L29X, W35X, D36X, G39X, V40X, D43X, D45X, S47X, A49X, V50X, D51X, D54X, R58X, N60X, A66X, E68X, E69X, Q70X, D73X, V74X, I75X, Y78X, T84X, S86X, I87X,
B1195.70174WO00 12131093.2
H89X, L93X, H95X, A97X, H100X, K101X, V105X, T116X, A119X, A124X, G127X, E139X, F147X, Y154X, S157X, L158X, D169X, V175X, V179X, R181X, R183X, L185X, N194X, P197X, H199X, A202X, H203X, D204X, R207X, R208X, G209X, K214X, Q221X, E229X, M239X, A248X, G252X, A261X, A266X, E267X, E273X, R279X, A280X, E281X, K284X, T285X, R287X, A288X, A291X, E309X, A311X, H321X, S328X, K333X, H334X, M342X, A343X, W345X, A347X, A360X, E361X, R362X, K365X, V368X, A374X, V375X, A378X, S389X, S393X, S400X, A411X, A415X, E419X, E421X, G422X, E424X, E434X, T435X, R438X, G440X, D447X, A449X, T453X, L462X, T463X, V466X, G468X, G469X, D478X, E483X, H485X, R487X, S490X, R494X, H496X, and T497X relative to the amino acid sequence provided in SEQ ID NO: 1, or at corresponding positions in a homologous recombinase (e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase), wherein X represents any amino acid other than the wild type amino acid. In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3X, V5X, D14X, A15X, E20X, L23X, E24X, S25X, L29X, W35X, D36X, G39X, V40X, D43X, D45X, S47X, A49X, V50X, D51X, D54X, R58X, N60X, A66X, E68X, E69X, Q70X, D73X, V74X, I75X, Y78X, T84X, S86X, I87X, H89X, L93X, H95X, A97X, H100X, K101X, V105X, T116X, A119X, A124X, G127X, E139X, F147X, S157X, L158X, D169X, V175X, V179X, R181X, R183X, L185X, N194X, P197X, H199X, A202X, H203X, D204X, R207X, R208X, G209X, K214X, Q221X, E229X, M239X, A248X, G252X, A261X, A266X, E267X, E273X, R279X, A280X, E281X, K284X, T285X, R287X, A288X, A291X, E309X, A311X, H321X, S328X, K333X, H334X, M342X, A343X, W345X, A347X, A360X, E361X, R362X, K365X, V368X, A374X, V375X, A378X, S389X, S393X, S400X, A411X, A415X, E419X, E421X, G422X, E424X, E434X, T435X, R438X, G440X, D447X, A449X, T453X, L462X, T463X, V466X, G468X, G469X, D478X, E483X, H485X, R487X, S490X, R494X, H496X, and T497X relative to the amino acid sequence provided in SEQ ID NO: 1, or at corresponding positions in a homologous recombinase (e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase), wherein X represents any amino acid other than the wild type
B1195.70174WO00 12131093.2
amino acid. In certain embodiments, the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3T, A3V, V5F, V5I, S10A, D14N, A15T, E20K, E20Q, L23F, L23M, E24K, S25I, L29F, W35P, W35L, D36G, D36V, G39D, V40I, D43E, D45G, S47A, A49E, A49T, V50I, D51E, D51N, D51Y, D54N, R58K, N60S, A66T, E68K, E69D, Q70P, D73G, V74A, V74M, I75V, Y78H, Y78N, T84S, S86G, S86N, S86T, I87T, I87V, H89N, L93M, H95Y, A97S, H100Y, K101R, V105I, T116P, A119S, A124S, G127E, E139A, F147Y, Y154C, S157G, L158M, D169N, V175I, V175M, V179G, V179M, R181Q, R183L, L185M, N194D, N194K, P197Q, P197T, H199Y, A202S, H203Y, D204G, R207I, R207Q, R208S, G209V, K214R, Q221R, E229K, M239L, A248V, G252S, A261V, A266T, E267D, E273D, E273K, R279C, A280T, E281K, K284N, T285A, R287P, A288T, A291S, A291T, E309D, A311V, H321N, S328T, K333N, H334P, M342V, A343T, W345L, A347T, A347V, A360T, E361D, E361G, R362K, K365N, V368A, V368N, A374V, V375I, A378T, S389R, S393F, S400Y, A411V, A415V, E419D, E421K, G422S, E424G, E434G, T435A, R438Q, G440E, D447N, A449V, T453A, T453N, L462M, T463I, V466M, G468D, G469R, D478E, E483K, H485Y, R487K, S490N, R494Q, H496P, and T497A relative to the amino acid sequence of SEQ ID NO: 1, or at corresponding positions in a homologous recombinase (e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase). In certain embodiments, the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3T, A3V, V5F, V5I, D14N, A15T, E20K, E20Q, L23F, L23M, E24K, S25I, L29F, W35P, W35L, D36G, D36V, G39D, V40I, D43E, D45G, S47A, A49E, A49T, V50I, D51E, D51N, D51Y, D54N, R58K, N60S, A66T, E68K, E69D, Q70P, D73G, V74A, V74M, I75V, Y78H, Y78N, T84S, S86G, S86N, S86T, I87T, I87V, H89N, L93M, H95Y, A97S, H100Y, K101R, V105I, T116P, A119S, A124S, G127E, E139A, F147Y, S157G, L158M, D169N, V175I, V175M, V179G, V179M, R181Q, R183L, L185M, N194D, N194K, P197Q, P197T, H199Y, A202S, H203Y, D204G, R207I, R207Q, R208S, G209V, K214R, Q221R, E229K, M239L, A248V, G252S, A261V, A266T, E267D, E273D, E273K, R279C, A280T, E281K, K284N, T285A, R287P, A288T, A291S, E309D, A311V, H321N, S328T, K333N, H334P, M342V,
B1195.70174WO00 12131093.2
A343T, W345L, A347T, A360T, E361D, R362K, K365N, V368A, V368N, A374V, V375I, A378T, S389R, S393F, S400Y, A411V, A415V, E419D, E421K, G422S, E424G, E434G, T435A, R438Q, G440E, D447N, A449V, T453A, T453N, L462M, T463I, V466M, G468D, G469R, D478E, E483K, H485Y, R487K, S490N, R494Q, H496P, and T497A relative to the amino acid sequence of SEQ ID NO: 1, or at corresponding positions in a homologous recombinase (e.g., a recombinase with at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity to Bxb1 recombinase). [0170] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 3 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A3X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A3T mutation. In certain embodiments, the mutation is an A3V mutation. [0171] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 5 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a V5X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V5F mutation. In certain embodiments, the mutation is a V5I mutation. [0172] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 10 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S10X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S10A mutation. [0173] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 14 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D14X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D14N mutation. [0174] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 15 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
A15X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A15T mutation. [0175] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 20 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E20X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E20K mutation. In certain embodiments, the mutation is an E20Q mutation. [0176] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 23 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an L23X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an L23F mutation. In certain embodiments, the mutation is an L23M mutation. [0177] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 24 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E24X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E24K mutation. [0178] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 25 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S25X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S25I mutation. [0179] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 29 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an L29X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an L29F mutation. [0180] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 35 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a
B1195.70174WO00 12131093.2
W35X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a W35P mutation. In certain embodiments, the mutation is a W35L mutation. [0181] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 36 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D36X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D36G mutation. In certain embodiments, the mutation is a D36V mutation. [0182] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 39 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a G39X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G39D mutation. [0183] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 40 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a V40X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V40I mutation. [0184] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 43 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D43X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D43E mutation. [0185] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 45 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D45X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D45G mutation. [0186] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 47 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
S47X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S47A mutation. [0187] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 49 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A49X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A49E mutation. In certain embodiments, the mutation is an A49T mutation. [0188] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 50 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a V50X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V50I mutation. [0189] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 51 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D51X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D51E mutation. In certain embodiments, the mutation is a D51N mutation. In certain embodiments, the mutation is a D51Y mutation. [0190] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 54 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D54X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D54N mutation. [0191] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 58 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R58X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R58K mutation. [0192] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 60 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
N60X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an N60S mutation. [0193] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 66 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A66X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A66T mutation. [0194] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 68 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E68X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E68K mutation. [0195] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 69 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E69X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E69D mutation. [0196] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 70 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a Q70X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a Q70P mutation. [0197] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 73 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D73X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D73G mutation. [0198] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 74 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a
B1195.70174WO00 12131093.2
V74X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V74A mutation. [0199] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 75 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an I75X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an I75V mutation. [0200] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 78 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a Y78X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a Y78H mutation. [0201] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 84 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a T84X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a T84S mutation. [0202] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 86 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S86X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S86G mutation. In certain embodiments, the mutation is an S86N mutation. In certain embodiments, the mutation is an S86T mutation. [0203] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 87 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an I87X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an I87T mutation. In certain embodiments, the mutation is an I87V mutation. [0204] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 89 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
H89X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H89N mutation. [0205] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 93 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an L93X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an L93M mutation. [0206] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 95 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an H95X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H95Y mutation. [0207] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 97 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A97X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A97S mutation. [0208] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 100 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an H100X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H100Y mutation. [0209] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 101 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a K101X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a K101R mutation. [0210] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 105 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a
B1195.70174WO00 12131093.2
V105X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V105I mutation. [0211] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 116 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a T116X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a T116P mutation. [0212] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 119 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A119X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A119S mutation. [0213] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 124 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A124X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A124S mutation. [0214] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 127 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a G127X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G127E mutation. [0215] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 139 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E139X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E139A mutation. [0216] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 147 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
F147X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an F147Y mutation. [0217] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 154 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a Y154X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a Y154C mutation. [0218] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 157 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S157X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S157G mutation. [0219] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 158 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an L158X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an L158M mutation. [0220] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 169 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D169X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D169N mutation. [0221] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 175 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a V175X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V175I mutation. In certain embodiments, the mutation is a V175M mutation. [0222] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 179 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a
B1195.70174WO00 12131093.2
V179X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V179G mutation. In certain embodiments, the mutation is a V179M mutation. [0223] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 181 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R181X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R181Q mutation. [0224] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 183 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R183X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R183L mutation. [0225] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 185 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an L185X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an L185M mutation. [0226] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 194 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an N194X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an N194D mutation. In certain embodiments, the mutation is an N194K mutation. [0227] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 197 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a P197X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a P197Q mutation. In certain embodiments, the mutation is a P197T mutation. [0228] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 199 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
H199X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H199Y mutation. [0229] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 202 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A202X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A202S mutation. [0230] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 203 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an H203X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H203Y mutation. [0231] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 204 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D204X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D204G mutation. [0232] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 207 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R207X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R207I mutation. In certain embodiments, the mutation is an R207Q mutation. [0233] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 208 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R208X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R208S mutation. [0234] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 209 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a
B1195.70174WO00 12131093.2
G209X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G209V mutation. [0235] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 214 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a K214X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a K214R mutation. [0236] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 221 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a Q221X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a Q221R mutation. [0237] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 229 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E229X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E229K mutation. [0238] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 239 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an M239X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an M239L mutation. [0239] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 248 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A248X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A248V mutation. [0240] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 252 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a
B1195.70174WO00 12131093.2
G252X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G252S mutation. [0241] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 261 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A261X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A261V mutation. [0242] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 266 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A266X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A266T mutation. [0243] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 267 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E267X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E267D mutation. [0244] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 273 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E273X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E273D mutation. In certain embodiments, the mutation is an E273K mutation. [0245] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 279 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R279X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R279C mutation. [0246] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 280 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
A280X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A280T mutation. [0247] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 281 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E281X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E281K mutation. [0248] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 284 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a K284X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a K284N mutation. [0249] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 285 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a T285X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a T285A mutation. [0250] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 287 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R287X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R287P mutation. [0251] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 288 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A288X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A288T mutation. [0252] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 291 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
A291X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A291S mutation. In certain embodiments, the mutation is an A291T mutation. [0253] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 309 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E309X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E309D mutation. [0254] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 311 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A311X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A311V mutation. [0255] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 321 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an H321X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H321N mutation. [0256] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 328 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S328X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S328T mutation. [0257] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 333 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a K333X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a K333N mutation. [0258] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 334 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
H334X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H334P mutation. [0259] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 342 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an M342X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an M342V mutation. [0260] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 343 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A343X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A343T mutation. [0261] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 345 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a W345X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a W345L mutation. [0262] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 347 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A347X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A347T mutation. In certain embodiments, the mutation is an A347V mutation. [0263] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 360 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A360X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A360T mutation. [0264] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 361 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
E361X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E361D mutation. In certain embodiments, the mutation is an E361G mutation. [0265] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 362 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R362X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R362K mutation. [0266] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 365 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a K365X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a K365N mutation. [0267] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 368 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a V368X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V368A mutation. In certain embodiments, the mutation is a V368N mutation. [0268] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 374 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A374X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A374V mutation. [0269] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 375 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a V375X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V375I mutation. [0270] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 378 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
A378X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A378T mutation. [0271] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 389 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S389X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S389R mutation. [0272] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 393 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S393X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S393F mutation. [0273] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 400 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S400X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S400Y mutation. [0274] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 411 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A411X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A411V mutation. [0275] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 415 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A415X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A415V mutation. [0276] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 419 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
E419X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E419D mutation. [0277] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 421 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E421X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E421K mutation. [0278] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 422 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a G422X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G422S mutation. [0279] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 424 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E424X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E424G mutation. [0280] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 434 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E434X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E434G mutation. [0281] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 435 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a T435X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a T435A mutation. [0282] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 438 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
R438X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R438Q mutation. [0283] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 440 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a G440X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G440E mutation. [0284] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 447 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D447 mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D447N mutation. [0285] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 449 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an A449X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an A449V mutation. [0286] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 453 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a T453X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a T453A mutation. In certain embodiments, the mutation is a T453N mutation. [0287] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 462 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an L462X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an L462M mutation. [0288] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 463 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a
B1195.70174WO00 12131093.2
T463X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a T463I mutation. [0289] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 466 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a V466X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a V466M mutation. [0290] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 468 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a G468X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G468D mutation. [0291] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 469 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a G469X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a G469R mutation. [0292] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 478 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a D478X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a D478E mutation. [0293] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 483 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an E483X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an E483K mutation. [0294] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 485 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an
B1195.70174WO00 12131093.2
H485X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H485Y mutation. [0295] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 487 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R487X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R487K mutation. [0296] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 490 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an S490X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an S490N mutation. [0297] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 494 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an R494X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an R494Q mutation. [0298] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 496 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is an H496X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is an H496P mutation. [0299] In some embodiments, the amino acid sequence of the Bxb1 recombinase comprises a mutation at amino acid position 497 of SEQ ID NO: 1, or a corresponding mutation at a homologous position in another recombinase sequence. In some embodiments, the mutation is a T497X mutation, wherein X is any amino acid other than the wild type. In certain embodiments, the mutation is a T497A mutation. [0300] In some embodiments, the Bxb1 recombinase comprises a substitution or combination of substitutions of any one of the Bxb1 variants in Table 1 below: [0301] Table 1: Bxb1 Variants from PANCEv1 that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s)
B1195.70174WO00 12131093.2
Y78X Y78N or Y78H L23X and D51X L23M and D51E titution or combination of
substitutions of any one of the Bxb1 variants in Table 2 below: [0303] Table 2: Bxb1 Variants from PANCEv2 that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s) V5X S86X N194X A266X R362X and V5I S86G N194D A266V R362K and
B1195.70174WO00 12131093.2
[0304] In some embodiments, the Bxb1 recombinase comprises a substitution or combination of substitutions of any one of the Bxb1 variants in Table 3 below: [0305] Table 3: Bxb1 Variants from PACEv1 that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s) V5X, A119X, E281X, G422S, R487X V5I, A119S, E281K, G422S, R487K d of
substitutions of any one of the Bxb1 variants in Table 4 below: [0307] Table 4: Bxb1 Variants from PANCEv3, PANCEv4, PACEv2, and PANCEv5 that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s) V5X P197X d R494X V5I P197T d R494
B1195.70174WO00 12131093.2
V5X, S86X, and H321X V5I, S86N, and H321N V5X, V74X, M239X, T453X, and G468X V5I, V74M, M239L, T453N, and G468D G
of any one of the Bxb1 variants in Table 5 below: [0309] Table 5: Rationally Designed Bxb1 Double Mutants that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s) A49X and H496X A49E and H496P
B1195.70174WO00 12131093.2
[0310] In some embodiments, the Bxb1 recombinase comprises a substitution or combination of substitutions of any one of the Bxb1 variants in Table 6 below: [0311] Table 6: Additional Evolved Bxb1 Mutants that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s) S157X, G209X, and V368X S157G, G209V, and V368A
B1195.70174WO00 12131093.2
L29X, W35X, V74X, and V105X L29F, W35L, V74A, and V105I of
any one of the Bxb1 variants in Table 7 below: [0313] Table 7: Rationally Designed Bxb1 Triple Mutants that Showed Improved Integration Efficiency Amino Acid Position(s) Amino Acid Mutation(s) D14X, R207X, and T453X D14N, R207Q, and T453A
he amino acid sequence of SEQ ID NO: 1, wherein X is any amino acid other than the wild type amino acid. In certain embodiments, the Bxb1 recombinase comprises a V74A mutation relative to the amino acid sequence of SEQ ID NO: 1. In certain embodiments, the Bxb1 recombinase comprises the following amino acid sequence (which is also referred to herein as “evoBxb1” and “evoPASSIGE”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: MRALVVIRLSRVTDATTSPERQLESCQQLCAQRGWDVVGVAEDLDVSGAVDPFDRKRR PNLARWLAFEEQPFDAIVAYRVDRLTRSIRHLQQLVHWAEDHKKLVVSATEAHFDTTTP FAAVVIALMGTVAQMELEAIKERNRSAAHFNIRAGKYRGSLPPWGYLPTRVDGEWRLV PDPVQRERILEVYHRVVDNHEPLHLVAHDLNRRGVLSPKDYFAQLQGREPQGREWSAT
B1195.70174WO00 12131093.2
ALKRSMISEAMLGYATLNGKTVRDDDGAPLVRAEPILTREQLEALRAELVKTSRAKPAV STPSLLLRVLFCAVCGEPAYKFAGGGRKHPRYRCRSMGFPKHCGNGTVAMAEWDAFC EEQVLDLLGDAERLEKVWVAGSDSAVELAEVNAELVDLTSLIGSPAYRAGSPQREALD ARIAALAARQEELEGLEARPSGWEWRETGQRFGDWWREQDTAAKNTWLRSMNVRLTF DVRGGLTRTIDFGDLQEYEQHLRLGSVVERLHTGMS (SEQ ID NO: 124) [0315] In some embodiments, the Bxb1 recombinase comprises V74X, E229X, and V375X mutations relative to the amino acid sequence of SEQ ID NO: 1, wherein X is any amino acid other than the wild type amino acid. In certain embodiments, the Bxb recombinase comprises V74A, E229K, and V375I mutations relative to the amino acid sequence of SEQ ID NO: 1. In certain embodiments, the Bxb1 recombinase comprises the following amino acid sequence (which is also referred to herein as “eeBxb1” and “eePASSIGE”), or an amino acid sequence at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to the following amino acid sequence: MRALVVIRLSRVTDATTSPERQLESCQQLCAQRGWDVVGVAEDLDVSGAVDPFDRKRR PNLARWLAFEEQPFDAIVAYRVDRLTRSIRHLQQLVHWAEDHKKLVVSATEAHFDTTTP FAAVVIALMGTVAQMELEAIKERNRSAAHFNIRAGKYRGSLPPWGYLPTRVDGEWRLV PDPVQRERILEVYHRVVDNHEPLHLVAHDLNRRGVLSPKDYFAQLQGREPQGRKWSAT ALKRSMISEAMLGYATLNGKTVRDDDGAPLVRAEPILTREQLEALRAELVKTSRAKPAV STPSLLLRVLFCAVCGEPAYKFAGGGRKHPRYRCRSMGFPKHCGNGTVAMAEWDAFC EEQVLDLLGDAERLEKVWVAGSDSAIELAEVNAELVDLTSLIGSPAYRAGSPQREALDA RIAALAARQEELEGLEARPSGWEWRETGQRFGDWWREQDTAAKNTWLRSMNVRLTFD VRGGLTRTIDFGDLQEYEQHLRLGSVVERLHTGMS (SEQ ID NO: 125) [0316] It should be appreciated that any of the amino acid mutations described herein, (e.g., A3T) from a first amino acid residue (e.g., A) to a second amino acid residue (e.g., T) may also include mutations from the first amino acid residue to an amino acid residue that is similar to (e.g., conserved) the second amino acid residue. For example, mutation of an amino acid with a hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan) may be a mutation to a second amino acid with a different hydrophobic side chain (e.g., alanine, valine, isoleucine, leucine, methionine, phenylalanine, tyrosine, or tryptophan). For example, a mutation of an alanine to a threonine (e.g., an A3T mutation) may also be a mutation from an alanine to an amino acid that is similar in size and chemical
B1195.70174WO00 12131093.2
properties to a threonine, for example, serine. As another example, mutation of an amino acid with a positively charged side chain (e.g., arginine, histidine, or lysine) may be a mutation to a second amino acid with a different positively charged side chain (e.g., arginine, histidine, or lysine). As another example, mutation of an amino acid with a polar side chain (e.g., serine, threonine, asparagine, or glutamine) may be a mutation to a second amino acid with a different polar side chain (e.g., serine, threonine, asparagine, or glutamine). Additional similar amino acid pairs include, but are not limited to, the following: phenylalanine and tyrosine; asparagine and glutamine; methionine and cysteine; aspartic acid and glutamic acid; and arginine and lysine. [0317] The skilled artisan would recognize that such conservative amino acid substitutions will likely have minor effects on protein structure and are likely to be well tolerated without compromising function. In some embodiments, any of the amino acid mutations provided herein from one amino acid to a threonine may be an amino acid mutation to a serine. In some embodiments, any of the amino acid mutations provided herein from one amino acid to an arginine may be an amino acid mutation to a lysine. In some embodiments, any of the amino acid mutations provided herein from one amino acid to an isoleucine may be an amino acid mutation to an alanine, valine, methionine, or leucine. In some embodiments, any of the amino acid mutations provided herein from one amino acid to a lysine may be an amino acid mutation to an arginine. In some embodiments, any of the amino acid mutations provided herein from one amino acid to an aspartic acid may be an amino acid mutation to a glutamic acid or asparagine. In some embodiments, any of the amino acid mutations provided herein from one amino acid to a valine may be an amino acid mutation to an alanine, isoleucine, methionine, or leucine. In some embodiments, any of the amino acid mutations provided herein from one amino acid to a glycine may be an amino acid mutation to an alanine. It should be appreciated, however, that additional conserved amino acid residues would be recognized by the skilled artisan, and any of the amino acid mutations to other conserved amino acid residues are also within the scope of this disclosure. [0318] In some aspects, the present disclosure provides Bxb1 recombinase variants comprising mutations corresponding to any of the mutations disclosed herein, or any combination thereof, at a homologous position in another recombinase. In some embodiments, the other recombinase is also a serine recombinase like Bxb1. Examples of additional serine recombinases include, without limitation, Hin, Gin, Tn3, β-six, CinH, ParA, γδ, ϕC31, TP901, TG1, φBT1, R4, φRV1,
B1195.70174WO00 12131093.2
φFC1, MR11, A118, U153, gp29, MJ1, TP901-1, V153, phiRV1, phi370.1, WB, BL3, SprA, ϕJoe, ϕK38, Int2, Int3, Int4, Int7, Int8, Int9, Int10, Int11, Int12, Int13, L1, peaches, Bxz2, and SV1. Additional serine recombinases are known in the art and will be readily apparent to those of skill in the art. Methods of Editing [0319] As referenced throughout, this disclosure combines the use of prime editing (PE), twin PE, or multi-flap prime editing with site-specific recombination. The term “site-specific recombination” refers to a type of genetic recombination also known as “conservative site- specific recombination.” Site-specific recombination is a type of genetic recombination in which DNA strand exchange takes place between segments possessing at least a certain degree of sequence homology. Enzymes known as site-specific recombinases (“SSRs”), such as Bxb1, perform rearrangements of DNA segments by recognizing and binding to short, specific DNA sequence (“recombinase recognition sites”), at which they cleave the DNA backbone, exchange the two DNA helices involved, and rejoin the DNA strands. In some cases, the presence of a recombinase enzyme and the recombination sites is sufficient for the reaction to proceed; in other systems a number of accessory proteins and/or accessory sites are required. Many different genome modification strategies, among these recombinase-mediated cassette exchange (RMCE), an advanced approach for the targeted introduction of transcription units into predetermined genomic loci, rely on SSRs. Site-specific recombination systems are highly specific, fast, and efficient, even when faced with complex eukaryotic genomes. They are employed naturally in a variety of cellular processes, including bacterial genome replication, differentiation and pathogenesis, and movement of mobile genetic elements. Recombination sites are typically between about 30 and 200 nucleotides in length and generally consist of two motifs with a partial inverted-repeat symmetry, to which the recombinase binds, and which flank a central crossover sequence at which the recombination takes place. The pairs of sites between which the recombination occurs are usually identical, but there are exceptions (e.g., attP and attB). [0320] Once a recombinase recognition site is installed in the genome, a cognate recombinase that recognizes the installed recombinase recognition site may be used to catalyze the precise cleavage, strand exchange, and rejoining of DNA fragments at the defined recombinase recognition sites. This is accomplished without relying on endogenous repair mechanisms in a cell for repairing double-strand breaks that otherwise can induce indels and other undesirable
B1195.70174WO00 12131093.2
DNA rearrangements. The reactions catalyzed by recombinases and recombinase recognition sites result in large-scale genomic changes, such as, insertions, deletions, inversions, replacements, and chromosomal translocations of one or more chromosomal regions, including one or more loci, one or more genes, or one or more portions of genes (e.g., gene exons, introns, and gene regulatory regions). [0321] In certain embodiments, the one or more recombinase recognition sites can be inserted or introduced anywhere within a genome. In some organisms, a genome is organized as a single chromosome (e.g., bacteria) and the recombinase recognition site may be inserted at any locus within the chromosome. The insertion site may be within a gene or within an intergenic region of a chromosome. The insertion may be within an exon, intron, or therebetween, or within a regulatory sequence, such as a promoter, enhancer, or transcription binding sequence. In other organisms, e.g., humans, the genome is organized into more than one chromosome, and the recombinase recognition site may be inserted at any locus within the chromosome. For instance, in humans, the genome comprises 23 pairs of chromosomes. In addition, the genome also may be mitochondrial DNA. The insertion site may be within a gene or within an intergenic region of a chromosome. The insertion may be within an exon, intron, or therebetween, or within a regulatory sequence, such as a promoter, enhancer, or transcription binding sequence. [0322] As used herein “inserting in a genome” in any organism can include inserting one or more SSR recognition sites in any one or more chromosomes of a given genome (depending upon the number of chromosomes making up the genome) and at any chromosomal locus or loci. Where a genome comprises more than one chromosome, reference to “inserting in a genome” may include inserting the one or more SSRs into the one or more chromosomes of the genome. For example, in humans—which have 23 pairs of chromosomes—reference to “inserting in a genome” refers to inserting one or more SSR recognition sites in any one of chromosome 1, chromosome 2, chromosome 3, chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8, chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome 13, chromosome 14, chromosome 15, chromosome 16, chromosome 17, chromosome 18, chromosome 19, chromosome 20, chromosome 21, chromosome 22, or chromosome 23 (aka, XX chromosome or XY chromosome), or insertion into any combination of said chromosomes, or in a mitochondrial genome.
B1195.70174WO00 12131093.2
[0323] In various embodiments, the recombinase recognition sites are inserted by PE or twinPE upstream of a gene. For instance, the recombinase recognition sites may be inserted upstream by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs from the 5′ end of gene. In other embodiments, the recombinase recognition sites are inserted upstream by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs from the 5′ end of the transcription start site of a gene. In still another embodiments, recombinase recognition sites are inserted upstream by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs from the 5′ end of a promoter element. [0324] In various embodiments, the recombinase recognition sites are inserted by PE or twinPE downstream of a gene. For instance, the recombinase recognition sites are inserted downstream by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs from the 3′ end of gene. In other embodiments, the recombinase recognition sites are inserted downstream by 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
B1195.70174WO00 12131093.2
44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs from the 3′ end of the transcription termination site of a gene. [0325] In still another embodiment, the recombinase recognition sites are inserted within an exon, within an intron, or at the junction between an intron and exon, or upstream or downstream of an exon or intron. In various embodiments, the recombinase recognition sites may be inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs upstream from the 5′ end of an exon. [0326] In various embodiments, the recombinase recognition sites are inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs downstream from the 3′ end of an exon. [0327] In various embodiments, the recombinase recognition sites are inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs upstream from the 5′ end of an intron. [0328] In various embodiments, the recombinase recognition sites are inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
B1195.70174WO00 12131093.2
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs downstream from the 3′ end of an intron. [0329] In other embodiments, the recombinase recognition sites are inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs upstream from the 5′ end of a regulatory sequence or element (e.g., a promoter, transcription binding site, or enhancer element). [0330] In various embodiments, the recombinase recognition sites may be inserted at a position that is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, or up to 150 base pairs downstream from the 3′ end of a regulatory sequence or element (e.g., a cis-promoter element, transcription binding site, or enhancer element). [0331] In various embodiments, the disclosure provides compositions and methods for installing one or more recombinase recognition sites using single flap prime editing (“classical PE”), twin prime editing (or twinPE), or multi-flap PE. [0332] In some embodiments, classical PE may be used to insert one or more or two or more recombinase recognition sites into a desired genomic site. [0333] In some embodiments, twinPE may be used to insert one or more or two or more recombinase recognition sites into a desired genomic site. [0334] In some embodiments, multi-flap PE may be used to insert one or more or two or more recombinase recognition sites into one more desired genomic sites. [0335] Insertion of recombinase recognition sites provides a programmed location for effecting one or more site-specific intended edits in a target DNA, e.g., genetic changes in a target gene or
B1195.70174WO00 12131093.2
a genome. Non-limiting examples of intended edits via genetic recombination include insertion of an exogenous sequence into a target DNA, deletion (excision) of an endogenous sequence in a target DNA, inversion of an endogenous sequence in a target DNA, replacement of an endogenous sequence in a target DNA by an exogenous sequence, translocation of sequences between two target DNA sequences (e.g., between two different chromosomes), and any combination thereof. Accordingly, when the target DNA is a target gene or target genome, genetic changes via recombination can include, for example, genomic integration of an exogenous DNA sequence, e.g., sequence of a plasmid or a part thereof, genomic deletion or insertion, chromosomal translocations, and replacement of an endogenous genomic sequence in a target genome by an exogenous sequence (“cassette exchanges”), among other genetic changes. These exemplary types of genetic changes are illustrated in FIG.1. [0336] The mechanism of installing a recombinase recognition site into the genome is analogous to installing other sequences, such as peptide/protein and RNA tags, into the genome. Recombinase sites can be installed in a target DNA, e.g., a target genome, with single flap prime editing, twin prime editing, or multi-flap prime editing. [0337] In another aspect, the present disclosure provides methods for modifying one or more target nucleic acids in a cell comprising contacting the one or more target nucleic acids with any of the recombinases provided herein (e.g., one or more target nucleic acids that comprise one or more recombinase recognition sites). [0338] In one aspect, the present disclosure provides methods for modifying a target nucleic acid in a cell using prime editing and a recombinase, comprising expressing in the cell a polynucleotide encoding any of the recombinases provided herein, a polynucleotide encoding a prime editor, and one or more polynucleotides encoding one or more prime editing guide RNAs (pegRNAs) comprising DNA synthesis templates encoding one or more recombinase recognition sites. [0339] In one aspect, the present disclosure provides methods for modifying a target nucleic acid in a cell using prime editing and a recombinase, comprising expressing in the cell a polynucleotide encoding any of the recombinases provided herein and a polynucleotide encoding a prime editor, and providing to the cell one or more prime editing guide RNAs (pegRNAs) comprising DNA synthesis templates encoding one or more recombinase recognition sites.
B1195.70174WO00 12131093.2
[0340] In some embodiments (e.g., embodiments in which the method is for insertion of a donor nucleic acid sequence into a target nucleic acid sequence), the method further comprises expressing in the cell a polynucleotide comprising DNA for insertion into the target nucleic acid. In certain embodiments, the DNA comprises one or more donor genes. In some embodiments, the DNA comprises a recombinase recognition site (e.g., an attB site). In some embodiments, the ratio of the polynucleotide encoding the prime editor to the polynucleotide encoding the pegRNA to the polynucleotide encoding the Bxb1 recombinase to the polynucleotide encoding the DNA for insertion into the target nucleic acid is about 10:1:10:15 (e.g., about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 : 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.2, 1.4, 1.6, 1.8, or 2 : 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, or 15 : 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20). In some embodiments, the prime editor installs a recombinase recognition site (e.g., an attP site) in the target nucleic acid, thereby facilitating Bxb1-mediated recombination with the recombinase recognition site flanking the DNA, resulting in insertion of the DNA into the target nucleic acid. [0341] In some embodiments (e.g., embodiments in which the method is for cassette exchange), the donor DNA sequence is flanked on both sides by a recombinase recognition site (e.g., an attB site). In some embodiments, the prime editor installs a first instance and a second instance of a recombinase recognition site (e.g., attP sites) in the target nucleic acid, thereby facilitating Bxb1- mediated recombination between the recombinase recognition sites in the target nucleic acid and the recombinase recognition sites flanking the DNA, resulting in excision of the target nucleic acid sequence between the first instance and the second instance of the recombinase recognition site and insertion of the DNA in its place. [0342] In some embodiments, the method comprises expressing in the cell a polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site (e.g., an attP site) and a polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site (e.g., an attB site). In some embodiments, the method comprises providing to the cell a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site (e.g., an attP site) and a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site (e.g., an attB site). In some embodiments, the first and the second recombinase recognition sites are encoded in the same orientation. In some embodiments (e.g., embodiments in which the method is for deletion of a target nucleic acid sequence), a first prime editor installs
B1195.70174WO00 12131093.2
the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated deletion of the nucleic acid between the first and the second recombinase recognition sites. In some embodiments (e.g., embodiments in which the method is for translocation between two nucleic acid molecules, such as two chromosomes), a first prime editor installs the first recombinase recognition site into a target nucleic acid sequence on a first chromosome, and a second prime editor installs the second recombinase recognition site into a target nucleic acid sequence on a second chromosome, thereby facilitating Bxb1-mediated recombination between the two chromosomes. In some embodiments, the first and the second recombinase recognition sites are encoded in opposite orientations. In some embodiments (e.g., embodiments in which the method is for inversion of a nucleic acid sequence), a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated inversion of the nucleic acid sequence between the first and the second recombinase recognition sites. [0343] In some embodiments, the method is a method for inserting DNA into a target nucleic acid in a cell using prime editing and a recombinase. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding a pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination with the second recombination site, resulting in insertion of the DNA into the target nucleic acid.
B1195.70174WO00 12131093.2
[0344] In some embodiments, the method is a method for exchanging DNA in a target nucleic acid in a cell using prime editing and a recombinase. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding one or more pegRNAs comprising a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding any of the Bxb1 recombinases provided herein; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs a first instance and a second instance of the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination between the first recombinase recognition sites in the target nucleic acid and the second recombinase recognition sites flanking the DNA, resulting in excision of the target nucleic acid sequence between the first instance and the second instance of the first recombinase recognition site and insertion of the DNA in its place. [0345] In some embodiments, the method is a method for deleting DNA from a target nucleic acid in a cell using prime editing and a recombinase. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein;
B1195.70174WO00 12131093.2
wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated deletion of the nucleic acid between the first and the second recombinase recognition sites. [0346] In some embodiments, the method is a method for recombining target nucleic acids in two chromosomes in a cell using prime editing and a recombinase. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a site in a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site in a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein; wherein a first prime editor installs the first recombinase recognition site into a target nucleic acid on a first chromosome, and a second prime editor installs the second recombinase recognition site into a target nucleic acid on a second chromosome, thereby facilitating Bxb1- mediated recombination between the two chromosomes. [0347] In some embodiments, the method is a method for inverting a target nucleic acid in a cell using prime editing and a recombinase. In certain embodiments, the method comprises expressing in a cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation as the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site;
B1195.70174WO00 12131093.2
(iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding any of the Bxb1 recombinases provided herein; wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated inversion of the nucleic acid between the first and the second recombinase recognition sites. [0348] In some embodiments, the method is a method for inserting DNA into a target nucleic acid in a cell using prime editing and a recombinase comprising providing to the cell a pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site, and expressing in the cell: (i) a first polynucleotide encoding any of the Bxb1 recombinases described herein; (ii) a second polynucleotide encoding a prime editor; and (iii) a third polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination with the second recombination site, resulting in insertion of the DNA into the target nucleic acid. [0349] In some embodiments, the method is a method for exchanging DNA in a target nucleic acid in a cell using prime editing and a recombinase comprising providing to the cell one or more pegRNAs comprising a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; and expressing in the cell: (i) a first polynucleotide encoding any of the Bxb1 recombinases described herein; (ii) a second polynucleotide encoding a prime editor; and (iii) a third polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site;
B1195.70174WO00 12131093.2
wherein the prime editor installs a first instance and a second instance of the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination between the first recombinase recognition sites in the target nucleic acid and the second recombinase recognition sites flanking the DNA, resulting in excision of the target nucleic acid sequence between the first instance and the second instance of the first recombinase recognition site and insertion of the DNA in its place. [0350] In some embodiments, the method is a method for deleting DNA from a target nucleic acid in a cell using prime editing and a recombinase comprising providing to the cell 1) a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site, and 2) a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; and expressing in the cell: (i) a first polynucleotide encoding a prime editor; and (ii) a second polynucleotide encoding any of the Bxb1 recombinases described herein; wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated deletion of the nucleic acid between the first and the second recombinase recognition sites. [0351] In some embodiments, the method is a method for recombining target nucleic acids in two chromosomes in a cell using prime editing and a recombinase comprising providing to the cell 1) a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site, and 2) a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site on a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; and expressing in the cell:
B1195.70174WO00 12131093.2
(i) a first polynucleotide encoding a prime editor; and (ii) a second polynucleotide encoding any of the Bxb1 recombinases described herein; wherein a first prime editor installs the first recombinase recognition site into a target nucleic acid on a first chromosome, and a second prime editor installs the second recombinase recognition site into a target nucleic acid on a second chromosome, thereby facilitating Bxb1- mediated recombination between the two chromosomes. [0352] In some embodiments, the method is a method for inverting a target nucleic acid in a cell using prime editing and a recombinase comprising providing to the cell 1) a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site, and 2) a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation as the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; and expressing in the cell: (i) a first polynucleotide encoding a prime editor; and (ii) a second polynucleotide encoding any of the Bxb1 recombinases described herein; wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated inversion of the nucleic acid between the first and the second recombinase recognition sites. [0353] In some embodiments, any of the polynucleotides used in the systems, compositions, and methods provided herein comprise DNA. In some embodiments, any of the polynucleotides used in the systems, compositions, and methods provided herein are DNA. In some embodiments, any of the polynucleotides used in the systems, compositions, and methods provided herein comprise RNA. In certain embodiments, any of the polynucleotides used in the systems, compositions, and methods provided herein are RNA. In certain embodiments, any of the polynucleotides encoding a Bxb1 recombinase and/or any of the polynucleotides encoding a prime editor used in the systems, compositions, and methods provided herein comprises RNA (e.g., mRNA) or are RNA (e.g., mRNA). [0354] In some embodiments, the methods provided herein utilize two pegRNAs designed to promote integration of a donor DNA into a target nucleic acid and prevent unwanted
B1195.70174WO00 12131093.2
recombination with other molecules. For example, in some embodiments, the methods utilize a first pegRNA and a second pegRNA that each produce DNA flaps on the target nucleic acid that partially overlap each other, wherein each flap comprises a 5′ portion that does not overlap with the other flap. In certain embodiments, the partially overlapping flaps promote integration of a donor DNA into the target nucleic acid and prevent recombination between a polynucleotide encoding the donor DNA and a polynucleotide encoding a pegRNA. [0355] In some embodiments, any of the methods described herein are performed in vitro. In some embodiments, any of the methods described herein are performed ex vivo. In some embodiments, any of the methods described herein are performed in vivo. In some embodiments, any of the methods described herein are performed in a subject. In certain embodiments, the subject is a human. In some embodiments, editing a target nucleic acid using any of the methods described herein may be performed in order to treat a disease or disorder, for example, in a subject such as a human. napDNAbp [0356] In various embodiments, the prime editors utilized in the systems and methods described herein comprise a nucleic acid programmable DNA binding protein (napDNAbp). [0357] In various embodiments, prime editors may include a napDNAbp domain having a wild type Cas9 sequence, including, for example the canonical Streptococcus pyogenes Cas9 sequence of SEQ ID NO: 6, shown as follows. Description Sequence SEQ ID NO: S C 9 MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGN 6
B1195.70174WO00 12131093.2
QLKRRRYTGWGRLSRKLINGIRDKQSGKTILDFLKSDG FANRNFMQLIHDDSLTFKEDIQKAQVSGQGDSLHEHIA 7
B1195.70174WO00 12131093.2
GRDMYVDQELDINRLSDYDVDAIVPQSFLKDDSIDNKVLT RSDKNRGKSDNVPSEEVVKKMKNYWRQLLNAKLITQRKF
described above, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto. In some embodiments, the prime editors used in the methods described herein include any of the following other wild type SpCas9 sequences, which may be modified with one or more of the mutations described herein at corresponding amino acid positions: Description Sequence S C 9 ATGGATAAGAAATACTCAATAGGCTTAGATATCGGCACAAATAGCG AG C CT C A A T A T GG G C AT G C G A T A A
B1195.70174WO00 12131093.2
TCATCAAGACTTGACTCTTTTAAAAGCTTTAGTTCGACAACAACTTC CAGAAAAGTATAAAGAAATCTTTTTTGATCAATCAAAAAACGGATA A TT T C C T A TA T TA A A A G G T AC T G A AT T C T CA G GT TC T AG T A A C GA CC TG C C CA A T AT T
B1195.70174WO00 12131093.2
TCTTCAAAACAGAAATTACACTTGCAAATGGAGAGATTCGCAAACG CCCTCTAATCGAAACTAATGGGGAAACTGGAGAAATTGTCTGGGAT C C TC C G A A C G A G C TT G A CT C C IG F D FE G N E N LT R SG L H A L G D W A Y EI D DP KN
B1195.70174WO00 12131093.2
PIDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASAGELQKGNEL ALPSKYVNFLYLASHYEKLKGSPEDNEQKQLFVEQHKHYLDEIIEQISEF K T A CT TC G G G G A CT G A G A A A A A T A AA T G A T T A A T A T C TT T A G T A TT G A
B1195.70174WO00 12131093.2
AAACATACGCTCACCTGTTCGACGATAAGGTTATGAAACAGTTAAA GAGGCGTCGCTATACGGGCTGGGGACGATTGTCGCGGAAACTTATC C AT TC TT A A A A T A G C TG G T G A C T T G G A TT GC G A A G A A CA AG A A G A C TC G A A A C
B1195.70174WO00 12131093.2
TGATTCACCAATCCATCACGGGATTATATGAAACTCGGATAGATTTG TCACAGCTTGGGGGTGACGGATCCCCCAAGAAGAAGAGGAAAGTCT T IG F D F L K LP L IL E LS S A F I I R K TK EI KS D K S Q EII A S G A T A A G A G A A A C C
B1195.70174WO00 12131093.2
GCAAGTGGAGTAGATGCTAAAGCGATTCTTTCTGCACGATTGAGTAA ATCAAGACGATTAGAAAATCTCATTGCTCAGCTCCCCGGTGAGAAG C C G T A A T A A T C A A T G C TT G A G A TT T T C A G G AA A A CC T A CG A AA G T A TC CT A TC
B1195.70174WO00 12131093.2
AAACGCCAATTGGTTGAAACTCGCCAAATCACTAAGCATGTGGCAC AAATTTTGGATAGTCGCATGAATACTAAATACGATGAAAATGATAA T C C G G AT TC T C G TT G A A T A G C A G GT T A C A C A IG F D F L K LP L IL E LS S A F I I
B1195.70174WO00 12131093.2
EEGIKELGSQILKEHPVENTQLQNEKLYLYYLQNGRDMYVDQELDINR LSDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMK TK EI KS D K S Q EII A
the above SpCas9 sequences, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto. In other embodiments, the Cas9 protein can be a wild type Cas9 ortholog from another bacterial species different from the canonical Cas9 from S. pyogenes. For example, modified versions of the following Cas9 orthologs can be used in connection with the prime editors described in this specification by making mutations at positions corresponding to H840A or any other amino acids of interest in wild type SpCas9. In addition, any variant Cas9 orthologs having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of the below orthologs may also be used with the prime editors. Description Sequence R K N E Y L Q L T Y
B1195.70174WO00 12131093.2
Description Sequence KGWGRLSKKLLTGIVDENGQRIIDLMWNTDQNFKEIVDQPVFKEQIDQL G K F R I K V F L K H A N P L Y D I M I L Q I N Y K
B1195.70174WO00 12131093.2
Description Sequence DKVLSAYNKHRDKPIREQAENIIHLFTLTNLGAPAAFKYFDTTIDRKRYT S Q Q R P P Y K Q N V E K K I H F K A F R E K LI I K
B1195.70174WO00 12131093.2
Description Sequence RLEKSLKELGSKILKENIPAKLSKIDNNALQNDRLYLYYLQNGKDMYTG V R K A N I L G A P I GI D L E N F W A Q L L R G R S
B1195.70174WO00 12131093.2
Description Sequence LKISDKAMVLNQILILLHSNATSPVLEKLGYHTRFTLGKKHNLISENAVL R E K L F A Y L L S V G M R Y H Q A N E L V I E
B1195.70174WO00 12131093.2
Description Sequence NGIITKDKLLMTFKFRIPYYVGPLNSYHKDKGGNSWIVRKEEGKILPWNF Q K K A AI D I F T FI E P K Y I Y E K E D L D M K E P E N K E
B1195.70174WO00 12131093.2
Description Sequence KDRGLTDVEILIPKVLINSLFRYNGSLVRITGRGDTRLLLVHEQPLYVSNS Y D D V Q Q F S Y F N N R S V L G I Q S S K L
B1195.70174WO00 12131093.2
Description Sequence Wild t e SVYYSDLKDNEDKIRSILTFRIPYYFGPLNITKDRQFDWIIKKEGKENERIL N R Q D V H D L N K F V F R E E S K V K V D K
B1195.70174WO00 12131093.2
Description Sequence QEILGITIMEKTRFEQNPILFLEEKGFLRPRVLMKLPKYTLYEFPEGRRRL N FI V L AI K F R A K V E C I Q L V Q D Y I V R
B1195.70174WO00 12131093.2
Description Sequence LHYDENEENEFKNRNLNDTRYISRFLANFIREHLKFADSDDKQKVYTVN R K Q E I K Y E E P A E L I S F A K V L S Y F L
[0361] The napDNAbp used in the prime editors described herein may include any suitable homologs and/or orthologs or naturally occurring enzymes, such as Cas9. Cas9 homologs and/or
B1195.70174WO00 12131093.2
orthologs have been described in various species, including, but not limited to, S. pyogenes and S. thermophilus. The Cas moiety may be configured (e.g., mutagenized, recombinantly engineered, or otherwise obtained from nature) as a nickase, i.e., capable of cleaving only a single strand of the target double-stranded DNA. In some embodiments, a Cas9 nuclease has an inactive (e.g., an inactivated) DNA cleavage domain; that is, the Cas9 is a nickase. In some embodiments, the Cas9 protein comprises an amino acid sequence that is at least 85%, at least 90%, at least 92%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or at least 99.5% identical to the amino acid sequence of a Cas9 protein as provided by any one of the Cas9 orthologs in the above tables. [0362] Additional suitable napDNAbp sequences that can be used in prime editors will be apparent to those of skill in the art based on this disclosure, and such Cas9 nucleases and sequences include Cas9 sequences from the organisms and loci disclosed in Chylinski, Rhun, and Charpentier, “The tracrRNA and Cas9 families of type II CRISPR-Cas immunity systems” (2013) RNA Biology 10:5, 726-737; the entire contents of which are incorporated herein by reference. Additional exemplary Cas variants and homologs include, but are not limited to, Cas9 (e.g., dCas9 and nCas9), Cpf1, CasX, CasY, C2c1, C2c2, C2c3, GeoCas9, CjCas9, Cas12a, Cas12b, Cas12g, Cas12h, Cas12i, Cas13b, Cas13c, Cas13d, Cas14, Csn2, xCas9, SpCas9-NG, Nme2Cas9, circularly permuted Cas9, Argonaute (Ago), Cas9-KKH, SmacCas9, Spy-macCas9, SpCas9-VRQR, SpCas9-NRRH, SpaCas9-NRTH, SpCas9-NRCH, LbCas12a, AsCas12a, CeCas12a, MbCas12a, Cas3, CasΦ, and circularly permuted Cas9 domains such as CP1012, CP1028, CP1041, CP1249, and CP1300, and variants and homologs thereof. Reverse transcriptase domain [0363] In various embodiments, the prime editors used in the systems and methods described herein comprise a reverse transcriptase domain. In some embodiments, the reverse transcriptase domain is a wild type MMLV reverse transcriptase. In some embodiments, the reverse transcriptase domain is a variant of wild type MMLV reverse transcriptase having the amino acid sequence of SEQ ID NO: 29. [0364] For example, PE2 and PEmax comprise a variant reverse transcriptase domain of SEQ ID NO: 29, which is based on the wild type MMLV reverse transcriptase domain of SEQ ID NO: 28 (and, in particular, a Genscript codon optimized MMLV reverse transcriptase having the nucleotide sequence of SEQ ID NO: 28) and which comprises amino acid substitutions D200N,
B1195.70174WO00 12131093.2
T306K, W313F, T330P, and L603W relative to the wild type MMLV RT of SEQ ID NO: 28. The amino acid sequence of the variant RT of PE2 and PEmax is SEQ ID NO: 29. [0365] Prime editors may also comprise other variant RTs as well. In various embodiments, the prime editors used in the methods described herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising one or more of the following mutations: P51L, S67K, E69K, L139P, T197A, D200N, H204R, F209N, E302K, E302R, T306K, F309N, W313F, T330P, L345G, L435G, N454K, D524G, E562Q, D583N, H594Q, L603W, E607K, or D653N in the wild type M-MLV RT of SEQ ID NO: 28, or at a corresponding amino acid position in another wild type RT polypeptide sequence. [0366] Some exemplary reverse transcriptases that can be fused to napDNAbp proteins or provided as individual proteins according to various embodiments of this disclosure are provided below. Exemplary reverse transcriptases include variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the following wild-type enzymes or partial enzymes: Description Sequence (variant substitutions relative to wild type) L Y L G L P L S K L L Y L G
B1195.70174WO00 12131093.2
Description Sequence (variant substitutions relative to wild type) PDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKL P L S K L L Y L G L P L S K L L Y L G L P L S W A W
B1195.70174WO00 12131093.2
Description Sequence (variant substitutions relative to wild type) GPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQ G V D A L Y L G L P L S L Y L G L P L S L L Y L G
B1195.70174WO00 12131093.2
Description Sequence (variant substitutions relative to wild type) PDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKL P L S L Y L G L P L S L Y L G L P L S L Y L G
B1195.70174WO00 12131093.2
Description Sequence (variant substitutions relative to wild type) PDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKL P L S L Y L P P L S L Y L G L P L S A L Y L
B1195.70174WO00 12131093.2
Description Sequence (variant substitutions relative to wild type) PDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLTQKL P L S L Y L G L P L S L P W M V T K P W
B1195.70174WO00 12131093.2
Description Sequence (variant substitutions relative to wild type) W313F WGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQGYAKGVLT M V T K L Y L L P L S d
herein (with RT provided as either a fusion partner or in trans) can include a variant RT comprising one or more of the following mutations: P51X, S67X, E69X, L139X, T197X, D200X, H204X, F209X, E302X, T306X, F309X, W313X, T330X, L345X, L435X, N454X, D524X, E562X, D583X, H594X, L603X, E607X, or D653X in the wild type M-MLV RT of SEQ ID NO: 28, or at a corresponding amino acid position in another wild type RT polypeptide sequence, wherein “X” can be any amino acid. [0368] Some exemplary reverse transcriptases that can be fused to napDNAbp proteins or provided as individual proteins according to various embodiments of this disclosure are provided below. Exemplary reverse transcriptases include variants with at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to the wild-type enzymes or partial enzymes described in SEQ ID NOs: 28-78. [0369] The prime editor (PE) system described here contemplates any publicly-available reverse transcriptase described or disclosed in any of the following U.S. patents (each of which are incorporated by reference): U.S. Patent Nos: 10,202,658; 10,189,831; 10,150,955; 9,932,567;
B1195.70174WO00 12131093.2
9,783,791; 9,580,698; 9,534,201; and 9,458,484, and any variant thereof that can be made using known methods for installing mutations or known methods for evolving proteins. The following references describe reverse transcriptases in known the art. Each of their disclosures are incorporated herein by reference. [0370] Herzig, E., Voronin, N., Kucherenko, N. & Hizi, A. A Novel Leu92 Mutant of HIV-1 Reverse Transcriptase with a Selective Deficiency in Strand Transfer Causes a Loss of Viral Replication. J. Virol.89, 8119–8129 (2015). [0371] Mohr, G. et al. A Reverse Transcriptase-Cas1 Fusion Protein Contains a Cas6 Domain Required for Both CRISPR RNA Biogenesis and RNA Spacer Acquisition. Mol. Cell 72, 700- 714.e8 (2018). [0372] Zhao, C., Liu, F. & Pyle, A. M. An ultraprocessive, accurate reverse transcriptase encoded by a metazoan group II intron. RNA 24, 183–195 (2018). [0373] Zimmerly, S. & Wu, L. An Unexplored Diversity of Reverse Transcriptases in Bacteria. Microbiol Spectr 3, MDNA3-0058–2014 (2015). [0374] Ostertag, E. M. & Kazazian Jr, H. H. Biology of Mammalian L1 Retrotransposons. Annual Review of Genetics 35, 501–538 (2001). [0375] Perach, M. & Hizi, A. Catalytic Features of the Recombinant Reverse Transcriptase of Bovine Leukemia Virus Expressed in Bacteria. Virology 259, 176–189 (1999). [0376] Lim, D. et al. Crystal structure of the moloney murine leukemia virus RNase H domain. J. Virol.80, 8379–8389 (2006). [0377] Zhao, C. & Pyle, A. M. Crystal structures of a group II intron maturase reveal a missing link in spliceosome evolution. Nature Structural & Molecular Biology 23, 558–565 (2016). [0378] Griffiths, D. J. Endogenous retroviruses in the human genome sequence. Genome Biol.2, REVIEWS1017 (2001). [0379] Baranauskas, A. et al., Generation and characterization of new highly thermostable and processive M-MuLV reverse transcriptase variants. Protein Eng Des Sel 25, 657–668 (2012). [0380] Zimmerly, S., Guo, H., Perlman, P. S. & Lambowltz, A. M. Group II intron mobility occurs by target DNA-primed reverse transcription. Cell 82, 545–554 (1995). [0381] Feng, Q., Moran, J. V., Kazazian, H. H. & Boeke, J. D. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905–916 (1996).
B1195.70174WO00 12131093.2
[0382] Berkhout, B., Jebbink, M. & Zsíros, J. Identification of an Active Reverse Transcriptase Enzyme Encoded by a Human Endogenous HERV-K Retrovirus. Journal of Virology 73, 2365– 2375 (1999). [0383] Kotewicz, M. L., Sampson, C. M., D’Alessio, J. M. & Gerard, G. F. Isolation of cloned Moloney murine leukemia virus reverse transcriptase lacking ribonuclease H activity. Nucleic Acids Res 16, 265–277 (1988). [0384] Arezi, B. & Hogrefe, H. Novel mutations in Moloney Murine Leukemia Virus reverse transcriptase increase thermostability through tighter binding to template-primer. Nucleic Acids Res 37, 473–481 (2009). [0385] Blain, S. W. & Goff, S. P. Nuclease activities of Moloney murine leukemia virus reverse transcriptase. Mutants with altered substrate specificities. J. Biol. Chem.268, 23585–23592 (1993). [0386] Xiong, Y. & Eickbush, T. H. Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J 9, 3353–3362 (1990). [0387] Herschhorn, A. & Hizi, A. Retroviral reverse transcriptases. Cell. Mol. Life Sci.67, 2717–2747 (2010). [0388] Taube, R., Loya, S., Avidan, O., Perach, M. & Hizi, A. Reverse transcriptase of mouse mammary tumour virus: expression in bacteria, purification and biochemical characterization. Biochem. J.329 ( Pt 3), 579–587 (1998). [0389] Liu, M. et al. Reverse Transcriptase-Mediated Tropism Switching in Bordetella Bacteriophage. Science 295, 2091–2094 (2002). [0390] Luan, D. D., Korman, M. H., Jakubczak, J. L. & Eickbush, T. H. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72, 595–605 (1993). [0391] Nottingham, R. M. et al. RNA-seq of human reference RNA samples using a thermostable group II intron reverse transcriptase. RNA 22, 597–613 (2016). [0392] Telesnitsky, A. & Goff, S. P. RNase H domain mutations affect the interaction between Moloney murine leukemia virus reverse transcriptase and its primer-template. Proc. Natl. Acad. Sci. U.S.A.90, 1276–1280 (1993).
B1195.70174WO00 12131093.2
[0393] Halvas, E. K., Svarovskaia, E. S. & Pathak, V. K. Role of Murine Leukemia Virus Reverse Transcriptase Deoxyribonucleoside Triphosphate-Binding Site in Retroviral Replication and In Vivo Fidelity. Journal of Virology 74, 10349–10358 (2000). [0394] Nowak, E. et al., Structural analysis of monomeric retroviral reverse transcriptase in complex with an RNA/DNA hybrid. Nucleic Acids Res 41, 3874–3887 (2013). [0395] Stamos, J. L., Lentzsch, A. M. & Lambowitz, A. M. Structure of a Thermostable Group II Intron Reverse Transcriptase with Template-Primer and Its Functional and Evolutionary Implications. Molecular Cell 68, 926-939.e4 (2017). [0396] Das, D. & Georgiadis, M. M. The Crystal Structure of the Monomeric Reverse Transcriptase from Moloney Murine Leukemia Virus. Structure 12, 819–829 (2004). [0397] Avidan, O., Meer, M. E., Oz, I. & Hizi, A. The processivity and fidelity of DNA synthesis exhibited by the reverse transcriptase of bovine leukemia virus. European Journal of Biochemistry 269, 859–867 (2002). [0398] Gerard, G. F. et al. The role of template-primer in protection of reverse transcriptase from thermal inactivation. Nucleic Acids Res 30, 3118–3129 (2002). [0399] Monot, C. et al. The Specificity and Flexibility of L1 Reverse Transcription Priming at Imperfect T-Tracts. PLOS Genetics 9, e1003499 (2013). [0400] Mohr, S. et al. Thermostable group II intron reverse transcriptase fusion proteins and their use in cDNA synthesis and next-generation RNA sequencing. RNA 19, 958–970 (2013). [0401] In some embodiments, the prime editor proteins comprise an MMLV reverse transcriptase comprising one or more amino acid substitutions. The wild-type MMLV reverse transcriptase is provided by the following sequence: DESCRIPTION SEQUENCE A L V W P K Q K T
B1195.70174WO00 12131093.2
DESCRIPTION SEQUENCE DQPLPDADHTWYTDGSSLLQEGQRKAGAAVTTETEVIWAKA H Q
or more mutations relative to the wild-type amino acid sequence. In some embodiments, the reverse transcriptase is the MMLV pentamutant described above (i.e., comprising amino acid substitutions D200N, T306K, W313F, T330P, and L603W). [0403] The disclosure also contemplates the use of any wild-type reverse transcriptase in the prime editors described herein. Exemplary wild-type reverse transcriptases which may be used include, but are not limited to, the following sequences, or any variant thereof having at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity thereto: SEQ DESCRIPTION SEQUENCE ID O 0 1
least 80%, at least 85%, at least 90%, at least 95%, or at least 99% sequence identity to any of
B1195.70174WO00 12131093.2
the enzymes above in the prime editor proteins disclosed herein is also contemplated by the present disclosure. [0405] In some embodiments, the domain comprising an RNA-dependent DNA polymerase activity comprises a Tf1 reverse transcriptase. For example, the prime editor proteins described herein may comprise a Tf1 reverse transcriptase comprising one or more mutations relative to the amino acid sequence of SEQ ID NO: 30. In some embodiments, the Tf1 reverse transcriptase comprises one or more mutations selected from the group consisting of V14A, E22K, P70T, G72V, M102I, K106R, K118R, A139T, L158Q, F269L, S297Q, K356E, A363V, K413E, I423V, and S492N relative to the amino acid sequence of SEQ ID NO: 30. In certain embodiments, the Tf1 reverse transcriptase comprises any one of the following groups of amino acid substitutions relative to the amino acid sequence of SEQ ID NO: 30: K118R and S297Q; V14A, L158Q, F269L, and K356E; K106R, L158Q, F269L, A363V, and I423V; E22K, P70T, G72V, M102I, K106R, A139T, L158Q, F269L, A363V, K413E, and S492N; or P70T, G72V, M102I, K106R, L158Q, F269L, A363V, K413E, and S492N. [0406] In some embodiments, the present disclosure provides reverse transcriptases, and prime editors (e.g. fusion proteins or prime editors in which each component is provided in trans) comprising reverse transcriptases, wherein the reverse transcriptase is a Tf1 reverse transcriptase of SEQ ID NO: 30, or a Tf1 reverse transcriptase variant having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with SEQ ID NO: 30, wherein the Tf1 reverse transcriptase variant comprises one or more mutations selected from the group consisting of V14A, E22K, I64L, I64W, P70T, G72V, M102I, K106R, K118R, L133N, A139T, L158Q, S188K, I260L, F269L, E274R, R288Q, Q293K, S297Q, N316Q, K321R, K356E, A363V, K413E, I423V, and S492N relative to SEQ ID NO: 30. In some embodiments, the Tf1 reverse transcriptase variant comprises a single mutation, wherein the single mutation is an I64L mutation, an I64W mutation, a K118R mutation, an L133N mutation, an S188K mutation, an I260L mutation, an E274R mutation, an R288Q mutation, a Q293K mutation, an S297Q mutation, an N316Q mutation, or a K321R mutation.
B1195.70174WO00 12131093.2
[0407] In some embodiments, the Tf1 reverse transcriptase variant comprises any one of the following groups of mutations relative to the amino acid sequence of SEQ ID NO: 30: K118R and S297Q; V14A, L158Q, F269L, and K356E; E22K, P70T, G72V, M102I, K106R, A139T, L158Q, F269L, A363V, K413E, and S492N; P70T, G72V, M102I, K106R, L158Q, F269L, A363V, K413E, and S492N; K106R, L158Q, F269L, A363V, and I423V; K118R, S297Q, S188K, I64L, I260L, and R288Q; E22K, P70T, G72V, M102I, K106R, A139T, L158Q, F269L, A363V, K413E, S492N, K118R, S297Q, S188K, I64L, and I260L; K118R and S188K; K118R, S188K, and I260L; K118R, S188K, I260L, and S297Q; or K118R, S188K, I260L, R288K, and S297Q. [0408] In certain embodiments, the Tf1 reverse transcriptase variant comprises the amino acid sequence of any one of SEQ ID NOs: 30 and 56-78, or an amino acid sequence at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to any one of SEQ ID NOs: 30 and 56-78, wherein the amino acid sequence comprises at least one of residues 14A, 22K, 64L, 64W, 70T, 72V, 102I, 106R, 118R, 133N, 139T, 158Q, 188K, 260L, 269L, 274R, 288Q, 293K, 297Q, 316Q, 321R, 356E, 363V, 413E, 423V, 492N: Tf1 variant 5.131: ISSSKHTLSQMNKVSNIVKEPKLPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYR LPIRNYPLTPVKMQAMNDEINQGLKSGIIRESKAINACPVIFVPRKEGTLRMVVDYRPLN KYVKPNIYPLPLIEQLLTKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGLTPCQENIDKVLQWKQPKNRKELRQFLGQVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISI (SEQ ID NO: 56) Tf1 variant 5.27: ISSSKHTLSQMNKASNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGLTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKEILLETD ASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRHYL ESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSRI VDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 57)
B1195.70174WO00 12131093.2
Tf1 variant 5.47: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPRKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGLTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTVEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADAL SRIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 58) Tf1 variant 5.59: ISSSKHTLSQMNKVSNIVKEPKLPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYR LPIRNYPLTPVKMQAMNDEINQGLKSGIIRESKAINACPVIFVPRKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLTKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGLTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISI (SEQ ID NO: 59) Tf1 variant 5.60: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLTPVKMQAMNDEINQGLKSGIIRESKAINACPVIFVPRKEGTLRMVVDYKPLNK YVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGLTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISISGGSKRTADGSEFEPKKKRKV (SEQ ID NO: 60) Tf1 variant 5.612: ISSSKHTLSQMNKVSNIVKEPKLPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYR LPLRNYPLTPVKMQAMNDEINQGLKSGIIRESKAINACPVIFVPRKEGTLRMVVDYRPLN KYVKPNIYPLPLIEQLLTKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGIKTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFLGYHISEKGLTPCQENIDKVLQWKQPKNRKELRQFLGQVNY LRKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISI (SEQ ID NO: 61) Tf1 variant 5.618:
B1195.70174WO00 12131093.2
ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYRPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGQVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 62) Tf1 variant S188K: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGIKTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 63) Tf1 variant I260L: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFLGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGSVNY LRKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 64) Tf1 variant R288Q: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNQKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 65) Tf1 variant Q293K:
B1195.70174WO00 12131093.2
ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRKFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 66) Tf1 variant I64L: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PLRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 67) Tf1 variant I64W: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PWRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPL NKYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYL VMPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKN ANLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGSVNY LRKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 68) Tf1 variant N316Q: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLQKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 69) Tf1 variant K321R: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN
B1195.70174WO00 12131093.2
KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKRDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 70) Tf1 variant L133N: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN KYVKPNIYPLPNIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 71) Tf1 variant K118R: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYRPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 72) Tf1 variant K118R: Tf1 variant S297Q: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYKPLN KYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGFTPCQENIDKVLQWKQPKNRKELRQFLGQVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHWRH YLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALS RIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 73) Tf1-rat4: MISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENY RLPIRNYPLPPGKMQAMNDEINQGLKSGIIRESKAINACPVMFVPKKEGTLRMVVDYRP LNKYVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHLIRVRKGDEHKLAFRCPRGVFEY LVMPYGIKTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLK
B1195.70174WO00 12131093.2
NANLIINQAKCEFHQSQVKFLGYHISEKGFTPCQENIDKVLQWKQPKNQKELRQFLGQV NYLRKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKIL LETDASDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLKHW RHYLESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADA LSRIVDETEPIPKDSEDNSINFVNQISI (SEQ ID NO: 74) Tf1evo3.1: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLTPVKMQAMNDEINQGLKSGIIRESKAINACPVIFVPRKEGTLRMVVDYKPLNK YVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYCINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGLTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISI (SEQ ID NO: 75) Tf1evo3.2: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLTPVKMQAMNDEINQGLKGGIIRESKAINACPVIFVPRKEGTLRMVVDYRPLNK YVKPNVYPLPLIEQLLAKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGISTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFIGYHISEKGLTPCQENIDKVLQWKQPKNRKELRQFLGSVNYL RKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISI (SEQ ID NO: 76) Tf1evo+rat-1: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLTPVKMQAMNDEINQGLKGGIIRESKAINACPVIFVPRKEGTLRMVVDYRPLNK YVKPNVYPLPLIEQLLAKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGIKTAPAHFQYFINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA NLIINQAKCEFHQSQVKFLGYHISEKGLTPCQENIDKVLQWKQPKNQKELRQFLGQVNY LRKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISI (SEQ ID NO: 77) Tf1evo+rat2: ISSSKHTLSQMNKVSNIVKEPELPDIYKEFKDITADTNTEKLPKPIKGLEFEVELTQENYRL PIRNYPLTPVKMQAMNDEINQGLKSGIIRESKAINACPVIFVPRKEGTLRMVVDYRPLNK YVKPNIYPLPLIEQLLAKIQGSTIFTKLDLKSAYHQIRVRKGDEHKLAFRCPRGVFEYLV MPYGIKTAPAHFQYCINTILGEAKESHVVCYMDDILIHSKSESEHVKHVKDVLQKLKNA
B1195.70174WO00 12131093.2
NLIINQAKCEFHQSQVKFLGYHISEKGLTPCQENIDKVLQWKQPKNQKELRQFLGQVNY LRKFIPKTSQLTHPLNKLLKKDVRWKWTPTQTQAIENIKQCLVSPPVLRHFDFSKKILLET DVSDVAVGAVLSQKHDDDKYYPVGYYSAKMSKAQLNYSVSDKEMLAIIKSLEHWRHY LESTIEPFKILTDHRNLIGRITNESEPENKRLARWQLFLQDFNFEINYRPGSANHIADALSR IVDETEPIPKDNEDNSINFVNQISI (SEQ ID NO: 78) [0409] In some embodiments, the domain comprising an RNA-dependent DNA polymerase activity comprises an Ec48 reverse transcriptase. For example, the prime editor proteins described herein may comprise an Ec48 reverse transcriptase comprising one or more mutations relative to the amino acid sequence of SEQ ID NO: 31. In some embodiments, the Ec48 reverse transcriptase comprises one or more mutations selected from the group consisting of A36V, E54K, K87E, R205K, V214L, D243N, R267I, S277F, E279K, N317S, K318E, H324Q, K326E, E328K, and R372K relative to the amino acid sequence of SEQ ID NO: 31. In certain embodiments, the Ec48 reverse transcriptase comprises any one of the following groups of amino acid substitutions relative to the amino acid sequence of SEQ ID NO: 31: R267I, K318E, K326E, E328K, and R372K; K87E, R205K, V214L, D243N, R267I, N317S, K318E, H324Q, and K326E; E54K, K87E, D243N, R267I, E279K, and K318E; A36V, K87E, R205K, D243N, R267I, E279K, and K318E; E54K, K87E, D243N, R267I, E279K, and K318E; or E54K, K87E, D243N, R267I, S277F, E279K, and K318E. [0410] In some embodiments, the present disclosure provides reverse transcriptases, and prime editors (e.g. fusion proteins or prime editors in which each component is provided in trans) comprising reverse transcriptases, wherein the reverse transcriptase is an Ec48 reverse transcriptase of SEQ ID NO: 31, or an Ec48 reverse transcriptase variant having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% sequence identity with SEQ ID NO: 31, wherein the Ec48 reverse transcriptase variant comprises one or more mutations selected from the group consisting of A36V, E54K, E60K, K87E, S151T, E165D, L182N, T189N, R205K, V214L, D243N, R267I, S277F, E279K, V303M, K307R, R315K, N317S, K318E, H324Q, K326E, E328K, K343N, R372K, R378K, and T385R relative to SEQ ID NO: 31. In some embodiments, the Ec48 reverse transcriptase variant comprises a single mutation, wherein the single mutation is an L182N
B1195.70174WO00 12131093.2
mutation, a T189N mutation, a K307R mutation, an R315K mutation, an R378K mutation, or a T385R mutation. [0411] In some embodiments, the Ec48 reverse transcriptase variant comprises any one of the following groups of mutations relative to the amino acid sequence of SEQ ID NO: 31: R267I, K318E, K326E, E328K, and R372K; K87E, R205K, V214L, D243N, R267I, N317S, K318E, H324Q, and K326E; E54K, K87E, D243N, R267I, E279K, and K318E; A36V, K87E, R205K, D243N, R267I, E279K, and K318E; E54K, K87E, D243N, R267I, E279K, and K318E; E54K, K87E, D243N, R267I, S277F, E279K, and K318E; E60K, K87E, E165D, D243N, R267I, E279K, K318E, and K343N; E60K, K87E, S151T, E165D, D243N, R267I, E279K, V303M, K318E, and K343N; or R315K, L182N, and T189N. [0412] In certain embodiments, the Ec48 reverse transcriptase variant comprises the amino acid sequence of any one of SEQ ID NOs: 31 and 48-55 or an amino acid sequence at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99% identical to any one of SEQ ID NOs: 31 and 48-55, wherein the amino acid sequence comprises at least one of residues 36V, 54K, 60K, 87E, 151T, 165D, 182N, 189N, 205K, 214L, 243N, 267I, 277F, 279K, 303M, 307R, 315K, 317S, 318E, 324Q, 326E, 328K, 343N, 372K, 378K, and 385R: Ec48 variant 3.23: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDEKYTLKEI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALEYLVDICTKDDFVVQGAL TSSYIATLCLFAVEGDVVRRAQKKGLVYTRLLDDITVSSKISNYDFSQMQSHIERMLSEH NLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPSDEVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGRVSELGRVGQEEYESFKKQLQAIKPMPSKRDVAVIDAAIKSLELSYSK GNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 48) Ec48 variant 3.35 (or Ec48-evo2): GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDKKYTLKEI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALEYLVDICTKDDFVVQGAL TSSYIATLCLFAVEGDVVRRAQRKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH NLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPSDKVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGRVNELGRVGHEKYESFKKQLQAIKPMPSKRDVAVIDAAIKSLELSYSK GNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 49) Ec48 variant 3.36:
B1195.70174WO00 12131093.2
GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKVLSISVEELKAIAELSLDEKYTLKEI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALEYLVDICTKDDFVVQGAL TSSYIATLCLFAVEGDVVRRAQKKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH NLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPSDKVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGRVNELGRVGHEKYESFKKQLQAIKPMPSKRDVAVIDAAIKSLELSYSK GNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 50) Ec48 variant 3.37: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDKKYTLKEI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALEYLVDICTKDDFVVQGAL TSSYIATLCLFAVEGDVVRRAQRKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH NLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPSDKVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGRVNELGRVGHEKYESFKKQLQAIKPMPSKRDVAVIDAAIKSLELSYSK GNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 49) Ec48 variant 3.38: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDKKYTLKEI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALEYLVDICTKDDFVVQGAL TSSYIATLCLFAVEGDVVRRAQRKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH NLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPFDKVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGRVNELGRVGHEKYESFKKQLQAIKPMPSKRDVAVIDAAIKSLELSYSK GNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 51) Ec48 variant 3.500: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDEKYTLKKI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALDYLVDICTKDDFVVQGAL TSSYIATLCLFAVEGDVVRRAQRKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH NLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPSDKVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGRVNELGRVGHEKYESFKKQLQAIKPMPSNRDVAVIDAAIKSLELSYSK GNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 52) Ec48 variant 3.501: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDEKYTLKKI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRTVFEEILHIKDEALDYLVDICTKDDFVVQGAL TSSYIATLCLFAVEGDVVRRAQRKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH NLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPSDKVKRIRASIHNLKLLAAKNNTKTSM AYRKEFNRCMGRVNELGRVGHEKYESFKKQLQAIKPMPSNRDVAVIDAAIKSLELSYSK GNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 53) Ec48 variant 3.8 (or Ec48-evo1):
B1195.70174WO00 12131093.2
GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDEKYTLKEI PKIDGSKRIVYSLHPKMRLLQSRINKRIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALEYLVDICTKDDFVVQGAL TSSYIATLCLFAVEGDVVRRAQRKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH DLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPSDEVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGRVNELGRVGHEEYKSFKKQLQAIKPMPSKRDVAVIDAAIKSLELSYSK GNQNKHWYKKKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 54) Ec48-v2: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDEKYTLKEI PKIDGSKRIVYSLHPKMRLLQSRINKRIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALEYLVDICTKDDFVVQGAN TSSYIANLCLFAVEGDVVRRAQRKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH DLPINKHKTKIFHCSSEPIKVHGLRVDYDSPRLPSDEVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGKVNKLGRVGHEKYESFKKQLQAIKPMPSKRDVAVIDAAIKSLELSYS KGNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 55) Ec48-evo3: GRPYVTLNLNGMFMDKFKPYSKSNAPITTLEKLSKALSISVEELKAIAELSLDEKYTLKKI PKIDGSKRIVYSLHPKMRLLQSRINERIFKELVVFPSFLFGSVPSKNDVLNSNVKRDYVSC AKAHCGAKTVLKVDISNFFDNIHRDLVRSVFEEILHIKDEALDYLVDICTKDDFVVQGAL TSSYIATLCLFAVEGDVVRRAQRKGLVYTRLVDDITVSSKISNYDFSQMQSHIERMLSEH NLPINKHKTKIFHCSSEPIKVHGLIVDYDSPRLPSDKVKRIRASIHNLKLLAAKNNTKTSV AYRKEFNRCMGRVNELGRVGHEKYESFKKQLQAIKPMPSNRDVAVIDAAIKSLELSYSK GNQNKHWYKRKYDLTRYKMIILTRSESFKEKLECFKSRLASLKPL (SEQ ID NO: 52) Nuclear localization sequences (NLS) [0413] In various embodiments, the prime editors used in the systems and methods described herein may comprise one or more nuclear localization sequences (NLS), which help promote translocation of a protein into the cell nucleus. Such sequences are well-known in the art and can include the following examples: DESCRIPTION SEQUENCE SEQ ID NO:
B1195.70174WO00 12131093.2
NLS OF POLYOMA VSRKRPRP 103 LARGE T-AG
. ntly described systems and methods may comprise any known NLS sequence, including any of those described in Cokol et al., “Finding nuclear localization signals,” EMBO Rep., 2000, 1(5): 411- 415 and Freitas et al., “Mechanisms and Signals for the Nuclear Import of Proteins,” Current Genomics, 2009, 10(8): 550-7, each of which are incorporated herein by reference. [0415] In various embodiments, the fusion proteins used in the systems and methods described herein further comprise one or more (and preferably at least two) nuclear localization sequences. In certain embodiments, the fusion proteins comprise at least two NLSs. In embodiments with at least two NLSs, the NLSs can be the same NLSs or they can be different NLSs. In some embodiments, one or more of the NLSs are bipartite NLSs (“bpNLS”). In certain embodiments, the disclosed fusion proteins comprise two bipartite NLSs. In some embodiments, the disclosed fusion proteins comprise more than two bipartite NLSs. [0416] The location of the NLS fusion can be at the N-terminus, the C-terminus, or within a sequence of a fusion protein (e.g., inserted between the encoded napDNAbp component (e.g., Cas9) and a polymerase domain (e.g., a reverse transcriptase). [0417] The NLSs may be any known NLS sequence in the art. The NLSs may also be any future-discovered NLSs for nuclear localization. The NLSs also may be any naturally-occurring NLS, or any non-naturally occurring NLS (e.g., an NLS with one or more desired mutations). [0418] The term “nuclear localization sequence” or “NLS” refers to an amino acid sequence that promotes import of a protein into the cell nucleus, for example, by nuclear transport. Nuclear localization sequences are known in the art and would be apparent to the skilled artisan. For example, NLS sequences are described in Plank et al., International PCT application PCT/EP2000/011690, filed November 23, 2000, published as WO/2001/038547 on May 31, 2001, the contents of which are incorporated herein by reference. In some embodiments, an NLS comprises the amino acid sequence PKKKRKV (SEQ ID NO: 94),
B1195.70174WO00 12131093.2
MDSLLMNRRKFLYQFKNVRWAKGRRETYLC (SEQ ID NO: 99), KRTADGSEFESPKKKRKV (SEQ ID NO: 97), or KRTADGSEFEPKKKRKV (SEQ ID NO: 106). In other embodiments, an NLS comprises the amino acid sequences NLSKRPAAIKKAGQAKKKK (SEQ ID NO: 107), PAAKRVKLD (SEQ ID NO: 98), RQRRNELKRSF (SEQ ID NO: 108), or NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 109). [0419] In one aspect of the disclosure, a prime editor or other fusion protein may be modified with one or more nuclear localization sequences (NLS), preferably at least two NLSs. In certain embodiments, the fusion proteins are modified with two or more NLSs. The disclosure contemplates the use of any nuclear localization sequence known in the art at the time of the disclosure, or any nuclear localization sequence that is identified or otherwise made available in the state of the art after the time of the instant filing. A representative nuclear localization sequence is a peptide sequence that directs the protein to the nucleus of the cell in which the sequence is expressed. A nuclear localization signal is predominantly basic, can be positioned almost anywhere in a protein's amino acid sequence, generally comprises a short sequence of four amino acids (Autieri & Agrawal, (1998) J. Biol. Chem.273: 14731-37, incorporated herein by reference) to eight amino acids, and is typically rich in lysine and arginine residues (Magin et al., (2000) Virology 274: 11-16, incorporated herein by reference). Nuclear localization sequences often comprise proline residues. A variety of nuclear localization sequences have been identified and have been used to effect transport of biological molecules from the cytoplasm to the nucleus of a cell. See, e.g., Tinland et al., (1992) Proc. Natl. Acad. Sci. U.S.A.89:7442-46; Moede et al., (1999) FEBS Lett.461:229-34, which is incorporated herein by reference. Translocation is currently thought to involve nuclear pore proteins. [0420] Most NLSs can be classified in three general groups: (i) a monopartite NLS exemplified by the SV40 large T antigen NLS (PKKKRKV (SEQ ID NO: 94)); (ii) a bipartite motif consisting of two basic domains separated by a variable number of spacer amino acids and exemplified by the Xenopus nucleoplasmin NLS (KRXXXXXXXXXXKKKL (SEQ ID NO: 110)); and (iii) noncanonical sequences such as M9 of the hnRNP Al protein, the influenza virus nucleoprotein NLS, and the yeast Gal4 protein NLS (Dingwall and Laskey 1991). [0421] Nuclear localization sequences appear at various points in the amino acid sequences of proteins. NLS have been identified at the N-terminus, the C-terminus, and in the central region of
B1195.70174WO00 12131093.2
proteins. Thus, the disclosure provides fusion proteins that may be modified with one or more NLSs at the C-terminus and/or the N-terminus, as well as at internal regions of the fusion protein. The residues of a longer sequence that do not function as component NLS residues should be selected so as not to interfere, for example, tonically or sterically, with the nuclear localization signal itself. Therefore, although there are no strict limits on the composition of an NLS-comprising sequence, in practice, such a sequence can be functionally limited in length and composition. [0422] The present disclosure contemplates any suitable means by which to modify a fusion protein to include one or more NLSs. In one aspect, the fusion proteins may be engineered to express a fusion protein that is translationally fused at its N-terminus or its C-terminus (or both) to one or more NLSs, i.e., to form a prime editor-NLS fusion construct. In other embodiments, a fusion protein-encoding nucleotide sequence may be genetically modified to incorporate a reading frame that encodes one or more NLSs in an internal region of the encoded prime editor. In addition, the NLSs may include various amino acid linkers or spacer regions encoded between the prime editor and the N-terminally, C-terminally, or internally-attached NLS amino acid sequence, e.g., and in the central region of proteins. Thus, the present disclosure also provides for nucleotide constructs, vectors, and host cells for expressing fusion proteins that comprise a prime editor and one or more NLSs, among other components. [0423] The prime editors described herein may also comprise nuclear localization sequences that are linked to a prime editor through one or more linkers, e.g., a polymeric, amino acid, nucleic acid, polysaccharide, chemical, or nucleic acid linker element. The linkers within the contemplated scope of the disclosure are not intended to have any limitations and can be any suitable type of molecule (e.g., polymer, amino acid, polysaccharide, nucleic acid, lipid, or any synthetic chemical linker domain) and can be joined to the prime editor by any suitable strategy that effectuates forming a bond (e.g., covalent linkage, hydrogen bonding) between the prime editor and the one or more NLSs. Linkers [0424] The prime editors used in the systems and methods described herein may include one or more linkers. As defined above, the term “linker,” as used herein, refers to a chemical group or a molecule linking two molecules or moieties, e.g., a binding domain and a cleavage domain of a nuclease. In some embodiments, a linker joins a gRNA binding domain of an RNA-
B1195.70174WO00 12131093.2
programmable nuclease and a polymerase (e.g., a reverse transcriptase). In some embodiments, a linker joins a Cas9 nickase and a reverse transcriptase. Typically, the linker is positioned between, or flanked by, two groups, molecules, or other moieties and connected to each one via a covalent bond, thus connecting the two. In some embodiments, the linker is an amino acid or a plurality of amino acids (e.g., a peptide or protein). In some embodiments, the linker is an organic molecule, group, polymer, or chemical moiety. In some embodiments, the linker is 5-100 amino acids in length, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100, 100-150, or 150-200 amino acids in length. Longer or shorter linkers are also contemplated. [0425] The linker may be as simple as a covalent bond, or it may be a polymeric linker many atoms in length. In certain embodiments, the linker is a polypeptide, or amino acid-based. In other embodiments, the linker is not peptide-like. In certain embodiments, the linker is a covalent bond (e.g., a carbon-carbon bond, disulfide bond, carbon-heteroatom bond, etc.). In certain embodiments, the linker is a carbon-nitrogen bond of an amide linkage. In certain embodiments, the linker is a cyclic or acyclic, substituted or unsubstituted, branched or unbranched, aliphatic or heteroaliphatic linker. In certain embodiments, the linker is polymeric (e.g., polyethylene, polyethylene glycol, polyamide, polyester, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminoalkanoic acid. In certain embodiments, the linker comprises an aminoalkanoic acid (e.g., glycine, ethanoic acid, alanine, beta-alanine, 3- aminopropanoic acid, 4-aminobutanoic acid, 5-pentanoic acid, etc.). In certain embodiments, the linker comprises a monomer, dimer, or polymer of aminohexanoic acid (Ahx). In certain embodiments, the linker is based on a carbocyclic moiety (e.g., cyclopentane, cyclohexane). In other embodiments, the linker comprises a polyethylene glycol moiety (PEG). In other embodiments, the linker comprises amino acids. In certain embodiments, the linker comprises a peptide. In certain embodiments, the linker comprises an aryl or heteroaryl moiety. In certain embodiments, the linker is based on a phenyl ring. The linker may include functionalized moieties to facilitate attachment of a nucleophile (e.g., thiol, amino) from the peptide to the linker. Any electrophile may be used as part of the linker. Exemplary electrophiles include, but are not limited to, activated esters, activated amides, Michael acceptors, alkyl halides, aryl halides, acyl halides, and isothiocyanates.
B1195.70174WO00 12131093.2
[0426] In some other embodiments, the linker comprises the amino acid sequence (GGGGS)n (SEQ ID NO: 84), (G)n (SEQ ID NO: 85), (EAAAK)n (SEQ ID NO: 86), (GGS)n (SEQ ID NO: 87), (SGGS)n (SEQ ID NO: 81), (XP)n (SEQ ID NO: 88), or any combination thereof, wherein n is independently an integer between 1 and 30, and wherein X is any amino acid. In some embodiments, the linker comprises the amino acid sequence (GGS)n (SEQ ID NO: 87), wherein n is 1, 3, or 7. In some embodiments, the linker comprises the amino acid sequence SGSETPGTSESATPES (SEQ ID NO: 89). In some embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESSGGSSGGS (SEQ ID NO: 90). In some embodiments, the linker comprises the amino acid sequence SGGSGGSGGS (SEQ ID NO: 91). In some embodiments, the linker comprises the amino acid sequence SGGS (SEQ ID NO: 82). In other embodiments, the linker comprises the amino acid sequence SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGG S (SEQ ID NO: 83, 60AA). In some embodiments, the linker comprises the amino acid sequence GGS, GGSGGS (SEQ ID NO: 92), GGSGGSGGS (SEQ ID NO: 93), SGGSSGGSSGSETPGTSESATPESSGGSSGGSS (SEQ ID NO: 80), SGSETPGTSESATPES (SEQ ID NO: 89), or SGGSSGGSSGSETPGTSESATPESAGSYPYDVPDYAGSAAPAAKKKKLDGSGSGGSSGG S (SEQ ID NO: 83). [0427] In certain embodiments, linkers may be used to link any of the peptides or peptide domains or moieties of the invention (e.g., a napDNAbp linked or fused to a reverse transcriptase domain, and/or a napDNAbp linked to one or more NLS). Any of the domains of the fusion proteins used in the systems and methods described herein may also be connected to one another through any of the presently described linkers. PEgRNAs [0428] The prime editing systems and methods described herein contemplate the use of any suitable PEgRNAs, e.g., to introduce recombinase recognition sites into a target DNA sequence, such as a genome, using prime editing. PEgRNA architecture [0429] In some embodiments, an extended guide RNA, or pegRNA, used in the prime editing systems and methods disclosed herein includes a spacer sequence (e.g., a ~20 nt spacer sequence) and a gRNA core region, which binds with the napDNAbp. In some embodiments, the
B1195.70174WO00 12131093.2
pegRNA includes an extended RNA segment, i.e., an extension arm, at the 5′ end, i.e., a 5′ extension. In some embodiments, the 5′ extension includes a DNA synthesis template sequence, a primer binding site, and an optional 5-20 nucleotide linker sequence. The RT primer binding site hybridizes to the free 3ʹ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming reverse transcriptase for DNA polymerization in the 5′-3′ direction. [0430] In another embodiment, an extended guide RNA (i.e., a pegRNA) used in the prime editing systems and methods provided herein includes a spacer sequence (e.g., a ~20 nt spacer sequence) and a gRNA core, which binds with the napDNAbp. In some embodiments, the pegRNA includes an extended RNA segment, i.e., an extension arm, at the 3′ end, i.e., a 3′ extension. In some embodiments, the 3′ extension includes a DNA synthesis template sequence, and a reverse transcription primer binding site. The RT primer binding site hybridizes to the free 3′ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming reverse transcriptase for DNA polymerization in the 5′-3′ direction. [0431] In another embodiment, an extended guide RNA (i.e., a pegRNA) used in the prime editing systems and methods provided herein includes a spacer sequence (e.g., a ~20 nt spacer sequence) and a gRNA core, which binds with the napDNAbp. In some embodiments, the pegRNA includes an extended RNA segment, i.e., an extension arm, at an intermolecular position within the gRNA core, i.e., an intramolecular extension. In some embodiments, the intramolecular extension includes a DNA synthesis template sequence, and a reverse transcription primer binding site. The RT primer binding site hybridizes to the free 3′ end that is formed after a nick is formed in the non-target strand of the R-loop, thereby priming reverse transcriptase for DNA polymerization in the 5′-3′ direction. [0432] In one embodiment, the position of the intermolecular RNA extension is not in the spacer sequence of the guide RNA. In another embodiment, the position of the intermolecular RNA extension is in the gRNA core. In still another embodiment, the position of the intermolecular RNA extension is anywhere within the guide RNA molecule except within the spacer sequence, or at a position which disrupts the spacer sequence. In one embodiment, the intermolecular RNA extension is inserted downstream from the 3′ end of the spacer sequence. In another embodiment, the intermolecular RNA extension is inserted at least 1 nucleotide, at least 2 nucleotides, at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10
B1195.70174WO00 12131093.2
nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, or at least 25 nucleotides downstream of the 3′ end of the spacer sequence. [0433] In other embodiments, the intermolecular RNA extension is inserted into the gRNA core, which refers to the portion of a traditional guide RNA corresponding or comprising the tracrRNA, which binds and/or interacts with the napDNAbp, e.g., a Cas9 protein or equivalent thereof (i.e., a different napDNAbp). Preferably the insertion of the intermolecular RNA extension does not disrupt or minimally disrupts the interaction between the tracrRNA portion and the napDNAbp. [0434] The length of the RNA extension (which includes at least the RT template and primer binding site) can be any useful length. In various embodiments, the RNA extension is at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 21 nucleotides, at least 22 nucleotides, at least 23 nucleotides, at least 24 nucleotides, at least 25 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length. [0435] The RT template sequence can also be any suitable length. For example, the RT template sequence can be at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length.
B1195.70174WO00 12131093.2
[0436] In still other embodiments, the reverse transcription primer binding site sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length. [0437] In other embodiments, the optional linker or spacer sequence is at least 3 nucleotides, at least 4 nucleotides, at least 5 nucleotides, at least 6 nucleotides, at least 7 nucleotides, at least 8 nucleotides, at least 9 nucleotides, at least 10 nucleotides, at least 11 nucleotides, at least 12 nucleotides, at least 13 nucleotides, at least 14 nucleotides, at least 15 nucleotides, at least 16 nucleotides, at least 17 nucleotides, at least 18 nucleotides, at least 19 nucleotides, at least 20 nucleotides, at least 30 nucleotides, at least 40 nucleotides, at least 50 nucleotides, at least 60 nucleotides, at least 70 nucleotides, at least 80 nucleotides, at least 90 nucleotides, at least 100 nucleotides, at least 200 nucleotides, at least 300 nucleotides, at least 400 nucleotides, or at least 500 nucleotides in length. [0438] The RT template sequence, in certain embodiments, encodes a single-stranded DNA molecule which is homologous to the non-target strand (and thus, complementary to the corresponding site of the target strand) but includes one or more nucleotide changes, e.g., for introducing a recombinase recognition sequence into a target DNA molecule. The one or more nucleotide changes may include one or more single-base nucleotide changes, one or more deletions, and/or one or more insertions. [0439] The synthesized single-stranded DNA product of the RT template sequence is homologous to the non-target strand except that it contains one or more nucleotide changes. The single-stranded DNA product of the RT template sequence hybridizes in equilibrium with the complementary target strand sequence, thereby displacing the homologous endogenous target strand sequence. The displaced endogenous strand may be referred to in some embodiments as a 5′ endogenous DNA flap species. This 5′ endogenous DNA flap species can be removed by a 5′ flap endonuclease (e.g., FEN1) and the single-stranded DNA product, now hybridized to the
B1195.70174WO00 12131093.2
endogenous target strand, may be ligated, thereby creating a mismatch between the endogenous sequence and the newly synthesized strand. The mismatch may be resolved by the cell’s innate DNA repair and/or replication processes. [0440] In various embodiments, the nucleotide sequence of the RT template sequence corresponds to the nucleotide sequence of the non-target strand that becomes displaced as the 5′ flap species and that overlaps with the site to be edited. [0441] In various embodiments of the extended guide RNAs, the DNA synthesis template sequence may encode a single-strand DNA flap that is complementary to an endogenous DNA sequence adjacent to a nick site, wherein the single-strand DNA flap comprises a desired nucleotide change. The single-stranded DNA flap may displace an endogenous single-strand DNA at the nick site. The displaced endogenous single-strand DNA at the nick site can have a 5′ end and form an endogenous flap, which can be excised by the cell. In various embodiments, excision of the 5′ end endogenous flap can help drive product formation since removing the 5′ end endogenous flap encourages hybridization of the single-strand 3′ DNA flap to the corresponding complementary DNA strand, and the incorporation or assimilation of the desired nucleotide change carried by the single-strand 3′ DNA flap into the target DNA. [0442] The terms “cleavage site,” “nick site,” and “cut site” as used interchangeably herein in the context of prime editing, refer to a specific position in between two nucleotides or two base pairs in the double-stranded target DNA sequence. In some embodiments, the position of a nick site is determined relative to the position of a specific PAM sequence. In some embodiments, the nick site is the particular position where a nick will occur when the double stranded target DNA is contacted with a napDNAbp, e.g., a nickase such as a Cas nickase, that recognizes a specific PAM sequence. For each PEgRNA described herein, a nick site (e.g., the “first nick site” when referred to in the context of PE3, PE5 and similar approaches), is characteristic of the particular napDNAbp to which the gRNA core of the PEgRNA associates with, and is characteristic of the particular PAM required for recognition and function of the napDNAbp. For example, for a PEgRNA that comprises a gRNA core that associates with a SpCas9, the nick site in the phosphodiester bond between bases three (“-3” position relative to the position 1 of the PAM sequence) and four (“-4” position relative to position 1 of the PAM sequence). [0443] In some embodiments, a nick site is in a target strand of the double-stranded target DNA sequence. In some embodiments, a nick site is in a non-target strand of the double-stranded
B1195.70174WO00 12131093.2
target DNA sequence. In some embodiments, the nick site is in a protospacer sequence. In some embodiments, the nick site is adjacent to a protospacer sequence. In some embodiments, a nick site is downstream of a region, e.g., on a non-target strand, that is complementary to a primer binding site of a PEgRNA. In some embodiments, a nick site is downstream of a region, e.g., on a non-target strand, that binds to a primer binding site of a PEgRNA. In some embodiments, a nick site is immediately downstream of a region, e.g., on a non-target strand, that is complementary to a primer binding site of a PEgRNA. In some embodiments, the nick site is upstream of a specific PAM sequence on the non-target strand of the double stranded target DNA, wherein the PAM sequence is specific for recognition by a napDNAbp that associates with the gRNA core of a PEgRNA. In some embodiments, the nick site is downstream of a specific PAM sequence on the non-target strand of the double stranded target DNA, wherein the PAM sequence is specific for recognition by a napDNAbp that associates with the gRNA core of a PEgRNA. In some embodiments, the nick site is 3 nucleotides upstream of the PAM sequence, and the PAM sequence is recognized by a Streptococcus pyogenes Cas9 nickase, a P. lavamentivorans Cas9 nickase, a C. diphtheriae Cas9 nickase, a N. cinerea Cas9, a S. aureus Cas9, or a N. lari Cas9 nickase. In some embodiments, the nick site is 3 nucleotides upstream of the PAM sequence, and the PAM sequence is recognized by a Cas9 nickase, wherein the Cas9 nickase comprises a nuclease active HNH domain and a nuclease inactive RuvC domain. In some embodiments, the nick site is 2 base pairs upstream of the PAM sequence, and the PAM sequence is recognized by a S. thermophilus Cas9 nickase. [0444] In various embodiments of the extended guide RNAs, the cellular repair of the single- strand DNA flap results in installation of the desired nucleotide change, thereby forming a desired product. [0445] In still other embodiments, the desired nucleotide change is installed in an editing window that is between about -5 to +5 of the nick site, or between about -10 to +10 of the nick site, or between about -20 to +20 of the nick site, or between about -30 to +30 of the nick site, or between about -40 to +40 of the nick site, or between about -50 to +50 of the nick site, or between about -60 to +60 of the nick site, or between about -70 to +70 of the nick site, or between about -80 to +80 of the nick site, or between about -90 to +90 of the nick site, or between about -100 to +100 of the nick site, or between about -200 to +200 of the nick site.
B1195.70174WO00 12131093.2
[0446] In other embodiments, the desired nucleotide change is installed in an editing window that is between about +1 to +2 from the nick site, or about +1 to +3, +1 to +4, +1 to +5, +1 to +6, +1 to +7, +1 to +8, +1 to +9, +1 to +10, +1 to +11, +1 to +12, +1 to +13, +1 to +14, +1 to +15, +1 to +16, +1 to +17, +1 to +18, +1 to +19, +1 to +20, +1 to +21, +1 to +22, +1 to +23, +1 to +24, +1 to +25, +1 to +26, +1 to +27, +1 to +28, +1 to +29, +1 to +30, +1 to +31, +1 to +32, +1 to +33, +1 to +34, +1 to +35, +1 to +36, +1 to +37, +1 to +38, +1 to +39, +1 to +40, +1 to +41, +1 to +42, +1 to +43, +1 to +44, +1 to +45, +1 to +46, +1 to +47, +1 to +48, +1 to +49, +1 to +50, +1 to +51, +1 to +52, +1 to +53, +1 to +54, +1 to +55, +1 to +56, +1 to +57, +1 to +58, +1 to +59, +1 to +60, +1 to +61, +1 to +62, +1 to +63, +1 to +64, +1 to +65, +1 to +66, +1 to +67, +1 to +68, +1 to +69, +1 to +70, +1 to +71, +1 to +72, +1 to +73, +1 to +74, +1 to +75, +1 to +76, +1 to +77, +1 to +78, +1 to +79, +1 to +80, +1 to +81, +1 to +82, +1 to +83, +1 to +84, +1 to +85, +1 to +86, +1 to +87, +1 to +88, +1 to +89, +1 to +90, +1 to +90, +1 to +91, +1 to +92, +1 to +93, +1 to +94, +1 to +95, +1 to +96, +1 to +97, +1 to +98, +1 to +99, +1 to +100, +1 to +101, +1 to +102, +1 to +103, +1 to +104, +1 to +105, +1 to +106, +1 to +107, +1 to +108, +1 to +109, +1 to +110, +1 to +111, +1 to +112, +1 to +113, +1 to +114, +1 to +115, +1 to +116, +1 to +117, +1 to +118, +1 to +119, +1 to +120, +1 to +121, +1 to +122, +1 to +123, +1 to +124, or +1 to +125 from the nick site. [0447] In still other embodiments, the desired nucleotide change is installed in an editing window that is between about +1 to +2 from the nick site, or about +1 to +5, +1 to +10, +1 to +15, +1 to +20, +1 to +25, +1 to +30, +1 to +35, +1 to +40, +1 to +45, +1 to +50, +1 to +55, +1 to +100, +1 to +105, +1 to +110, +1 to +115, +1 to +120, +1 to +125, +1 to +130, +1 to +135, +1 to +140, +1 to +145, +1 to +150, +1 to +155, +1 to +160, +1 to +165, +1 to +170, +1 to +175, +1 to +180, +1 to +185, +1 to +190, +1 to +195, or +1 to +200, from the nick site. [0448] In various aspects, the extended guide RNAs are modified versions of an extended guide RNA. pegRNAs (i.e., extended guide RNAs) and ngRNAs may be expressed from an encoding nucleic acid, or synthesized chemically. Methods are well known in the art for obtaining or otherwise synthesizing guide RNAs, and for determining the appropriate sequence of the pegRNA, including the protospacer sequence, which interacts and hybridizes with the target strand of a genomic target site of interest. [0449] In various embodiments, the particular design aspects of a pegRNA sequence and ngRNA sequence will depend upon the nucleotide sequence of a genomic target site of interest
B1195.70174WO00 12131093.2
(i.e., the desired site to be edited) and the type of napDNAbp (e.g., Cas9 protein) present in the prime editing systems utilized in the methods and compositions described herein, among other factors, such as PAM sequence locations, percent G/C content in the target sequence, the degree of microhomology regions, secondary structures, etc. [0450] In general, a spacer sequence (i.e., a guide sequence) of a pegRNA or ngRNA can be any polynucleotide sequence having sufficient complementarity with a target polynucleotide sequence to hybridize with the target sequence and direct sequence-specific binding of a napDNAbp (e.g., a Cas9, Cas9 homolog, or Cas9 variant) to the target sequence. In some embodiments, the degree of complementarity between a guide sequence and its corresponding target sequence, when optimally aligned using a suitable alignment algorithm, is about or more than about 50%, 60%, 75%, 80%, 85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determined with the use of any suitable algorithm for aligning sequences, non-limiting examples of which include the Smith-Waterman algorithm, the Needleman-Wunsch algorithm, algorithms based on the Burrows-Wheeler Transform (e.g., the Burrows Wheeler Aligner), ClustalW, Clustal X, BLAT, Novoalign (Novocraft Technologies, ELAND (Illumina, San Diego, Calif.), SOAP (available at soap.genomics.org.cn), and Maq (available at maq.sourceforge.net). In some embodiments, a guide sequence is about or more than about 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 75, or more nucleotides in length. [0451] In some embodiments, a guide sequence is less than about 75, 50, 45, 40, 35, 30, 25, 20, 15, 12, or fewer nucleotides in length. The ability of a guide sequence to direct sequence-specific binding of a prime editor to a target sequence may be assessed by any suitable assay. For example, the components of a prime editor, including the guide sequence to be tested, may be provided to a host cell having the corresponding target sequence, such as by transfection with vectors encoding the components of a prime editor disclosed herein, followed by an assessment of preferential cleavage within the target sequence, such as by Surveyor assay as described herein. Similarly, cleavage of a target polynucleotide sequence may be evaluated in a test tube by providing the target sequence, components of a prime editor, including the guide sequence to be tested and a control guide sequence different from the test guide sequence, and comparing binding or rate of cleavage at the target sequence between the test and control guide sequence reactions. Other assays are possible, and will occur to those skilled in the art.
B1195.70174WO00 12131093.2
[0452] A guide sequence may be selected to target any target sequence. In some embodiments, the target sequence is a sequence within a genome of a cell. Exemplary target sequences include those that are unique in the target genome. For example, for the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGG where NNNNNNNNNNNNXGG (N is A, G, T, or C; and X can be anything). A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGG where NNNNNNNNNNNXGG (N is A, G, T, or C; and X can be anything). For the S. thermophilus CRISPR1Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXXAGAAW where NNNNNNNNNNNNXXAGAAW (N is A, G, T, or C; X can be anything; and W is A or T). A unique target sequence in a genome may include an S. thermophilus CRISPR 1 Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXXAGAAW where NNNNNNNNNNNXXAGAAW (N is A, G, T, or C; X can be anything; and W is A or T). For the S. pyogenes Cas9, a unique target sequence in a genome may include a Cas9 target site of the form MMMMMMMMNNNNNNNNNNNNXGGXG where NNNNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything). A unique target sequence in a genome may include an S. pyogenes Cas9 target site of the form MMMMMMMMMNNNNNNNNNNNXGGXG where NNNNNNNNNNNXGGXG (N is A, G, T, or C; and X can be anything). In each of these sequences “M” may be A, G, T, or C, and need not be considered in identifying a sequence as unique. [0453] In some embodiments, a guide sequence is selected to reduce the degree of secondary structure within the guide sequence. Secondary structure may be determined by any suitable polynucleotide folding algorithm. Some programs are based on calculating the minimal Gibbs free energy. An example of one such algorithm is mFold, as described by Zuker and Stiegler (Nucleic Acids Res.9 (1981), 133-148). Another example folding algorithm is the online webserver RNAfold, developed at Institute for Theoretical Chemistry at the University of Vienna, using the centroid structure prediction algorithm (see, e.g., A. R. Gruber et al., 2008, Cell 106(1): 23-24; and PA Carr and GM Church, 2009, Nature Biotechnology 27(12): 1151-62). Further algorithms may be found in U.S. Application Ser. No.61/836,080, incorporated herein
B1195.70174WO00 12131093.2
by reference. In some embodiments, silent mutations are introduced in a guide sequence in order to alter its secondary structure and increase the efficiency of prime editing. [0454] In some embodiments, the scaffold or gRNA core portion of a pegRNA comprises sequences corresponding to the tracr sequence and tracr mate sequence of a traditional guide RNA. In general, a tracr mate sequence includes any sequence that has sufficient complementarity with a tracr sequence to promote one or more of: (1) excision of a guide sequence flanked by tracr mate sequences in a cell containing the corresponding tracr sequence; and (2) formation of a complex at a target sequence, wherein the complex comprises the tracr mate sequence hybridized to the tracr sequence. In general, degree of complementarity is with reference to the optimal alignment of the tracr mate sequence and tracr sequence, along the length of the shorter of the two sequences. Optimal alignment may be determined by any suitable alignment algorithm, and may further account for secondary structures, such as self- complementarity within either the tracr sequence or tracr mate sequence. In some embodiments, the degree of complementarity between the tracr sequence and tracr mate sequence along the length of the shorter of the two when optimally aligned is about or more than about 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In some embodiments, the tracr sequence is about or more than about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, or more nucleotides in length. In some embodiments, the tracr sequence and tracr mate sequence are contained within a single transcript, such that hybridization between the two produces a transcript having a secondary structure, such as a hairpin. Preferred loop forming sequences for use in hairpin structures are four nucleotides in length, and most preferably have the sequence GAAA. However, longer or shorter loop sequences may be used, as may alternative sequences. The sequences preferably include a nucleotide triplet (for example, AAA), and an additional nucleotide (for example C or G). Examples of loop forming sequences include CAAA and AAAG. In an embodiment of the invention, the transcript or transcribed polynucleotide sequence has at least two or more hairpins. In preferred embodiments, the transcript has two, three, four or five hairpins. In a further embodiment of the invention, the transcript has at most five hairpins. In some embodiments, the single transcript further includes a transcription termination sequence; preferably this is a polyT sequence, for example six T nucleotides. Further non-limiting examples of single polynucleotides comprising a guide sequence, a tracr mate sequence, and a tracr sequence are as follows (listed 5′ to 3′), where “N” represents a base of a
B1195.70174WO00 12131093.2
guide sequence, the first block of lower case letters represent the tracr mate sequence, and the second block of lower case letters represent the tracr sequence, and the final poly-T sequence represents the transcription terminator: (1)NNNNNNNNGTTTTTGTACTCTCAAGATTTAGAAATAAATCTTGCAGAAGCTACAAA GATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTCGTTAT TTAATTTTTT (SEQ ID NO: 113); (2)NNNNNNNNNNNNNNNNNNGTTTTTGTACTCTCAGAAATGCAGAAGCTACAAAGAT AAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTCGTTATTTA ATTTTTT (SEQ ID NO: 114); (3)NNNNNNNNNNNNNNNNNNNNGTTTTTGTACTCTCAGAAATGCAGAAGCTACAAA GATAAGGCTTCATGCCGAAATCAACACCCTGTCATTTTATGGCAGGGTGTTTTTT (SEQ ID NO: 115); (4)NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTT (SEQ ID NO: 116); (5)NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGG CTAGTCCGTTATCAACTTGAAAAAGTGTTTTTTT (SEQ ID NO: 117); and (6) NNNNNNNNNNNNNNNNNNNNGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCT AGTCCGTTATCATTTTTTTT (SEQ ID NO: 118). [0455] In some embodiments, sequences (1) to (3) are used in combination with Cas9 from S. thermophilus CRISPR1. In some embodiments, sequences (4) to (6) are used in combination with Cas9 from S. pyogenes. In some embodiments, the tracr sequence is a separate transcript from a transcript comprising the tracr mate sequence. [0456] It will be apparent to those of skill in the art that in order to target any of the fusion proteins comprising a Cas9 domain and a single-stranded DNA binding protein, as disclosed herein, to a target site, e.g., a site at which a recombinase recognition sequence is to be introduced, it is typically necessary to co-express the fusion protein together with a guide RNA, e.g., an sgRNA. As explained in more detail elsewhere herein, a guide RNA typically comprises a tracrRNA framework allowing for Cas9 binding, and a guide sequence, which confers sequence specificity to the Cas9:nucleic acid editing enzyme/domain fusion protein.
B1195.70174WO00 12131093.2
[0457] In some embodiments, a pegRNA comprises a structure 5ʹ-[guide sequence]- GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAAGGCUAGUCCGUUAUCAACUUG AAAAAGUGGCACCGAGUCGGUGCUUUUU (SEQ ID NO: 119)-extension arm-3ʹ, wherein the guide sequence comprises a sequence that is complementary to the target sequence. The guide sequence, also referred to herein as the spacer sequence, is typically 20 nucleotides long. The sequences of suitable guide RNAs for targeting Cas9:nucleic acid editing enzyme/domain fusion proteins to specific genomic target sites will be apparent to those of skill in the art based on the instant disclosure. Such suitable guide RNA sequences typically comprise guide sequences that are complementary to a nucleic acid sequence within 50 nucleotides upstream or downstream of the target nucleotide to be edited. Some exemplary guide RNA sequences suitable for targeting any of the provided fusion proteins to specific target sequences are provided herein. Additional guide sequences are well known in the art and can be used with the prime editors utilized in the methods and compositions described herein. [0458] In some embodiments, a PEgRNA comprises three main component elements ordered in the 5ʹ to 3ʹ direction, namely: a spacer, a gRNA core, and an extension arm at the 3ʹ end. In some embodiments, the extension arm may further be divided into the following structural elements in the 5ʹ to 3ʹ direction, namely: an edit template, a homology arm, and a primer binding site. In some embodiments, the extension arm may further be divided into the following structural elements in the 5ʹ to 3ʹ direction, namely: a homology arm, an edit template, and a primer binding site. In some embodiments, the extension arm may further be divided into the following structural elements in the 5ʹ to 3ʹ direction, namely: a DNA synthesis template (e.g., a RT template), and a primer binding site. In addition, the PEgRNA may comprise an optional 3ʹ end modifier region and an optional 5ʹ end modifier region . Still further, the PEgRNA may comprise a transcriptional termination signal at the 3ʹ end of the PEgRNA. These structural elements are further defined herein. The depiction of the structure of the PEgRNA is not meant to be limiting and embraces variations in the arrangement of the elements. For example, the optional sequence modifiers and could be positioned within or between any of the other regions shown, and not limited to being located at the 3ʹ and 5ʹ ends. PEgRNA modifications [0459] The PEgRNAs may also include additional design modifications that may alter the properties and/or characteristics of PEgRNAs, thereby improving the efficacy of prime editing.
B1195.70174WO00 12131093.2
In various embodiments, these modifications may belong to one or more of a number of different categories, including but not limited to: (1) designs to enable efficient expression of functional PEgRNAs from non-polymerase III (pol III) promoters, which would enable the expression of longer PEgRNAs without burdensome sequence requirements; (2) modifications to the core, Cas9-binding PEgRNA scaffold, which could improve efficacy; (3) modifications to the PEgRNA to improve RT processivity, allowing the insertion of longer sequences at targeted genomic loci; and (4) addition of RNA motifs to the 5ʹ or 3ʹ termini of the PEgRNA that improve PEgRNA stability, enhance RT processivity, prevent misfolding of the PEgRNA, or recruit additional factors important for genome editing. Such modifications are described further, for example, in PCT publication WO 2022/067130, which is incorporated herein by reference. Pharmaceutical compositions [0460] Other aspects of the present disclosure relate to pharmaceutical compositions comprising any of the evolved Bxb1 recombinases, guide RNAs (including, e.g., PEgRNAs and ePEgRNAs), prime editors, and polynucleotides described herein. The term “pharmaceutical composition”, as used herein, refers to a composition formulated for pharmaceutical use. In some embodiments, the pharmaceutical composition further comprises a pharmaceutically acceptable carrier. In some embodiments, the pharmaceutical composition comprises additional agents (e.g., for specific delivery, increasing half-life, or other therapeutic compounds). [0461] As used herein, the term “pharmaceutically-acceptable carrier” means a pharmaceutically-acceptable material, composition or vehicle, such as a liquid or solid filler, diluent, excipient, manufacturing aid (e.g., lubricant, talc magnesium, calcium or zinc stearate, or steric acid), or solvent encapsulating material, involved in carrying or transporting the compound from one site (e.g., the delivery site) of the body, to another site (e.g., an organ, tissue, or other part of the body). A pharmaceutically acceptable carrier is “acceptable” in the sense of being compatible with the other ingredients of the formulation and not injurious to the tissue of the subject (e.g., physiologically compatible, sterile, physiologic pH, etc.). Some examples of materials that can serve as pharmaceutically-acceptable carriers include: (1) sugars, such as lactose, glucose and sucrose; (2) starches, such as corn starch and potato starch; (3) cellulose, and its derivatives, such as sodium carboxymethyl cellulose, methylcellulose, ethyl cellulose, microcrystalline cellulose and cellulose acetate; (4) powdered tragacanth; (5) malt; (6) gelatin; (7) lubricating agents, such as magnesium stearate, sodium lauryl sulfate and talc; (8) excipients,
B1195.70174WO00 12131093.2
such as cocoa butter and suppository waxes; (9) oils, such as peanut oil, cottonseed oil, safflower oil, sesame oil, olive oil, corn oil, and soybean oil; (10) glycols, such as propylene glycol; (11) polyols, such as glycerin, sorbitol, mannitol and polyethylene glycol (PEG); (12) esters, such as ethyl oleate and ethyl laurate; (13) agar; (14) buffering agents, such as magnesium hydroxide and aluminum hydroxide; (15) alginic acid; (16) pyrogen-free water; (17) isotonic saline; (18) Ringer's solution; (19) ethyl alcohol; (20) pH buffered solutions; (21) polyesters, polycarbonates and/or polyanhydrides; (22) bulking agents, such as polypeptides and amino acids; (23) serum component, such as serum albumin, HDL, and LDL; (22) C2-C12 alcohols, such as ethanol; and (23) other non-toxic compatible substances employed in pharmaceutical formulations. Wetting agents, coloring agents, release agents, coating agents, sweetening agents, flavoring agents, perfuming agents, preservatives, and antioxidants can also be present in the formulation. The terms such as “excipient,” “carrier,” “pharmaceutically acceptable carrier,” or the like are used interchangeably herein. [0462] In some embodiments, the pharmaceutical composition is formulated for delivery to a subject, e.g., for gene editing. Suitable routes of administering the pharmaceutical composition described herein include, without limitation: topical, subcutaneous, transdermal, intradermal, intralesional, intraarticular, intraperitoneal, intravesical, transmucosal, gingival, intradental, intracochlear, transtympanic, intraorgan, epidural, intrathecal, intramuscular, intravenous, intravascular, intraosseus, periocular, intratumoral, intracerebral, and intracerebroventricular administration. [0463] In some embodiments, the pharmaceutical composition described herein is administered locally to a diseased site (e.g., tumor site). In some embodiments, the pharmaceutical composition described herein is administered to a subject by injection, by means of a catheter, by means of a suppository, or by means of an implant, the implant being of a porous, non-porous, or gelatinous material, including a membrane, such as a sialastic membrane, or a fiber. [0464] In other embodiments, the pharmaceutical composition described herein is delivered in a controlled release system. In one embodiment, a pump may be used (see, e.g., Langer, 1990, Science 249:1527-1533; Sefton, 1989, CRC Crit. Ref. Biomed. Eng.14:201; Buchwald et al., 1980, Surgery 88:507; Saudek et al., 1989, N. Engl. J. Med.321:574). In another embodiment, polymeric materials can be used. (See, e.g., Medical Applications of Controlled Release (Langer and Wise eds., CRC Press, Boca Raton, Fla., 1974); Controlled Drug Bioavailability, Drug
B1195.70174WO00 12131093.2
Product Design and Performance (Smolen and Ball eds., Wiley, New York, 1984); Ranger and Peppas, 1983, Macromol. Sci. Rev. Macromol. Chem.23:61. See also Levy et al., 1985, Science 228:190; During et al., 1989, Ann. Neurol.25:351; Howard et al., 1989, J. Neurosurg.71:105). Other controlled release systems are discussed, for example, in Langer, supra. [0465] In some embodiments, the pharmaceutical composition is formulated in accordance with routine procedures as a composition adapted for intravenous or subcutaneous administration to a subject, e.g., a human. In some embodiments, pharmaceutical compositions for administration by injection are solutions in sterile isotonic aqueous buffer. Where necessary, the pharmaceutical composition can also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the pharmaceutical composition is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the pharmaceutical composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients can be mixed prior to administration. [0466] A pharmaceutical composition for systemic administration may be a liquid, e.g., sterile saline, lactated Ringer’s or Hank’s solution. In addition, the pharmaceutical composition can be in solid forms and re-dissolved or suspended immediately prior to use. Lyophilized forms are also contemplated. [0467] The pharmaceutical composition can be contained within a lipid particle or vesicle, such as a liposome or microcrystal, which is also suitable for parenteral administration. The particles can be of any suitable structure, such as unilamellar or plurilamellar, so long as compositions are contained therein. Compounds can be entrapped in “stabilized plasmid-lipid particles” (SPLP) containing the fusogenic lipid dioleoylphosphatidylethanolamine (DOPE), low levels (5-10 mol%) of cationic lipid and stabilized by a polyethyleneglycol (PEG) coating (Zhang Y. P. et al., Gene Ther.1999, 6:1438-47). Positively charged lipids such as N-[1-(2,3-dioleoyloxi)propyl]- N,N,N-trimethyl-amoniummethylsulfate, or “DOTAP,” are particularly preferred for such particles and vesicles. The preparation of such lipid particles is well known. See, e.g., U.S.
B1195.70174WO00 12131093.2
Patent Nos.4,880,635; 4,906,477; 4,911,928; 4,917,951; 4,920,016; and 4,921,757; each of which is incorporated herein by reference. [0468] The pharmaceutical compositions described herein may be administered or packaged as a unit dose, for example. The term “unit dose” when used in reference to a pharmaceutical composition of the present disclosure refers to physically discrete units suitable as unitary dosage for the subject, each unit containing a predetermined quantity of active material calculated to produce the desired therapeutic effect in association with the required diluent; i.e., carrier, or vehicle. [0469] Further, the pharmaceutical composition can be provided as a pharmaceutical kit comprising (a) a container containing a compound of the invention in lyophilized form and (b) a second container containing a pharmaceutically acceptable diluent (e.g., sterile water) for injection. The pharmaceutically acceptable diluent can be used for reconstitution or dilution of the lyophilized compound of the invention. Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use, or sale for human administration. [0470] In another aspect, an article of manufacture containing materials useful for the treatment of the diseases described above is included. In some embodiments, the article of manufacture comprises a container and a label. Suitable containers include, for example, bottles, vials, syringes, and test tubes. The containers may be formed from a variety of materials such as glass or plastic. In some embodiments, the container holds a composition that is effective for treating a disease and may have a sterile access port. For example, the container may be an intravenous solution bag or a vial having a stopper pierce-able by a hypodermic injection needle. The active agent in the composition is a compound of the invention. In some embodiments, the label on or associated with the container indicates that the composition is used for treating the disease of choice. The article of manufacture may further comprise a second container comprising a pharmaceutically-acceptable buffer, such as phosphate-buffered saline, Ringer's solution, or dextrose solution. It may further include other materials desirable from a commercial and user standpoint, including other buffers, diluents, filters, needles, syringes, and package inserts with instructions for use.
B1195.70174WO00 12131093.2
Polynucleotides, Vectors, Kits, and Cells [0471] In some aspects, the present disclosure provides polynucleotides and vectors encoding any of the Bxb1 recombinases described herein. In some aspects, the present disclosure provides polynucleotides and vectors encoding prime editors and pegRNAs as disclosed herein. In some embodiments, the polynucleotides and vectors provided herein comprise DNA. In some embodiments, the polynucleotides and vectors provided herein comprise RNA. In some embodiments, the polynucleotides and vectors provided herein consist of DNA. [0472] The evolved Bxb1 recombinases, guide RNAs (including pegRNAs and epegRNAs), prime editors, and compositions of the present disclosure may be assembled into kits. In some embodiments, the kit comprises polynucleotides for expression of the evolved Bxb1 recombinases, prime editors, and/or pegRNAs, and epegRNAs described herein. In other embodiments, the kit further comprises appropriate guide nucleotide sequences or nucleic acid vectors for the expression of such guide nucleotide sequences, to target the Cas9 protein of the prime editors to the desired target sequence, e.g., to introduce a recombinase recognition sequence at the target site. [0473] The kits described herein may include one or more containers housing components for performing the methods described herein, and optionally instructions for use. Any of the kits described herein may further comprise components needed for performing the prime editing methods described herein. Each component of the kits, where applicable, may be provided in liquid form (e.g., in solution) or in solid form, (e.g., a dry powder). In certain cases, some of the components may be reconstitutable or otherwise processible (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water), which may or may not be provided with the kit. [0474] In some embodiments, the kits may optionally include instructions and/or promotion for use of the components provided. As used herein, “instructions” can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc. The written instructions may be in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological
B1195.70174WO00 12131093.2
products, which can also reflect approval by the agency of manufacture, use, or sale for animal administration. As used herein, “promoted” includes all methods of doing business including methods of education, hospital and other clinical instruction, scientific inquiry, drug discovery or development, academic research, pharmaceutical industry activity including pharmaceutical sales, and any advertising or other promotional activity including written, oral, and electronic communication of any form, associated with the disclosure. Additionally, the kits may include other components depending on the specific application, as described herein. [0475] The kits may contain any one or more of the components described herein in one or more containers. The components may be prepared sterilely, packaged in a syringe, and shipped refrigerated. Alternatively, they may be housed in a vial or other container for storage. A second container may have other components prepared sterilely. Alternatively, the kits may include the active agents premixed and shipped in a vial, tube, or other container. [0476] The kits may have a variety of forms, such as a blister pouch, a shrink-wrapped pouch, a vacuum sealable pouch, a sealable thermoformed tray, or a similar pouch or tray form, with the accessories loosely packed within the pouch, one or more tubes, containers, a box, or a bag. The kits may be sterilized after the accessories are added, thereby allowing the individual accessories in the container to be otherwise unwrapped. The kits can be sterilized using any appropriate sterilization techniques, such as radiation sterilization, heat sterilization, or other sterilization methods known in the art. The kits may also include other components, depending on the specific application, for example, containers, cell media, salts, buffers, reagents, syringes, needles, a fabric, such as gauze, for applying or removing a disinfecting agent, disposable gloves, a support for the agents prior to administration, etc. Some aspects of this disclosure provide kits comprising a nucleic acid construct comprising a nucleotide sequence encoding the prime editor systems described herein, or various components thereof (e.g., including, but not limited to, the napDNAbps, reverse transcriptase domains, and pegRNAs/epegRNAs). In some embodiments, the nucleotide sequence(s) comprises a heterologous promoter (or more than a single promoter) that drives expression of the prime editor system components. [0477] Other aspects of this disclosure provide kits comprising one or more nucleic acid constructs encoding the various components of the prime editing system described herein. In some embodiments, the nucleotide sequence comprises a heterologous promoter that drives expression of the prime editing system components.
B1195.70174WO00 12131093.2
[0478] Cells that may contain any of the recombinases, guide RNAs, prime editors, and/or compositions described herein include prokaryotic cells and eukaryotic cells. The methods described herein may be used to deliver a recombinase and a prime editor and guide RNA into a eukaryotic cell (e.g., a mammalian cell, such as a human cell). In some embodiments, the cell is in vitro (e.g., a cultured cell). In some embodiments, the cell is in vivo (e.g., in a subject, such as a human subject). In some embodiments, the cell is ex vivo (e.g., isolated from a subject and may be administered back to the same or a different subject). [0479] Mammalian cells of the present disclosure include human cells, primate cells (e.g., vero cells), rat cells (e.g., GH3 cells, OC23 cells) or mouse cells (e.g., MC3T3 cells). There are a variety of human cell lines, including, without limitation, human embryonic kidney (HEK) cells, HeLa cells, cancer cells from the National Cancer Institute's 60 cancer cell lines (NCI60), DU145 (prostate cancer) cells, Lncap (prostate cancer) cells, MCF-7 (breast cancer) cells, MDA- MB-438 (breast cancer) cells, PC3 (prostate cancer) cells, T47D (breast cancer) cells, THP-1 (acute myeloid leukemia) cells, U87 (glioblastoma) cells, SHSY5Y human neuroblastoma cells (cloned from a myeloma) and Saos-2 (bone cancer) cells. In some embodiments, prime editors and/or guide RNAs are delivered into human embryonic kidney (HEK) cells (e.g., HEK293 or HEK293T cells). In some embodiments, prime editors and/or guide RNAs are delivered into stem cells (e.g., human stem cells) such as, for example, pluripotent stem cells (e.g., human pluripotent stem cells including human induced pluripotent stem cells (hiPSCs)). A stem cell refers to a cell with the ability to divide for indefinite periods in culture and to give rise to specialized cells. A pluripotent stem cell refers to a type of stem cell that is capable of differentiating into all tissues of an organism, but not alone capable of sustaining full organismal development. A human induced pluripotent stem cell refers to a somatic (e.g., mature or adult) cell that has been reprogrammed to an embryonic stem cell-like state by being forced to express genes and factors important for maintaining the defining properties of embryonic stem cells (see, e.g., Takahashi and Yamanaka, Cell 126 (4): 663–76, 2006, incorporated by reference herein). Human induced pluripotent stem cells express stem cell markers and are capable of generating cells characteristic of all three germ layers (ectoderm, endoderm, mesoderm). [0480] Additional non-limiting examples of cell lines that may be used in accordance with the present disclosure include 293-T, 293-T, 3T3, 4T1, 721, 9L, A-549, A172, A20, A253, A2780, A2780ADR, A2780cis, A431, ALC, B16, B35, BCP-1, BEAS-2B, bEnd.3, BHK-21, BR 293,
B1195.70174WO00 12131093.2
BxPC3, C2C12, C3H-10T1/2, C6, C6/36, Cal-27, CGR8, CHO, CML T1, CMT, COR-L23, COR-L23/5010, COR-L23/CPR, COR-L23/R23, COS-7, COV-434, CT26, D17, DH82, DU145, DuCaP, E14Tg2a, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, Hepa1c1c7, High Five cells, HL-60, HMEC, HT-29, HUVEC, J558L cells, Jurkat, JY cells, K562 cells, KCL22, KG1, Ku812, KYO1, LNCap, Ma-Mel 1, 2, 3....48, MC-38, MCF-10A, MCF-7, MDA-MB-231, MDA-MB-435, MDA-MB-468, MDCK II, MG63, MONO- MAC 6, MOR/0.2R, MRC5, MTD-1A, MyEnd, NALM-1, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NW-145, OPCN/OPCT Peer, PNT-1A/PNT 2, PTK2, Raji, RBL cells, RenCa, RIN-5F, RMA/RMAS, S2, Saos-2 cells, Sf21, Sf9, SiHa, SKBR3, SKOV-3, T-47D, T2, T84, THP1, U373, U87, U937, VCaP, WM39, WT-49, X63, YAC-1, and YAR cells. [0481] Some aspects of this disclosure provide cells comprising any of the constructs disclosed herein. In some embodiments, a host cell is transiently or non-transiently transfected with one or more vectors described herein. In some embodiments, a cell is transfected as it naturally occurs in a subject. In some embodiments, a cell that is transfected is taken from a subject. In some embodiments, the cell is derived from cells taken from a subject, such as a cell line. A wide variety of cell lines for tissue culture are known in the art. Examples of cell lines include, but are not limited to, C8161, CCRF-CEM, MOLT, mIMCD-3, NHDF, HeLa-S3, Huh1, Huh4, Huh7, HUVEC, HASMC, HEKn, HEKa, MiaPaCell, Panc1, PC-3, TF1, CTLL-2, C1R, Rat6, CV1, RPTE, A10, T24, J82, A375, ARH-77, Calu1, SW480, SW620, SKOV3, SK-UT, CaCo2, P388D1, SEM-K2, WEHI-231, HB56, TIB55, Jurkat, J45.01, LRMB, Bcl-1, BC-3, IC21, DLD2, Raw264.7, NRK, NRK-52E, MRC5, MEF, Hep G2, HeLa B, HeLa T4, COS, COS-1, COS-6, COS-M6A, BS-C-1 monkey kidney epithelial, BALB/3T3 mouse embryo fibroblast, 3T3 Swiss, 3T3-L1, 132-d5 human fetal fibroblasts; 10.1 mouse fibroblasts, 293-T, 3T3, 721, 9L, A2780, A2780ADR, A2780cis, A 172, A20, A253, A431, A-549, ALC, B16, B35, BCP-1 cells, BEAS- 2B, bEnd.3, BHK-21, BR 293, BxPC3, C3H-10T1/2, C6/36, Cal-27, CHO, CHO-7, CHO-IR, CHO-K1, CHO-K2, CHO-T, CHO Dhfr −/−, COR-L23, COR-L23/CPR, COR-L23/5010, COR- L23/R23, COS-7, COV-434, CML T1, CMT, CT26, D17, DH82, DU145, DuCaP, EL4, EM2, EM3, EMT6/AR1, EMT6/AR10.0, FM3, H1299, H69, HB54, HB55, HCA2, HEK-293, HeLa, Hepa1c1c7, HL-60, HMEC, HT-29, Jurkat, JY cells, K562 cells, Ku812, KCL22, KG1, KYO1, LNCap, Ma-Mel 1-48, MC-38, MCF-7, MCF-10A, MDA-MB-231, MDA-MB-468, MDA-MB-
B1195.70174WO00 12131093.2
435, MDCK II, MDCK 11, MOR/0.2R, MONO-MAC 6, MTD-1A, MyEnd, NCI-H69/CPR, NCI-H69/LX10, NCI-H69/LX20, NCI-H69/LX4, NIH-3T3, NALM-1, NW-145, OPCN/OPCT cell lines, Peer, PNT-1A/PNT 2, RenCa, RIN-5F, RMA/RMAS, Saos-2 cells, Sf-9, SkBr3, T2, T-47D, T84, THP1 cell line, U373, U87, U937, VCaP, Vero cells, WM39, WT-49, X63, YAC-1, YAR, and transgenic varieties thereof. [0482] Cell lines are available from a variety of sources known to those with skill in the art (see, e.g., the American Type Culture Collection (ATCC) (Manassus, Va.)). In some embodiments, a cell transfected with one or more vectors described herein is used to establish a new cell line comprising one or more vector-derived sequences. In some embodiments, a cell transiently transfected with the components of a CRISPR system as described herein (such as by transient transfection of one or more vectors, or transfection with RNA), and modified through the activity of a CRISPR complex, is used to establish a new cell line comprising cells containing the modification but lacking any other exogenous sequence. In some embodiments, cells transiently or non-transiently transfected with one or more vectors described herein, or cell lines derived from such cells, are used in assessing one or more test compounds. EXAMPLES Example 1. Evolution of Bxb1 Recombinase Variants [0483] Large serine recombinases (LSRs) are a group of DNA integrases encoded by bacteriophage that can integrate large DNA molecules into bacterial genomes. They harbor tremendous potential for precise gene integration to study gene regulation, generate transgenic models, and treat genetic diseases. Various applications of LSRs are listed in FIG.1. The biggest barrier preventing the use of LSRs for large gene insertion is its low DNA integration efficiency. Previous reports have shown that the most widely used LSR, Bxb1, has an integration efficiency of around 20% in mammalian cells. Due to its complex mechanism of action and large size, rational mutagenesis of Bxb1 to enhance enzymatic activity is a difficult process, and directed evolution platforms are better suited for this purpose. Thus, phage-assisted continuous and non- continuous evolution (PACE and PANCE) were used to enhance the integration efficiency of the Bxb1 recombinase. [0484] To perform the evolution, two evolution circuits were first designed and established: 1) a plasmid integration circuit, and 2) a cassette exchange circuit. The schematic and description of
B1195.70174WO00 12131093.2
the two circuits are noted in FIGs.2A-2B. In both selection circuits, M13 bacteriophage survival was linked with the successful integration of a promoter sequence in front of gIII, a gene necessary for phage infectivity and detachment. Using these circuits, seven individual evolution campaigns were performed (FIG.3), each consisting of multiple rounds of selection, and many convergent mutations were observed (FIG.4). The variants from these evolutions were then cloned into mammalian expression vectors and transfected into HEK293T cells to test their performance. The variants were tested in cell lines that have the Bxb1 attachment sites installed in them (FIGs.5, 12, 13, and 14) as well as in the one-pot system (FIGs.6, 8, and 10). The one- pot system refers to the transfection of prime editor, pegRNAs, recombinase, and donor DNA at the same time. The prime editor first installs the Bxb1 attachment into the genomic locus of interest. The recombinase then integrates the desired DNA cargo. This method allows for precise, programmable gene integration in mammalian cells. [0485] Through evolution, > 80 different Bxb1 variants that exhibit improved activity over the wild-type (WT) enzyme were observed (FIGs.7, 9, 11, and 15). Some of these variants improve Bxb1 activity by greater than 3-fold. The activity of Bxb1 was also further improved by rationally combining mutations from the evolution (FIGs.16, 17, and 18). Rationally combined variants with improved activity are listed in FIG.19. [0486] To further improve the performance of Bxb1 for programmable gene integration, plasmid dosage and ratios were also optimized, and new-generation prime editors were tested (FIGs.20, 21, 24, and 25). These optimized systems offer ~20% integration efficiency in the one-pot system. Lastly, the RT templates of the dual pegRNAs used to install the attachment site were also optimized to reduce background recombination that occurs between the pegRNA and donor plasmids (FIGs.22 and 23). Collectively, multiple Bxb1 variants with improved integration efficiencies were generated, and these variants can be harnessed for applications including programmable gene integration and other targeted genetic changes (e.g., DNA inversion) in mammalian cells. Example 2. Phage-assisted evolution of Bxb1 recombinase enhances prime editing-mediated programmable large-gene integration in mammalian cells [0487] Mutations that contribute to diseases in humans range from single nucleotide changes to large deletions, inversions, translocations, and duplications1-3. For many genetic diseases, a
B1195.70174WO00 12131093.2
variety of loss-of-function mutations within a specific gene can cause pathogenesis: for instance, more than 500 ABCA4 gene variants, 1,000 PAH gene variants, and 2,000 CFTR gene variants have been reported in patients with Stargardt disease, phenylketonuria, and cystic fibrosis, respectively4-6. The ability to integrate full-length healthy genes or cDNAs into their endogenous loci in principle could serve as a single therapeutic strategy to serve patients with many different pathogenic alleles. Integration into the native locus could preserve physiological control of gene expression, evading overexpression issues associated with viral vector-mediated gene therapy that can induce pathology7-10. [0488] Motivated by this potential, the development of technologies that can efficiently and precisely integrate large DNA sequences into the mammalian genome in live cells at specified target sites has been a longstanding goal of the field11. Although programmable nucleases followed by either random end-joining or homology-directed repair can be used to perform targeted DNA integration, these approaches require double-stranded breaks that can induce undesired consequences such as target locus deletion or chromosomal translocations, suffer from low integration efficiencies, and typically generate a high frequency of uncontrolled indels, reversed-orientation cargo byproducts, and multimeric insertions12-20.The recent discovery and characterization of CRISPR-associated transposase systems (CASTs) show great promise for programmable integration but currently suffer from low efficiency in mammalian cells (≤2% genomic integration for Type-I CAST systems21,22, while Type-V-K CAST systems have thus far not achieved integration into mammalian cell genomic DNA targets23,24). [0489] Recently, prime editing-assisted site-specific integrase gene editing (PASSIGE), a platform that uses prime editing and site-specific recombinases to integrate multi-kilobase DNA cargo into targeted sites in the mammalian genome with up to 6.8% efficiency following a single-transfection, was reported25 (FIG.26A). In PASSIGE, single-flap or dual-flap prime editing installs a site-specific recombinase landing site into a target genomic location25,26. The corresponding recombinase then catalyzes the insertion of the cargo DNA into the landing site, resulting in targeted integration. PASSIGE can be performed with a single-transfection by simultaneously delivering the prime editor, pegRNA(s), nicking sgRNA, donor DNA plasmid, and recombinase (LSR), or using two successive transfections to perform the prime editing step and recombination at different times. A similar method that uses prime editing and site-specific recombinases, PASTE, was later described by Yarnall, Ioannidi, Schmitt-Ulms, and coworkers27.
B1195.70174WO00 12131093.2
[0490] Current programmable, large gene-integration platforms with reported activity in mammalian cells exhibit low or modest integration efficiencies. In PASSIGE, even though the recombinase attachment site is effectively installed into the genome using dual-flap PE (typically >50%), overall integration efficiencies remain modest (2.6%-6.8%), indicating that the recombination step mediated by Bxb1 recombinase strongly constrains integration yields in mammalian cells25. Indeed, previous data has demonstrated that in a clonal HEK293T cell line with a homozygous recombinase attachment site, Bxb1-mediated recombination results in a maximum of 17% integration25. These observations suggest the opportunity to improve recombinase-mediated targeted gene integration through evolving and engineering recombinase enzymes. [0491] Presented in this Example is the development of a phage-assisted continuous and non- continuous evolution (PACE and PANCE)28,29 selection for recombinase activity and used it to evolve Bxb1 to yield much higher PASSIGE efficiencies. Among dozens of Bxb1 variants with improved activity, the best evolved variant, evoBxb1, achieved a 3.2-fold average improvement in genomic integration efficiencies in human cells at pre-installed recombinase attachment sites. Evolved mutations from different domains were also combined to generate an even more active variant, eeBxb1. [0492] The use of PASSIGE with evoBxb1 or eeBxb1 is referred to herein as evoPASSIGE or eePASSIGE. Across 12 genomic loci in mammalian cells, evoPASSIGE and eePASSIGE demonstrate a 2.7-fold and 4.2-fold average improvement in targeted large DNA integration efficiencies over PASSIGE, and both variants strongly outperform PASTE in side-by-side comparisons by an average of 9.1-fold and 16-fold. PASSIGE variants developed in this study can achieve >30% integration of multi-kb gene-sized cargo at both safe-harbor and therapeutic loci following a single transfection. Results Development of a recombinase PACE circuit [0493] PACE and PANCE28,29 are methods for rapidly evolving proteins with diverse functions33 (FIG.26B). During PACE and PANCE, gene III, which encodes pIII, a protein essential for phage replication is removed from the M13 filamentous bacteriophage and replaced with the protein being evolved to generate the selection phage (SP). In a fixed-volume vessel (‘lagoon’), the SP infects host Escherichia coli cells that harbor accessory plasmids linking the activity of
B1195.70174WO00 12131093.2
the protein being evolved to the expression of gene III, and a mutagenesis plasmid (MP) that constantly mutagenizes the phage genome after infection. During PACE, the SP is continuously diluted with fresh host E. coli cells. Only SP that encode proteins with the desired activity persist within the lagoon, while those that encode inactive variants are rapidly diluted out of the lagoon28. During PANCE, the SP is diluted in discrete steps, typically every 12-24 hours. PANCE is less stringent than PACE and can be helpful in the early phases of evolution when variants have low initial activity29. [0494] To link the activity of Bxb1-mediated recombination to gene III expression and subsequent phage propagation, two selection circuits were developed (FIG.26C). In both circuits, the selection phage encodes Bxb1, and the host E. coli cells harbor a plasmid P1 with a promoter sequence and a plasmid P2 with a promoter-less gene III cassette. In circuit 1, the promoter sequence and the sequence upstream of gene III are both placed between two recombinase attachment sites. Upon expression of Bxb1, two recombination events occur between P1 and P2 such that the promoter is placed in front of gene III, driving gene III expression. In contrast, circuit 2 has one recombinase attachment site present in each plasmid, and a single Bxb1-mediated recombination event integrates plasmids P1 and P2, also resulting in placement of the promoter upstream of gene III. It was anticipated that circuit 1 would be more stringent than circuit 2 as two recombination events are required for phage propagation in circuit 1. [0495] To identify the best evolution strategy to evolve Bxb1, four different sub-circuits (1.1- 1.4) were established for circuit 1 and two different sub-circuits (2.1-2.2) were established for circuit 2 (FIG.31). In sub-circuits 1.1, 1.2, 2.1, and 2.2 attB, one DNA landing site substrate for Bxb1, was placed in plasmid P1, and attP, the partner DNA landing site substrate for Bxb1, was placed in plasmid P2. In sub-circuits 1.3 and 1.4, attP was instead placed in plasmid P1, and attB was placed in plasmid P2. It is known that the central dinucleotide of the attachment site for the Bxb1 recombinase can either be GT or GA34. To test this variable, sub-circuits 1.2, 1.4, and 2.2 contained a GA instead of the canonical GT central dinucleotide. Evolution of the Bxb1 recombinase [0496] Next, PANCE of the Bxb1 recombinase was performed in all six sub-circuits in parallel. Throughout the evolution, stringency was increased by reducing the time between serial dilution events from 12 to 4 hours and increasing dilution ratios between passages from 50:1 to 5000:1.
B1195.70174WO00 12131093.2
After six PANCE passages, phage across all six PANCE lagoons began to propagate from approximately 1-fold to >20,000-fold overnight, suggesting that they had enriched Bxb1 variants with improved activity (FIG.26D and FIG.31). Sequencing of individual phage revealed mutational convergence, even across different circuits (FIGs.40A-40B), although the number of mutations that emerged was sparse, suggesting the opportunity for further improvement through evolution. [0497] Since the evolution trajectory of sub-circuit 1.3 suggested it to be the most stringent among those tested, the evolution campaign was continued using this circuit. Selection stringency was further increased by decreasing gIII expression, and PANCE was continued for four additional passages (FIG.27A and FIGs.32A-32C). The resulting phage pools emerging from PANCE were evolved for an additional 132 hours using PACE (FIG.32B). Throughout PACE, the selection stringency was increased in the lagoons by increasing the flow rate from 0.5 to 3.0 volume/hour. Finally, the phage pools surviving PACE were subjected to six additional passages of PANCE on a more stringent circuit in which the size of the P1 donor plasmid was increased from 3.2 kB to 6.5 kB (FIG.32C). Overall, selection phage encoding Bxb1 emerging after the entire evolution process described above survived an average total dilution of ~10150, and sequencing of individual phage revealed a high degree of mutational convergence (FIGs.41- 43). Characterization of evolved variants in mammalian cells with pre-installed attachment sites [0498] 40 unique evolved Bxb1 variants were cloned into mammalian expression vectors and tested in human HEK293T cells generated by prime editing that were either homozygous for attP insertion at AAVS1 or homozygous for attB insertion at CCR5. These evolved variants were tested alongside wild-type (WT) Bxb1 used in PASSIGE and a catalytically inactive Bxb1 variant (dead Bxb1, S10A, Y154C)35. Clonal HEK293T cells were co-transfected with the recombinase plasmid along with a 5.6-kB donor plasmid containing either an attP or attB landing site. After 72 hours, cells were harvested, and integration efficiencies were assessed by droplet digital polymerase chain reaction (ddPCR). Nearly all evolved variants (39/40) showed enhanced integration efficiencies over WT Bxb1, and the top 15 variants showed improvements exceeding 2.4-fold (FIG.27B and Table 6). Notably, the highest activity evolved recombinase (Bxb1 V105I) supported 60% and 39% integration efficiencies at genomic sites AAVS1 and
B1195.70174WO00 12131093.2
CCR5, respectively, compared to 18% and 12% with WT Bxb1, corresponding to a 3.3-fold average improvement (FIG.27C). Mapping beneficial mutations onto the AlphaFold-predicted structure of Bxb1 [0499] To hypothesize potential roles of the evolved mutations that improve integration efficiencies, the mutations were mapped onto an AlphaFold236 predicted structure of the Bxb1 recombinase. The N-terminal domain (NTD), C-terminal domain-a (CTD-a), and C-terminal domain-b (CTD-b), of the recombinase all aligned well with previously solved structures of serine recombinases (PDB: 1ZR437, 6DNW38, and 4KIS39) (FIG.33A). Despite being the smallest domain, the catalytic NTD harbored 30 unique mutations, more than any other domain. The DNA-binding CTD-a and CTD-b harbored 12 and 13 distinct mutations, respectively, and the linker connecting the two domains contained two mutations (FIG.27D). [0500] Interestingly, the top 15 performing variants all contained a mutation in the NTD, and upon docking the DNA substrate of gammadelta resolvase tetramer37 (PDB: 1ZR4) onto the predicted structure of the NTD, it was realized that most of these mutations are predicted to be in flexible loops of the enzyme, close to the active site and the DNA substrate (FIG.33B). All mutated residues that were present in the flexible regions of the enzyme were also surface- exposed (FIG.33C). Other conserved mutations that led to the highest improvements in integration efficiencies, including V105I, L29F, V74A, and W35L, were all clustered at the core of the protein (FIG.27E). The position of these mutations suggests that they likely stabilize the core of the NTD: for example, the change of Val to Ile at position 105 and Leu to Phe at position 29 may help stabilize the bulky, hydrophobic Trp residue at position 35 (FIG.33D). Collectively, these observations suggest that the evolved variants may enhance integration by optimizing the conformation of active site residues or by improving protein stability. Characterization of evolved variants for PASSIGE [0501] As reported previously25, reducing the 3' flap overlap between the two twinPE pegRNAs improves PASSIGE integration efficiencies due to reduced recombination between the donor DNA plasmid and pegRNA-encoding plasmid (FIG.34A). To identify the ideal overlap length for installation of either attP or attB into the human genome, HEK293T cells were co-transfected with plasmids encoding PEmax and twinPE pegRNAs with overlap lengths ranging from 8 bp to 50 bp for attP, and from 12 bp to 38 bp for attB installation. It was found that 3' flap overlap lengths could be truncated up to 28-bp for attP and 20-bp for attB installation without any
B1195.70174WO00 12131093.2
decrease in installation efficiencies or increase in indels at both the AAVS1 and CCR5 loci (FIG. 34B). Unless otherwise stated in Tables 10A-10D, all experiments described below used an overlap length of 28-bp and 20-bp for attP and attB installation, respectively. Using these overlap lengths greatly reduces the complexity of pegRNA design as it defines the reverse transcriptase template (RTT, or DNA synthesis template) sequence when installing either attachment site into the genome. [0502] Next, the ten highest-efficiency Bxb1 variants were picked from the experiment in FIG. 27C and were tested in PASSIGE at the AAVS1 and CCR5 loci in which prime editing installed either an attP or attB sequence into the genome, respectively. HEK293T cells were co- transfected with a 5.6-kB donor DNA plasmid along with plasmids encoding either WT Bxb1 or an evolved Bxb1 variant, PEmax, and dual pegRNAs for twinPE25. After 72 hours, integration efficiencies were assessed using ddPCR. All ten variants showed improvements >2-fold compared to WT Bxb1, and the highest-performing variant, Bxb1 V74A, showed a 2.8-fold and 3.9-fold improvement in integration efficiency at the AAVS1 and CCR5 loci respectively (FIG. 28A). Hereinafter, the Bxb1 V74A variant is referred to as evoBxb1, and the use of evoBxb1 for PASSIGE is referred to as evoPASSIGE. Combining evolved mutations to further enhance integration efficiency [0503] Since evoBxb1 as well as several other top-performing variants harbor only a single mutation, often located in the NTD of the enzyme, the integration efficiencies of Bxb1 variants were next evaluated with combined evolved NTD, CTD-a, and CTD-b mutations (FIG.27D).19 triple-mutant variants were generated that individually harbored one mutation in each domain of the protein and were tested alongside WT Bxb1 in cell lines that either have pre-installed attP or attB at the AAVS1 or CCR5 loci respectively. In both cell lines, combining evoBxb1 with E229K and V375I resulted in the highest integration efficiencies when knocking in a 5.6-kB donor DNA plasmid (FIG.35A and Table 7). Hereinafter, this evolved and engineered triple-mutant variant is referred to as eeBxb1, and the use of eeBxb1 for PASSIGE is referred to as eePASSIGE. [0504] The performance of single-transfection eePASSIGE, evoPASSIGE, and PASSIGE were compared side-by-side at the AAVS1 and CCR5 sites in HEK293T cells, and the highest integration efficiencies were observed with eePASSIGE: 30% and 24% at the AAVS1 and CCR5 loci respectively compared to 26% and 21% with evoPASSIGE, and 9.3% and 5.5% with PASSIGE (FIG.28B). In mouse N2a cells at the safe-harbor locus Rosa26, an even more
B1195.70174WO00 12131093.2
pronounced difference was observed: eePASSIGE integrated a 5.6-kB donor DNA with 20% efficiency compared to 9.5% with evoPASSIGE and 3.2% with PASSIGE, corresponding to a 2.9-fold and 6.2-fold improvement in integration respectively (FIG.28B). For the Rosa26 site, PE6d40 was used instead of PEmax to install the attB sequence. Off-target profiling of evolved Bxb1 variants [0505] To assess off-target integration of the new Bxb1 variants, HEK293T cells were co- transfected with either dead Bxb1, WT Bxb1, evoBxb1, or eeBxb1 along with an attP- or attB- containing donor DNA plasmid encoding mCherry. Post-transfection, cells were passaged for two weeks, and then flow cytometry was performed to assess the percentage of mCherry+ cells. It was reasoned that after two weeks, most of the donor DNA plasmid would get diluted out, and mCherry expression detected above background (as measured by the signal with dead Bxb1) could be attributed to stable integration of the mCherry cassette into an off-target site in the genome. [0506] A very low percentage of mCherry+ cells was observed for dead Bxb1, and no statistically significant integration above background was observed for the WT Bxb1 or evoBxb1 when transfecting either donor plasmid (P > 0.4), indicating the absence of off-target activity (FIG.28C). For eeBxb1, a significant increase in mCherry expression above background was observed when transfecting an attP-donor (P < 0.001), suggesting that this highly active variant may recognize and integrate its cargo into attB-resembling sequences in the genome. In contrast, no off-target activity was detected when transfecting eeBxb1 with an attB-donor (P=0.2). This discrepancy may be attributed to the fact that the minimal attP sequence required for recombination is 10-bp longer than that for attB, making its occurrence in random DNA approximately 410 times rarer. The identities of these ten bases in attP are also the most important for Bxb1-mediated recombination41. [0507] To identify which CTD mutations contribute to off-target integration when transfecting an attP-donor with eeBxb1, variants V74A (evoBxb1), V74A+E229K, V74A+V375I, and V74A+E229K+V375I (eeBxb1) were tested with dead Bxb1. E229K was identified as the cause for off-target integration since the addition of this mutation significantly increased mCherry expression above background (P < 0.0001) (FIG.35B). Docking the DNA substrate from the Listeria innocua prophage serine recombinase38 (PDB: 6DNW) onto the AlphaFold236 predicted structure of the CTD-a domain of Bxb1 suggests that this negatively charged Glu side chain is
B1195.70174WO00 12131093.2
located near the DNA substrate (FIG.35C). Mutating this residue to a positively charged lysine may increase the affinity of the recombinase to the negatively charged DNA phosphate backbone in a sequence-independent manner, resulting in more potent DNA engagement and increased integration efficiency but reduced sequence specificity. [0508] Given these results, it is recommended that evoPASSIGE be used when installing attB into the genome and eePASSIGE be used when installing attP into the genome. These strategies should offer the highest integration efficiencies while minimizing off-target events. Further characterization of PASSIGE variants [0509] To determine whether the identity of the attachment site in the genome affects integration, PE6 prime editor variants40 were screened to further optimize installation efficiencies (FIG.36), and then donor integration when installing either attP or attB into the genome was evaluated. When installing attP, higher integration efficiencies were observed at AAVS1 and ACTB but not CCR5 and Rosa26, indicating that the choice of attP or attB landing site that should be installed using prime editing is locus-dependent (FIG.29A). [0510] Next, PASSIGE variants were compared side-by-side with PASTE, a similar technology reported to have improved integration efficiencies over PASSIGE. PASTE differs from PASSIGE by using 1) a pegRNA scaffold mutant previously described by Wu and coworkers42 (atgRNAv2), 2) a different linker (an XTEN-48 linker) between the Cas9 and reverse transcriptase (RT), 3) addition of the L139P mutation that was previously characterized26 to the engineered M-MLV RT in PE2, and 4) a mutated attP sequence27. Each of these optimizations were systematically tested in PASSIGE, evoPASSIGE, and eePASSIGE systems, but no consistent improvements in targeted integration were observed across multiple genomic loci in HEK293T cells (FIGs.37A-37C, 37E). The atgRNAv2 scaffold slightly improved integration on a case-by-case basis (FIG.37A), but the Cas9–RT linker, L139P mutation in the M-MLV RT, and attP mutant reduced integration efficiencies across all sites tested (FIGs.37B-37C, 37E). [0511] In PASTE, fusing the recombinase to the prime editor protein was reported to substantially improve integration27. However, when WT Bxb1, evoBxb1, or eeBxb1 were fused to PEmax, a substantial decrease in integration efficiencies was observed in all cases from fused prime editor–recombinase compared to unfused prime editor + recombinase (FIG.37D). This trend persisted when the recombinase was replaced in PASTE with Bxb1 variants generated in this Example (FIG.37F). These observations are consistent with the mechanism of prime editing
B1195.70174WO00 12131093.2
and recombinase-mediated integration, in which the prime editor must vacate the target site before the recombinase can act on the installed sequence. Indeed, when PASTE was directly compared with PASSIGE, evoPASSIGE, and eePASSIGE across three genomic loci (AAVS1, CCR5, and ACTB) in human HEK293T cells, it was observed that PASSIGE, evoPASSIGE, and eePASSIGE all outperformed PASTE by an average of 2.5-fold, 6.1-fold, and 10-fold, respectively (FIG.29B). Characterization of PASSIGE variants at therapeutic sites [0512] The above experiments evaluated evolved Bxb1 variants at safe-harbor loci AAVS1, CCR5, and Rosa2643 and at the highly expressed essential gene ACTB44. Next, the ability of PASSIGE to integrate gene-sized cargo into eight therapeutically relevant endogenous genomic sites was tested in HEK293T and N2a cells. These sites included (i) ALB, a highly expressed gene in the liver that has been used to express clinically relevant protein levels for loss-of- function diseases30,45, (ii) B2M and TRAC, which have been used to express chimeric antigen receptors for CAR-T cell therapy46, and (iii) CFTR, GBA1, COL7A1, FANCA, and Smn1 implicated in cystic fibrosis4, Gaucher disease47, Parkinson’s disease48, dystrophic epidermolysis bullosa49, Fanconi anemia50, and spinal muscular atrophy51. PegRNAs were designed and PE6 variants were tested to install both attB and attP into each locus (FIG.38). For B2M and TRAC, the attachment sites were installed into the 5' untranslated region surrounding the start codon, since disruption of these genes increases therapeutic potency46. For COL7A1, the attachment sites were installed into intron 4 of the gene as the majority of disease-causing mutations are located after exon 449. For all other genes, the attachment site was installed into intron 1. [0513] Next, a 5.6-kB donor plasmid was integrated into all eight therapeutically relevant loci using PASSIGE, evoPASSIGE, eePASSIGE, and PASTE. Evo- and ee-PASSIGE showed substantial improvements in integration over PASSIGE at all eight target sites, and PASSIGE substantially outperformed PASTE at all sites tested (FIG.29C). Across all eight target sites in human and mouse cells, evoPASSIGE, eePASSIGE, PASSIGE, and PASTE mediated targeted donor DNA integration with an average efficiency of 22%, 17%, 7.8%, and 3.8%, respectively. [0514] Then, the fold-change in integration efficiencies of evolved Bxb1 variants was tested across all 12 genomic sites used in this Example. Averaged across all 12 sites, evoPASSIGE and eePASSIGE outperformed PASSIGE by 2.7-fold and 4.2-fold, respectively, (FIG.29D). PASSIGE, evoPASSIGE, and eePASSIGE outperformed PASTE by an average of 3.3-fold, 9.1-
B1195.70174WO00 12131093.2
fold, and 16.2-fold, respectively (FIG.29E). When using strategies with no detected off-target integration, greater than 15% cargo knock-in was observed at all 12 sites using either evo- or ee- PASSIGE (FIG.29F). Notably, at AAVS1, ACTB, and FANCA, greater than 30% integration was observed, and at B2M, GBA1, COL7A1, CFTR, Smn1, and CCR5, greater than 20% integration was observed. Overall, these findings demonstrate that evoPASSIGE and eePASSIGE exceed targeted multi-kb donor DNA integration efficiencies in human and mouse cells over previously reported methods at multiple safe-harbor and therapeutic loci. Integration of therapeutic DNA cargo using PASSIGE variants [0515] Next, therapeutic gene cargoes were integrated into multiple genomic loci optimized in FIG.29C using PASSIGE, eePASSIGE, and PASTE. Across all sites, either PEmax or a PE6 variant were used to install the attP recombinase landing site (FIG.30A). The following were integrated: 1) a 6.1 kB plasmid encoding GBA1 cDNA (∆ exon 1) into intron 1 of GBA1, 2) an 8.8 kB plasmid encoding FANCA cDNA (∆ exon 1) into intron 1 of FANCA, 3) a 5.9 kB plasmid encoding a CD19-CAR cassette52,53 into the 5' UTR regions of TRAC and B2M, 4) a 7.1 kB plasmid encoding the human Factor IX (F9) cDNA (∆ exon 1) into intron 1 of ALB, and 5) a 5.3- kB plasmid encoding Smn1 cDNA (∆ exon 1) into intron 1 of Smn1. In all five cases, eePASSIGE resulted in the highest integration efficiencies (32% average integration and a minimum of 23% integration across all sites). At GBA1, B2M, and ALB, therapeutic cargoes were integrated with >30%, and at FANCA, cargo knock-in reached 46% (FIG.30A). Consistent with the above observations in other genomic loci and human cell types, PASTE yielded the lowest integration efficiencies among all tested methods, averaging 4.4%. [0516] To assess if the knock-in of therapeutic cargoes led to successful protein production, the integration and expression of F9 was next assessed in hepatocyte-derived HuH7 cells where Albumin is expressed. AttB was installed into intron 1 of ALB, and then integrated the F9 cDNA (∆ exon 1) using PASSIGE variants along with a minicircle DNA donor that is free of bacterial DNA sequences54. The F9 minicircle encoded an attP motif, a splice acceptor, followed by the F9 cDNA (∆ exon 1) and 3' UTR sequence. After cargo knock-in at intron 1, splicing between the secretion signal of ALB exon 1 and the integrated F9 cDNA leads to F9 expression and release25,45. Next, ELISA was performed on conditioned media to detect human F9 expression 9 days after transfection. Average F9 levels of 0.79, 4.0, and 6.9 ng/mL were observed upon
B1195.70174WO00 12131093.2
PASSIGE, evoPASSIGE, and eePASSIGE treatment, respectively. EvoPASSIGE and eePASSIGE thus showed overall 5.0-fold and 8.7-fold higher F9 expression than PASSIGE (FIG.30B). [0517] Collectively, these results demonstrate that evoPASSIGE and eePASSIGE are robust, programmable large DNA integration platforms capable of mediating targeted gene integration at a wide variety of therapeutically relevant loci with efficiencies suitable for many therapeutic applications. Discussion [0518] Targeted integration of large DNA payloads into the genome has been a long-standing challenge, and existing approaches to this goal, such as PASSIGE25, PASTE27, CRISPR- associated transposases19-24, and nuclease-mediated integration12,13, suffer from low efficiencies or safety concerns. Here, phage-assisted evolution was used to substantially enhance the activity of the Bxb1 recombinase for large DNA cargo integration in mammalian cells. In HEK293T cells with pre-installed recombinase attachment sites, evolved Bxb1 variants demonstrate substantial improvements, achieving up to 60% integration of a 5.6-kB plasmid compared to 18% observed with the WT enzyme. The evolved mutations may help improve integration by improving enzyme solubility, catalysis, and/or attachment site binding. [0519] When combined with prime editing installation of recombinase landing sites in the PASSIGE system, the evoBxb1 variant (V74A) demonstrates a 2.7-fold average improvement in donor integration across 12 sites from a single catalytic domain mutation, while the eeBxb1 variant (V74A, E229K, V375I) that was generated by rationally combining evolved mutations from distinct domains of the enzyme demonstrates a 4.2-fold average improvement over PASSIGE. No off-target integration was detected for evoBxb1 or eeBxb1 when delivering an attB-containing donor. However, when delivering an attP-containing donor, off-target integration was detected for eeBxb1, indicating that this highly active recombinase may have acquired the ability to use attB-like pseudosites in the genome. [0520] Given these observations, it is recommended that evoBxb1 be used when installing attB, and eeBxb1 be used when installing attP into the genome. Due to the further enhanced activity of eeBxb1, the second strategy usually leads to higher integration efficiencies (at 10 of 12 sites in FIG.29F). At the two target sites where the first strategy was preferred, attB installation was on average 2.2-fold higher than attP, suggesting that the bottleneck in this configuration when using
B1195.70174WO00 12131093.2
eePASSIGE is landing site installation, not Bxb1-mediated recombination (FIGs.37A-37F). Because of this locus dependency, it is recommended that both attB and attP genomic installation be tested during optimization. [0521] Evo- and ee-PASSIGE show improvements in multi-kB targeted DNA integration at all 12 genomic loci tested in three mammalian cell lines, can efficiently integrate cDNA cassettes into six therapeutically relevant endogenous genomic sites in human cells, and can integrate gene cargoes that produce functional protein. Additionally, PASSIGE variants show large improvements over other programmable gene integration methods including PASTE, with evo- and ee-PASSIGE offering a 9.1-fold and 16.2-fold average improvement in integration across all 12 sites tested in human and mouse cells, respectively. Indeed, PASTE did not outperform PASSIGE at any site tested in this study and exhibited on average 3.3-fold lower donor knock-in. PASTE also installed recombinase landing sites 2.0-fold less efficiently on average than PEmax, the prime editor typically used in PASSIGE, across seven genomic sites tested (FIG.38). It is noted that one of the most common ddPCR probes used to quantify PASTE integration efficiency in the original report at multiple sites27 does not exclusively report integrated DNA product formation and shows high background in negative controls lacking prime editor or using dead Bxb1 recombinase (FIGs.39A-39B). This background may partially explain the disparity of PASTE performance provided herein compared to the previous report. [0522] Collectively, the evolved and engineered Bxb1 variants generated in this study enable high levels of programmable large gene integration, consistently achieving over 30% (and up to 46%) donor gene knock-in at safe harbor and therapeutically relevant loci. Methods General methods and molecular cloning [0523] Gibson assembly was used to clone all plasmids. Briefly, for Gibson cloning, fragments were obtained from PCR amplification, plasmid vector digestion, or synthetic gene fragments and assembled using NEBuilder Hifi DNA assembly master mix (New England Biolabs). PCR was performed using Phusion U Hot Start II DNA polymerase (Thermo Fisher Scientific), Phusion U Green Multiplex PCR Master Mix (Thermo Fisher Scientific), or Q5 Hot Start High- Fidelity 2× Master Mix (New England Biolabs). DNA oligonucleotides were obtained from either Integrated DNA Technologies (IDT) or Eton-Biosciences. Synthetic gene fragments were obtained from either IDT or Genscript. Plasmids for mammalian expression of Bxb1 variants
B1195.70174WO00 12131093.2
were cloned into the pCMV-Bxb1 vector backbone (Addgene, #182142). Plasmids expressing pegRNAs were cloned by assembling PCR-amplified pegRNA backbone (forward primer: 5'- GCTCGAGGTACCTCTCTA-3' (SEQ ID NO: 126), reverse primer: 5'- GAAATACTTTCAAGTTACGG-3' (SEQ ID NO: 127)) or BsaI-digested pegRNA backbone (Addgene, #132777) and pegRNA encoding eblocks ordered from IDT. DNA donor plasmids for mammalian cell experiments were cloned by assembling PCR amplified fragments or synthetic gene fragments into either Factor IX donor vector backbone (Addgene, # 182141) or attB-puro donor vector backbone (Addgene, #181923) digested by restriction enzymes. All prime editor variants used in this study (PEmax, PE6b-d) are available on Addgene (#174820, 207852- 207854). Constructs for PASTE experiments were obtained from Addgene: PASTE v3 (#179105), PASTE DNA donor plasmid (#179115), ACTB atgRNA (#179108), and ACTB nicking sgRNA (#179109). All vectors for mammalian cell experiments were purified using Plasmid Plus Midiprep kits (Qiagen), QIAprep Spin Miniprep kits, or Qiagen Plasmid Plus 96 Miniprep Kit. Sequences of all pegRNA and sgRNA constructs, and PE6 variants used in this work are listed in Tables 10A-10D. General mammalian cell culture conditions [0524] HEK293T cells (American Type Culture Collection (ATCC) CRL-3216), N2A cells (ATCC, CCL-131), HuH7 cells (originated from ATCC), and HEK293T clonal cell lines with either pre-installed attP at AAVS1 or attB at CCR5 were cultured in Dulbecco’s Modified Eagle Medium (DMEM) plus GlutaMAX (Thermo Fisher Scientific) supplemented with 10% (v/v) fetal bovine serum (FBS) (Thermo Fisher Scientific). Clonal cell lines were generated using twin prime editing as previously described25. All cell lines were maintained and cultured at 37 °C with 5% CO2, authenticated by their respective suppliers, and tested negative for mycoplasma. Phage plaquing [0525] Plaque assays were performed to check for phage that cheat the selection (for example by integrating gIII onto the SP), to measure phage titers, and for bacteriophage cloning. An overnight culture of host cells was diluted by 50-fold in Davis Rich Medium (DRM) with carbenicillin and grown at 37 °C with shaking at 225 rpm until OD600 reached 0.3-0.8. Phage were serially diluted by a factor of 10 in water, up to 106-fold, and four different dilutions were then chosen for plaquing. Plates for plaquing were made by pipetting ~ 1 mL of molten 2×YT agar mixed with 0.04% Bluo-gal (Gold Biotechnologies) into a 12-well plate (Corning). Top
B1195.70174WO00 12131093.2
agar was made by combining 2×YT media and agar (2:1 ratio) and stored at 55 °C until use. To plaque, 100-µL of host cells, 10 µL of serially diluted phage, and 500-µL of top agar were mixed and quickly added onto the solid agar in the 12-well plate. After the top agar solidified, plates were incubated overnight at 37 °C. Preparation and transformation of chemically competent cells [0526] Strain S2060 was used for all evolution experiments. To make competent cells, an overnight culture of bacteria was diluted by 50-fold into 30 mL of 2×YT media with the appropriate antibiotics and grown at 37 °C with shaking at 225 rpm until OD600 reached 0.3-0.4. The cells were centrifuged for 10 minutes at 4,000 g at 4 °C, and the pellet was resuspended in 3 mL of cold TSS media (LB media supplemented with 5% v/v DMSO, 10% w/v PEG 3350, and 20 mM MgCl2) on ice. The resuspended cells were aliquoted into 100 µL volumes, frozen in dry ice, and stored at −80 °C. To transform cells with the appropriate plasmids, 1-5 µL of each plasmid, 20 µL of 5×KCM solution (500mM KCl, 150 mM CaCl2, and 250 mM MgCL2), 100 µL of chemically competent cells, and 80 µL of water were mixed and incubated on ice for 10 minutes. Cells were heat-shocked at 42 °C for 90 seconds, then 1 mL of SOC media (New England Biolabs) was added for recovery. Cells were recovered at 37 °C with shaking at 225 rpm for 1-2 hours before plating. Bacteriophage cloning [0527] Cloning of Bxb1 phage was performed using Gibson assembly of PCR fragments, as previously described56. Following assembly, the reaction was transformed into chemically competent S2060 E. coli host cells containing plasmid pJC175e which encodes gIII under the phage-shock promoter and allows for activity-independent phage propagation57. After transformation, the cloned phage in E. coli was grown first for 15 minutes in DRM media without antibiotics at 37 °C, and then overnight in media with carbenicillin. Bacteria were centrifuged for 3 minutes at 8,000 g and plaqued in host strain S2060 transformed with pJC175e. The next day, individual plaques were picked and grown in DRM with carbenicillin. Once the culture reached late growth phase, bacteria were centrifuged for 10 minutes at 4,000 g, and the supernatant containing phage was isolated. Colony PCR was performed using primers (5'- GCTGTCTTTCGCTGCTGAGG-3' (SEQ ID NO: 128) and 5'-
B1195.70174WO00 12131093.2
GCAAGAAACAATGAAATAGCAATAGCTATCTTACCGAAGCCC-3' (SEQ ID NO: 129)) and sent for sanger sequencing (Quintara Biosciences). Phage-assisted noncontinuous evolution (PANCE) [0528] Strain S2060 cells were transformed with the appropriate P1 and P2 plasmid (FIG.26C) and made chemically competent. Chemically competent host cells were transformed with the mutagenesis plasmid (MP6)58 as described above and plated on 2×YT agar with 100mM glucose. The next day, several colonies were picked and grown overnight at 37 °C with shaking at 225 rpm. The overnight culture was then diluted by 50-fold in DRM media with the appropriate antibiotics and grown at 37 °C with shaking at 225 rpm until OD600 reached 0.3-0.4. To induce MP6 expression, arabinose was added to reach a final concentration of 20 mM. Immediately, 1 mL of this culture was mixed with 10 µL of the selection phage in a 96-well plate (Avantor, VWR) and grown overnight for 12-18 hours at 37 °C with shaking at 225 rpm. The plate was centrifuged for 10 minutes at 4,000 g and phage was isolated from the supernatant. Isolated phage were used to infect the next PANCE passage until a noticeable change in phage propagation was observed. After each PANCE passage, titers of isolated phage were determined by qPCR (described below), and this information was used to determine the selection strategy for the next passage. After evolution, phage were plaqued in 1) host strain 2060 to check for cheater phage that might have recombined with gIII, and 2) host strain S2060 transformed with pJC175e to determine phage titers. Individual plaques were PCR amplified using the same primers noted in ‘Bacteriophage cloning’ and sent for sanger sequencing. Mutation tables were generated using Mutato (hub.docker.com/r/araguram/mutato). Assessment of PANCE titers using quantitative PCR (qPCR) [0529] To generate a standard curve for qPCR, a standard phage sample of high titer (~1X1010 pfu/mL as determined by plaquing) was serially diluted by a factor of 10, up to 108- fold, in water and carried forward along with the isolated phage from PANCE. First, 50 µL of phage was lysed for 30 minutes at 80 °C. To remove the genome of replication-incompetent polyphage, 5 µL of the lysed phage was mixed with 44.5 µL of 1x DNase buffer and 0.5 µL of DNase I enzyme (New England Biolabs). This mixture was incubated first for 20 minutes at 37 °C, then for 20 minutes at 95 °C. Finally, 1.5 µL of this reaction was combined with 14 µL of Q5 Hot Start High-Fidelity 2× Master Mix, SYBR Green (Invitrogen), 0.125 µL each of 100 µM
B1195.70174WO00 12131093.2
M13 forward and reverse primers (5'-CACCGTTCATCTGTCCTCTTT-3' (SEQ ID NO: 128), and 5'-CGACCTGCTCCATGTTACTTAG-3' (SEQ ID NO: 129)), and water to achieve a final volume of 28 µL. The qPCR was performed with the following conditions: 98 °C for 2 min, and then 45 cycles of [98 °C for 10 s, 60 °C for 20 s, and finally 72°C for 15 s]. Standard curve was generated using Cq values and phage titers of PANCE pools were determined accordingly. Phage-assisted continuous evolution (PACE) [0530] PACE experiments were performed as previously described33. Briefly, as explained above for PANCE, S2060 host cells with the appropriate plasmids (P1, P2, and MP6) were grown until OD600 reached 0.3-0.4. Next, the chemostat and all four lagoons were filled with 80 mL and 15 mL of this cell culture respectively. To maintain an OD600 between ~0.2-0.8 in the chemostat, an appropriate flow rate (~ 80mL/hour) was established to continuously dilute the cells with fresh media (59g Harvard Custom Media C, 50 µL of 0.1M CaCl2, 120 µL of trace metal solution, 400 mg chloramphenicol pre-dissolved in 4 mL of ethanol, 500 ng carbenicillin, 1 g spectinomycin, 500 mL DI water, and 20 L Harvard Custom Media A solution). A flow rate of 7.5 mL/hour was set in the lagoons, and cells were induced with 10 mM arabinose. This set-up was allowed to equilibrate for at least an hour before selection phage infection. Next, all pumps were turned off and the lagoons were infected with ~108 PFU/mL phage. After 10 minutes, the pumps were turned back on to start the evolution, and samples (~500 µL) were taken from the waste line of each lagoon for plaquing (T=0 timepoint). As indicated in FIG.32B, samples from the lagoons were taken throughout evolution at different timepoints, and flowrate in the lagoons was increased with time. After collection, samples were centrifuged for 3 minutes at 8,000 g, and the supernatant was collected and plaqued as described above in PANCE to check for cheater phage and determine phage titers. Individual plaques were picked, PCR amplified, sequenced, and analyzed as described in the ‘PANCE’ section. Structure analysis via AlphaFold2 [0531] All protein structures were predicted using AlphaFold via ColabFold v1.5.336, 59. ChimeraX60 was used to align structures. Mammalian cell culture transfection [0532] All transfections were performed in 96-well poly-D-lysine coated plates (Corning). HEK293T, N2a, and HuH7 cells were seeded at a density of 10K/well, 20K/well, and 15K/well
B1195.70174WO00 12131093.2
respectively. After 16–24 hours, cells were transfected at approximately 50-60% confluency with 0.5 µl of Lipofectamine 2000 (Thermo Fisher Scientific), according to the manufacturer’s protocols. In HEK293T cells for PASSIGE, evoPASSIGE, or eePASSIGE, 100 ng of prime editor plasmid, 10 ng of each pegRNA plasmid, 100 ng of Bxb1 plasmid, and 150 ng of donor plasmid were transfected. In FIG.30A, 20 ng of each pegRNA was transfected instead of 10. For PASTE experiments, 100 ng of prime editor plasmid and 100 ng of Bxb1 plasmid were replaced with 200 ng of PASTEv3 construct (Addgene: #179105). For PE3 experiments, 20 ng of pegRNA plasmid, and 10 ng of nicking sgRNA plasmid were used. In FIGs.36 and 38, to assess attachment site installation using twinPE, 100 ng of prime editor or 200 ng of PASTE v3 along with 10 ng of each pegRNA were transfected. In N2a cells, 25 ng of each pegRNA was transfected instead of 10 ng, and the amount of all other components were kept the same as above. In HuH7 cells, 50 ng of Bxb1 plasmid, 50 ng of prime editor, 75 ng of F9 DNA donor plasmid or minicircle plasmid (see below for preparation), and 10 ng of each pegRNA plasmid were transfected. Genomic DNA preparation for high-throughput sequencing and droplet digital PCR (ddPCR) [0533] For the extraction of genomic DNA, media was removed from cells cultured for 3 days after transfection. The cells were washed with 1× PBS solution (Thermo Fisher Scientific) before adding 50 µL of freshly prepared lysis buffer (10 mM Tris-HCl at pH 8.0; 0.05% SDS; 25 µg ml−1 of proteinase K (Thermo Fisher Scientific)) into each well. This mixture was incubated at 37 °C for 1–2 hours, transferred into a 96-well PCR plate, and then heated at 80 °C for 30 minutes to inactivate proteinase K. This genomic DNA mixture was directly used as a template for high-throughput sequencing (see below). [0534] To prepare genomic DNA for ddPCR, the above mixture was further purified using the DNAdvance kit from Beckman Coulter (Cat# A48705), according to the manufacturer’s protocol. Briefly, 60 µL of Pre-Bind PBBA buffer was mixed with 30 µL of cell lysate. Next, 60 µL of Bind BBE buffer with beads was added and thoroughly mixed. The beads were washed twice with 200 µL of freshly prepared 70% ethanol and air-dried for five minutes before eluting in 20-30 µL of water or elution buffer. Final DNA concentrations were determined using Nano Drop (Thermo Fisher Scientific).
B1195.70174WO00 12131093.2
High-throughput DNA sequencing and analysis of genomic DNA samples [0535] Illumina Miseq instruments were used to sequence PCR amplified genomic sites of interest, as previously described25. First, a 25 µL PCR reaction (PCR1) was performed using 0.5 µM of each forward and reverse primer containing Illumina adapters (Table 8), Phusion U Hot Start II DNA polymerase, 1 µL of extracted genomic DNA (see above), and water. PCR1 was performed with the following conditions: 98 °C for 2 min, 30 cycles of [98 °C for 10 s, 61 °C for 20 s and 72 °C for 30 s], and finally a 72 °C extension for 2 min. Next, a second 25 µL PCR reaction (PCR2) was performed using 0.5 µM of unique forward and reverse Illumina barcoding primer pair, Phusion U Hot Start II DNA polymerase, 1 µL of PCR1 product, and water. PCR2 was performed with the following conditions: 98 °C for 2 min, 10 cycles of [98 °C for 10 s, 61 °C for 20 s and 72 °C for 30 s], and finally a 72 °C extension for 2 min. Products from PCR2 were combined, electrophoresed on a 1.5% agarose gel, and extracted using QIAquick Gel Extraction Kit (Qiagen). DNA concentration of the resulting library was quantified using a Qubit dsDNA High Sensitivity Assay Kit (Thermo Fisher Scientific). The library was then normalized and sequenced on an Illumina Miseq instrument according to the manufacturer’s protocols. Individual sequencing reads were demultiplexed using MiSeq Reporter (Illumina). [0536] CRISPResso261 was used to analyze high-throughput sequencing reads, as previously described25. Briefly, for all experiments, CRISPResso2 was executed on HDR mode with the following parameters for each edit: “e” specified the amplicon expected after editing, “qwc” specified the quantification window which was set between 10-bp upstream of the first nick and 10-bp downstream of the second nick, “discard_indel_reads” was set to TRUE, and “q” was set to 30. Percent editing was quantified by multiplying the ratio of non-discarded HDR aligned reads and total reads aligned to all amplicons by 100. Percent indels were quantified by multiplying the ratio of indel-containing discarded reads and total reads aligned to all amplicons by 100. ddPCR analysis to assess integration efficiency [0537] ddPCR was used to determine the abundance of genomic DNA fragments containing the genome–donor junction in comparison to a reference gene. Approximately, 50–200 ng of bead- purified DNA was added to a 25 µL reaction mixture containing 1) 2X ddPCR Supermix for Probes (No dUTP) (Bio-Rad, 1863025), 2) reference gene primer pair + probe master mix from
B1195.70174WO00 12131093.2
Bio-Rad, (ACTB (Unique Assay ID: dHsaCNS141996500 ) or GAPDH (Unique Assay ID: dHsaCNS794216737) for human cells; Tfrc (Unique Assay ID: dMmuCNS420644255) for mouse cells), and 3) genome–donor junction primer pair and probe (900 nM each primer, 250 nM probe). All primer pairs, probes, and reference primer pair + probe master mixes used in this study are listed in Table S9. Droplet generation, PCR, and droplet reading steps were all performed using the Bio-Rad QX ONE platform. PCR was performed with the following conditions: 95 °C for 10 min, 50 cycles of [94 °C for 30 s and 58 °C for 2 min], and finally, 98 °C for 10 min. Data from ddPCR was analyzed using the QX ONE software 1.3 Standard Edition. The following highlights an example of how thresholds for each channel were determined to avoid false positives observed from plasmid recombined products. To determine % integration, the ratio between the concentrations (copies/µL) of the genome-donor junction and reference gene was multiplied by 100. [0538] FIG.30C shows the ddPCR plots used to assess integration efficiency at the FANCA locus using PASSIGE (Data shown in FIG.30A). The “10% rule”, recommended by ddPCR instrument manufacturer Bio-Rad, was applied to determine a high threshold and therefore avoid false positives from plasmid-donor recombined products. In the genome-donor junction (FAM channel) a first line is drawn right above the negative cloud (3221, dashed line). A second line is drawn around the mean of the positive cloud (4493, dashed line). Threshold (3348, solid line) is then determined using the following formula: Threshold = 10% ^Amplitude of second line − Amplitude of first line^ + Amplitude of first line [0539] The same process is used to determine the threshold for the reference (HEX channel). In this case, the threshold for the reference is 3265. Percent integration is determined using the following formula: Concentration of FAM channel ^copies/μL^ % integration = × 100 Concentration of HEX channel ^copies/μL^ [0540] FIG.30D shows ddPCR plots for the FAM channel and corresponding % integration values obtained when using genome-donor junction binding probes used in this study. An attL or attR binding probe was used to assess integration efficiencies at the GBA1 and FANCA loci respectively (Data shown in FIG.30A). False positives from plasmid-donor recombined products
B1195.70174WO00 12131093.2
were highly reduced as the control lacking prime editor but containing eeBxb1 in both cases showed ≤0.5% integration. Flow cytometry to assess off-target integration [0541] To assess off-target integration, 15,000 HEK293T cells were transfected with 100 ng of Bxb1 variant and 150 ng of a 3.2-kB donor DNA plasmid with either an attP or attB Bxb1 landing site encoding mCherry under the CMV promoter. Cells were passaged for 2 weeks to dilute the plasmid DNA. Next, transfected cells were trypsinized, resuspended in PBS solution, and assessed for mCherry fluorescence using the CytoFLEX S Flow Cytometer (Beckman Coulter) software. The following highlights an example of how the cells were gated. [0542] To assess off-target integration in FIG.28C, a representative flow cytometry plot, shown in FIGs.28E-28F, was used.15,000 HEK293T cells were transfected with a recombinase variant and mCherry donor plasmid. The cells were passaged for 14 days before performing flow cytometry analysis. Cells were gated to remove dead cells (P1), then to remove doublets (P2), and finally to assess mCherry+ cells from the ECD channel. FIG.28E shows an untreated sample. FIG.28F shows the histogram used to assess mCherry+ cells when transfecting cells with either the dead Bxb1 negative control or eeBxb1. V1L represents mCherry- cells, and V1R represents mCherry+ cells. Minicircle donor production [0543] Minicircle donor DNA was prepared using MC-Easy minicircle DNA Production Kit (System Biosciences, Cat #: MA925A-1). Briefly, the Factor IX donor plasmid (Addgene, # 182141) was transformed into ZYCY10P3S2T Minicircle Production Cells. Transformed ZYCY10P3S2T cells were inoculated in 2 mL of LB media with kanamycin and grown for one hour at 30 °C. Then, 1 mL of this culture was inoculated into 200 mL of LB media without antibiotics and grown overnight at 30 °C with shaking at 225 rpm. The next day, 200 mL of induction media was added before the OD600 of the overnight culture reached 0.6. This culture was further grown for 3 hours at 30 °C and 1 hour at 37 °C with shaking at 225 rpm. Qiagen Plasmid Maxi Kit was used to extract the minicircle DNA. To check the quality of the minicircle, 1 µg of the maxi-prepped product was linearized using restriction enzymes and electrophoresed on a 1% agarose gel. The F9 minicircle donor was used to transfect HuH7 cells as described above.
B1195.70174WO00 12131093.2
Factor IX expression and quantification in HuH7 cells [0544] Three days after transfection, HuH7 cells were passaged into a 48-well plate with fresh media and allowed to grow to confluence. ELISA was used to measure Factor IX concentration from conditioned media 9 days after transfection according to the manufacturer’s protocol (Innovative Research Human Total Factor IX ELISA Kit, IHUFIXKTT). References in Example 2 [0545] 1. Landrum, M.J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42, D980-985 (2014). [0546] 2. Weischenfeldt, J., Symmons, O., Spitz, F. & Korbel, J.O. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet 14, 125-138 (2013). [0547] 3. Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68-74 (2015). [0548] 4. Bareil, C. & Bergougnoux, A. CFTR gene variants, epidemiology and molecular pathology. Arch Pediatr 27 Suppl 1, eS8-eS12 (2020). [0549] 5. Jin, X. et al. Identification of novel deep intronic PAH gene variants in patients diagnosed with phenylketonuria. Hum Mutat 43, 56-66 (2022). [0550] 6. Riveiro-Alvarez, R. et al. Frequency of ABCA4 mutations in 278 Spanish controls: an insight into the prevalence of autosomal recessive Stargardt disease. Br J Ophthalmol 93, 1359- 1364 (2009). [0551] 7. Van Alstyne, M. et al. Gain of toxic function by long-term AAV9-mediated SMN overexpression in the sensorimotor circuit. Nat Neurosci 24, 930-940 (2021). [0552] 8. Sacco, M.G. et al. Lymphoid abnormalities in CD40 ligand transgenic mice suggest the need for tight regulation in gene therapy approaches to hyper immunoglobulin M (IgM) syndrome. Cancer Gene Ther 7, 1299-1306 (2000). [0553] 9. Collins, A.L. et al. Mild overexpression of MeCP2 causes a progressive neurological disorder in mice. Hum Mol Genet 13, 2679-2689 (2004). [0554] 10. Thomas, C.E., Ehrhardt, A. & Kay, M.A. Progress and problems with the use of viral vectors for gene therapy. Nat Rev Genet 4, 346-358 (2003).
B1195.70174WO00 12131093.2
[0555] 11. Anzalone, A.V., Koblan, L.W. & Liu, D.R. Genome editing with CRISPR-Cas nucleases, base editors, transposases and prime editors. Nat Biotechnol 38, 824-844 (2020). [0556] 12. Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9 mediated homology- independent targeted integration. Nature 540, 144-149 (2016). [0557] 13. Wang, B. et al. Highly efficient CRISPR/HDR-mediated knock-in for mouse embryonic stem cells and zygotes. Biotechniques 59, 201-202, 204, 206-208 (2015). [0558] 14. Heyer, W.D., Ehmsen, K.T. & Liu, J. Regulation of homologous recombination in eukaryotes. Annu Rev Genet 44, 113-139 (2010). [0559] 15. Branzei, D. & Foiani, M. Regulation of DNA repair throughout the cell cycle. Nat Rev Mol Cell Biol 9, 297-308 (2008). [0560] 16. Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nat Biotechnol 36, 765-771 (2018). [0561] 17. Alanis-Lobato, G. et al. Frequent loss of heterozygosity in CRISPR-Cas9-edited early human embryos. Proc Natl Acad Sci U S A 118 (2021). [0562] 18. Song, Y. et al. Large-Fragment Deletions Induced by Cas9 Cleavage while Not in the BEs System. Mol Ther Nucleic Acids 21, 523-526 (2020). [0563] 19. Pawelczak, K.S., Gavande, N.S., VanderVere-Carozza, P.S. & Turchi, J.J. Modulating DNA Repair Pathways to Improve Precision Genome Engineering. ACS Chem Biol 13, 389-396 (2018). [0564] 20. Hoijer, I. et al. CRISPR-Cas9 induces large structural variants at on-target and off- target sites in vivo that segregate across generations. Nat Commun 13, 627 (2022). [0565] 21. Lampe, G.D. et al. Targeted DNA integration in human cells without double-strand breaks using CRISPR-associated transposases. Nat Biotechnol (2023). [0566] 22. Klompe, S.E., Vo, P.L.H., Halpin-Healy, T.S. & Sternberg, S.H. Transposon-encoded CRISPR-Cas systems direct RNA-guided DNA integration. Nature 571, 219-225 (2019). [0567] 23. Strecker, J. et al. RNA-guided DNA insertion with CRISPR-associated transposases. Science 365, 48-53 (2019). [0568] 24. Tou, C.J., Orr, B. & Kleinstiver, B.P. Precise cut-and-paste DNA insertion using engineered type V-K CRISPR-associated transposases. Nat Biotechnol 41, 968-979 (2023).
B1195.70174WO00 12131093.2
[0569] 25. Anzalone, A.V. et al. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat Biotechnol 40, 731-740 (2022). [0570] 26. Anzalone, A.V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149-157 (2019). [0571] 27. Yarnall, M.T.N. et al. Drag-and-drop genome insertion of large sequences without double-strand DNA cleavage using CRISPR-directed integrases. Nat Biotechnol 41, 500-512 (2023). [0572] 28. Esvelt, K.M., Carlson, J.C. & Liu, D.R. A system for the continuous directed evolution of biomolecules. Nature 472, 499-503 (2011). [0573] 29. Roth, T.B., Woolston, B.M., Stephanopoulos, G. & Liu, D.R. Phage-Assisted Evolution of Bacillus methanolicus Methanol Dehydrogenase 2. ACS Synth Biol 8, 796-806 (2019). [0574] 30. Nathwani, A.C. et al. Long-term safety and efficacy of factor IX gene therapy in hemophilia B. N Engl J Med 371, 1994-2004 (2014). [0575] 31. Kuo, C.Y. et al. Site-Specific Gene Editing of Human Hematopoietic Stem Cells for X-Linked Hyper-IgM Syndrome. Cell Rep 23, 2606-2616 (2018). [0576] 32. Carrette, L.L.G., Blum, R., Ma, W., Kelleher, R.J., 3rd & Lee, J.T. Tsix-Mecp2 female mouse model for Rett syndrome reveals that low-level MECP2 expression extends life and improves neuromotor function. Proc Natl Acad Sci U S A 115, 8185-8190 (2018). [0577] 33. Miller, S.M., Wang, T. & Liu, D.R. Phage-assisted continuous and non-continuous evolution. Nat Protoc 15, 4101-4127 (2020). [0578] 34. Jusiak, B. et al. Comparison of Integrases Identifies Bxb1-GA Mutant as the Most Efficient Site-Specific Integrase System in Mammalian Cells. ACS Synth Biol 8, 16-24 (2019). [0579] 35. Ghosh, P., Pannunzio, N.R. & Hatfull, G.F. Synapsis in phage Bxb1 integration: selection mechanism for the correct pair of recombination sites. J Mol Biol 349, 331-348 (2005). [0580] 36. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583-589 (2021). [0581] 37. Li, W. et al. Structure of a synaptic gammadelta resolvase tetramer covalently linked to two cleaved DNAs. Science 309, 1210-1215 (2005). [0582] 38. Li, H., Sharp, R., Rutherford, K., Gupta, K. & Van Duyne, G.D. Serine Integrase attP Binding and Specificity. J Mol Biol 430, 4401-4418 (2018).
B1195.70174WO00 12131093.2
[0583] 39. Rutherford, K., Yuan, P., Perry, K., Sharp, R. & Van Duyne, G.D. Attachment site recognition and regulation of directionality by the serine integrases. Nucleic Acids Res 41, 8341- 8356 (2013). [0584] 40. Doman, J.L. et al. Phage-assisted evolution and protein engineering yield compact, efficient prime editors. Cell 186, 3983-4002 e3926 (2023). [0585] 41. Bessen, J.L. et al. High-resolution specificity profiling and off-target prediction for site-specific DNA recombinases. Nat Commun 10, 1937 (2019). [0586] 42. Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency. Genome Biol 16, 280 (2015). [0587] 43. Shin, S. et al. Comprehensive Analysis of Genomic Safe Harbors as Target Sites for Stable Expression of the Heterologous Gene in HEK293 Cells. ACS Synth Biol 9, 1263-1269 (2020). [0588] 44. Patrinostro, X. et al. Essential nucleotide- and protein-dependent functions of Actb/beta-actin. Proc Natl Acad Sci U S A 115, 7973-7978 (2018). [0589] 45. Sharma, R. et al. In vivo genome editing of the albumin locus as a platform for protein replacement therapy. Blood 126, 1777-1784 (2015). [0590] 46. Eyquem, J. et al. Targeting a CAR to the TRAC locus with CRISPR/Cas9 enhances tumour rejection. Nature 543, 113-117 (2017). [0591] 47. Mistry, P.K. et al. Gaucher disease: Progress and ongoing challenges. Mol Genet Metab 120, 8-21 (2017). [0592] 48. Migdalska-Richards, A. & Schapira, A.H. The relationship between glucocerebrosidase mutations and Parkinson disease. J Neurochem 139 Suppl 1, 77-90 (2016). [0593] 49. Dang, N. & Murrell, D.F. Mutation analysis and characterization of COL7A1 mutations in dystrophic epidermolysis bullosa. Exp Dermatol 17, 553-568 (2008). [0594] 50. Castella, M. et al. Origin, functional role, and clinical impact of Fanconi anemia FANCA mutations. Blood 117, 3759-3769 (2011). [0595] 51. Keinath, M.C., Prior, D.E. & Prior, T.W. Spinal Muscular Atrophy: Mutations, Testing, and Clinical Relevance. Appl Clin Genet 14, 11-25 (2021). [0596] 52. Kochenderfer, J.N. et al. Construction and preclinical evaluation of an anti-CD19 chimeric antigen receptor. J Immunother 32, 689-702 (2009).
B1195.70174WO00 12131093.2
[0597] 53. Brudno, J.N. et al. Safety and feasibility of anti-CD19 CAR T cells with fully human binding domains in patients with B-cell lymphoma. Nat Med 26, 270-280 (2020). [0598] 54. Kay, M.A., He, C.Y. & Chen, Z.Y. A robust system for production of minicircle DNA vectors. Nat Biotechnol 28, 1287-1289 (2010). [0599] 55. Durrant, M.G. et al. Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome. Nat Biotechnol 41, 488-499 (2023). [0600] 56. Neugebauer, M.E. et al. Evolution of an adenine base editor into a small, efficient cytosine base editor with low off-target activity. Nat Biotechnol 41, 673-685 (2023). [0601] 57. Hubbard, B.P. et al. Continuous directed evolution of DNA-binding proteins to improve TALEN specificity. Nat Methods 12, 939-942 (2015). [0602] 58. Badran, A.H. & Liu, D.R. Development of potent in vivo mutagenesis plasmids with broad mutational spectra. Nat Commun 6, 8425 (2015). [0603] 59. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679-682 (2022). [0604] 60. Goddard, T.D. et al. UCSF ChimeraX: Meeting modern challenges in visualization and analysis. Protein Sci 27, 14-25 (2018). [0605] 61. Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat Biotechnol 37, 224-226 (2019).
B1195.70174WO00 12131093.2
Tables referenced in Example 2 Table 6: Absolute knock-in efficiencies of all variants tested in FIG.27B % Knock-in Variant Genotype AAVS1 CCR5 15 40 07 35 37 67 58 00 33 33 23 02 28 70 35 86 25 97 96 10 26 83 60 86 00 37 82 34 41 67 69 08 35 73 78 28 67 77 04 47 56 01
B1195.70174WO00 12131093.2
Table 7: Absolute knock-in efficiencies of all variants tested in FIG.35A % Knock-in Variant AAVS1 CCR5 22 45 06 81 05 44 22 56 19 72 55 08 13 69 42 61 52 69 68 28
B1195.70174WO00 12131093.2
Table 8: HTS primers used in Example 2 HTS-Fwd SEQ HTS-rev SEQ ID ID NO: NO:
B1195.70174WO00 12131093.2
: R C P d d r of e t i
s m i h r C T GT AA GT AA AA AA GT GT c P A AG A G A G G A A a A e C C C A C C C A C A C A CG C C C C G C A GC T C C A GC T GC T GC T C C C A C t GC T G C G A G C G A G A G A G C GA C a c + d n e e e r s r e r e u e b o t s x i B B s f e r m i a m T T B T B T c B T B T B T B T RP r e p m C A C A C A C A rf C C C C e T A A A A bor p a s d r t e x t Eo n n a r e e m i ti r s m n i on e P t t P B B 0 P B B 0 a e : g t t a t t 8 a 2- t t t t t t Bt t Pt t Bt t Pt t O9 C A a a a a a a a W 4 e 7 l 2 b . 1 8 2 A 7 1 2. S 5 1 6 9 2 1 1 0 3 9 a G R . s S 5 2 R . S S 5 R 5 7 . 5 0 1 T I V F A A C G C I V a F A A C s GV V R C o R I F A A A A C C C 9 1 3 1 C 1 B 2 1
3 7 1 AI 3 / T G CT GC CT CT G/ NE Z/ GC G CC A CC A/ -6 M 5 / A F 3 6 1 AC CGA C C C C C CAT C C C G 4 5 1 CA T C T C C T A
GG GG GG GG GG G A A A A AA GT GG GG GG GG GG GGA A A AA T AA AA AA AA AA AACG A CG A CG A CG A CG AC A CG A CG AG AG AG AGGC GC GC GC GC C C A C C C C C T T T T T C T C T C C C C C G G GT GT GT GT G A G A G A G A G A G G A G A G A G A G A G AH H HD DP B B D B B B A P A c G G r c f T r T T P c T T T B T f T C A C A A G rf T C A C A C A C A 0B P B P P B P B 0 t t t t t t t t t t t t t t t t Bt t Pt Bt Pt O a a a a a a a a a t a t a t a W B C 4 7 6 9 1 2.3 B T B 2 6 T a 2 2 s a s . 1 SV 5 6 2 9 2 0 7 9 R B T a s . B B . 5 0 9 1 C C o o GI C G M M 1 3 1 A A R R F A A C C A o R I F L A L A 2 B 2 B 1 B 2 1
3 7 1 AI 3 / T G CT GC CT CT G/ NE Z/ GC G CC A CC A/ -6 M 5 / A F 6 6 1 TTT T C TAAGGAGC TAC AGA C T C G 4 5 1 CA T C T C C T
A T CA T CA T C T T A T CA T CA T CA T CA T T T T CA CAGG GG GG GG GG GG GG GG T C T GG GGA A A C A A A AA AA AA AA AA GA T GT AA AACG A CG A CG A CG A CG A CG A CG A CG G C T G C T AG AGGC T GC T GC T GC T GC T GC T GC T GC T G G C A C C T C T A G A G A G A G A G A G A G A GG A G G G C GG C G A G AB T B T B T B T B T B T B T B T B T B T c A A A C A C A C A C A C A C A C A r cC C C f T rf T B t P B P B P B P B P B E 0 0 t a t t a t t a t t a t t a t t a t t a t t a t t a t t a t t P 9 O a t t a 2- D W 4 1 1 C 1 1 A A C A R T R T A 7 A A A 9 2 7 1 2 0.3 7 1 . s 7 . 9 0R B A B L L C N C N n 1 n 5 G 9 1 1 3 1 T R T F C F C G G O C O C A F A F m S m S I F 1 B 2 1
3 7 1 AI 3 / T G CT GC CT CT G/ NE Z/ GC G CC A CC A/ -6 M 5 / A F 1 6 1 GG T C T A C T T C T T C CA GG C 4 5 1 CA T C T C C T AGGAA
A CG A CG AC AC A CG A CG A CG A CG A CG A CG A CG A CG AGGC GC C C C C GC GC GC GC C T T C C T T T T GC T GC T GC T GC T GC T G A G A GA C GA C G A G A G A G A G A G A G A G A G A H HB T B T B T B T D D P c c B T B T B T B T B T C A C A C A C A P A A G G rf T rf T C A C A C A C A C A 0Bt t Pt t Bt t Pt t Bt t Pt t Bt t Pt t Bt t Pt t Bt Pt B 0 t O a a a a a a a a a a t a t a t a W 4 71 2. 1 3 S 1 V S V 5 6 2 6 2 0 7 R 5 R B T B T a s a s B B C A . 9 5 0 9 1 A A C C C C o M M 1 3 1 A A C C A A R o R L A L A 2 B 2 B R T 1 B 2 1
4 7 1 I 3/ GT G GT G A CT CT G/ NE Z/ GC G GC A GC A/ -6 M 5 / A F 6 6 1 TTT T C TAAGGAGC TAC AGA C T C G 4 5 1 CA T C T C C T
A T CA T C T T A T CA T CA T CA T CA T T T T CA CA CAGG GG GG GG GG GG GG T T T GG GG GGA A C C A A AA AA AA AA AA G G AA AA AAA CG A CG A CG A T CG A CG A CG A CG G C T G T T AG AG AG C C GC GC GC GC GC G CG C A C C CGT GT T T T T T T C T C T A GG C A GG G G G G A G A G A G A G A G A G C G A G A G A HB T B T B T B T B T B T B T B T B T D c c P C C C C C A r T r A A A A A A C A C A C A C f f T G 0P t B P B P B P B 0 t a t t a t t a t t a t t a t t a t t a t t P a t t B P P O a t t a t t a t t a W 1 1 F 4 C 1 1 A A R T R T A 7 A 7 A A 7 9 1 2 1 1 2 0.3 . 7 . 9 0R B A B L L C N C N n n GB T 5 9 1 1 3 1 T F C F C G G O C O C A F A F m S m S I F C A 1 B 2 1
4 7 1 I 3/ GT G GT G A CT CT G/ NE Z/ GC G GC A GC A/ -6 M 5 / A F 1 6 1 GG T C T A C T T C T T C CA GG C 4 5 1 CA T C T C C T
A AGGG GT T CA T CA T CA T CA T CA T C T T A T CA T GG GG GG GG GG GG GG T C T A C GG T TA T A A A A C A GA T A A A A AA AA AA G A GCG AC AG G G G T T A T T C C C C A C C A C C A C C A C CG A C CG A C CG G C CG A CG G C CGGT C GT GT GT T T T T T G A GA C G G G G A G A G A G A G A G A G A G A GG C G A G A GG C B T B T c B T B T B T B T B T B T B T B A A r c f T A A A A A A C A r T C C C C C C C C f T C A 0P t B P P P P B P P 0 t a t t a t t a t t a t t a t t a t t a t t a t t P a t t P P O a t t a t t a W 1 1 A 4 7S V 5 6 2 R a C R T 1 A 7 A 0 A 1 2 0.3 L C 1 3 . 7 . 9 0A N C N 5 1 A C s B M A C o R L A 2 B R T F A B n G 9 1 3 1 C G O C A F m S I F A F 1 B 2 1
5 7 1 T AC G CC GT G A GC T/ NE Z/ A G A A GT AC C/
A I C / A A A 3 Q / F / F / F -6 M/ - MF - MK - MK - MK 5 / A F A 6 C 5 / A F K 6 B 5 / A F B 6 A 5 / A F B 6 5 / A F B A A 2 7 2 2 2 2 1 7 1 7 1 7 1 7 1 C C C C CG C G C G C G C G CA A A A AC C C C C C C A C C C T C T A C A A A C C T C T C T CAT T AT C T AT C T AT C T AT TA GG C A GG C A GG C A GG C A GG C 8 5 9 1 5 7 1 5 8 1 5 9 1 5 1 G C TA C G C G G C T C A C C AC T A C T T C T C A CAGG T C GC T AG A T G T C GC C T A G C T T C G G C G T C C C TG G C C T GG T C C T G A T C G C T T C T G C G C C T T H HB T D B T B T D PC A P A A G C A C A G g A n g -e n -e n -e n Ar Nd i m o i r d e b o Rmni i mr d e b o o t c e b o o o i t mr c e b o o o i t c e b Nn o ni or g s e a l ni or n e n o n o n e n n o n e n n o D d b p p p b p G d u j r p Go d u j r p Go d u j r p 0B B 0 t t a t t B a t t Bt t Bt t O a a a W 4 1 1 1 7 1 2 0.C 7 3 9 L B O T B N C L B T . 5 0 9 1 C M O C 1 3 1 1 N A L N A B 2 1
r ot a T n i T T T T T T T T T mr T T T T T T T T T T e t T T T T T T Q : D I O 0 0 0 0 . 2 e l p m a x E n i d e t s e t e t i s h c a e r of d e su e c n e u q e s AN R g e p
d e n e c GC a T ACA GGAG G AT TA GCACA C C C CGC AT C T T T T CGCAG T C C CGCn o p p s C C C C C C AG T T CG T C A C T C A C C C C C a m G G G A G G G G G C C C tna o a i C r r p t e t p a l e e x o b- r e w t Btt a – v 1 E n 6 1 v a o e b n r o e t c e g i n e m i n y i g ti e t E P E P E P E P d e r P e u q d e a r n t i n i n i n i s w t w t w t w t m i e S r e r t P A m i o t n i ai : N r r x a x a x a P d m d 6 D R Ea m m g V E P E P E P E P 0 1 e e e - P A : tis m 0 0 t 0 t n i o B n 8 2 O 1 A a e g - P t A t B t a t B t B t a t a t a W 4 s 0 e 1 8 l e 2 A 7 9 1 2 0.3 9 b l b e t . i s 1 S 6 2 2 1 7 V 5 R . a . S 5 0 1 a T a S G T I F A A C s G V 9 1 3 1 C o R I F A A 1 B 2 1
T T T T T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 8 0 2 GGG CG GG GG T 3 9 1 CAA CG TA T C GA C 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 7 7 1 GCT C C C
G G C C C G T A C G G G T A C G G G T G C C G T G C C G C C p pb a l e e - r 6 e w t Btt a 1 v o e b nE P E P E P E P E P E n n n P i i i n i n i n i w t w t w t w t w t w t x a xm b a x a 6 b 6 m m d E E E E E 6 P P P P P E P 00 P t Ot B t a t P t a t B t a t P t a t B t a t a W 4 71 2. 1 S 6 0 7 3 9 V 5 R 5 R B T B 2 T . a s 5 0 9 1 A C C C C o 1 3 1 A C C A A R 1 B 2 1
T T T T T T T T T T T T AAAAC CGAA AT C GAA C AC C T CCAGAG C T TGAATATAGACAC T G T AAG
T T C C T C C C A C GC T C TG AC C G GC C C C C C A GG T AT CG G G GT T GT C C C C GT C A C T C A C p pb a l e e - r Bt 6 e v w t t a 1 o e b n EP E P E E E n P P P i n i n i n i n i w t w t w t w t w t x x d a b a 6 m 6 m d E E E 6 P P P E P E P 00 Pt P t B P B O t a t t B a t t a t t a t a W 4 7 2 62 9 1 2 1 6 a 2 0. 7 3 . 9 s . S 5 0 o G V R B T a I A C C s 5 9 1 o 1 3 1 1 R F A C A R B 2 1
T T T T T T T T T T
AA G GT T C A C AA A G GT T C A C AG A G AG C C T C A C AG G G AG C A C G G G GC T EP E E E E n P P P P i n i n i n i n i w t w t w t w t w t xa x b b a 6 6 m m c E 6 P E P E P E P E P 00 Btt P t a t B t a t P t O a t B t a t a W 4 71 2 0. 7 3 9 B B M M C A . 5 0 9 1 L L 2 1 3 1 A A B 2 B R T 1 B 2 1
T T T T T T T T T T AAAAC CGAA AT C GTAAC T C T G CA G TGGAC TAGAAAGAC G C T A T C CG A G A T T GAT C
G T G G T G T G G G C A A G G G C A A G G G A T G G G G A EP E P E E E n P P P i n i n i n i n i w t w t w t w t w t xa x x b a a m 6 c 6 m m E P E P E P E P E P 00 Pt B t P t B P O t t t tt t a a a a t a W 4 71 2 C 1 1 0. 7 3 A R T R T . 9 A A 5 0 9 1 R 1 3 1 T F C F B B C G G 1 B 2 1
T T T T T T T T T T AAAAC CGAA AT C AA C TAC C G A CGG T G TG GGAC TAGAAAGAC G C T A TGG T
G G A A G T G A A G T G C C G C G C C G C G T T A C EP E E E E n P P P P i n i n i n i n i w t w t w t w t w t xa x b 6 b a 6 m m d E E 6 P P E P E P E P 00 Btt P t O a t B t a t P t a t B t a t a W 4 71 2 A 0.3 7 A A A L 7 L C N C N 1 7 . 9 n 5 0 9 1 O O 1 3 1 C 1 C 1 A F A F m S 1 B 2 1
T T T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 2 1 2 T CC AGGAG G G 3 9 1 CAA CG TA T C GA C 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 1 8 1
AGG C TAC T C G GT GC T A C C C CC GA C GC T A C C C CC GA G C AG TAG G T G AG TAG G T A C T T G C T T C T C T C C C C C C C C C C C C G T AG TG G A G G G G C C G T A C G G G T A C G G p pb a l e - r e 6 e Btt 1 v w t a o e b nE P E n P E P E P E P i n i n i n i n i w t w t w t w t w t x x d a a 6 m m b b E 6 6 P E P E P E P E P E 0 0 P t 9 B t P B P Ot a 2- t t a t t a t t a t W 4 D a 7 9 1 2. 1 2n . 1 S 1 0 7 3 V S V 5 R 5 R . 9 5 0 1 m G 9 S I F A A A A C C C 1 3 1 C 1 B 2 1
T T T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 1 1 2 CAGAG C G T G T 5 9 1 CAAGG CA G A G 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 0 8 1 ACT T C C T T
G G C C G T G C C G C C C C G T T A C G G C C A G G C E P E E E E E n P n P P P P i i n i n i n i n i w t w t w t w t w t w t x a x am m d E E 6 d E 6 b b E 6 6 P P P P E P E P 00 B tt P tt B t P t B t P t O a a t a t a t a t a W 4 71 2.B 6 2 6 2 0 7 3 T B T . 9 aC C s a o s o B L B 5 0 9 1 L 1 3 1 1 A A R R A A B 2 1
T T T T T T T T T T AAT TA CG C CA G A TAGG C T C AG GG T GTAC TACAGCGTAAA T T TAGAC AC T AGA A T C
A G A C C G G A C C G G G T G T G G T G T G G G C EP E E E E n P n P P P i i n i n i n i w t w t w t w t w t xa x a x m m c a 6 m b E 6 P E P E P E P E P 00 Bt P t B t P B O t t t tt t a a a a t a W 4 71 2 0.3 R 7 . 9 5 0 M 2 M C A C A T 9 1 B 2 B R T R T F 1 3 1 C 1 B 2 1
T T T T T T T T T T AAAAC CGAA AT C GAA C AC C T CCAGAG C T TGAATATAGACAC T G T A C A
A A G G G C A A G G G A T G G G G A T G G A A G T G A A G T EP E P E P E P E n n P i i n i n i n i w t w t w t w t w t xa x c a 6 m m b E 6 b 6 P E P E P E P E P 00 Pt O t B tt P tt B t P t a a a t a t a W 4 71 2. RT 1 1 A 7 A 0 7 7 3 . 9 5 0 F A B A B L L 9 1 1 3 1 C G G O C 1 O C 1 1 B 2 1
T T T T T T T T T T AAAAC CGAA AT C AAA C T A T C C GC TGAGC C A C CGG T G G
C C C C T T C C GC C A C GC C GC C A C GC C GC T C TG AC C GC T C TG AC C GT T GT C C C EP E P E E E n n P n P n P i i i i n i w t w t w t w t w t xa x a x d d a m m 6 6 m E P E P E P E P E P 00 Bt O t P tt B tt P tt P t a a a a t W F a 4 A A 7 9 1 2.3 C N C N 1 n 1 2 0 n . B 7 T . 9 5 0 9 1 A G 1 3 1 F A F m S m S I F C A 1 B 2 1
T T T T T T T T T T T T AA C TAC C G A GC TGAGC C A C CGG T G G
TC C C CGC AT C T T GAC T ATGA G AC T G GC C C A G ATG T G G G G AC AA T AG C C C C G T C C T C T C G GT C A C A G AG C A C G p pb a l e e - r e w t Btt 6 v a 1 o e b n EP E P E E E n i n P i n P i n P i n i w t w t w t w t w t xa x m b a 6 d 6 b 6 m E P E P E P E P E P 00 Pt O t B t a t P t a t P t a t P t a t a W 4 71 2. 1 0 S 7 3 9 V 5 6 R 2 . a A C s o B 5 0 L M 9 1 2 1 3 1 1 A C R A B B 2 1
T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 5 1 2 TAA TG AA GC C 7 9 1 CCAGAC C T C GA C 2 9 1 AGT C A CGAAAGAAGG TGT T T C GG T 4 8 1 ATC TG
T T T C T T C C AGGAG CGAAA TGGCG G G GC TA GC C T C T G G GC CA AGGG A A G G G AG T AAGA G G A A GC C T GC C A C C GC C E P E P E E E n n P P P i i n i n i n i w t w t w t w t w t x a x x m b a m b a 6 6 m E P E P E P E P E P 00 P t B t P t P t P t Ot a t a t a t a t a W 4 71 2 0.3 C 1 A A A R T 7 7 L C N . 9 5 0 9 1 R T F A B C G O C 1 A 1 3 F 1 1 B 2 1
T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 2 1 2 T CC AGGAG G G 3 9 1 CAA CG TA T C GA C 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 1 8 1 AG G
TA G C C A T C GC T C T AC C GC C A C GC C GGG AA G G G AG T G G GT C A C AG A G AG C A C G E P E E E E n P n P n P n P i i i i n i w t w t w t w t w t xa x a x d m m b a 6 6 m E P E P E P E P E P 00 P t Ot P t a t P tt P tt P tt W 4 A a a a a 7 2 0 1. 1 3 A 0 7 3 9n . C 1 . 5 0 1 G N A B M 9 1 3 m I A B L 2 1 1 S F F G A B B 2 1
T T T T T T T T T T T
G G GC T GC T GC T C TG AC C G GT C A C A G GC C C C C C G GC C p pb a l e e p al e e p al e e - r 8 e w t Btt p a b- r e w t Ptt p a b- r e w t Ptt a 3 v o e b n 0 5 v o e b n 8 3 v o e b nE P E P E E E n P P P i n i n i n i n i w t w t w t w t w t x a x x x d a a a m 6 m m m E P E P E P E P E P 05- - 0 0 P t P B P P Ot tt tt tt t a a B a t 0 B a a 8 3 W 4 7 4 1 2.C A 1 3 n . 3 . 1 S 1 0 7 3 V S V . 9 5 0 1 R G T m S I B G 9 1 3 1 F L A I F A A A A 1 B 2 1
T T T T T T T T T T AAAAC CGAA AT C T TAGC G C TA G G TGGAC TAGAAAGAC G C T A T C CG A G CGA C
TC C C CGC T C C C CGC T C CGC T C CGC T C CGC C C C C C C C C C C G GC C C C C C G GC C C C C C G GC C C C C C G GC C C C C C G GC C C C C C p e p p p p pb a l - r e 8 e w Ptt p b a l e r e Ptt a p l e r e Ptt p b a l e r e Btt p b a l e r e Btt 2 v t a - e w t a b e w t a - e w t a - e w t a o e b n 8 1 v o e b n - 8 v o e b n 8 3 v o e b n 8 2 v o e b n EP E P E P E P E n i n i n i n P i n i w t w t w t w t w t xa x a x a x a x a m m m m m E P E P E P E P E P 82 8 - 1- 8 8 8 - 3- 2- 0 0 Ptt P t a t P t a t B t O a t B t a t a W 4 71 2 1 0.3 S 1 V S 1 V S 1 V S 1 7 V S V . 9 5 0 9 1 A 1 3 1 A A A A A A A A A 1 B 2 1
T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 8 0 2 GGG CG GG GG T 5 9 1 CAAGG CA G A G 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 7 7 1
GCT C A CC GA C GC T C A CC GA G C AG TA C G T G T G AG TA G G C G T T AG TA G C G T T C C C C C C C C C C C C C AG T TG AG T TG AG T TG G G G G C C C G T A C G G G T A C G G G T A C G G p p p p b a l e e - r B p a l e e p a l e e p p a l e e p p a l e e 0 e v w t tt a b- r e Btt b r Ptt b r Ptt b r Ptt v w t a - e v w t a - e v w t a - e v w t a 2 o e b n 2 1 o e b n 0 5 o e b n 8 3 o e b n 8 2 o e b nE P E E E E n P n P P P i i n i n i n i w t w t w t w t w t x a x a x a x a x a m m m m m E P E P E P E P E P 0 2 2 0 8 - 1- 5- - 2- 0 0 B t B t P t P t P t Ot a t a t a t a 8 3 t a W 4 71 2 1 S 1 0.3 V S V 5 7 R 5 R 5 R . 9 5 0 9 1 A 1 3 1 A A A C C C C C C 1 B 2 1
T T T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 9 0 2 ATA CAC G T C G 1 0 2 GT C C CAAC CG A 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 8 7 1 GAG TA GG T
A C G G G T A C G G G T A C G G G T A C G G G T A C G G G T A C p p b a l e - r e p al e e p al e e p al e e p a e e p a e e 8 e v w t Ptt p r e a b- e v w t Ptt p b r e a - 8 e v w t Btt p b r e a - e v w t Btt p a b l - r e v w t Btt p a b l - r e w Btt n v t a 1 o b n 8 o b n 3 o b n 8 2 o e b n 0 2 o e b 2 1 o e b nE P E P E P E P E E n n n n P P i i i i n i n i w t w t w t w t w t w t x a x a x a x a x a x a m m m m m m E P E P E P E P E P E P 8 1- 8 8 - 3 8 0 2 - 2- 2- 1- 0 0 P tt P tt B t O t B tt B tt B t a a a a a t a W 4 71 2 0. 5 R 5 R 5 R 5 R 5 R 5 7 3 R . 9 5 0 9 1 C C 1 3 1 C C C C C C C C C C 1 B 2 1
T T T T T T T T T T T T AAAAC CGAA AT C GC TGAGC C A C CGG T G G G T
G C C C C C C C C C C A T TG A T TG C C G G G G C C C G G C C C GG T A C G G GG T A C G G GT T GT C C C EP E E E E n P n P n P n P i i i i n i w t w t w t w t w t st n s t n s t s t s t a n n n ll i r l l a i r l l a i r l l a i r l l a i r Aa v Aa v Aa v Aa v Aa v 00 Bt P B P B O t tt tt tt t a a a a t a W 4 7 6 1 2 3 0.3 S S 5 9 . 1 1 7 R 5 R B . 5 0 1 G V V T 9 I 1 3 1 F A A A A C C C C C A 1 B 2 1
T T T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 1 1 2 CAGAG C G T G T 3 9 1 CAA CG TA T C GA C 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 0 8 1
ACT TGC C A A C AT T A GT CA AG T GC AG AC C C G T A CC GA G C AG TAG T G T T C T T T C C C C T C A C T G T C A C C T C TG T C C C C C C C C C C C A C G T TG G G G G A G G G T A C G G E P E P E P E P E n n n n P i i i i n i w t w t w t w t w t st P P n s t s t / x / ai n ai n a a E x a E l l r Aa l l r v Aa l l i r v Aa mT mT v E S S P A E P A F 0 0 P t B t P t 7 3 P B Ot t a a - t t a t A t a t a W 4 7 2 3 7 1.3 B 6 2 6 2 T a a . s 1 0 S 7 V 5 R . 9 5 0 1 C s o s 9 o G C 1 3 1 A R R I F A A C 1 B 2 1
T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 1 1 2 CAGAG C G T G T 5 9 1 CAAGG CA G A G 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 0 8 1 ACT T C
C T C T T C C AAT A T AAT A T AGGA C AGG A C G G C G G C C A G G C C A G A C C G G A C C G E P E P E E E n n P P P i i n i n i n i w t w t w t w t w t P / / / / /x a E s t n E s t T n E s t n E s t n EmT l a i S l a i T S l a i T S a i T S E S l P r A Aa v A l P r Aa v A l P r Aa v A P l l r Aa v A P 00 B tt P tt B tt P t O t B t a a a a t a W 4 7 2 8 1 B 3 0. 7 3 T . B B . 9 5 0 M M 9 1 C GI L L 2 2 1 3 1 A F A A B B 1 B 2 1
T T T T T T T T T T
C TGT T CGT T C T T C C T T C AGGAG G A TG A C T C A C T C A AGG G GC T GC T G GC T GC T G G GC C A A G G GC C G A A G G G AG T G E P E E E E n P n P n P n P i i i i n i w t w t w t w t w t /s t n E /s t n E /s t / n E s t / n E s t n E l a T r a T a T T T l i S l i S l i S l a i S l a i S Aa v A l P r Aa l r v A P Aa l r v A P Aa l r v A P Aa v A P 00 P tt B t P t B t P t O a t a t a t a t a W 4 71 2 0.3 C A C A R T R T 1 7 . 9 A 5 0 9 1 R F F B 1 3 1 1 T R T C C G B 2 1
T T T T T T T T T T
G G G A T G G A A G T G A A G T G C C G C G C C G C E P E P E E E n P P P i n i n i n i n i w t w t w t w t w t /s t n E /s t /s t /s t /s t ai T S n E ai T S n E ai T S n E ai T S n E ai T l l r l l r l l r l l r l S Aa v A P Aa v A P Aa v A P Aa v A l P r Aa v A P 00 B tt P tt B t P t B t O a a t a t a t a W 4 71 2. 1 A 7 A A A 0 3 7 7 . 9 5 0 A B L L C N C N 9 1 O 1 3 1 G C 1 O C 1 A F A F 1 B 2 1
T T T T T T T T T T T T T T 0 2 2 TTGC G T C A G T C C T 2 1 2 T CC AGGAG G G 5 9 1 CAAGG CA G A G 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 1 8 1 AGG TA
C T C TGC G T C C T C TGC C C C G C T T G T C C C C CA G A G A G T G C C G A C G A E P E n P i n i 3 w t w t E 3 P E P /s t E /s t E P / P / n T n x a E x a E l l a i r S a A P l l a i T r S a mT mT A E S E S A v A v P P A P A B 0 0 P tt B tt 9 3 O - B t B t a a A t a t a W 4 93 1 1 7 1 2 0.3 1 .n 1 n s B 7 N C L . 9 5 0 9 1 m GI M O 1 3 1 1 S m S F L N B 2 1
T T D I : T OT Q T T T 1 1
2 A TAAT C C T A CA C A T AC TG A GAAGGT T GG C TG CGG AGT T ACA A C C GCAAG CA A C CA C A CA AAT G T G TA G ) 1 A C T C G C GAG A A A A GA AG TA CGC AA 7 0 2 AG G C CAG C G G 1 9 1 ATTGCAGAACGGAA T T TAT TA A G G 0 8 1 AC
G G C T T C C A A CA gn GT C C T C GAT T C GT CG A C T C C G GT C T GT C C C e L GA -ll T CAAAA A GA C C T G TGG TAG u G TAA T C A G A G GA TA C A GC T T G F – a rt e p t e e 1 x o p b a l E n - r e v w t Btt a ec 6 1 o e b n3 n E e e g mi n g iti e t E P u q r P a y P E d r n P E P E P i n i n i n i e t s w t w w w P / e x S t t t a E e r na x x x T A o t i a a a mS P N m i r i r t a m m d 6 m E A R g P d E V E P E P E P E P e e e t 0 0 B P i tt : s t n m i o B n 8 2 P O a B t 0 a 1 e g - t A t B t a t B t a t B t a t W 4 A a 7 2.B T e 8 2 l e t . s 1 S 5 6 R 2 9 1 2 1 0 7 3 . 9 a . S 5 0 1 C b a i S G V I C s G V 9 1 3 1 A T F A A C o R I F A A 1 B 2 1
1 2 2 ATCTAGGCAC T C CGAAGT C T TGGGGAT AGT GACAGACAC TATGGAAGA C T CGAA T C CAAACAAGTAG T C C GGTAGGGAC T T TGTATGGCAC C C A C CGCGG T CGCAAAC CA AC CAC G C T
T T G C C G T C T TAT T T T G C T C T T C T T G C C T T T G C TGC C T C AT TGAT T T C G T C C C C C C C T C C GC C TAT T T C C C C CGA GAA GA AAC T A C C G A C T C AA A GT AAAA G AA GAAAA AA AT CA C T G T T C T T T T CAGT CA T CG G TA T CA C T G T T CAATGG CAA A T GC T GCA A GAA A GCA A GCAA AC A G T A G A G G G G T T C T T T T A G A G GC T A GA T A p e p pb a l - r e e wBtt p b a l e r e Btt 6 t 1 v e b a - 6 e w t 1 v e b a o n o nE P E P E P E P E P E E E E E E n n n P P P P P P i i i n i n i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t w t x a x a x a x xm b b m m d a a 6 6 6 d 6 m b 6 m d E 6 P E P E P E P E P E P E P E P E P E P E P 00 P t B t P t B t P B P P B P B Ot a t a t a t t a t t a t t a t t a t tt tt tt W B a a a a 4 7 2 1 9 C 1.3 S V 5 R 5 6 R B 2 6 2 2 1 6 2 9 2 0 7 T B T . 9 a a . S V 5 R B T a . 5 0 1 A C C C C s o s o GI A C s G 9 1 3 1 A C C A A R R F A C C A o R I F 1 B 2 1
9 2 2 GAGGAC T A TGAC TAGA C C C T TGGC CAAA CG AGGA T G A C CAGG A C C AACAG TAGGGGT C C T GG TAAGT C C CGAACGGAG T G TGTATAGAAATAAGATGT A C CAAAC C T T TAC TATAGACGC
T C GT T T T T C G G A T T C A C A T C GG GGAT G G ACGGTGAT A T TG A C TGA T G G T G T AAGAGA T AAAAAGT TAAG T A AGT T GAT CAAACAG GT C A C A GT T GAT GAT T C T T T T A GT CAA G GT CAA C G T T T T g C T C CAT C T g T T T C C T C A C C C T C T T T A C G T AAT C A C A AT C C T T T T AC T T A G G A G G G A G G A G G G A G AT C E P E n P E P E P E P E P E P E P E P E P E P i n i n i n i n i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t w t xa x a x x x b b c a m b c a a 6 6 m m 6 6 6 m m b E 6 P E P E P E P E P E P E P E P E P E P E P 00 B tt P t B t P t B t P t B t P t B t P t B t O a t a t a t a t a t a t a t a t a t a t a W 4 71 2 1 A 0. 7 3 . 9 B B C A C A R T R T 1 7 L 5 0 9 1 L L M 2 M A 2 F F B A B O 1 3 1 1 A A B B R T R T C C G G C 1 B 2 1
0 4 2 TCT A G T CGC T G CGAAT T C G TGGGGAT AGTAACATACAT TGT GGCAGA C C CGAA C T CAAAAACGTAG T C C GGTAGGGAC T T TGTATGG C C CAA C C TGGGATAGGAAAAAGAAAAGGT
AT C T C C C C T C C T T C TGC T T G C C G T C TA T TAAC C C T C C C T C C T CAT C C GCGGC C TAT TG T T T C T T C C GAC C TGGCGA AAGAAC GAGGAC G A AA GAGT G T AAC AAC C C TATAAT TAG G TAGGAA TAG TAAAC G AT T T TAGTAAA AC g T C A T C C g T C T T C G CA C C T C CAAT T AC T T C C T C CA G A A T T G T T G T A G T G G T A G A G G G G T G T C G T A p pb a l e e - r B 6 e w t tt 1 v a o e b nE P E P E P E P E P E P E P E P E P E P E n n n n n n n i n P i i i i i i i n i n i n i w t w t w t w t w t w t w t w t w t w t w t xa x a x a x a x a x b m m d d a 6 6 6 m m b 6 b 6 m m E P E P E P E P E P E P E P E P E P E P E P 00 P t B P B P E 9 B P B P B P Ot tt tt tt tt 2 tt tt tt t t t a a a a a - a a a t a t a t a W 4 7 2A A A D 9 1 0.3 7 L C N C N 1 n 1 2 n . 1 S 1 S 5 7 R 5 R B T B T . 9 5 0 1 O A A G V V 9 I A A C C C C 1 3 1 C 1 F F m S m S F A A C C A A 1 B 2 1
3 2 2 CCGAGGAC A T TGACAT AGC C T C TGGCAC AACGGAAGT G A C CAGG A C C AACAGATGGGGT C C T GGTAAGT C C CGAACGGAG G T TGT TA AGAGAA CAAG TAT T C CAT CAA C T CAT T T TGT G C CAC T
T CGGG TAACAACAGT T GAC T GAA T T T A CAAA CAA T GT T T T C C T C A C C C T C T T AT C A C AT A TA C TAAT C AAT T T ATAAT G T T C C T T G AA AC G A G A G C G C T T G C G G C T T g T C g C T A G G A G G G A G G A G G E P E P E P E P E P E P E P E E E E E n n n P P P P P i i i n i n i n i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t w t w t xa x a x a x a x d d b b c b c a 6 6 6 6 m m 6 m 6 6 m m E P E P E P E P E P E P E P E P E P E P E P E P 00 B t P t B t P t B t P t B t P t B t P t B t P t Ot a t a t a t a t a t a t a t a t a t a t a t a W 4 71 2. 6 2 6 3 a 2 0 s a s B B M C A C A R T R T 1 1 7 . 9 A A 5 0 9 1 o o L L 2 M 2 R F F B B 1 3 1 1 R R A A B B T R T C C G G B 2 1
A GAT CAAC CAAAT T G C G CAT TGA T C TAT C T TGG C G C C CAC C C T T C T CAC T C CAAA CAGAC C TATGT T
T C T T T AC T T C C TATAATAG G TAGG TAA TAATAAAC G GAACAGT G A G AT C A A G A AC g T C A C C g T C T T T T GC T G T GC TA C A GC TAC A GC TAAT T T A G A G GC TAA A GT C T T T T p pb a l e e - r e w t Btt 61 v a o e b n EP E P E P E P E P E P E P E P E P E E n i n i n i n i n i n i n i n i n P i n P i n i w t w t w t w t w t w t w t w t w t w t w t xa x x x b a a a 6 b 6 m m d 6 d 6 m m b E P E P E P E P E P E P E P E 6 d P E 6 b P E 6 P E P 00 Bt P t B t P B P P P B P P O t t t tt tt t t t t t t a a a a a t a t t t t t W F a a a a a 4 A 7 A A A 7 2 9 1 0.3 L 7 L C N C N 1 n 1 2 1 6 2 7 . 9 n . B T S V 5 R a B 5 0 9 1 O O G s 1 3 1 C 1 C 1 A F A F m S m S I F C A A A C C o R L A 1 B 2 1
2 3 2 GGGAG T AGTAACAT A C C T TGT GGCAGA C C CGAA T C C AACAGC A T C GACAGATGGGGT C C T GGTAAGT C C CGAACGGAG G T TGTATAGAAAT CAGG T A C T C AACAGCAT GAC T G T C GGC A C T T
C T GT C T GGG GA A T C T T AG T T C C C C C C C C G TGAT G C TGGA T AGA T AA AAGT G CA TGA AA TGA AGT T CGATAAT GT T C G T T CAT T T g T T C G T C C T C A C T AT C T T T T AC TA ATAG A G G A G G G A G A AC g T GAA C T T T T T GC TA TAG AT A C g T T T AGT C T T T T AC T T A G G G A GT C T T T T E P E P E P E P E P E P E E E E n n n n n P P P P i i i i i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t x a x a x a x x x b a a am m 6 m b 6 m d 6 m m b E E E E 6 P P P P E P E P E P E P E P E P 00 P t P t B t P P P P P P P Ot t t tt tt tt tt tt tt t a a a a a a a t W 4 A a a a 7 2.M C 1 A A 0 A 1 0 3 2 A R T F A 7 B L C O N 1 3 1 7 . 9 n . C N A G A I A B B 5 0 9 1 L 1 3 1 1 B R T C G C 1 F m S F F G A B 2 1
2 3 2 GGGAG T AGTAACAT A C C T TGT GGCAGA C C CGAA T C C AACAGC A T C GACAGATGGGGT C C T GGTAAGT C C CGAACGGAG G T TGTATAGAAAT CAGG T A C T C AACAGCAT GAC T G T C GGC
A C T T AT T G C T G T GC T G C T C T T T C A CC GT C C C T C GC C C T C T C T GC A T T C C GC C T C GCG C G GTG CG G G T AA TA GAAT GA GA GAC GAGGAAA T T T T A G T CAA T T T G T G TAA CAG T CA A CAGT T T CAA T CAA T CAG T CAG A GCA A GC T A GCG A GCA A GC G T G C T T T g C T A G T T T T T T T T T GC T G p pb a l e e p al e e p al e e p a p p l e e a l e e a l e e - r 8 e 3 v w t Btt p b r o e b a - n 0 e 5 v w t Ptt p b r o e b a - n 8 e 3 v w t Ptt p b r Ptt p b r Ptt p r Ptt o e b a - n 8 e 2 v w t o e b a - n 8 e 1 v w t o e b a b n - e 8 v w t o e b a n E P E P E P E P E P E P E P E P E n i n i n i n i n i n i n i n P i n i w t w t w t w t w t w t w t w t w t x a x a x a x a x a x a x a x d am m 6 m m m m m m E P E P E P E P E P E P E P E P E P 05 8 - - 2 8 - 1- 8- 0 0 P tt P t a t P t a t B t a t P tt P tt B a B a a 8 P t 3 t P t O a t P t a t a W 4 7 2 0 4 1 0.3 1 3 . 3 . 1M C S 1 S 1 S 1 S 1 S 7 . 9 5 0 1 2 A n GI B V V V V V 9 L GI A 1 3 1 B R T m S F A F A A A A A A A A A 1 B 2 1
0 5 2 ATCTAGGC C C T TGGCAT TGC CGCGGAC A C TAAGAAAGAG T C TAGAG C AACGGT A C T CATAAAGGTAA T C C GGTAGGGAC T T TGTATGGCAC C C A C CGCGG T CGCAAAC CA AC CAC G C TGT CGA
GA G G GC G G C A GAG AAGT AA AA AAC AAG ATAA T CAG T CAG T T CAG AC T T T T AC TG AC TA AC T C AC T A A AC TA GC T A GC T G GC T G GC T G G A GT T G AA C G AA C G AT C G A A G AA C p e e p e e p e e p e e p e e p e e p e e p e p e p p b a l - r 8 e w Btt p b a l r e w Btt p b a l r e w Btt p b a l r e w Btt p b a l r e w Ptt p b a l r Ptt p b a l r Ptt p b a l r e Ptt a p l r e Ptt p b a l r 3 v t a - o e b n 8 2 v t a - o e b n 0 2 v t o e b a - n 2 1 v t o e b a - n 0 5 v t o e b a - n 8 e 3 v w t o e b a - n 8 e 2 v w t o e b a - n 8 e w t 1 v o e b a b n - e w t 8 v o e b a - n 8 e 3 v oE P E P E P E P E P E P E P E E E n n n n n n n P n P P i i i i i i i i n i n i w t w t w t w t w t w t w t w t w t w t x a x a x a x a x a x a x a x a x a x a m m m m m m m m m m E P E P E P E P E P E P E P E P E P E P 8 3 8 0 2 0 8 - 2- 2- 1- 5- - 2 8 - 1 8 - 8- 3- 0 0 B t B t B t B t P t P t P t P t P t B t Ot a t a t a t a t a t a 8 3 t a t a t a t a W 4 71 2. 1 S 1 1 1 0 3 V S V S V S V 5 R 5 R 5 R 5 R 5 7 R 5 R . 9 5 0 9 1 A A A A A A A A C C C 1 3 1 C C C C C C C C C 1 B 2 1
G T CA G CA G C G C TGC T T G C C G T C T TAT T T T G C TGAA TA A GAG T T GAC GCGGC C T A GAG C GAT TGAT AG T T C G T C C C C C C T C CA C A G C T GA A AGT AA AA A CT C C T AC T T CAG T T CAAAC T GAT T T TAGTAATGGA A A A A AA T AC T C C AC A G AC C G A A G A G GC T G GC T A G A G G G GT T GC T T C GC T A GA T A p p b a l e e p a e e p e e - r e w t Btt p b l - r e w t Btt p b a l - r e w Btt 8 2 v e b a 0 2 v e b a 2 v t e a o n o n 1 o b nE P E P E P E P E P E P E P E E E n n n n n P P P i i i i i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t x a x a x a s t n s t s t s t s t s t s t a n a n a n a n n nm m m l i l i l i l i a i a i a i E l P E P E P r l r l r l r l l r l l r l l r Aa v Aa v Aa v Aa v Aa v Aa v Aa v8 2 0 2 - 2- 1- 0 0 B t B B B P B P B P B Ot tt tt tt tt tt tt tt t t a a a a a a a a t a t a W 4 7 2 6 1. 5 R 5 3 1 1 6 R 5 R . S S 5 R 5 R B 2 0 7 3 T B T . 9 a 5 0 1 C C C G V V I A A C C C s 9 1 3 1 C C C F A A C C A C A o R 1 B 2 1
8 2 2 ATCTAGGCAC T C CGAAGT C T TGGGGAT AGT GACAGACAC TATGGAAGA C T CGAA T C CAAACAAGTAG T C C GGTAGGGAC T T TGTATGGC G C T A C C CGT C G TGGAAC CA AC TA AGAG C T T T C
AAC GAC GAAAAC GG G TG TG G T G TAA AAA AG T CAA T C A A T T GAC T GAA T T T AA AA AG AA T CA T C CAAC T G T T C G C T AT A AT T T ATA AT G C C C T T T G T A G T A G A G G T C G C C G C T T G C G G C T T T T g C T C T g C G T A E P E P E P E P E P E P E E E E n n n n n P P P P i i i i i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t st / / / n x a E x a E x a E s t n s t n s t s t s t s t T T T T l S T S n T S n T S n T S n T S l a i r Aa mS mS mS v E P A P E P A P E P A P l l a i r Aa A l a i A l a i A l a i A l a i A l a i A v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P / E F 0 0 P tt 7 3- P t A t B t B t P t B t P t B t P t B t O a a t a t a t a t a t a t a t a t a W 4 6 7 2 3 7 1 2. 1 8 3 0 7 3 . 9 a . s s S V 5 R B T .o GI A C C GI B L B C L M 2 M A C A 5 0 9 1 1 3 1 R F A C A F A A B 2 B R T R T 1 B 2 1
5 3 2 GGAC T A TGAC TAGA C C C T TGGC CAAA CG AGGA T G A C CAGG C A C A CAAG TAGGGGT C T CGG TA GAT C C CGA CAGGAG T G TGTATAGAAAAAC GATAT C A CACAC CAT TAC T TGGG T C T C TGGAA
AAG A AGT GA GA GG G AAG AAC C T C A C C C T T C T T T A AAT C A C A AT C T T T C T AC C T T C T C TA T TAA T A TAG G TAG G TAA C A A C g T C A C T T T C G CA G G A G G G A G G A G G G A G A G A A C g C T T G T T G T A E P E P E P E P E P E E E E E n n n n P P P P P i i i i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t st n s t s t s t s t s t s t s t s t s a T S n a T S n a T S n a T S n a T S n a T S n a T S n a T S n T t S nl l i r A l l i r A l l i A l i A l i A l i A l i A l i A l a i A l a i Aa v P / E Aa v P / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v 00 P t B t P t B t P t B t P B P B Ot t t t a t t t t t a a a a t a t a t a t a t a W 4 71 2 A A A A 0.3 R 9 T R T 1 1 7 L 7 L C C 1 1 7 . 5 0 1 F F A B A B O O N N n n 9 A 1 3 1 C C G G C 1 C 1 F A F m S m S 1 B 2 1
ta n i r T T T T
A ATGAA T C r AA AT e AGAAG k TT GAT GA ni G l GC C C T AGT GC AGT C : C Q
C GC C T TA G GC T A ni k p s C GT CGT G GC T C GAG G G A AT C ci a N e p a e / r t t e 2 p x o b l Bt n - r e w t t a e E c 6 1 v o e b n 3 3 n e e g mi n g iti e t E a y P E P E E u q r d r n i n i P P P e t s w t w t / xa E / T x E e S e r n S a T S A m i o ti ai r t x a x a m m N m m E A r R d a P P E P A P g P E V E P E P e B P e e ti 0 tt B t : s n m B 0 i o 8 O a t a C tt n 2 0 a e g - P t A t B t a t a W 4 7 2 1 1 8 2 1. C 1 0 7 3 9 L B T e l b e t . i s S V 5 R . 5 0 9 1 O C a S GI C 1 3 1 N A T F A A C 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A TTAGA C T 0 9 2 A T TAC GA T 6 9 1 TCGAG G C CG A G 2 9 1 GGAGGAT C G T G T T TA GT C 7 6 2
TTAC C A C AT CAG ATAG TAGGCAGT T G CA GT CG T C AA T G GT CGG TA G C CA G GC C C GT CGG TA ATAG G GC A T C GAG G G A AT C p pb a l e e - r 6 e Btt 1 v w t a o e b nE P E E n P P E P i n i n i n i w t w t w t w t xd a x a 6 m m b E 6 P E P E P E P 00 B t B t P t B Ot t t t a a t W 4 A a a 7 2 6 2 9 1 2 1 1 0. 7 3 . 9 a s . S V S V 5 R 5 0 1 o G 9 I A C 1 3 1 R F A A A C 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T 9 8 2 CTT CA GC T 7 9 1 CCAGAC C T C GA C 2 9 1 GGAGGAT C G T G T T TA GT C 6 6 2
TA GC GGG GGG TT CA TAG T G A GAGG G TGAG G G C C G TGAG G C A T C C C AG T T G G A AT C GA G A G AC C GA G G A G AC C A G GT CG AA T G E P E P E E n n P P i i n i n i w t w t w t w t xa x b a 6 m m d E 6 P E P E P E P 00 P tt B t P B O a t t a t t a t a W 4 71 2. 5 6 R B 2 0 7 3 T B T . 9 a 5 0 1 C C C s 9 1 3 1 C A A o R 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T TGA T C T T C 7 9 1 CCAGAC C T C GA C 1 9 1 GAATGAAATATGT TA GT C 9
6 2 6 2 6 2 6 2A TC CAG CG ACAG TAGGC GGGGG C C A C G CAGA AATAG GGAG C GA T T AC C A G A GT CGT G GC T C GAG G G A AT T C GAC G A G AC C G p pb a l e e - r 6 e Btt 1 v w t a o e b nE P E E n P P E P i n i n i n i w t w t w t w t xa x d m b a 6 6 m E P E P E P E P 00 P t P t B P Ot t t t a t t W B a a a 4 7 2 6 2 9 1. 2 1 0 7 3 . 9 a s . S V 5 R B T 5 0 1 o G 9 I A C 1 3 1 R F A C C A 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A TTAGA C T 0 9 2 A T TAC GA T 6 9 1 TCGAG G C CG A G 2 9 1 GGAGGAT C G T G T T TA GT C 7 6 2 TTAC
AT C T TAC T A C T TAC T A C T G T CAG T T G T A C G T A C A GT G C GT CG AA T A AA A AA G G A G A G G A G A G T g GA A GG C G E P E E E n P P P i n i n i n i w t w t w t w t x d a 6 b 6 b 6 m E P E P E P E P 00 B t B t P t B t Ot a t a t t W 4 C a a 7 2 6 2 9 1. 2 0 7 3 . 9 a s . B B 5 0 M 9 1 o GI L L 2 1 3 1 R F A A B 1 B 2 1
T T T T T T T T
A T G ATG C C GC T CGGAA CG A CCAA G GG CG C G G A CG A A AAGTGAG C G A G TGA G G G TGA G C G A G T g A G C G g T T G A A g T T G A G T T T G E P E E n P P E P i n i n i n i w t w t w t w t x a x m c a 6 m b E P E P E 6 P E P 00 P t B P B Ot tt tt t a a a t a W 4 71 2 0. 7 3 . 9 5 0M C 2 A C A R T 9 1 1 3 1 B R T R T F C 1 B 2 1
T T T T T T T T C TGAGC C C G T A CG G
CCAA CCA A A AAG A CG AT A CCCAT GA G A AC T G C G T A T A T G T AGC CGC C A AGC CGC C G T G TG T C T C A G G G A A A G A A A G G G G EP E E E n P n P n P i i i n i w t w t w t w t xa x c a 6 m m b E E E 6 P P P E P 00 Pt B P B O t tt tt t a a a t a W 4 71 2. RT 1 1 A 0 7 3 . 9 F A B A 7 B L 5 0 9 1 1 3 1 C G G O C 1 1 B 2 1
T T T T T T T T
G T G TG T C TA C C C C C C C G A C C C G A C C A C A TAC G G C G G G A C C C G A C C C G T A G A E P E E E n P n P P i i n i n i w t w t w t w t x x b a a 6 m m d E E 6 P P E P E P 00 P t B t P B Ot t tt t a a a t a W 4A A A 7 1 2 0.3 7 L C N C N 1 7 . 9 n 5 0 9 1 O A 1 3 1 C 1 F A F m S 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T TGA T C T T C 7 9 1 CCAGAC C T C GA C 1 9 1 GAATGAAATATGT TA GT C 9
6 2 6 2 6 2 6 2A T CAG CG ACAG AT CAG TAG TGC C C C A C G CAGA G CAGA AA AG GA T T AC C A G A GT CGT G GC C C GT CGT G GC T C GAG G G A AT C p pb a l e e - r e w Btt 61 v t o e b a nE P E E E n P n P P i i n i n i w t w t w t w t xa x d a 6 m m b E 6 P E P E P E P 00 P E t 9 B t P t B Ot a 2- t t tt W 4 D a a a 7 9 1 2.3 1 2n . 1 S 1 0 7 V S V 5 R . 9 5 0 1 m G 9 S I F A A A A C 1 3 1 C 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T 9 8 2 CTT CA GC T 7 9 1 CCAGAC C T C GA C 2 9 1 GGAGGAT C G T G T T TA GT C 6 6 2
TA GC GGG GGG TT CA TAG T G A GAGG G TGAG G G C C G TGAG G C A T C C C AG T T G G A AT C GA G A G AC C GA G G A G AC C A G GT CG AA T G E P E P E E n n P P i i n i n i w t w t w t w t xa x b a 6 m m d E 6 P E P E P E P 00 P tt B t P B O a t t a t t a t a W 4 71 2. 5 6 R B 2 0 7 3 T B T . 9 a 5 0 1 C C C s 9 1 3 1 C A A o R 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T TGA T C T T C 7 9 1 CCAGAC C T C GA C 1 9 1 GAATGAAATATGT TA GT C 9 6 2
A AGG A A C A C A G C C C C T T C T C T T C T A C T G C C C A T C C G T A G A ATGT G A A A AAC ATAAC TGA A GG C G T A G A G A G A G G A G A G g C G E P E E E n P P P i n i n i n i w t w t w t w t x d b b a 6 6 6 m E P E P E P E P 00 P tt B t O t P t B t a a t a t a W 4 71 2. 6 2 0 7 3 . 9 a s o B L B 5 0 9 1 L M 1 3 1 R A A 2 B 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T CTAC G C C G 3 9 1 CAA CG TA T C GA C 2 9 1 GGAGGAT C G T G T T TA GT C 1 7 2 A
T G TG C CG A CG A CCAA GC T C GA CG C GA CG AA T g GA G C GGG GGG AAAGC T A GG C G A g G T T TGA G A A g G T T TGA G G A GG T A T G T G E P E E E n P P P i n i n i n i w t w t w t w t x a xm c a 6 m b E 6 P E P E P E P 00 P tt B t O t P t B t a a t a t a W 4 71 2 0. 7 3 . 9M C A C A R T 5 0 9 1 2 F 1 3 1 1 B R T R T C B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T C CTT T T C A 7 9 1 CCAGAC C T C GA C 2 9 1 GGAGGAT C G T G T T TA GT C 3 7 2
CCAA CCAA A T C AT T C AAG A CG A A C C GA G A A TG C G T A C C A C C G T G TGC T T A T G T AGCG AGCG T A G G G A C A A G A C A A G G C G G E P E n P E P E P i n i n i n i w t w t w t w t xa x c a 6 m m b E 6 P E P E P E P 00 P t B P B Ot t a t t a t t a t a W 4 71 2.R T 1 1 A 0 7 3 . 9 A 7 L 5 0 9 1 F B A B O 1 3 1 C G G C 1 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T GCGT C A A 3 9 1 CAA CG TA T C GA C 1 9 1 GAATGAAATATGT TA GT C 5 7 2
GAAT C CC C CC C A A GG T G TGC T T C A C C C CGC C C A C C C CGC C C C C C G A C C G GT A G G G AG A C TAC C C C C G AG C A C C C GA T A G A E P E E E n P P P i n i n i n i w t w t w t w t x x b a a 6 m m d E 6 P E P E P E P 00 P t B t P t B Ot t a t t a a t a W 4 71 2A A A 0.3 7 L C N C N 1 7 . 9 n 5 0 9 1 O A A 1 3 1 1 C 1 F F m S B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T TGA T C T T C 7 9 1 CCAGAC C T C GA C 1 9 1 GAATGAAATATGT TA GT C 9
6 2 6 2 6 2 6 2ACAG CG GGGG T G ACAG TAGGC C C C A TAC C G TGA GAG C C G CA GA T A G A G A G AC C C G GT CGG TA G GC A TATA C GAGG G G A AT C p pb a l e e - r 6 e w t Btt 1 v a o e b n E P E P E P E n i n i n P i n i w t w t w t w t x x d a a 6 m m b E 6 P E P E P E P 00 P t P P B Ot t a t tt tt W F a a a 4 7 9 1 2.3 1 2 1 0 7 . 9n . B T S V 5 R 5 0 1 m G 9 S I F C A A A C 1 3 1 C 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T TGA T C T T C 7 9 1 CCAGAC C T C GA C 1 9 1 GAATGAAATATGT TA GT C 9 6 2
A AGG A C A G C C CG AC C C T T C T A C T G T C C A T C C G T A C A GC T GA G GGGCG A T A A AA TGAG C A GTGA G A G A G A G A G g A G C G g T T G A E P E P E E n n P P i i n i n i w t w t w t w t xa x d b a 6 6 m m E P E P E P E P 00 P tt P tt P t O t P t a a a t a W 4 71 2 6 2 0. 7 3 . 9 a s o B C L M 2 A 5 0 9 1 1 3 1 R A B R T 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T C CTT T T C A 6 9 1 TCGAG G C CG A G 2 9 1 GGAGGAT C G T G T T TA GT C 3 7 2
CCAA CCAT GA T C C CA A AAG A CG A A T CA CGCG C T A C C G T G TGC T C C C C C GG T A T G T AG G G AC CG A A G GT C A G G G AG C A C C C E P E E E n P P P i n i n i n i w t w t w t w t x x b a b a 6 m 6 m E P E P E P E P 00 B t P t P P Ot a t t a t t a t a W 4 71 2.3 R T 1 A A 0 7 7 L C N . 9 5 0 9 1 F A B O A 1 3 1 C G C 1 F 1 B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T TGA T C T T C 7 9 1 CCAGAC C T C GA C 1 9 1 GAATGAAATATGT TA GT C 9 6 2 AC C C C
TAC C C G A G AGA C AG G ACGC ATAAC GA T A C C C C A A G A G A G E P E n P E P E P i n i n i n i w t w t w t w t xa x d a 6 m m b E P E P E 6 P E P 00 P t Ot P t P t P t a t a t t W 4 A a a 7 0 1 2. 1 3 A 0 7 3 . 9n . C 1 5 0 1 G N A B 9 1 3 m I A B L 1 1 S F F G A B 2 1
T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T CTAC G C C G 3 9 1 CAA CG TA T C GA C 2 9 1 GGAGGAT C G T G T T TA GT C 1 7 2 A T G
T C CGGA C ACAG CG T TAC T AA GT G C GGG G C C C A C T C G A T g GAG C A G TGA A AC ATAAC A G G g T T G A G T A G A G A G A G p pb a l e e - r Bt 8 e 3 v w t t a o e b nE P E E E n P P P i n i n i n i w t w t w t w t x a x a x m m d a 6 m E P E P E P E P 00 P tt P t P B O a t t a t t a t W B a 4 7 2 0 1.M C 3 0 7 3 2 A 1 . 9 n . GI B 5 0 9 1 L 1 3 1 1 B R T m S F A B 2 1
T T T T T T T T T T T T T T T T T T T T T T 0 2 2 C TAT C T TGG C G C 8 8 2 CAC C 2 8 2 TT TGG A 1 9 1 GAGAT T T T G 5 6 2 A
G GG C C G TA G C C C G TA G C G C C G TA G C G CA A A C CAGA G C C TG C TG C TG C GT C G C G G C G C G G C G C G G C GT C G GC C GT C p p b a l e - r e p Ptt p b a l e r e p Ptt p b a l e r e p Ptt p b a l e r e p Ptt a l r 0 e 5 v w t a - e w t a - e w t a - e w t a p b e o e b n 8 3 v o e b n 8 2 v o e b n 8 1 v o e b n - 8 v oE P E E E E n P P P P i n i n i n i n i w t w t w t w t w t x a x a x a x a x a m m m m m E P E P E P E P E P 0 5 8 8 - - 2- 1- 8- 0 0 P t P t P t P P Ot t a t t a t t a 8 3 a t a W 4 71 2 0. 1 3 S 1 V S 1 V S 1 V S 1 7 V S V . 9 5 0 9 1 A 1 3 1 A A A A A A A A A 1 B 2 1
T T T T T T T T GAT CAAC CAAA A A GGG A A CA A C T A G T TG G G ATAC T TG GGAAA A T CA G A AG A
GG T G C GG T G C GG T G C GG T G C GG T G GC C GT C G GC C GT C G GC C GT C G GC C GT C G GC C e e p e p e p e p e w t Ptt p b a l r e e e w Btt p b a l r e Btt p b a l r e Btt p b a l r e Btt b a - 83 v t e b a - 8 e w t 2 v e b a - 0 e w t 2 v e b a - 2 e w t 1 v e b a n o n o n o n o n EP E P E E n i n P i n P i n i w t w t w t w t xa x a x a x a m m m m E P E P E P E P 83 8 - 2 0 - 2 2 - 1- 0 0 Bt B B B O t tt t t a a t a t a W 4 71 2. 1 0 3 S 1 V S 1 V S 1 V S 7 V . 9 5 0 9 1 A 1 3 1 A A A A A A A 1 B 2 1
T T T T T T T T
TA G AG T C TA G A G A G A AG T C T G T C T G T C ATGAGG G G A AT A T AG A TA AG A TA AG C GAG G G A AT C GAG G G A AT C GAG G G A AT C p p pb a l e e a l e e p al e e p al e e - r e w t Ptt p b- r e w t Ptt p b- r e w t Ptt p b r e w Ptt 05 v o e b a n 8 3 v o e b a n 8 2 v o e b a - n 8 1 v t o e b a n EP E P E E n n P n P i i i n i w t w t w t w t xa x a x a x a m m m m E P E P E P E P 05 8 8 - - 2- 1- 0 0 Pt O t P t P t P t a t a 8 3 t a t a W 4 71 2 0. 5 7 3 9 R 5 R 5 R 5 R . 5 0 9 1 C C 1 3 1 C C C C C C 1 B 2 1
T T T T T T T T T T T T T T 0 2 2 C T T TAT G C C T G T CGAG T C T GG C A T 9 8 2 CTT CA GC T 5 8 2 C T G TGGG CA GC T 1 9 1 GAATGAAATATGT TA GT C 6 6 2 TAA TA
GAGG G G G T A TGAG A TGAG A TGAG A TGA G G A A C G G A AT C G G A AT C G G A AT C G G A p a p l e b r e p Ptt p b a l e e p Btt p b a l e e p Btt p b a l e e p Btt p b a l e e Btt- e w t - r e w t - r e w t - r e w t - r e w t 8 v e b a 8 3 v e b a 8 v e a 0 v e a 2 v e a o n o n 2 o b n 2 o b n 1 o b nE P E P E P E P E n i n i n i n P i n i w t w t w t w t w t x a x a x a x a x a m m m m m E P E P E P E P E P 8 8 8 0 2- 3- 2- 2- 1- 0 0 P tt B tt B t B t B t O a a t a t a t a W 4 71 2 0. 5 R 5 R 5 R 5 7 3 R 5 R . 9 5 0 9 1 C 1 3 1 C C C C C C C C C 1 B 2 1
T T T T T T T T AAA C T A T C C GGT G T T CGAAAAC TA A A A
GGGG G G GGGGG TC A A AGG TGA GA T C C C GAG C C C G TGAG C C AG T C C C A G T C A TAC C G A G A C G G A G A C G GT C AA T G GA T A G A EP E P E E n P P i n i n i n i w t w t w t w t st n s t n s t n s t n ll a i r l l a i l a i l a i a r a l r a l r A v A v A v Aa v 00 Btt P tt B t O t P t a a a t a W 4 71 2. B 6 2 6 2 0 7 3 9 T B T . a s a 5 0 9 1 C C o s o 1 3 1 A A R R 1 B 2 1
T T T T T T T T
G C C CAG TA A TAG TAG G TGAG G C T T C G T A C A TG C GAG GAC ATAAC G C G G C G G A AT C G A G AC C G G A G A G EP E P E P E n P i n i n i n i w t w t w t w t / xa E / / s t T x a E T x a E T n T mS mS mS E P A P E P A P E P A P l l a i r S A Aa v P / E 00 Pt B t B P O t a t t a t t a t a W 4 7 2 1 8 1 0.3 S V 5 3 7 R B T . . 9 5 0 1 A G B 9 1 3 1 A C C C A I F L A 1 B 2 1
T T T T T T T T
GGG G G A g G T T TGA G C T G C T A C C G A GG T A T G T G G T A T G T GC CG G G G A A A EP E P E P E n n i n P i i n i w t w t w t w t st n s t s t s t ll a i T r S n a T S n T S n T S a A l l i r a A l l a i r a A l l a i r A A v P / E A v P / E A v P / E Aa v P / E 00 Btt P t B t P t O a t a t a t a W 4 71 2. 1 0 7 3 . 9 CA R T R T A 5 0 9 1 1 3 R F F B 1 1 T C C G B 2 1
T T T T T T T T
ACCCA G T GAAC AA T GAAC T CCACC AG GC C G T G TGC TA G T G TGC T C C C C CGC G AC C A A G GT C G G G GT C A G G G AG C A C C C EP E P E E n n P P i i n i n i w t w t w t w t st n s t s t s t ll a i T r S n A l l a i T S n A l a i T S n A l a i T S A Aa v P / E r Aa v P l / E r Aa v P l / E r Aa v P / E 00 Bt P B P O t t a t t a t t a t a W 4 71 2 1 A 7 A A 0.3 7 7 . 9 5 0 A B L L C N 9 1 3 G O C 1 O C 1 A 1 F 1 1 B 2 1
T T T T T T T T
CCACC ACAG CG ACAGG TGCC G C C CGC C C C C C C TGC C C C G C A T C A C C C A T AC C A A T T AC C G CGGG CG G A G A G A G A G A G G G A E P E E n P P i n i n i 3 w t w t w t E P st n s t s t / xa E ai T S n ai T S n ai T S T l l r a A l l r A l l r A mS A v P / E Aa v P / E Aa v P / E E P A P B 0 0 B tt P tt B t 9 3- B t Oa a t a A t a W 4 A 9 3 1 7 1 2 0.3 C N 1 . 7 . 9 n 1 n s B N 5 0 9 1 A m m GI M 1 3 1 1 F S S F L B 2 1
T T
o 2 AC AC TG A A GGA T T A C GG T T T C C G G T C N AG R T T AGC AA g C e T CG CTG T CGG T GC T C CG
G A C C C gn G C T TGT A T T T T C AG G C C A C G T i k G GG C T T G G G G G ci a N/ 2 r t e p t x o p b a l e E n - r e e w Btt 6 v t a ec 1 o e b n e g g E 3 n e m i ni ti e t a y P E n P E P i n n E u q r P d r t i i P e S e s w t w t w t / xa E e r n mT S A mo ti ai t x a x a d E P A P N i R r r g P d Ea m m 6 V E P E P E P eP e e Btt : tis 0 n m B 0 i o 8 O a D tt n 2 0 a 1 e g - P t A t B t a t B t a t a W 4 82 7 1 2. B 1 6 0 7 3 9 T e l b e t . i s S V 5 R 2 . a 5 0 9 1 C a S GI A C s 1 3 1 A T F A C o R 1 B 2 1
T T T TAAGAT CAAC CAAAT T G C G CAT TGA T C TAT C T TGG C G CGAT C A T A TAG TAC TAG G
A AAC AAT T C T C T G G G T A A T A T C T G C T T C T G C T TGT T A AT T T T C A TGGT C GT T G T G T T G T AT C A A C C TG G TGT T GT T G G A G GG C T T G G G G G AA CAT A GG C A T C GG C C AT T AGT G G GT C T T G GG C T C T G GG C T T p pb a l e e - r e w t Btt 61 v a o e b n EP E P E P E E E E E E n n n P n P n P n P n P P i i i i i i i n i n i w t w t w t w t w t w t w t w t w t xa x a x x x b a a a m m 6 b 6 m m d 6 d 6 m E P E P E P E P E P E P E P E P E P 00 Btt P tt B tt P t B t P t B t P t P t O a a a t a t a t t t t W 4 A a a a a 7 2 9 B 2 . 1 S 1 S 5 6 R 5 R B B 2 6 2 9 1. 2 . 1 0 S 7 3 . 9 5 0 1 G V V T T a I A A C C C C s a o s o G V 9 I A 1 3 1 1 F A A C C A A R R F A B 2 1
4 9 2 CTCAT CGTAATGCGGAGCAT AGT CATAGACAC T T T C TGGAG A C C GGA G T C AAAG CAGG TAGT CGGC TAGGAAGT T C T C T A G C CGGG T T CGGAAAGAAAGAT GATATACAAGAG C T T
A AG T T T A T T G GA AT C GAA T T T T T C G T TG C T G T G T GG AA C C T A CG C CA C CG T C T C C T AT CA T TGG T TGA AGC T AAC T T T C C T C AGGT T AG A G T C T C GT C C GGG C T C AG G C T T T T AAT C T g GC T T T T g G A g G CGA g AG CAGG G G G G A G G G G A G A G A A A A G A A A A G A p p b a l e e - r Bt 6 e w t t 1 v a o e b n E P E P E P E P E P E P E P E P E E n n n n n i n P P i i i i i n i n i n i n i w t w t w t w t w t w t w t w t w t w t xa x a x x b 6 m d 6 b 6 b a 6 m m c a 6 m b E 6 P E P E P E P E P E P E P E P E P E P 00 B tt P tt B tt B tt P t B t P t B t P t B t O a a a a t a t a t a t a t a t W 4 C a 7 2 5 6 R B 2 9 1. 2 0 7 3 T . 9 a . B B C A C A R T 5 0 9 1 C C s o GI L L M 2 M 2 F 1 3 1 C A R F A A B B R T R T C 1 B 2 1
T T T T TAAGAT CAAC CAAAT T G C G CAT TGA T C TAT C T TGG C G C A T T T A C C T C TAC TA G
G GAGAGAG GAGATAAT C CAG CAGAA C T A CA T T GAATA AT G T G GGAA A TA A T AT C T T C T C C C T T AG GA C T C C C T A C T C GC TGGC TG C C C C C G TGAAG C AAA T GA GT T GGA A A GT GG GG C T GG G AA C T C AG G AG C T T AG G AA G TA G TA C C C A C GT T C C GT T C C T T G C G AG C T C T G AC G AA C T C T T T T C C T T C G G A G GG C T EP E P E P E E E E E E n P P P P P P i n i n i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t xa x x x c a a a 6 m m b 6 b 6 m m d d E 6 6 P E P E P E P E P E P E P E P E P Ptt B t a t P t a t B t P t B t P E 0 0 t B t P t 9 O a t a t a t a t a t a t a 2- W 4 D 7 2 A A 9 1 0.3 R 9 T 1 1 A F A B A 7 A B L 7 L C O O N C N 1 n 1 2 n . 7 . 5 0 9 1 A A 3 m m GI 1 1 1 C G G C 1 C 1 F F S S F B 2 1
6 9 2 CC T C TAGGAGTAC CGTGGAC T A T G CAAC TACAT T TGGG CAGA C G CGAA T G CAAA CAGG TAGT CGGC TAGGAAGT T C C T T AGC GAC G C T GGAG TAGAAAAAGG G T T T C C CATA
C C C C T C A A T T C A C C T A C AAT AA C AT AA T T G T T T AT C G T T C G TA AG A C TAG T A T T G T T GG G T C GT T G T GT TG AT CA AA T T GT T T AAG C T T C T G C T TGT A T T T T A G G G G T C C TG GC T GC C G C T C T T GA CAT G T C C A C G T AGT T C T G AAT T AAT C G G A G G G G G G A A G G C A T G G G T G G C T G A G A G A C A p pb a l e e - r Bt 6 e 1 v w t t a o e b nE P E P E E E E E E E E n n P n P n P n P n P n P n P P i i i i i i i i n i n i w t w t w t w t w t w t w t w t w t w t x a x a x xm m b b a a 6 6 m m d E E 6 d 6 b 6 b P E 6 P P E P E P E P E P E P E P E P 00 B tt P tt B t P t B t P t B t P t B t P t O a a t a t a t a t a t a t a t a t a W 4 71 2. 1 S 1 V S V 5 6 R 5 R B 2 6 2 0 7 3 9 T B T . a a 5 0 1 A A C C C C s o s B B 9 1 3 1 A A C C A A R o R L A L A 1 B 2 1
3 0 3 A TGACAC AGC C T C TAGGAAAGCGG C AAT G A C CGA C A C G TAGAGACGTGGTACGG T C CAA T C CGGAG TGGAAGATAT TA AG C G G T C T CGAC TAG C C T AGGT TAGGC G T C TGGAA
A A T C C AT C TGCAT C CGAT C T CA T C T C GAATAT AT G T GCGGCAT T T GAC T C C CGG T C C TGA C C C C T A T C GTGGTGAGGT GG T GT GGC AGT AGGA GT T CGGAGAAGAA T g G AC T T A T T g GC C A G AA g G CG A AA g AG CAGG A G AG C T GG G AA C T C AG G AG C T T AG G AA C A T C GT T C T C GT T C C T T E P E n P E P E P E P E P E P E P E P E P i n i n i n i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t x a x a x a x xm m c m b a a 6 6 c 6 m m b E 6 b 6 P E P E P E P E P E P E P E P E P E P 00 B tt P tt B tt P tt B t P t B t P t B t P t O a a a a t a t a t a t a t a t a W 4 71 2M C 1 1 A A 0. 7 3 2 M 2 A C R A R T R T . 9 0 F F A 7 B A B L 7 L 5 9 1 O O 1 3 1 1 B B T R T C C G G C 1 C 1 B 2 1
3 1 3 TA T G CAAC T A C C T T TGGG CAGA C G CGAA T G C A C A CGG A T CGAGAAC TGGGATGC C G C TAA C T CGGG TAGGAAAG TA AT TAG G C T C T C C CACAGC A T CAA T T CGG C C C C T C CG AA C T
CA A C T C G TA T T A T T T T A A GA A T GAG A A T T G T T C A A T GG AA T C C CGC T G A G CGG C A AC C G T T A G G C T GT T A A G T TGA C TG C T GA T C T T T T C T T G G A G GG G T GG C G AT C T T G GG A T T T T T C C C T T C G G G G GG AA G AT C A ATG G A T G A C C C T C T C T C T C g A p pb a l e e - r e w Btt 61 v t a o e b nE P E P E P E P E P E E E E E n n n n P P P P P i i i i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t x a x a x a x x d d a a m m 6 6 m m b 6 d 6 b 6 m E P E P E P E P E P E P E P E P E P E P 00 B t P t B t P P P B P P P Ot t t tt tt tt tt tt tt ta a a a F a a a a a t a W 4 A A 7 2 9 1.C N C N 1 n 1 2 1 6 2 0 7 3 . 9 n . B T S V 5 R a B 5 0 1 A A GI C A C s o L M 9 2 1 3 1 F F m S m S F A A C R A B 1 B 2 1
A C CAAAT T G C G CAT TGA T C TAT C T TGG C G C C T C C G CAGC T C CAAA CAGAC C TATGT T T T T TGAG TA GGA
G C T C GGC C AGT CGGA A GGA G GGACGGATGC A g AGAGGG AGA A G TA C T C T CG C A T C T T C G CG C A T AGAAAAT C G A A C A G A C T G A C C G T C T G A C C G G C T G A C C G A C C G A C A EP E P E E E E E E E n n P n P n P P P P P i i i i n i n i n i n i n i w t w t w t w t w t w t w t w t w t xa x x x m b a a a x a 6 m b 6 m d 6 m m b E E E E E 6 P P P P P E P E P E P E P 00 Ptt B tt P t P t P t P t P t P t P t O a a t a t a t a t a t t t W 4 A a a a 7 1 2. C 1 A A 0 3 A 1 0 7 3 A R T . 9 0 A 7 R B L C N 1 n . C 5 1 G N A B 9 1 3 1 T F C G O C 1 A F m S I F A B F G L A 1 B 2 1
4 0 3 A TAC CAAAGCAT TGGT CAAA C C CGA A C T A A T CGG C A C G TAGAGACGTGGTACGG T C CAA T C CGGAG TGGAAGATAT TA AG C G G T C T CGAC TAG C C T AGGT TAGGC G T C TGGAAA T T C
GAC CGG T C AAT T T C T AAGAAATAAAT T AAA AACAGG C T C GGC C C T TG T T T T G C T TAC G T GGG TGT G TA G T T T g G A G AA g A AG CA A G GG C T A GG C G C C T T C T C T T G GC T A G G GA T G GG C T C T C T T G GA C G GC C p pb a l e e p - r e w t B e t e p e e p e e p e p e t p a b a l - r e w t Ptt p a b a l - r e w t Ptt p a b a l - r e w t Ptt p a b a l e - r e w t Ptt a p l e b r e w t Ptt 83 v o e b n 0 5 v o e b n 8 3 v o e b n 8 2 v o e b 8 1 v e b a - 8 v e b a n o n o nE P E P E P E P E P E E E E n n n n n P P P P i i i i i n i n i n i n i w t w t w t w t w t w t w t w t w t x a x a x x x x x x m m d a a a a a a 6 m m m m m mE P E P E P E P E P E P E P E P E P 05 8 - - 2 8 - 1- 8- 0 0 P t P t P t B t P t P t P t P P Ot t t t t t t tt t a a a B a a 8 3 t W 4 0 B a a a a 7 2 4 1. 1 1 1 1 0 3 9M C 1 3 . 3 . S 1 7 V S V S S S . 5 0 1 2 A n GI B V V V 9 L GI A A 1 3 1 B R T m S F A F A A A A A A A A 1 B 2 1
2 2 3 CC C T T GGCAC T T C C TGGGAG C A TGAG CAAAAG T C TAGGAC T A C G TGT A C TGAAG CAGG TAGT CGGC TAGGAAGT T C C T T AGC GAC G C T GGAG TAGAAAAAGG G T T T C C CATA
C C TG C CA T C C T C C T C T G T C T C T C C C T C AAT A AA AAA C AAC T G TAGT C G TAA T T T G TAA T G T T AA G T T C A G T T AG C T T C T T T G C C T T T G C T T C T G C T T T A TAAA T GAA TGT T A TA A T G T A T T C T T T T T T T T T T T T A C A A G GG T G GA G GC T T G G G G G G G G A G G A G G T C G G G G C T C C G G p p b a l e e p Btt p b a l e e p Btt p b a l e e p Btt p b a l e e p Bt p b a l e e p Pt p b a l e e p Pt p b a l e e p Pt p b a l e e p Pt a l e e p Pt p b a l - r 8 e w t - r e w t - r e w t - r e w t t - r e w t t - r e w t r e w t r e w t p r e w t r e 3 v o e b a n 8 2 v o e b a n 0 2 v o e b a n 2 1 v o e b a n 0 5 v o e b a n 8 3 v t o e b a - n 8 2 v t o e b a - n 8 1 v t o e b a b n - 8 v t o e b a - n 8 3 v oE P E E E E E E E E E n P n P P P P P P P P i i n i n i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t x a x a x a x a x a x a x a x a x a x a m m m m m m m m m m E P E P E P E P E P E P E P E P E P E P 8 3 8 - 2 0 - 2 2 - 1 0 - 5 8 - - 2 8 - 1- 8 8 - 3- 0 0 B t B t B t B P P P P P B Ot t t t a t t t t t t t a a a t a t a 8 3 t a t a t a t a W 4 71 2. 1 S 1 1 1 0 3 V S V S V S V 5 7 R 5 R 5 R 5 R 5 R 5 R . 9 5 0 9 1 A 1 3 1 A A A A A A A C C C C C C C C C C C C 1 B 2 1
T T TAAGAT CAAC CAAAT T G C G CAT TGA T C TAT C T TGG CGC C T T T T A C T C T C G TAC TAG G
A T A T GA AG A CAA G TA G TAC G T A A AAA C AAT T G TAT C G T C T T G G G T T C T C A T T AT G A T C T G T C GTGT T A T GGC T T G T T T G T AC T T T T T T T T C T T C T T A G G T A G T T C TGAAT G A T G C AG G T G G G G G A G GC T G G A G G C T T G G G G A C A G C C G C AT T G G e e p w t Btt p b a l e e p Btt p b a l e e p Btt p b a l e e Btt e b a - r 8 e w t - r e w t - r e w t 2 v e b a 0 2 v e b a 2 v e a n o n o n 1 o b n EP E P E P E P E P E P E P E E E n n n n n n P P P i i i i i i n i n i n i n i w t w t w t w t w t w t w t w t w t w t xa x a x a s t n s t s t s t s t s t s t m m m P E P E P l l a i n r l l a i n r l a i n l a i n l a i n l a i n l a i E Aa v Aa l r v Aa l r v Aa l r v Aa l r v Aa l r v Aa v 82 0 - 2 2 - 1- 0 0 Btt B tt B t B t P t B t P t B t P t B t O a a t a t a t a t a t a t a t a t a W 4 7 2 6 1 5 3 1 1 6 R 5 R 5 R . S S 5 R 5 R B 2 0. 7 3 T B . 9 5 0 1 C C C G V V T a I A A C C s 9 1 3 1 C C C F A A C C C A C A o R 1 B 2 1
T T T TAAGAT CAAC CAAAT T G C G CAT TGA T C TAT C T TGG C G CGAT C A T A TAG TAC TAG G
A AAT AAT TAC GT G TGGT TGAAA T C T A T CGAC T C C C TG G TGT T A T T G T T TGC TGC C GGT GG T GT C T C T G C T G T T C T G A AAT T T AA A G A G AT C AG AGC C A A A G A GG GT T G G C T G G C T T G G G G C C G C T g C T T T T g A g C A EP E P E P E E E E E E n n P P P P P P i i n i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t st / / / n x a E x a E x a E s t n s t n s t s t s t ll a i T r S T S T S a i T S a i T S n ai T S n ai T S n ai T S a m m m l l r a A l l r v Aa A l l r v Aa A l l r v Aa A l l r A A v E P A P E P A P E P A P A P / E P / E P / E v P / E Aa v P / E F 0 P t 7 0 3 P t B t B t P t B t P B P O t a - A t a t a t a t a t t a t t a t t a t a W 4 73 7 1 2. 62 a . s s 1 8 0 S V 5 3 7 3 R B T . B B . 9 5 0 M M C A 9 1 o GI A C C GI L L 2 2 1 3 1 R F A C A F A A B B R T 1 B 2 1
6 0 3 TCAAAGA CATGT TGC AAA C C CGA C A T A T A C G CGG A T CGAGAAC TGGGAT CG CG C TAA C T CGGG TAGGAAAG TATATAGGATGT CAAAAA C C TGAGTGTGG T C T CGTGGGAT T C C CG
GTG GTG A GG C T C AGC GA C C T A C C G ACG T AGG A g A A GT AG T GGAG TAA G TAA C CGT C CGGAC C T T T T CAGG A G AG C T GG G AA C T C AG G AG C T T AG G AA C A C GT T C C GT T C C T T G G AG C T T G G AA C T C G G A E P E P E P E P E P E E E E E n n n n P P P P P i i i i n i n i n i n i n i n i w t w t w t w t w t w t w t w t w t w t st n s t s t s t s t s t s t s t s t s a T S n a T S n a T S n a T S n a T S n a T S n a T S n a T S n T t S nl l i r A l l i r A l l i A l i A l i A l i A l i A l i A l a i A l a i Aa v P / E Aa v P / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v P l / E r Aa v 00 B t P t B t P t B t P t B P B P Ot t t t a t t t t t a a a a t a t a t a t a t a W 4 71 2 A A A A 0.3 9 C A R T R T 1 1 7 L 7 L C C 1 7 . 5 0 1 R T F C F A B A B C G G O C 1 O N N n 9 C 1 A 1 3 1 F A F m S 1 B 2 1
AG T A A T C GCA T T A T G T T AT T G AT G AC G AT A T AT AC C C A T C T T A C T GT A C T GAG C GT A T A T GT T T C TA A T T T T T AT A T T AT AT T T A AG CG AT C AC AG GG T AC A GG T A T A GG T A AG TGGT AGC AGCGAG TCGT TGGC TG GCA T T C GC T AT GC C GT AGC AGGA G AT A A G GA G A GT T G A AG G TT CG T C T C C A C T C G T C T T C A C T A T A C T T CG T G T T A G C GC GGCG C G T A G C T G CG G C TA G T C G GT T G AGG T G G TG A T A GA AC T AGAA CA GCA C CAGA A AA AG C GAGAA CA G AAA A AAT C A G T CA G G C GG T TA G T T C G AT C GT T A T T C C T T C CA GC C C C CAA AA G TAGAGA AAAG T C GC A A C G T C T C T C C C T AAA T G TACAG C T T G G C T T GT A C G T A A T T G G GGA G G C T C G G EP n i 3 E 3 3 w t P E P E P st / E / / T S n x a T S a T x a E T x a E T A l P l i / E r Aa A mS mS mS v P / E E P A P E P A P E P A P B 0 0 Btt 9 3- B t a A t B t a t B t O a t a W 4 9 2 3 1 1 7 1 0.3 1 . n s B 7 N C L B T . 9 5 0 9 1 m G S I F M L O N C 1 3 A 1 1 B 2 1
EQUIVALENTS AND SCOPE [0607] In the claims articles such as “a,” “an,” and “the” may mean one or more than one unless indicated to the contrary or otherwise evident from the context. Claims or descriptions that include “or” between one or more members of a group are considered satisfied if one, more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process unless indicated to the contrary or otherwise evident from the context. The invention includes embodiments in which exactly one member of the group is present in, employed in, or otherwise relevant to a given product or process. The invention includes embodiments in which more than one, or all of the group members are present in, employed in, or otherwise relevant to a given product or process. [0608] Furthermore, the invention encompasses all variations, combinations, and permutations in which one or more limitations, elements, clauses, and descriptive terms from one or more of the listed claims is introduced into another claim. For example, any claim that is dependent on another claim can be modified to include one or more limitations found in any other claim that is dependent on the same base claim. Where elements are presented as lists, e.g., in Markush group format, each subgroup of the elements is also disclosed, and any element(s) can be removed from the group. It should it be understood that, in general, where the invention, or aspects of the invention, is/are referred to as comprising particular elements and/or features, certain embodiments of the invention or aspects of the invention consist, or consist essentially of, such elements and/or features. For purposes of simplicity, those embodiments have not been specifically set forth in haec verba herein. It is also noted that the terms “comprising” and “containing” are intended to be open and permits the inclusion of additional elements or steps. Where ranges are given, endpoints are included. Furthermore, unless otherwise indicated or otherwise evident from the context and understanding of one of ordinary skill in the art, values that are expressed as ranges can assume any specific value or sub–range within the stated ranges in different embodiments of the invention, to the tenth of the unit of the lower limit of the range, unless the context clearly dictates otherwise. [0609] This application refers to various issued patents, published patent applications, journal articles, and other publications, all of which are incorporated herein by reference. If there is a conflict between any of the incorporated references and the instant specification, the specification shall control. In addition, any particular embodiment of the present invention that
B1195.70174WO00 12131093.2
falls within the prior art may be explicitly excluded from any one or more of the claims. Because such embodiments are deemed to be known to one of ordinary skill in the art, they may be excluded even if the exclusion is not set forth explicitly herein. Any particular embodiment of the invention can be excluded from any claim, for any reason, whether or not related to the existence of prior art. [0610] Those skilled in the art will recognize or be able to ascertain using no more than routine experimentation many equivalents to the specific embodiments described herein. The scope of the present embodiments described herein is not intended to be limited to the above Description, but rather is as set forth in the appended claims. Those of ordinary skill in the art will appreciate that various changes and modifications to this description may be made without departing from the spirit or scope of the present invention, as defined in the following claims.
B1195.70174WO00 12131093.2
Claims
CLAIMS What is claimed is: 1. A Bxb1 recombinase comprising an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to the amino acid sequence of SEQ ID NO: 1, wherein the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions at positions selected from the group consisting of amino acid residues 3, 5, 10, 14, 15, 20, 23, 24, 25, 29, 35, 36, 39, 40, 43, 45, 47, 49, 50, 51, 54, 58, 60, 66, 68, 69, 70, 73, 74, 75, 78, 84, 86, 87, 89, 93, 95, 97, 100, 101, 105, 116, 119, 124, 127, 139, 147, 154, 157, 158, 169, 175, 179, 181, 183, 185, 194, 197, 199, 202, 203, 204, 207, 208, 209, 214, 221, 229, 239, 248, 252, 261, 266, 267, 273, 279, 280, 281, 284, 285, 287, 288, 291, 309, 311, 321, 328, 333, 334, 342, 343, 345, 347, 360, 361, 362, 365, 368, 374, 375, 378, 389, 393, 400, 411, 415, 419, 421, 422, 424, 434, 435, 438, 440, 447, 449, 453, 462, 463, 466, 468, 469, 478, 483, 485, 487, 490, 494, 496, and 497 of the amino acid sequence provided in SEQ ID NO: 1, or at corresponding positions in a homologous recombinase. 2. The Bxb1 recombinase of claim 1, wherein the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3X, V5X, S10X, D14X, A15X, E20X, L23X, E24X, S25X, L29X, W35X, D36X, G39X, V40X, D43X, D45X, S47X, A49X, V50X, D51X, D54X, R58X, N60X, A66X, E68X, E69X, Q70X, D73X, V74X, I75X, Y78X, T84X, S86X, I87X, H89X, L93X, H95X, A97X, H100X, K101X, V105X, T116X, A119X, A124X, G127X, E139X, F147X, Y154X, S157X, L158X, D169X, V175X, V179X, R181X, R183X, L185X, N194X, P197X, H199X, A202X, H203X, D204X, R207X, R208X, G209X, K214X, Q221X, E229X, M239X, A248X, G252X, A261X, A266X, E267X, E273X, R279X, A280X, E281X, K284X, T285X, R287X, A288X, A291X, E309X, A311X, H321X, S328X, K333X, H334X, M342X, A343X, W345X, A347X, A360X, E361X, R362X, K365X, V368X, A374X, V375X, A378X, S389X, S393X, S400X, A411X, A415X, E419X, E421X, G422X, E424X, E434X, T435X, R438X, G440X, D447X, A449X, T453X, L462X,
B1195.70174WO00 12131093.
2
T463X, V466X, G468X, G469X, D478X, E483X, H485X, R487X, S490X, R494X, H496X, and T497X relative to the amino acid sequence provided in SEQ ID NO: 1, or at corresponding positions in a homologous recombinase, wherein X represents any amino acid other than the wild type amino acid.
3. The Bxb1 recombinase of claim 1 or 2, wherein the amino acid sequence of the Bxb1 recombinase comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or at least 10 substitutions selected from the group consisting of A3T, A3V, V5F, V5I, S10A, D14N, A15T, E20K, E20Q, L23F, L23M, E24K, S25I, L29F, W35P, W35L, D36G, D36V, G39D, V40I, D43E, D45G, S47A, A49E, A49T, V50I, D51E, D51N, D51Y, D54N, R58K, N60S, A66T, E68K, E69D, Q70P, D73G, V74A, V74M, I75V, Y78H, Y78N, T84S, S86G, S86N, S86T, I87T, I87V, H89N, L93M, H95Y, A97S, H100Y, K101R, V105I, T116P, A119S, A124S, G127E, E139A, F147Y, Y154C, S157G, L158M, D169N, V175I, V175M, V179G, V179M, R181Q, R183L, L185M, N194D, N194K, P197Q, P197T, H199Y, A202S, H203Y, D204G, R207I, R207Q, R208S, G209V, K214R, Q221R, E229K, M239L, A248V, G252S, A261V, A266T, E267D, E273D, E273K, R279C, A280T, E281K, K284N, T285A, R287P, A288T, A291S, A291T, E309D, A311V, H321N, S328T, K333N, H334P, M342V, A343T, W345L, A347T, A347V, A360T, E361D, E361G, R362K, K365N, V368A, V368N, A374V, V375I, A378T, S389R, S393F, S400Y, A411V, A415V, E419D, E421K, G422S, E424G, E434G, T435A, R438Q, G440E, D447N, A449V, T453A, T453N, L462M, T463I, V466M, G468D, G469R, D478E, E483K, H485Y, R487K, S490N, R494Q, H496P, and T497A relative to the amino acid sequence of SEQ ID NO: 1, or at corresponding positions in a homologous recombinase.
4. The Bxb1 recombinase of any one of claims 1-3, wherein the Bxb1 recombinase comprises a combination of substitutions of any one of the variants listed in Tables 1-7.
5. The Bxb1 recombinase of any one of claims 1-4, wherein the Bxb1 recombinase comprises substitutions at any of the following positions or groups of positions: Y78X; L23X and D51X; W35X; D51X; A49X; D45X; L29X; E229X; V105X; D73X; H100X; A449X; R183X; or E273X; wherein X represents any amino acid other than the wild type amino acid.
B1195.70174WO00 12131093.2
6. The Bxb1 recombinase of claim 5, wherein the Bxb1 recombinase comprises any of the following substitutions or groups of substitutions: Y78N; L23M and D51E; W35L; D51Y; A49E; D45G; L29F; E229K; D51N; Y78H; V105I; D73G; H100Y; A449V; R183L; or E273K.
7. The Bxb1 recombinase of any one of claims 1-4, wherein the Bxb1 recombinase comprises substitutions at any of the following positions or groups of positions: V5X, S86X, N194X, A266X, R362X, and A411X; A3X, V5X, and G39X; A3X, V5X, and K284X; H203X, A280X, and H485X; D36X, L185X, R362X, and H485X; S157X and H485X; A3X, V5X, F147X, H199X, R207X, and D447X; A3X and V5X; A3X, V5X, and H485X; A3X, V5X, and A374X; V5X and S86X; V5X, S86X, and T463X; V5X, S86X, and G469X; V5X, S86X, and A248X; V5X and N194X; V5X and H203X; V5X; S157X and A202X; E139X, S157X, and S400X; S157X; S157X and H485X; or A3X, V5X, A66X, V179X, and Q221X; wherein X represents any amino acid other than the wild type amino acid.
B1195.70174WO00 12131093.2
8. The Bxb1 recombinase of claim 7, wherein the Bxb1 recombinase comprises any of the following substitutions or groups of substitutions: V5I, S86G, N194D, A266V, R362K, and A411V; A3T, V5I, and G39D; A3T, V5I, and K284N; H203Y, A280T, and H485Y; D36G, L185M, R362K, and H485Y; S157G and H485Y; A3T, V5I, F147Y, H199Y, R207I, and D447N; A3T and V5I; A3T, V5I, and H485Y; A3T, V5I, and A374V; V5I and S86G; V5I, S86G, and T463I; V5I, S86G, and G469R; V5I, S86G, and A248V; V5I and N194D; V5I and H203Y; V5I; S157G and A202S; E139A, S157G, and S400Y; S157G; S157G and H485Y; or A3T, V5I, A66T, V179G, and Q221R.
9. The Bxb1 recombinase of any one of claims 1-4, wherein the Bxb1 recombinase comprises substitutions at any of the following positions or groups of positions: V5X, A119X, E281X, G422S, R487X; V5X, E281X, A311X, and G422X; V5X, A97X, H199X, A288X, G422X, E434X, and A449X; V5X, S86X, E281X, and G422X;
B1195.70174WO00 12131093.2
V5X, S86X, E281X, H334X, and G422X; A3X, S86X, and A347X; V5X, S86X, E281X, and G422X; S86X; A12X, I75X, G252X, T453X, and T503X; I75X, G252X, and T453X; S86X, S157X, and W345X; V5X, H89X, and E281X; A3X and E421X; A3X; S157X, T453X, and V466X; or A3X, A280X, E309X, and S328X; wherein X represents any amino acid other than the wild type amino acid.
10. The Bxb1 recombinase of claim 9, wherein the Bxb1 recombinase comprises any of the following substitutions or groups of substitutions: V5I, A119S, E281K, G422S, R487K; V5I, E281K, A311V, and G422S; V5I, A97S, H199Y, A288T, G422S, E434G, and A449V; V5I, S86N, E281K, and G422S; V5I, S86N, E281K, H334P, and G422S; A3T, S86N, and A347T; V5I, S86N, E281K, and G422S; S86N; A12D, I75V, G252S, T453A, and T503P; I75V, G252S, and T453A; S86N, S157G, and W345L; V5I, H89N, and E281K; A3T and E421K; A3T; S157G, T453A, and V466M; or
B1195.70174WO00 12131093.2
A3T, A280T, E309D, and S328T.
11. The Bxb recombinase of claim 9, wherein the Bxb1 recombinase comprises any of the following substitutions or groups of substitutions: V5I, A119S, E281K, G422S, R487K; V5I, E281K, A311V, and G422S; V5I, A97S, H199Y, A288T, G422S, E434G, and A449V; V5I, S86N, E281K, and G422S; V5I, S86N, E281K, H334P, and G422S; A3T, S86N, and A347T; V5I, S86N, E281K, and G422S; S86N; A12D, I75V, G252S, T453A, and T503P; S86N, S157G, and W345L; V5I, H89N, and E281K; A3T; S157G, T453A, and V466M; or A3T, A280T, E309D, and S328T.
12. The Bxb1 recombinase of any one of claims 1-4, wherein the Bxb1 recombinase comprises substitutions at any of the following positions or groups of positions: V5X, P197X, and R494X; D14X and E267X; D14X and E273D; D14X and E483X; E20X, D51X, and E68X; E20X;L29X, R183X, and K333X; D14X; E24X and A261X; I87X and E361X; Q70X and T84X;
B1195.70174WO00 12131093.2
A49X, S86X, and T116X; V5X, L29X, A124X, and R287X; E20X and E361X; V5X, L93X, S157X, D204X, and T453X; V5X, V74X, S389X, T453X, and G468X; V5X, S86X, and H321X; V5X, V74X, M239X, T453X, and G468X; V5X, S86X, S157X, K214X, E273X, and E361X; I75X, S157X, L158X, G252X, and T453X; I75X, S157X, G252X, and T453X; V5X, D14X, and R207X; V5X, R207X, T435X, and L462X; V50X, I87X, R208X, and V375X; D51X and V375X; V74X and M342X; V74X; D51X and V375X; N60X; S157X, G209X, and V368X; E20X, E69X, A291X, and A343X; or V5X, T116X, S157X, G209X, and T453X; wherein X represents any amino acid other than the wild type amino acid.
13. The Bxb1 recombinase of claim 12, wherein the Bxb1 recombinase comprises any of the following substitutions or groups of substitutions: V5I, P197T, and R494Q; D14N and E267D; D14N and E273D; D14N and E483K; E20K, D51N, and E68K; E20Q;L29F, R183L, and K333N;
B1195.70174WO00 12131093.2
D14N; E24K and A261V; I87V and E361D; Q70P and T84S; A49T, S86T, and T116P; V5I, L29F, A124S, and R287P; E20Q and E361D; V5I, L93M, S157G, D204G, and T453A; V5I, V74M, S389R, T453N, and G468D; V5I, S86N, and H321N; V5I, V74M, M239L, T453N, and G468D; V5I, S86N, S157G, K214R, E273D, and E361G; I75V, S157G, L158M, G252S, and T453A; I75V, S157G, G252S, and T453A; V5I, D14N, and R207Q; V5I, R207Q, T435A, and L462M; V50I, I87V, R208S, and V375I; D51E and V375I; V74M and M342V; V74A; D51E and V375I; N60S; S157G, G209V, and V368A; E20Q, E69D, A291T, and A343T; or V5I, T116P, S157G, G209V, and T453A.
14. The Bxb1 recombinase of any one of claims 1-4, wherein the Bxb1 recombinase comprises substitutions at any of the following groups of positions: L29X and H496P; A49X and H496X; L29X and E229X;
B1195.70174WO00 12131093.2
A49X and E229X; L29X and R207X; L29X and A261X; L29X and E361X; L29X and V375X; L29X and T453X; L29X and G468X; L29X and R494X; W35X and E229X; D51X and E229X; V105X and E229X; D14X and E229X; or V74X and E229X; wherein X represents any amino acid other than the wild type amino acid.
15. The Bxb1 recombinase of claim 14, wherein the Bxb1 recombinase comprises any of the following groups of substitutions: L29F and H496P; A49E and H496P; L29F and E229K; A49E and E229K; L29F and R207Q; L29F and A261V; L29F and E361G; L29F and V375I; L29F and T453A; L29F and T453N; L29F and G468D; L29F and R494Q; W35L and E229K; D51E and E229K;
B1195.70174WO00 12131093.2
V105I and E229K; D14N and E229K; or V74A and E229K.
16. The Bxb1 recombinase of any one of claims 1-4, wherein the Bxb1 recombinase comprises substitutions at any of the following positions or groups of positions: S157X, G209X, and V368X; A449X; V5X, T116X, S157X, G209X, and T453X; R183X; V5X, R207X, T435X, and L462X; E20X, E69X, A291X, and A343X; D51X; V40X; E24X and A261X; V5X, P197X, and R494X; N60X; D73X; V5X, L29X, A124X, and R287X; Y78X; V74X and M342X; E229X; Q70X and T84X; E20X, D51X, and E68X; E273X; H100X; E20X; E24X; D45X; L29X, R183X, and K333X; D14X;
B1195.70174WO00 12131093.2
D14X and E273X; W35X; V50X, I87X, R208X, and V375X; D51X and V375X; V5X, D14X, and R207X; E20X and E361X; D14X and E483X; D14X and E267X; I87X and E361X; A49X, S86X, and T116X; V74X; Y78X; L29X; V105X; E229X; or L29X, W35X, V74X, and V105X; wherein X represents any amino acid other than the wild type amino acid.
17. The Bxb1 recombinase of claim 16, wherein the Bxb1 recombinase comprises any of the following substitutions or groups of substitutions: S157G, G209V, and V368A; A449V; V5I, T116P, S157G, G209V, and T453A; R183L; V5I, R207Q, T435A, and L462M; E20Q, E69D, A291T, and A343T; D51N; V40I; E24K and A261V; V5I, P197T, and R494Q; N60S;
B1195.70174WO00 12131093.2
D73G; V5I, L29F, A124S, and R287P; Y78H; V74M and M342V; E229K; Q70P and T84S; E20K, D51N, and E68K; E273K; H100Y; E20Q; E24K; D45G; L29F, R183L, and K333N; D14N; D14N and E273D; W35L; D51Y; D51E; V50I, I87V, R208S, and V375I; D51E and V375I; V5I, D14N, and R207Q; E20Q and E361D; D14N and E483K; D14N and E267D; I87V and E361D; A49T, S86T, and T116P; V74A; Y78N; L29F; V105I; E229K; or
B1195.70174WO00 12131093.2
L29F, W35L, V74A, and V105I.
18. The Bxb1 recombinase of any one of claims 1-4, wherein the Bxb1 recombinase comprises substitutions at any of the following groups of positions: D14X, R207X, and T453X; D14X, A261X, and T453X; D14X, A261X, and V375X; D14X, R207X, and V375X; D14X, E361X, and V375X; V105X, A261X, and T453X; V105X, R207X, and T453X; V74X, A261X, and T453X; V74X, R207X, and V375X; V74X, R207X, and T453X; V105X, A261X, and V375X; V74X, A261X, and V375X; V105X, R207X, and V375X; D14X, E229X, and V375X; D14X, E229X, and T453X; V105X, E229X, and T453X; V105X, E229X, and V375X; V74X, E229X, and T453X; or V74X, E229X, and V375X; wherein X is any amino acid other than the wild type amino acid.
19. The Bxb1 recombinase of claim 18, wherein the Bxb1 recombinase comprises any of the following groups of substitutions: D14N, R207Q, and T453A; D14N, A261V, and T453A; D14N, A261V, and V375I; D14N, R207Q, and V375I;
B1195.70174WO00 12131093.2
D14N, E361G, and V375I; V105I, A261V, and T453A; V105I, R207Q, and T453A; V74A, A261V, and T453A; V74A, R207Q, and V375I; V74A, R207Q, and T453A; V105I, A261V, and V375I; V74A, A261V, and V375I; V105I, R207Q, and V375I; D14N, E229K, and V375I; D14N, E229K, and T453A; V105I, E229K, and T453A; V105I, E229K, and V375I; V74A, E229K, and T453A; or V74A, E229K, and V375I.
20. The Bxb1 recombinase of any one of claims 1-4, wherein the Bxb1 recombinase comprises substitutions at any of the following positions or groups of positions: V74X; or V74X, E229X, and V375X; wherein X is any amino acid other than the wild type amino acid.
21. The Bxb1 recombinase of claim 20, wherein the Bxb1 recombinase comprises substitutions at any of the following positions or groups of positions: V74A; or V74A, E229K, and V375I.
22. A polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21, optionally wherein the polynucleotide is RNA, e.g., mRNA.
23. A vector comprising the polynucleotide of claim 22.
B1195.70174WO00 12131093.2
24. A cell comprising the Bxb1 recombinase of any one of claims 1-21 the polynucleotide of claim 22, or the vector of claim 23.
25. A kit comprising the Bxb1 recombinase of any one of claims 1-21, the polynucleotide of claim 22, or the vector of claim 23.
26. A system comprising the Bxb1 recombinase of any one of claims 1-21.
27. The system of claim 26 further comprising a prime editor or one or more polynucleotides encoding the prime editor.
28. The system of claim 26 or 27 further comprising one or more prime editing guide RNAs (pegRNAs) or one or more polynucleotides encoding the one or more pegRNA, wherein each pegRNA comprises a DNA synthesis template encoding a recombinase recognition site recognizable by a Bxb1 recombinase.
29. A system comprising a polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21, a polynucleotide encoding a prime editor, and one or more prime editing guide RNAs (pegRNAs) or one or more polynucleotides encoding one or more pegRNAs, wherein each of the one or more pegRNAs comprises a DNA synthesis template encoding a recombinase recognition site.
30. The system of claim 29 further comprising a first prime editing guide RNA (pegRNA) or one or more polynucleotides encoding the first pegRNA, and a second pegRNA or one or more polynucleotides encoding the second pegRNA, wherein the first pegRNA comprises a first spacer, first gRNA core, first PBS, and a first DNA synthesis template, wherein the second pegRNA comprises a second spacer, second gRNA core, second PBS, and a second DNA synthesis template,
B1195.70174WO00 12131093.2
wherein the first spacer is complementary to a first site on a first strand of a double stranded target DNA, wherein the second spacer is complementary to a second site on a second strand of the double stranded target DNA, wherein the first DNA synthesis template comprises a region of complementarity to the second DNA synthesis template and optionally comprises a region not complementary to the second DNA synthesis template, and wherein the sequence 5'-[non-complementary region of the first DNA synthesis template]-[second DNA synthesis template]-3' comprises a recombinase recognition site recognizable by a Bxb1 recombinase.
31. The system of claim 30, wherein the first DNA synthesis template comprises the recombinase recognition site.
32. The system of claim 30 or 31, wherein the second DNA synthesis template comprises the recombinase recognition site.
33. The system of any one of claims 28-32, wherein the recombinase recognition site is an attB sequence.
34. The system of any one of claims 28-32, wherein the recombinase recognition site is an attP sequence.
35. The system of any one of claims 26-34 further comprising a polynucleotide comprising a donor DNA for insertion into a target nucleic acid.
36. The system of claim 35, wherein the donor DNA comprises one or more genes.
37. The system of claim 35 or 36, wherein the polynucleotide further comprises one or more recombinase recognition sites recognizable by a Bxb1 recombinase, optionally wherein the donor DNA is flanked by one or two recombinase recognition sites.
B1195.70174WO00 12131093.2
38. The system of claim 33, wherein the polynucleotide further comprises an attB sequence, optionally wherein the donor DNA is flanked by one or two attB recombinase recognition sites.
39. The system of claim 34, wherein the polynucleotide further comprises an attP sequence, optionally wherein the donor DNA is flanked by one or two attP recombinase recognition sites.
40. The system of claim 39, wherein the Bxb1 recombinase comprises a V74A substitution relative to SEQ ID NO: 1.
41. The system of claim 38, wherein the Bxb1 recombinase comprises V74A, E229K, and V375I substitutions relative to SEQ ID NO: 1.
42. The system of any one of claims 33-41, wherein the attB sequence comprises SEQ ID NO: 112, or the reverse complement thereof.
43. The system of any one of claims 34-42, wherein the attP sequence comprises SEQ ID NO: 111 or the reverse complement thereof.
44. The system of any one of claims 26-43, wherein the system comprises a pegRNA comprising a DNA synthesis template encoding an attP recombinase recognition site.
45. The system of any one of claims 26-44, wherein the system comprises a pegRNA comprising a DNA synthesis template encoding an attB recombinase recognition site.
46. The system of any one of claims 26-45, wherein the system comprises a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site and a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site.
47. The system of any one of claims 26-46, wherein the system comprises a first pegRNA comprising a DNA synthesis template encoding an attP recombinase recognition site and a
B1195.70174WO00 12131093.2
second pegRNA comprising a DNA synthesis template encoding an attB recombinase recognition site.
48. The system of claim 46 or 47, wherein the first and the second recombinase recognition sites are encoded in the same orientation.
49. The system of claim 46 or 47, wherein the first and the second recombinase recognition sites are encoded in opposite orientations.
50. The system of any one of claims 46-49, wherein the first pegRNA targets a first site on a genome, and the second pegRNA targets a second site upstream or downstream of the first site on the genome.
51. The system of any one of claims 46-49, wherein the first pegRNA targets a site on a first chromosome, and the second pegRNA targets a site on a second chromosome.
52. The system of any one of claims 26-51, wherein the prime editor comprises a nucleic acid programmable DNA-binding protein (napDNAbp) and a reverse transcriptase.
53. The system of claim 52, wherein the napDNAbp comprises an amino acid sequence of any one of SEQ ID NOs: 6-27, or an amino acid sequence that is at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to any one of SEQ ID NOs: 6-27.
54. The system of claim 52 or 53, wherein the napDNAbp is a Cas9 protein.
55. The system of any one of claims 52-54, wherein the napDNAbp is a Cas9 nickase.
56. The system of any one of claims 52-55, wherein the reverse transcriptase comprises an amino acid sequence of any one of SEQ ID NOs: 28-78, or an amino acid sequence that is at
B1195.70174WO00 12131093.2
least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to any one of SEQ ID NOs: 28-78.
57. The system of any one of claims 52-56, wherein the reverse transcriptase is an MMLV reverse transcriptase.
58. The system of any one of claims 52-57, wherein the napDNAbp or the reverse transcriptase is fused to the Bxb1 recombinase directly or via a peptide linker.
59. The system of any one of claims 52-57, wherein the prime editor is not covalently linked to the Bxb1 recombinase.
60. The system of any one of claims 26-59, wherein the prime editor is PE2, PE3, PE4, PE5, PE2max, PE3max, PE4max, PE5max, or a PE6 prime editor.
61. The system of any one of claims 29-60, wherein the polynucleotide encoding the Bxb1 recombinase and/or the polynucleotide encoding the prime editor comprises RNA.
62. A system comprising: (i) a pegRNA or a first polynucleotide encoding the pegRNA, wherein the pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site.
63. A system comprising: (i) one or more pegRNAs or a first polynucleotide encoding one or more pegRNAs, wherein the one or more pegRNAs each comprise a DNA synthesis template encoding a first
B1195.70174WO00 12131093.2
recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site.
64. A system comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21.
65. A system comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for installation at a site in a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site in a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site;
B1195.70174WO00 12131093.2
(iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21.
66. A system comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation of the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21.
67. The system of any one of claims 62-66, wherein each of the first, second, third, and fourth polynucleotides are comprised on separate vectors for recombinant expression.
68. The system of any one of claims 62-67, wherein any of the polynucleotides encoding the prime editor and/or the Bxb1 recombinase comprise RNA.
69. A composition comprising the Bxb1 recombinase of any one of claims 1-21, a prime editor, and one or more prime editing guide RNAs (pegRNAs) comprising a DNA synthesis template encoding a recombinase recognition site.
70. A composition comprising a polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21, a polynucleotide encoding a prime editor, and one or more prime editing guide RNAs (pegRNAs) or one or more polynucleotides encoding one or more pegRNAs, wherein the one or more pegRNAs each comprise a DNA synthesis template encoding a recombinase recognition site.
B1195.70174WO00 12131093.2
71. The composition of claim 69 or 70 further comprising a polynucleotide comprising DNA for insertion into a target nucleic acid.
72. The composition of claim 70 or 71, wherein the polynucleotide encoding the Bxb1 recombinase and/or the polynucleotide encoding the prime editor comprise RNA.
73. A composition comprising: (i) a pegRNA or a first polynucleotide encoding the pegRNA, wherein the pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site.
74. A composition comprising: (i) one or more pegRNAs or a first polynucleotide encoding one or more pegRNAs, wherein each of the one or more pegRNAs comprises a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into a target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site.
75. A composition comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for
B1195.70174WO00 12131093.2
installation at a first site on a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site on a target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21.
76. A composition comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site for installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site on a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21.
77. A composition comprising: (i) a first pegRNA or a first polynucleotide encoding the first pegRNA, wherein the first pegRNA comprises a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second pegRNA or a second polynucleotide encoding the second pegRNA, wherein the second pegRNA comprises a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation of the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site;
B1195.70174WO00 12131093.2
(iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21.
78. The composition of any one of claims 73-77, wherein each of the first, second, third, and fourth polynucleotides are comprised on separate vectors for recombinant expression.
79. The composition of any one of claims 73-78, wherein any of the polynucleotides encoding the prime editor and/or the Bxb1 recombinase comprise RNA.
80. A cell comprising the composition of any one of claims 69-79.
81. A kit comprising the composition of any one of claims 69-79.
82. A method for modifying one or more target nucleic acids in a cell comprising contacting the one or more target nucleic acids with the Bxb1 recombinase of any one or claims 1-21.
83. A method for modifying a target nucleic acid in a cell using prime editing and a recombinase, the method comprising expressing in the cell a polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21, a polynucleotide encoding a prime editor, and one or more polynucleotides encoding one or more prime editing guide RNAs (pegRNAs) comprising DNA synthesis templates encoding one or more recombinase recognition sites.
84. A method for modifying a target nucleic acid in a cell using prime editing and a recombinase, the method comprising expressing in the cell a polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21 and a polynucleotide encoding a prime editor, and providing to the cell one or more prime editing guide RNAs (pegRNAs) comprising DNA synthesis templates encoding one or more recombinase recognition sites.
85. The method of claim 83 or 84, wherein the polynucleotide encoding the Bxb1 recombinase and/or the polynucleotide encoding the prime editor comprises RNA.
B1195.70174WO00 12131093.2
86. The method of any one of claims 83-85 further comprising expressing in the cell a polynucleotide comprising DNA for insertion into the target nucleic acid.
87. The method of claim 86, wherein the ratio of the polynucleotide encoding the prime editor to the polynucleotide encoding the pegRNA to the polynucleotide encoding the Bxb1 recombinase to the polynucleotide encoding the DNA for insertion into the target nucleic acid is about 10:1:10:15.
88. The method of claim 86 or 87, wherein the DNA comprises one or more donor genes.
89. The method of any one of claims 86-88, wherein the DNA comprises a recombinase recognition site, optionally wherein the recombinase recognition site is an attB site or an attP site.
90. The method of claim 89, wherein the prime editor installs a recombinase recognition site in the target nucleic acid, optionally wherein the recombinase recognition site is an attP site or an attB site, thereby facilitating Bxb1-mediated recombination with the recombinase recognition site flanking the DNA, resulting in insertion of the DNA into the target nucleic acid.
91. The method of any one of claims 86-90, wherein the DNA is flanked on both sides by a recombinase recognition site, optionally wherein the recombinase recognition site is an attB site or an attP site.
92. The method of claim 91, wherein the prime editor installs a first instance and a second instance of a recombinase recognition site in the target nucleic acid, optionally wherein the first and second instance of the recombinase recognition site are both attP sites or both attB sites, thereby facilitating Bxb1-mediated recombination between the recombinase recognition sites in the target nucleic acid and the recombinase recognition sites flanking the DNA, resulting in excision of the target nucleic acid sequence between the first instance and the second instance of the recombinase recognition site and insertion of the DNA in its place.
B1195.70174WO00 12131093.2
93. The method of claim 86, comprising expressing in the cell a polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site, and a polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site.
94. The method of claim 93, wherein the first and the second recombinase recognition sites are encoded in the same orientation.
95. The method of claim 94, wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated deletion of the nucleic acid between the first and the second recombinase recognition sites.
96. The method of claim 94, wherein a first prime editor installs the first recombinase recognition site into a target nucleic acid sequence on a first chromosome, and a second prime editor installs the second recombinase recognition site into a target nucleic acid sequence on a second chromosome, thereby facilitating Bxb1-mediated recombination between the two chromosomes.
97. The method of claim 93, wherein the first and the second recombinase recognition sites are encoded in opposite orientations.
98. The method of claim 97, wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated inversion of the nucleic acid sequence between the first and the second recombinase recognition sites.
B1195.70174WO00 12131093.2
99. A method for inserting DNA into a target nucleic acid in a cell using prime editing and a recombinase, the method comprising expressing in the cell: (i) a first polynucleotide encoding a pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination with the second recombination site, resulting in insertion of the DNA into the target nucleic acid.
100. A method for exchanging DNA in a target nucleic acid in a cell using prime editing and a recombinase, the method comprising expressing in the cell: (i) a first polynucleotide encoding one or more pegRNAs comprising a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs a first instance and a second instance of the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination between the first recombinase recognition sites in the target nucleic acid and the second recombinase recognition sites flanking the DNA, resulting in excision of the target nucleic acid sequence between the first instance and the second instance of the first recombinase recognition site and insertion of the DNA in its place.
B1195.70174WO00 12131093.2
101. A method for deleting DNA from a target nucleic acid in a cell using prime editing and a recombinase, the method comprising expressing in the cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated deletion of the nucleic acid between the first and the second recombinase recognition sites.
102. A method for recombining target nucleic acids in two chromosomes in a cell using prime editing and a recombinase, the method comprising expressing in the cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site on a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; wherein a first prime editor installs the first recombinase recognition site into a target nucleic acid on a first chromosome, and a second prime editor installs the second recombinase
B1195.70174WO00 12131093.2
recognition site into a target nucleic acid on a second chromosome, thereby facilitating Bxb1- mediated recombination between the two chromosomes.
103. A method for inverting a target nucleic acid in a cell using prime editing and a recombinase, the method comprising expressing in the cell: (i) a first polynucleotide encoding a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; (ii) a second polynucleotide encoding a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation as the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; (iii) a third polynucleotide encoding a prime editor; and (iv) a fourth polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated inversion of the nucleic acid between the first and the second recombinase recognition sites.
104. A method for inserting DNA into a target nucleic acid in a cell using prime editing and a recombinase, the method comprising providing to the cell a pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site; and expressing in the cell: (i) a first polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; (ii) a second polynucleotide encoding a prime editor; and (iii) a third polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA comprises a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site;
B1195.70174WO00 12131093.2
wherein the prime editor installs the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination with the second recombination site, resulting in insertion of the DNA into the target nucleic acid.
105. A method for exchanging DNA in a target nucleic acid in a cell using prime editing and a recombinase, the method comprising providing to the cell one or more pegRNAs comprising a DNA synthesis template encoding a first recombinase recognition site for installation at one or more sites in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site; and expressing in the cell: (i) a first polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; (ii) a second polynucleotide encoding a prime editor; and (iii) a third polynucleotide encoding a DNA for insertion into the target nucleic acid, wherein the DNA is flanked on both sides by a second recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; wherein the prime editor installs a first instance and a second instance of the first recombinase recognition site in the target nucleic acid, thereby facilitating Bxb1-mediated recombination between the first recombinase recognition sites in the target nucleic acid and the second recombinase recognition sites flanking the DNA, resulting in excision of the target nucleic acid sequence between the first instance and the second instance of the first recombinase recognition site and insertion of the DNA in its place.
106. A method for deleting DNA from a target nucleic acid in a cell using prime editing and a recombinase, the method comprising providing to the cell 1) a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a first site in a target nucleic acid, optionally wherein the first recombinase recognition site is an attP site or an attB site, and 2) a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a second site in the same target nucleic acid, optionally wherein the second recombinase recognition site is an attB site or an attP site; and expressing in the cell: (i) a first polynucleotide encoding a prime editor; and (ii) a second polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21;
B1195.70174WO00 12131093.2
wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated deletion of the nucleic acid between the first and the second recombinase recognition sites.
107. A method for recombining target nucleic acids in two chromosomes in a cell using prime editing and a recombinase, the method comprising providing to the cell 1) a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site for installation at a site on a first chromosome, optionally wherein the first recombinase recognition site is an attP site or an attB site, and 2) a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the same orientation as the first recombinase recognition site for installation at a site on a second chromosome, optionally wherein the second recombinase recognition site is an attB site or an attP site; and expressing in the cell: (i) a first polynucleotide encoding a prime editor; and (ii) a second polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21; wherein a first prime editor installs the first recombinase recognition site into a target nucleic acid on a first chromosome, and a second prime editor installs the second recombinase recognition site into a target nucleic acid on a second chromosome, thereby facilitating Bxb1- mediated recombination between the two chromosomes.
108. A method for inverting a target nucleic acid in a cell using prime editing and a recombinase, the method comprising providing to the cell 1) a first pegRNA comprising a DNA synthesis template encoding a first recombinase recognition site, optionally wherein the first recombinase recognition site is an attP site or an attB site, and 2) a second pegRNA comprising a DNA synthesis template encoding a second recombinase recognition site in the opposite orientation as the first recombinase recognition site, optionally wherein the second recombinase recognition site is an attB site or an attP site; and expressing in the cell: (i) a first polynucleotide encoding a prime editor; and (ii) a second polynucleotide encoding the Bxb1 recombinase of any one of claims 1-21;
B1195.70174WO00 12131093.2
wherein a first prime editor installs the first recombinase recognition site into the target nucleic acid, and a second prime editor installs the second recombinase recognition site into the target nucleic acid at a position upstream or downstream of the first recombinase recognition site, thereby facilitating Bxb1-mediated inversion of the nucleic acid between the first and the second recombinase recognition sites.
109. The method of any one of claims 99-108, wherein any of the polynucleotides encoding the Bxb1 recombinase and/or the prime editor comprise RNA.
110. The method of any one of the preceding claims, wherein a first pegRNA and a second pegRNA each produce DNA flaps on the target nucleic acid that partially overlap each other, wherein each flap comprises a 5′ portion that does not overlap with the other flap.
111. The method of claim 110, wherein the partially overlapping flaps promote integration of a donor DNA into the target nucleic acid and prevent recombination between a polynucleotide encoding the donor DNA and a polynucleotide encoding a pegRNA.
112. The method of any one of claims 82-111, wherein the method is performed in vitro.
113. The method of any one of claims 82-111, wherein the method is performed ex vivo.
114. The method of any one of claims 82-111, wherein the method is performed in vivo.
115. The method of claim 114, wherein the method is performed in a subject.
116. The method of claim 115, wherein the subject is a human.
117. The method of any one of claims 82-116, wherein the target nucleic acid is edited in order to treat a disease or disorder.
118. Use of the recombinase of any one of claims 1-21 in the manufacture of a medicament.
B1195.70174WO00 12131093.2
119. The recombinase of any one of claims 1-21 for use in medicine.
B1195.70174WO00 12131093.2
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202363484184P | 2023-02-09 | 2023-02-09 | |
US63/484,184 | 2023-02-09 | ||
US202463619465P | 2024-01-10 | 2024-01-10 | |
US63/619,465 | 2024-01-10 |
Publications (3)
Publication Number | Publication Date |
---|---|
WO2024168147A2 WO2024168147A2 (en) | 2024-08-15 |
WO2024168147A3 WO2024168147A3 (en) | 2024-09-19 |
WO2024168147A9 true WO2024168147A9 (en) | 2024-10-10 |
Family
ID=90366757
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2024/014998 WO2024168147A2 (en) | 2023-02-09 | 2024-02-08 | Evolved recombinases for editing a genome in combination with prime editing |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024168147A2 (en) |
Family Cites Families (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4880635B1 (en) | 1984-08-08 | 1996-07-02 | Liposome Company | Dehydrated liposomes |
US4921757A (en) | 1985-04-26 | 1990-05-01 | Massachusetts Institute Of Technology | System for delayed and pulsed release of biologically active substances |
US4920016A (en) | 1986-12-24 | 1990-04-24 | Linear Technology, Inc. | Liposomes with enhanced circulation time |
JPH0825869B2 (en) | 1987-02-09 | 1996-03-13 | 株式会社ビタミン研究所 | Antitumor agent-embedded liposome preparation |
US4911928A (en) | 1987-03-13 | 1990-03-27 | Micro-Pak, Inc. | Paucilamellar lipid vesicles |
US4917951A (en) | 1987-07-28 | 1990-04-17 | Micro-Pak, Inc. | Lipid vesicles formed of surfactants and steroids |
US5244797B1 (en) | 1988-01-13 | 1998-08-25 | Life Technologies Inc | Cloned genes encoding reverse transcriptase lacking rnase h activity |
AU785007B2 (en) | 1999-11-24 | 2006-08-24 | Mcs Micro Carrier Systems Gmbh | Polypeptides comprising multimers of nuclear localization signals or of protein transduction domains and their use for transferring molecules into cells |
US8178291B2 (en) | 2005-02-18 | 2012-05-15 | Monogram Biosciences, Inc. | Methods and compositions for determining hypersusceptibility of HIV-1 to non-nucleoside reverse transcriptase inhibitors |
US9783791B2 (en) | 2005-08-10 | 2017-10-10 | Agilent Technologies, Inc. | Mutant reverse transcriptase and methods of use |
WO2008132722A1 (en) | 2007-04-26 | 2008-11-06 | Ramot At Tel-Aviv University Ltd. | Pluripotent autologous stem cells from oral mucosa and methods of use |
AU2010221284B2 (en) | 2009-03-04 | 2015-10-01 | Board Of Regents, The University Of Texas System | Stabilized reverse transcriptase fusion proteins |
US9458484B2 (en) | 2010-10-22 | 2016-10-04 | Bio-Rad Laboratories, Inc. | Reverse transcriptase mixtures with improved storage stability |
JO3470B1 (en) | 2012-10-08 | 2020-07-05 | Merck Sharp & Dohme | 5-phenoxy-3h-pyrimidin-4-one derivatives and their use as hiv reverse transcriptase inhibitors |
AU2017308889B2 (en) | 2016-08-09 | 2023-11-09 | President And Fellows Of Harvard College | Programmable Cas9-recombinase fusion proteins and uses thereof |
US9580698B1 (en) | 2016-09-23 | 2017-02-28 | New England Biolabs, Inc. | Mutant reverse transcriptase |
BR112021018607A2 (en) | 2019-03-19 | 2021-11-23 | Massachusetts Inst Technology | Methods and compositions for editing nucleotide sequences |
GB202006462D0 (en) * | 2020-05-04 | 2020-06-17 | Mote Res Limited | Modifying genomes with integrase |
AU2021350835A1 (en) | 2020-09-24 | 2023-04-27 | President And Fellows Of Harvard College | Prime editing guide rnas, compositions thereof, and methods of using the same |
KR20230091894A (en) * | 2020-10-21 | 2023-06-23 | 메사추세츠 인스티튜트 오브 테크놀로지 | Systems, methods, and compositions for site-specific genetic engineering using programmable addition via site-specific targeting elements (PASTE) |
EP4347859A4 (en) * | 2021-05-26 | 2025-07-02 | Flagship Pioneering Innovations Vi Llc | INTEGRASE COMPOSITIONS AND METHODS |
-
2024
- 2024-02-08 WO PCT/US2024/014998 patent/WO2024168147A2/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2024168147A3 (en) | 2024-09-19 |
WO2024168147A2 (en) | 2024-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20250011748A1 (en) | Base editors, compositions, and methods for modifying the mitochondrial genome | |
US20240209338A1 (en) | Evolution of cytidine deaminases | |
US20240417719A1 (en) | Methods and compositions for editing a genome with prime editing and a recombinase | |
CN116497067B (en) | Compositions and methods for treating hemoglobinopathies | |
US20220315906A1 (en) | Base editors with diversified targeting scope | |
US20230021641A1 (en) | Cas9 variants having non-canonical pam specificities and uses thereof | |
US20220307001A1 (en) | Evolved cas9 variants and uses thereof | |
US20240173430A1 (en) | Base editing for treating hutchinson-gilford progeria syndrome | |
US20220177877A1 (en) | Highly multiplexed base editing | |
WO2021158999A1 (en) | Gene editing methods for treating spinal muscular atrophy | |
AU2022206476A1 (en) | Prime editor variants, constructs, and methods for enhancing prime editing efficiency and precision | |
EP4143315A1 (en) | <smallcaps/>? ? ?ush2a? ? ? ? ?targeted base editing of thegene | |
AU2022325166A1 (en) | Improved prime editors and methods of use | |
WO2024155745A1 (en) | Base editing-mediated readthrough of premature termination codons (bert) | |
WO2024155741A1 (en) | Prime editing-mediated readthrough of premature termination codons (pert) | |
WO2024168147A9 (en) | Evolved recombinases for editing a genome in combination with prime editing | |
WO2023205687A1 (en) | Improved prime editing methods and compositions | |
WO2024108092A1 (en) | Prime editor delivery by aav | |
WO2024077267A1 (en) | Prime editing methods and compositions for treating triplet repeat disorders | |
AU2023325079A1 (en) | Evolved cytosine deaminases and methods of editing dna using same | |
WO2024243415A1 (en) | Evolved and engineered prime editors with improved editing efficiency | |
CN118056010A (en) | Improved boot editor and method of use | |
EP4323384A2 (en) | Evolved double-stranded dna deaminase base editors and methods of use | |
CN117321201A (en) | Guided editor variants, constructs, and methods for enhancing guided editing efficiency and accuracy | |
HK1261797A1 (en) | Evolved cas9 proteins for gene editing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24712644 Country of ref document: EP Kind code of ref document: A2 |