WO2024102666A2 - Serine recombinases for gene editing - Google Patents
Serine recombinases for gene editing Download PDFInfo
- Publication number
- WO2024102666A2 WO2024102666A2 PCT/US2023/078852 US2023078852W WO2024102666A2 WO 2024102666 A2 WO2024102666 A2 WO 2024102666A2 US 2023078852 W US2023078852 W US 2023078852W WO 2024102666 A2 WO2024102666 A2 WO 2024102666A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- aav
- sequence
- attachment site
- gene editing
- derivative
- Prior art date
Links
- 102000018120 Recombinases Human genes 0.000 title claims abstract description 179
- 108010091086 Recombinases Proteins 0.000 title claims abstract description 179
- MTCFGRXMJLQNBG-UHFFFAOYSA-N Serine Natural products OCC(N)C(O)=O MTCFGRXMJLQNBG-UHFFFAOYSA-N 0.000 title claims abstract description 166
- 238000010362 genome editing Methods 0.000 title claims abstract description 79
- 150000007523 nucleic acids Chemical group 0.000 claims abstract description 129
- 238000000034 method Methods 0.000 claims abstract description 56
- 230000010354 integration Effects 0.000 claims abstract description 29
- 102000039446 nucleic acids Human genes 0.000 claims description 122
- 108020004707 nucleic acids Proteins 0.000 claims description 122
- 210000004027 cell Anatomy 0.000 claims description 105
- 241000700605 Viruses Species 0.000 claims description 96
- 102000040430 polynucleotide Human genes 0.000 claims description 73
- 108091033319 polynucleotide Proteins 0.000 claims description 73
- 239000002157 polynucleotide Substances 0.000 claims description 73
- 108090000623 proteins and genes Proteins 0.000 claims description 51
- 230000006798 recombination Effects 0.000 claims description 43
- 238000005215 recombination Methods 0.000 claims description 43
- 125000003729 nucleotide group Chemical group 0.000 claims description 42
- 101710163270 Nuclease Proteins 0.000 claims description 39
- 239000002773 nucleotide Substances 0.000 claims description 39
- 108091033409 CRISPR Proteins 0.000 claims description 30
- 101000607560 Homo sapiens Ubiquitin-conjugating enzyme E2 variant 3 Proteins 0.000 claims description 30
- 102100039936 Ubiquitin-conjugating enzyme E2 variant 3 Human genes 0.000 claims description 30
- 239000013598 vector Substances 0.000 claims description 29
- 210000001106 artificial yeast chromosome Anatomy 0.000 claims description 26
- 239000003550 marker Substances 0.000 claims description 26
- 239000013612 plasmid Substances 0.000 claims description 25
- 210000004436 artificial bacterial chromosome Anatomy 0.000 claims description 24
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 23
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 21
- 230000003612 virological effect Effects 0.000 claims description 16
- 102000008579 Transposases Human genes 0.000 claims description 13
- 108010020764 Transposases Proteins 0.000 claims description 13
- 230000001580 bacterial effect Effects 0.000 claims description 13
- 230000001105 regulatory effect Effects 0.000 claims description 13
- 241001529453 unidentified herpesvirus Species 0.000 claims description 13
- 239000002458 cell surface marker Substances 0.000 claims description 12
- 108091006047 fluorescent proteins Proteins 0.000 claims description 12
- 102000034287 fluorescent proteins Human genes 0.000 claims description 12
- 230000003115 biocidal effect Effects 0.000 claims description 11
- 108010000700 Acetolactate synthase Proteins 0.000 claims description 10
- 102000002260 Alkaline Phosphatase Human genes 0.000 claims description 10
- 108020004774 Alkaline Phosphatase Proteins 0.000 claims description 10
- 108010035563 Chloramphenicol O-acetyltransferase Proteins 0.000 claims description 10
- 108010066133 D-octopine dehydrogenase Proteins 0.000 claims description 10
- ULGZDMOVFRHVEP-RWJQBGPGSA-N Erythromycin Chemical compound O([C@@H]1[C@@H](C)C(=O)O[C@@H]([C@@]([C@H](O)[C@@H](C)C(=O)[C@H](C)C[C@@](C)(O)[C@H](O[C@H]2[C@@H]([C@H](C[C@@H](C)O2)N(C)C)O)[C@H]1C)(C)O)CC)[C@H]1C[C@@](C)(OC)[C@@H](O)[C@H](C)O1 ULGZDMOVFRHVEP-RWJQBGPGSA-N 0.000 claims description 10
- 102000053187 Glucuronidase Human genes 0.000 claims description 10
- 108010060309 Glucuronidase Proteins 0.000 claims description 10
- 108010001336 Horseradish Peroxidase Proteins 0.000 claims description 10
- 108060001084 Luciferase Proteins 0.000 claims description 10
- 239000005089 Luciferase Substances 0.000 claims description 10
- 108010058731 nopaline synthase Proteins 0.000 claims description 10
- UCSJYZPVAKXKNQ-HZYVHMACSA-N streptomycin Chemical compound CN[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O[C@H]1O[C@@H]1[C@](C=O)(O)[C@H](C)O[C@H]1O[C@@H]1[C@@H](NC(N)=N)[C@H](O)[C@@H](NC(N)=N)[C@H](O)[C@H]1O UCSJYZPVAKXKNQ-HZYVHMACSA-N 0.000 claims description 10
- 102000016938 Catalase Human genes 0.000 claims description 9
- 108010053835 Catalase Proteins 0.000 claims description 9
- 241000713666 Lentivirus Species 0.000 claims description 9
- 241001655883 Adeno-associated virus - 1 Species 0.000 claims description 8
- 241000702423 Adeno-associated virus - 2 Species 0.000 claims description 7
- 241000202702 Adeno-associated virus - 3 Species 0.000 claims description 7
- 241000580270 Adeno-associated virus - 4 Species 0.000 claims description 7
- 241001634120 Adeno-associated virus - 5 Species 0.000 claims description 7
- 241000972680 Adeno-associated virus - 6 Species 0.000 claims description 7
- 241001164823 Adeno-associated virus - 7 Species 0.000 claims description 7
- 241001164825 Adeno-associated virus - 8 Species 0.000 claims description 7
- 241000649045 Adeno-associated virus 10 Species 0.000 claims description 7
- 241000649047 Adeno-associated virus 12 Species 0.000 claims description 7
- 241000300529 Adeno-associated virus 13 Species 0.000 claims description 7
- 241000710929 Alphavirus Species 0.000 claims description 7
- 241001339993 Anelloviridae Species 0.000 claims description 7
- 241000124740 Bocaparvovirus Species 0.000 claims description 7
- 241000725619 Dengue virus Species 0.000 claims description 7
- 241000700588 Human alphaherpesvirus 1 Species 0.000 claims description 7
- 241000701074 Human alphaherpesvirus 2 Species 0.000 claims description 7
- 241000701041 Human betaherpesvirus 7 Species 0.000 claims description 7
- 241001502974 Human gammaherpesvirus 8 Species 0.000 claims description 7
- 241000701027 Human herpesvirus 6 Species 0.000 claims description 7
- 241000125945 Protoparvovirus Species 0.000 claims description 7
- 241000700618 Vaccinia virus Species 0.000 claims description 7
- 210000004962 mammalian cell Anatomy 0.000 claims description 7
- 230000001225 therapeutic effect Effects 0.000 claims description 7
- 241000701161 unidentified adenovirus Species 0.000 claims description 7
- 241000701447 unidentified baculovirus Species 0.000 claims description 7
- 241001430294 unidentified retrovirus Species 0.000 claims description 7
- 241000649046 Adeno-associated virus 11 Species 0.000 claims description 6
- 102100031780 Endonuclease Human genes 0.000 claims description 6
- 241000701085 Human alphaherpesvirus 3 Species 0.000 claims description 6
- 239000003242 anti bacterial agent Substances 0.000 claims description 6
- 210000005260 human cell Anatomy 0.000 claims description 6
- 230000008685 targeting Effects 0.000 claims description 6
- YMHOBZXQZVXHBM-UHFFFAOYSA-N 2,5-dimethoxy-4-bromophenethylamine Chemical compound COC1=CC(CCN)=C(OC)C=C1Br YMHOBZXQZVXHBM-UHFFFAOYSA-N 0.000 claims description 5
- 108091005950 Azurite Proteins 0.000 claims description 5
- 108010006654 Bleomycin Proteins 0.000 claims description 5
- 108091005944 Cerulean Proteins 0.000 claims description 5
- 108091005960 Citrine Proteins 0.000 claims description 5
- 108091005943 CyPet Proteins 0.000 claims description 5
- 241000702421 Dependoparvovirus Species 0.000 claims description 5
- 108091005941 EBFP Proteins 0.000 claims description 5
- 108091005947 EBFP2 Proteins 0.000 claims description 5
- 108091005942 ECFP Proteins 0.000 claims description 5
- 229930193140 Neomycin Natural products 0.000 claims description 5
- 108010093965 Polymyxin B Proteins 0.000 claims description 5
- 239000004098 Tetracycline Substances 0.000 claims description 5
- 241000545067 Venus Species 0.000 claims description 5
- 108010084455 Zeocin Proteins 0.000 claims description 5
- 229960000723 ampicillin Drugs 0.000 claims description 5
- AVKUERGKIZMTKX-NJBDSQKTSA-N ampicillin Chemical compound C1([C@@H](N)C(=O)N[C@H]2[C@H]3SC([C@@H](N3C2=O)C(O)=O)(C)C)=CC=CC=C1 AVKUERGKIZMTKX-NJBDSQKTSA-N 0.000 claims description 5
- 102000005936 beta-Galactosidase Human genes 0.000 claims description 5
- 108010005774 beta-Galactosidase Proteins 0.000 claims description 5
- 229960001561 bleomycin Drugs 0.000 claims description 5
- OYVAGSVQBOHSSS-UAPAGMARSA-O bleomycin A2 Chemical compound N([C@H](C(=O)N[C@H](C)[C@@H](O)[C@H](C)C(=O)N[C@@H]([C@H](O)C)C(=O)NCCC=1SC=C(N=1)C=1SC=C(N=1)C(=O)NCCC[S+](C)C)[C@@H](O[C@H]1[C@H]([C@@H](O)[C@H](O)[C@H](CO)O1)O[C@@H]1[C@H]([C@@H](OC(N)=O)[C@H](O)[C@@H](CO)O1)O)C=1N=CNC=1)C(=O)C1=NC([C@H](CC(N)=O)NC[C@H](N)C(N)=O)=NC(N)=C1C OYVAGSVQBOHSSS-UAPAGMARSA-O 0.000 claims description 5
- 229960003669 carbenicillin Drugs 0.000 claims description 5
- FPPNZSSZRUTDAP-UWFZAAFLSA-N carbenicillin Chemical compound N([C@H]1[C@H]2SC([C@@H](N2C1=O)C(O)=O)(C)C)C(=O)C(C(O)=O)C1=CC=CC=C1 FPPNZSSZRUTDAP-UWFZAAFLSA-N 0.000 claims description 5
- 229960005091 chloramphenicol Drugs 0.000 claims description 5
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 claims description 5
- 239000011035 citrine Substances 0.000 claims description 5
- 229960003276 erythromycin Drugs 0.000 claims description 5
- 229960000318 kanamycin Drugs 0.000 claims description 5
- 229930027917 kanamycin Natural products 0.000 claims description 5
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 claims description 5
- 229930182823 kanamycin A Natural products 0.000 claims description 5
- 239000002679 microRNA Substances 0.000 claims description 5
- 229960004927 neomycin Drugs 0.000 claims description 5
- CWCMIVBLVUHDHK-ZSNHEYEWSA-N phleomycin D1 Chemical compound N([C@H](C(=O)N[C@H](C)[C@@H](O)[C@H](C)C(=O)N[C@@H]([C@H](O)C)C(=O)NCCC=1SC[C@@H](N=1)C=1SC=C(N=1)C(=O)NCCCCNC(N)=N)[C@@H](O[C@H]1[C@H]([C@@H](O)[C@H](O)[C@H](CO)O1)O[C@@H]1[C@H]([C@@H](OC(N)=O)[C@H](O)[C@@H](CO)O1)O)C=1N=CNC=1)C(=O)C1=NC([C@H](CC(N)=O)NC[C@H](N)C(N)=O)=NC(N)=C1C CWCMIVBLVUHDHK-ZSNHEYEWSA-N 0.000 claims description 5
- 229920000024 polymyxin B Polymers 0.000 claims description 5
- 229960005266 polymyxin b Drugs 0.000 claims description 5
- UNFWWIHTNXNPBV-WXKVUWSESA-N spectinomycin Chemical compound O([C@@H]1[C@@H](NC)[C@@H](O)[C@H]([C@@H]([C@H]1O1)O)NC)[C@]2(O)[C@H]1O[C@H](C)CC2=O UNFWWIHTNXNPBV-WXKVUWSESA-N 0.000 claims description 5
- 229960000268 spectinomycin Drugs 0.000 claims description 5
- 229960005322 streptomycin Drugs 0.000 claims description 5
- 229960002180 tetracycline Drugs 0.000 claims description 5
- 229930101283 tetracycline Natural products 0.000 claims description 5
- 235000019364 tetracycline Nutrition 0.000 claims description 5
- 150000003522 tetracyclines Chemical class 0.000 claims description 5
- GWBUNZLLLLDXMD-UHFFFAOYSA-H tricopper;dicarbonate;dihydroxide Chemical compound [OH-].[OH-].[Cu+2].[Cu+2].[Cu+2].[O-]C([O-])=O.[O-]C([O-])=O GWBUNZLLLLDXMD-UHFFFAOYSA-H 0.000 claims description 5
- 108091005957 yellow fluorescent proteins Proteins 0.000 claims description 5
- 239000003623 enhancer Substances 0.000 claims description 4
- QQXQGKSPIMGUIZ-AEZJAUAXSA-N queuosine Chemical group C1=2C(=O)NC(N)=NC=2N([C@H]2[C@@H]([C@H](O)[C@@H](CO)O2)O)C=C1CN[C@H]1C=C[C@H](O)[C@@H]1O QQXQGKSPIMGUIZ-AEZJAUAXSA-N 0.000 claims description 4
- 102000004657 Calcium-Calmodulin-Dependent Protein Kinase Type 2 Human genes 0.000 claims description 3
- 108010003721 Calcium-Calmodulin-Dependent Protein Kinase Type 2 Proteins 0.000 claims description 3
- 101000829506 Homo sapiens Rhodopsin kinase GRK1 Proteins 0.000 claims description 3
- 108700011259 MicroRNAs Proteins 0.000 claims description 3
- 102100023742 Rhodopsin kinase GRK1 Human genes 0.000 claims description 3
- 102000001435 Synapsin Human genes 0.000 claims description 3
- 108050009621 Synapsin Proteins 0.000 claims description 3
- 230000001939 inductive effect Effects 0.000 claims description 3
- 101100178718 Drosophila melanogaster Hsc70-4 gene Proteins 0.000 claims 1
- 108020004414 DNA Proteins 0.000 description 48
- 102000053602 DNA Human genes 0.000 description 48
- 150000002632 lipids Chemical class 0.000 description 28
- 108090000765 processed proteins & peptides Proteins 0.000 description 28
- 235000018102 proteins Nutrition 0.000 description 28
- 102000004169 proteins and genes Human genes 0.000 description 28
- -1 diTP Chemical compound 0.000 description 26
- 102000004196 processed proteins & peptides Human genes 0.000 description 25
- 102100034343 Integrase Human genes 0.000 description 24
- 229920001184 polypeptide Polymers 0.000 description 24
- 235000001014 amino acid Nutrition 0.000 description 23
- 230000000694 effects Effects 0.000 description 23
- 229940024606 amino acid Drugs 0.000 description 22
- 150000001413 amino acids Chemical class 0.000 description 22
- 229920002477 rna polymer Polymers 0.000 description 21
- 239000002105 nanoparticle Substances 0.000 description 19
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 18
- 102000004190 Enzymes Human genes 0.000 description 16
- 108090000790 Enzymes Proteins 0.000 description 16
- 238000010354 CRISPR gene editing Methods 0.000 description 13
- 238000006243 chemical reaction Methods 0.000 description 12
- 239000012634 fragment Substances 0.000 description 12
- 101001065663 Homo sapiens Lipolysis-stimulated lipoprotein receptor Proteins 0.000 description 11
- 230000037431 insertion Effects 0.000 description 11
- 238000003780 insertion Methods 0.000 description 11
- 108020004999 messenger RNA Proteins 0.000 description 11
- 108091028043 Nucleic acid sequence Proteins 0.000 description 10
- 239000002502 liposome Substances 0.000 description 10
- 229950010342 uridine triphosphate Drugs 0.000 description 10
- 235000012000 cholesterol Nutrition 0.000 description 9
- 238000006467 substitution reaction Methods 0.000 description 9
- 230000000295 complement effect Effects 0.000 description 8
- 238000000338 in vitro Methods 0.000 description 8
- 239000000203 mixture Substances 0.000 description 8
- 230000002068 genetic effect Effects 0.000 description 7
- 230000004048 modification Effects 0.000 description 7
- 238000012986 modification Methods 0.000 description 7
- 229920000642 polymer Polymers 0.000 description 7
- 230000037430 deletion Effects 0.000 description 6
- 238000012217 deletion Methods 0.000 description 6
- 230000035772 mutation Effects 0.000 description 6
- 238000013518 transcription Methods 0.000 description 6
- 230000035897 transcription Effects 0.000 description 6
- 108091026890 Coding region Proteins 0.000 description 5
- 108020004635 Complementary DNA Proteins 0.000 description 5
- 108010042407 Endonucleases Proteins 0.000 description 5
- 108020005004 Guide RNA Proteins 0.000 description 5
- 125000003275 alpha amino acid group Chemical group 0.000 description 5
- 238000010804 cDNA synthesis Methods 0.000 description 5
- 238000003776 cleavage reaction Methods 0.000 description 5
- 239000002299 complementary DNA Substances 0.000 description 5
- 239000012636 effector Substances 0.000 description 5
- VYXSBFYARXAAKO-UHFFFAOYSA-N ethyl 2-[3-(ethylamino)-6-ethylimino-2,7-dimethylxanthen-9-yl]benzoate;hydron;chloride Chemical compound [Cl-].C1=2C=C(C)C(NCC)=CC=2OC2=CC(=[NH+]CC)C(C)=CC2=C1C1=CC=CC=C1C(=O)OCC VYXSBFYARXAAKO-UHFFFAOYSA-N 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000001638 lipofection Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000007017 scission Effects 0.000 description 5
- 230000017105 transposition Effects 0.000 description 5
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 4
- OAKPWEUQDVLTCN-NKWVEPMBSA-N 2',3'-Dideoxyadenosine-5-triphosphate Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1CC[C@@H](CO[P@@](O)(=O)O[P@](O)(=O)OP(O)(O)=O)O1 OAKPWEUQDVLTCN-NKWVEPMBSA-N 0.000 description 4
- ZKHQWZAMYRWXGA-KQYNXXCUSA-J ATP(4-) Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)[C@H]1O ZKHQWZAMYRWXGA-KQYNXXCUSA-J 0.000 description 4
- ZKHQWZAMYRWXGA-UHFFFAOYSA-N Adenosine triphosphate Natural products C1=NC=2C(N)=NC=NC=2N1C1OC(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)C(O)C1O ZKHQWZAMYRWXGA-UHFFFAOYSA-N 0.000 description 4
- 229930182558 Sterol Natural products 0.000 description 4
- ARLKCWCREKRROD-POYBYMJQSA-N [[(2s,5r)-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)CC1 ARLKCWCREKRROD-POYBYMJQSA-N 0.000 description 4
- 230000027455 binding Effects 0.000 description 4
- 230000003197 catalytic effect Effects 0.000 description 4
- RGWHQCVHVJXOKC-SHYZEUOFSA-J dCTP(4-) Chemical compound O=C1N=C(N)C=CN1[C@@H]1O[C@H](COP([O-])(=O)OP([O-])(=O)OP([O-])([O-])=O)[C@@H](O)C1 RGWHQCVHVJXOKC-SHYZEUOFSA-J 0.000 description 4
- PYWVYCXTNDRMGF-UHFFFAOYSA-N rhodamine B Chemical compound [Cl-].C=12C=CC(=[N+](CC)CC)C=C2OC2=CC(N(CC)CC)=CC=C2C=1C1=CC=CC=C1C(O)=O PYWVYCXTNDRMGF-UHFFFAOYSA-N 0.000 description 4
- 239000004055 small Interfering RNA Substances 0.000 description 4
- 150000003432 sterols Chemical class 0.000 description 4
- 235000003702 sterols Nutrition 0.000 description 4
- ABZLKHKQJHEPAX-UHFFFAOYSA-N tetramethylrhodamine Chemical compound C=12C=CC(N(C)C)=CC2=[O+]C2=CC(N(C)C)=CC=C2C=1C1=CC=CC=C1C([O-])=O ABZLKHKQJHEPAX-UHFFFAOYSA-N 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 239000001226 triphosphate Substances 0.000 description 4
- 235000011178 triphosphate Nutrition 0.000 description 4
- 108010053770 Deoxyribonucleases Proteins 0.000 description 3
- 102000016911 Deoxyribonucleases Human genes 0.000 description 3
- AHCYMLUZIRLXAA-SHYZEUOFSA-N Deoxyuridine 5'-triphosphate Chemical compound O1[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C[C@@H]1N1C(=O)NC(=O)C=C1 AHCYMLUZIRLXAA-SHYZEUOFSA-N 0.000 description 3
- 241000196324 Embryophyta Species 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 108700026244 Open Reading Frames Proteins 0.000 description 3
- HDRRAMINWIWTNU-NTSWFWBYSA-N [[(2s,5r)-5-(2-amino-6-oxo-3h-purin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound C1=2NC(N)=NC(=O)C=2N=CN1[C@H]1CC[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HDRRAMINWIWTNU-NTSWFWBYSA-N 0.000 description 3
- 125000000539 amino acid group Chemical group 0.000 description 3
- 230000004071 biological effect Effects 0.000 description 3
- URGJWIFLBWJRMF-JGVFFNPUSA-N ddTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)CC1 URGJWIFLBWJRMF-JGVFFNPUSA-N 0.000 description 3
- 238000004520 electroporation Methods 0.000 description 3
- 238000009472 formulation Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000002887 multiple sequence alignment Methods 0.000 description 3
- 230000007935 neutral effect Effects 0.000 description 3
- 238000010839 reverse transcription Methods 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 238000010361 transduction Methods 0.000 description 3
- 230000026683 transduction Effects 0.000 description 3
- 125000002264 triphosphate group Chemical class [H]OP(=O)(O[H])OP(=O)(O[H])OP(=O)(O[H])O* 0.000 description 3
- NRJAVPSFFCBXDT-HUESYALOSA-N 1,2-distearoyl-sn-glycero-3-phosphocholine Chemical compound CCCCCCCCCCCCCCCCCC(=O)OC[C@H](COP([O-])(=O)OCC[N+](C)(C)C)OC(=O)CCCCCCCCCCCCCCCCC NRJAVPSFFCBXDT-HUESYALOSA-N 0.000 description 2
- VGIRNWJSIRVFRT-UHFFFAOYSA-N 2',7'-difluorofluorescein Chemical compound OC(=O)C1=CC=CC=C1C1=C2C=C(F)C(=O)C=C2OC2=CC(O)=C(F)C=C21 VGIRNWJSIRVFRT-UHFFFAOYSA-N 0.000 description 2
- WCKQPPQRFNHPRJ-UHFFFAOYSA-N 4-[[4-(dimethylamino)phenyl]diazenyl]benzoic acid Chemical compound C1=CC(N(C)C)=CC=C1N=NC1=CC=C(C(O)=O)C=C1 WCKQPPQRFNHPRJ-UHFFFAOYSA-N 0.000 description 2
- SJQRQOKXQKVJGJ-UHFFFAOYSA-N 5-(2-aminoethylamino)naphthalene-1-sulfonic acid Chemical compound C1=CC=C2C(NCCN)=CC=CC2=C1S(O)(=O)=O SJQRQOKXQKVJGJ-UHFFFAOYSA-N 0.000 description 2
- 241000713838 Avian myeloblastosis virus Species 0.000 description 2
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 2
- 241000222120 Candida <Saccharomycetales> Species 0.000 description 2
- 241001337994 Cryptococcus <scale insect> Species 0.000 description 2
- DHMQDGOQFOQNFH-UHFFFAOYSA-N Glycine Chemical compound NCC(O)=O DHMQDGOQFOQNFH-UHFFFAOYSA-N 0.000 description 2
- XKMLYUALXHKNFT-UUOKFMHZSA-N Guanosine-5'-triphosphate Chemical compound C1=2NC(N)=NC(=O)C=2N=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O XKMLYUALXHKNFT-UUOKFMHZSA-N 0.000 description 2
- 241000238631 Hexapoda Species 0.000 description 2
- 101000600434 Homo sapiens Putative uncharacterized protein encoded by MIR7-3HG Proteins 0.000 description 2
- 108010061833 Integrases Proteins 0.000 description 2
- 108091092195 Intron Proteins 0.000 description 2
- FFEARJCKVFRZRR-BYPYZUCNSA-N L-methionine Chemical compound CSCC[C@H](N)C(O)=O FFEARJCKVFRZRR-BYPYZUCNSA-N 0.000 description 2
- 241000713869 Moloney murine leukemia virus Species 0.000 description 2
- 238000010222 PCR analysis Methods 0.000 description 2
- 108091005804 Peptidases Proteins 0.000 description 2
- 239000004365 Protease Substances 0.000 description 2
- 102100037401 Putative uncharacterized protein encoded by MIR7-3HG Human genes 0.000 description 2
- 102100037486 Reverse transcriptase/ribonuclease H Human genes 0.000 description 2
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 2
- 235000014680 Saccharomyces cerevisiae Nutrition 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 108091027967 Small hairpin RNA Proteins 0.000 description 2
- 108020004459 Small interfering RNA Proteins 0.000 description 2
- 101000910035 Streptococcus pyogenes serotype M1 CRISPR-associated endonuclease Cas9/Csn1 Proteins 0.000 description 2
- 108020004566 Transfer RNA Proteins 0.000 description 2
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 2
- PGAVKCOVUIYSFO-UHFFFAOYSA-N [[5-(2,4-dioxopyrimidin-1-yl)-3,4-dihydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound OC1C(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)OC1N1C(=O)NC(=O)C=C1 PGAVKCOVUIYSFO-UHFFFAOYSA-N 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 210000004102 animal cell Anatomy 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 101150010487 are gene Proteins 0.000 description 2
- 210000004507 artificial chromosome Anatomy 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 239000011616 biotin Substances 0.000 description 2
- 229960002685 biotin Drugs 0.000 description 2
- 235000020958 biotin Nutrition 0.000 description 2
- 229940098773 bovine serum albumin Drugs 0.000 description 2
- 125000002091 cationic group Chemical group 0.000 description 2
- 108091092259 cell-free RNA Proteins 0.000 description 2
- 210000000349 chromosome Anatomy 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- SUYVUBYJARFZHO-RRKCRQDMSA-N dATP Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-RRKCRQDMSA-N 0.000 description 2
- SUYVUBYJARFZHO-UHFFFAOYSA-N dATP Natural products C1=NC=2C(N)=NC=NC=2N1C1CC(O)C(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 SUYVUBYJARFZHO-UHFFFAOYSA-N 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 238000000684 flow cytometry Methods 0.000 description 2
- GNBHRKFJIUUOQI-UHFFFAOYSA-N fluorescein Chemical compound O1C(=O)C2=CC=CC=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 GNBHRKFJIUUOQI-UHFFFAOYSA-N 0.000 description 2
- 230000002538 fungal effect Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000006801 homologous recombination Effects 0.000 description 2
- 238000002744 homologous recombination Methods 0.000 description 2
- 210000000688 human artificial chromosome Anatomy 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 210000003734 kidney Anatomy 0.000 description 2
- 229920002521 macromolecule Polymers 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000001404 mediated effect Effects 0.000 description 2
- 229930182817 methionine Natural products 0.000 description 2
- 108091070501 miRNA Proteins 0.000 description 2
- 238000010369 molecular cloning Methods 0.000 description 2
- 239000013642 negative control Substances 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 230000001566 pro-viral effect Effects 0.000 description 2
- 210000001236 prokaryotic cell Anatomy 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 108020004418 ribosomal RNA Proteins 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- ATHGHQPFGPMSJY-UHFFFAOYSA-N spermidine Chemical compound NCCCCNCCCN ATHGHQPFGPMSJY-UHFFFAOYSA-N 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000005945 translocation Effects 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 210000005253 yeast cell Anatomy 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- JKMHFZQWWAIEOD-UHFFFAOYSA-N 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid Chemical compound OCC[NH+]1CCN(CCS([O-])(=O)=O)CC1 JKMHFZQWWAIEOD-UHFFFAOYSA-N 0.000 description 1
- ZLOIGESWDJYCTF-XVFCMESISA-N 4-thiouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=S)C=C1 ZLOIGESWDJYCTF-XVFCMESISA-N 0.000 description 1
- LQLQRFGHAALLLE-UHFFFAOYSA-N 5-bromouracil Chemical compound BrC1=CNC(=O)NC1=O LQLQRFGHAALLLE-UHFFFAOYSA-N 0.000 description 1
- NJYVEMPWNAYQQN-UHFFFAOYSA-N 5-carboxyfluorescein Chemical compound C12=CC=C(O)C=C2OC2=CC(O)=CC=C2C21OC(=O)C1=CC(C(=O)O)=CC=C21 NJYVEMPWNAYQQN-UHFFFAOYSA-N 0.000 description 1
- WQZIDRAQTRIQDX-UHFFFAOYSA-N 6-carboxy-x-rhodamine Chemical compound OC(=O)C1=CC=C(C([O-])=O)C=C1C(C1=CC=2CCCN3CCCC(C=23)=C1O1)=C2C1=C(CCC1)C3=[N+]1CCCC3=C2 WQZIDRAQTRIQDX-UHFFFAOYSA-N 0.000 description 1
- FVFVNNKYKYZTJU-UHFFFAOYSA-N 6-chloro-1,3,5-triazine-2,4-diamine Chemical compound NC1=NC(N)=NC(Cl)=N1 FVFVNNKYKYZTJU-UHFFFAOYSA-N 0.000 description 1
- 241000093740 Acidaminococcus sp. Species 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- DCXYFEDJOCDNAF-UHFFFAOYSA-N Asparagine Natural products OC(=O)C(N)CC(N)=O DCXYFEDJOCDNAF-UHFFFAOYSA-N 0.000 description 1
- 108090001008 Avidin Proteins 0.000 description 1
- 108091079001 CRISPR RNA Proteins 0.000 description 1
- 241000589875 Campylobacter jejuni Species 0.000 description 1
- 241000589986 Campylobacter lari Species 0.000 description 1
- 108090000994 Catalytic RNA Proteins 0.000 description 1
- 102000053642 Catalytic RNA Human genes 0.000 description 1
- 108010077544 Chromatin Proteins 0.000 description 1
- 108020004638 Circular DNA Proteins 0.000 description 1
- KQLDDLUWUFBQHP-UHFFFAOYSA-N Cordycepin Natural products C1=NC=2C(N)=NC=NC=2N1C1OCC(CO)C1O KQLDDLUWUFBQHP-UHFFFAOYSA-N 0.000 description 1
- 108091029523 CpG island Proteins 0.000 description 1
- 241000699800 Cricetinae Species 0.000 description 1
- 241000699802 Cricetulus griseus Species 0.000 description 1
- 150000008574 D-amino acids Chemical class 0.000 description 1
- 230000033616 DNA repair Effects 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- 241000255601 Drosophila melanogaster Species 0.000 description 1
- 101000764582 Enterobacteria phage T4 Tape measure protein Proteins 0.000 description 1
- 108010013369 Enteropeptidase Proteins 0.000 description 1
- 102100029727 Enteropeptidase Human genes 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 101000621102 Escherichia phage Mu Portal protein Proteins 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 108010074860 Factor Xa Proteins 0.000 description 1
- 241000589602 Francisella tularensis Species 0.000 description 1
- 241000589599 Francisella tularensis subsp. novicida Species 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 1
- WHUUTDBJXJRKMK-UHFFFAOYSA-N Glutamic acid Natural products OC(=O)C(N)CCC(O)=O WHUUTDBJXJRKMK-UHFFFAOYSA-N 0.000 description 1
- 239000004471 Glycine Substances 0.000 description 1
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 1
- 239000007995 HEPES buffer Substances 0.000 description 1
- 108060003760 HNH nuclease Proteins 0.000 description 1
- 102000029812 HNH nuclease Human genes 0.000 description 1
- 241000543133 Helicobacter canadensis Species 0.000 description 1
- 241000256244 Heliothis virescens Species 0.000 description 1
- 101000914514 Homo sapiens T-cell-specific surface glycoprotein CD28 Proteins 0.000 description 1
- 108090000144 Human Proteins Proteins 0.000 description 1
- 102000003839 Human Proteins Human genes 0.000 description 1
- 102000008394 Immunoglobulin Fragments Human genes 0.000 description 1
- 108010021625 Immunoglobulin Fragments Proteins 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 101710203526 Integrase Proteins 0.000 description 1
- QNAYBMKLOCPYGJ-REOHCLBHSA-N L-alanine Chemical compound C[C@H](N)C(O)=O QNAYBMKLOCPYGJ-REOHCLBHSA-N 0.000 description 1
- 150000008575 L-amino acids Chemical class 0.000 description 1
- DCXYFEDJOCDNAF-REOHCLBHSA-N L-asparagine Chemical compound OC(=O)[C@@H](N)CC(N)=O DCXYFEDJOCDNAF-REOHCLBHSA-N 0.000 description 1
- CKLJMWTZIZZHCS-REOHCLBHSA-N L-aspartic acid Chemical compound OC(=O)[C@@H](N)CC(O)=O CKLJMWTZIZZHCS-REOHCLBHSA-N 0.000 description 1
- HNDVDQJCIGZPNO-YFKPBYRVSA-N L-histidine Chemical compound OC(=O)[C@@H](N)CC1=CN=CN1 HNDVDQJCIGZPNO-YFKPBYRVSA-N 0.000 description 1
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 description 1
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 description 1
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 description 1
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 description 1
- 241000689670 Lachnospiraceae bacterium ND2006 Species 0.000 description 1
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 description 1
- 239000000232 Lipid Bilayer Substances 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 1
- 108010052285 Membrane Proteins Proteins 0.000 description 1
- 102000018697 Membrane Proteins Human genes 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 108060004795 Methyltransferase Proteins 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- KWYHDKDOAIKMQN-UHFFFAOYSA-N N,N,N',N'-tetramethylethylenediamine Chemical compound CN(C)CCN(C)C KWYHDKDOAIKMQN-UHFFFAOYSA-N 0.000 description 1
- 108091061960 Naked DNA Proteins 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 240000007019 Oxalis corniculata Species 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- 108091093037 Peptide nucleic acid Proteins 0.000 description 1
- 108091000080 Phosphotransferase Proteins 0.000 description 1
- 206010035226 Plasma cell myeloma Diseases 0.000 description 1
- 229930185560 Pseudouridine Natural products 0.000 description 1
- PTJWIQPHWPFNBW-UHFFFAOYSA-N Pseudouridine C Natural products OC1C(O)C(CO)OC1C1=CNC(=O)NC1=O PTJWIQPHWPFNBW-UHFFFAOYSA-N 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- 108020004511 Recombinant DNA Proteins 0.000 description 1
- 108020003564 Retroelements Proteins 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 241000256251 Spodoptera frugiperda Species 0.000 description 1
- 241000191967 Staphylococcus aureus Species 0.000 description 1
- 108010090804 Streptavidin Proteins 0.000 description 1
- 241000194017 Streptococcus Species 0.000 description 1
- 241000193996 Streptococcus pyogenes Species 0.000 description 1
- 241000194020 Streptococcus thermophilus Species 0.000 description 1
- 241000187747 Streptomyces Species 0.000 description 1
- 102100027213 T-cell-specific surface glycoprotein CD28 Human genes 0.000 description 1
- 108700026226 TATA Box Proteins 0.000 description 1
- PZBFGYYEXUXCOF-UHFFFAOYSA-N TCEP Chemical compound OC(=O)CCP(CCC(O)=O)CCC(O)=O PZBFGYYEXUXCOF-UHFFFAOYSA-N 0.000 description 1
- 108010017842 Telomerase Proteins 0.000 description 1
- 102100032938 Telomerase reverse transcriptase Human genes 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- 108010010574 Tn3 resolvase Proteins 0.000 description 1
- 108091023040 Transcription factor Proteins 0.000 description 1
- 102000040945 Transcription factor Human genes 0.000 description 1
- 102100030986 Transgelin-3 Human genes 0.000 description 1
- 108050006165 Transgelin-3 Proteins 0.000 description 1
- 241000255993 Trichoplusia ni Species 0.000 description 1
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 description 1
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 description 1
- JCZSFCLRSONYLH-UHFFFAOYSA-N Wyosine Natural products N=1C(C)=CN(C(C=2N=C3)=O)C=1N(C)C=2N3C1OC(CO)C(O)C1O JCZSFCLRSONYLH-UHFFFAOYSA-N 0.000 description 1
- NOXMCJDDSWCSIE-DAGMQNCNSA-N [[(2R,3S,4R,5R)-5-(2-amino-4-oxo-3H-pyrrolo[2,3-d]pyrimidin-7-yl)-3,4-dihydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound C1=2NC(N)=NC(=O)C=2C=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)[C@H]1O NOXMCJDDSWCSIE-DAGMQNCNSA-N 0.000 description 1
- AZJLCKAEZFNJDI-DJLDLDEBSA-N [[(2r,3s,5r)-5-(4-aminopyrrolo[2,3-d]pyrimidin-7-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound C1=CC=2C(N)=NC=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 AZJLCKAEZFNJDI-DJLDLDEBSA-N 0.000 description 1
- AZRNEVJSOSKAOC-VPHBQDTQSA-N [[(2r,3s,5r)-5-[5-[(e)-3-[6-[5-[(3as,4s,6ar)-2-oxo-1,3,3a,4,6,6a-hexahydrothieno[3,4-d]imidazol-4-yl]pentanoylamino]hexanoylamino]prop-1-enyl]-2,4-dioxopyrimidin-1-yl]-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl] phosphono hydrogen phosphate Chemical compound O1[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C[C@@H]1N1C(=O)NC(=O)C(\C=C\CNC(=O)CCCCCNC(=O)CCCC[C@H]2[C@H]3NC(=O)N[C@H]3CS2)=C1 AZRNEVJSOSKAOC-VPHBQDTQSA-N 0.000 description 1
- ZXZIQGYRHQJWSY-NKWVEPMBSA-N [hydroxy-[[(2s,5r)-5-(6-oxo-3h-purin-9-yl)oxolan-2-yl]methoxy]phosphoryl] phosphono hydrogen phosphate Chemical compound O1[C@H](COP(O)(=O)OP(O)(=O)OP(O)(=O)O)CC[C@@H]1N1C(NC=NC2=O)=C2N=C1 ZXZIQGYRHQJWSY-NKWVEPMBSA-N 0.000 description 1
- 230000021736 acetylation Effects 0.000 description 1
- 238000006640 acetylation reaction Methods 0.000 description 1
- 108020002494 acetyltransferase Proteins 0.000 description 1
- 102000005421 acetyltransferase Human genes 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 235000004279 alanine Nutrition 0.000 description 1
- 125000000217 alkyl group Chemical group 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- 235000009582 asparagine Nutrition 0.000 description 1
- 229960001230 asparagine Drugs 0.000 description 1
- 125000000613 asparagine group Chemical group N[C@@H](CC(N)=O)C(=O)* 0.000 description 1
- 235000003704 aspartic acid Nutrition 0.000 description 1
- WGDUUQDYDIIBKT-UHFFFAOYSA-N beta-Pseudouridine Natural products OC1OC(CN2C=CC(=O)NC2=O)C(O)C1O WGDUUQDYDIIBKT-UHFFFAOYSA-N 0.000 description 1
- OQFSQFPPLPISGP-UHFFFAOYSA-N beta-carboxyaspartic acid Natural products OC(=O)C(N)C(C(O)=O)C(O)=O OQFSQFPPLPISGP-UHFFFAOYSA-N 0.000 description 1
- 238000006664 bond formation reaction Methods 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000006555 catalytic reaction Methods 0.000 description 1
- 239000006143 cell culture medium Substances 0.000 description 1
- 238000002659 cell therapy Methods 0.000 description 1
- 108091092356 cellular DNA Proteins 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 229960001231 choline Drugs 0.000 description 1
- OEYIOHPDSNJKLS-UHFFFAOYSA-N choline Chemical compound C[N+](C)(C)CCO OEYIOHPDSNJKLS-UHFFFAOYSA-N 0.000 description 1
- 210000003483 chromatin Anatomy 0.000 description 1
- 230000005757 colony formation Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000021615 conjugation Effects 0.000 description 1
- OFEZSBMBBKLLBJ-BAJZRUMYSA-N cordycepin Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)C[C@H]1O OFEZSBMBBKLLBJ-BAJZRUMYSA-N 0.000 description 1
- OFEZSBMBBKLLBJ-UHFFFAOYSA-N cordycepine Natural products C1=NC=2C(N)=NC=NC=2N1C1OC(CO)CC1O OFEZSBMBBKLLBJ-UHFFFAOYSA-N 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 235000018417 cysteine Nutrition 0.000 description 1
- XUJNEKJLAYXESH-UHFFFAOYSA-N cysteine Natural products SCC(N)C(O)=O XUJNEKJLAYXESH-UHFFFAOYSA-N 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- HAAZLUGHYHWQIW-KVQBGUIXSA-N dGTP Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)O1 HAAZLUGHYHWQIW-KVQBGUIXSA-N 0.000 description 1
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 1
- 238000002716 delivery method Methods 0.000 description 1
- 239000005549 deoxyribonucleoside Substances 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000005546 dideoxynucleotide Substances 0.000 description 1
- ZPTBLXKRQACLCR-XVFCMESISA-N dihydrouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)CC1 ZPTBLXKRQACLCR-XVFCMESISA-N 0.000 description 1
- XSWSEQPWKOWORN-UHFFFAOYSA-N dodecan-2-ol Chemical compound CCCCCCCCCCC(C)O XSWSEQPWKOWORN-UHFFFAOYSA-N 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- LYCAIKOWRPUZTN-UHFFFAOYSA-N ethylene glycol Natural products OCCO LYCAIKOWRPUZTN-UHFFFAOYSA-N 0.000 description 1
- 238000005206 flow analysis Methods 0.000 description 1
- 229940118764 francisella tularensis Drugs 0.000 description 1
- 108010089843 gamma delta resolvase Proteins 0.000 description 1
- 238000001415 gene therapy Methods 0.000 description 1
- 108091006104 gene-regulatory proteins Proteins 0.000 description 1
- 102000034356 gene-regulatory proteins Human genes 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 235000013922 glutamic acid Nutrition 0.000 description 1
- 239000004220 glutamic acid Substances 0.000 description 1
- ZDXPYRJPNDTMRX-UHFFFAOYSA-N glutamine Natural products OC(=O)C(N)CCC(N)=O ZDXPYRJPNDTMRX-UHFFFAOYSA-N 0.000 description 1
- 230000013595 glycosylation Effects 0.000 description 1
- 238000006206 glycosylation reaction Methods 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- 208000002672 hepatitis B Diseases 0.000 description 1
- HNDVDQJCIGZPNO-UHFFFAOYSA-N histidine Natural products OC(=O)C(N)CC1=CN=CN1 HNDVDQJCIGZPNO-UHFFFAOYSA-N 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 230000002209 hydrophobic effect Effects 0.000 description 1
- WGCNASOHLSPBMP-UHFFFAOYSA-N hydroxyacetaldehyde Natural products OCC=O WGCNASOHLSPBMP-UHFFFAOYSA-N 0.000 description 1
- 238000000099 in vitro assay Methods 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 229960000310 isoleucine Drugs 0.000 description 1
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 230000029226 lipidation Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 210000005229 liver cell Anatomy 0.000 description 1
- 230000001320 lysogenic effect Effects 0.000 description 1
- 210000001161 mammalian embryo Anatomy 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 1
- 239000000693 micelle Substances 0.000 description 1
- 238000000520 microinjection Methods 0.000 description 1
- 239000003068 molecular probe Substances 0.000 description 1
- 238000009126 molecular therapy Methods 0.000 description 1
- 125000004573 morpholin-4-yl group Chemical group N1(CCOCC1)* 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 201000000050 myeloid neoplasm Diseases 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 230000030648 nucleus localization Effects 0.000 description 1
- 230000009437 off-target effect Effects 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 230000003647 oxidation Effects 0.000 description 1
- 238000007254 oxidation reaction Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 150000004713 phosphodiesters Chemical class 0.000 description 1
- 230000026731 phosphorylation Effects 0.000 description 1
- 238000006366 phosphorylation reaction Methods 0.000 description 1
- 102000020233 phosphotransferase Human genes 0.000 description 1
- 238000013081 phylogenetic analysis Methods 0.000 description 1
- 230000008488 polyadenylation Effects 0.000 description 1
- 230000023603 positive regulation of transcription initiation, DNA-dependent Effects 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000010469 pro-virus integration Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000013636 protein dimer Substances 0.000 description 1
- PTJWIQPHWPFNBW-GBNDHIKLSA-N pseudouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1C1=CNC(=O)NC1=O PTJWIQPHWPFNBW-GBNDHIKLSA-N 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 230000002207 retinal effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000002342 ribonucleoside Substances 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108091092562 ribozyme Proteins 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 102000023888 sequence-specific DNA binding proteins Human genes 0.000 description 1
- 108091008420 sequence-specific DNA binding proteins Proteins 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 125000006850 spacer group Chemical group 0.000 description 1
- 229940063673 spermidine Drugs 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 102000055501 telomere Human genes 0.000 description 1
- 108091035539 telomere Proteins 0.000 description 1
- 210000003411 telomere Anatomy 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- MPLHNVLQVRSVEE-UHFFFAOYSA-N texas red Chemical compound [O-]S(=O)(=O)C1=CC(S(Cl)(=O)=O)=CC=C1C(C1=CC=2CCCN3CCCC(C=23)=C1O1)=C2C1=C(CCC1)C3=[N+]1CCCC3=C2 MPLHNVLQVRSVEE-UHFFFAOYSA-N 0.000 description 1
- IBVCSSOEYUMRLC-GABYNLOESA-N texas red-5-dutp Chemical compound O1[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C[C@@H]1N1C(=O)NC(=O)C(C#CCNS(=O)(=O)C=2C=C(C(C=3C4=CC=5CCCN6CCCC(C=56)=C4OC4=C5C6=[N+](CCC5)CCCC6=CC4=3)=CC=2)S([O-])(=O)=O)=C1 IBVCSSOEYUMRLC-GABYNLOESA-N 0.000 description 1
- 150000003573 thiols Chemical class 0.000 description 1
- ANRHNWWPFJCPAZ-UHFFFAOYSA-M thionine Chemical compound [Cl-].C1=CC(N)=CC2=[S+]C3=CC(N)=CC=C3N=C21 ANRHNWWPFJCPAZ-UHFFFAOYSA-M 0.000 description 1
- 229940113082 thymine Drugs 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000001890 transfection Methods 0.000 description 1
- UNXRWKVEANCORM-UHFFFAOYSA-N triphosphoric acid Chemical compound OP(O)(=O)OP(O)(=O)OP(O)(O)=O UNXRWKVEANCORM-UHFFFAOYSA-N 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 241001478887 unidentified soil bacteria Species 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- 239000004474 valine Substances 0.000 description 1
- 239000013603 viral vector Substances 0.000 description 1
- 210000002845 virion Anatomy 0.000 description 1
- 239000000277 virosome Substances 0.000 description 1
- JCZSFCLRSONYLH-QYVSTXNMSA-N wyosin Chemical compound N=1C(C)=CN(C(C=2N=C3)=O)C=1N(C)C=2N3[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O JCZSFCLRSONYLH-QYVSTXNMSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
- C12N15/902—Stable introduction of foreign DNA into chromosome using homologous recombination
- C12N15/907—Stable introduction of foreign DNA into chromosome using homologous recombination in mammalian cells
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2800/00—Nucleic acids vectors
- C12N2800/30—Vector systems comprising sequences for excision in presence of a recombinase, e.g. loxP or FRT
Definitions
- the disclosure is based, in part, upon the development of serine recombinases for use in gene editing systems to integrate nucleic acid sequences.
- Described herein are gene editing systems comprising: a) a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase; and b) a nucleic acid comprising a donor polynucleotide and a first attachment site sequence.
- the first attachment site sequence is 5’ of the donor polynucleotide.
- the nucleic acid encoding the serine recombinase further comprises a second attachment site sequence.
- the second attachment site sequence is 5’ of the serine recombinase.
- the first attachment site sequence and the second attachment site sequence are capable of recombination.
- the first attachment site sequence is a bacterial genomic recombination sequence (attB).
- the first attachment site sequence is a phage genomic recombination sequence (attP).
- the second attachment site sequence is a bacterial genomic recombination sequence (attB).
- the second attachment site sequence is a phage genomic recombination sequence (attP).
- the attB sequence comprises about 20 to about 500 nucleotides. In some embodiments, the attP sequence comprises about 20 to about 500 nucleotides.
- the nucleic acid comprising the donor polynucleotide and the first attachment site sequence is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
- the nucleic acid encoding the serine recombinase is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
- the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus.
- the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV- rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV-Anc80, AAV- Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV- HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11, AAV-HSC
- the herpesvirus is HSV-1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8.
- the donor polynucleotide comprises a size of at least about 1 kilobase (kb), 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, 100 kb, 110 kb, 120 kb, or more than 120 kb.
- the donor polynucleotide encodes a therapeutic, a reporter, or a marker.
- the reporter comprises a fluorescent protein.
- the fluorescent protein is GFP, EBFP, EBFP2, Azurite, mKalamal, ECFP, Cerulean, CyPet, YFP, Citrine, Venus, YPet, RFP, CFP, or a derivative thereof.
- the reporter is acetohydroxyacid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucuronidase (GUS), chloramphenicol acetyltransferase (CAT), horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, or a derivative thereof.
- the marker is an antibiotic resistance marker.
- the antibiotic resistance marker is kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin B, tetracycline, chloramphenicol, neomycin, zeocin, or a derivative thereof.
- the marker is a cell surface marker.
- eukaryotic cells comprising a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115.
- the eukaryotic cell is a mammalian cell.
- the eukaryotic cell is a human cell.
- the serine recombinase comprises an integration efficiency of at least about 5%.
- the serine recombinase comprises an integration efficiency of at least about 25%.
- the serine recombinase comprises an integration efficiency of at least about 50%.
- the serine recombinase is capable of targeting genes comprising a catalase domain or synthase domain.
- the catalase is manganese catalase.
- the synthase is Queuosine synthase.
- the serine recombinase is capable of targeting genes comprising a DUF4244 Pfam domain.
- vectors comprising: a) a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1- 115; and b) one or more regulatory elements.
- the one or more regulatory elements comprises a promoter, an enhancer, an intron, a microRNA, a linker, a splicing element, or a polyA signal.
- the promoter is selected from a constitutive promoter, an inducible promoter, a mini promoter, or a derivative thereof.
- the promoter is selected from the group consisting of: CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl 9, p40, Synapsin, CaMKII, GRK1, polH, EM7, OpIEl, and a derivative thereof.
- vectors comprising a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115, wherein the vector is selected from the group consisting of: a plasmid, a nanoplasmid, a phagemid, a phage derivative, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), and a cosmid.
- BAC bacterial artificial chromosome
- YAC yeast artificial chromosome
- Described herein are methods for gene editing comprising: a) providing or identifying a first attachment site sequence in a host genome; b) providing a nucleic acid comprising a donor polynucleotide and a second attachment site sequence to a host cell; and c) contacting the host cell with a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase, wherein the first attachment site sequence and the second attachment site sequence are capable of recombination.
- the first attachment site sequence is endogenous in the host genome.
- the first attachment site sequence is provided using viral delivery.
- the first attachment site sequence is provided using a transposase. In some embodiments, the first attachment site sequence is provided using a nuclease. In some embodiments, the nuclease is a double-strand nuclease. In some embodiments, the nuclease is a Type II CRISPR endonuclease. In some embodiments, the nuclease is a Type V CRISPR endonuclease. In some embodiments, the nuclease is Cas9. In some embodiments, the first attachment site sequence is provided using a reverse transcriptase. In some embodiments, the second attachment site sequence is 5’ of the donor polynucleotide.
- the first attachment site sequence is a bacterial genomic recombination sequence (attB). In some embodiments, the first attachment site sequence is a phage genomic recombination sequence (attP). In some embodiments, the second attachment site sequence is a bacterial genomic recombination sequence (attB). In some embodiments, the second attachment site sequence is a phage genomic recombination sequence (attP). In some embodiments, the attB sequence comprises about 20 to about 500 nucleotides. In some embodiments, the attP sequence comprises about 20 to about 500 nucleotides.
- the nucleic acid comprising the donor polynucleotide and the second attachment site sequence is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (B AC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
- the nucleic acid encoding the serine recombinase is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
- the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus.
- the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV- rhM4-l, AAV-hu37, AAV-Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP- EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV- HSC10, AAV-HSC11, AAV-
- the herpesvirus is HSV-1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8.
- the donor polynucleotide comprises a size of at least about 1 kilobase (kb), 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, 100 kb, 110 kb, 120 kb, or more than 120 kb.
- the donor polynucleotide encodes a therapeutic, a reporter, or a marker.
- the reporter comprises a fluorescent protein.
- the fluorescent protein is GFP, EBFP, EBFP2, Azurite, mKalamal, ECFP, Cerulean, CyPet, YFP, Citrine, Venus, YPet, RFP, CFP, or a derivative thereof.
- the reporter is acetohydroxyacid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucuronidase (GUS), chloramphenicol acetyltransferase (CAT), horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, or a derivative thereof.
- the marker is an antibiotic resistance marker.
- the antibiotic resistance marker is kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin B, tetracycline, chloramphenicol, neomycin, zeocin, or a derivative thereof.
- the marker is a cell surface marker.
- FIG. 1 shows a phylogenetic protein tree of LSRs of the disclosure.
- the tree was inferred from a global multiple sequence alignment of LSR sequences clustered at 90% amino acid identity (AAI). Selected serine recombinase family candidates are highlighted by large dots.
- FIGs. 2A-2C show a schematic of an exemplary in vitro screening procedure for serine recombinase recombination activity.
- FIG. 2A shows a schematic of recombinase in vitro expression from a linear or circular dsDNA construct.
- FIG. 2B shows a schematic for a recombination reaction using integrase that is added to the recombination reaction together with attP and attB dsDNA fragments specific to the serine recombinase.
- FIG. 2C shows a schematic of a PCR analysis by agarose gel electrophoresis of the recombined DNA amplified by attL- and attR-specific primers.
- SEQ ID NOs: 1-115 show amino acid sequences of MG178 family large serine recombinases suitable for use in gene editing as described herein.
- DLBs DNA double- stranded breaks
- HR homologous recombination
- lentiviruses or adeno-associated viruses in combination with a CRISPR nuclease are used to insert large pieces of DNA, for example whole genes.
- lentiviral -mediated integration lacks the targetability feature, as integration occurs mostly randomly in open chromatin.
- AAV-mediated delivery has a limited cargo capacity and is not available for all cell types.
- a safe and efficient targeted genome editing system that allows for large template integration is needed.
- the present disclosure is based, in part, upon the development of gene editing systems comprising large serine recombinases (LSRs) or serine recombinases for targetable and programmable integration of large fragments of DNA into a eukaryotic genome.
- LSRs large serine recombinases
- serine recombinases described herein can integrate multi-kilobase DNA sequences.
- the term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within one or more than one standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value.
- nucleotide refers to a base-sugar-phosphate combination.
- Contemplated nucleotides include naturally occurring nucleotides and synthetic nucleotides.
- Nucleotides are monomeric units of a nucleic acid sequence (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)).
- nucleotide includes ribonucleoside triphosphates adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP, dCTP, diTP, dUTP, dGTP, dTTP, or derivatives thereof.
- ribonucleoside triphosphates adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP)
- deoxyribonucleoside triphosphates such as dATP, dCTP, diTP, dUTP, dGTP, dTTP, or derivatives thereof.
- Such derivatives include, for example, [aS]dATP, 7-deaza-dGTP and 7-deaza-dATP, and nucleot
- nucleotide as used herein encompasses dideoxyribonucleoside triphosphates (ddNTPs) and their derivatives.
- ddNTPs dideoxyribonucleoside triphosphates
- Illustrative examples of ddNTPs include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP.
- a nucleotide may be unlabeled or detectably labeled, such as using moieties comprising optically detectable moieties (e.g., fluorophores) or quantum dots.
- Detectable labels include, for example, radioactive isotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels, and enzyme labels.
- Fluorescent labels of nucleotides include but are not limited fluorescein, 5-carboxyfluorescein (FAM), 2'7'- dimethoxy-4'5-dichloro-6-carboxyfluorescein (JOE), rhodamine, 6-carboxyrhodamine (R6G), N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy-X-rhodamine (ROX), 4-(4 'dimethylaminophenylazo) benzoic acid (DABCYL), Cascade Blue, Oregon Green, Texas Red, Cyanine and 5-(2'-aminoethyl)aminonaphthalene-l -sulfonic acid (EDANS).
- FAM 5-carboxyfluorescein
- JE 2'7'- dimethoxy-4'5-dichloro-6-carboxyfluorescein
- rhodamine 6-car
- fluorescently labeled nucleotides include [R6G]dUTP, [TAMRA]dUTP, [R110]dCTP, [R6G]dCTP, [TAMRA]dCTP, [JOE]ddATP, [R6G]ddATP, [FAM]ddCTP, [R110]ddCTP, [TAMRA]ddGTP, [ROX]ddTTP, [dR6G]ddATP, [dR110]ddCTP, [dTAMRA]ddGTP, and [dROX]ddTTP available from Perkin Elmer, Foster City, Calif; FluoroLink DeoxyNucleotides, FluoroLink Cy3-dCTP, FluoroLink Cy5-dCTP, FluoroLink Fluor X-dCTP, FluoroLink Cy3-dUTP, and FluoroLink Cy5-dUTP available from Amersham, Arlington Heights, IL; Fluorescein- 15 -d
- nucleotide encompasses chemically modified nucleotides.
- An exemplary chemically- modified nucleotide is biotin-dNTP.
- biotinylated dNTPs include, biotin-dATP (e.g., bio-N6-ddATP, biotin- 14-dATP), biotin-dCTP (e.g., biotin- 11-dCTP, biotin- 14-dCTP), and biotin-dUTP (e.g., biotin- 11-dUTP, biotin- 16-dUTP, biotin-20-dUTP).
- polynucleotide oligonucleotide
- nucleic acid a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof, either in single-, double-, or multi -stranded form.
- Contemplated polynucleotides include a gene or fragment thereof.
- Exemplary polynucleotides include, but are not limited to, DNA, RNA, coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, cell-free polynucleotides including cell-free DNA (cfDNA) and cell-free RNA (cfRNA), nucleic acid probes, and primers.
- loci locus defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short
- a T means U (Uracil) in RNA and T (Thymine) in DNA.
- a polynucleotide can be exogenous or endogenous to a cell and/or exist in a cell-free environment.
- the term polynucleotide encompasses modified polynucleotides e.g., altered backbone, sugar, or nucleobase). If present, modifications to the nucleotide structure are imparted before or after assembly of the polymer.
- Non-limiting examples of modifications include: 5 -bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos, locked nucleic acids, glycol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or fluorescein linked to the sugar), thiol-containing nucleotides, biotin-linked nucleotides, fluorescent base analogs, CpG islands, methyl -7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, queuosine, and wyosine.
- the sequence of nucleotides may be interrupted by non-nucleotide components.
- transfection or “transfected” generally refer to introduction of a nucleic acid into a cell by non-viral or viral-based methods.
- the nucleic acid molecules may be gene sequences encoding complete proteins or functional portions thereof. See, e.g., Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 18.1-18.88.
- peptide refers to a polymer of at least two amino acid residues joined by peptide bond(s). This term does not connote a specific length of polymer, nor is it intended to imply or distinguish whether the peptide is produced using recombinant techniques, chemical or enzymatic synthesis, or is naturally occurring. The terms apply to naturally occurring amino acid polymers as well as amino acid polymers comprising at least one modified amino acid. In some cases, the polymer is interrupted by non-amino acids. The terms include amino acid chains of any length, including full length proteins, and proteins with or without secondary or tertiary structure (e.g., domains).
- amino acid polymer that has been modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, oxidation, and any other manipulation such as conjugation with a labeling component.
- amino acid and amino acids refer to natural and non-natural amino acids, including, but not limited to, modified amino acids.
- Modified amino acids include amino acids that have been chemically modified to include a group or a chemical moiety not naturally present on the amino acid.
- amino acid includes both D-amino acids and L-amino acids.
- non-native refers to a nucleic acid or polypeptide sequence that is non-naturally occurring.
- Non-native refers to a non-naturally occurring nucleic acid or polypeptide sequence that comprises modifications such as mutations, insertions, or deletions.
- the term non-native encompasses fusion nucleic acids or polypeptides that encodes or exhibits an activity (e.g., enzymatic activity, methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.) of the nucleic acid or polypeptide sequence to which the non-native sequence is fused.
- a non-native nucleic acid or polypeptide sequence includes those linked to a naturally-occurring nucleic acid or polypeptide sequence (or a variant thereof) by genetic engineering to generate a chimeric nucleic acid or polypeptide sequence encoding a chimeric nucleic acid or polypeptide.
- promoter refers to the regulatory DNA region which controls transcription or expression of a polynucleotide (e.g., a gene) and which may be located adjacent to or overlapping a nucleotide or region of nucleotides at which RNA transcription is initiated.
- a promoter may contain specific DNA sequences which bind protein factors, often referred to as transcription factors, which facilitate binding of RNA polymerase to the DNA leading to gene transcription.
- Eukaryotic basal promoters typically, though not necessarily, contain a TATA-box and/or a CAAT box.
- expression refers to the process by which a nucleic acid sequence or a polynucleotide is transcribed from a DNA template (such as into mRNA or other RNA transcript) and/or the process by which a transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides may be collectively referred to as “gene product.” If the polynucleotide is derived from genomic DNA, the term expression includes splicing of the mRNA in a eukaryotic cell.
- operably linked refers to an arrangement of genetic elements, e.g., a promoter, an enhancer, a polyadenylation sequence, etc., wherein an operation (e.g., movement or activation) of a first genetic element has some effect on the second genetic element.
- the effect on the second genetic element can be, but need not be, of the same type as operation of the first genetic element.
- two genetic elements are operably linked if movement of the first element causes an activation of the second element.
- a regulatory element which may comprise promoter and/or enhancer sequences, is operatively linked to a coding region if the regulatory element helps initiate transcription of the coding sequence. There may be intervening residues between the regulatory element and coding region so long as this functional relationship is maintained.
- a “vector” as used herein refers to a macromolecule or association of macromolecules that comprises or associates with a polynucleotide and which mediates delivery of the polynucleotide to a cell.
- vectors include nucleic-based vectors (e.g., plasmids and viral vectors) and liposomes.
- An exemplary nucleic-acid based vector comprises genetic elements, e.g., regulatory elements, operatively linked to a gene to facilitate expression of the gene in a target.
- expression cassette and “nucleic acid cassette” are used interchangeably to refer to a component of a vector comprising a combination of nucleic acid sequences or elements (e.g., therapeutic gene, promoter, and a terminator) that are expressed together or are operably linked for expression.
- the terms encompass an expression cassette including a combination of regulatory elements and a gene or genes to which they are operably linked for expression.
- a “functional fragment” of a DNA or protein sequence refers to a fragment that retains a biological activity (either functional or structural) that is substantially similar to a biological activity of the full-length DNA or protein sequence.
- a biological activity of a DNA sequence includes its ability to influence expression in a manner attributed to the full- length sequence.
- engineered refers to an object that has been modified by human intervention.
- the terms refer to a polynucleotide or polypeptide that is non-naturally occurring.
- An engineered peptide has, but does not require, low sequence identity (e.g., less than 50% sequence identity, less than 25% sequence identity, less than 10% sequence identity, less than 5% sequence identity, less than 1% sequence identity) to a naturally occurring human protein.
- VPR and VP64 domains are synthetic transactivation domains.
- Non-limiting examples include the following: a nucleic acid modified by changing its sequence to a sequence that does not occur in nature; a nucleic acid modified by ligating it to a nucleic acid that it does not associate with in nature such that the ligated product possesses a function not present in the original nucleic acid; an engineered nucleic acid synthesized in vitro with a sequence that does not exist in nature; a protein modified by changing its amino acid sequence to a sequence that does not exist in nature; an engineered protein acquiring a new function or property.
- An “engineered” system comprises at least one engineered component.
- a “guide nucleic acid” or “guide polynucleotide” refers to a nucleic acid that may hybridize to a target nucleic acid and thereby directs an associated nuclease to the target nucleic acid.
- a guide nucleic acid is, but is not limited to, RNA (guide RNA or gRNA), DNA, or a mixture of RNA and DNA.
- a guide nucleic acid can include a crRNA or a tracrRNA or a combination of both.
- guide nucleic acid encompasses an engineered guide nucleic acid and a programmable guide nucleic acid to specifically bind to the target nucleic acid.
- a portion of the target nucleic acid may be complementary to a portion of the guide nucleic acid.
- the strand of a double-stranded target polynucleotide that is complementary to and hybridizes with the guide nucleic acid is the complementary strand.
- the strand of the double-stranded target polynucleotide that is complementary to the complementary strand, and therefore is not complementary to the guide nucleic acid is called noncomplementary strand.
- a guide nucleic acid having a polynucleotide chain is a “single guide nucleic acid.”
- a guide nucleic acid having two polynucleotide chains is a “double guide nucleic acid.”
- the term “guide nucleic acid” is inclusive, referring to both single guide nucleic acids and double guide nucleic acids.
- a guide nucleic acid may comprise a segment referred to as a “nucleic acid-targeting segment” or a “nucleic acid-targeting sequence,” or a “spacer.”
- a nucleic acid-targeting segment can include a subsegment referred to as a “protein binding segment” or “protein binding sequence” or “Cas protein binding segment.”
- tracrRNA or “tracr sequence” means trans-activating CRISPR RNA.
- tracrRNA interacts with the CRISPR (cr) RNA to form a guide nucleic acid (e.g., guide RNA or gRNA) that may hybridize to a target nucleic acid and thereby directs an associated nuclease to the target nucleic acid.
- guide nucleic acid e.g., guide RNA or gRNA
- RuvC III domain refers to a third discontinuous segment of a RuvC endonuclease domain (the RuvC nuclease domain being comprised of three discontiguous segments, RuvC I, RuvC II, and RuvC III).
- a RuvC domain or segments thereof can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences (e.g., Pfam HMM PF 18541 for RuvC III).
- HMMs Hidden Markov Models
- HNH domain refers to an endonuclease domain having characteristic histidine and asparagine residues.
- An HNH domain can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences (e.g., Pfam HMM PF01844 for domain HNH).
- HMMs Hidden Markov Models
- transposon refers to mobile elements that move in and out of genomes carrying “cargo DNA” with them. These transposons can differ on the type of nucleic acid to transpose, the type of repeat at the ends of the transposon, the type of cargo to be carried, or by the mode of transposition (i.e., self-repair or host-repair).
- transposase or “transposases” refers to an enzyme that binds to the end of a transposon and catalyzes its movement to another part of the genome. Types of movement include a cut and paste mechanism and a replicative transposition mechanism.
- Tn7 or “Tn7-like transposase” refers to a family of transposases comprising three main components: a heteromeric transposase (TnsA and/or TnsB) alongside a regulator protein (TnsC).
- Tn7 elements can encode dedicated target site- sei ection proteins, TnsD and TnsE.
- TnsABC the sequence-specific DNA-binding protein TnsD directs transposition into a conserved site referred to as the “Tn7 attachment site,” attTn7.
- TnsD is a member of a large family of proteins that also includes TniQ. TniQ has been shown to target transposition into resolution sites of plasmids.
- Genome editing and “genome editing” can be used interchangeably.
- Gene editing or genome editing means to change the nucleic acid sequence of a gene or a genome.
- Genome editing can include, for example, insertions, deletions, and mutations.
- Genome editing can be performed by a gene editing system, for example a nuclease, a reverse transcriptase, a recombinase, or a base editor.
- recombinase refers to an enzyme that mediates the recombination of DNA fragments located between recombinase recognition sequences, which results in the excision, insertion, inversion, exchange or translocation) of the DNA fragments located between the recombinase recognition sequences.
- nucleic acid modification refers to the process by which two or more nucleic acid molecules, or two or more regions of a single nucleic acid molecule, are modified by the action of a recombinase protein. Recombination can result in, inter alia, the insertion, inversion, excision, or translocation of a nucleic acid sequence, e.g., in or between one or more nucleic acid molecules.
- the term “complex” refers to a joining of at least two components.
- the two components may each retain the properties/activities they had prior to forming the complex or gain properties as a result of forming the complex.
- the joining includes, but is not limited to, covalent bonding, non-covalent bonding (i.e., hydrogen bonding, ionic interactions, Van der Waals interactions, and hydrophobic bond), use of a linker, fusion, or any other suitable method.
- Contemplated components of the complex include polynucleotides, polypeptides, or combinations thereof.
- a complex comprises an endonuclease and a guide polynucleotide.
- contig or “contigs” is a set of DNA segments or sequences that overlap in a way that provides a contiguous representation of a genomic region.
- sequence identity or “percent identity” in the context of two or more nucleic acids or polypeptide sequences, refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as measured using a sequence comparison algorithm.
- Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment for polypeptide sequences longer than 30 residues; BLASTP using parameters of a wordlength (W) of 2, an expectation (E) of 1000000, and the PAM30 scoring matrix setting gap costs at 9 to open gaps and 1 to extend gaps for sequences of less than 30 residues (these are the default parameters for BLASTP in the BLAST suite available at https://blast.ncbi.nlm.nih.gov); CLUSTALW with the Smith -Waterman homology search algorithm parameters with a match of 2, a mismatch of -1, and a gap of -1; MUSCLE with default parameters; MAFFT with parameters of a retree of 2 and max iterations of 1000; Novafold with default parameters; HMMER hmmalign with
- optically aligned in the context of two or more nucleic acids or polypeptide sequences, refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned to maximal correspondence of amino acids residues or nucleotides, for example, as determined by the alignment producing a highest or “optimized” percent identity score.
- variants of any of the enzymes described herein with one or more conservative amino acid substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three- dimensional structure or function of the polypeptide.
- Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity, and R chain length for one another. Additionally, or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., non-conserved residues) without altering the basic functions of the encoded proteins.
- Such conservatively substituted variants include variants with at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of the large serine recombinase protein sequences described herein (e.g., MG178 family large serine recombinase, or any other family large serine recombinase described herein).
- such conservatively substituted variants are functional variants.
- Such functional variants can encompass sequences with substitutions such that the activity of one or more critical active site residues are not disrupted.
- a decreased activity variant of a protein described herein comprises a disrupting substitution of at least one, at least two, or all three catalytic residues.
- LSRs Large serine recombinases
- Viral LSRs range between 400 and 700 amino acids long and drive phage genome integration into a bacterial host genome when the virus enters its lysogenic life cycle.
- the mechanism for prophage integration involves the LSR recognizing a specific attachment site in the host genome, the attB site, and a phage attachment site, the attP site, on the phage genome.
- Viral genome integration occurs via recombination at these attachment sites, a process that leads to the generation of two new attachment sites, the attL and attR sites flanking the prophage.
- Serine recombinases described herein provided for genome engineering due to their ability to integrate a desired cargo into a specific target site.
- Serine Recombinases Described herein are gene editing systems comprising: a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase. Further described herein are nucleic acids, vectors, and cells comprising a serine recombinase described herein. Further described herein are means for integrating nucleic acid sequences in a genome.
- Serine recombinases are enzymes that catalyze site-specific recombination events by facilitating DNA strand exchanges between two DNA segments possessing cognate recombinase recognition sites.
- the serine recombinase family comprises, for example, the small serine recombinases gamma-delta resolvase (from the TnlOOO transposon) and Tn3 resolvase (from the Tn3 transposon), or the large serine recombinases (LSRs) cpC31 -integrase (from the q>C31 phage), Bxbl -integrase (from the my cobacteriophage), and R4 integrase.
- LSRs large serine recombinases
- Serine recombinases are characterized by a conserved catalytic serine amino acid residue that attacks the DNA phosphodiester and becomes covalently linked to a DNA strand end during catalysis. Serine recombinases recognize cognate attachment site sequences termed attB on the acceptor DNA strand (for example a bacterial genome) and attP on the donor DNA strand (for example the phage genome). After the recombination event, the attB and attP sites are recombined to form the attL and attR sites flanking the newly integrated sequence. attB and attP sites are typically up to about 50 bases long.
- the serine recombinases form a tetrameric complex, with a protein dimer each attaching to an attB or attP attachment site.
- the serine recombinases cleave each strand producing a double strand break and leaving a 2 bp overhang and then strand exchange and ligate the strands.
- no other enzymes are needed to perform the reaction.
- the serine recombinase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least
- the serine recombinase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 1-115.
- the serine recombinase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 90% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 1-115.
- the serine recombinase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having 100% identity to any one of SEQ ID NOs: 1-115.
- eukaryotic cells comprising a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115.
- the eukaryotic cell is a mammalian cell. In some embodiments, the eukaryotic cell is a human cell.
- the serine recombinases described herein comprise improved integration efficiency. In some embodiments, the serine recombinases described herein comprise an integration efficiency of at least about 5%. In some embodiments, the serine recombinases described herein comprise an integration efficiency of at least about 25%. In some embodiments, the serine recombinases described herein comprise an integration efficiency of at least about 50%. In some embodiments, the serine recombinases described herein comprise an integration efficiency of at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or more than 95%.
- the serine recombinases described herein comprise an improved integration efficiency as compared to a serine recombinase selected from the group consisting of: P-six, CinH, ParA y5, Bxbl, cpC31, TP901, TGI, cpBTl, R4, cpRVl, cpFCl, MR11, Al 18, U153, and gp29.
- the serine recombinase is a viral, prokaryotic, or eukaryotic serine recombinase. In some embodiments, the serine recombinase is capable of targeting genes comprising a catalase domain or synthase domain. In some embodiments, the catalase is manganese catalase. In some embodiments, the synthase is Queuosine synthase. In some embodiments, the serine recombinase is capable of targeting genes comprising a DUF4244 Pfam domain.
- the serine recombinase described herein comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of serine recombinase.
- NLS nuclear localization sequences
- the NLS comprises any of the sequences in Table 1 below, or a combination thereof:
- the serine recombinase comprises a tag.
- the tag is an affinity tag.
- affinity tags include, but are not limited to, a His-tag, a Flag tag, a Myc-tag, an MBP-tag, and a GST-tag.
- the serine recombinase comprises a protease cleavage site.
- exemplary protease cleavage sites include, but are not limited to, a TEV site, a C3 site, a Factor Xa site, and an Enterokinase site.
- compositions comprising: a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase; and a nucleic acid comprising a donor polynucleotide and a first attachment site sequence.
- the first attachment site sequence is 5’ of the donor polynucleotide.
- the nucleic acid encoding the serine recombinase further comprises a second attachment site sequence. In some embodiments, the second attachment site sequence is 5’ of the serine recombinase. In some embodiments, the nucleic acid encoding the serine recombinase comprises one or more attachment site sequences. In some embodiments, the nucleic acid encoding the serine recombinase comprises 1, 2, 3, 4, 5, or more than 5 attachment site sequences.
- the nucleic acid comprising a donor polynucleotide comprises one or more attachment site sequences. In some embodiments, the nucleic acid comprising a donor polynucleotide comprises 1, 2, 3, 4, 5, or more than 5 attachment site sequences.
- the first attachment site sequence and the second attachment site sequence are capable of recombination.
- the first attachment site sequence is a bacterial genomic recombination sequence (attB).
- the attB sequence comprises about 20 to about 500 nucleotides.
- the attB sequence comprises about 20 to about 450, about 20 to about 400, about 20 to about 350, about 20 to about 300, about 20 to about 250, about 20 to about 200, about 20 to about 250, about 20 to about 100, about 20 to about 50, about 50 to about 450, about 50 to about 400, about 50 to about 350, about 50 to about 300, about 50 to about 250, about 50 to about 200, about 50 to about 150, about 50 to about 100, about 100 to about 450, about 100 to about 400, about 100 to about 350, about 100 to about 300, about 100 to about 250, about 100 to about 200, or about 100 to about 150 nucleotides.
- the first attachment site sequence is a phage genomic recombination sequence (attP).
- the attP sequence comprises about 20 to about 450, about 20 to about 400, about 20 to about 350, about 20 to about 300, about 20 to about 250, about 20 to about 200, about 20 to about 250, about 20 to about 100, about 20 to about 50, about 50 to about 450, about 50 to about 400, about 50 to about 350, about 50 to about 300, about 50 to about 250, about 50 to about 200, about 50 to about 150, about 50 to about 100, about 100 to about 450, about 100 to about 400, about 100 to about 350, about 100 to about 300, about 100 to about 250, about 100 to about 200, or about 100 to about 150 nucleotides.
- the second attachment site sequence is a bacterial genomic recombination sequence (attB).
- the attB sequence comprises about 20 to about 500 nucleotides.
- the attB sequence comprises about 20 to about 450, about 20 to about 400, about 20 to about 350, about 20 to about 300, about 20 to about 250, about 20 to about 200, about 20 to about 250, about 20 to about 100, about 20 to about 50, about 50 to about 450, about 50 to about 400, about 50 to about 350, about 50 to about 300, about 50 to about 250, about 50 to about 200, about 50 to about 150, about 50 to about 100, about 100 to about 450, about 100 to about 400, about 100 to about 350, about 100 to about 300, about 100 to about 250, about 100 to about 200, or about 100 to about 150 nucleotides.
- the second attachment site sequence is a phage genomic recombination sequence (attP).
- the attP sequence comprises about 20 to about 450, about 20 to about 400, about 20 to about 350, about 20 to about 300, about 20 to about 250, about 20 to about 200, about 20 to about 250, about 20 to about 100, about 20 to about 50, about 50 to about 450, about 50 to about 400, about 50 to about 350, about 50 to about 300, about 50 to about 250, about 50 to about 200, about 50 to about 150, about 50 to about 100, about 100 to about 450, about 100 to about 400, about 100 to about 350, about 100 to about 300, about 100 to about 250, about 100 to about 200, or about 100 to about 150 nucleotides.
- the nucleic acid comprising the donor polynucleotide and the first attachment site sequence are delivered by a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (B AC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
- eukaryotic genomes comprising a donor polynucleotide and an attL sequence 5’ to the donor polynucleotide sequence.
- the eukaryotic genomes further comprise an attR sequence 3’ to the donor polynucleotide sequence.
- eukaryotic genomes comprising a donor polynucleotide sequence; and an attL sequence 3’ to the donor polynucleotide sequence.
- the eukaryotic genomes further comprise an attR sequence 3’ to the donor polynucleotide sequence.
- eukaryotic genomes comprising a donor polynucleotide sequence and an attL sequence 5’ or 3’ to the donor polynucleotide sequence.
- the attL sequence and the attR sequence are the same. [0074] In some embodiments, the attL sequence is a recombined sequence of a first attachment site sequence and a second attachment site sequence. In some embodiments, the attR sequence is a recombined sequence of a first attachment site sequence and a second attachment site sequence.
- Serine recombinases described herein can provide for integration of polynucleotides (e.g., donor polynucleotides) of large sizes.
- the donor polynucleotide comprises a size of at least about 1 kilobase (kb), 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, or more than 50 kb.
- the donor polynucleotide comprises a size of at least about 15 kb, 20 kb, 25 kb, 30 kb, 35 kb, 50 kb, 100 kb, 200 kb, 300 kb, 400 kb, or 500 kb. In some embodiments, the donor polynucleotide comprises a size of about 200 base pairs (bp) to about 500 kb, 200 bp to about 250 kb, or 200 bp to about 100 kb.
- the donor polynucleotide comprises a size of about 1 kb to about 10 kb, about 1 to about 7.5 kb, about 1 to about 5 kb, about 1 to about 3 kb, about 2 to about 10 kb, about 2 to about 7.5 kb, about 2 to about 5 kb, about 2 to about 3 kb, about 3 to about 10 kb, about 3 to about 7.5 kb, or about 3 to about 5 kb.
- the donor polynucleotide comprises a size of about 10 kb to about 500 kb, 10 kb to about 400 kb, 10 kb to about 300 kb, 10 kb to about 200 kb, 10 kb to about 100 kb, about 10 kb to about 75 kb, about 10 kb to about 50 kb, about 10 kb to about 30 kb, about 20 kb to about 100 kb, about 20 to about 75 kb, about 20 kb to about 50 kb, about 20 kb to about 30 kb, about 30 kb to about 100 kb, about 30 kb to about 75 kb, or about 30 kb to about 50 kb.
- the donor polynucleotide comprises a size of about 10 to about 500, 20 to about 400, 10 to about 300, 10 to about 200, or 10 to about 100. In some embodiments, the donor polynucleotide is circular. In some embodiments, the donor polynucleotide is linear.
- the donor polynucleotide encodes a therapeutic, a reporter, or a marker.
- the reporter comprises a fluorescent protein.
- the fluorescent protein is GFP, EBFP, EBFP2, Azurite, mKalamal, ECFP, Cerulean, CyPet, YFP, Citrine, Venus, YPet, RFP, CFP, or a derivative thereof.
- the reporter is acetohydroxyacid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucuronidase (GUS), chloramphenicol acetyltransferase (CAT), horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, or a derivative thereof.
- the marker is an antibiotic resistance marker.
- the antibiotic resistance marker is kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin B, tetracycline, chloramphenicol, neomycin, zeocin, or a derivative thereof.
- the marker is a cell surface marker.
- the cell surface marker is a membrane protein, a sugar moiety, or a small molecule (for example biotin) presented on the cell surface.
- the cell surface marker is a CD3, B2M, CD4, CD8, CD28, HLA proteins, MHC complex, streptavidin, or avidin.
- the cell surface marker is an antibody for example an IgG, or an antibody fragment for example an scFv, or an Fc.
- the cell surface marker is bound by a specific antibody.
- the cell can be analyzed for expression of the cell surface marker by flow cytometry.
- the nucleic acid encoding the serine recombinase or the serine recombinase gene editing system is a DNA, for example a linear DNA, a plasmid DNA, or a minicircle DNA.
- the nucleic acid is an RNA, for example a mRNA.
- vectors comprising: a) a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1- 115; and b) one or more regulatory elements.
- vectors comprising a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115, wherein the vector is selected from the group consisting of: a plasmid, a nanoplasmid, a phagemid, a phage derivative, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), and a cosmid.
- BAC bacterial artificial chromosome
- YAC yeast artificial chromosome
- the nucleic acid encoding the serine recombinase or the serine recombinase gene editing system is delivered by a nucleic acid-based vector.
- the nucleic acid-based vector is a plasmid (e.g., circular DNA molecules that can autonomously replicate inside a cell), cosmid (e.g., pWE or sCos vectors), artificial chromosome, human artificial chromosome (HAC), yeast artificial chromosomes (YAC), bacterial artificial chromosome (BAC), Pl -derived artificial chromosomes (PAC), phagemid, phage derivative, bacmid, or virus.
- cosmid e.g., pWE or sCos vectors
- HAC human artificial chromosome
- YAC yeast artificial chromosomes
- BAC bacterial artificial chromosome
- PAC Pl -derived artificial chromosomes
- the nucleic acid-based vector is selected from the list consisting of: pSF-CMV-NEO-NH2-PPT-3XFLAG, pSF-CMV-NEO- C00H-3XFLAG, pSF-CMV-PURO-NH2-GST-TEV, pSF-OXB20-COOH-TEV-FLAG(R)- 6His, pCEP4 pDEST27, pSF-CMV-Ub-KrYFP, pSF-CMV-FMDV-daGFP, pEFla-mCherry- N1 vector, pEFla-tdTomato vector, pSF-CMV-FMDV-Hygro, pSF-CMV-PGK-Puro, pMCP-tag(m), pSF-CMV-PUR0-NH2-CMYC, pSF-OXB20-BetaGal,pSF-OXB20-Fluc, pSF-OXB20
- the one or more regulatory elements comprises a promoter, an enhancer, an intron, a microRNA, a linker, a splicing element, or a poly A signal.
- the promoter is selected from a constitutive promoter, an inducible promoter, a mini promoter, or a derivative thereof.
- the promoter is selected from the group consisting of: CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl9, p40, Synapsin, CaMKII, GRK1, polH, EM7, OpIEl, and a derivative thereof.
- the promoter is a U6 promoter.
- the promoter is a CAG promoter.
- the nucleic acid-based vector is a virus.
- the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus.
- the virus is an alphavirus.
- the virus is a parvovirus.
- the virus is an adenovirus.
- the virus is an AAV.
- the virus is a baculovirus.
- the virus is a Dengue virus. In some embodiments, the virus is a lentivirus. In some embodiments, the virus is a herpesvirus. In some embodiments, the virus is a poxvirus. In some embodiments, the virus is an anellovirus. In some embodiments, the virus is a bocavirus. In some embodiments, the virus is a vaccinia virus. In some embodiments, the virus is or a retrovirus.
- the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV- Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11, AAV
- the virus is AAV1 or a derivative thereof. In some embodiments, the virus is AAV2 or a derivative thereof. In some embodiments, the virus is AAV3 or a derivative thereof. In some embodiments, the virus is AAV4 or a derivative thereof. In some embodiments, the virus is AAV5 or a derivative thereof. In some embodiments, the virus is AAV6 or a derivative thereof. In some embodiments, the virus is AAV7 or a derivative thereof. In some embodiments, the virus is AAV8 or a derivative thereof. In some embodiments, the virus is AAV9 or a derivative thereof. In some embodiments, the virus is AAV10 or a derivative thereof. In some embodiments, the virus is AAV1 1 or a derivative thereof.
- the virus is AAV12 or a derivative thereof. In some embodiments, the virus is AAV13 or a derivative thereof. In some embodiments, the virus is AAV14 or a derivative thereof. In some embodiments, the virus is AAV15 or a derivative thereof. In some embodiments, the virus is AAV16 or a derivative thereof. In some embodiments, the virus is AAV-rh8 or a derivative thereof. In some embodiments, the virus is AAV-rhlO or a derivative thereof. In some embodiments, the virus is AAV-rh20 or a derivative thereof. In some embodiments, the virus is AAV-rh39 or a derivative thereof. In some embodiments, the virus is AAV-rh74 or a derivative thereof.
- the virus is AAV-rhM4-l or a derivative thereof. In some embodiments, the virus is AAV-hu37 or a derivative thereof. In some embodiments, the virus is AAV- Anc80 or a derivative thereof. In some embodiments, the virus is AAV-Anc80L65 or a derivative thereof. In some embodiments, the virus is AAV-7m8 or a derivative thereof. In some embodiments, the virus is AAV-PHP-B or a derivative thereof. In some embodiments, the virus is AAV-PHP-EB or a derivative thereof. In some embodiments, the virus is AAV- 2.5 or a derivative thereof. In some embodiments, the virus is AAV-2tYF or a derivative thereof.
- the virus is AAV-3B or a derivative thereof. In some embodiments, the virus is AAV-LK03 or a derivative thereof. In some embodiments, the virus is AAV-HSC1 or a derivative thereof. In some embodiments, the virus is AAV-HSC2 or a derivative thereof. In some embodiments, the virus is AAV-HSC3 or a derivative thereof. In some embodiments, the virus is AAV-HSC4 or a derivative thereof. In some embodiments, the virus is AAV-HSC5 or a derivative thereof. In some embodiments, the virus is AAV-HSC6 or a derivative thereof. In some embodiments, the virus is AAV-HSC7 or a derivative thereof.
- the virus is AAV-HSC8 or a derivative thereof. In some embodiments, the virus is AAV-HSC9 or a derivative thereof. In some embodiments, the virus is AAV-HSC10 or a derivative thereof. In some embodiments, the virus is AAV-HSC11 or a derivative thereof. In some embodiments, the virus is AAV- HSC12 or a derivative thereof. In some embodiments, the virus is AAV-HSC13 or a derivative thereof. In some embodiments, the virus is AAV-HSC14 or a derivative thereof. In some embodiments, the virus is AAV-HSC15 or a derivative thereof. In some embodiments, the virus is AAV-TT or a derivative thereof.
- the virus is AAV-DJ/8 or a derivative thereof. In some embodiments, the virus is AAV-Myo or a derivative thereof. In some embodiments, the virus is AAV-NP40 or a derivative thereof. In some embodiments, the virus is AAV-NP59 or a derivative thereof. In some embodiments, the virus is AAV- NP22 or a derivative thereof. In some embodiments, the virus is AAV-NP66 or a derivative thereof. In some embodiments, the virus is AAV-HSC16 or a derivative thereof. [0089] In some embodiments, the virus is HSV-1 or a derivative thereof. In some embodiments, the virus is HSV-2 or a derivative thereof. In some embodiments, the virus is VZV or a derivative thereof.
- the virus is EBV or a derivative thereof. In some embodiments, the virus is CMV or a derivative thereof. In some embodiments, the virus is HHV-6 or a derivative thereof. In some embodiments, the virus is HHV-7 or a derivative thereof. In some embodiments, the virus is HHV-8 or a derivative thereof.
- the nucleic acid encoding the serine recombinase or a serine recombinase gene editing system is delivered by a non-nucleic acid-based delivery system (e.g., a non-viral delivery system).
- a non-viral delivery system e.g., a non-viral delivery system
- the non-viral delivery system is a liposome.
- the nucleic acid is associated with a lipid.
- the nucleic acid associated with a lipid in some embodiments, is encapsulated in the aqueous interior of a liposome, interspersed within the lipid bilayer of a liposome, attached to a liposome via a linking molecule that is associated with both the liposome and the nucleic acid, entrapped in a liposome, complexed with a liposome, dispersed in a solution containing a lipid, mixed with a lipid, combined with a lipid, contained as a suspension in a lipid, contained or complexed with a micelle, or otherwise associated with a lipid.
- the nucleic acid is comprised in a lipid nanoparticle (LNP).
- the serine recombinase or the serine recombinase gene editing system is introduced into the cell in any suitable way, either stably or transiently.
- the serine recombinase or the serine recombinase gene editing system is transfected into the cell.
- the cell is transduced or transfected with a nucleic acid construct that encodes the serine recombinase or the serine recombinase gene editing system.
- a cell is transduced (e.g., with a virus encoding the serine recombinase or the serine recombinase gene editing system), or transfected (e.g., with a plasmid encoding the serine recombinase or the serine recombinase gene editing system) with a nucleic acid that encodes the serine recombinase or the serine recombinase gene editing system, or the translated the serine recombinase or the serine recombinase gene editing system.
- the transduction is a stable or transient transduction.
- a plasmid expressing the serine recombinase or the serine recombinase gene editing system is introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction (for example lentivirus or AAV) or other methods known to those of skill in the art.
- the gene editing system is introduced into the cell as one or more polypeptides.
- delivery is achieved through the use of RNP complexes. Delivery methods to cells for polypeptides and/or RNPs are known in the art, for example by electroporation or by cell squeezing.
- Exemplary methods of delivery of nucleic acids include lipofection, nucleofection, electroporation, stable genome integration (e.g., piggybac), microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA.
- lipofection is described in e.g., U.S. Pat. Nos.
- lipofection reagents are sold commercially (e.g., TransfectamTM, LipofectinTM and SF Cell Line 4D-Nucleofector X KitTM (Lonza)).
- Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of WO 91/17424 and WO 91/16024.
- the delivery is to cells (e.g., in vitro or ex vivo administration) or target tissues (e.g., in vivo administration).
- the nucleic acid is comprised in a liposome or a nanoparticle that specifically targets a host cell.
- delivery of the serine recombinase or the serine recombinase gene editing system to the target nucleic acid site comprises delivering a nucleic acid comprising an open reading frame encoding the serine recombinase or the serine recombinase gene editing system.
- the nucleic acid comprises a promoter.
- the open reading frame encoding the serine recombinase or the serine recombinase gene editing system is operably linked to the promoter.
- the promoter is a ribonucleic acid (RNA) pol III promoter.
- delivery of the serine recombinase or the serine recombinase gene editing system to the target nucleic acid site comprises delivering a capped mRNA containing the open reading frame encoding the serine recombinase or the serine recombinase gene editing system. In some embodiments, delivery of the serine recombinase or the serine recombinase gene editing system to the target nucleic acid site comprises delivering a translated polypeptide.
- delivery of the serine recombinase or the serine recombinase gene editing system to the target nucleic acid site comprises delivering a deoxyribonucleic acid (DNA) encoding the serine recombinase or the serine recombinase gene editing system operably linked to a ribonucleic acid (RNA) pol III promoter.
- DNA deoxyribonucleic acid
- RNA ribonucleic acid
- lipid nanoparticles comprising the serine recombinase or the serine recombinase gene editing system of the disclosure for delivery into a cell.
- the lipid nanoparticle comprises the serine recombinase or the serine recombinase gene editing system or a nucleic acid encoding the serine recombinase or the serine recombinase gene editing system. In some embodiments, the lipid nanoparticle comprises the one or more components of the serine recombinase gene editing system. In some embodiments, the lipid nanoparticle comprises the serine recombinase or a nucleic acid encoding the serine recombinase. In some embodiments, the lipid nanoparticle comprises the donor polynucleotide.
- the lipid nanoparticle is tethered to the serine recombinase gene editing system.
- Lipid nanoparticles as described herein can be 4-component lipid nanoparticles.
- Such nanoparticles can be configured for delivery of RNA or other nucleic acids (e.g., synthetic RNA, mRNA, or in vv/ra-synthesized mRNA) and can be generally formulated as described in WO2012135805A2.
- Such nanoparticles can generally comprise: (a) a cationic lipid, (b) a neutral lipid (e.g., DSPC or DOPE), (c) a sterol (e.g., cholesterol or a cholesterol analog), or (d) a PEG-modified lipid (e.g., PEG-DMG).
- Cationic lipid formulations can include particles comprising either 3 or 4 or more components in addition to polynucleotide, primary construct, or RNA (e.g., mRNA).
- RNA e.g., mRNA
- formulations with certain cationic lipids include, but are not limited to, 98N12-5 and may contain 42% lipidoid, 48% cholesterol and 10% PEG (Cl 4 or greater alkyl chain length).
- formulations with certain lipidoids include, but are not limited to, C12-200 and may contain 50% cationic lipid, 10% disteroylphosphatidyl choline, 38.5% cholesterol, and 1.5% PEG-DMG.
- the cationic lipid nanoparticle comprises a cationic lipid, a PEG-modified lipid, a sterol, and a non-cationic lipid.
- the cationic lipid nanoparticle has a molar ratio of about 20-60% cationic lipid: about 5-25% non-cationic lipid: about 25-55% sterol; and about 0.5-15% PEG-modified lipid.
- the cationic lipid nanoparticle comprises a molar ratio of about 50% cationic lipid, about 1.5% PEG-modified lipid, about 38.5% cholesterol, and about 10% non-cationic lipid.
- the cationic lipid nanoparticle comprises a molar ratio of about 55% cationic lipid, about 2.5% PEG-modified lipid, about 32.5% cholesterol, and about 10% noncationic lipid.
- the cationic lipid is an ionizable cationic lipid
- the noncationic lipid is a neutral lipid
- the sterol is a cholesterol.
- the cationic lipid nanoparticle has a molar ratio of 50:38.5: 10: 1.5 of cationic lipid: cholesterol: PEG2000-DMG:DSPC or DMG:DOPE.
- lipid nanoparticles as described herein can comprise cholesterol, l,2-dioleoyl-sn-glycero-3 -phosphoethanolamine (DOPE), 1, l‘-((2-(4-(2-((2-(bis(2-hydroxydodecyl)amino)ethyl)(2- hydroxydodecyl)amino)ethyl)piperazin-l-yl)ethyl)azanediyl)bis(dodecan-2-ol) (C 12-200), and DMG-PEG-2000 at molar ratios of 47.5: 16:35: 1.5
- DOPE dioleoyl-sn-glycero-3 -phosphoethanolamine
- the first attachment site sequence is endogenous in the host genome.
- the first attachment site sequence is provided using viral delivery.
- viral delivery comprises use of a virus, wherein the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus.
- the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV- Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11, AAV- HSC12
- the first attachment site sequence is provided using a transposase.
- the transposase is transposase (Tnp) Tn5, Sleeping Beauty transposase, or a Tn7 transposon.
- the gene editing system comprises an enzyme with transposase activity. Additional enzymes with transposase activity include, but are not limited to, retrons and IS200/IS605 transposons.
- the first attachment site sequence is provided using a nuclease.
- the nuclease is a double-strand nuclease.
- the nuclease is a Type II CRISPR endonuclease.
- the nuclease is Cas9.
- Type II CRISPR systems are considered the simplest in terms of components. In Type II CRISPR systems, the processing of the CRISPR array into mature crRNAs does not require the presence of a special endonuclease subunit, but rather a small trans-encoded crRNA (tracrRNA) with a region complementary to the array repeat sequence; the tracrRNA interacts with both its corresponding effector nuclease (e.g., Cas9) and the repeat sequence to form a precursor dsRNA structure, which is cleaved by endogenous RNAse III to generate a mature effector enzyme loaded with both tracrRNA and crRNA.
- tracrRNA trans-encoded crRNA
- Type II nucleases are known as DNA nucleases.
- Type II nucleases generally exhibit a structure consisting of a RuvC-like endonuclease domain that adopts the RNase H fold with an unrelated HNH nuclease domain inserted within the folds of the RuvC-like nuclease domain.
- the RuvC-like domain is responsible for the cleavage of the target (e.g., crRNA complementary) DNA strand, while the HNH domain is responsible for cleavage of the displaced DNA strand.
- Exemplary CRISPR Cas9 proteins include, but are not limited to, Cas9 from Streptococcus pyogenes (UniProtKB - Q99ZW2 (CAS9 STRP1)), Streptococcus thermophilus (UniProtKB - G3ECR1 (CAS9 STRTR)), Staphylococcus aureus (UniProtKB - J7RUA5 (CAS9 STAAU), Campylobacter jejuni (UniProtKB - Q0P897 (CAS9 CAMJE)), Campylobacter lari (UniProtKB - A0A0A8HTA3 (A0A0A8HTA3 CAMLA), Helicobacter canadensis (UniProtKB - C5ZYI3 (C5ZYI3 9HELI)), and Francisella tularensis subsp.
- Streptococcus pyogenes UniProtKB - Q99ZW2 (
- Novicida UniProtKB - A0Q5Y3 (CAS9 FRATN). Additional Type II nucleases are described in International Patent Application Publication WO 2021/226363, WO 2022/159758, and WO 2022/056324.
- the nuclease is a CRISPR nuclease.
- the CRISPR nuclease is a Class 2 Type II SpCas9 or a Class 2 Type V-A Casl2a (previously Cpfl).
- the Type V-A nuclease has a guide RNA of 42-44 nucleotides compared with approximately 100 nt for SpCas9.
- the Type V-A nuclease results in staggered cut sites.
- the Type V-A nuclease results in staggered cut sites to facilitate directed repair pathways, such as microhomologydependent targeted integration (MITI).
- MITI microhomologydependent targeted integration
- the nuclease is a Type V CRISPR endonuclease.
- Type V CRISPR systems are characterized by a nuclease effector (e.g., Casl2) structure similar to that of Type II effectors, comprising a RuvC-like domain. Similar to Type II, most (but not all) Type V CRISPR systems use a tracrRNA to process pre-crRNAs into mature crRNAs; however, unlike Type II systems which requires RNAse III to cleave the pre-crRNA into multiple crRNAs, Type V systems are capable of using the effector nuclease itself to cleave pre-crRNAs.
- Casl2 nuclease effector
- Type V CRISPR systems are known as DNA nucleases. Unlike Type II CRISPR systems, some Type V enzymes (e.g., Casl2a) appear to have a robust single-stranded nonspecific deoxyribonuclease activity that is activated by the first crRNA-directed cleavage of a double-stranded target sequence.
- Type V enzymes e.g., Casl2a
- Type V-A enzymes require a 5’ protospacer adjacent motif (PAM) next to the chosen target site: 5’-TTTV-3’ for Lachnospiraceae bacterium ND2006 LbCasl2a and Acidaminococcus sp. AsCasl2a; and 5’-TTV-3’ for Francisella novicida FnCasl2a.
- PAM sequence is YTV, YYN, or TTN. Additional Type II nucleases are described in International Patent Application Publication WO 2021/226363.
- the first attachment site sequence is provided using a reverse transcriptase.
- Reverse transcription is the translation of an RNA template into a complementary DNA.
- Reverse transcription is performed by enzymes termed reverse transcriptases (RT) that are enzymes with RNA-dependent DNA polymerase activity that create the complementary DNA (cDNA) strand from an RNA template.
- RT reverse transcriptases
- Some of the RT enzymes also have DNA-dependent DNA polymerase activity to create a double-stranded dsDNA.
- Reverse transcriptases can be of viral origin (for example HIV, hepatitis B, Moloney murine leukemia virus (MMLV), or avian myeloblastosis virus (AMV)) or bacterial origin (for example group II introns, retrons/retron-like RTs, diversity-generating retroelements (DGRs), Abi-like RTs, CRISPR-associated RTs, and group Il-like RTs (G2L)).
- Reverse transcriptases of eukaryotic origin comprise the telomerase reverse transcriptase that maintains the telomeres of eukaryotic chromosomes. Reverse transcription allows the introduction of site-directed insertions, deletions, and mutations into the cDNA by encoding them in the RNA template.
- the reverse transcriptase is a viral, prokaryotic, or eukaryotic reverse transcriptase.
- the reverse transcriptase is an MG151, MG153, or MG160 family reverse transcriptase.
- the reverse transcriptase is an MG140, MG146, MG148, MG149, MG151, MG153, MG154, MG155, MG156, MG157, MG158, MG159, MG160, MG163, MG164, MG165, MG166, MG167, MG168, MG169, MG170, or MG176 family reverse transcriptase.
- the reverse transcriptase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of the MG140, MG146, MG148, MG149, MG151, MG153, MG154, MG155, MG156, MG157, MG158, MG159, MG160, MG163, MG164, MG165, MG166, MG167, MG168, MG169, MG170, MG172, MG173, or MG176 family reverse transcriptases or retrotransposases.
- the reverse transcriptase comprises a sequence with at least 80% sequence identity to any one of the MG140, MG146, MG148, MG149, MG151, MG153, MG154, MG155, MG156, MG157, MG158, MG159, MG160, MG163, MG164, MG165, MG166, MG167, MG168, MG169, MG170, MG172, MG173, or MG176 family reverse transcriptases or retrotransposases or variants thereof.
- the reverse transcriptase is smaller than 300 amino acids. In some embodiments, the reverse transcriptase is smaller than 250 amino acids.
- the methods are used to introduce a modification in the genome of a cell.
- the modification is an insertion, deletion, or mutation.
- the methods are used to introduce site-directed insertions, deletions, and/or mutations in the genome of a cell (for example an insertion and a mutation).
- the methods are used in combination with a nucleic acid template to facilitate site-directed insertions into the genome of a cell.
- the cell is a human cell.
- the cell genome or a vector comprised in the cell is modified.
- the cell genome is modified ex vivo.
- the cell genome is modified in vivo.
- the methods described herein further comprise detecting the genome modifications.
- the cell is cultured for a certain amount of time.
- the DNA or RNA is extracted and sequenced, and modified sequence areas are mapped and compared with an unmodified sequence.
- cells are stained with antibodies for protein products that are translated from the modified nucleic acid, and the resulting stained proteins or polypeptides in the cell are analyzed, for example by flow cytometry.
- a cell comprising the serine recombinase or the serine recombinase system described herein.
- the cell e.g., mammalian cell
- the cell comprises the eukaryotic genome described herein.
- the cell is a human cell.
- the cell is a eukaryotic cell (e.g., a plant cell, an animal cell, a protist cell, or a fungi cell), a mammalian cell (a Chinese hamster ovary (CHO) cell, baby hamster kidney (BHK), human embryo kidney (HEK), mouse myeloma (NSO), or human retinal cells), an immortalized cell (e.g., a HeLa cell, a COS cell, a HEK-293T cell, a MDCK cell, a 3T3 cell, a PC 12 cell, a Huh7 cell, a HepG2 cell, a K562 cell, a N2a cell, or a SY5Y cell), an insect cell (e.g., a Spodoptera frugiperda cell, a Trichoplusia ni cell, a Drosophila melanogaster cell, a S2 cell, or a Heliothis virescen
- the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is an immortalized cell. In some embodiments, the cell is an insect cell. In some embodiments, the cell is a yeast cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is a fungal cell. In some embodiments, the cell is a prokaryotic cell.
- the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, a primary cell, or derivative thereof.
- the cell is a liver cell.
- kits comprising one or more nucleic acid constructs encoding the various components of the serine recombinases described herein, e.g., comprising a nucleotide sequence encoding the components of the serine recombinases capable of modifying a target DNA sequence.
- any of the serine recombinases disclosed herein is assembled into a pharmaceutical, diagnostic, or research kit to facilitate its use in therapeutic, diagnostic, or research applications.
- a kit may include one or more containers housing any of the vectors disclosed herein and instructions for use.
- the kit may be designed to facilitate use of the methods described herein by researchers and can take many forms.
- Each of the compositions of the kit may be provided in liquid form (e.g., in solution), or in solid form, (e.g., a dry powder).
- the compositions are constitutable or otherwise processable (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water or a cell culture medium), which may or may not be provided with the kit.
- a suitable solvent or other species for example, water or a cell culture medium
- Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc.
- the written instructions in some embodiments, are in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which instructions can also reflect approval by the agency of manufacture, use, or sale for animal administration.
- LSRs Putative large serine recombinases
- LSR candidates were identified based on the presence of resolvase, recombinase, and Zn-finger domains, as well as catalytic residues required for activity (FIG. 1). Phylogenetic analysis of LSR candidates indicated that these enzymes are encoded in highly diverse genomes, and prophage boundaries were predicted for many (FIG. 1). Prophage genomes mobilized by LSR reached nearly 94 kb in length.
- Prophage boundaries are identified by aligning the contigs containing the LSR with highly similar contig sequences lacking the LSRs, which likely represent the host without the integration event. With integration boundaries delineated, the attachment site’s common cores are identified by searching for repeats near the boundaries.
- the attP and attB sites from the attL, attR, and common core sequences from the native integrated prophage genomic context are determined bioinformatically and tested in in vitro recombination reactions.
- the attB and attP sites are synthesized in gene fragments -300 bp in length with primer binding sites unique to each attachment site end (FIG. 2C).
- Serine recombinases are expressed in vitro, while negative controls include in vitro expression reactions without template (null) (FIG. 2A).
- Negative recombination reaction controls are set up in 10 pL reactions using 50 ng of attB, 50 ng of attP, recombination buffer (20 mM HEPES pH 7.5, 50 pg/mL bovine serum albumin (BSA), 2 mM TCEP, 5 mM MgCh, 100 mM KC1, 5 mM spermidine, 2 mM ZnCh, and 5% glycerol) and 1 pL null reaction (no recombinase template).
- Experimental conditions include 50 ng of attB, 50 ng of attP, and 1 pL of in vitro-Q ⁇ vQSSQ recombinase (FIG. 2B).
- Recombination reactions are incubated at 30 °C for 1 hour and diluted with water at 1 : 10. PCR reactions are then performed with attL- (attB5 and attP3) or attR- (attB3 and attP5) specific primer sets (FIG. 2C) and analyzed on a 2% agarose gel to determine amplification and size of resulting products. Product-forming reactions are Sanger sequenced and aligned to the predicted attL and attR sequences determined bioinformatically.
- Recombinases are tested for their activity in human cells by synthesizing the attB fragment into a target plasmid (pTarget) with the attP site upstream of a promoterless mCherry coding ORF.
- attB fragments are synthesized into a pDonor plasmid encoding a pCMV promoter upstream of the attB site without a downstream coding ORF.
- the pCMV promoter of pDonor When cotransfected with the active recombinase, the pCMV promoter of pDonor is recombined with the pTarget mCherry, and the junction of the pCMV promoter to the mCherry drives transcription and translation of the mCherry coding region. Efficiency of the recombinase is compared to the negative control of a cell population transfected with both pDonor and pTarget without the recombinase plasmid.
- Example 5 Prophetic - Landing pad activity in mammalian cells
- the landing pad, an attP or attB sequence site is (1) found to be endogenous to the human genome sequence, or (2) introduced using viral delivery or by way of a transposable element, (3) integrated into the genome using HDR coupled with a nuclease, or (4) reverse transcribed into the genome using a targeted reverse transcriptase.
- LSR activity to the genome is determined by using a DNA donor comprising (1) a promoter-driven fluorescent protein construct or (2) a promoterless fluorescent coding construct with the cognate attachment (attB/attP) site and/or (3) an antibiotic resistance marker or (4) a screenable cell surface marker.
- the donor is introduced into the cell as a plasmid, a minicircle, a Bacterial Artificial Chromosome, a nanoplasmid, or a linear dsDNA construct to integrate into the landing pad.
- the LSR is transfected into the cell using either, (1) a plasmid encoding for the transcription and translation of the LSR, (2) an mRNA coded for LSR translation, or (3) a purified protein.
- Landing pad efficiency is determined by flow analysis in the case of a fluorescent protein and/or cell surface marker donor, or colony formation under selective conditions and subsequent PCR analysis of exogenous/endogenous DNA junction formation.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Wood Science & Technology (AREA)
- Molecular Biology (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Medicinal Chemistry (AREA)
- Microbiology (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
The disclosure relates generally to gene editing systems comprising large serine recombinases and methods of using such large serine recombinases for integration of nucleic acid sequences.
Description
SERINE RECOMBINASES FOR GENE EDITING
CROSS-REFERENCE
[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/382,692 filed November 7, 2022, which is incorporated by reference in its entirety herein.
BRIEF SUMMARY
[0002] The disclosure is based, in part, upon the development of serine recombinases for use in gene editing systems to integrate nucleic acid sequences.
[0003] Described herein are gene editing systems comprising: a) a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase; and b) a nucleic acid comprising a donor polynucleotide and a first attachment site sequence. In some embodiments, the first attachment site sequence is 5’ of the donor polynucleotide. In some embodiments, the nucleic acid encoding the serine recombinase further comprises a second attachment site sequence. In some embodiments, the second attachment site sequence is 5’ of the serine recombinase. In some embodiments, the first attachment site sequence and the second attachment site sequence are capable of recombination. In some embodiments, the first attachment site sequence is a bacterial genomic recombination sequence (attB). In some embodiments, the first attachment site sequence is a phage genomic recombination sequence (attP). In some embodiments, the second attachment site sequence is a bacterial genomic recombination sequence (attB). In some embodiments, the second attachment site sequence is a phage genomic recombination sequence (attP). In some embodiments, the attB sequence comprises about 20 to about 500 nucleotides. In some embodiments, the attP sequence comprises about 20 to about 500 nucleotides. In some embodiments, the nucleic acid comprising the donor polynucleotide and the first attachment site sequence is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid. In some embodiments, the nucleic acid encoding the serine recombinase is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid. In some embodiments, the virus is an alphavirus, a
parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus. In some embodiments, the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV- rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV-Anc80, AAV- Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV- HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11, AAV-HSC12, AAV-HSC13, AAV-HSC14, AAV-HSC15, AAV-TT, AAV-DJ/8, AAV-Myo, AAV-NP40, AAV-NP59, AAV-NP22, AAV-NP66, or AAV-HSC16, or a derivative thereof. In some embodiments, the herpesvirus is HSV-1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8. In some embodiments, the donor polynucleotide comprises a size of at least about 1 kilobase (kb), 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, 100 kb, 110 kb, 120 kb, or more than 120 kb. In some embodiments, the donor polynucleotide encodes a therapeutic, a reporter, or a marker. In some embodiments, the reporter comprises a fluorescent protein. In some embodiments, the fluorescent protein is GFP, EBFP, EBFP2, Azurite, mKalamal, ECFP, Cerulean, CyPet, YFP, Citrine, Venus, YPet, RFP, CFP, or a derivative thereof. In some embodiments, the reporter is acetohydroxyacid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucuronidase (GUS), chloramphenicol acetyltransferase (CAT), horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, or a derivative thereof. In some embodiments, the marker is an antibiotic resistance marker. In some embodiments, the antibiotic resistance marker is kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin B, tetracycline, chloramphenicol, neomycin, zeocin, or a derivative thereof. In some embodiments, the marker is a cell surface marker.
[0004] Described herein are eukaryotic cells comprising a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115. In some embodiments, the eukaryotic cell is a mammalian cell. In some embodiments, the eukaryotic cell is a human cell. In some embodiments, the serine recombinase comprises an integration efficiency of at least about 5%. In some embodiments, the serine recombinase comprises an integration efficiency of at least about 25%. In some embodiments, the serine recombinase comprises an integration efficiency of at least about 50%. In some embodiments, the serine recombinase is capable of targeting genes comprising a catalase domain or synthase domain. In some
embodiments, the catalase is manganese catalase. In some embodiments, the synthase is Queuosine synthase. In some embodiments, the serine recombinase is capable of targeting genes comprising a DUF4244 Pfam domain.
[0005] Described herein are vectors comprising: a) a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1- 115; and b) one or more regulatory elements. In some embodiments, the one or more regulatory elements comprises a promoter, an enhancer, an intron, a microRNA, a linker, a splicing element, or a polyA signal. In some embodiments, the promoter is selected from a constitutive promoter, an inducible promoter, a mini promoter, or a derivative thereof. In some embodiments, the promoter is selected from the group consisting of: CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl 9, p40, Synapsin, CaMKII, GRK1, polH, EM7, OpIEl, and a derivative thereof.
[0006] Described herein are vectors comprising a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115, wherein the vector is selected from the group consisting of: a plasmid, a nanoplasmid, a phagemid, a phage derivative, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), and a cosmid.
[0007] Described herein are methods for gene editing, comprising: a) providing or identifying a first attachment site sequence in a host genome; b) providing a nucleic acid comprising a donor polynucleotide and a second attachment site sequence to a host cell; and c) contacting the host cell with a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase, wherein the first attachment site sequence and the second attachment site sequence are capable of recombination. In some embodiments, the first attachment site sequence is endogenous in the host genome. In some embodiments, the first attachment site sequence is provided using viral delivery. In some embodiments, the first attachment site sequence is provided using a transposase. In some embodiments, the first attachment site sequence is provided using a nuclease. In some embodiments, the nuclease is a double-strand nuclease. In some embodiments, the nuclease is a Type II CRISPR endonuclease. In some embodiments, the nuclease is a Type V CRISPR endonuclease. In some embodiments, the nuclease is Cas9. In some embodiments, the first attachment site sequence is provided using a reverse transcriptase. In some embodiments, the second attachment site sequence is 5’ of the donor polynucleotide. In some embodiments, the first attachment site sequence is a bacterial genomic recombination sequence (attB). In some embodiments, the first attachment site
sequence is a phage genomic recombination sequence (attP). In some embodiments, the second attachment site sequence is a bacterial genomic recombination sequence (attB). In some embodiments, the second attachment site sequence is a phage genomic recombination sequence (attP). In some embodiments, the attB sequence comprises about 20 to about 500 nucleotides. In some embodiments, the attP sequence comprises about 20 to about 500 nucleotides. In some embodiments, the nucleic acid comprising the donor polynucleotide and the second attachment site sequence is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (B AC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid. In some embodiments, the nucleic acid encoding the serine recombinase is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid. In some embodiments, the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus. In some embodiments, the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV- rhM4-l, AAV-hu37, AAV-Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP- EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV- HSC10, AAV-HSC11, AAV-HSC12, AAV-HSC13, AAV-HSC14, AAV-HSC15, AAV-TT, AAV-DJ/8, AAV-Myo, AAV-NP40, AAV-NP59, AAV-NP22, AAV-NP66, or AAV- HSC16, or a derivative thereof. In some embodiments, the herpesvirus is HSV-1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8. In some embodiments, the donor polynucleotide comprises a size of at least about 1 kilobase (kb), 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, 100 kb, 110 kb, 120 kb, or more than 120 kb. In some embodiments, the donor polynucleotide encodes a therapeutic, a reporter, or a marker. In some embodiments, the reporter comprises a fluorescent protein. In some embodiments, the fluorescent protein is GFP, EBFP, EBFP2, Azurite, mKalamal, ECFP, Cerulean, CyPet, YFP, Citrine, Venus, YPet, RFP, CFP, or a derivative thereof. In some embodiments, the reporter is acetohydroxyacid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucuronidase (GUS), chloramphenicol acetyltransferase (CAT), horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, or a derivative thereof. In
some embodiments, the marker is an antibiotic resistance marker. In some embodiments, the antibiotic resistance marker is kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin B, tetracycline, chloramphenicol, neomycin, zeocin, or a derivative thereof. In some embodiments, the marker is a cell surface marker.
BRIEF DESCRIPTION OF THE DRAWINGS
[0008] The novel features of the disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings of which:
[0009] FIG. 1 shows a phylogenetic protein tree of LSRs of the disclosure. The tree was inferred from a global multiple sequence alignment of LSR sequences clustered at 90% amino acid identity (AAI). Selected serine recombinase family candidates are highlighted by large dots.
[0010] FIGs. 2A-2C show a schematic of an exemplary in vitro screening procedure for serine recombinase recombination activity. FIG. 2A shows a schematic of recombinase in vitro expression from a linear or circular dsDNA construct. FIG. 2B shows a schematic for a recombination reaction using integrase that is added to the recombination reaction together with attP and attB dsDNA fragments specific to the serine recombinase. FIG. 2C shows a schematic of a PCR analysis by agarose gel electrophoresis of the recombined DNA amplified by attL- and attR-specific primers.
BRIEF DESCRIPTION OF THE SEQUENCE LISTING
[0011] The Sequence Listing filed herewith provides exemplary polynucleotide and polypeptide sequences for use in methods, compositions, and systems according to the disclosure. Below are exemplary descriptions of sequences therein.
[0012] SEQ ID NOs: 1-115 show amino acid sequences of MG178 family large serine recombinases suitable for use in gene editing as described herein.
DETAILED DESCRIPTION
[0013] Site-directed gene editing systems are powerful tools for site-directed genome engineering in cells. Most of the current gene editing systems depend on DNA double-
stranded breaks (DSBs) to direct cellular DNA repair pathways such as homologous recombination (HR). However, these gene editing systems are often correlated with high indel rates, low insertion efficiency, high off-target activity, and a limited cargo size.
[0014] Additionally, the repair or insertion of longer pieces of DNA has remained challenging, and a safe and efficient way of targeted integration of large templates into a genome, for example for gene therapies or engineered cell therapies, is lacking. To date, lentiviruses or adeno-associated viruses (AAV) in combination with a CRISPR nuclease are used to insert large pieces of DNA, for example whole genes. However, lentiviral -mediated integration lacks the targetability feature, as integration occurs mostly randomly in open chromatin. AAV-mediated delivery has a limited cargo capacity and is not available for all cell types. A safe and efficient targeted genome editing system that allows for large template integration is needed.
[0015] The present disclosure is based, in part, upon the development of gene editing systems comprising large serine recombinases (LSRs) or serine recombinases for targetable and programmable integration of large fragments of DNA into a eukaryotic genome. In some embodiments, serine recombinases described herein can integrate multi-kilobase DNA sequences.
Definitions
[0016] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as is commonly understood by one of skill in the art to which the claimed subject matter belongs. It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of any subject matter claimed. The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.
[0017] The practice of some methods disclosed herein employ, unless otherwise indicated, techniques of immunology, biochemistry, chemistry, molecular biology, microbiology, cell biology, genomics, and recombinant DNA. See for example Sambrook and Green, Molecular Cloning: A Laboratory Manual, 4th Edition (2012); the series Current Protocols in Molecular Biology (F. M. Ausubel, et al. eds.); the series Methods In Enzymology (Academic Press, Inc.), PCR 2: A Practical Approach (M.J. MacPherson, B.D. Hames and G.R. Taylor eds. (1995)), Harlow and Lane, eds. (1988) Antibodies, A Laboratory Manual, and Culture of Animal Cells: A Manual of Basic Technique and Specialized Applications, 6th Edition (R.I. Freshney, ed. (2010)).
[0018] As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, to the extent that the terms “including”, “includes”, “having”, “has”, “with”, or variants thereof are used in either the detailed description and/or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising”.
[0019] The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within one or more than one standard deviation, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, up to 15%, up to 10%, up to 5%, or up to 1% of a given value.
[0020] The term “nucleotide,” as used herein, refers to a base-sugar-phosphate combination. Contemplated nucleotides include naturally occurring nucleotides and synthetic nucleotides. Nucleotides are monomeric units of a nucleic acid sequence (e.g., deoxyribonucleic acid (DNA) and ribonucleic acid (RNA)). The term nucleotide includes ribonucleoside triphosphates adenosine triphosphate (ATP), uridine triphosphate (UTP), cytosine triphosphate (CTP), guanosine triphosphate (GTP) and deoxyribonucleoside triphosphates such as dATP, dCTP, diTP, dUTP, dGTP, dTTP, or derivatives thereof. Such derivatives include, for example, [aS]dATP, 7-deaza-dGTP and 7-deaza-dATP, and nucleotide derivatives that confer nuclease resistance on the nucleic acid molecule containing them. The term nucleotide as used herein encompasses dideoxyribonucleoside triphosphates (ddNTPs) and their derivatives. Illustrative examples of ddNTPs include, but are not limited to, ddATP, ddCTP, ddGTP, ddITP, and ddTTP. A nucleotide may be unlabeled or detectably labeled, such as using moieties comprising optically detectable moieties (e.g., fluorophores) or quantum dots. Detectable labels include, for example, radioactive isotopes, fluorescent labels, chemiluminescent labels, bioluminescent labels, and enzyme labels. Fluorescent labels of nucleotides include but are not limited fluorescein, 5-carboxyfluorescein (FAM), 2'7'- dimethoxy-4'5-dichloro-6-carboxyfluorescein (JOE), rhodamine, 6-carboxyrhodamine (R6G), N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA), 6-carboxy-X-rhodamine (ROX), 4-(4 'dimethylaminophenylazo) benzoic acid (DABCYL), Cascade Blue, Oregon Green, Texas Red, Cyanine and 5-(2'-aminoethyl)aminonaphthalene-l -sulfonic acid (EDANS). Specific examples of fluorescently labeled nucleotides include [R6G]dUTP, [TAMRA]dUTP, [R110]dCTP, [R6G]dCTP, [TAMRA]dCTP, [JOE]ddATP, [R6G]ddATP, [FAM]ddCTP, [R110]ddCTP, [TAMRA]ddGTP, [ROX]ddTTP, [dR6G]ddATP,
[dR110]ddCTP, [dTAMRA]ddGTP, and [dROX]ddTTP available from Perkin Elmer, Foster City, Calif; FluoroLink DeoxyNucleotides, FluoroLink Cy3-dCTP, FluoroLink Cy5-dCTP, FluoroLink Fluor X-dCTP, FluoroLink Cy3-dUTP, and FluoroLink Cy5-dUTP available from Amersham, Arlington Heights, IL; Fluorescein- 15 -d ATP, Fluorescein- 12-dUTP, Tetramethyl-rodamine-6-dUTP, IR770-9-dATP, Fluorescein- 12-ddUTP, Fluorescein- 12- UTP, and Fluorescein- 15 -2 '-d ATP available from Boehringer Mannheim, Indianapolis, Ind.; and Chromosome Labeled Nucleotides, BODIPY-FL-14-UTP, BODIPY-FL-4-UTP, B0DIPY-TMR-14-UTP, BODIPY-TMR-14-dUTP, BODIPY-TR-14-UTP, BODIPY-TR-14- dUTP, Cascade Blue-7-UTP, Cascade Blue-7-dUTP, fluorescein- 12-UTP, fluorescein- 12- dUTP, Oregon Green 488-5-dUTP, Rhodamine Green-5-UTP, Rhodamine Green-5-dUTP, tetramethylrhodamine-6-UTP, tetramethylrhodamine-6-dUTP, Texas Red-5-UTP, Texas Red-5-dUTP, and Texas Red-12-dUTP available from Molecular Probes, Eugene, Oreg. The term nucleotide encompasses chemically modified nucleotides. An exemplary chemically- modified nucleotide is biotin-dNTP. Non-limiting examples of biotinylated dNTPs include, biotin-dATP (e.g., bio-N6-ddATP, biotin- 14-dATP), biotin-dCTP (e.g., biotin- 11-dCTP, biotin- 14-dCTP), and biotin-dUTP (e.g., biotin- 11-dUTP, biotin- 16-dUTP, biotin-20-dUTP). [0021] The terms “polynucleotide,” “oligonucleotide,” and “nucleic acid” are used interchangeably to refer to a polymeric form of nucleotides of any length, either deoxyribonucleotides or ribonucleotides, or analogs thereof, either in single-, double-, or multi -stranded form. Contemplated polynucleotides include a gene or fragment thereof. Exemplary polynucleotides include, but are not limited to, DNA, RNA, coding or non-coding regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA), micro-RNA (miRNA), ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, cell-free polynucleotides including cell-free DNA (cfDNA) and cell-free RNA (cfRNA), nucleic acid probes, and primers. In a polynucleotide when referring to a T, a T means U (Uracil) in RNA and T (Thymine) in DNA. A polynucleotide can be exogenous or endogenous to a cell and/or exist in a cell-free environment. The term polynucleotide encompasses modified polynucleotides e.g., altered backbone, sugar, or nucleobase). If present, modifications to the nucleotide structure are imparted before or after assembly of the polymer. Non-limiting examples of modifications include: 5 -bromouracil, peptide nucleic acid, xeno nucleic acid, morpholinos, locked nucleic acids, glycol nucleic acids, threose nucleic acids, dideoxynucleotides, cordycepin, 7-deaza-GTP, fluorophores (e.g., rhodamine or fluorescein
linked to the sugar), thiol-containing nucleotides, biotin-linked nucleotides, fluorescent base analogs, CpG islands, methyl -7-guanosine, methylated nucleotides, inosine, thiouridine, pseudouridine, dihydrouridine, queuosine, and wyosine. The sequence of nucleotides may be interrupted by non-nucleotide components.
[0022] The terms “transfection” or “transfected” generally refer to introduction of a nucleic acid into a cell by non-viral or viral-based methods. The nucleic acid molecules may be gene sequences encoding complete proteins or functional portions thereof. See, e.g., Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 18.1-18.88.
[0023] The terms “peptide,” “polypeptide,” and “protein” are used interchangeably herein to refer to a polymer of at least two amino acid residues joined by peptide bond(s). This term does not connote a specific length of polymer, nor is it intended to imply or distinguish whether the peptide is produced using recombinant techniques, chemical or enzymatic synthesis, or is naturally occurring. The terms apply to naturally occurring amino acid polymers as well as amino acid polymers comprising at least one modified amino acid. In some cases, the polymer is interrupted by non-amino acids. The terms include amino acid chains of any length, including full length proteins, and proteins with or without secondary or tertiary structure (e.g., domains). The terms also encompass an amino acid polymer that has been modified, for example, by disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation, oxidation, and any other manipulation such as conjugation with a labeling component. The terms “amino acid” and “amino acids,” as used herein, refer to natural and non-natural amino acids, including, but not limited to, modified amino acids. Modified amino acids include amino acids that have been chemically modified to include a group or a chemical moiety not naturally present on the amino acid. The term “amino acid” includes both D-amino acids and L-amino acids.
[0024] As used herein, the “non-native” refers to a nucleic acid or polypeptide sequence that is non-naturally occurring. Non-native refers to a non-naturally occurring nucleic acid or polypeptide sequence that comprises modifications such as mutations, insertions, or deletions. The term non-native encompasses fusion nucleic acids or polypeptides that encodes or exhibits an activity (e.g., enzymatic activity, methyltransferase activity, acetyltransferase activity, kinase activity, ubiquitinating activity, etc.) of the nucleic acid or polypeptide sequence to which the non-native sequence is fused. A non-native nucleic acid or polypeptide sequence includes those linked to a naturally-occurring nucleic acid or polypeptide sequence (or a variant thereof) by genetic engineering to generate a chimeric nucleic acid or polypeptide sequence encoding a chimeric nucleic acid or polypeptide.
[0025] The term “promoter”, as used herein, refers to the regulatory DNA region which controls transcription or expression of a polynucleotide (e.g., a gene) and which may be located adjacent to or overlapping a nucleotide or region of nucleotides at which RNA transcription is initiated. A promoter may contain specific DNA sequences which bind protein factors, often referred to as transcription factors, which facilitate binding of RNA polymerase to the DNA leading to gene transcription. Eukaryotic basal promoters typically, though not necessarily, contain a TATA-box and/or a CAAT box.
[0026] The term “expression”, as used herein, refers to the process by which a nucleic acid sequence or a polynucleotide is transcribed from a DNA template (such as into mRNA or other RNA transcript) and/or the process by which a transcribed mRNA is subsequently translated into peptides, polypeptides, or proteins. Transcripts and encoded polypeptides may be collectively referred to as “gene product.” If the polynucleotide is derived from genomic DNA, the term expression includes splicing of the mRNA in a eukaryotic cell.
[0027] As used herein, “operably linked”, “operable linkage”, “operatively linked”, or grammatical equivalents thereof refer to an arrangement of genetic elements, e.g., a promoter, an enhancer, a polyadenylation sequence, etc., wherein an operation (e.g., movement or activation) of a first genetic element has some effect on the second genetic element. The effect on the second genetic element can be, but need not be, of the same type as operation of the first genetic element. For example, two genetic elements are operably linked if movement of the first element causes an activation of the second element. For instance, a regulatory element, which may comprise promoter and/or enhancer sequences, is operatively linked to a coding region if the regulatory element helps initiate transcription of the coding sequence. There may be intervening residues between the regulatory element and coding region so long as this functional relationship is maintained.
[0028] A “vector” as used herein, refers to a macromolecule or association of macromolecules that comprises or associates with a polynucleotide and which mediates delivery of the polynucleotide to a cell. Examples of vectors include nucleic-based vectors (e.g., plasmids and viral vectors) and liposomes. An exemplary nucleic-acid based vector comprises genetic elements, e.g., regulatory elements, operatively linked to a gene to facilitate expression of the gene in a target.
[0029] As used herein, “expression cassette” and “nucleic acid cassette” are used interchangeably to refer to a component of a vector comprising a combination of nucleic acid sequences or elements (e.g., therapeutic gene, promoter, and a terminator) that are expressed together or are operably linked for expression. The terms encompass an expression cassette
including a combination of regulatory elements and a gene or genes to which they are operably linked for expression.
[0030] A “functional fragment” of a DNA or protein sequence refers to a fragment that retains a biological activity (either functional or structural) that is substantially similar to a biological activity of the full-length DNA or protein sequence. A biological activity of a DNA sequence includes its ability to influence expression in a manner attributed to the full- length sequence.
[0031] The terms “engineered,” “synthetic,” and “artificial” are used interchangeably herein to refer to an object that has been modified by human intervention. For example, the terms refer to a polynucleotide or polypeptide that is non-naturally occurring. An engineered peptide has, but does not require, low sequence identity (e.g., less than 50% sequence identity, less than 25% sequence identity, less than 10% sequence identity, less than 5% sequence identity, less than 1% sequence identity) to a naturally occurring human protein. For example, VPR and VP64 domains are synthetic transactivation domains. Non-limiting examples include the following: a nucleic acid modified by changing its sequence to a sequence that does not occur in nature; a nucleic acid modified by ligating it to a nucleic acid that it does not associate with in nature such that the ligated product possesses a function not present in the original nucleic acid; an engineered nucleic acid synthesized in vitro with a sequence that does not exist in nature; a protein modified by changing its amino acid sequence to a sequence that does not exist in nature; an engineered protein acquiring a new function or property. An “engineered” system comprises at least one engineered component. [0032] As used herein, a “guide nucleic acid” or “guide polynucleotide” refers to a nucleic acid that may hybridize to a target nucleic acid and thereby directs an associated nuclease to the target nucleic acid. A guide nucleic acid is, but is not limited to, RNA (guide RNA or gRNA), DNA, or a mixture of RNA and DNA. A guide nucleic acid can include a crRNA or a tracrRNA or a combination of both. The term guide nucleic acid encompasses an engineered guide nucleic acid and a programmable guide nucleic acid to specifically bind to the target nucleic acid. A portion of the target nucleic acid may be complementary to a portion of the guide nucleic acid. The strand of a double-stranded target polynucleotide that is complementary to and hybridizes with the guide nucleic acid is the complementary strand. The strand of the double-stranded target polynucleotide that is complementary to the complementary strand, and therefore is not complementary to the guide nucleic acid is called noncomplementary strand. A guide nucleic acid having a polynucleotide chain is a “single guide nucleic acid.” A guide nucleic acid having two polynucleotide chains is a “double
guide nucleic acid.” If not otherwise specified, the term “guide nucleic acid” is inclusive, referring to both single guide nucleic acids and double guide nucleic acids. A guide nucleic acid may comprise a segment referred to as a “nucleic acid-targeting segment” or a “nucleic acid-targeting sequence,” or a “spacer.” A nucleic acid-targeting segment can include a subsegment referred to as a “protein binding segment” or “protein binding sequence” or “Cas protein binding segment.”
[0033] The term “tracrRNA” or “tracr sequence” means trans-activating CRISPR RNA. tracrRNA interacts with the CRISPR (cr) RNA to form a guide nucleic acid (e.g., guide RNA or gRNA) that may hybridize to a target nucleic acid and thereby directs an associated nuclease to the target nucleic acid.
[0034] As used herein, the term “RuvC III domain” refers to a third discontinuous segment of a RuvC endonuclease domain (the RuvC nuclease domain being comprised of three discontiguous segments, RuvC I, RuvC II, and RuvC III). A RuvC domain or segments thereof can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences (e.g., Pfam HMM PF 18541 for RuvC III).
[0035] As used herein, the term “HNH domain” refers to an endonuclease domain having characteristic histidine and asparagine residues. An HNH domain can generally be identified by alignment to documented domain sequences, structural alignment to proteins with annotated domains, or by comparison to Hidden Markov Models (HMMs) built based on documented domain sequences (e.g., Pfam HMM PF01844 for domain HNH).
[0036] As used herein, the term “transposon” refers to mobile elements that move in and out of genomes carrying “cargo DNA” with them. These transposons can differ on the type of nucleic acid to transpose, the type of repeat at the ends of the transposon, the type of cargo to be carried, or by the mode of transposition (i.e., self-repair or host-repair).
[0037] As used herein, the term “transposase” or “transposases” refers to an enzyme that binds to the end of a transposon and catalyzes its movement to another part of the genome. Types of movement include a cut and paste mechanism and a replicative transposition mechanism.
[0038] As used herein, the term “Tn7” or “Tn7-like transposase” refers to a family of transposases comprising three main components: a heteromeric transposase (TnsA and/or TnsB) alongside a regulator protein (TnsC). In addition to the TnsABC transposition proteins, Tn7 elements can encode dedicated target site- sei ection proteins, TnsD and TnsE. In
conjunction with TnsABC, the sequence-specific DNA-binding protein TnsD directs transposition into a conserved site referred to as the “Tn7 attachment site,” attTn7. TnsD is a member of a large family of proteins that also includes TniQ. TniQ has been shown to target transposition into resolution sites of plasmids.
[0039] As used herein, the terms “gene editing” and “genome editing” can be used interchangeably. Gene editing or genome editing means to change the nucleic acid sequence of a gene or a genome. Genome editing can include, for example, insertions, deletions, and mutations. Genome editing can be performed by a gene editing system, for example a nuclease, a reverse transcriptase, a recombinase, or a base editor.
[0040] As used herein, the term “recombinase” refers to an enzyme that mediates the recombination of DNA fragments located between recombinase recognition sequences, which results in the excision, insertion, inversion, exchange or translocation) of the DNA fragments located between the recombinase recognition sequences.
[0041] As used herein, the term “recombine,” or “recombination,” in the context of a nucleic acid modification (e.g., a genomic modification), refers to the process by which two or more nucleic acid molecules, or two or more regions of a single nucleic acid molecule, are modified by the action of a recombinase protein. Recombination can result in, inter alia, the insertion, inversion, excision, or translocation of a nucleic acid sequence, e.g., in or between one or more nucleic acid molecules.
[0042] As used herein, the term “complex” refers to a joining of at least two components. The two components may each retain the properties/activities they had prior to forming the complex or gain properties as a result of forming the complex. The joining includes, but is not limited to, covalent bonding, non-covalent bonding (i.e., hydrogen bonding, ionic interactions, Van der Waals interactions, and hydrophobic bond), use of a linker, fusion, or any other suitable method. Contemplated components of the complex include polynucleotides, polypeptides, or combinations thereof. For example, a complex comprises an endonuclease and a guide polynucleotide.
[0043] The term” contig” or “contigs” is a set of DNA segments or sequences that overlap in a way that provides a contiguous representation of a genomic region.
[0044] The term “sequence identity” or “percent identity” in the context of two or more nucleic acids or polypeptide sequences, refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a local or global comparison window, as
measured using a sequence comparison algorithm. Suitable sequence comparison algorithms for polypeptide sequences include, e.g., BLASTP using parameters of a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix setting gap costs at existence of 11, extension of 1, and using a conditional compositional score matrix adjustment for polypeptide sequences longer than 30 residues; BLASTP using parameters of a wordlength (W) of 2, an expectation (E) of 1000000, and the PAM30 scoring matrix setting gap costs at 9 to open gaps and 1 to extend gaps for sequences of less than 30 residues (these are the default parameters for BLASTP in the BLAST suite available at https://blast.ncbi.nlm.nih.gov); CLUSTALW with the Smith -Waterman homology search algorithm parameters with a match of 2, a mismatch of -1, and a gap of -1; MUSCLE with default parameters; MAFFT with parameters of a retree of 2 and max iterations of 1000; Novafold with default parameters; HMMER hmmalign with default parameters.
[0045] The term “optimally aligned” in the context of two or more nucleic acids or polypeptide sequences, refers to two (e.g., in a pairwise alignment) or more (e.g., in a multiple sequence alignment) sequences that have been aligned to maximal correspondence of amino acids residues or nucleotides, for example, as determined by the alignment producing a highest or “optimized” percent identity score.
[0046] Included in the current disclosure are variants of any of the enzymes described herein with one or more conservative amino acid substitutions. Such conservative substitutions can be made in the amino acid sequence of a polypeptide without disrupting the three- dimensional structure or function of the polypeptide. Conservative substitutions can be accomplished by substituting amino acids with similar hydrophobicity, polarity, and R chain length for one another. Additionally, or alternatively, by comparing aligned sequences of homologous proteins from different species, conservative substitutions can be identified by locating amino acid residues that have been mutated between species (e.g., non-conserved residues) without altering the basic functions of the encoded proteins. Such conservatively substituted variants include variants with at least about 20%, at least about 25%, at least about 30%, at least about 35%, at least about 40%, at least about 45%, at least about 50%, at least about 55%, at least about 60%, at least about 65%, at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least about 91%, at least about 92%, at least about 93%, at least about 94%, at least about 95%, at least about 96%, at least about 97%, at least about 98%, or at least about 99% identity to any one of the large serine recombinase protein sequences described herein (e.g., MG178 family large serine recombinase, or any other family large serine recombinase described herein). In some
embodiments, such conservatively substituted variants are functional variants. Such functional variants can encompass sequences with substitutions such that the activity of one or more critical active site residues are not disrupted.
[0047] Also included in the current disclosure are variants of any of the enzymes described herein with substitution of one or more catalytic residues to decrease or eliminate activity of the enzyme (e.g. decreased-activity variants). In some embodiments, a decreased activity variant of a protein described herein comprises a disrupting substitution of at least one, at least two, or all three catalytic residues.
[0048] Conservative substitution tables providing functionally similar amino acids are available from a variety of references (see, for e.g., Creighton, Proteins: Structures and Molecular Properties (W H Freeman & Co.; 2nd edition (December 1993)). The following eight groups each contain amino acids that are conservative substitutions for one another:
1) Alanine (A), Glycine (G);
2) Aspartic acid (D), Glutamic acid (E);
3) Asparagine (N), Glutamine (Q);
4) Arginine (R), Lysine (K);
5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V);
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W);
7) Serine (S), Threonine (T); and
8) Cysteine (C), Methionine (M)
Serine Recombinase Gene Editing Systems
[0049] Current gene editing systems lack the ability to integrate multi-kilobase nucleic acid sequences. Large serine recombinases (LSRs) are capable of integrating large fragments of DNA into a eukaryotic genome. Viral LSRs range between 400 and 700 amino acids long and drive phage genome integration into a bacterial host genome when the virus enters its lysogenic life cycle. The mechanism for prophage integration involves the LSR recognizing a specific attachment site in the host genome, the attB site, and a phage attachment site, the attP site, on the phage genome. Viral genome integration occurs via recombination at these attachment sites, a process that leads to the generation of two new attachment sites, the attL and attR sites flanking the prophage. Serine recombinases described herein provided for genome engineering due to their ability to integrate a desired cargo into a specific target site.
Serine Recombinases
[0050] Described herein are gene editing systems comprising: a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase. Further described herein are nucleic acids, vectors, and cells comprising a serine recombinase described herein. Further described herein are means for integrating nucleic acid sequences in a genome.
[0051] Serine recombinases are enzymes that catalyze site-specific recombination events by facilitating DNA strand exchanges between two DNA segments possessing cognate recombinase recognition sites. The serine recombinase family comprises, for example, the small serine recombinases gamma-delta resolvase (from the TnlOOO transposon) and Tn3 resolvase (from the Tn3 transposon), or the large serine recombinases (LSRs) cpC31 -integrase (from the q>C31 phage), Bxbl -integrase (from the my cobacteriophage), and R4 integrase. Serine recombinases are characterized by a conserved catalytic serine amino acid residue that attacks the DNA phosphodiester and becomes covalently linked to a DNA strand end during catalysis. Serine recombinases recognize cognate attachment site sequences termed attB on the acceptor DNA strand (for example a bacterial genome) and attP on the donor DNA strand (for example the phage genome). After the recombination event, the attB and attP sites are recombined to form the attL and attR sites flanking the newly integrated sequence. attB and attP sites are typically up to about 50 bases long. During the recombination event, the serine recombinases form a tetrameric complex, with a protein dimer each attaching to an attB or attP attachment site. The serine recombinases cleave each strand producing a double strand break and leaving a 2 bp overhang and then strand exchange and ligate the strands. Typically, for serine recombinases no other enzymes are needed to perform the reaction.
[0052] In some embodiments, the serine recombinase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least
87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least
94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 70% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 75% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 80% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 85% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 90% identity to any one of SEQ ID
NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 95% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 96% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 97% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 98% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having at least about 99% identity to any one of SEQ ID NOs: 1-115. In some embodiments, the serine recombinase comprises a sequence having 100% identity to any one of SEQ ID NOs: 1-115. [0053] Further described herein are eukaryotic cells comprising a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115. In some embodiments, the eukaryotic cell is a mammalian cell. In some embodiments, the eukaryotic cell is a human cell.
[0054] In some embodiments, the serine recombinases described herein comprise improved integration efficiency. In some embodiments, the serine recombinases described herein comprise an integration efficiency of at least about 5%. In some embodiments, the serine recombinases described herein comprise an integration efficiency of at least about 25%. In some embodiments, the serine recombinases described herein comprise an integration efficiency of at least about 50%. In some embodiments, the serine recombinases described herein comprise an integration efficiency of at least about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or more than 95%. In some embodiments, the serine recombinases described herein comprise an improved integration efficiency as compared to a serine recombinase selected from the group consisting of: P-six, CinH, ParA y5, Bxbl, cpC31, TP901, TGI, cpBTl, R4, cpRVl, cpFCl, MR11, Al 18, U153, and gp29.
[0055] In some embodiments, the serine recombinase is a viral, prokaryotic, or eukaryotic serine recombinase. In some embodiments, the serine recombinase is capable of targeting genes comprising a catalase domain or synthase domain. In some embodiments, the catalase is manganese catalase. In some embodiments, the synthase is Queuosine synthase. In some embodiments, the serine recombinase is capable of targeting genes comprising a DUF4244 Pfam domain.
[0056] In some embodiments, the serine recombinase described herein comprises one or more nuclear localization sequences (NLSs) proximal to an N- or C-terminus of serine recombinase.
[0057] In some embodiments, the NLS comprises any of the sequences in Table 1 below, or a combination thereof:
[0058] In some embodiments, the serine recombinase comprises a tag. In some embodiments, the tag is an affinity tag. Exemplary affinity tags include, but are not limited to, a His-tag, a Flag tag, a Myc-tag, an MBP-tag, and a GST-tag.
[0059] In some embodiments, the serine recombinase comprises a protease cleavage site. Exemplary protease cleavage sites include, but are not limited to, a TEV site, a C3 site, a Factor Xa site, and an Enterokinase site.
Recombination Sites
[0060] Described herein are compositions comprising: a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid
encoding the serine recombinase; and a nucleic acid comprising a donor polynucleotide and a first attachment site sequence.
[0061] In some embodiments, the first attachment site sequence is 5’ of the donor polynucleotide.
[0062] In some embodiments, the nucleic acid encoding the serine recombinase further comprises a second attachment site sequence. In some embodiments, the second attachment site sequence is 5’ of the serine recombinase. In some embodiments, the nucleic acid encoding the serine recombinase comprises one or more attachment site sequences. In some embodiments, the nucleic acid encoding the serine recombinase comprises 1, 2, 3, 4, 5, or more than 5 attachment site sequences.
[0063] In some embodiments, the nucleic acid comprising a donor polynucleotide comprises one or more attachment site sequences. In some embodiments, the nucleic acid comprising a donor polynucleotide comprises 1, 2, 3, 4, 5, or more than 5 attachment site sequences.
[0064] In some embodiments, the first attachment site sequence and the second attachment site sequence are capable of recombination.
[0065] In some embodiments, the first attachment site sequence is a bacterial genomic recombination sequence (attB). In some embodiments, the attB sequence comprises about 20 to about 500 nucleotides. In some embodiments, the attB sequence comprises about 20 to about 450, about 20 to about 400, about 20 to about 350, about 20 to about 300, about 20 to about 250, about 20 to about 200, about 20 to about 250, about 20 to about 100, about 20 to about 50, about 50 to about 450, about 50 to about 400, about 50 to about 350, about 50 to about 300, about 50 to about 250, about 50 to about 200, about 50 to about 150, about 50 to about 100, about 100 to about 450, about 100 to about 400, about 100 to about 350, about 100 to about 300, about 100 to about 250, about 100 to about 200, or about 100 to about 150 nucleotides.
[0066] In some embodiments, the first attachment site sequence is a phage genomic recombination sequence (attP). In some embodiments, the attP sequence comprises about 20 to about 450, about 20 to about 400, about 20 to about 350, about 20 to about 300, about 20 to about 250, about 20 to about 200, about 20 to about 250, about 20 to about 100, about 20 to about 50, about 50 to about 450, about 50 to about 400, about 50 to about 350, about 50 to about 300, about 50 to about 250, about 50 to about 200, about 50 to about 150, about 50 to about 100, about 100 to about 450, about 100 to about 400, about 100 to about 350, about 100 to about 300, about 100 to about 250, about 100 to about 200, or about 100 to about 150 nucleotides.
[0067] In some embodiments, the second attachment site sequence is a bacterial genomic recombination sequence (attB). In some embodiments, the attB sequence comprises about 20 to about 500 nucleotides. In some embodiments, the attB sequence comprises about 20 to about 450, about 20 to about 400, about 20 to about 350, about 20 to about 300, about 20 to about 250, about 20 to about 200, about 20 to about 250, about 20 to about 100, about 20 to about 50, about 50 to about 450, about 50 to about 400, about 50 to about 350, about 50 to about 300, about 50 to about 250, about 50 to about 200, about 50 to about 150, about 50 to about 100, about 100 to about 450, about 100 to about 400, about 100 to about 350, about 100 to about 300, about 100 to about 250, about 100 to about 200, or about 100 to about 150 nucleotides.
[0068] In some embodiments, the second attachment site sequence is a phage genomic recombination sequence (attP). In some embodiments, the attP sequence comprises about 20 to about 450, about 20 to about 400, about 20 to about 350, about 20 to about 300, about 20 to about 250, about 20 to about 200, about 20 to about 250, about 20 to about 100, about 20 to about 50, about 50 to about 450, about 50 to about 400, about 50 to about 350, about 50 to about 300, about 50 to about 250, about 50 to about 200, about 50 to about 150, about 50 to about 100, about 100 to about 450, about 100 to about 400, about 100 to about 350, about 100 to about 300, about 100 to about 250, about 100 to about 200, or about 100 to about 150 nucleotides.
[0069] In some embodiments, the nucleic acid comprising the donor polynucleotide and the first attachment site sequence are delivered by a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (B AC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
[0070] Described herein are eukaryotic genomes comprising a donor polynucleotide and an attL sequence 5’ to the donor polynucleotide sequence. In some embodiments, the eukaryotic genomes further comprise an attR sequence 3’ to the donor polynucleotide sequence.
[0071] Described herein are eukaryotic genomes comprising a donor polynucleotide sequence; and an attL sequence 3’ to the donor polynucleotide sequence. In some embodiments, the eukaryotic genomes further comprise an attR sequence 3’ to the donor polynucleotide sequence.
[0072] Described herein are eukaryotic genomes comprising a donor polynucleotide sequence and an attL sequence 5’ or 3’ to the donor polynucleotide sequence.
[0073] In some embodiments, the attL sequence and the attR sequence are the same.
[0074] In some embodiments, the attL sequence is a recombined sequence of a first attachment site sequence and a second attachment site sequence. In some embodiments, the attR sequence is a recombined sequence of a first attachment site sequence and a second attachment site sequence.
Donor Polynucleotides
[0075] Serine recombinases described herein can provide for integration of polynucleotides (e.g., donor polynucleotides) of large sizes. In some embodiments, the donor polynucleotide comprises a size of at least about 1 kilobase (kb), 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, or more than 50 kb. In some embodiments, the donor polynucleotide comprises a size of at least about 15 kb, 20 kb, 25 kb, 30 kb, 35 kb, 50 kb, 100 kb, 200 kb, 300 kb, 400 kb, or 500 kb. In some embodiments, the donor polynucleotide comprises a size of about 200 base pairs (bp) to about 500 kb, 200 bp to about 250 kb, or 200 bp to about 100 kb. In some embodiments, the donor polynucleotide comprises a size of about 1 kb to about 10 kb, about 1 to about 7.5 kb, about 1 to about 5 kb, about 1 to about 3 kb, about 2 to about 10 kb, about 2 to about 7.5 kb, about 2 to about 5 kb, about 2 to about 3 kb, about 3 to about 10 kb, about 3 to about 7.5 kb, or about 3 to about 5 kb. In some embodiments, the donor polynucleotide comprises a size of about 10 kb to about 500 kb, 10 kb to about 400 kb, 10 kb to about 300 kb, 10 kb to about 200 kb, 10 kb to about 100 kb, about 10 kb to about 75 kb, about 10 kb to about 50 kb, about 10 kb to about 30 kb, about 20 kb to about 100 kb, about 20 to about 75 kb, about 20 kb to about 50 kb, about 20 kb to about 30 kb, about 30 kb to about 100 kb, about 30 kb to about 75 kb, or about 30 kb to about 50 kb. In some embodiments, the donor polynucleotide comprises a size of about 10 to about 500, 20 to about 400, 10 to about 300, 10 to about 200, or 10 to about 100. In some embodiments, the donor polynucleotide is circular. In some embodiments, the donor polynucleotide is linear.
[0076] In some embodiments, the donor polynucleotide encodes a therapeutic, a reporter, or a marker.
[0077] In some embodiments, the reporter comprises a fluorescent protein. In some embodiments, the fluorescent protein is GFP, EBFP, EBFP2, Azurite, mKalamal, ECFP, Cerulean, CyPet, YFP, Citrine, Venus, YPet, RFP, CFP, or a derivative thereof.
[0078] In some embodiments, the reporter is acetohydroxyacid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucuronidase (GUS), chloramphenicol acetyltransferase (CAT), horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, or a derivative thereof.
[0079] In some embodiments, the marker is an antibiotic resistance marker. In some embodiments, the antibiotic resistance marker is kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin B, tetracycline, chloramphenicol, neomycin, zeocin, or a derivative thereof.
[0080] In some embodiments, the marker is a cell surface marker. In some embodiments, the cell surface marker is a membrane protein, a sugar moiety, or a small molecule (for example biotin) presented on the cell surface. In some embodiments, the cell surface marker is a CD3, B2M, CD4, CD8, CD28, HLA proteins, MHC complex, streptavidin, or avidin. In some embodiments, the cell surface marker is an antibody for example an IgG, or an antibody fragment for example an scFv, or an Fc. In some embodiments, the cell surface marker is bound by a specific antibody. In some embodiments, the cell can be analyzed for expression of the cell surface marker by flow cytometry.
Delivery and Vectors
[0081] Disclosed herein, in some embodiments, are nucleic acid sequences encoding a serine recombinase or a serine recombinase gene editing system disclosed herein.
[0082] In some embodiments, the nucleic acid encoding the serine recombinase or the serine recombinase gene editing system is a DNA, for example a linear DNA, a plasmid DNA, or a minicircle DNA. In some embodiments, the nucleic acid is an RNA, for example a mRNA.
[0083] Described herein are vectors comprising: a) a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1- 115; and b) one or more regulatory elements. Further described herein are vectors comprising a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115, wherein the vector is selected from the group consisting of: a plasmid, a nanoplasmid, a phagemid, a phage derivative, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), and a cosmid.
[0084] In some embodiments, the nucleic acid encoding the serine recombinase or the serine recombinase gene editing system is delivered by a nucleic acid-based vector. In some embodiments, the nucleic acid-based vector is a plasmid (e.g., circular DNA molecules that can autonomously replicate inside a cell), cosmid (e.g., pWE or sCos vectors), artificial chromosome, human artificial chromosome (HAC), yeast artificial chromosomes (YAC), bacterial artificial chromosome (BAC), Pl -derived artificial chromosomes (PAC), phagemid, phage derivative, bacmid, or virus. In some embodiments, the nucleic acid-based vector is
selected from the list consisting of: pSF-CMV-NEO-NH2-PPT-3XFLAG, pSF-CMV-NEO- C00H-3XFLAG, pSF-CMV-PURO-NH2-GST-TEV, pSF-OXB20-COOH-TEV-FLAG(R)- 6His, pCEP4 pDEST27, pSF-CMV-Ub-KrYFP, pSF-CMV-FMDV-daGFP, pEFla-mCherry- N1 vector, pEFla-tdTomato vector, pSF-CMV-FMDV-Hygro, pSF-CMV-PGK-Puro, pMCP-tag(m), pSF-CMV-PUR0-NH2-CMYC, pSF-OXB20-BetaGal,pSF-OXB20-Fluc, pSF-OXB20, pSF-Tac, pRI 101-AN DNA, pCambia2301, pTYB21, pKLAC2, pAc5.1/V5- His A, and pDEST8.
[0085] In some embodiments, the one or more regulatory elements comprises a promoter, an enhancer, an intron, a microRNA, a linker, a splicing element, or a poly A signal. In some embodiments, the promoter is selected from a constitutive promoter, an inducible promoter, a mini promoter, or a derivative thereof. In some embodiments, the promoter is selected from the group consisting of: CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl9, p40, Synapsin, CaMKII, GRK1, polH, EM7, OpIEl, and a derivative thereof. In some embodiments the promoter is a U6 promoter. In some embodiments, the promoter is a CAG promoter.
[0086] In some embodiments, the nucleic acid-based vector is a virus. In some embodiments, the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus. In some embodiments, the virus is an alphavirus. In some embodiments, the virus is a parvovirus. In some embodiments, the virus is an adenovirus. In some embodiments, the virus is an AAV. In some embodiments, the virus is a baculovirus. In some embodiments, the virus is a Dengue virus. In some embodiments, the virus is a lentivirus. In some embodiments, the virus is a herpesvirus. In some embodiments, the virus is a poxvirus. In some embodiments, the virus is an anellovirus. In some embodiments, the virus is a bocavirus. In some embodiments, the virus is a vaccinia virus. In some embodiments, the virus is or a retrovirus.
[0087] In some embodiments, the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV- Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11, AAV- HSC12, AAV-HSC13, AAV-HSC14, AAV-HSC15, AAV-TT, AAV-DJ/8, AAV-Myo, AAV-NP40, AAV-NP59, AAV-NP22, AAV-NP66, AAV-HSC16, or a derivative thereof. In
some embodiments, the herpesvirus is HSV type 1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8.
[0088] In some embodiments, the virus is AAV1 or a derivative thereof. In some embodiments, the virus is AAV2 or a derivative thereof. In some embodiments, the virus is AAV3 or a derivative thereof. In some embodiments, the virus is AAV4 or a derivative thereof. In some embodiments, the virus is AAV5 or a derivative thereof. In some embodiments, the virus is AAV6 or a derivative thereof. In some embodiments, the virus is AAV7 or a derivative thereof. In some embodiments, the virus is AAV8 or a derivative thereof. In some embodiments, the virus is AAV9 or a derivative thereof. In some embodiments, the virus is AAV10 or a derivative thereof. In some embodiments, the virus is AAV1 1 or a derivative thereof. In some embodiments, the virus is AAV12 or a derivative thereof. In some embodiments, the virus is AAV13 or a derivative thereof. In some embodiments, the virus is AAV14 or a derivative thereof. In some embodiments, the virus is AAV15 or a derivative thereof. In some embodiments, the virus is AAV16 or a derivative thereof. In some embodiments, the virus is AAV-rh8 or a derivative thereof. In some embodiments, the virus is AAV-rhlO or a derivative thereof. In some embodiments, the virus is AAV-rh20 or a derivative thereof. In some embodiments, the virus is AAV-rh39 or a derivative thereof. In some embodiments, the virus is AAV-rh74 or a derivative thereof. In some embodiments, the virus is AAV-rhM4-l or a derivative thereof. In some embodiments, the virus is AAV-hu37 or a derivative thereof. In some embodiments, the virus is AAV- Anc80 or a derivative thereof. In some embodiments, the virus is AAV-Anc80L65 or a derivative thereof. In some embodiments, the virus is AAV-7m8 or a derivative thereof. In some embodiments, the virus is AAV-PHP-B or a derivative thereof. In some embodiments, the virus is AAV-PHP-EB or a derivative thereof. In some embodiments, the virus is AAV- 2.5 or a derivative thereof. In some embodiments, the virus is AAV-2tYF or a derivative thereof. In some embodiments, the virus is AAV-3B or a derivative thereof. In some embodiments, the virus is AAV-LK03 or a derivative thereof. In some embodiments, the virus is AAV-HSC1 or a derivative thereof. In some embodiments, the virus is AAV-HSC2 or a derivative thereof. In some embodiments, the virus is AAV-HSC3 or a derivative thereof. In some embodiments, the virus is AAV-HSC4 or a derivative thereof. In some embodiments, the virus is AAV-HSC5 or a derivative thereof. In some embodiments, the virus is AAV-HSC6 or a derivative thereof. In some embodiments, the virus is AAV-HSC7 or a derivative thereof. In some embodiments, the virus is AAV-HSC8 or a derivative thereof. In some embodiments, the virus is AAV-HSC9 or a derivative thereof. In some
embodiments, the virus is AAV-HSC10 or a derivative thereof. In some embodiments, the virus is AAV-HSC11 or a derivative thereof. In some embodiments, the virus is AAV- HSC12 or a derivative thereof. In some embodiments, the virus is AAV-HSC13 or a derivative thereof. In some embodiments, the virus is AAV-HSC14 or a derivative thereof. In some embodiments, the virus is AAV-HSC15 or a derivative thereof. In some embodiments, the virus is AAV-TT or a derivative thereof. In some embodiments, the virus is AAV-DJ/8 or a derivative thereof. In some embodiments, the virus is AAV-Myo or a derivative thereof. In some embodiments, the virus is AAV-NP40 or a derivative thereof. In some embodiments, the virus is AAV-NP59 or a derivative thereof. In some embodiments, the virus is AAV- NP22 or a derivative thereof. In some embodiments, the virus is AAV-NP66 or a derivative thereof. In some embodiments, the virus is AAV-HSC16 or a derivative thereof. [0089] In some embodiments, the virus is HSV-1 or a derivative thereof. In some embodiments, the virus is HSV-2 or a derivative thereof. In some embodiments, the virus is VZV or a derivative thereof. In some embodiments, the virus is EBV or a derivative thereof. In some embodiments, the virus is CMV or a derivative thereof. In some embodiments, the virus is HHV-6 or a derivative thereof. In some embodiments, the virus is HHV-7 or a derivative thereof. In some embodiments, the virus is HHV-8 or a derivative thereof.
[0090] In some embodiments, the nucleic acid encoding the serine recombinase or a serine recombinase gene editing system is delivered by a non-nucleic acid-based delivery system (e.g., a non-viral delivery system). In some embodiments, the non-viral delivery system is a liposome. In some embodiments, the nucleic acid is associated with a lipid. The nucleic acid associated with a lipid, in some embodiments, is encapsulated in the aqueous interior of a liposome, interspersed within the lipid bilayer of a liposome, attached to a liposome via a linking molecule that is associated with both the liposome and the nucleic acid, entrapped in a liposome, complexed with a liposome, dispersed in a solution containing a lipid, mixed with a lipid, combined with a lipid, contained as a suspension in a lipid, contained or complexed with a micelle, or otherwise associated with a lipid. In some embodiments, the nucleic acid is comprised in a lipid nanoparticle (LNP).
[0091] In some embodiments, the serine recombinase or the serine recombinase gene editing system is introduced into the cell in any suitable way, either stably or transiently. In some embodiments, the serine recombinase or the serine recombinase gene editing system is transfected into the cell. In some embodiments, the cell is transduced or transfected with a nucleic acid construct that encodes the serine recombinase or the serine recombinase gene editing system. For example, a cell is transduced (e.g., with a virus encoding the serine
recombinase or the serine recombinase gene editing system), or transfected (e.g., with a plasmid encoding the serine recombinase or the serine recombinase gene editing system) with a nucleic acid that encodes the serine recombinase or the serine recombinase gene editing system, or the translated the serine recombinase or the serine recombinase gene editing system. In some embodiments, the transduction is a stable or transient transduction. In some embodiments, a plasmid expressing the serine recombinase or the serine recombinase gene editing system is introduced into cells through electroporation, transient (e.g., lipofection) and stable genome integration (e.g., piggybac) and viral transduction (for example lentivirus or AAV) or other methods known to those of skill in the art. In some embodiments, the gene editing system is introduced into the cell as one or more polypeptides. In some embodiments, delivery is achieved through the use of RNP complexes. Delivery methods to cells for polypeptides and/or RNPs are known in the art, for example by electroporation or by cell squeezing.
[0092] Exemplary methods of delivery of nucleic acids include lipofection, nucleofection, electroporation, stable genome integration (e.g., piggybac), microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid nucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386; 4,946,787; and 4,897,355) and lipofection reagents are sold commercially (e.g., Transfectam™, Lipofectin™ and SF Cell Line 4D-Nucleofector X Kit™ (Lonza)). Cationic and neutral lipids that are suitable for efficient receptor-recognition lipofection of polynucleotides include those of WO 91/17424 and WO 91/16024. In some embodiments, the delivery is to cells (e.g., in vitro or ex vivo administration) or target tissues (e.g., in vivo administration). In some embodiments, the nucleic acid is comprised in a liposome or a nanoparticle that specifically targets a host cell.
[0093] Additional methods for the delivery of nucleic acids to cells are known to those skilled in the art. See, for example, US 2003/0087817.
[0094] In some embodiments, delivery of the serine recombinase or the serine recombinase gene editing system to the target nucleic acid site comprises delivering a nucleic acid comprising an open reading frame encoding the serine recombinase or the serine recombinase gene editing system. In some embodiments, the nucleic acid comprises a promoter. In some embodiments, the open reading frame encoding the serine recombinase or the serine recombinase gene editing system is operably linked to the promoter. In some embodiments, the promoter is a ribonucleic acid (RNA) pol III promoter.
[0095] In some embodiments, delivery of the serine recombinase or the serine recombinase gene editing system to the target nucleic acid site comprises delivering a capped mRNA containing the open reading frame encoding the serine recombinase or the serine recombinase gene editing system. In some embodiments, delivery of the serine recombinase or the serine recombinase gene editing system to the target nucleic acid site comprises delivering a translated polypeptide. In some embodiments, delivery of the serine recombinase or the serine recombinase gene editing system to the target nucleic acid site comprises delivering a deoxyribonucleic acid (DNA) encoding the serine recombinase or the serine recombinase gene editing system operably linked to a ribonucleic acid (RNA) pol III promoter.
Lipid nanoparticles
[0096] Disclosed herein, in certain embodiments, are lipid nanoparticles comprising the serine recombinase or the serine recombinase gene editing system of the disclosure for delivery into a cell.
[0097] In some embodiments, the lipid nanoparticle comprises the serine recombinase or the serine recombinase gene editing system or a nucleic acid encoding the serine recombinase or the serine recombinase gene editing system. In some embodiments, the lipid nanoparticle comprises the one or more components of the serine recombinase gene editing system. In some embodiments, the lipid nanoparticle comprises the serine recombinase or a nucleic acid encoding the serine recombinase. In some embodiments, the lipid nanoparticle comprises the donor polynucleotide.
[0098] In some embodiments, the lipid nanoparticle is tethered to the serine recombinase gene editing system.
[0099] Lipid nanoparticles as described herein can be 4-component lipid nanoparticles. Such nanoparticles can be configured for delivery of RNA or other nucleic acids (e.g., synthetic RNA, mRNA, or in vv/ra-synthesized mRNA) and can be generally formulated as described in WO2012135805A2. Such nanoparticles can generally comprise: (a) a cationic lipid, (b) a neutral lipid (e.g., DSPC or DOPE), (c) a sterol (e.g., cholesterol or a cholesterol analog), or (d) a PEG-modified lipid (e.g., PEG-DMG).
[0100] The cationic lipid referred to herein as “C 12-200” is disclosed by Love et al., Proc Natl Acad Sci USA. 2010 107: 1864-1869 and Liu and Huang, Molecular Therapy. 2010 669- 670. Cationic lipid formulations can include particles comprising either 3 or 4 or more components in addition to polynucleotide, primary construct, or RNA (e.g., mRNA). As an example, formulations with certain cationic lipids, include, but are not limited to, 98N12-5
and may contain 42% lipidoid, 48% cholesterol and 10% PEG (Cl 4 or greater alkyl chain length). As another example, formulations with certain lipidoids include, but are not limited to, C12-200 and may contain 50% cationic lipid, 10% disteroylphosphatidyl choline, 38.5% cholesterol, and 1.5% PEG-DMG.
[0101] In some embodiments, the cationic lipid nanoparticle comprises a cationic lipid, a PEG-modified lipid, a sterol, and a non-cationic lipid. In some embodiments, the cationic lipid nanoparticle has a molar ratio of about 20-60% cationic lipid: about 5-25% non-cationic lipid: about 25-55% sterol; and about 0.5-15% PEG-modified lipid. In some embodiments, the cationic lipid nanoparticle comprises a molar ratio of about 50% cationic lipid, about 1.5% PEG-modified lipid, about 38.5% cholesterol, and about 10% non-cationic lipid. In some embodiments, the cationic lipid nanoparticle comprises a molar ratio of about 55% cationic lipid, about 2.5% PEG-modified lipid, about 32.5% cholesterol, and about 10% noncationic lipid. In some embodiments, the cationic lipid is an ionizable cationic lipid, the noncationic lipid is a neutral lipid, and the sterol is a cholesterol. In some embodiments, the cationic lipid nanoparticle has a molar ratio of 50:38.5: 10: 1.5 of cationic lipid: cholesterol: PEG2000-DMG:DSPC or DMG:DOPE. In some embodiments, lipid nanoparticles as described herein can comprise cholesterol, l,2-dioleoyl-sn-glycero-3 -phosphoethanolamine (DOPE), 1, l‘-((2-(4-(2-((2-(bis(2-hydroxydodecyl)amino)ethyl)(2- hydroxydodecyl)amino)ethyl)piperazin-l-yl)ethyl)azanediyl)bis(dodecan-2-ol) (C 12-200), and DMG-PEG-2000 at molar ratios of 47.5: 16:35: 1.5
Methods for Gene Editing
[0102] Described herein, in some embodiments, are methods for gene editing, comprising: a) providing or identifying a first attachment site sequence in a host genome; b) providing a nucleic acid comprising a donor polynucleotide and a second attachment site sequence to a host cell; and c) contacting the host cell with a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase, wherein the first attachment site sequence and the second attachment site sequence are capable of recombination.
[0103] In some embodiments, the first attachment site sequence is endogenous in the host genome.
[0104] In some embodiments, the first attachment site sequence is provided using viral delivery. In some embodiments, viral delivery comprises use of a virus, wherein the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a
lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus. In some embodiments, the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV- Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11, AAV- HSC12, AAV-HSC13, AAV-HSC14, AAV-HSC15, AAV-TT, AAV-DJ/8, AAV-Myo, AAV-NP40, AAV-NP59, AAV-NP22, AAV-NP66, AAV-HSC16, or a derivative thereof. In some embodiments, the herpesvirus is HSV type 1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8.
[0105] In some embodiments, the first attachment site sequence is provided using a transposase. In some embodiments, the transposase is transposase (Tnp) Tn5, Sleeping Beauty transposase, or a Tn7 transposon. In some embodiments, the gene editing system comprises an enzyme with transposase activity. Additional enzymes with transposase activity include, but are not limited to, retrons and IS200/IS605 transposons.
[0106] In some embodiments, the first attachment site sequence is provided using a nuclease. In some embodiments, the nuclease is a double-strand nuclease.
[0107] In some embodiments, the nuclease is a Type II CRISPR endonuclease. In some embodiments, the nuclease is Cas9. Type II CRISPR systems are considered the simplest in terms of components. In Type II CRISPR systems, the processing of the CRISPR array into mature crRNAs does not require the presence of a special endonuclease subunit, but rather a small trans-encoded crRNA (tracrRNA) with a region complementary to the array repeat sequence; the tracrRNA interacts with both its corresponding effector nuclease (e.g., Cas9) and the repeat sequence to form a precursor dsRNA structure, which is cleaved by endogenous RNAse III to generate a mature effector enzyme loaded with both tracrRNA and crRNA. Type II nucleases are known as DNA nucleases. Type II nucleases generally exhibit a structure consisting of a RuvC-like endonuclease domain that adopts the RNase H fold with an unrelated HNH nuclease domain inserted within the folds of the RuvC-like nuclease domain. The RuvC-like domain is responsible for the cleavage of the target (e.g., crRNA complementary) DNA strand, while the HNH domain is responsible for cleavage of the displaced DNA strand. Exemplary CRISPR Cas9 proteins include, but are not limited to, Cas9 from Streptococcus pyogenes (UniProtKB - Q99ZW2 (CAS9 STRP1)), Streptococcus thermophilus (UniProtKB - G3ECR1 (CAS9 STRTR)), Staphylococcus aureus (UniProtKB -
J7RUA5 (CAS9 STAAU), Campylobacter jejuni (UniProtKB - Q0P897 (CAS9 CAMJE)), Campylobacter lari (UniProtKB - A0A0A8HTA3 (A0A0A8HTA3 CAMLA), Helicobacter canadensis (UniProtKB - C5ZYI3 (C5ZYI3 9HELI)), and Francisella tularensis subsp. Novicida (UniProtKB - A0Q5Y3 (CAS9 FRATN). Additional Type II nucleases are described in International Patent Application Publication WO 2021/226363, WO 2022/159758, and WO 2022/056324.
[0108] In some embodiments, the nuclease is a CRISPR nuclease. In some embodiments, the CRISPR nuclease is a Class 2 Type II SpCas9 or a Class 2 Type V-A Casl2a (previously Cpfl). In some embodiments, the Type V-A nuclease has a guide RNA of 42-44 nucleotides compared with approximately 100 nt for SpCas9. In some embodiments, the Type V-A nuclease results in staggered cut sites. In some embodiments, the Type V-A nuclease results in staggered cut sites to facilitate directed repair pathways, such as microhomologydependent targeted integration (MITI).
[0109] In some embodiments, the nuclease is a Type V CRISPR endonuclease. Type V CRISPR systems are characterized by a nuclease effector (e.g., Casl2) structure similar to that of Type II effectors, comprising a RuvC-like domain. Similar to Type II, most (but not all) Type V CRISPR systems use a tracrRNA to process pre-crRNAs into mature crRNAs; however, unlike Type II systems which requires RNAse III to cleave the pre-crRNA into multiple crRNAs, Type V systems are capable of using the effector nuclease itself to cleave pre-crRNAs. Like Type II CRISPR systems, Type V CRISPR systems are known as DNA nucleases. Unlike Type II CRISPR systems, some Type V enzymes (e.g., Casl2a) appear to have a robust single-stranded nonspecific deoxyribonuclease activity that is activated by the first crRNA-directed cleavage of a double-stranded target sequence.
[0110] The most commonly used Type V-A enzymes require a 5’ protospacer adjacent motif (PAM) next to the chosen target site: 5’-TTTV-3’ for Lachnospiraceae bacterium ND2006 LbCasl2a and Acidaminococcus sp. AsCasl2a; and 5’-TTV-3’ for Francisella novicida FnCasl2a. In some embodiments the PAM sequence is YTV, YYN, or TTN. Additional Type II nucleases are described in International Patent Application Publication WO 2021/226363.
[oni] In some embodiments, the first attachment site sequence is provided using a reverse transcriptase. Reverse transcription is the translation of an RNA template into a complementary DNA. Reverse transcription is performed by enzymes termed reverse transcriptases (RT) that are enzymes with RNA-dependent DNA polymerase activity that create the complementary DNA (cDNA) strand from an RNA template. Some of the RT
enzymes also have DNA-dependent DNA polymerase activity to create a double-stranded dsDNA. Reverse transcriptases can be of viral origin (for example HIV, hepatitis B, Moloney murine leukemia virus (MMLV), or avian myeloblastosis virus (AMV)) or bacterial origin (for example group II introns, retrons/retron-like RTs, diversity-generating retroelements (DGRs), Abi-like RTs, CRISPR-associated RTs, and group Il-like RTs (G2L)). Reverse transcriptases of eukaryotic origin comprise the telomerase reverse transcriptase that maintains the telomeres of eukaryotic chromosomes. Reverse transcription allows the introduction of site-directed insertions, deletions, and mutations into the cDNA by encoding them in the RNA template.
[0112] In some embodiments, the reverse transcriptase is a viral, prokaryotic, or eukaryotic reverse transcriptase. In some embodiments, the reverse transcriptase is an MG151, MG153, or MG160 family reverse transcriptase. In some embodiments, the reverse transcriptase is an MG140, MG146, MG148, MG149, MG151, MG153, MG154, MG155, MG156, MG157, MG158, MG159, MG160, MG163, MG164, MG165, MG166, MG167, MG168, MG169, MG170, or MG176 family reverse transcriptase. In some embodiments, the reverse transcriptase comprises a sequence with at least 80%, at least 81%, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, or 100% sequence identity to any one of the MG140, MG146, MG148, MG149, MG151, MG153, MG154, MG155, MG156, MG157, MG158, MG159, MG160, MG163, MG164, MG165, MG166, MG167, MG168, MG169, MG170, MG172, MG173, or MG176 family reverse transcriptases or retrotransposases. In some embodiments, the reverse transcriptase comprises a sequence with at least 80% sequence identity to any one of the MG140, MG146, MG148, MG149, MG151, MG153, MG154, MG155, MG156, MG157, MG158, MG159, MG160, MG163, MG164, MG165, MG166, MG167, MG168, MG169, MG170, MG172, MG173, or MG176 family reverse transcriptases or retrotransposases or variants thereof. In some embodiments, the reverse transcriptase is smaller than 300 amino acids. In some embodiments, the reverse transcriptase is smaller than 250 amino acids.
[0113] In some embodiments, the methods are used to introduce a modification in the genome of a cell. In some embodiments, the modification is an insertion, deletion, or mutation. In some embodiments, the methods are used to introduce site-directed insertions, deletions, and/or mutations in the genome of a cell (for example an insertion and a mutation). In some embodiments, the methods are used in combination with a nucleic acid template to
facilitate site-directed insertions into the genome of a cell. In some embodiments, the cell is a human cell. In some embodiments, the cell genome or a vector comprised in the cell is modified. In some embodiments, the cell genome is modified ex vivo. In some embodiments, the cell genome is modified in vivo.
[0114] In some embodiments, the methods described herein further comprise detecting the genome modifications. In some embodiments, after the cell genome is modified, the cell is cultured for a certain amount of time. In some embodiments, the DNA or RNA is extracted and sequenced, and modified sequence areas are mapped and compared with an unmodified sequence. In some embodiments, cells are stained with antibodies for protein products that are translated from the modified nucleic acid, and the resulting stained proteins or polypeptides in the cell are analyzed, for example by flow cytometry.
Cells
[0115] Described herein, in certain embodiments, is a cell comprising the serine recombinase or the serine recombinase system described herein. In some embodiments, the cell (e.g., mammalian cell) comprises the eukaryotic genome described herein. In some embodiments, the cell is a human cell.
[0116] In some embodiments, the cell is a eukaryotic cell (e.g., a plant cell, an animal cell, a protist cell, or a fungi cell), a mammalian cell (a Chinese hamster ovary (CHO) cell, baby hamster kidney (BHK), human embryo kidney (HEK), mouse myeloma (NSO), or human retinal cells), an immortalized cell (e.g., a HeLa cell, a COS cell, a HEK-293T cell, a MDCK cell, a 3T3 cell, a PC 12 cell, a Huh7 cell, a HepG2 cell, a K562 cell, a N2a cell, or a SY5Y cell), an insect cell (e.g., a Spodoptera frugiperda cell, a Trichoplusia ni cell, a Drosophila melanogaster cell, a S2 cell, or a Heliothis virescens cell), a yeast cell (e.g., a Saccharomyces cerevisiae cell, a Cryptococcus cell, or a Candida cell), a plant cell (e.g., a parenchyma cell, a collenchyma cell, or a sclerenchyma cell), a fungal cell (e.g., a Saccharomyces cerevisiae cell, a Cryptococcus cell, or a Candida cell), or a prokaryotic cell (e.g., a E. coli cell, a streptococcus bacterium cell, a streptomyces soil bacteria cell, or an archaea cell). In some embodiments, the cell is a eukaryotic cell. In some embodiments, the cell is a mammalian cell. In some embodiments, the cell is an immortalized cell. In some embodiments, the cell is an insect cell. In some embodiments, the cell is a yeast cell. In some embodiments, the cell is a plant cell. In some embodiments, the cell is a fungal cell. In some embodiments, the cell is a prokaryotic cell.
[0117] In some embodiments, the cell is an A549, HEK-293, HEK-293T, BHK, CHO, HeLa, MRC5, Sf9, Cos-1, Cos-7, Vero, BSC 1, BSC 40, BMT 10, WI38, HeLa, Saos, C2C12, L cell, HT1080, HepG2, Huh7, K562, a primary cell, or derivative thereof.
[0118] In some embodiments, the cell is a liver cell.
Kits
[0119] In some embodiments, this disclosure provides kits comprising one or more nucleic acid constructs encoding the various components of the serine recombinases described herein, e.g., comprising a nucleotide sequence encoding the components of the serine recombinases capable of modifying a target DNA sequence.
[0120] In some embodiments, any of the serine recombinases disclosed herein is assembled into a pharmaceutical, diagnostic, or research kit to facilitate its use in therapeutic, diagnostic, or research applications. A kit may include one or more containers housing any of the vectors disclosed herein and instructions for use.
[0121] The kit may be designed to facilitate use of the methods described herein by researchers and can take many forms. Each of the compositions of the kit, where applicable, may be provided in liquid form (e.g., in solution), or in solid form, (e.g., a dry powder). In some embodiments, the compositions are constitutable or otherwise processable (e.g., to an active form), for example, by the addition of a suitable solvent or other species (for example, water or a cell culture medium), which may or may not be provided with the kit. As used herein, "instructions" can define a component of instruction and/or promotion, and typically involve written instructions on or associated with packaging of the disclosure. Instructions also can include any oral or electronic instructions provided in any manner such that a user will clearly recognize that the instructions are to be associated with the kit, for example, audiovisual (e.g., videotape, DVD, etc.), Internet, and/or web-based communications, etc. The written instructions, in some embodiments, are in a form prescribed by a governmental agency regulating the manufacture, use, or sale of pharmaceuticals or biological products, which instructions can also reflect approval by the agency of manufacture, use, or sale for animal administration.
EXAMPLES
[0122] The following examples are given for the purpose of illustrating various embodiments of the disclosure and are not meant to limit the present disclosure in any fashion. The present examples, along with the methods described herein are presently representative of preferred
embodiments, are exemplary, and are not intended as limitations on the scope of the disclosure. Changes therein and other uses which are encompassed within the spirit of the disclosure as defined by the scope of the claims will occur to those skilled in the art.
Example 1. Bioinformatic Identification of Large Serine Recombinases
[0123] This example describes the identification of proteins with large serine recombinase function by a bioinformatic approach.
[0124] Putative large serine recombinases (LSRs) were identified in an extensive database of viral, prokaryotic, and eukaryotic proteins. The search resulted in 163,797 non-partial homologs with a score > 50. LSRs were further filtered by requiring contigs to have a 1 kbp flank on either side of the LSR, and dereplicated at 90% average amino acid identity (AAI). After dereplication, 8,364 LSRs were globally aligned, and a phylogenetic tree was constructed. Closely related contigs lacking the LSRs were identified by searching for contigs containing the two genes flanking approximate proviral boundaries identified. To ensure the contigs were from a closely related strain, local alignments were performed requiring the two genes to share > 99% AAI. Precise proviral boundaries were identified by locally aligning the contigs containing and lacking the LSRs at the nucleotide level. Once integration boundaries were delineated, the attL and attR sites flanking the prophage, as well as the attachment sites’ common core, were identified by searching for imperfect repeats near the boundaries.
[0125] LSR candidates were identified based on the presence of resolvase, recombinase, and Zn-finger domains, as well as catalytic residues required for activity (FIG. 1). Phylogenetic analysis of LSR candidates indicated that these enzymes are encoded in highly diverse genomes, and prophage boundaries were predicted for many (FIG. 1). Prophage genomes mobilized by LSR reached nearly 94 kb in length.
Example 2. Prophetic - In silica prediction of LSR attachment sites
[0126] Prophage boundaries are identified by aligning the contigs containing the LSR with highly similar contig sequences lacking the LSRs, which likely represent the host without the integration event. With integration boundaries delineated, the attachment site’s common cores are identified by searching for repeats near the boundaries.
Example 3. Prophetic - In vitro assay of serine recombinase activity
[0127] In vitro recombination reactions
[0128] To test the functionality of the serine recombinases, the attP and attB sites from the attL, attR, and common core sequences from the native integrated prophage genomic context are determined bioinformatically and tested in in vitro recombination reactions. The attB and attP sites are synthesized in gene fragments -300 bp in length with primer binding sites unique to each attachment site end (FIG. 2C). Serine recombinases are expressed in vitro, while negative controls include in vitro expression reactions without template (null) (FIG. 2A). Negative recombination reaction controls are set up in 10 pL reactions using 50 ng of attB, 50 ng of attP, recombination buffer (20 mM HEPES pH 7.5, 50 pg/mL bovine serum albumin (BSA), 2 mM TCEP, 5 mM MgCh, 100 mM KC1, 5 mM spermidine, 2 mM ZnCh, and 5% glycerol) and 1 pL null reaction (no recombinase template). Experimental conditions include 50 ng of attB, 50 ng of attP, and 1 pL of in vitro-Q ^vQSSQ recombinase (FIG. 2B). Recombination reactions are incubated at 30 °C for 1 hour and diluted with water at 1 : 10. PCR reactions are then performed with attL- (attB5 and attP3) or attR- (attB3 and attP5) specific primer sets (FIG. 2C) and analyzed on a 2% agarose gel to determine amplification and size of resulting products. Product-forming reactions are Sanger sequenced and aligned to the predicted attL and attR sequences determined bioinformatically.
Example 4. Prophetic - In cell plasmid recombination
[0129] Recombinases are tested for their activity in human cells by synthesizing the attB fragment into a target plasmid (pTarget) with the attP site upstream of a promoterless mCherry coding ORF. attB fragments are synthesized into a pDonor plasmid encoding a pCMV promoter upstream of the attB site without a downstream coding ORF. When cotransfected with the active recombinase, the pCMV promoter of pDonor is recombined with the pTarget mCherry, and the junction of the pCMV promoter to the mCherry drives transcription and translation of the mCherry coding region. Efficiency of the recombinase is compared to the negative control of a cell population transfected with both pDonor and pTarget without the recombinase plasmid.
Example 5. Prophetic - Landing pad activity in mammalian cells
[0130] To introduce exogenous donor DNA into the human genome using large serine recombinases, the landing pad, an attP or attB sequence site is (1) found to be endogenous to the human genome sequence, or (2) introduced using viral delivery or by way of a transposable element, (3) integrated into the genome using HDR coupled with a nuclease, or (4) reverse transcribed into the genome using a targeted reverse transcriptase.
[0131] After introduction or identification of the landing pad (either an attP or attB) site to the genome, LSR activity to the genome is determined by using a DNA donor comprising (1) a promoter-driven fluorescent protein construct or (2) a promoterless fluorescent coding construct with the cognate attachment (attB/attP) site and/or (3) an antibiotic resistance marker or (4) a screenable cell surface marker. The donor is introduced into the cell as a plasmid, a minicircle, a Bacterial Artificial Chromosome, a nanoplasmid, or a linear dsDNA construct to integrate into the landing pad.
[0132] Along with introducing the donor into the cell, the LSR is transfected into the cell using either, (1) a plasmid encoding for the transcription and translation of the LSR, (2) an mRNA coded for LSR translation, or (3) a purified protein. Landing pad efficiency is determined by flow analysis in the case of a fluorescent protein and/or cell surface marker donor, or colony formation under selective conditions and subsequent PCR analysis of exogenous/endogenous DNA junction formation.
References
[0133] Anzalone AV, Gao XD, Podracky CJ, Nelson AT, Koblan LW, Raguram A, Levy JM, Mercer JAM, Liu DR. Programmable deletion, replacement, integration and inversion of large DNA sequences with twin prime editing. Nat Biotechnol. 2022, 0(5):731-740. doi: 10.1038/s41587-021-01133-w. Epub 2021 Dec 9. PMID: 34887556; PMCID: PMC9117393. [0134] Durrant MG, Fanton A, Tycko J, Hinks M, Chandrasekaran SS, Perry NT, Schaepe J, Du PP, Lotfy P, Bassik MC, Bintu L, Bhatt AS, Hsu PD. Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome. Nat Biotechnol. 2022 Oct 10. doi: 10.1038/s41587-022-01494-w. PMID: 36217031
[0135] Smith MCM. Phage-encoded Serine Integrases and Other Large Serine Recombinases. Microbiol Spectr. 2015 Aug;3(4). doi: 10.1128/microbiolspec.MDNA3-0059- 2014. PMID: 26350324
[0136] Robert C. Edgar, Search and clustering orders of magnitude faster than BLAST, Bioinformatics, Volume 26, Issue 19, 1 October 2010, Pages 2460-2461, doi.org/10.1093/bioinformatics/btq461
[0137] Nayfach, S., Camargo, A.P., Schulz, F. et al. CheckV assesses the quality and completeness of metagenome-assembled viral genomes. Nat Biotechnol 39, 578-585 (2021). doi . org/10.1038/s41587-020-00774-7
[0138] Price MN, Dehal PS, Arkin AP (2010) FastTree 2 - Approximately Maximum- Likelihood Trees for Large Alignments. PLoS ONE 5(3): e9490. doi . org/ 10.1371 /j ournal .pone.0009490
[0139] Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772-780. doi : 10.1093/molbev/mst010
[0140] Steinegger, M., Sbding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35, 1026-1028 (2017). doi.org/10.1038/nbt.3988
EQUIVALENTS
[0141] The disclosure may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the disclosure described herein. Scope of the disclosure is thus indicated by the appended claims rather than by the foregoing description, and all changes that come within the meaning and range of equivalency of the claims are intended to be embraced therein.
Claims
1. A gene editing system comprising: a) a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase; and b) a nucleic acid comprising a donor polynucleotide and a first attachment site sequence.
2. The gene editing system of claim 1, wherein the first attachment site sequence is 5’ of the donor polynucleotide.
3. The gene editing system of any one of claims 1-2, wherein the nucleic acid encoding the serine recombinase further comprises a second attachment site sequence.
4. The gene editing system of claim 3, wherein the second attachment site sequence is 5’ of the serine recombinase.
5. The gene editing system of any one of claims 3-4, wherein the first attachment site sequence and the second attachment site sequence are capable of recombination.
6. The gene editing system of any one of claims 1-5, wherein the first attachment site sequence is a bacterial genomic recombination sequence (attB).
7. The gene editing system of any one of claims 1-5, wherein the first attachment site sequence is a phage genomic recombination sequence (attP).
8. The gene editing system of any one of claims 3-7, wherein the second attachment site sequence is a bacterial genomic recombination sequence (attB).
9. The gene editing system of any one of claims 3-7, wherein the second attachment site sequence is a phage genomic recombination sequence (attP).
10. The gene editing system of any one of claims 6-9, wherein the attB sequence comprises about 20 to about 500 nucleotides.
11. The gene editing system of any one of claims 7-10, wherein the attP sequence comprises about 20 to about 500 nucleotides.
12. The gene editing system of any one of claims 1-11, wherein the nucleic acid comprising the donor polynucleotide and the first attachment site sequence is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
13. The gene editing system of any one of claims 1-12, wherein the nucleic acid encoding the serine recombinase is provided within a plasmid, a nanoplasmid, a phagemid, a phage
derivative, a virus, a bacmid, a bacterial artificial chromosome (B AC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
14. The gene editing system of any one of claims 12-13, wherein the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus.
15. The gene editing system of claim 14, wherein the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV-hu37, AAV-Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV- 2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV- HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV-HSC11, AAV-HSC12, AAV-HSC13, AAV-HSC14, AAV-HSC15, AAV-TT, AAV- DJ/8, AAV-Myo, AAV-NP40, AAV-NP59, AAV-NP22, AAV-NP66, or AAV-HSC16, or a derivative thereof.
16. The gene editing system of claim 14, wherein the herpesvirus is HSV-1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8.
17. The gene editing system of any one of claims 1-16, wherein the donor polynucleotide comprises a size of at least about 1 kilobase (kb), 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20 kb, 30 kb, 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, 100 kb, 110 kb, 120 kb, or more than 120 kb.
18. The gene editing system of any one of claims 1-17, wherein the donor polynucleotide encodes a therapeutic, a reporter, or a marker.
19. The gene editing system of claim 18, wherein the reporter comprises a fluorescent protein.
20. The gene editing system of claim 19, wherein the fluorescent protein is GFP, EBFP, EBFP2, Azurite, mKalamal, ECFP, Cerulean, CyPet, YFP, Citrine, Venus, YPet, RFP, CFP, or a derivative thereof.
21. The gene editing system of claim 18, wherein the reporter is acetohydroxyacid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucuronidase (GUS), chloramphenicol acetyltransferase (CAT), horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, or a derivative thereof.
22. The gene editing system of any one of claims 18-21, wherein the marker is an antibiotic resistance marker.
23. The gene editing system of claim 22, wherein the antibiotic resistance marker is kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin B, tetracycline, chloramphenicol, neomycin, zeocin, or a derivative thereof.
24. The gene editing system of any one of claims 18-23, wherein the marker is a cell surface marker.
25. A eukaryotic cell comprising a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115.
26. The eukaryotic cell of claim 25, wherein the eukaryotic cell is a mammalian cell.
27. The eukaryotic cell of claim 25, wherein the eukaryotic cell is a human cell.
28. The eukaryotic cell of claim 25, wherein the serine recombinase comprises an integration efficiency of at least about 5%.
29. The eukaryotic cell of claim 25, wherein the serine recombinase comprises an integration efficiency of at least about 25%.
30. The eukaryotic cell of claim 25, wherein the serine recombinase comprises an integration efficiency of at least about 50%.
31. The eukaryotic cell of claim 25, wherein the serine recombinase is capable of targeting genes comprising a catalase domain or synthase domain.
32. The eukaryotic cell of claim 31, wherein the catalase is manganese catalase.
33. The eukaryotic cell of any one of claims 31-32, wherein the synthase is Queuosine synthase.
34. The eukaryotic cell of any one of claims 31-33, wherein the serine recombinase is capable of targeting genes comprising a DUF4244 Pfam domain.
35. A vector compri sing : a) a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115; and b) one or more regulatory elements.
36. The vector of claim 35, wherein the one or more regulatory elements comprises a promoter, an enhancer, an intron, a microRNA, a linker, a splicing element, or a polyA signal.
37. The vector of claim 36, wherein the promoter is selected from a constitutive promoter, an inducible promoter, a mini promoter, or a derivative thereof.
38. The vector of claim 36, wherein the promoter is selected from the group consisting of: CMV, CBA, EFla, CAG, PGK, TRE, U6, UAS, T7, Sp6, lac, araBad, trp, Ptac, p5, pl 9, p40, Synapsin, CaMKII, GRK1, polH, EM7, OpIEl, and a derivative thereof.
39. A vector comprising a nucleic acid encoding a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115, wherein the vector is selected from the group consisting of: a plasmid, a nanoplasmid, a phagemid, a phage derivative, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), and a cosmid.
40. A method for gene editing, comprising: a) providing or identifying a first attachment site sequence in a host genome; b) providing a nucleic acid comprising a donor polynucleotide and a second attachment site sequence to a host cell; and c) contacting the host cell with a serine recombinase comprising at least about 80% sequence identity to any one of SEQ ID NOs: 1-115 or a nucleic acid encoding the serine recombinase, wherein the first attachment site sequence and the second attachment site sequence are capable of recombination.
41. The method of claim 40, wherein the first attachment site sequence is endogenous in the host genome.
42. The method of claim 40, wherein the first attachment site sequence is provided using viral delivery.
43. The method of claim 40, wherein the first attachment site sequence is provided using a transposase.
44. The method of claim 40, wherein the first attachment site sequence is provided using a nuclease.
45. The method of claim 44, wherein the nuclease is a double-strand nuclease.
46. The method of claim 44, wherein the nuclease is a Type II CRISPR endonuclease.
47. The method of claim 44, wherein the nuclease is a Type V CRISPR endonuclease.
48. The method of claim 44, wherein the nuclease is Cas9.
49. The method of claim 40, wherein the first attachment site sequence is provided using a reverse transcriptase.
50. The method of any one of claims 40-49, wherein the second attachment site sequence is 5’ of the donor polynucleotide.
51. The method of any one of claims 40-50, wherein the first attachment site sequence is a bacterial genomic recombination sequence (attB).
52. The method of any one of claims 40-50, wherein the first attachment site sequence is a phage genomic recombination sequence (attP).
53. The method of any one of claims 40-52, wherein the second attachment site sequence is a bacterial genomic recombination sequence (attB).
54. The method of any one of claims 40-52, wherein the second attachment site sequence is a phage genomic recombination sequence (attP).
55. The method of any one of claims 51-54, wherein the attB sequence comprises about 20 to about 500 nucleotides.
56. The method of any one of claims 52-55, wherein the attP sequence comprises about 20 to about 500 nucleotides.
57. The method of any one of claims 40-56, wherein the nucleic acid comprising the donor polynucleotide and the second attachment site sequence is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
58. The method of any one of claims 40-57, wherein the nucleic acid encoding the serine recombinase is provided within a plasmid, a nanoplasmid, a phagemid, a phage derivative, a virus, a bacmid, a bacterial artificial chromosome (BAC), a minicircle, a doggybone, a yeast artificial chromosome (YAC), or a cosmid.
59. The method of any one of claims 57-58, wherein the virus is an alphavirus, a parvovirus, an adenovirus, an AAV, a baculovirus, a Dengue virus, a lentivirus, a herpesvirus, a poxvirus, an anellovirus, a bocavirus, a vaccinia virus, or a retrovirus.
60. The method of claim 59, wherein the AAV is AAV1, AAV2, AAV3, AAV4, AAV5, AAV6, AAV7, AAV8, AAV9, AAV10, AAV11, AAV12, AAV13, AAV14, AAV15, AAV16, AAV-rh8, AAV-rhlO, AAV-rh20, AAV-rh39, AAV-rh74, AAV-rhM4-l, AAV- hu37, AAV-Anc80, AAV-Anc80L65, AAV-7m8, AAV-PHP-B, AAV-PHP-EB, AAV-2.5, AAV-2tYF, AAV-3B, AAV-LK03, AAV-HSC1, AAV-HSC2, AAV-HSC3, AAV-HSC4, AAV-HSC5, AAV-HSC6, AAV-HSC7, AAV-HSC8, AAV-HSC9, AAV-HSC10, AAV- HSC11, AAV-HSC12, AAV-HSC13, AAV-HSC14, AAV-HSC15, AAV-TT, AAV-DJ/8, AAV-Myo, AAV-NP40, AAV-NP59, AAV-NP22, AAV-NP66, or AAV-HSC16, or a derivative thereof.
61. The method of claim 59, wherein the herpesvirus is HSV-1, HSV-2, VZV, EBV, CMV, HHV-6, HHV-7, or HHV-8.
62. The method of any one of claims 40-61, wherein the donor polynucleotide comprises a size of at least about 1 kilobase (kb), 2 kb, 3 kb, 4 kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 20
kb, 30 kb, 40 kb, 50 kb, 60 kb, 70 kb, 80 kb, 90 kb, 100 kb, 110 kb, 120 kb, or more than 120 kb.
63. The method of any one of claims 40-62, wherein the donor polynucleotide encodes a therapeutic, a reporter, or a marker.
64. The method of claim 63, wherein the reporter comprises a fluorescent protein.
65. The method of claim 64, wherein the fluorescent protein is GFP, EBFP, EBFP2, Azurite, mKalamal, ECFP, Cerulean, CyPet, YFP, Citrine, Venus, YPet, RFP, CFP, or a derivative thereof.
66. The method of claim 63, wherein the reporter is acetohydroxyacid synthase (AHAS), alkaline phosphatase (AP), beta galactosidase (LacZ), beta glucuronidase (GUS), chloramphenicol acetyltransferase (CAT), horseradish peroxidase (HRP), luciferase (Luc), nopaline synthase (NOS), octopine synthase (OCS), luciferase, or a derivative thereof.
67. The method of any one of claims 63-66, wherein the marker is an antibiotic resistance marker.
68. The method of claim 67, wherein the antibiotic resistance marker is kanamycin, spectinomycin, streptomycin, ampicillin, carbenicillin, bleomycin, erythromycin, polymyxin B, tetracycline, chloramphenicol, neomycin, zeocin, or a derivative thereof.
69. The method of any one of claims 63-66, wherein the marker is a cell surface marker.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263382692P | 2022-11-07 | 2022-11-07 | |
US63/382,692 | 2022-11-07 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2024102666A2 true WO2024102666A2 (en) | 2024-05-16 |
WO2024102666A3 WO2024102666A3 (en) | 2024-06-20 |
Family
ID=91033417
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/078852 WO2024102666A2 (en) | 2022-11-07 | 2023-11-06 | Serine recombinases for gene editing |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024102666A2 (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9034650B2 (en) * | 2005-02-02 | 2015-05-19 | Intrexon Corporation | Site-specific serine recombinases and methods of their use |
TW202132565A (en) * | 2019-11-01 | 2021-09-01 | 美商聖加莫治療股份有限公司 | Gin recombinase variants |
EP4305165A1 (en) * | 2021-03-08 | 2024-01-17 | Flagship Pioneering Innovations VI, LLC | Lentivirus with altered integrase activity |
-
2023
- 2023-11-06 WO PCT/US2023/078852 patent/WO2024102666A2/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2024102666A3 (en) | 2024-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9738908B2 (en) | CRISPR/Cas systems for genomic modification and gene modulation | |
US11572556B2 (en) | Systems, methods, and compositions for site-specific genetic engineering using programmable addition via site-specific targeting elements (paste) | |
US11713471B2 (en) | Class II, type V CRISPR systems | |
JP2023179468A (en) | Enzymes with ruvc domains | |
US11021719B2 (en) | Methods and compositions for assessing CRISPER/Cas-mediated disruption or excision and CRISPR/Cas-induced recombination with an exogenous donor nucleic acid in vivo | |
CN111315889A (en) | Methods and compositions for enhancing homologous recombination | |
CN116096892A (en) | Enzyme with RuvC domain | |
US20230119375A1 (en) | Materials and methods for increasing gene editing frequency | |
US20230340481A1 (en) | Systems and methods for transposing cargo nucleotide sequences | |
WO2021178934A1 (en) | Class ii, type v crispr systems | |
AU2019244594A1 (en) | Modified nucleic acid editing systems for tethering donor DNA | |
WO2024102666A2 (en) | Serine recombinases for gene editing | |
WO2024102667A2 (en) | Serine recombinases for gene editing | |
WO2024086661A2 (en) | Gene editing systems comprising reverse transcriptases | |
WO2024086669A2 (en) | Gene editing systems comprising reverse transcriptases | |
WO2024055013A1 (en) | Systems and methods for transposing cargo nucleotide sequences | |
WO2024124204A2 (en) | Retrotransposon compositions and methods of use | |
WO2024055012A1 (en) | Systems and methods for transposing cargo nucleotide sequences | |
WO2023164592A2 (en) | Fusion proteins | |
AU2023226059A1 (en) | Fusion proteins | |
WO2023164591A2 (en) | Systems and methods for transposing cargo nucleotide sequences | |
AU2023225035A1 (en) | Systems and methods for transposing cargo nucleotide sequences | |
WO2023164593A2 (en) | Systems and methods for transposing cargo nucleotide sequences | |
WO2020047531A1 (en) | Scalable tagging of endogenous genes by homology-independent intron targeting | |
WO2023164590A2 (en) | Fusion proteins |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23889560 Country of ref document: EP Kind code of ref document: A2 |