WO2024017189A1 - Tnpb-based genome editor - Google Patents
Tnpb-based genome editor Download PDFInfo
- Publication number
- WO2024017189A1 WO2024017189A1 PCT/CN2023/107697 CN2023107697W WO2024017189A1 WO 2024017189 A1 WO2024017189 A1 WO 2024017189A1 CN 2023107697 W CN2023107697 W CN 2023107697W WO 2024017189 A1 WO2024017189 A1 WO 2024017189A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- polypeptide
- tnpb
- seq
- polynucleotide
- nucleotide sequence
- Prior art date
Links
- 229920001184 polypeptide Polymers 0.000 claims abstract description 590
- 102000004196 processed proteins & peptides Human genes 0.000 claims abstract description 590
- 108090000765 processed proteins & peptides Proteins 0.000 claims abstract description 590
- 230000000694 effects Effects 0.000 claims abstract description 136
- 108010042407 Endonucleases Proteins 0.000 claims abstract description 86
- 230000004927 fusion Effects 0.000 claims abstract description 76
- 238000010362 genome editing Methods 0.000 claims abstract description 58
- 238000000034 method Methods 0.000 claims abstract description 52
- 102000004533 Endonucleases Human genes 0.000 claims abstract description 19
- 125000003729 nucleotide group Chemical group 0.000 claims description 236
- 239000002773 nucleotide Substances 0.000 claims description 233
- 102000040430 polynucleotide Human genes 0.000 claims description 161
- 108091033319 polynucleotide Proteins 0.000 claims description 161
- 239000002157 polynucleotide Substances 0.000 claims description 161
- 108020005004 Guide RNA Proteins 0.000 claims description 146
- 210000004027 cell Anatomy 0.000 claims description 119
- 150000001413 amino acids Chemical class 0.000 claims description 116
- 239000012634 fragment Substances 0.000 claims description 113
- 235000001014 amino acid Nutrition 0.000 claims description 110
- 108020004414 DNA Proteins 0.000 claims description 101
- 108090000623 proteins and genes Proteins 0.000 claims description 77
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 64
- 239000013612 plasmid Substances 0.000 claims description 62
- 230000008685 targeting Effects 0.000 claims description 54
- 238000003776 cleavage reaction Methods 0.000 claims description 51
- 230000007017 scission Effects 0.000 claims description 51
- 102000004169 proteins and genes Human genes 0.000 claims description 44
- 230000027455 binding Effects 0.000 claims description 43
- 235000018102 proteins Nutrition 0.000 claims description 42
- 230000014509 gene expression Effects 0.000 claims description 32
- 102000053602 DNA Human genes 0.000 claims description 31
- 230000004048 modification Effects 0.000 claims description 29
- 238000012986 modification Methods 0.000 claims description 29
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 20
- 244000005700 microbiome Species 0.000 claims description 20
- 238000013518 transcription Methods 0.000 claims description 20
- 230000035897 transcription Effects 0.000 claims description 20
- 241000588724 Escherichia coli Species 0.000 claims description 17
- 150000007523 nucleic acids Chemical class 0.000 claims description 17
- 239000000203 mixture Substances 0.000 claims description 16
- 241000588626 Acinetobacter baumannii Species 0.000 claims description 15
- 241001468246 Aeribacillus pallidus Species 0.000 claims description 15
- 241000607548 Aeromonas media Species 0.000 claims description 15
- 241000607525 Aeromonas salmonicida Species 0.000 claims description 15
- 241001160853 Anoxybacillus amylolyticus Species 0.000 claims description 15
- 241000193755 Bacillus cereus Species 0.000 claims description 15
- 241000193388 Bacillus thuringiensis Species 0.000 claims description 15
- 241000589877 Campylobacter coli Species 0.000 claims description 15
- 241000193403 Clostridium Species 0.000 claims description 15
- 241000193155 Clostridium botulinum Species 0.000 claims description 15
- 241000193468 Clostridium perfringens Species 0.000 claims description 15
- 241000724200 Clostridium phage c-st Species 0.000 claims description 15
- 241000959949 Deinococcus geothermalis Species 0.000 claims description 15
- 241000194031 Enterococcus faecium Species 0.000 claims description 15
- 241000987609 Halorubrum halophilum Species 0.000 claims description 15
- 241000588747 Klebsiella pneumoniae Species 0.000 claims description 15
- 241000205284 Methanosarcina acetivorans Species 0.000 claims description 15
- 241000205274 Methanosarcina mazei Species 0.000 claims description 15
- 241000192673 Nostoc sp. Species 0.000 claims description 15
- 241000531124 Raoultella ornithinolytica Species 0.000 claims description 15
- 241001138501 Salmonella enterica Species 0.000 claims description 15
- 241000192560 Synechococcus sp. Species 0.000 claims description 15
- 241000203780 Thermobifida fusca Species 0.000 claims description 15
- 241001313699 Thermosynechococcus elongatus Species 0.000 claims description 15
- 241000317522 Youngiibacter multivorans Species 0.000 claims description 15
- 229940097012 bacillus thuringiensis Drugs 0.000 claims description 15
- 102000039446 nucleic acids Human genes 0.000 claims description 12
- 108020004707 nucleic acids Proteins 0.000 claims description 12
- 210000001236 prokaryotic cell Anatomy 0.000 claims description 11
- 230000005782 double-strand break Effects 0.000 claims description 10
- HCHKCACWOHOZIP-UHFFFAOYSA-N Zinc Chemical group [Zn] HCHKCACWOHOZIP-UHFFFAOYSA-N 0.000 claims description 9
- 235000004279 alanine Nutrition 0.000 claims description 9
- 125000003295 alanine group Chemical group N[C@@H](C)C(=O)* 0.000 claims description 9
- 210000004899 c-terminal region Anatomy 0.000 claims description 9
- 239000011701 zinc Substances 0.000 claims description 9
- 229910052725 zinc Inorganic materials 0.000 claims description 9
- -1 APOBEC1 Proteins 0.000 claims description 8
- 108700019146 Transgenes Proteins 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 7
- 239000012190 activator Substances 0.000 claims description 6
- 239000003112 inhibitor Substances 0.000 claims description 6
- 102000012758 APOBEC-1 Deaminase Human genes 0.000 claims description 3
- 108010052875 Adenine deaminase Proteins 0.000 claims description 3
- 101100121888 Arabidopsis thaliana SBE2.1 gene Proteins 0.000 claims description 3
- 101100504552 Arabidopsis thaliana SBE2.2 gene Proteins 0.000 claims description 3
- 108010031325 Cytidine deaminase Proteins 0.000 claims description 3
- 102100026846 Cytidine deaminase Human genes 0.000 claims description 3
- 102100040263 DNA dC->dU-editing enzyme APOBEC-3A Human genes 0.000 claims description 3
- 108010033040 Histones Proteins 0.000 claims description 3
- 101000964378 Homo sapiens DNA dC->dU-editing enzyme APOBEC-3A Proteins 0.000 claims description 3
- 101000615488 Homo sapiens Methyl-CpG-binding domain protein 2 Proteins 0.000 claims description 3
- 102100021299 Methyl-CpG-binding domain protein 2 Human genes 0.000 claims description 3
- 108060004795 Methyltransferase Proteins 0.000 claims description 3
- 102000016397 Methyltransferase Human genes 0.000 claims description 3
- 102000004160 Phosphoric Monoester Hydrolases Human genes 0.000 claims description 3
- 108090000608 Phosphoric Monoester Hydrolases Proteins 0.000 claims description 3
- 108091000080 Phosphotransferase Proteins 0.000 claims description 3
- 102000006275 Ubiquitin-Protein Ligases Human genes 0.000 claims description 3
- 108010083111 Ubiquitin-Protein Ligases Proteins 0.000 claims description 3
- 102000005421 acetyltransferase Human genes 0.000 claims description 3
- 108020002494 acetyltransferase Proteins 0.000 claims description 3
- 208000031753 acute bilirubin encephalopathy Diseases 0.000 claims description 3
- 230000006154 adenylylation Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 230000006114 demyristoylation Effects 0.000 claims description 3
- 229940079593 drug Drugs 0.000 claims description 3
- 239000003814 drug Substances 0.000 claims description 3
- 230000007498 myristoylation Effects 0.000 claims description 3
- 102000020233 phosphotransferase Human genes 0.000 claims description 3
- 230000008439 repair process Effects 0.000 claims description 3
- 150000003384 small molecules Chemical class 0.000 claims description 3
- 241000702421 Dependoparvovirus Species 0.000 claims description 2
- 230000030648 nucleus localization Effects 0.000 claims description 2
- 102100031780 Endonuclease Human genes 0.000 description 68
- 101710163270 Nuclease Proteins 0.000 description 34
- 108091028043 Nucleic acid sequence Proteins 0.000 description 29
- 239000004287 Dehydroacetic acid Substances 0.000 description 28
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 24
- 238000003780 insertion Methods 0.000 description 23
- 230000037431 insertion Effects 0.000 description 23
- 108091026890 Coding region Proteins 0.000 description 20
- 239000012636 effector Substances 0.000 description 18
- 238000012217 deletion Methods 0.000 description 16
- 230000037430 deletion Effects 0.000 description 16
- 230000001105 regulatory effect Effects 0.000 description 16
- 238000006467 substitution reaction Methods 0.000 description 16
- 239000013598 vector Substances 0.000 description 16
- 238000007792 addition Methods 0.000 description 15
- 108020004999 messenger RNA Proteins 0.000 description 15
- 238000003556 assay Methods 0.000 description 12
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 11
- 210000005260 human cell Anatomy 0.000 description 11
- 238000012360 testing method Methods 0.000 description 11
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 10
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 10
- 230000006801 homologous recombination Effects 0.000 description 10
- 238000002744 homologous recombination Methods 0.000 description 10
- 238000009396 hybridization Methods 0.000 description 9
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Natural products CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 8
- 101000829958 Homo sapiens N-acetyllactosaminide beta-1,6-N-acetylglucosaminyl-transferase Proteins 0.000 description 8
- 102100023315 N-acetyllactosaminide beta-1,6-N-acetylglucosaminyl-transferase Human genes 0.000 description 8
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 8
- WOWHHFRSBJGXCM-UHFFFAOYSA-M cetyltrimethylammonium chloride Chemical compound [Cl-].CCCCCCCCCCCCCCCC[N+](C)(C)C WOWHHFRSBJGXCM-UHFFFAOYSA-M 0.000 description 8
- 230000000295 complement effect Effects 0.000 description 8
- 238000001727 in vivo Methods 0.000 description 8
- 239000004310 lactic acid Substances 0.000 description 8
- 238000013519 translation Methods 0.000 description 8
- 241000699666 Mus <mouse, genus> Species 0.000 description 7
- 230000004075 alteration Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 210000000349 chromosome Anatomy 0.000 description 6
- 239000000047 product Substances 0.000 description 6
- 241000894007 species Species 0.000 description 6
- 239000000126 substance Substances 0.000 description 6
- 101150044011 tnpB gene Proteins 0.000 description 6
- 101710172824 CRISPR-associated endonuclease Cas9 Proteins 0.000 description 5
- 241001465754 Metazoa Species 0.000 description 5
- 108091034117 Oligonucleotide Proteins 0.000 description 5
- 125000000539 amino acid group Chemical group 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 230000001404 mediated effect Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000000717 retained effect Effects 0.000 description 5
- 241001164825 Adeno-associated virus - 8 Species 0.000 description 4
- 239000004254 Ammonium phosphate Substances 0.000 description 4
- 108020005544 Antisense RNA Proteins 0.000 description 4
- 108091033409 CRISPR Proteins 0.000 description 4
- 230000007018 DNA scission Effects 0.000 description 4
- 241000196324 Embryophyta Species 0.000 description 4
- XBDQKXXYIPTUBI-UHFFFAOYSA-N Propionic acid Substances CCC(O)=O XBDQKXXYIPTUBI-UHFFFAOYSA-N 0.000 description 4
- 102000004389 Ribonucleoproteins Human genes 0.000 description 4
- 108010081734 Ribonucleoproteins Proteins 0.000 description 4
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 4
- 239000001569 carbon dioxide Substances 0.000 description 4
- 239000003184 complementary RNA Substances 0.000 description 4
- 238000004520 electroporation Methods 0.000 description 4
- 230000035772 mutation Effects 0.000 description 4
- 238000011144 upstream manufacturing Methods 0.000 description 4
- 229940035893 uracil Drugs 0.000 description 4
- 241000702423 Adeno-associated virus - 2 Species 0.000 description 3
- 108700028369 Alleles Proteins 0.000 description 3
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 3
- 108091092195 Intron Proteins 0.000 description 3
- 241000699670 Mus sp. Species 0.000 description 3
- 238000010459 TALEN Methods 0.000 description 3
- 108010043645 Transcription Activator-Like Effector Nucleases Proteins 0.000 description 3
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 3
- 108010017070 Zinc Finger Nucleases Proteins 0.000 description 3
- 230000001580 bacterial effect Effects 0.000 description 3
- 238000000684 flow cytometry Methods 0.000 description 3
- 238000011534 incubation Methods 0.000 description 3
- 230000001939 inductive effect Effects 0.000 description 3
- 238000010369 molecular cloning Methods 0.000 description 3
- 239000013642 negative control Substances 0.000 description 3
- 230000007935 neutral effect Effects 0.000 description 3
- 210000003463 organelle Anatomy 0.000 description 3
- 230000008488 polyadenylation Effects 0.000 description 3
- 229920000642 polymer Polymers 0.000 description 3
- 239000002243 precursor Substances 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000011426 transformation method Methods 0.000 description 3
- 230000017105 transposition Effects 0.000 description 3
- 108700010070 Codon Usage Proteins 0.000 description 2
- 108010017826 DNA Polymerase I Proteins 0.000 description 2
- 102000004594 DNA Polymerase I Human genes 0.000 description 2
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 2
- 101710096438 DNA-binding protein Proteins 0.000 description 2
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 2
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 2
- 229940113491 Glycosylase inhibitor Drugs 0.000 description 2
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 2
- 241000238631 Hexapoda Species 0.000 description 2
- 101000950695 Homo sapiens Mitogen-activated protein kinase 8 Proteins 0.000 description 2
- 108091027974 Mature messenger RNA Proteins 0.000 description 2
- 102100037808 Mitogen-activated protein kinase 8 Human genes 0.000 description 2
- 101100001705 Mus musculus Angptl3 gene Proteins 0.000 description 2
- 108010076504 Protein Sorting Signals Proteins 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- IQFYYKKMVGJFEH-XLPZGREQSA-N Thymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](O)C1 IQFYYKKMVGJFEH-XLPZGREQSA-N 0.000 description 2
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- 210000005006 adaptive immune system Anatomy 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 2
- 230000002759 chromosomal effect Effects 0.000 description 2
- 239000002299 complementary DNA Substances 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000013206 minimal dilution Methods 0.000 description 2
- 210000004940 nucleus Anatomy 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 108020001580 protein domains Proteins 0.000 description 2
- 108091008146 restriction endonucleases Proteins 0.000 description 2
- YGSDEFSMJLZEOE-UHFFFAOYSA-N salicylic acid Chemical compound OC(=O)C1=CC=CC=C1O YGSDEFSMJLZEOE-UHFFFAOYSA-N 0.000 description 2
- 239000000523 sample Substances 0.000 description 2
- 230000001568 sexual effect Effects 0.000 description 2
- 230000035882 stress Effects 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 238000010361 transduction Methods 0.000 description 2
- 230000026683 transduction Effects 0.000 description 2
- 238000001890 transfection Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- WKKCYLSCLQVWFD-UHFFFAOYSA-N 1,2-dihydropyrimidin-4-amine Chemical compound N=C1NCNC=C1 WKKCYLSCLQVWFD-UHFFFAOYSA-N 0.000 description 1
- YKBGVTZYEHREMT-KVQBGUIXSA-N 2'-deoxyguanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@H]1C[C@H](O)[C@@H](CO)O1 YKBGVTZYEHREMT-KVQBGUIXSA-N 0.000 description 1
- JLIDBLDQVAYHNE-LXGGSRJLSA-N 2-cis-abscisic acid Chemical compound OC(=O)/C=C(/C)\C=C\C1(O)C(C)=CC(=O)CC1(C)C JLIDBLDQVAYHNE-LXGGSRJLSA-N 0.000 description 1
- 108020005345 3' Untranslated Regions Proteins 0.000 description 1
- 108020003589 5' Untranslated Regions Proteins 0.000 description 1
- MSSXOMSJDRHRMC-UHFFFAOYSA-N 9H-purine-2,6-diamine Chemical compound NC1=NC(N)=C2NC=NC2=N1 MSSXOMSJDRHRMC-UHFFFAOYSA-N 0.000 description 1
- 239000013607 AAV vector Substances 0.000 description 1
- HRPVXLWXLXDGHG-UHFFFAOYSA-N Acrylamide Chemical compound NC(=O)C=C HRPVXLWXLXDGHG-UHFFFAOYSA-N 0.000 description 1
- 241000251468 Actinopterygii Species 0.000 description 1
- 241000589158 Agrobacterium Species 0.000 description 1
- 239000004475 Arginine Substances 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- 102000020313 Cell-Penetrating Peptides Human genes 0.000 description 1
- 108010051109 Cell-Penetrating Peptides Proteins 0.000 description 1
- MIKUYHXYGGJMLM-GIMIYPNGSA-N Crotonoside Natural products C1=NC2=C(N)NC(=O)N=C2N1[C@H]1O[C@@H](CO)[C@H](O)[C@@H]1O MIKUYHXYGGJMLM-GIMIYPNGSA-N 0.000 description 1
- NYHBQMYGNKIUIF-UHFFFAOYSA-N D-guanosine Natural products C1=2NC(N)=NC(=O)C=2N=CN1C1OC(CO)C(O)C1O NYHBQMYGNKIUIF-UHFFFAOYSA-N 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 102100035273 E3 ubiquitin-protein ligase CBL-B Human genes 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 229940123611 Genome editing Drugs 0.000 description 1
- 102100023823 Homeobox protein EMX1 Human genes 0.000 description 1
- 101000931098 Homo sapiens DNA (cytosine-5)-methyltransferase 1 Proteins 0.000 description 1
- 101000737265 Homo sapiens E3 ubiquitin-protein ligase CBL-B Proteins 0.000 description 1
- 101001048956 Homo sapiens Homeobox protein EMX1 Proteins 0.000 description 1
- 101000653360 Homo sapiens Methylcytosine dioxygenase TET1 Proteins 0.000 description 1
- 101000579123 Homo sapiens Phosphoglycerate kinase 1 Proteins 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 102100021244 Integral membrane protein GPR180 Human genes 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 239000004472 Lysine Substances 0.000 description 1
- KDXKERNSBIXSRK-UHFFFAOYSA-N Lysine Natural products NCCCCC(N)C(O)=O KDXKERNSBIXSRK-UHFFFAOYSA-N 0.000 description 1
- 101150083522 MECP2 gene Proteins 0.000 description 1
- 102100039124 Methyl-CpG-binding protein 2 Human genes 0.000 description 1
- 108020005196 Mitochondrial DNA Proteins 0.000 description 1
- KJWZYMMLVHIVSU-IYCNHOCDSA-N PGK1 Chemical compound CCCCC[C@H](O)\C=C\[C@@H]1[C@@H](CCCCCCC(O)=O)C(=O)CC1=O KJWZYMMLVHIVSU-IYCNHOCDSA-N 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 102100028251 Phosphoglycerate kinase 1 Human genes 0.000 description 1
- 239000002202 Polyethylene glycol Substances 0.000 description 1
- 229920002873 Polyethylenimine Polymers 0.000 description 1
- 108091034057 RNA (poly(A)) Proteins 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 108020004511 Recombinant DNA Proteins 0.000 description 1
- 108091028664 Ribonucleotide Proteins 0.000 description 1
- 241000283984 Rodentia Species 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 101100166144 Staphylococcus aureus cas9 gene Proteins 0.000 description 1
- 108091081024 Start codon Proteins 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- 108010073062 Transcription Activator-Like Effectors Proteins 0.000 description 1
- 108700029229 Transcriptional Regulatory Elements Proteins 0.000 description 1
- 102000008579 Transposases Human genes 0.000 description 1
- 108010020764 Transposases Proteins 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 210000004102 animal cell Anatomy 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 230000000692 anti-sense effect Effects 0.000 description 1
- ODKSFYDXXFIFQN-UHFFFAOYSA-N arginine Natural products OC(=O)C(N)CCCNC(N)=N ODKSFYDXXFIFQN-UHFFFAOYSA-N 0.000 description 1
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000009395 breeding Methods 0.000 description 1
- 230000001488 breeding effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000030570 cellular localization Effects 0.000 description 1
- 239000012707 chemical precursor Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 229960005091 chloramphenicol Drugs 0.000 description 1
- WIIZWVCIJKGZOK-RKDXNWHRSA-N chloramphenicol Chemical compound ClC(Cl)C(=O)N[C@H](CO)[C@H](O)C1=CC=C([N+]([O-])=O)C=C1 WIIZWVCIJKGZOK-RKDXNWHRSA-N 0.000 description 1
- 235000012000 cholesterol Nutrition 0.000 description 1
- 239000013611 chromosomal DNA Substances 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 239000005547 deoxyribonucleotide Substances 0.000 description 1
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000001214 effect on cellular process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- ZMMJGEGLRURXTF-UHFFFAOYSA-N ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 description 1
- 229960005542 ethidium bromide Drugs 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 239000013613 expression plasmid Substances 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 238000013230 female C57BL/6J mice Methods 0.000 description 1
- 230000002538 fungal effect Effects 0.000 description 1
- 238000012239 gene modification Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- IIRDTKBZINWQAW-UHFFFAOYSA-N hexaethylene glycol Chemical group OCCOCCOCCOCCOCCOCCO IIRDTKBZINWQAW-UHFFFAOYSA-N 0.000 description 1
- 230000003054 hormonal effect Effects 0.000 description 1
- 102000057967 human DNMT1 Human genes 0.000 description 1
- 239000000411 inducer Substances 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- ZNJFBWYDHIGLCU-HWKXXFMVSA-N jasmonic acid Chemical compound CC\C=C/C[C@@H]1[C@@H](CC(O)=O)CCC1=O ZNJFBWYDHIGLCU-HWKXXFMVSA-N 0.000 description 1
- 229930027917 kanamycin Natural products 0.000 description 1
- 229960000318 kanamycin Drugs 0.000 description 1
- SBUJHOSQTJFQJX-NOAMYHISSA-N kanamycin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CN)O[C@@H]1O[C@H]1[C@H](O)[C@@H](O[C@@H]2[C@@H]([C@@H](N)[C@H](O)[C@@H](CO)O2)O)[C@H](N)C[C@@H]1N SBUJHOSQTJFQJX-NOAMYHISSA-N 0.000 description 1
- 229930182823 kanamycin A Natural products 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 210000005228 liver tissue Anatomy 0.000 description 1
- 210000004962 mammalian cell Anatomy 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 230000021121 meiosis Effects 0.000 description 1
- 238000000520 microinjection Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 231100000350 mutagenesis Toxicity 0.000 description 1
- 239000002105 nanoparticle Substances 0.000 description 1
- 238000007481 next generation sequencing Methods 0.000 description 1
- 229940046166 oligodeoxynucleotide Drugs 0.000 description 1
- 230000008723 osmotic stress Effects 0.000 description 1
- FJKROLUGYXJWQN-UHFFFAOYSA-N papa-hydroxy-benzoic acid Natural products OC(=O)C1=CC=C(O)C=C1 FJKROLUGYXJWQN-UHFFFAOYSA-N 0.000 description 1
- 229930195732 phytohormone Natural products 0.000 description 1
- 210000002706 plastid Anatomy 0.000 description 1
- 229920001223 polyethylene glycol Polymers 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 230000019525 primary metabolic process Effects 0.000 description 1
- 150000003212 purines Chemical class 0.000 description 1
- 150000003230 pyrimidines Chemical class 0.000 description 1
- 238000010188 recombinant method Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000009711 regulatory function Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002336 ribonucleotide Substances 0.000 description 1
- 125000002652 ribonucleotide group Chemical group 0.000 description 1
- 108091092562 ribozyme Proteins 0.000 description 1
- 229920002477 rna polymer Polymers 0.000 description 1
- 229960004889 salicylic acid Drugs 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 238000002864 sequence alignment Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 210000001082 somatic cell Anatomy 0.000 description 1
- 125000006850 spacer group Chemical group 0.000 description 1
- 210000000130 stem cell Anatomy 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 101150097091 tnpA gene Proteins 0.000 description 1
- 230000000699 topical effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 1
- 229940045145 uridine Drugs 0.000 description 1
- 210000003462 vein Anatomy 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/87—Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
- C12N15/90—Stable introduction of foreign DNA into chromosome
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/102—Mutagenizing nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/52—Genes encoding for enzymes or proenzymes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/10—Transferases (2.)
- C12N9/12—Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
- C12N9/1241—Nucleotidyltransferases (2.7.7)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/16—Hydrolases (3) acting on ester bonds (3.1)
- C12N9/22—Ribonucleases RNAses, DNAses
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61K—PREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
- A61K48/00—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
- A61K48/005—Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy characterised by an aspect of the 'active' part of the composition delivered, i.e. the nucleic acid delivered
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/63—Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
- C12N15/70—Vectors or expression systems specially adapted for E. coli
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2750/00—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssDNA viruses
- C12N2750/00011—Details
- C12N2750/14011—Parvoviridae
- C12N2750/14111—Dependovirus, e.g. adenoassociated viruses
- C12N2750/14141—Use of virus, viral particle or viral elements as a vector
- C12N2750/14143—Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector
Definitions
- the present invention relates to molecular biology.
- the present invention provides novel RNA-guided systems for gene editing.
- the modification of genome at a predetermined site has been enabled by employing site-specific systems.
- Genome-editing techniques such as meganucleases, designer zinc finger nucleases (ZFNs) , or transcription activator-like effector nucleases (TALENs) , are available for producing targeted genome modification, but these systems tend to have low specificity and employ designed nucleases that need to be redesigned for each target site, which renders them costly and time-consuming to prepare.
- ZFNs designer zinc finger nucleases
- TALENs transcription activator-like effector nucleases
- CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
- Cas nuclease is generally large in size, making it difficult to deliver the CRISPR-Cas systems into a cell.
- Transposition has a key role in reshaping genomes of all living organisms. Insertion sequences of IS200/IS605 and IS607 families are among the simplest mobile genetic elements and contain only the genes that are required for their transposition and its regulation. These elements encode tnpA transposase, which is essential for mobilization, and often carry an accessory tnpB gene, which is dispensable for transposition.
- a TnpB protein (ISDra2 TnpB) has been reported to have the activity of RNA-guided DNA endonuclease (Karvelis et al., Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease.
- TnpB is a functional progenitor of CRISPR–Cas nucleases and is established as a prototype of a new system for genome editing. TnpB proteins are generally much smaller than Cas proteins in length, and thus, will be more convenient for the delivery into a cell.
- the ISDra2 TnpB needs to recognize a transposon-associated motif (TAM) of TTGAT for effecting the RNA-guided cleavage. It is known that a longer sequence is generally present in the genome with a lower frequency. Therefore, the use of the ISDra2 TnpB in genome editing will be limited.
- TAM transposon-associated motif
- TnpB polypeptides having the activity of RNA-guided DNA endonuclease identified a number of TnpB polypeptides having the activity of RNA-guided DNA endonuclease.
- the identification of the TnpB polypeptides increases the possibility of editing various genomic regions which is not accessible for the TnpB polypeptide of the prior art.
- the TnpB polypeptides of the present disclosure provide an editing efficiency higher than the prior art TnpB polypeptide, and even comparable to Cas9 nuclease.
- the present disclosure provides a recombinant gene editing system comprising
- TnpB polypeptide or a functional fragment thereof or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and
- gRNA guide RNA
- the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp.
- JA-3-3Ab Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
- composition comprising
- a target double-stranded DNA comprising a nucleotide sequence of interest and a TAM recognized by the TnpB polypeptide
- gRNA recombinant guide RNA
- the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp.
- JA-3-3Ab Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
- the present disclosure provides a method of introducing a double-strand break into a polynucleotide of interest comprising a step of contacting the polynucleotide with a recombinant gene editing system comprising
- TnpB polypeptide or a functional fragment thereof or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and
- gRNA guide RNA
- TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp.
- PCC 7120 Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
- the present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant gene editing system comprising
- TnpB polypeptide or a functional fragment thereof or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and
- gRNA guide RNA
- TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp.
- PCC 7120 Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
- the present disclosure provides a modified TnpB polypeptide comprising a modification in the DDE motif as compared to the parent TnpB polypeptide, wherein the parent polypeptide has an activity of RNA-guided endonuclease, and wherein the modified TnpB is deprived of the activity of cleaving double-stranded DNA.
- the present disclosure provides a recombinant system comprising
- modified TnpB polypeptide of the fifth aspect or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or the functional fragment thereof, and
- gRNA guide RNA
- the present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant system of the sixth aspect and a gene editing system targeting the genomic sequence, wherein the nucleotide sequence of interest is next to the genomic sequence.
- the present disclosure provides a fusion polypeptide comprising a TnpB polypeptide or a functional fragment thereof or disarmed TnpB polypeptide fused to a fusion partner, wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp.
- a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus
- JA-3-3Ab Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp.
- TnpB polypeptide has an activity of RNA-guided endonuclease, and wherein the disarmed TnpB polypeptide is deprived of the activity of cleaving double-stranded RNA.
- the present disclosure provides a gene editing system comprising
- fusion polypeptide of the present disclosure or a polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide
- gRNA guide RNA
- the present disclosure provides a A method of modifying a genomic sequence in a eukaryotic cell, comprising a step of introducing the gene editing system of the ninth aspect into the eukaryotic cell, wherein the gRNA comprises a targeting region capable of hybridizing to a portion of the genomic sequence.
- the present disclosure provides a method of screening TnpB polypeptide for the activity of cleaving double-stranded DNA comprising the steps of:
- gRNA comprising a targeting region and a backbone region, wherein the backbone region comprises 100-350 nucleotides before the 3’ end of the IS, which naturally comprises the nucleotide sequence encoding the TnpB polypeptide;
- a target DNA comprising a nucleotide sequence that hybridizes to the nucleotide sequence of the targeting region and a TAM recognized by the TnpB polypeptide, wherein the TAM consists of four or five consecutive nucleotides adjacent to the 5’ end of the IS;
- Fig. 1 shows the maps of the pair of plasmids for screening TnpB polypeptide having the activity of RNA-guided DNA endonuclease, comprising the test plasmid (encoding a TnpB polypeptide and a gRNA, Fig. 1A) and the reporter plasmids (comprising a target sequence, Fig. 1B) .
- Fig. 2 shows the results of the screening for TnpB polypeptide having the activity of RNA-guided DNA endonuclease.
- Fig. 3 shows the depletion ratio of TnpB polypeptides and dTnpB polypeptides (the TnpB polypeptides with the DDE motif substituted by alanine) .
- Fig. 4 shows the alignment of the amino acid sequences of TnpB polypeptides having the activity of RNA-guided DNA endonuclease.
- Fig. 5 shows the amino acid sequences of TnpB polypeptides having the activity of RNA-guided DNA endonuclease, and the DDE motif in the amino acid sequences (the residues bold and underlined) .
- Fig. 6 shows the structure and mechanism of the fluorescence-reporting system.
- Fig. 7 shows the maps of the plasmids for detecting the RNA-guided cleavage in 293T cells with the fluorescence-reporting system, including a plasmid encoding the fluorescence-reporting system (A) , a plasmid encoding the TnpB polypeptide (B) , and a plasmid encoding gRNA (C) .
- Fig. 8 shows the results of flow cytometry for detecting the expression of GFP which indicates the RNA-guided cleavage in the fluorescence-reporting system by ISTfu1 TnpB, ISDge10 TnpB, ISAba30 TnpB, ISAam1 and ISYmu1 TnpB polypeptides.
- Fig. 9 shows the efficiency of editing by ISTfu1 TnpB and ISDra2 TnpB polypeptides with different TAMs in the fluorescence-reporting system.
- Fig. 10 shows the results of the surveyor assays for detecting the RNA-guided cleavage in human cells by ISTfu1 TnpB (panel A) , ISDge10 TnpB (panel B) , ISAba30 TnpB (panel C) , ISAam1 (panel D) and ISYmu1 (panel E) TnpB polypeptides.
- Fig. 11 shows the effect of the backbone design on the RNA-guided cleavage by ISDra2 (panel A) , ISTfu1 (panel B) , ISDge10 (panel C) , ISAba30 (panel D) , ISAam1 (panel E) , and ISYmu1 (panel F) TnpB polypeptides.
- Fig. 12 shows the distribution of 10 conserved residues in 25 active TnpB proteins together with ISDra2 TnpB (panel A) , which are marked as asterisks and as black lines in the bottom bar with the domain architecture overlaid, and that the endonuclease activity of TnpB mutants (N to A) was sharply decreased (panel B) .
- Fig. 14 shows gRNA design for seven nucleases at ten genomic loci of human. Nucleases and the corresponding TAM are color-coded. The gRNAs are aligned according to the stranded position. Taking CBLB as an example, the gRNA is more overlapping for ISAam1 and three Cas12f variants than for the other three nucleases.
- Fig. 15 shows the comparison of editing efficiency of two TnpB systems and five Cas nucleases at 10 genomic loci in human HEK293T (panel A) and HCT116 (panel B) cells, and the Comparison of editing efficiency of ISAam1 (panel C) or ISYmu1 (panel D) relative to five Cas nucleases at three genomic loci in HEK293T cells.
- each dot represents the average efficiency of three biological replicates.
- the distribution is shown as a box plot where the box indicates the median (middle line) and the interquartile range (IQR, box limits) and Values from minimum to maximum are shown by the whiskers.
- the gRNA design is shown on the left panel, and editing efficiency shown on the right panel.
- the seven nucleases are color-coded. Since it is impossible to design overlapping gRNAs targeting the same location across all seven nucleases, two groups of overlapping gRNAs were separately designed for ISAam1, three Cas12f variants and Nme2-C. NR, and for ISAam1 and SaCas9. ISYmu1 was in a similar scenario.
- TnpB polypeptide refers to a polypeptide encoded by the tnpB gene in an insertion sequence (IS) .
- TnpB endonuclease , “TnpB effector” and “TnpB nuclease” are used interchangeably herein and refer to the TnpB polypeptide having an activity of RNA-guided endonuclease.
- a TnpB polypeptide is generally 300-500 amino acid residues in size.
- TnpB polypeptide “derived from” a microorganism refers to the TnpB polypeptide naturally occurring in the microorganism, including TnpB polypeptide that can be found in an online database such as the National Center for Biotechnology Information (NCBI) , and the natural variants thereof.
- NCBI National Center for Biotechnology Information
- polypeptide and “protein” are used interchangeably herein and refer to a polymer of amino acids and includes full-length proteins and fragments thereof.
- Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, and include restriction endonucleases that cleave DNA at specific sites without damaging the bases.
- endonucleases include, but are not limited to, restriction endonucleases, meganucleases, TAL effector nucleases (TALENs) , zinc finger nucleases, and Cas (CRISPR-associated) effector endonucleases.
- the present disclosure provides novel RNA-guided TnpB endonucleases.
- nucleic acid means a polynucleotide and includes a single or a double-stranded polymer of deoxyribonucleotide or ribonucleotide bases. Nucleic acids may also include fragments and modified nucleotides. Thus, the terms “polynucleotide” , “nucleic acid sequence” , “nucleotide sequence” and “nucleic acid fragment” are used interchangeably to denote a polymer of RNA and/or DNA and/or RNA-DNA that is single-or double-stranded, optionally comprising synthetic, non-natural, or altered nucleotide bases.
- Nucleotides are referred to by their single letter designation as follows: “A” for adenosine or deoxyadenosine (for RNA or DNA, respectively) , “C” for cytosine or deoxycytosine, “G” for guanosine or deoxyguanosine, “U” for uridine, “T” for deoxythymidine, “R” for purines (A or G) , “Y” for pyrimidines (C or T) , “K” for G or T, “H” for A or C or T, “I” for inosine, and “N” for any nucleotide.
- gene as it applies to a prokaryotic and eukaryotic cell or organism cells encompasses not only chromosomal DNA found within the nucleus, but organelle DNA found within subcellular components (e.g., mitochondria, or plastid) of the cell.
- gene refers to the entire complement of genetic material (genes and non-coding sequences) that is present in each cell of an organism, or virus or organelle; and/or a complete set of chromosomes inherited as a (haploid) unit from one parent.
- selectively hybridizes means hybridization, preferably under stringent hybridization conditions, of a nucleic acid sequence to a specified nucleic acid target sequence to a detectably greater degree (e.g., at least 2-fold over background) than its hybridization to non-target nucleic acid sequences, and the substantial exclusion of non-target nucleic acids.
- Selectively hybridizing sequences typically have about at least 80%sequence identity, or 90%sequence identity, up to and including 100%sequence identity (i.e., fully complementary) with each other.
- stringent conditions or “stringent hybridization conditions” includes reference to conditions under which a probe will selectively hybridize to its target sequence in an in vitro hybridization assay. Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified which are 100%complementary to the probe (homologous probing) . Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing) .
- various conditions for hybridization including stringent hybridization conditions and highly stringent hybridization conditions.
- homology refers to DNA sequences that are similar.
- a "region of homology to a genomic region” that is found on the donor DNA is a region of DNA that has a similar sequence to a given "genomic region” in the cell or organism genome.
- a region of homology can be of any length that is sufficient to promote homologous recombination at the cleaved target site.
- the region of homology can comprise 5-3000 or more bases, such as at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, or 3000 in length to enable the homologous recombination with the corresponding genomic region.
- bases such as at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800
- a “genomic region” is a segment on a chromosome or organelle DNA of a cell. that is present either upstream or downstream of the target site or, alternatively, also comprises a portion (at either 5’ or 3’ end) of the target site.
- the genomic region can comprise can comprise 5-3000 or more bases, such as at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, or 3000 in length to enable the homologous recombination with the corresponding region of homology.
- bases such as at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700,
- homologous recombination means the exchange of DNA fragments between two DNA molecules at the sites of homology.
- the frequency of homologous recombination is influenced by a number of factors.
- the amount of homologous recombination and the relative proportion of homologous to non-homologous recombination vary in different organisms.
- the length of the region of homology affects the frequency of homologous recombination events: the longer the region of homology, the greater the frequency.
- the homologous recombination needs a certain length of the homologous region, which is species-variable.
- Sequence identity or “identity” in the context of nucleotide or amino acid sequences refers to the nucleotide bases or amino acid residues in two sequences that are the same when aligned for maximum correspondence over a specified comparison window.
- the term "percentage of sequence identity” refers to the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the nucleotide or amino acid sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences.
- the percentage sequence identity is calculated by dividing the number of matched positions (i.e., positions at which the nucleotide bases or amino acid residues in the two sequences are identical) by the total number of positions in the window of comparison and multiplying the results by 100. For example, when aligning two sequences, if 950 positions in two sequences, which are optimally aligned in a comparison window of 1000 positions, are identical, the sequences are 95%identical to each other.
- BLAST is a searching algorithm provided by NCBI used to find regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches to identify sequences having sufficient similarity to a query sequence such that the similarity would not be predicted to have occurred randomly. BLAST reports the identified sequences and their local alignment to the query sequence. It is well understood by one skilled in the art that many levels of sequence identity are useful in identifying polypeptides from other species or modified naturally or synthetically wherein such polypeptides have the same or similar function or activity. Useful examples of percent identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%or 95%, or any percentage from 50%to 100%.
- any amino acid identity from 50%to 100% may be useful in describing the present disclosure, such as 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%or 99%.
- centimorgan or “map unit” is the distance between two polynucleotide sequences, linked genes, markers, target sites, loci, or any pair thereof, wherein 1%of the products of meiosis are recombinant.
- a centimorgan is equivalent to a distance equal to a 1%average recombination frequency between the two linked genes, markers, target sites, loci, or any pair thereof.
- an "isolated" polynucleotide, polypeptide, or protein is substantially or essentially free from components that normally accompany or interact with the polynucleotide, polypeptide, or protein as found in its naturally occurring environment.
- an isolated polynucleotide or polypeptide or protein is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized.
- an "isolated" polynucleotide is free of sequences that naturally flank the polynucleotide (i.e., sequences located at the 5' and 3' ends of the polynucleotide) in the genomic DNA of the organism from which the polynucleotide is derived.
- Isolated polynucleotides and polypeptides may be purified from a cell in which they naturally occur.
- the methods for isolating or purifying polynucleotides or polypeptides are known to a person skilled in the art. The term also embraces recombinant or chemically synthesized polynucleotides and polypeptides.
- fragment refers to a contiguous set of nucleotides or amino acids. In one embodiment, a fragment comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous nucleotides. In one embodiment, a fragment comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous amino acids. A fragment may or may not exhibit the function of a sequence sharing some percent identity over the length of said fragment.
- the term "functional fragment” refers to a portion of an isolated polynucleotide or polypeptide that displays the same activity or function as the longer or full-length sequence from which it derives.
- gene includes a nucleic acid fragment that expresses a functional molecule such as, but not limited to, a specific protein, including regulatory sequences preceding (5' non-coding sequences) and following (3' non-coding sequences) the coding sequence.
- endogenous means a sequence or other molecule that naturally occurs in a cell or organism.
- An endogenous polynucleotide is normally found in the genome of a cell; that is, not heterologous.
- heterologous refers to the difference between the original environment, location, or composition of a particular polynucleotide or polypeptide and its current environment, location, or composition.
- Non-limiting examples include differences in taxonomic derivation (e.g., a polynucleotide obtained from species A would be heterologous if inserted into the genome of species B, or of a different variety or cultivar of species A; or a polynucleotide obtained from a bacterium was introduced into a cell of a plant or an animal) , or sequence (e.g., a polynucleotide obtained from species A, isolated, modified, and re-introduced into a plant of species A) .
- an "allele” is one of several alternative forms of a gene occupying a given locus on a chromosome. When all the alleles present at a given locus on a chromosome in a cell or an organism are the same, the cell or organism is homozygous at that locus. If the alleles present at a given locus on a chromosome differ, the cell or organism is heterozygous at that locus.
- Coding sequence refers to a nucleotide sequence which codes for a specific amino acid sequence.
- Regulatory sequences refer to nucleotide sequences located upstream (5' non-coding sequences) , within, or downstream (3' non-coding sequences) of a coding sequence, which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include, but are not limited to, promoters, translation leader sequences, 5' untranslated sequences, 3' untranslated sequences, introns, polyadenylation signal sequences, RNA processing sites, effector binding sites, and stem-loop structures.
- a “mutated gene” is a gene that has been altered through human intervention.
- a “mutated gene” has a sequence that differs from the sequence of the corresponding non-mutated gene by the addition, deletion, insertion or substitution of at least one nucleotide.
- the mutated gene comprises an alteration that results from a guide polynucleotide/TnpB endonuclease system as disclosed herein.
- a mutated organism is an organism comprising a mutated gene.
- a "targeted mutation” is a mutation in a gene that is made in a target sequence within the gene using any method known to a person skilled in the art, including a method involving a guided TnpB endonuclease system as disclosed herein.
- knock-out refers to a DNA sequence in a cell that has been rendered partially or completely inoperative, e.g., by targeting with a TnpB protein of the present disclosure; for example, a DNA sequence prior to knock-out could have encoded an amino acid sequence, or could have had a regulatory function (e.g., promoter) .
- a regulatory function e.g., promoter
- knock-in represents the replacement or insertion of a DNA sequence at a specific site in the genome of a cell by targeting with a TnpB protein (for example by homologous recombination (HR) , wherein a suitable donor DNA polynucleotide is also used) .
- the knock-in can be a specific insertion of a heterologous nucleotide sequence that encodes an amino acid sequence or a functional RNA, or a specific insertion of a transcriptional regulatory element.
- domain means a contiguous stretch of nucleotides (that can be RNA, DNA, and/or RNA-DNA-combination sequence) or contiguous or non-contiguous amino acids.
- a “conserved domain” or “motif” means a set of nucleotides or amino acids conserved at specific positions along an aligned sequence of evolutionarily related genes or proteins. While nucleotides or amino acids at other positions can vary between homologous proteins, nucleotides or amino acids that are highly conserved at specific positions indicate amino acids that are essential for the structure, the stability, or the function of a polynucleotide or protein.
- a "codon-optimized” nucleotide sequence is a nucleotide sequence having its frequency of codon usage designed to mimic the frequency of preferred codon usage of the host cell.
- An “optimized” polynucleotide comprises a nucleotide sequence that has been optimized for improved expression in a particular heterologous host cell.
- a “promoter” is a nucleotide sequence involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, and/or comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of some variation may have identical promoter activity.
- a promoter that causes a gene to be expressed in most tissues or cell types at most times are commonly referred to as “constitutive promoter” .
- the term “inducible promoter” or “regulated promoter” refers to a promoter that selectively express a coding sequence or functional RNA in response to the presence of an endogenous or exogenous stimulus, for example by chemical compounds (chemical inducers) or in response to environmental, hormonal, chemical, and/or developmental signals.
- Inducible or regulated promoters include, for example, promoters induced or regulated by light, heat, stress, flooding or drought, salt stress, osmotic stress, phytohormones, wounding, or chemicals such as ethanol, abscisic acid (ABA) , jasmonate, salicylic acid, or safeners.
- promoters induced or regulated by light, heat, stress, flooding or drought, salt stress, osmotic stress, phytohormones, wounding, or chemicals such as ethanol, abscisic acid (ABA) , jasmonate, salicylic acid, or safeners.
- An “enhancer” is a nucleotide sequence that can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the activity or tissue-specificity of a promoter.
- translation leader sequence refers to a nucleotide sequence located between the promoter sequence and the coding sequence.
- the translation leader sequence is present in the mRNA upstream of the start codon.
- the translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency.
- 3' non-coding sequences which can be exchanged with "transcription terminator” or “termination sequences” refer to nucleotide sequences located downstream of a coding sequence and include polyadenylation signal sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression.
- the polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3' end of the mRNA precursor.
- RNA transcript refers to the product resulting from transcription of a DNA sequence catalyzed by RNA polymerase. When the RNA transcript is a perfect complimentary copy of the DNA sequence, it is referred to as the primary transcript or pre-mRNA. An RNA transcript derived from post-transcriptional processing of the pre-mRNA is referred to as mature RNA or messenger RNA (mRNA) . “Messenger RNA” or “mRNA” refers to the RNA that can be translated into protein and does not comprises introns. "cDNA” refers to a DNA that is complementary to, and synthesized from, an mRNA template using the reverse transcriptase.
- the cDNA can be single-stranded or converted into double-stranded form using, e.g., the Klenow fragment of DNA polymerase I.
- Sense RNA refers to RNA transcript that includes the mRNA and can be translated into protein within a cell or in vitro.
- Antisense RNA refers to an RNA transcript that is complementary to all or part of a target primary transcript or mRNA, and that can block the expression of a target gene. The complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e., at the 5' non-coding sequence, 3' non-coding sequence, introns, or the coding sequence.
- RNA refers to antisense RNA, ribozyme RNA, or other RNA that may not be translated but yet has an effect on cellular processes.
- complement and “reverse complement” are used interchangeably herein with respect to mRNA transcripts, and are meant to define the antisense RNA of the message.
- operably linked refers to the association of nucleotide sequences on a single nucleic acid fragment so that the function of one is regulated by the other.
- a promoter is operably linked with a coding sequence when it is capable of regulating the expression of the coding sequence (i.e., the coding sequence is transcribed under the control of the promoter) .
- Coding sequences can be operably linked to regulatory sequences in a sense or antisense orientation.
- a heterologous component polynucleotide, polypeptide, other molecule, cell
- a “host cell” refers to an in vivo or isolated eukaryotic cell, prokaryotic cell (e.g., bacterial or archaeal cell) , or cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, into which a heterologous polynucleotide or polypeptide has been introduced.
- the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell and an animal cell, such as an invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, an insect cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
- the cell is isolated.
- the cell is in vivo.
- recombinant refers to an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis, or by genetic engineering techniques.
- Plasmid and "vector” refer to a linear or circular extra chromosomal element often carrying genes that are not part of the central metabolism of the cell, and usually in the form of double-stranded DNA.
- Such elements may be autonomously replicating sequences, genome integrating sequences, phage, or nucleotide sequences, in linear or circular form, of a single-or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a polynucleotide of interest into a cell.
- nucleic acid construct when referring to nucleic acid molecules, comprises an artificial combination of nucleic acid sequences, e.g., regulatory and coding sequences that are not all found together in nature.
- nucleic acid construct contains the control sequences required to express the coding sequence of the present invention, the term is synonymous with the term “expression cassette” .
- a nucleic acid construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature. Such a construct may be used by itself or may be used in conjunction with a vector.
- the vector for expressing a coding sequence (e.g., comprising an expression construct) is referred to as “expression vector” .
- expression refers to the production of a functional end-product (e.g., an mRNA, guide RNA, or a protein) in either precursor or mature form.
- a functional end-product e.g., an mRNA, guide RNA, or a protein
- a “mature” protein refers to a post-translationally processed polypeptide (i.e., one from which any pre-or propeptides present in the primary translation product have been removed) .
- Precursor protein refers to the primary product of translation of mRNA (i.e., with pre-and propeptides still present) . Pre-and propeptides may be but are not limited to intracellular localization signals.
- an "effector” or “effector protein” is a protein that encompasses an activity including recognizing, binding to, and/or cleaving or nicking a polynucleotide target.
- An effector, or effector protein may also be an endonuclease, such as the TnpB polypeptide of the invention.
- the "effector complex" of a gene editing system includes TnpB polypeptide involved in gRNA and target recognition and binding.
- a “functional fragment" of a TnpB endonuclease refers to a portion of the TnpB endonuclease of the present disclosure in which the ability to recognize, bind to, and/or cleave (introduce a double-strand break in) the target site is retained.
- the "functional variant" of a TnpB endonuclease refers to a variant of the TnpB endonuclease disclosed herein in which the ability to recognize, bind to, and/or cleave a target sequence is retained.
- a TnpB endonuclease may also include a multifunctional TnpB endonuclease, which refers to a single polypeptide that has endonuclease activity (comprising at least one protein domain that can act as a endonuclease) and at least one other functionality, such as but not limited to, the functionality to form a complex (comprises at least a second protein domain that can form a complex with other proteins) .
- a multifunctional TnpB endonuclease refers to a single polypeptide that has endonuclease activity (comprising at least one protein domain that can act as a endonuclease) and at least one other functionality, such as but not limited to, the functionality to form a complex (comprises at least a second protein domain that can form a complex with other proteins) .
- the term "guide polynucleotide” relates to a polynucleotide that can form a complex with a TnpB endonuclease, such as the TnpB endonuclease described herein, and enables the Cas endonuclease to recognize, optionally bind to, and optionally cleave a DNA target site.
- the guide polynucleotide can be a guide RNA, a guide DNA sequence, or a combination thereof (an RNA-DNA combination molecule) .
- the guide RNA is also referred to as “right element RNA” or “reRNA” .
- a “functional fragment” of a guide polynucleotide refers to a portion or subsequence of the guide polynucleotide of the present disclosure in which the ability to function as a guide polynucleotide is retained.
- a “functional variant” of a guide polynucleotide refers to a variant of the guide polynucleotide of the present disclosure in which the ability to function as a guide polynucleotide is retained.
- targeting domain and “targeting region” are used interchangeably herein and includes a nucleotide sequence that can hybridize (is complementary) to one strand (nucleotide sequence) of a double strand DNA target site.
- the percent complementation between the targeting region and the target sequence can be at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 63%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%.
- variable targeting region can be at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides in length. In some embodiments, the variable targeting domain comprises a contiguous stretch of 12 to 30 nucleotides.
- the targeting domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence, or any combination thereof.
- the "backbone" of a guide polynucleotide comprises a nucleotide sequence that interacts with a TnpB polypeptide.
- gRNA/TnpB complex refers to an RNA component and a TnpB endonuclease that are capable of forming a complex, wherein the complex can direct the TnpB endonuclease to a DNA target site, enabling the TnpB endonuclease to recognize, bind to, and/or cleave (introduce a double-strand break) the DNA target site.
- target site refers to a nucleotide sequence on a chromosome, episome, a locus, or any other DNA molecule in the genome (including chromosomal, chloroplastic, mitochondrial DNA, plasmid DNA) of a cell, at which a gRNA/TnpB complex can recognize, bind to, and optionally nick or cleave.
- the target site can be an endogenous site in the genome of a cell, or alternatively, the target site can be heterologous to the cell and thereby not be naturally occurring in the genome of the cell, or the target site can be found in a heterologous genomic location compared to where it occurs in nature.
- a "transposon-associated motif” herein refers to a short nucleotide sequence adjacent to a target sequence that is recognized (targeted) by a gRNA/TnpB complex described herein.
- the TnpB endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not adjacent to a TAM sequence.
- the sequence and length of a TAM herein can differ depending on the TnpB protein.
- the TAM sequence is typically 4 or 5 nucleotides long.
- a “modified” TnpB polypeptide/endonuclease refers to a TnpB polypeptide comprising the substitution, deletion, insertion or addition of at least one amino acid when compared to the initial or wildtype TnpB polypeptide. If the modified TnpB is deprived of the activity of cleaving the DNA molecule while the ability of recognizing and binding to polynucleotide is retained, the modified TnpB polypeptide can be referred to a “disarmed” TnpB polypeptide.
- altered target site refers to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence.
- alteration includes, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, (iv) a chemical alteration of at least one nucleotide, or (v) any combination of (i) - (iv) .
- a “modified nucleotide” or “edited nucleotide” refers to a nucleotide sequence of interest that comprises at least one alteration when compared to its non-modified nucleotide sequence.
- Such "alterations” include, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, (iv) a chemical alteration of at least one nucleotide, or (v) any combination of (i) - (iv) .
- Methods for "modifying a target site” and “altering a target site” are used interchangeably herein and refer to methods for producing an altered target site.
- donor DNA is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a gRNA/TnpB complex of the invention.
- polynucleotide modification template includes a polynucleotide that comprises at least one nucleotide modification when compared to the nucleotide sequence to be edited.
- a nucleotide modification can be the substitution, addition, insertion or deletion of at least one nucleotide.
- the polynucleotide modification template can further comprise homologous nucleotide sequences flanking the at least one nucleotide modification, wherein the flanking homologous nucleotide sequences provide sufficient homology to the desired nucleotide sequence to be edited.
- the term "before” in reference to a sequence position, refers to an occurrence of one sequence upstream of another sequence (at the 5’ end for nucleotide sequences, or at the N terminus for the amino acid sequences) .
- the term “after” in reference to a sequence position refers to an occurrence of one sequence downstream of another sequence (at the 3’ end for nucleotide sequences, or at the C terminus for the amino acid sequences) .
- TnpB polypeptides encoded by the TnpB gene in the insertion sequences (IS) . That is, the TnpB polypeptides can work as the effector protein in a gene editing system.
- the active TnpB polypeptides comprises a N-terminal helix-turn-helix (HTH) domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
- the present disclosure provides an isolated TnpB polypeptide having the activity of RNA-guided endonuclease or a functional fragment thereof.
- the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
- the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp.
- PCC 7120 Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
- the TnpB polypeptide is encoded by a tnpB gene in an insertion sequence (IS) selected from a group consisting of ISEfa4, ISAs26, ISCpe2, ISMma22, ISBce3, ISAeme8, ISTfu1, ISCco1, ISSoc3, ISTel2, ISNsp3, ISCbt1, ISMac7, ISEc46, ISSen6, ISHahl1, ISKpn69, ISDge10, ISKpn85, ISNsp2, ISAba30, ISRor9, ISAam1, ISYmu1, ISCytsp1, ISCvi1, ISCvi2, ISAepa1 and ISBth16.
- IS insertion sequence
- the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 , 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids.
- the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
- the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
- the TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide.
- the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.
- the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT.
- the TnpB polypeptide recognizes a TAM of CCAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.
- the TnpB polypeptide recognizes a TAM of CTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 16.
- the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29.
- the TnpB polypeptide recognizes a TAM of TTAC.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20.
- the TnpB polypeptide recognizes a TAM of TTAG.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14.
- the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.
- the TnpB polypeptide recognizes a TAM of TTGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 24.
- the TnpB polypeptide of the present disclosure or the functional fragment thereof is capable of effecting RNA-guided cleavage in a prokaryotic and/or eukaryotic cell, preferably in both prokaryotic and eukaryotic cells.
- the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24.
- the the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- the the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- the variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids.
- the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
- the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
- the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
- Examples for the DDE motif include
- the present disclosure also provides a modified/disarmed TnpB polypeptide comprising a modified DDE motif.
- the DDE motif is modified by substituting at least one amino acid in the motif with a neutral amino acid or a basic amino acid.
- at least one amino acid in the motif is substituted by alanine.
- the modified TnpB polypeptide comprises
- D185A, E268A and/or D350A as compared to SEQ ID NO: 10;
- D234A, E342A and/or D436A as compared to SEQ ID NO: 12;
- the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide is conserved at positions corresponding to N31, G179, L267, C332, C335, C351, and C354 of SEQ ID NO: 7.
- the present disclosure also provides a modified/disarmed TnpB polypeptide comprising a modification at the position corresponding to N31 of SEQ ID NO: 7.
- the modification is a substitution with alanine.
- the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide is conserved at positions corresponding to G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
- the modified TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA, i.e., is a disarmed TnpB polypeptide.
- the present disclosure provides a fusion polypeptide comprising a TnpB polypeptide of the present disclosure or a functional fragment thereof or a modified/disarmed TnpB polypeptide of the present disclosure, fused to a fusion partner.
- the TnpB polypeptide includes the TnpB polypeptide with the activity of RNA-guided endonuclease as described above, or the functional fragment thereof.
- the modified/disarmed TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA.
- the fusion partner is a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone or other DNA-binding protein) associated with the target DNA.
- a polypeptide e.g., a histone or other DNA-binding protein
- the fusion partner is a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
- the fusion partner is a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc. ) .
- the fusion partner is another polypeptide or domain, for example Clo51 or FokI nuclease, to generate double-strand breaks (Guilinger et al. Nature Biotechnology, volume 32, number 6, June 2014) .
- the fusion partner is a polypeptide that directs editing of single or multiple bases in a polynucleotide sequence, for example a site-specific deaminase that can change the identity of a nucleotide, for example from C-G to T-A or an A-T to G-C (Gaudelli et al., Programmable base editing of A-T to G-C in genomic DNA without DNA cleavage.
- ature (2017) ; Nishida et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems.
- Science 353 (6305) (2016) ; Komor et al. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.
- the fusion polypeptide may comprise, for example, an active (double strand break creating) , partially active (nickase) or deactivated (deprived of cleaving) TnpB endonuclease and a deaminase (such as, but not limited to, a cytidine deaminase, an adenine deaminase, APOBEC1, APOBEC3A, BE2, BE3, BE4, ABEs, or the like) .
- the fusion partner includes base edit repair inhibitors and glycosylase inhibitors (e.g., uracil glycosylase inhibitor (to prevent uracil removal) ) .
- the fusion partner can be a Cas endonuclease or another TnpB endonuclease as described in the present disclosure.
- the TnpB polypeptide, the functional fragment thereof, or the modified/disarmed TnpB polypeptide of the present disclosure can also be fused to a heterologous nuclear localization sequence (NLS) .
- a heterologous NLS herein may be of sufficient strength to drive accumulation of the TnpB polypeptide the functional fragment thereof, the modified/disarmed TnpB polypeptide or the fusion polypeptide in a detectable amount in the nucleus of a eukaryotic cell.
- An NLS may comprise one (monopartite) or more (e.g., bipartite) short sequences (e.g., 2 to 20 residues) of basic, positively charged residues (e.g., lysine and/or arginine) .
- An NLS may be operably linked to the N-terminus or C-terminus of a TnpB polypeptide, for example.
- Two or more NLS sequences can be linked to a TnpB polypeptide, for example, on both the N-and C-termini of a TnpB polypeptide.
- the guide polynucleotide enables target recognition, binding, and optionally cleavage by the TnpB polypeptide.
- the guide polynucleotide sequence can be a RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence) .
- the guide polynucleotide can comprise at least one nucleotide, phosphodiester bond or linkage modification such as, but not limited, to Locked Nucleic Acid (LNA) , 5-methyl dC, 2, 6-Diaminopurine, 2'-Fluoro A, 2'-Fluoro U, 2'-O-Methyl RNA, phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 (hexaethylene glycol chain) molecule, or 5' to 3' covalent linkage resulting in circularization.
- LNA Locked Nucleic Acid
- 5-methyl dC 2, 6-Diaminopurine
- 2'-Fluoro A 2'-Fluoro U
- 2'-O-Methyl RNA phosphorothioate bond
- linkage to a cholesterol molecule linkage to a polyethylene glycol molecule
- a guide polynucleotide that solely comprises ribonucleic acids is also referred to as a "guide RNA” or “gRNA” .
- a guide polynucleotide may be engineered or synthetic.
- the gRNA for TnpB polypeptide is also referred to as “right element RNA” or “reRNA” .
- the guide polynucleotide includes a chimeric non-naturally occurring guide RNA comprising regions that are not found together in nature (i.e., they are heterologous with each other) .
- a chimeric non-naturally occurring guide RNA comprising a targeting region that can hybridize to a nucleotide sequence in a target DNA, linked to a backbone region that can recognize the TnpB polypeptide, wherein the first and second nucleotide sequence are not found linked together in nature.
- the targeting region is at the 3’ end of the scarffold.
- the guide polynucleotide for TnpB polypeptide is a single guide, and the backbone can be 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived.
- the guide polynucleotide can effect the target recognition, binding, and optionally cleavage by the TnpB polypeptide when removing a part or the whole of one or several stem structures in the backbone.
- the guide polynucleotide can further comprise an additional nucleotide sequence at the 5’ end of the backbone.
- the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
- the targeting region and the backbone region are selected from the group consisting of a DNA sequence, an RNA sequence, and a combination thereof.
- the guide polynucleotide comprises RNA backbone modifications that enhance stability, DNA backbone modifications that enhance stability, and a combination thereof (see Kanasty et al., 2013, Common RNA-backbone modifications, Nature Materials 12:976-977; US20150082478 published 19 Mar. 2015 and US20150059010 published 26 Feb. 2015) .
- TnpB endonuclease, the functional fragment thereof, the disarmed TnpB polypeptide and the fusion polypeptide of the present disclosure can be isolated from a native source (for TnpB polypeptide) , or from a recombinant source where the host cell is genetically modified to express the nucleotide sequence encoding the polypeptide.
- the TnpB polypeptide and fusion polypeptide can be produced using cell free protein expression systems, or be synthetically produced.
- the present disclosure also provides an isolated polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide, the functional fragment thereof, the disarmed TnpB polypeptide and the fusion polypeptide of the present disclosure.
- the TnpB polypeptide, the functional fragment thereof, the disarmed TnpB polypeptide and the fusion polypeptide, as well as the guide polynucleotide can be expressed in a cell.
- Cells include, but are not limited to, human, non-human, animal, bacterial, fungal, insect, yeast, and plant cells.
- vectors and constructs including circular plasmids, and linear polynucleotides, comprising a polynucleotide of interest and optionally other components including linkers, adapters, regulatory sequences.
- the vector comprises an expression cassette encoding both the TnpB polypeptide and the guide polynucleotide.
- a recognition site and/or target site can be comprised within an intron, coding sequence, 5' UTRs, 3' UTRs, and/or regulatory regions.
- the vector comprises two expression cassettes encoding the TnpB polypeptide and the guide polynucleotide, respectively.
- the expression of the TnpB polypeptide and/or the guide polynucleotide is driven by a constitutive promoter, an inducible promoter, or a spatio-temporal specific promoter.
- the present disclosure provides a recombinant gene editing system comprising the novel TnpB polypeptide having the activity of RNA-guided endonuclease of the present disclosure.
- the recombinant gene editing system comprises:
- TnpB polypeptide of the present disclosure having the activity of RNA-guided endonuclease or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide, or the functional fragment thereof, and
- a guide polynucleotide such as a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
- gRNA guide RNA
- the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
- the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp.
- PCC 7120 Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
- the TnpB polypeptide is encoded by a tnpB gene in an insertion sequence (IS) selected from a group consisting of ISEfa4, ISAs26, ISCpe2, ISMma22, ISBce3, ISAeme8, ISTfu1, ISCco1, ISSoc3, ISTel2, ISNsp3, ISCbt1, ISMac7, ISEc46, ISSen6, ISHahl1, ISKpn69, ISDge10, ISKpn85, ISNsp2, ISAba30, ISRor9, ISAam1, ISYmu1, ISCytsp1, ISCvi1, ISCvi2, ISAepa1 and ISBth16.
- IS insertion sequence
- the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids.
- the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
- the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
- the TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide.
- the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.
- the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT.
- the TnpB polypeptide recognizes a TAM of CCAT.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.
- the TnpB polypeptide recognizes a TAM of CTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 16.
- the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29.
- the TnpB polypeptide recognizes a TAM of TTAC.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20.
- the TnpB polypeptide recognizes a TAM of TTAG.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14.
- the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.
- the TnpB polypeptide recognizes a TAM of TTGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 24.
- the TnpB polypeptide of the present disclosure or the functional fragment thereof is capable of effecting RNA-guided cleavage in a prokaryotic and/or eukaryotic cell, preferably in both prokaryotic and eukaryotic cells.
- the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24.
- the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- the variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids.
- the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
- the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
- the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
- the TnpB comprises a DDE motif corresponding to
- the recombinant gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
- the recombinant gene editing system further comprises a heterologous polynucleotide, such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- a heterologous polynucleotide such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- the backbone of the guide polynucleotide comprises the 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived.
- the backbone is modified by removing a part or the whole of one or several stem structures in the backbone. In some embodiments, a part or the whole of the first stem structure from the 3’ end is removed.
- the guide polynucleotide comprises an additional nucleotide sequence at the 5’ end of the backbone.
- the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
- the gRNA further comprises one or more additional protein-binding domains.
- the system comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.
- composition or complex comprising
- TnpB polypeptide of the present disclosure having the activity of RNA-guided endonuclease or a functional fragment thereof,
- a target double-stranded DNA comprising a nucleotide sequence of interest and a TAM recognized by the TnpB polypeptide
- gRNA recombinant guide RNA
- the present disclosure also provides an isolated cell comprising
- TnpB polypeptide of the present disclosure having the activity of RNA-guided endonuclease or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide, or the functional fragment thereof,
- a target double-stranded DNA comprising a nucleotide sequence of interest and a TAM recognized by the TnpB polypeptide
- gRNA recombinant guide RNA
- the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
- the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp.
- PCC 7120 Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
- the TnpB polypeptide is encoded by a tnpB gene in an insertion sequence (IS) selected from a group consisting of ISEfa4, ISAs26, ISCpe2, ISMma22, ISBce3, ISAeme8, ISTfu1, ISCco1, ISSoc3, ISTel2, ISNsp3, ISCbt1, ISMac7, ISEc46, ISSen6, ISHahl1, ISKpn69, ISDge10, ISKpn85, ISNsp2, ISAba30, ISRor9, ISAam1, ISYmu1, ISCytsp1, ISCvi1, ISCvi2, ISAepa1 and ISBth16.
- IS insertion sequence
- the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids.
- the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
- the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
- the TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide.
- the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.
- the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT.
- the TnpB polypeptide recognizes a TAM of CCAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.
- the TnpB polypeptide recognizes a TAM of CTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 16.
- the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29.
- the TnpB polypeptide recognizes a TAM of TTAC.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20.
- the TnpB polypeptide recognizes a TAM of TTAG.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14.
- the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.
- the TnpB polypeptide recognizes a TAM of TTGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 24.
- the TnpB polypeptide of the present disclosure or the functional fragment thereof is capable of effecting RNA-guided cleavage in a prokaryotic and/or eukaryotic cell, preferably in both prokaryotic and eukaryotic cells.
- the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24.
- the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- the variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids.
- the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
- the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
- the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
- the TnpB comprises a DDE motif corresponding to
- the recombinant gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
- the composition further comprises a heterologous polynucleotide, such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- a heterologous polynucleotide such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- the backbone of the guide polynucleotide comprises the 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived.
- the backbone is modified by removing a part or the whole of one or several stem structures in the backbone. In some embodiments, a part or the whole of the first stem structure from the 3’ end is removed.
- the guide polynucleotide comprises an additional nucleotide sequence at the 5’ end of the backbone.
- the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
- the gRNA further comprises one or more additional protein-binding domains.
- the composition of isolated cell comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.
- the present disclosure provides a recombinant system comprising
- modified TnpB polypeptide of the present disclosure comprising a modification in the DDE motif as compared to the parent TnpB polypeptide, or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or the functional fragment thereof, wherein the parent polypeptide has the activity of RNA-guided endonuclease, and the modified TnpB polypeptide is deprived of the activity of cleaving double-stranded DNA, and
- gRNA guide RNA
- the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
- the DDE motif is modified by substituting at least one amino acid in the motif with a neutral amino acid or a basic amino acid. In some embodiments, at least one amino acid in the motif is substituted by alanine. In some embodiments, the modified TnpB polypeptide comprises
- D185A, E268A and/or D350A as compared to SEQ ID NO: 10;
- D234A, E342A and/or D436A as compared to SEQ ID NO: 12;
- the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide is conserved at positions corresponding to N31, G179, L267, C332, C335, C351, and C354 of SEQ ID NO: 7.
- the modified TnpB polypeptide comprising a modification at the position corresponding to N31 of SEQ ID NO: 7.
- the modification is a substitution with alanine.
- the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide is conserved at positions corresponding to G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
- the modified TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA, i.e., is a disarmed TnpB polypeptide.
- the backbone of the guide polynucleotide comprises the 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived.
- the backbone is modified by removing a part or the whole of one or several stem structures in the backbone. In some embodiments, a part or the whole of the first stem structure from the 3’ end is removed.
- the guide polynucleotide comprises an additional nucleotide sequence at the 5’ end of the backbone.
- the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
- the gRNA further comprises one or more additional protein-binding domains.
- the system comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.
- the present disclosure provides a gene editing system comprising
- fusion polypeptide of the present disclosure e.g., comprising a TnpB polypeptide having the activity of RNA-guided endonuclease, or a functional fragment thereof, or a modified TnpB polypeptide fused to a fusion partner, or a polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide, or the functional fragment thereof, and
- a guide polynucleotide such as a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
- gRNA guide RNA
- the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
- the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp.
- PCC 7120 Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
- the TnpB polypeptide is encoded by a tnpB gene in an insertion sequence (IS) selected from a group consisting of ISEfa4, ISAs26, ISCpe2, ISMma22, ISBce3, ISAeme8, ISTfu1, ISCco1, ISSoc3, ISTel2, ISNsp3, ISCbt1, ISMac7, ISEc46, ISSen6, ISHahl1, ISKpn69, ISDge10, ISKpn85, ISNsp2, ISAba30, ISRor9, ISAam1, ISYmu1, ISCytsp1, ISCvi1, ISCvi2, ISAepa1 and ISBth16.
- IS insertion sequence
- the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids.
- the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
- the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the TnpB polypeptide fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
- the TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide.
- the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.
- the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT.
- the TnpB polypeptide recognizes a TAM of CCAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.
- the TnpB polypeptide recognizes a TAM of CTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 16.
- the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO:1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29.
- the TnpB polypeptide recognizes a TAM of TTAC.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20.
- the TnpB polypeptide recognizes a TAM of TTAG.
- the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14.
- the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.
- the TnpB polypeptide recognizes a TAM of TTGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 24.
- the TnpB polypeptide of the present disclosure or the functional fragment thereof is capable of effecting RNA-guided cleavage in a prokaryotic and/or eukaryotic cell, preferably in both prokaryotic and eukaryotic cells.
- the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24.
- the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- the variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids.
- the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
- the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
- the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
- the TnpB comprises a DDE motif corresponding to
- the recombinant gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
- the recombinant gene editing system further comprises a heterologous polynucleotide, such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- a heterologous polynucleotide such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- the backbone of the guide polynucleotide comprises the 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived.
- the backbone is modified by removing a part or the whole of one or several stem structures in the backbone. In some embodiments, a part or the whole of the first stem structure from the 3’ end is removed.
- the guide polynucleotide comprises an additional nucleotide sequence at the 5’ end of the backbone.
- the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
- the modified TnpB polypeptide comprises a modification in the DDE motif as compared to the parent TnpB polypeptide, wherein the parent polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
- the DDE motif is modified by substituting at least one amino acid in the motif with a neutral amino acid or a basic amino acid. In some embodiments, at least one amino acid in the motif is substituted by alanine. In some embodiments, the modified TnpB polypeptide comprises
- D185A, E268A and/or D350A as compared to SEQ ID NO: 10;
- D234A, E342A and/or D436A as compared to SEQ ID NO: 12;
- the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide is conserved at positions corresponding to N31, G179, L267, C332, C335, C351, and C354 of SEQ ID NO: 7.
- the modified TnpB polypeptide comprising a modification at the position corresponding to N31 of SEQ ID NO: 7.
- the modification is a substitution with alanine.
- the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- the modified TnpB polypeptide is conserved at positions corresponding to G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
- the modified TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA, i.e., is a disarmed TnpB polypeptide.
- the fusion partner is a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone or other DNA-binding protein) associated with the target DNA.
- a polypeptide e.g., a histone or other DNA-binding protein
- the fusion partner is a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
- the fusion partner is a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc. ) .
- the fusion partner is another polypeptide or domain, for example Clo51 or FokI nuclease, to generate double-strand breaks (Guilinger et al. Nature Biotechnology, volume 32, number 6, June 2014) .
- the fusion partner is a polypeptide that directs editing of single or multiple bases in a polynucleotide sequence, for example a site-specific deaminase that can change the identity of a nucleotide, for example from C-G to T-A or an A-T to G-C (Gaudelli et al., Programmable base editing of A-T to G-C in genomic DNA without DNA cleavage. " Nature (2017) ; Nishida et al. "Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. " Science 353 (6305) (2016) ; Komor et al. "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. " Nature 533 (7603) (2016) : 420-4.
- the fusion partner is an active (double strand break creating) , partially active (nickase) or deactivated (deprived of cleaving) TnpB endonuclease and a deaminase (such as, but not limited to, a cytidine deaminase, an adenine deaminase, APOBEC1, APOBEC3A, BE2, BE3, BE4, ABEs, or the like) .
- the fusion partner includes base edit repair inhibitors and glycosylase inhibitors (e.g., uracil glycosylase inhibitor (to prevent uracil removal) ) .
- the fusion partner is a Cas endonuclease or another TnpB endonuclease as described in the present disclosure.
- the fusion partner is a heterologous NLS.
- the NLS is operably linked to the N-terminus or C-terminus of the TnpB polypeptide, the functional fragment thereof or the modified TnpB polypeptide.
- the fusion polypeptides comprises two or more NLS sequences linked to the TnpB polypeptide the functional fragment thereof or the modified TnpB polypeptide, for example, on both the N-and C-termini of the same.
- the gRNA further comprises one or more additional protein-binding domains.
- the system comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.
- the present disclosure provides a method of introducing a double-strand break into a polynucleotide of interest comprising a step of contacting the polynucleotide with the recombinant gene editing system of the present disclosure targeting a nucleotide sequence in the polynucleotide.
- the present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell the recombinant gene editing system or the fusion system of the present disclosure targeting a genomic sequence in the cell.
- the present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell the disarmed system of the present disclosure and a gene editing system targeting the genomic sequence, wherein the nucleotide sequence targeted by the disarmed sequence is next to the genomic sequence.
- Methods for introducing polynucleotides or polypeptides or a polynucleotide-protein complex into cells or organisms are known in the art including, but not limited to, microinjection, electroporation, stable transformation methods, transient transformation methods, ballistic particle acceleration (particle bombardment) , whiskers mediated transformation, Agrobacterium-mediated transformation, direct gene transfer, viral-mediated introduction, transfection, transduction, cell-penetrating peptides, mesoporous silica nanoparticle (MSN) -mediated direct protein delivery, topical applications, sexual crossing, sexual breeding, and any combination thereof.
- microinjection electroporation
- stable transformation methods including, but not limited to, transient transformation methods, ballistic particle acceleration (particle bombardment) , whiskers mediated transformation, Agrobacterium-mediated transformation, direct gene transfer, viral-mediated introduction, transfection, transduction, cell-penetrating peptides, mesoporous silica nanoparticle (MSN) -mediated direct protein
- Adeno-associated virus is a widely used vector for deliver heterologous polynucleotides.
- AAV recombinant AAV
- the delivery of Cas system with recombinant AAV is limited, and it is generally not possible to deliver a Cas-fusion system (the fusion and gRNA) encoded in a single vector.
- the gene editing system with a TnpB polypeptide and a TnpB fusion can be delivered in a single rAAV.
- the present disclosure provides a recombinant adeno-associate virus (rAAV) comprising a genome comprising a first expression cassette encoding the TnpB polypeptide, the modified TnpB polypeptide, or the fusion polypeptide of the present disclosure.
- the first expression cassette comprises a promoter and a terminator.
- the genome comprises a second expression cassette encoding a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof.
- gRNA guide RNA
- the second expression cassette comprises a promoter and a terminator.
- the first expression cassette comprises less than about 4,700 nucleotides, less than about 4,600 nucleotides, less than about 4,500 nucleotides, less than about 4,400 nucleotides, less than about 4,300 nucleotides, less than about 4,200 nucleotides, less than about 4,100 nucleotides, less than about 4,000 nucleotides, less than about 3,900 nucleotides, less than about 3,800 nucleotides, less than about 3,700 nucleotides, less than about 3,600 nucleotides, less than about 3,500 nucleotides, less than about 3,400 nucleotides, less than about 3,300 nucleotides, less than about 3,200 nucleotides, less than about 3,100 nucleotides, less than about 3,000 nucleotides, less than about 2,900 nucleotides, less than about 2,800 nucleotides, less than about 2,700 nucleotides, less than about 2,600 nucleotides, or less than
- the genome comprises about 4,500 to about 4,700 nucleotides.
- the present disclosure provides a method of screening TnpB polypeptide for the activity of cleaving double-stranded DNA comprising the steps of:
- gRNA comprising a targeting region and a backbone region, wherein the backbone region comprises at least 100 nucleotides before the 3’ end of the IS, which naturally comprises the nucleotide sequence encoding the TnpB polypeptide;
- a target DNA comprising a nucleotide sequence that hybridizes to the nucleotide sequence of the targeting region and a TAM recognized by the TnpB polypeptide, wherein the TAM consists of four or five consecutive nucleotides adjacent to the 5’ end of the IS;
- the backbone region comprises at least 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300 or more nucleotides before the 3’ end of the IS. In some embodiments, the backbone region comprises 100-350, 125-325, 150-300, 175-275, 200-250, 150-225, 175-225, 150-200, 175-225, or 175-200nucleotides before the 3’ end of the IS.
- the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
- the TnpB polypeptide can be provided as an isolated polypeptide or by the expression from a polynucleotide encoding the same.
- the TnpB polypeptide is provided by a first polynucleotide, preferably DNA, comprising a first nucleotide sequence encoding the same.
- the gRNA can be provided as an isolated RNA molecule or by the transcription form a DNA molecule encoding the same.
- the gRNA is provided as a second polynucleotide, preferably DNA, comprising a second nucleotide sequence encoding the same.
- the first and second polynucleotides can be provided in a single vector, or in separate vectors.
- the first and second polynucleotides are provided in a first vector, such as a first plasmid.
- the first nucleotide sequence is operably linked to a first promoter.
- the second nucleotide sequence is operably linked to a second promoter.
- the target DNA is provided in a second plasmid.
- contacting the TnpB polypeptide with the gRNA and the target DNA comprises introducing the first and second plasmids into a host cell comprising the target DNA.
- This Example was carried out to screen TnpB polypeptides with endonuclease activity.
- a series of plasmid pairs (a test plasmid comprising nucleotide sequences encoding a TnpB polypeptide, the gRNA (comprising a targeting sequence of SEQ ID NO: 240 and related reRNA backbone) and a resistant gene against chloramphenicol (Cm) , and a reporter plasmid comprising a target sequence of SEQ ID NO: 240, a TAM and a resistant gene against kanamycin) were constructed.
- TnpB genes and gRNA coding sequences (backbone + targeting region) were synthesized by Tsingke (Beijing, China) and cloned into pBAD backbone by Gibson Assembly, the TnpB genes were driven by J23108 promoter, and gRNA coding sequences were driven by J23119 promoter (see Leenay et al., 2016, Identifying and Visualizing Functional PAM Diversity across CRISPR-Cas Systems, Molecular Cell, 62, 1-11) .
- the reporter plasmid (Kan+) carrying oligos containing target nucleotide sequence and related TAM flanked by EcoRI and XhoI restriction sites were ordered from Tsingke (Beijing, China) .
- oligos and the pCB457 plasmid were digested with EcoRI and BamHI at 37°C for 1h.
- the digested products were isolated with and ligated using T4 ligase according to the manufacturer’s instructions.
- Fig. 1 The maps of the plasmid pair for ISTfu1 TnpB are shown in Fig. 1 as an example.
- Each of the plasmid pairs were transformed into E. coli BW25141 cells by electroporation to test the endonuclease activity of the TnpB polypeptide (test group) , and a plasmid pair with the removal of the gRNA coding sequence from the test plasmid was used as negative control.
- test and target plasmids were electroporated into E. coli (NEB 10 ⁇ , C3020K) using BIO-RAD machine (Gene Pulser Xcell) with program 1.8kV, 25 ⁇ F, 200phm. After electroporation, 900 ⁇ l SOC medium was added followed by the incubation at 37°C for 1 hour. Then, the mixture was quartered, serially diluted (10 ⁇ ) and inocubated onto different LB plates (Cm+/Kan+, Cm+, Kan+ and plain) (50mg/L for each antibiotic) . After the incubation at 37°Cfor 12 hours, the photos of the plates were taken.
- depletion ratio the number of colonies of TnpB &reRNA group in minimal dilution/the number of colonies of TnpB alone group in minimal dilution
- the plain plate, K+ plate, and Cm+ plate were used to show the efficiency of electroporation. Generally, the negative control showed similar numbers of colonies on these three plates as compared to the test group.
- TnpB polypeptides in total were tested, and the results demonstrated that the TnpB polypeptides of SEQ ID NOs: 1-29 showed endonuclease activity (see Fig. 2) , cleaving a target site in an RNA-guided manner with various depletion ratios (see Fig. 3) , while 52 TnpB polypeptides encoded by SEQ ID NOs: 59-110 did not show endonuclease activity (data not shown) .
- TnpB polypeptides having RNA-guided endonuclease activity and gRNAs as well as the TAMs thereof are shown in Table 1.
- the amino acid sequences of the TnpB polypeptides showing endonuclease activity were aligned (Clustal Omega) .
- the results showed that the DDE motif is conserved in the TnpB polypeptides (see Figs. 4 and 5) .
- the DDE motif in ISBce3 TnpB and ISCbt1 TnpB was not completely aligned with others, it was actually present, and was indicated in the amino acid sequences (see Fig. 5) .
- TnpB polypeptides having the activity of RNA- guided endonuclease are generally conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7, except that ISCbt1 (SEQ ID NO: 12) is different from others at positions corresponding to L267, C332, C335, C351 and C354 of SEQ ID NO: 7.
- DDE motif is essential for the endonuclease activity
- the variants of the TnpB polypeptides of SEQ ID NOs: 1-22 with the first D residue of DDE motif substituted by A was tested as described in Example 1. The results showed that the variants substantially lost the endonuclease activity (see Figs. 2 and 3) , indicating that the DDE motif is essential for the endonuclease activity, and it is possible to prepare disarmed TnpB polypeptide (dTnpB) by introducing mutation (s) into DDE motif.
- dTnpB disarmed TnpB polypeptide
- TnpB polypeptides ISTfu1 SEQ ID NO: 7
- ISDge10 SEQ ID NO: 18
- ISAba30 SEQ ID NO: 21
- ISDra2 The variants of the TnpB polypeptides ISTfu1 (SEQ ID NO: 7) , ISDge10 (SEQ ID NO: 18) , ISAba30 (SEQ ID NO: 21) and ISDra2 with the amino acid corresponding to N31 of SEQ ID NO:7 substituted by A was tested as described in Example 1. The results showed that the variants substantially lost the endonuclease activity (see Fig.
- This Example was carried out to verify the RNA-guided cleavage in eukaryotic cells with the TnpB polypeptides.
- a fluorescence-reporting system was used for identify the RNA-guided cleavage.
- the system comprised a target sequence (SEQ ID NO: 240) and a TAM located between mRFP coding sequence and GFP coding sequence (SEQ ID NOs: 140 and 141) .
- the mRFP and GFP coding sequences were linked out of frame, and thus, the GFP would not be expressed if no cleavage occurred in the target sequence. Once the cleavage occurred, the mRFP and GFP might be linked in frame upon repairing.
- Plasmid groups each comprising a plasmid encoding a TnpB (TnpB plasmid) , a plasmid encoding the corresponding gRNA (gRNA plasmid) and a reporting plasmid comprising the fluorescence-reporting system, were constructed.
- oligos containing target sequence and related TAM were ordered from Tsingke (Beijing, China) . Then, the oligos were annealed and ligated into pRGS vector digested with EcoRI and BamHI (see Kim et al., Surrogate Reporters for Enrichment of Cells with Nuclease-induced Mutations, Nature Methods, 2011, 8 (11) : 941-944) .
- the gRNA plasmid was constructed by inserting the oligos encoding gRNA (the target sequence+backbone as shown in Table 1) flanked by EcoRI and BamHI restriction sites into pUC19 plasmid under the control of U6 promoter, and the TnpB plasmid was constructed by inserting the coding sequence of the TnpB polypeptide (see Table 1) into pcDNA3.1.
- the maps of the above three plasmids are shown in Fig. 7.
- a group of plasmids (120ng TnpB plasmid + 80ng gRNA plasmid+ 200ng reporting plasmids) were co-transfected into HEK293T cells (ATCC, CRL3216) with 2000 Reagent (Invitrogen) according to the manufacturer’s instructions.
- the resulted cells were analyzed by flow cytometry (LSRFortessa, BD bioscience) .
- the predicted TAM sequence for ISTfu1 TnpB polypeptide is 5'-TGAT, which is similar to the TAM sequence (5'-TTGAT) of ISDra2 TnpB.
- test plasmids encoding ISTfu1 TnpB polypeptide or ISDra2 TnpB polypeptide and their respective guide RNAs was co-transfected with the reporting plasmids into 293T cells, respectively, and the resulted cells were detected by flow cytometry, as described in Example 3.
- ISTfu1 TnpB polypeptide can recognize all the four TAMs, and a higher efficiency of cleavage was observed for TAMs TTGAT and CTGAT, while ISDra2 TnpB polypeptide can recognize TTGAT only. Further, ISTfu1 TnpB polypeptide showed an efficiency of cleavage more than 2 times higher than ISDra2 TnpB polypeptide.
- This Example was carried out to verify the RNA-guided cleavage of an endogenous gene in human cells by the TnpB polypeptide of the invention.
- the gRNA backbones used in this Example are those listed in Table 1.
- This Example was carried out to verify the RNA-guided cleavage of an endogenous gene in human cells by the TnpB polypeptide of the invention.
- the gRNA backbones used in this Example are those listed in Table 1.
- Plasmids encoding ISTfu1 TnpB polypeptide (SEQ ID NO: 7) and a gRNA comprising a targeting region of SEQ ID NO: 142 (Target sequence 1 in hDNMT1) and a backbone of SEQ ID NO: 117 were constructed and transfected into HEK293T cells as described in Example 3. After an incubation at 37 °C for two days, the transfected cells were collected for the isolation of genomic DNA with an isolation kit (DP201, Bioteke Corporation, Beijing) according to the manufacturer’s instructions The genomic DNA was detected by Surveyor assay to identify the efficiency of cleavage, and the genomic DNA from untreated 293T cells were used as control.
- an isolation kit DP201, Bioteke Corporation, Beijing
- PCR products were then denatured, and annealed with 3 ⁇ L 1XAccuPrime Buffer II, then digested with 0.5 mL Surveyor nuclease (Integrated DNA Technologies, IDT, USA) .
- Lanes #1, #2, and #3 showed cleavage by Surveyor nuclease (the indels%was 14.5%, 16.6%or 20.0%) , indicating that ISTfu1 TnpB polypeptide can achieve RNA-guided cleavage of human DNMT1 in human cells.
- Plasmids encoding ISDge10 TnpB polypeptide (SEQ ID NO: 17) and a gRNA comprising a targeting region of SEQ ID NO: 143 (Target sequence 1 in hTET1) , 144 (Target sequence 1 in hTET2) or 145 (Target sequence in hHPRT) and a backbone of SEQ ID NO: 127 were constructed and transfected into 293T cells as described in Example 3. Plasmids encoding a spCas9 and gRNA comprising the same targeting region were used as reference.
- transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 with the primers listed below.
- ISDge10 TnpB polypeptide achieved RNA-guided cleavage of hTET1, hTET2 and hHPRT, thereby introducing indel into the same indicating that ISDge10 TnpB polypeptide can achieve RNA-guided cleavage of hTET1, hTET2 and hHPRT in human cells.
- Plasmids encoding ISAba30 TnpB polypeptide (SEQ ID NO: 22) and a gRNA comprising a targeting region of SEQ ID NO: 146 (Target sequence 2 in hDNMT1) and a backbone of SEQ ID NO: 132 were constructed and transfected into 293T cells as described in Example 3.
- transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 with the primers of SEQ ID NOs: 226 and 227.
- ISAba30 TnpB polypeptide achieved RNA-guided cleavage of hDNMT1, thereby introducing indel (indels%of 24.9%and 26.0%for #1 and #2, respectively) into the same indicating that ISAba30 TnpB polypeptide can achieve RNA-guided cleavage of hDNMT1 in human cells.
- Plasmids encoding ISAam1 TnpB polypeptide (SEQ ID NO: 23) and a gRNA comprising a targeting region of SEQ ID NO: 147 (Target sequence 2 in hTET1) or 148 (Target sequence 2 in hTET2) and a backbone of SEQ ID NO: 133 were constructed and transfected into 293T cells as described in Example 3. Plasmids encoding a spCas9 and gRNA comprising the same targeting region were used as reference.
- transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 the primers listed below.
- ISAam1 TnpB polypeptide achieved RNA-guided cleavage of hTET1 and hTET2, thereby introducing indel into the same indicating that ISAam1 TnpB polypeptide can achieve RNA-guided cleavage of hTET1 and hTET2 in human cells with an indels%comparable to or even higher than spCas9.
- Plasmids encoding ISYmu1 TnpB polypeptide (SEQ ID NO: 24) and a gRNA comprising a targeting region of SEQ ID NO: 142, 149 (Target sequence in hDNMT3b) , 150 (Target sequence 3 in hDNMT1) , 151 (Target sequence 3 in hTET2) or 152 (Target sequence in hPGK1) and a backbone of SEQ ID NO: 207 or 210 were constructed and transfected into 293T cells as described in Example 3.
- the transfected cells were tested by Surveyor assay of the genomic DNA as described in
- ISYmu1 TnpB polypeptide achieved RNA-guided cleavage of hDNMT1, hDNMT3b, hTET, and hPGK1, thereby introducing indel into the same, indicating that ISAba30 TnpB polypeptide can achieve RNA-guided cleavage of hDNMT1, hDNMT3b, hTET, and hPGK1 in human cells.
- transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 the primers listed below.
- ISAam1 and ISYmu1 TnpB polypeptide achieved higher editing efficiency than ISDra2 in most of the tested genes in human cells.
- the gRNA backbones for ISDra2, ISTfu1, ISDge10, ISAba30, ISAam1, and ISYmu1 TnpB polypeptides were designed as “N” nucleotides at the right end of the IS (referred to as “gN” ) , and the sequences thereof are shown in Tables 3-8. That is, “N” indicates the left end of the gRNA backbone.
- RNA-guided cleavages by the ISDra2, ISTfu1, ISDge10, ISAba30, ISAam1, and ISYmu1 TnpB polypeptides were conducted as described in Example 3, and the results were shown in Fig. 11.
- ISDra2 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 129nt (Fig. 11A) ; ISTfu1 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 139nt (Fig. 11B) ; ISDge10 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 122nt (Fig. 11C) ; ISAba30 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 101nt (Fig.
- ISAam1 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 100nt (Fig. 11E) ; and ISYmu1 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 120nt (Fig. 11F) .
- the TnpB polypeptides was also able to cleavage the target sequence with longer gRNA backbones (Figs. 11A-11F) , indicating that it is possible to use longer gRNA backbone to form RNP.
- Example 7 Comparison of editing efficiency of TnpB polypeptides and Cas nucleases This Example was carried to demonstrate that the editing efficiency of the TnpB polypeptides is superior over the small Cas nucleases.
- ISAam1 and ISYmu1 TnpB polypeptides were examined for their ex vivo and in vivo activity together with specificity in comparison with the five well developed CRISPR-Cas editors, which include two optimized Un1Cas12f1 variants (referred as Un1Cas12f1 and CasMINI) , AsCas12f1, optimized Nme2Cas9 (Nme2-C. NR) , and SaCas9.
- Un1Cas12f1 and CasMINI two optimized Un1Cas12f1 variants
- AsCas12f1 and CasMINI AsCas12f1
- AsCas12f1 and CasMINI AsCas12f1
- Nme2Cas9 Nme2-C. NR
- SaCas9 optimized Nme2Cas9
- Plasmid groups each comprising a plasmid encoding a TnpB or Cas editor, a plasmid encoding the corresponding gRNA (gRNA plasmid) and a reporting plasmid comprising the fluorescence-reporting system, were constructed and transfected into HEK293T cells and HCT116 cells (ATCC CCL-247) , and the genome editing efficiency at 10 genomic loci were tested as described in Example 3.
- ISAam1 and ISYmu1 TnpB polypeptides achieved an editing efficiency which was significantly higher than the well developed small Cas editors (Un1Cas12f1, CasMINI, AsCas12f1, and Nme2-C. NR) and was comparable to SaCas9.
- rAAVs encoding a genome editing system was prepared as described previously (see Ran, F. A. et al. 2015, In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186-191) .
- a nuclease expression cassette driven by CMV promoter and a gRNA/reRNA expression cassette driven by human U6 promoter were cloned between ITRs (see Fig. 16A) , and used to prepare recombinant AAV2 and AAV8.
- HEK293T cells were plated in 150mm dishes 12h before transfection; 30 ⁇ g helper plasmid, 15 ⁇ g AAV2 plasmid and 15 ⁇ g expression plasmid were tansfected using polyethyleneimine, and AAV vectors was purified three days later.
- AAV8 for mouse injection was generated by PackGene Biotech Co, with a concentration of 10 13 gc/ml.
- C2C12 cells (ATCC CRL-1772) were seeded at 5 x 10 4 gc per well on 48-well plate. AAV2 was added to cells at a multiplicity of infection of 10 4 gc per well. Cells were collected 4 days after transduction for genomic DNA extraction and editing efficiency analysis by next generation sequencing. With Rosa26 locus as the target, ISAam1 TnpB and Cas9 systems show appreciable activity (Fig. 16A) . Considering that AsCas12f has the lowest average editing activity among Cas12fs and Nme2-C. NR’s activity is generally lower than saCas9, we removed them from the in vivo experiment to avoid sacrificing more mice than necessary.
- mice We then individually delivered a single AAV8 vector encoding each of five different editing systems into mice and analyzed the editing activity in the target organ liver. All experiments related to animal work described in this study were performed strictly in accordance with the guidelines for the Care and Use of Laboratory Animals, and approved by Animal Welfare and Research Ethics Committee of Institute of Zoology, Chinese Academy of Sciences.
- the mouse strain C57BL/6J was obtained from Vitalriver. 6-week-old female C57BL/6J mice were injected with 5 x 10 11 gc in 100 ⁇ l volume via tail vein. Mice were sacrificed 14 days later and liver tissues were collected for genome extraction This in vivo result roughly recapitulates the aforementioned ex vivo data where TnpB and SaCas9 systems show relatively high activity (Fig. 16B) .
- HEK293T cells For quantifying the editing specificity or off-target level, we employed one candidate-based assay in mouse N2a cells and one unbiased genome-wide assay in human HEK293T cells by iGUIDE-seq (see Nobles, C. L. et al., 2019, iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity. Genome Biol. 20, 14) .
- half million HEK293T cells were transfected with 1 ⁇ g nuclease plasmid, 500 ng gRNA plasmid and 50 pmol double-stranded oligodeoxynucleotide (dsODN) using Lonza 4D system (Program CM-130) .
- ISAam1 shows the lowest off-target ratio for two loci (Figs. 17A and 17B) , while ISYmu1 also exhibits a low degree of off-targeting editing (no off-target editing for one locus and the second or the third lowest off-targeting editing level for the remaining two loci) .
- an in-depth characterization indicates that ISAam1 and ISYmu1 TnpBs outperforms Cas12 variants in terms of ex vivo and in vivo efficiency, while exhibiting comparable performance to Cas9 variants. Moreover, their editing specificity is on par with these Cas12 or Cas9 variants.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Wood Science & Technology (AREA)
- Organic Chemistry (AREA)
- Zoology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Physics & Mathematics (AREA)
- Plant Pathology (AREA)
- Biophysics (AREA)
- Medicinal Chemistry (AREA)
- Mycology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Enzymes And Modification Thereof (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
Provided herein are novel TnpB polypeptides having the activity of RNA-guided endonuclease. Provided herein are also a gene editing system comprising the TnpB polypeptide of the present invention or a disarmed variant thereof, or a fusion polypeptide comprising the same as well as a method for gene editing with the TnpB polypeptide or the disarmed variant thereof, or the fusion polypeptide or the gene editing system of the invention.
Description
The present invention relates to molecular biology. In particular, the present invention provides novel RNA-guided systems for gene editing.
The modification of genome at a predetermined site has been enabled by employing site-specific systems.
Genome-editing techniques such as meganucleases, designer zinc finger nucleases (ZFNs) , or transcription activator-like effector nucleases (TALENs) , are available for producing targeted genome modification, but these systems tend to have low specificity and employ designed nucleases that need to be redesigned for each target site, which renders them costly and time-consuming to prepare.
Recently, CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) -Cas systems have become popular, which comprise different domains of effector proteins that encompass a variety of activities (DNA recognition, binding, and optionally cleavage) . However, the Cas nuclease is generally large in size, making it difficult to deliver the CRISPR-Cas systems into a cell.
Transposition has a key role in reshaping genomes of all living organisms. Insertion sequences of IS200/IS605 and IS607 families are among the simplest mobile genetic elements and contain only the genes that are required for their transposition and its regulation. These elements encode tnpA transposase, which is essential for mobilization, and often carry an accessory tnpB gene, which is dispensable for transposition. A TnpB protein (ISDra2 TnpB) has been reported to have the activity of RNA-guided DNA endonuclease (Karvelis et al., Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature, 2021, 559: 692-696) . TnpB is a functional progenitor of CRISPR–Cas nucleases and is established as a prototype of a new system for genome editing. TnpB proteins are generally much smaller than Cas proteins in length, and thus, will be more convenient for the delivery into a cell.
However, the ISDra2 TnpB needs to recognize a transposon-associated motif (TAM) of TTGAT for effecting the RNA-guided cleavage. It is known that a longer sequence is generally present in the genome with a lower frequency. Therefore, the use of the ISDra2 TnpB in genome editing will be limited.
Hence, there is a need of identifying new TnpB polypeptides having the activity of RNA-guided DNA endonuclease, which recognize a different such as shorter TAM as compared to ISDra2 TnpB, and genome editing systems comprising the same.
To meet the need above, the inventors identified a number of TnpB polypeptides having the activity of RNA-guided DNA endonuclease. The identification of the TnpB polypeptides increases the possibility of editing various genomic regions which is not accessible for the TnpB polypeptide of the prior art. Further, the TnpB polypeptides of the present disclosure provide an editing efficiency higher than the prior art TnpB polypeptide, and even comparable to Cas9 nuclease.
In a first aspect, the present disclosure provides a recombinant gene editing system comprising
- a TnpB polypeptide or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and
- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA, wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
In a second aspect, the present disclosure provides a composition comprising
- a recombinant TnpB polypeptide or a functional fragment thereof,
- a target double-stranded DNA comprising a nucleotide sequence of interest and a TAM recognized by the TnpB polypeptide; and
- a recombinant guide RNA (gRNA) comprising a targeting region capable of hybridizing to the nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or a functional fragment thereof, wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
In a third aspect, the present disclosure provides a method of introducing a double-strand break into a polynucleotide of interest comprising a step of contacting the polynucleotide with a
recombinant gene editing system comprising
- a TnpB polypeptide or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and
- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence in the polynucleotide of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,
wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
In a fourth aspect, the present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant gene editing system comprising
- a TnpB polypeptide or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and
- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a portion of the genomic sequence and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,
wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
In a fifth aspect, the present disclosure provides a modified TnpB polypeptide comprising a modification in the DDE motif as compared to the parent TnpB polypeptide, wherein the parent polypeptide has an activity of RNA-guided endonuclease, and wherein the modified TnpB is deprived of the activity of cleaving double-stranded DNA.
In a sixth aspect, the present disclosure provides a recombinant system comprising
- the modified TnpB polypeptide of the fifth aspect, or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or
the functional fragment thereof, and
- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
In a seventh aspect, the present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant system of the sixth aspect and a gene editing system targeting the genomic sequence, wherein the nucleotide sequence of interest is next to the genomic sequence.
In an eighth aspect, the present disclosure provides a fusion polypeptide comprising a TnpB polypeptide or a functional fragment thereof or disarmed TnpB polypeptide fused to a fusion partner, wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, wherein the TnpB polypeptide has an activity of RNA-guided endonuclease, and wherein the disarmed TnpB polypeptide is deprived of the activity of cleaving double-stranded RNA.
In a ninth aspect, the present disclosure provides a gene editing system comprising
- the fusion polypeptide of the present disclosure, or a polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide, and
- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
In a tenth aspect, the present disclosure provides a A method of modifying a genomic sequence in a eukaryotic cell, comprising a step of introducing the gene editing system of the ninth aspect into the eukaryotic cell, wherein the gRNA comprises a targeting region capable of hybridizing to a portion of the genomic sequence.
In an eleventh aspect, the present disclosure provides a method of screening TnpB polypeptide for the activity of cleaving double-stranded DNA comprising the steps of:
- providing a candidate TnpB polypeptide from a microorganism;
- providing a gRNA comprising a targeting region and a backbone region, wherein the backbone region comprises 100-350 nucleotides before the 3’ end of the IS, which naturally comprises the nucleotide sequence encoding the TnpB polypeptide;
- providing a target DNA comprising a nucleotide sequence that hybridizes to the nucleotide sequence of the targeting region and a TAM recognized by the TnpB polypeptide, wherein the
TAM consists of four or five consecutive nucleotides adjacent to the 5’ end of the IS;
- contacting the TnpB polypeptide with the gRNA and the target DNA; and
- detecting the cleavage on the target DNA.
Fig. 1 shows the maps of the pair of plasmids for screening TnpB polypeptide having the activity of RNA-guided DNA endonuclease, comprising the test plasmid (encoding a TnpB polypeptide and a gRNA, Fig. 1A) and the reporter plasmids (comprising a target sequence, Fig. 1B) .
Fig. 2 shows the results of the screening for TnpB polypeptide having the activity of RNA-guided DNA endonuclease.
Fig. 3 shows the depletion ratio of TnpB polypeptides and dTnpB polypeptides (the TnpB polypeptides with the DDE motif substituted by alanine) .
Fig. 4 shows the alignment of the amino acid sequences of TnpB polypeptides having the activity of RNA-guided DNA endonuclease.
Fig. 5 shows the amino acid sequences of TnpB polypeptides having the activity of RNA-guided DNA endonuclease, and the DDE motif in the amino acid sequences (the residues bold and underlined) .
Fig. 6 shows the structure and mechanism of the fluorescence-reporting system.
Fig. 7 shows the maps of the plasmids for detecting the RNA-guided cleavage in 293T cells with the fluorescence-reporting system, including a plasmid encoding the fluorescence-reporting system (A) , a plasmid encoding the TnpB polypeptide (B) , and a plasmid encoding gRNA (C) .
Fig. 8 shows the results of flow cytometry for detecting the expression of GFP which indicates the RNA-guided cleavage in the fluorescence-reporting system by ISTfu1 TnpB, ISDge10 TnpB, ISAba30 TnpB, ISAam1 and ISYmu1 TnpB polypeptides.
Fig. 9 shows the efficiency of editing by ISTfu1 TnpB and ISDra2 TnpB polypeptides with different TAMs in the fluorescence-reporting system.
Fig. 10 shows the results of the surveyor assays for detecting the RNA-guided cleavage in human cells by ISTfu1 TnpB (panel A) , ISDge10 TnpB (panel B) , ISAba30 TnpB (panel C) , ISAam1 (panel D) and ISYmu1 (panel E) TnpB polypeptides.
Fig. 11 shows the effect of the backbone design on the RNA-guided cleavage by ISDra2 (panel A) , ISTfu1 (panel B) , ISDge10 (panel C) , ISAba30 (panel D) , ISAam1 (panel E) , and ISYmu1 (panel F) TnpB polypeptides.
Fig. 12 shows the distribution of 10 conserved residues in 25 active TnpB proteins together with ISDra2 TnpB (panel A) , which are marked as asterisks and as black lines in the bottom bar with the domain architecture overlaid, and that the endonuclease activity of TnpB mutants (N to A) was sharply decreased (panel B) .
Fig. 13 shows the editing efficiency of ISAam1, ISYmu1 and ISDra2 systems at six randomly selected endogenous sites in HEK293T cells. Data are shown as the mean ± SD, n=3.
Fig. 14 shows gRNA design for seven nucleases at ten genomic loci of human. Nucleases and the corresponding TAM are color-coded. The gRNAs are aligned according to the stranded position. Taking CBLB as an example, the gRNA is more overlapping for ISAam1 and three Cas12f variants than for the other three nucleases.
Fig. 15 shows the comparison of editing efficiency of two TnpB systems and five Cas nucleases at 10 genomic loci in human HEK293T (panel A) and HCT116 (panel B) cells, and the Comparison of editing efficiency of ISAam1 (panel C) or ISYmu1 (panel D) relative to five Cas nucleases at three genomic loci in HEK293T cells. For panels A and B, each dot represents the average efficiency of three biological replicates. The distribution is shown as a box plot where the box indicates the median (middle line) and the interquartile range (IQR, box limits) and Values from minimum to maximum are shown by the whiskers. For panels C and D, the gRNA design is shown on the left panel, and editing efficiency shown on the right panel. The seven nucleases are color-coded. Since it is impossible to design overlapping gRNAs targeting the same location across all seven nucleases, two groups of overlapping gRNAs were separately designed for ISAam1, three Cas12f variants and Nme2-C. NR, and for ISAam1 and SaCas9. ISYmu1 was in a similar scenario.
Fig. 16 shows the AAV2-delivery based editing efficiency at Rosa26 locus in mouse C2C12 cell (n=3, panel A) and the efficiency of AAV8 mediated in vivo gene editing at Rosa26 site (n=4) and Angptl3 site (n=3) in mouse (panel B) . Data are shown as the mean ± SD.
Fig. 17 shows the ratio of off-target vs. on-target editing efficiency at top two predicted off-target sites of each nuclease (n=3, panel A) , and the off-target level quantified by iGUIDE analysis at MAPK8 locus in HEK293T cells (panel B, in which the pie chart shows the proportion of on-target and off-target reads, respectively) .
1. Definitions
As used herein, the terms “TnpB polypeptide” refers to a polypeptide encoded by the tnpB gene in an insertion sequence (IS) . “TnpB endonuclease” , “TnpB effector” and “TnpB nuclease” are used interchangeably herein and refer to the TnpB polypeptide having an activity of RNA-guided endonuclease. A TnpB polypeptide is generally 300-500 amino acid residues in size. A TnpB polypeptide “derived from” a microorganism refers to the TnpB polypeptide naturally occurring in the microorganism, including TnpB polypeptide that can be found in an online database such as the National Center for Biotechnology Information (NCBI) , and the natural
variants thereof.
The terms “polypeptide” and “protein” are used interchangeably herein and refer to a polymer of amino acids and includes full-length proteins and fragments thereof.
Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, and include restriction endonucleases that cleave DNA at specific sites without damaging the bases. Examples of endonucleases include, but are not limited to, restriction endonucleases, meganucleases, TAL effector nucleases (TALENs) , zinc finger nucleases, and Cas (CRISPR-associated) effector endonucleases. The present disclosure provides novel RNA-guided TnpB endonucleases.
As used herein, "nucleic acid" means a polynucleotide and includes a single or a double-stranded polymer of deoxyribonucleotide or ribonucleotide bases. Nucleic acids may also include fragments and modified nucleotides. Thus, the terms "polynucleotide" , "nucleic acid sequence" , "nucleotide sequence" and "nucleic acid fragment" are used interchangeably to denote a polymer of RNA and/or DNA and/or RNA-DNA that is single-or double-stranded, optionally comprising synthetic, non-natural, or altered nucleotide bases. Nucleotides (usually found in their 5'-monophosphate form) are referred to by their single letter designation as follows: "A" for adenosine or deoxyadenosine (for RNA or DNA, respectively) , "C" for cytosine or deoxycytosine, "G" for guanosine or deoxyguanosine, "U" for uridine, "T" for deoxythymidine, "R" for purines (A or G) , "Y" for pyrimidines (C or T) , "K" for G or T, "H" for A or C or T, "I" for inosine, and "N" for any nucleotide.
The term "genome" as it applies to a prokaryotic and eukaryotic cell or organism cells encompasses not only chromosomal DNA found within the nucleus, but organelle DNA found within subcellular components (e.g., mitochondria, or plastid) of the cell. The term "genome" refers to the entire complement of genetic material (genes and non-coding sequences) that is present in each cell of an organism, or virus or organelle; and/or a complete set of chromosomes inherited as a (haploid) unit from one parent.
The term "selectively hybridizes" means hybridization, preferably under stringent hybridization conditions, of a nucleic acid sequence to a specified nucleic acid target sequence to a detectably greater degree (e.g., at least 2-fold over background) than its hybridization to non-target nucleic acid sequences, and the substantial exclusion of non-target nucleic acids. Selectively hybridizing sequences typically have about at least 80%sequence identity, or 90%sequence identity, up to and including 100%sequence identity (i.e., fully complementary) with each other.
The term "stringent conditions" or "stringent hybridization conditions" includes reference to conditions under which a probe will selectively hybridize to its target sequence in an in vitro hybridization assay. Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified which are 100%complementary to the probe (homologous probing) . Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing) . A person skilled in the art knows various conditions for hybridization, including
stringent hybridization conditions and highly stringent hybridization conditions. See, for example, Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, N.Y.; and Ausubel et al. (eds. ) , 1995, Current Protocols in Molecular Biology, John Wiley &Sons, N.Y.
The term "homology" refers to DNA sequences that are similar. For example, a "region of homology to a genomic region" that is found on the donor DNA is a region of DNA that has a similar sequence to a given "genomic region" in the cell or organism genome. A region of homology can be of any length that is sufficient to promote homologous recombination at the cleaved target site. For example, the region of homology can comprise 5-3000 or more bases, such as at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, or 3000 in length to enable the homologous recombination with the corresponding genomic region.
As used herein, a "genomic region" is a segment on a chromosome or organelle DNA of a cell. that is present either upstream or downstream of the target site or, alternatively, also comprises a portion (at either 5’ or 3’ end) of the target site. The genomic region can comprise can comprise 5-3000 or more bases, such as at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, or 3000 in length to enable the homologous recombination with the corresponding region of homology.
The term "homologous recombination" (HR) means the exchange of DNA fragments between two DNA molecules at the sites of homology. The frequency of homologous recombination is influenced by a number of factors. The amount of homologous recombination and the relative proportion of homologous to non-homologous recombination vary in different organisms. Generally, the length of the region of homology affects the frequency of homologous recombination events: the longer the region of homology, the greater the frequency. Further, the homologous recombination needs a certain length of the homologous region, which is species-variable. See, for example, Singer et al., (1982) Cell 31: 25-33; Shen and Huang, (1986) Genetics 112: 441-57; Watt et al., (1985) Proc. Natl. Acad. Sci. USA 82: 4768-72, Sugawara and Haber, (1992) Mol Cell Biol 12: 563-75, Rubnitz and Subramani, (1984) Mol Cell Biol 4: 2253-8;Ayares et al., (1986) Proc. Natl. Acad. Sci. USA 83: 5199-203; Liskay et al., (1987) Genetics 115: 161-7.
"Sequence identity" or "identity" in the context of nucleotide or amino acid sequences refers to the nucleotide bases or amino acid residues in two sequences that are the same when aligned for maximum correspondence over a specified comparison window.
The term "percentage of sequence identity" refers to the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the nucleotide or amino acid sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage sequence identity is calculated by dividing the number of matched positions (i.e., positions at which the nucleotide bases or amino
acid residues in the two sequences are identical) by the total number of positions in the window of comparison and multiplying the results by 100. For example, when aligning two sequences, if 950 positions in two sequences, which are optimally aligned in a comparison window of 1000 positions, are identical, the sequences are 95%identical to each other.
A variety of comparison methods have been designed for sequence alignments and the calculations of percent identity or similarity, including, but not limited to, the MegAlign. TM. program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wis. ) . Within the context of this application it will be understood that where sequence analysis software is used for analysis, the results of the analysis will be based on the "default values" of the program referenced, unless otherwise specified. As used herein "default values" will mean any set of values or parameters that originally load with the software when first initialized.
"BLAST" is a searching algorithm provided by NCBI used to find regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches to identify sequences having sufficient similarity to a query sequence such that the similarity would not be predicted to have occurred randomly. BLAST reports the identified sequences and their local alignment to the query sequence. It is well understood by one skilled in the art that many levels of sequence identity are useful in identifying polypeptides from other species or modified naturally or synthetically wherein such polypeptides have the same or similar function or activity. Useful examples of percent identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%or 95%, or any percentage from 50%to 100%. Indeed, any amino acid identity from 50%to 100%may be useful in describing the present disclosure, such as 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%or 99%.
A "centimorgan" (cM) or "map unit" is the distance between two polynucleotide sequences, linked genes, markers, target sites, loci, or any pair thereof, wherein 1%of the products of meiosis are recombinant. Thus, a centimorgan is equivalent to a distance equal to a 1%average recombination frequency between the two linked genes, markers, target sites, loci, or any pair thereof.
An "isolated" polynucleotide, polypeptide, or protein is substantially or essentially free from components that normally accompany or interact with the polynucleotide, polypeptide, or protein as found in its naturally occurring environment. Thus, an isolated polynucleotide or polypeptide or protein is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. Preferably, an "isolated" polynucleotide is free of sequences that naturally flank the polynucleotide (i.e., sequences located at the 5' and 3' ends of the polynucleotide) in the genomic DNA of the organism from which the polynucleotide is derived. Isolated polynucleotides and polypeptides may be purified from a cell in which they naturally occur. The methods for isolating or purifying polynucleotides or polypeptides are known to a person skilled in the art. The term also embraces recombinant or chemically synthesized polynucleotides and polypeptides.
The term "fragment" refers to a contiguous set of nucleotides or amino acids. In one embodiment, a fragment comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous nucleotides. In one embodiment, a fragment comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous amino acids. A fragment may or may not exhibit the function of a sequence sharing some percent identity over the length of said fragment.
The term "functional fragment" refers to a portion of an isolated polynucleotide or polypeptide that displays the same activity or function as the longer or full-length sequence from which it derives.
The term "gene" includes a nucleic acid fragment that expresses a functional molecule such as, but not limited to, a specific protein, including regulatory sequences preceding (5' non-coding sequences) and following (3' non-coding sequences) the coding sequence.
The term "endogenous" means a sequence or other molecule that naturally occurs in a cell or organism. An endogenous polynucleotide is normally found in the genome of a cell; that is, not heterologous.
The term "heterologous" refers to the difference between the original environment, location, or composition of a particular polynucleotide or polypeptide and its current environment, location, or composition. Non-limiting examples include differences in taxonomic derivation (e.g., a polynucleotide obtained from species A would be heterologous if inserted into the genome of species B, or of a different variety or cultivar of species A; or a polynucleotide obtained from a bacterium was introduced into a cell of a plant or an animal) , or sequence (e.g., a polynucleotide obtained from species A, isolated, modified, and re-introduced into a plant of species A) .
An "allele" is one of several alternative forms of a gene occupying a given locus on a chromosome. When all the alleles present at a given locus on a chromosome in a cell or an organism are the same, the cell or organism is homozygous at that locus. If the alleles present at a given locus on a chromosome differ, the cell or organism is heterozygous at that locus.
"Coding sequence" refers to a nucleotide sequence which codes for a specific amino acid sequence. "Regulatory sequences" refer to nucleotide sequences located upstream (5' non-coding sequences) , within, or downstream (3' non-coding sequences) of a coding sequence, which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include, but are not limited to, promoters, translation leader sequences, 5' untranslated sequences, 3' untranslated sequences, introns, polyadenylation signal sequences, RNA processing sites, effector binding sites, and stem-loop structures.
A "mutated gene" is a gene that has been altered through human intervention. A "mutated gene" has a sequence that differs from the sequence of the corresponding non-mutated gene by the addition, deletion, insertion or substitution of at least one nucleotide. In the present disclosure, the mutated gene comprises an alteration that results from a guide polynucleotide/TnpB endonuclease system as disclosed herein. A mutated organism is an organism comprising a
mutated gene.
As used herein, a "targeted mutation" is a mutation in a gene that is made in a target sequence within the gene using any method known to a person skilled in the art, including a method involving a guided TnpB endonuclease system as disclosed herein.
The term "knock-out" refers to a DNA sequence in a cell that has been rendered partially or completely inoperative, e.g., by targeting with a TnpB protein of the present disclosure; for example, a DNA sequence prior to knock-out could have encoded an amino acid sequence, or could have had a regulatory function (e.g., promoter) .
The term "knock-in" represents the replacement or insertion of a DNA sequence at a specific site in the genome of a cell by targeting with a TnpB protein (for example by homologous recombination (HR) , wherein a suitable donor DNA polynucleotide is also used) . The knock-in can be a specific insertion of a heterologous nucleotide sequence that encodes an amino acid sequence or a functional RNA, or a specific insertion of a transcriptional regulatory element. The term "domain" means a contiguous stretch of nucleotides (that can be RNA, DNA, and/or RNA-DNA-combination sequence) or contiguous or non-contiguous amino acids.
A "conserved domain" or "motif" means a set of nucleotides or amino acids conserved at specific positions along an aligned sequence of evolutionarily related genes or proteins. While nucleotides or amino acids at other positions can vary between homologous proteins, nucleotides or amino acids that are highly conserved at specific positions indicate amino acids that are essential for the structure, the stability, or the function of a polynucleotide or protein.
A "codon-optimized" nucleotide sequence is a nucleotide sequence having its frequency of codon usage designed to mimic the frequency of preferred codon usage of the host cell. An "optimized" polynucleotide comprises a nucleotide sequence that has been optimized for improved expression in a particular heterologous host cell.
A "promoter" is a nucleotide sequence involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, and/or comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of some variation may have identical promoter activity.
A promoter that causes a gene to be expressed in most tissues or cell types at most times are commonly referred to as "constitutive promoter" . The term "inducible promoter" or “regulated promoter” refers to a promoter that selectively express a coding sequence or functional RNA in response to the presence of an endogenous or exogenous stimulus, for example by chemical compounds (chemical inducers) or in response to environmental, hormonal, chemical, and/or
developmental signals. Inducible or regulated promoters include, for example, promoters induced or regulated by light, heat, stress, flooding or drought, salt stress, osmotic stress, phytohormones, wounding, or chemicals such as ethanol, abscisic acid (ABA) , jasmonate, salicylic acid, or safeners.
An "enhancer" is a nucleotide sequence that can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the activity or tissue-specificity of a promoter.
The term "translation leader sequence" refers to a nucleotide sequence located between the promoter sequence and the coding sequence. The translation leader sequence is present in the mRNA upstream of the start codon. The translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency.
The term "3' non-coding sequences" , which can be exchanged with "transcription terminator" or "termination sequences" refer to nucleotide sequences located downstream of a coding sequence and include polyadenylation signal sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3' end of the mRNA precursor.
The term "RNA transcript" refers to the product resulting from transcription of a DNA sequence catalyzed by RNA polymerase. When the RNA transcript is a perfect complimentary copy of the DNA sequence, it is referred to as the primary transcript or pre-mRNA. An RNA transcript derived from post-transcriptional processing of the pre-mRNA is referred to as mature RNA or messenger RNA (mRNA) . "Messenger RNA" or "mRNA" refers to the RNA that can be translated into protein and does not comprises introns. "cDNA" refers to a DNA that is complementary to, and synthesized from, an mRNA template using the reverse transcriptase. The cDNA can be single-stranded or converted into double-stranded form using, e.g., the Klenow fragment of DNA polymerase I. "Sense" RNA refers to RNA transcript that includes the mRNA and can be translated into protein within a cell or in vitro. "Antisense RNA" refers to an RNA transcript that is complementary to all or part of a target primary transcript or mRNA, and that can block the expression of a target gene. The complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e., at the 5' non-coding sequence, 3' non-coding sequence, introns, or the coding sequence. "Functional RNA" refers to antisense RNA, ribozyme RNA, or other RNA that may not be translated but yet has an effect on cellular processes. The terms "complement" and "reverse complement" are used interchangeably herein with respect to mRNA transcripts, and are meant to define the antisense RNA of the message.
The term "operably linked" refers to the association of nucleotide sequences on a single nucleic acid fragment so that the function of one is regulated by the other. For example, a promoter is operably linked with a coding sequence when it is capable of regulating the expression of the coding sequence (i.e., the coding sequence is transcribed under the control of the promoter) . Coding sequences can be operably linked to regulatory sequences in a sense or antisense orientation.
The term "host" refers to an organism or cell into which a heterologous component (polynucleotide, polypeptide, other molecule, cell) has been introduced. As used herein, a "host cell" refers to an in vivo or isolated eukaryotic cell, prokaryotic cell (e.g., bacterial or archaeal cell) , or cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, into which a heterologous polynucleotide or polypeptide has been introduced. The cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell and an animal cell, such as an invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, an insect cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell. In some cases, the cell is isolated. In some cases, the cell is in vivo.
The term "recombinant" refers to an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis, or by genetic engineering techniques.
The terms "plasmid" and "vector" refer to a linear or circular extra chromosomal element often carrying genes that are not part of the central metabolism of the cell, and usually in the form of double-stranded DNA. Such elements may be autonomously replicating sequences, genome integrating sequences, phage, or nucleotide sequences, in linear or circular form, of a single-or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a polynucleotide of interest into a cell.
The term "construct" , when referring to nucleic acid molecules, comprises an artificial combination of nucleic acid sequences, e.g., regulatory and coding sequences that are not all found together in nature. When the nucleic acid construct contains the control sequences required to express the coding sequence of the present invention, the term is synonymous with the term “expression cassette” . For example, a nucleic acid construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature. Such a construct may be used by itself or may be used in conjunction with a vector. If a vector is used, then the choice of vector is dependent upon the method that will be used to introduce the vector into the host cells as is well known to those skilled in the art. The vector for expressing a coding sequence (e.g., comprising an expression construct) is referred to as “expression vector” .
The term "expression" , as used herein, refers to the production of a functional end-product (e.g., an mRNA, guide RNA, or a protein) in either precursor or mature form.
A "mature" protein refers to a post-translationally processed polypeptide (i.e., one from which any pre-or propeptides present in the primary translation product have been removed) .
"Precursor" protein refers to the primary product of translation of mRNA (i.e., with pre-and propeptides still present) . Pre-and propeptides may be but are not limited to intracellular localization signals.
As used herein, an "effector" or "effector protein" is a protein that encompasses an activity including recognizing, binding to, and/or cleaving or nicking a polynucleotide target. An effector, or effector protein, may also be an endonuclease, such as the TnpB polypeptide of the invention. The "effector complex" of a gene editing system includes TnpB polypeptide involved in gRNA and target recognition and binding.
A "functional fragment" of a TnpB endonuclease refers to a portion of the TnpB endonuclease of the present disclosure in which the ability to recognize, bind to, and/or cleave (introduce a double-strand break in) the target site is retained. The "functional variant" of a TnpB endonuclease refers to a variant of the TnpB endonuclease disclosed herein in which the ability to recognize, bind to, and/or cleave a target sequence is retained.
A TnpB endonuclease may also include a multifunctional TnpB endonuclease, which refers to a single polypeptide that has endonuclease activity (comprising at least one protein domain that can act as a endonuclease) and at least one other functionality, such as but not limited to, the functionality to form a complex (comprises at least a second protein domain that can form a complex with other proteins) .
As used herein, the term "guide polynucleotide" , relates to a polynucleotide that can form a complex with a TnpB endonuclease, such as the TnpB endonuclease described herein, and enables the Cas endonuclease to recognize, optionally bind to, and optionally cleave a DNA target site. The guide polynucleotide can be a guide RNA, a guide DNA sequence, or a combination thereof (an RNA-DNA combination molecule) . For TnpB polypeptide, the guide RNA is also referred to as “right element RNA” or “reRNA” .
A "functional fragment" of a guide polynucleotide refers to a portion or subsequence of the guide polynucleotide of the present disclosure in which the ability to function as a guide polynucleotide is retained. A "functional variant" of a guide polynucleotide refers to a variant of the guide polynucleotide of the present disclosure in which the ability to function as a guide polynucleotide is retained.
The terms "targeting domain" and "targeting region" are used interchangeably herein and includes a nucleotide sequence that can hybridize (is complementary) to one strand (nucleotide sequence) of a double strand DNA target site. The percent complementation between the targeting region and the target sequence can be at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 63%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%. The variable targeting region can be at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides in length. In some embodiments, the variable targeting domain comprises a contiguous stretch of 12 to 30 nucleotides. The targeting domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence, or any combination thereof.
The "backbone" of a guide polynucleotide comprises a nucleotide sequence that interacts with a TnpB polypeptide.
The term "gRNA/TnpB complex" refers to an RNA component and a TnpB endonuclease that are capable of forming a complex, wherein the complex can direct the TnpB endonuclease to a DNA target site, enabling the TnpB endonuclease to recognize, bind to, and/or cleave (introduce a double-strand break) the DNA target site.
The terms "target site" , "target sequence" , and "target region" are used interchangeably herein and refer to a nucleotide sequence on a chromosome, episome, a locus, or any other DNA molecule in the genome (including chromosomal, chloroplastic, mitochondrial DNA, plasmid DNA) of a cell, at which a gRNA/TnpB complex can recognize, bind to, and optionally nick or cleave. The target site can be an endogenous site in the genome of a cell, or alternatively, the target site can be heterologous to the cell and thereby not be naturally occurring in the genome of the cell, or the target site can be found in a heterologous genomic location compared to where it occurs in nature.
A "transposon-associated motif" (TAM) herein refers to a short nucleotide sequence adjacent to a target sequence that is recognized (targeted) by a gRNA/TnpB complex described herein. The TnpB endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not adjacent to a TAM sequence. The sequence and length of a TAM herein can differ depending on the TnpB protein. The TAM sequence is typically 4 or 5 nucleotides long.
A “modified” TnpB polypeptide/endonuclease refers to a TnpB polypeptide comprising the substitution, deletion, insertion or addition of at least one amino acid when compared to the initial or wildtype TnpB polypeptide. If the modified TnpB is deprived of the activity of cleaving the DNA molecule while the ability of recognizing and binding to polynucleotide is retained, the modified TnpB polypeptide can be referred to a “disarmed” TnpB polypeptide.
An "altered target site" , "altered target sequence" , "modified target site" , "modified target sequence" are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence. Such "alteration" includes, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, (iv) a chemical alteration of at least one nucleotide, or (v) any combination of (i) - (iv) .
A "modified nucleotide" or "edited nucleotide" refers to a nucleotide sequence of interest that comprises at least one alteration when compared to its non-modified nucleotide sequence. Such "alterations" include, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, (iv) a chemical alteration of at least one nucleotide, or (v) any combination of (i) - (iv) .
Methods for "modifying a target site" and "altering a target site" are used interchangeably herein and refer to methods for producing an altered target site.
As used herein, "donor DNA" is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a gRNA/TnpB complex of the invention.
The term "polynucleotide modification template" includes a polynucleotide that comprises at least one nucleotide modification when compared to the nucleotide sequence to be edited. A nucleotide modification can be the substitution, addition, insertion or deletion of at least one nucleotide. Optionally, the polynucleotide modification template can further comprise homologous nucleotide sequences flanking the at least one nucleotide modification, wherein the flanking homologous nucleotide sequences provide sufficient homology to the desired nucleotide sequence to be edited.
As used herein, the term "before" , in reference to a sequence position, refers to an occurrence of one sequence upstream of another sequence (at the 5’ end for nucleotide sequences, or at the N terminus for the amino acid sequences) . The term “after” in reference to a sequence position, refers to an occurrence of one sequence downstream of another sequence (at the 3’ end for nucleotide sequences, or at the C terminus for the amino acid sequences) .
2. TnpB polypeptide with the activity of RNA-guided endonuclease
The inventors have identified a number of novel RNA-guided endonucleases, which are TnpB polypeptides encoded by the TnpB gene in the insertion sequences (IS) . That is, the TnpB polypeptides can work as the effector protein in a gene editing system. Upon the sequence analysis, the inventors found that the active TnpB polypeptides comprises a N-terminal helix-turn-helix (HTH) domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
Therefore, the present disclosure provides an isolated TnpB polypeptide having the activity of RNA-guided endonuclease or a functional fragment thereof. In some embodiments, the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
In some embodiments, the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
In some embodiments, the TnpB polypeptide is encoded by a tnpB gene in an insertion sequence (IS) selected from a group consisting of ISEfa4, ISAs26, ISCpe2, ISMma22, ISBce3, ISAeme8, ISTfu1, ISCco1, ISSoc3, ISTel2, ISNsp3, ISCbt1, ISMac7, ISEc46, ISSen6, ISHahl1, ISKpn69, ISDge10, ISKpn85, ISNsp2, ISAba30, ISRor9, ISAam1, ISYmu1, ISCytsp1, ISCvi1, ISCvi2, ISAepa1 and ISBth16.
In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. The variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 , 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
In some embodiments, the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity. The TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide. In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.
In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT.
In some embodiments, the TnpB polypeptide recognizes a TAM of CCAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.
In some embodiments, the TnpB polypeptide recognizes a TAM of CTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 16.
In some embodiments, the TnpB polypeptide recognizes a TAM of TGAY, where Y= C or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7, 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAT. In some embodiments, the
TnpB polypeptide comprises an amino acid of SEQ ID NO: 7.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAG. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAW, where W = A or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 3. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 23. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 25 or 27.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 24.
In some embodiments, the TnpB polypeptide of the present disclosure or the functional fragment thereof is capable of effecting RNA-guided cleavage in a prokaryotic and/or eukaryotic cell, preferably in both prokaryotic and eukaryotic cells. In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24.
In some embodiments, the the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. The variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320,
325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
Examples for the DDE motif include
D186, E280 and D362 of SEQ ID NO: 1;
D188, E272 and D368 of SEQ ID NO: 2;
D205, E289 and D372 of SEQ ID NO: 3;
D184, E268 and D348 of SEQ ID NO: 4;
D175, E260 and D339 of SEQ ID NO: 5;
D186, E270 and D352 of SEQ ID NO: 6;
D181, E265 and D361 of SEQ ID NO: 7;
D199, E293 and D383 of SEQ ID NO: 8;
D211, E295 and D373 of SEQ ID NO: 9;
D185, E268 and D350 of SEQ ID NO: 10;
D181, E265 and D345 of SEQ ID NO: 11;
D234, E342 and D436 of SEQ ID NO: 12;
D184, E268 and D350 of SEQ ID NO: 13;
D185, E269 and D351 of SEQ ID NO: 14;
D189, E273 and D369 of SEQ ID NO: 15;
D190, E290 and D376 of SEQ ID NO: 16;
D188, E272 and D368 of SEQ ID NO: 17;
D187, E271 and D351 of SEQ ID NO: 18;
D188, E272 and D368 of SEQ ID NO: 19;
D181, E265 and D345 of SEQ ID NO: 20;
D184, E268 and D364 of SEQ ID NO: 21;
D183, E267 and D349 of SEQ ID NO: 22;
D186, E270 and D351 of SEQ ID NO: 23;
D185, E279 and D361 of SEQ ID NO: 24;
D188, E272 and D352 of SEQ ID NO: 25;
D181, E265 and D348 of SEQ ID NO: 26;
D186, E276 and D359 of SEQ ID NO: 27;
D189, E274 and D354 of SEQ ID NO: 28; and
D188, E272 and D352 of SEQ ID NO: 29.
The inventors found that the TnpB endonuclease can be disarmed by modifying the DDE motif.
Therefore, the present disclosure also provides a modified/disarmed TnpB polypeptide comprising a modified DDE motif. In some embodiments, the DDE motif is modified by substituting at least one amino acid in the motif with a neutral amino acid or a basic amino acid. In some embodiments, at least one amino acid in the motif is substituted by alanine. In some embodiments, the modified TnpB polypeptide comprises
D186A, E280A and/or D362A as compared to SEQ ID NO: 1;
D188A, E272A and/or D368A as compared to SEQ ID NO: 2;
D205A, E289A and/or D372A as compared to SEQ ID NO: 3;
D184A, E268A and/or D348A as compared to SEQ ID NO: 4;
D175A, E260A and/or D339A as compared to SEQ ID NO: 5;
D186A, E270A and/or D352A as compared to SEQ ID NO: 6;
D181A, E265A and/or D361A as compared to SEQ ID NO: 7;
D199A, E293A and/or D383A as compared to SEQ ID NO: 8;
D211A, E295A and/or D373A as compared to SEQ ID NO: 9;
D185A, E268A and/or D350A as compared to SEQ ID NO: 10;
D181A, E265A and/or D345A as compared to SEQ ID NO: 11;
D234A, E342A and/or D436A as compared to SEQ ID NO: 12;
D184A, E268A and/or D350A as compared to SEQ ID NO: 13;
D185A, E269A and/or D351A as compared to SEQ ID NO: 14;
D189A, E273A and/or D369A as compared to SEQ ID NO: 15;
D190A, E290A and/or D376A as compared to SEQ ID NO: 16;
D188A, E272A and/or D368A as compared to SEQ ID NO: 17;
D187A, E271A and/or D351A as compared to SEQ ID NO: 18;
D188A, E272A and/or D368A as compared to SEQ ID NO: 19;
D181A, E265A and/or D345A as compared to SEQ ID NO: 20;
D184A, E268A and/or D364A as compared to SEQ ID NO: 21;
D183A, E267A and/or D349A as compared to SEQ ID NO: 22;
D186A, E270A and/or D351A as compared to SEQ ID NO: 23;
D185A, E279A and/or D361A as compared to SEQ ID NO: 24;
D188A, E272A and/or D352A as compared to SEQ ID NO: 25;
D181A, E265A and/or D348A as compared to SEQ ID NO: 26;
D186A, E276A and/or D359A as compared to SEQ ID NO: 27;
D189A, E274A and/or D354A as compared to SEQ ID NO: 28; and
D188A, E272A and/or D352A as compared to SEQ ID NO: 29.
In some embodiments, the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide is conserved at positions corresponding to N31, G179, L267, C332, C335, C351, and C354 of SEQ ID NO: 7.
The inventors further found that the TnpB endonuclease can be disarmed by modifying the amino acid corresponding to N31 of SEQ ID NO: 7. Therefore, the present disclosure also provides a modified/disarmed TnpB polypeptide comprising a modification at the position corresponding to N31 of SEQ ID NO: 7. In some embodiments, the modification is a substitution with alanine.
In some embodiments, the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide is conserved at positions corresponding to G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
In some embodiments, the modified TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA, i.e., is a disarmed TnpB polypeptide.
3.Fusion polypeptide
The present disclosure provides a fusion polypeptide comprising a TnpB polypeptide of the present disclosure or a functional fragment thereof or a modified/disarmed TnpB polypeptide of the present disclosure, fused to a fusion partner. The TnpB polypeptide includes the TnpB polypeptide with the activity of RNA-guided endonuclease as described above, or the functional fragment thereof. The modified/disarmed TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA.
In some embodiments, the fusion partner is a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone or other DNA-binding protein) associated with the target DNA.
In further embodiments, the fusion partner is a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
In further embodiments, the fusion partner is a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc. ) . In some embodiments, the fusion partner is another polypeptide or domain, for example Clo51 or FokI nuclease, to generate double-strand breaks (Guilinger et al. Nature Biotechnology, volume 32, number 6, June 2014) .
In some embodiments, the fusion partner is a polypeptide that directs editing of single or multiple bases in a polynucleotide sequence, for example a site-specific deaminase that can change the identity of a nucleotide, for example from C-G to T-A or an A-T to G-C (Gaudelli et al., Programmable base editing of A-T to G-C in genomic DNA without DNA cleavage. "Nature (2017) ; Nishida et al. " Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. "Science 353 (6305) (2016) ; Komor et al. " Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. " Nature 533 (7603) (2016) : 420-4.
The fusion polypeptide may comprise, for example, an active (double strand break creating) , partially active (nickase) or deactivated (deprived of cleaving) TnpB endonuclease and a
deaminase (such as, but not limited to, a cytidine deaminase, an adenine deaminase, APOBEC1, APOBEC3A, BE2, BE3, BE4, ABEs, or the like) . In some embodiments, the fusion partner includes base edit repair inhibitors and glycosylase inhibitors (e.g., uracil glycosylase inhibitor (to prevent uracil removal) ) .
In some embodiments, the fusion partner can be a Cas endonuclease or another TnpB endonuclease as described in the present disclosure.
The TnpB polypeptide, the functional fragment thereof, or the modified/disarmed TnpB polypeptide of the present disclosure can also be fused to a heterologous nuclear localization sequence (NLS) . A heterologous NLS herein may be of sufficient strength to drive accumulation of the TnpB polypeptide the functional fragment thereof, the modified/disarmed TnpB polypeptide or the fusion polypeptide in a detectable amount in the nucleus of a eukaryotic cell. An NLS may comprise one (monopartite) or more (e.g., bipartite) short sequences (e.g., 2 to 20 residues) of basic, positively charged residues (e.g., lysine and/or arginine) . An NLS may be operably linked to the N-terminus or C-terminus of a TnpB polypeptide, for example. Two or more NLS sequences can be linked to a TnpB polypeptide, for example, on both the N-and C-termini of a TnpB polypeptide.
4.Guide polynucleotide
The guide polynucleotide enables target recognition, binding, and optionally cleavage by the TnpB polypeptide. The guide polynucleotide sequence can be a RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence) . Optionally, the guide polynucleotide can comprise at least one nucleotide, phosphodiester bond or linkage modification such as, but not limited, to Locked Nucleic Acid (LNA) , 5-methyl dC, 2, 6-Diaminopurine, 2'-Fluoro A, 2'-Fluoro U, 2'-O-Methyl RNA, phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 (hexaethylene glycol chain) molecule, or 5' to 3' covalent linkage resulting in circularization. A guide polynucleotide that solely comprises ribonucleic acids is also referred to as a "guide RNA" or "gRNA" . A guide polynucleotide may be engineered or synthetic. The gRNA for TnpB polypeptide is also referred to as “right element RNA” or “reRNA” .
The guide polynucleotide includes a chimeric non-naturally occurring guide RNA comprising regions that are not found together in nature (i.e., they are heterologous with each other) . For example, a chimeric non-naturally occurring guide RNA comprising a targeting region that can hybridize to a nucleotide sequence in a target DNA, linked to a backbone region that can recognize the TnpB polypeptide, wherein the first and second nucleotide sequence are not found linked together in nature. In some embodiments, the targeting region is at the 3’ end of the scarffold.
The guide polynucleotide for TnpB polypeptide is a single guide, and the backbone can be 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived. The guide polynucleotide can effect the target recognition, binding, and optionally
cleavage by the TnpB polypeptide when removing a part or the whole of one or several stem structures in the backbone.
The guide polynucleotide can further comprise an additional nucleotide sequence at the 5’ end of the backbone. In some embodiments, the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
In some embodiments, the targeting region and the backbone region are selected from the group consisting of a DNA sequence, an RNA sequence, and a combination thereof.
In some embodiments, the guide polynucleotide comprises RNA backbone modifications that enhance stability, DNA backbone modifications that enhance stability, and a combination thereof (see Kanasty et al., 2013, Common RNA-backbone modifications, Nature Materials 12:976-977; US20150082478 published 19 Mar. 2015 and US20150059010 published 26 Feb. 2015) .
5. Polynucleotide and construct for expressing the TnpB polypeptide or fusion polypeptide The TnpB endonuclease, the functional fragment thereof, the disarmed TnpB polypeptide and the fusion polypeptide of the present disclosure can be isolated from a native source (for TnpB polypeptide) , or from a recombinant source where the host cell is genetically modified to express the nucleotide sequence encoding the polypeptide. Alternatively, the TnpB polypeptide and fusion polypeptide can be produced using cell free protein expression systems, or be synthetically produced.
Therefore, the present disclosure also provides an isolated polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide, the functional fragment thereof, the disarmed TnpB polypeptide and the fusion polypeptide of the present disclosure.
The TnpB polypeptide, the functional fragment thereof, the disarmed TnpB polypeptide and the fusion polypeptide, as well as the guide polynucleotide can be expressed in a cell. Cells include, but are not limited to, human, non-human, animal, bacterial, fungal, insect, yeast, and plant cells.
Standard recombinant DNA and molecular cloning techniques used herein are well known in the art and are described more fully in Sambrook et al., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory: Cold Spring Harbor, N.Y. (1989) . Transformation methods are well known to those skilled in the art and are described infra.
Provided are also vectors and constructs including circular plasmids, and linear polynucleotides, comprising a polynucleotide of interest and optionally other components including linkers, adapters, regulatory sequences.
In some embodiments, the vector comprises an expression cassette encoding both the TnpB polypeptide and the guide polynucleotide. In some examples a recognition site and/or target site
can be comprised within an intron, coding sequence, 5' UTRs, 3' UTRs, and/or regulatory regions.
In some embodiments, the vector comprises two expression cassettes encoding the TnpB polypeptide and the guide polynucleotide, respectively.
In some embodiments, the expression of the TnpB polypeptide and/or the guide polynucleotide is driven by a constitutive promoter, an inducible promoter, or a spatio-temporal specific promoter.
6. Gene Editing with the TnpB polypeptide
6.1. Recombinant gene editing system
The present disclosure provides a recombinant gene editing system comprising the novel TnpB polypeptide having the activity of RNA-guided endonuclease of the present disclosure.
In some embodiments, the recombinant gene editing system comprises:
- a TnpB polypeptide of the present disclosure having the activity of RNA-guided endonuclease or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide, or the functional fragment thereof, and
- a guide polynucleotide, such as a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
In some embodiments, the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
In some embodiments, the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
In some embodiments, the TnpB polypeptide is encoded by a tnpB gene in an insertion sequence (IS) selected from a group consisting of ISEfa4, ISAs26, ISCpe2, ISMma22, ISBce3, ISAeme8, ISTfu1, ISCco1, ISSoc3, ISTel2, ISNsp3, ISCbt1, ISMac7, ISEc46, ISSen6, ISHahl1, ISKpn69, ISDge10, ISKpn85, ISNsp2, ISAba30, ISRor9, ISAam1, ISYmu1, ISCytsp1, ISCvi1, ISCvi2, ISAepa1 and ISBth16.
In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
In some embodiments, the the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. The variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
In some embodiments, the TnpB polypeptide fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity. The TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide. In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.
In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT. In some embodiments, the TnpB polypeptide recognizes a TAM of CCAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.
In some embodiments, the TnpB polypeptide recognizes a TAM of CTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 16.
In some embodiments, the TnpB polypeptide recognizes a TAM of TGAY, where Y= C or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7, 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAC. In some
embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAG. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAW, where W = A or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 3. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 23. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 25 or 27.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 24.
In some embodiments, the TnpB polypeptide of the present disclosure or the functional fragment thereof is capable of effecting RNA-guided cleavage in a prokaryotic and/or eukaryotic cell, preferably in both prokaryotic and eukaryotic cells. In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24.
In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. The variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous
amino acids in SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
In some embodiments, the TnpB comprises a DDE motif corresponding to
D186, E280 and D362 of SEQ ID NO: 1;
D188, E272 and D368 of SEQ ID NO: 2;
D205, E289 and D372 of SEQ ID NO: 3;
D184, E268 and D348 of SEQ ID NO: 4;
D175, E260 and D339 of SEQ ID NO: 5;
D186, E270 and D352 of SEQ ID NO: 6;
D181, E265 and D361 of SEQ ID NO: 7;
D199, E293 and D383 of SEQ ID NO: 8;
D211, E295 and D373 of SEQ ID NO: 9;
D185, E268 and D350 of SEQ ID NO: 10;
D181, E265 and D345 of SEQ ID NO: 11;
D234, E342 and D436 of SEQ ID NO: 12;
D184, E268 and D350 of SEQ ID NO: 13;
D185, E269 and D351 of SEQ ID NO: 14;
D189, E273 and D369 of SEQ ID NO: 15;
D190, E290 and D376 of SEQ ID NO: 16;
D188, E272 and D368 of SEQ ID NO: 17;
D187, E271 and D351 of SEQ ID NO: 18;
D188, E272 and D368 of SEQ ID NO: 19;
D181, E265 and D345 of SEQ ID NO: 20;
D184, E268 and D364 of SEQ ID NO: 21;
D183, E267 and D349 of SEQ ID NO: 22;
D186, E270 and D351 of SEQ ID NO: 23;
D185, E279 and D361 of SEQ ID NO: 24;
D188, E272 and D352 of SEQ ID NO: 25;
D181, E265 and D348 of SEQ ID NO: 26;
D186, E276 and D359 of SEQ ID NO: 27; and
D189, E274 and D354 of SEQ ID NO: 28;
D188, E272 and D352 of SEQ ID NO: 29.
In some embodiments, the recombinant gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
In some embodiments, the recombinant gene editing system further comprises a heterologous polynucleotide, such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
In some embodiments, the backbone of the guide polynucleotide comprises the 115-350
nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived. In some embodiments, the backbone is modified by removing a part or the whole of one or several stem structures in the backbone. In some embodiments, a part or the whole of the first stem structure from the 3’ end is removed.
In some embodiments the guide polynucleotide comprises an additional nucleotide sequence at the 5’ end of the backbone. In some embodiments, the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
In some embodiments, the gRNA further comprises one or more additional protein-binding domains. In some embodiments, the system comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.
6.2. Composition, complex and isolated cell
During the gene editing, a complex or composition can be formed. Therefore, the present disclosure provides a composition or complex comprising
- a recombinant TnpB polypeptide of the present disclosure having the activity of RNA-guided endonuclease or a functional fragment thereof,
- a target double-stranded DNA comprising a nucleotide sequence of interest and a TAM recognized by the TnpB polypeptide; and
- a recombinant guide RNA (gRNA) comprising a targeting region capable of hybridizing to the nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or a functional fragment thereof.
The present disclosure also provides an isolated cell comprising
- a recombinant TnpB polypeptide of the present disclosure, having the activity of RNA-guided endonuclease or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide, or the functional fragment thereof,
- a target double-stranded DNA comprising a nucleotide sequence of interest and a TAM recognized by the TnpB polypeptide; and
- a recombinant guide RNA (gRNA) comprising a targeting region capable of hybridizing to the nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or a functional fragment thereof.
In some embodiments, the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
In some embodiments, the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca,
Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
In some embodiments, the TnpB polypeptide is encoded by a tnpB gene in an insertion sequence (IS) selected from a group consisting of ISEfa4, ISAs26, ISCpe2, ISMma22, ISBce3, ISAeme8, ISTfu1, ISCco1, ISSoc3, ISTel2, ISNsp3, ISCbt1, ISMac7, ISEc46, ISSen6, ISHahl1, ISKpn69, ISDge10, ISKpn85, ISNsp2, ISAba30, ISRor9, ISAam1, ISYmu1, ISCytsp1, ISCvi1, ISCvi2, ISAepa1 and ISBth16.
In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.The variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
In some embodiments, the TnpB polypeptide fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity. The TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide. In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.
In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT.
In some embodiments, the TnpB polypeptide recognizes a TAM of CCAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.
In some embodiments, the TnpB polypeptide recognizes a TAM of CTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 16.
In some embodiments, the TnpB polypeptide recognizes a TAM of TGAY, where Y= C or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7, 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAG. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAW, where W = A or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 3. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 23. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 25 or 27.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 24.
In some embodiments, the TnpB polypeptide of the present disclosure or the functional fragment thereof is capable of effecting RNA-guided cleavage in a prokaryotic and/or eukaryotic cell, preferably in both prokaryotic and eukaryotic cells. In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24.
In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. The variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
In some embodiments, the TnpB comprises a DDE motif corresponding to
D186, E280 and D362 of SEQ ID NO: 1;
D188, E272 and D368 of SEQ ID NO: 2;
D205, E289 and D372 of SEQ ID NO: 3;
D184, E268 and D348 of SEQ ID NO: 4;
D175, E260 and D339 of SEQ ID NO: 5;
D186, E270 and D352 of SEQ ID NO: 6;
D181, E265 and D361 of SEQ ID NO: 7;
D199, E293 and D383 of SEQ ID NO: 8;
D211, E295 and D373 of SEQ ID NO: 9;
D185, E268 and D350 of SEQ ID NO: 10;
D181, E265 and D345 of SEQ ID NO: 11;
D234, E342 and D436 of SEQ ID NO: 12;
D184, E268 and D350 of SEQ ID NO: 13;
D185, E269 and D351 of SEQ ID NO: 14;
D189, E273 and D369 of SEQ ID NO: 15;
D190, E290 and D376 of SEQ ID NO: 16;
D188, E272 and D368 of SEQ ID NO: 17;
D187, E271 and D351 of SEQ ID NO: 18;
D188, E272 and D368 of SEQ ID NO: 19;
D181, E265 and D345 of SEQ ID NO: 20;
D184, E268 and D364 of SEQ ID NO: 21;
D183, E267 and D349 of SEQ ID NO: 22;
D186, E270 and D351 of SEQ ID NO: 23;
D185, E279 and D361 of SEQ ID NO: 24;
D188, E272 and D352 of SEQ ID NO: 25;
D181, E265 and D348 of SEQ ID NO: 26;
D186, E276 and D359 of SEQ ID NO: 27;
D189, E274 and D354 of SEQ ID NO: 28; and
D188, E272 and D352 of SEQ ID NO: 29.
In some embodiments, the recombinant gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
In some embodiments, the composition further comprises a heterologous polynucleotide, such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
In some embodiments, the backbone of the guide polynucleotide comprises the 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived. In some embodiments, the backbone is modified by removing a part or the whole of one or several stem structures in the backbone. In some embodiments, a part or the whole of the first stem structure from the 3’ end is removed.
In some embodiments the guide polynucleotide comprises an additional nucleotide sequence at the 5’ end of the backbone. In some embodiments, the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
In some embodiments, the gRNA further comprises one or more additional protein-binding domains. In some embodiments, the composition of isolated cell comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.
6.3. Disarmed system
The present disclosure provides a recombinant system comprising
- a modified TnpB polypeptide of the present disclosure comprising a modification in the DDE motif as compared to the parent TnpB polypeptide, or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or the functional fragment thereof, wherein the parent polypeptide has the activity of RNA-guided endonuclease, and the modified TnpB polypeptide is deprived of the activity of cleaving double-stranded DNA, and
- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
In some embodiments, the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
In some embodiments, the DDE motif is modified by substituting at least one amino acid in the motif with a neutral amino acid or a basic amino acid. In some embodiments, at least one amino acid in the motif is substituted by alanine. In some embodiments, the modified TnpB polypeptide comprises
D186A, E280A and/or D362A as compared to SEQ ID NO: 1;
D188A, E272A and/or D368A as compared to SEQ ID NO: 2;
D205A, E289A and/or D372A as compared to SEQ ID NO: 3;
D184A, E268A and/or D348A as compared to SEQ ID NO: 4;
D175A, E260A and/or D339A as compared to SEQ ID NO: 5;
D186A, E270A and/or D352A as compared to SEQ ID NO: 6;
D181A, E265A and/or D361A as compared to SEQ ID NO: 7;
D199A, E293A and/or D383A as compared to SEQ ID NO: 8;
D211A, E295A and/or D373A as compared to SEQ ID NO: 9;
D185A, E268A and/or D350A as compared to SEQ ID NO: 10;
D181A, E265A and/or D345A as compared to SEQ ID NO: 11;
D234A, E342A and/or D436A as compared to SEQ ID NO: 12;
D184A, E268A and/or D350A as compared to SEQ ID NO: 13;
D185A, E269A and/or D351A as compared to SEQ ID NO: 14;
D189A, E273A and/or D369A as compared to SEQ ID NO: 15;
D190A, E290A and/or D376A as compared to SEQ ID NO: 16;
D188A, E272A and/or D368A as compared to SEQ ID NO: 17;
D187A, E271A and/or D351A as compared to SEQ ID NO: 18;
D188A, E272A and/or D368A as compared to SEQ ID NO: 19;
D181A, E265A and/or D345A as compared to SEQ ID NO: 20;
D184A, E268A and/or D364A as compared to SEQ ID NO: 21;
D183A, E267A and/or D349A as compared to SEQ ID NO: 22;
D186A, E270A and/or D351A as compared to SEQ ID NO: 23;
D185A, E279A and/or D361A as compared to SEQ ID NO: 24;
D188A, E272A and/or D352A as compared to SEQ ID NO: 25;
D181A, E265A and/or D348A as compared to SEQ ID NO: 26;
D186A, E276A and/or D359A as compared to SEQ ID NO: 27;
D189A, E274A and/or D354A as compared to SEQ ID NO: 28; and
D188A, E272A and/or D352A as compared to SEQ ID NO: 29.
In some embodiments, the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide is conserved at positions corresponding to N31, G179, L267, C332, C335, C351, and C354 of SEQ ID NO: 7.
In some embodiments, the modified TnpB polypeptide comprising a modification at the position corresponding to N31 of SEQ ID NO: 7. In some embodiments, the modification is a
substitution with alanine.
In some embodiments, the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide is conserved at positions corresponding to G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
In some embodiments, the modified TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA, i.e., is a disarmed TnpB polypeptide.
In some embodiments, the backbone of the guide polynucleotide comprises the 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived. In some embodiments, the backbone is modified by removing a part or the whole of one or several stem structures in the backbone. In some embodiments, a part or the whole of the first stem structure from the 3’ end is removed.
In some embodiments the guide polynucleotide comprises an additional nucleotide sequence at the 5’ end of the backbone. In some embodiments, the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
In some embodiments, the gRNA further comprises one or more additional protein-binding domains. In some embodiments, the system comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.
6.4. Fusion system
The present disclosure provides a gene editing system comprising
- a fusion polypeptide of the present disclosure, e.g., comprising a TnpB polypeptide having the activity of RNA-guided endonuclease, or a functional fragment thereof, or a modified TnpB polypeptide fused to a fusion partner, or a polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide, or the functional fragment thereof, and
- a guide polynucleotide, such as a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the
nucleotide sequence encoding the gRNA.
In some embodiments, the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
In some embodiments, the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
In some embodiments, the TnpB polypeptide is encoded by a tnpB gene in an insertion sequence (IS) selected from a group consisting of ISEfa4, ISAs26, ISCpe2, ISMma22, ISBce3, ISAeme8, ISTfu1, ISCco1, ISSoc3, ISTel2, ISNsp3, ISCbt1, ISMac7, ISEc46, ISSen6, ISHahl1, ISKpn69, ISDge10, ISKpn85, ISNsp2, ISAba30, ISRor9, ISAam1, ISYmu1, ISCytsp1, ISCvi1, ISCvi2, ISAepa1 and ISBth16.
In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. The variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7
In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335,
340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
In some embodiments, the TnpB polypeptide fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity. The TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide. In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.
In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT.
In some embodiments, the TnpB polypeptide recognizes a TAM of CCAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.
In some embodiments, the TnpB polypeptide recognizes a TAM of CTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 16.
In some embodiments, the TnpB polypeptide recognizes a TAM of TGAY, where Y= C or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7, 10 or 21.In some embodiments, the TnpB polypeptide recognizes a TAM of TGAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO:1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAG. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAW, where W = A or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 3. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 23. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 25 or 27.
In some embodiments, the TnpB polypeptide recognizes a TAM of TTGAT. In some
embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 24.
In some embodiments, the TnpB polypeptide of the present disclosure or the functional fragment thereof is capable of effecting RNA-guided cleavage in a prokaryotic and/or eukaryotic cell, preferably in both prokaryotic and eukaryotic cells. In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24.
In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. The variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.
In some embodiments, the TnpB comprises a DDE motif corresponding to
D186, E280 and D362 of SEQ ID NO: 1;
D188, E272 and D368 of SEQ ID NO: 2;
D205, E289 and D372 of SEQ ID NO: 3;
D184, E268 and D348 of SEQ ID NO: 4;
D175, E260 and D339 of SEQ ID NO: 5;
D186, E270 and D352 of SEQ ID NO: 6;
D181, E265 and D361 of SEQ ID NO: 7;
D199, E293 and D383 of SEQ ID NO: 8;
D211, E295 and D373 of SEQ ID NO: 9;
D185, E268 and D350 of SEQ ID NO: 10;
D181, E265 and D345 of SEQ ID NO: 11;
D234, E342 and D436 of SEQ ID NO: 12;
D184, E268 and D350 of SEQ ID NO: 13;
D185, E269 and D351 of SEQ ID NO: 14;
D189, E273 and D369 of SEQ ID NO: 15;
D190, E290 and D376 of SEQ ID NO: 16;
D188, E272 and D368 of SEQ ID NO: 17;
D187, E271 and D351 of SEQ ID NO: 18;
D188, E272 and D368 of SEQ ID NO: 19;
D181, E265 and D345 of SEQ ID NO: 20;
D184, E268 and D364 of SEQ ID NO: 21;
D183, E267 and D349 of SEQ ID NO: 22;
D186, E270 and D351 of SEQ ID NO: 23;
D185, E279 and D361 of SEQ ID NO: 24;
D188, E272 and D352 of SEQ ID NO: 25;
D181, E265 and D348 of SEQ ID NO: 26;
D186, E276 and D359 of SEQ ID NO: 27;
D189, E274 and D354 of SEQ ID NO: 28; and
D188, E272 and D352 of SEQ ID NO: 29.
In some embodiments, the recombinant gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
In some embodiments, the recombinant gene editing system further comprises a heterologous polynucleotide, such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
In some embodiments, the backbone of the guide polynucleotide comprises the 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived. In some embodiments, the backbone is modified by removing a part or the whole of one or several stem structures in the backbone. In some embodiments, a part or the whole of the first stem structure from the 3’ end is removed.
In some embodiments the guide polynucleotide comprises an additional nucleotide sequence at the 5’ end of the backbone. In some embodiments, the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.
In some embodiments, the modified TnpB polypeptide comprises a modification in the DDE motif as compared to the parent TnpB polypeptide, wherein the parent polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
In some embodiments, the DDE motif is modified by substituting at least one amino acid in the motif with a neutral amino acid or a basic amino acid. In some embodiments, at least one amino acid in the motif is substituted by alanine. In some embodiments, the modified TnpB polypeptide comprises
D186A, E280A and/or D362A as compared to SEQ ID NO: 1;
D188A, E272A and/or D368A as compared to SEQ ID NO: 2;
D205A, E289A and/or D372A as compared to SEQ ID NO: 3;
D184A, E268A and/or D348A as compared to SEQ ID NO: 4;
D175A, E260A and/or D339A as compared to SEQ ID NO: 5;
D186A, E270A and/or D352A as compared to SEQ ID NO: 6;
D181A, E265A and/or D361A as compared to SEQ ID NO: 7;
D199A, E293A and/or D383A as compared to SEQ ID NO: 8;
D211A, E295A and/or D373A as compared to SEQ ID NO: 9;
D185A, E268A and/or D350A as compared to SEQ ID NO: 10;
D181A, E265A and/or D345A as compared to SEQ ID NO: 11;
D234A, E342A and/or D436A as compared to SEQ ID NO: 12;
D184A, E268A and/or D350A as compared to SEQ ID NO: 13;
D185A, E269A and/or D351A as compared to SEQ ID NO: 14;
D189A, E273A and/or D369A as compared to SEQ ID NO: 15;
D190A, E290A and/or D376A as compared to SEQ ID NO: 16;
D188A, E272A and/or D368A as compared to SEQ ID NO: 17;
D187A, E271A and/or D351A as compared to SEQ ID NO: 18;
D188A, E272A and/or D368A as compared to SEQ ID NO: 19;
D181A, E265A and/or D345A as compared to SEQ ID NO: 20;
D184A, E268A and/or D364A as compared to SEQ ID NO: 21;
D183A, E267A and/or D349A as compared to SEQ ID NO: 22;
D186A, E270A and/or D351A as compared to SEQ ID NO: 23;
D185A, E279A and/or D361A as compared to SEQ ID NO: 24;
D188A, E272A and/or D352A as compared to SEQ ID NO: 25;
D181A, E265A and/or D348A as compared to SEQ ID NO: 26;
D186A, E276A and/or D359A as compared to SEQ ID NO: 27;
D189A, E274A and/or D354A as compared to SEQ ID NO: 28; and
D188A, E272A and/or D352A as compared to SEQ ID NO: 29.
In some embodiments, the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide is conserved at positions corresponding to N31, G179, L267, C332, C335, C351, and C354 of SEQ ID NO: 7.
In some embodiments, the modified TnpB polypeptide comprising a modification at the position corresponding to N31 of SEQ ID NO: 7. In some embodiments, the modification is a substitution with alanine.
In some embodiments, the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide is conserved at positions corresponding to G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.
In some embodiments, the modified TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA, i.e., is a disarmed TnpB polypeptide.
In some embodiments, the fusion partner is a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone or other DNA-binding protein) associated with the target DNA.
In further embodiments, the fusion partner is a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
In further embodiments, the fusion partner is a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc. ) . In some embodiments, the fusion partner is another polypeptide or domain, for example Clo51 or FokI nuclease, to generate double-strand breaks (Guilinger et al. Nature Biotechnology, volume 32, number 6, June 2014) .
In some embodiments, the fusion partner is a polypeptide that directs editing of single or multiple bases in a polynucleotide sequence, for example a site-specific deaminase that can change the identity of a nucleotide, for example from C-G to T-A or an A-T to G-C (Gaudelli et al., Programmable base editing of A-T to G-C in genomic DNA without DNA cleavage. " Nature (2017) ; Nishida et al. "Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. " Science 353 (6305) (2016) ; Komor et al. "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. " Nature 533 (7603) (2016) : 420-4.
In some embodiments, the fusion partner is an active (double strand break creating) , partially active (nickase) or deactivated (deprived of cleaving) TnpB endonuclease and a deaminase (such as, but not limited to, a cytidine deaminase, an adenine deaminase, APOBEC1, APOBEC3A, BE2, BE3, BE4, ABEs, or the like) . In some embodiments, the fusion partner includes base edit repair inhibitors and glycosylase inhibitors (e.g., uracil glycosylase inhibitor (to prevent uracil removal) ) .
In some embodiments, the fusion partner is a Cas endonuclease or another TnpB endonuclease as described in the present disclosure.
In some embodiments, the fusion partner is a heterologous NLS. In some embodiments, the NLS is operably linked to the N-terminus or C-terminus of the TnpB polypeptide, the functional
fragment thereof or the modified TnpB polypeptide. In some embodiments, the fusion polypeptides comprises two or more NLS sequences linked to the TnpB polypeptide the functional fragment thereof or the modified TnpB polypeptide, for example, on both the N-and C-termini of the same.
In some embodiments, the gRNA further comprises one or more additional protein-binding domains. In some embodiments, the system comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.
7. Method for gene editing
The present disclosure provides a method of introducing a double-strand break into a polynucleotide of interest comprising a step of contacting the polynucleotide with the recombinant gene editing system of the present disclosure targeting a nucleotide sequence in the polynucleotide.
The present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell the recombinant gene editing system or the fusion system of the present disclosure targeting a genomic sequence in the cell.
The present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell the disarmed system of the present disclosure and a gene editing system targeting the genomic sequence, wherein the nucleotide sequence targeted by the disarmed sequence is next to the genomic sequence.
Methods for introducing polynucleotides or polypeptides or a polynucleotide-protein complex into cells or organisms are known in the art including, but not limited to, microinjection, electroporation, stable transformation methods, transient transformation methods, ballistic particle acceleration (particle bombardment) , whiskers mediated transformation, Agrobacterium-mediated transformation, direct gene transfer, viral-mediated introduction, transfection, transduction, cell-penetrating peptides, mesoporous silica nanoparticle (MSN) -mediated direct protein delivery, topical applications, sexual crossing, sexual breeding, and any combination thereof.
Adeno-associated virus (AAV) is a widely used vector for deliver heterologous polynucleotides. However, due to the small genome size of AAV (about 4.7 kb) and the large size of Cas polypeptide (more than 1000 amino acids) , the delivery of Cas system with recombinant AAV (rAAV) is limited, and it is generally not possible to deliver a Cas-fusion system (the fusion and gRNA) encoded in a single vector. In contrast to Cas systems, due to the much smaller size of the TnpB polypeptides, the gene editing system with a TnpB polypeptide and a TnpB fusion can be delivered in a single rAAV.
Therefore, the present disclosure provides a recombinant adeno-associate virus (rAAV) comprising a genome comprising a first expression cassette encoding the TnpB polypeptide,
the modified TnpB polypeptide, or the fusion polypeptide of the present disclosure. In some embodiments, the first expression cassette comprises a promoter and a terminator.
In some embodiments, the genome comprises a second expression cassette encoding a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof. In some embodiments, the second expression cassette comprises a promoter and a terminator.
In some embodiments, the first expression cassette comprises less than about 4,700 nucleotides, less than about 4,600 nucleotides, less than about 4,500 nucleotides, less than about 4,400 nucleotides, less than about 4,300 nucleotides, less than about 4,200 nucleotides, less than about 4,100 nucleotides, less than about 4,000 nucleotides, less than about 3,900 nucleotides, less than about 3,800 nucleotides, less than about 3,700 nucleotides, less than about 3,600 nucleotides, less than about 3,500 nucleotides, less than about 3,400 nucleotides, less than about 3,300 nucleotides, less than about 3,200 nucleotides, less than about 3,100 nucleotides, less than about 3,000 nucleotides, less than about 2,900 nucleotides, less than about 2,800 nucleotides, less than about 2,700 nucleotides, less than about 2,600 nucleotides, or less than about 2,500 nucleotides.
In some embodiments the genome comprises about 4,500 to about 4,700 nucleotides.
8. Method of screening TnpB polypeptides having the activity of RNA-guided endonuclease
The present disclosure provides a method of screening TnpB polypeptide for the activity of cleaving double-stranded DNA comprising the steps of:
- providing a candidate TnpB polypeptide from a microorganism;
- providing a gRNA comprising a targeting region and a backbone region, wherein the backbone region comprises at least 100 nucleotides before the 3’ end of the IS, which naturally comprises the nucleotide sequence encoding the TnpB polypeptide;
- providing a target DNA comprising a nucleotide sequence that hybridizes to the nucleotide sequence of the targeting region and a TAM recognized by the TnpB polypeptide, wherein the TAM consists of four or five consecutive nucleotides adjacent to the 5’ end of the IS;
- contacting the TnpB polypeptide with the gRNA and the target DNA; and
- detecting the cleavage on the target DNA.
In some embodiments, the backbone region comprises at least 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300 or more nucleotides before the 3’ end of the IS. In some embodiments, the backbone region comprises 100-350, 125-325, 150-300, 175-275, 200-250, 150-225, 175-225, 150-200, 175-225, or 175-200nucleotides before the 3’ end of the IS.
In some embodiments, the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
The TnpB polypeptide can be provided as an isolated polypeptide or by the expression from a
polynucleotide encoding the same. In some embodiments, the TnpB polypeptide is provided by a first polynucleotide, preferably DNA, comprising a first nucleotide sequence encoding the same.
The gRNA can be provided as an isolated RNA molecule or by the transcription form a DNA molecule encoding the same. In some embodiments, the gRNA is provided as a second polynucleotide, preferably DNA, comprising a second nucleotide sequence encoding the same. The first and second polynucleotides can be provided in a single vector, or in separate vectors. In some embodiments, the first and second polynucleotides are provided in a first vector, such as a first plasmid. In some embodiments, the first nucleotide sequence is operably linked to a first promoter. In some embodiments, the second nucleotide sequence is operably linked to a second promoter.
In some embodiments, the target DNA is provided in a second plasmid.
In some embodiments, contacting the TnpB polypeptide with the gRNA and the target DNA comprises introducing the first and second plasmids into a host cell comprising the target DNA.
Examples
Example 1. Screening of TnpB polypeptides with endonuclease activity
This Example was carried out to screen TnpB polypeptides with endonuclease activity.
A series of plasmid pairs (a test plasmid comprising nucleotide sequences encoding a TnpB polypeptide, the gRNA (comprising a targeting sequence of SEQ ID NO: 240 and related reRNA backbone) and a resistant gene against chloramphenicol (Cm) , and a reporter plasmid comprising a target sequence of SEQ ID NO: 240, a TAM and a resistant gene against kanamycin) were constructed.
In brief, for the test plasmids, TnpB genes and gRNA coding sequences (backbone + targeting region) were synthesized by Tsingke (Beijing, China) and cloned into pBAD backbone by Gibson Assembly, the TnpB genes were driven by J23108 promoter, and gRNA coding sequences were driven by J23119 promoter (see Leenay et al., 2016, Identifying and Visualizing Functional PAM Diversity across CRISPR-Cas Systems, Molecular Cell, 62, 1-11) .
The reporter plasmid (Kan+) carrying oligos containing target nucleotide sequence and related TAM flanked by EcoRI and XhoI restriction sites were ordered from Tsingke (Beijing, China) . In brief, oligos and the pCB457 plasmid were digested with EcoRI and BamHI at 37℃ for 1h. The digested products were isolated with and ligated using T4 ligase according to the manufacturer’s instructions.
The maps of the plasmid pair for ISTfu1 TnpB are shown in Fig. 1 as an example.
Each of the plasmid pairs were transformed into E. coli BW25141 cells by electroporation to
test the endonuclease activity of the TnpB polypeptide (test group) , and a plasmid pair with the removal of the gRNA coding sequence from the test plasmid was used as negative control.
In particular, the test and target plasmids were electroporated into E. coli (NEB 10β, C3020K) using BIO-RAD machine (Gene Pulser Xcell) with program 1.8kV, 25μF, 200phm. After electroporation, 900μl SOC medium was added followed by the incubation at 37℃ for 1 hour. Then, the mixture was quartered, serially diluted (10×) and inocubated onto different LB plates (Cm+/Kan+, Cm+, Kan+ and plain) (50mg/L for each antibiotic) . After the incubation at 37℃for 12 hours, the photos of the plates were taken.
Viable colonies were counted and depletion ratio was calculated (depletion ratio = the number of colonies of TnpB &reRNA group in minimal dilution/the number of colonies of TnpB alone group in minimal dilution) .
The plain plate, K+ plate, and Cm+ plate were used to show the efficiency of electroporation. Generally, the negative control showed similar numbers of colonies on these three plates as compared to the test group.
On the K+Cm+ plate, negative controls and the test groups encoding a TnpB polypeptide not having endonuclease activity showed colonies, while the test groups encoding a TnpB polypeptide having endonuclease activity did not.
81 TnpB polypeptides in total were tested, and the results demonstrated that the TnpB polypeptides of SEQ ID NOs: 1-29 showed endonuclease activity (see Fig. 2) , cleaving a target site in an RNA-guided manner with various depletion ratios (see Fig. 3) , while 52 TnpB polypeptides encoded by SEQ ID NOs: 59-110 did not show endonuclease activity (data not shown) .
The TnpB polypeptides having RNA-guided endonuclease activity and gRNAs as well as the TAMs thereof are shown in Table 1.
Table 1 TnpB polypeptides having desired activity
Example 2. Characterization of the TnpB polypeptides
2.1. Analysis of sequence
The amino acid sequences of the TnpB polypeptides showing endonuclease activity were aligned (Clustal Omega) . The results showed that the DDE motif is conserved in the TnpB polypeptides (see Figs. 4 and 5) . Although the DDE motif in ISBce3 TnpB and ISCbt1 TnpB was not completely aligned with others, it was actually present, and was indicated in the amino acid sequences (see Fig. 5) . It was noted that the TnpB polypeptides having the activity of RNA-
guided endonuclease are generally conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7, except that ISCbt1 (SEQ ID NO: 12) is different from others at positions corresponding to L267, C332, C335, C351 and C354 of SEQ ID NO: 7.
For clarity, the alignment between 25 of the newly-identified TnpB polypeptides and ISDra2 TnpB polypeptide was shown in Fig. 12A.
2.2. DDE motif is essential for the endonuclease activity
The variants of the TnpB polypeptides of SEQ ID NOs: 1-22 with the first D residue of DDE motif substituted by A was tested as described in Example 1. The results showed that the variants substantially lost the endonuclease activity (see Figs. 2 and 3) , indicating that the DDE motif is essential for the endonuclease activity, and it is possible to prepare disarmed TnpB polypeptide (dTnpB) by introducing mutation (s) into DDE motif.
2.3. The conserved amino acid residues are essential for the endonuclease activity
The variants of the TnpB polypeptides ISTfu1 (SEQ ID NO: 7) , ISDge10 (SEQ ID NO: 18) , ISAba30 (SEQ ID NO: 21) and ISDra2 with the amino acid corresponding to N31 of SEQ ID NO:7 substituted by A was tested as described in Example 1. The results showed that the variants substantially lost the endonuclease activity (see Fig. 12B) , indicating that the conserved amino acid residue such as that corresponding to N31 of SEQ ID NO: 7 is essential for the endonuclease activity, and it is possible to prepare disarmed TnpB polypeptide (dTnpB) by introducing mutation (s) into amino acid (s) that are conserved between TnpB polypeptides.
Example 3. RNA-guided cleavage in eukaryotic cells
This Example was carried out to verify the RNA-guided cleavage in eukaryotic cells with the TnpB polypeptides.
A fluorescence-reporting system was used for identify the RNA-guided cleavage. As shown in Fig. 6, the system comprised a target sequence (SEQ ID NO: 240) and a TAM located between mRFP coding sequence and GFP coding sequence (SEQ ID NOs: 140 and 141) . The mRFP and GFP coding sequences were linked out of frame, and thus, the GFP would not be expressed if no cleavage occurred in the target sequence. Once the cleavage occurred, the mRFP and GFP might be linked in frame upon repairing.
Plasmid groups, each comprising a plasmid encoding a TnpB (TnpB plasmid) , a plasmid encoding the corresponding gRNA (gRNA plasmid) and a reporting plasmid comprising the fluorescence-reporting system, were constructed.
In brief, for constructing the reporting plasmid, oligos containing target sequence and related TAM were ordered from Tsingke (Beijing, China) . Then, the oligos were annealed and ligated into pRGS vector digested with EcoRI and BamHI (see Kim et al., Surrogate Reporters for Enrichment of Cells with Nuclease-induced Mutations, Nature Methods, 2011, 8 (11) : 941-944) .
The gRNA plasmid was constructed by inserting the oligos encoding gRNA (the target sequence+backbone as shown in Table 1) flanked by EcoRI and BamHI restriction sites into pUC19 plasmid under the control of U6 promoter, and the TnpB plasmid was constructed by inserting the coding sequence of the TnpB polypeptide (see Table 1) into pcDNA3.1. The maps of the above three plasmids are shown in Fig. 7.
A group of plasmids (120ng TnpB plasmid + 80ng gRNA plasmid+ 200ng reporting plasmids) were co-transfected into HEK293T cells (ATCC, CRL3216) with 2000 Reagent (Invitrogen) according to the manufacturer’s instructions. The resulted cells were analyzed by flow cytometry (LSRFortessa, BD bioscience) .
As shown in Fig. 8, five (ISTfu1, ISDge10, ISAba30, ISAam1, and ISYmu1) of the 29 TnpB polypeptides showed RNA-guided cleavage in eukaryotic cells, i.e., GFP signal was shown in the presence of both the TnpB polypeptide and the gRNA.
Example 4. TAM preference of ISTfu1 TnpB polypeptide
The predicted TAM sequence for ISTfu1 TnpB polypeptide is 5'-TGAT, which is similar to the TAM sequence (5'-TTGAT) of ISDra2 TnpB. In order to further identify the difference between them, a series of reporting plasmids comprising TAM sequences 5'-nTGAT (n=T, A, G, or T) were constructed as described in Example 3.
The test plasmids encoding ISTfu1 TnpB polypeptide or ISDra2 TnpB polypeptide and their respective guide RNAs was co-transfected with the reporting plasmids into 293T cells, respectively, and the resulted cells were detected by flow cytometry, as described in Example 3.
The results were shown by the GFP percentages, which normalized to the GFP percentage of the ISDra2 TnpB/TTGAT group.
As shown in Fig. 9, ISTfu1 TnpB polypeptide can recognize all the four TAMs, and a higher efficiency of cleavage was observed for TAMs TTGAT and CTGAT, while ISDra2 TnpB polypeptide can recognize TTGAT only. Further, ISTfu1 TnpB polypeptide showed an efficiency of cleavage more than 2 times higher than ISDra2 TnpB polypeptide.
Example 5. RNA-guided cleavage of an endogenous gene in human cells
This Example was carried out to verify the RNA-guided cleavage of an endogenous gene in human cells by the TnpB polypeptide of the invention. The gRNA backbones used in this Example are those listed in Table 1.
5.1. Editing in hDNMT1 by ISTfu1 TnpB
This Example was carried out to verify the RNA-guided cleavage of an endogenous gene in human cells by the TnpB polypeptide of the invention. The gRNA backbones used in this Example are those listed in Table 1.
5.1. Editing in hDNMT1 by ISTfu1 TnpB
Plasmids encoding ISTfu1 TnpB polypeptide (SEQ ID NO: 7) and a gRNA comprising a targeting region of SEQ ID NO: 142 (Target sequence 1 in hDNMT1) and a backbone of SEQ ID NO: 117 were constructed and transfected into HEK293T cells as described in Example 3. After an incubation at 37 ℃ for two days, the transfected cells were collected for the isolation of genomic DNA with an isolation kit (DP201, Bioteke Corporation, Beijing) according to the manufacturer’s instructions The genomic DNA was detected by Surveyor assay to identify the efficiency of cleavage, and the genomic DNA from untreated 293T cells were used as control.
Surveyor assay was performed by reference to Guschin et al., 2010 (Guschin et al., A Rapid and General Assay for Monitoring Endogenous Gene Modification, Methods in Molecular Biology, 2010, 649: 247-256) . In brief, 100ng genomic DNA was used in a 25 μL PCR reaction system using AccuPrime Taq polymerase (Invitrogen, USA) and the primers of SEQ ID NOs: 218 and 219. The PCR conditions were as follows: 94℃ for 2 min; 30X (94 ℃ for 20 s, 60℃ for 20 s, 68℃ for 40 s) ; 68℃ for 3 min; hold at 4℃.
6.5 μL PCR products were then denatured, and annealed with 3 μL 1XAccuPrime Buffer II, then digested with 0.5 mL Surveyor nuclease (Integrated DNA Technologies, IDT, USA) . Samples were run on 10%acrylamide TBE gel, stained with ethidium bromide for 10 min, rinsed with water and then exposed on Bio-rad gel imager. The band intensities were quantified using Image J software, and the genome editing efficiency was calculated using the equation: %genome editing =100 * (1 - (1 -fraction cleaved) 1/2) .
As shown in Fig. 10A, Lanes #1, #2, and #3 showed cleavage by Surveyor nuclease (the indels%was 14.5%, 16.6%or 20.0%) , indicating that ISTfu1 TnpB polypeptide can achieve RNA-guided cleavage of human DNMT1 in human cells.
5.2. Editing in hTET1, hTET2 and hHPRT by ISDge10 TnpB
Plasmids encoding ISDge10 TnpB polypeptide (SEQ ID NO: 17) and a gRNA comprising a targeting region of SEQ ID NO: 143 (Target sequence 1 in hTET1) , 144 (Target sequence 1 in hTET2) or 145 (Target sequence in hHPRT) and a backbone of SEQ ID NO: 127 were constructed and transfected into 293T cells as described in Example 3. Plasmids encoding a spCas9 and gRNA comprising the same targeting region were used as reference.
The transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 with the primers listed below.
As shown in Fig. 10B, ISDge10 TnpB polypeptide achieved RNA-guided cleavage of hTET1,
hTET2 and hHPRT, thereby introducing indel into the same indicating that ISDge10 TnpB polypeptide can achieve RNA-guided cleavage of hTET1, hTET2 and hHPRT in human cells.
5.3. Editing in hDNMT1 by ISAba30 TnpB
Plasmids encoding ISAba30 TnpB polypeptide (SEQ ID NO: 22) and a gRNA comprising a targeting region of SEQ ID NO: 146 (Target sequence 2 in hDNMT1) and a backbone of SEQ ID NO: 132 were constructed and transfected into 293T cells as described in Example 3.
The transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 with the primers of SEQ ID NOs: 226 and 227.
As shown in Fig. 10C, ISAba30 TnpB polypeptide achieved RNA-guided cleavage of hDNMT1, thereby introducing indel (indels%of 24.9%and 26.0%for #1 and #2, respectively) into the same indicating that ISAba30 TnpB polypeptide can achieve RNA-guided cleavage of hDNMT1 in human cells.
5.4. Editing in hTET1 and hTET2 by ISAam1 TnpB
Plasmids encoding ISAam1 TnpB polypeptide (SEQ ID NO: 23) and a gRNA comprising a targeting region of SEQ ID NO: 147 (Target sequence 2 in hTET1) or 148 (Target sequence 2 in hTET2) and a backbone of SEQ ID NO: 133 were constructed and transfected into 293T cells as described in Example 3. Plasmids encoding a spCas9 and gRNA comprising the same targeting region were used as reference.
The transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 the primers listed below.
As shown in Fig. 10D, ISAam1 TnpB polypeptide achieved RNA-guided cleavage of hTET1 and hTET2, thereby introducing indel into the same indicating that ISAam1 TnpB polypeptide can achieve RNA-guided cleavage of hTET1 and hTET2 in human cells with an indels%comparable to or even higher than spCas9.
5.5. Editing in hDNMT1, hDNMT3b, hTET, and hPGK1 by ISYmu1 TnpB
Plasmids encoding ISYmu1 TnpB polypeptide (SEQ ID NO: 24) and a gRNA comprising a targeting region of SEQ ID NO: 142, 149 (Target sequence in hDNMT3b) , 150 (Target sequence 3 in hDNMT1) , 151 (Target sequence 3 in hTET2) or 152 (Target sequence in hPGK1) and a backbone of SEQ ID NO: 207 or 210 were constructed and transfected into 293T cells as described in Example 3.
The transfected cells were tested by Surveyor assay of the genomic DNA as described in
Example 5.1 the primers listed below.
As shown in Fig. 10E, ISYmu1 TnpB polypeptide achieved RNA-guided cleavage of hDNMT1, hDNMT3b, hTET, and hPGK1, thereby introducing indel into the same, indicating that ISAba30 TnpB polypeptide can achieve RNA-guided cleavage of hDNMT1, hDNMT3b, hTET, and hPGK1 in human cells.
5.6. Comparison of ISAam1 and ISYmu1 with ISDra2 by editing in human genes
Plasmids encoding ISAam1 TnpB polypeptide (SEQ ID NO: 23) /ISYmu1 TnpB polypeptide (SEQ ID NO: 24) /ISDra2 TnpB polypeptide and a gRNA comprising a targeting region in AGBL1-1, APOB-3, EMX1, MECP2, PGK1, and TET1 genes (see Table 2) and a backbone of SEQ ID NO: 133 (for ISAam1) , SEQ ID NO: 210 (for ISYmu1) or SEQ ID NO: 158 (for ISDra2) .
Table 2
The transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 the primers listed below.
As shown in Fig. 13, ISAam1 and ISYmu1 TnpB polypeptide achieved higher editing efficiency than ISDra2 in most of the tested genes in human cells.
Example 6. Analysis of the gRNA backbone for TnpB polypeptide
The gRNA backbones for ISDra2, ISTfu1, ISDge10, ISAba30, ISAam1, and ISYmu1 TnpB polypeptides were designed as “N” nucleotides at the right end of the IS (referred to as “gN” ) , and the sequences thereof are shown in Tables 3-8. That is, “N” indicates the left end of the gRNA backbone.
Table 3. gRNA backbones for ISDra2
Table 4. gRNA backbones for ISTfu1
Table 5. gRNA backbones for ISDge10
Table 6. gRNA backbones for ISAba30
Table 7. gRNA backbones for ISAam1
Table 8. gRNA backbones for ISYmu1
The RNA-guided cleavages by the ISDra2, ISTfu1, ISDge10, ISAba30, ISAam1, and ISYmu1 TnpB polypeptides were conducted as described in Example 3, and the results were shown in Fig. 11.
In particular, ISDra2 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 129nt (Fig. 11A) ; ISTfu1 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 139nt (Fig. 11B) ; ISDge10 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 122nt (Fig. 11C) ; ISAba30 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 101nt (Fig. 11D) ; ISAam1 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 100nt (Fig. 11E) ; and ISYmu1 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 120nt (Fig. 11F) . The TnpB polypeptides was also able to cleavage the target sequence with longer gRNA backbones (Figs. 11A-11F) , indicating that it is possible to use longer gRNA backbone to form RNP.
Backbone designs with altered right end based on backbones (based on backbones of SEQ ID NOs: 157, 166, and 178, respectively, see Figs. 10A-10C, right panels) were conducted for ISDra2, ISTfu1 and ISDge10 TnpB polypeptides (see Figs. 10A-10C, right panels) , and the RNA-guided cleavages were also conducted as described in Example 3. As shown in Figs. 10A-10C, the substitution of the right end nucleotide inhibited the RNA-guided cleavages, and the substitution of the two right end nucleotides greatly inhibited the RNA-guided cleavages, while the addition or deletion of one or more nucleotides at the right ends greatly inhibited or even eliminated the RNA-guided cleavages, indicating that the right end of the gRNA backbone is essential for the RNA-guided cleavage by TnpB polypeptides.
Example 7. Comparison of editing efficiency of TnpB polypeptides and Cas nucleases This Example was carried to demonstrate that the editing efficiency of the TnpB polypeptides is superior over the small Cas nucleases.
In particular, ISAam1 and ISYmu1 TnpB polypeptides were examined for their ex vivo and in vivo activity together with specificity in comparison with the five well developed CRISPR-Cas editors, which include two optimized Un1Cas12f1 variants (referred as Un1Cas12f1 and CasMINI) , AsCas12f1, optimized Nme2Cas9 (Nme2-C. NR) , and SaCas9. To make these seven systems comparable despite their different TAM/PAM requirement, we carefully chose genomic regions, within which the gRNA for each system could be designed to target sequences overlapping within a narrow range (see, e.g., Fig. 14) .
Plasmid groups, each comprising a plasmid encoding a TnpB or Cas editor, a plasmid encoding the corresponding gRNA (gRNA plasmid) and a reporting plasmid comprising the fluorescence-reporting system, were constructed and transfected into HEK293T cells and HCT116 cells (ATCC CCL-247) , and the genome editing efficiency at 10 genomic loci were tested as described in Example 3.
As shown in Figs. 15A and 15B, ISAam1 and ISYmu1 TnpB polypeptides achieved an editing efficiency which was significantly higher than the well developed small Cas editors (Un1Cas12f1, CasMINI, AsCas12f1, and Nme2-C. NR) and was comparable to SaCas9.
To further minimize the confounding effect of distances between targets, we performed comparison with overlapping (>15 bp) gRNAs against three endogenous targets in HEK293T cell line. The former patterns are largely reproduced where ISAam1 and ISYmu1 TnpBs together with SaCas9 show higher activity compared to Cas12f nucleases (Fig. 15C)
rAAVs encoding a genome editing system was prepared as described previously (see Ran, F. A. et al. 2015, In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186-191) . In brief, a nuclease expression cassette driven by CMV promoter and a gRNA/reRNA expression cassette driven by human U6 promoter were cloned between ITRs (see Fig. 16A) , and used to prepare recombinant AAV2 and AAV8. HEK293T cells were plated in 150mm dishes 12h before transfection; 30 μg helper plasmid, 15 μg AAV2 plasmid and 15 μg expression plasmid were tansfected using polyethyleneimine, and AAV vectors was purified three days later. AAV8 for mouse injection was generated by PackGene Biotech Co, with a concentration of 1013 gc/ml.
C2C12 cells (ATCC CRL-1772) were seeded at 5 x 104 gc per well on 48-well plate. AAV2 was added to cells at a multiplicity of infection of 104 gc per well. Cells were collected 4 days after transduction for genomic DNA extraction and editing efficiency analysis by next generation sequencing. With Rosa26 locus as the target, ISAam1 TnpB and Cas9 systems show appreciable activity (Fig. 16A) . Considering that AsCas12f has the lowest average editing activity among Cas12fs and Nme2-C. NR’s activity is generally lower than saCas9, we removed them from the in vivo experiment to avoid sacrificing more mice than necessary.
We then individually delivered a single AAV8 vector encoding each of five different editing systems into mice and analyzed the editing activity in the target organ liver. All experiments related to animal work described in this study were performed strictly in accordance with the guidelines for the Care and Use of Laboratory Animals, and approved by Animal Welfare and Research Ethics Committee of Institute of Zoology, Chinese Academy of Sciences. The mouse strain C57BL/6J was obtained from Vitalriver. 6-week-old female C57BL/6J mice were injected with 5 x 1011 gc in 100 μl volume via tail vein. Mice were sacrificed 14 days later and liver tissues were collected for genome extraction This in vivo result roughly recapitulates the aforementioned ex vivo data where TnpB and SaCas9 systems show relatively high activity (Fig. 16B) .
For quantifying the editing specificity or off-target level, we employed one candidate-based assay in mouse N2a cells and one unbiased genome-wide assay in human HEK293T cells by iGUIDE-seq (see Nobles, C. L. et al., 2019, iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity. Genome Biol. 20, 14) . In brief, half million HEK293T cells were transfected with 1 μg nuclease plasmid, 500 ng gRNA plasmid and 50 pmol double-stranded oligodeoxynucleotide (dsODN) using Lonza 4D system (Program CM-130) . Cells were collected 3 days after nucleofection for genomic extraction. The genome library was prepared and subjected for sequencing. Specifically, for Rosa26 and Angptl3 loci in mouse, we predicted
potential off-target sites for seven systems, quantified the indel frequencies at top 2 off-target sites, and calculated the ratio between off-target and on-target edits. For MAPK8 locus of human, iGUIDE-seq was performed to characterize the specificity. Among the three loci, ISAam1 shows the lowest off-target ratio for two loci (Figs. 17A and 17B) , while ISYmu1 also exhibits a low degree of off-targeting editing (no off-target editing for one locus and the second or the third lowest off-targeting editing level for the remaining two loci) .
In summary, an in-depth characterization indicates that ISAam1 and ISYmu1 TnpBs outperforms Cas12 variants in terms of ex vivo and in vivo efficiency, while exhibiting comparable performance to Cas9 variants. Moreover, their editing specificity is on par with these Cas12 or Cas9 variants.
Claims (74)
- A recombinant gene editing system comprising- a TnpB polypeptide or a functional fragment thereof or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
- The recombinant gene editing system of claim 1, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- The recombinant gene editing system of claim 1 or 2, which comprises a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
- The recombinant gene editing system of any of claims 1 to 3, wherein the TnpB polypeptide or the functional fragment thereof recognizes a transposon-associated motif (TAM) adjacent to the nucleotide sequence of interest and has an endonuclease activity.
- The recombinant gene editing system of any of claims 1 to 4, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- The recombinant gene editing system of claim 4, wherein the TAM consists of four consecutive nucleotides.
- The recombinant gene editing system of any of claims 1 to 6, further comprising a heterologous polynucleotide.
- The recombinant gene editing system of claim 7, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- A composition comprising- a recombinant TnpB polypeptide or a functional fragment thereof,- a target double-stranded DNA comprising a nucleotide sequence of interest and a TAM recognized by the TnpB polypeptide; and- a recombinant guide RNA (gRNA) comprising a targeting region capable of hybridizing to the nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or a functional fragment thereof,wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
- The composition of claim 9, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- The composition of claim 9 or 10, wherein the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
- The composition of any of claims 9 to 11, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- The composition of claim 11, wherein the TAM consists of four consecutive nucleotides.
- The composition of any of claims 9 to 13, further comprising a heterologous polynucleotide.
- The composition of claim 14, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- A method of introducing a double-strand break into a polynucleotide of interest comprising a step of contacting the polynucleotide with a recombinant gene editing system comprising- a TnpB polypeptide or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence in the polynucleotide of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
- The method of claim 16, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- The method of claim 16 or 17, wherein the gene editing system comprises the TnpB polypeptide or the functional fragment thereof and the gRNA, or a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
- The method of any of claims 16 to 18, wherein the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
- The method of any of claims 16 to 19, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- The method of claim 19, wherein the TAM consists of four consecutive nucleotides.
- The method of any of claims 16 to 21, wherein the gene editing system further comprises a heterologous polynucleotide.
- The method of claim 22, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- A method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant gene editing system comprising- a TnpB polypeptide or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a portion of the genomic sequence and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
- The method of claim 24, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- The method of claim 24 or 25, wherein the gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
- The method of any of claims 24 to 26, wherein the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
- The method of any of claims 24 to 27, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
- The method of claim 27, wherein the TAM consists of four consecutive nucleotides.
- The method of any of claims 24 to 29, wherein the gene editing system further comprises a heterologous polynucleotide.
- The method of claim 30, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- The method of any of claims 24 to 31, wherein the cell is a prokaryotic or eukaryotic cell.
- A modified TnpB polypeptide comprising a modification in the DDE motif as compared to the parent TnpB polypeptide, wherein the parent polypeptide has an activity of RNA-guided endonuclease, and wherein the modified TnpB is deprived of the activity of cleaving double-stranded DNA.
- The modified TnpB polypeptide of claim 33, wherein at least one amino acid in the DDE motif is substituted by alanine, an amino acid corresponding to N31 of SEQ ID NO: 7 is substituted by alanine.
- The modified TnpB polypeptide of claim 33 or 34, wherein the parent TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
- The modified TnpB polypeptide of claim 33, wherein the modified TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- A recombinant system comprising- the modified TnpB polypeptide of any of claims 33 to 36, or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or the functional fragment thereof, and- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
- The recombinant system of claim 37, which comprises a first polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
- The recombinant system of claim 37 or 38, wherein the gRNA further comprises one or more protein-binding domains.
- A method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant system of any of claims 37 to 39 and a gene editing system targeting the genomic sequence, wherein the nucleotide sequence of interest is next to the genomic sequence.
- A fusion polypeptide comprising a TnpB polypeptide, or a functional fragment thereof, or a disarmed variant thereof fused to a fusion partner, wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and has an activity of RNA-guided endonuclease, and wherein the disarmed variant is the modified TnpB polypeptide of any of claims 33-36.
- The fusion polypeptide of claim 41, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
- The fusion polypeptide of claim 41 or 42, wherein the TnpB polypeptide has the activity of cleaving double-stranded DNA.
- The fusion polypeptide of any of claims 41 to 43, wherein the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
- The fusion polypeptide of claim 44, wherein the TAM consists of four consecutive nucleotides.
- A gene editing system comprising- the fusion polypeptide of any of claims 41-44, or a polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide, and- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
- The gene editing system of claim 46, which comprises a first polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
- The gene editing system of claims 46 or 47, further comprising a heterologous polynucleotide.
- The gene editing system of claim 48, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
- A method of modifying a genomic sequence in a eukaryotic cell, comprising a step of introducing the gene editing system of any of claims 46-49 into the eukaryotic cell, wherein the gRNA comprises a targeting region capable of hybridizing to a portion of the genomic sequence.
- A method of screening TnpB polypeptide for the activity of cleaving double-stranded DNA comprising the steps of:- providing a candidate TnpB polypeptide from a microorganism;- providing a gRNA comprising a targeting region and a backbone region, wherein the backbone region comprises 100-350 nucleotides before the 3’ end of the IS, which naturally comprises the nucleotide sequence encoding the TnpB polypeptide;- providing a target DNA comprising a nucleotide sequence that hybridizes to the nucleotide sequence of the targeting region and a TAM recognized by the TnpB polypeptide, wherein the TAM consists of four or five consecutive nucleotides adjacent to the 5’ end of the IS;- contacting the TnpB polypeptide with the gRNA and the target DNA; and- detecting the cleavage on the target DNA.
- The method of claim 51, wherein the TnpB polypeptide is provided as a first polynucleotide comprising a first nucleotide sequence encoding the same.
- The method of claim 51 or 52, wherein the gRNA is provided as a second polynucleotide comprising a second nucleotide sequence encoding the same.
- The method of claim 52 or 53, wherein the first and second polynucleotides are provided in a first plasmid.
- The method of claim 52, wherein the first nucleotide sequence is operably linked to a first promoter.
- The method of claim 53, wherein the second nucleotide sequence is operably linked to a second promoter.
- The method of any of claims 51-56, wherein the target DNA is provided in a second plasmid.
- The method of claim 57, wherein contacting the TnpB polypeptide with the gRNA and the target DNA comprising introducing the first and second plasmids into a host cell comprising the target DNA.
- The method of any of claims 51-58, wherein the TnpB polypeptide comprises a a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
- A fusion polypeptide comprising the modified TnpB polypeptide of any of claims 33-36 fused to a fusion partner.
- The fusion polypeptide of claim 60, wherein the fusion partner is a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone) associated with the target DNA.
- The fusion polypeptide of claim 60, wherein the fusion partner is a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
- The fusion polypeptide of claim 60, wherein the fusion partner is a polypeptide that directly provides for increased transcription of the target nucleic acid.
- The fusion polypeptide of claim 63, wherein the fusion partner is a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator.
- The fusion polypeptide of claim 60, wherein the fusion partner is another polypeptide or domain to generate double-strand breaks.
- The fusion polypeptide of claim 60, wherein the fusion partner is a polypeptide that directs editing of single or multiple bases in a polynucleotide sequence.
- The fusion polypeptide of claim 66, wherein the fusion partner is a site-specific deaminase that can change the identity of a nucleotide, for example from C-G to T-A or an A-T to G-C.
- The fusion polypeptide of claim 66, wherein the fusion partner is a deaminase such as a cytidine deaminase, an adenine deaminase, APOBEC1, APOBEC3A, BE2, BE3, BE4, or ABEs.
- The fusion polypeptide of claim 66, wherein the fusion partner includes base edit repair inhibitors and glycosylase inhibitors.
- The fusion polypeptide of claim 60, wherein the fusion partner can be a Cas endonuclease or another TnpB endonuclease as described in the present disclosure.
- The fusion polypeptide of claim 60, wherein the fusion partner is a nuclear localization sequence (NLS) .
- A recombinant adeno-associated virus (rAAV) comprising a genome comprising a first expression cassette encoding the fusion polypeptide of any of claims 41-45 and 60-71.
- The rAAV of claim 72, wherein the genome comprises a second expression cassette encoding a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof.
- The rAAV of claim 71 or 72, wherein the first expression cassette comprises less than about 4,700 nucleotides, less than about 4,600 nucleotides, less than about 4,500 nucleotides, less than about 4,400 nucleotides, less than about 4,300 nucleotides, less than about 4,200 nucleotides, less than about 4,100 nucleotides, or less than about 4,000 nucleotides.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2022106290 | 2022-07-18 | ||
CNPCT/CN2022/106290 | 2022-07-18 | ||
CNPCT/CN2023/098324 | 2023-06-05 | ||
CN2023098324 | 2023-06-05 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024017189A1 true WO2024017189A1 (en) | 2024-01-25 |
Family
ID=89617116
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/107697 WO2024017189A1 (en) | 2022-07-18 | 2023-07-17 | Tnpb-based genome editor |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024017189A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130022590A1 (en) * | 2009-12-11 | 2013-01-24 | Mackay Joel | Compositions Comprising Zinc Finger Domains and Uses Therefor |
US20200063126A1 (en) * | 2018-03-14 | 2020-02-27 | Arbor Biotechnologies, Inc. | Novel crispr dna targeting enzymes and systems |
CN113528582A (en) * | 2020-04-15 | 2021-10-22 | 博雅辑因(北京)生物科技有限公司 | Method and medicine for targeted editing of RNA based on LEAPER technology |
WO2022060707A1 (en) * | 2020-09-15 | 2022-03-24 | Rutgers, The State University Of New Jersey | Systems for gene editing and methods of use thereof |
-
2023
- 2023-07-17 WO PCT/CN2023/107697 patent/WO2024017189A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130022590A1 (en) * | 2009-12-11 | 2013-01-24 | Mackay Joel | Compositions Comprising Zinc Finger Domains and Uses Therefor |
US20200063126A1 (en) * | 2018-03-14 | 2020-02-27 | Arbor Biotechnologies, Inc. | Novel crispr dna targeting enzymes and systems |
CN113528582A (en) * | 2020-04-15 | 2021-10-22 | 博雅辑因(北京)生物科技有限公司 | Method and medicine for targeted editing of RNA based on LEAPER technology |
WO2022060707A1 (en) * | 2020-09-15 | 2022-03-24 | Rutgers, The State University Of New Jersey | Systems for gene editing and methods of use thereof |
Non-Patent Citations (3)
Title |
---|
DATABASE Protein 18 October 2021 (2021-10-18), ANONYMOUS : "MULTISPECIES: IS200/IS605 family element RNA-guided endonuclease TnpB", XP093131815, retrieved from NCBI Database accession no. WP_002287525.1 * |
DATABASE Protein 6 April 2020 (2020-04-06), ANONYMOUS : "transposase-like protein B (plasmid) [Enterococcus faecium] ", XP093131817, retrieved from NCBI Database accession no. ABZ01936.1 * |
KARVELIS TAUTVYDAS; DRUTEIKA GYTIS; BIGELYTE GRETA; BUDRE KAROLINA; ZEDAVEINYTE RIMANTE; SILANSKAS ARUNAS; KAZLAUSKAS DARIUS; VENC: "Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease", NATURE, vol. 599, no. 7886, 7 October 2021 (2021-10-07), pages 692 - 696, XP037627757, DOI: 10.1038/s41586-021-04058-1 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210017507A1 (en) | Methods and compositions for sequences guiding cas9 targeting | |
US20230407341A1 (en) | Using Truncated Guide RNAs (tru-gRNAs) to Increase Specificity for RNA-Guided Genome Editing | |
CA2936646C (en) | Methods and compositions for sequences guiding cas9 targeting | |
US10982200B2 (en) | Enzymes with RuvC domains | |
US11713471B2 (en) | Class II, type V CRISPR systems | |
JP6336140B2 (en) | Nuclease-mediated DNA assembly | |
EP3744844A1 (en) | Extended single guide rna and use thereof | |
JP2019162140A (en) | Crispr hybrid dna/rna polynucleotides and methods of use | |
US11767525B2 (en) | System and method for genome editing | |
CN113373130A (en) | Cas12 protein, gene editing system containing Cas12 protein and application | |
CA3177828A1 (en) | Enzymes with ruvc domains | |
WO2022199511A1 (en) | Lt1cas13d protein and gene editing system | |
CA3228222A1 (en) | Class ii, type v crispr systems | |
CN118325867A (en) | Cas9 protein, gene editing system containing Cas9 protein and application | |
KR102151064B1 (en) | Gene editing composition comprising sgRNAs with matched 5' nucleotide and gene editing method using the same | |
CN116751762A (en) | Cas12b proteins, single stranded guide RNAs, gene editing systems comprising same and related applications | |
WO2024017189A1 (en) | Tnpb-based genome editor | |
CN117025570A (en) | Cas12a mutant protein, gene editing system containing Cas12a mutant protein and application | |
WO2020092704A1 (en) | Multiplexed deterministic assembly of dna libraries | |
JP2024501892A (en) | Novel nucleic acid-guided nuclease | |
US20210115500A1 (en) | Genotyping edited microbial strains | |
CN115667528B (en) | Multiplex genome editing method and system | |
RU2794774C1 (en) | Crispr/cas9 type ii genome editing system and its use | |
US20230407278A1 (en) | Compositions and methods for cas9 molecules with improved gene editing properties | |
CN118726313A (en) | Streptococcus pyogenes CAS9 mutant genes and polypeptides encoded thereby |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23842252 Country of ref document: EP Kind code of ref document: A1 |