WO2024017189A1

WO2024017189A1 - Tnpb-based genome editor

Info

Publication number: WO2024017189A1
Application number: PCT/CN2023/107697
Authority: WO
Inventors: Haoyi WANG; Guanghai XIANG; Yong Zhang; Yuanqing Li; Yongyuan HUO; Jing Sun
Original assignee: Beijing Institute For Stem Cell And Regenerative Medicine; Institute Of Zoology, Chinese Academy Of Sciences
Priority date: 2022-07-18
Filing date: 2023-07-17
Publication date: 2024-01-25

Abstract

Provided herein are novel TnpB polypeptides having the activity of RNA-guided endonuclease. Provided herein are also a gene editing system comprising the TnpB polypeptide of the present invention or a disarmed variant thereof, or a fusion polypeptide comprising the same as well as a method for gene editing with the TnpB polypeptide or the disarmed variant thereof, or the fusion polypeptide or the gene editing system of the invention.

Description

TnpB-Based Genome Editor

Technical Field

The present invention relates to molecular biology. In particular, the present invention provides novel RNA-guided systems for gene editing.

Background

The modification of genome at a predetermined site has been enabled by employing site-specific systems.

Genome-editing techniques such as meganucleases, designer zinc finger nucleases (ZFNs) , or transcription activator-like effector nucleases (TALENs) , are available for producing targeted genome modification, but these systems tend to have low specificity and employ designed nucleases that need to be redesigned for each target site, which renders them costly and time-consuming to prepare.

Recently, CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) -Cas systems have become popular, which comprise different domains of effector proteins that encompass a variety of activities (DNA recognition, binding, and optionally cleavage) . However, the Cas nuclease is generally large in size, making it difficult to deliver the CRISPR-Cas systems into a cell.

Transposition has a key role in reshaping genomes of all living organisms. Insertion sequences of IS200/IS605 and IS607 families are among the simplest mobile genetic elements and contain only the genes that are required for their transposition and its regulation. These elements encode tnpA transposase, which is essential for mobilization, and often carry an accessory tnpB gene, which is dispensable for transposition. A TnpB protein (ISDra2 TnpB) has been reported to have the activity of RNA-guided DNA endonuclease (Karvelis et al., Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature, 2021, 559: 692-696) . TnpB is a functional progenitor of CRISPR–Cas nucleases and is established as a prototype of a new system for genome editing. TnpB proteins are generally much smaller than Cas proteins in length, and thus, will be more convenient for the delivery into a cell.

However, the ISDra2 TnpB needs to recognize a transposon-associated motif (TAM) of TTGAT for effecting the RNA-guided cleavage. It is known that a longer sequence is generally present in the genome with a lower frequency. Therefore, the use of the ISDra2 TnpB in genome editing will be limited.

Hence, there is a need of identifying new TnpB polypeptides having the activity of RNA-guided DNA endonuclease, which recognize a different such as shorter TAM as compared to ISDra2 TnpB, and genome editing systems comprising the same.

Summary of the Invention

To meet the need above, the inventors identified a number of TnpB polypeptides having the activity of RNA-guided DNA endonuclease. The identification of the TnpB polypeptides increases the possibility of editing various genomic regions which is not accessible for the TnpB polypeptide of the prior art. Further, the TnpB polypeptides of the present disclosure provide an editing efficiency higher than the prior art TnpB polypeptide, and even comparable to Cas9 nuclease.

In a first aspect, the present disclosure provides a recombinant gene editing system comprising

- a TnpB polypeptide or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and

- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA, wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.

In a second aspect, the present disclosure provides a composition comprising

- a recombinant TnpB polypeptide or a functional fragment thereof,

- a target double-stranded DNA comprising a nucleotide sequence of interest and a TAM recognized by the TnpB polypeptide; and

- a recombinant guide RNA (gRNA) comprising a targeting region capable of hybridizing to the nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or a functional fragment thereof, wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.

In a third aspect, the present disclosure provides a method of introducing a double-strand break into a polynucleotide of interest comprising a step of contacting the polynucleotide with a recombinant gene editing system comprising

- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence in the polynucleotide of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,

wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.

In a fourth aspect, the present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant gene editing system comprising

- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a portion of the genomic sequence and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,

In a fifth aspect, the present disclosure provides a modified TnpB polypeptide comprising a modification in the DDE motif as compared to the parent TnpB polypeptide, wherein the parent polypeptide has an activity of RNA-guided endonuclease, and wherein the modified TnpB is deprived of the activity of cleaving double-stranded DNA.

In a sixth aspect, the present disclosure provides a recombinant system comprising

- the modified TnpB polypeptide of the fifth aspect, or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or the functional fragment thereof, and

- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.

In a seventh aspect, the present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant system of the sixth aspect and a gene editing system targeting the genomic sequence, wherein the nucleotide sequence of interest is next to the genomic sequence.

In an eighth aspect, the present disclosure provides a fusion polypeptide comprising a TnpB polypeptide or a functional fragment thereof or disarmed TnpB polypeptide fused to a fusion partner, wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, wherein the TnpB polypeptide has an activity of RNA-guided endonuclease, and wherein the disarmed TnpB polypeptide is deprived of the activity of cleaving double-stranded RNA.

In a ninth aspect, the present disclosure provides a gene editing system comprising

- the fusion polypeptide of the present disclosure, or a polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide, and

In a tenth aspect, the present disclosure provides a A method of modifying a genomic sequence in a eukaryotic cell, comprising a step of introducing the gene editing system of the ninth aspect into the eukaryotic cell, wherein the gRNA comprises a targeting region capable of hybridizing to a portion of the genomic sequence.

In an eleventh aspect, the present disclosure provides a method of screening TnpB polypeptide for the activity of cleaving double-stranded DNA comprising the steps of:

- providing a candidate TnpB polypeptide from a microorganism;

- providing a gRNA comprising a targeting region and a backbone region, wherein the backbone region comprises 100-350 nucleotides before the 3’ end of the IS, which naturally comprises the nucleotide sequence encoding the TnpB polypeptide;

- providing a target DNA comprising a nucleotide sequence that hybridizes to the nucleotide sequence of the targeting region and a TAM recognized by the TnpB polypeptide, wherein the TAM consists of four or five consecutive nucleotides adjacent to the 5’ end of the IS;

- contacting the TnpB polypeptide with the gRNA and the target DNA; and

- detecting the cleavage on the target DNA.

Brief Description of the Drawings

Fig. 1 shows the maps of the pair of plasmids for screening TnpB polypeptide having the activity of RNA-guided DNA endonuclease, comprising the test plasmid (encoding a TnpB polypeptide and a gRNA, Fig. 1A) and the reporter plasmids (comprising a target sequence, Fig. 1B) .

Fig. 2 shows the results of the screening for TnpB polypeptide having the activity of RNA-guided DNA endonuclease.

Fig. 3 shows the depletion ratio of TnpB polypeptides and dTnpB polypeptides (the TnpB polypeptides with the DDE motif substituted by alanine) .

Fig. 4 shows the alignment of the amino acid sequences of TnpB polypeptides having the activity of RNA-guided DNA endonuclease.

Fig. 5 shows the amino acid sequences of TnpB polypeptides having the activity of RNA-guided DNA endonuclease, and the DDE motif in the amino acid sequences (the residues bold and underlined) .

Fig. 6 shows the structure and mechanism of the fluorescence-reporting system.

Fig. 7 shows the maps of the plasmids for detecting the RNA-guided cleavage in 293T cells with the fluorescence-reporting system, including a plasmid encoding the fluorescence-reporting system (A) , a plasmid encoding the TnpB polypeptide (B) , and a plasmid encoding gRNA (C) .

Fig. 8 shows the results of flow cytometry for detecting the expression of GFP which indicates the RNA-guided cleavage in the fluorescence-reporting system by ISTfu1 TnpB, ISDge10 TnpB, ISAba30 TnpB, ISAam1 and ISYmu1 TnpB polypeptides.

Fig. 9 shows the efficiency of editing by ISTfu1 TnpB and ISDra2 TnpB polypeptides with different TAMs in the fluorescence-reporting system.

Fig. 10 shows the results of the surveyor assays for detecting the RNA-guided cleavage in human cells by ISTfu1 TnpB (panel A) , ISDge10 TnpB (panel B) , ISAba30 TnpB (panel C) , ISAam1 (panel D) and ISYmu1 (panel E) TnpB polypeptides.

Fig. 11 shows the effect of the backbone design on the RNA-guided cleavage by ISDra2 (panel A) , ISTfu1 (panel B) , ISDge10 (panel C) , ISAba30 (panel D) , ISAam1 (panel E) , and ISYmu1 (panel F) TnpB polypeptides.

Fig. 12 shows the distribution of 10 conserved residues in 25 active TnpB proteins together with ISDra2 TnpB (panel A) , which are marked as asterisks and as black lines in the bottom bar with the domain architecture overlaid, and that the endonuclease activity of TnpB mutants (N to A) was sharply decreased (panel B) .

Fig. 13 shows the editing efficiency of ISAam1, ISYmu1 and ISDra2 systems at six randomly selected endogenous sites in HEK293T cells. Data are shown as the mean ± SD, n=3.

Fig. 14 shows gRNA design for seven nucleases at ten genomic loci of human. Nucleases and the corresponding TAM are color-coded. The gRNAs are aligned according to the stranded position. Taking CBLB as an example, the gRNA is more overlapping for ISAam1 and three Cas12f variants than for the other three nucleases.

Fig. 15 shows the comparison of editing efficiency of two TnpB systems and five Cas nucleases at 10 genomic loci in human HEK293T (panel A) and HCT116 (panel B) cells, and the Comparison of editing efficiency of ISAam1 (panel C) or ISYmu1 (panel D) relative to five Cas nucleases at three genomic loci in HEK293T cells. For panels A and B, each dot represents the average efficiency of three biological replicates. The distribution is shown as a box plot where the box indicates the median (middle line) and the interquartile range (IQR, box limits) and Values from minimum to maximum are shown by the whiskers. For panels C and D, the gRNA design is shown on the left panel, and editing efficiency shown on the right panel. The seven nucleases are color-coded. Since it is impossible to design overlapping gRNAs targeting the same location across all seven nucleases, two groups of overlapping gRNAs were separately designed for ISAam1, three Cas12f variants and Nme2-C. NR, and for ISAam1 and SaCas9. ISYmu1 was in a similar scenario.

Fig. 16 shows the AAV2-delivery based editing efficiency at Rosa26 locus in mouse C2C12 cell (n=3, panel A) and the efficiency of AAV8 mediated in vivo gene editing at Rosa26 site (n=4) and Angptl3 site (n=3) in mouse (panel B) . Data are shown as the mean ± SD.

Fig. 17 shows the ratio of off-target vs. on-target editing efficiency at top two predicted off-target sites of each nuclease (n=3, panel A) , and the off-target level quantified by iGUIDE analysis at MAPK8 locus in HEK293T cells (panel B, in which the pie chart shows the proportion of on-target and off-target reads, respectively) .

Detailed Description of the Invention

1. Definitions

As used herein, the terms “TnpB polypeptide” refers to a polypeptide encoded by the tnpB gene in an insertion sequence (IS) . “TnpB endonuclease” , “TnpB effector” and “TnpB nuclease” are used interchangeably herein and refer to the TnpB polypeptide having an activity of RNA-guided endonuclease. A TnpB polypeptide is generally 300-500 amino acid residues in size. A TnpB polypeptide “derived from” a microorganism refers to the TnpB polypeptide naturally occurring in the microorganism, including TnpB polypeptide that can be found in an online database such as the National Center for Biotechnology Information (NCBI) , and the natural variants thereof.

The terms “polypeptide” and “protein” are used interchangeably herein and refer to a polymer of amino acids and includes full-length proteins and fragments thereof.

Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, and include restriction endonucleases that cleave DNA at specific sites without damaging the bases. Examples of endonucleases include, but are not limited to, restriction endonucleases, meganucleases, TAL effector nucleases (TALENs) , zinc finger nucleases, and Cas (CRISPR-associated) effector endonucleases. The present disclosure provides novel RNA-guided TnpB endonucleases.

As used herein, "nucleic acid" means a polynucleotide and includes a single or a double-stranded polymer of deoxyribonucleotide or ribonucleotide bases. Nucleic acids may also include fragments and modified nucleotides. Thus, the terms "polynucleotide" , "nucleic acid sequence" , "nucleotide sequence" and "nucleic acid fragment" are used interchangeably to denote a polymer of RNA and/or DNA and/or RNA-DNA that is single-or double-stranded, optionally comprising synthetic, non-natural, or altered nucleotide bases. Nucleotides (usually found in their 5'-monophosphate form) are referred to by their single letter designation as follows: "A" for adenosine or deoxyadenosine (for RNA or DNA, respectively) , "C" for cytosine or deoxycytosine, "G" for guanosine or deoxyguanosine, "U" for uridine, "T" for deoxythymidine, "R" for purines (A or G) , "Y" for pyrimidines (C or T) , "K" for G or T, "H" for A or C or T, "I" for inosine, and "N" for any nucleotide.

The term "genome" as it applies to a prokaryotic and eukaryotic cell or organism cells encompasses not only chromosomal DNA found within the nucleus, but organelle DNA found within subcellular components (e.g., mitochondria, or plastid) of the cell. The term "genome" refers to the entire complement of genetic material (genes and non-coding sequences) that is present in each cell of an organism, or virus or organelle; and/or a complete set of chromosomes inherited as a (haploid) unit from one parent.

The term "selectively hybridizes" means hybridization, preferably under stringent hybridization conditions, of a nucleic acid sequence to a specified nucleic acid target sequence to a detectably greater degree (e.g., at least 2-fold over background) than its hybridization to non-target nucleic acid sequences, and the substantial exclusion of non-target nucleic acids. Selectively hybridizing sequences typically have about at least 80%sequence identity, or 90%sequence identity, up to and including 100%sequence identity (i.e., fully complementary) with each other.

The term "stringent conditions" or "stringent hybridization conditions" includes reference to conditions under which a probe will selectively hybridize to its target sequence in an in vitro hybridization assay. Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified which are 100%complementary to the probe (homologous probing) . Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing) . A person skilled in the art knows various conditions for hybridization, including stringent hybridization conditions and highly stringent hybridization conditions. See, for example, Sambrook et al., 1989, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, N.Y.; and Ausubel et al. (eds. ) , 1995, Current Protocols in Molecular Biology, John Wiley &Sons, N.Y.

The term "homology" refers to DNA sequences that are similar. For example, a "region of homology to a genomic region" that is found on the donor DNA is a region of DNA that has a similar sequence to a given "genomic region" in the cell or organism genome. A region of homology can be of any length that is sufficient to promote homologous recombination at the cleaved target site. For example, the region of homology can comprise 5-3000 or more bases, such as at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, or 3000 in length to enable the homologous recombination with the corresponding genomic region.

As used herein, a "genomic region" is a segment on a chromosome or organelle DNA of a cell. that is present either upstream or downstream of the target site or, alternatively, also comprises a portion (at either 5’ or 3’ end) of the target site. The genomic region can comprise can comprise 5-3000 or more bases, such as at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2700, 2800, 2900, or 3000 in length to enable the homologous recombination with the corresponding region of homology.

The term "homologous recombination" (HR) means the exchange of DNA fragments between two DNA molecules at the sites of homology. The frequency of homologous recombination is influenced by a number of factors. The amount of homologous recombination and the relative proportion of homologous to non-homologous recombination vary in different organisms. Generally, the length of the region of homology affects the frequency of homologous recombination events: the longer the region of homology, the greater the frequency. Further, the homologous recombination needs a certain length of the homologous region, which is species-variable. See, for example, Singer et al., (1982) Cell 31: 25-33; Shen and Huang, (1986) Genetics 112: 441-57; Watt et al., (1985) Proc. Natl. Acad. Sci. USA 82: 4768-72, Sugawara and Haber, (1992) Mol Cell Biol 12: 563-75, Rubnitz and Subramani, (1984) Mol Cell Biol 4: 2253-8;Ayares et al., (1986) Proc. Natl. Acad. Sci. USA 83: 5199-203; Liskay et al., (1987) Genetics 115: 161-7.

"Sequence identity" or "identity" in the context of nucleotide or amino acid sequences refers to the nucleotide bases or amino acid residues in two sequences that are the same when aligned for maximum correspondence over a specified comparison window.

The term "percentage of sequence identity" refers to the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the nucleotide or amino acid sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage sequence identity is calculated by dividing the number of matched positions (i.e., positions at which the nucleotide bases or amino acid residues in the two sequences are identical) by the total number of positions in the window of comparison and multiplying the results by 100. For example, when aligning two sequences, if 950 positions in two sequences, which are optimally aligned in a comparison window of 1000 positions, are identical, the sequences are 95%identical to each other.

A variety of comparison methods have been designed for sequence alignments and the calculations of percent identity or similarity, including, but not limited to, the MegAlign. TM. program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, Wis. ) . Within the context of this application it will be understood that where sequence analysis software is used for analysis, the results of the analysis will be based on the "default values" of the program referenced, unless otherwise specified. As used herein "default values" will mean any set of values or parameters that originally load with the software when first initialized.

"BLAST" is a searching algorithm provided by NCBI used to find regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches to identify sequences having sufficient similarity to a query sequence such that the similarity would not be predicted to have occurred randomly. BLAST reports the identified sequences and their local alignment to the query sequence. It is well understood by one skilled in the art that many levels of sequence identity are useful in identifying polypeptides from other species or modified naturally or synthetically wherein such polypeptides have the same or similar function or activity. Useful examples of percent identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%or 95%, or any percentage from 50%to 100%. Indeed, any amino acid identity from 50%to 100%may be useful in describing the present disclosure, such as 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%or 99%.

A "centimorgan" (cM) or "map unit" is the distance between two polynucleotide sequences, linked genes, markers, target sites, loci, or any pair thereof, wherein 1%of the products of meiosis are recombinant. Thus, a centimorgan is equivalent to a distance equal to a 1%average recombination frequency between the two linked genes, markers, target sites, loci, or any pair thereof.

An "isolated" polynucleotide, polypeptide, or protein is substantially or essentially free from components that normally accompany or interact with the polynucleotide, polypeptide, or protein as found in its naturally occurring environment. Thus, an isolated polynucleotide or polypeptide or protein is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. Preferably, an "isolated" polynucleotide is free of sequences that naturally flank the polynucleotide (i.e., sequences located at the 5' and 3' ends of the polynucleotide) in the genomic DNA of the organism from which the polynucleotide is derived. Isolated polynucleotides and polypeptides may be purified from a cell in which they naturally occur. The methods for isolating or purifying polynucleotides or polypeptides are known to a person skilled in the art. The term also embraces recombinant or chemically synthesized polynucleotides and polypeptides.

The term "fragment" refers to a contiguous set of nucleotides or amino acids. In one embodiment, a fragment comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous nucleotides. In one embodiment, a fragment comprises 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous amino acids. A fragment may or may not exhibit the function of a sequence sharing some percent identity over the length of said fragment.

The term "functional fragment" refers to a portion of an isolated polynucleotide or polypeptide that displays the same activity or function as the longer or full-length sequence from which it derives.

The term "gene" includes a nucleic acid fragment that expresses a functional molecule such as, but not limited to, a specific protein, including regulatory sequences preceding (5' non-coding sequences) and following (3' non-coding sequences) the coding sequence.

The term "endogenous" means a sequence or other molecule that naturally occurs in a cell or organism. An endogenous polynucleotide is normally found in the genome of a cell; that is, not heterologous.

The term "heterologous" refers to the difference between the original environment, location, or composition of a particular polynucleotide or polypeptide and its current environment, location, or composition. Non-limiting examples include differences in taxonomic derivation (e.g., a polynucleotide obtained from species A would be heterologous if inserted into the genome of species B, or of a different variety or cultivar of species A; or a polynucleotide obtained from a bacterium was introduced into a cell of a plant or an animal) , or sequence (e.g., a polynucleotide obtained from species A, isolated, modified, and re-introduced into a plant of species A) .

An "allele" is one of several alternative forms of a gene occupying a given locus on a chromosome. When all the alleles present at a given locus on a chromosome in a cell or an organism are the same, the cell or organism is homozygous at that locus. If the alleles present at a given locus on a chromosome differ, the cell or organism is heterozygous at that locus.

"Coding sequence" refers to a nucleotide sequence which codes for a specific amino acid sequence. "Regulatory sequences" refer to nucleotide sequences located upstream (5' non-coding sequences) , within, or downstream (3' non-coding sequences) of a coding sequence, which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include, but are not limited to, promoters, translation leader sequences, 5' untranslated sequences, 3' untranslated sequences, introns, polyadenylation signal sequences, RNA processing sites, effector binding sites, and stem-loop structures.

A "mutated gene" is a gene that has been altered through human intervention. A "mutated gene" has a sequence that differs from the sequence of the corresponding non-mutated gene by the addition, deletion, insertion or substitution of at least one nucleotide. In the present disclosure, the mutated gene comprises an alteration that results from a guide polynucleotide/TnpB endonuclease system as disclosed herein. A mutated organism is an organism comprising a mutated gene.

As used herein, a "targeted mutation" is a mutation in a gene that is made in a target sequence within the gene using any method known to a person skilled in the art, including a method involving a guided TnpB endonuclease system as disclosed herein.

The term "knock-out" refers to a DNA sequence in a cell that has been rendered partially or completely inoperative, e.g., by targeting with a TnpB protein of the present disclosure; for example, a DNA sequence prior to knock-out could have encoded an amino acid sequence, or could have had a regulatory function (e.g., promoter) .

The term "knock-in" represents the replacement or insertion of a DNA sequence at a specific site in the genome of a cell by targeting with a TnpB protein (for example by homologous recombination (HR) , wherein a suitable donor DNA polynucleotide is also used) . The knock-in can be a specific insertion of a heterologous nucleotide sequence that encodes an amino acid sequence or a functional RNA, or a specific insertion of a transcriptional regulatory element. The term "domain" means a contiguous stretch of nucleotides (that can be RNA, DNA, and/or RNA-DNA-combination sequence) or contiguous or non-contiguous amino acids.

A "conserved domain" or "motif" means a set of nucleotides or amino acids conserved at specific positions along an aligned sequence of evolutionarily related genes or proteins. While nucleotides or amino acids at other positions can vary between homologous proteins, nucleotides or amino acids that are highly conserved at specific positions indicate amino acids that are essential for the structure, the stability, or the function of a polynucleotide or protein.

A "codon-optimized" nucleotide sequence is a nucleotide sequence having its frequency of codon usage designed to mimic the frequency of preferred codon usage of the host cell. An "optimized" polynucleotide comprises a nucleotide sequence that has been optimized for improved expression in a particular heterologous host cell.

A "promoter" is a nucleotide sequence involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, and/or comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of some variation may have identical promoter activity.

A promoter that causes a gene to be expressed in most tissues or cell types at most times are commonly referred to as "constitutive promoter" . The term "inducible promoter" or “regulated promoter” refers to a promoter that selectively express a coding sequence or functional RNA in response to the presence of an endogenous or exogenous stimulus, for example by chemical compounds (chemical inducers) or in response to environmental, hormonal, chemical, and/or developmental signals. Inducible or regulated promoters include, for example, promoters induced or regulated by light, heat, stress, flooding or drought, salt stress, osmotic stress, phytohormones, wounding, or chemicals such as ethanol, abscisic acid (ABA) , jasmonate, salicylic acid, or safeners.

An "enhancer" is a nucleotide sequence that can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the activity or tissue-specificity of a promoter.

The term "translation leader sequence" refers to a nucleotide sequence located between the promoter sequence and the coding sequence. The translation leader sequence is present in the mRNA upstream of the start codon. The translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency.

The term "3' non-coding sequences" , which can be exchanged with "transcription terminator" or "termination sequences" refer to nucleotide sequences located downstream of a coding sequence and include polyadenylation signal sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3' end of the mRNA precursor.

The term "RNA transcript" refers to the product resulting from transcription of a DNA sequence catalyzed by RNA polymerase. When the RNA transcript is a perfect complimentary copy of the DNA sequence, it is referred to as the primary transcript or pre-mRNA. An RNA transcript derived from post-transcriptional processing of the pre-mRNA is referred to as mature RNA or messenger RNA (mRNA) . "Messenger RNA" or "mRNA" refers to the RNA that can be translated into protein and does not comprises introns. "cDNA" refers to a DNA that is complementary to, and synthesized from, an mRNA template using the reverse transcriptase. The cDNA can be single-stranded or converted into double-stranded form using, e.g., the Klenow fragment of DNA polymerase I. "Sense" RNA refers to RNA transcript that includes the mRNA and can be translated into protein within a cell or in vitro. "Antisense RNA" refers to an RNA transcript that is complementary to all or part of a target primary transcript or mRNA, and that can block the expression of a target gene. The complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e., at the 5' non-coding sequence, 3' non-coding sequence, introns, or the coding sequence. "Functional RNA" refers to antisense RNA, ribozyme RNA, or other RNA that may not be translated but yet has an effect on cellular processes. The terms "complement" and "reverse complement" are used interchangeably herein with respect to mRNA transcripts, and are meant to define the antisense RNA of the message.

The term "operably linked" refers to the association of nucleotide sequences on a single nucleic acid fragment so that the function of one is regulated by the other. For example, a promoter is operably linked with a coding sequence when it is capable of regulating the expression of the coding sequence (i.e., the coding sequence is transcribed under the control of the promoter) . Coding sequences can be operably linked to regulatory sequences in a sense or antisense orientation.

The term "host" refers to an organism or cell into which a heterologous component (polynucleotide, polypeptide, other molecule, cell) has been introduced. As used herein, a "host cell" refers to an in vivo or isolated eukaryotic cell, prokaryotic cell (e.g., bacterial or archaeal cell) , or cell from a multicellular organism (e.g., a cell line) cultured as a unicellular entity, into which a heterologous polynucleotide or polypeptide has been introduced. The cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell and an animal cell, such as an invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, an insect cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell. In some cases, the cell is isolated. In some cases, the cell is in vivo.

The term "recombinant" refers to an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis, or by genetic engineering techniques.

The terms "plasmid" and "vector" refer to a linear or circular extra chromosomal element often carrying genes that are not part of the central metabolism of the cell, and usually in the form of double-stranded DNA. Such elements may be autonomously replicating sequences, genome integrating sequences, phage, or nucleotide sequences, in linear or circular form, of a single-or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a polynucleotide of interest into a cell.

The term "construct" , when referring to nucleic acid molecules, comprises an artificial combination of nucleic acid sequences, e.g., regulatory and coding sequences that are not all found together in nature. When the nucleic acid construct contains the control sequences required to express the coding sequence of the present invention, the term is synonymous with the term “expression cassette” . For example, a nucleic acid construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature. Such a construct may be used by itself or may be used in conjunction with a vector. If a vector is used, then the choice of vector is dependent upon the method that will be used to introduce the vector into the host cells as is well known to those skilled in the art. The vector for expressing a coding sequence (e.g., comprising an expression construct) is referred to as “expression vector” .

The term "expression" , as used herein, refers to the production of a functional end-product (e.g., an mRNA, guide RNA, or a protein) in either precursor or mature form.

A "mature" protein refers to a post-translationally processed polypeptide (i.e., one from which any pre-or propeptides present in the primary translation product have been removed) .

"Precursor" protein refers to the primary product of translation of mRNA (i.e., with pre-and propeptides still present) . Pre-and propeptides may be but are not limited to intracellular localization signals.

As used herein, an "effector" or "effector protein" is a protein that encompasses an activity including recognizing, binding to, and/or cleaving or nicking a polynucleotide target. An effector, or effector protein, may also be an endonuclease, such as the TnpB polypeptide of the invention. The "effector complex" of a gene editing system includes TnpB polypeptide involved in gRNA and target recognition and binding.

A "functional fragment" of a TnpB endonuclease refers to a portion of the TnpB endonuclease of the present disclosure in which the ability to recognize, bind to, and/or cleave (introduce a double-strand break in) the target site is retained. The "functional variant" of a TnpB endonuclease refers to a variant of the TnpB endonuclease disclosed herein in which the ability to recognize, bind to, and/or cleave a target sequence is retained.

A TnpB endonuclease may also include a multifunctional TnpB endonuclease, which refers to a single polypeptide that has endonuclease activity (comprising at least one protein domain that can act as a endonuclease) and at least one other functionality, such as but not limited to, the functionality to form a complex (comprises at least a second protein domain that can form a complex with other proteins) .

As used herein, the term "guide polynucleotide" , relates to a polynucleotide that can form a complex with a TnpB endonuclease, such as the TnpB endonuclease described herein, and enables the Cas endonuclease to recognize, optionally bind to, and optionally cleave a DNA target site. The guide polynucleotide can be a guide RNA, a guide DNA sequence, or a combination thereof (an RNA-DNA combination molecule) . For TnpB polypeptide, the guide RNA is also referred to as “right element RNA” or “reRNA” .

A "functional fragment" of a guide polynucleotide refers to a portion or subsequence of the guide polynucleotide of the present disclosure in which the ability to function as a guide polynucleotide is retained. A "functional variant" of a guide polynucleotide refers to a variant of the guide polynucleotide of the present disclosure in which the ability to function as a guide polynucleotide is retained.

The terms "targeting domain" and "targeting region" are used interchangeably herein and includes a nucleotide sequence that can hybridize (is complementary) to one strand (nucleotide sequence) of a double strand DNA target site. The percent complementation between the targeting region and the target sequence can be at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 63%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%or 100%. The variable targeting region can be at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides in length. In some embodiments, the variable targeting domain comprises a contiguous stretch of 12 to 30 nucleotides. The targeting domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence, or any combination thereof.

The "backbone" of a guide polynucleotide comprises a nucleotide sequence that interacts with a TnpB polypeptide.

The term "gRNA/TnpB complex" refers to an RNA component and a TnpB endonuclease that are capable of forming a complex, wherein the complex can direct the TnpB endonuclease to a DNA target site, enabling the TnpB endonuclease to recognize, bind to, and/or cleave (introduce a double-strand break) the DNA target site.

The terms "target site" , "target sequence" , and "target region" are used interchangeably herein and refer to a nucleotide sequence on a chromosome, episome, a locus, or any other DNA molecule in the genome (including chromosomal, chloroplastic, mitochondrial DNA, plasmid DNA) of a cell, at which a gRNA/TnpB complex can recognize, bind to, and optionally nick or cleave. The target site can be an endogenous site in the genome of a cell, or alternatively, the target site can be heterologous to the cell and thereby not be naturally occurring in the genome of the cell, or the target site can be found in a heterologous genomic location compared to where it occurs in nature.

A "transposon-associated motif" (TAM) herein refers to a short nucleotide sequence adjacent to a target sequence that is recognized (targeted) by a gRNA/TnpB complex described herein. The TnpB endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not adjacent to a TAM sequence. The sequence and length of a TAM herein can differ depending on the TnpB protein. The TAM sequence is typically 4 or 5 nucleotides long.

A “modified” TnpB polypeptide/endonuclease refers to a TnpB polypeptide comprising the substitution, deletion, insertion or addition of at least one amino acid when compared to the initial or wildtype TnpB polypeptide. If the modified TnpB is deprived of the activity of cleaving the DNA molecule while the ability of recognizing and binding to polynucleotide is retained, the modified TnpB polypeptide can be referred to a “disarmed” TnpB polypeptide.

An "altered target site" , "altered target sequence" , "modified target site" , "modified target sequence" are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence. Such "alteration" includes, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, (iv) a chemical alteration of at least one nucleotide, or (v) any combination of (i) - (iv) .

A "modified nucleotide" or "edited nucleotide" refers to a nucleotide sequence of interest that comprises at least one alteration when compared to its non-modified nucleotide sequence. Such "alterations" include, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, (iv) a chemical alteration of at least one nucleotide, or (v) any combination of (i) - (iv) .

Methods for "modifying a target site" and "altering a target site" are used interchangeably herein and refer to methods for producing an altered target site.

As used herein, "donor DNA" is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a gRNA/TnpB complex of the invention.

The term "polynucleotide modification template" includes a polynucleotide that comprises at least one nucleotide modification when compared to the nucleotide sequence to be edited. A nucleotide modification can be the substitution, addition, insertion or deletion of at least one nucleotide. Optionally, the polynucleotide modification template can further comprise homologous nucleotide sequences flanking the at least one nucleotide modification, wherein the flanking homologous nucleotide sequences provide sufficient homology to the desired nucleotide sequence to be edited.

As used herein, the term "before" , in reference to a sequence position, refers to an occurrence of one sequence upstream of another sequence (at the 5’ end for nucleotide sequences, or at the N terminus for the amino acid sequences) . The term “after” in reference to a sequence position, refers to an occurrence of one sequence downstream of another sequence (at the 3’ end for nucleotide sequences, or at the C terminus for the amino acid sequences) .

2. TnpB polypeptide with the activity of RNA-guided endonuclease

The inventors have identified a number of novel RNA-guided endonucleases, which are TnpB polypeptides encoded by the TnpB gene in the insertion sequences (IS) . That is, the TnpB polypeptides can work as the effector protein in a gene editing system. Upon the sequence analysis, the inventors found that the active TnpB polypeptides comprises a N-terminal helix-turn-helix (HTH) domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.

Therefore, the present disclosure provides an isolated TnpB polypeptide having the activity of RNA-guided endonuclease or a functional fragment thereof. In some embodiments, the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.

In some embodiments, the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.

In some embodiments, the TnpB polypeptide is encoded by a tnpB gene in an insertion sequence (IS) selected from a group consisting of ISEfa4, ISAs26, ISCpe2, ISMma22, ISBce3, ISAeme8, ISTfu1, ISCco1, ISSoc3, ISTel2, ISNsp3, ISCbt1, ISMac7, ISEc46, ISSen6, ISHahl1, ISKpn69, ISDge10, ISKpn85, ISNsp2, ISAba30, ISRor9, ISAam1, ISYmu1, ISCytsp1, ISCvi1, ISCvi2, ISAepa1 and ISBth16.

In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.

In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. The variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 , 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.

In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.

In some embodiments, the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity. The TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide. In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.

In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT.

In some embodiments, the TnpB polypeptide recognizes a TAM of CCAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.

In some embodiments, the TnpB polypeptide recognizes a TAM of CTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 16.

In some embodiments, the TnpB polypeptide recognizes a TAM of TGAY, where Y= C or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7, 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7.

In some embodiments, the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAG. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.

In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAW, where W = A or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 3. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 23. In some embodiments, the TnpB polypeptide recognizes a TAM of TTTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 25 or 27.

In some embodiments, the TnpB polypeptide recognizes a TAM of TTGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 24.

In some embodiments, the TnpB polypeptide of the present disclosure or the functional fragment thereof is capable of effecting RNA-guided cleavage in a prokaryotic and/or eukaryotic cell, preferably in both prokaryotic and eukaryotic cells. In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 7, 18, 21, 23 or 24.

In some embodiments, the the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. The variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.

In some embodiments, the functional fragment comprises at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the functional fragment consists of at least 100, 150, 200, 225, 250, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350 or more contiguous amino acids in SEQ ID NO: 7, 18, 21, 23 or 24.

Examples for the DDE motif include

D186, E280 and D362 of SEQ ID NO: 1;

D188, E272 and D368 of SEQ ID NO: 2;

D205, E289 and D372 of SEQ ID NO: 3;

D184, E268 and D348 of SEQ ID NO: 4;

D175, E260 and D339 of SEQ ID NO: 5;

D186, E270 and D352 of SEQ ID NO: 6;

D181, E265 and D361 of SEQ ID NO: 7;

D199, E293 and D383 of SEQ ID NO: 8;

D211, E295 and D373 of SEQ ID NO: 9;

D185, E268 and D350 of SEQ ID NO: 10;

D181, E265 and D345 of SEQ ID NO: 11;

D234, E342 and D436 of SEQ ID NO: 12;

D184, E268 and D350 of SEQ ID NO: 13;

D185, E269 and D351 of SEQ ID NO: 14;

D189, E273 and D369 of SEQ ID NO: 15;

D190, E290 and D376 of SEQ ID NO: 16;

D188, E272 and D368 of SEQ ID NO: 17;

D187, E271 and D351 of SEQ ID NO: 18;

D188, E272 and D368 of SEQ ID NO: 19;

D181, E265 and D345 of SEQ ID NO: 20;

D184, E268 and D364 of SEQ ID NO: 21;

D183, E267 and D349 of SEQ ID NO: 22;

D186, E270 and D351 of SEQ ID NO: 23;

D185, E279 and D361 of SEQ ID NO: 24;

D188, E272 and D352 of SEQ ID NO: 25;

D181, E265 and D348 of SEQ ID NO: 26;

D186, E276 and D359 of SEQ ID NO: 27;

D189, E274 and D354 of SEQ ID NO: 28; and

D188, E272 and D352 of SEQ ID NO: 29.

The inventors found that the TnpB endonuclease can be disarmed by modifying the DDE motif.

Therefore, the present disclosure also provides a modified/disarmed TnpB polypeptide comprising a modified DDE motif. In some embodiments, the DDE motif is modified by substituting at least one amino acid in the motif with a neutral amino acid or a basic amino acid. In some embodiments, at least one amino acid in the motif is substituted by alanine. In some embodiments, the modified TnpB polypeptide comprises

D186A, E280A and/or D362A as compared to SEQ ID NO: 1;

D188A, E272A and/or D368A as compared to SEQ ID NO: 2;

D205A, E289A and/or D372A as compared to SEQ ID NO: 3;

D184A, E268A and/or D348A as compared to SEQ ID NO: 4;

D175A, E260A and/or D339A as compared to SEQ ID NO: 5;

D186A, E270A and/or D352A as compared to SEQ ID NO: 6;

D181A, E265A and/or D361A as compared to SEQ ID NO: 7;

D199A, E293A and/or D383A as compared to SEQ ID NO: 8;

D211A, E295A and/or D373A as compared to SEQ ID NO: 9;

D185A, E268A and/or D350A as compared to SEQ ID NO: 10;

D181A, E265A and/or D345A as compared to SEQ ID NO: 11;

D234A, E342A and/or D436A as compared to SEQ ID NO: 12;

D184A, E268A and/or D350A as compared to SEQ ID NO: 13;

D185A, E269A and/or D351A as compared to SEQ ID NO: 14;

D189A, E273A and/or D369A as compared to SEQ ID NO: 15;

D190A, E290A and/or D376A as compared to SEQ ID NO: 16;

D188A, E272A and/or D368A as compared to SEQ ID NO: 17;

D187A, E271A and/or D351A as compared to SEQ ID NO: 18;

D188A, E272A and/or D368A as compared to SEQ ID NO: 19;

D181A, E265A and/or D345A as compared to SEQ ID NO: 20;

D184A, E268A and/or D364A as compared to SEQ ID NO: 21;

D183A, E267A and/or D349A as compared to SEQ ID NO: 22;

D186A, E270A and/or D351A as compared to SEQ ID NO: 23;

D185A, E279A and/or D361A as compared to SEQ ID NO: 24;

D188A, E272A and/or D352A as compared to SEQ ID NO: 25;

D181A, E265A and/or D348A as compared to SEQ ID NO: 26;

D186A, E276A and/or D359A as compared to SEQ ID NO: 27;

D189A, E274A and/or D354A as compared to SEQ ID NO: 28; and

D188A, E272A and/or D352A as compared to SEQ ID NO: 29.

In some embodiments, the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide is conserved at positions corresponding to N31, G179, L267, C332, C335, C351, and C354 of SEQ ID NO: 7.

The inventors further found that the TnpB endonuclease can be disarmed by modifying the amino acid corresponding to N31 of SEQ ID NO: 7. Therefore, the present disclosure also provides a modified/disarmed TnpB polypeptide comprising a modification at the position corresponding to N31 of SEQ ID NO: 7. In some embodiments, the modification is a substitution with alanine.

In some embodiments, the modified TnpB polypeptide comprises an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide consists of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the modified TnpB polypeptide is conserved at positions corresponding to G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.

In some embodiments, the modified TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA, i.e., is a disarmed TnpB polypeptide.

3.Fusion polypeptide

The present disclosure provides a fusion polypeptide comprising a TnpB polypeptide of the present disclosure or a functional fragment thereof or a modified/disarmed TnpB polypeptide of the present disclosure, fused to a fusion partner. The TnpB polypeptide includes the TnpB polypeptide with the activity of RNA-guided endonuclease as described above, or the functional fragment thereof. The modified/disarmed TnpB polypeptide has the ability of recognizing and binding to DNA molecule, but is deprived of cleaving the DNA molecule, e.g., double-stranded DNA.

In some embodiments, the fusion partner is a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone or other DNA-binding protein) associated with the target DNA.

In further embodiments, the fusion partner is a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.

In further embodiments, the fusion partner is a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g., a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc. ) . In some embodiments, the fusion partner is another polypeptide or domain, for example Clo51 or FokI nuclease, to generate double-strand breaks (Guilinger et al. Nature Biotechnology, volume 32, number 6, June 2014) .

In some embodiments, the fusion partner is a polypeptide that directs editing of single or multiple bases in a polynucleotide sequence, for example a site-specific deaminase that can change the identity of a nucleotide, for example from C-G to T-A or an A-T to G-C (Gaudelli et al., Programmable base editing of A-T to G-C in genomic DNA without DNA cleavage. "Nature (2017) ; Nishida et al. " Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. "Science 353 (6305) (2016) ; Komor et al. " Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. " Nature 533 (7603) (2016) : 420-4.

The fusion polypeptide may comprise, for example, an active (double strand break creating) , partially active (nickase) or deactivated (deprived of cleaving) TnpB endonuclease and a deaminase (such as, but not limited to, a cytidine deaminase, an adenine deaminase, APOBEC1, APOBEC3A, BE2, BE3, BE4, ABEs, or the like) . In some embodiments, the fusion partner includes base edit repair inhibitors and glycosylase inhibitors (e.g., uracil glycosylase inhibitor (to prevent uracil removal) ) .

In some embodiments, the fusion partner can be a Cas endonuclease or another TnpB endonuclease as described in the present disclosure.

The TnpB polypeptide, the functional fragment thereof, or the modified/disarmed TnpB polypeptide of the present disclosure can also be fused to a heterologous nuclear localization sequence (NLS) . A heterologous NLS herein may be of sufficient strength to drive accumulation of the TnpB polypeptide the functional fragment thereof, the modified/disarmed TnpB polypeptide or the fusion polypeptide in a detectable amount in the nucleus of a eukaryotic cell. An NLS may comprise one (monopartite) or more (e.g., bipartite) short sequences (e.g., 2 to 20 residues) of basic, positively charged residues (e.g., lysine and/or arginine) . An NLS may be operably linked to the N-terminus or C-terminus of a TnpB polypeptide, for example. Two or more NLS sequences can be linked to a TnpB polypeptide, for example, on both the N-and C-termini of a TnpB polypeptide.

4.Guide polynucleotide

The guide polynucleotide enables target recognition, binding, and optionally cleavage by the TnpB polypeptide. The guide polynucleotide sequence can be a RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence) . Optionally, the guide polynucleotide can comprise at least one nucleotide, phosphodiester bond or linkage modification such as, but not limited, to Locked Nucleic Acid (LNA) , 5-methyl dC, 2, 6-Diaminopurine, 2'-Fluoro A, 2'-Fluoro U, 2'-O-Methyl RNA, phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 (hexaethylene glycol chain) molecule, or 5' to 3' covalent linkage resulting in circularization. A guide polynucleotide that solely comprises ribonucleic acids is also referred to as a "guide RNA" or "gRNA" . A guide polynucleotide may be engineered or synthetic. The gRNA for TnpB polypeptide is also referred to as “right element RNA” or “reRNA” .

The guide polynucleotide includes a chimeric non-naturally occurring guide RNA comprising regions that are not found together in nature (i.e., they are heterologous with each other) . For example, a chimeric non-naturally occurring guide RNA comprising a targeting region that can hybridize to a nucleotide sequence in a target DNA, linked to a backbone region that can recognize the TnpB polypeptide, wherein the first and second nucleotide sequence are not found linked together in nature. In some embodiments, the targeting region is at the 3’ end of the scarffold.

The guide polynucleotide for TnpB polypeptide is a single guide, and the backbone can be 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived. The guide polynucleotide can effect the target recognition, binding, and optionally cleavage by the TnpB polypeptide when removing a part or the whole of one or several stem structures in the backbone.

The guide polynucleotide can further comprise an additional nucleotide sequence at the 5’ end of the backbone. In some embodiments, the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.

In some embodiments, the targeting region and the backbone region are selected from the group consisting of a DNA sequence, an RNA sequence, and a combination thereof.

In some embodiments, the guide polynucleotide comprises RNA backbone modifications that enhance stability, DNA backbone modifications that enhance stability, and a combination thereof (see Kanasty et al., 2013, Common RNA-backbone modifications, Nature Materials 12:976-977; US20150082478 published 19 Mar. 2015 and US20150059010 published 26 Feb. 2015) .

5. Polynucleotide and construct for expressing the TnpB polypeptide or fusion polypeptide The TnpB endonuclease, the functional fragment thereof, the disarmed TnpB polypeptide and the fusion polypeptide of the present disclosure can be isolated from a native source (for TnpB polypeptide) , or from a recombinant source where the host cell is genetically modified to express the nucleotide sequence encoding the polypeptide. Alternatively, the TnpB polypeptide and fusion polypeptide can be produced using cell free protein expression systems, or be synthetically produced.

Therefore, the present disclosure also provides an isolated polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide, the functional fragment thereof, the disarmed TnpB polypeptide and the fusion polypeptide of the present disclosure.

The TnpB polypeptide, the functional fragment thereof, the disarmed TnpB polypeptide and the fusion polypeptide, as well as the guide polynucleotide can be expressed in a cell. Cells include, but are not limited to, human, non-human, animal, bacterial, fungal, insect, yeast, and plant cells.

Standard recombinant DNA and molecular cloning techniques used herein are well known in the art and are described more fully in Sambrook et al., Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory: Cold Spring Harbor, N.Y. (1989) . Transformation methods are well known to those skilled in the art and are described infra.

Provided are also vectors and constructs including circular plasmids, and linear polynucleotides, comprising a polynucleotide of interest and optionally other components including linkers, adapters, regulatory sequences.

In some embodiments, the vector comprises an expression cassette encoding both the TnpB polypeptide and the guide polynucleotide. In some examples a recognition site and/or target site can be comprised within an intron, coding sequence, 5' UTRs, 3' UTRs, and/or regulatory regions.

In some embodiments, the vector comprises two expression cassettes encoding the TnpB polypeptide and the guide polynucleotide, respectively.

In some embodiments, the expression of the TnpB polypeptide and/or the guide polynucleotide is driven by a constitutive promoter, an inducible promoter, or a spatio-temporal specific promoter.

6. Gene Editing with the TnpB polypeptide

6.1. Recombinant gene editing system

The present disclosure provides a recombinant gene editing system comprising the novel TnpB polypeptide having the activity of RNA-guided endonuclease of the present disclosure.

In some embodiments, the recombinant gene editing system comprises:

- a TnpB polypeptide of the present disclosure having the activity of RNA-guided endonuclease or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide, or the functional fragment thereof, and

- a guide polynucleotide, such as a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.

In some embodiments, the TnpB polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.

In some embodiments, the the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. The variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7

In some embodiments, the TnpB polypeptide fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity. The TnpB polypeptide of the present disclosure can recognize a shorter TAM as compared to ISDra2 TnpB polypeptide. In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM consisting of four consecutive nucleotides.

In some embodiments, the TnpB polypeptide of the present disclosure can recognize a TAM of CCAT, CTAC, TGAC, TGAT, TTAC, TTAG, TTAA, TTAT, ACAT, TTTAT, TTTAA or TTGAT. In some embodiments, the TnpB polypeptide recognizes a TAM of CCAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 2, 9, 15, 17 or 19.

In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. The variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7

In some embodiments, the TnpB comprises a DDE motif corresponding to

D186, E280 and D362 of SEQ ID NO: 1;

D188, E272 and D368 of SEQ ID NO: 2;

D205, E289 and D372 of SEQ ID NO: 3;

D184, E268 and D348 of SEQ ID NO: 4;

D175, E260 and D339 of SEQ ID NO: 5;

D186, E270 and D352 of SEQ ID NO: 6;

D181, E265 and D361 of SEQ ID NO: 7;

D199, E293 and D383 of SEQ ID NO: 8;

D211, E295 and D373 of SEQ ID NO: 9;

D185, E268 and D350 of SEQ ID NO: 10;

D181, E265 and D345 of SEQ ID NO: 11;

D234, E342 and D436 of SEQ ID NO: 12;

D184, E268 and D350 of SEQ ID NO: 13;

D185, E269 and D351 of SEQ ID NO: 14;

D189, E273 and D369 of SEQ ID NO: 15;

D190, E290 and D376 of SEQ ID NO: 16;

D188, E272 and D368 of SEQ ID NO: 17;

D187, E271 and D351 of SEQ ID NO: 18;

D188, E272 and D368 of SEQ ID NO: 19;

D181, E265 and D345 of SEQ ID NO: 20;

D184, E268 and D364 of SEQ ID NO: 21;

D183, E267 and D349 of SEQ ID NO: 22;

D186, E270 and D351 of SEQ ID NO: 23;

D185, E279 and D361 of SEQ ID NO: 24;

D188, E272 and D352 of SEQ ID NO: 25;

D181, E265 and D348 of SEQ ID NO: 26;

D186, E276 and D359 of SEQ ID NO: 27; and

D189, E274 and D354 of SEQ ID NO: 28;

D188, E272 and D352 of SEQ ID NO: 29.

In some embodiments, the recombinant gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.

In some embodiments, the recombinant gene editing system further comprises a heterologous polynucleotide, such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.

In some embodiments, the backbone of the guide polynucleotide comprises the 115-350 nucleotides, e.g., at least 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340 nucleotides before the right end (RE) of the IS, from which the TnpB is derived. In some embodiments, the backbone is modified by removing a part or the whole of one or several stem structures in the backbone. In some embodiments, a part or the whole of the first stem structure from the 3’ end is removed.

In some embodiments the guide polynucleotide comprises an additional nucleotide sequence at the 5’ end of the backbone. In some embodiments, the additional nucleotide sequence can recognize and/or bind to an additional nuclease, such as a TnpB polypeptide of the disclosure or a Cas nuclease.

In some embodiments, the gRNA further comprises one or more additional protein-binding domains. In some embodiments, the system comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.

6.2. Composition, complex and isolated cell

During the gene editing, a complex or composition can be formed. Therefore, the present disclosure provides a composition or complex comprising

- a recombinant TnpB polypeptide of the present disclosure having the activity of RNA-guided endonuclease or a functional fragment thereof,

- a recombinant guide RNA (gRNA) comprising a targeting region capable of hybridizing to the nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or a functional fragment thereof.

The present disclosure also provides an isolated cell comprising

- a recombinant TnpB polypeptide of the present disclosure, having the activity of RNA-guided endonuclease or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide, or the functional fragment thereof,

In some embodiments, the TnpB polypeptide comprises the amino acid sequence of SEQ ID NO:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide consists of the amino acid sequence of SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.

In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.The variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7

In some embodiments, the TnpB comprises a DDE motif corresponding to

D186, E280 and D362 of SEQ ID NO: 1;

D188, E272 and D368 of SEQ ID NO: 2;

D205, E289 and D372 of SEQ ID NO: 3;

D184, E268 and D348 of SEQ ID NO: 4;

D175, E260 and D339 of SEQ ID NO: 5;

D186, E270 and D352 of SEQ ID NO: 6;

D181, E265 and D361 of SEQ ID NO: 7;

D199, E293 and D383 of SEQ ID NO: 8;

D211, E295 and D373 of SEQ ID NO: 9;

D185, E268 and D350 of SEQ ID NO: 10;

D181, E265 and D345 of SEQ ID NO: 11;

D234, E342 and D436 of SEQ ID NO: 12;

D184, E268 and D350 of SEQ ID NO: 13;

D185, E269 and D351 of SEQ ID NO: 14;

D189, E273 and D369 of SEQ ID NO: 15;

D190, E290 and D376 of SEQ ID NO: 16;

D188, E272 and D368 of SEQ ID NO: 17;

D187, E271 and D351 of SEQ ID NO: 18;

D188, E272 and D368 of SEQ ID NO: 19;

D181, E265 and D345 of SEQ ID NO: 20;

D184, E268 and D364 of SEQ ID NO: 21;

D183, E267 and D349 of SEQ ID NO: 22;

D186, E270 and D351 of SEQ ID NO: 23;

D185, E279 and D361 of SEQ ID NO: 24;

D188, E272 and D352 of SEQ ID NO: 25;

D181, E265 and D348 of SEQ ID NO: 26;

D186, E276 and D359 of SEQ ID NO: 27;

D189, E274 and D354 of SEQ ID NO: 28; and

D188, E272 and D352 of SEQ ID NO: 29.

In some embodiments, the composition further comprises a heterologous polynucleotide, such as an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.

In some embodiments, the gRNA further comprises one or more additional protein-binding domains. In some embodiments, the composition of isolated cell comprises one or more additional effector polypeptides capable of binding to the one or more additional protein-binding domains, or the polynucleotide comprising a nucleotide sequence encoding the one or more effector polypeptides, to form one or more ribonucleoproteins in tandem.

6.3. Disarmed system

The present disclosure provides a recombinant system comprising

- a modified TnpB polypeptide of the present disclosure comprising a modification in the DDE motif as compared to the parent TnpB polypeptide, or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or the functional fragment thereof, wherein the parent polypeptide has the activity of RNA-guided endonuclease, and the modified TnpB polypeptide is deprived of the activity of cleaving double-stranded DNA, and

In some embodiments, the DDE motif is modified by substituting at least one amino acid in the motif with a neutral amino acid or a basic amino acid. In some embodiments, at least one amino acid in the motif is substituted by alanine. In some embodiments, the modified TnpB polypeptide comprises

D186A, E280A and/or D362A as compared to SEQ ID NO: 1;

D188A, E272A and/or D368A as compared to SEQ ID NO: 2;

D205A, E289A and/or D372A as compared to SEQ ID NO: 3;

D184A, E268A and/or D348A as compared to SEQ ID NO: 4;

D175A, E260A and/or D339A as compared to SEQ ID NO: 5;

D186A, E270A and/or D352A as compared to SEQ ID NO: 6;

D181A, E265A and/or D361A as compared to SEQ ID NO: 7;

D199A, E293A and/or D383A as compared to SEQ ID NO: 8;

D211A, E295A and/or D373A as compared to SEQ ID NO: 9;

D185A, E268A and/or D350A as compared to SEQ ID NO: 10;

D181A, E265A and/or D345A as compared to SEQ ID NO: 11;

D234A, E342A and/or D436A as compared to SEQ ID NO: 12;

D184A, E268A and/or D350A as compared to SEQ ID NO: 13;

D185A, E269A and/or D351A as compared to SEQ ID NO: 14;

D189A, E273A and/or D369A as compared to SEQ ID NO: 15;

D190A, E290A and/or D376A as compared to SEQ ID NO: 16;

D188A, E272A and/or D368A as compared to SEQ ID NO: 17;

D187A, E271A and/or D351A as compared to SEQ ID NO: 18;

D188A, E272A and/or D368A as compared to SEQ ID NO: 19;

D181A, E265A and/or D345A as compared to SEQ ID NO: 20;

D184A, E268A and/or D364A as compared to SEQ ID NO: 21;

D183A, E267A and/or D349A as compared to SEQ ID NO: 22;

D186A, E270A and/or D351A as compared to SEQ ID NO: 23;

D185A, E279A and/or D361A as compared to SEQ ID NO: 24;

D188A, E272A and/or D352A as compared to SEQ ID NO: 25;

D181A, E265A and/or D348A as compared to SEQ ID NO: 26;

D186A, E276A and/or D359A as compared to SEQ ID NO: 27;

D189A, E274A and/or D354A as compared to SEQ ID NO: 28; and

D188A, E272A and/or D352A as compared to SEQ ID NO: 29.

In some embodiments, the modified TnpB polypeptide comprising a modification at the position corresponding to N31 of SEQ ID NO: 7. In some embodiments, the modification is a substitution with alanine.

6.4. Fusion system

The present disclosure provides a gene editing system comprising

- a fusion polypeptide of the present disclosure, e.g., comprising a TnpB polypeptide having the activity of RNA-guided endonuclease, or a functional fragment thereof, or a modified TnpB polypeptide fused to a fusion partner, or a polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide, or the functional fragment thereof, and

In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. In some embodiments, the the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29. The variant may differ from SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7

In some embodiments, the TnpB polypeptide recognizes a TAM of TGAY, where Y= C or T. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7, 10 or 21.In some embodiments, the TnpB polypeptide recognizes a TAM of TGAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 10 or 21. In some embodiments, the TnpB polypeptide recognizes a TAM of TGAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 7.

In some embodiments, the TnpB polypeptide recognizes a TAM of TTAN, where N is any nucleotide. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO:1, 4, 5, 6, 8, 11, 12, 13, 14, 18, 20, 22, 26, 28, and 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAC. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 5, 11, 12 or 20. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAG. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 4, 6 or 14. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAA. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 1, 22, 28, or 29. In some embodiments, the TnpB polypeptide recognizes a TAM of TTAT. In some embodiments, the TnpB polypeptide comprises an amino acid of SEQ ID NO: 8, 13, 18 or 26.

In some embodiments, the TnpB polypeptide is a functional variant comprising an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. In some embodiments, the TnpB polypeptide is a functional variant consisting of an amino acid sequence at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99%identical to SEQ ID NO: 7, 18, 21, 23 or 24. The variant may differ from SEQ ID NO: 7, 18, 21, 23 or 24 by the substitution, insertion, deletion and/or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more amino acids. In some embodiments, the variant is conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7.

In some embodiments, the TnpB comprises a DDE motif corresponding to

D186, E280 and D362 of SEQ ID NO: 1;

D188, E272 and D368 of SEQ ID NO: 2;

D205, E289 and D372 of SEQ ID NO: 3;

D184, E268 and D348 of SEQ ID NO: 4;

D175, E260 and D339 of SEQ ID NO: 5;

D186, E270 and D352 of SEQ ID NO: 6;

D181, E265 and D361 of SEQ ID NO: 7;

D199, E293 and D383 of SEQ ID NO: 8;

D211, E295 and D373 of SEQ ID NO: 9;

D185, E268 and D350 of SEQ ID NO: 10;

D181, E265 and D345 of SEQ ID NO: 11;

D234, E342 and D436 of SEQ ID NO: 12;

D184, E268 and D350 of SEQ ID NO: 13;

D185, E269 and D351 of SEQ ID NO: 14;

D189, E273 and D369 of SEQ ID NO: 15;

D190, E290 and D376 of SEQ ID NO: 16;

D188, E272 and D368 of SEQ ID NO: 17;

D187, E271 and D351 of SEQ ID NO: 18;

D188, E272 and D368 of SEQ ID NO: 19;

D181, E265 and D345 of SEQ ID NO: 20;

D184, E268 and D364 of SEQ ID NO: 21;

D183, E267 and D349 of SEQ ID NO: 22;

D186, E270 and D351 of SEQ ID NO: 23;

D185, E279 and D361 of SEQ ID NO: 24;

D188, E272 and D352 of SEQ ID NO: 25;

D181, E265 and D348 of SEQ ID NO: 26;

D186, E276 and D359 of SEQ ID NO: 27;

D189, E274 and D354 of SEQ ID NO: 28; and

D188, E272 and D352 of SEQ ID NO: 29.

In some embodiments, the recombinant gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.

In some embodiments, the modified TnpB polypeptide comprises a modification in the DDE motif as compared to the parent TnpB polypeptide, wherein the parent polypeptide comprises a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.

D186A, E280A and/or D362A as compared to SEQ ID NO: 1;

D188A, E272A and/or D368A as compared to SEQ ID NO: 2;

D205A, E289A and/or D372A as compared to SEQ ID NO: 3;

D184A, E268A and/or D348A as compared to SEQ ID NO: 4;

D175A, E260A and/or D339A as compared to SEQ ID NO: 5;

D186A, E270A and/or D352A as compared to SEQ ID NO: 6;

D181A, E265A and/or D361A as compared to SEQ ID NO: 7;

D199A, E293A and/or D383A as compared to SEQ ID NO: 8;

D211A, E295A and/or D373A as compared to SEQ ID NO: 9;

D185A, E268A and/or D350A as compared to SEQ ID NO: 10;

D181A, E265A and/or D345A as compared to SEQ ID NO: 11;

D234A, E342A and/or D436A as compared to SEQ ID NO: 12;

D184A, E268A and/or D350A as compared to SEQ ID NO: 13;

D185A, E269A and/or D351A as compared to SEQ ID NO: 14;

D189A, E273A and/or D369A as compared to SEQ ID NO: 15;

D190A, E290A and/or D376A as compared to SEQ ID NO: 16;

D188A, E272A and/or D368A as compared to SEQ ID NO: 17;

D187A, E271A and/or D351A as compared to SEQ ID NO: 18;

D188A, E272A and/or D368A as compared to SEQ ID NO: 19;

D181A, E265A and/or D345A as compared to SEQ ID NO: 20;

D184A, E268A and/or D364A as compared to SEQ ID NO: 21;

D183A, E267A and/or D349A as compared to SEQ ID NO: 22;

D186A, E270A and/or D351A as compared to SEQ ID NO: 23;

D185A, E279A and/or D361A as compared to SEQ ID NO: 24;

D188A, E272A and/or D352A as compared to SEQ ID NO: 25;

D181A, E265A and/or D348A as compared to SEQ ID NO: 26;

D186A, E276A and/or D359A as compared to SEQ ID NO: 27;

D189A, E274A and/or D354A as compared to SEQ ID NO: 28; and

D188A, E272A and/or D352A as compared to SEQ ID NO: 29.

In some embodiments, the fusion partner is a polypeptide that directs editing of single or multiple bases in a polynucleotide sequence, for example a site-specific deaminase that can change the identity of a nucleotide, for example from C-G to T-A or an A-T to G-C (Gaudelli et al., Programmable base editing of A-T to G-C in genomic DNA without DNA cleavage. " Nature (2017) ; Nishida et al. "Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. " Science 353 (6305) (2016) ; Komor et al. "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. " Nature 533 (7603) (2016) : 420-4.

In some embodiments, the fusion partner is an active (double strand break creating) , partially active (nickase) or deactivated (deprived of cleaving) TnpB endonuclease and a deaminase (such as, but not limited to, a cytidine deaminase, an adenine deaminase, APOBEC1, APOBEC3A, BE2, BE3, BE4, ABEs, or the like) . In some embodiments, the fusion partner includes base edit repair inhibitors and glycosylase inhibitors (e.g., uracil glycosylase inhibitor (to prevent uracil removal) ) .

In some embodiments, the fusion partner is a Cas endonuclease or another TnpB endonuclease as described in the present disclosure.

In some embodiments, the fusion partner is a heterologous NLS. In some embodiments, the NLS is operably linked to the N-terminus or C-terminus of the TnpB polypeptide, the functional fragment thereof or the modified TnpB polypeptide. In some embodiments, the fusion polypeptides comprises two or more NLS sequences linked to the TnpB polypeptide the functional fragment thereof or the modified TnpB polypeptide, for example, on both the N-and C-termini of the same.

7. Method for gene editing

The present disclosure provides a method of introducing a double-strand break into a polynucleotide of interest comprising a step of contacting the polynucleotide with the recombinant gene editing system of the present disclosure targeting a nucleotide sequence in the polynucleotide.

The present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell the recombinant gene editing system or the fusion system of the present disclosure targeting a genomic sequence in the cell.

The present disclosure provides a method of modifying a genomic sequence in a cell comprising a step of introducing into the cell the disarmed system of the present disclosure and a gene editing system targeting the genomic sequence, wherein the nucleotide sequence targeted by the disarmed sequence is next to the genomic sequence.

Methods for introducing polynucleotides or polypeptides or a polynucleotide-protein complex into cells or organisms are known in the art including, but not limited to, microinjection, electroporation, stable transformation methods, transient transformation methods, ballistic particle acceleration (particle bombardment) , whiskers mediated transformation, Agrobacterium-mediated transformation, direct gene transfer, viral-mediated introduction, transfection, transduction, cell-penetrating peptides, mesoporous silica nanoparticle (MSN) -mediated direct protein delivery, topical applications, sexual crossing, sexual breeding, and any combination thereof.

Adeno-associated virus (AAV) is a widely used vector for deliver heterologous polynucleotides. However, due to the small genome size of AAV (about 4.7 kb) and the large size of Cas polypeptide (more than 1000 amino acids) , the delivery of Cas system with recombinant AAV (rAAV) is limited, and it is generally not possible to deliver a Cas-fusion system (the fusion and gRNA) encoded in a single vector. In contrast to Cas systems, due to the much smaller size of the TnpB polypeptides, the gene editing system with a TnpB polypeptide and a TnpB fusion can be delivered in a single rAAV.

Therefore, the present disclosure provides a recombinant adeno-associate virus (rAAV) comprising a genome comprising a first expression cassette encoding the TnpB polypeptide, the modified TnpB polypeptide, or the fusion polypeptide of the present disclosure. In some embodiments, the first expression cassette comprises a promoter and a terminator.

In some embodiments, the genome comprises a second expression cassette encoding a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof. In some embodiments, the second expression cassette comprises a promoter and a terminator.

In some embodiments, the first expression cassette comprises less than about 4,700 nucleotides, less than about 4,600 nucleotides, less than about 4,500 nucleotides, less than about 4,400 nucleotides, less than about 4,300 nucleotides, less than about 4,200 nucleotides, less than about 4,100 nucleotides, less than about 4,000 nucleotides, less than about 3,900 nucleotides, less than about 3,800 nucleotides, less than about 3,700 nucleotides, less than about 3,600 nucleotides, less than about 3,500 nucleotides, less than about 3,400 nucleotides, less than about 3,300 nucleotides, less than about 3,200 nucleotides, less than about 3,100 nucleotides, less than about 3,000 nucleotides, less than about 2,900 nucleotides, less than about 2,800 nucleotides, less than about 2,700 nucleotides, less than about 2,600 nucleotides, or less than about 2,500 nucleotides.

In some embodiments the genome comprises about 4,500 to about 4,700 nucleotides.

8. Method of screening TnpB polypeptides having the activity of RNA-guided endonuclease

The present disclosure provides a method of screening TnpB polypeptide for the activity of cleaving double-stranded DNA comprising the steps of:

- providing a candidate TnpB polypeptide from a microorganism;

- providing a gRNA comprising a targeting region and a backbone region, wherein the backbone region comprises at least 100 nucleotides before the 3’ end of the IS, which naturally comprises the nucleotide sequence encoding the TnpB polypeptide;

- contacting the TnpB polypeptide with the gRNA and the target DNA; and

- detecting the cleavage on the target DNA.

In some embodiments, the backbone region comprises at least 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300 or more nucleotides before the 3’ end of the IS. In some embodiments, the backbone region comprises 100-350, 125-325, 150-300, 175-275, 200-250, 150-225, 175-225, 150-200, 175-225, or 175-200nucleotides before the 3’ end of the IS.

The TnpB polypeptide can be provided as an isolated polypeptide or by the expression from a polynucleotide encoding the same. In some embodiments, the TnpB polypeptide is provided by a first polynucleotide, preferably DNA, comprising a first nucleotide sequence encoding the same.

The gRNA can be provided as an isolated RNA molecule or by the transcription form a DNA molecule encoding the same. In some embodiments, the gRNA is provided as a second polynucleotide, preferably DNA, comprising a second nucleotide sequence encoding the same. The first and second polynucleotides can be provided in a single vector, or in separate vectors. In some embodiments, the first and second polynucleotides are provided in a first vector, such as a first plasmid. In some embodiments, the first nucleotide sequence is operably linked to a first promoter. In some embodiments, the second nucleotide sequence is operably linked to a second promoter.

In some embodiments, the target DNA is provided in a second plasmid.

In some embodiments, contacting the TnpB polypeptide with the gRNA and the target DNA comprises introducing the first and second plasmids into a host cell comprising the target DNA.

Examples

Example 1. Screening of TnpB polypeptides with endonuclease activity

This Example was carried out to screen TnpB polypeptides with endonuclease activity.

A series of plasmid pairs (a test plasmid comprising nucleotide sequences encoding a TnpB polypeptide, the gRNA (comprising a targeting sequence of SEQ ID NO: 240 and related reRNA backbone) and a resistant gene against chloramphenicol (Cm) , and a reporter plasmid comprising a target sequence of SEQ ID NO: 240, a TAM and a resistant gene against kanamycin) were constructed.

In brief, for the test plasmids, TnpB genes and gRNA coding sequences (backbone + targeting region) were synthesized by Tsingke (Beijing, China) and cloned into pBAD backbone by Gibson Assembly, the TnpB genes were driven by J23108 promoter, and gRNA coding sequences were driven by J23119 promoter (see Leenay et al., 2016, Identifying and Visualizing Functional PAM Diversity across CRISPR-Cas Systems, Molecular Cell, 62, 1-11) .

The reporter plasmid (Kan+) carrying oligos containing target nucleotide sequence and related TAM flanked by EcoRI and XhoI restriction sites were ordered from Tsingke (Beijing, China) . In brief, oligos and the pCB457 plasmid were digested with EcoRI and BamHI at 37℃ for 1h. The digested products were isolated with and ligated using T4 ligase according to the manufacturer’s instructions.

The maps of the plasmid pair for ISTfu1 TnpB are shown in Fig. 1 as an example.

Each of the plasmid pairs were transformed into E. coli BW25141 cells by electroporation to test the endonuclease activity of the TnpB polypeptide (test group) , and a plasmid pair with the removal of the gRNA coding sequence from the test plasmid was used as negative control.

In particular, the test and target plasmids were electroporated into E. coli (NEB 10β, C3020K) using BIO-RAD machine (Gene Pulser Xcell) with program 1.8kV, 25μF, 200phm. After electroporation, 900μl SOC medium was added followed by the incubation at 37℃ for 1 hour. Then, the mixture was quartered, serially diluted (10×) and inocubated onto different LB plates (Cm+/Kan+, Cm+, Kan+ and plain) (50mg/L for each antibiotic) . After the incubation at 37℃for 12 hours, the photos of the plates were taken.

Viable colonies were counted and depletion ratio was calculated (depletion ratio = the number of colonies of TnpB &reRNA group in minimal dilution/the number of colonies of TnpB alone group in minimal dilution) .

The plain plate, K+ plate, and Cm+ plate were used to show the efficiency of electroporation. Generally, the negative control showed similar numbers of colonies on these three plates as compared to the test group.

On the K+Cm+ plate, negative controls and the test groups encoding a TnpB polypeptide not having endonuclease activity showed colonies, while the test groups encoding a TnpB polypeptide having endonuclease activity did not.

81 TnpB polypeptides in total were tested, and the results demonstrated that the TnpB polypeptides of SEQ ID NOs: 1-29 showed endonuclease activity (see Fig. 2) , cleaving a target site in an RNA-guided manner with various depletion ratios (see Fig. 3) , while 52 TnpB polypeptides encoded by SEQ ID NOs: 59-110 did not show endonuclease activity (data not shown) .

The TnpB polypeptides having RNA-guided endonuclease activity and gRNAs as well as the TAMs thereof are shown in Table 1.

Table 1 TnpB polypeptides having desired activity

Example 2. Characterization of the TnpB polypeptides

2.1. Analysis of sequence

The amino acid sequences of the TnpB polypeptides showing endonuclease activity were aligned (Clustal Omega) . The results showed that the DDE motif is conserved in the TnpB polypeptides (see Figs. 4 and 5) . Although the DDE motif in ISBce3 TnpB and ISCbt1 TnpB was not completely aligned with others, it was actually present, and was indicated in the amino acid sequences (see Fig. 5) . It was noted that the TnpB polypeptides having the activity of RNA- guided endonuclease are generally conserved at positions corresponding to N31, G179, D181, E265, L267, C332, C335, C351, C354 and D361 of SEQ ID NO: 7, except that ISCbt1 (SEQ ID NO: 12) is different from others at positions corresponding to L267, C332, C335, C351 and C354 of SEQ ID NO: 7.

For clarity, the alignment between 25 of the newly-identified TnpB polypeptides and ISDra2 TnpB polypeptide was shown in Fig. 12A.

2.2. DDE motif is essential for the endonuclease activity

The variants of the TnpB polypeptides of SEQ ID NOs: 1-22 with the first D residue of DDE motif substituted by A was tested as described in Example 1. The results showed that the variants substantially lost the endonuclease activity (see Figs. 2 and 3) , indicating that the DDE motif is essential for the endonuclease activity, and it is possible to prepare disarmed TnpB polypeptide (dTnpB) by introducing mutation (s) into DDE motif.

2.3. The conserved amino acid residues are essential for the endonuclease activity

The variants of the TnpB polypeptides ISTfu1 (SEQ ID NO: 7) , ISDge10 (SEQ ID NO: 18) , ISAba30 (SEQ ID NO: 21) and ISDra2 with the amino acid corresponding to N31 of SEQ ID NO:7 substituted by A was tested as described in Example 1. The results showed that the variants substantially lost the endonuclease activity (see Fig. 12B) , indicating that the conserved amino acid residue such as that corresponding to N31 of SEQ ID NO: 7 is essential for the endonuclease activity, and it is possible to prepare disarmed TnpB polypeptide (dTnpB) by introducing mutation (s) into amino acid (s) that are conserved between TnpB polypeptides.

Example 3. RNA-guided cleavage in eukaryotic cells

This Example was carried out to verify the RNA-guided cleavage in eukaryotic cells with the TnpB polypeptides.

A fluorescence-reporting system was used for identify the RNA-guided cleavage. As shown in Fig. 6, the system comprised a target sequence (SEQ ID NO: 240) and a TAM located between mRFP coding sequence and GFP coding sequence (SEQ ID NOs: 140 and 141) . The mRFP and GFP coding sequences were linked out of frame, and thus, the GFP would not be expressed if no cleavage occurred in the target sequence. Once the cleavage occurred, the mRFP and GFP might be linked in frame upon repairing.

Plasmid groups, each comprising a plasmid encoding a TnpB (TnpB plasmid) , a plasmid encoding the corresponding gRNA (gRNA plasmid) and a reporting plasmid comprising the fluorescence-reporting system, were constructed.

In brief, for constructing the reporting plasmid, oligos containing target sequence and related TAM were ordered from Tsingke (Beijing, China) . Then, the oligos were annealed and ligated into pRGS vector digested with EcoRI and BamHI (see Kim et al., Surrogate Reporters for Enrichment of Cells with Nuclease-induced Mutations, Nature Methods, 2011, 8 (11) : 941-944) .

The gRNA plasmid was constructed by inserting the oligos encoding gRNA (the target sequence+backbone as shown in Table 1) flanked by EcoRI and BamHI restriction sites into pUC19 plasmid under the control of U6 promoter, and the TnpB plasmid was constructed by inserting the coding sequence of the TnpB polypeptide (see Table 1) into pcDNA3.1. The maps of the above three plasmids are shown in Fig. 7.

A group of plasmids (120ng TnpB plasmid + 80ng gRNA plasmid+ 200ng reporting plasmids) were co-transfected into HEK293T cells (ATCC, CRL3216) with 2000 Reagent (Invitrogen) according to the manufacturer’s instructions. The resulted cells were analyzed by flow cytometry (LSRFortessa, BD bioscience) .

As shown in Fig. 8, five (ISTfu1, ISDge10, ISAba30, ISAam1, and ISYmu1) of the 29 TnpB polypeptides showed RNA-guided cleavage in eukaryotic cells, i.e., GFP signal was shown in the presence of both the TnpB polypeptide and the gRNA.

Example 4. TAM preference of ISTfu1 TnpB polypeptide

The predicted TAM sequence for ISTfu1 TnpB polypeptide is 5'-TGAT, which is similar to the TAM sequence (5'-TTGAT) of ISDra2 TnpB. In order to further identify the difference between them, a series of reporting plasmids comprising TAM sequences 5'-nTGAT (n=T, A, G, or T) were constructed as described in Example 3.

The test plasmids encoding ISTfu1 TnpB polypeptide or ISDra2 TnpB polypeptide and their respective guide RNAs was co-transfected with the reporting plasmids into 293T cells, respectively, and the resulted cells were detected by flow cytometry, as described in Example 3.

The results were shown by the GFP percentages, which normalized to the GFP percentage of the ISDra2 TnpB/TTGAT group.

As shown in Fig. 9, ISTfu1 TnpB polypeptide can recognize all the four TAMs, and a higher efficiency of cleavage was observed for TAMs TTGAT and CTGAT, while ISDra2 TnpB polypeptide can recognize TTGAT only. Further, ISTfu1 TnpB polypeptide showed an efficiency of cleavage more than 2 times higher than ISDra2 TnpB polypeptide.

Example 5. RNA-guided cleavage of an endogenous gene in human cells

This Example was carried out to verify the RNA-guided cleavage of an endogenous gene in human cells by the TnpB polypeptide of the invention. The gRNA backbones used in this Example are those listed in Table 1.

5.1. Editing in hDNMT1 by ISTfu1 TnpB

Plasmids encoding ISTfu1 TnpB polypeptide (SEQ ID NO: 7) and a gRNA comprising a targeting region of SEQ ID NO: 142 (Target sequence 1 in hDNMT1) and a backbone of SEQ ID NO: 117 were constructed and transfected into HEK293T cells as described in Example 3. After an incubation at 37 ℃ for two days, the transfected cells were collected for the isolation of genomic DNA with an isolation kit (DP201, Bioteke Corporation, Beijing) according to the manufacturer’s instructions The genomic DNA was detected by Surveyor assay to identify the efficiency of cleavage, and the genomic DNA from untreated 293T cells were used as control.

Surveyor assay was performed by reference to Guschin et al., 2010 (Guschin et al., A Rapid and General Assay for Monitoring Endogenous Gene Modification, Methods in Molecular Biology, 2010, 649: 247-256) . In brief, 100ng genomic DNA was used in a 25 μL PCR reaction system using AccuPrime Taq polymerase (Invitrogen, USA) and the primers of SEQ ID NOs: 218 and 219. The PCR conditions were as follows: 94℃ for 2 min; 30X (94 ℃ for 20 s, 60℃ for 20 s, 68℃ for 40 s) ; 68℃ for 3 min; hold at 4℃.

6.5 μL PCR products were then denatured, and annealed with 3 μL 1XAccuPrime Buffer II, then digested with 0.5 mL Surveyor nuclease (Integrated DNA Technologies, IDT, USA) . Samples were run on 10%acrylamide TBE gel, stained with ethidium bromide for 10 min, rinsed with water and then exposed on Bio-rad gel imager. The band intensities were quantified using Image J software, and the genome editing efficiency was calculated using the equation: %genome editing =100 * (1 - (1 -fraction cleaved) ^1/2) .

As shown in Fig. 10A, Lanes #1, #2, and #3 showed cleavage by Surveyor nuclease (the indels%was 14.5%, 16.6%or 20.0%) , indicating that ISTfu1 TnpB polypeptide can achieve RNA-guided cleavage of human DNMT1 in human cells.

5.2. Editing in hTET1, hTET2 and hHPRT by ISDge10 TnpB

Plasmids encoding ISDge10 TnpB polypeptide (SEQ ID NO: 17) and a gRNA comprising a targeting region of SEQ ID NO: 143 (Target sequence 1 in hTET1) , 144 (Target sequence 1 in hTET2) or 145 (Target sequence in hHPRT) and a backbone of SEQ ID NO: 127 were constructed and transfected into 293T cells as described in Example 3. Plasmids encoding a spCas9 and gRNA comprising the same targeting region were used as reference.

The transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 with the primers listed below.

As shown in Fig. 10B, ISDge10 TnpB polypeptide achieved RNA-guided cleavage of hTET1, hTET2 and hHPRT, thereby introducing indel into the same indicating that ISDge10 TnpB polypeptide can achieve RNA-guided cleavage of hTET1, hTET2 and hHPRT in human cells.

5.3. Editing in hDNMT1 by ISAba30 TnpB

Plasmids encoding ISAba30 TnpB polypeptide (SEQ ID NO: 22) and a gRNA comprising a targeting region of SEQ ID NO: 146 (Target sequence 2 in hDNMT1) and a backbone of SEQ ID NO: 132 were constructed and transfected into 293T cells as described in Example 3.

The transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 with the primers of SEQ ID NOs: 226 and 227.

As shown in Fig. 10C, ISAba30 TnpB polypeptide achieved RNA-guided cleavage of hDNMT1, thereby introducing indel (indels%of 24.9%and 26.0%for #1 and #2, respectively) into the same indicating that ISAba30 TnpB polypeptide can achieve RNA-guided cleavage of hDNMT1 in human cells.

5.4. Editing in hTET1 and hTET2 by ISAam1 TnpB

Plasmids encoding ISAam1 TnpB polypeptide (SEQ ID NO: 23) and a gRNA comprising a targeting region of SEQ ID NO: 147 (Target sequence 2 in hTET1) or 148 (Target sequence 2 in hTET2) and a backbone of SEQ ID NO: 133 were constructed and transfected into 293T cells as described in Example 3. Plasmids encoding a spCas9 and gRNA comprising the same targeting region were used as reference.

The transfected cells were tested by Surveyor assay of the genomic DNA as described in Example 5.1 the primers listed below.

As shown in Fig. 10D, ISAam1 TnpB polypeptide achieved RNA-guided cleavage of hTET1 and hTET2, thereby introducing indel into the same indicating that ISAam1 TnpB polypeptide can achieve RNA-guided cleavage of hTET1 and hTET2 in human cells with an indels%comparable to or even higher than spCas9.

5.5. Editing in hDNMT1, hDNMT3b, hTET, and hPGK1 by ISYmu1 TnpB

Plasmids encoding ISYmu1 TnpB polypeptide (SEQ ID NO: 24) and a gRNA comprising a targeting region of SEQ ID NO: 142, 149 (Target sequence in hDNMT3b) , 150 (Target sequence 3 in hDNMT1) , 151 (Target sequence 3 in hTET2) or 152 (Target sequence in hPGK1) and a backbone of SEQ ID NO: 207 or 210 were constructed and transfected into 293T cells as described in Example 3.

The transfected cells were tested by Surveyor assay of the genomic DNA as described in

Example 5.1 the primers listed below.

As shown in Fig. 10E, ISYmu1 TnpB polypeptide achieved RNA-guided cleavage of hDNMT1, hDNMT3b, hTET, and hPGK1, thereby introducing indel into the same, indicating that ISAba30 TnpB polypeptide can achieve RNA-guided cleavage of hDNMT1, hDNMT3b, hTET, and hPGK1 in human cells.

5.6. Comparison of ISAam1 and ISYmu1 with ISDra2 by editing in human genes

Plasmids encoding ISAam1 TnpB polypeptide (SEQ ID NO: 23) /ISYmu1 TnpB polypeptide (SEQ ID NO: 24) /ISDra2 TnpB polypeptide and a gRNA comprising a targeting region in AGBL1-1, APOB-3, EMX1, MECP2, PGK1, and TET1 genes (see Table 2) and a backbone of SEQ ID NO: 133 (for ISAam1) , SEQ ID NO: 210 (for ISYmu1) or SEQ ID NO: 158 (for ISDra2) .

Table 2

As shown in Fig. 13, ISAam1 and ISYmu1 TnpB polypeptide achieved higher editing efficiency than ISDra2 in most of the tested genes in human cells.

Example 6. Analysis of the gRNA backbone for TnpB polypeptide

The gRNA backbones for ISDra2, ISTfu1, ISDge10, ISAba30, ISAam1, and ISYmu1 TnpB polypeptides were designed as “N” nucleotides at the right end of the IS (referred to as “gN” ) , and the sequences thereof are shown in Tables 3-8. That is, “N” indicates the left end of the gRNA backbone.

Table 3. gRNA backbones for ISDra2

Table 4. gRNA backbones for ISTfu1

Table 5. gRNA backbones for ISDge10

Table 6. gRNA backbones for ISAba30

Table 7. gRNA backbones for ISAam1

Table 8. gRNA backbones for ISYmu1

The RNA-guided cleavages by the ISDra2, ISTfu1, ISDge10, ISAba30, ISAam1, and ISYmu1 TnpB polypeptides were conducted as described in Example 3, and the results were shown in Fig. 11.

In particular, ISDra2 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 129nt (Fig. 11A) ; ISTfu1 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 139nt (Fig. 11B) ; ISDge10 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 122nt (Fig. 11C) ; ISAba30 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 101nt (Fig. 11D) ; ISAam1 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 100nt (Fig. 11E) ; and ISYmu1 TnpB polypeptide was able to cleavage the target sequence with a gRNA backbone of 120nt (Fig. 11F) . The TnpB polypeptides was also able to cleavage the target sequence with longer gRNA backbones (Figs. 11A-11F) , indicating that it is possible to use longer gRNA backbone to form RNP.

Backbone designs with altered right end based on backbones (based on backbones of SEQ ID NOs: 157, 166, and 178, respectively, see Figs. 10A-10C, right panels) were conducted for ISDra2, ISTfu1 and ISDge10 TnpB polypeptides (see Figs. 10A-10C, right panels) , and the RNA-guided cleavages were also conducted as described in Example 3. As shown in Figs. 10A-10C, the substitution of the right end nucleotide inhibited the RNA-guided cleavages, and the substitution of the two right end nucleotides greatly inhibited the RNA-guided cleavages, while the addition or deletion of one or more nucleotides at the right ends greatly inhibited or even eliminated the RNA-guided cleavages, indicating that the right end of the gRNA backbone is essential for the RNA-guided cleavage by TnpB polypeptides.

Example 7. Comparison of editing efficiency of TnpB polypeptides and Cas nucleases This Example was carried to demonstrate that the editing efficiency of the TnpB polypeptides is superior over the small Cas nucleases.

In particular, ISAam1 and ISYmu1 TnpB polypeptides were examined for their ex vivo and in vivo activity together with specificity in comparison with the five well developed CRISPR-Cas editors, which include two optimized Un1Cas12f1 variants (referred as Un1Cas12f1 and CasMINI) , AsCas12f1, optimized Nme2Cas9 (Nme2-C. NR) , and SaCas9. To make these seven systems comparable despite their different TAM/PAM requirement, we carefully chose genomic regions, within which the gRNA for each system could be designed to target sequences overlapping within a narrow range (see, e.g., Fig. 14) .

Plasmid groups, each comprising a plasmid encoding a TnpB or Cas editor, a plasmid encoding the corresponding gRNA (gRNA plasmid) and a reporting plasmid comprising the fluorescence-reporting system, were constructed and transfected into HEK293T cells and HCT116 cells (ATCC CCL-247) , and the genome editing efficiency at 10 genomic loci were tested as described in Example 3.

As shown in Figs. 15A and 15B, ISAam1 and ISYmu1 TnpB polypeptides achieved an editing efficiency which was significantly higher than the well developed small Cas editors (Un1Cas12f1, CasMINI, AsCas12f1, and Nme2-C. NR) and was comparable to SaCas9.

To further minimize the confounding effect of distances between targets, we performed comparison with overlapping (>15 bp) gRNAs against three endogenous targets in HEK293T cell line. The former patterns are largely reproduced where ISAam1 and ISYmu1 TnpBs together with SaCas9 show higher activity compared to Cas12f nucleases (Fig. 15C)

rAAVs encoding a genome editing system was prepared as described previously (see Ran, F. A. et al. 2015, In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186-191) . In brief, a nuclease expression cassette driven by CMV promoter and a gRNA/reRNA expression cassette driven by human U6 promoter were cloned between ITRs (see Fig. 16A) , and used to prepare recombinant AAV2 and AAV8. HEK293T cells were plated in 150mm dishes 12h before transfection; 30 μg helper plasmid, 15 μg AAV2 plasmid and 15 μg expression plasmid were tansfected using polyethyleneimine, and AAV vectors was purified three days later. AAV8 for mouse injection was generated by PackGene Biotech Co, with a concentration of 10¹³ gc/ml.

C2C12 cells (ATCC CRL-1772) were seeded at 5 x 10⁴ gc per well on 48-well plate. AAV2 was added to cells at a multiplicity of infection of 10⁴ gc per well. Cells were collected 4 days after transduction for genomic DNA extraction and editing efficiency analysis by next generation sequencing. With Rosa26 locus as the target, ISAam1 TnpB and Cas9 systems show appreciable activity (Fig. 16A) . Considering that AsCas12f has the lowest average editing activity among Cas12fs and Nme2-C. NR’s activity is generally lower than saCas9, we removed them from the in vivo experiment to avoid sacrificing more mice than necessary.

We then individually delivered a single AAV8 vector encoding each of five different editing systems into mice and analyzed the editing activity in the target organ liver. All experiments related to animal work described in this study were performed strictly in accordance with the guidelines for the Care and Use of Laboratory Animals, and approved by Animal Welfare and Research Ethics Committee of Institute of Zoology, Chinese Academy of Sciences. The mouse strain C57BL/6J was obtained from Vitalriver. 6-week-old female C57BL/6J mice were injected with 5 x 10¹¹ gc in 100 μl volume via tail vein. Mice were sacrificed 14 days later and liver tissues were collected for genome extraction This in vivo result roughly recapitulates the aforementioned ex vivo data where TnpB and SaCas9 systems show relatively high activity (Fig. 16B) .

For quantifying the editing specificity or off-target level, we employed one candidate-based assay in mouse N2a cells and one unbiased genome-wide assay in human HEK293T cells by iGUIDE-seq (see Nobles, C. L. et al., 2019, iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity. Genome Biol. 20, 14) . In brief, half million HEK293T cells were transfected with 1 μg nuclease plasmid, 500 ng gRNA plasmid and 50 pmol double-stranded oligodeoxynucleotide (dsODN) using Lonza 4D system (Program CM-130) . Cells were collected 3 days after nucleofection for genomic extraction. The genome library was prepared and subjected for sequencing. Specifically, for Rosa26 and Angptl3 loci in mouse, we predicted potential off-target sites for seven systems, quantified the indel frequencies at top 2 off-target sites, and calculated the ratio between off-target and on-target edits. For MAPK8 locus of human, iGUIDE-seq was performed to characterize the specificity. Among the three loci, ISAam1 shows the lowest off-target ratio for two loci (Figs. 17A and 17B) , while ISYmu1 also exhibits a low degree of off-targeting editing (no off-target editing for one locus and the second or the third lowest off-targeting editing level for the remaining two loci) .

In summary, an in-depth characterization indicates that ISAam1 and ISYmu1 TnpBs outperforms Cas12 variants in terms of ex vivo and in vivo efficiency, while exhibiting comparable performance to Cas9 variants. Moreover, their editing specificity is on par with these Cas12 or Cas9 variants.

Claims

A recombinant gene editing system comprising

- a TnpB polypeptide or a functional fragment thereof or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and

- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,

wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
The recombinant gene editing system of claim 1, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
The recombinant gene editing system of claim 1 or 2, which comprises a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
The recombinant gene editing system of any of claims 1 to 3, wherein the TnpB polypeptide or the functional fragment thereof recognizes a transposon-associated motif (TAM) adjacent to the nucleotide sequence of interest and has an endonuclease activity.
The recombinant gene editing system of any of claims 1 to 4, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
The recombinant gene editing system of claim 4, wherein the TAM consists of four consecutive nucleotides.
The recombinant gene editing system of any of claims 1 to 6, further comprising a heterologous polynucleotide.
The recombinant gene editing system of claim 7, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
A composition comprising

- a recombinant TnpB polypeptide or a functional fragment thereof,

- a target double-stranded DNA comprising a nucleotide sequence of interest and a TAM recognized by the TnpB polypeptide; and

- a recombinant guide RNA (gRNA) comprising a targeting region capable of hybridizing to the nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or a functional fragment thereof,

wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
The composition of claim 9, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
The composition of claim 9 or 10, wherein the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
The composition of any of claims 9 to 11, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
The composition of claim 11, wherein the TAM consists of four consecutive nucleotides.
The composition of any of claims 9 to 13, further comprising a heterologous polynucleotide.
The composition of claim 14, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
A method of introducing a double-strand break into a polynucleotide of interest comprising a step of contacting the polynucleotide with a recombinant gene editing system comprising

- a TnpB polypeptide or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and

- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence in the polynucleotide of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,

wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
The method of claim 16, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
The method of claim 16 or 17, wherein the gene editing system comprises the TnpB polypeptide or the functional fragment thereof and the gRNA, or a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
The method of any of claims 16 to 18, wherein the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
The method of any of claims 16 to 19, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
The method of claim 19, wherein the TAM consists of four consecutive nucleotides.
The method of any of claims 16 to 21, wherein the gene editing system further comprises a heterologous polynucleotide.
The method of claim 22, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
A method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant gene editing system comprising

- a TnpB polypeptide or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof, and

- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a portion of the genomic sequence and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA,

wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and wherein the TnpB polypeptide has an activity of RNA-guided endonuclease.
The method of claim 24, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
The method of claim 24 or 25, wherein the gene editing system comprises a first polynucleotide comprising a nucleotide sequence encoding the TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
The method of any of claims 24 to 26, wherein the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
The method of any of claims 24 to 27, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 7, 18, 21, 23 or 24.
The method of claim 27, wherein the TAM consists of four consecutive nucleotides.
The method of any of claims 24 to 29, wherein the gene editing system further comprises a heterologous polynucleotide.
The method of claim 30, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
The method of any of claims 24 to 31, wherein the cell is a prokaryotic or eukaryotic cell.
A modified TnpB polypeptide comprising a modification in the DDE motif as compared to the parent TnpB polypeptide, wherein the parent polypeptide has an activity of RNA-guided endonuclease, and wherein the modified TnpB is deprived of the activity of cleaving double-stranded DNA.
The modified TnpB polypeptide of claim 33, wherein at least one amino acid in the DDE motif is substituted by alanine, an amino acid corresponding to N31 of SEQ ID NO: 7 is substituted by alanine.
The modified TnpB polypeptide of claim 33 or 34, wherein the parent TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis.
The modified TnpB polypeptide of claim 33, wherein the modified TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
A recombinant system comprising

- the modified TnpB polypeptide of any of claims 33 to 36, or a functional fragment thereof, or a polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or the functional fragment thereof, and

- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
The recombinant system of claim 37, which comprises a first polynucleotide comprising a nucleotide sequence encoding the modified TnpB polypeptide or the functional fragment thereof and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
The recombinant system of claim 37 or 38, wherein the gRNA further comprises one or more protein-binding domains.
A method of modifying a genomic sequence in a cell comprising a step of introducing into the cell a recombinant system of any of claims 37 to 39 and a gene editing system targeting the genomic sequence, wherein the nucleotide sequence of interest is next to the genomic sequence.
A fusion polypeptide comprising a TnpB polypeptide, or a functional fragment thereof, or a disarmed variant thereof fused to a fusion partner, wherein the TnpB polypeptide is derived from a microorganism selected from a group consisting of Enterococcus faecium, Aeromonas salmonicida, Clostridium perfringens, Methanosarcina mazei, Bacillus cereus, Aeromonas media, Thermobifida fusca, Campylobacter coli, Synechococcus sp. JA-3-3Ab, Thermosynechococcus elongatus, Nostoc sp. PCC 7120, Clostridium botulinum type C C-Stockholm Bacteriophage c-st, Methanosarcina acetivorans, Escherichia coli, Salmonella enterica, Halorubrum halophilum, Klebsiella pneumoniae, Deinococcus geothermalis, Acinetobacter baumannii, Raoultella ornithinolytica, Anoxybacillus amylolyticus, Youngiibacter multivorans, Cytobacillus sp. CY-G, Clostridium vitabionis, Aeribacillus pallidus and Bacillus thuringiensis, and has an activity of RNA-guided endonuclease, and wherein the disarmed variant is the modified TnpB polypeptide of any of claims 33-36.
The fusion polypeptide of claim 41, wherein the TnpB polypeptide comprises an amino acid sequence at least 70%identical to SEQ ID NO: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 or 29.
The fusion polypeptide of claim 41 or 42, wherein the TnpB polypeptide has the activity of cleaving double-stranded DNA.
The fusion polypeptide of any of claims 41 to 43, wherein the TnpB polypeptide or the functional fragment thereof recognizes a TAM adjacent to the nucleotide sequence of interest and has an endonuclease activity.
The fusion polypeptide of claim 44, wherein the TAM consists of four consecutive nucleotides.
A gene editing system comprising

- the fusion polypeptide of any of claims 41-44, or a polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide, and

- a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof, or a polynucleotide comprising the nucleotide sequence encoding the gRNA.
The gene editing system of claim 46, which comprises a first polynucleotide comprising a nucleotide sequence encoding the fusion polypeptide and a second polynucleotide comprising the nucleotide sequence encoding the gRNA.
The gene editing system of claims 46 or 47, further comprising a heterologous polynucleotide.
The gene editing system of claim 48, wherein the heterologous polynucleotide is an expression cassette, a transgene, a donor DNA, or a polynucleotide modification template.
A method of modifying a genomic sequence in a eukaryotic cell, comprising a step of introducing the gene editing system of any of claims 46-49 into the eukaryotic cell, wherein the gRNA comprises a targeting region capable of hybridizing to a portion of the genomic sequence.
A method of screening TnpB polypeptide for the activity of cleaving double-stranded DNA comprising the steps of:

- providing a candidate TnpB polypeptide from a microorganism;

- providing a gRNA comprising a targeting region and a backbone region, wherein the backbone region comprises 100-350 nucleotides before the 3’ end of the IS, which naturally comprises the nucleotide sequence encoding the TnpB polypeptide;

- providing a target DNA comprising a nucleotide sequence that hybridizes to the nucleotide sequence of the targeting region and a TAM recognized by the TnpB polypeptide, wherein the TAM consists of four or five consecutive nucleotides adjacent to the 5’ end of the IS;

- contacting the TnpB polypeptide with the gRNA and the target DNA; and

- detecting the cleavage on the target DNA.
The method of claim 51, wherein the TnpB polypeptide is provided as a first polynucleotide comprising a first nucleotide sequence encoding the same.
The method of claim 51 or 52, wherein the gRNA is provided as a second polynucleotide comprising a second nucleotide sequence encoding the same.
The method of claim 52 or 53, wherein the first and second polynucleotides are provided in a first plasmid.
The method of claim 52, wherein the first nucleotide sequence is operably linked to a first promoter.
The method of claim 53, wherein the second nucleotide sequence is operably linked to a second promoter.
The method of any of claims 51-56, wherein the target DNA is provided in a second plasmid.
The method of claim 57, wherein contacting the TnpB polypeptide with the gRNA and the target DNA comprising introducing the first and second plasmids into a host cell comprising the target DNA.
The method of any of claims 51-58, wherein the TnpB polypeptide comprises a a N-terminal HTH domain, a central domain, a C-terminal Zinc finger domain, and a DDE motif.
A fusion polypeptide comprising the modified TnpB polypeptide of any of claims 33-36 fused to a fusion partner.
The fusion polypeptide of claim 60, wherein the fusion partner is a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g., a histone) associated with the target DNA.
The fusion polypeptide of claim 60, wherein the fusion partner is a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
The fusion polypeptide of claim 60, wherein the fusion partner is a polypeptide that directly provides for increased transcription of the target nucleic acid.
The fusion polypeptide of claim 63, wherein the fusion partner is a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator.
The fusion polypeptide of claim 60, wherein the fusion partner is another polypeptide or domain to generate double-strand breaks.
The fusion polypeptide of claim 60, wherein the fusion partner is a polypeptide that directs editing of single or multiple bases in a polynucleotide sequence.
The fusion polypeptide of claim 66, wherein the fusion partner is a site-specific deaminase that can change the identity of a nucleotide, for example from C-G to T-A or an A-T to G-C.
The fusion polypeptide of claim 66, wherein the fusion partner is a deaminase such as a cytidine deaminase, an adenine deaminase, APOBEC1, APOBEC3A, BE2, BE3, BE4, or ABEs.
The fusion polypeptide of claim 66, wherein the fusion partner includes base edit repair inhibitors and glycosylase inhibitors.
The fusion polypeptide of claim 60, wherein the fusion partner can be a Cas endonuclease or another TnpB endonuclease as described in the present disclosure.
The fusion polypeptide of claim 60, wherein the fusion partner is a nuclear localization sequence (NLS) .
A recombinant adeno-associated virus (rAAV) comprising a genome comprising a first expression cassette encoding the fusion polypeptide of any of claims 41-45 and 60-71.
The rAAV of claim 72, wherein the genome comprises a second expression cassette encoding a guide RNA (gRNA) comprising a targeting region capable of hybridizing to a nucleotide sequence of interest and a backbone region capable of binding to the TnpB polypeptide or the functional fragment thereof.
The rAAV of claim 71 or 72, wherein the first expression cassette comprises less than about 4,700 nucleotides, less than about 4,600 nucleotides, less than about 4,500 nucleotides, less than about 4,400 nucleotides, less than about 4,300 nucleotides, less than about 4,200 nucleotides, less than about 4,100 nucleotides, or less than about 4,000 nucleotides.