CN116355878B

CN116355878B - Novel TnpB programmable nuclease and application thereof

Info

Publication number: CN116355878B
Application number: CN202310177144.3A
Authority: CN
Inventors: 彭楠; 徐颖; 刘涛; 王静; 熊彬杨
Original assignee: Huazhong Agricultural University
Current assignee: Huazhong Agricultural University
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2024-04-26
Anticipated expiration: 2043-02-28
Also published as: CN116355878A

Abstract

The invention relates to novel TnpB programming nuclease, belonging to the field of nucleic acid editing. Specifically, the invention provides a novel TnpB programming nuclease which has lower homology with the reported TnpB enzyme and has larger difference in characteristics, can realize the activity of the nuclease in cells and outside the cells, and has wide application prospect.

Description

Novel TnpB programmable nuclease and application thereof

Technical Field

The invention relates to novel TnpB programming nuclease, belonging to the field of nucleic acid editing.

Background

TnpB is a transposon encoded protein belonging to the Ω system, and in recent years, it has been found that it is also a novel RNA-mediated programmatic nuclease (Karvelis T,Druteika G,Bigelyte G,et al.Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease[J].Nature.2021,599(7886):692-696;Han AT,Soumya K,F Esra D,et al.The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases[J].Science,374(6563):57-65.). with nucleic acid cleavage and editing activity, which has advantages in cell delivery efficiency due to the smaller volume of TnpB compared to Cas9, can be used for genome editing and modification in vivo and in vitro, and can show great application prospects in the fields of gene therapy, species genome modification, trait improvement, and the like as a novel nucleic acid editing tool in the future. The development of different types of TnpB proteins facilitates the use of the enzyme in the field of nucleic acid editing. However, as a transposon-encoded protein which is ubiquitous in microorganisms, there are more than 40 tens of thousands of TnpB proteins annotated in the NCBI database, but not every TnpB can be used as a tool for nucleic acid cleavage and editing.

ISDra2 from staphylococcus radiodurans (Deinococcus radiodurans) is a relatively well-studied TnpB programmatic nuclease that cleaves 5' -TTGAT as a TAM sequence under the direction of gRNA of a single shot card structure at non-targeting and targeting sequences at target positions to form cross-dissociated cohesive ends, thereby allowing identification, cleavage and editing of target nucleic acid sequences in vitro and in vivo.

In order to discover more TnpB programming nucleases which are different from ISDra2 in characteristics, the invention separates some novel TnpB programming nucleases from archaea, has a far evolution relationship with ISDra2 and a very low similarity (less than 30%), and further discovers that the novel TnpB programming nucleases have unique values in the aspects of identified TAM sites, omega RNA structures, enzyme activity sites, temperature sensitivity, metal ion dependence and the like, and have great application prospects.

Disclosure of Invention

The invention provides a composition, which is characterized by comprising the following components:

a) A TnpB protein or one or more nucleotide sequences encoding the TnpB protein;

b) An omega RNA molecule or one or more nucleotide sequences encoding the omega RNA molecule, the omega RNA molecule capable of forming a complex with a) the TnpB protein and directing TnpB to recognize a target sequence;

Wherein, tnpB protein is selected from any one of the No. 1-22 sequences listed in Table 1;

Wherein the omega RNA molecule is of a double hairpin structure; optionally, the omega RNA molecule has the sequence of ：UUAAGAAGGACUUGACUUUGGCUGACCGUGUGUUUGUAUGUCCUAAAUGUGGUUGGACUGUAGAUCGUGACUAUAAUGCUUCUCUAAAUAUUCUUCGUGCGGGGUCGGGACUGCCCUUAGAGCCUGUGGACAGGGGACCUCUGCUAUACAUUCCCUUCUCAGAAGGGGUGUAUAGUAAGUUUCUUGGAAGAAGCAGGAAAUCUCCAUCGUGAGGUGGAGAUGCCACGUCCGUAAGGGCGGGGUUGUUCAC.

In some embodiments, the TAM (Transposon Associated Motif) sequence 5 'to the upper target sequence is 5' -TTTAA or 5'-ATTAA or 5' -TCTAA or 5'-TGTAA or 5' -TATAA or 5'-TTCAA or 5' -TTAAA or 5'-TTTCA or 5' -TTTGA or 5'-TTTTA or 5' -TTTAC or 5'-TTTAG or 5' -TTTAT.

In some embodiments, the above-described compositions further comprise one or more metal ions; the metal ions comprise magnesium ions, manganese ions or calcium ions; the concentration of the metal ions was 10mM. The addition of these metal ions can enhance the activity of TnpB described above.

The invention also provides a carrier system capable of encoding the above composition.

The invention also provides an engineered host cell containing the composition; in some embodiments, the host cell is a microbial cell, such as a Pediococcus acidilactici, lactobacillus reuteri, or E.coli cell; or animal cells, such as HEK293T cells; or a plant cell.

The invention also provides application of the composition, the vector system and the host cell in the field of nucleic acid recognition or modification.

The invention also provides application of the composition, the vector system and the host cell in the field of phage infection resistance.

The invention also provides a nucleic acid recognition or modification method, which is characterized in that a target sequence and the composition are placed in an environment of 37-85 ℃; in some embodiments, the ambient temperature is 37 ℃ or 42 ℃ or 55 ℃ or 65 ℃ or 75 ℃ or 85 ℃; in some embodiments, the ambient temperature described above is 75 ℃. The TnpB provided by the invention has wider temperature adaptability, and can be used in a wider temperature range, so that different host cells can be used for culturing. In addition, the enzymatic activity of TnpB above is also highest if the ambient temperature reaches 75 ℃.

The present invention also provides a TnpB mutant protein, which is characterized in that it contains any one of the sequences 1 to 22 listed in Table 1 and has a mutation at the position D187 and/or E271 corresponding to the sequence No. 6. The D187 and E271 sites of the SiRe_0632 sequence of the No. 6 protein are key sites influencing the enzyme activity, the enzyme activity is seriously influenced after mutation, meanwhile, the two sites are conserved in the 22 SiRe proteins provided by the invention, and any site of the SiRe proteins can be obtained through protein sequence similarity comparison, and the mutant protein is obtained through mutation.

The invention also provides application of the mutant protein in the field of nucleic acid recognition or modification. The mutant proteins lose enzymatic activity and thus recognize the target sequence under the direction of ωRNA, but cannot cleave. In combination with adenine or cytosine deaminase, a base editor can be developed to realize single base editing of a target sequence.

The invention has the beneficial effects that: the invention provides a novel TnpB enzyme, which has low homology with the reported TnpB enzyme ISDra, has larger difference in characteristics, can show nuclease activity in cells and outside cells, has unique value in the aspects of identified TAM sites, omega RNA structures, enzyme activity sites, temperature sensitivity, metal ion dependence and the like, and has wide application prospect.

Drawings

The structure and evolutionary relationship of the newly isolated TnpB of FIG. 1. A: tnpB schematic structural diagram. LE: LEFT ELEMENT sequences; RE: RIGHT ELEMENT sequences. B: the evolution of 22 SiRe-like TnpB proteins was related to the published ISDra2 TnpB proteins.

FIG. 2 SiRe class TnpB TAM sequences, omega RNA sequences and constructs. A: TAM and LE. B: omega RNA scaffold and guide sequences. C: omega RNA structure. D: siRe protein purification diagram.

FIG. 3 enzyme activity at different temperatures. A: at different temperatures, sisTnpB1 had cleavage effect on plasmid. L: molecular weight standard; ctrl: blank control group; OC: nicking the plasmid; FLL: a linear plasmid; SC: supercoiled plasmid; 37-85: different temperatures. B: enzyme activity at different temperatures was reflected by FLL quantitative statistics. C: specificity of cleavage. Target represents the Target sequence. D: metal ion specificity. -: no metal ions. E: enzyme active site analysis; f: cleavage pattern of double-stranded DNA. The arrow indicates the cutting position.

FIG. 4 SisTnpB1 simultaneously cleaves dsDNA and ssDNA. A. B: sisTnpB1 cleaves dsDNA carrying TAM and the target sequence at 75deg.C. C. D: sisTnpB1 cleaves dsDNA carrying TAM and the target sequence at 37 ℃. E. F: g, H under 75 ℃ conditions: sisTnpB1 cleaves dsDNA carrying the target sequence but without TAM at 37 ℃. I. J: sisTnpB1 cleaved dsDNA without target sequence and TAM at 75deg.C. K. L: sisTnpB1 cleaves ssDNA with the target sequence and with TAM (K) and without TAM (L) at 75 ℃. M, N: sisTnpB1 cleaves ssDNA with the target sequence and with and without TAM (M) at 37 ℃. O, P: sisTnpB1 cleaves ssDNA without the target sequence and TAM at 75deg.C. FAM: fluorescent labeling.

FIG. 5 explores the Seed sequence and TAM diversity by base mutation. A: sisTnpB 1A schematic representation of the matching of the guide RNA to the target sequence. B: sisTnpB1 of the 5 consecutive bases of the target sequence were mutated and then cleaved. C: sisTnpB1 to the target sequence +1 to +10 position on single base mutation after cutting. D: sisTnpB1 to the-1 to-5 positions on the Tam position of the target sequence. Ctrl: control without SisTnpB 1.

FIG. 6 uses SisTnpB to edit the bacterial genome.

A: exogenous carrying SisTnpB and guide RNA thereof are designed, and interference schematic diagram is carried out on two endogenous plasmids of Pediococcus acidilactici. B: sisTnpB1 interference of plasmids carrying GE00037 and GE00033, respectively, GE00014+GE00039 is a common gene on both plasmids. C: designing exogenous carrying SisTnpB and guide RNA, and carrying out gene knockout schematic on two target sites on Pyre genes on pediococcus acidilactici genome. D: sisTnpB1 results of gene knockout at two targeting sites on Pyre gene. E: the transformants were compiled and run-off sequenced.

Detailed Description

The following definitions and methods are provided to better define the present application and to guide those of ordinary skill in the art in the practice of the present application. Unless otherwise indicated, terms are to be construed according to conventional usage by those of ordinary skill in the relevant art. All patent documents, academic papers, industry standards, and other publications cited herein are incorporated by reference in their entirety.

Unless otherwise indicated, nucleic acids are written in the 5 'to 3' direction from left to right; the amino acid sequence is written in the amino to carboxyl direction from left to right. Amino acids may be represented herein by their commonly known three-letter symbols or by the single-letter symbols recommended by the IUPAC-IUB biochemical nomenclature committee. Likewise, nucleotides may be referred to by commonly accepted single letter codes. The numerical range includes the numbers defining the range. As used herein, "nucleic acid" includes reference to deoxyribonucleotide or ribonucleotide polymers in either single-or double-stranded form, and unless otherwise limited, includes known analogs (e.g., peptide nucleic acids) having the basic properties of natural nucleotides that hybridize to single-stranded nucleic acids in a manner similar to naturally occurring nucleotides. As used herein, the term "encode" or "encoded" when used in the context of a particular nucleic acid, means that the nucleic acid contains the necessary information to direct translation of the nucleotide sequence into a particular protein. The information encoding the protein is represented using codons. As used herein, reference to a "full-length sequence" of a particular polynucleotide or protein encoded thereby refers to an entire nucleic acid sequence or an entire amino acid sequence having a natural (non-synthetic) endogenous sequence. The full length polynucleotide encodes the full length, catalytically active form of the particular protein. The terms "polypeptide", "polypeptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The term is used for amino acid polymers in which one or more amino acid residues are artificial chemical analogs of the corresponding naturally occurring amino acid. The term is also used for naturally occurring amino acid polymers. The terms "residue" or "amino acid" are used interchangeably herein to refer to an amino acid that is incorporated into a protein, polypeptide, or peptide (collectively, "protein"). Amino acids may be naturally occurring amino acids, and unless otherwise limited, may include known analogs of natural amino acids, which analogs may function in a similar manner to naturally occurring amino acids.

In the present application, the terms "comprises," "comprising," or variations thereof, are to be understood to encompass other elements, numbers, or steps in addition to those described. "subject plant" or "subject plant cell" refers to a plant or plant cell in which genetic engineering has been effected, or a progeny cell of a plant or cell so engineered, which progeny cell comprises the engineering. "control" or "control plants" provide a reference point for measuring phenotypic changes in a subject plant.

Those skilled in the art will readily recognize that advances in the field of molecular biology, such as site-specific and random mutagenesis, polymerase chain reaction methods, and protein engineering techniques, provide a wide range of suitable tools and procedures for engineering or engineering amino acid sequences and potentially genetic sequences of proteins of interest.

In some embodiments, the nucleotide sequences of the present application may be altered to make conservative amino acid substitutions. The principles and examples of conservative amino acid substitutions are described further below. In certain embodiments, substitutions may be made to the nucleotide sequences of the present application in accordance with published species codon preferences without altering the amino acid sequence. In some embodiments, a portion of the nucleotide sequence in the present application is replaced with a different codon encoding the same amino acid sequence, such that the amino acid sequence encoded thereby is not changed while the nucleotide sequence is changed. Conservative variants include those sequences that encode the amino acid sequence of one of the proteins of an embodiment due to the degeneracy of the genetic code. Those skilled in the art will recognize that amino acid additions and/or substitutions are generally based on the relative similarity of amino acid side chain substituents, e.g., hydrophobicity, charge, size, etc., of the substituents. Exemplary amino acid substituents having various of the aforementioned contemplated properties are well known to those skilled in the art and include arginine and lysine; glutamic acid and aspartic acid; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine. Guidelines for suitable amino acid substitutions that do not affect the biological activity of the protein of interest can be found in the model of Dayhoff et al (1978) Atlas of Protein Sequence and Structure (protein sequence and structure atlas) (Natl. Biomed. Res. Foundation, washington, D.C.), incorporated herein by reference. Conservative substitutions, such as substitution of one amino acid for another with similar properties, may be made. Identification of sequence identity includes hybridization techniques. For example, all or part of a known nucleotide sequence is used as a probe for selective hybridization with other corresponding nucleotide sequences present in a cloned genomic DNA fragment or population of cDNA fragments (i.e., a genomic library or cDNA library) from a selected organism. The hybridization probes may be genomic DNA fragments, cDNA fragments, RNA fragments, or other oligonucleotides, and may be labeled with a detectable group such as 32P or other detectable marker. Thus, for example, hybridization probes can be prepared by labeling synthetic oligonucleotides based on the sequences of the embodiments. Methods for preparing hybridization probes and constructing cDNA and genomic libraries are generally known in the art. Hybridization of the sequences may be performed under stringent conditions. As used herein, the term "stringent conditions" or "stringent hybridization conditions" refers to conditions under which a probe will hybridize to its target sequence to a detectably greater extent (e.g., at least 2-fold, 5-fold, or 10-fold over background) relative to hybridization to other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the hybridization stringency and/or controlling the washing conditions, target sequences 100% complementary to the probes can be identified (homologous probe method). Alternatively, stringent conditions can be adjusted to allow for some sequence mismatches in order to detect lower similarity (heterologous probe method). Typically, the probe is less than about 1000 or 500 nucleotides in length. Typically, stringent conditions are those in which the salt concentration is less than about 1.5M Na ion, typically about 0.01M to 1.0M Na ion concentration (or other salt) at a pH of 7.0 to 8.3, and the temperature conditions are: when used with short probes (e.g., 10 to 50 nucleotides), at least about 30 ℃; when used with long probes (e.g., greater than 50 nucleotides), at least about 60 ℃. Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide. Exemplary low stringency conditions include hybridization at 37 ℃ with 30% to 35% formamide buffer, 1M NaCl, 1% sds (sodium dodecyl sulfate), washing in 1 x to 2 x SSC (20 x SSC = 3.0M naci/0.3M trisodium citrate) at 50 ℃ to 55 ℃. Exemplary moderately stringent conditions include hybridization in 40% to 45% formamide, 1.0M NaCl, 1% SDS at 37℃and washing in 0.5 XSSC to 1 XSSC at 55℃to 60 ℃. Exemplary high stringency conditions include hybridization in 50% formamide, 1M NaCl, 1% sds at 37 ℃ and a final wash in 0.1 x SSC at 60 ℃ to 65 ℃ for at least about 20 minutes. Optionally, the wash buffer may comprise about 0.1% to about 1% sds. The duration of hybridization is typically less than about 24 hours, typically from about 4 hours to about 12 hours. Specificity generally depends on post-hybridization washing, the key factors being the ionic strength and temperature of the final wash solution. The Tm (thermodynamic melting point) of DNA-DNA hybrids can be approximated from the formula Meinkoth and Wahl (1984) Anal. Biochem. 138:267-284: tm=81.5 ℃ +16.6 (log) +0.41 (% GC) -0.61 (% formamide) -500/L; where M is the molar concentration of monovalent cations,% GC is the percentage of guanosine and cytosine nucleotides in the DNA,% formamide is the percentage of formamide in the hybridization solution, and L is the base pair length of the hybrid. Tm is the temperature (at a defined ionic strength and pH) at which 50% of the complementary target sequence hybridizes to a perfectly matched probe. Washing is typically performed at least until equilibrium is reached and a low hybridization background level is reached, such as 2 hours, 1 hour, or 30 minutes. Each 1% mismatch corresponds to a decrease in Tm of about 1 ℃; thus, tm, hybridization, and/or wash conditions can be adjusted to hybridize to sequences of desired identity. For example, if sequences with ≡90% identity are desired, the Tm can be reduced by 10 ℃. Typically, stringent conditions are selected to be about 5 ℃ lower than the Tm for the specific sequence and its complement at a defined ionic strength and pH. However, under very stringent conditions, hybridization and/or washing may be performed at 4℃below the Tm; hybridization and/or washing may be performed at 6 ℃ below the Tm under moderately stringent conditions; hybridization and/or washing can be performed at 11℃below the Tm under low stringency conditions.

In some embodiments, fragments of the nucleotide sequence and the amino acid sequence encoded thereby are also included. As used herein, the term "fragment" refers to a portion of the nucleotide sequence of a polynucleotide or a portion of the amino acid sequence of a polypeptide of an embodiment. Fragments of a nucleotide sequence may encode protein fragments that retain the biological activity of the native or corresponding full-length protein and thus have protein activity. Mutant proteins include biologically active fragments of a native protein that comprise consecutive amino acid residues that retain the biological activity of the native protein.

In the present invention, a "target sequence" or "target polynucleotide" or "target nucleic acid" may be any polynucleotide that is endogenous or exogenous to a cell (e.g., a prokaryotic or eukaryotic cell). For example, the target polynucleotide may be a polynucleotide that is present in the nucleus of a eukaryotic cell. The target polynucleotide may be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or unwanted DNA). In some cases, the target sequence should be related to the protospacer adjacent motif (PAM or TAM).

"Target sequence" refers to a polynucleotide targeted by a guide sequence in a gRNA, e.g., a sequence that has complementarity to the guide sequence, wherein hybridization between the target sequence and the guide sequence will promote the formation of a CRISPR or TnpB complex (including Cas or TnpB protein and gRNA). Complete complementarity is not necessary so long as sufficient complementarity exists to cause hybridization and promote the formation of a complex. The target sequence may comprise any polynucleotide, such as DNA or RNA. In some cases, the target sequence is located either inside or outside the cell. In some cases, the target sequence is located in the nucleus or cytoplasm of the cell. In some cases, the target sequence may be located within an organelle of a eukaryotic cell, such as a mitochondria or chloroplast.

The following examples are illustrative of the application and are not intended to limit the scope of the application. Modifications and substitutions to methods, procedures, or conditions of the present application without departing from the spirit and nature of the application are intended to be within the scope of the present application. Examples follow conventional experimental conditions, such as the molecular cloning laboratory Manual of Sambrook et al (Sambrook J & Russell DW, molecular cloning: alaboratory manual, 2001), or conditions recommended by the manufacturer's instructions, unless otherwise indicated. Unless otherwise indicated, all chemical reagents used in the examples were conventional commercial reagents, and the technical means used in the examples were conventional means well known to those skilled in the art.

EXAMPLE 1 isolation of TnpB protein from archaea

As TnpB has smaller volume compared with Cas9, the cell delivery efficiency is improved, and the cell delivery efficiency can be used for genome editing in vivo and in vitro, so that the cell delivery efficiency can be used as a novel nucleic acid editing tool in the future, and the cell delivery efficiency has great application prospects in the fields of gene therapy, species genome modification, character improvement and the like. The development of different types of TnpB proteins facilitates the use of the enzyme in the field of nucleic acid editing. However, as a transposon-encoded protein which is ubiquitous in microorganisms, there are 40 or more kinds of TnpB proteins annotated in NCBI database, and not every TnpB can be used as a tool for nucleic acid cleavage and editing.

So far, RNA guided endonucleases for genome editing have only been identified from bacteria. Bacterial insert sequences of the IS200/IS605 and IS607 families encode RNA-directed TnpB endonucleases that are reprogrammed for genome editing. These families of insertion sequences are also widely separated in archaea (Archaea). However, it is unclear whether TnpB, and in particular which TnpB, are present in archaea that can be used for genome editing.

The present invention identifies 149194 IS200/605 elements from the sequenced and assembled archaea genome, from which 8574 IS200/605 elements encoding both TnpA and TnpB genes are screened. Conservative catalytic amino acid residue alignments have found TnpB encoded by some IS200/605, possibly with programmatic nuclease activity. They have far more distant evolutionary relationships with the published ISDra2 TnpB proteins, have very low similarity (< 30%), are a novel class of TnpB proteins, and may have different properties than ISDra2 TnpB.

By comparing LEFT ELEMENT sequences of IS200/605, conserved TAM sequences were identified from the immediate outside of conserved LEFT ELEMENT; guide RNA sequences and 3' -immediately adjacent guide sequences were predicted from conserved RIGHT ELEMENT sequences. And further through activity experiments, 22 proteins are found to have cleavage activity, while other proteins, such as WP_198539375.1, WP_240570379.1, WP_240570379.1, MCL4379344.1, WP_014512934.1, WP_014513195.1, ADX85008 and the like, have no cleavage activity. These 22 proteins (information shown in Table 1) are therefore expected to be developed into valuable TnpB programmatic nucleases.

Table 1 isolated TnpB protein with Activity

The structure, sequence similarity and evolutionary relationship of these 22 proteins all differ significantly from ISDra2 (wp_ 010887311.1) (fig. 1).

Through RNA comparison and belief analysis, TAM (Transposon Associated Motif) is also different from ISDra, 5' -TTTAA is found, the sequence and structure of omega RNA is also different from ISDra (see figure 2 for details), and the double hairpin structure is unique. The sequence of the omega RNA obtained is as follows:

UUAAGAAGGACUUGACUUUGGCUGACCGUGUGUUUGUAUGUCCUAAAUGUGGUUGGACUGUAGAUCGUGACUAUAAUGCUUCUCUAAAUAUUCUUCGUGCGGGGUCGGGACUGCCCUUAGAGCCUGUGGACAGGGGACCUCUGCUAUACAUUCCCUUCUCAGAAGGGGUGUAUAGUAAGUUUCUUGGAAGAAGCAGGAAAUCUCCAUCGUGAGGUGGAGAUGCCACGUCCGUAAGGGCGGGGUUGUUCAC.

Example 2 analysis of SiRe protein Properties

The invention further analyzes the characteristics of SiRe protein, and mainly comprises activity at different temperatures, enzyme cleavage specificity, dependence on metal ions, enzyme active sites and double-stranded DNA cutting modes.

1. Activity at different temperatures.

One of the SiRe proteins SiRe_0632 (designated SisTnpB 1) and a specific Target site thereof and a plasmid containing TAM are subjected to cleavage experiments at different temperatures in vitro, and the specific steps are as follows:

5.4nM SisTnpB1 RNP complex (complex formed by SisTnpB1 and ωRNA) and 110ng of puc19 plasmid DNA carrying double-stranded oligo-chain clones of different target sequences and TAM sequences were added to 10mM Tris-HCl buffer (pH 7.5), 1mM DTT, 1mM EDTA, 100mM NaCl, 10mM MgCl ₂ buffer and reacted at different temperatures (37-85 ℃) for 60 minutes. The reaction was stopped by adding 20mM proteinase K and 4% SDS solution and incubating at 37℃for 1 hour. Then, a 2 Xsupported dye was added and the cleavage reaction was analyzed by agarose or denaturing PAGE electrophoresis. The DNA fragments in the agarose gel were visualized by ethidium bromide staining, and the DNA fragments in denaturing PAGE were detected using a FUJIFILM scanner (FLA-5100).

The results showed that the programmer enzyme had specific cleavage activity at 37 to 85℃and the optimal enzyme activity was 75℃C (FIG. 3A, B).

2. Specificity of cleavage.

The cleavage specificity was measured by reacting SisTnpB1 RNP complex with puc19 plasmid DNA carrying no target sequence and no TAM sequence, carrying target sequence and no TAM sequence, and carrying target and TAM sequences, respectively, according to the method described in 1, with the reaction temperature set at 75 ℃. The specificity of the programmer was found to be high, with cleavage activity only when the TAM sequence and the Target sequence were present (fig. 3C).

3. Metal ion dependence.

The reaction temperature was set at 37℃and 10mM MgCl ₂ in the buffer was replaced with MnCl ₂、CaCl₂、ZnCl₂ or NiCl ₂, respectively, and the metal ion dependence was examined according to the method described in 1. The programmer enzyme was found to have a broader spectrum of metal ion dependence, with the highest activity in the presence of manganese ions and lower activity in the presence of zinc nickel (fig. 3D). Therefore, the enzyme activity can be increased by adding metal ions such as magnesium, manganese and calcium, and especially adding manganese ions. Specific data are shown in Table 2 (enzyme activity reflected by linear plasmid percentage).

TABLE 2 quantitative enzyme Activity of SisTnpB1 on Metal ion dependence

4. Enzyme active site.

The invention discovers a plurality of nuclease active sites existing on RuvC structural domains through sequence alignment of SiRe proteins and analysis. The invention respectively mutates the sites to obtain mutant protein, and the reaction temperature is set at 37 ℃ according to the method described in 1, so as to detect the enzymatic activity of the mutant protein. The significant decrease in enzyme activity was found after mutation at two sites D187 and E271 (fig. 3E), demonstrating the important role of these two sites on enzyme activity. The mutant proteins lose enzymatic activity and thus recognize the target sequence under the direction of ωRNA, but cannot cleave. The SiRe protein can be modified by mutating the two sites, and can be developed into a base editor by combining adenine or cytosine deaminase, so that single base editing of a target sequence is realized.

5. Double-stranded DNA cleavage scheme.

The linearized double-stranded plasmid formed after cleavage of the plasmid was further recovered and sequenced, and the sequencing results showed that the cleavage pattern of its double-stranded DNA was in a staggered dissociation mode at 15-18nt of the TAM segment on the non-targeting sequence and 20-28nt of the TAM segment on the targeting sequence (fig. 3F), resulting in a 5' protruding cohesive end.

These results show that the SiRe proteins differ from ISDra2 in their activity at different temperatures, cleavage specificity, dependence on metal ions, enzyme active site and double-stranded DNA cleavage pattern, revealing some unique properties of such proteins.

Example 3 analysis of the dependence of SisTnpB protein on target and TAM sequences

To investigate whether SisTnpB1 was able to cleave double-stranded DNA (dsDNA) and single-stranded DNA (ssDNA) simultaneously, the present invention tested the cleavage capacity of TnpB at different temperatures with short double-stranded or single-stranded oligonucleotides with or without TAM and target sequences as substrates. TnpB1 was found to rapidly cleave dsDNA carrying TAM and 20bp target sequence at 75deg.C; whereas cleavage of the target sequence and non-target sequence (NTS, nontarget sequence) sequences was strongly reduced at 37 ℃. If no omega RNA matching sequence is present, sisTnpB1 has no cleavage activity and shows stronger specificity. At an optimal enzyme activity temperature of 75 ℃, sisTnpB1 showed very weak cleavage on the target strand of the dsDNA substrate, whereas there was no cleavage on the non-target strand without TAM sequence. Little cleavage activity at 37℃showed a more stringent PAM dependence, consistent with plasmid cleavage results. SisTnpB1 also cleaved a matched single stranded DNA at 75℃whether TAM was present or not. Whereas the presence of TAM sequences on ssDNA substrates resulted in higher cleavage efficiency (fig. 4).

These results indicate that sisTnpB when used to cleave double-stranded DNA, has a high Target dependence over a broad range of temperatures. sisTnpB1 also has a strong Target dependence when used for cleavage of single stranded DNA.

Example 4 development of TAM site diversity

In order to research whether sisTnpB1 recognizes more diverse TAM sites, the invention mutates the basic group Or/>) The effect of the target sequence and TAM sequence on SisTnpB endonuclease activity was studied in the introduction of the target sequence and adjacent TAM (fig. 5A). Mutations M1-M5 converted +1gccaa+5 to +1cggtt+5 on the target sequence, which almost abrogated DNA cleavage by SisTnpB1 (fig. 5B); whereas the M11-M15 and M16-M20 mutations have less effect on target DNA cleavage, indicating that the target sequence is located at positions +1 to +10. In addition, M-1 to M-5, in which the TAM sequence was mutated, almost eliminated DNA cleavage (FIG. 5B).

We introduced single inversion mutation in SEED and TAM to find out the +1 nucleotide of target sequenceThe reverse mutation of (a) strongly inhibited SisTnpB1 cleavage, while the other single reversals had less effect on SisTnpB cleavage of the target DNA (fig. 5C), indicating that SisTnpB1 was highly tolerant to the target DNA sequence mutation.

More importantly, sisTnpB1 can recognize a wider variety of TAM sequences, including 5′-TTTAA、5′-ATTAA、5′-TCTAA、5′-TGTAA、5′-TATAA、5′-TTCAA、5′-TTAAA、5′-TTTCA、5′-TTTGA、5′-TTTTA、5′-TTTAC、5′-TTTAG、5′-TTTAT,, which greatly expands the application range of the protein.

EXAMPLE 5 editing of microbial, animal, plant cell genomes Using SisTnpB1

The invention selects Pediococcus acidilactici as a representative, and tests SisTnpB1 the effect of editing the bacterial genome. The method comprises the following specific steps:

The interfering plasmid encoding SisTnpB RNP 1 was transferred into Pediococcus acidilactici, on which Guide RNA (5' -TTTAA as TAM) was designed targeting the endogenous plasmid of the bacterium, and the results after transformation indicated that the endogenous plasmid was targeted and consumed by cleavage (FIG. 6A, B).

In addition, the protein knockout plasmid is designed, two target sites taking 5'-ATTTAA or 5' -TTTAT as TAM are selected on pyrE genes, the target sites are respectively cultured at 37 degrees and 45 degrees after transformation, PCR amplification is carried out by designing primers around the target sites, the higher efficiency of knockout of the genes can be achieved (FIG. 6C, D), and the deletion of the sequences of the sites is verified by sequencing (FIG. 6E).

These show SisTnpB that the invention can be applied to interference and genome editing of bacteria, and the same experimental design has higher editing efficiency when used in lactobacillus reuteri and escherichia coli.

In addition, by testing the editing effect of three sites AGBL, EMX1 and AAVS1 in HEK293T cells, it was shown that SisTnpB was able to successfully edit the animal cell genome.

In addition, the effect of SisTnpB1 on the editing of animal cells was tested, as represented by HEK293T cells. The target sequence after selection of optimal TAM 5' TTTAA at three sites AGBL, EMX1 and AAVS1 revealed that all three SisTnpB1 targets showed DNA Double Strand Break (DSB) repair by high throughput sequencing of the target sequence. However, the editing effect of the target cannot be detected in HEK293T cells tested in the control group, and the results show that SisTnpB1 can successfully edit the genome of the animal cells.

Also, the effect of corn test protein on the editing of plant genome was selected. The maize editing system was designed to target the target region at the optimal TAM in both ms26 and wax genes, and after 24h transformation, the immature embryos were incubated at 45 ℃ for a total of 3 days for 4 hours each. Embryo incubation at 37 ℃ was also used as a control. The incubation time and length at 37 ℃ was the same as 45 ℃ treatment, and was maintained at this temperature during the experiment. After the treatment, embryos were collected and high throughput sequenced for the target region, which showed that SisTnpB generated targeted mutations at both ms26 and wax sites in the 45 ℃ incubation treatment, indicating that they achieved successful editing in plant cells.

Example 6 use SisTnpB1 to combat phage infection

SisTnpB1 may also be used to combat phage infection.

GuideRNA interfering plasmids were designed that specifically interfere with the E.coli phage genome. The plasmid was transferred into E.coli Rosseta strain, liquid cultured to OD 0.6-0.8 and plated. E.coli containing SisTnpB and guideRNA were found to be resistant to phage infection by a 10-fold dilution gradient of E.coli virus T5 and T7 on plates in different areas after about 8 hours compared to the control.

Through tests, other SiRe proteins besides SisTnpB < 1 > also have the same characteristics, functions and technical effects.

While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. A composition, comprising:

a) A TnpB protein or one or more nucleic acid molecules encoding the TnpB protein;

b) An omega RNA molecule or one or more nucleic acid molecules encoding the omega RNA molecule, the omega RNA molecule capable of forming a complex with a) the TnpB protein and directing TnpB to recognize a target nucleic acid;

Wherein, tnpB protein is selected from the protein with the sequence number of 6 in the table 1;

Wherein the omega RNA molecule has a double hairpin structure and the sequence is that ：UUAAGAAGGACUUGACUUUGGCUGACCGUGUGUUUGUAUGUCCUAAAUGUGGUUGGACUGUAGAUCGUGACUAUAAUGCUUCUCUAAAUAUUCUUCGUGCGGGGUCGGGACUGCCCUUAGAGCCUGUGGACAGGGGACCUCUGCUAUACAUUCCCUUCUCAGAAGGGGUGUAUAGUAAGUUUCUUGGAAGAAGCAGGAAAUCUCCAUCGUGAGGUGGAGAUGCCACGUCCGUAAGGGCGGGGUUGUUCAC.

2. The composition of claim 1, wherein the composition further comprises one or more metal ions.

3. The composition of claim 2, wherein the metal ions comprise magnesium ions or manganese ions or calcium ions.

4. A composition according to claim 3, wherein the concentration of the metal ions is 10mM.

5. A carrier system capable of encoding the composition of any one of claims 1-4.

6. An engineered host cell comprising the composition of any one of claims 1-4, said host cell being a microbial cell or an animal cell.

7. Use of the composition of any one of claims 1-4, or the vector system of claim 5, or the host cell of claim 6 in the field of nucleic acid recognition or modification for non-disease diagnosis and therapeutic purposes.

8. The use of claim 7, wherein the use comprises targeted cleavage of double-stranded DNA, single-stranded DNA, or targeted recognition of a target nucleic acid.

9. The use of claim 6, further comprising combating phage infection.

10. A method of nucleic acid recognition or modification, characterized in that a target nucleic acid and the composition of any one of claims 1-4 are placed in an environment of 37-85 ℃, said method being a method for non-disease diagnosis and treatment purposes.

11. The method of claim 10, wherein the ambient temperature is 37 ℃ or 42 ℃ or 55 ℃ or 65 ℃ or 75 ℃ or 85 ℃.

12. The method of claim 11, wherein the ambient temperature is 75 ℃.