CN116355878A

CN116355878A - Novel TnpB programmability nuclease and application thereof

Info

Publication number: CN116355878A
Application number: CN202310177144.3A
Authority: CN
Inventors: 彭楠; 徐颖; 刘涛; 王静; 熊彬杨
Original assignee: Huazhong Agricultural University
Current assignee: Huazhong Agricultural University
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-06-30
Anticipated expiration: 2043-02-28
Also published as: CN116355878B

Abstract

The invention relates to a novel TnpB programming nuclease, belonging to the field of nucleic acid editing. Specifically, the invention provides a novel TnpB programming nuclease, which has lower homology with the reported TnpB enzyme, has larger difference in characteristics, can realize nuclease activity in cells and outside cells, and has wide application prospect.

Description

Novel TnpB programmability nuclease and application thereof

Technical Field

The invention relates to a novel TnpB programming nuclease, belonging to the field of nucleic acid editing.

Background

TnpB IS a transposon-encoded protein belonging to the omega system, and has been found in recent years to be a novel RNA-mediated programmatic nuclease having nucleic acid cleavage and editing activity (Karvelis T, druteika G, bigelyte G, et al Transposon-associated TnpB IS a programmable RNA-guided DNA endonuclease [ J ]. Nature.2021,599 (7886): 692-696;Han AT,Soumya K,F Esra D,et al.The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases [ J ]. Science,374 (6563): 57-65.). Because the volume of TnpB is smaller than that of Cas9, the TnpB has an advantage in cell delivery efficiency, can be used for genome editing and modification in vivo and in vitro, and can be used as a novel nucleic acid editing tool to show great application prospects in the fields of gene therapy, species genome modification, character improvement and the like in the future. The discovery of different types of TnpB proteins facilitates the use of the enzyme in the field of nucleic acid editing. However, as a transposon-encoded protein that is ubiquitous in microorganisms, there are more than 40 tens of thousands of TnpB proteins annotated in NCBI database, but not every TnpB can be used as a tool for nucleic acid cleavage and editing.

ISDra2 from Staphylococcus radiodurans (Deinococcus radiodurans) is a well-studied TnpB programing nuclease that cleaves 5' -TTGAT as TAM sequence under the guidance of gRNA of a single shot card structure at a target site non-targeting sequence and targeting sequence to form cross-dissociated cohesive ends, thereby allowing identification, cleavage and editing of target nucleic acid sequences in vitro and in vivo.

In order to discover more TnpB programming nucleases which are different from ISDra2 in characteristics, the invention separates some novel TnpB programming nucleases from archaea, has far evolutionary relationship with the ISDra2 and low similarity (less than 30 percent), and further discovers that the TnpB programming nucleases have unique values in the aspects of identified TAM sites, omega RNA structures, enzyme activity sites, temperature sensitivity, metal ion dependence and the like, and have great application prospects.

Disclosure of Invention

The invention provides a composition, which is characterized by comprising the following components:

a) A TnpB protein or one or more nucleotide sequences encoding the TnpB protein;

b) An omega RNA molecule or one or more nucleotide sequences encoding the omega RNA molecule, the omega RNA molecule capable of forming a complex with a) the TnpB protein and directing the recognition of a target sequence by TnpB;

wherein the TnpB protein is selected from any one of the sequences 1-22 listed in Table 1;

wherein the omega RNA molecule is of a double hairpin structure; optionally, the sequence of the omega RNA molecule is: UUAAGAAGGACUUGACUUUGGCUGACCGUGUGUUUGUAUGUCCUAAAUGUGGUUGGACUGUAGAUCGUGACUAUAAUGCUUCUCUAAAUAUUCUUCGUGCGGGGUCGGGACUGCCCUUAGAGCCUGUGGACAGGGGACCUCUGCUAUACAUUCCCUUCUCAGAAGGGGUGUAUAGUAAGUUUCUUGGAAGAAGCAGGAAAUCUCCAUCGUGAGGUGGAGAUGCCACGUCCGUAAGGGCGGGGUUGUUCAC.

In some embodiments, the TAM (Transposon Associated Motif) sequence 5 'to the upper target sequence is 5' -TTTAA or 5'-ATTAA or 5' -TCTAA or 5'-TGTAA or 5' -TATAA or 5'-TTCAA or 5' -TTAAA or 5'-TTTCA or 5' -TTTGA or 5'-TTTTA or 5' -ttac or 5 '-ttag or 5' -ttat.

In some embodiments, the above-described compositions further comprise one or more metal ions; the metal ions comprise magnesium ions, manganese ions or calcium ions; the concentration of the metal ions was 10mM. The addition of these metal ions can enhance the activity of TnpB described above.

The invention also provides a carrier system capable of encoding the above composition.

The invention also provides an engineered host cell containing the composition; in some embodiments, the host cell is a microbial cell, such as a Pediococcus acidilactici, lactobacillus reuteri, or E.coli cell; or animal cells, such as HEK293T cells; or a plant cell.

The invention also provides application of the composition, the vector system and the host cell in the field of nucleic acid recognition or modification.

The invention also provides application of the composition, the vector system and the host cell in the field of phage infection resistance.

The invention also provides a nucleic acid recognition or modification method, which is characterized in that a target sequence and the composition are placed in an environment of 37-85 ℃; in some embodiments, the ambient temperature is 37 ℃ or 42 ℃ or 55 ℃ or 65 ℃ or 75 ℃ or 85 ℃; in some embodiments, the ambient temperature described above is 75 ℃. The TnpB provided by the invention has wider temperature adaptability, and can be used in a wider temperature range, so that different host cells are used for culture. In addition, the enzymatic activity of TnpB described above is also highest if the ambient temperature reaches 75 ℃.

The present invention also provides a TnpB mutein characterized in that the protein contains any one of the sequences No. 1 to 22 listed in table 1 and has a mutation at the position D187 and/or E271 corresponding to the sequence No. 6. The D187 and E271 sites of the SiRe_0632 sequence of the No. 6 protein are key sites influencing the enzyme activity, the enzyme activity is seriously influenced after mutation, meanwhile, the two sites are conserved in the 22 SiRe proteins provided by the invention, and any site of the SiRe proteins can be obtained through protein sequence similarity comparison, and the mutant protein is obtained through mutation.

The invention also provides application of the mutant protein in the field of nucleic acid recognition or modification. The mutant proteins lose enzymatic activity and thus recognize the target sequence under the direction of ωRNA, but cannot cleave. In combination with adenine or cytosine deaminase, a base editor can be developed to realize single base editing of a target sequence.

The invention has the beneficial effects that: the invention provides a novel TnpB enzyme, which has low homology with reported TnpB enzyme ISDra2, has larger difference in characteristics, can realize nuclease activity in cells and outside cells, has unique value in the aspects of identified TAM sites, omega RNA structures, enzyme activity sites, temperature sensitivity, metal ion dependence and the like, and has wide application prospect.

Drawings

FIG. 1 structure and evolutionary relationship of the newly isolated TnpB. A: tnpB structure schematic diagram. LE: a Left element sequence; RE: right element sequence. B: the evolution of 22 SiRe-class TnpB proteins was related to published ISDra2 TnpB proteins.

FIG. 2 TAM sequence, omega RNA sequence and structure of SiRe-like TnpB. A: TAM and LE. B: omega RNA scaffold and guide sequences. C: omega RNA structure. D: siRe protein purification diagram.

FIG. 3 enzyme activity at different temperatures. A: cleavage effect of SisTnpB1 on plasmid at different temperatures. L: molecular weight standard; ctrl: blank control group; OC: nicking the plasmid; FLL: a linear plasmid; SC: supercoiled plasmid; 37-85: different temperatures. B: enzyme activity at different temperatures was reflected by FLL quantitative statistics. C: specificity of cleavage. Target represents the Target sequence. D: metal ion specificity. -: no metal ions. E: enzyme active site analysis; f: cleavage pattern of double-stranded DNA. The arrow indicates the cutting position.

FIG. 4 SisTnpB1 simultaneously cleaves dsDNA and ssDNA. A. B: sisTnpB1 cleaves dsDNA carrying TAM and the target sequence at 75 ℃. C. D: sisTnpB1 cleaves dsDNA carrying TAM and the target sequence at 37 ℃. E. F: g, H at 75℃: sisTnpB1 cleaves dsDNA carrying the target sequence but without TAM at 37 ℃. I. J: sisTnpB1 cleaves dsDNA without the target sequence and TAM at 75 ℃. K. L: sisTnpB1 cleaves ssDNA with the target sequence and with TAM (K) and without TAM (L) at 75 ℃. M, N: sisTnpB1 cleaves ssDNA with the target sequence and with (M) and without (N) TAM at 37 ℃. O, P: sisTnpB1 cleaves ssDNA without the target sequence and TAM at 75 ℃. FAM: fluorescent labeling.

FIG. 5 explores the Seed sequence and TAM diversity by base mutation. A: schematic representation of the matching of the guide RNA of SisTnpB1 to the sequence of interest. B: sisTnpB1 pair target sequence in which 5 continuous bases are mutated and then pair-cut. C: cleavage after single base mutation at positions +1 to +10 of the target sequence by SisTnpB 1. D: cleavage after single base mutation at positions-1 to-5 of the Tam position of the target sequence by SisTnpB 1. Ctrl: control group without SisTnpB 1.

FIG. 6 editing of bacterial genomes using SisTnpB 1.

A: exogenous SisTnpB1 and guide RNA thereof are designed, and two endogenous plasmids of Pediococcus acidilactici are subjected to interference schematic diagrams. B: interference of the plasmids carrying GE00037 and GE00033, respectively, by SisTnpB1, GE00014+GE00039 is a common gene on both plasmids. C: designing exogenous SisTnpB1 and guide RNA thereof, and carrying out gene knockout schematic diagram on two target sites on Pyre genes on pediococcus acidilactici genome. D: results of gene knockout of two targeting sites on the Pyre gene by SisTnpB 1. E: the transformants were compiled and run-off sequenced.

Detailed Description

The following definitions and methods are provided to better define the present application and to guide those of ordinary skill in the art in the practice of the present application. Unless otherwise indicated, terms are to be construed according to conventional usage by those of ordinary skill in the relevant art. All patent documents, academic papers, industry standards, and other publications cited herein are incorporated by reference in their entirety.

Unless otherwise indicated, nucleic acids are written in the 5 'to 3' direction from left to right; the amino acid sequence is written in the amino to carboxyl direction from left to right. Amino acids may be represented herein by their commonly known three-letter symbols or by the single-letter symbols recommended by the IUPAC-IUB biochemical nomenclature committee. Likewise, nucleotides may be referred to by commonly accepted single letter codes. The numerical range includes the numbers defining the range. As used herein, "nucleic acid" includes reference to deoxyribonucleotide or ribonucleotide polymers in either single-or double-stranded form, and unless otherwise limited, includes known analogs (e.g., peptide nucleic acids) having the basic properties of natural nucleotides that hybridize to single-stranded nucleic acids in a manner similar to naturally occurring nucleotides. As used herein, the term "encode" or "encoded" when used in the context of a particular nucleic acid, means that the nucleic acid contains the necessary information to direct translation of the nucleotide sequence into a particular protein. The information encoding the protein is represented using codons. As used herein, reference to a "full-length sequence" of a particular polynucleotide or protein encoded thereby refers to an entire nucleic acid sequence or an entire amino acid sequence having a natural (non-synthetic) endogenous sequence. The full length polynucleotide encodes the full length, catalytically active form of the particular protein. The terms "polypeptide", "polypeptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The term is used for amino acid polymers in which one or more amino acid residues are artificial chemical analogs of the corresponding naturally occurring amino acid. The term is also used for naturally occurring amino acid polymers. The terms "residue" or "amino acid" are used interchangeably herein to refer to an amino acid that is incorporated into a protein, polypeptide, or peptide (collectively, "protein"). Amino acids may be naturally occurring amino acids, and unless otherwise limited, may include known analogs of natural amino acids, which analogs may function in a similar manner to naturally occurring amino acids.

In this application, the terms "comprises," "comprising," or variations thereof, are to be understood to encompass other elements, numbers, or steps in addition to those described. "subject plant" or "subject plant cell" refers to a plant or plant cell in which genetic engineering has been effected, or a progeny cell of a plant or cell so engineered, which progeny cell comprises the engineering. "control" or "control plants" provide a reference point for measuring phenotypic changes in a subject plant.

Those skilled in the art will readily recognize that advances in the field of molecular biology, such as site-specific and random mutagenesis, polymerase chain reaction methods, and protein engineering techniques, provide a wide range of suitable tools and procedures for engineering or engineering amino acid sequences and potentially genetic sequences of proteins of interest.

In some embodiments, the nucleotide sequences of the present application may be altered to make conservative amino acid substitutions. The principles and examples of conservative amino acid substitutions are described further below. In certain embodiments, substitutions may be made to the nucleotide sequences of the present application in accordance with the disclosed codon bias of the species without altering the amino acid sequence. In some embodiments, a portion of the nucleotide sequence herein is replaced with a different codon encoding the same amino acid sequence, such that the amino acid sequence encoded thereby is not changed while the nucleotide sequence is changed. Conservative variants include those sequences that encode the amino acid sequence of one of the proteins of an embodiment due to the degeneracy of the genetic code. Those skilled in the art will recognize that amino acid additions and/or substitutions are generally based on the relative similarity of amino acid side chain substituents, e.g., hydrophobicity, charge, size, etc., of the substituents. Exemplary amino acid substituents having various of the aforementioned contemplated properties are well known to those skilled in the art and include arginine and lysine; glutamic acid and aspartic acid; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine. Guidelines for suitable amino acid substitutions that do not affect the biological activity of the protein of interest can be found in the model of Dayhoff et al (1978) Atlas of Protein Sequence and Structure (protein sequence and structure atlas) (Natl. Biomed. Res. Foundation, washington, D.C.), incorporated herein by reference. Conservative substitutions, such as substitution of one amino acid for another with similar properties, may be made. Identification of sequence identity includes hybridization techniques. For example, all or part of a known nucleotide sequence is used as a probe for selective hybridization with other corresponding nucleotide sequences present in a cloned genomic DNA fragment or population of cDNA fragments (i.e., a genomic library or cDNA library) from a selected organism. The hybridization probes may be genomic DNA fragments, cDNA fragments, RNA fragments, or other oligonucleotides, and may be labeled with a detectable group such as 32P or other detectable marker. Thus, for example, hybridization probes can be prepared by labeling synthetic oligonucleotides based on the sequences of the embodiments. Methods for preparing hybridization probes and constructing cDNA and genomic libraries are generally known in the art. Hybridization of the sequences may be performed under stringent conditions. As used herein, the term "stringent conditions" or "stringent hybridization conditions" refers to conditions under which a probe will hybridize to its target sequence to a detectably greater extent (e.g., at least 2-fold, 5-fold, or 10-fold over background) relative to hybridization to other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the hybridization stringency and/or controlling the washing conditions, target sequences 100% complementary to the probes can be identified (homologous probe method). Alternatively, stringent conditions can be adjusted to allow for some sequence mismatches in order to detect lower similarity (heterologous probe method). Typically, the probe is less than about 1000 or 500 nucleotides in length. Typically, stringent conditions are those in which the salt concentration is less than about 1.5M Na ion, typically about 0.01M to 1.0M Na ion concentration (or other salt) at a pH of 7.0 to 8.3, and the temperature conditions are: when used with short probes (e.g., 10 to 50 nucleotides), at least about 30 ℃; when used with long probes (e.g., greater than 50 nucleotides), at least about 60 ℃. Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide. Exemplary low stringency conditions include hybridization at 37 ℃ with 30% to 35% formamide buffer, 1M NaCl, 1% sds (sodium dodecyl sulfate), washing in 1 x to 2 x SSC (20 x SSC = 3.0M naci/0.3M trisodium citrate) at 50 ℃ to 55 ℃. Exemplary moderately stringent conditions include hybridization in 40% to 45% formamide, 1.0M NaCl, 1% SDS at 37℃and washing in 0.5 XSSC to 1 XSSC at 55℃to 60 ℃. Exemplary high stringency conditions include hybridization in 50% formamide, 1M NaCl, 1% sds at 37 ℃ and a final wash in 0.1 x SSC at 60 ℃ to 65 ℃ for at least about 20 minutes. Optionally, the wash buffer may comprise about 0.1% to about 1% sds. The duration of hybridization is typically less than about 24 hours, typically from about 4 hours to about 12 hours. Specificity generally depends on post-hybridization washing, the key factors being the ionic strength and temperature of the final wash solution. The Tm (thermodynamic melting point) of DNA-DNA hybrids can be approximated from the formula Meinkoth and Wahl (1984) Anal. Biochem. 138:267-284: tm=81.5 ℃ +16.6 (log) +0.41 (% GC) -0.61 (% formamide) -500/L; where M is the molar concentration of monovalent cations,% GC is the percentage of guanosine and cytosine nucleotides in the DNA,% formamide is the percentage of formamide in the hybridization solution, and L is the base pair length of the hybrid. Tm is the temperature (at a defined ionic strength and pH) at which 50% of the complementary target sequence hybridizes to a perfectly matched probe. Washing is typically performed at least until equilibrium is reached and a low hybridization background level is reached, such as 2 hours, 1 hour, or 30 minutes. Each 1% mismatch corresponds to a decrease in Tm of about 1 ℃; thus, tm, hybridization, and/or wash conditions can be adjusted to hybridize to sequences of desired identity. For example, if sequences with ≡90% identity are desired, the Tm can be reduced by 10 ℃. Typically, stringent conditions are selected to be about 5 ℃ lower than the Tm for the specific sequence and its complement at a defined ionic strength and pH. However, under very stringent conditions, hybridization and/or washing may be performed at 4℃below the Tm; hybridization and/or washing may be performed at 6 ℃ below the Tm under moderately stringent conditions; hybridization and/or washing can be performed at 11℃below the Tm under low stringency conditions.

In some embodiments, fragments of the nucleotide sequence and the amino acid sequence encoded thereby are also included. As used herein, the term "fragment" refers to a portion of the nucleotide sequence of a polynucleotide or a portion of the amino acid sequence of a polypeptide of an embodiment. Fragments of a nucleotide sequence may encode protein fragments that retain the biological activity of the native or corresponding full-length protein and thus have protein activity. Mutant proteins include biologically active fragments of a native protein that comprise consecutive amino acid residues that retain the biological activity of the native protein.

In the present invention, a "target sequence" or "target polynucleotide" or "target nucleic acid" may be any polynucleotide that is endogenous or exogenous to a cell (e.g., a prokaryotic or eukaryotic cell). For example, the target polynucleotide may be a polynucleotide that is present in the nucleus of a eukaryotic cell. The target polynucleotide may be a sequence encoding a gene product (e.g., a protein) or a non-coding sequence (e.g., a regulatory polynucleotide or unwanted DNA). In some cases, the target sequence should be related to the protospacer adjacent motif (PAM or TAM).

"target sequence" refers to a polynucleotide targeted by a guide sequence in a gRNA, e.g., a sequence that has complementarity to the guide sequence, wherein hybridization between the target sequence and the guide sequence will promote the formation of a CRISPR or TnpB complex (including Cas or TnpB proteins and gRNA). Complete complementarity is not necessary so long as sufficient complementarity exists to cause hybridization and promote the formation of a complex. The target sequence may comprise any polynucleotide, such as DNA or RNA. In some cases, the target sequence is located either inside or outside the cell. In some cases, the target sequence is located in the nucleus or cytoplasm of the cell. In some cases, the target sequence may be located within an organelle of a eukaryotic cell, such as a mitochondria or chloroplast.

The following examples are illustrative of the invention and are not intended to limit the scope of the invention. Modifications and substitutions to methods, procedures, or conditions of the present invention without departing from the spirit and nature of the invention are intended to be within the scope of the present application. Examples follow conventional experimental conditions, such as the molecular cloning laboratory Manual of Sambrook et al (Sambrook J & Russell DW, molecular cloning: alaboratory manual, 2001), or conditions recommended by the manufacturer's instructions, unless otherwise indicated. Unless otherwise indicated, all chemical reagents used in the examples were conventional commercial reagents, and the technical means used in the examples were conventional means well known to those skilled in the art.

EXAMPLE 1 isolation of TnpB protein from archaea

Because the volume of TnpB is smaller than that of Cas9, the TnpB has an advantage in cell delivery efficiency, can be used for genome editing in vivo and in vitro, and can be used as a novel nucleic acid editing tool in the future, and has great application prospects in the fields of gene therapy, species genome modification, character improvement and the like. The discovery of different types of TnpB proteins facilitates the use of the enzyme in the field of nucleic acid editing. However, as a transposon-encoded protein that is ubiquitous in microorganisms, there are 40 or more tens of thousands of TnpB proteins annotated in NCBI database alone, and not every TnpB can be used as a tool for nucleic acid cleavage and editing.

So far, RNA guided endonucleases for genome editing have only been identified from bacteria. Bacterial insert sequences of the IS200/IS605 and IS607 families encode RNA-directed TnpB endonucleases that are reprogrammed for genome editing. These families of insertion sequences are also widely separated in Archaea (Archaea). However, it is unclear whether or not there are tnpbs in archaea that can be used for genome editing, and in particular which tnpbs are.

The invention identifies 149194 IS200/605 elements from the sequenced and assembled archaea genome, and screens 8574 IS200/605 elements simultaneously encoding TnpA and TnpB genes. Conservative catalytic amino acid residue alignments have found some IS200/605 encoded TnpB, possibly with a programmatic nuclease activity. They have a far-reaching evolutionary relationship with published ISDra2 TnpB proteins, have very low similarity (< 30%), are novel TnpB proteins, and may have different properties from ISDra2 TnpB.

By comparing the Left Element sequences of IS200/605, conserved TAM sequences are identified from the immediate outside of the conserved Left Element; the guide RNA sequence and the guide sequence immediately 3' to it were predicted from the conserved sequences of the conserved Right Element. And further through activity experiments, 22 proteins are found to have cleavage activity, while other proteins, such as WP_198539375.1, WP_240570379.1, WP_240570379.1, MCL4379344.1, WP_014512934.1, WP_014513195.1, ADX85008 and the like, have no cleavage activity. These 22 proteins (information shown in Table 1) are therefore expected to be developed into valuable TnpB programmatic nucleases.

Table 1 active TnpB protein isolated

The structure, sequence similarity and evolutionary relationship of these 22 proteins all differ significantly from ISDra2 (wp_ 010887311.1) (fig. 1).

Through RNA alignment and belief analysis, TAM (Transposon Associated Motif) is also different from ISDra2, is 5' -TTTAA, and the sequence and structure of omega RNA are also different from ISDra2 (see figure 2 for details), so that the novel double hairpin structure is a unique double hairpin structure. The sequence of the omega RNA obtained is as follows:

UUAAGAAGGACUUGACUUUGGCUGACCGUGUGUUUGUAUGUCCUAAAUGUGGUUGGACUGUAGAUCGUGACUAUAAUGCUUCUCUAAAUAUUCUUCGUGCGGGGUCGGGACUGCCCUUAGAGCCUGUGGACAGGGGACCUCUGCUAUACAUUCCCUUCUCAGAAGGGGUGUAUAGUAAGUUUCUUGGAAGAAGCAGGAAAUCUCCAUCGUGAGGUGGAGAUGCCACGUCCGUAAGGGCGGGGUUGUUCAC。

example 2 analysis of SiRe protein Properties

The invention further analyzes the characteristics of SiRe protein, and mainly comprises activity at different temperatures, enzyme cleavage specificity, dependence on metal ions, enzyme active sites and double-stranded DNA cutting modes.

1. Activity at different temperatures.

One of the SiRe proteins SiRe_0632 (named SisTnpB 1) and a specific Target site thereof, and a plasmid containing TAM are subjected to a cleavage experiment at different temperatures in vitro, and the specific steps are as follows:

110ng of the puc19 plasmid DNA of the 5.4nM SisTnpB1 RNP complex (complex of SisTnpB1 and ωRNA) and the double-stranded oligo-chain clone carrying the different target sequences and TAM sequences was added to 10mM Tris-HCl buffer (pH 7.5), 1mM DTT, 1mM EDTA, 100mM NaCl, 10mM MgCl ₂ Is reacted at different temperatures (37-85 ℃) for 60 minutes. The reaction was stopped by adding 20mM proteinase K and 4% SDS solution and incubating at 37℃for 1 hour. Then, a2 Xsupported dye was added and the cleavage reaction was analyzed by agarose or denaturing PAGE electrophoresis. The DNA fragments in the agarose gel were visualized by ethidium bromide staining, and the DNA fragments in denaturing PAGE were detected using a FUJIFILM scanner (FLA-5100).

The results showed that the programmer enzyme had specific cleavage activity at 37 to 85℃and the optimal enzyme activity was 75℃C (FIG. 3A, B).

2. Specificity of cleavage.

The cleavage specificity was measured by reacting the SisTnpB1 RNP complex with puc19 plasmid DNA carrying neither the target sequence nor the TAM sequence, carrying the target sequence and the TAM sequence, and carrying the target and the TAM sequence, respectively, by the method described in 1, with the reaction temperature set at 75 ℃. The specificity of the programmer was found to be high, with cleavage activity only when the TAM sequence and the Target sequence were present (fig. 3C).

3. Metal ion dependence.

The method according to 1, wherein the reaction temperature is set at 37℃and 10mM MgCl in buffer ₂ Respectively replaced by MnCl ₂ 、CaCl ₂ 、ZnCl ₂ Or NiCl ₂ The metal ion dependence was detected. The programmer enzyme was found to have a broader spectrum of metal ion dependence, with the highest activity in the presence of manganese ions and lower activity in the presence of zinc nickel (fig. 3D). Therefore, the enzyme activity can be increased by adding metal ions such as magnesium, manganese and calcium, and especially adding manganese ions. Specific data are shown in Table 2 (enzyme activity reflected by linear plasmid percentage).

TABLE 2 quantitative enzyme Activity of SisTnpB1 on Metal ion dependence

4. Enzyme active site.

The invention discovers a plurality of nuclease active sites existing on RuvC structural domains through sequence alignment of SiRe proteins and analysis. The invention respectively mutates the sites to obtain mutant protein, and the reaction temperature is set at 37 ℃ according to the method described in 1, so as to detect the enzymatic activity of the mutant protein. The significant decrease in enzyme activity was found after mutation at two sites D187 and E271 (fig. 3E), demonstrating the important role of these two sites on enzyme activity. The mutant proteins lose enzymatic activity and thus recognize the target sequence under the direction of ωRNA, but cannot cleave. The SiRe protein can be modified by mutating the two sites, and can be developed into a base editor by combining adenine or cytosine deaminase, so that single base editing of a target sequence is realized.

5. Double-stranded DNA cleavage scheme.

The linearized double-stranded plasmid formed after cleavage of the plasmid was further recovered and sequenced, and the sequencing results showed that the cleavage pattern of its double-stranded DNA was in a staggered dissociation mode at 15-18nt of the TAM segment on the non-targeting sequence and 20-28nt of the TAM segment on the targeting sequence (fig. 3F), resulting in a 5' protruding cohesive end.

These results show the difference in the activity of the SiRe protein and ISDra2 at different temperatures, cleavage specificity, dependence on metal ions, enzyme active site and double-stranded DNA cleavage pattern, revealing some unique properties of this type of protein.

Example 3 analysis of the dependence of SisTnpB1 protein on target and TAM sequences

To investigate whether SisTnpB1 is capable of cleaving double stranded DNA (dsDNA) and single stranded DNA (ssDNA) simultaneously, the present invention examined the cleavage capacity of TnpB1 at different temperatures with short double stranded or single stranded oligonucleotides with or without TAM and target sequences as substrates. TnpB1 was found to rapidly cleave dsDNA carrying TAM and 20bp target sequences at 75deg.C; whereas cleavage of Target Sequence (TS) and non-target sequence (NTS, nontarget sequence) sequences was strongly reduced at 37 ℃. If no omega RNA matching sequence exists, sisTnpB1 has no cleavage activity, and shows stronger specificity. At an optimal enzyme activity temperature of 75 ℃, sisTnpB1 showed very weak cleavage on the target strand of dsDNA substrate, whereas there was no cleavage on the non-target strand without TAM sequence. Little cleavage activity at 37℃showed a more stringent PAM dependence, consistent with plasmid cleavage results. SisTnpB1 also cleaves a matched single stranded DNA at 75℃whether TAM is present or not. Whereas the presence of TAM sequences on ssDNA substrates resulted in higher cleavage efficiency (fig. 4).

These results indicate that sisTnpB1 has a high Target dependence over a broad spectrum of temperatures when used to cleave double stranded DNA. The sisTnpB1 also has strong Target dependence when used for cutting single-stranded DNA.

EXAMPLE 4 exploitation of TAM site diversity

In order to study whether the sisTnpB1 recognizes more diverse TAM sites, the invention mutates the basic group

Or->

) The effect of target sequences and TAM sequences on the activity of the SisTnpB1 endoenzyme was studied by introducing them into the target sequences and adjacent TAMs (fig. 5A). Mutations M1-M5 converted +1gccaa+5 to +1cggtt+5 on the target sequence, which almost abrogated DNA cleavage by SisTnpB1 (fig. 5B); whereas the M11-M15 and M16-M20 mutations have less effect on target DNA cleavage, indicating that the target sequence is located at positions +1 to +10. In addition, M-1 to M-5, in which the TAM sequence was mutated, almost eliminated DNA cleavage (FIG. 5B).

We introduced single inversion mutation in SEED and TAM to find out the +1 nucleotide of target sequence

While the reverse mutation of (a) strongly inhibited cleavage of SisTnpB1, the other single reversals had less effect on cleavage of target DNA by SisTnpB1 (FIG. 5C), indicating that SisTnpB1 was highly tolerant to mutation of the target DNA sequence.

More importantly, sisTnpB1 can recognize more diverse TAM sequences, including 5' -TTTAA, 5' -ATTAA, 5' -TCTAA, 5' -TGTAA, 5' -TATAA, 5' -TTCAA, 5' -TTAAA, 5' -TTTCA, 5' -TTTGA, 5' -TTTTA, 5' -TTTAC, 5' -TTTAG and 5' -TTTAT, which greatly expands the application range of the protein.

EXAMPLE 5 editing of microbial, animal, plant cell genomes Using SisTnpB1

The invention selects pediococcus acidilactici as a representative, and tests the effect of SisTnpB1 on bacterial genome editing. The method comprises the following specific steps:

the interference plasmid encoding SisTnpB1 RNP was transferred into Pediococcus acidilactici, on which Guide RNA targeting the endogenous plasmid of the bacterium (5' -TTTAA as TAM) was designed, and the results after transformation indicated that the endogenous plasmid was targeted and consumed by cleavage (FIG. 6A, B).

In addition, the protein knockout plasmid is designed, two target sites taking 5'-ATTTAA or 5' -TTTAT as TAM are selected on pyrE genes, the target sites are respectively cultured at 37 degrees and 45 degrees after transformation, PCR amplification is carried out by designing primers around the target sites, higher-efficiency knockout of the genes can be achieved (figure 6C, D), and deletion of the sequences of the sites is verified by sequencing (figure 6E).

These indicate that SisTnpB1 can be applied to interference and genome editing of bacteria, and the same experimental design has higher editing efficiency when being used in lactobacillus reuteri and escherichia coli.

In addition, by testing the editing effect of three sites of AGBL1, EMX1 and AAVS1 in HEK293T cells, sisTnpB1 is shown to successfully edit the genome of animal cells.

In addition, the editing effect of SisTnpB1 on animal cells was tested, as represented by HEK293T cells. The target sequence after selecting the optimal TAM 5' TTTAA at three sites of AGBL1, EMX1 and AAVS1 reveals that all three SisTnpB1 targets show DNA Double Strand Break (DSB) repair through high throughput sequencing of the target sequence. While the editing effect of the target cannot be detected in HEK293T cells tested in the control group, the results show that SisTnpB1 can successfully edit the genome of animal cells.

Also, the effect of corn test protein on the editing of plant genome was selected. The maize editing system was designed to target the target region at the optimal TAM in both ms26 and wax genes, and after 24h transformation, the immature embryos were incubated at 45 ℃ for a total of 3 days for 4 hours each. Embryo incubation at 37 ℃ was also used as a control. The incubation time and length at 37 ℃ was the same as 45 ℃ treatment, and was maintained at this temperature during the experiment. After the end of the treatment, embryos were collected and high throughput sequenced for the target region, showing that both ms26 and wax sites produced targeted mutations in the treatment of SisTnpB incubated at 45 ℃, indicating that they achieved successful editing in plant cells.

Example 6 use of SisTnpB1 against phage infection

SisTnpB1 can also be used to combat phage infection.

The guide RNA interference plasmid which specifically interferes with the genome of the coliphage is designed. The plasmid was transferred into E.coli Rosseta strain, liquid cultured to OD 0.6-0.8 and plated. E.coli containing SisTnpB1 and guide RNA was found to be resistant to phage infection by plating E.coli virus T5 and T7 at 10-fold dilution gradients on plates of different regions after about 8h compared to the control.

Through tests, other SiRe proteins besides SisTnpB1 also have the same characteristics, functions and technical effects.

While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. A composition, comprising:

wherein the TnpB protein is selected from any one of numbers 1-22 listed in Table 1;

optionally, the omega RNA molecule is in a double hairpin structure; optionally, the sequence of the omega RNA molecule is: UUAAGAAGGACUUGACUUUGGCUGACCGUGUGUUUGUAUGUCCUAAAUGUGGUUGGACUGUAGAUCGUGACUAUAAUGCUUCUCUAAAUAUUCUUCGUGCGGGGUCGGGACUGCCCUUAGAGCCUGUGGACAGGGGACCUCUGCUAUACAUUCCCUUCUCAGAAGGGGUGUAUAGUAAGUUUCUUGGAAGAAGCAGGAAAUCUCCAUCGUGAGGUGGAGAUGCCACGUCCGUAAGGGCGGGGUUGUUCAC.

2. The composition of claim 1, wherein the TAM (Transposon Associated Motif) sequence 5' to the target sequence is 5' -TTTAA or 5' -ATTAA or 5' -TGTAA or 5' -TATAA or 5' -TTCAA or 5' -TTAAA or 5' -TTTCA or 5' -TTTGA or 5' -TTTTA or 5' -TTTAC or 5' -ttag or 5' -ttat.

3. The composition of claim 1, wherein said composition further comprises one or more metal ions; optionally, the metal ions comprise magnesium ions or manganese ions or calcium ions; optionally, the concentration of the metal ion is 10mM.

4. A carrier system capable of encoding the composition of any one of claims 1-2.

5. An engineered host cell comprising the composition of any one of claims 1-2; optionally, the host cell is a microbial cell or an animal cell or a plant cell.

6. Use of the composition, the vector system, the host cell according to claims 1-5 in the field of nucleic acid recognition or modification; optionally, the use includes targeted cleavage of double-stranded DNA, single-stranded DNA, or targeted recognition of a target nucleic acid.

7. The use of claim 6, further comprising combating phage infection.

8. A method of nucleic acid recognition or modification, characterized in that a target sequence and the composition of any one of claims 1-3 are placed in an environment of 37-85 ℃; optionally, the ambient temperature is 37 ℃ or 42 ℃ or 55 ℃ or 65 ℃ or 75 ℃ or 85 ℃; optionally, the ambient temperature is 75 ℃.

9. A TnpB mutein comprising any one of the numbers 1-22 listed in table 1 and having a mutation at position D187 and/or E271 corresponding to the protein sequence No. 6.

10. Use of the mutein according to claim 9 in the field of nucleic acid recognition or modification.