WO2021110119A1

WO2021110119A1 - Highly active transposase and application thereof

Info

Publication number: WO2021110119A1
Application number: PCT/CN2020/133796
Authority: WO
Inventors: 文雯; 宋姗姗; 刘韬; 刘祥箴; 金华君; 钱其军
Original assignee: 上海细胞治疗集团有限公司
Priority date: 2019-12-04
Filing date: 2020-12-04
Publication date: 2021-06-10
Also published as: CN112899252A

Abstract

Provided are a highly active transposase and an application thereof, the amino acid sequence of the transposase is as shown in SEQ ID NO: 2 or 12, the transposase used for a transposon system can significantly improve the gene transfer activity of the transposon. The transposase enzyme and its coding nucleotide sequence can be used to construct a gene transfer system, to prepare or be used as a drug, preparation, or tool for genome research, gene therapy, cell therapy, or multi-functional stem cell induction and/or differentiation.

Description

A highly active transposase and its application

Technical field

The invention belongs to the field of molecular biology and biomedicine, and specifically relates to a high-activity transposase and its application.

Background technique

A DNA transposon is a mobile DNA sequence that can be transposed from one position in the genome to another through a series of processes such as cutting and reintegration. PiggyBac (PB) transposon is a DNA transposon isolated from Trichoplusia ni TN368 cell line. It can be specifically inserted into the target site of "TTAA". With the help of transposase, PiggyBac transposes The transposon can accurately excise the target gene from the host without reshooting the host chromosome. PB transposon has no potential viral genotoxicity, can carry a long foreign gene fragment (up to 150kb), and has strong transformability. The transgene mediated by PB transposase has the characteristics of high integration efficiency, stable integration, long-term expression, single copy integration, insertable site location, and easy manipulation. It is often used in the production of transgenic mice and the genetic manipulation of mouse embryonic stem cells. , Gene mutagenesis and other genetic manipulation, pluripotent stem cell induction and other fields.

The transposition activity of PB transposase is the highest among existing mammalian DNA transposons, and it has a very broad application prospect. There have been many studies at home and abroad that use the PB transposon system as a method of gene editing to carry out transgene and gene mutation in a variety of organisms, including insect cells, protists, plants and vertebrates. In 2003, Tomita fused human type III collagen with enhanced green fluorescent protein EGFP and used the PB transposon to integrate into the silkworm silk protein gene to obtain a transgenic silkworm that can stably express human collagen. In 2005, Balu inserted human dihydrofolate reductase (hDHFR) into the Plasmodium genome through the PB transposon system. In 2014, Eric T obtained a stable transgenic line with PB transposon capable of transposing in vivo. In 2005, Sheng Ding used PB transposons to efficiently introduce foreign gene fragments into human cells and mouse cell lines cultured in vitro, and stably express them, cultivating stable traits of transgenic fluorescent mice, proving that PB transposons The system can be used as an effective operating tool to study the possibility of other vertebrate gene functions.

The DNA transposon system consists of two parts, the transposons with inverted repeats (IRs) at both ends that can carry the target DNA fragments, and the transposase that can catalyze the "cut and paste" of the transposon. The transposase first binds the IRs sequences on both sides of the transposon, and then removes the transposon from the host DNA site accurately and seamlessly, and finally integrates the DNA fragment into the new site. The establishment of an efficient transposition system can achieve targeted knockout of target genes or targeted introduction of target genes, providing an effective vector tool for gene editing in mammalian cells. The transposition efficiency of the transposable system determines the efficiency of gene editing, and a large part of the transposition efficiency depends on the expression level of the transposase. Therefore, increasing the transposase activity is a key technical point for increasing the transposable efficiency of transposons.

The transposition activity of transposase is affected by the binding site, active site, structure and other factors. At present, the crystal structure of transposase has not been clearly analyzed, but some domains are considered to be important structures, and experiments have proved The activity of transposase can be affected by any non-special amino acid.

A hyperactive piggyBac transposase for mammals (A hyperactive piggyBac transposase for mammalian applications, PNAS|January 25, 2011|vol.108|no.4|1531-1536) discloses a transposition efficiency of mPBase( The wild-type PiggyBac transposase optimized by mammalian codons) 10-fold high-activity PiggyBac transposase with amino acid mutations at the following positions (refers to the following existing high-activity transposase hyPBase, as shown in SEQ ID NO:1 Show): I30V, G165S, S103P, M282V, S509G, N570S and N538K.

PiggyBac transposon mutants and their applications (PiggyBac transposon variants and methods of use, US9670503B2) and PiggyBac transposon variants and methods of use (CN102421902A) are reapplications based on the priority of U.S. Provisional Application No. 61/155206. The author disclosed: continue mutation selection based on integration-deficient PiggyBac mutants with higher integrase activity than integration-deficient PiggyBac mutants, and mutation selection on the basis of wild-type PiggyBac normals with higher integration activity than wild-type PiggyBac normals body.

Although the enzyme activity of the existing PiggyBac transposase mutant is higher than that of the wild-type PiggyBac transposase, it still cannot meet the higher and stricter enzymatic activity requirements. Therefore, the research on the PiggyBac transposase with high enzymatic activity It is still necessary.

Summary of the invention

The present invention provides a new high-activity transposase, which exhibits extremely high transposition activity in E. coli, insect cells, yeast cells, mammalian cells and other cells, which is higher than the existing high-activity transposase. The active transposase hyPBase has a broad spectrum of application to host cells, and also has high transposition activity in mammalian cells, especially in human cells. It is the exploration of transposase, especially in human cells. The exploration of transposase provides new clues and basis.

The present invention also provides amino acid sequences and peptides that are the basis of the new highly active transposase of the present invention, as well as nucleotide sequences encoding the amino acid sequences, peptides and proteins of the highly active transposase of the present invention, and the nucleoside Acid sequence-based nucleic acids, nucleic acid constructs, recombinant vectors and host cells, and gene transfer systems and applications based on the above peptides, proteins, nucleic acids, nucleic acid constructs, recombinant vectors and host cell components.

In some embodiments of the present invention, the amino acid sequence of the existing highly active transposase hyPBase (shown in SEQ ID NO:1) is mutated to asparagine at position 92 and valine at position 119 to Alanine and glutamine at position 601 were mutated to arginine to obtain the target mutant amino acid sequence, as shown in SEQ ID NO: 2. In CHO cells, compared with the existing high-activity transposase hyPBase, which is codon-optimized and added to the nuclear localization signal system, the transposition efficiency (30.9%), the target high-activity enzyme generated based on the amino acid sequence of SEQ ID NO: 2 The transposition efficiency of bz-hyPBase (51.7%) is increased by nearly 21%; in PBMC cells, compared with the existing high-activity transposase hyPBase, the transposition efficiency (9.81%) is codon-optimized and added to the nuclear localization signal system. The transposition efficiency (19.4%) of the target high-activity bz-hyPBase enzyme generated based on the amino acid sequence of SEQ ID NO: 2 is increased by nearly 10%. This shows that the target high-activity enzyme based on the mutant amino acid sequence of the present invention exhibits better transposition activity than the existing high-activity transposase hyPBase, especially in mammalian cells and human-derived cells. Block activity. Therefore, some embodiments of the present invention provide a new highly active transposase, which contains one or more amino acid sequences shown in SEQ ID NO: 2, and the highly active transposase is in Escherichia coli , Insect cells, yeast cells and mammalian cells all show extremely high transposition activity, especially to meet the high transposition activity requirements of mammalian and human-derived cells.

Amino acid sequence of hyPBase transposase containing nuclear localization sequence (SEQ ID NO:1):

Target mutant amino acid sequence containing nuclear localization sequence (SEQ ID NO: 2):

(The underlined and bold italic parts are nuclear localization sequences)

The amino acid sequence of the existing high-activity transposase hyPBase (shown in SEQ ID NO: 1) is the amino acid sequence obtained by performing the above amino acid mutations at positions 92, 119, and 601 alone or at any two positions, with one or Enzymes formed based on multiple mutant amino acid sequences also have the same or similar transposition efficiency as the target high-activity transposase bz-hyPBase described in the Examples of the present invention or the same or similar to the existing hyPBase, and are also protected by the present invention. The mutant amino acid sequence of the new highly active transposase, and the enzyme formed based on the mutant amino acid sequence also belongs to the new highly active transposase to be protected by the present invention.

As mentioned above, the amino acid sequence 92, 119, and 601 of the existing highly active transposase hyPBase (shown in SEQ ID NO: 1), any two positions or three positions alone, any two positions or three positions are subjected to the above amino acid mutations. The mutated amino acid sequence of, and the amino acid sequence obtained by performing one or more amino acid deletion, substitution, insertion or addition operations that still maintain or improve the enzyme activity also belong to the replacement scheme of the technical scheme of the present invention with the same or similar technical effects. Within the scope of protection of the present invention, the mutant amino acid sequence of the new highly active transposase to be protected by the present invention is also included, and enzymes formed on the basis of one or more of this mutant amino acid sequence also belong to the new mutant amino acid sequence to be protected by the present invention. Highly active transposase.

As mentioned above, the amino acid sequence 92, 119, and 601 of the existing highly active transposase hyPBase (shown in SEQ ID NO: 1), any two positions or three positions alone, any two positions or three positions are subjected to the above amino acid mutations. The mutant amino acid sequence also contains the amino acid sequence of the functional protein. Add functional protein to the new high-activity transposase to improve or increase the function of the new high-activity transposase, such as the amino acid sequence and expression of the nuclear localization signal EGFP green fluorescent protein amino acid sequence, tag protein amino acid sequence or antibody amino acid sequence, etc. These functional proteins can improve the transposition activity of new highly active transposases. For example, nuclear localization signals can help improve the transposition activity of transposases; or can enhance the transposition monitoring function of highly active transposases, such as EGFP green Fluorescent protein or tag protein facilitates the qualitative and/or quantitative monitoring of transposase activity; or adds new functions to new highly active transposases, such as antibodies that can additionally increase immune activity.

The present invention also protects the amino acid sequence 92, 119, and 601 of the existing high-activity transposase hyPBase (shown in SEQ ID NO: 1), any two or three of the above amino acid mutations. The mutant amino acid sequence of the mutant amino acid sequence, and the derivative amino acid sequence obtained by performing one or more amino acid deletion, substitution, insertion or addition operations on the basis of the mutant amino acid sequence, which still maintains or improves the enzyme activity, is connected by peptide bonds after dehydration and condensation of the amino acids The chain compound, that is, peptide. The number of peptides containing the above-mentioned mutant amino acids or the above-mentioned derived amino acid sequences can be one or more. The peptide is also connected with the functional protein's amino acid sequence after being dehydrated and condensed by amino acids and then connected by peptide bonds, such as the peptide of nuclear localization signal, the peptide of expressing EGFP green fluorescent protein, and the peptide of tag protein. Segment or antibody peptide segment, etc.

The present invention uses the existing high-activity transposase hyPBase (shown in SEQ ID NO: 1) amino acid sequence 92, 119, 601 alone, any two positions or three positions to carry out the above amino acid mutations. The amino acid sequence, and the peptide fragment formed based on the mutant amino acid sequence, and the derivative amino acid obtained by performing one or more amino acid deletions, substitutions, insertions or additions on the basis of the mutant amino acid sequence and still maintain or improve the enzyme activity The sequence and the protein formed on the basis of the peptide fragment formed on the basis of the derived amino acid sequence belong to the new highly active transposase protected by the present invention. The number of the above-mentioned mutant amino acid sequence, derivative amino acid sequence, and peptide segments formed on the basis of the above-mentioned mutant amino acid sequence and derivative amino acid sequence in the new highly active transposase is one or more.

A mutant nucleotide sequence encoding the above-mentioned new highly active transposase, peptide fragment and its amino acid sequence of the present invention, a nucleotide sequence complementary to, hybridizing or overlapping with the mutant nucleotide sequence, or the mutant core The nucleotide sequence undergoes base substitution, deletion or addition operations and has a nucleotide sequence encoding a new highly active transposase, or a nucleotide sequence that has at least 80% homology with the mutant nucleotide sequence, Preferably, a nucleotide sequence having at least 90% or more homology with the mutant nucleotide sequence, and preferably a nucleotide sequence having at least 96% or more homology with the mutant nucleotide sequence, all belong to the present invention. The number of protected mutant nucleotide sequences encoding the new highly active transposase, peptides and amino acid sequences of the present invention can be one or multiple repeated copies. details as follows:

The nucleotide sequence encoding the amino acid sequence of the existing high-activity enzyme hyPBase (SEQ ID NO:1) is optimized by human codons to obtain a human codon optimized nucleotide sequence, and the nucleotide sequence is optimized with human codons Based on the sequence (SEQ ID NO: 4), the following base mutations were made: base T at 276 was mutated to base C, base T at 356 was mutated to base C, and base G at base 900 was mutated to Base A, base A at position 1802 is mutated to base G; a mutant nucleotide sequence encoding the amino acid sequence of the new highly active transposase bz-hyPBase (shown in SEQ ID NO: 2) of the present invention is obtained, as shown in SEQ ID NO: as shown in 3.

The nucleotide sequence (SEQ ID NO: 4) of the existing high-activity enzyme hyPBase with nuclear localization sequence optimized by human-derived codons:

Mutant nucleotide sequence containing nuclear localization sequence (SEQ ID NO: 3):

Alternatively, the mutant nucleotide sequence (shown in SEQ ID NO: 3) undergoes base substitution, deletion or addition operations and has a nucleotide sequence encoding a new highly active transposase bz-hyPBase;

Or in accordance with the principle of base complementary pairing, the nucleotide sequence complementary to the mutant nucleotide sequence (shown in SEQ ID NO: 3) and its base substitution, deletion or addition operation and a new highly active transposase The nucleotide sequence of bz-hyPBase;

Or overlap with the mutant nucleotide sequence (shown in SEQ ID NO: 3) and have a nucleotide sequence encoding the nucleotide sequence of the new highly active transposase bz-hyPBase;

Or hybridize with the mutant nucleotide sequence (shown in SEQ ID NO: 3) and have a nucleotide sequence encoding the nucleotide sequence of the new highly active transposase bz-hyPBase;

Or the same mutant nucleotide sequence (shown in SEQ ID NO: 3) has more than 80% homology and has a nucleotide sequence encoding the new highly active transposase bz-hyPBase; specifically, the same mutant nucleoside is preferred The acid sequence (shown in SEQ ID NO: 3) has more than 90% homology and has a nucleotide sequence encoding the new highly active transposase bz-hyPBase; more preferably a homomutated nucleotide sequence (SEQ ID NO: 3) It has more than 96% homology and has a nucleotide sequence encoding the new highly active transposase bz-hyPBase;

They all belong to the mutant nucleotide sequences encoding the new high-activity transposase bz-hyPBase, its peptide fragments, or its amino acid sequence to be protected by the present invention.

If a functional protein is connected to the new high-activity transposase of the present invention, the mutant nucleotide sequence encoding it also contains a nucleotide sequence encoding the functional protein, such as a nucleotide sequence encoding a nuclear localization signal , The nucleotide sequence expressing EGFP green fluorescent protein, the nucleotide sequence encoding the peptide of the tag protein or the nucleotide sequence encoding the antibody, etc.

The present invention also provides the above-mentioned nucleic acid polymerized from the mutant nucleotide sequence encoding the new highly active transposase of the present invention, or its peptide fragment, or its amino acid sequence. When a functional protein is connected to the novel high-activity transposase of the present invention, the nucleic acid also contains a nucleotide sequence encoding the functional protein (nuclear localization signal, EGFP green fluorescent protein, tag protein or antibody).

The present invention also provides a nucleic acid construct to which one or more regulatory sequences are operably linked, and the regulatory sequences direct the target sequence to be expressed and coded in a host cell. The expression codes include those involved in the production of proteins or polypeptides. Any step of the process, including but not limited to transcription, post-transcriptional modification, translation, post-translational modification and secretion, etc. The nucleic acid construct also contains the above-mentioned mutant nucleotide sequence encoding the new highly active transposase of the present invention, or its peptide fragment, or its amino acid sequence, or a nucleic acid polymerized from the mutant nucleotide sequence.

The present invention also provides a recombinant vector containing the above-mentioned mutant nucleotide sequence encoding the new highly active transposase of the present invention, or its peptide fragment, or its amino acid sequence, or polymerized by the mutant nucleotide sequence. The nucleic acid, or the above-mentioned nucleic acid construct. The recombinant vector includes a recombinant cloning vector, a recombinant eukaryotic expression vector or a recombinant viral vector. The recombinant cloning vector includes a pRS vector, a T vector or a pUC vector, etc., and the recombinant eukaryotic expression vector includes pEGFP, pCMVp-NEO-BAN Or pSV2, etc. The recombinant virus vector includes a recombinant adenovirus vector or a lentivirus vector.

The present invention also provides a host cell, which contains the above-mentioned mutant nucleotide sequence encoding the new highly active transposase of the present invention, or its peptide fragment, or its amino acid sequence, or is polymerized from the mutant nucleotide sequence. The nucleic acid, or the above-mentioned nucleic acid construct, or the above-mentioned recombinant vector. The host cells include E. coli cells, insect cells, yeast cells, mammalian cells, and the like.

The present invention provides a new high-activity transposase used in the transposition system to improve the transposable activity of transposons, or a peptide segment constituting a new high-activity transposase, or a nucleic acid encoding the new high-activity transposase Construct, or recombinant vector encoding the new high-activity transposase or nucleic acid construct containing the new high-activity transposase and/or encoding the new high-activity transposase and/or encoding the new high-activity transposase Enzyme recombinant vector host cells (E. coli cells, insect cells, yeast cells or mammalian cells, etc.), point, stably and efficiently integrate foreign genes into the host cell genome, and achieve long-term and stable expression without affecting The stable expression of the original host genes can be used to construct new gene transfer systems, and can also be used to prepare or use as drugs and/or preparations for genome research, gene therapy, cell therapy, or the induction and/or differentiation of pluripotent stem cells. It can be prepared or used as a tool for genome research, gene therapy, cell therapy, or multifunctional stem cell induction and/or differentiation.

A gene transfer system containing the new high-activity transposase of the present invention, or a nucleic acid construct encoding the new high-activity transposase, or a recombinant vector encoding the new high-activity transposase, or a new high-activity transposase Active transposase and/or a nucleic acid construct encoding a new highly active transposase and/or a host cell for a recombinant vector encoding a new highly active transposase.

The gene transfer system also contains a transposon gene, a nucleic acid or nucleic acid construct encoding a new highly active transposase integrated with the transposon gene; or a nucleic acid or nucleic acid construct encoding a new highly active transposase It is independent of the transposon gene; or the nucleic acid or nucleic acid construct encoding the new highly active transposase is located on the same recombinant vector as the transposon gene; or the nucleic acid or nucleic acid construct encoding the new highly active transposase and The transposon gene is located on a different recombinant vector; or the transposon gene is integrated into the nucleic acid construct encoding the new highly active transposase; or the transposon gene is integrated into the recombinant vector encoding the new highly active transposase ; Or the transposon gene is independent of the recombinant vector encoding the new high-activity transposase; or the transposon gene is transferred into the nucleic acid construction containing the new high-activity transposase and/or encoding the new high-activity transposase Or the transposon gene is located in a nucleic acid construct and/or a recombinant vector encoding a new highly active transposase; or a transposon gene is located in a nucleic acid construct containing a new highly active transposase and/or encoding a new highly active transposase and / Or outside the host cell of the recombinant vector encoding the new highly active transposase.

A medicine and/or preparation for genome research, gene therapy, cell therapy, or induction and/or differentiation of pluripotent stem cells, containing the new high-activity transposase of the present invention, or encoding the new high-activity transposase The nucleic acid construct or the recombinant vector encoding the new high-activity transposase or the nucleic acid construct containing the new high-activity transposase and/or the nucleic acid construct encoding the new high-activity transposase and/or the new high-activity transposase The host cell of the recombinant vector of the transposase, or the above-mentioned gene transfer system.

The medicine used for genome research, gene therapy, cell therapy, or multifunctional stem cell induction and/or differentiation also contains pharmaceutically acceptable excipients, and can be prepared into any pharmaceutically feasible dosage form, and can also be supplemented at the same time Auxiliary treatment components.

A tool for genome research, gene therapy, cell therapy, or induction and/or differentiation of pluripotent stem cells, containing the new highly active transposase of the present invention, or a nucleic acid construct encoding the new highly active transposase , Or a recombinant vector encoding the new high-activity transposase or a nucleic acid construct containing a new high-activity transposase and/or a nucleic acid construct encoding a new high-activity transposase and/or a new high-activity transposase The host cell of the recombinant vector, or the above-mentioned gene transfer system.

Description of the drawings

Figure 1 is a vector map of PRS316-URA-PBase in step (3) of Example 1.

Figure 2 is a schematic diagram of the flow of multiple accumulation error-prone PCR mutations of the transposase in step (3) of Example 1 (above) and the transposase fragments and linearized vectors recovered by the error-prone PCR are transformed into a 10:1 molar ratio Schematic diagram of the ura-deficient yeast strain (the figure below).

Figure 3 is a schematic diagram of the mutant library and screening of high-efficiency transposase in step (3) of Example 1.

Figure 4 is a diagram of the plasmid PRS316-URA-PBase in Example 2 and the working principle diagram of the plasmid (A), WT PBase, hyPBase, optimized hyPBase, and bz-hyPBase in the yeast transposition visual diagram (B), WT PBase, hyPBase, Statistical graph of the transposition of optimized hyPBase, bz-hyPBase in yeast (C) Statistic histogram of the transposition of WT PBase, hyPBase, optimized hyPBase, and bz-hyPBase in yeast (D).

5 is a schematic diagram of the structure of ploxP-bz-HyPB plasmid in Example 3.

Figure 6 is a schematic diagram of the pSAD-EGFP plasmid structure in Example 3.

FIG. 7 is a comparison diagram of the efficiency of editing CHO cell genome using optimized hyPBase and bz-hyPBase transposase in Example 3. It can be seen that the transposition efficiency of bz-hyPBase in CHO cells is significantly increased.

FIG. 8 is a comparison diagram of the efficiency of preparing CAR T cells using optimized hyPBase and bz-hyPBase transposase in Example 4. It can be seen that the transposition efficiency of bz-hyPBase in multiple PBMC donors is significantly increased. A: Results of 7 days; B: Results of 14 days.

Detailed ways

Compared with the transposase shown in SEQ ID NO:1, the highly active transposase provided by the present invention exists in one, any two or all three positions selected from the 92nd, 119th and 601th positions Amino acid mutations, including amino acid insertions, deletions or substitutions; or compared to the transposase shown in SEQ ID NO: 11, in one, any two or all three selected from the 82nd, 109th and 591th positions There are amino acid insertions, deletions or substitutions at these positions. A preferred mutation is a substitution mutation. Preferably, the highly active transposase of the present invention has mutations in the above three positions, especially amino acid substitutions have occurred. Preferably, the amino acid residues at the remaining positions of the transposase of the present invention are the same as the amino acid residues at the corresponding positions of SEQ ID NO: 1 or 11 except for the mutation at the position.

In a preferred embodiment, the amino acid sequence of the highly active transposase of the present invention has one, any two or all three of the following substitution mutations compared with the sequence shown in SEQ ID NO:1: Isoleucine at position 92 The acid mutation is asparagine, the valine at position 119 is mutated to alanine, and the glutamine at position 601 is mutated to arginine; further preferably, the highly active transposase of the present invention has all the above three positions. The substitution mutation has occurred. In a preferred embodiment, the amino acid sequence of the highly active transposase of the present invention has one, any two or all three of the following substitution mutations compared with the sequence shown in SEQ ID NO: 11: Isoleucine at position 82 The acid mutation is asparagine, the valine at position 109 is mutated to alanine, and the glutamine at position 591 is mutated to arginine; further preferably, the highly active transposase of the present invention has all the above three positions. The substitution mutation has occurred. Preferably, the amino acid residues at the remaining positions of the transposase of the present invention are the same as the amino acid residues at the corresponding positions of SEQ ID NO: 1 or 11 except for the mutation at the position.

In some embodiments, the amino acid sequence of the highly active transposase of the present invention is shown in SEQ ID NO: 12. In a particularly preferred embodiment, the amino acid sequence of the highly active transposase of the present invention is shown in SEQ ID NO: 2. The amino acid sequence of the transposase shown in SEQ ID NO: 11 and 12 herein does not contain a nuclear localization sequence.

The present invention also includes the following transposase: Compared with SEQ ID NO: 1, except for one, any two or all three positions of the 92nd, 119th and 601th positions, the transposase described in any of the embodiments herein In addition to the mutations, there are one or more insertion, deletion and/or substitution mutations in the other one or more amino acid positions of SEQ ID NO: 1, or compared with SEQ ID NO: 11, except in the 82, 109 and In addition to the mutations described in any of the embodiments herein in one, any two or all three positions of position 591, there are one or more insertions in the other one or more amino acid positions of SEQ ID NO: 11, Deletion and/or substitution mutation, and the transposase still has the transposase activity described herein. Preferred mutations are substitution mutations, and more preferred are conservative substitutions. For example, the substitution of amino acid residues with the same or similar properties usually does not significantly change the transposase activity of the resulting mutant. For example, amino acids whose side chain groups have the same polarity can be used for substitution. Based on the polarity of side chain groups, amino acids can be divided into non-polar amino acids (hydrophobic amino acids) and polar amino acids (hydrophilic amino acids); among them, non-polar amino acids include alanine, valine, leucine, iso Leucine, proline, phenylalanine, tryptophan and methionine; polar amino acids include neutral amino acids, basic amino acids and acidic amino acids, among which neutral amino acids include serine, threonine, and cysteine , Tyrosine, asparagine and glutamine, basic amino acids include lysine, arginine and histidine, acidic amino acids include aspartic acid and glutamic acid. In some embodiments, this type of transposase has at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least SEQ ID NO:1. 99% sequence identity, and at least one, any two or all three of the 92nd, 119th and 601th positions have the substitution mutations described in any of the embodiments herein, or are similar to SEQ ID NO: 11. Have a sequence identity of at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, or at least 99%, and have a sequence identity of at least 82, 109, and 591 At least one, any two, or all three positions have the substitution mutations described in any of the embodiments herein. A tool known in the art, such as BLASTP, can be used to calculate the sequence identity between two amino acid sequences.

In some embodiments, the present invention provides a fusion protein, which contains the highly active transposase described in any embodiment of the present invention and a functional protein, or is formed or formed by the highly active transposase and the functional protein. composition. It should be understood that the fusion protein should at least retain the transposition activity of the highly active transposase described herein. The functional protein is used to improve or increase the biological activity or biological function of the highly active transposase of the present invention. Exemplary functional proteins include, but are not limited to, functional proteins used to increase the transposable activity of transposases, used to monitor the transposable function of transposases, and/or used to add new functions to transposases. For example, functional proteins include, but are not limited to, nuclear localization signal proteins/sequences, which can guide transposase to accumulate in the nucleus, thereby helping to improve the transposition efficiency of transposase; labeled proteins (such as fluorescent labeled proteins, such as green fluorescent protein ( Such as EGFP), red fluorescent protein, blue fluorescent protein, yellow fluorescent protein, etc.) or tag protein (such as His6, Flag, GST, MBP, HA, Myc, His-Myc, etc.), used to enhance the transposition of transposase The monitoring function facilitates the qualitative and/or quantitative monitoring of the transposase activity of the transposase; the antibody of interest is used to increase the new function of the transposase, such as increasing the immunogenicity. An exemplary nuclear localization signal protein or sequence is the c-myc nuclear localization signal sequence, and its sequence may be as shown in the amino acid residues 3-11 of SEQ ID NO:1.

In a preferred embodiment, in the fusion protein of the present invention, the amino acid sequence of the transposase has one, any two or all three substitution mutations as compared with the sequence shown in SEQ ID NO:1: position 92 The isoleucine is mutated to asparagine, the valine at position 119 is mutated to alanine, and the glutamine at position 601 is mutated to arginine; further preferably, the transposase is at the above three positions All have the substitution mutation; and further preferably, the amino acid residue at the remaining position of the transposase is the same as the amino acid residue at the corresponding position of SEQ ID NO:1. Alternatively, in the fusion protein of the present invention, the amino acid sequence of the transposase has one, any two or all three substitution mutations as compared with the sequence shown in SEQ ID NO: 11: Isoleucine at position 82 The transposase is mutated to asparagine, the valine at position 109 is mutated to alanine, and the glutamine at position 591 is mutated to arginine; further preferably, the transposase has been mutated at the above three positions. Substitution mutation; and further preferably, the amino acid residue at the remaining position of the transposase is the same as the amino acid residue at the corresponding position of SEQ ID NO:1. In a particularly preferred embodiment, the amino acid sequence of the transposase in the fusion protein of the present invention is shown in SEQ ID NO: 2 or 12.

In the fusion protein of the present invention, if necessary, the transposase and the functional protein can be connected via a linker sequence. The linker sequence may be a conventional linker, such as a linker sequence containing glycine and serine. In the fusion protein, the transposase can be located at the N-terminal or C-terminal of the fusion protein; or, when the fusion protein has more than two functional proteins, the fusion protein can also be located between two or more functional proteins.

The present invention includes nucleic acid molecules whose polynucleotide sequence is the coding sequence of the transposase described herein or the complementary sequence of the coding sequence, or the coding sequence of the fusion protein described herein or the complementary sequence thereof. In some embodiments, compared with SEQ ID NO: 4, the coding sequence of the transposase of the present invention has a base at one, any two, or all three of positions 276, 356, and 1802. Base mutation, optionally there is a base mutation at base 900. Preferably, when a mutation occurs, base T at position 276 is mutated to base C, base T at position 356 is mutated to base C, base G at position 900 is mutated to base A, and base A at position 1802 is mutated. Mutation to base G. The polynucleotide sequence of the nucleic acid molecule of the present invention is shown in SEQ ID NO: 3. In some embodiments, compared with SEQ ID NO: 13, the polynucleotide sequence of the nucleic acid molecule of the present invention is present at one, any two, or all three positions among the 246th, 326th, and 1772th positions. Base mutation, optionally there is a base mutation at base 870; preferably, the mutation at position 246 is a base T mutation to base C, and the mutation at position 326 is a base T mutation to a base C, the mutation at position 870 is a mutation of base G to base A, and the mutation at position 1772 is a mutation of base A to base G; more preferably, the polynucleotide sequence is as shown in SEQ ID NO: 14. Show. Here, the polynucleotide sequences shown in SEQ ID NOs: 13 and 14 do not contain the coding sequence of the nuclear localization sequence.

In some embodiments, the polynucleotide sequence of the nucleic acid molecule of the present invention has at least 80% homology, preferably at least 90% homology, and more than the polynucleotide sequence described in SEQ ID NO: 4 or 13. Preferably at least 96% homology, and the same coding function, at the same time at the 276th position or the 246th position, the 356th position or the 326th position and the 1802th position or the 1772th position, any two or all of them The base mutation is present at three positions, and the base mutation is optionally present at the 900th or 870th base.

Also provided herein is a nucleic acid construct containing the coding sequence of the transposase described in any embodiment herein or its complement, or the coding sequence of the fusion protein described in any embodiment herein or its complement. In some embodiments, the nucleic acid construct is an expression cassette, and in addition to the coding sequence, the expression cassette also contains a transcription termination sequence such as a PolyA tailing signal sequence and a promoter. Appropriate promoters are well known in the art, and those skilled in the art can select a suitable promoter capable of promoting the expression of the transposase described herein or its fusion protein in the host according to the host used for expression.

In some embodiments, the nucleic acid construct sequentially includes the following elements: a transposon 5'terminal repeat sequence (5'ITR), a polyclonal insertion site, a polyA tailing signal sequence, a transposon 3'terminal repeat sequence (3'ITR), the nucleic acid molecule described in any of the embodiments herein, and a promoter that controls the expression of the nucleic acid molecule. The direction and/or order referred to in "sequentially" in the "sequentially including the following elements" refers to from upstream to downstream. In the present invention, unless otherwise specified, the direction along the aforementioned "forward direction" is from upstream to downstream, and the direction along the aforementioned "reverse direction" is from downstream to upstream.

In some embodiments, the 5'end repeat sequence of the transposon is the 5'end repeat sequence of the PiggyBac transposon, and its nucleotide sequence is, for example, as shown in SEQ ID NO: 15; the 3'end of the transposon The repeat sequence is the 3'terminal repeat sequence of the PiggyBac transposon, and its nucleotide sequence is, for example, as shown in SEQ ID NO: 16.

In some embodiments, the polyA tailing signal sequence has a polyA tailing signal function in both forward and reverse directions.

In some embodiments, each of the above 6 elements is independently a single copy or multiple copies.

The above-mentioned 6 elements may be directly connected, or may contain other sequences such as linker or restriction site.

In the present invention, if there is no special description, the above-mentioned "the polyA tailing signal sequence has a polyA tailing signal function in both forward and reverse directions" includes but is not limited to the following situations:

1) A polyA tailing signal sequence, which has the function of polyA tailing signal in both forward and reverse directions;

2) Two polyA tailing signal sequences, one has the function of polyA tailing in the forward direction, and the other has the function of polyA tailing signal in the reverse direction.

Preferably, the solution in 1) above is adopted. Without being bound by theory, the exogenous gene expression cassette and the PiggyBac transposase expression cassette can share a polyA tailing signal sequence, thereby reducing a polyA tailing signal sequence, embodying the principle of intensiveness, reducing the size of the plasmid, and helping in Under the premise of ensuring transfection efficiency, increase the capacity of the foreign gene expression cassette.

In some embodiments, the PB expression cassette is placed in the same direction as the exogenous gene expression cassette, and two polyA tailed signal sequences are used, where the PB expression cassette is in front, and the polyA tailed signal sequence is placed in one of the ITRs and the exogenous gene. Between gene promoters. For example: the promoter that controls the expression of PB transposase, PB transposase coding sequence, transposon 5'terminal repeat sequence, polyA tail signal sequence 1, foreign gene promoter and foreign gene (multiple clone insertion site ), polyA tailing signal sequence 2, transposon 3'terminal repeat sequence; and the direction of the expression cassette of the PB transposase is the same as the direction of the expression cassette of the foreign gene.

In some embodiments, the position of the 5'end repeat of the transposon and the 3'end of the transposon can be interchanged.

In some embodiments, the nucleotide sequence of the polyclonal insertion site is shown in SEQ ID NO: 17;

In some embodiments, the nucleotide sequence of the polyA tailing signal sequence is shown in SEQ ID NO: 18; the sequence shown in SEQ ID NO: 18 has a polyA tailing signal function in both forward and reverse directions.

Exemplary promoters include, but are not limited to, CMV promoter, EF1α promoter, SV40 promoter, Ubiquitin B promoter, CAG promoter, HSP70 promoter, PGK-1 promoter, β-actin promoter, TK promoter And GRP78 promoter.

One or more identical or different foreign genes of interest and optionally a promoter that controls the expression of the foreign gene can be operably inserted into the multiple cloning site of the nucleic acid construct of the present invention, or its multiple clones The site is replaced with one or more identical or different exogenous gene coding sequences and optionally a promoter that controls the expression of the exogenous gene; the exogenous gene is independently a single copy or multiple copies.

In some embodiments, the direction of the expression cassette of the transposase is opposite to the direction of the expression cassette of the foreign gene.

In some embodiments, the exogenous gene is selected from a luciferin reporter gene (such as green fluorescent protein, red fluorescent protein, yellow fluorescent protein, etc.), luciferase genes (such as firefly luciferase, Renilla luciferase, etc.) ), natural functional protein genes (such as TP53, GM-CSF, OCT4, SOX2, Nanog, KLF4, c-Myc), RNAi genes and artificial chimeric genes (such as chimeric antigen receptor genes, Fc fusion protein genes, full length One or more of antibody genes, Nanobody genes).

As used herein, the term "expression cassette" refers to the complete elements required to express a gene, including promoters, gene coding sequences, and PolyA tailing signal sequences.

The term "nucleic acid construct" is defined herein as a single-stranded or double-stranded nucleic acid molecule, and preferably refers to an artificially constructed nucleic acid molecule. Optionally, the nucleic acid construct further comprises one or more control sequences operably linked, and the control sequences can direct the coding sequence to be expressed in a suitable host cell under compatible conditions. Expression should be understood to include any steps involved in the production of a protein or polypeptide, including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.

The term "operably inserted/linked" is defined herein as a conformation in which the regulatory sequence is located at an appropriate position relative to the coding sequence of the DNA sequence so that the regulatory sequence directs the expression of the protein or polypeptide. In the nucleic acid construct of the present invention, for example, the foreign gene promoter and the foreign gene coding sequence are placed at the multiple cloning site by DNA recombination technology. The "operably linked" can be achieved by means of DNA recombination, specifically, the nucleic acid construct is a recombinant nucleic acid construct.

The term "coding sequence" is defined herein as the part of a nucleic acid sequence that directly determines the amino acid sequence of its protein product. The boundary of the coding sequence is usually determined by the ribosome binding site immediately upstream of the 5'open reading frame of the mRNA (for prokaryotic cells) and the transcription termination sequence immediately downstream of the 3'open reading frame of the mRNA. Coding sequences can include, but are not limited to DNA, cDNA, and recombinant nucleic acid sequences.

The term "regulatory sequence" herein is defined as including all components necessary or advantageous for expressing the peptide of the present invention. Each control sequence may be naturally contained or foreign to the nucleic acid sequence encoding the protein or polypeptide. These regulatory sequences include, but are not limited to, leader sequences, polyadenylation sequences, propeptide sequences, promoters, signal sequences, and transcription terminator. At a minimum, regulatory sequences should include promoters and termination signals for transcription and translation. In order to introduce specific restriction sites to connect the regulatory sequence to the coding region of the nucleic acid sequence encoding the protein or polypeptide, a regulatory sequence with a linker can be provided.

The control sequence may be a suitable promoter sequence, that is, a nucleic acid sequence recognized by the host cell expressing the nucleic acid sequence. The promoter sequence contains transcriptional regulatory sequences that mediate the expression of the protein or polypeptide. The promoter can be any nucleic acid sequence that is transcriptionally active in the host cell of choice, including mutant, truncated and hybrid promoters, and can be derived from extracellular or intracellular encoding homologous or heterologous to the host cell Protein or peptide gene.

The regulatory sequence can also be a suitable transcription termination sequence, that is, a sequence that can be recognized by the host cell to terminate transcription. The termination sequence can be operably linked to the 3'end of the nucleic acid sequence encoding the protein or polypeptide. Any terminator that can function in the host cell of choice can be used in the present invention.

The control sequence can also be a suitable leader sequence, that is, an untranslated region of mRNA that is important for translation by the host cell. The leader sequence is operably linked to the 5'end of the nucleic acid sequence encoding the polypeptide. Any leader sequence that can function in the host cell of choice can be used in the present invention.

The control sequence can also be a signal peptide coding region, which encodes an amino acid sequence linked to the amino terminus of a protein or polypeptide, which can guide the encoded polypeptide into the secretory pathway of cells. The 5'end of the coding region of the nucleic acid sequence may naturally contain a signal peptide coding region in which the translation reading frame is naturally linked to the fragment of the coding region of the secreted polypeptide. Alternatively, the 5'end of the coding region may contain a signal peptide coding region that is foreign to the coding sequence. When the coding sequence normally does not contain a signal peptide coding region, it may be necessary to add a foreign signal peptide coding region. Alternatively, the natural signal peptide coding region can be simply replaced with a foreign signal peptide coding region to enhance polypeptide secretion. However, any signal peptide coding region that can guide the expressed polypeptide into the secretory pathway of the host cell used can be used in the present invention.

The control sequence can also be a propeptide coding region, which encodes an amino acid sequence located at the amino terminus of the polypeptide. The resulting polypeptide is called a zymogen or a propolypeptide. The pro-polypeptide is usually inactive and can be converted into a mature active polypeptide by cleaving the pro-polypeptide from the pro-polypeptide through catalysis or autocatalysis.

When there is both a signal peptide and a pro-peptide region at the amino terminus of the polypeptide, the pro-peptide region is adjacent to the amino terminus of the polypeptide, and the signal peptide region is adjacent to the amino terminus of the pro-peptide region.

It may also be necessary to add regulatory sequences that can regulate the expression of the polypeptide according to the growth of the host cell. Examples of regulatory systems are those that respond to chemical or physical stimuli (including in the presence of regulatory compounds) to turn on or turn off gene expression. Other examples of regulatory sequences are those that enable gene amplification. In these examples, the nucleic acid sequence encoding the protein or polypeptide should be operably linked to the regulatory sequence.

In some embodiments, the invention provides recombinant vectors. The recombinant vector may contain the nucleic acid molecule or nucleic acid construct described in any of the embodiments herein. The recombinant vector can be a recombinant cloning vector, a recombinant eukaryotic expression vector or a recombinant virus vector. The recombinant vector may contain other regulatory elements, including but not limited to leader sequence, polyadenylation sequence, propeptide sequence, enhancer, transcription terminator, resistance gene, etc. The corresponding recombinant vector can be selected and constructed according to different purposes, so that it contains the required regulatory elements. The recombinant cloning vector is preferably a pRS vector, a T vector or a pUC vector, the recombinant eukaryotic expression vector is preferably pEGFP, pCMVp-NEO-BAN or pSV2, and the recombinant viral vector is preferably a recombinant adenovirus vector or a lentiviral vector. In some embodiments, the recombinant cloning vector is the nucleic acid construct according to any one of the embodiments of the present invention and pUC18, pUC19, pMD18-T, pMD19-T, pGM-T vector, pUC57, pMAX or pDC315 series vector A recombinant vector obtained by recombination; the recombinant expression vector is the nucleic acid construct according to any embodiment of the present invention and the pCDNA3 series vector, pCDNA4 series vector, pCDNA5 series vector, pCDNA6 series vector, pRL series vector, pUC57 vector, pMAX A vector or a recombinant vector obtained by recombination of the pDC315 series vector; the recombinant virus vector is a recombinant adenovirus vector, a recombinant adeno-associated virus vector, a recombinant retrovirus vector, a recombinant herpes simplex virus vector or a recombinant vaccinia virus vector.

The nucleic acid constructs and recombinant vectors can be constructed by methods well known in the art, and expressed by conventional methods, so as to prepare the transposase and fusion proteins described herein.

In some embodiments, the present invention also provides a host cell, which contains the nucleic acid molecule, nucleic acid construct and/or recombinant vector described in any of the embodiments herein, or expresses the transposase and/or the transposase described in any of the embodiments herein Or fusion protein. The host cell of the present invention is preferably an E. coli cell, an insect cell, a yeast cell or a mammalian cell. In some embodiments, the host cell is a recombinant mammalian cell; for example, a recombinant primary culture T cell, Jurkat cell, K562 cell, tumor cell, HEK293 cell or CHO cell.

In some embodiments, the present invention also provides a gene transfer system, which contains the transposase, fusion protein, nucleic acid molecule, nucleic acid construct, recombinant vector or host cell described in any of the embodiments herein. In some embodiments, the gene transfer system further contains a transposon gene. In some embodiments, the nucleic acid molecule or nucleic acid construct described in any of the embodiments herein is integrated with a transposon gene; in some embodiments, the nucleic acid molecule or nucleic acid construct is relatively independent of the transposon gene In some embodiments, the nucleic acid molecule or nucleic acid construct and the transposon gene are located on the same recombinant vector; in some embodiments, the nucleic acid molecule or nucleic acid construct and the transposon gene are located On different recombinant vectors; in some embodiments, the transposon gene is integrated into the nucleic acid construct; in some embodiments, the transposon gene is integrated into the recombinant vector described in any of the embodiments herein On; In some embodiments, the transposon gene is transferred into the host cell described in any of the embodiments herein; in some embodiments, the transposon gene is located in the host described in any of the embodiments herein Extracellular.

The present invention also provides the use of the transposase, fusion protein, nucleic acid molecule, nucleic acid construct, recombinant vector, host cell or gene transfer system described in any of the embodiments herein in any of the following:

(1) Preparation or use as drugs and/or preparations for genome research, gene therapy, cell therapy, or multifunctional stem cell induction and/or differentiation; preferably, preparation or use as drugs and/or preparations for integrating foreign genes into the host cell genome Or a preparation, preferably the host cell is an E. coli cell, an insect cell, a yeast cell or a mammalian cell;

(2) Preparation or use as a tool for genome research, gene therapy, cell therapy, or induction and/or differentiation of pluripotent stem cells; preferably for preparation or use as a tool for integrating foreign genes into the host cell genome, preferably the host cell is the large intestine Bacillus cells, insect cells, yeast cells or mammalian cells.

The present invention also provides a medicine and/or preparation for genome research, gene therapy, cell therapy, or induction and/or differentiation of pluripotent stem cells, containing the transposase, fusion protein, and nucleic acid described in any of the embodiments herein Molecules, nucleic acid constructs, recombinant vectors, host cells or gene transfer systems.

The present invention also provides a tool for genome research, gene therapy, cell therapy, or multifunctional stem cell induction and/or differentiation, which contains the transposase, fusion protein, nucleic acid molecule, and nucleic acid construct described in any of the embodiments herein Food, recombinant vector, host cell or gene transfer system.

In some embodiments, the present invention includes the following items 1 to 18:

1. An amino acid sequence of a highly active transposase, containing one or more of the following amino acid sequences: (1) Amino acid mutations at the following positions of the amino acid sequence shown in SEQ ID NO:1 have transposase activity Amino acid sequence: at least one of amino acid 92, amino acid 119, or amino acid 601; preferably amino acid 92, amino acid 119, and amino acid 601 are simultaneously subjected to amino acid mutations; more preferably the isoleucine at 92 position Mutations to asparagine, valine at position 119 to alanine, and glutamine at position 601 to arginine; (2) In (1) the amino acid at position 92, amino acid 119 or amino acid 601 One or more amino acids other than amino acid mutations are deleted, substituted, inserted or added to obtain an amino acid sequence with transposase activity; preferably one or more of amino acid mutations other than amino acid 92, amino acid 119 and amino acid 601 are simultaneously undergone mutation The amino acid sequence with transposase activity obtained by deletion, substitution, insertion or addition of three amino acids; moreover, the isoleucine at position 92 is mutated to asparagine, the valine at position 119 is mutated to alanine, and 601 The glutamine at position is mutated into one or more amino acids other than arginine, and the amino acid sequence with transposase activity is obtained by deletion, substitution, insertion or addition.

2. The amino acid sequence according to item 1, wherein the amino acid sequence also contains the amino acid sequence of a functional protein; the amino acid sequence of the functional protein is preferably an amino acid sequence for nuclear localization signal, an amino acid sequence for expressing EGFP green fluorescent protein , Tag protein amino acid sequence or antibody amino acid sequence, etc.

3. An amino acid sequence of a highly active transposase, containing one or more of the amino acid sequence shown in SEQ ID NO: 2 or the amino acid sequence shown in SEQ ID NO: 2 at amino acid 92, amino acid 119, and amino acid 601 The amino acid sequence with transposase activity is obtained by deleting, replacing, inserting or adding one or more other amino acids.

4. A peptide fragment containing one or more amino acid sequences described in any one of items 1-3.

5. A protein containing one or more amino acid sequences described in any one of items 1 to 3 or one or more peptide fragments described in item 4, which has transposase activity.

6. A nucleotide sequence encoding the amino acid sequence of any one of items 1-3 or the peptide fragment of item 4 or the protein of item 5, containing one or more of the following nucleotide sequences:( 1) Make the following base mutations to the nucleotide sequence shown in SEQ ID NO: 4: at least one of base 276, base 356, base 900, or base 1802; It is preferable to carry out base mutations at base 275, base 356, base 900 and base 1802 at the same time, and it is more preferable to mutate base T at base 275 to base C and base T at base 356 to base mutation. Base C, base G at position 900 is mutated to base A, base A at position 1802 is mutated to base G; or (2) a nucleotide sequence complementary to the mutated nucleotide sequence in (1); Or (3) a nucleotide sequence that overlaps with the mutated nucleotide sequence in (1) and has the same coding function; or (4) hybridizes with the mutated nucleotide sequence in (1) and has the same coding function (5) Substitution, deletion or addition of one or more bases in the nucleotide sequence of (1), (2), (3) or (4) except for the gene mutation site Nucleotides with the same coding function; or (6) Nucleosides that have at least 80% homology with the nucleotide sequence in (1), (2), (3) or (4) and have the same coding function Acid sequence; preferably a nucleotide sequence with at least 90% homology and the same coding function; more preferably a nucleotide sequence with at least 96% homology and the same coding function.

7. A nucleotide sequence encoding the amino acid sequence described in any one of items 1-3 or the peptide fragment described in item 4 or the protein described in item 5, containing one or more of the following nucleotide sequences:( 1) The nucleotide sequence shown in SEQ ID NO: 3; or (2) the nucleotide sequence complementary to the mutated nucleotide sequence in (1); or (3) the mutated nucleotide sequence in (1) Nucleotide sequences overlapping and having the same coding function; or (4) a nucleotide sequence that hybridizes with the mutated nucleotide sequence in (1) and has the same coding function; or (5) a pair of (1) ), (2), (3) or (4) in which one or more bases in the nucleotide sequence other than the gene mutation site are substituted, deleted or added and have the same coding function; or ( 6) A nucleotide sequence with at least 80% homology and the same coding function as the nucleotide sequence in (1), (2), (3) or (4); preferably at least 90% homology and the same A nucleotide sequence that encodes a function; more preferably a nucleotide sequence that is at least 96% homologous and has the same encoding function.

8. The nucleotide sequence described in item 6 or 7 also contains a nucleotide sequence encoding a functional protein, preferably a nucleotide sequence encoding a nuclear localization signal, a nucleotide sequence expressing EGFP green fluorescent protein, The nucleotide sequence encoding the peptide of the tag protein or the nucleotide sequence encoding the antibody.

9. A nucleic acid containing the nucleotide sequence of any one of items 6-8.

10. A nucleic acid construct encoding the amino acid sequence described in any one of items 1 to 3 or the peptide fragment described in item 4 or the protein described in item 5.

11. A nucleic acid construct according to item 10, which contains the nucleotide sequence according to any one of items 6 to 8, or contains the nucleic acid according to item 9.

12. A recombinant vector containing the nucleotide sequence of any one of items 6-8, or the nucleic acid of item 9, or the nucleic acid construct of any one of items 10-11 The recombinant vector is preferably a recombinant cloning vector, a recombinant eukaryotic expression vector or a recombinant viral vector, the recombinant cloning vector is preferably a pRS vector, a T vector or a pUC vector, and the recombinant eukaryotic expression vector is preferably pEGFP, pCMVp-NEO- BAN or pSV2, the recombinant virus vector is preferably a recombinant adenovirus vector or a lentivirus vector.

13. A host cell containing the nucleic acid construct according to any one of items 10-11 or the recombinant vector according to item 12; the host cell is preferably an E. coli cell, an insect cell, a yeast cell or a mammal Animal cells.

14. A gene transfer system, characterized in that it contains the peptide of item 4, or the protein of item 5, or the nucleic acid of item 9, or any one of items 10-11 The nucleic acid construct described in item 12, or the recombinant vector described in item 12, or the host cell described in item 13.

15. A gene transfer system according to item 14, characterized in that it further contains a transposon gene, the nucleic acid of item 9 or the nucleic acid construct of any one of items 10-11 and Transposon gene integration; or the nucleic acid of item 9 or the nucleic acid construct of any one of items 10-11 and the transposon gene are relatively independent; or the nucleic acid of item 9 or the nucleic acid of item 10- The nucleic acid construct according to any one of items 11 and the transposon gene are located on the same recombinant vector; or the nucleic acid according to item 9 or the nucleic acid construct according to any one of items 10-11 and the transposon The daughter gene is located on a different recombinant vector; or the transposon gene is integrated into the nucleic acid construct described in any one of items 10-11; or the transposon gene is integrated into the recombinant vector described in item 12; or The transposon gene is transferred into the host cell described in item 13; or the transposon gene is located outside the host cell described in item 13.

16. The peptide segment of item 4, or the protein of item 5, or the nucleic acid of item 9, or the nucleic acid construct of any one of items 10-11, or item 12 Use of the recombinant vector, or the host cell described in item 13, or the gene transfer system described in any one of items 14-15, in any one of the following items:

17. A drug and/or preparation for genome research, gene therapy, cell therapy, or induction and/or differentiation of pluripotent stem cells, containing the peptide described in item 4, or the protein described in item 5, Or the nucleic acid of item 9, or the nucleic acid construct of any one of items 10-11, or the recombinant vector of item 12, or the host cell of item 13, or the host cell of item 13 or 14- The gene transfer system described in any one of 15 items.

18. A tool for genome research, gene therapy, cell therapy, or induction and/or differentiation of pluripotent stem cells, containing the peptide described in item 4, or the protein described in item 5, or item 9 The nucleic acid, or the nucleic acid construct described in any one of items 10-11, or the recombinant vector described in item 12, or the host cell described in item 13, or any one of items 14-15 The gene transfer system described in one item.

The following describes the present invention more clearly with reference to the drawings and specific embodiments of the specification. The specific embodiments are only used to explain the present invention, but are not limited in any way. The experimental method conditions in the described examples are conventional experimental method conditions unless otherwise specified; the reagents and the like are carried out according to the manufacturer's instructions without special instructions.

Example 1 Obtaining a highly active bz-hyPBase mutant

Based on the original sequence of the existing highly active piggybac transposase (hyPBase for short) (amino acid sequence shown in SEQ ID NO:1), we made the following changes to obtain the protected baize piggyBac transposase (bz-hyPBase for short) sequence information:

(1) Based on human codon usage preferences, we optimized the existing high-activity piggybac transposase to obtain the nucleotide sequence shown in SEQ ID NO: 4 to increase the expression level of the transposase;

(2) A human c-myc nuclear localization signal is added after the start codon to improve the integration efficiency of foreign genes in the host cell;

(3) The following method was used to randomly mutate the nucleotide sequence shown in SEQ ID NO: 4 to obtain a mutant whose transposition efficiency was significantly better than that of the existing highly active piggybac transposase. We named it bz- hyPBase (amino acid sequence shown in SEQ ID NO: 2, nucleotide sequence shown in SEQ ID NO: 3), specifically as follows:

a. Construction of screening report vector

The resistance gene G418 is inserted between the 5'IR and 3'IR of the original transposon by means of gene synthesis to form the transposon G418-IR. The transposon was inserted into the TTAA in the URA3 gene by recombination after PCR, and the transposase with an inducible promoter was inserted into the PRS316 polyclonal restriction site to finally constitute the screening report vector PRS316-URA- PBase. The specific operations are as follows:

(1) PCR was performed on the template PRS316 using primers pURA-F (SEQ ID NO: 5: aagccgctaaaggcattatccgcc) and pURA-R (SEQ ID NO: 6: aactgtgccctccatggaaaaatcagtc) to obtain linearized fragment 1 of plasmid PRS316.

(2) Use primers pURA-IR-F (SEQ ID NO: 7:) and pURA-IR-R (SEQ ID NO: 8:) to perform PCR on the synthesized transposon G418-IR to obtain the transposon Linearized fragment 2 with homologous sequence to PRS316.

pURA-IR-F (SEQ ID NO: 7):

pURA-IR-R (SEQ ID NO: 8):

(3) Use NEBuilder homologous recombinase to connect fragment 1 and fragment 2 to form plasmid PRS316-URA.

(4) Synthesize the PB transposase with GALS inducible promoter gene, use SacI and EcoRI to clone it into the vector PRS316-URA, and finally generate the plasmid PRS316-URA-PBase. The PRS316-URA-PBase vector map is shown in Figure 1.

b. Construction of mutant library

Design PCR primers outside the open reading frame (ORF) of the transposase: GR-F (SEQ ID NO: 9: taatcagcgaagcgatga) and GR-R (SEQ ID NO: 10: cagcatgcctgctattgtcttcc), on the PRS-URA-PBase vector The transposase ORF has a homologous sequence of about 50 bp at both ends. The transposase is mutated using clonth's error-prone PCR kit, and the number of mutations can be accumulated by recovering PCR fragments as a template for multiple mutations (as shown in the flow chart above in Figure 2). (Shown), and finally get a transposase fragment containing point mutations. The screening report vector PRS316-URA-PBase uses XbaI and EcoRI for linearization, and removes the original unmutated transposase. The transposase fragments and linearized vectors recovered by PCR are transformed into ura-deficient yeast strains at a molar ratio of 10:1 (shown in the flow chart below in Figure 2 and shown in Figure 3), and the yeast will use its own homologous recombination to repair The mechanism allows the exogenous target fragment to be replaced by the homology arm into the DNA plasmid carrying the gap, thereby automatically combining into a complete plasmid with the target fragment in the yeast cell. Through this method, one-step cloning of DNA fragments into yeast strains can be achieved, and at the same time, the phenomenon of high frequency repetition of mutants in the process of plasmid construction and amplification in Escherichia coli and then transferred to yeast can be reduced. By this method, the clones obtained on the plate after transformation are mutants, and a certain number of mutant libraries can be obtained by picking single clones.

c. Screening process for efficient transposase

As shown in Figure 3, the screening process is divided into two screenings. In the first screening, all mutants were screened on a large scale, and mutants with significantly higher transposition efficiency than those in the unmutated control group were obtained. The second screening was carried out in the yeast obtained in the first screening, and the exact transposition was calculated. To obtain a mutant with increased transposition efficiency in yeast, bz-hyPBase (SEQ ID NO: 2 amino acid sequence, SEQ ID NO: 3 nucleotide sequence).

The first screening: The transformed mutant library is picked up and activated in YPD medium containing G418 antibiotics in a 96-well plate. After 24 hours of activation, it is transferred using a replicator and inoculated to a concentration of 2% Induce in YPD medium with galactose. After 24 hours of induction, dilute the bacterial solution to 10-2 or 10-3 (determined according to the growth of yeast), take 10μl of the dot plate on the ura-deficient solid medium, and observe the growth of the mutant after 48 hours of cultivation , And compared with the clones without mutations, the clones with significantly higher transposition efficiency were screened out, and the second screening was carried out.

Second screening: Activate the suspected mutants obtained in the first screening for 24 hours, adjust the OD600 value after activation to be consistent, and inoculate them into YPD medium containing 2% galactose at a ratio of 1:100 for induction for 24 hours After induction, adjust the OD600 value to be consistent again, and dilute to 10-2, 10-3, 10-4, take 20μl diluted to 10-2, 10-3 and spread on the ura-deficient solid medium for 24 hours. Count the number of clones, and the clones grown on the ura-deficient solid medium are the clones that have undergone transposition. At the same time, take 20μl diluted to 10-3, 10-4 and spread the YPD complete solid medium on the para-position control. The grown clones are the total number of yeast. Transposase transposition efficiency=number of clones transposed/total number of clones=(number of clones in ura-deficient medium*dilution factor)/(number of clones in YPD medium*dilution factor)*100%. Through this method, high-throughput screening can be achieved. A single operation can achieve throughput screening of 96-960 mutants, which greatly increases the probability of obtaining highly active transposases through screening.

Through the above calculation, we can get the accurate transposition efficiency of the mutant, and we will select the strains with increased transposition efficiency for mutation site analysis. Inoculate the yeast in the initially activated 96-well plate for expansion culture, extract the yeast plasmid, send it to the company for sequencing and analysis, and obtain the mutant mutation site by comparing with the original sequence.

The amino acid sequence of hyPBase with nuclear localization sequence (SEQ ID NO:1):

The amino acid sequence of bz-hyPBase with nuclear localization sequence (SEQ ID NO: 2):

The amino acid sequence of the existing highly active transposase hyPBase (SEQ ID NO:1) is mutated from isoleucine at position 92 to asparagine, valine at position 119 is mutated to alanine, and glutamine at position 601 The amide was mutated to arginine to obtain the amino acid sequence of bz-hyPBase as shown in SEQ ID NO: 2.

The nucleotide sequence of human codon-optimized hyPBase transposase containing nuclear localization sequence (SEQ ID NO: 4):

Nucleotide sequence of bz-hyPBase transposase containing nuclear localization sequence (SEQ ID NO: 3):

The nucleotide sequence of the existing high-activity enzyme hyPBase has been optimized by human codons to obtain a human codon optimized nucleotide sequence. Based on the human codon optimized nucleotide sequence (SEQ ID NO: 4), the following is performed Base mutation at position: base T at position 276 was mutated to base C, base T at position 356 was mutated to base C, base G at position 900 was mutated to base A, and base A at position 1802 was mutated to Base G; to obtain a mutated nucleotide sequence that encodes the new high-activity transposase bz-hyPBase of the present invention as shown in SEQ ID NO: 3.

Example 2 bz-hyPBase has higher transposition efficiency in yeast

We inserted the transposon with the G418 resistance gene into the URA3 gene in the yeast plasmid PRS316, disrupted the expression of the URA gene, and cloned the transposase with the inducible promoter into PRS316 at the same time to generate the plasmid PRS316-URA- Pbase, prepare plasmids carrying different transposases WT PBase, hyPBase, optimized hyPBase, and bz-hyPBase in parallel. The plasmid was transformed into ura-deficient Saccharomyces cerevisiae BJ2168, which cannot survive in the ura-deficient medium. The transposase is turned on and expressed under the regulation of the inducer galactose, which promotes the transposition of the transposon, the transposition of the transposon, the normal expression of the URA gene, and the clone that undergoes the transposition resumes normal growth in the ura-deficient medium . By counting the number of transposable clones in a certain number of yeasts, the transposable efficiency of transposase in Saccharomyces cerevisiae can be calculated. Through this method, we compared the transposition efficiency of wild-type piggybac transposase WT PBase, the existing highly active piggybac transposase hyPBase, codon-optimized transposase optimized hyPBase and bz-hyPBase with nuclear localization signal added. 4 The experimental results show that the transposition efficiency of bz-hyPBase is 3 times that of hyPBase, which proves that bz-hyPBase has higher transposition efficiency in yeast.

WT PBase is a plasmid carrying a mammalian codon-optimized piggybac transposase, hyPBase is a plasmid carrying the existing highly active piggybac transposase (obtained by mutation of 7 amino acid sites for WTPBase described in the background art), optimized hyPBase In order to carry the existing high-activity piggybac transposase through the human source codon optimization and nuclear positioning signal system to obtain the transposase plasmid, bz-hyPBase is a new high-activity transposase screened in the present invention (i.e. optimized hyPBase A plasmid obtained by carrying out the three amino acid site mutations described in the Examples of the present invention).

Example 3 bz-hyPBase has higher gene editing efficiency in CHO cells

We clone optimized hyPBase and bz-hyPBase into mammalian cell expression vectors to generate plasmids ploxP-optimized hyPBase (the structure is the same as Figure 5, only the transposase in Figure 5 is replaced by optimized hyPBase from bz-hyPBase) and ploxP-bz-HyPB (Figure 5) to express transposase. Both the optimized hyPBase and bz-hyPBase promoters are connected to the human c-myc nuclear localization signal. The transposon carrying the EGFP gene was cloned into the vector pSAD-EGFP (Figure 6) to express green fluorescent protein. The two plasmids expressing transposase and transposon are jointly electrotransformed into CHO cells. The transposon with EGFP will be inserted into the genome under the action of transposase to make it stably express green fluorescent protein. After two subcultures Then, on the 7th and 14th days, the cells expressing green fluorescent protein were counted by flow cytometry technology. The more cells that can express the fluorescent protein, the higher the efficiency of transposase transposition. From the statistical results in Figure 7, the transposition activity of bz-hyPBase is significantly better than hyPBase.

Example 4 bz-hyPBase has higher gene editing efficiency in T cells

We combined the ploxP-optimized hyPBase and ploxP-bz-HyPB plasmids in Example 3 into peripheral blood mononuclear cell PBMC cells for T cell genome editing preparation. The transposon with the EGFP green fluorescent protein gene edits the T cell genome under the action of the transposase, and the editing efficiency of the T cell can reflect the strength of the transposase activity. We used 3 PBMC cells from different healthy individuals to conduct multiple experiments. Flow cytometry was used to detect gene editing efficiency on the 5th day. The higher the EGFP positive rate, the higher the transposase activity. The experimental results are shown in Figure 8. In PBMC cells from different donors, the transposition activity of bz-hyPBase is better than optimized hyPBase.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit the protection scope of the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should It is understood that several improvements can be made without departing from the principle of the present invention, and these improvements should also be regarded as the protection scope of the present invention.

Claims

A transposase, characterized in that the transposase is a mutant of SEQ ID NO: 11, and its amino acid sequence is compared with SEQ ID NO: 11, and is selected from the 82nd, 109th and 591th positions. There are amino acid mutations at one, any two, or all three positions at the position, and optionally at other positions of SEQ ID NO: 11 except for the 82nd, 109th, and 591th positions. Multiple amino acid mutations; or

The transposase is a mutant of SEQ ID NO:1, and its amino acid sequence is compared with SEQ ID NO:1 in one, any two or all three selected from the 92nd, 119th and 601th positions. There are amino acid mutations in each position, and optionally one or more amino acid mutations in other positions of SEQ ID NO:1 except for the 92nd, 119th, and 601th positions.
The transposase of claim 1, wherein the amino acid sequence of the transposase has one, any two, or all three substitution mutations as compared with the sequence shown in SEQ ID NO: 11: 82 The isoleucine at position is mutated to asparagine, the valine at position 109 is mutated to alanine, the glutamine at position 591 is mutated to arginine, and optionally, the transposase is in addition to the There are one or more amino acid mutations in other positions of SEQ ID NO: 11 except for positions 82, 109 and 591; preferably, the amino acid residues in the remaining positions of the transposase are the same as those in SEQ ID NO: 11. The same; or

Compared with the sequence shown in SEQ ID NO:1, the amino acid sequence of the transposase has one, any two, or all three substitution mutations: the isoleucine at position 92 is mutated to asparagine, and the amino acid sequence at position 119 is Valine is mutated to alanine, glutamine at position 601 is mutated to arginine, and optionally, the transposase is in SEQ ID NO except for the 92nd, 119th and 601th positions. 1 has one or more amino acid mutations in other positions; preferably, the amino acid residues in the remaining positions of the transposase are the same as SEQ ID NO:1;

Preferably, the amino acid sequence of the transposase is shown in SEQ ID NO: 2 or 12.
A fusion protein comprising the transposase described in any one of claims 1-2 and a functional protein, or is formed or formed by the transposase described in any one of claims 1-2 and the functional protein composition.
The fusion protein of claim 3, wherein the functional protein is used to increase the transposable activity of the transposase, used to monitor the transposable function of the transposase, and/or used to A functional protein that adds new functions to the transposase;

Preferably, the functional protein is selected from the group consisting of a nuclear localization signal protein, a marker protein or a tag protein and an antibody of interest.
A nucleic acid molecule whose polynucleotide sequence is:

(1) A polynucleotide sequence encoding the transposase of any one of claims 1-2;

(2) A polynucleotide sequence encoding the fusion protein of any one of claims 3-4; or

(3) The complementary sequence of the polynucleotide sequence described in (1) or (2).
The nucleic acid molecule of claim 5, wherein the polynucleotide sequence is compared with SEQ ID NO: 4, at one, any two, or all three of the 276th, 356th, and 1802th positions. There is a base mutation at each position, and optionally there is a base mutation at base 900; preferably, the mutation at position 276 is a base T mutation to base C, and the mutation at position 356 is a base T The mutation at position 900 is a mutation of base G to base A, and the mutation at position 1802 is a mutation of base A to base G; more preferably, the polynucleotide sequence is as SEQ ID NO: as shown in 3; or

Compared with SEQ ID NO: 13, its polynucleotide sequence has base mutations at one, any two or all three of the 246th, 326th and 1772th positions, optionally at the 870th position. There are also base mutations on the bases; preferably, the mutation at position 246 is a mutation of base T to base C, the mutation at position 326 is a mutation of base T to base C, and the mutation at position 870 is a base. The G mutation is base A, and the mutation at position 1772 is the mutation of base A to base G; more preferably, the polynucleotide sequence is shown in SEQ ID NO: 14.
A nucleic acid construct containing the nucleic acid molecule of claim 5 or 6; preferably, the nucleic acid construct is an expression cassette.
The nucleic acid construct according to claim 7, wherein the nucleic acid construct comprises in turn: a transposon 5'end repeat sequence, a polyclonal insertion site, a polyA tailing signal sequence, and a transposon 3'end A repetitive sequence, the nucleic acid molecule of claim 5 or 6, and a promoter that controls the expression of the nucleic acid molecule.
A recombinant vector containing the nucleic acid molecule of claim 5 or 6 or the nucleic acid construct of claim 7 or 8; preferably, the recombinant vector is a recombinant cloning vector, a recombinant eukaryotic expression vector or a recombinant virus Carrier.
The recombinant vector of claim 9, wherein the recombinant cloning vector is a pRS vector, a T vector or a pUC vector, and the recombinant eukaryotic expression vector is pEGFP, pCMVp-NEO-BAN or pSV2, and the recombinant The virus vector is a recombinant adenovirus vector or a lentivirus vector.
A host cell containing the nucleic acid molecule according to claim 5 or 6, the nucleic acid construct according to claim 7 or 8, or the recombinant vector according to claim 9 or 10, and/or its expression according to claim 1- The transposase of any one of 2 and/or the fusion protein of claim 3 or 4; preferably, the host cell is an E. coli cell, an insect cell, a yeast cell or a mammalian cell.
A gene transfer system, which contains the transposase of any one of claims 1-2, the fusion protein of claim 3 or 4, the nucleic acid molecule of claim 5 or 6, and the nucleic acid molecule of claim 7 or The nucleic acid construct of 8, the recombinant vector of claim 9 or 10, or the host cell of claim 11.
The gene transfer system of claim 12, wherein the gene transfer system further contains a transposon gene;

Preferably, the nucleic acid molecule of claim 5 or 6 or the nucleic acid construct of claim 7 or 8 is integrated with a transposon gene, or the nucleic acid molecule of claim 5 or 6 or the nucleic acid molecule of claim 7 or 8. The nucleic acid construct is relatively independent of the transposon gene, or the nucleic acid molecule of claim 5 or 6 or the nucleic acid construct of claim 7 or 8 and the transposon gene are located on the same recombinant vector , Or the nucleic acid molecule of claim 5 or 6, or the nucleic acid construct of claim 7 or 8, and the transposon gene are located on different recombinant vectors, or the transposon gene is integrated in claim 7 or On the nucleic acid construct of 8, or the transposon gene is integrated into the recombinant vector of claim 9 or 10, or the transposon gene is transferred into the host cell of claim 11, or The transposon gene is located outside the host cell of claim 11.
The transposase of any one of claims 1-2, the fusion protein of claim 3 or 4, the nucleic acid molecule of claim 5 or 6, the nucleic acid construct of claim 7 or 8, Use of the recombinant vector of claim 9 or 10, the host cell of claim 11 or the gene transfer system of claim 12 or 13 in any of the following:

(1) Preparation or use as drugs and/or preparations for genome research, gene therapy, cell therapy, or multifunctional stem cell induction and/or differentiation; preferably, preparation or use as drugs and/or preparations for integrating foreign genes into the host cell genome Or a preparation, preferably the host cell is an E. coli cell, an insect cell, a yeast cell or a mammalian cell;

(2) Preparation or use as a tool for genome research, gene therapy, cell therapy, or induction and/or differentiation of pluripotent stem cells; preferably for preparation or use as a tool for integrating foreign genes into the host cell genome, preferably the host cell is the large intestine Bacillus cells, insect cells, yeast cells or mammalian cells.
A medicine, preparation or tool for genome research, gene therapy, cell therapy, or induction and/or differentiation of pluripotent stem cells, which contains the transposase according to any one of claims 1-2, claim 3 The fusion protein of or 4, the nucleic acid molecule of claim 5 or 6, the nucleic acid construct of claim 7 or 8, the recombinant vector of claim 9 or 10, the host cell of claim 11 Or the gene transfer system of claim 12 or 13.