CN112899252A

CN112899252A - High-activity transposase and application thereof

Info

Publication number: CN112899252A
Application number: CN202010940595.4A
Authority: CN
Inventors: 文雯; 宋姗姗; 刘韬; 刘祥箴; 金华君; 钱其军
Original assignee: Shanghai Cell Therapy Research Institute; Shanghai Cell Therapy Group Co Ltd
Current assignee: Shanghai Cell Therapy Research Institute; Shanghai Cell Therapy Group Co Ltd
Priority date: 2019-12-04
Filing date: 2020-09-09
Publication date: 2021-06-04
Also published as: WO2021110119A1

Abstract

The invention belongs to the field of molecular biology and biomedicine, and particularly relates to a high-activity transposase and application thereof, wherein the amino acid sequence of the high-activity transposase is shown as SEQ ID NO. 2, the coding nucleotide sequence of the amino acid sequence is shown as SEQ ID NO. 3, the gene transfer activity of a transposon can be remarkably improved when the high-activity transposase based on the amino acid sequence is used for a transposition system, and the high-activity transposase and the nucleotide sequence coding the high-activity transposase can be used for constructing the gene transfer system and preparing or using the high-activity transposase as medicines, preparations or tools and the like for genome research, gene therapy, cell therapy, or multifunctional stem cell induction and/or differentiation.

Description

High-activity transposase and application thereof

Technical Field

The invention belongs to the fields of molecular biology and biomedicine, and particularly relates to a high-activity transposase and application thereof.

Background

DNA transposons are mobile DNA sequences that can be transposed from one location to another in the genome by a series of processes, such as cutting, reintegration, etc. The PiggyBac (PB) transposon is a DNA transposon which is separated from a Trichoplusia ni TN368 cell line, can be specifically inserted into a TTAA (Trichoplusia ni) locus, and can accurately excise a target gene from a host by virtue of transposase without rephotography of a host chromosome. The PB transposon has no potential virus genetic toxicity, can carry a long exogenous gene fragment (up to 150kb) and has strong transformation. The PB transposase mediated transgene has the characteristics of high integration efficiency, stable integration, long-term expression, single copy integration, insertion site positioning, simple and convenient operation and the like, and is usually applied to multiple fields of transgenic mouse production, mouse embryonic stem cell genetic operation, gene operation such as gene mutagenesis and the like, pluripotent stem cell induction and the like.

The transposition activity of PB transposase is the highest among the existing mammalian DNA transposons, and the PB transposase has a wide application prospect. Many studies have been made at home and abroad to use the PB transposon system as a method for gene editing for transgenics and gene mutation in a variety of organisms, including insect cells, protists, plants and vertebrates. In 2003, Tomita fused human type III collagen with enhanced green fluorescent protein EGFP, and integrated the EGFP into silkworm fibroin gene by using PB transposon, so as to obtain transgenic silkworm capable of stably expressing human collagen. In 2005, Balu inserted human dihydrofolate reductase (hDHFR) into the genomes of abusive protozoa via the PB transposable system. In 2014, Eric T obtained a stable transgenic line in which the PB transposon transposes in vivo. In 2005, Sheng Ding introduced exogenous gene fragments into human cells and mouse cell lines cultured in vitro with high efficiency via PB transposons, and allowed them to express stably, and cultured transgenic fluorescent mice with stable traits, demonstrating the possibility that PB transposable systems can be used as an effective operational tool for studying gene functions of other vertebrates.

The DNA transposon system consists of two parts, a transposon with Inverted Repeats (IRs) at both ends capable of carrying the DNA fragment of interest and a transposase capable of catalyzing the "cutting and sticking" of the transposon. Transposases first bind IRs sequences flanking the transposon, then the transposon is removed from the host DNA site precisely tracelessly, and finally the DNA fragment is integrated into a new site. The establishment of the high-efficiency transposition system can realize the fixed-point knockout of target genes or the fixed-point introduction of target genes, and provides an effective carrier tool for gene editing in mammalian cells. The efficiency of transposition by the transposition system determines the efficiency of gene editing and is largely dependent on the expression level of transposase, and thus, increasing transposase activity is a key technical point for increasing transposition efficiency of transposons.

The transposition activity of transposase is influenced by factors such as binding sites, active sites and structures, so that the crystal structure of transposase is not clearly analyzed at present, but a part of structural domains are considered as important structures, and experiments prove that the activity of transposase can be influenced by any non-specific amino acid.

A high-activity PiggyBac transferase (A superactive piggyBac transferase for mammalian applications, PNAS | January 25,2011| vol.108| No.4| 1531-1536) discloses a high-activity PiggyBac transferase (referred to as the following existing high-activity transposase hypnase, shown as SEQ ID NO:1) with transposition efficiency 10 times higher than that of mBase (wild-type PiggyBac transferase optimized by mammalian codons) for carrying out the following site amino acid mutations: I30V, G165S, S103P, M282V, S509G, N570S and N538K.

PiggyBac transposon mutants and their use (PiggyBac transposon variants and methods of use, US9670503B2) and PiggyBac transposon variants and methods of use (CN102421902A) are re-applications based on the priority of U.S. provisional application No. 61/155206, both of which disclose: and continuously mutating and selecting a mutant with integrase activity higher than that of the integration defect PiggyBac mutant on the basis of integrating the defect PiggyBac mutant, and mutating and selecting a mutant with integration activity higher than that of a wild type PiggyBac normal body on the basis of the wild type PiggyBac normal body.

Although the enzymatic activity of the existing PiggyBac transferase mutant is improved compared with that of a wild-type PiggyBac transferase, the enzyme activity requirement of the existing PiggyBac transferase mutant cannot be met, and therefore, the research on the PiggyBac transferase with high enzymatic activity is still necessary.

Disclosure of Invention

The invention provides a novel high-activity transposase which shows extremely high transposition activity in cells such as escherichia coli, insect cells, yeast cells, mammalian cells and the like, has broad spectrum of application to host cells compared with the prior high-activity transposase hyPBase, also has high transposition activity in mammalian cells, particularly high transposition activity in human cells, and provides a new clue and basis for the search of transposase, particularly the search of transposase in human cells.

The invention also provides an amino acid sequence and a peptide segment which are used as the basis of the novel high-activity transposase, a nucleotide sequence which is used for encoding the amino acid sequence, the peptide segment and the protein of the high-activity transposase, a nucleic acid construct, a recombinant vector and a host cell which are based on the nucleotide sequence, and a gene transfer system and an application which are based on the peptide segment, the protein, the nucleic acid construct, the recombinant vector and the host cell component.

The invention simultaneously mutates isoleucine at position 92 into asparagine, valine at position 119 into alanine, and glutamine at position 601 into arginine in the amino acid sequence (shown in SEQ ID NO:1) of the prior high-activity transposase hyPBase to obtain the target mutant amino acid sequence shown in SEQ ID NO: 2. In CHO cells, compared with the transposition efficiency (30.9%) of the prior high-activity transposase hyPBase through codon optimization and addition of a nuclear localization signal system, the transposition efficiency (51.7%) of the target high-activity transposase bz-hyPBase generated on the basis of the amino acid sequence of SEQ ID NO:2 is improved by nearly 21%; in PBMC cells, compared with the transposition efficiency (9.81%) of the existing high-activity transposase hyPBase through codon optimization and addition of a nuclear localization signal system, the transposition efficiency (19.4%) of the target high-activity bz-hyPBase enzyme generated on the basis of the amino acid sequence of SEQ ID NO:2 is improved by nearly 10%. It is demonstrated that the target highly active enzyme based on the mutated amino acid sequence of the present invention exhibits superior transposition activity compared to the existing highly active transposase hyppase, especially high transposition activity exhibited in mammalian cells and human-derived cells. Therefore, the invention provides a novel high-activity transposase which contains one or more amino acid sequences shown in SEQ ID NO. 2, and the high-activity transposase shows extremely high transposition activity in escherichia coli, insect cells, yeast cells and mammalian cells, and particularly meets the high transposition activity requirements of the mammalian cells and human cells.

The amino acid sequence of hyPBase (SEQ ID NO: 1):

MGPAAKRVKLDGSSLDDEHILSALLQSDDELVGEDSDSEVSDHVSEDDVQSDTEEAFIDEVHEVQPTSSGSEILDEQNVIEQPGSSLASNRILTLPQRTIRGKNKHCWSTSKPTRRSRVSALNIVRSQRGPTRMCRNIYDPLLCFKLFFTDEIISEIVKWTNAEISLKRRESMTSATFRDTNEDEIYAFFGILVMTAVRKDNHMSTDDLFDRSLSMVYVSVMSRDRFDFLIRCLRMDDKSIRPTLRENDVFTPVRKIWDLFIHQCIQNYTPGAHLTIDEQLLGFRGRCPFRVYIPNKPSKYGIKILMMCDSGTKYMINGMPYLGRGTQTNGVPLGEYYVKELSKPVHGSCRNITCDNWFTSIPLAKNLLQEPYKLTIVGTVRSNKREIPEVLKNSRSRPVGTSMFCFDGPLTLVSYKPKPAKMVYLLSSCDEDASINESTGKPQMVMYYNQTKGGVDTLDQMCSVMTCSRKTNRWPMALLYGMINIACINSFIIYSHNVSSKGEKVQSRKKFMRNLYMGLTSSFMRKRLEAPTLKRYLRDNISNILPKEVPGTSDDSTEEPVMKKRTYCTYCPSKIRRKASASCKKCKKVICREHNIDMCQSCF*

target mutant amino acid sequence (SEQ ID NO: 2):

MGPAAKRVKLDGSSLDDEHILSALLQSDDELVGEDSDSEVSDHVSEDDVQSDTEEAFIDEVHEVQPTSSGSEILDEQNVIEQPGSSLASNRNLTLPQRTIRGKNKHCWSTSKPTRRSRASALNIVRSQRGPTRMCRNIYDPLLCFKLFFTDEIISEIVKWTNAEISLKRRESMTSATFRDTNEDEIYAFFGILVMTAVRKDNHMSTDDLFDRSLSMVYVSVMSRDRFDFLIRCLRMDDKSIRPTLRENDVFTPVRKIWDLFIHQCIQNYTPGAHLTIDEQLLGFRGRCPFRVYIPNKPSKYGIKILMMCDSGTKYMINGMPYLGRGTQTNGVPLGEYYVKELSKPVHGSCRNITCDNWFTSIPLAKNLLQEPYKLTIVGTVRSNKREIPEVLKNSRSRPVGTSMFCFDGPLTLVSYKPKPAKMVYLLSSCDEDASINESTGKPQMVMYYNQTKGGVDTLDQMCSVMTCSRKTNRWPMALLYGMINIACINSFIIYSHNVSSKGEKVQSRKKFMRNLYMGLTSSFMRKRLEAPTLKRYLRDNISNILPKEVPGTSDDSTEEPVMKKRTYCTYCPSKIRRKASASCKKCKKVICREHNIDMCRSCF*

the mutated amino acid sequence obtained by carrying out the above amino acid mutation on the 92 th site, 119 th site and 601 th site of the amino acid sequence of the existing high-activity transposase hyPBase (shown in SEQ ID NO:1) or on any two sites of the 92 th site, the 119 th site and the 601 th site, and the enzyme formed on the basis of one or more of the mutated amino acid sequences also has the same or similar transposition efficiency as the target high-activity transposase bz-hyPBase described in the embodiment of the invention or the existing hyPBase, and also belongs to the mutated amino acid sequence of the novel high-activity transposase to be protected by the invention, and the enzyme formed on the basis of the mutated amino acid sequence also belongs to the novel high-activity transposase to be protected by the invention.

As described above, the amino acid sequence still maintaining or improving the enzymatic activity obtained by carrying out the above amino acid mutation on the 92 th site, 119 th site, 601 th site of the amino acid sequence of the prior high-activity transposase hyPBase (shown in SEQ ID NO:1), any two sites or three sites, and then carrying out one or more amino acid deletion, substitution, insertion or addition operations also belongs to the substitution scheme with the same or similar technical effects in the technical scheme of the invention, and is within the protection scope of the invention. Also belonging to the mutated amino acid sequence of the novel high-activity transposase to be protected in the present invention, an enzyme formed on the basis of one or more of the mutated amino acid sequences also belongs to the novel high-activity transposase to be protected in the present invention.

As described above, the mutant amino acid sequence obtained by performing the above amino acid mutation on the 92 th, 119 th and 601 th sites of the amino acid sequence of the existing high-activity transposase hyPBase (shown in SEQ ID NO:1), any two sites or three sites, and also contains the amino acid sequence of the functional protein, and the functional protein is added on the new high-activity transposase to improve or increase the functions of the new high-activity transposase, such as the amino acid sequence of the nuclear localization signal, the amino acid sequence of the expressed EGFP green fluorescent protein, the amino acid sequence of the tag protein or the amino acid sequence of the antibody. These functional proteins can increase the transposition activity of a new highly active transposase, e.g., nuclear localization signals can help increase the transposition activity of a transposase; or can enhance the transposition monitoring function of the high-activity transposase, such as EGFP green fluorescent protein or tag protein to facilitate the qualitative and/or quantitative monitoring of the transposition activity of the transposase; or to add new functions to the new highly active transposase, e.g., antibodies may additionally increase immune activity.

The invention also protects a mutant amino acid sequence obtained by carrying out the above amino acid mutation on 92, 119 and 601 sites of the amino acid sequence of the prior high-activity transposase hyPBase (shown in SEQ ID NO:1), any two sites or three sites, and a derived amino acid sequence which is obtained by carrying out one or more amino acid deletion, substitution, insertion or addition operations on the basis of the mutant amino acid sequence and still maintains or improves the enzyme activity, and a chain compound which is a peptide segment is connected by peptide bonds after amino acid dehydration condensation. The number of the above-mentioned mutated amino acids or the above-mentioned derived amino acid sequences contained in the peptide fragment may be one or more. The peptide segment is also connected with the peptide segment of the functional protein which is formed by connecting the amino acid sequence of the functional protein through peptide bonds after dehydration and condensation of amino acids, such as the peptide segment of a nuclear localization signal, the peptide segment of an EGFP green fluorescent protein, a label protein peptide segment or an antibody peptide segment and the like.

The invention relates to a mutant amino acid sequence obtained by carrying out amino acid mutation on 92 th site, 119 th site and 601 th site of an amino acid sequence of the prior high-activity transposase hyPBase (shown in SEQ ID NO:1), a peptide segment formed on the basis of the mutant amino acid sequence, a derivative amino acid sequence which is obtained by carrying out deletion, substitution, insertion or addition operation on one or more amino acids on the basis of the mutant amino acid sequence and still keeps or improves the activity of the enzyme, and a protein formed on the basis of the peptide segment formed on the basis of the derivative amino acid sequence, which belong to the novel high-activity transposase protected by the invention. The number of the above-mentioned mutant amino acid sequence, derivative amino acid sequence and peptide segment formed on the basis of the above-mentioned mutant amino acid sequence and derivative amino acid sequence in said new high-activity transposase is one or more.

The mutant nucleotide sequence encoding the above-mentioned novel high-activity transposase, peptide fragment and amino acid sequence thereof of the present invention, the nucleotide sequence complementary to, hybridized with or overlapped with the mutant nucleotide sequence, or the nucleotide sequence subjected to base substitution, deletion or addition operation and having a nucleotide sequence encoding the novel high-activity transposase, or the nucleotide sequence having at least 80% homology with the mutant nucleotide sequence, preferably at least 90% homology with the mutant nucleotide sequence, more preferably at least 96% homology with the mutant nucleotide sequence, are all the mutant nucleotide sequences encoding the novel high-activity transposase, peptide fragment and amino acid sequence thereof of the present invention to be protected, and the number thereof may be one or multiple copies. The method comprises the following specific steps:

the nucleotide sequence of the amino acid sequence of the prior high-activity enzyme hyPBase (shown in SEQ ID NO:1) is optimized by a human codon to obtain a human codon optimized nucleotide sequence, and the base mutations of the following sites are carried out on the basis of the human codon optimized nucleotide sequence (SEQ ID NO: 4): 276 base T is mutated into base C, 356 base T is mutated into base C, 900 base G is mutated into base A, 1802 base A is mutated into base G; obtaining a mutant nucleotide sequence which encodes the amino acid sequence (shown in SEQ ID NO:2) of the novel high-activity transposase bz-hyPBase, and the mutant nucleotide sequence is shown in SEQ ID NO: 3.

Nucleotide sequence of human codon optimized prior high activity enzyme hyPBase (SEQ ID NO: 4): atgggccctgctgccaagagggtcaagttggacggcagcagcctggacgacgagcacatcctgagcgccctgctgcagagcgacgacgagctggtgggcgaggacagcgacagcgaggtgagcgaccacgtgagcgaggacgacgtgcagagcgacaccgaggaggccttcatcgacgaggtgcacgaggtgcagcccaccagcagcggcagcgagatcctggacgagcagaacgtgatcgagcagcccggcagcagcctggccagcaaccgcaatctgaccctgccccagcgcaccatccgcggcaagaacaagcactgctggagcaccagcaagcccacccgccgcagccgcgtcagcgccctgaacatcgtgcgcagccagcgcggccccacccgcatgtgccgcaacatctacgaccccctgctgtgcttcaagctgttcttcaccgacgagatcatcagcgagatcgtgaagtggaccaacgccgagatcagcctgaagcgccgcgagagcatgaccagcgccaccttccgcgacaccaacgaggacgagatctacgccttcttcggcatcctggtgatgaccgccgtgcgcaaggacaaccacatgagcaccgacgacctgttcgaccgcagcctgagcatggtgtacgtgagcgtgatgagccgcgaccgcttcgacttcctgatccgctgcctgcgcatggacgacaagagcatccgccccaccctgcgcgagaacgacgtgttcacccccgtgcgcaagatctgggacctgttcatccaccagtgcatccagaactacacccccggcgcccacctgaccatcgacgagcagctgctgggcttccgcggccgctgccccttccgcgtgtacatccccaacaagcccagcaagtacggcatcaagatcctgatgatgtgcgacagcggcaccaagtacatgatcaacggcatgccctacctgggccgcggcacccagaccaacggcgtgcccctgggcgagtactacgtgaaggagctgagcaagcccgtgcacggcagctgccgcaacatcacctgcgacaactggttcaccagcatccccctggccaagaacctgctgcaggagccctacaagctgaccatcgtgggcaccgtgcgcagcaacaagcgcgagatccccgaggtgctgaagaacagccgcagccgccccgtgggcaccagcatgttctgcttcgacggccccctgaccctggtgagctacaagcccaagcccgccaagatggtgtacctgctgagcagctgcgacgaggacgccagcatcaacgagagcaccggcaagccccagatggtgatgtactacaaccagaccaagggcggcgtggacaccctggaccagatgtgcagcgtgatgacctgcagccgcaagaccaaccgctggcccatggccctgctgtacggcatgatcaacatcgcctgcatcaacagcttcatcatctacagccacaacgtgagcagcaagggcgagaaggtgcagagccgcaagaagttcatgcgcaacctgtacatgggcctgaccagcagcttcatgcgcaagcgcctggaggcccccaccctgaagcgctacctgcgcgacaacatcagcaacatcctgcccaaggaggtgcccggcaccagcgacgacagcaccgaggagcccgtgatgaagaagcgcacctactgcacctactgccccagcaagatccgccgcaaggccagcgccagctgcaagaagtgcaagaaggtgatctgccgcgagcacaacatcgacatgtgccagagctgcttctaa

Mutant nucleotide sequence (SEQ ID NO: 3):

atgggccctgctgccaagagggtcaagttggacggcagcagcctggacgacgagcacatcctgagcgccctgctgcagagcgacgacgagctggtgggcgaggacagcgacagcgaggtgagcgaccacgtgagcgaggacgacgtgcagagcgacaccgaggaggccttcatcgacgaggtgcacgaggtgcagcccaccagcagcggcagcgagatcctggacgagcagaacgtgatcgagcagcccggcagcagcctggccagcaaccgcaacctgaccctgccccagcgcaccatccgcggcaagaacaagcactgctggagcaccagcaagcccacccgccgcagccgcgccagcgccctgaacatcgtgcgcagccagcgcggccccacccgcatgtgccgcaacatctacgaccccctgctgtgcttcaagctgttcttcaccgacgagatcatcagcgagatcgtgaagtggaccaacgccgagatcagcctgaagcgccgcgagagcatgaccagcgccaccttccgcgacaccaacgaggacgagatctacgccttcttcggcatcctggtgatgaccgccgtgcgcaaggacaaccacatgagcaccgacgacctgttcgaccgcagcctgagcatggtgtacgtgagcgtgatgagccgcgaccgcttcgacttcctgatccgctgcctgcgcatggacgacaagagcatccgccccaccctgcgcgagaacgacgtgttcacccccgtgcgcaagatctgggacctgttcatccaccagtgcatccagaactacacccccggcgcccacctgaccatcgacgagcagctgctgggcttccgcggccgctgccccttccgcgtgtacatccccaacaagcccagcaaatacggcatcaagatcctgatgatgtgcgacagcggcaccaagtacatgatcaacggcatgccctacctgggccgcggcacccagaccaacggcgtgcccctgggcgagtactacgtgaaggagctgagcaagcccgtgcacggcagctgccgcaacatcacctgcgacaactggttcaccagcatccccctggccaagaacctgctgcaggagccctacaagctgaccatcgtgggcaccgtgcgcagcaacaagcgcgagatccccgaggtgctgaagaacagccgcagccgccccgtgggcaccagcatgttctgcttcgacggccccctgaccctggtgagctacaagcccaagcccgccaagatggtgtacctgctgagcagctgcgacgaggacgccagcatcaacgagagcaccggcaagccccagatggtgatgtactacaaccagaccaagggcggcgtggacaccctggaccagatgtgcagcgtgatgacctgcagccgcaagaccaaccgctggcccatggccctgctgtacggcatgatcaacatcgcctgcatcaacagcttcatcatctacagccacaacgtgagcagcaagggcgagaaggtgcagagccgcaagaagttcatgcgcaacctgtacatgggcctgaccagcagcttcatgcgcaagcgcctggaggcccccaccctgaagcgctacctgcgcgacaacatcagcaacatcctgcccaaggaggtgcccggcaccagcgacgacagcaccgaggagcccgtgatgaagaagcgcacctactgcacctactgccccagcaagatccgccgcaaggccagcgccagctgcaagaagtgcaagaaggtgatctgccgcgagcacaacatcgacatgtgccggagctgcttctaa

or, the mutant nucleotide sequence (shown in SEQ ID NO:3) is subjected to base substitution, deletion or addition operation and has a nucleotide sequence for coding a novel high-activity transposase bz-hyPBase;

or a nucleotide sequence which is complementary with the mutant nucleotide sequence (shown in SEQ ID NO:3) according to the base complementary pairing principle and has a new nucleotide sequence of the high-activity transposase bz-hyPBase after base substitution, deletion or addition operation;

or a nucleotide sequence which is overlapped with a mutant nucleotide sequence (shown in SEQ ID NO:3) and has a nucleotide sequence for coding a novel high-activity transposase bz-hyPBase;

or a nucleotide sequence which hybridizes with a mutant nucleotide sequence (shown in SEQ ID NO:3) and has a nucleotide sequence coding a novel high-activity transposase bz-hyPBase;

or has more than 80% homology with the mutant nucleotide sequence (shown in SEQ ID NO:3) and has a nucleotide sequence for coding a novel high-activity transposase bz-hyPBase; specifically, it is preferable that it has 90% or more homology with the mutant nucleotide sequence (shown in SEQ ID NO:3) and has a nucleotide sequence encoding a novel highly active transposase bz-hyPBase; more preferably a nucleotide sequence having more than 96% homology with the mutant nucleotide sequence (shown in SEQ ID NO:3) and encoding a novel highly active transposase bz-hyPBase;

all belong to the mutant nucleotide sequence of the new high-activity transposase bz-hyPBase or the peptide fragment thereof or the amino acid sequence thereof to be protected by the invention.

If the novel high-activity transposase of the invention is also connected with a functional protein, the mutant nucleotide sequence for coding the functional protein also contains a nucleotide sequence for coding the functional protein, such as a nucleotide sequence for coding a nuclear localization signal, a nucleotide sequence for expressing EGFP green fluorescent protein, a nucleotide sequence for coding a tag protein peptide segment or a nucleotide sequence for coding an antibody, and the like.

The present invention also provides the above-mentioned nucleic acid polymerized from a mutant nucleotide sequence encoding the novel high-activity transposase of the present invention, or a peptide fragment thereof, or an amino acid sequence thereof. When the novel high activity transposase of the present invention is linked to a functional protein, the nucleic acid also contains a nucleotide sequence encoding a functional protein (nuclear localization signal, EGFP green fluorescent protein, tag protein or antibody).

The present invention also provides a nucleic acid construct comprising one or more control sequences operably linked to the nucleic acid construct, wherein the control sequences direct the expression of a target sequence in a host cell, wherein the expression of the coding sequence comprises any step involved in the production of the protein or polypeptide, including but not limited to transcription, post-transcriptional modification, translation, post-translational modification, secretion, and the like. The nucleic acid construct further comprises the above-mentioned mutant nucleotide sequence encoding the novel high-activity transposase of the present invention, or a peptide fragment thereof, or an amino acid sequence thereof, or a nucleic acid obtained by polymerizing the mutant nucleotide sequence.

The present invention also provides a recombinant vector comprising the above-mentioned mutant nucleotide sequence encoding the novel high-activity transposase of the present invention, or a peptide fragment thereof, or an amino acid sequence thereof, or a nucleic acid obtained by polymerizing the mutant nucleotide sequence, or the above-mentioned nucleic acid construct. The recombinant vector comprises a recombinant cloning vector, a recombinant eukaryotic expression vector or a recombinant virus vector, wherein the recombinant cloning vector comprises a pRS vector, a T vector or a pUC vector and the like, the recombinant eukaryotic expression vector comprises pEGFP, pCMVp-NEO-BAN or pSV2 and the like, and the recombinant virus vector comprises a recombinant adenovirus vector or a lentivirus vector and the like.

The present invention also provides a host cell comprising the above-mentioned mutant nucleotide sequence encoding the novel high-activity transposase of the present invention, or a peptide fragment thereof, or an amino acid sequence thereof, or a nucleic acid polymerized from the mutant nucleotide sequence, or the above-mentioned nucleic acid construct, or the above-mentioned recombinant vector. The host cell includes Escherichia coli cell, insect cell, yeast cell or mammal cell, etc.

The new high-activity transposase for improving transposon transposition activity of the transposition system, or the peptide fragment forming the new high-activity transposase, or the nucleic acid construct encoding the new high-activity transposase, or the recombinant vector encoding the new high-activity transposase, or the host cell (escherichia coli cell, insect cell, yeast cell, mammalian cell, etc.) containing the new high-activity transposase and/or the nucleic acid construct encoding the new high-activity transposase and/or the recombinant vector encoding the new high-activity transposase, provided by the invention, can be used for site-directed, stable and efficient integration of exogenous genes into the host cell genome, and realizing long-term and stable expression, without affecting the stable expression of the host original genes, can be used for constructing a new gene transfer system, and can also be used for preparing or being used for genome research, gene therapy, cell therapy, Or a pluripotent stem cell-inducing and/or differentiating agent, and also useful for preparing or as a tool for genomic research, gene therapy, cell therapy, or pluripotent stem cell-inducing and/or differentiating.

A gene transfer system comprising the novel high-activity transposase of the present invention, or a nucleic acid construct encoding the novel high-activity transposase, or a recombinant vector encoding the novel high-activity transposase, or a host cell comprising the novel high-activity transposase and/or a nucleic acid construct encoding the novel high-activity transposase and/or a recombinant vector encoding the novel high-activity transposase.

In the gene transfer system, a transposon gene is also contained, and a nucleic acid or a nucleic acid construct encoding a novel high-activity transposase is integrated with the transposon gene; or nucleic acids or nucleic acid constructs encoding the novel highly active transposase are independent of the transposon gene; or the nucleic acid or nucleic acid construct encoding the novel high activity transposase is located on the same recombinant vector as the transposon gene; or the nucleic acid or nucleic acid construct encoding the novel high activity transposase is located on a different recombinant vector from the transposon gene; or the transposon gene is integrated into a nucleic acid construct encoding a novel highly active transposase; or the transposon gene is integrated into a recombinant vector encoding a new transposase with high activity; or the transposon gene is independent of a recombinant vector encoding a new transposase with high activity; or the transposon gene is transferred into a host cell containing a new high-activity transposase and/or a nucleic acid construct encoding the new high-activity transposase and/or a recombinant vector encoding the new high-activity transposase; or the transposon gene is located outside the host cell containing the novel high activity transposase and/or the nucleic acid construct encoding the novel high activity transposase and/or the recombinant vector encoding the novel high activity transposase.

A drug and/or a preparation for genome research, gene therapy, cell therapy, or pluripotent stem cell induction and/or differentiation, which comprises the novel high-activity transposase of the present invention, or a nucleic acid construct encoding the novel high-activity transposase, or a recombinant vector encoding the novel high-activity transposase, or a host cell comprising the novel high-activity transposase and/or a nucleic acid construct encoding the novel high-activity transposase and/or a recombinant vector encoding the novel high-activity transposase, or a gene transfer system as described above.

The medicine for genome research, gene therapy, cell therapy or multifunctional stem cell induction and/or differentiation also contains pharmaceutically acceptable auxiliary materials, can be prepared into any pharmaceutically feasible dosage form, and can be simultaneously supplemented with auxiliary therapeutic components.

A means for genome research, gene therapy, cell therapy, or pluripotent stem cell induction and/or differentiation, comprising the novel high-activity transposase of the present invention, or a nucleic acid construct encoding the novel high-activity transposase, or a recombinant vector encoding the novel high-activity transposase, or a host cell comprising the novel high-activity transposase and/or a nucleic acid construct encoding the novel high-activity transposase and/or a recombinant vector encoding the novel high-activity transposase, or the above-mentioned gene transfer system.

Drawings

FIG. 1 is a vector map of PRS316-URA-PBase in step (3) of example 1.

FIG. 2 is a schematic diagram of the process of multiple accumulation error-prone PCR mutation of transposase in step (3) of example 1 (upper panel) and the process of converting transposase fragments and linearized vector recovered by error-prone PCR into ura-deficient yeast strain at a molar ratio of 10:1 (lower panel).

FIG. 3 is a schematic diagram showing the procedures for library of mutants and screening for a high-efficiency transposase in step (3) of example 1.

FIG. 4 is a PRS316-URA-PBase map and plasmid working diagram (A), WT PBase, hypPBase, optimized hypPBase, bz-hypPBase in yeast, visual map (B), WT PBase, hypPBase, optimized hypPBase, bz-hypPBase in yeast, statistical map (C) of WT PBase, hypPBase, optimized hypPBase, bz-hypase in yeast, statistical histogram (D) of bz-hypase in yeast.

FIG. 5 is a schematic diagram showing the structure of the ploxP-bz-HyPB plasmid in example 3.

FIG. 6 is a schematic diagram of the structure of the pSAD-EGFP plasmid in example 3.

FIG. 7 is a graph comparing the efficiency of editing CHO cell genomes using optimized hyPBase and bz-hyPBase transposases in example 3. It was found that the transposition efficiency of bz-hyppase in CHO cells was significantly increased.

FIG. 8 is a graph comparing the efficiency of CAR T cells prepared in example 4 using optimized hypase and bz-hypase transposase. It is known that the transposition efficiency of bz-hyppase in PBMC multiple donors is significantly increased.

Detailed Description

The invention will be elucidated more clearly in conjunction with the drawings and the specific embodiments described in the description, which are intended to illustrate the invention, but are not limited thereto. The experimental method conditions in the examples are, unless otherwise specified, conventional experimental method conditions; reagents and the like are carried out according to the manufacturer's instructions without special instructions.

EXAMPLE 1 obtaining of highly active bz-hyPBase mutant

Based on the original sequence (shown as SEQ ID NO:1) of the prior high-activity piggyBac transposase (hyppase for short), the sequence information of the protected base piggyBac transposase (bz-hyppase for short) is obtained by the following changes:

(1) based on the preference of codon usage of human beings, the codon optimization is carried out on the prior high-activity piggybac transposase to obtain a nucleotide sequence shown as SEQ ID NO.4 so as to improve the expression level of the transposase;

(2) a human c-myc nuclear localization signal is added behind the initiation codon, so that the integration efficiency of the exogenous gene in the host cell is improved;

(3) the nucleotide sequence shown in SEQ ID NO.4 is subjected to random mutation by adopting the following method to obtain a mutant with transposition efficiency obviously superior to that of the existing high-activity piggybac transposase, and the mutant is named as bz-hyPBase (an amino acid sequence shown in SEQ ID NO. 2 and a nucleotide sequence shown in SEQ ID NO. 3), and the method comprises the following steps:

a. construction of screening reporter vectors

The resistance gene G418 is inserted between the 5 'IR and 3' IR of the transposon element by means of gene synthesis to form the transposon G418-IR. The transposon is inserted into TTAA of URA3 gene by recombination after PCR, and transposase with inducible promoter is inserted into PRS316 polyclonal enzyme cutting site, finally forming screening report vector PRS 316-URA-PBase. The specific operation is as follows:

(1) template PRS316 was subjected to PCR using primers pURA-F (SEQ ID NO: 5: aagccgctaaaggcattatccgcc) and pURA-R (SEQ ID NO: 6: aactgtgccctccatggaaaaatcagtc) to give linearized fragment 1 of plasmid PRS 316.

(2) PCR was performed on the synthetic transposon G418-IR using the primers pURA-IR-F (SEQ ID NO: 7:) and pURA-IR-R (SEQ ID NO: 8:), to obtain the linearized fragment 2 of the transposon having a sequence homologous to PRS 316.

pURA-IR-F(SEQ ID NO:7)：

gactgatttttccatggagggcacagttaaccctagaaagatagtctgcgtaaaattgacgcatgcgac

pURA-IR-R(SEQ ID NO:8)：

ggcggataatgcctttagcggcttaaccctagaaagataatcatattgtg

(3) Fragment 1 and fragment 2 were ligated using NEBuilder homologous recombinase to construct plasmid PRS 316-URA.

(4) The PB transposase gene with the GALS inducible promoter was synthesized and cloned into the vector PRS316-URA using SacI and EcoRI, resulting in the plasmid PRS 316-URA-PBase. The PRS316-URA-PBase vector map is shown in FIG. 1.

b. Construction of mutant pools

PCR primers were designed outside of the transposase Open Reading Frame (ORF): GR-F (SEQ ID NO: 9: taatcagcgaagcgatga) and GR-R (SEQ ID NO: 10: cagcatgcctgctattgtcttcc), wherein homologous sequences of about 50bp are arranged at two ends of transposase ORF on a PRS-URA-PBase vector, a clonth error-prone PCR kit is used for mutating transposase, and the number of mutations can be accumulated by recovering PCR fragments as templates (shown in a flow chart above a figure 2), so that transposase fragments containing point mutations are finally obtained. Screening the reporter vector PRS316-URA-PBase was linearized with XbaI and EcoRI and the original, unmutated transposase was removed. The transposase fragment recovered by PCR and the linearized vector are transformed into ura-deficient yeast strain according to the molar ratio of 10:1 (shown in the flow chart below the figure 2 and the figure 3), and the yeast can utilize the self-contained homologous recombination repair mechanism to ensure that the exogenous target fragment is replaced into the DNA plasmid with the gap through the homologous arm, so that the complete plasmid with the target fragment is automatically combined in the yeast cell. By the method, one-step cloning of the DNA fragment to the yeast strain can be realized, and the phenomenon of high-frequency repeat of mutant in the process of transferring the amplified plasmid constructed by escherichia coli into yeast is reduced. By the method, the clones obtained on the plate after transformation are mutants, and a certain amount of mutant libraries can be obtained by selecting a single clone.

c. Process for screening high-efficiency transposase

As shown in fig. 3, the screening process is divided into two screens. The first screening is carried out on all mutants in a large range, the mutants with transposition efficiency obviously higher than that of an unmutated control group are obtained through screening, the second screening is carried out in the yeast obtained through the first screening, and the mutant with the transposition efficiency increased in the yeast, namely bz-hypase (SEQ ID NO:2 amino acid sequence, SEQ ID NO:3 nucleotide sequence), is obtained through calculating the exact transposition efficiency.

Screening for the first time: the transformed mutant library was picked up and monocloned into a 96-well plate and YPD medium containing G418 antibiotic for activation, and after 24 hours of activation, it was transferred using a replicator and inoculated into YPD medium containing 2% galactose for induction. After 24 hours of induction, the bacterial liquid is diluted to 10-2 or 10-3 (determined according to the growth condition of yeast), 10 mul of spot plate is taken to ura defect type solid culture medium, after 48 hours of culture, the growth condition of mutant is observed, and compared with the non-mutant clone, the clone with obviously improved transposition efficiency is screened out, and secondary screening is carried out.

And (3) screening for the second time: activating the suspected mutant obtained by the first screening for 24 hours, adjusting the OD600 value to be consistent after activation, inoculating the suspected mutant into a YPD culture medium containing 2% galactose according to the proportion of 1:100 for induction for 24 hours, adjusting the OD600 value to be consistent again after induction, diluting the mutant in a gradient manner to 10-2, 10-3 and 10-4, taking 20 mu l of the mutant to dilute the mutant to 10-2 and 10-3, coating the mutant on a ura-deficient solid culture medium for culture for 24 hours, counting the number of clones, and obtaining the clone which has undergone transposition after growing on the ura-deficient solid culture medium. At the same time, 20. mu.l of the solution was diluted to 10-3 and 10-4 and applied to YPD complete solid medium for alignment control, and the grown clones were the total yeast number. Transposase transposition efficiency ═ number of clones that have transposed/total number of clones ═ (number of clones in ura-deficient medium × (fold dilution)/(number of clones in YPD medium × (fold dilution) × (100%). By the method, high-throughput screening can be realized, the throughput screening of 96-960 mutants can be realized by one-time single-person operation, and the probability of obtaining high-activity transposase by screening is greatly increased.

Through the calculation, accurate transposition efficiency of the mutant can be obtained, and the strain with the increased transposition efficiency is selected for mutation site analysis. Inoculating yeast in the initially activated 96-well plate for amplification culture, extracting yeast plasmid, sending to company for sequencing analysis, and obtaining mutant mutation sites by comparison with the original sequence.

The amino acid sequence of hyPBase (SEQ ID NO: 1):

amino acid sequence of bz-hyPBase (SEQ ID NO: 2):

the amino acid sequence of the prior high-activity transposase hyPBase (shown in SEQ ID NO:1) is that isoleucine at the 92-position is mutated into asparagine, threonine at the 119-position is mutated into alanine, and glutamine at the 601-position is mutated into arginine, so as to obtain the amino acid sequence of bz-hyPBase shown in SEQ ID NO: 2.

Nucleotide sequence of human codon optimized hyPBase (SEQ ID NO: 4):

atgggccctgctgccaagagggtcaagttggacggcagcagcctggacgacgagcacatcctgagcgccctgctgcagagcgacgacgagctggtgggcgaggacagcgacagcgaggtgagcgaccacgtgagcgaggacgacgtgcagagcgacaccgaggaggccttcatcgacgaggtgcacgaggtgcagcccaccagcagcggcagcgagatcctggacgagcagaacgtgatcgagcagcccggcagcagcctggccagcaaccgcaatctgaccctgccccagcgcaccatccgcggcaagaacaagcactgctggagcaccagcaagcccacccgccgcagccgcgtcagcgccctgaacatcgtgcgcagccagcgcggccccacccgcatgtgccgcaacatctacgaccccctgctgtgcttcaagctgttcttcaccgacgagatcatcagcgagatcgtgaagtggaccaacgccgagatcagcctgaagcgccgcgagagcatgaccagcgccaccttccgcgacaccaacgaggacgagatctacgccttcttcggcatcctggtgatgaccgccgtgcgcaaggacaaccacatgagcaccgacgacctgttcgaccgcagcctgagcatggtgtacgtgagcgtgatgagccgcgaccgcttcgacttcctgatccgctgcctgcgcatggacgacaagagcatccgccccaccctgcgcgagaacgacgtgttcacccccgtgcgcaagatctgggacctgttcatccaccagtgcatccagaactacacccccggcgcccacctgaccatcgacgagcagctgctgggcttccgcggccgctgccccttccgcgtgtacatccccaacaagcccagcaagtacggcatcaagatcctgatgatgtgcgacagcggcaccaagtacatgatcaacggcatgccctacctgggccgcggcacccagaccaacggcgtgcccctgggcgagtactacgtgaaggagctgagcaagcccgtgcacggcagctgccgcaacatcacctgcgacaactggttcaccagcatccccctggccaagaacctgctgcaggagccctacaagctgaccatcgtgggcaccgtgcgcagcaacaagcgcgagatccccgaggtgctgaagaacagccgcagccgccccgtgggcaccagcatgttctgcttcgacggccccctgaccctggtgagctacaagcccaagcccgccaagatggtgtacctgctgagcagctgcgacgaggacgccagcatcaacgagagcaccggcaagccccagatggtgatgtactacaaccagaccaagggcggcgtggacaccctggaccagatgtgcagcgtgatgacctgcagccgcaagaccaaccgctggcccatggccctgctgtacggcatgatcaacatcgcctgcatcaacagcttcatcatctacagccacaacgtgagcagcaagggcgagaaggtgcagagccgcaagaagttcatgcgcaacctgtacatgggcctgaccagcagcttcatgcgcaagcgcctggaggcccccaccctgaagcgctacctgcgcgacaacatcagcaacatcctgcccaaggaggtgcccggcaccagcgacgacagcaccgaggagcccgtgatgaagaagcgcacctactgcacctactgccccagcaagatccgccgcaaggccagcgccagctgcaagaagtgcaagaaggtgatctgccgcgagcacaacatcgacatgtgccagagctgcttctaa

the nucleotide sequence of bz-hyPBase (SEQ ID NO: 3):

the nucleotide sequence of the prior high-activity enzyme hyPBase is optimized by a human codon to obtain a human codon optimized nucleotide sequence, and the nucleotide mutations of the following sites are carried out based on the human codon optimized nucleotide sequence (SEQ ID NO: 4): 276 base T is mutated into base C, 356 base T is mutated into base C, 900 base G is mutated into base A, 1802 base A is mutated into base G; obtaining the mutant nucleotide sequence which encodes the novel high-activity transposase bz-hyPBase and is shown as SEQ ID NO. 3.

Example 2 higher transposition efficiency of bz-hyPBase in Yeast

A transposon with a G418 resistance gene is inserted into a URA3 gene in a yeast plasmid PRS316 to destroy the expression of the URA gene, transposases with inducible promoters are simultaneously cloned into the PRS316 to generate a plasmid PRS316-URA-Pbase, and plasmids carrying different transposases WT PBase, hypase, optimized hypase and bz-hypase are prepared in parallel. The plasmid is transferred into ura-deficient saccharomyces cerevisiae BJ2168, and the strain can not survive in ura-deficient culture medium. The transposase starts expression under the regulation and control of inducer galactose, transposons of transposons are promoted, transposons of the transposons occur, URA genes are normally expressed, and the clones which have the transposons recover normal growth in URA defective culture media. The efficiency of transposing transposase in saccharomyces cerevisiae can be calculated by counting the number of clones transposable in a certain amount of saccharomyces. By the method, the transposition efficiencies of wild-type piggybac transposase WT PBase, the existing high-activity piggybac transposase hypPBase, transposase optimized hypPBase subjected to codon optimization and added with a nuclear localization signal and bz-hypPBase are compared, and the experimental result of FIG. 4 shows that the transposition efficiency of bz-hypPBase is 3 times higher than that of hypPBase, so that bz-hypPBase is proved to have higher transposition efficiency in yeast.

WT PBase is a plasmid carrying piggybac transposase optimized by mammalian codons, hyPBase is a plasmid carrying existing high-activity piggybac transposase (obtained by mutating 7 amino acid sites of WTPPase described in the background technology), optimized hyPBase is a plasmid carrying transposase obtained by optimizing human codons and adding a nuclear localization signal system to the existing high-activity piggybac transposase, and bz-hyPBase is a plasmid carrying new high-activity transposase screened by the invention (namely, transposase obtained by mutating three amino acid sites of optimized hyPBase described in the embodiment of the invention).

Example 3 higher Gene editing efficiency of bz-hyPBase in CHO cells

We cloned optimized hyppase and bz-hyppase into mammalian cell expression vectors to generate plasmids ploxP-optimized hyppase (structure same as FIG. 5, transposing only the transposase bz-hyppase in FIG. 5 with optimized hyppase) and ploxP-bz-HyPB (FIG. 5), allowing them to express transposase. Human c-myc nuclear localization signals are connected behind the promoters of optimized hyppase and bz-hyppase. The transposon carrying the EGFP gene was cloned into the vector pSAD-EGFP (FIG. 6) to express the green fluorescent protein. Two plasmids expressing transposase and transposon are jointly transferred into CHO cells by electricity, the transposon with EGFP is inserted into a genome under the action of the transposase to enable the transposon to stably express green fluorescent protein, after two subcultures, the cells expressing the green fluorescent protein are counted on 7 th and 14 th days by using a flow cytometry detection technology, and the more the number of the cells capable of expressing the fluorescent protein is, the higher the transposase transposition efficiency is. From the statistical results in FIG. 7, the transposition activity of bz-hyPBase was significantly superior to hyPBase.

Example 4 higher Gene editing efficiency of bz-hyPBase in T cells

We electroporated the ploxP-optimized hypPase and the ploxP-bz-hypB plasmids of example 3 into PBMC cells of peripheral blood mononuclear cells for T cell genome editing. The T cell genome is edited by the transposon with the EGFP green fluorescent protein gene under the action of transposase, and the editing efficiency of the T cell can reflect the activity of the transposase. We performed multiple sets of experiments using PBMC cells from 3 different healthy human sources, and detected the gene editing efficiency using flow cytometry at day 5, with higher EGFP positive rates representing higher transposase activity. As shown in FIG. 8, the transposition activity of bz-hyppase was superior to that of optimized hyppase in PBMC cells from different donors.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the protection scope of the present invention, and although the present invention is described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that several modifications can be made without departing from the principle of the present invention, and these modifications should also be regarded as the protection scope of the present invention.

Sequence listing

<110> Shanghai cell therapy institute

SHANGHAI CELL THERAPY GROUP Co.,Ltd.

<120> a high-activity transposase and use thereof

<130> 199908Z1

<150> CN 201911227263.5

<151> 2019-12-04

<160> 10

<170> SIPOSequenceListing 1.0

<210> 1

<211> 604

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 1

Met Gly Pro Ala Ala Lys Arg Val Lys Leu Asp Gly Ser Ser Leu Asp

1 5 10 15

Asp Glu His Ile Leu Ser Ala Leu Leu Gln Ser Asp Asp Glu Leu Val

20 25 30

Gly Glu Asp Ser Asp Ser Glu Val Ser Asp His Val Ser Glu Asp Asp

35 40 45

Val Gln Ser Asp Thr Glu Glu Ala Phe Ile Asp Glu Val His Glu Val

50 55 60

Gln Pro Thr Ser Ser Gly Ser Glu Ile Leu Asp Glu Gln Asn Val Ile

65 70 75 80

Glu Gln Pro Gly Ser Ser Leu Ala Ser Asn Arg Ile Leu Thr Leu Pro

85 90 95

Gln Arg Thr Ile Arg Gly Lys Asn Lys His Cys Trp Ser Thr Ser Lys

100 105 110

Pro Thr Arg Arg Ser Arg Val Ser Ala Leu Asn Ile Val Arg Ser Gln

115 120 125

Arg Gly Pro Thr Arg Met Cys Arg Asn Ile Tyr Asp Pro Leu Leu Cys

130 135 140

Phe Lys Leu Phe Phe Thr Asp Glu Ile Ile Ser Glu Ile Val Lys Trp

145 150 155 160

Thr Asn Ala Glu Ile Ser Leu Lys Arg Arg Glu Ser Met Thr Ser Ala

165 170 175

Thr Phe Arg Asp Thr Asn Glu Asp Glu Ile Tyr Ala Phe Phe Gly Ile

180 185 190

Leu Val Met Thr Ala Val Arg Lys Asp Asn His Met Ser Thr Asp Asp

195 200 205

Leu Phe Asp Arg Ser Leu Ser Met Val Tyr Val Ser Val Met Ser Arg

210 215 220

Asp Arg Phe Asp Phe Leu Ile Arg Cys Leu Arg Met Asp Asp Lys Ser

225 230 235 240

Ile Arg Pro Thr Leu Arg Glu Asn Asp Val Phe Thr Pro Val Arg Lys

245 250 255

Ile Trp Asp Leu Phe Ile His Gln Cys Ile Gln Asn Tyr Thr Pro Gly

260 265 270

Ala His Leu Thr Ile Asp Glu Gln Leu Leu Gly Phe Arg Gly Arg Cys

275 280 285

Pro Phe Arg Val Tyr Ile Pro Asn Lys Pro Ser Lys Tyr Gly Ile Lys

290 295 300

Ile Leu Met Met Cys Asp Ser Gly Thr Lys Tyr Met Ile Asn Gly Met

305 310 315 320

Pro Tyr Leu Gly Arg Gly Thr Gln Thr Asn Gly Val Pro Leu Gly Glu

325 330 335

Tyr Tyr Val Lys Glu Leu Ser Lys Pro Val His Gly Ser Cys Arg Asn

340 345 350

Ile Thr Cys Asp Asn Trp Phe Thr Ser Ile Pro Leu Ala Lys Asn Leu

355 360 365

Leu Gln Glu Pro Tyr Lys Leu Thr Ile Val Gly Thr Val Arg Ser Asn

370 375 380

Lys Arg Glu Ile Pro Glu Val Leu Lys Asn Ser Arg Ser Arg Pro Val

385 390 395 400

Gly Thr Ser Met Phe Cys Phe Asp Gly Pro Leu Thr Leu Val Ser Tyr

405 410 415

Lys Pro Lys Pro Ala Lys Met Val Tyr Leu Leu Ser Ser Cys Asp Glu

420 425 430

Asp Ala Ser Ile Asn Glu Ser Thr Gly Lys Pro Gln Met Val Met Tyr

435 440 445

Tyr Asn Gln Thr Lys Gly Gly Val Asp Thr Leu Asp Gln Met Cys Ser

450 455 460

Val Met Thr Cys Ser Arg Lys Thr Asn Arg Trp Pro Met Ala Leu Leu

465 470 475 480

Tyr Gly Met Ile Asn Ile Ala Cys Ile Asn Ser Phe Ile Ile Tyr Ser

485 490 495

His Asn Val Ser Ser Lys Gly Glu Lys Val Gln Ser Arg Lys Lys Phe

500 505 510

Met Arg Asn Leu Tyr Met Gly Leu Thr Ser Ser Phe Met Arg Lys Arg

515 520 525

Leu Glu Ala Pro Thr Leu Lys Arg Tyr Leu Arg Asp Asn Ile Ser Asn

530 535 540

Ile Leu Pro Lys Glu Val Pro Gly Thr Ser Asp Asp Ser Thr Glu Glu

545 550 555 560

Pro Val Met Lys Lys Arg Thr Tyr Cys Thr Tyr Cys Pro Ser Lys Ile

565 570 575

Arg Arg Lys Ala Ser Ala Ser Cys Lys Lys Cys Lys Lys Val Ile Cys

580 585 590

Arg Glu His Asn Ile Asp Met Cys Gln Ser Cys Phe

595 600

<210> 2

<211> 604

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 2

Met Gly Pro Ala Ala Lys Arg Val Lys Leu Asp Gly Ser Ser Leu Asp

1 5 10 15

Asp Glu His Ile Leu Ser Ala Leu Leu Gln Ser Asp Asp Glu Leu Val

20 25 30

Gly Glu Asp Ser Asp Ser Glu Val Ser Asp His Val Ser Glu Asp Asp

35 40 45

Val Gln Ser Asp Thr Glu Glu Ala Phe Ile Asp Glu Val His Glu Val

50 55 60

Gln Pro Thr Ser Ser Gly Ser Glu Ile Leu Asp Glu Gln Asn Val Ile

65 70 75 80

Glu Gln Pro Gly Ser Ser Leu Ala Ser Asn Arg Asn Leu Thr Leu Pro

85 90 95

Gln Arg Thr Ile Arg Gly Lys Asn Lys His Cys Trp Ser Thr Ser Lys

100 105 110

Pro Thr Arg Arg Ser Arg Ala Ser Ala Leu Asn Ile Val Arg Ser Gln

115 120 125

Arg Gly Pro Thr Arg Met Cys Arg Asn Ile Tyr Asp Pro Leu Leu Cys

130 135 140

Phe Lys Leu Phe Phe Thr Asp Glu Ile Ile Ser Glu Ile Val Lys Trp

145 150 155 160

Thr Asn Ala Glu Ile Ser Leu Lys Arg Arg Glu Ser Met Thr Ser Ala

165 170 175

Thr Phe Arg Asp Thr Asn Glu Asp Glu Ile Tyr Ala Phe Phe Gly Ile

180 185 190

Leu Val Met Thr Ala Val Arg Lys Asp Asn His Met Ser Thr Asp Asp

195 200 205

Leu Phe Asp Arg Ser Leu Ser Met Val Tyr Val Ser Val Met Ser Arg

210 215 220

Asp Arg Phe Asp Phe Leu Ile Arg Cys Leu Arg Met Asp Asp Lys Ser

225 230 235 240

Ile Arg Pro Thr Leu Arg Glu Asn Asp Val Phe Thr Pro Val Arg Lys

245 250 255

Ile Trp Asp Leu Phe Ile His Gln Cys Ile Gln Asn Tyr Thr Pro Gly

260 265 270

Ala His Leu Thr Ile Asp Glu Gln Leu Leu Gly Phe Arg Gly Arg Cys

275 280 285

Pro Phe Arg Val Tyr Ile Pro Asn Lys Pro Ser Lys Tyr Gly Ile Lys

290 295 300

Ile Leu Met Met Cys Asp Ser Gly Thr Lys Tyr Met Ile Asn Gly Met

305 310 315 320

Pro Tyr Leu Gly Arg Gly Thr Gln Thr Asn Gly Val Pro Leu Gly Glu

325 330 335

Tyr Tyr Val Lys Glu Leu Ser Lys Pro Val His Gly Ser Cys Arg Asn

340 345 350

Ile Thr Cys Asp Asn Trp Phe Thr Ser Ile Pro Leu Ala Lys Asn Leu

355 360 365

Leu Gln Glu Pro Tyr Lys Leu Thr Ile Val Gly Thr Val Arg Ser Asn

370 375 380

Lys Arg Glu Ile Pro Glu Val Leu Lys Asn Ser Arg Ser Arg Pro Val

385 390 395 400

Gly Thr Ser Met Phe Cys Phe Asp Gly Pro Leu Thr Leu Val Ser Tyr

405 410 415

Lys Pro Lys Pro Ala Lys Met Val Tyr Leu Leu Ser Ser Cys Asp Glu

420 425 430

Asp Ala Ser Ile Asn Glu Ser Thr Gly Lys Pro Gln Met Val Met Tyr

435 440 445

Tyr Asn Gln Thr Lys Gly Gly Val Asp Thr Leu Asp Gln Met Cys Ser

450 455 460

Val Met Thr Cys Ser Arg Lys Thr Asn Arg Trp Pro Met Ala Leu Leu

465 470 475 480

Tyr Gly Met Ile Asn Ile Ala Cys Ile Asn Ser Phe Ile Ile Tyr Ser

485 490 495

His Asn Val Ser Ser Lys Gly Glu Lys Val Gln Ser Arg Lys Lys Phe

500 505 510

Met Arg Asn Leu Tyr Met Gly Leu Thr Ser Ser Phe Met Arg Lys Arg

515 520 525

Leu Glu Ala Pro Thr Leu Lys Arg Tyr Leu Arg Asp Asn Ile Ser Asn

530 535 540

Ile Leu Pro Lys Glu Val Pro Gly Thr Ser Asp Asp Ser Thr Glu Glu

545 550 555 560

Pro Val Met Lys Lys Arg Thr Tyr Cys Thr Tyr Cys Pro Ser Lys Ile

565 570 575

Arg Arg Lys Ala Ser Ala Ser Cys Lys Lys Cys Lys Lys Val Ile Cys

580 585 590

Arg Glu His Asn Ile Asp Met Cys Arg Ser Cys Phe

595 600

<210> 3

<211> 1815

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 3

atgggccctg ctgccaagag ggtcaagttg gacggcagca gcctggacga cgagcacatc 60

ctgagcgccc tgctgcagag cgacgacgag ctggtgggcg aggacagcga cagcgaggtg 120

agcgaccacg tgagcgagga cgacgtgcag agcgacaccg aggaggcctt catcgacgag 180

gtgcacgagg tgcagcccac cagcagcggc agcgagatcc tggacgagca gaacgtgatc 240

gagcagcccg gcagcagcct ggccagcaac cgcaacctga ccctgcccca gcgcaccatc 300

cgcggcaaga acaagcactg ctggagcacc agcaagccca cccgccgcag ccgcgccagc 360

gccctgaaca tcgtgcgcag ccagcgcggc cccacccgca tgtgccgcaa catctacgac 420

cccctgctgt gcttcaagct gttcttcacc gacgagatca tcagcgagat cgtgaagtgg 480

accaacgccg agatcagcct gaagcgccgc gagagcatga ccagcgccac cttccgcgac 540

accaacgagg acgagatcta cgccttcttc ggcatcctgg tgatgaccgc cgtgcgcaag 600

gacaaccaca tgagcaccga cgacctgttc gaccgcagcc tgagcatggt gtacgtgagc 660

gtgatgagcc gcgaccgctt cgacttcctg atccgctgcc tgcgcatgga cgacaagagc 720

atccgcccca ccctgcgcga gaacgacgtg ttcacccccg tgcgcaagat ctgggacctg 780

ttcatccacc agtgcatcca gaactacacc cccggcgccc acctgaccat cgacgagcag 840

ctgctgggct tccgcggccg ctgccccttc cgcgtgtaca tccccaacaa gcccagcaaa 900

tacggcatca agatcctgat gatgtgcgac agcggcacca agtacatgat caacggcatg 960

ccctacctgg gccgcggcac ccagaccaac ggcgtgcccc tgggcgagta ctacgtgaag 1020

gagctgagca agcccgtgca cggcagctgc cgcaacatca cctgcgacaa ctggttcacc 1080

agcatccccc tggccaagaa cctgctgcag gagccctaca agctgaccat cgtgggcacc 1140

gtgcgcagca acaagcgcga gatccccgag gtgctgaaga acagccgcag ccgccccgtg 1200

ggcaccagca tgttctgctt cgacggcccc ctgaccctgg tgagctacaa gcccaagccc 1260

gccaagatgg tgtacctgct gagcagctgc gacgaggacg ccagcatcaa cgagagcacc 1320

ggcaagcccc agatggtgat gtactacaac cagaccaagg gcggcgtgga caccctggac 1380

cagatgtgca gcgtgatgac ctgcagccgc aagaccaacc gctggcccat ggccctgctg 1440

tacggcatga tcaacatcgc ctgcatcaac agcttcatca tctacagcca caacgtgagc 1500

agcaagggcg agaaggtgca gagccgcaag aagttcatgc gcaacctgta catgggcctg 1560

accagcagct tcatgcgcaa gcgcctggag gcccccaccc tgaagcgcta cctgcgcgac 1620

aacatcagca acatcctgcc caaggaggtg cccggcacca gcgacgacag caccgaggag 1680

cccgtgatga agaagcgcac ctactgcacc tactgcccca gcaagatccg ccgcaaggcc 1740

agcgccagct gcaagaagtg caagaaggtg atctgccgcg agcacaacat cgacatgtgc 1800

cggagctgct tctaa 1815

<210> 4

<211> 1815

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 4

atgggccctg ctgccaagag ggtcaagttg gacggcagca gcctggacga cgagcacatc 60

ctgagcgccc tgctgcagag cgacgacgag ctggtgggcg aggacagcga cagcgaggtg 120

agcgaccacg tgagcgagga cgacgtgcag agcgacaccg aggaggcctt catcgacgag 180

gtgcacgagg tgcagcccac cagcagcggc agcgagatcc tggacgagca gaacgtgatc 240

gagcagcccg gcagcagcct ggccagcaac cgcaatctga ccctgcccca gcgcaccatc 300

cgcggcaaga acaagcactg ctggagcacc agcaagccca cccgccgcag ccgcgtcagc 360

gccctgaaca tcgtgcgcag ccagcgcggc cccacccgca tgtgccgcaa catctacgac 420

cccctgctgt gcttcaagct gttcttcacc gacgagatca tcagcgagat cgtgaagtgg 480

accaacgccg agatcagcct gaagcgccgc gagagcatga ccagcgccac cttccgcgac 540

accaacgagg acgagatcta cgccttcttc ggcatcctgg tgatgaccgc cgtgcgcaag 600

gacaaccaca tgagcaccga cgacctgttc gaccgcagcc tgagcatggt gtacgtgagc 660

gtgatgagcc gcgaccgctt cgacttcctg atccgctgcc tgcgcatgga cgacaagagc 720

atccgcccca ccctgcgcga gaacgacgtg ttcacccccg tgcgcaagat ctgggacctg 780

ttcatccacc agtgcatcca gaactacacc cccggcgccc acctgaccat cgacgagcag 840

ctgctgggct tccgcggccg ctgccccttc cgcgtgtaca tccccaacaa gcccagcaag 900

tacggcatca agatcctgat gatgtgcgac agcggcacca agtacatgat caacggcatg 960

ccctacctgg gccgcggcac ccagaccaac ggcgtgcccc tgggcgagta ctacgtgaag 1020

gagctgagca agcccgtgca cggcagctgc cgcaacatca cctgcgacaa ctggttcacc 1080

agcatccccc tggccaagaa cctgctgcag gagccctaca agctgaccat cgtgggcacc 1140

gtgcgcagca acaagcgcga gatccccgag gtgctgaaga acagccgcag ccgccccgtg 1200

ggcaccagca tgttctgctt cgacggcccc ctgaccctgg tgagctacaa gcccaagccc 1260

gccaagatgg tgtacctgct gagcagctgc gacgaggacg ccagcatcaa cgagagcacc 1320

ggcaagcccc agatggtgat gtactacaac cagaccaagg gcggcgtgga caccctggac 1380

cagatgtgca gcgtgatgac ctgcagccgc aagaccaacc gctggcccat ggccctgctg 1440

tacggcatga tcaacatcgc ctgcatcaac agcttcatca tctacagcca caacgtgagc 1500

agcaagggcg agaaggtgca gagccgcaag aagttcatgc gcaacctgta catgggcctg 1560

accagcagct tcatgcgcaa gcgcctggag gcccccaccc tgaagcgcta cctgcgcgac 1620

aacatcagca acatcctgcc caaggaggtg cccggcacca gcgacgacag caccgaggag 1680

cccgtgatga agaagcgcac ctactgcacc tactgcccca gcaagatccg ccgcaaggcc 1740

agcgccagct gcaagaagtg caagaaggtg atctgccgcg agcacaacat cgacatgtgc 1800

cagagctgct tctaa 1815

<210> 5

<211> 24

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 5

aagccgctaa aggcattatc cgcc 24

<210> 6

<211> 28

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 6

aactgtgccc tccatggaaa aatcagtc 28

<210> 7

<211> 69

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 7

gactgatttt tccatggagg gcacagttaa ccctagaaag atagtctgcg taaaattgac 60

gcatgcgac 69

<210> 8

<211> 50

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 8

ggcggataat gcctttagcg gcttaaccct agaaagataa tcatattgtg 50

<210> 9

<211> 18

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 9

taatcagcga agcgatga 18

<210> 10

<211> 23

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 10

cagcatgcct gctattgtct tcc 23

Claims

1. An amino acid sequence of a highly active transposase comprising one or more of the following amino acid sequences: (1) 1, and the amino acid sequence with transposase activity is obtained by carrying out amino acid mutation on the following sites of the amino acid sequence shown in SEQ ID NO: at least one amino acid position selected from amino acid 92, amino acid 119, and amino acid 601; preferably, amino acid mutation is carried out at amino acid 92, amino acid 119 or amino acid 601 simultaneously; more preferably, isoleucine at position 92 is mutated to asparagine, desmosine at position 119 is mutated to alanine, and glutamine at position 601 is mutated to arginine;

(2) an amino acid sequence having transposase activity obtained by deletion, substitution, insertion or addition of one or more amino acids other than the amino acid mutation at amino acid position 92, amino acid position 119 or amino acid position 601 in (1); preferably amino acid sequence with transposase activity obtained by deletion, substitution, insertion or addition of one or more amino acids except amino acid mutation at amino acid 92, amino acid 119 or amino acid 601; more preferably, the amino acid sequence with transposase activity is obtained by deleting, substituting, inserting or adding one or more amino acids except that isoleucine at position 92 is mutated into asparagine, threonine at position 119 is mutated into alanine, and glutamine at position 601 is mutated into arginine.

2. The amino acid sequence of claim 1, further comprising an amino acid sequence of a functional protein; the amino acid sequence of the functional protein is preferably the amino acid sequence of a nuclear localization signal, the amino acid sequence of an EGFP green fluorescent protein, the amino acid sequence of a tag protein, the amino acid sequence of an antibody and the like.

3. An amino acid sequence of a high-activity transposase is characterized by comprising one or more amino acid sequences shown in SEQ ID NO. 2 or an amino acid sequence with transposase activity obtained by carrying out amino acid deletion, substitution, insertion or addition on one or more amino acids except amino acid 92, amino acid 119 and amino acid 601 in the amino acid sequence shown in SEQ ID NO. 2.

4. A peptide stretch comprising one or more amino acid sequences according to any one of claims 1 to 3.

5. A protein comprising one or more amino acid sequences as claimed in any of claims 1 to 3 or one or more peptide fragments as claimed in claim 4, having transposase activity.

6. A nucleotide sequence encoding an amino acid sequence according to any one of claims 1 to 3 or a peptide fragment according to claim 4 or a protein according to claim 5, comprising one or more of the following nucleotide sequences: (1) carrying out base mutation on the nucleotide sequence shown in SEQ ID NO.4 at the following sites: at least one of base 276, base 356, base 900 or base 1802; preferably, base mutations at 275 th, 356 th, 900 th and 1802 th bases are performed simultaneously, more preferably, base T at 275 th is mutated to base C, base T at 356 th is mutated to base C, base G at 900 th is mutated to base A, and base A at 1802 th is mutated to base G; or

(2) A nucleotide sequence complementary to the mutated nucleotide sequence of (1); or

(3) A nucleotide sequence which is overlapped with the mutated nucleotide sequence in the step (1) and has the same coding function; or

(4) A nucleotide sequence which is hybridized with the mutated nucleotide sequence in the step (1) and has the same coding function; or

(5) A nucleotide sequence obtained by substituting, deleting or adding one or more bases of the nucleotide sequence in (1), (2), (3) or (4) except for the gene mutation site and having the same coding function; or

(6) A nucleotide sequence having at least 80% homology with the nucleotide sequence in (1), (2), (3) or (4) and having the same coding function; preferably at least 90% homologous and having the same coding function; more preferably nucleotide sequences which are at least 96% homologous and have the same coding function.

7. A nucleotide sequence encoding an amino acid sequence according to any one of claims 1 to 3 or a peptide fragment according to claim 4 or a protein according to claim 5, comprising one or more of the following nucleotide sequences:

(1) 3, and the nucleotide sequence shown in SEQ ID NO; or

Nucleotide sequences for column functions.

8. The nucleotide sequence of claim 6 or 7 further comprising a nucleotide sequence encoding a functional protein, preferably a nucleotide sequence encoding a nuclear localization signal, a nucleotide sequence expressing EGFP green fluorescent protein, a nucleotide sequence encoding a peptide fragment of a tag protein or a nucleotide sequence encoding an antibody.

9. A nucleic acid comprising the nucleotide sequence of any one of claims 6-8.

10. A nucleic acid construct encoding an amino acid sequence according to any one of claims 1 to 3 or a peptide fragment according to claim 4 or a protein according to claim 5.

11. A nucleic acid construct according to claim 10 comprising a nucleotide sequence according to any one of claims 6 to 8, or comprising a nucleic acid according to claim 9.

12. A recombinant vector comprising the nucleotide sequence of any one of claims 6-8, or the nucleic acid of claim 9, or the nucleic acid construct of any one of claims 10-11; the recombinant vector is preferably a recombinant cloning vector, a recombinant eukaryotic expression vector or a recombinant viral vector, the recombinant cloning vector is preferably a pRS vector, a T vector or a pUC vector, the recombinant eukaryotic expression vector is preferably pEGFP, pCMVp-NEO-BAN or pSV2, and the recombinant viral vector is preferably a recombinant adenoviral vector or a lentiviral vector.

13. A host cell comprising the nucleic acid construct of any one of claims 10-11 or the recombinant vector of claim 12; the host cell is preferably an E.coli cell, an insect cell, a yeast cell or a mammalian cell.

14. A gene transfer system comprising the peptide fragment of claim 4, the protein of claim 5, the nucleic acid of claim 9, the nucleic acid construct of any one of claims 10 to 11, the recombinant vector of claim 12, or the host cell of claim 13.

15. A gene transfer system according to claim 14, further comprising a transposon gene, wherein the nucleic acid of claim 9 or the nucleic acid construct of any one of claims 10 to 11 is integrated with the transposon gene; or the nucleic acid of claim 9 or the nucleic acid construct of any one of claims 10-11, relatively independent of a transposon gene; or the nucleic acid of claim 9 or the nucleic acid construct of any one of claims 10 to 11 is located on the same recombinant vector as the transposon gene; or the nucleic acid of claim 9 or the nucleic acid construct of any one of claims 10 to 11 and the transposon gene are located on different recombinant vectors; or a transposon gene integrated in the nucleic acid construct of any one of claims 10 to 11; or a transposon gene integrated in the recombinant vector of claim 12; or a transposon gene into a host cell according to claim 13; or a transposon gene located outside the host cell of claim 13.

16. Use of the peptide stretch of claim 4, or the protein of claim 5, or the nucleic acid of claim 9, or the nucleic acid construct of any one of claims 10 to 11, or the recombinant vector of claim 12, or the host cell of claim 13, or the gene transfer system of any one of claims 14 to 15, in any one of:

(1) preparing or using as a medicament and/or formulation for genomic research, gene therapy, cell therapy, or pluripotent stem cell induction and/or differentiation; preferably prepared or used as a medicament and/or formulation for integrating a foreign gene into the genome of a host cell, preferably a host cell being an E.coli cell, an insect cell, a yeast cell or a mammalian cell;

(2) preparing or using as a tool for genomic research, gene therapy, cell therapy, or pluripotent stem cell induction and/or differentiation; preferably, the preparation or use as a means for integrating the foreign gene into the genome of a host cell, preferably a host cell is an E.coli cell, an insect cell, a yeast cell or a mammalian cell.

17. A medicament and/or formulation for genomic research, gene therapy, cell therapy, or pluripotent stem cell induction and/or differentiation comprising the peptide stretch of claim 4, or the protein of claim 5, or the nucleic acid of claim 9, or the nucleic acid construct of any one of claims 10 to 11, or the recombinant vector of claim 12, or the host cell of claim 13, or the gene transfer system of any one of claims 14 to 15.

18. A tool for genomic research, gene therapy, cell therapy, or pluripotent stem cell induction and/or differentiation comprising the peptide stretch of claim 4, or the protein of claim 5, or the nucleic acid of claim 9, or the nucleic acid construct of any one of claims 10 to 11, or the recombinant vector of claim 12, or the host cell of claim 13, or the gene transfer system of any one of claims 14 to 15.