CN113755512A

CN113755512A - Method for preparing tandem repeat protein and application

Info

Publication number: CN113755512A
Application number: CN202011405477.XA
Authority: CN
Inventors: 毕昌昊; 张学礼; 刘丽; 赵东东; 李斯微
Original assignee: Tianjin Institute of Industrial Biotechnology of CAS
Current assignee: Tianjin Institute of Industrial Biotechnology of CAS
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2021-12-07
Anticipated expiration: 2040-12-03
Also published as: CN113755512B

Abstract

The invention provides a method for preparing tandem repeat protein and related products and application thereof. The method for preparing the tandem repeat protein comprises the steps of introducing an expression vector containing a double-stranded DNA molecule named as a single-copy gene expression cassette into a receptor cell to obtain a recombinant cell, extracting total RNA of the recombinant cell, translating the total RNA to obtain the tandem repeat protein, and greatly shortening the time for preparing the long tandem repeat protein. Experiments prove that only 7 days are needed to obtain the tandem repeat MaSp1 which is repeated for 40 times, and the time is greatly shortened compared with the traditional method. The method has the characteristics of short experimental period, time and cost saving, high efficiency and the like.

Description

Method for preparing tandem repeat protein and application

Technical Field

The invention relates to a method for preparing tandem repeat protein and application thereof in the field of biotechnology.

Background

Tandem repeat proteins are proteins with highly repetitive amino acid sequences, produced by the expression of tandem repeat genes. In the prior art, tandem repeat proteins are prepared by constructing an expression vector containing tandem repeat DNA and then expressing the tandem repeat proteins. At present, the construction of tandem repeat DNA expression vector adopts 2 methods, mainly including asymmetric viscous end complementation method and isocaudarner method. The copy number generated by the asymmetric sticky end complementation method is random, and a plurality of enzymes are needed for enzyme digestion and connection. The isocaudarner method is also complicated and requires repeated enzyme digestion and ligation. Both methods are time consuming and labor intensive.

The dragline silk protein in spider silk has very high strength, and the dragline silk strength of the spider silk is 5 times that of steel wires and 3 times that of artificial Kevlar fibers under the same weight. At the same time, spider silks have good plasticity, and the two characteristics enable the spider silks to be widely applied to various fields. In the industrial sector, for example, composite materials for parachutes, protective clothing, aircraft are produced. In the biomedical field, wound sutures, delivery vehicles for biopharmaceuticals, scaffolds for cell culture and organ transplantation are included. Dragline silk is composed primarily of the spidroin proteins MaSp1(major ampullate spidroins 1) and MaSp2(major ampullate spidroins 2). These two proteins are highly modular proteins with long repeats within the sequence, flanked by approximately 100 amino acid residues in length. However, spider silks are difficult to obtain in large quantities by feeding spiders, because they have strong field awareness and aggressiveness. Therefore, many studies have attempted to express recombinant spidroin proteins in other hosts. Increasing the length of the recombinant dragline silk protein is one of the key factors for improving the mechanical performance of spider silk spinning. The size of the dragline silk protein in nature is 250-320 kDa. A scholarly expresses 284.9kDa recombinant spidroin protein by using expression escherichia coli in 2010, and the spinning mechanical property of the recombinant spidroin protein is similar to that of natural spidroin. The recombinant spidroin protein for expressing 184.9kDa needs to synthesize a repeating unit MaSp1, then utilize the homologous enzyme seamless splicing technology MaSp2 concatemer, further repeat the same method to synthesize MaSp4, MaSp8, MaSp16, MaSp32 and MaSp48 in sequence, and finally splice into MaSp 96. The steps are complicated, time-consuming and labor-consuming. And if optimization of spider silk sequences is desired, the genes need to be resynthesized, again taking a significant amount of time to reconstruct a series of concatemers.

Disclosure of Invention

The problem to be solved by the present invention is how to prepare tandem repeat proteins.

In order to solve the above technical problems, the present invention provides a method for preparing tandem repeat protein, comprising introducing an expression vector containing a double-stranded DNA molecule named single-copy gene expression cassette into a recipient cell to obtain a recombinant cell, culturing the recombinant cell, and expressing to obtain tandem repeat protein; the single-copy gene expression cassette comprises a promoter, an intron named 3 ' intron connected to the promoter, a target protein coding gene named single-copy gene connected to the 3 ' intron, a coding sequence of Ribosome Binding Site (RBS) connected to the target protein coding gene, a spacer sequence connected to the coding sequence of ribosome binding site, an initiation codon connected to the spacer sequence, and an intron named 5 ' intron connected to the initiation codon; the 3 'intron and the 5' intron satisfy condition a, where condition a is that a splicing bubble is formed by base complementary pairing of the precursor RNA transcribed from the single-copy gene expression cassette in the recombinant cell, and a mature circular single-stranded RNA molecule is produced by a splicing reaction; the target protein coding gene does not contain a stop codon.

In the above method, the spacer is a sequence between RBS and ATG, and functions to bind ribosomes to mRNA with high strength. The spacer sequence may be a double-stranded DNA of 4-10bp, for example, a double-stranded DNA whose nucleotide sequence of one strand is the 5535-5543 th nucleotide of the sequence 1.

In the above method, the tandem repeat protein may contain more than 2 copies of the single copy protein, such as more than 7 copies of the single copy protein, and more than 10 copies of the single copy protein.

In the above method, the single-copy gene expression cassette is composed of a promoter, the 3 'intron, the target protein-encoding gene, the coding sequence for the ribosome binding site, the spacer sequence, the initiation codon, and the 5' intron, which are linked to each other.

As the expression cassette of the circular mRNA of the target protein, the single-copy gene expression cassette may include a promoter for initiating the transcription of the gene encoding the target protein, and may further include a terminator for terminating the transcription of the gene encoding the target protein. Further, the single copy gene expression cassette may also include an enhancer sequence. Promoters useful in the present invention include, but are not limited to: constitutive promoters, tissue, organ and development specific promoters, and inducible promoters. Examples of promoters include, but are not limited to: the T7 promoter of the T7 phage, the constitutive promoter of cauliflower mosaic virus 35S. They may be used alone or in combination with other promoters. Suitable transcription terminators include, but are not limited to: an agrobacterium nopaline synthase terminator (NOS terminator), a cauliflower mosaic virus CaMV 35S terminator, a tml terminator.

In the above method, the single copy gene is a gene encoding a protein of interest, and the gene encoding the protein of interest does not contain a stop codon (TAA, TGA or TAG).

In the above method, the initiation codon is ATG.

In the above method, the target gene may further comprise a replication origin (pMB1) gene.

In the above method, the target gene may further comprise a selection marker gene. The selection marker gene is a gene of known function and sequence capable of functioning as a specific marker. For example, a gene encoding an enzyme or a luminescent compound which can produce a color change (GUS gene, luciferase gene, etc.), a marker gene for antibiotics (e.g., nptII gene conferring resistance to kanamycin and related antibiotics, bar gene conferring resistance to the herbicide phosphinothricin, hph gene conferring resistance to the antibiotic hygromycin, and dhfr gene conferring resistance to methotrexate, EPSPS gene conferring resistance to glyphosate), or a chemical-resistant marker gene (e.g., herbicide-resistant gene), a mannose-6-phosphate isomerase gene providing the ability to metabolize mannose.

In the above method, the target protein-encoding gene encodes a target protein, which may be MaSp 1; the MaSp1 is a protein with an amino acid sequence of SEQ ID No. 3.

In the above method, the recipient cell is any one of C1) -C4):

C1) a prokaryotic microbial cell;

C2) gram-negative bacterial cells;

C3) an escherichia bacterial cell;

C4) escherichia coli BL21(DE3) cells.

In the above method, the 3 'intron and the 5' intron which satisfy condition a are a pair of introns as follows:

the 3 'intron comprises 6 splicing bubbles and 3' splicing sites, the names of the coding DNAs of the 6 splicing bubbles are 3 'sp 1 gene, 3' sp2 gene, 3 'sp 3 gene, 3' sp4 gene, 3 'sp 5 gene and 3' sp6 gene respectively, and the coding DNAs of the 3 'splicing sites are called 3' ss gene; the nucleotide sequence of the 3' sp1 gene which is a chain is a double-stranded DNA molecule at the 5193-5214 th site of the sequence 1 in the sequence table; the nucleotide sequence of the 3' sp2 gene which is a chain is a double-stranded DNA molecule at the 5278-5289 th site of the sequence 1 in the sequence table; the nucleotide sequence of the 3' sp3 gene which is a chain is a double-stranded DNA molecule at 5293-5306 th site of the sequence 1 in the sequence table; the nucleotide sequence of the 3' sp4 gene which is one strand is a double-stranded DNA molecule at the 5318-th-5337 th site of the sequence 1 in the sequence table; the nucleotide sequence of the 3' sp5 gene which is one strand is a double-stranded DNA molecule at the 5352-5370 th site of the sequence 1 in the sequence table; the nucleotide sequence of one strand of the 3' sp6 gene is a double-stranded DNA molecule at the 5371-5386 th site of the sequence 1 in the sequence table; the nucleotide sequence of which the 3' splice site is a strand is a double-stranded DNA molecule at the 5419-5423 th site of the sequence 1 in the sequence table;

the 5 ' intron contains a 5 ' splice site and a 5 ' ss sequence; the nucleotide sequence of the 5' splice site is 5547-5556 position of the sequence 1; the nucleotide sequence of the 5 ' ss sequence is 5557-5721 of the sequence 1, comprises 4 splicing bubbles, and the names of coding DNA are 5 ' sp1 gene, 5 ' sp2 gene, 5 ' sp3 gene and 5 ' sp4 gene respectively; the nucleotide sequence of one strand of the 5' sp1 gene is a double-stranded DNA molecule at the 5569-5590 position of the sequence 1 in the sequence table; the nucleotide sequence of which one strand is the 5' sp2 gene is a double-stranded DNA molecule at 5631-5643 site of the sequence 1 in the sequence table; the 5 'sp 3 gene is a double-stranded DNA molecule with the nucleotide sequence of one strand being the 5648-th-5698-th site of the sequence 1 in the sequence table, and the 5' sp4 gene is a double-stranded DNA molecule with the nucleotide sequence of one strand being the 5671-th-5687-th site of the sequence 1 in the sequence table.

In the above method, the 3 'intron is a double-stranded DNA having the nucleotide sequence of the 5190-5423 th nucleotide of the sequence 1 in one strand (coding strand), and the 5' intron is a double-stranded DNA having the nucleotide sequence of the 5547-5721 th nucleotide of the sequence 1 in one strand (coding strand).

In the above method, the single copy gene expression cassette is a double-stranded DNA molecule having one strand (coding strand) with the nucleotide sequence of position 5117-5835 of SEQ ID No. 1;

or the expression vector is a double-stranded DNA molecule (expressing tandem repeat MaSp protein) with one strand of which the nucleotide sequence is SEQ ID No. 1.

The invention also provides any one of the following products related to the method:

A1) said double stranded DNA molecule in said method named single copy gene expression cassette;

A2) a vector containing a1) the double-stranded DNA molecule;

A3) a recombinant microorganism comprising the double-stranded DNA molecule of A1).

The vector of A2) can be constructed using an existing expression vector. The existing expression vectors comprise pMD 18-T vector, pET21b and the like. The existing expression vector may also contain the 3' untranslated region of the foreign gene, i.e., contain the polyadenylation signal and any other DNA segments involved in mRNA processing or gene expression. The poly A signal can direct the addition of poly A to the 3' end of the mRNA precursor. Construction of the vector according to A2), enhancers, such as transcription enhancers, may also be used, which may be ATG start codons or adjacent regions, but which must be in reading frame with the coding sequence in order to ensure correct translation of the entire sequence. In order to facilitate identification and screening of the transgenic results, the existing expression vectors used may be processed, for example, by adding genes encoding enzymes or luminescent compounds which produce a color change (GUS gene, luciferase gene, etc.), marker genes for antibiotics (e.g., nptII gene which confers resistance to kanamycin and related antibiotics, bar gene which confers resistance to phosphinothricin as an herbicide, hph gene which confers resistance to hygromycin as an antibiotic, dhfr gene which confers resistance to methatrexate, EPSPS gene which confers resistance to glyphosate), or marker genes for chemical resistance (e.g., herbicide resistance), mannose-6-phosphate isomerase gene which provides the ability to metabolize mannose.

The invention provides the use of the above method or the above product in the preparation of tandem repeat proteins.

The invention provides a method for preparing tandem repeat protein, which comprises the steps of introducing an expression vector containing double-stranded DNA molecules named as a single-copy gene expression cassette into a receptor cell to obtain a recombinant cell, extracting total RNA of the recombinant cell, translating the total RNA into tandem repeat protein, and greatly shortening the time for preparing long tandem repeat protein. Experiments prove that only 7 days are needed to obtain 40 times of tandem repeat MaSp1 protein, and the time is greatly shortened.

Drawings

FIG. 1 is a schematic diagram of the structure of MaSp1 RNA expression cassette in example 1 of the present invention. In the figure, RBS is the coding sequence of the ribosome binding site and ATG is the initiation codon.

FIG. 2 is a schematic diagram showing the mechanism of intron splicing to generate circular MaSp1 RNA in example 1 of the present invention. In the figure, BSJ is a back 'splice junction (back' splice junction) site, i.e., a splice junction; RBS is the ribosome binding site; ATG is the initiation codon.

FIG. 3 is a schematic diagram of the mechanism of the MaSp1 tandem repeat protein translation in example 1 of the present invention. RBS is the ribosome binding site; ATG is the initiation codon.

FIG. 4 is a diagram showing the confirmation of the looping of MaSp1 RNA in example 1 of the present invention.

FIG. 5 is a graph showing the result of sanger sequencing of the MaSp1 RNA splice junction in example 1 of the present invention.

FIG. 6 is a PAGE gel of translated circularized MaSp1 RNA according to example 1 of the present invention, wherein M is Marker, 1 is MaSp1 inclusion body, and 2 is MaSp1 supernatant.

FIG. 7 is a Western drawing of the protein after translation of the cyclized MaSp1 RNA of example 1, wherein 1 is MaSp1 inclusion body and 2 is MaSp1 supernatant.

FIG. 8 is a mass spectrum of a suspected MaSp1 protein in example 1 of the present invention.

Detailed Description

The present invention is described in further detail below with reference to specific embodiments, which are given for the purpose of illustration only and are not intended to limit the scope of the invention. The experimental procedures in the following examples are conventional unless otherwise specified. Materials, reagents and the like used in the following examples are commercially available unless otherwise specified.

In the examples described below, Escherichia coli DH5 α (BC102-02) was produced by Biomed; coli BL21(CW0809S) is a century product of Beijing kang.

In the following examples, the RNAprep Pure culture cell/bacteria total RNA extraction kit (DP430) is a product of TIANGEN corporation; the Rever Tra Ace qPCR RT kit cDNA Synthesis kit (FSQ-101) is a product of TOYOBO corporation.

In the following examples, 10xBSA protein solution (B9000S) was NEB; 2XEs Taq MasterMix (with dye) (CW0690H) is a century company product of Beijing kang.

In the following examples, the media used are specifically as follows:

the solid LB culture medium is a sterile culture medium prepared from tryptone, yeast extract, NaCl, agar and deionized water, and the content of the tryptone, the yeast extract, the NaCl and the agar is as follows: 10g/L tryptone, 5g/L yeast extract, 10g/L NaCl and 15g/L agar.

The liquid LB culture medium is a sterile culture medium prepared from tryptone, yeast extract, NaCl and deionized water, and the content of the tryptone, the yeast extract and the NaCl is as follows: 10g/L tryptone, 5g/L yeast extract, 10g/L NaCl.

The solid LB culture medium with 100 mug/mL ampicillin concentration is a sterile culture medium made of ampicillin, tryptone, yeast extract, NaCl, agar and deionized water, and the contents of ampicillin, tryptone, yeast extract, NaCl and agar are as follows: 100 ug/mL ampicillin, 10g/L tryptone, 5g/L yeast extract, 10g/L NaCl, 15g/L agar.

The liquid LB culture medium with 100 mug/mL ampicillin concentration is a sterile culture medium made of ampicillin, tryptone, yeast extract, NaCl and deionized water, and the contents of ampicillin, tryptone, yeast extract and NaCl are as follows: 100. mu.g/mL ampicillin, 10g/L tryptone, 5g/L yeast extract, 10g/L NaCl.

Example 1 preparation of tandem repeat MaSp1

This example prepared an expression vector containing a single copy gene expression cassette named pMaSP1, a double-stranded DNA whose nucleotide sequence of one strand was SEQ ID No.1 of the sequence Listing (pMaSP 1). In the sequence 1, the 494-1459 th site is an apramycin resistance gene, the 5117 th site 5835 th site is a DNA molecule named single copy gene expression cassette, which is hereinafter referred to as MaSp1 RNA expression cassette. The structure of the MaSp1 RNA expression cassette is shown in FIG. 1, and comprises a T7 promoter (the nucleotide sequence is the 5117-, An initiation codon ATG (the nucleotide sequence is the 5544-th and 5546-th nucleotides of the sequence 1), an intron (the nucleotide sequence is the 5547-th and 5721-th nucleotides of the sequence 1, wherein the 5547-5556-th and 5557-5721-th nucleotides of the sequence 1) connected with the initiation codon and named as a 5' intron. The 5788-5835 th site of the sequence 1 in the sequence table is a terminator for terminating the transcription of the intron and the MaSp1 gene, and the 12 th to 467 th sites are replication initiation sites.

The MaSp1 gene does not contain a stop codon (TAA, TGA or TAG). The MaSp1 gene encodes MaSp1, and MaSp1 is a protein with an amino acid sequence of sequence 2. The 3 'intron and the 5' intron satisfy condition A that the precursor RNA transcribed from the single-copy gene expression cassette in the recombinant cell forms a splicing bubble through base complementary pairing and generates a mature circular single-stranded RNA molecule through a splicing reaction (G-OH-catalyzed splicing reaction) (see the mechanism in FIG. 2).

The 3 'intron contains 6 splicing bubbles and 3' splicing sites, the names of the coding DNAs of the 6 splicing bubbles are 3 'sp 1 gene, 3' sp2 gene, 3 'sp 3 gene, 3' sp4 gene, 3 'sp 5 gene and 3' sp6 gene respectively, and the coding DNAs of the 3 'splicing sites are called 3' ss gene; the nucleotide sequence of the 3' sp1 gene which is a chain is a double-stranded DNA molecule at the 5193-5214 th site of the sequence 1 in the sequence table; the nucleotide sequence of the 3' sp2 gene which is a chain is a double-stranded DNA molecule at the 5278-5289 th site of the sequence 1 in the sequence table; the nucleotide sequence of the 3' sp3 gene which is a chain is a double-stranded DNA molecule at 5293-5306 th site of the sequence 1 in the sequence table; the nucleotide sequence of the 3' sp4 gene which is one strand is a double-stranded DNA molecule at the 5318-th-5337 th site of the sequence 1 in the sequence table; the nucleotide sequence of the 3' sp5 gene which is one strand is a double-stranded DNA molecule at the 5352-5370 th site of the sequence 1 in the sequence table; the nucleotide sequence of one strand of the 3' sp6 gene is a double-stranded DNA molecule at the 5371-5386 th site of the sequence 1 in the sequence table; the nucleotide sequence of which the 3' splice site is one strand is a double-stranded DNA molecule at the 5419-5423 th site of the sequence 1 in the sequence table.

The mechanism of preparing tandem repeat protein using the expression vector pMaSp1 containing the single-copy gene expression cassette described above is to introduce pMaSp1 into recipient cells to obtain recombinant cells, and in the recombinant cells, pMaSp1 transcribes precursor RNA shown in the left panel of fig. 2, which is also called nuclear mRNA precursor (pre-mRNA). In the precursor RNA, the 3 'intron and the 5' intron form a splicing bubble through base complementary pairing, and splicing reaction is catalyzed by G-OH, and then cyclization is carried out, so as to generate a mature circular single-stranded RNA molecule shown in the right diagram of FIG. 2, which is called circular MaSp1 RNA. Ribosomes bind to the Ribosome Binding Site (RBS) sequence on the circular MaSp1 RNA, initiating translation of the protein from the AUG, and because the circular MaSp1 RNA does not contain UAA, UGA, or UAG, the ribosomes continue to translate on the circular MaSp1 mRNA, thereby generating the MaSp1 tandem repeat protein (fig. 3). The specific process is as follows:

1. preparation of expression vector pMaSP1 containing Single copy Gene expression cassette

pMaSP1 is constructed in a modular manner, and each module is connected by adopting a golden gate method: protective bases, enzyme recognition sites and complementary sticky ends of restriction endonuclease BsaI are added at two ends of each module, and the protective bases, the enzyme recognition sites and the complementary sticky ends are added in a primer embedding mode, and the method comprises the following specific steps:

1.1 Module

The construction of pMaSp1 requires module a and module B:

the deoxyribonucleotide sequence of the module A is shown as the 5547-5423 position of the sequence 1 of the sequence table, wherein the 5547-5721 position is a 5 'intron (wherein the 5547-5556 position is a 5' splice site, the 5557-5721 position is a 5 'ss gene), the 5788-5835 position is a transcription terminator, the 12-467 position is an replication initiation site (pMB1) gene, the 494-1459 position is an ampicillin resistance gene, the 5117-5135 position is a T7 promoter, and the 5190-5423 position is a 3' intron (wherein the 5190-5418 position is a3 'ss gene, and the 5419-5423 position is a 3' splice site).

The module B contains MaSp1 gene, and the deoxyribonucleotide sequence of the module B is shown as 5424-5528 site of the sequence 1 in the sequence table.

1.2 processing of the modules

Adding protective basic groups and enzyme recognition sites of restriction endonuclease BsaI and complementary sticky ends to two ends of the module A through PCR reaction to obtain a module A with restriction endonuclease BsaI sites at two ends, and naming the module A as a module A-BsaI; the primer pairs used for this PCR reaction were PartA-F and PartA-R.

PartA-F：5’-CCAGGTCTCAAAGGAGTACTCGATGGATCTCAGGTCAATTGAGGCCTGAGTA-3' (underlined nucleotides are BsaI recognition sites)

PartA-R：5’-CCAGGTCTCAGGTAGCATTATGTTCAGATAAGGTC-3'. (underlined nucleotides are BsaI recognition sites)

Adding protective basic groups and enzyme recognition sites of restriction endonuclease BsaI and complementary sticky ends to two ends of the module B through PCR reaction to obtain a module B with restriction endonuclease BsaI sites at two ends, and naming the module B as a module B-BsaI; the primer pair used in the PCR reaction is PartB-F and PartB-R.

PartB-F：5’-CCAGGTCTCATACCAGCGGACGTGG-3' (underlined nucleotides are BsaI recognition sites)

PartB-R：5’-CCAGGTCTCACCTTTGTTCCCTGGCTTCC-3 (underlined nucleotides are BsaI recognition sites).

1.3 construction of pMaSP1

A plasmid was prepared by connecting the module A-BsaI and the module B-BsaI at a molar ratio of 1:1 into a circular MaSp1 RNA by the following system: 20 μ L of module A-BsaI 5.05E-8mol (about 100ng) and module B-BsaI 5.05E-8mol in the reaction system; BsaI enzyme 1. mu.L, T4 DNA Ligase (T4 DNA Ligase) 1. mu.L, 10x T4 buffer (10x T4 buffer) 2. mu.L, 10xBSA protein solution 2. mu.L, with deionized water to make up to 20. mu.L. The ligation was performed by the following reaction conditions: reacting at 37 ℃ for 3 min; reacting for 4min at 25 ℃, and carrying out 25 cycles; the unligated fragments were excised by reaction at 50 ℃ for 5min and then the enzyme was inactivated by reaction at 80 ℃ for 5 min. After the reaction, a golden gate reaction solution of pMaSP1 was obtained.

Transferring 5 mu L of golden gate reaction liquid of pMaSP1 into escherichia coli DH5 alpha competent cells, screening on a solid LB culture medium with ampicillin concentration of 100 mu g/mL, selecting bacteria for sequencing, screening and constructing correct plasmids, and amplifying and extracting the plasmids to obtain pMaSP 1.

2. Construction and verification of circular MaSp1 RNA

Transferring 0.5 μ L of pMaSP1 into competent cells of Escherichia coli BL21(DE3), screening on solid LB medium with ampicillin concentration of 100 μ g/mL to obtain Escherichia coli BL21(DE3) positive transformant (recombinant cells transferred into pMaSP 1), transferring Escherichia coli BL21(DE3) positive transformant into liquid LB medium with ampicillin concentration of 100 μ g/mL, and culturing at 37 deg.C to OD_600nmAbout.0.4, inducing with 1mM isopropyl-beta-D-thiogalactoside (IPTG) for 12h, taking 1mL of the bacterial liquid, centrifuging for 2min at 12000rmp, discarding the supernatant, and extracting the total RNA of the escherichia coli BL21(DE3) positive transformant by using an RNAprep Pure cultured cell/bacteria total RNA extraction kit according to the method described in the specification.

Total RNA was reverse transcribed into cDNA using the Rever Tra Ace qPCRRT kit cDNA Synthesis kit according to the method described in the specification.

The PCR reaction of the cDNA was performed using the primer pair Testif cirMaSp1-F and Testif cirMaSp1-R to verify whether MaSp1 RNA forms a loop.

Testify cirMaSp1-F:5’-CAGGACAGGGAGGATATGGA-3’；

Testify cirMaSp1-R:5’-CTCCTCCCATGGCTGC-3’。

The DNA polymerase used was verified to be 2XEs Taq MasterMix (with dye). The samples after the PCR reaction were run on gel and the electrophoretogram is shown in FIG. 4. The band with a molecular weight of 100bp was sent for sequencing, and the sanger sequencing result of MaSp1 RNA splice (Splicejunction) is shown in FIG. 5. Sequencing showed that the 5 'splice site was ligated to the 3' splice site, indicating that the MaSp1 RNA had been circularized.

3. Preparation of MaSp1 tandem repeat proteins

Escherichia coli BL21(DE3) positive transformant in which the loop formation of MaSp1 RNA was verified in step 2 was inoculated into liquid LB medium having ampicillin concentration of 100. mu.g/mL and cultured at 37 ℃ to OD₆₀₀Approximatively 0.4, followed by further induction with 1mM IPTG at 37 ℃ for 6h (labeled MaSp1 cirmRNA 6h in FIG. 6); the empty vector-transferred BL21 strain was used as a control in place of the loop-formed MaSp1 RNA BL21 strain (indicated as empty vector 6h in fig. 6). After the induction was completed, a protein gel-running sample was prepared, and the results of the protein gel-running test are shown in FIG. 6, and three protein gel bands were added after the treatment of E.coli BL21(DE3) positive transformant in which MaSp1 RNA had been cyclized, as compared to the control. The three bands are cut off respectively and sent to mass spectrometry for detection, the result is shown in figure 8, and the band with the molecular weight more than 118kD is spidroin protein.

Among them, the empty vector-transferred BL21 strain was a transformant obtained by transferring an empty vector into E.coli BL21(DE 3). The empty vector was a plasmid obtained by removing the MaSp1 RNA expression cassette from pMaSP1 while keeping the other nucleotides of pMaSP1 unchanged. The empty vector differed from pMaSp1 only in that it did not contain the MaSp1 RNA expression cassette.

Experiments prove that only 7 days are needed to obtain the tandem repeat MaSp1 which is repeated for 40 times, and the time is greatly shortened compared with the traditional method.

The present invention has been described in detail above. It will be apparent to those skilled in the art that the invention can be practiced in a wide range of equivalent parameters, concentrations, and conditions without departing from the spirit and scope of the invention and without undue experimentation. While the invention has been described with reference to specific embodiments, it will be appreciated that the invention can be further modified. In general, this application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. The use of some of the essential features is possible within the scope of the claims attached below.

Sequence listing

<110> institute of biotechnology for Tianjin industry of Chinese academy of sciences

<120> a method for preparing tandem repeat protein and uses thereof

<130> GNCSY200930

<160> 2

<170> SIPOSequenceListing 1.0

<210> 1

<211> 5860

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 1

tggcgaatgg gacgcgccct gtagcggcgc attaagcgcg gcgggtgtgg tggttacgcg 60

cagcgtgacc gctacacttg ccagcgccct agcgcccgct cctttcgctt tcttcccttc 120

ctttctcgcc acgttcgccg gctttccccg tcaagctcta aatcgggggc tccctttagg 180

gttccgattt agtgctttac ggcacctcga ccccaaaaaa cttgattagg gtgatggttc 240

acgtagtggg ccatcgccct gatagacggt ttttcgccct ttgacgttgg agtccacgtt 300

ctttaatagt ggactcttgt tccaaactgg aacaacactc aaccctatct cggtctattc 360

ttttgattta taagggattt tgccgatttc ggcctattgg ttaaaaaatg agctgattta 420

acaaaaattt aacgcgaatt ttaacaaaat attaacgttt acaatttcag gtggcacttt 480

tcggggaaat gtgcgcggaa cccctatttg tttatttttc taaatacatt caaatatgta 540

tccgctcatg agacaataac cctgataaat gcttcaataa tattgaaaaa ggaagagtat 600

gagtattcaa catttccgtg tcgcccttat tccctttttt gcggcatttt gccttcctgt 660

ttttgctcac ccagaaacgc tggtgaaagt aaaagatgct gaagatcagt tgggtgcacg 720

agtgggttac atcgaactgg atctcaacag cggtaagatc cttgagagtt ttcgccccga 780

agaacgtttt ccaatgatga gcacttttaa agttctgcta tgtggcgcgg tattatcccg 840

tattgacgcc gggcaagagc aactcggtcg ccgcatacac tattctcaga atgacttggt 900

tgagtactca ccagtcacag aaaagcatct tacggatggc atgacagtaa gagaattatg 960

cagtgctgcc ataaccatga gtgataacac tgcggccaac ttacttctga caacgatcgg 1020

aggaccgaag gagctaaccg cttttttgca caacatgggg gatcatgtaa ctcgccttga 1080

tcgttgggaa ccggagctga atgaagccat accaaacgac gagcgtgaca ccacgatgcc 1140

tgcagcaatg gcaacaacgt tgcgcaaact attaactggc gaactactta ctctagcttc 1200

ccggcaacaa ttaatagact ggatggaggc ggataaagtt gcaggaccac ttctgcgctc 1260

ggcccttccg gctggctggt ttattgctga taaatctgga gccggtgagc gtgggtctcg 1320

cggtatcatt gcagcactgg ggccagatgg taagccctcc cgtatcgtag ttatctacac 1380

gacggggagt caggcaacta tggatgaacg aaatagacag atcgctgaga taggtgcctc 1440

actgattaag cattggtaac tgtcagacca agtttactca tatatacttt agattgattt 1500

aaaacttcat ttttaattta aaaggatcta ggtgaagatc ctttttgata atctcatgac 1560

caaaatccct taacgtgagt tttcgttcca ctgagcgtca gaccccgtag aaaagatcaa 1620

aggatcttct tgagatcctt tttttctgcg cgtaatctgc tgcttgcaaa caaaaaaacc 1680

accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt ttccgaaggt 1740

aactggcttc agcagagcgc agataccaaa tactgtcctt ctagtgtagc cgtagttagg 1800

ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa tcctgttacc 1860

agtggctgct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa gacgatagtt 1920

accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc ccagcttgga 1980

gcgaacgacc tacaccgaac tgagatacct acagcgtgag ctatgagaaa gcgccacgct 2040

tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa caggagagcg 2100

cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg ggtttcgcca 2160

cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc tatggaaaaa 2220

cgccagcaac gcggcctttt tacggttcct ggccttttgc tggccttttg ctcacatgtt 2280

ctttcctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg agtgagctga 2340

taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg aagcggaaga 2400

gcgcctgatg cggtattttc tccttacgca tctgtgcggt atttcacacc gcatatatgg 2460

tgcactctca gtacaatctg ctctgatgcc gcatagttaa gccagtatac actccgctat 2520

cgctacgtga ctgggtcatg gctgcgcccc gacacccgcc aacacccgct gacgcgccct 2580

gacgggcttg tctgctcccg gcatccgctt acagacaagc tgtgaccgtc tccgggagct 2640

gcatgtgtca gaggttttca ccgtcatcac cgaaacgcgc gaggcagctg cggtaaagct 2700

catcagcgtg gtcgtgaagc gattcacaga tgtctgcctg ttcatccgcg tccagctcgt 2760

tgagtttctc cagaagcgtt aatgtctggc ttctgataaa gcgggccatg ttaagggcgg 2820

ttttttcctg tttggtcact gatgcctccg tgtaaggggg atttctgttc atgggggtaa 2880

tgataccgat gaaacgagag aggatgctca cgatacgggt tactgatgat gaacatgccc 2940

ggttactgga acgttgtgag ggtaaacaac tggcggtatg gatgcggcgg gaccagagaa 3000

aaatcactca gggtcaatgc cagcgcttcg ttaatacaga tgtaggtgtt ccacagggta 3060

gccagcagca tcctgcgatg cagatccgga acataatggt gcagggcgct gacttccgcg 3120

tttccagact ttacgaaaca cggaaaccga agaccattca tgttgttgct caggtcgcag 3180

acgttttgca gcagcagtcg cttcacgttc gctcgcgtat cggtgattca ttctgctaac 3240

cagtaaggca accccgccag cctagccggg tcctcaacga caggagcacg atcatgcgca 3300

cccgtggggc cgccatgccg gcgataatgg cctgcttctc gccgaaacgt ttggtggcgg 3360

gaccagtgac gaaggcttga gcgagggcgt gcaagattcc gaataccgca agcgacaggc 3420

cgatcatcgt cgcgctccag cgaaagcggt cctcgccgaa aatgacccag agcgctgccg 3480

gcacctgtcc tacgagttgc atgataaaga agacagtcat aagtgcggcg acgatagtca 3540

tgccccgcgc ccaccggaag gagctgactg ggttgaaggc tctcaagggc atcggtcgag 3600

atcccggtgc ctaatgagtg agctaactta cattaattgc gttgcgctca ctgcccgctt 3660

tccagtcggg aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag 3720

gcggtttgcg tattgggcgc cagggtggtt tttcttttca ccagtgagac gggcaacagc 3780

tgattgccct tcaccgcctg gccctgagag agttgcagca agcggtccac gctggtttgc 3840

cccagcaggc gaaaatcctg tttgatggtg gttaacggcg ggatataaca tgagctgtct 3900

tcggtatcgt cgtatcccac taccgagata tccgcaccaa cgcgcagccc ggactcggta 3960

atggcgcgca ttgcgcccag cgccatctga tcgttggcaa ccagcatcgc agtgggaacg 4020

atgccctcat tcagcatttg catggtttgt tgaaaaccgg acatggcact ccagtcgcct 4080

tcccgttccg ctatcggctg aatttgattg cgagtgagat atttatgcca gccagccaga 4140

cgcagacgcg ccgagacaga acttaatggg cccgctaaca gcgcgatttg ctggtgaccc 4200

aatgcgacca gatgctccac gcccagtcgc gtaccgtctt catgggagaa aataatactg 4260

ttgatgggtg tctggtcaga gacatcaaga aataacgccg gaacattagt gcaggcagct 4320

tccacagcaa tggcatcctg gtcatccagc ggatagttaa tgatcagccc actgacgcgt 4380

tgcgcgagaa gattgtgcac cgccgcttta caggcttcga cgccgcttcg ttctaccatc 4440

gacaccacca cgctggcacc cagttgatcg gcgcgagatt taatcgccgc gacaatttgc 4500

gacggcgcgt gcagggccag actggaggtg gcaacgccaa tcagcaacga ctgtttgccc 4560

gccagttgtt gtgccacgcg gttgggaatg taattcagct ccgccatcgc cgcttccact 4620

ttttcccgcg ttttcgcaga aacgtggctg gcctggttca ccacgcggga aacggtctga 4680

taagagacac cggcatactc tgcgacatcg tataacgtta ctggtttcac attcaccacc 4740

ctgaattgac tctcttccgg gcgctatcat gccataccgc gaaaggtttt gcgccattcg 4800

atggtgtccg ggatctcgac gctctccctt atgcgactcc tgcattagga agcagcccag 4860

tagtaggttg aggccgttga gcaccgccgc cgcaaggaat ggtgcatgca aggagatggc 4920

gcccaacagt cccccggcca cggggcctgc caccataccc acgccgaaac aagcgctcat 4980

gagcccgaag tggcgagccc gatcttcccc atcggtgatg tcggcgatat aggcgccagc 5040

aaccgcacct gtggcgccgg tgatgccggc cacgatgcgt ccggcgtaga ggatcgagat 5100

ctcgatcccg cgaaattaat acgactcact ataggggaat tgtgagcgga taacaattcc 5160

cctctagaaa taattttgtt taactttaaa attctagaga aaatttcgtc tggattagtt 5220

acttatcgtg taaaatctga taaatggaat tggttctaca taaatgccta acgactatcc 5280

ctttggggag tagggtcaag tgactcgaaa cgatagacaa cttgctttaa caagttggag 5340

atatagtctg ctctgcatgg tgacatgcag ctggatataa ttccggggta agattaacga 5400

ccttatctga acataatgct accagcggtc gcggcggtct gggtggccag ggtgcaggta 5460

tggcggctgc ggctgcaatg ggcggtgctg gccaaggtgg ctacggcggc ctgggttctc 5520

agggtactaa ggagatatac catatggatc tgcgttcaat tgaggcctga gtataaggtg 5580

acttatactt gtaatctatc taaacgggga acctctctag tagacaatcc cgtgctaaat 5640

tgtaggactg ccctttaata aatacttcta tatttaaaga ggtatttatg aaaagcggaa 5700

tttatcagat taaaaatact ttgagatccg gctgctaaca aagcccgaaa ggaagctgag 5760

ttggctgctg ccaccgctga gcaataacta gcataacccc ttggggcctc taaacgggtc 5820

ttgaggggtt ttttgctgaa aggaggaact atatccggat 5860

<210> 2

<211> 35

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 2

Ser Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Met Ala Ala Ala

1 5 10 15

Ala Ala Met Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser

20 25 30

Gln Gly Thr

35

Claims

1. A method of preparing tandem repeat proteins, comprising: comprises introducing an expression vector containing a double-stranded DNA molecule named as a single-copy gene expression cassette into a receptor cell to obtain a recombinant cell, culturing the recombinant cell, and expressing to obtain a tandem repeat protein; the single-copy gene expression cassette comprises a promoter, an intron which is connected with the promoter and is named as a3 ' intron, a target protein coding gene which is connected with the 3 ' intron and is named as a single-copy gene, a coding sequence of a ribosome binding site which is connected with the target protein coding gene, a spacer sequence which is connected with the coding sequence of the ribosome binding site, an initiation codon which is connected with the spacer sequence, and an intron which is connected with the initiation codon and is named as a 5 ' intron; the 3 'intron and the 5' intron satisfy condition a, where condition a is that a splicing bubble is formed by base complementary pairing of the precursor RNA transcribed from the single-copy gene expression cassette in the recombinant cell, and a mature circular single-stranded RNA molecule is produced by a splicing reaction; the target protein coding gene does not contain a stop codon.

2. The method of claim 1, wherein: the single copy gene expression cassette is formed by connecting a promoter, the 3 'intron, the target protein coding gene, the coding sequence of the ribosome binding site, the spacer sequence, the initiation codon and the 5' intron.

3. The method according to claim 1 or 2, characterized in that: the target protein is MaSp 1.

4. The method of claim 3, wherein: the MaSp1 is a protein with an amino acid sequence of SEQ ID No. 2.

5. The method according to any one of claims 1-4, wherein: the recipient cell is any one of C1) -C4):

C1) a prokaryotic microbial cell;

C2) gram-negative bacterial cells;

C3) an escherichia bacterial cell;

C4) escherichia coli BL21(DE3) cells.

6. The method according to any one of claims 1-5, wherein: the 3 'intron and the 5' intron which satisfy the condition A are a pair of introns as follows:

7. The method according to any one of claims 1-6, wherein: the 3 'intron is a double-stranded DNA of which the nucleotide sequence of one strand is the 5190-5423 th nucleotide of the sequence 1, and the 5' intron is a double-stranded DNA of which the nucleotide sequence of one strand is the 5547-5721 th nucleotide of the sequence 1.

8. The method according to any one of claims 1-7, wherein: the single copy gene expression cassette is a double-stranded DNA molecule with one strand of which the nucleotide sequence is the 5117-position 5835 of SEQ ID No. 1;

or the expression vector is a double-stranded DNA molecule with one strand of which the nucleotide sequence is SEQ ID No. 1.

9. Any one of the following products:

A1) the method of any one of claims 1-8 wherein said double stranded DNA molecule is a single copy gene expression cassette;

A2) a vector containing a1) the double-stranded DNA molecule;

10. Use of the method of any one of claims 1 to 8 or the product of claim 9 for the preparation of tandem repeat proteins.