CN112111504A

CN112111504A - Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method

Info

Publication number: CN112111504A
Application number: CN202010952359.4A
Authority: CN
Inventors: 汤华东; 李冠霖; 柳余莉; 欧阳聪; 杨薇; 高思偲; 尹领鹏; 谢倩; 杨琴霞; 张盼盼
Original assignee: Wuhan Hiteck Biological Pharma Co ltd
Current assignee: Wuhan Hiteck Biological Pharma Co ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2020-12-22

Abstract

The invention discloses a method for screening enzyme digestion adaptive fusion protein and an IGF-I preparation method, wherein the screening method is to insert target protein sequence coding genes into a plurality of different expression vectors containing protease enzyme digestion sites respectively to obtain amino acid sequences of a plurality of different fusion proteins; predicting the secondary structure of the amino acid sequence of the obtained fusion protein; taking amino acid sequences of different fusion proteins and corresponding secondary structure prediction results thereof as input files to predict a three-dimensional structure; and performing molecular docking on the three-dimensional structure prediction result and the protease corresponding to the fusion protein by adopting a Cluspro algorithm, selecting a digestion system with the correct docking result ratio exceeding a threshold value for experimental verification, and screening out the fusion protein with the most suitable protease. The soluble expression system of thrombin and HRV 3C protease suitable for IGF-I is screened by the method, and the soluble expression system has the advantages of stability, solubility, high-activity mass expression, convenient enzyme digestion and the like.

Description

Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method

Technical Field

The invention relates to the technical field of recombinant protein production, in particular to a method for screening enzyme digestion adaptive fusion protein, which saves manpower, material resources and time, and a preparation method of IGF-I.

Background

Insulin-like growth factor-I (IGF-I) plays an important role in the growth, development and proliferation of cells, and is a multifunctional regulatory factor. Currently, human IGF-I has been clinically used for the treatment of diabetes, insulin resistance syndrome, dwarfism and nervous system diseases, etc., with good results. If IGF-I is directly expressed in a prokaryotic expression system, the target protein usually forms an inclusion body, and in order to improve the protein solubility and facilitate purification, a dissolution promoting tag and a purification tag are added for fusion expression; in order to obtain the target protein, a protease cleavage site sequence is usually inserted between the target protein and the lysis-promoting tag and the purification tag, so that the tags can be removed by using corresponding protease to obtain the final target protein.

Commonly used proteases comprise enterokinase, thrombin, Xa factor, HRV 3C protease (human rhinovirus 3C protease) and the like, each protease has certain restriction on amino acid sequences before and after the cleavage site (for example, the optimal cleavage site sequence of thrombin is LVPR ↓GS, ↓indicatesthat the protease cleaves a corresponding polypeptide chain at the site, the same below), and protein drugs do not allow redundant amino acids to exist before the mature peptide, so that the protease corresponding to the fusion protein needs to be selected to ensure that the target protein obtained after cleavage has the N-terminal and/or the C-terminal of the natural non-additional amino acid residues. In actual production, a plurality of fusion proteins containing different protease enzyme cutting site sequences are usually required to be expressed and purified, and the optimal expression system can be selected by comparing the enzyme cutting efficiencies of different proteases; also, the following may occur during this process: (1) because the enzyme cutting sites are wrapped inside the fusion protein, the protease cannot cut the fusion protein; (2) the fusion protein contains a plurality of enzyme cutting sites of protease, and the protease can non-specifically cut the fusion protein and even cut the target protein. In any case, the cost of manpower and material resources is wasted, the development period is prolonged, and even the project is terminated.

Therefore, a rapid and efficient method is urgently needed to screen out a soluble expression system which can be successfully digested by enzyme and successfully prepare the recombinant human insulin-like growth factor-I so as to facilitate large-scale production.

Disclosure of Invention

In view of the above drawbacks or needs for improvement in the prior art, an object of the present invention is to provide a method for screening an adapted protease restriction enzyme-cleaved fusion protein, which efficiently screens protease restriction enzyme-cleaved fusion proteins by using a virtual screening technique, thereby solving the technical risk that the cleavage effect can only be verified by using a large number of experiments in the past.

The invention also aims to provide a preparation method of the insulin-like growth factor-I, which obtains the insulin-like growth factor-I fusion protein with good enzyme digestion effect by the virtual screening method, realizes soluble expression and successful enzyme digestion of the fusion protein, and solves the technical problem that the recombinant human insulin-like growth factor-I is difficult to stably, soluble and highly actively express in an escherichia coli expression system.

The technical scheme of the invention is detailed as follows:

a method of screening for an enzyme-cleaved aptameric fusion protein comprising the steps of:

(1) respectively inserting target protein sequence coding genes into a plurality of different expression vectors containing protease enzyme cutting sites to obtain amino acid sequences of a plurality of different fusion proteins; predicting the secondary structure of the amino acid sequence of the obtained fusion protein by using a secondary structure prediction algorithm PSIPRED;

(2) taking the amino acid sequences of the different fusion proteins in the step (1) and the corresponding secondary structure prediction results thereof as input files, and predicting the three-dimensional structure by a structure prediction algorithm I-TASSER;

(3) and (3) performing molecular docking on the three-dimensional structure prediction result obtained in the step (2) and the protease corresponding to the fusion protein by adopting a Cluspro algorithm, selecting a digestion system with the correct docking result ratio exceeding a threshold value for experimental verification, and screening out the fusion protein with the most suitable protease.

In the screening method, candidate proteases can be listed one by one, fusion protein sequences containing target proteins and enzyme cutting sites corresponding to the proteases are obtained respectively, and fusion proteins with corresponding quantities are obtained according to the quantity of the proteases. The subsequent secondary structure prediction algorithm PSIPRED, the tertiary structure prediction algorithm I-TASSER and the molecular docking algorithm Cluspro are all the existing known technologies, and the operation can be carried out by a person skilled in the art according to the instruction of each algorithm.

Preferably, the method further comprises the following steps:

(4) and (4) performing codon optimization on the most adaptive fusion protein coding gene obtained in the step (3). The expression quantity can be improved after optimization.

Preferably, in the above method, the target protein is human insulin-like growth factor-I.

Preferably, in the above method, the protease is enterokinase, thrombin or HRV 3C protease, and when the three-dimensional structure prediction results of the fusion proteins corresponding to different proteases are subjected to docking, the docking sites are selected as follows:

selecting all lysine (Lys) in an amino acid sequence and 4 amino acids on the left and right of the lysine (Lys) as candidate enzyme cutting sites of the fusion protein corresponding to the enterokinase;

selecting arginine (Arg) and 4 amino acids on the left and right of the Arg as candidate enzyme cutting sites of the fusion protein corresponding to the thrombin;

the fusion protein corresponding to HRV 3C protease selects glutamine (Gln) and 4 amino acids on the left and right of the glutamine as candidate enzyme cutting sites.

Preferably, in the above method, the threshold value in step (3) is 50%.

A method for preparing recombinant human insulin-like growth factor-I uses fusion protein with amino acid sequence as shown in SEQ ID NO.1, uses pET-32a (+) vector to express in prokaryotic host, and uses thrombin to carry out enzyme digestion on the fusion protein obtained by expression. The fusion protein coding gene was cloned into pET-32a (+) vector between MscI and EcoRV cleavage sites.

Preferably, in the above IGF-I preparation method, the encoding gene of the fusion protein shown in SEQ ID NO.1 is:

(1) a gene sequence shown as SEQ ID NO. 4; or

(2) A gene sequence which has 90 to 100 percent of homology with the gene sequence shown in SEQ ID NO.4 and encodes the same functional protein; or

(3) The gene sequence shown in SEQ ID NO.4 is a gene sequence which is derived from (1) and encodes protein with the same activity by adding, deleting or replacing one or more codons.

A method for preparing recombinant human insulin-like growth factor-I uses fusion protein whose amino acid sequence is shown in SEQ ID NO.3, uses pET-48b (+) vector to make expression in prokaryotic host, and uses HRV 3C enzyme to make enzyme digestion so as to obtain the invented fusion protein. The fusion protein coding gene was cloned into the pET-48b (+) vector between the SacII and HindIII cleavage sites.

Preferably, in the above IGF-I preparation method, the coding gene corresponding to the fusion protein shown in SEQ ID NO.3 is:

(1) a gene sequence shown as SEQ ID NO. 5;

(2) a gene sequence which has 90 to 100 percent of homology with the gene sequence shown in SEQ ID NO.5 and encodes the same functional protein; or

(3) The gene sequence shown in SEQ ID NO.5 is a gene sequence which is derived from (1) and encodes protein with the same activity by adding, deleting or replacing one or more codons.

Preferably, in any of the above-described preparation methods, the prokaryotic host is BL21(DE3) E.coli strain, Rosetta-gami B (DE3) E.coli strain, Origami B (DE3) E.coli strain or Rosetta-gami2(DE3) E.coli strain.

Compared with the prior art, the invention has the following beneficial effects:

the method for screening enzyme digestion adaptive fusion protein provided by the invention adopts a virtual screening means, can virtually compare the enzyme digestion effects of various proteases through structure simulation and molecular docking, screens out a proper fusion protein expression system and the proteases before an experiment, conforms to a prediction result after experimental verification, can greatly reduce the capital, labor and time cost, and is more convenient and efficient compared with the conventional high-input mode which can only be screened by one verification test.

According to the invention, the enzyme digestion effects of thrombin, enterokinase and HRV 3C protease are obtained through virtual screening by the screening method, and the enzyme digestion effects are consistent with the prediction result after experimental verification, and the thrombin, the HRV 3C protease and the corresponding fusion protein thereof are more suitable for preparing IGF-I, so that a soluble expression system which is suitable for the recombinant human insulin-like growth factor-I and can be successfully subjected to enzyme digestion is obtained.

The preparation method of the recombinant human insulin-like growth factor-I fusion protein provided by the invention has the advantages of soluble expression, large expression quantity and the like, can successfully obtain the recombinant human insulin-like growth factor-I by protease enzyme digestion, and solves the technical problem that the human insulin-like growth factor-I cannot be directly obtained.

Drawings

FIG. 1 is a PSIPRED online prediction server input interface;

FIG. 2 is an I-TASSER online prediction server input interface;

FIG. 3 is a Cluspro online prediction server input interface;

FIG. 4 is a map of E.coli expression vector pET-32a (+);

FIG. 5 is a map of the multiple cloning site region of E.coli expression vector pET-32a (+);

FIG. 6 is a SDS-PAGE electrophoretic analysis of thrombin fusion proteins;

FIG. 7 is a Western Blot immunodifferencing profile of thrombin fusion proteins;

FIG. 8 is a map of E.coli expression vector pET-48b (+);

FIG. 9 is a map of the multiple cloning site region of E.coli expression vector pET-48b (+);

FIG. 10 is a Western Blot immunodifferencing profile of HRV 3C enzyme fusion protein;

FIG. 11 is a SDS-PAGE analysis of thrombin-cleaved thrombin fusion proteins;

FIG. 12 is a WesternBlot immunodifferencing profile of HRV 3C enzyme fusion protein after digestion with HRV 3C enzyme;

FIG. 13 is a Western Blot immunodifferencing profile of enterokinase fusion protein;

FIG. 14 is an SDS-PAGE analysis of enterokinase fusion protein after digestion with enterokinase.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention are described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The inventors succeeded in improving the thermal stability of proteins using various structure prediction algorithms through in-depth studies on protein structures and molecular modeling methods in the early days (Li, et al. apple Environ Microb,2018,84(2), e 02129-17; Li, et al. RSCAdv,2018,8, 1948). The invention virtually screens out fusion protein suitable for preparing the recombinant human insulin-like growth factor-I by a structure prediction algorithm and a molecular docking algorithm, and adopts corresponding protease to successfully perform enzyme digestion to prepare the recombinant human insulin-like growth factor-I. By adopting the technical scheme of the invention, the expression of IGF-I in an escherichia coli expression system with stability, solubility and high activity can be realized, so that the industrial production of the IGF-I becomes possible, and the IGF-I is beneficial to development of new drugs and clinical application.

With respect to the examples, it is noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless otherwise indicated, the technical terms used are terms commonly used by those of ordinary skill in the art; the experimental method without specific conditions noted is a conventional experimental method; the test materials used are commercially available products unless otherwise specified, and the ingredients and preparation methods of various reagents and media can be found in conventional laboratory manuals.

Example 1: obtaining amino acid sequences of fusion proteins

According to literature reports, the direct expression of insulin-like growth factor-I (IGF-I) in Escherichia coli forms inclusion bodies (Rosano GL, et al front Microbiol,2014,5:172), and soluble expression of proteins can be realized by adding a solubilizing protein tag for fusion expression. At present, the most widely used escherichia coli expression vector is the pET system, and a solubilizing tag such as Thioredoxin (Thioredoxin) and a His-tag purification tag are added to the vector, so that soluble expression and purification of fusion protein can be promoted.

In order to realize that the N end of IGF-I has no other amino acid after the fusion protein is digested by the protease, the candidate protease is thrombin, FactorXa, enterokinase, TEV enzyme and HRV 3C enzyme; taking into account the cost of enzyme and the difficulty of acquisition, simulated digestion experiments were performed with thrombin, enterokinase, HRV 3C enzyme. In different pET expression vectors, enzyme cutting sites of thrombin, enterokinase and HRV 3C enzyme are contained, and target protein can be obtained by protease enzyme cutting. Finally, three fusion proteins of pET-32a (+) -thrombin enzyme cutting site-IGF-I, pET-32a (+) -enterokinase enzyme cutting site-IGF-I, pET-48b (+) -HRV 3C enzyme cutting site-IGF-I are determined for subsequent experiments.

The amino acid sequence of IGF-I is shown in SEQ ID No. 6:

GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCF RSCDLRRLEMYCAPLKPAKSA。

respectively inserting the amino acid sequence of IGF-I into thrombin enzyme cutting site sequence LVPR, enterokinase enzyme cutting site DDDDK sequence of pET-32a (+) vector and HRV 3C enzyme cutting site LEVLFQ sequence of pET-48b (+) vector to obtain corresponding fusion protein amino acid sequence, i.e. obtaining fusion protein amino acid sequence

Thrombin fusion protein amino acid sequence SEQ ID No.1:

MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHMHHHHHHSSGLVPRGPETLCGAELVDALQ FVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA。

enterokinase fusion protein amino acid sequence SEQ ID No.2:

MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHMHHHHHHSSGLVPRGSGMKETAAAKFERQHMDSPDLGTDDDDKGPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCA PLKPAKSA。

HRV 3C enzyme fusion protein amino acid sequence SEQ ID No.3:

MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHTSGGGGSNNNPPTPTPSSGSGHHHHHHSAALEVLFQGPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAK SA。

example 2 prediction of fusion protein Secondary Structure

The PSIPRED online prediction server (http:// bio if. cs. ucl. ac. uk/PSIPRED /) is opened, and its pages are shown in FIG. 1. The amino acid Sequence obtained in example 1 was entered into the blank field of "Protein Sequence" and filled in Job Name and Email address, and after submitting the task, the ss2 file of the predicted result was sent to the designated mailbox.

The prediction results of the secondary structure of the thrombin fusion protein in example 1 are as follows, wherein C represents a random coil, S represents a β -sheet, and H represents an α -helix:

CCCCCSSCCHHHHHHHHHHCCCCSSSSSSCCCCHHHHHHHHHHHHHHHHHCCCSSSSSSSCCCCCCHHHHHCCCCCCSSSSSSCCSSSSSSSCCCCHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHCCCCSSSCCCCCCCCCCCCCCCCCCCCCHHCCCCCSSSSSSSSCCCCCCCCC

the results of predicting the secondary structure of the enterokinase fusion protein in example 1 are as follows:

CCCCCSSCCCCCHHHHHHCCCCCSSSSSCCCCCHHHHHHHHHHHHHHHHHCCCSSSSSSSCCCCCCCCCCCCCCCCCSSSSSSCCSSSSSSSCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCSSSCCCCCCCCCCCCCCCCCCCCCHHCCCCCCCHHHHHCCCCCCCCCC

the result of predicting the secondary structure of the HRV 3C enzyme fusion protein in example 1 is as follows:

CCCCCSSCCCCCHHHHHHCCCCCSSSSSCCCCCHHHHHHHHHHHHHHHHHCCCSSSSSSSCCCCCCCCCCCCCCCCCSSSSSSCCSSSSSSSCCCCHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHCCCCCCHHHHHHHHHHHHHCCCCSSSCCCCCCCCCCCCCCCCCCCCCHHCCCCCCCHHHHHCCCCCCCCCC

example 3 prediction of three-dimensional Structure of fusion protein

An I-TASSER online prediction server (https:// zhanglab. ccmb. med. umich. edu/I-TASSER /) is opened, and the page is shown in FIG. 2. The amino acid sequence of the fusion protein is entered into the sequence box, and then "Option III: the upload button in the specific secondary structure for specific contents "option uploads the ss2 file obtained in example 2. And then filling in an email address and a task name, and submitting the prediction task. After the task is finished, the predicted pdb file of the three-dimensional structure of the fusion protein can be received.

Example 4 molecular docking of fusion proteins with proteases

Turn on the Cluspro online prediction server (https:// Cluspro. bu. edu/home. php), the page of which is shown in FIG. 3. Clicking an 'Upload PDB' button in the 'Receptor' column to Upload the predicted fusion protein PDB file in the embodiment 3; IN the column "Ligand", protease PDB ID corresponding to the fusion protein was filled IN, thrombin was filled IN "1 ETR", enterokinase was filled IN "1 EKB", and HRV 3C enzyme was filled IN "2 IN 2", respectively.

In order to accelerate the docking speed and improve the docking accuracy, in the following "introduction and replication" options, the "introduction" blank column corresponding to the Receptor is filled with candidate enzyme cutting sites of the fusion protein, and the "introduction" blank column corresponding to the Ligand is filled with amino acid residue sites which are possibly contacted with the fusion protein on the surface near the protease activity site.

Selecting Arg and 4 amino acids on the left and right of the Arg as candidate enzyme cutting sites by the thrombin fusion protein, selecting all Lys and 4 amino acids on the left and right of the Lys in an amino acid sequence as candidate enzyme cutting sites by the enterokinase fusion protein, and selecting Gln and 4 amino acids on the left and right of the Gln as candidate enzyme cutting sites by the HRV 3C enzyme fusion protein; preferably, the region for docking the thrombin fusion protein is 49-57, 66-78, 93-101, 125-133, 146-154 in the sequence of SEQ ID No. 1; the region for docking the enterokinase fusion protein is the 1-8, 15-23, 33-41, 49-62, 66-75, 79-105, 130-144, 154-162, 181-189, 219-228 site in the sequence of SEQ ID No. 2; the region for docking the HRV 3C enzyme fusion protein is 47-55, 59-67, 95-103, 147-155, 162-170 and 187-195 in the SEQ ID No.3 sequence.

The amino acid sequence of thrombin (PDB ID:1ETR) is shown in SEQ ID No. 7:

TFGAGEADCGLRPLFEKKQVQDQTEKELFESYIEGRIVEGQDAEVGLSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTVDDLLVRIGKHSRTRYERKVEKISMLDKIYIHPRYNWKENLDRDIALLKLKRPIELSDYIHPVCLPDKQTAAKLLHAGFKGRVTGWGNRRETWTTSVAEVQPSVLQVVNLPLVERPVCKASTRIRITDNMFCAGYKPGEGKRGDACEGDSGGPFVMKSPYNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDRLGS

the butt joint region is the 51-53, 55-56, 58-65, 79-80, 83, 85-86, 88, 90, 101, 104, 105, 131, 132, 135, 175, 179, 182, 190, 192, 194, 214, 215, 224, 225, 234, 241, 261, 267, 269, 273, 276;

the amino acid sequence of enterokinase (PDB ID:1EKB) is shown in SEQ ID No. 8:

CGKKLVTQEVSPKIVGGSDSREGAWPWVVALYFDDQQVCGASLVSRDWLVSAAHCVYGRNMEPSKWKAVLGLHMASNLTSPQIETRLIDQIVINPHYNKRRKNNDIAMMHLEMKVNYTDYIQPICLPEENQVFPPGRICSIAGWGALIYQGSTADVLQEADVPLLSNEKCQQQMPEYNITENMVCAGYEAGGVDSCQGDSGGPLMCQENNRWLLAGVTSFGYQCALPNRPGVYARVPRFTEWIQSFLH

the butt joint area is 33, 36-39, 54-55, 57-59, 97, 99-100, 102, 105, 195-221, 218-221, 231-232;

the amino acid sequence of HRV 3C enzyme (PDB ID:2IN2) is shown IN SEQ ID No. 9:

GPNTEFALSLLRKNIMTITTSKGEFTGLGIHDRVCVIPTHAQPGDDVLVNGQKIRVKDKYKLVDPENINLELTVLTLDRNEKFRDIRGFISEDLEGVDATLVVHSNNFTNTILEVGPVTMAGLINLSSTPTNRMIRYDYATKTGQCGGVLCATGKIFGIHVGGNGRQGFSAQLKKQYFVEKQ

the docking area is the 26-30, 44-48, 64-68, 74, 76-78, 111-112, 131-137, 139, 147-152, 166-170, 174.

After the docking file and the docking site are input, a 'Dock' button is clicked to carry out docking prediction, the Cluspro outputs 20 most probable docking compound structure sets and pdb structure files thereof, and the number of structures contained in each set is given. The 20 aggregate compound structure files were observed to derive the docking position of each compound.

The output of the thrombin and fusion protein docking is as follows:

1) the largest structure set comprises 220 complex structures in total, and the enzyme cutting position is PR ↓ GP; 2) IR ↓ GI (161); 3) no definite docking position; 4) PR ↓ GP (64); 5) no definite docking position; 6) IR ↓ GI (49); 7) PR ↓ GP (46); 8) PR ↓ GP (43); 9) PR ↓ GP (29); 10) DR ↓ GF (26); 11) no definite docking position; 12) no definite docking position; 13) no definite docking position; 14) DR ↓ GF (15); 15) IR ↓ GI (14); 16) PR ↓ GP (12); 17) PR ↓ GP (10); 18) no definite docking position; 19) IR ↓ GI (4); 20) DR ↓ GF (1).

Counting the docking positions and the number of the docking positions to obtain that the percentage of the thrombin active sites docking at the correct enzyme cutting position (PR ↓ GP) is 61.1%:

MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIR(32.8％)GIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHMHHHHHHSSGLVPR(61.1％)GPETLCGAELVDALQFVCGDR(6.1％)GFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA

similarly, the following are the restriction sites of enterokinase and the percentage thereof, and the proportion of the correct restriction site (DK ↓ GP) is 14.7%:

MSDK(17.1％)IIHLTDDSFDTDVLKADGAILVDFWAEWCGPCK(1.8％)MIAPILDEIADEYQGKLTVAKLNIDQNPGTAPK(30.1％)YGIRGIPTLLLFKNGEVAATK(10.4％)VGALSK(2.3％)GQLKEFLDANLAGSGSGHMHHHHHHSSGLVPRGSGMK(14.4％)ETAAAK(3.0％)FERQHMDSPDLGTDDDDK(14.7％)GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLK(6.1％)PAKSA

the following are the restriction enzyme sites of HRV 3C enzyme and the percentage thereof, and the proportion of the correct restriction enzyme site (FQ ↓ GP) is 56.6%:

MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHTSGGGGSNNNPPTPTPSSGSGHHHHHHSAALEVLFQ(56.6％)GPETLCGAELVDALQ(43.4％)FVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA

it is considered that, since a ratio of the correct cleavage site exceeding 50% in the prediction result means that the protease can smoothly contact the substrate and specifically cleave the correct cleavage site, 50% is selected as the threshold. Subsequently, expression experiments were carried out by selecting a thrombin cleavage system and an HRV 3C cleavage system as examples, and expression experiments were carried out by selecting an enterokinase cleavage system as a comparative example.

Example 5 preparation of fusion protein and enzyme cleavage

5.1 obtaining of fusion protein Gene sequences

In order to realize large-scale and high-efficiency expression of each fusion protein in Escherichia coli, the inventor finally designs a proper gene sequence for expression by referring to the codon bias of Escherichia coli and considering factors such as codon degeneracy, GC content, proper restriction endonuclease and the like.

Specifically, the thrombin fusion protein gene sequence is shown as SEQ ID No.4, and the 5 'end of the thrombin fusion protein gene sequence is provided with an MscI restriction endonuclease site (TGGCCA), the 3' end of the thrombin fusion protein gene sequence is provided with a continuous stop codon (TAATGA), and a recognition site (GATATC) of EcoRV:

TGGCCATATGCACCATCATCATCATCATTCTTCTGGTCTGGTGCCACGCGGTCCGGAGACCCTGTGCGGTGCGGAACTGGTGGACGCGCTGCAATTTGTTTGCGGTGATCGTGGCTTCTACTTTAACAAGCCGACCGGTTATGGTAGCAGCAGCCGTCGTGCGCCGCAGACCGGTATCGTTGACGAGTGCTGCTTCCGTAGCTGCGATCTGCGTCGTCTGGAAATGTATTGCGCGCCGCTGAAGCCGGCGAAAAGCGCGTAATGAGATATC

the HRV 3C enzyme fusion protein gene sequence is shown in SEQ ID No.5, the 5 'end of the HRV 3C enzyme fusion protein gene sequence is provided with a SacI restriction endonuclease site (CCGCGG), the 3' end of the HRV 3C enzyme fusion protein gene sequence is provided with a termination codon (TAA) and a recognition site of HindIII (AAGCTT):

CCGCGGCTCTGGAAGTGCTGTTTCAAGGTCCGGAGACCCTGTGCGGTGCGGAACTGGTGGACGCGCTGCAATTTGTTTGCGGTGATCGTGGCTTCTACTTTAACAAGCCGACCGGTTATGGTAGCAGCAGCCGTCGTGCGCCGCAGACCGGTATCGTTGACGAGTGCTGCTTCCGTAGCTGCGATCTGCGTCGTCTGGAAATGTATTGCGCGCCGCTGAAGCCGGCGAAAAGCGCGTAAAAGCTT

the above sequence was obtained by chemical synthesis.

5.2 construction of recombinant expression vector of thrombin fusion protein and fusion expression engineering bacterium

The map of the pET-32a (+) expression vector and the sequence of its multiple cloning site are shown in FIGS. 4 and 5. After the cDNA sequence (SEQ ID No.4) of the synthetic Thrombin fusion protein is subjected to double enzyme digestion by MscI and EcoRV, the cDNA sequence is connected to a pET-32a (+) expression vector which is also subjected to double enzyme digestion through T4 DNA ligase to construct a recombinant expression vector which is marked as pET-32 a-Thrombin-IGF-I; and (3) transforming an escherichia coli clone strain Top10, culturing and screening recombinants on an LB culture medium containing 50 mu g/ml ampicillin at 37 ℃, and sequencing to verify the sequence correctness after PCR and enzyme digestion verification are correct.

Extracting and identifying correct recombinants, transforming the recombinants into an Escherichia coli expression strain Origami B (DE3), and culturing and screening the recombinants on an LB culture medium containing 50 mu g/ml ampicillin (Amp), 15 mu g/ml kanamycin (Kan) and 12.5 mu g/ml tetracycline (Tet) at 37 ℃, wherein the obtained recombinants are IGF-I engineering bacteria expressed by Trx fusion and named as Thrombin-IGF. It was stored in 15% glycerol and frozen in a freezer at-80 ℃. Wherein, the fusion protein generated by the expression of the Thrombin-IGF engineering bacteria contains a thioredoxin (Trx) tag, a His-tag purification tag, a Thrombin (Thrombin) enzyme cutting site and a mature peptide IGF-I sequence, and the theoretical molecular weight is about 21.6 kDa.

Inoculating 0.2% of frozen Thrombin-IGF engineering bacteria in a glycerol tube into a test tube containing a fresh LB resistant culture medium (containing 50 mu g/ml Amp, 15 mu g/ml Kan and 12.5 mu g/ml Tet), performing shake culture overnight, and activating the strain; transferring into 300mL LB culture medium, shaking culturing at 37 deg.C for about 3h, and measuring OD₆₀₀When the value is 0.6-0.8, adding isopropyl-beta-D-thiogalactoside (IPTG) with the final concentration of 0.05mM, carrying out induction expression for 24h at 25 ℃, and centrifuging to collect thalli; weighing wet weight of the thallus, adding a bacteria-breaking liquid (50mM Tris-HCl, 0.5M NaCl, pH8.0) in a weight ratio of the thallus to the bacteria-breaking liquid of 1:15, performing ultrasonic bacteria-breaking, and performing SDS-PAGE detection. The results are shown in FIG. 6, where M is protein Marker; 1 is a whole bacterium liquid before induction; 2 is induced whole bacteria liquid; and 3, breaking the bacteria supernatant after induction. The result shows that the bacterial breaking supernatant has the expression of fusion protein with the molecular weight close to 26kDa, which is close to the theoretical molecular weight.

The fusion protein was identified by Western blotting (Western Blot) using IGF-I monoclonal antibody (product of Abcam, cat # ab9572, the same applies below), and the result is shown in FIG. 7, where M is protein Marker; 1 is a whole bacterium liquid before induction; 2 is induced whole bacteria liquid; and 3, the induced bacteria-breaking supernatant. The results show that the fusion protein in the supernatant of the broken bacteria has the immune activity of IGF-I, and the fusion protein can form a small amount of dimer structure as can be seen in the figure.

5.3 construction of recombinant expression vector of HRV 3C enzyme fusion protein and fusion expression engineering bacteria

The map of the pET-48b (+) expression vector and the sequence of its multiple cloning site are shown in FIGS. 8 and 9. After the cDNA sequence (SEQ ID No.5) of the artificially synthesized HRV 3C enzyme fusion protein is subjected to double enzyme digestion by SacI and HindIII, the cDNA sequence is connected to a pET-48b (+) expression vector which is also subjected to double enzyme digestion through T4 DNA ligase to construct a recombinant expression vector which is recorded as pET-48b-HRV 3C-IGF-I; and (3) transforming an escherichia coli clone strain Top10, culturing and screening recombinants on an LB culture medium containing 50 mu g/ml kanamycin at 37 ℃, and after the recombinants are identified to be correct through PCR and enzyme digestion, sequencing to verify the sequence correctness of the recombinants.

Extracting and identifying correct recombinants, transforming the recombinants into an Escherichia coli expression strain Rosetta-gami2(DE3), and culturing and screening the recombinants on an LB culture medium containing 50 mu g/ml Kan, 34 mu g/ml Chl, 50 mu g/ml streptomycin (Str) and 12.5 mu g/ml Tet at 37 ℃, wherein the obtained recombinants are IGF-I engineering bacteria expressed by Trx fusion and are named as HRV 3C-IGF. It was stored in 15% glycerol and frozen in a freezer at-80 ℃. The fusion protein expressed and generated by the HRV 3C-IGF engineering bacteria contains a thioredoxin (Trx) tag, a His-tag purification tag, an HRV 3C enzyme cutting site and a mature peptide IGF-I sequence, and the theoretical molecular weight is about 23.5 kDa.

Taking HRV 3C-IGF engineering bacteria frozen in a glycerol tube, inoculating the engineering bacteria into a test tube containing a fresh LB resistance culture medium (containing 50 mu g/ml Kan, 50 mu g/ml Str, 34 mu g/ml Chl and 12.5 mu g/ml Tet) according to 0.2 percent, carrying out shake culture overnight, and activating the strain; transferring into 300mL LB culture medium, shaking culturing at 37 deg.C for about 3h, and measuring OD₆₀₀When the value is 0.6-0.8, adding isopropyl-beta-D-thiogalactoside (IPTG) with the final concentration of 0.5mM, carrying out induction expression for 25h at 25 ℃, and centrifuging to collect thalli; weighing wet weight of the thallus, adding a bacteria breaking liquid (50mM Tris-HCl, 0.5M NaCl, pH8.0) according to the weight ratio of the thallus to the bacteria breaking liquid of 1:15, carrying out Western Blot detection on the bacteria liquid before and after induction and the bacteria breaking supernatant by adopting an IGF-I monoclonal antibody after ultrasonic bacteria breaking, wherein the result is shown in figure 10, and M is a protein Marker; 1 is induced whole bacteria liquid; 2 is the induced bacteria-breaking supernatant. The result shows that the fusion protein in the bacteria breaking supernatant has IGF-I immunological activity, the protein size is near 26kDa, and the molecular weight is close to the theoretical molecular weight.

5.4 purification of the fusion protein

The expression of the Thrombin fusion protein and the HRV 3C enzyme fusion protein was performed by using the Thrombin-IGF-engineered bacterium and the HRV 3C-IGF-engineered bacterium constructed in examples 5.2 and 5.3. The fermentation broth was centrifuged to collect the cells, disrupted by sonication, and the supernatant was collected, subjected to affinity chromatography using Ni-Sepharose 6FF column, equilibrated with 20mM Tris-HCl (pH7.8) -0.5M NaCl, and eluted with a 50mM → 500mM imidazole gradient to collect the target protein. To reduce the effect of high salt concentration on protease activity, excess salt was removed using a G-15 desalting column.

5.5 cleavage of the fusion protein

The IGF-I mature peptide is about 10kDa in size, and if cleaved correctly, a band is detectable at around 10kDa and has IGF-I immunoreactivity.

The system for cleaving fusion protein with thrombin adopts 1U thrombin (Solarbio's product, cat # T8021) to cleave 100 μ g fusion protein, and the final concentration of cleavage buffer system is 20mM Tris-HCl (pH 8.0), 0.15M NaCl, 0/2.5/10/20mM CaCl₂The mixture was digested for 24 hours at 37 ℃ in a shaker. The cleavage result is shown in FIG. 11, wherein M is protein Marker, 1 is CaCl-free₂And 2 is 2.5mM CaCl₂And 3 is 10mMCaCl₂And 4 is 20mM CaCl₂. As shown in the figure, CaCl₂At a final concentration of 10mM, the cleavage was best, and thrombin almost completely cleaved the fusion protein, CaCl₂The concentration is increased again to influence the enzyme activity and reduce the cutting effect.

The HRV 3C enzyme cleavage system cleaved the fusion protein 100. mu.g of the fusion protein using 1U/10U of HRV 3C enzyme (product of Takara, cat. No.: 7360) and the cleavage buffer system was digested with 50mM Tris-HCl (pH7.5) and 0.15M NaCl at 4 ℃ for 24 hours. The results of the enzyme digestion product of IGF-I monoclonal antibody WB are shown in FIG. 12, wherein M is protein Marker, 1 is the addition of 1U enzyme, and 2 is the addition of 10U enzyme. As can be seen, the addition of 1U of enzyme cleaves the fusion protein to yield a small amount of mature IGF-I peptide; the enzyme cutting effect is better when 10U enzyme is added, and a large amount of IGF-I mature peptide can be obtained.

Comparative example 1 obtaining of Enterokinase fusion protein and enzyme digestion

1.1 obtaining the Gene sequence of the fusion protein

The enterokinase fusion protein gene sequence is shown in SEQ ID No.10, and the 5 'end of the enterokinase fusion protein gene sequence is provided with a KpnI restriction endonuclease site (GGTACC), the 3' end of the enterokinase fusion protein gene sequence is provided with a continuous stop codon (TAATGA) and a recognition site (AAGCTT) of HindIII:

GGTACCGACGACGACGACAAGGGTCCGGAGACCCTGTGCGGTGCGGAACTGGTGGACGCGCTGCAATTTGTTTGCGGTGATCGTGGCTTCTACTTTAACAAGCCGACCGGTTATGGTAGCAGCAGCCGTCGTGCGCCGCAGACCGGTATCGTTGACGAGTGCTGCTTCCGTAGCTGCGATCTGCGTCGTCTGGAAATGTATTGCGCGCCGCTGAAGCCGGCGAAAAGCGCGTAATGAAAGCTT

1.2 construction of recombinant expression vector of enterokinase fusion protein and fusion expression engineering bacterium

After the cDNA sequence (SEQ ID No.10) of the artificially synthesized Enterokinase fusion protein is subjected to double enzyme digestion by KpnI and HindIII, the cDNA sequence is connected to a pET-32a (+) expression vector which is also subjected to double enzyme digestion through T4 DNA ligase to construct a recombinant expression vector which is recorded as pET-32 a-Enterokinase-IGF-I; and (3) transforming an escherichia coli clone strain Top10, culturing and screening recombinants on an LB culture medium containing 50 mu g/ml ampicillin at 37 ℃, and sequencing to verify the sequence correctness after PCR and enzyme digestion verification are correct.

Extracting and identifying correct recombinants, transforming the recombinants into an Escherichia coli competence expression strain Origami B (DE3), and culturing and screening the recombinants on an LB culture medium containing 50 mu g/ml ampicillin (Amp), 15 mu g/ml kanamycin (Kan) and 12.5 mu g/ml tetracycline (Tet) at 37 ℃, wherein the obtained recombinants are IGF-I engineering bacteria expressed by Trx fusion and named as Ek-IGF. It was stored in 15% glycerol and frozen in a freezer at-80 ℃. Wherein, the fusion protein generated by Ek-IGF engineering bacteria expression should contain thioredoxin (Trx) label, His-tag purification label, Enterokinase (Enterokinase) enzyme cutting site and mature peptide IGF-I sequence, and the theoretical molecular weight is about 24.7 kDa.

Inoculating 0.2% of Ek-IGF engineering bacteria frozen in a glycerol tube into a test tube containing a fresh LB resistance culture medium (containing 50 mu g/ml Amp, 15 mu g/ml Kan and 12.5 mu g/ml Tet), performing shake culture overnight, and activating the strain; transferring into 300mL LB culture medium, shaking culturing at 37 deg.C for about 3h, and measuring OD₆₀₀When the value is 0.6-0.8, adding IPTG (isopropyl thiogalactoside) with the final concentration of 0.5mM, carrying out induced expression for 24h at 25 ℃, and centrifuging to collect thalli; weighing wet weight of thallus, adding a bacteria breaking liquid (50mM Tris-HCl, 0.5M NaCl, pH8.0) according to the weight ratio of the thallus to the bacteria breaking liquid of 1:15, carrying out Western Blot identification on IGF-I monoclonal antibody by using the bacteria breaking supernatant and the precipitate after ultrasonic bacteria breaking, wherein the result is shown in figure 13, and M is protein Marker; 1 is a bacterium breaking supernatant; 2 is the bacterium breaking sediment. The result shows that the fusion protein in the bacteria breaking supernatant has IGF-I immunological activity and the protein size is between 26 and 34 kDa.

1.3 purification of enterokinase fusion proteins

Expression of the enterokinase fusion protein was accomplished using the Ek-IGF-engineered bacteria constructed in comparative example 1.1. The fermentation broth was centrifuged to collect the cells, disrupted by sonication, and the supernatant was collected, subjected to affinity chromatography using Ni-Sepharose 6FF column, equilibrated with 20mM Tris-HCl (pH7.8) -0.5M NaCl, and eluted with a 50mM → 500mM imidazole gradient to collect the target protein. To reduce the effect of high salt concentration on protease activity, excess salt was removed using a G-15 desalting column.

1.4 enzyme cleavage of enterokinase fusion proteins

The enterokinase system for cutting enterokinase fusion protein adopts different amounts of recombinant bovine enterokinase (product of Yaohai biological company, cat # ez00001) to cut 1mg of fusion protein, and the final concentration of the enzyme digestion buffer system is 20mM Tris-HCl (pH 8.0), 50mM NaCl, 2mM CaCl₂The enzyme was digested at 16 ℃ for 24 h. The SDS-PAGE result of the enzyme digestion product is shown in FIG. 14, wherein M is protein Marker, 1 is 1IU enzyme, 2 is 5IU enzyme, 3 is 10IU enzyme, 4 is 50IU enzyme, 5 is 100IU enzyme, 6 is enterokinase cut positive control fusion protein, and 7 is non-enzyme added positive control fusion protein. As can be seen, enterokinase can cleave the corresponding positive control protein, but cannot cleave the enterokinase fusion protein.

The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Sequence listing

<110> Wuhan Haite biopharmaceuticals GmbH

<120> method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method

<130> WH2008252-1

<160> 10

<170> SIPOSequenceListing 1.0

<210> 1

<211> 199

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 1

Met Ser Asp Lys Ile Ile His Leu Thr Asp Asp Ser Phe Asp Thr Asp

1 5 10 15

Val Leu Lys Ala Asp Gly Ala Ile Leu Val Asp Phe Trp Ala Glu Trp

20 25 30

Cys Gly Pro Cys Lys Met Ile Ala Pro Ile Leu Asp Glu Ile Ala Asp

35 40 45

Glu Tyr Gln Gly Lys Leu Thr Val Ala Lys Leu Asn Ile Asp Gln Asn

50 55 60

Pro Gly Thr Ala Pro Lys Tyr Gly Ile Arg Gly Ile Pro Thr Leu Leu

65 70 75 80

Leu Phe Lys Asn Gly Glu Val Ala Ala Thr Lys Val Gly Ala Leu Ser

85 90 95

Lys Gly Gln Leu Lys Glu Phe Leu Asp Ala Asn Leu Ala Gly Ser Gly

100 105 110

Ser Gly His Met His His His His His His Ser Ser Gly Leu Val Pro

115 120 125

Arg Gly Pro Glu Thr Leu Cys Gly Ala Glu Leu Val Asp Ala Leu Gln

130 135 140

Phe Val Cys Gly Asp Arg Gly Phe Tyr Phe Asn Lys Pro Thr Gly Tyr

145 150 155 160

Gly Ser Ser Ser Arg Arg Ala Pro Gln Thr Gly Ile Val Asp Glu Cys

165 170 175

Cys Phe Arg Ser Cys Asp Leu Arg Arg Leu Glu Met Tyr Cys Ala Pro

180 185 190

Leu Lys Pro Ala Lys Ser Ala

195

<210> 2

<211> 228

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 2

Met Ser Asp Lys Ile Ile His Leu Thr Asp Asp Ser Phe Asp Thr Asp

1 5 10 15

Val Leu Lys Ala Asp Gly Ala Ile Leu Val Asp Phe Trp Ala Glu Trp

20 25 30

Cys Gly Pro Cys Lys Met Ile Ala Pro Ile Leu Asp Glu Ile Ala Asp

35 40 45

Glu Tyr Gln Gly Lys Leu Thr Val Ala Lys Leu Asn Ile Asp Gln Asn

50 55 60

Pro Gly Thr Ala Pro Lys Tyr Gly Ile Arg Gly Ile Pro Thr Leu Leu

65 70 75 80

Leu Phe Lys Asn Gly Glu Val Ala Ala Thr Lys Val Gly Ala Leu Ser

85 90 95

Lys Gly Gln Leu Lys Glu Phe Leu Asp Ala Asn Leu Ala Gly Ser Gly

100 105 110

Ser Gly His Met His His His His His His Ser Ser Gly Leu Val Pro

115 120 125

Arg Gly Ser Gly Met Lys Glu Thr Ala Ala Ala Lys Phe Glu Arg Gln

130 135 140

His Met Asp Ser Pro Asp Leu Gly Thr Asp Asp Asp Asp Lys Gly Pro

145 150 155 160

Glu Thr Leu Cys Gly Ala Glu Leu Val Asp Ala Leu Gln Phe Val Cys

165 170 175

Gly Asp Arg Gly Phe Tyr Phe Asn Lys Pro Thr Gly Tyr Gly Ser Ser

180 185 190

Ser Arg Arg Ala Pro Gln Thr Gly Ile Val Asp Glu Cys Cys Phe Arg

195 200 205

Ser Cys Asp Leu Arg Arg Leu Glu Met Tyr Cys Ala Pro Leu Lys Pro

210 215 220

Ala Lys Ser Ala

225

<210> 3

<211> 221

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 3

Met Ser Asp Lys Ile Ile His Leu Thr Asp Asp Ser Phe Asp Thr Asp

1 5 10 15

Val Leu Lys Ala Asp Gly Ala Ile Leu Val Asp Phe Trp Ala Glu Trp

20 25 30

Cys Gly Pro Cys Lys Met Ile Ala Pro Ile Leu Asp Glu Ile Ala Asp

35 40 45

Glu Tyr Gln Gly Lys Leu Thr Val Ala Lys Leu Asn Ile Asp Gln Asn

50 55 60

Pro Gly Thr Ala Pro Lys Tyr Gly Ile Arg Gly Ile Pro Thr Leu Leu

65 70 75 80

Leu Phe Lys Asn Gly Glu Val Ala Ala Thr Lys Val Gly Ala Leu Ser

85 90 95

Lys Gly Gln Leu Lys Glu Phe Leu Asp Ala Asn Leu Ala Gly Ser Gly

100 105 110

Ser Gly His Thr Ser Gly Gly Gly Gly Ser Asn Asn Asn Pro Pro Thr

115 120 125

Pro Thr Pro Ser Ser Gly Ser Gly His His His His His His Ser Ala

130 135 140

Ala Leu Glu Val Leu Phe Gln Gly Pro Glu Thr Leu Cys Gly Ala Glu

145 150 155 160

Leu Val Asp Ala Leu Gln Phe Val Cys Gly Asp Arg Gly Phe Tyr Phe

165 170 175

Asn Lys Pro Thr Gly Tyr Gly Ser Ser Ser Arg Arg Ala Pro Gln Thr

180 185 190

Gly Ile Val Asp Glu Cys Cys Phe Arg Ser Cys Asp Leu Arg Arg Leu

195 200 205

Glu Met Tyr Cys Ala Pro Leu Lys Pro Ala Lys Ser Ala

210 215 220

<210> 4

<211> 271

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 4

tggccatatg caccatcatc atcatcattc ttctggtctg gtgccacgcg gtccggagac 60

cctgtgcggt gcggaactgg tggacgcgct gcaatttgtt tgcggtgatc gtggcttcta 120

ctttaacaag ccgaccggtt atggtagcag cagccgtcgt gcgccgcaga ccggtatcgt 180

tgacgagtgc tgcttccgta gctgcgatct gcgtcgtctg gaaatgtatt gcgcgccgct 240

gaagccggcg aaaagcgcgt aatgagatat c 271

<210> 5

<211> 245

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 5

ccgcggctct ggaagtgctg tttcaaggtc cggagaccct gtgcggtgcg gaactggtgg 60

acgcgctgca atttgtttgc ggtgatcgtg gcttctactt taacaagccg accggttatg 120

gtagcagcag ccgtcgtgcg ccgcagaccg gtatcgttga cgagtgctgc ttccgtagct 180

gcgatctgcg tcgtctggaa atgtattgcg cgccgctgaa gccggcgaaa agcgcgtaaa 240

agctt 245

<210> 6

<211> 70

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 6

Gly Pro Glu Thr Leu Cys Gly Ala Glu Leu Val Asp Ala Leu Gln Phe

1 5 10 15

Val Cys Gly Asp Arg Gly Phe Tyr Phe Asn Lys Pro Thr Gly Tyr Gly

20 25 30

Ser Ser Ser Arg Arg Ala Pro Gln Thr Gly Ile Val Asp Glu Cys Cys

35 40 45

Phe Arg Ser Cys Asp Leu Arg Arg Leu Glu Met Tyr Cys Ala Pro Leu

50 55 60

Lys Pro Ala Lys Ser Ala

65 70

<210> 7

<211> 295

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 7

Thr Phe Gly Ala Gly Glu Ala Asp Cys Gly Leu Arg Pro Leu Phe Glu

1 5 10 15

Lys Lys Gln Val Gln Asp Gln Thr Glu Lys Glu Leu Phe Glu Ser Tyr

20 25 30

Ile Glu Gly Arg Ile Val Glu Gly Gln Asp Ala Glu Val Gly Leu Ser

35 40 45

Pro Trp Gln Val Met Leu Phe Arg Lys Ser Pro Gln Glu Leu Leu Cys

50 55 60

Gly Ala Ser Leu Ile Ser Asp Arg Trp Val Leu Thr Ala Ala His Cys

65 70 75 80

Leu Leu Tyr Pro Pro Trp Asp Lys Asn Phe Thr Val Asp Asp Leu Leu

85 90 95

Val Arg Ile Gly Lys His Ser Arg Thr Arg Tyr Glu Arg Lys Val Glu

100 105 110

Lys Ile Ser Met Leu Asp Lys Ile Tyr Ile His Pro Arg Tyr Asn Trp

115 120 125

Lys Glu Asn Leu Asp Arg Asp Ile Ala Leu Leu Lys Leu Lys Arg Pro

130 135 140

Ile Glu Leu Ser Asp Tyr Ile His Pro Val Cys Leu Pro Asp Lys Gln

145 150 155 160

Thr Ala Ala Lys Leu Leu His Ala Gly Phe Lys Gly Arg Val Thr Gly

165 170 175

Trp Gly Asn Arg Arg Glu Thr Trp Thr Thr Ser Val Ala Glu Val Gln

180 185 190

Pro Ser Val Leu Gln Val Val Asn Leu Pro Leu Val Glu Arg Pro Val

195 200 205

Cys Lys Ala Ser Thr Arg Ile Arg Ile Thr Asp Asn Met Phe Cys Ala

210 215 220

Gly Tyr Lys Pro Gly Glu Gly Lys Arg Gly Asp Ala Cys Glu Gly Asp

225 230 235 240

Ser Gly Gly Pro Phe Val Met Lys Ser Pro Tyr Asn Asn Arg Trp Tyr

245 250 255

Gln Met Gly Ile Val Ser Trp Gly Glu Gly Cys Asp Arg Asp Gly Lys

260 265 270

Tyr Gly Phe Tyr Thr His Val Phe Arg Leu Lys Lys Trp Ile Gln Lys

275 280 285

Val Ile Asp Arg Leu Gly Ser

290 295

<210> 8

<211> 248

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 8

Cys Gly Lys Lys Leu Val Thr Gln Glu Val Ser Pro Lys Ile Val Gly

1 5 10 15

Gly Ser Asp Ser Arg Glu Gly Ala Trp Pro Trp Val Val Ala Leu Tyr

20 25 30

Phe Asp Asp Gln Gln Val Cys Gly Ala Ser Leu Val Ser Arg Asp Trp

35 40 45

Leu Val Ser Ala Ala His Cys Val Tyr Gly Arg Asn Met Glu Pro Ser

50 55 60

Lys Trp Lys Ala Val Leu Gly Leu His Met Ala Ser Asn Leu Thr Ser

65 70 75 80

Pro Gln Ile Glu Thr Arg Leu Ile Asp Gln Ile Val Ile Asn Pro His

85 90 95

Tyr Asn Lys Arg Arg Lys Asn Asn Asp Ile Ala Met Met His Leu Glu

100 105 110

Met Lys Val Asn Tyr Thr Asp Tyr Ile Gln Pro Ile Cys Leu Pro Glu

115 120 125

Glu Asn Gln Val Phe Pro Pro Gly Arg Ile Cys Ser Ile Ala Gly Trp

130 135 140

Gly Ala Leu Ile Tyr Gln Gly Ser Thr Ala Asp Val Leu Gln Glu Ala

145 150 155 160

Asp Val Pro Leu Leu Ser Asn Glu Lys Cys Gln Gln Gln Met Pro Glu

165 170 175

Tyr Asn Ile Thr Glu Asn Met Val Cys Ala Gly Tyr Glu Ala Gly Gly

180 185 190

Val Asp Ser Cys Gln Gly Asp Ser Gly Gly Pro Leu Met Cys Gln Glu

195 200 205

Asn Asn Arg Trp Leu Leu Ala Gly Val Thr Ser Phe Gly Tyr Gln Cys

210 215 220

Ala Leu Pro Asn Arg Pro Gly Val Tyr Ala Arg Val Pro Arg Phe Thr

225 230 235 240

Glu Trp Ile Gln Ser Phe Leu His

245

<210> 9

<211> 182

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 9

Gly Pro Asn Thr Glu Phe Ala Leu Ser Leu Leu Arg Lys Asn Ile Met

1 5 10 15

Thr Ile Thr Thr Ser Lys Gly Glu Phe Thr Gly Leu Gly Ile His Asp

20 25 30

Arg Val Cys Val Ile Pro Thr His Ala Gln Pro Gly Asp Asp Val Leu

35 40 45

Val Asn Gly Gln Lys Ile Arg Val Lys Asp Lys Tyr Lys Leu Val Asp

50 55 60

Pro Glu Asn Ile Asn Leu Glu Leu Thr Val Leu Thr Leu Asp Arg Asn

65 70 75 80

Glu Lys Phe Arg Asp Ile Arg Gly Phe Ile Ser Glu Asp Leu Glu Gly

85 90 95

Val Asp Ala Thr Leu Val Val His Ser Asn Asn Phe Thr Asn Thr Ile

100 105 110

Leu Glu Val Gly Pro Val Thr Met Ala Gly Leu Ile Asn Leu Ser Ser

115 120 125

Thr Pro Thr Asn Arg Met Ile Arg Tyr Asp Tyr Ala Thr Lys Thr Gly

130 135 140

Gln Cys Gly Gly Val Leu Cys Ala Thr Gly Lys Ile Phe Gly Ile His

145 150 155 160

Val Gly Gly Asn Gly Arg Gln Gly Phe Ser Ala Gln Leu Lys Lys Gln

165 170 175

Tyr Phe Val Glu Lys Gln

180

<210> 10

<211> 243

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 10

ggtaccgacg acgacgacaa gggtccggag accctgtgcg gtgcggaact ggtggacgcg 60

ctgcaatttg tttgcggtga tcgtggcttc tactttaaca agccgaccgg ttatggtagc 120

agcagccgtc gtgcgccgca gaccggtatc gttgacgagt gctgcttccg tagctgcgat 180

ctgcgtcgtc tggaaatgta ttgcgcgccg ctgaagccgg cgaaaagcgc gtaatgaaag 240

ctt 243

Claims

1. A method for screening enzyme-cleaved aptameric fusion proteins, comprising the steps of:

2. The method of claim 1, further comprising the steps of:

(4) and (4) performing codon optimization on the most adaptive fusion protein coding gene obtained in the step (3).

3. The method of claim 1, wherein the target protein is human insulin-like growth factor-I.

4. The method according to claim 3, wherein the protease is enterokinase, thrombin or HRV 3C protease, and when different proteases are docked with their corresponding three-dimensional structure predictions of the fusion protein, the docking sites are selected as follows:

5. The method of claim 3, wherein the threshold value in step (3) is 50%.

6. A preparation method of recombinant human insulin-like growth factor-I is characterized in that fusion protein with an amino acid sequence shown as SEQ ID NO.1 is used, pET-32a (+) vector is used for expression in a prokaryotic host, and thrombin is used for enzyme digestion of the fusion protein obtained by expression.

7. The method according to claim 6, wherein the gene encoding the fusion protein of SEQ ID No.1 is:

(1) a gene sequence shown as SEQ ID NO. 4; or

8. A preparation method of recombinant human insulin-like growth factor-I is characterized in that fusion protein with an amino acid sequence shown as SEQ ID NO.3 is used, pET-48b (+) vector is used for expression in a prokaryotic host, and HRV 3C enzyme is used for enzyme digestion of the fusion protein obtained by expression.

9. The method according to claim 8, wherein the coding gene corresponding to the fusion protein of SEQ ID No.3 is:

(1) a gene sequence shown as SEQ ID NO. 5;

10. The process according to any one of claims 6 to 9, wherein the prokaryotic host is a strain of BL21(DE3) E.coli, a strain of Rosetta-gami B (DE3) E.coli, a strain of Origami B (DE3) E.coli or a strain of Rosetta-gami2(DE3) E.coli.