CN112111504A - Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method - Google Patents

Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method Download PDF

Info

Publication number
CN112111504A
CN112111504A CN202010952359.4A CN202010952359A CN112111504A CN 112111504 A CN112111504 A CN 112111504A CN 202010952359 A CN202010952359 A CN 202010952359A CN 112111504 A CN112111504 A CN 112111504A
Authority
CN
China
Prior art keywords
fusion protein
gly
leu
protease
seq
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010952359.4A
Other languages
Chinese (zh)
Inventor
汤华东
李冠霖
柳余莉
欧阳聪
杨薇
高思偲
尹领鹏
谢倩
杨琴霞
张盼盼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Hiteck Biological Pharma Co ltd
Original Assignee
Wuhan Hiteck Biological Pharma Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Hiteck Biological Pharma Co ltd filed Critical Wuhan Hiteck Biological Pharma Co ltd
Priority to CN202010952359.4A priority Critical patent/CN112111504A/en
Publication of CN112111504A publication Critical patent/CN112111504A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/435Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • C07K14/475Growth factors; Growth regulators
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/70Vectors or expression systems specially adapted for E. coli
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/50Fusion polypeptide containing protease site

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Plant Pathology (AREA)
  • Microbiology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Analytical Chemistry (AREA)
  • Toxicology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

The invention discloses a method for screening enzyme digestion adaptive fusion protein and an IGF-I preparation method, wherein the screening method is to insert target protein sequence coding genes into a plurality of different expression vectors containing protease enzyme digestion sites respectively to obtain amino acid sequences of a plurality of different fusion proteins; predicting the secondary structure of the amino acid sequence of the obtained fusion protein; taking amino acid sequences of different fusion proteins and corresponding secondary structure prediction results thereof as input files to predict a three-dimensional structure; and performing molecular docking on the three-dimensional structure prediction result and the protease corresponding to the fusion protein by adopting a Cluspro algorithm, selecting a digestion system with the correct docking result ratio exceeding a threshold value for experimental verification, and screening out the fusion protein with the most suitable protease. The soluble expression system of thrombin and HRV 3C protease suitable for IGF-I is screened by the method, and the soluble expression system has the advantages of stability, solubility, high-activity mass expression, convenient enzyme digestion and the like.

Description

Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method
Technical Field
The invention relates to the technical field of recombinant protein production, in particular to a method for screening enzyme digestion adaptive fusion protein, which saves manpower, material resources and time, and a preparation method of IGF-I.
Background
Insulin-like growth factor-I (IGF-I) plays an important role in the growth, development and proliferation of cells, and is a multifunctional regulatory factor. Currently, human IGF-I has been clinically used for the treatment of diabetes, insulin resistance syndrome, dwarfism and nervous system diseases, etc., with good results. If IGF-I is directly expressed in a prokaryotic expression system, the target protein usually forms an inclusion body, and in order to improve the protein solubility and facilitate purification, a dissolution promoting tag and a purification tag are added for fusion expression; in order to obtain the target protein, a protease cleavage site sequence is usually inserted between the target protein and the lysis-promoting tag and the purification tag, so that the tags can be removed by using corresponding protease to obtain the final target protein.
Commonly used proteases comprise enterokinase, thrombin, Xa factor, HRV 3C protease (human rhinovirus 3C protease) and the like, each protease has certain restriction on amino acid sequences before and after the cleavage site (for example, the optimal cleavage site sequence of thrombin is LVPR ↓GS, ↓indicatesthat the protease cleaves a corresponding polypeptide chain at the site, the same below), and protein drugs do not allow redundant amino acids to exist before the mature peptide, so that the protease corresponding to the fusion protein needs to be selected to ensure that the target protein obtained after cleavage has the N-terminal and/or the C-terminal of the natural non-additional amino acid residues. In actual production, a plurality of fusion proteins containing different protease enzyme cutting site sequences are usually required to be expressed and purified, and the optimal expression system can be selected by comparing the enzyme cutting efficiencies of different proteases; also, the following may occur during this process: (1) because the enzyme cutting sites are wrapped inside the fusion protein, the protease cannot cut the fusion protein; (2) the fusion protein contains a plurality of enzyme cutting sites of protease, and the protease can non-specifically cut the fusion protein and even cut the target protein. In any case, the cost of manpower and material resources is wasted, the development period is prolonged, and even the project is terminated.
Therefore, a rapid and efficient method is urgently needed to screen out a soluble expression system which can be successfully digested by enzyme and successfully prepare the recombinant human insulin-like growth factor-I so as to facilitate large-scale production.
Disclosure of Invention
In view of the above drawbacks or needs for improvement in the prior art, an object of the present invention is to provide a method for screening an adapted protease restriction enzyme-cleaved fusion protein, which efficiently screens protease restriction enzyme-cleaved fusion proteins by using a virtual screening technique, thereby solving the technical risk that the cleavage effect can only be verified by using a large number of experiments in the past.
The invention also aims to provide a preparation method of the insulin-like growth factor-I, which obtains the insulin-like growth factor-I fusion protein with good enzyme digestion effect by the virtual screening method, realizes soluble expression and successful enzyme digestion of the fusion protein, and solves the technical problem that the recombinant human insulin-like growth factor-I is difficult to stably, soluble and highly actively express in an escherichia coli expression system.
The technical scheme of the invention is detailed as follows:
a method of screening for an enzyme-cleaved aptameric fusion protein comprising the steps of:
(1) respectively inserting target protein sequence coding genes into a plurality of different expression vectors containing protease enzyme cutting sites to obtain amino acid sequences of a plurality of different fusion proteins; predicting the secondary structure of the amino acid sequence of the obtained fusion protein by using a secondary structure prediction algorithm PSIPRED;
(2) taking the amino acid sequences of the different fusion proteins in the step (1) and the corresponding secondary structure prediction results thereof as input files, and predicting the three-dimensional structure by a structure prediction algorithm I-TASSER;
(3) and (3) performing molecular docking on the three-dimensional structure prediction result obtained in the step (2) and the protease corresponding to the fusion protein by adopting a Cluspro algorithm, selecting a digestion system with the correct docking result ratio exceeding a threshold value for experimental verification, and screening out the fusion protein with the most suitable protease.
In the screening method, candidate proteases can be listed one by one, fusion protein sequences containing target proteins and enzyme cutting sites corresponding to the proteases are obtained respectively, and fusion proteins with corresponding quantities are obtained according to the quantity of the proteases. The subsequent secondary structure prediction algorithm PSIPRED, the tertiary structure prediction algorithm I-TASSER and the molecular docking algorithm Cluspro are all the existing known technologies, and the operation can be carried out by a person skilled in the art according to the instruction of each algorithm.
Preferably, the method further comprises the following steps:
(4) and (4) performing codon optimization on the most adaptive fusion protein coding gene obtained in the step (3). The expression quantity can be improved after optimization.
Preferably, in the above method, the target protein is human insulin-like growth factor-I.
Preferably, in the above method, the protease is enterokinase, thrombin or HRV 3C protease, and when the three-dimensional structure prediction results of the fusion proteins corresponding to different proteases are subjected to docking, the docking sites are selected as follows:
selecting all lysine (Lys) in an amino acid sequence and 4 amino acids on the left and right of the lysine (Lys) as candidate enzyme cutting sites of the fusion protein corresponding to the enterokinase;
selecting arginine (Arg) and 4 amino acids on the left and right of the Arg as candidate enzyme cutting sites of the fusion protein corresponding to the thrombin;
the fusion protein corresponding to HRV 3C protease selects glutamine (Gln) and 4 amino acids on the left and right of the glutamine as candidate enzyme cutting sites.
Preferably, in the above method, the threshold value in step (3) is 50%.
A method for preparing recombinant human insulin-like growth factor-I uses fusion protein with amino acid sequence as shown in SEQ ID NO.1, uses pET-32a (+) vector to express in prokaryotic host, and uses thrombin to carry out enzyme digestion on the fusion protein obtained by expression. The fusion protein coding gene was cloned into pET-32a (+) vector between MscI and EcoRV cleavage sites.
Preferably, in the above IGF-I preparation method, the encoding gene of the fusion protein shown in SEQ ID NO.1 is:
(1) a gene sequence shown as SEQ ID NO. 4; or
(2) A gene sequence which has 90 to 100 percent of homology with the gene sequence shown in SEQ ID NO.4 and encodes the same functional protein; or
(3) The gene sequence shown in SEQ ID NO.4 is a gene sequence which is derived from (1) and encodes protein with the same activity by adding, deleting or replacing one or more codons.
A method for preparing recombinant human insulin-like growth factor-I uses fusion protein whose amino acid sequence is shown in SEQ ID NO.3, uses pET-48b (+) vector to make expression in prokaryotic host, and uses HRV 3C enzyme to make enzyme digestion so as to obtain the invented fusion protein. The fusion protein coding gene was cloned into the pET-48b (+) vector between the SacII and HindIII cleavage sites.
Preferably, in the above IGF-I preparation method, the coding gene corresponding to the fusion protein shown in SEQ ID NO.3 is:
(1) a gene sequence shown as SEQ ID NO. 5;
(2) a gene sequence which has 90 to 100 percent of homology with the gene sequence shown in SEQ ID NO.5 and encodes the same functional protein; or
(3) The gene sequence shown in SEQ ID NO.5 is a gene sequence which is derived from (1) and encodes protein with the same activity by adding, deleting or replacing one or more codons.
Preferably, in any of the above-described preparation methods, the prokaryotic host is BL21(DE3) E.coli strain, Rosetta-gami B (DE3) E.coli strain, Origami B (DE3) E.coli strain or Rosetta-gami2(DE3) E.coli strain.
Compared with the prior art, the invention has the following beneficial effects:
the method for screening enzyme digestion adaptive fusion protein provided by the invention adopts a virtual screening means, can virtually compare the enzyme digestion effects of various proteases through structure simulation and molecular docking, screens out a proper fusion protein expression system and the proteases before an experiment, conforms to a prediction result after experimental verification, can greatly reduce the capital, labor and time cost, and is more convenient and efficient compared with the conventional high-input mode which can only be screened by one verification test.
According to the invention, the enzyme digestion effects of thrombin, enterokinase and HRV 3C protease are obtained through virtual screening by the screening method, and the enzyme digestion effects are consistent with the prediction result after experimental verification, and the thrombin, the HRV 3C protease and the corresponding fusion protein thereof are more suitable for preparing IGF-I, so that a soluble expression system which is suitable for the recombinant human insulin-like growth factor-I and can be successfully subjected to enzyme digestion is obtained.
The preparation method of the recombinant human insulin-like growth factor-I fusion protein provided by the invention has the advantages of soluble expression, large expression quantity and the like, can successfully obtain the recombinant human insulin-like growth factor-I by protease enzyme digestion, and solves the technical problem that the human insulin-like growth factor-I cannot be directly obtained.
Drawings
FIG. 1 is a PSIPRED online prediction server input interface;
FIG. 2 is an I-TASSER online prediction server input interface;
FIG. 3 is a Cluspro online prediction server input interface;
FIG. 4 is a map of E.coli expression vector pET-32a (+);
FIG. 5 is a map of the multiple cloning site region of E.coli expression vector pET-32a (+);
FIG. 6 is a SDS-PAGE electrophoretic analysis of thrombin fusion proteins;
FIG. 7 is a Western Blot immunodifferencing profile of thrombin fusion proteins;
FIG. 8 is a map of E.coli expression vector pET-48b (+);
FIG. 9 is a map of the multiple cloning site region of E.coli expression vector pET-48b (+);
FIG. 10 is a Western Blot immunodifferencing profile of HRV 3C enzyme fusion protein;
FIG. 11 is a SDS-PAGE analysis of thrombin-cleaved thrombin fusion proteins;
FIG. 12 is a WesternBlot immunodifferencing profile of HRV 3C enzyme fusion protein after digestion with HRV 3C enzyme;
FIG. 13 is a Western Blot immunodifferencing profile of enterokinase fusion protein;
FIG. 14 is an SDS-PAGE analysis of enterokinase fusion protein after digestion with enterokinase.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention are described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The inventors succeeded in improving the thermal stability of proteins using various structure prediction algorithms through in-depth studies on protein structures and molecular modeling methods in the early days (Li, et al. apple Environ Microb,2018,84(2), e 02129-17; Li, et al. RSCAdv,2018,8, 1948). The invention virtually screens out fusion protein suitable for preparing the recombinant human insulin-like growth factor-I by a structure prediction algorithm and a molecular docking algorithm, and adopts corresponding protease to successfully perform enzyme digestion to prepare the recombinant human insulin-like growth factor-I. By adopting the technical scheme of the invention, the expression of IGF-I in an escherichia coli expression system with stability, solubility and high activity can be realized, so that the industrial production of the IGF-I becomes possible, and the IGF-I is beneficial to development of new drugs and clinical application.
With respect to the examples, it is noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless otherwise indicated, the technical terms used are terms commonly used by those of ordinary skill in the art; the experimental method without specific conditions noted is a conventional experimental method; the test materials used are commercially available products unless otherwise specified, and the ingredients and preparation methods of various reagents and media can be found in conventional laboratory manuals.
Example 1: obtaining amino acid sequences of fusion proteins
According to literature reports, the direct expression of insulin-like growth factor-I (IGF-I) in Escherichia coli forms inclusion bodies (Rosano GL, et al front Microbiol,2014,5:172), and soluble expression of proteins can be realized by adding a solubilizing protein tag for fusion expression. At present, the most widely used escherichia coli expression vector is the pET system, and a solubilizing tag such as Thioredoxin (Thioredoxin) and a His-tag purification tag are added to the vector, so that soluble expression and purification of fusion protein can be promoted.
In order to realize that the N end of IGF-I has no other amino acid after the fusion protein is digested by the protease, the candidate protease is thrombin, FactorXa, enterokinase, TEV enzyme and HRV 3C enzyme; taking into account the cost of enzyme and the difficulty of acquisition, simulated digestion experiments were performed with thrombin, enterokinase, HRV 3C enzyme. In different pET expression vectors, enzyme cutting sites of thrombin, enterokinase and HRV 3C enzyme are contained, and target protein can be obtained by protease enzyme cutting. Finally, three fusion proteins of pET-32a (+) -thrombin enzyme cutting site-IGF-I, pET-32a (+) -enterokinase enzyme cutting site-IGF-I, pET-48b (+) -HRV 3C enzyme cutting site-IGF-I are determined for subsequent experiments.
The amino acid sequence of IGF-I is shown in SEQ ID No. 6:
GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCF RSCDLRRLEMYCAPLKPAKSA。
respectively inserting the amino acid sequence of IGF-I into thrombin enzyme cutting site sequence LVPR, enterokinase enzyme cutting site DDDDK sequence of pET-32a (+) vector and HRV 3C enzyme cutting site LEVLFQ sequence of pET-48b (+) vector to obtain corresponding fusion protein amino acid sequence, i.e. obtaining fusion protein amino acid sequence
Thrombin fusion protein amino acid sequence SEQ ID No.1:
MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHMHHHHHHSSGLVPRGPETLCGAELVDALQ FVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA
enterokinase fusion protein amino acid sequence SEQ ID No.2:
MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHMHHHHHHSSGLVPRGSGMKETAAAKFERQHMDSPDLGTDDDDKGPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCA PLKPAKSA
HRV 3C enzyme fusion protein amino acid sequence SEQ ID No.3:
MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHTSGGGGSNNNPPTPTPSSGSGHHHHHHSAALEVLFQGPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAK SA
example 2 prediction of fusion protein Secondary Structure
The PSIPRED online prediction server (http:// bio if. cs. ucl. ac. uk/PSIPRED /) is opened, and its pages are shown in FIG. 1. The amino acid Sequence obtained in example 1 was entered into the blank field of "Protein Sequence" and filled in Job Name and Email address, and after submitting the task, the ss2 file of the predicted result was sent to the designated mailbox.
The prediction results of the secondary structure of the thrombin fusion protein in example 1 are as follows, wherein C represents a random coil, S represents a β -sheet, and H represents an α -helix:
CCCCCSSCCHHHHHHHHHHCCCCSSSSSSCCCCHHHHHHHHHHHHHHHHHCCCSSSSSSSCCCCCCHHHHHCCCCCCSSSSSSCCSSSSSSSCCCCHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHCCCCSSSCCCCCCCCCCCCCCCCCCCCCHHCCCCCSSSSSSSSCCCCCCCCC
the results of predicting the secondary structure of the enterokinase fusion protein in example 1 are as follows:
CCCCCSSCCCCCHHHHHHCCCCCSSSSSCCCCCHHHHHHHHHHHHHHHHHCCCSSSSSSSCCCCCCCCCCCCCCCCCSSSSSSCCSSSSSSSCCCCHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHCCCCSSSCCCCCCCCCCCCCCCCCCCCCHHCCCCCCCHHHHHCCCCCCCCCC
the result of predicting the secondary structure of the HRV 3C enzyme fusion protein in example 1 is as follows:
CCCCCSSCCCCCHHHHHHCCCCCSSSSSCCCCCHHHHHHHHHHHHHHHHHCCCSSSSSSSCCCCCCCCCCCCCCCCCSSSSSSCCSSSSSSSCCCCHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHCCCCCCHHHHHHHHHHHHHCCCCSSSCCCCCCCCCCCCCCCCCCCCCHHCCCCCCCHHHHHCCCCCCCCCC
example 3 prediction of three-dimensional Structure of fusion protein
An I-TASSER online prediction server (https:// zhanglab. ccmb. med. umich. edu/I-TASSER /) is opened, and the page is shown in FIG. 2. The amino acid sequence of the fusion protein is entered into the sequence box, and then "Option III: the upload button in the specific secondary structure for specific contents "option uploads the ss2 file obtained in example 2. And then filling in an email address and a task name, and submitting the prediction task. After the task is finished, the predicted pdb file of the three-dimensional structure of the fusion protein can be received.
Example 4 molecular docking of fusion proteins with proteases
Turn on the Cluspro online prediction server (https:// Cluspro. bu. edu/home. php), the page of which is shown in FIG. 3. Clicking an 'Upload PDB' button in the 'Receptor' column to Upload the predicted fusion protein PDB file in the embodiment 3; IN the column "Ligand", protease PDB ID corresponding to the fusion protein was filled IN, thrombin was filled IN "1 ETR", enterokinase was filled IN "1 EKB", and HRV 3C enzyme was filled IN "2 IN 2", respectively.
In order to accelerate the docking speed and improve the docking accuracy, in the following "introduction and replication" options, the "introduction" blank column corresponding to the Receptor is filled with candidate enzyme cutting sites of the fusion protein, and the "introduction" blank column corresponding to the Ligand is filled with amino acid residue sites which are possibly contacted with the fusion protein on the surface near the protease activity site.
Selecting Arg and 4 amino acids on the left and right of the Arg as candidate enzyme cutting sites by the thrombin fusion protein, selecting all Lys and 4 amino acids on the left and right of the Lys in an amino acid sequence as candidate enzyme cutting sites by the enterokinase fusion protein, and selecting Gln and 4 amino acids on the left and right of the Gln as candidate enzyme cutting sites by the HRV 3C enzyme fusion protein; preferably, the region for docking the thrombin fusion protein is 49-57, 66-78, 93-101, 125-133, 146-154 in the sequence of SEQ ID No. 1; the region for docking the enterokinase fusion protein is the 1-8, 15-23, 33-41, 49-62, 66-75, 79-105, 130-144, 154-162, 181-189, 219-228 site in the sequence of SEQ ID No. 2; the region for docking the HRV 3C enzyme fusion protein is 47-55, 59-67, 95-103, 147-155, 162-170 and 187-195 in the SEQ ID No.3 sequence.
The amino acid sequence of thrombin (PDB ID:1ETR) is shown in SEQ ID No. 7:
TFGAGEADCGLRPLFEKKQVQDQTEKELFESYIEGRIVEGQDAEVGLSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTVDDLLVRIGKHSRTRYERKVEKISMLDKIYIHPRYNWKENLDRDIALLKLKRPIELSDYIHPVCLPDKQTAAKLLHAGFKGRVTGWGNRRETWTTSVAEVQPSVLQVVNLPLVERPVCKASTRIRITDNMFCAGYKPGEGKRGDACEGDSGGPFVMKSPYNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDRLGS
the butt joint region is the 51-53, 55-56, 58-65, 79-80, 83, 85-86, 88, 90, 101, 104, 105, 131, 132, 135, 175, 179, 182, 190, 192, 194, 214, 215, 224, 225, 234, 241, 261, 267, 269, 273, 276;
the amino acid sequence of enterokinase (PDB ID:1EKB) is shown in SEQ ID No. 8:
CGKKLVTQEVSPKIVGGSDSREGAWPWVVALYFDDQQVCGASLVSRDWLVSAAHCVYGRNMEPSKWKAVLGLHMASNLTSPQIETRLIDQIVINPHYNKRRKNNDIAMMHLEMKVNYTDYIQPICLPEENQVFPPGRICSIAGWGALIYQGSTADVLQEADVPLLSNEKCQQQMPEYNITENMVCAGYEAGGVDSCQGDSGGPLMCQENNRWLLAGVTSFGYQCALPNRPGVYARVPRFTEWIQSFLH
the butt joint area is 33, 36-39, 54-55, 57-59, 97, 99-100, 102, 105, 195-221, 218-221, 231-232;
the amino acid sequence of HRV 3C enzyme (PDB ID:2IN2) is shown IN SEQ ID No. 9:
GPNTEFALSLLRKNIMTITTSKGEFTGLGIHDRVCVIPTHAQPGDDVLVNGQKIRVKDKYKLVDPENINLELTVLTLDRNEKFRDIRGFISEDLEGVDATLVVHSNNFTNTILEVGPVTMAGLINLSSTPTNRMIRYDYATKTGQCGGVLCATGKIFGIHVGGNGRQGFSAQLKKQYFVEKQ
the docking area is the 26-30, 44-48, 64-68, 74, 76-78, 111-112, 131-137, 139, 147-152, 166-170, 174.
After the docking file and the docking site are input, a 'Dock' button is clicked to carry out docking prediction, the Cluspro outputs 20 most probable docking compound structure sets and pdb structure files thereof, and the number of structures contained in each set is given. The 20 aggregate compound structure files were observed to derive the docking position of each compound.
The output of the thrombin and fusion protein docking is as follows:
1) the largest structure set comprises 220 complex structures in total, and the enzyme cutting position is PR ↓ GP; 2) IR ↓ GI (161); 3) no definite docking position; 4) PR ↓ GP (64); 5) no definite docking position; 6) IR ↓ GI (49); 7) PR ↓ GP (46); 8) PR ↓ GP (43); 9) PR ↓ GP (29); 10) DR ↓ GF (26); 11) no definite docking position; 12) no definite docking position; 13) no definite docking position; 14) DR ↓ GF (15); 15) IR ↓ GI (14); 16) PR ↓ GP (12); 17) PR ↓ GP (10); 18) no definite docking position; 19) IR ↓ GI (4); 20) DR ↓ GF (1).
Counting the docking positions and the number of the docking positions to obtain that the percentage of the thrombin active sites docking at the correct enzyme cutting position (PR ↓ GP) is 61.1%:
MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIR(32.8%)GIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHMHHHHHHSSGLVPR(61.1%)GPETLCGAELVDALQFVCGDR(6.1%)GFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA
similarly, the following are the restriction sites of enterokinase and the percentage thereof, and the proportion of the correct restriction site (DK ↓ GP) is 14.7%:
MSDK(17.1%)IIHLTDDSFDTDVLKADGAILVDFWAEWCGPCK(1.8%)MIAPILDEIADEYQGKLTVAKLNIDQNPGTAPK(30.1%)YGIRGIPTLLLFKNGEVAATK(10.4%)VGALSK(2.3%)GQLKEFLDANLAGSGSGHMHHHHHHSSGLVPRGSGMK(14.4%)ETAAAK(3.0%)FERQHMDSPDLGTDDDDK(14.7%)GPETLCGAELVDALQFVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLK(6.1%)PAKSA
the following are the restriction enzyme sites of HRV 3C enzyme and the percentage thereof, and the proportion of the correct restriction enzyme site (FQ ↓ GP) is 56.6%:
MSDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVAKLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDANLAGSGSGHTSGGGGSNNNPPTPTPSSGSGHHHHHHSAALEVLFQ(56.6%)GPETLCGAELVDALQ(43.4%)FVCGDRGFYFNKPTGYGSSSRRAPQTGIVDECCFRSCDLRRLEMYCAPLKPAKSA
it is considered that, since a ratio of the correct cleavage site exceeding 50% in the prediction result means that the protease can smoothly contact the substrate and specifically cleave the correct cleavage site, 50% is selected as the threshold. Subsequently, expression experiments were carried out by selecting a thrombin cleavage system and an HRV 3C cleavage system as examples, and expression experiments were carried out by selecting an enterokinase cleavage system as a comparative example.
Example 5 preparation of fusion protein and enzyme cleavage
5.1 obtaining of fusion protein Gene sequences
In order to realize large-scale and high-efficiency expression of each fusion protein in Escherichia coli, the inventor finally designs a proper gene sequence for expression by referring to the codon bias of Escherichia coli and considering factors such as codon degeneracy, GC content, proper restriction endonuclease and the like.
Specifically, the thrombin fusion protein gene sequence is shown as SEQ ID No.4, and the 5 'end of the thrombin fusion protein gene sequence is provided with an MscI restriction endonuclease site (TGGCCA), the 3' end of the thrombin fusion protein gene sequence is provided with a continuous stop codon (TAATGA), and a recognition site (GATATC) of EcoRV:
TGGCCATATGCACCATCATCATCATCATTCTTCTGGTCTGGTGCCACGCGGTCCGGAGACCCTGTGCGGTGCGGAACTGGTGGACGCGCTGCAATTTGTTTGCGGTGATCGTGGCTTCTACTTTAACAAGCCGACCGGTTATGGTAGCAGCAGCCGTCGTGCGCCGCAGACCGGTATCGTTGACGAGTGCTGCTTCCGTAGCTGCGATCTGCGTCGTCTGGAAATGTATTGCGCGCCGCTGAAGCCGGCGAAAAGCGCGTAATGAGATATC
the HRV 3C enzyme fusion protein gene sequence is shown in SEQ ID No.5, the 5 'end of the HRV 3C enzyme fusion protein gene sequence is provided with a SacI restriction endonuclease site (CCGCGG), the 3' end of the HRV 3C enzyme fusion protein gene sequence is provided with a termination codon (TAA) and a recognition site of HindIII (AAGCTT):
CCGCGGCTCTGGAAGTGCTGTTTCAAGGTCCGGAGACCCTGTGCGGTGCGGAACTGGTGGACGCGCTGCAATTTGTTTGCGGTGATCGTGGCTTCTACTTTAACAAGCCGACCGGTTATGGTAGCAGCAGCCGTCGTGCGCCGCAGACCGGTATCGTTGACGAGTGCTGCTTCCGTAGCTGCGATCTGCGTCGTCTGGAAATGTATTGCGCGCCGCTGAAGCCGGCGAAAAGCGCGTAAAAGCTT
the above sequence was obtained by chemical synthesis.
5.2 construction of recombinant expression vector of thrombin fusion protein and fusion expression engineering bacterium
The map of the pET-32a (+) expression vector and the sequence of its multiple cloning site are shown in FIGS. 4 and 5. After the cDNA sequence (SEQ ID No.4) of the synthetic Thrombin fusion protein is subjected to double enzyme digestion by MscI and EcoRV, the cDNA sequence is connected to a pET-32a (+) expression vector which is also subjected to double enzyme digestion through T4 DNA ligase to construct a recombinant expression vector which is marked as pET-32 a-Thrombin-IGF-I; and (3) transforming an escherichia coli clone strain Top10, culturing and screening recombinants on an LB culture medium containing 50 mu g/ml ampicillin at 37 ℃, and sequencing to verify the sequence correctness after PCR and enzyme digestion verification are correct.
Extracting and identifying correct recombinants, transforming the recombinants into an Escherichia coli expression strain Origami B (DE3), and culturing and screening the recombinants on an LB culture medium containing 50 mu g/ml ampicillin (Amp), 15 mu g/ml kanamycin (Kan) and 12.5 mu g/ml tetracycline (Tet) at 37 ℃, wherein the obtained recombinants are IGF-I engineering bacteria expressed by Trx fusion and named as Thrombin-IGF. It was stored in 15% glycerol and frozen in a freezer at-80 ℃. Wherein, the fusion protein generated by the expression of the Thrombin-IGF engineering bacteria contains a thioredoxin (Trx) tag, a His-tag purification tag, a Thrombin (Thrombin) enzyme cutting site and a mature peptide IGF-I sequence, and the theoretical molecular weight is about 21.6 kDa.
Inoculating 0.2% of frozen Thrombin-IGF engineering bacteria in a glycerol tube into a test tube containing a fresh LB resistant culture medium (containing 50 mu g/ml Amp, 15 mu g/ml Kan and 12.5 mu g/ml Tet), performing shake culture overnight, and activating the strain; transferring into 300mL LB culture medium, shaking culturing at 37 deg.C for about 3h, and measuring OD600When the value is 0.6-0.8, adding isopropyl-beta-D-thiogalactoside (IPTG) with the final concentration of 0.05mM, carrying out induction expression for 24h at 25 ℃, and centrifuging to collect thalli; weighing wet weight of the thallus, adding a bacteria-breaking liquid (50mM Tris-HCl, 0.5M NaCl, pH8.0) in a weight ratio of the thallus to the bacteria-breaking liquid of 1:15, performing ultrasonic bacteria-breaking, and performing SDS-PAGE detection. The results are shown in FIG. 6, where M is protein Marker; 1 is a whole bacterium liquid before induction; 2 is induced whole bacteria liquid; and 3, breaking the bacteria supernatant after induction. The result shows that the bacterial breaking supernatant has the expression of fusion protein with the molecular weight close to 26kDa, which is close to the theoretical molecular weight.
The fusion protein was identified by Western blotting (Western Blot) using IGF-I monoclonal antibody (product of Abcam, cat # ab9572, the same applies below), and the result is shown in FIG. 7, where M is protein Marker; 1 is a whole bacterium liquid before induction; 2 is induced whole bacteria liquid; and 3, the induced bacteria-breaking supernatant. The results show that the fusion protein in the supernatant of the broken bacteria has the immune activity of IGF-I, and the fusion protein can form a small amount of dimer structure as can be seen in the figure.
5.3 construction of recombinant expression vector of HRV 3C enzyme fusion protein and fusion expression engineering bacteria
The map of the pET-48b (+) expression vector and the sequence of its multiple cloning site are shown in FIGS. 8 and 9. After the cDNA sequence (SEQ ID No.5) of the artificially synthesized HRV 3C enzyme fusion protein is subjected to double enzyme digestion by SacI and HindIII, the cDNA sequence is connected to a pET-48b (+) expression vector which is also subjected to double enzyme digestion through T4 DNA ligase to construct a recombinant expression vector which is recorded as pET-48b-HRV 3C-IGF-I; and (3) transforming an escherichia coli clone strain Top10, culturing and screening recombinants on an LB culture medium containing 50 mu g/ml kanamycin at 37 ℃, and after the recombinants are identified to be correct through PCR and enzyme digestion, sequencing to verify the sequence correctness of the recombinants.
Extracting and identifying correct recombinants, transforming the recombinants into an Escherichia coli expression strain Rosetta-gami2(DE3), and culturing and screening the recombinants on an LB culture medium containing 50 mu g/ml Kan, 34 mu g/ml Chl, 50 mu g/ml streptomycin (Str) and 12.5 mu g/ml Tet at 37 ℃, wherein the obtained recombinants are IGF-I engineering bacteria expressed by Trx fusion and are named as HRV 3C-IGF. It was stored in 15% glycerol and frozen in a freezer at-80 ℃. The fusion protein expressed and generated by the HRV 3C-IGF engineering bacteria contains a thioredoxin (Trx) tag, a His-tag purification tag, an HRV 3C enzyme cutting site and a mature peptide IGF-I sequence, and the theoretical molecular weight is about 23.5 kDa.
Taking HRV 3C-IGF engineering bacteria frozen in a glycerol tube, inoculating the engineering bacteria into a test tube containing a fresh LB resistance culture medium (containing 50 mu g/ml Kan, 50 mu g/ml Str, 34 mu g/ml Chl and 12.5 mu g/ml Tet) according to 0.2 percent, carrying out shake culture overnight, and activating the strain; transferring into 300mL LB culture medium, shaking culturing at 37 deg.C for about 3h, and measuring OD600When the value is 0.6-0.8, adding isopropyl-beta-D-thiogalactoside (IPTG) with the final concentration of 0.5mM, carrying out induction expression for 25h at 25 ℃, and centrifuging to collect thalli; weighing wet weight of the thallus, adding a bacteria breaking liquid (50mM Tris-HCl, 0.5M NaCl, pH8.0) according to the weight ratio of the thallus to the bacteria breaking liquid of 1:15, carrying out Western Blot detection on the bacteria liquid before and after induction and the bacteria breaking supernatant by adopting an IGF-I monoclonal antibody after ultrasonic bacteria breaking, wherein the result is shown in figure 10, and M is a protein Marker; 1 is induced whole bacteria liquid; 2 is the induced bacteria-breaking supernatant. The result shows that the fusion protein in the bacteria breaking supernatant has IGF-I immunological activity, the protein size is near 26kDa, and the molecular weight is close to the theoretical molecular weight.
5.4 purification of the fusion protein
The expression of the Thrombin fusion protein and the HRV 3C enzyme fusion protein was performed by using the Thrombin-IGF-engineered bacterium and the HRV 3C-IGF-engineered bacterium constructed in examples 5.2 and 5.3. The fermentation broth was centrifuged to collect the cells, disrupted by sonication, and the supernatant was collected, subjected to affinity chromatography using Ni-Sepharose 6FF column, equilibrated with 20mM Tris-HCl (pH7.8) -0.5M NaCl, and eluted with a 50mM → 500mM imidazole gradient to collect the target protein. To reduce the effect of high salt concentration on protease activity, excess salt was removed using a G-15 desalting column.
5.5 cleavage of the fusion protein
The IGF-I mature peptide is about 10kDa in size, and if cleaved correctly, a band is detectable at around 10kDa and has IGF-I immunoreactivity.
The system for cleaving fusion protein with thrombin adopts 1U thrombin (Solarbio's product, cat # T8021) to cleave 100 μ g fusion protein, and the final concentration of cleavage buffer system is 20mM Tris-HCl (pH 8.0), 0.15M NaCl, 0/2.5/10/20mM CaCl2The mixture was digested for 24 hours at 37 ℃ in a shaker. The cleavage result is shown in FIG. 11, wherein M is protein Marker, 1 is CaCl-free2And 2 is 2.5mM CaCl2And 3 is 10mMCaCl2And 4 is 20mM CaCl2. As shown in the figure, CaCl2At a final concentration of 10mM, the cleavage was best, and thrombin almost completely cleaved the fusion protein, CaCl2The concentration is increased again to influence the enzyme activity and reduce the cutting effect.
The HRV 3C enzyme cleavage system cleaved the fusion protein 100. mu.g of the fusion protein using 1U/10U of HRV 3C enzyme (product of Takara, cat. No.: 7360) and the cleavage buffer system was digested with 50mM Tris-HCl (pH7.5) and 0.15M NaCl at 4 ℃ for 24 hours. The results of the enzyme digestion product of IGF-I monoclonal antibody WB are shown in FIG. 12, wherein M is protein Marker, 1 is the addition of 1U enzyme, and 2 is the addition of 10U enzyme. As can be seen, the addition of 1U of enzyme cleaves the fusion protein to yield a small amount of mature IGF-I peptide; the enzyme cutting effect is better when 10U enzyme is added, and a large amount of IGF-I mature peptide can be obtained.
Comparative example 1 obtaining of Enterokinase fusion protein and enzyme digestion
1.1 obtaining the Gene sequence of the fusion protein
The enterokinase fusion protein gene sequence is shown in SEQ ID No.10, and the 5 'end of the enterokinase fusion protein gene sequence is provided with a KpnI restriction endonuclease site (GGTACC), the 3' end of the enterokinase fusion protein gene sequence is provided with a continuous stop codon (TAATGA) and a recognition site (AAGCTT) of HindIII:
GGTACCGACGACGACGACAAGGGTCCGGAGACCCTGTGCGGTGCGGAACTGGTGGACGCGCTGCAATTTGTTTGCGGTGATCGTGGCTTCTACTTTAACAAGCCGACCGGTTATGGTAGCAGCAGCCGTCGTGCGCCGCAGACCGGTATCGTTGACGAGTGCTGCTTCCGTAGCTGCGATCTGCGTCGTCTGGAAATGTATTGCGCGCCGCTGAAGCCGGCGAAAAGCGCGTAATGAAAGCTT
1.2 construction of recombinant expression vector of enterokinase fusion protein and fusion expression engineering bacterium
After the cDNA sequence (SEQ ID No.10) of the artificially synthesized Enterokinase fusion protein is subjected to double enzyme digestion by KpnI and HindIII, the cDNA sequence is connected to a pET-32a (+) expression vector which is also subjected to double enzyme digestion through T4 DNA ligase to construct a recombinant expression vector which is recorded as pET-32 a-Enterokinase-IGF-I; and (3) transforming an escherichia coli clone strain Top10, culturing and screening recombinants on an LB culture medium containing 50 mu g/ml ampicillin at 37 ℃, and sequencing to verify the sequence correctness after PCR and enzyme digestion verification are correct.
Extracting and identifying correct recombinants, transforming the recombinants into an Escherichia coli competence expression strain Origami B (DE3), and culturing and screening the recombinants on an LB culture medium containing 50 mu g/ml ampicillin (Amp), 15 mu g/ml kanamycin (Kan) and 12.5 mu g/ml tetracycline (Tet) at 37 ℃, wherein the obtained recombinants are IGF-I engineering bacteria expressed by Trx fusion and named as Ek-IGF. It was stored in 15% glycerol and frozen in a freezer at-80 ℃. Wherein, the fusion protein generated by Ek-IGF engineering bacteria expression should contain thioredoxin (Trx) label, His-tag purification label, Enterokinase (Enterokinase) enzyme cutting site and mature peptide IGF-I sequence, and the theoretical molecular weight is about 24.7 kDa.
Inoculating 0.2% of Ek-IGF engineering bacteria frozen in a glycerol tube into a test tube containing a fresh LB resistance culture medium (containing 50 mu g/ml Amp, 15 mu g/ml Kan and 12.5 mu g/ml Tet), performing shake culture overnight, and activating the strain; transferring into 300mL LB culture medium, shaking culturing at 37 deg.C for about 3h, and measuring OD600When the value is 0.6-0.8, adding IPTG (isopropyl thiogalactoside) with the final concentration of 0.5mM, carrying out induced expression for 24h at 25 ℃, and centrifuging to collect thalli; weighing wet weight of thallus, adding a bacteria breaking liquid (50mM Tris-HCl, 0.5M NaCl, pH8.0) according to the weight ratio of the thallus to the bacteria breaking liquid of 1:15, carrying out Western Blot identification on IGF-I monoclonal antibody by using the bacteria breaking supernatant and the precipitate after ultrasonic bacteria breaking, wherein the result is shown in figure 13, and M is protein Marker; 1 is a bacterium breaking supernatant; 2 is the bacterium breaking sediment. The result shows that the fusion protein in the bacteria breaking supernatant has IGF-I immunological activity and the protein size is between 26 and 34 kDa.
1.3 purification of enterokinase fusion proteins
Expression of the enterokinase fusion protein was accomplished using the Ek-IGF-engineered bacteria constructed in comparative example 1.1. The fermentation broth was centrifuged to collect the cells, disrupted by sonication, and the supernatant was collected, subjected to affinity chromatography using Ni-Sepharose 6FF column, equilibrated with 20mM Tris-HCl (pH7.8) -0.5M NaCl, and eluted with a 50mM → 500mM imidazole gradient to collect the target protein. To reduce the effect of high salt concentration on protease activity, excess salt was removed using a G-15 desalting column.
1.4 enzyme cleavage of enterokinase fusion proteins
The enterokinase system for cutting enterokinase fusion protein adopts different amounts of recombinant bovine enterokinase (product of Yaohai biological company, cat # ez00001) to cut 1mg of fusion protein, and the final concentration of the enzyme digestion buffer system is 20mM Tris-HCl (pH 8.0), 50mM NaCl, 2mM CaCl2The enzyme was digested at 16 ℃ for 24 h. The SDS-PAGE result of the enzyme digestion product is shown in FIG. 14, wherein M is protein Marker, 1 is 1IU enzyme, 2 is 5IU enzyme, 3 is 10IU enzyme, 4 is 50IU enzyme, 5 is 100IU enzyme, 6 is enterokinase cut positive control fusion protein, and 7 is non-enzyme added positive control fusion protein. As can be seen, enterokinase can cleave the corresponding positive control protein, but cannot cleave the enterokinase fusion protein.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.
Sequence listing
<110> Wuhan Haite biopharmaceuticals GmbH
<120> method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method
<130> WH2008252-1
<160> 10
<170> SIPOSequenceListing 1.0
<210> 1
<211> 199
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 1
Met Ser Asp Lys Ile Ile His Leu Thr Asp Asp Ser Phe Asp Thr Asp
1 5 10 15
Val Leu Lys Ala Asp Gly Ala Ile Leu Val Asp Phe Trp Ala Glu Trp
20 25 30
Cys Gly Pro Cys Lys Met Ile Ala Pro Ile Leu Asp Glu Ile Ala Asp
35 40 45
Glu Tyr Gln Gly Lys Leu Thr Val Ala Lys Leu Asn Ile Asp Gln Asn
50 55 60
Pro Gly Thr Ala Pro Lys Tyr Gly Ile Arg Gly Ile Pro Thr Leu Leu
65 70 75 80
Leu Phe Lys Asn Gly Glu Val Ala Ala Thr Lys Val Gly Ala Leu Ser
85 90 95
Lys Gly Gln Leu Lys Glu Phe Leu Asp Ala Asn Leu Ala Gly Ser Gly
100 105 110
Ser Gly His Met His His His His His His Ser Ser Gly Leu Val Pro
115 120 125
Arg Gly Pro Glu Thr Leu Cys Gly Ala Glu Leu Val Asp Ala Leu Gln
130 135 140
Phe Val Cys Gly Asp Arg Gly Phe Tyr Phe Asn Lys Pro Thr Gly Tyr
145 150 155 160
Gly Ser Ser Ser Arg Arg Ala Pro Gln Thr Gly Ile Val Asp Glu Cys
165 170 175
Cys Phe Arg Ser Cys Asp Leu Arg Arg Leu Glu Met Tyr Cys Ala Pro
180 185 190
Leu Lys Pro Ala Lys Ser Ala
195
<210> 2
<211> 228
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 2
Met Ser Asp Lys Ile Ile His Leu Thr Asp Asp Ser Phe Asp Thr Asp
1 5 10 15
Val Leu Lys Ala Asp Gly Ala Ile Leu Val Asp Phe Trp Ala Glu Trp
20 25 30
Cys Gly Pro Cys Lys Met Ile Ala Pro Ile Leu Asp Glu Ile Ala Asp
35 40 45
Glu Tyr Gln Gly Lys Leu Thr Val Ala Lys Leu Asn Ile Asp Gln Asn
50 55 60
Pro Gly Thr Ala Pro Lys Tyr Gly Ile Arg Gly Ile Pro Thr Leu Leu
65 70 75 80
Leu Phe Lys Asn Gly Glu Val Ala Ala Thr Lys Val Gly Ala Leu Ser
85 90 95
Lys Gly Gln Leu Lys Glu Phe Leu Asp Ala Asn Leu Ala Gly Ser Gly
100 105 110
Ser Gly His Met His His His His His His Ser Ser Gly Leu Val Pro
115 120 125
Arg Gly Ser Gly Met Lys Glu Thr Ala Ala Ala Lys Phe Glu Arg Gln
130 135 140
His Met Asp Ser Pro Asp Leu Gly Thr Asp Asp Asp Asp Lys Gly Pro
145 150 155 160
Glu Thr Leu Cys Gly Ala Glu Leu Val Asp Ala Leu Gln Phe Val Cys
165 170 175
Gly Asp Arg Gly Phe Tyr Phe Asn Lys Pro Thr Gly Tyr Gly Ser Ser
180 185 190
Ser Arg Arg Ala Pro Gln Thr Gly Ile Val Asp Glu Cys Cys Phe Arg
195 200 205
Ser Cys Asp Leu Arg Arg Leu Glu Met Tyr Cys Ala Pro Leu Lys Pro
210 215 220
Ala Lys Ser Ala
225
<210> 3
<211> 221
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 3
Met Ser Asp Lys Ile Ile His Leu Thr Asp Asp Ser Phe Asp Thr Asp
1 5 10 15
Val Leu Lys Ala Asp Gly Ala Ile Leu Val Asp Phe Trp Ala Glu Trp
20 25 30
Cys Gly Pro Cys Lys Met Ile Ala Pro Ile Leu Asp Glu Ile Ala Asp
35 40 45
Glu Tyr Gln Gly Lys Leu Thr Val Ala Lys Leu Asn Ile Asp Gln Asn
50 55 60
Pro Gly Thr Ala Pro Lys Tyr Gly Ile Arg Gly Ile Pro Thr Leu Leu
65 70 75 80
Leu Phe Lys Asn Gly Glu Val Ala Ala Thr Lys Val Gly Ala Leu Ser
85 90 95
Lys Gly Gln Leu Lys Glu Phe Leu Asp Ala Asn Leu Ala Gly Ser Gly
100 105 110
Ser Gly His Thr Ser Gly Gly Gly Gly Ser Asn Asn Asn Pro Pro Thr
115 120 125
Pro Thr Pro Ser Ser Gly Ser Gly His His His His His His Ser Ala
130 135 140
Ala Leu Glu Val Leu Phe Gln Gly Pro Glu Thr Leu Cys Gly Ala Glu
145 150 155 160
Leu Val Asp Ala Leu Gln Phe Val Cys Gly Asp Arg Gly Phe Tyr Phe
165 170 175
Asn Lys Pro Thr Gly Tyr Gly Ser Ser Ser Arg Arg Ala Pro Gln Thr
180 185 190
Gly Ile Val Asp Glu Cys Cys Phe Arg Ser Cys Asp Leu Arg Arg Leu
195 200 205
Glu Met Tyr Cys Ala Pro Leu Lys Pro Ala Lys Ser Ala
210 215 220
<210> 4
<211> 271
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
tggccatatg caccatcatc atcatcattc ttctggtctg gtgccacgcg gtccggagac 60
cctgtgcggt gcggaactgg tggacgcgct gcaatttgtt tgcggtgatc gtggcttcta 120
ctttaacaag ccgaccggtt atggtagcag cagccgtcgt gcgccgcaga ccggtatcgt 180
tgacgagtgc tgcttccgta gctgcgatct gcgtcgtctg gaaatgtatt gcgcgccgct 240
gaagccggcg aaaagcgcgt aatgagatat c 271
<210> 5
<211> 245
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
ccgcggctct ggaagtgctg tttcaaggtc cggagaccct gtgcggtgcg gaactggtgg 60
acgcgctgca atttgtttgc ggtgatcgtg gcttctactt taacaagccg accggttatg 120
gtagcagcag ccgtcgtgcg ccgcagaccg gtatcgttga cgagtgctgc ttccgtagct 180
gcgatctgcg tcgtctggaa atgtattgcg cgccgctgaa gccggcgaaa agcgcgtaaa 240
agctt 245
<210> 6
<211> 70
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 6
Gly Pro Glu Thr Leu Cys Gly Ala Glu Leu Val Asp Ala Leu Gln Phe
1 5 10 15
Val Cys Gly Asp Arg Gly Phe Tyr Phe Asn Lys Pro Thr Gly Tyr Gly
20 25 30
Ser Ser Ser Arg Arg Ala Pro Gln Thr Gly Ile Val Asp Glu Cys Cys
35 40 45
Phe Arg Ser Cys Asp Leu Arg Arg Leu Glu Met Tyr Cys Ala Pro Leu
50 55 60
Lys Pro Ala Lys Ser Ala
65 70
<210> 7
<211> 295
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 7
Thr Phe Gly Ala Gly Glu Ala Asp Cys Gly Leu Arg Pro Leu Phe Glu
1 5 10 15
Lys Lys Gln Val Gln Asp Gln Thr Glu Lys Glu Leu Phe Glu Ser Tyr
20 25 30
Ile Glu Gly Arg Ile Val Glu Gly Gln Asp Ala Glu Val Gly Leu Ser
35 40 45
Pro Trp Gln Val Met Leu Phe Arg Lys Ser Pro Gln Glu Leu Leu Cys
50 55 60
Gly Ala Ser Leu Ile Ser Asp Arg Trp Val Leu Thr Ala Ala His Cys
65 70 75 80
Leu Leu Tyr Pro Pro Trp Asp Lys Asn Phe Thr Val Asp Asp Leu Leu
85 90 95
Val Arg Ile Gly Lys His Ser Arg Thr Arg Tyr Glu Arg Lys Val Glu
100 105 110
Lys Ile Ser Met Leu Asp Lys Ile Tyr Ile His Pro Arg Tyr Asn Trp
115 120 125
Lys Glu Asn Leu Asp Arg Asp Ile Ala Leu Leu Lys Leu Lys Arg Pro
130 135 140
Ile Glu Leu Ser Asp Tyr Ile His Pro Val Cys Leu Pro Asp Lys Gln
145 150 155 160
Thr Ala Ala Lys Leu Leu His Ala Gly Phe Lys Gly Arg Val Thr Gly
165 170 175
Trp Gly Asn Arg Arg Glu Thr Trp Thr Thr Ser Val Ala Glu Val Gln
180 185 190
Pro Ser Val Leu Gln Val Val Asn Leu Pro Leu Val Glu Arg Pro Val
195 200 205
Cys Lys Ala Ser Thr Arg Ile Arg Ile Thr Asp Asn Met Phe Cys Ala
210 215 220
Gly Tyr Lys Pro Gly Glu Gly Lys Arg Gly Asp Ala Cys Glu Gly Asp
225 230 235 240
Ser Gly Gly Pro Phe Val Met Lys Ser Pro Tyr Asn Asn Arg Trp Tyr
245 250 255
Gln Met Gly Ile Val Ser Trp Gly Glu Gly Cys Asp Arg Asp Gly Lys
260 265 270
Tyr Gly Phe Tyr Thr His Val Phe Arg Leu Lys Lys Trp Ile Gln Lys
275 280 285
Val Ile Asp Arg Leu Gly Ser
290 295
<210> 8
<211> 248
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 8
Cys Gly Lys Lys Leu Val Thr Gln Glu Val Ser Pro Lys Ile Val Gly
1 5 10 15
Gly Ser Asp Ser Arg Glu Gly Ala Trp Pro Trp Val Val Ala Leu Tyr
20 25 30
Phe Asp Asp Gln Gln Val Cys Gly Ala Ser Leu Val Ser Arg Asp Trp
35 40 45
Leu Val Ser Ala Ala His Cys Val Tyr Gly Arg Asn Met Glu Pro Ser
50 55 60
Lys Trp Lys Ala Val Leu Gly Leu His Met Ala Ser Asn Leu Thr Ser
65 70 75 80
Pro Gln Ile Glu Thr Arg Leu Ile Asp Gln Ile Val Ile Asn Pro His
85 90 95
Tyr Asn Lys Arg Arg Lys Asn Asn Asp Ile Ala Met Met His Leu Glu
100 105 110
Met Lys Val Asn Tyr Thr Asp Tyr Ile Gln Pro Ile Cys Leu Pro Glu
115 120 125
Glu Asn Gln Val Phe Pro Pro Gly Arg Ile Cys Ser Ile Ala Gly Trp
130 135 140
Gly Ala Leu Ile Tyr Gln Gly Ser Thr Ala Asp Val Leu Gln Glu Ala
145 150 155 160
Asp Val Pro Leu Leu Ser Asn Glu Lys Cys Gln Gln Gln Met Pro Glu
165 170 175
Tyr Asn Ile Thr Glu Asn Met Val Cys Ala Gly Tyr Glu Ala Gly Gly
180 185 190
Val Asp Ser Cys Gln Gly Asp Ser Gly Gly Pro Leu Met Cys Gln Glu
195 200 205
Asn Asn Arg Trp Leu Leu Ala Gly Val Thr Ser Phe Gly Tyr Gln Cys
210 215 220
Ala Leu Pro Asn Arg Pro Gly Val Tyr Ala Arg Val Pro Arg Phe Thr
225 230 235 240
Glu Trp Ile Gln Ser Phe Leu His
245
<210> 9
<211> 182
<212> PRT
<213> Artificial Sequence (Artificial Sequence)
<400> 9
Gly Pro Asn Thr Glu Phe Ala Leu Ser Leu Leu Arg Lys Asn Ile Met
1 5 10 15
Thr Ile Thr Thr Ser Lys Gly Glu Phe Thr Gly Leu Gly Ile His Asp
20 25 30
Arg Val Cys Val Ile Pro Thr His Ala Gln Pro Gly Asp Asp Val Leu
35 40 45
Val Asn Gly Gln Lys Ile Arg Val Lys Asp Lys Tyr Lys Leu Val Asp
50 55 60
Pro Glu Asn Ile Asn Leu Glu Leu Thr Val Leu Thr Leu Asp Arg Asn
65 70 75 80
Glu Lys Phe Arg Asp Ile Arg Gly Phe Ile Ser Glu Asp Leu Glu Gly
85 90 95
Val Asp Ala Thr Leu Val Val His Ser Asn Asn Phe Thr Asn Thr Ile
100 105 110
Leu Glu Val Gly Pro Val Thr Met Ala Gly Leu Ile Asn Leu Ser Ser
115 120 125
Thr Pro Thr Asn Arg Met Ile Arg Tyr Asp Tyr Ala Thr Lys Thr Gly
130 135 140
Gln Cys Gly Gly Val Leu Cys Ala Thr Gly Lys Ile Phe Gly Ile His
145 150 155 160
Val Gly Gly Asn Gly Arg Gln Gly Phe Ser Ala Gln Leu Lys Lys Gln
165 170 175
Tyr Phe Val Glu Lys Gln
180
<210> 10
<211> 243
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
ggtaccgacg acgacgacaa gggtccggag accctgtgcg gtgcggaact ggtggacgcg 60
ctgcaatttg tttgcggtga tcgtggcttc tactttaaca agccgaccgg ttatggtagc 120
agcagccgtc gtgcgccgca gaccggtatc gttgacgagt gctgcttccg tagctgcgat 180
ctgcgtcgtc tggaaatgta ttgcgcgccg ctgaagccgg cgaaaagcgc gtaatgaaag 240
ctt 243

Claims (10)

1. A method for screening enzyme-cleaved aptameric fusion proteins, comprising the steps of:
(1) respectively inserting target protein sequence coding genes into a plurality of different expression vectors containing protease enzyme cutting sites to obtain amino acid sequences of a plurality of different fusion proteins; predicting the secondary structure of the amino acid sequence of the obtained fusion protein by using a secondary structure prediction algorithm PSIPRED;
(2) taking the amino acid sequences of the different fusion proteins in the step (1) and the corresponding secondary structure prediction results thereof as input files, and predicting the three-dimensional structure by a structure prediction algorithm I-TASSER;
(3) and (3) performing molecular docking on the three-dimensional structure prediction result obtained in the step (2) and the protease corresponding to the fusion protein by adopting a Cluspro algorithm, selecting a digestion system with the correct docking result ratio exceeding a threshold value for experimental verification, and screening out the fusion protein with the most suitable protease.
2. The method of claim 1, further comprising the steps of:
(4) and (4) performing codon optimization on the most adaptive fusion protein coding gene obtained in the step (3).
3. The method of claim 1, wherein the target protein is human insulin-like growth factor-I.
4. The method according to claim 3, wherein the protease is enterokinase, thrombin or HRV 3C protease, and when different proteases are docked with their corresponding three-dimensional structure predictions of the fusion protein, the docking sites are selected as follows:
selecting all lysine (Lys) in an amino acid sequence and 4 amino acids on the left and right of the lysine (Lys) as candidate enzyme cutting sites of the fusion protein corresponding to the enterokinase;
selecting arginine (Arg) and 4 amino acids on the left and right of the Arg as candidate enzyme cutting sites of the fusion protein corresponding to the thrombin;
the fusion protein corresponding to HRV 3C protease selects glutamine (Gln) and 4 amino acids on the left and right of the glutamine as candidate enzyme cutting sites.
5. The method of claim 3, wherein the threshold value in step (3) is 50%.
6. A preparation method of recombinant human insulin-like growth factor-I is characterized in that fusion protein with an amino acid sequence shown as SEQ ID NO.1 is used, pET-32a (+) vector is used for expression in a prokaryotic host, and thrombin is used for enzyme digestion of the fusion protein obtained by expression.
7. The method according to claim 6, wherein the gene encoding the fusion protein of SEQ ID No.1 is:
(1) a gene sequence shown as SEQ ID NO. 4; or
(2) A gene sequence which has 90 to 100 percent of homology with the gene sequence shown in SEQ ID NO.4 and encodes the same functional protein; or
(3) The gene sequence shown in SEQ ID NO.4 is a gene sequence which is derived from (1) and encodes protein with the same activity by adding, deleting or replacing one or more codons.
8. A preparation method of recombinant human insulin-like growth factor-I is characterized in that fusion protein with an amino acid sequence shown as SEQ ID NO.3 is used, pET-48b (+) vector is used for expression in a prokaryotic host, and HRV 3C enzyme is used for enzyme digestion of the fusion protein obtained by expression.
9. The method according to claim 8, wherein the coding gene corresponding to the fusion protein of SEQ ID No.3 is:
(1) a gene sequence shown as SEQ ID NO. 5;
(2) a gene sequence which has 90 to 100 percent of homology with the gene sequence shown in SEQ ID NO.5 and encodes the same functional protein; or
(3) The gene sequence shown in SEQ ID NO.5 is a gene sequence which is derived from (1) and encodes protein with the same activity by adding, deleting or replacing one or more codons.
10. The process according to any one of claims 6 to 9, wherein the prokaryotic host is a strain of BL21(DE3) E.coli, a strain of Rosetta-gami B (DE3) E.coli, a strain of Origami B (DE3) E.coli or a strain of Rosetta-gami2(DE3) E.coli.
CN202010952359.4A 2020-09-11 2020-09-11 Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method Pending CN112111504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010952359.4A CN112111504A (en) 2020-09-11 2020-09-11 Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010952359.4A CN112111504A (en) 2020-09-11 2020-09-11 Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method

Publications (1)

Publication Number Publication Date
CN112111504A true CN112111504A (en) 2020-12-22

Family

ID=73801900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010952359.4A Pending CN112111504A (en) 2020-09-11 2020-09-11 Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method

Country Status (1)

Country Link
CN (1) CN112111504A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115340996A (en) * 2022-08-30 2022-11-15 态创生物科技(广州)有限公司 Co-expression method of multi-subunit protein by using specific enzyme cutting site

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1769456A (en) * 2005-05-20 2006-05-10 成都西玛生物科技有限公司 Recombinant a human peptide production method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1769456A (en) * 2005-05-20 2006-05-10 成都西玛生物科技有限公司 Recombinant a human peptide production method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
宁俊凯: "胰岛素样生长因子-1(IGF-1)原核制备工艺进展", 《海峡药学》 *
易华伟等: "基于氨基酸序列和模拟结构预测蛋白质稳定性的研究进展", 《生物技术通报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115340996A (en) * 2022-08-30 2022-11-15 态创生物科技(广州)有限公司 Co-expression method of multi-subunit protein by using specific enzyme cutting site

Similar Documents

Publication Publication Date Title
Cao et al. Conjugal type IV macromolecular transfer systems of Gram-negative bacteria: organismal distribution, structural constraints and evolutionary conclusions
Li et al. Molecular characterization of an ice nucleation protein variant (inaQ) from Pseudomonas syringae and the analysis of its transmembrane transport activity in Escherichia coli
Golden et al. Ribosomal protein L6: structural evidence of gene duplication from a primitive RNA binding protein.
RU2007124731A (en) GRAM POSITIVE BACTERIA PRODUCING RECOMBINANT PROTEINS
Unzueta et al. Strategies for the production of difficult-to-express full-length eukaryotic proteins using microbial cell factories: production of human alpha-galactosidase A
CN110724187B (en) Recombinant engineering bacterium for efficiently expressing liraglutide precursor and application thereof
Liu et al. Fusion expression of pedA gene to obtain biologically active pediocin PA-1 in Escherichia coli
CN115785237B (en) Recombinant botulinum toxin and preparation method thereof
JP2020529221A5 (en)
CN112980865A (en) Construction method of recombinant human-like collagen engineering bacteria
CN110835366B (en) Tag polypeptide for promoting soluble expression of protein and application thereof
Shi et al. Expression, purification and renaturation of truncated human integrin β1 from inclusion bodies of Escherichia coli
CN112111504A (en) Method for screening enzyme digestion adaptive fusion protein and IGF-I preparation method
KR20100086717A (en) Method for the secretory production of heterologous protein in escherichia coli
CN109055339B (en) TEV protease mutant, gene, biological material, preparation method, reagent or kit and application
Rahman et al. Topology-informed strategies for the overexpression and purification of membrane proteins
Aaltonen et al. Transmembrane topology of the Acr3 family arsenite transporter from Bacillus subtilis
Durrani et al. Expression and rapid purification of recombinant biologically active ovine growth hormone with DsbA targeting to Escherichia coli inner membrane
CN102676533A (en) Recombinant human cystatin C coding gene and expression method
JP7016552B2 (en) How to increase the secretion of recombinant proteins
MA High expression level of human epidermal growth factor (hEGF) using a well-designed fusion protein-tagged construct in E. coli.
CN110540601B (en) Recombinant PLB-hEGF fusion protein and application thereof
CN109880840B (en) In vivo biotinylation labeling system for recombinant protein escherichia coli
CN109161557B (en) Application of radiation-resistant deinococcus gobi alkaline protease gene KerB
CN113493780A (en) Method for preparing recombinant heparinase II by utilizing SUMO fusion expression system and SUMO _ heparinase II fusion protein prepared by same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201222